CTI Home Page
UWM Home Page Lubar School Home Page SAP UCC Home Page Search UWM
University of Wisconsin-Milwaukee


Center for Technology Innovation

 

Program Detail

►CTI Home Page
►About CTI
►Events
►Members
►Faculty
►Custom Programs
►Decision Lab

 

Center for
Technology Innovation
UW-Milwaukee
PO Box 742
Milwaukee, WI   53201
Lubar Hall N334
 

PHONE: 414-229-3939
Fax: 414-229-4477

Please direct questions and comments to:
daveh@uwm.edu

Last updated July 20, 2007


Speech Recognition: Primary Human-Computer Interface?

featuring IBM Team of 4, TJ Watson Research Center, Yorktown Heights, NY

Friday, October 31, 2003
8:30 AM - 4:15 PM
Breakfast and check-in at 8 AM
UWM Lubar School of Business
Lubar Hall, Room N146

Printable Brochure



A full day seminar presented by a Team from IBM TJ Watson Research Center, Yorktown Heights, NY. This workshop is being coordinated by Dr. Paul Ambrose, Assistant Professor of Management Information Systems at the School of Business Administration, University of Wisconsin-Milwaukee.

Overview

This workshop will be presented by a team from IBM TJ Watson Research Center. Topics will include: Overview conversational/multimodal technologies (Picheny) – This will be an overview presentation of the activities in the areas of conversational and multimodal technologies at IBM. The main technology components, including recognition, synthesis, dialog management, and natural language understanding, will be described. Key issues affecting practical deployment of these technologies will be highlighted. In addition, a programming model for the creation of usable conversational interfaces will be described that incorporates both voice and GUI in a seamless fashion. SuperHuman Speech Recognition (Kingsbury) - The goal of IBM’s Superhuman speech recognition project is to develop a domain-independent speech recognition system that matches or exceeds human performance across the full range of possible application domains, acoustic conditions and speaker characteristics. This will be an overview of the project, including some tutorial material on speech recognition technology, a comparison of human and machine performance on various types of speech, and a discussion of the advances required to achieve "superhuman’’ accuracy on realistic material. Text-to-Speech Research (Eide) - The main components of current state-of-the-art text-to-speech systems, including the IBM system, will be described. The evolution of synthetic speech systems coincides with the emergence of standards for markup to drive them; specific proposals for the extension to existing markup languages to enable next-generation synthetic speech will be given. Current and future applications of TTS technology will be discussed. Audio-Visual Automatic Speech Recognition (Potamianos) - Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability into the human computer interface. We will review the main components of audio-visual automatic speech recognition, namely the visual front end design, and the integration of the two speech informative steams (audio and visual) for improved automatic speech recognition.

Topics


About the speaker

Michael Picheny is the Manager of the Speech and Language Algorithms Group and has worked in the Speech Recognition area since 1981. He received his Ph.D. from MIT, holds over 20 patents and has been heavily involved in the development of almost all of IBMs recognition systems. Brian Kingsbury is a research staff member and received his Ph.D. degree from the University of California-Berkeley. His research interests include the development of speech recognition systems that are robust to different acoustic environments and speaking styles, modeling of speech acoustics, and hardware support for machine perception. Ellen Eide is a research staff member and is currently involved in text-to-speech research, primarily prosody models and search. She received her Ph.D. from the MIT Gerasimos Potamianos is a research staff member and received his Ph.D. degree from the Johns Hopkins University. His research interests span the areas of signal and image processing, multimedia signal processing, automatic speech recognition, language modeling, audio-visual speech processing, recognition, and synthesis. He has 5 patents filed.

Who should Attend?

The topic is of importance in that voice will someday soon become the primary input media for the computer. The seminar is for those IS staff members charged with developing future systems and working with user interfaces, and MIS and Computer Science students who are interested in learning more about conversational interface technologies, and the research and products being developed at IBM.

Coordinator

This technology event is being coordinated by Dr. Paul Ambrose, Assistant Professor of Management Information Systems at the School of Business Administration, University of Wisconsin-Milwaukee.