Creating the Customized Data Model for a Directory Assistance Automations

In the initial planning stages for installing a Directory Assistance System, a list of the localities to be included in the locality database, along with other parameters, must be specified. This list is then processed through a unique computer program to create the phonetic dictionary, which phonetically describes the locality database, the state database, and other listings and phrases to be recognized by the system and their pronunciations. The language model describes the different words and phrases to be understood and can be customized for each geographic region. One of the unique characteristics of the FGC Speech Recognition System is its ability to generate multiple language models within the same system, so that the system can recognize different languages or dialects, if called upon. The acoustic model is a statistical representation of how phonemes (the units of pronunciation) are pronounced in the context of the application. The Acoustic Model is customized and produced via a training process using recorded samples of speech collected by the FGC/SRS System during the installation phase and thereafter, on a scheduled periodic basis.

Training the Model

You may wish to compare the acoustic model, which is a computer program, to a robot that needs to be trained. In order to train the model, we need to collect thousands of speech samples of actual customers responding to the question, "What City, please?" These samples are then reviewed by FGC analysts to accurately record what the person said, using a program which was designed by FGC engineers for this purpose. Training the model is a complex, sophisticated process which requires a set of computer programs and special skills on the part of the speech systems analyst.

The Speech Analysis and Sample Validation Workstation

In order to customize the Acoustic Model and train it to recognize responses in different languages or in different geographic regions, the FGC/SRS system collects over 500,000 speech samples during the preparation and installation phase of a new project. Using the SRS Speech Analysis Workstation Software, FGC trained speech analysts listen to the recorded samples and "tag" each sample with various indicators and the digital representation of what words were spoken, e. g. "Dallas."

Training the System to Build a Specific Acoustic Model

In order to deliver the best performance for each unique installation, a new customized model must be created using speech samples that are as identical as possible to those utterances which will be recognized by the final installed system. Different tones, emphasis, range of accents, gender ratios, age ratios, and other factors all contribute to the acoustic model and thus to final performance. This unique process involves the use of specialized computer programs and is referred to as "training" the system, since the system is "listening" to hundreds of thousands of speech samples in order to improve its accuracy in recognizing spoken words from this region.

Data Collection Process
1.
The System is designed to automatically record and collect actual customer utterances in order to build several statistically valid test sets which will be used to create the customer specific Acoustic Model and monitor speech recognition accuracy and performance. The collection system stores separate databases for city names, state names, frequently requested listings and other commonly used phrases.
Log Files for off-line analysis
2.
 

The FGC/SRS System includes a logging facility that stores speech samples and the recognition results which can be used as input to tuning, benchmarking and other types of off-line analysis. A Performance Analysis Report can be generated from logged files containing actual customer responses.



For information on products and services, contact FGC at
1-800-7070-FGC (1-800-707-0342)
or email info@fifthgen.com

Copyright ©2000-2007 Fifth Generation Computer Corporation.
All rights reserved.