Creating the Customized Data Model
for a Directory
Assistance Automations
In
the initial planning stages for installing a Directory
Assistance System, a list of the localities to be included
in the locality database, along with other parameters,
must be specified. This list is then processed through
a unique computer program to create the phonetic dictionary,
which phonetically describes the locality database, the
state database, and other listings and phrases to be recognized
by the system and their pronunciations. The language model
describes the different words and phrases to be understood
and can be customized for each geographic region. One
of the unique characteristics of the FGC Speech Recognition
System is its ability to generate multiple language models
within the same system, so that the system can recognize
different languages or dialects, if called upon. The acoustic
model is a statistical representation of how phonemes
(the units of pronunciation) are pronounced in the context
of the application. The Acoustic Model is customized and
produced via a training process using recorded samples
of speech collected by the FGC/SRS System during the installation
phase and thereafter, on a scheduled periodic basis.
Training
the Model
You
may wish to compare the acoustic model, which is a computer
program, to a robot that needs to be trained. In order
to train the model, we need to collect thousands of
speech samples of actual customers responding to the
question, "What City, please?" These samples are then
reviewed by FGC analysts to accurately record what the
person said, using a program which was designed by FGC
engineers for this purpose. Training the model is a
complex, sophisticated process which requires a set
of computer programs and special skills on the part
of the speech systems analyst.
The
Speech Analysis and Sample Validation Workstation
In
order to customize the Acoustic Model and train it to
recognize responses in different languages or in different
geographic regions, the FGC/SRS system collects over
500,000 speech samples during the preparation and installation
phase of a new project. Using the SRS Speech Analysis
Workstation Software, FGC trained speech analysts listen
to the recorded samples and "tag" each sample with various
indicators and the digital representation of what words
were spoken, e. g. "Dallas."
Training
the System to Build a Specific Acoustic Model
In
order to deliver the best performance for each unique
installation, a new customized model must be created
using speech samples that are as identical as possible
to those utterances which will be recognized by the
final installed system. Different tones, emphasis, range
of accents, gender ratios, age ratios, and other factors
all contribute to the acoustic model and thus to final
performance. This unique process involves the use of
specialized computer programs and is referred to as
"training" the system, since the system is "listening"
to hundreds of thousands of speech samples in order
to improve its accuracy in recognizing spoken words
from this region.
| Data
Collection Process |
|
1.
|
 |
The System is designed to automatically record and
collect actual customer utterances in order to build
several statistically valid test sets which will
be used to create the customer specific Acoustic
Model and monitor speech recognition accuracy and
performance. The collection system stores separate
databases for city names, state names, frequently
requested listings and other commonly used phrases.
|
| Log
Files for off-line analysis |
|
2.
|
|
The
FGC/SRS System includes a logging facility that
stores speech samples and the recognition results
which can be used as input to tuning, benchmarking
and other types of off-line analysis. A Performance
Analysis Report can be generated from logged files
containing actual customer responses.
|