[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]
By default, requests to the Engine use a pre-built vocabulary and grammar with a very large number of words and generic English grammar. Once these ASR models are loaded, they can be customized by the Engine. The Engine supports commands for customizing loaded ASR models by adding out-of-vocabulary words, and updating the bias weights of loaded words.
The add-words
command allows a client to add new words to a loaded ASR model.
For example, you could add the words "xcommand"
, "mod9"
,
and "janin"
to the "en_video"
model with the following command:
echo '{
"command": "add-words",
"asr-model": "en_video",
"words": [
{ "word": "xcommand" },
{ "word": "mod9", "soundslike": "mod nine" },
{ "word": "janin", "phones": "JH AE N IH N"}
]
}' | jq -c . | nc $HOST $PORT
The words
field consists of an array of lexical entries. Each
lexical entry has the spelling in the word
field (e.g. "janin"
in the above) that can be pretty much anything -- it's what gets
printed by the engine when it recognizes that word. If you don't
provide any other options, the pronunciation will be generated
automatically. See Custom Pronunciations for
more details on how to specify how each word is pronounced using either
the "soundslike"
or "phones"
options.
The options include:
Option | Type | Default | Description |
---|---|---|---|
asr-model |
string | N/A | The ASR model to modify. |
cost |
number | 5.0 | Controls the frequency for all words in the words list. The lower the cost , the more frequently the words appear. |
id |
string | None | (Optional) If provided, you can use this string to later remove the added words using drop-words . |
words |
array of objects | N/A | A list of words along with their pronunciations. |
Each of the objects in the words
array have the following options:
Option | Type | Default | Description |
---|---|---|---|
cost |
number | 0.0 | An adjustment for the likelihood of this pronunciation. |
phones |
string | N/A | A space delimited phonetic sequence representing the pronunciation of the word. |
soundslike |
string | N/A | English-like "sounds out" pronunciation. |
word |
string | N/A | The spelling of the word. |
See Custom Pronunciations for more information on the "soundslike"
and "phones"
options.
If a word has multiple pronunciations, they can all be added in a single
add-words
request by including multiple lexical entries, each
using the same word
but a different phones
entry for each variant.
The relative likelihoods of these can be tuned with their respective cost
values.
The response to the add-words
request can have the following fields:
Field | Type | Description |
---|---|---|
added |
int | The number of unique out-of-vocabulary words added to the ASR model through this request. |
updated |
int | The number of unique in-vocabulary words updated in the ASR model through this request. |
When the Engine receives an add-words
request, it will immediately
begin to recognize the added words. Any recognitions
currently in progress that are using the modified ASR model will immediately start recognizing those words.
The additions will persist until the Engine is shut down. If you
want the Engine to recognize the new words in a future session,
you must send the add-words
command to the new Engine instance.
Note that the add-words
command adds the words regardless of whether
they're already in the vocabulary. Adding a word that already is in
vocabulary may increase the likelihood of that word being output (but
will never decrease it).
Words already in vocabulary have been tuned such that they should
appear with the correct frequency. For words you add to the vocabulary
with add-words
, you can specify a cost
term that controls
the frequency for all words in the words list. If the cost
is too low,
the new words will appear where they shouldn't, and memory use and run
time will increase. If the cost
is too high, the new words will
fail to appear when they should. The cost
is loosely related to the
negative of the log of the likelihood of the added words and must be nonnegative.
A cost
of 0
will almost certainly cause the words to
appear much more frequently than desired. cost
s above around 14
are high enough that the new words will never appear. The default is 5
.
If the same word/pronunciation is added multiple times, the version with lowest cost will have precedence.
You cannot adjust the cost
once it has been set. Instead, you must
remove the previous words with drop-words
(see below) and add them
back with the new cost
.
It is recommended that the cost
be set to a large enough number so that the added word does
not slow down processing but small enough that the added word shows up in the alternatives lists,
and that the accuracy of the speech recognition be further tuned
by adjusting the bias
of the word using bias-words
.
See the Tuning section for more information.
Each call to add-words
increases the memory used by the Engine by at
least 50MB regardless of how many words are added by the single
call. You can save memory by reducing the number of calls to
add-words
by including as many words as possible per call.
Passing an id
(which allows the words to later be removed with a call
to drop-words
) increases the memory usage slightly. Calling drop-words
should free all the memory used by the corresponding call to add-words
with the same id
.
Words that were added with add-words
that included an id
may later be
deleted by calling drop-words
with the same id
. For example:
echo '{
"command": "add-words",
"asr-model": "en_video",
"id": "proper names",
"words": [
{ "word": "janin" },
{ "word": "stiggs" },
{ "word": "vanceson" }
]
}' | jq -c . | nc $HOST $PORT
echo '{
"command": "drop-words",
"asr-model": "en_video",
"id": "proper names"
}' | jq -c | nc $HOST $PORT
If you do not provide an id
with add-words
, there is no way short
of unloading and reloading the model to remove the added words.
Words that are in an ASR model's vocabulary, whether in the base model, or added in an add-words
request, can be re-weighted or biased with the bias-words
command.
The bias-words
command will bias the word everywhere in the model, changing the likelihood
of the word being output by the recognizer in every context.
The following command will bias the model so that "basketball"
and "hoop"
are
output more often, while "alien"
is output less often.
echo '{
"command": "bias-words",
"asr-model": "en_video",
"words": [
{"word":"basketball", "bias":2.3},
{"word":"hoop", "bias":3},
{"word":"alien", "bias:-2}
]
}' | jq -c . | nc $HOST $PORT
The words
option takes in an array of objects, each specifying a word
entry and a bias
amount.
Each call to bias-words
sets the bias
of the given word
, i.e. the bias-words
command is idempotent.
bias
es can take any amount. A positive bias will cause the biased word to be output more frequently
by the Engine, and a negative bias will cause the biased word to be output less frequently. By default,
all words in a model start with a bias of 0
.
Option | Type | Default | Description |
---|---|---|---|
words |
array of objects | N/A | A list of words along with biases. |
asr-model |
string | N/A | The ASR model to modify. |
Each of the objects in the words
array has the following options:
Option | Type | Default | Description |
---|---|---|---|
word |
string | N/A | The spelling of the word. |
bias |
number | N/A | The new bias for the given word. |
Similar to add-words
requests, once the Engine completes a bias-words
request, the new bias
es will
immediately apply to all requests using the modified model.
To undo or reset a bias
for a word, send a bias-words
command to set the bias to the default of 0.
Stopping and restarting the Engine will cause all biases to be reset to 0.
The lookup-word
command allows a client to query an ASR model for information about a given word.
echo '{"command":"lookup-word", "word":"euphoria", "asr-model":"en_video"}' | nc $HOST $PORT
Option | Type | Default | Description |
---|---|---|---|
word |
string | N/A | The word to look up. |
asr-model |
string | N/A | The ASR model to query. |
The Engine responds with fields indicating whether the queried word is in the ASR model's vocabulary. If the word is in the model's vocabulary, the response may have additional fields indicating other information.
{
"bias": 0,
"found": true,
"asr_model": "en_video",
"status": "completed",
"word": "euphoria"
}
Unfortunately, it is difficult to know a priori what the cost
and bias
parameters
for a given set of words should be, as it's dependent not only on how common the words are, but
also on details of how the base model was constructed and optimized. Tuning the right values
of cost
and bias
may take some trial and error before the right values are found.
Words that are not in an ASR model's vocabulary should be added to the model with an add-words
request.
We recommend adding the words with a relatively high cost
so that the newly added words do not show
up too often as false positives, and so that the added words do not noticeably slow down the speech recognizer.
The cost
should still be low enough that the added words show up in the phrase-alternatives
when they are spoken in the audio.
Once the added word shows up reliably in the phrase-alternatives
, further tuning should be done with bias-words
.
To set the cost of an added word to a higher value, you must restart the Engine and call add-words
with the new cost
.
To tune the bias
value, send a recognition request with audio containing the specified words,
and with phrase alternatives and biases requested. We can use the reported bias
scores
of each alternative to tune the correct value in the bias-words
command. The idea is to adjust
the bias
value so that the bias
score of the correct phrase is just less than the cost of the alternatives.
For example, suppose the file novavax.wav
contains the audio "Trials for Novavax are under way"
, but "Novavax"
is not
showing up in the transcript. To correct this, we run the audio through the Engine with
"phrase-alternatives": 8
and "phrase-alternatives-bias":true
.
(echo '{"phrase-alternatives": 8, "phrase-alternatives-bias":true}'; cat novavax.wav) | nc $HOST $PORT | jq .
Find the phrase where Novavax
appears. Here's an example:
...
"alternatives": [
{
"bias': {
"am": 0,
"lm": 0
}
"phrase": "nova vacs"
},
{
"bias": {
"am": 0,
"lm": 2.056
}
"phrase": "Novavax"
},
...
In this case, the best scoring phrase is "nova vacs"
, which has a total bias
of 0.0
.
The sum for the next best, "Novavax"
is 0 + 2.056 = 2.056
. This indicates
that we can adjust "Novavax"
to be the best scoring phrase by setting its bias to a value
greater than 2.056
.
echo '{"command":"bias-words", "asr-model":"en_video", "word":"Novavax", "bias":2.1}' | nc $HOST $PORT
If we send the the previous recognition request again...
(echo '{"phrase-alternatives": 8, "phrase-alternatives-bias":true}'; cat novavax.wav) | nc $HOST $PORT | jq .
...
"alternatives": [
{
"bias": {
"am": 0,
"lm": 0
}
"phrase": "Novavax"
},
{
"bias": {
"am": 0,
"lm": 0.044
}
"phrase": "nova vacs"
},
...
We see that "Novavax"
is now the best scoring phrase alternative, as desired.
Note that if the bias
is set too high, the biased word might start showing up unexpectedly
in undesired places.
©2019-2022 Mod9 Technologies (Version 1.9.5)