[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]
Note that the add-grammar
command is currently in beta and is
subject to change in future releases.
Typical models used in the Engine recognize generic conversational
English. The add-grammar
command allows you to modify an existing
model to recognize a highly structured grammar in addition to the
default conversational grammar. A good example is US telephone
numbers. The phone number (415) 721-0127 would rarely be spoken as
"forty one fifty seven two one hundred one twenty seven", but may
well be spoken "four one five seven twenty one oh one two seven".
Exploiting this constrained structure can improve accuracy, especially
if the audio quality is poor.
The add-grammar
command is similar to the existing custom grammar
feature. The difference is that a
custom grammar would only recognize phone numbers, whereas
add-grammar
allows recognition of phone numbers interspersed with
conversation. For example, if the audio is just "four one five seven
two one oh one two seven", then a custom grammar is appropriate,
whereas if the audio is "you can reach me at four one five seven two
one oh one two seven thanks", then add-grammar
is appropriate. Also,
the custom grammar is sent along with a recognition
request, and
does not modify the model, whereas the add-grammar
is its own
command, and modifies a model. So like add-words
, add-grammar
will
affect all recognitions after the add-grammar
command completes
(until the Engine is terminated or the model is reloaded).
To modify a model in an Engine with a grammar, you pass the
add-grammar
command with a "words"
option containing the
pronunciations of all the words in the grammar, a "grammar"
option
with the description of the grammar, and the "asr-model"
option with
the model to be modified. The "word"
and "grammar"
options are
documented for custom grammars, and we recommend reading
through the examples on that page to get a better idea of how a grammar
is constructed. Note that for add-grammar
, the grammar "type"
option must be "graph"
.
The following example is an excerpt of the command that would add a
phone number grammar to the mod9/en-US_phone-smaller
model of an Engine
running on locally at port 9900. This example is not meant to be stand
alone, but rather is used to demonstrate the format and structure of
the add-grammar
command.
echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller",
"words": [
{ "word": "eighty", "phones": "EY T IY" },
{ "word": "four", "phones": "F AO R" },
...
],
"grammar": {
"type": "graph",
"start": 0,
"exits": [ "1", "12", ... ],
"arcs": [
{ "from": "0", "to": "1", "word": "one" },
{ "from": "10", "to": "20", "word": "nineteen"},
...
]
}
}' | jq -c | nc localhost 9900
If you provide an id
string to the add-grammar
request, you can
later call drop-grammar
with the same asr-model
and id
to remove
the grammar. Note that providing an id
increases memory usage
slightly.
The following example shows how to add a real phone number grammar to an existing model in an Engine. It uses the same phone number grammar as is described for a custom grammar.
First, download the phone number grammar.
curl -sO https://mod9.io/phone-number-grammar.json
Next, download an audio file of a person saying, "ah yes this is adam and you can reach me at 415 721-0127 thanks".
curl -sO https://mod9.io/voicemail.wav
Since the add-grammar
command modifies a model in a running Engine,
it's best to start a new Engine for testing. The example below
uses the mod9/en-US_phone-smaller
model because the regular models have
high enough accuracy that it can be hard to see the differences. Also
note --models.mutable
; this argument is provided to allow you
to protect the Engine against clients modifying the models unexpectedly.
docker run -d mod9/asr \
engine --models.asr=mod9/en-US_phone-smaller --models.nlp= --models.mutable=true
Now run recognition on the audio file without the added phone number grammar and notice the many errors (mostly due to the use of the small model).
cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me for one by seventeen wine zero onto so
To add the phone number grammar, we use the jq
command to add the
required components to phone-number-grammar.json
, and pass it on to the
Engine:
jq -sc '.[0] + .[1]' phone-number-grammar.json <(echo '{"command": "add-grammar", "asr-model": "mod9/en-US_phone-smaller"}') | nc localhost 9900
At this point, the mod9/en-US_phone-smaller
has been modified to support
US phone numbers. Any audio recognized using this Engine and that
model will recognize not just English conversational audio, but also
phone numbers.
cat voicemail.wav | nc localhost 9900 | jq -r .transcript
# oh yeah this is that um and you can reach me four one five seven two one zero one two seven
©2019-2022 Mod9 Technologies (Version 1.9.5)