[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]
Several requests in the Engine including add-words,
custom grammars in the recognize
request, and
add-grammar
(currently in beta) take a list of words as an
option. For example, you could add the words "xcommand"
, "mod9"
,
and "janin"
to the "en_video"
model with the following command:
echo '{
"command": "add-words",
"asr-model": "en_video",
"words": [
{ "word": "xcommand" },
{ "word": "mod9", "soundslike": "mod nine" },
{ "word": "janin", "phones": "JH AE N IH N"}
]
}' | jq -c . | nc $HOST $PORT
The "word"
field contains the spelling of the word, and is what
the Engine outputs. It can be any text including numerals or unicode
(in utf8 encoding).
Note that for "xcommand"
, no additional information is provided. In
this case, the Engine will compute pronunciations automatically. In
general, the Engine does a pretty good job of computing pronunciations
automatically, but accuracy can be improved by providing additional
information in the options. Also, if the "word"
field contains
anything other than "a" through "z", the automatically generated
pronunciation can be inaccurate.
For "mod9"
, the option "soundslike"
was provided. This gives the
Engine a hint for how to pronounce "mod9"
. This is particularly
useful for words that contain anything other than the letters "a"
through "z". The "soundslike"
field should contain only letters and
spaces. It's best if "soundslike"
is composed of common words, each
with a single pronunciation.
You can also explicitly provide the pronunciation in the "phones"
field. This is demonstrated in the example for "janin"
.
For ASR models trained by Mod9, the
phones
field describes how the word is spoken and must be a phonetic
sequence from CMUdict, which is
itself a subset1 of
ARPAbet. We do not support lexical stress.
Although automatically generated pronunciations and "soundslike"
typically produce good result, using "phones"
will produce the most
accurate transcriptions. The next section describes how to audit
automatically generated pronunciations.
The "pronounce-words"
command will return what the Engine
would use for pronunciations given the "words"
option. It is useful
so you can manually audit the automatically generated pronunciations
and use the "phones"
field in the actual calls to e.g. "add-words"
.
For example:
echo '{
"command": "pronounce-words",
"words": [
{"word": "xcommand"},
{"word": "mod9", "soundslike": "mod nine"},
{"word": "janin", "phones": "JH AE N IH N"}
]
}' | jq -c . | nc $HOST $PORT
This returns:
{ "status": "completed", "words": [
{ "word": "xcommand", "phones": "Z K AH M AE N D" },
{ "word": "xcommand", "phones": "Z K AA M AH N D" },
{ "word": "xcommand", "phones": "EH K S K AH M AE N D" },
{ "word": "mod9", "phones": "M AO D N AY N" },
{ "word": "janin", "phones": "JH AE N IH N" }
]}
Note that the Engine returned multiple automatically generated
pronunciations for "xcommand"
. This is both because x
is often
pronounced as in "xylophone", and because of inherent ambiguity in
pronunciation generation. Although incorrect pronunciations don't
typically hurt recognition that much as long as the correct
pronunciation is also present, for highest accuracy, it's best to
select the correct pronunciations and use them in future commands. For
example:
echo '{
"command": "add-words",
"asr-model": "en_video",
"words": [
{ "word": "xcommand", "phones": "EH K S K AH M AE N D" },
{ "word": "mod9", "phones": "M AO D N AY N" },
{ "word": "janin", "phones": "JH AE N IH N" }
]
}' | jq -c . | nc $HOST $PORT
This command also supports the g2p-options
or g2p-cost
request options for more advanced functionality.
1: CMUdict is a subset of ARPAbet with the following phonemes absent: AX
, AXR
, DX
, EL
, EM
, EN
, H
, IX
, NX
, Q
, UX
, WH
. ↩
©2019-2022 Mod9 Technologies (1.9.5)