[ Overview || TCP | C++ | Python | REST | WebSocket || Models | Customization | Deployment | Licensing ]
The Engine allows an advanced endpoint-rules request option that can be used
to customize the endpointing system for a "recognize" request in default mode.
As an example, we perform a recognition job with a rule to endpoint any time there is 0.5 seconds of consecutive silence when the current utterance is longer than 7 seconds.
curl -sLO mod9.io/SW_4824_B.wav
(echo '{"cmd":"recognize", "endpoint-rules": {"rule1":{"min-utterance-length":7, "min-trailing-silence":0.5}}}'; \
    cat SW_4824_B.wav) | nc $HOST PORTEach time the Engine reads a chunk of audio, it needs to decide whether the utterance it is currently processing has reached an endpoint. Endpointing in Kaldi is implemented as a disjunction (a chained OR) of several endpointing rules.
// Returns a boolean. True if we've reached an endpoint.
// This is called every time the engine reads in a chunk of audio.
EndpointDetected {
    if (rule0.Activated) {
        return true;
    }
    .
    .
    if (rule5.Activated) {
        return true;
    }
    return false;
}Internally, the engine has 6 rules. Each rule has the same structure: they're all the same function, but vary in their parameters. Each rule is a conjunction (a chain of ANDs) of several parameters.
// Returns true if this endpointing rule detects an endpoint.
Rule::Activated {
      return
      (contains_nonsilence OR !rule.must_contain_nonsilence) AND
      trailing_silence >= rule.min_trailing_silence AND
      relative_cost <= rule.max_relative_cost AND
      utterance_length >= rule.min_utterance_length AND
      utterance_length <= rule.max_utterance_length;
}The endpoint-rules are customized by overwriting these parameters.
When writing endpointing rules, there are a few useful principles to keep in mind.
- The Engine should endpoint and end a segment during long pauses.
- Longer segments generally are more accurate because they have more audio and language context.
- If the system only outputs long segments, there will be high latency in between messages, which might make the engine feel unresponsive, especially when processing live audio in real time.
- As the utterance gets longer the internal lattice representing the current utterance grows in complexity. Thus it is practical to tolerate shorter and shorter pauses when dealing with longer utterances.
- The latencyrequest option will affect endpointing performance; shorter is better.
Endpoint rules can be passed in as a request option in the initial JSON request
when the request command is "recognize" and batch is false.
NOTE: It is the convention that higher numbered rules deal with longer utterance lengths.
| Field | Type | Description | 
|---|---|---|
| endpoint-rules | object | Add additional endpointing options, overriding defaults and engine command line. The accepted keys are "rule0"..."rule5". Example:{"endpoint-rules": {"rule2": {"min-trailing-silence": 0.1}}} | 
The JSON for each endpoint has the following fields:
| Field | Type | Description | 
|---|---|---|
| must-contain-nonsilence | boolean | True if the utterance must have a non-silent frame for us to endpoint. | 
| min-trailing-silence | number | Minimum duration in seconds of consecutive silence at the end of the current utterance. We restart counting once we hit nonsilence | 
| max-relative-cost | number or "inf" | A non-negative cost that is 0 if it is extremely likely we are at a final state, and higher the less likely we are to be at a final state. This is primarily used for small grammars. | 
| min-utterance-length | number | Minimum number of seconds of the utterance for this rule to apply (before min-utterance-lengthseconds, do not apply this rule). | 
| max-utterance-length | number | Maximum number of seconds of the utterance for this rule to apply (after max-utterance-lengthseconds, do not apply this rule). | 
To implement a hard cut at 40s, we overwrite rule 6 so that it is always activated
when utterance_length > 40.
curl -sLO mod9.io/SW_4824_B.wav
(echo '{' \
      '  "endpoint-rules":' \
      '    {' \
      '      "rule5":' \
      '        { ' \
      '         "min-utterance-length":40,' \
      '          "max-utterance-length":100,' \
      '          "max-relative-cost":"inf",' \
      '          "min-trailing-silence":0,' \
      '          "must-contain-nonsilence":false' \
      '        }' \
      '    }' \
      '}'; cat SW_4824_B.wav) | nc $HOST $PORTSet the min-utterance-length of each rule to a duration longer than 20.
©2019-2022 Mod9 Technologies (Version 1.9.9)
