Iteration
gbnf
offers a set of rules for describing the valid next characters for a state representing a given GBNF grammar and input. These rules can be used to constrain the possible set of tokens an LLM may produce.
Iterating through input rules
The output of GBNF
is a state object, representing the given grammar and any input provided so far. Iterating through it will produce a set of rules.
value
can be a code point or a range of code points.
To update the state, call add
with a new string; the resultant set of rules will reflect the new state:
An invalid string will raise an error:
Input rule types
Rules can have one of the following three types:
CHAR
CHAR
rules define valid next characters.
The type definition of a CHAR
rule is:
value
can either be a single number (denoting a single codepoint), or a range (two numbers, inclusive, denoting a range of code points.)
Code points can be translated back into characters with:
CHAR
rules are generated from GBNF like:
root ::= "foo"
Or, for ranges:
root ::= [a-z]
CHAR_EXCLUDE
CHAR_EXCLUDE
rules define invalid next characters.
The type definition of a CHAR_EXCLUDE
rule is:
value
can either be a single number (denoting a single codepoint), or a range (two numbers, inclusive, denoting a range of code points.)
Code points can be translated back into characters with:
CHAR_EXCLUDE
rules are generated from GBNF like:
root ::= ^[a-f]
NOTE
It can be possible to receive both CHAR
and CHAR_EXCLUDE
rules. Consider the following grammar:
root ::= ("bar") | ("ba" ^r)
Input ba
would produce the following two valid rules:
It is up to you how to interpret this.
END
An END
rule defines a valid stopping point.
The type definition of an END
rule is:
END
rules are generated from the end of a grammar. For example:
Will produce an end rule.
More commonly, an END
rule will appear alongside an array of valid other rules:
Here, we will receive an END rule as well as a “comma” rule.
NOTE The above example introduced the concept of rule refs, or rules that reference other defined rules. Rule refs are common for any sufficiently advanced grammar.
GBNF
automatically resolves rule refs to their eventual definitions, and are not exposed in the parse state.