Skip to content

Output Streaming

Stream Query Progress In Real Time

LMQL supports many forms of communicating query progress and results to the surrounding context, including the ability to stream intermediate values to the caller or a client connected via HTTP.

This chapter first discusses the standard output writers, supported out of the box, and then discusses how to create you custom output writers, to implement more advanced streaming scenarios.

Standard Output Writers

To simply print the current query output to the standard output, you can use the lmql.printing output writer. This will show query progress during execution, as well as intermediate validation results.

lmql
await lmql.run("'Q: Hello\\n A:[WHAT]'", 
               output_writer=lmql.printing)
output
Q: Hello
A: Hi there! How can I assist you?

 valid=True, final=fin

Alternatively, if you want to only stream the result for a specific variable, you can use the lmql.stream("VAR") output writer. This will only print the result for the variable VAR, as it is generated by the query.

lmql
await lmql.run("'{:user} Hello\\n {:assistant}[RESPONSE]'", 
               model="chatgpt",
               output_writer=lmql.stream("RESPONSE"))
output
Hello! How can I assist you today?

Lastly, there are also the options lmql.headless and lmql.silent to disable all input and output, and to disable all output, respectively. The difference between the two is that headless will raise an exception if the query asks for user input (via input()), while silent will ask for user input, but not print anything to the standard output.

Custom Output Writer

Next to the standard output writers, you can also provide your own implementation and pass it via output_writer= when running a query.

The basic interface for an output writer is as follows:

python
class BaseOutputWriter:
    async def input(self, *args):
        """
        Handle user input with an input prompt of *args. This is invoked when a query asks for user input via `await input()`.

        Returns:
            str: The user input.
        """

    async def add_interpreter_head_state(self, variable, head, prompt, where, trace, is_valid, is_final, mask, num_tokens, program_variables): 
        """
        Called whenever the query interpreter progresses in a meaningful way (e.g. new token added, new variable added, variable updated, etc.).

        Parameters:
            variable (str): 
                The name of the currently active variable.
            head (int): 
                The index of the current interpretation head (deprecated, will always be 0).
            prompt (str): 
                The full interaction trace/prompt of the query.
            where (object): 
                The AST representation of the queries validation condition.
            trace (object): 
                The evaluation trace of evaluating 'where' on the current program variables during generation.
            is_valid (bool): 
                Whether the current program variables satisfy the validation condition.
            is_final (bool): 
                Whether the value of 'valid' can be considered final (i.e. decoding more tokens will not change the value of 'valid').
            mask (np.ndarray): 
                Currently active token mask.
            num_tokens (int): 
                Number of tokens in the current 'prompt'.
            program_variables (ProgramState): 
                The current program state (lmql.runtime.program_state). E.g. program_variables.variable_values is a mapping of variable names to their current values.
        """

Based on this interface, you can implement your own output writer to implement custom streaming. For examples of how this interface can be used, see the implementation of the standard output writers in lmql.runtime.output_writer.