Skip to content

LMQL Release 0.0.5

April 17, 2023

Today we are releasing version 0.0.5 of LMQL. This release focuses on stability and performance improvements. For a detailed list of changes, please see below. We are particularly excited about the first community contributions that have been merged as part of this release, with many more in the works.

lmql==0.0.5 has been published on PyPI, based the current main branch of the GitHub repository. The updated version has also been deployed to the browser-based


  • Decoder Performance The argmax and sample decoders have undergone some optimizations, allowing them to run faster. This results in a 20-30% speed-up on common query workloads. #24.

  • Postprocessing Semantics Internally, LMQL now allows constraints to implement postprocessing semantics. This is used to convert variable values after they have been completed, to a more normalized form in the prompt, and to a semantically meaningful data type in the context of the query code. #24.

    For example, when using an INT(<var>) constraint on a generated number, the model will be restricted to only generate valid integers, and now, the resulting NUM value will additionally be converted to an int value:

       "My favorite number is: [NUM]\n"
       print(type(NUM), NUM * 2) # <class 'int'> 4
       "Number times two is {NUM * 2}"
  • Core Interpreter A complete reimplementation of the LMQL core interpreter has been completed. This fixes a couple of minor issues and overall, improves reliability and performance when dealing with branching decoding algorithms. #24.

  • Playground Locally and when used in-browser, the LMQL Playground now streams debugger information from the LMQL interpreter incrementally. This leads to speed-ups when running in the Playground, especially with longer outputs. #27f9a8ad.

  • Other Fixes:

    • When used from within Python (as decorated function), LMQL code no longer has to be doubly-escaped, e.g. you can now write STOPS_AT(VAR, "\n") instead of STOPS_AT(VAR, "\\n")
    • The LMQL inference API buffers requests that come in during startup, to avoid errors when the server is not yet ready. #15, thanks to @chrispan.
    • OpenAI request parallelization no longer leads to an error on Linux systems, with regards to worker processes #6.


Apart from the changes above, we are also working on a number of other features, including:

  • llama.cpp support as started in this PR, thanks to @CircArgs.

  • Support for Type Constraints, e.g. type(VAR) is DataClass, that automatically force the model to produce a value that structurally conforms to the given type. See this Twitter thread for more details.

  • Support for using Antlr parsers during query execution, to force the model to produce a value that conforms to a given grammar.

  • Extending Logit Masking to OpenAI Chat Models. This will enable full support for LMQL constraints with e.g. chatgpt and gpt-4 models. See #25, thanks to @kharvd.