Skip to content

LMQL becomes simpler and adds llama.cpp

July 13, 2023

Today we are releasing LMQL 0.0.6.5. This update contains a major simplification of the LMQL syntax, moving it much closer to standard Python. It also includes a llama.cpp based inference backend, several bug fixes and other minor improvements.

You can try the latest version of LMQL in your browser at lmql.ai/playground or install it via pip install lmql.

One Line Is All It Takes

Most notably, 0.0.6.5 comes with several simplifications of the core syntax of LMQL. Of course, all changes are backwards compatible, so you can continue to use your existing query code and move to the new version without any changes.

With this, we aim to minimize syntactic overhead, employing sensible defaults to enable more concise programs like the following:

"One line is all it takes [CONTINUATION]"
promptdown

One line is all it takes CONTINUATIONFallin' in love with me.

Sensible Defaults This is possible because LMQL now automatically assumes argmax and openai/text-davinci-003 as (configurable) default model. If you prefer to use a different model or custom decoder settings, you can still specify them explicitly, e.g. in the @lmql.query decorator function as demonstrated later in this post.

Without any additional configuration, the simple query code above translates to a full LMQL program like this:

argmax "One line is all it takes [CONTINUATION]" from "openai/text-davinci-003"

Inline Constraints

LMQL now allows you to specify several inline where constraints. This enables constraints that refer to local program variables, which means constraints can now be dependent on previous model outputs.

"A list of awesome Dua Lipa songs:\n"
songs = []

"- New Rules\n"
for i in range(4):
    "-[SONG]\n" where STOPS_BEFORE(SONG, "\n")
    songs.append(SONG)

"Out of these, my favorite is[FAVORITE]" where FAVORITE in songs
promptdown

A list of awesome Dua Lipa songs:⏎ - New Rules - SONGDon't Start Now - SONGIDGAF - SONGBe the One - SONGBlow Your Mind (Mwah) Out of these, my favorite is FAVORITEDon't Start Now

Note also how in this example LMQL code now reads much more like standard Python code, without any additional level of indentation.


@lmql.query functions

The overhauled syntax also makes LMQL much easier on the eyes when used with the @lmql.query function decorator in Python:

python
import lmql
import json

@lmql.query(model="openai/text-curie-001", temperature=0.9)
def summarize(): 
    '''lmql
    """
    Provide a summary of Dua Lipa, the pop icon:
    {{
      "name": "[STRING_VALUE]",
      "chart_position": [INT_VALUE],
      "top_songs": [[
         "[STRING_VALUE]",
         "[STRING_VALUE]"
      ]]
    }}
    """ where STOPS_BEFORE(STRING_VALUE, '"') and INT(INT_VALUE) and len(TOKENS(INT_VALUE)) < 3
    
    return json.loads(context.prompt.split("pop icon:",1)[1])
    '''

print(summarize()) # {'name': 'Dua Lipa', 'chart_position': 3415, 'top_songs': ['New Rules', 'Havana']}


lmql.F Lambda Functions

Based on LMQL's new minimal syntax, we introduce a novel and concise way to write LLM-based lambda functions. This offers a lightweight entryway to get started with integrated small LLM-based utilities in your code, without having to write a full LMQL program.

python
import lmql

summarize = lmql.F("Summarize the following in a few words: {data}: [SUMMARY]")
main_subject = lmql.F("What is the main subject (noun) of the following text? {data}: [SUBJECT]", 
                      "len(TOKENS(SUBJECT)) < 20")

text = "In LMQL, users can specify high-level, logical constraints ..."

summarize(data=text) # LMQL enables high-level constraints to be enforced during text 
                     # generation, simplifying multi-part prompting and integration.
main_subject(data=text) # Language Model Query Language (LMQL)



llama.cpp Inference Backend

LMQL now also fully integrates with the excellent llama.cpp C++ implementation of a number of Transformer-based language models.

Using llama.cpp from LMQL is as simple as specifying it in the from clause of a query:

argmax "Say 'this is a test':[RESPONSE]" from "llama.cpp:<PATH TO WEIGHTS>.bin"

We support, both, in-process loading of llama.cpp, as well as remote inference via lmql serve-model. To learn more about llama.cpp and how to use it with LMQL, check out the corresponding chapter in the LMQL documentation.


Other Changes

  • LMQL now includes a random model backend, which randomly samples tokens from the GPT-2 vocabulary. This is useful for debugging and testing purposes and can be used for data generation in the context of highly constrained query programs.

  • Two caching issues have been fixed, avoiding cache collisions which could lead to repeated model outputs.

  • More robust query string parsing, allowing for robust escaping of special characters [, ], { and }.

  • Added support for transformers based Llama models and the associated (fast) implementation of HF tokenizers.

  • Simplified Azure OpenAI support, see the relevant chapter in the documentation.

We thank community members @minosvasilias and @CircArgs for their contribution to this release.