Skip to content

Getting Started

Learn how to get started with LMQL and write your first program.

1. Installation

You can install LMQL locally or use the web-based Playground IDE. For the use of self-hosted models via 🤗 Transformers or llama.cpp, you have to install LMQL locally.

2. Write Your First Query

A very simple Hello World LMQL query looks like this:

"Say 'this is a test':[RESPONSE]" where len(TOKENS(RESPONSE)) < 25

Model Output

Say this is a test: RESPONSE This is a test

Note: You can click Open In Playground to run and experiment with this query.

This simple LMQL program consists of a single prompt statement and an associated where clause:

  • Prompt Statement "Say 'this is a test'[RESPONSE]": Prompts are constructed using so-called prompt statements that look like top-level strings in Python. Template variables like [RESPONSE] are automatically completed by the model. Apart from single-line textual prompts, LMQL also support multi-part and scripted prompts, e.g. by allowing control flow and branching behavior to control prompt construction. To learn more, see Scripted Prompting.

  • Constraint Clause where len(RESPONSE) < 10: In this second part of the statement, users can specify logical, high-level constraints on the output. LMQL uses novel evaluation semantics for these constraints, to automatically translate character-level constraints like len(RESPONSE) < 25 to (sub)token masks, that can be eagerly enforced during text generation. To learn more, see Constraints.

3. Going Further

Extending on your first query above, you may want to add more complex logic, e.g. by adding a second part to the prompt. Further, you may want to employ a different decoding algorithm, e.g. to sample multiple trajectories of your program or use a different model.

Let's extend our initial query, to allow for these changes:


"Say 'this is a test'[RESPONSE]" where len(TOKENS(RESPONSE)) < 25

if "test" not in RESPONSE:
    "You did not say 'test', try again:[RESPONSE]" where \
        len(TOKENS(RESPONSE)) < 25
    "Good job"

Going beyond what we have seen so far, this LMQL program extends on the above in a few ways:

  • Decoder Declaration sample(temperature=1.2): Here, we specify the decoding algorithm to use for text generation. In this case we use sample decoding with slightly increased temperature (>1.0). Above, we implicitly relied on deterministic argmax decoding, which is the default in LMQL. To learn more about the different supported decoding algorithms in LMQL (e.g. beam or best_k), please see Decoders.

  • Prompt Program: The main body of the program remains the prompt. As before, we use prompt statements here, however, now we also make use of control-flow and branching behavior.

    On each LLM call, the concatenation of all prompt statements so far, form the prompt used to generate a value for the currently active template variable like RESPONSE. This means the LLM is always aware of the full prompt context so far, when generating a value for a template variable.

    After a prompt statement has been executed, the contained template variables are automatically exposed to the surrounding program context. This allows you to react to model output and incorporate the results in your program logic. To learn more about this form of interactive prompting, please see Scripted Prompting.

3. Enjoy

These basic steps should get you started with LMQL. If you need more inspiration before writing your own queries, you can explore the examples included with the Playground IDE or showcased on the LMQL Website.

If you have any questions and or requests for documentation, please feel free to reach out to us via our Community Discord, GitHub Issues, or Twitter.