Skip to content

Overview

LMQL is a high-level, front-end language for text generation. This means that LMQL is not specific to any particular text generation model. Instead, we support a wide range of text generation models on the backend, including OpenAI model, llama.cpp and HuggingFace Transformers.

Loading Models

To load models in LMQL, you can use the lmql.model(...) function which gives you an lmql.LLM object:

lmql
lmql.model("openai/gpt-3.5-turbo-instruct") # OpenAI API model
lmql.model("random", seed=123) # randomly sampling model
lmql.model("llama.cpp:<YOUR_WEIGHTS>.gguf") # llama.cpp model

lmql.model("local:gpt2") # load a `transformers` model in-process
lmql.model("local:gpt2", cuda=True, load_in_4bit=True) # load a `transformers` model in process with additional arguments
lmql.model("gpt2") # access a `transformers` model hosted via `lmql serve-model`

LMQL supports multiple inference backends, each of which has its own set of parameters. For more details on how to use and configure the different backends, please refer to one of the following sections:

Specifying The Model

After creating an lmql.LLM object, you can pass it to a query program to specify the model to use during execution. There are two ways to do this:

Option A: Specifying the Model Externally

You can specify the model and its parameters externally, i.e. separately from the actual program code:

lmql
import lmql

# uses 'chatgpt' by default
@lmql.query(model="chatgpt")
def tell_a_joke():
    '''lmql
    """A great good dad joke. A indicates the punchline
    Q:[JOKE]
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and \
                           STOPS_AT(PUNCHLINE, "\n")
    '''

tell_a_joke() # uses chatgpt
tell_a_joke(model=lmql.model("openai/text-davinci-003")) # uses text-davinci-003

Here, the tell_a_joke query will use ChatGPT by default, but can still be configured to use a different model by passing it as an argument to the query function on invocation.

Option B: Queries with from Clause

You can specify the model as part of the query itself. For this, you can use from in combination with the indented syntax. This can be particularly useful, if your choice of model is intentional and should be part of your program.

lmql
argmax
    "This is a query with a specified 'from'-clause: [RESPONSE]"
from
    "openai/text-ada-001"

Here, we specify "openai/text-ada-001" directly, but the shown snippet is equivalent to the use of lmql.model(...), i.e. lmql.model("openai/text-ada-001").

Note, that the from keyword is only available with the indented standalone syntax as shown here, where the decoder keywords has to be provided explicitly.

Playground

To specify the model when running in the playground, you can use the model dropdown available in the top right of the program editor, to set and override the model parameter of your query program:

Model selection dropdown in the LMQL Playground.

Adding New Model Backends

Due to the modular design of LMQL, it is easy to add support for new models and backends. If you would like to propose or add support for a new model API or inference engine, please reach out to us via our Community Discord or via hello@lmql.ai.