community: Outlines integration (#27449)

In collaboration with @rlouf I build an [outlines](https://dottxt-ai.github.io/outlines/latest/) integration for langchain! I think this is really useful for doing any type of structured output locally. [Dottxt](https://dottxt.co) spend alot of work optimising this process at a lower level ([outlines-core](https://pypi.org/project/outlines-core/0.1.14/) written in rust) so I think this is a better alternative over all current approaches in langchain to do structured output. It also implements the `.with_structured_output` method so it should be a drop in replacement for a lot of applications. The integration includes: - **Outlines LLM class** - **ChatOutlines class** - **Tutorial Cookbooks** - **Documentation Page** - **Validation and error messages** - **Exposes Outlines Structured output features** - **Support for multiple backends** - **Integration and Unit Tests** Dependencies: `outlines` + additional (depending on backend used) I am not sure if the unit-tests comply with all requirements, if not I suggest to just remove them since I don't see a useful way to do it differently. ### Quick overview: Chat Models: <img width="698" alt="image" src="https://github.com/user-attachments/assets/05a499b9-858c-4397-a9ff-165c2b3e7acc"> Structured Output: <img width="955" alt="image" src="https://github.com/user-attachments/assets/b9fcac11-d3e5-4698-b1ae-8c4cb3d54c45"> --------- Co-authored-by: Vadym Barda <vadym@langchain.dev>
2025-06-20 05:43:55 +00:00 · 2024-11-21 05:31:31 +08:00 · 2024-11-21 05:31:31 +08:00 · dee72c46c1
commit dee72c46c1
parent 2901fa20cc
14 changed files with 2162 additions and 0 deletions
--- a/docs/docs/integrations/chat/outlines.ipynb
+++ b/docs/docs/integrations/chat/outlines.ipynb
@ -0,0 +1,348 @@
 {
    "cells": [
        {
            "cell_type": "raw",
            "id": "afaf8039",
            "metadata": {},
            "source": [
                "---\n",
                "sidebar_label: Outlines\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "e49f1e0d",
            "metadata": {},
            "source": [
                "# ChatOutlines\n",
                "\n",
                "This will help you getting started with Outlines [chat models](/docs/concepts/chat_models/). For detailed documentation of all ChatOutlines features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/outlines.chat_models.ChatOutlines.html).\n",
                "\n",
                "[Outlines](https://github.com/outlines-dev/outlines) is a library for constrained language generation. It allows you to use large language models (LLMs) with various backends while applying constraints to the generated output.\n",
                "\n",
                "## Overview\n",
                "### Integration details\n",
                "\n",
                "| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
                "| :--- | :--- | :---: | :---: |  :---: | :---: | :---: |\n",
                "| [ChatOutlines](https://api.python.langchain.com/en/latest/chat_models/outlines.chat_models.ChatOutlines.html) | [langchain-community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-community?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-community?style=flat-square&label=%20) |\n",
                "\n",
                "### Model features\n",
                "| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
                "| :---: | :---: | :---: | :---: |  :---: | :---: | :---: | :---: | :---: | :---: |\n",
                "| ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | \n",
                "\n",
                "## Setup\n",
                "\n",
                "To access Outlines models you'll need to have an internet connection to download the model weights from huggingface. Depending on the backend you need to install the required dependencies (see [Outlines docs](https://dottxt-ai.github.io/outlines/latest/installation/))\n",
                "\n",
                "### Credentials\n",
                "\n",
                "There is no built-in auth mechanism for Outlines.\n",
                "\n",
                "### Installation\n",
                "\n",
                "The LangChain Outlines integration lives in the `langchain-community` package and requires the `outlines` library:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "652d6238-1f87-422a-b135-f5abbb8652fc",
            "metadata": {},
            "outputs": [],
            "source": [
                "%pip install -qU langchain-community outlines"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "a38cde65-254d-4219-a441-068766c0d4b5",
            "metadata": {},
            "source": [
                "## Instantiation\n",
                "\n",
                "Now we can instantiate our model object and generate chat completions:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
            "metadata": {},
            "outputs": [],
            "source": [
                "from langchain_community.chat_models.outlines import ChatOutlines\n",
                "\n",
                "# For llamacpp backend\n",
                "model = ChatOutlines(model=\"TheBloke/phi-2-GGUF/phi-2.Q4_K_M.gguf\", backend=\"llamacpp\")\n",
                "\n",
                "# For vllm backend (not available on Mac)\n",
                "model = ChatOutlines(model=\"meta-llama/Llama-3.2-1B\", backend=\"vllm\")\n",
                "\n",
                "# For mlxlm backend (only available on Mac)\n",
                "model = ChatOutlines(model=\"mistralai/Ministral-8B-Instruct-2410\", backend=\"mlxlm\")\n",
                "\n",
                "# For huggingface transformers backend\n",
                "model = ChatOutlines(model=\"microsoft/phi-2\")  # defaults to transformers backend"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "2b4f3e15",
            "metadata": {},
            "source": [
                "## Invocation"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "62e0dbc3",
            "metadata": {
                "tags": []
            },
            "outputs": [],
            "source": [
                "from langchain_core.messages import HumanMessage\n",
                "\n",
                "messages = [HumanMessage(content=\"What will the capital of mars be called?\")]\n",
                "response = model.invoke(messages)\n",
                "\n",
                "response.content"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "18e2bfc0-7e78-4528-a73f-499ac150dca8",
            "metadata": {},
            "source": [
                "## Streaming\n",
                "\n",
                "ChatOutlines supports streaming of tokens:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
            "metadata": {},
            "outputs": [],
            "source": [
                "messages = [HumanMessage(content=\"Count to 10 in French:\")]\n",
                "\n",
                "for chunk in model.stream(messages):\n",
                "    print(chunk.content, end=\"\", flush=True)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ccc3e2f6",
            "metadata": {},
            "source": [
                "## Chaining"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "5a032003",
            "metadata": {},
            "outputs": [],
            "source": [
                "from langchain_core.prompts import ChatPromptTemplate\n",
                "\n",
                "prompt = ChatPromptTemplate.from_messages(\n",
                "    [\n",
                "        (\n",
                "            \"system\",\n",
                "            \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
                "        ),\n",
                "        (\"human\", \"{input}\"),\n",
                "    ]\n",
                ")\n",
                "\n",
                "chain = prompt | model\n",
                "chain.invoke(\n",
                "    {\n",
                "        \"input_language\": \"English\",\n",
                "        \"output_language\": \"German\",\n",
                "        \"input\": \"I love programming.\",\n",
                "    }\n",
                ")"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "d1ee55bc-ffc8-4cfa-801c-993953a08cfd",
            "metadata": {},
            "source": [
                "## Constrained Generation\n",
                "\n",
                "ChatOutlines allows you to apply various constraints to the generated output:\n",
                "\n",
                "### Regex Constraint"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.regex = r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\"\n",
                "\n",
                "response = model.invoke(\"What is the IP address of Google's DNS server?\")\n",
                "\n",
                "response.content"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "4a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "source": [
                "### Type Constraints"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "5a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.type_constraints = int\n",
                "response = model.invoke(\"What is the answer to life, the universe, and everything?\")\n",
                "\n",
                "response.content"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "6a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "source": [
                "### Pydantic and JSON Schemas"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "7a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "outputs": [],
            "source": [
                "from pydantic import BaseModel\n",
                "\n",
                "\n",
                "class Person(BaseModel):\n",
                "    name: str\n",
                "\n",
                "\n",
                "model.json_schema = Person\n",
                "response = model.invoke(\"Who are the main contributors to LangChain?\")\n",
                "person = Person.model_validate_json(response.content)\n",
                "\n",
                "person"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "8a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "source": [
                "### Context Free Grammars"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "9a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "outputs": [],
            "source": [
                "model.grammar = \"\"\"\n",
                "?start: expression\n",
                "?expression: term ((\"+\" | \"-\") term)*\n",
                "?term: factor ((\"*\" | \"/\") factor)*\n",
                "?factor: NUMBER | \"-\" factor | \"(\" expression \")\"\n",
                "%import common.NUMBER\n",
                "%import common.WS\n",
                "%ignore WS\n",
                "\"\"\"\n",
                "response = model.invoke(\"Give me a complex arithmetic expression:\")\n",
                "\n",
                "response.content"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "aa5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "source": [
                "## LangChain's Structured Output\n",
                "\n",
                "You can also use LangChain's Structured Output with ChatOutlines:"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "ba5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "outputs": [],
            "source": [
                "from pydantic import BaseModel\n",
                "\n",
                "\n",
                "class AnswerWithJustification(BaseModel):\n",
                "    answer: str\n",
                "    justification: str\n",
                "\n",
                "\n",
                "_model = model.with_structured_output(AnswerWithJustification)\n",
                "result = _model.invoke(\"What weighs more, a pound of bricks or a pound of feathers?\")\n",
                "\n",
                "result"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "ca5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
            "metadata": {},
            "source": [
                "## API reference\n",
                "\n",
                "For detailed documentation of all ChatOutlines features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/outlines.chat_models.ChatOutlines.html\n",
                "\n",
                "## Full Outlines Documentation: \n",
                "\n",
                "https://dottxt-ai.github.io/outlines/latest/"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3 (ipykernel)",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.9.9"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
 }
--- a/docs/docs/integrations/llms/outlines.ipynb
+++ b/docs/docs/integrations/llms/outlines.ipynb
@ -0,0 +1,268 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Outlines\n",
    "\n",
    "This will help you getting started with Outlines LLM. For detailed documentation of all Outlines features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/llms/outlines.llms.Outlines.html).\n",
    "\n",
    "[Outlines](https://github.com/outlines-dev/outlines) is a library for constrained language generation. It allows you to use large language models (LLMs) with various backends while applying constraints to the generated output.\n",
    "\n",
    "## Overview\n",
    "\n",
    "### Integration details\n",
    "| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
    "| :--- | :--- | :---: | :---: |  :---: | :---: | :---: |\n",
    "| [Outlines](https://python.langchain.com/api_reference/community/llms/langchain_community.llms.outlines.Outlines.html) | [langchain-community](https://python.langchain.com/api_reference/community/index.html) | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-community?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-community?style=flat-square&label=%20) |\n",
    "\n",
    "## Setup\n",
    "\n",
    "To access Outlines models you'll need to have an internet connection to download the model weights from huggingface. Depending on the backend you need to install the required dependencies (see [Outlines docs](https://dottxt-ai.github.io/outlines/latest/installation/))\n",
    "\n",
    "### Credentials\n",
    "\n",
    "There is no built-in auth mechanism for Outlines.\n",
    "\n",
    "## Installation\n",
    "\n",
    "The LangChain Outlines integration lives in the `langchain-community` package and requires the `outlines` library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "shellscript"
    }
   },
   "outputs": [],
   "source": [
    "%pip install -qU langchain-community outlines"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiation\n",
    "\n",
    "Now we can instantiate our model object and generate chat completions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.llms import Outlines\n",
    "\n",
    "# For use with llamacpp backend\n",
    "model = Outlines(model=\"microsoft/Phi-3-mini-4k-instruct\", backend=\"llamacpp\")\n",
    "\n",
    "# For use with vllm backend (not available on Mac)\n",
    "model = Outlines(model=\"microsoft/Phi-3-mini-4k-instruct\", backend=\"vllm\")\n",
    "\n",
    "# For use with mlxlm backend (only available on Mac)\n",
    "model = Outlines(model=\"microsoft/Phi-3-mini-4k-instruct\", backend=\"mlxlm\")\n",
    "\n",
    "# For use with huggingface transformers backend\n",
    "model = Outlines(\n",
    "    model=\"microsoft/Phi-3-mini-4k-instruct\"\n",
    ")  # defaults to backend=\"transformers\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Invocation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.invoke(\"Hello how are you?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chaining"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import PromptTemplate\n",
    "\n",
    "prompt = PromptTemplate.from_template(\"How to say {input} in {output_language}:\\n\")\n",
    "\n",
    "chain = prompt | model\n",
    "chain.invoke(\n",
    "    {\n",
    "        \"output_language\": \"German\",\n",
    "        \"input\": \"I love programming.\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Streaming\n",
    "\n",
    "Outlines supports streaming of tokens:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for chunk in model.stream(\"Count to 10 in French:\"):\n",
    "    print(chunk, end=\"\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Constrained Generation\n",
    "\n",
    "Outlines allows you to apply various constraints to the generated output:\n",
    "\n",
    "#### Regex Constraint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.regex = r\"((25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d\\d?)\"\n",
    "response = model.invoke(\"What is the IP address of Google's DNS server?\")\n",
    "\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Type Constraints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.type_constraints = int\n",
    "response = model.invoke(\"What is the answer to life, the universe, and everything?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### JSON Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import BaseModel\n",
    "\n",
    "\n",
    "class Person(BaseModel):\n",
    "    name: str\n",
    "\n",
    "\n",
    "model.json_schema = Person\n",
    "response = model.invoke(\"Who is the author of LangChain?\")\n",
    "person = Person.model_validate_json(response)\n",
    "\n",
    "person"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Grammar Constraint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.grammar = \"\"\"\n",
    "?start: expression\n",
    "?expression: term ((\"+\" | \"-\") term)\n",
    "?term: factor ((\"\" | \"/\") factor)\n",
    "?factor: NUMBER | \"-\" factor | \"(\" expression \")\"\n",
    "%import common.NUMBER\n",
    "%import common.WS\n",
    "%ignore WS\n",
    "\"\"\"\n",
    "response = model.invoke(\"Give me a complex arithmetic expression:\")\n",
    "\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## API reference\n",
    "\n",
    "For detailed documentation of all ChatOutlines features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/outlines.chat_models.ChatOutlines.html\n",
    "\n",
    "## Outlines Documentation: \n",
    "\n",
    "https://dottxt-ai.github.io/outlines/latest/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
--- a/docs/docs/integrations/providers/outlines.mdx
+++ b/docs/docs/integrations/providers/outlines.mdx
@ -0,0 +1,201 @@
 # Outlines
 >[Outlines](https://github.com/dottxt-ai/outlines) is a Python library for constrained language generation. It provides a unified interface to various language models and allows for structured generation using techniques like regex matching, type constraints, JSON schemas, and context-free grammars.
 Outlines supports multiple backends, including:
 - Hugging Face Transformers
 - llama.cpp
 - vLLM
 - MLX
 This integration allows you to use Outlines models with LangChain, providing both LLM and chat model interfaces.
 ## Installation and Setup
 To use Outlines with LangChain, you'll need to install the Outlines library:
 ```bash
 pip install outlines
 ```
 Depending on the backend you choose, you may need to install additional dependencies:
 - For Transformers: `pip install transformers torch datasets`
 - For llama.cpp: `pip install llama-cpp-python`
 - For vLLM: `pip install vllm`
 - For MLX: `pip install mlx`
 ## LLM
 To use Outlines as an LLM in LangChain, you can use the `Outlines` class:
 ```python
 from langchain_community.llms import Outlines
 ```
 ## Chat Models
 To use Outlines as a chat model in LangChain, you can use the `ChatOutlines` class:
 ```python
 from langchain_community.chat_models import ChatOutlines
 ```
 ## Model Configuration
 Both `Outlines` and `ChatOutlines` classes share similar configuration options:
 ```python
 model = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",  # Model identifier
    backend="transformers",  # Backend to use (transformers, llamacpp, vllm, or mlxlm)
    max_tokens=256,  # Maximum number of tokens to generate
    stop=["\n"],  # Optional list of stop strings
    streaming=True,  # Whether to stream the output
    # Additional parameters for structured generation:
    regex=None,
    type_constraints=None,
    json_schema=None,
    grammar=None,
    # Additional model parameters:
    model_kwargs={"temperature": 0.7}
 )
 ```
 ### Model Identifier
 The `model` parameter can be:
 - A Hugging Face model name (e.g., "meta-llama/Llama-2-7b-chat-hf")
 - A local path to a model
 - For GGUF models, the format is "repo_id/file_name" (e.g., "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf")
 ### Backend Options
 The `backend` parameter specifies which backend to use:
 - `"transformers"`: For Hugging Face Transformers models (default)
 - `"llamacpp"`: For GGUF models using llama.cpp
 - `"transformers_vision"`: For vision-language models (e.g., LLaVA)
 - `"vllm"`: For models using the vLLM library
 - `"mlxlm"`: For models using the MLX framework
 ### Structured Generation
 Outlines provides several methods for structured generation:
 1. **Regex Matching**:
   ```python
   model = Outlines(
       model="meta-llama/Llama-2-7b-chat-hf",
       regex=r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
   )
   ```
   This will ensure the generated text matches the specified regex pattern (in this case, a valid IP address).
 2. **Type Constraints**:
   ```python
   model = Outlines(
       model="meta-llama/Llama-2-7b-chat-hf",
       type_constraints=int
   )
   ```
   This restricts the output to valid Python types (int, float, bool, datetime.date, datetime.time, datetime.datetime).
 3. **JSON Schema**:
   ```python
   from pydantic import BaseModel
   class Person(BaseModel):
       name: str
       age: int
   model = Outlines(
       model="meta-llama/Llama-2-7b-chat-hf",
       json_schema=Person
   )
   ```
   This ensures the generated output adheres to the specified JSON schema or Pydantic model.
 4. **Context-Free Grammar**:
   ```python
   model = Outlines(
       model="meta-llama/Llama-2-7b-chat-hf",
       grammar="""
           ?start: expression
           ?expression: term (("+" | "-") term)*
           ?term: factor (("*" | "/") factor)*
           ?factor: NUMBER | "-" factor | "(" expression ")"
           %import common.NUMBER
       """
   )
   ```
   This generates text that adheres to the specified context-free grammar in EBNF format.
 ## Usage Examples
 ### LLM Example
 ```python
 from langchain_community.llms import Outlines
 llm = Outlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
 result = llm.invoke("Tell me a short story about a robot.")
 print(result)
 ```
 ### Chat Model Example
 ```python
 from langchain_community.chat_models import ChatOutlines
 from langchain_core.messages import HumanMessage, SystemMessage
 chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", max_tokens=100)
 messages = [
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What's the capital of France?")
 ]
 result = chat.invoke(messages)
 print(result.content)
 ```
 ### Streaming Example
 ```python
 from langchain_community.chat_models import ChatOutlines
 from langchain_core.messages import HumanMessage
 chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf", streaming=True)
 for chunk in chat.stream("Tell me a joke about programming."):
    print(chunk.content, end="", flush=True)
 print()
 ```
 ### Structured Output Example
 ```python
 from langchain_community.llms import Outlines
 from pydantic import BaseModel
 class MovieReview(BaseModel):
    title: str
    rating: int
    summary: str
 llm = Outlines(
    model="meta-llama/Llama-2-7b-chat-hf",
    json_schema=MovieReview
 )
 result = llm.invoke("Write a short review for the movie 'Inception'.")
 print(result)
 ```
 ## Additional Features
 ### Tokenizer Access
 You can access the underlying tokenizer for the model:
 ```python
 tokenizer = llm.tokenizer
 encoded = tokenizer.encode("Hello, world!")
 decoded = tokenizer.decode(encoded)
 ```
--- a/libs/community/extended_testing_deps.txt
+++ b/libs/community/extended_testing_deps.txt
@ -55,6 +55,7 @@ openai<2
 openapi-pydantic>=0.3.2,<0.4
 oracle-ads>=2.9.1,<3
 oracledb>=2.2.0,<3
 outlines[test]>=0.1.0,<0.2
 pandas>=2.0.1,<3
 pdfminer-six>=20221105,<20240706
 pgvector>=0.1.6,<0.2
--- a/libs/community/langchain_community/chat_models/init.py
+++ b/libs/community/langchain_community/chat_models/init.py
@ -143,6 +143,7 @@ if TYPE_CHECKING:
    from langchain_community.chat_models.openai import (
        ChatOpenAI,
    )
    from langchain_community.chat_models.outlines import ChatOutlines
    from langchain_community.chat_models.pai_eas_endpoint import (
        PaiEasChatEndpoint,
    )
@ -228,6 +229,7 @@ __all__ = [
    "ChatOCIModelDeploymentTGI",
    "ChatOllama",
    "ChatOpenAI",
    "ChatOutlines",
    "ChatPerplexity",
    "ChatReka",
    "ChatPremAI",
@ -294,6 +296,7 @@ _module_lookup = {
    "ChatOCIModelDeploymentTGI": "langchain_community.chat_models.oci_data_science",
    "ChatOllama": "langchain_community.chat_models.ollama",
    "ChatOpenAI": "langchain_community.chat_models.openai",
    "ChatOutlines": "langchain_community.chat_models.outlines",
    "ChatReka": "langchain_community.chat_models.reka",
    "ChatPerplexity": "langchain_community.chat_models.perplexity",
    "ChatSambaNovaCloud": "langchain_community.chat_models.sambanova",
--- a/libs/community/langchain_community/chat_models/outlines.py
+++ b/libs/community/langchain_community/chat_models/outlines.py
@ -0,0 +1,532 @@
 from __future__ import annotations
 import importlib.util
 import platform
 from collections.abc import AsyncIterator
 from typing import (
    Any,
    Callable,
    Dict,
    Iterator,
    List,
    Optional,
    Sequence,
    Tuple,
    Type,
    TypedDict,
    TypeVar,
    Union,
    get_origin,
 )
 from langchain_core.callbacks import CallbackManagerForLLMRun
 from langchain_core.callbacks.manager import AsyncCallbackManagerForLLMRun
 from langchain_core.language_models import LanguageModelInput
 from langchain_core.language_models.chat_models import BaseChatModel
 from langchain_core.messages import AIMessage, AIMessageChunk, BaseMessage
 from langchain_core.output_parsers import JsonOutputParser, PydanticOutputParser
 from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
 from langchain_core.runnables import Runnable
 from langchain_core.tools import BaseTool
 from langchain_core.utils.function_calling import convert_to_openai_tool
 from pydantic import BaseModel, Field, model_validator
 from typing_extensions import Literal
 from langchain_community.adapters.openai import convert_message_to_dict
 _BM = TypeVar("_BM", bound=BaseModel)
 _DictOrPydanticClass = Union[Dict[str, Any], Type[_BM], Type]
 class ChatOutlines(BaseChatModel):
    """Outlines chat model integration.
    Setup:
      pip install outlines
    Key init args — client params:
      backend: Literal["llamacpp", "transformers", "transformers_vision", "vllm", "mlxlm"] = "transformers"
        Specifies the backend to use for the model.
    Key init args — completion params:
      model: str
        Identifier for the model to use with Outlines.
      max_tokens: int = 256
        The maximum number of tokens to generate.
      stop: Optional[List[str]] = None
        A list of strings to stop generation when encountered.
      streaming: bool = True
        Whether to stream the results, token by token.
    See full list of supported init args and their descriptions in the params section.
    Instantiate:
      from langchain_community.chat_models import ChatOutlines
      chat = ChatOutlines(model="meta-llama/Llama-2-7b-chat-hf")
    Invoke:
      chat.invoke([HumanMessage(content="Say foo:")])
    Stream:
      for chunk in chat.stream([HumanMessage(content="Count to 10:")]):
          print(chunk.content, end="", flush=True)
    """  # noqa: E501
    client: Any = None  # :meta private:
    model: str
    """Identifier for the model to use with Outlines.
    The model identifier should be a string specifying:
    - A Hugging Face model name (e.g., "meta-llama/Llama-2-7b-chat-hf")
    - A local path to a model
    - For GGUF models, the format is "repo_id/file_name"
      (e.g., "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf")
    Examples:
    - "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf"
    - "meta-llama/Llama-2-7b-chat-hf"
    """
    backend: Literal[
        "llamacpp", "transformers", "transformers_vision", "vllm", "mlxlm"
    ] = "transformers"
    """Specifies the backend to use for the model.
    Supported backends are:
    - "llamacpp": For GGUF models using llama.cpp
    - "transformers": For Hugging Face Transformers models (default)
    - "transformers_vision": For vision-language models (e.g., LLaVA)
    - "vllm": For models using the vLLM library
    - "mlxlm": For models using the MLX framework
    Note: Ensure you have the necessary dependencies installed for the chosen backend.
    The system will attempt to import required packages and may raise an ImportError
    if they are not available.
    """
    max_tokens: int = 256
    """The maximum number of tokens to generate."""
    stop: Optional[List[str]] = None
    """A list of strings to stop generation when encountered."""
    streaming: bool = True
    """Whether to stream the results, token by token."""
    regex: Optional[str] = None
    """Regular expression for structured generation.
    If provided, Outlines will guarantee that the generated text matches this regex.
    This can be useful for generating structured outputs like IP addresses, dates, etc.
    Example: (valid IP address)
        regex = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    Note: Computing the regex index can take some time, so it's recommended to reuse
    the same regex for multiple generations if possible.
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/regex/
    """
    type_constraints: Optional[Union[type, str]] = None
    """Type constraints for structured generation.
    Restricts the output to valid Python types. Supported types include:
    int, float, bool, datetime.date, datetime.time, datetime.datetime.
    Example:
        type_constraints = int
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/format/
    """
    json_schema: Optional[Union[Any, Dict, Callable]] = None
    """Pydantic model, JSON Schema, or callable (function signature)
    for structured JSON generation.
    Outlines can generate JSON output that follows a specified structure,
    which is useful for:
    1. Parsing the answer (e.g., with Pydantic), storing it, or returning it to a user.
    2. Calling a function with the result.
    You can provide:
    - A Pydantic model
    - A JSON Schema (as a Dict)
    - A callable (function signature)
    The generated JSON will adhere to the specified structure.
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/json/
    """
    grammar: Optional[str] = None
    """Context-free grammar for structured generation.
    If provided, Outlines will generate text that adheres to the specified grammar.
    The grammar should be defined in EBNF format.
    This can be useful for generating structured outputs like mathematical expressions,
    programming languages, or custom domain-specific languages.
    Example:
        grammar = '''
            ?start: expression
            ?expression: term (("+" | "-") term)*
            ?term: factor (("*" | "/") factor)*
            ?factor: NUMBER | "-" factor | "(" expression ")"
            %import common.NUMBER
        '''
    Note: Grammar-based generation is currently experimental and may have performance
    limitations. It uses greedy generation to mitigate these issues.
    For more details and examples, see:
    https://dottxt-ai.github.io/outlines/reference/generation/cfg/
    """
    custom_generator: Optional[Any] = None
    """Set your own outlines generator object to override the default behavior."""
    model_kwargs: Dict[str, Any] = Field(default_factory=dict)
    """Additional parameters to pass to the underlying model.
    Example:
        model_kwargs = {"temperature": 0.8, "seed": 42}
    """
    @model_validator(mode="after")
    def validate_environment(self) -> "ChatOutlines":
        """Validate that outlines is installed and create a model instance."""
        num_constraints = sum(
            [
                bool(self.regex),
                bool(self.type_constraints),
                bool(self.json_schema),
                bool(self.grammar),
            ]
        )
        if num_constraints > 1:
            raise ValueError(
                "Either none or exactly one of regex, type_constraints, "
                "json_schema, or grammar can be provided."
            )
        return self.build_client()
    def build_client(self) -> "ChatOutlines":
        try:
            import outlines.models as models
        except ImportError:
            raise ImportError(
                "Could not import the Outlines library. "
                "Please install it with `pip install outlines`."
            )
        def check_packages_installed(
            packages: List[Union[str, Tuple[str, str]]],
        ) -> None:
            missing_packages = [
                pkg if isinstance(pkg, str) else pkg[0]
                for pkg in packages
                if importlib.util.find_spec(pkg[1] if isinstance(pkg, tuple) else pkg)
                is None
            ]
            if missing_packages:
                raise ImportError(
                    f"Missing packages: {', '.join(missing_packages)}. "
                    "You can install them with:\n\n"
                    f"    pip install {' '.join(missing_packages)}"
                )
        if self.backend == "llamacpp":
            check_packages_installed([("llama-cpp-python", "llama_cpp")])
            if ".gguf" in self.model:
                creator, repo_name, file_name = self.model.split("/", 2)
                repo_id = f"{creator}/{repo_name}"
            else:
                raise ValueError("GGUF file_name must be provided for llama.cpp.")
            self.client = models.llamacpp(repo_id, file_name, **self.model_kwargs)
        elif self.backend == "transformers":
            check_packages_installed(["transformers", "torch", "datasets"])
            self.client = models.transformers(
                model_name=self.model, **self.model_kwargs
            )
        elif self.backend == "transformers_vision":
            if hasattr(models, "transformers_vision"):
                from transformers import LlavaNextForConditionalGeneration
                self.client = models.transformers_vision(
                    self.model,
                    model_class=LlavaNextForConditionalGeneration,
                    **self.model_kwargs,
                )
            else:
                raise ValueError("transformers_vision backend is not supported")
        elif self.backend == "vllm":
            if platform.system() == "Darwin":
                raise ValueError("vLLM backend is not supported on macOS.")
            check_packages_installed(["vllm"])
            self.client = models.vllm(self.model, **self.model_kwargs)
        elif self.backend == "mlxlm":
            check_packages_installed(["mlx"])
            self.client = models.mlxlm(self.model, **self.model_kwargs)
        else:
            raise ValueError(f"Unsupported backend: {self.backend}")
        return self
    @property
    def _llm_type(self) -> str:
        return "outlines-chat"
    @property
    def _default_params(self) -> Dict[str, Any]:
        return {
            "max_tokens": self.max_tokens,
            "stop_at": self.stop,
            **self.model_kwargs,
        }
    @property
    def _identifying_params(self) -> Dict[str, Any]:
        return {
            "model": self.model,
            "backend": self.backend,
            "regex": self.regex,
            "type_constraints": self.type_constraints,
            "json_schema": self.json_schema,
            "grammar": self.grammar,
            **self._default_params,
        }
    @property
    def _generator(self) -> Any:
        from outlines import generate
        if self.custom_generator:
            return self.custom_generator
        constraints = [
            self.regex,
            self.type_constraints,
            self.json_schema,
            self.grammar,
        ]
        num_constraints = sum(constraint is not None for constraint in constraints)
        if num_constraints != 1 and num_constraints != 0:
            raise ValueError(
                "Either none or exactly one of regex, type_constraints, "
                "json_schema, or grammar can be provided."
            )
        if self.regex:
            return generate.regex(self.client, regex_str=self.regex)
        if self.type_constraints:
            return generate.format(self.client, python_type=self.type_constraints)
        if self.json_schema:
            return generate.json(self.client, schema_object=self.json_schema)
        if self.grammar:
            return generate.cfg(self.client, cfg_str=self.grammar)
        return generate.text(self.client)
    def _convert_messages_to_openai_format(
        self, messages: list[BaseMessage]
    ) -> list[dict]:
        return [convert_message_to_dict(message) for message in messages]
    def _convert_messages_to_prompt(self, messages: list[BaseMessage]) -> str:
        """Convert a list of messages to a single prompt."""
        if self.backend == "llamacpp":  # get base_model_name from gguf repo_id
            from huggingface_hub import ModelCard
            repo_creator, gguf_repo_name, file_name = self.model.split("/")
            model_card = ModelCard.load(f"{repo_creator}/{gguf_repo_name}")
            if hasattr(model_card.data, "base_model"):
                model_name = model_card.data.base_model
            else:
                raise ValueError(f"Base model name not found for {self.model}")
        else:
            model_name = self.model
        from transformers import AutoTokenizer
        return AutoTokenizer.from_pretrained(model_name).apply_chat_template(
            self._convert_messages_to_openai_format(messages),
            tokenize=False,
            add_generation_prompt=True,
        )
    def bind_tools(
        self,
        tools: Sequence[Dict[str, Any] | type | Callable[..., Any] | BaseTool],
        *,
        tool_choice: Optional[Union[Dict, bool, str]] = None,
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, BaseMessage]:
        """Bind tool-like objects to this chat model
        tool_choice: does not currently support "any", "auto" choices like OpenAI
            tool-calling API. should be a dict of the form to force this tool
            {"type": "function", "function": {"name": <<tool_name>>}}.
        """
        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
        tool_names = [ft["function"]["name"] for ft in formatted_tools]
        if tool_choice:
            if isinstance(tool_choice, dict):
                if not any(
                    tool_choice["function"]["name"] == name for name in tool_names
                ):
                    raise ValueError(
                        f"Tool choice {tool_choice=} was specified, but the only "
                        f"provided tools were {tool_names}."
                    )
            elif isinstance(tool_choice, str):
                chosen = [
                    f for f in formatted_tools if f["function"]["name"] == tool_choice
                ]
                if not chosen:
                    raise ValueError(
                        f"Tool choice {tool_choice=} was specified, but the only "
                        f"provided tools were {tool_names}."
                    )
            elif isinstance(tool_choice, bool):
                if len(formatted_tools) > 1:
                    raise ValueError(
                        "tool_choice=True can only be specified when a single tool is "
                        f"passed in. Received {len(tools)} tools."
                    )
                tool_choice = formatted_tools[0]
        kwargs["tool_choice"] = tool_choice
        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
        return super().bind_tools(tools=formatted_tools, **kwargs)
    def with_structured_output(
        self,
        schema: Optional[_DictOrPydanticClass],
        *,
        include_raw: bool = False,
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, Union[dict, BaseModel]]:
        if get_origin(schema) is TypedDict:
            raise NotImplementedError("TypedDict is not supported yet by Outlines")
        self.json_schema = schema
        if isinstance(schema, type) and issubclass(schema, BaseModel):
            parser: Union[PydanticOutputParser, JsonOutputParser] = (
                PydanticOutputParser(pydantic_object=schema)
            )
        else:
            parser = JsonOutputParser()
        if include_raw:  # TODO
            raise NotImplementedError("include_raw is not yet supported")
        return self | parser
    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        params = {**self._default_params, **kwargs}
        if stop:
            params["stop_at"] = stop
        prompt = self._convert_messages_to_prompt(messages)
        response = ""
        if self.streaming:
            for chunk in self._stream(
                messages=messages,
                stop=stop,
                run_manager=run_manager,
                **kwargs,
            ):
                if isinstance(chunk.message.content, str):
                    response += chunk.message.content
                else:
                    raise ValueError(
                        "Invalid content type, only str is supported, "
                        f"got {type(chunk.message.content)}"
                    )
        else:
            response = self._generator(prompt, **params)
        message = AIMessage(content=response)
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])
    def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        params = {**self._default_params, **kwargs}
        if stop:
            params["stop_at"] = stop
        prompt = self._convert_messages_to_prompt(messages)
        for token in self._generator.stream(prompt, **params):
            if run_manager:
                run_manager.on_llm_new_token(token)
            message_chunk = AIMessageChunk(content=token)
            chunk = ChatGenerationChunk(message=message_chunk)
            yield chunk
    async def _agenerate(
        self,
        messages: List[BaseMessage],
        stop: List[str] | None = None,
        run_manager: AsyncCallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> ChatResult:
        if hasattr(self._generator, "agenerate"):
            params = {**self._default_params, **kwargs}
            if stop:
                params["stop_at"] = stop
            prompt = self._convert_messages_to_prompt(messages)
            response = await self._generator.agenerate(prompt, **params)
            message = AIMessage(content=response)
            generation = ChatGeneration(message=message)
            return ChatResult(generations=[generation])
        elif self.streaming:
            response = ""
            async for chunk in self._astream(messages, stop, run_manager, **kwargs):
                response += chunk.message.content or ""
            message = AIMessage(content=response)
            generation = ChatGeneration(message=message)
            return ChatResult(generations=[generation])
        else:
            return await super()._agenerate(messages, stop, run_manager, **kwargs)
    async def _astream(
        self,
        messages: List[BaseMessage],
        stop: List[str] | None = None,
        run_manager: AsyncCallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> AsyncIterator[ChatGenerationChunk]:
        if hasattr(self._generator, "astream"):
            params = {**self._default_params, **kwargs}
            if stop:
                params["stop_at"] = stop
            prompt = self._convert_messages_to_prompt(messages)
            async for token in self._generator.astream(prompt, **params):
                if run_manager:
                    await run_manager.on_llm_new_token(token)
                message_chunk = AIMessageChunk(content=token)
                chunk = ChatGenerationChunk(message=message_chunk)
                yield chunk
        else:
            async for chunk in super()._astream(messages, stop, run_manager, **kwargs):
                yield chunk
--- a/libs/community/langchain_community/llms/init.py
+++ b/libs/community/langchain_community/llms/init.py
@ -458,6 +458,12 @@ def _import_openlm() -> Type[BaseLLM]:
    return OpenLM
 def _import_outlines() -> Type[BaseLLM]:
    from langchain_community.llms.outlines import Outlines
    return Outlines
 def _import_pai_eas_endpoint() -> Type[BaseLLM]:
    from langchain_community.llms.pai_eas_endpoint import PaiEasEndpoint
@ -807,6 +813,8 @@ def __getattr__(name: str) -> Any:
        return _import_openllm()
    elif name == "OpenLM":
        return _import_openlm()
    elif name == "Outlines":
        return _import_outlines()
    elif name == "PaiEasEndpoint":
        return _import_pai_eas_endpoint()
    elif name == "Petals":
@ -954,6 +962,7 @@ __all__ = [
    "OpenAIChat",
    "OpenLLM",
    "OpenLM",
    "Outlines",
    "PaiEasEndpoint",
    "Petals",
    "PipelineAI",
@ -1076,6 +1085,7 @@ def get_type_to_cls_dict() -> Dict[str, Callable[[], Type[BaseLLM]]]:
        "vertexai_model_garden": _import_vertex_model_garden,
        "openllm": _import_openllm,
        "openllm_client": _import_openllm,
        "outlines": _import_outlines,
        "vllm": _import_vllm,
        "vllm_openai": _import_vllm_openai,
        "watsonxllm": _import_watsonxllm,
--- a/libs/community/langchain_community/llms/outlines.py
+++ b/libs/community/langchain_community/llms/outlines.py
@ -0,0 +1,314 @@
 from __future__ import annotations
 import importlib.util
 import logging
 import platform
 from typing import Any, Callable, Dict, Iterator, List, Literal, Optional, Tuple, Union
 from langchain_core.callbacks import CallbackManagerForLLMRun
 from langchain_core.language_models.llms import LLM
 from langchain_core.outputs import GenerationChunk
 from pydantic import BaseModel, Field, model_validator
 logger = logging.getLogger(__name__)
 class Outlines(LLM):
    """LLM wrapper for the Outlines library."""
    client: Any = None  # :meta private:
    model: str
    """Identifier for the model to use with Outlines.
    The model identifier should be a string specifying:
    - A Hugging Face model name (e.g., "meta-llama/Llama-2-7b-chat-hf")
    - A local path to a model
    - For GGUF models, the format is "repo_id/file_name"
      (e.g., "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf")
    Examples:
    - "TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf"
    - "meta-llama/Llama-2-7b-chat-hf"
    """
    backend: Literal[
        "llamacpp", "transformers", "transformers_vision", "vllm", "mlxlm"
    ] = "transformers"
    """Specifies the backend to use for the model.
    Supported backends are:
    - "llamacpp": For GGUF models using llama.cpp
    - "transformers": For Hugging Face Transformers models (default)
    - "transformers_vision": For vision-language models (e.g., LLaVA)
    - "vllm": For models using the vLLM library
    - "mlxlm": For models using the MLX framework
    Note: Ensure you have the necessary dependencies installed for the chosen backend.
    The system will attempt to import required packages and may raise an ImportError
    if they are not available.
    """
    max_tokens: int = 256
    """The maximum number of tokens to generate."""
    stop: Optional[List[str]] = None
    """A list of strings to stop generation when encountered."""
    streaming: bool = True
    """Whether to stream the results, token by token."""
    regex: Optional[str] = None
    """Regular expression for structured generation.
    If provided, Outlines will guarantee that the generated text matches this regex.
    This can be useful for generating structured outputs like IP addresses, dates, etc.
    Example: (valid IP address)
        regex = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    Note: Computing the regex index can take some time, so it's recommended to reuse
    the same regex for multiple generations if possible.
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/regex/
    """
    type_constraints: Optional[Union[type, str]] = None
    """Type constraints for structured generation.
    Restricts the output to valid Python types. Supported types include:
    int, float, bool, datetime.date, datetime.time, datetime.datetime.
    Example:
        type_constraints = int
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/format/
    """
    json_schema: Optional[Union[BaseModel, Dict, Callable]] = None
    """Pydantic model, JSON Schema, or callable (function signature)
    for structured JSON generation.
    Outlines can generate JSON output that follows a specified structure,
    which is useful for:
    1. Parsing the answer (e.g., with Pydantic), storing it, or returning it to a user.
    2. Calling a function with the result.
    You can provide:
    - A Pydantic model
    - A JSON Schema (as a Dict)
    - A callable (function signature)
    The generated JSON will adhere to the specified structure.
    For more details, see: https://dottxt-ai.github.io/outlines/reference/generation/json/
    """
    grammar: Optional[str] = None
    """Context-free grammar for structured generation.
    If provided, Outlines will generate text that adheres to the specified grammar.
    The grammar should be defined in EBNF format.
    This can be useful for generating structured outputs like mathematical expressions,
    programming languages, or custom domain-specific languages.
    Example:
        grammar = '''
            ?start: expression
            ?expression: term (("+" | "-") term)*
            ?term: factor (("*" | "/") factor)*
            ?factor: NUMBER | "-" factor | "(" expression ")"
            %import common.NUMBER
        '''
    Note: Grammar-based generation is currently experimental and may have performance
    limitations. It uses greedy generation to mitigate these issues.
    For more details and examples, see:
    https://dottxt-ai.github.io/outlines/reference/generation/cfg/
    """
    custom_generator: Optional[Any] = None
    """Set your own outlines generator object to override the default behavior."""
    model_kwargs: Dict[str, Any] = Field(default_factory=dict)
    """Additional parameters to pass to the underlying model.
    Example:
        model_kwargs = {"temperature": 0.8, "seed": 42}
    """
    @model_validator(mode="after")
    def validate_environment(self) -> "Outlines":
        """Validate that outlines is installed and create a model instance."""
        num_constraints = sum(
            [
                bool(self.regex),
                bool(self.type_constraints),
                bool(self.json_schema),
                bool(self.grammar),
            ]
        )
        if num_constraints > 1:
            raise ValueError(
                "Either none or exactly one of regex, type_constraints, "
                "json_schema, or grammar can be provided."
            )
        return self.build_client()
    def build_client(self) -> "Outlines":
        try:
            import outlines.models as models
        except ImportError:
            raise ImportError(
                "Could not import the Outlines library. "
                "Please install it with `pip install outlines`."
            )
        def check_packages_installed(
            packages: List[Union[str, Tuple[str, str]]],
        ) -> None:
            missing_packages = [
                pkg if isinstance(pkg, str) else pkg[0]
                for pkg in packages
                if importlib.util.find_spec(pkg[1] if isinstance(pkg, tuple) else pkg)
                is None
            ]
            if missing_packages:
                raise ImportError(  # todo this is displaying wrong
                    f"Missing packages: {', '.join(missing_packages)}. "
                    "You can install them with:\n\n"
                    f"    pip install {' '.join(missing_packages)}"
                )
        if self.backend == "llamacpp":
            if ".gguf" in self.model:
                creator, repo_name, file_name = self.model.split("/", 2)
                repo_id = f"{creator}/{repo_name}"
            else:  # todo add auto-file-selection if no file is given
                raise ValueError("GGUF file_name must be provided for llama.cpp.")
            check_packages_installed([("llama-cpp-python", "llama_cpp")])
            self.client = models.llamacpp(repo_id, file_name, **self.model_kwargs)
        elif self.backend == "transformers":
            check_packages_installed(["transformers", "torch", "datasets"])
            self.client = models.transformers(self.model, **self.model_kwargs)
        elif self.backend == "transformers_vision":
            check_packages_installed(
                ["transformers", "datasets", "torchvision", "PIL", "flash_attn"]
            )
            from transformers import LlavaNextForConditionalGeneration
            if not hasattr(models, "transformers_vision"):
                raise ValueError(
                    "transformers_vision backend is not supported, "
                    "please install the correct outlines version."
                )
            self.client = models.transformers_vision(
                self.model,
                model_class=LlavaNextForConditionalGeneration,
                **self.model_kwargs,
            )
        elif self.backend == "vllm":
            if platform.system() == "Darwin":
                raise ValueError("vLLM backend is not supported on macOS.")
            check_packages_installed(["vllm"])
            self.client = models.vllm(self.model, **self.model_kwargs)
        elif self.backend == "mlxlm":
            check_packages_installed(["mlx"])
            self.client = models.mlxlm(self.model, **self.model_kwargs)
        else:
            raise ValueError(f"Unsupported backend: {self.backend}")
        return self
    @property
    def _llm_type(self) -> str:
        return "outlines"
    @property
    def _default_params(self) -> Dict[str, Any]:
        return {
            "max_tokens": self.max_tokens,
            "stop_at": self.stop,
            **self.model_kwargs,
        }
    @property
    def _identifying_params(self) -> Dict[str, Any]:
        return {
            "model": self.model,
            "backend": self.backend,
            "regex": self.regex,
            "type_constraints": self.type_constraints,
            "json_schema": self.json_schema,
            "grammar": self.grammar,
            **self._default_params,
        }
    @property
    def _generator(self) -> Any:
        from outlines import generate
        if self.custom_generator:
            return self.custom_generator
        if self.regex:
            return generate.regex(self.client, regex_str=self.regex)
        if self.type_constraints:
            return generate.format(self.client, python_type=self.type_constraints)
        if self.json_schema:
            return generate.json(self.client, schema_object=self.json_schema)
        if self.grammar:
            return generate.cfg(self.client, cfg_str=self.grammar)
        return generate.text(self.client)
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        params = {**self._default_params, **kwargs}
        if stop:
            params["stop_at"] = stop
        response = ""
        if self.streaming:
            for chunk in self._stream(
                prompt=prompt,
                stop=params["stop_at"],
                run_manager=run_manager,
                **params,
            ):
                response += chunk.text
        else:
            response = self._generator(prompt, **params)
        return response
    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        params = {**self._default_params, **kwargs}
        if stop:
            params["stop_at"] = stop
        for token in self._generator.stream(prompt, **params):
            if run_manager:
                run_manager.on_llm_new_token(token)
            yield GenerationChunk(text=token)
    @property
    def tokenizer(self) -> Any:
        """Access the tokenizer for the underlying model.
        .encode() to tokenize text.
        .decode() to convert tokens back to text.
        """
        if hasattr(self.client, "tokenizer"):
            return self.client.tokenizer
        raise ValueError("Tokenizer not found")
--- a/libs/community/tests/integration_tests/chat_models/test_outlines.py
+++ b/libs/community/tests/integration_tests/chat_models/test_outlines.py
@ -0,0 +1,177 @@
 # flake8: noqa
 """Test ChatOutlines wrapper."""
 from typing import Generator
 import re
 import platform
 import pytest
 from langchain_community.chat_models.outlines import ChatOutlines
 from langchain_core.messages import AIMessage, HumanMessage, BaseMessage
 from langchain_core.messages import BaseMessageChunk
 from pydantic import BaseModel
 from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
 MODEL = "microsoft/Phi-3-mini-4k-instruct"
 LLAMACPP_MODEL = "bartowski/qwen2.5-7b-ins-v3-GGUF/qwen2.5-7b-ins-v3-Q4_K_M.gguf"
 BACKENDS = ["transformers", "llamacpp"]
 if platform.system() != "Darwin":
    BACKENDS.append("vllm")
 if platform.system() == "Darwin":
    BACKENDS.append("mlxlm")
@pytest.fixture(params=BACKENDS)
 def chat_model(request: pytest.FixtureRequest) -> ChatOutlines:
    if request.param == "llamacpp":
        return ChatOutlines(model=LLAMACPP_MODEL, backend=request.param)
    else:
        return ChatOutlines(model=MODEL, backend=request.param)
 def test_chat_outlines_inference(chat_model: ChatOutlines) -> None:
    """Test valid ChatOutlines inference."""
    messages = [HumanMessage(content="Say foo:")]
    output = chat_model.invoke(messages)
    assert isinstance(output, AIMessage)
    assert len(output.content) > 1
 def test_chat_outlines_streaming(chat_model: ChatOutlines) -> None:
    """Test streaming tokens from ChatOutlines."""
    messages = [HumanMessage(content="How do you say 'hello' in Spanish?")]
    generator = chat_model.stream(messages)
    stream_results_string = ""
    assert isinstance(generator, Generator)
    for chunk in generator:
        assert isinstance(chunk, BaseMessageChunk)
        if isinstance(chunk.content, str):
            stream_results_string += chunk.content
        else:
            raise ValueError(
                f"Invalid content type, only str is supported, "
                f"got {type(chunk.content)}"
            )
    assert len(stream_results_string.strip()) > 1
 def test_chat_outlines_streaming_callback(chat_model: ChatOutlines) -> None:
    """Test that streaming correctly invokes on_llm_new_token callback."""
    MIN_CHUNKS = 5
    callback_handler = FakeCallbackHandler()
    chat_model.callbacks = [callback_handler]
    chat_model.verbose = True
    messages = [HumanMessage(content="Can you count to 10?")]
    chat_model.invoke(messages)
    assert callback_handler.llm_streams >= MIN_CHUNKS
 def test_chat_outlines_regex(chat_model: ChatOutlines) -> None:
    """Test regex for generating a valid IP address"""
    ip_regex = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    chat_model.regex = ip_regex
    assert chat_model.regex == ip_regex
    messages = [HumanMessage(content="What is the IP address of Google's DNS server?")]
    output = chat_model.invoke(messages)
    assert isinstance(output, AIMessage)
    assert re.match(
        ip_regex, str(output.content)
    ), f"Generated output '{output.content}' is not a valid IP address"
 def test_chat_outlines_type_constraints(chat_model: ChatOutlines) -> None:
    """Test type constraints for generating an integer"""
    chat_model.type_constraints = int
    messages = [
        HumanMessage(
            content="What is the answer to life, the universe, and everything?"
        )
    ]
    output = chat_model.invoke(messages)
    assert isinstance(int(str(output.content)), int)
 def test_chat_outlines_json(chat_model: ChatOutlines) -> None:
    """Test json for generating a valid JSON object"""
    class Person(BaseModel):
        name: str
    chat_model.json_schema = Person
    messages = [HumanMessage(content="Who are the main contributors to LangChain?")]
    output = chat_model.invoke(messages)
    person = Person.model_validate_json(str(output.content))
    assert isinstance(person, Person)
 def test_chat_outlines_grammar(chat_model: ChatOutlines) -> None:
    """Test grammar for generating a valid arithmetic expression"""
    if chat_model.backend == "mlxlm":
        pytest.skip("MLX grammars not yet supported.")
    chat_model.grammar = """
        ?start: expression
        ?expression: term (("+" | "-") term)*
        ?term: factor (("*" | "/") factor)*
        ?factor: NUMBER | "-" factor | "(" expression ")"
        %import common.NUMBER
        %import common.WS
        %ignore WS
    """
    messages = [HumanMessage(content="Give me a complex arithmetic expression:")]
    output = chat_model.invoke(messages)
    # Validate the output is a non-empty string
    assert (
        isinstance(output.content, str) and output.content.strip()
    ), "Output should be a non-empty string"
    # Use a simple regex to check if the output contains basic arithmetic operations and numbers
    assert re.search(
        r"[\d\+\-\*/\(\)]+", output.content
    ), f"Generated output '{output.content}' does not appear to be a valid arithmetic expression"
 def test_chat_outlines_with_structured_output(chat_model: ChatOutlines) -> None:
    """Test that ChatOutlines can generate structured outputs"""
    class AnswerWithJustification(BaseModel):
        """An answer to the user question along with justification for the answer."""
        answer: str
        justification: str
    structured_chat_model = chat_model.with_structured_output(AnswerWithJustification)
    result = structured_chat_model.invoke(
        "What weighs more, a pound of bricks or a pound of feathers?"
    )
    assert isinstance(result, AnswerWithJustification)
    assert isinstance(result.answer, str)
    assert isinstance(result.justification, str)
    assert len(result.answer) > 0
    assert len(result.justification) > 0
    structured_chat_model_with_raw = chat_model.with_structured_output(
        AnswerWithJustification, include_raw=True
    )
    result_with_raw = structured_chat_model_with_raw.invoke(
        "What weighs more, a pound of bricks or a pound of feathers?"
    )
    assert isinstance(result_with_raw, dict)
    assert "raw" in result_with_raw
    assert "parsed" in result_with_raw
    assert "parsing_error" in result_with_raw
    assert isinstance(result_with_raw["raw"], BaseMessage)
    assert isinstance(result_with_raw["parsed"], AnswerWithJustification)
    assert result_with_raw["parsing_error"] is None
--- a/libs/community/tests/integration_tests/llms/test_outlines.py
+++ b/libs/community/tests/integration_tests/llms/test_outlines.py
@ -0,0 +1,123 @@
 # flake8: noqa
 """Test Outlines wrapper."""
 from typing import Generator
 import re
 import platform
 import pytest
 from langchain_community.llms.outlines import Outlines
 from pydantic import BaseModel
 from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
 MODEL = "microsoft/Phi-3-mini-4k-instruct"
 LLAMACPP_MODEL = "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf"
 BACKENDS = ["transformers", "llamacpp"]
 if platform.system() != "Darwin":
    BACKENDS.append("vllm")
 if platform.system() == "Darwin":
    BACKENDS.append("mlxlm")
@pytest.fixture(params=BACKENDS)
 def llm(request: pytest.FixtureRequest) -> Outlines:
    if request.param == "llamacpp":
        return Outlines(model=LLAMACPP_MODEL, backend=request.param, max_tokens=100)
    else:
        return Outlines(model=MODEL, backend=request.param, max_tokens=100)
 def test_outlines_inference(llm: Outlines) -> None:
    """Test valid outlines inference."""
    output = llm.invoke("Say foo:")
    assert isinstance(output, str)
    assert len(output) > 1
 def test_outlines_streaming(llm: Outlines) -> None:
    """Test streaming tokens from Outlines."""
    generator = llm.stream("Q: How do you say 'hello' in Spanish?\n\nA:")
    stream_results_string = ""
    assert isinstance(generator, Generator)
    for chunk in generator:
        print(chunk)
        assert isinstance(chunk, str)
        stream_results_string += chunk
    print(stream_results_string)
    assert len(stream_results_string.strip()) > 1
 def test_outlines_streaming_callback(llm: Outlines) -> None:
    """Test that streaming correctly invokes on_llm_new_token callback."""
    MIN_CHUNKS = 5
    callback_handler = FakeCallbackHandler()
    llm.callbacks = [callback_handler]
    llm.verbose = True
    llm.invoke("Q: Can you count to 10? A:'1, ")
    assert callback_handler.llm_streams >= MIN_CHUNKS
 def test_outlines_regex(llm: Outlines) -> None:
    """Test regex for generating a valid IP address"""
    ip_regex = r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
    llm.regex = ip_regex
    assert llm.regex == ip_regex
    output = llm.invoke("Q: What is the IP address of googles dns server?\n\nA: ")
    assert isinstance(output, str)
    assert re.match(
        ip_regex, output
    ), f"Generated output '{output}' is not a valid IP address"
 def test_outlines_type_constraints(llm: Outlines) -> None:
    """Test type constraints for generating an integer"""
    llm.type_constraints = int
    output = llm.invoke(
        "Q: What is the answer to life, the universe, and everything?\n\nA: "
    )
    assert int(output)
 def test_outlines_json(llm: Outlines) -> None:
    """Test json for generating a valid JSON object"""
    class Person(BaseModel):
        name: str
    llm.json_schema = Person
    output = llm.invoke("Q: Who is the author of LangChain?\n\nA: ")
    person = Person.model_validate_json(output)
    assert isinstance(person, Person)
 def test_outlines_grammar(llm: Outlines) -> None:
    """Test grammar for generating a valid arithmetic expression"""
    llm.grammar = """
        ?start: expression
        ?expression: term (("+" | "-") term)*
        ?term: factor (("*" | "/") factor)*
        ?factor: NUMBER | "-" factor | "(" expression ")"
        %import common.NUMBER
        %import common.WS
        %ignore WS
    """
    output = llm.invoke("Here is a complex arithmetic expression: ")
    # Validate the output is a non-empty string
    assert (
        isinstance(output, str) and output.strip()
    ), "Output should be a non-empty string"
    # Use a simple regex to check if the output contains basic arithmetic operations and numbers
    assert re.search(
        r"[\d\+\-\*/\(\)]+", output
    ), f"Generated output '{output}' does not appear to be a valid arithmetic expression"
--- a/libs/community/tests/unit_tests/chat_models/test_imports.py
+++ b/libs/community/tests/unit_tests/chat_models/test_imports.py
@ -36,6 +36,7 @@ EXPECTED_ALL = [
    "ChatOCIModelDeploymentTGI",
    "ChatOllama",
    "ChatOpenAI",
    "ChatOutlines",
    "ChatPerplexity",
    "ChatPremAI",
    "ChatSambaNovaCloud",
--- a/libs/community/tests/unit_tests/chat_models/test_outlines.py
+++ b/libs/community/tests/unit_tests/chat_models/test_outlines.py
@ -0,0 +1,91 @@
 import pytest
 from _pytest.monkeypatch import MonkeyPatch
 from pydantic import BaseModel, Field
 from langchain_community.chat_models.outlines import ChatOutlines
 def test_chat_outlines_initialization(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    chat = ChatOutlines(
        model="microsoft/Phi-3-mini-4k-instruct",
        max_tokens=42,
        stop=["\n"],
    )
    assert chat.model == "microsoft/Phi-3-mini-4k-instruct"
    assert chat.max_tokens == 42
    assert chat.backend == "transformers"
    assert chat.stop == ["\n"]
 def test_chat_outlines_backend_llamacpp(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    chat = ChatOutlines(
        model="TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf",
        backend="llamacpp",
    )
    assert chat.backend == "llamacpp"
 def test_chat_outlines_backend_vllm(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    chat = ChatOutlines(model="microsoft/Phi-3-mini-4k-instruct", backend="vllm")
    assert chat.backend == "vllm"
 def test_chat_outlines_backend_mlxlm(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    chat = ChatOutlines(model="microsoft/Phi-3-mini-4k-instruct", backend="mlxlm")
    assert chat.backend == "mlxlm"
 def test_chat_outlines_with_regex(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    regex = r"\d{3}-\d{3}-\d{4}"
    chat = ChatOutlines(model="microsoft/Phi-3-mini-4k-instruct", regex=regex)
    assert chat.regex == regex
 def test_chat_outlines_with_type_constraints(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    chat = ChatOutlines(model="microsoft/Phi-3-mini-4k-instruct", type_constraints=int)
    assert chat.type_constraints == int  # noqa
 def test_chat_outlines_with_json_schema(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    class TestSchema(BaseModel):
        name: str = Field(description="A person's name")
        age: int = Field(description="A person's age")
    chat = ChatOutlines(
        model="microsoft/Phi-3-mini-4k-instruct", json_schema=TestSchema
    )
    assert chat.json_schema == TestSchema
 def test_chat_outlines_with_grammar(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    grammar = """
 ?start: expression
 ?expression: term (("+" | "-") term)*
 ?term: factor (("*" | "/") factor)*
 ?factor: NUMBER | "-" factor | "(" expression ")"
 %import common.NUMBER
    """
    chat = ChatOutlines(model="microsoft/Phi-3-mini-4k-instruct", grammar=grammar)
    assert chat.grammar == grammar
 def test_raise_for_multiple_output_constraints(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(ChatOutlines, "build_client", lambda self: self)
    with pytest.raises(ValueError):
        ChatOutlines(
            model="microsoft/Phi-3-mini-4k-instruct",
            type_constraints=int,
            regex=r"\d{3}-\d{3}-\d{4}",
        )
--- a/libs/community/tests/unit_tests/llms/test_imports.py
+++ b/libs/community/tests/unit_tests/llms/test_imports.py
@ -67,6 +67,7 @@ EXPECT_ALL = [
    "OpenAIChat",
    "OpenLLM",
    "OpenLM",
    "Outlines",
    "PaiEasEndpoint",
    "Petals",
    "PipelineAI",
--- a/libs/community/tests/unit_tests/llms/test_outlines.py
+++ b/libs/community/tests/unit_tests/llms/test_outlines.py
@ -0,0 +1,92 @@
 import pytest
 from _pytest.monkeypatch import MonkeyPatch
 from langchain_community.llms.outlines import Outlines
 def test_outlines_initialization(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    llm = Outlines(
        model="microsoft/Phi-3-mini-4k-instruct",
        max_tokens=42,
        stop=["\n"],
    )
    assert llm.model == "microsoft/Phi-3-mini-4k-instruct"
    assert llm.max_tokens == 42
    assert llm.backend == "transformers"
    assert llm.stop == ["\n"]
 def test_outlines_backend_llamacpp(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    llm = Outlines(
        model="TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf",
        backend="llamacpp",
    )
    assert llm.backend == "llamacpp"
 def test_outlines_backend_vllm(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", backend="vllm")
    assert llm.backend == "vllm"
 def test_outlines_backend_mlxlm(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", backend="mlxlm")
    assert llm.backend == "mlxlm"
 def test_outlines_with_regex(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    regex = r"\d{3}-\d{3}-\d{4}"
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", regex=regex)
    assert llm.regex == regex
 def test_outlines_with_type_constraints(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", type_constraints=int)
    assert llm.type_constraints == int  # noqa
 def test_outlines_with_json_schema(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    from pydantic import BaseModel, Field
    class TestSchema(BaseModel):
        name: str = Field(description="A person's name")
        age: int = Field(description="A person's age")
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", json_schema=TestSchema)
    assert llm.json_schema == TestSchema
 def test_outlines_with_grammar(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    grammar = """
    ?start: expression
    ?expression: term (("+" | "-") term)*
    ?term: factor (("*" | "/") factor)*
    ?factor: NUMBER | "-" factor | "(" expression ")"
    %import common.NUMBER
    """
    llm = Outlines(model="microsoft/Phi-3-mini-4k-instruct", grammar=grammar)
    assert llm.grammar == grammar
 def test_raise_for_multiple_output_constraints(monkeypatch: MonkeyPatch) -> None:
    monkeypatch.setattr(Outlines, "build_client", lambda self: self)
    with pytest.raises(ValueError):
        Outlines(
            model="microsoft/Phi-3-mini-4k-instruct",
            type_constraints=int,
            regex=r"\d{3}-\d{3}-\d{4}",
        )
        Outlines(
            model="microsoft/Phi-3-mini-4k-instruct",
            type_constraints=int,
            regex=r"\d{3}-\d{3}-\d{4}",
        )