langchain/docs/docs/how_to/output_parser_yaml.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72b1b316",
   "metadata": {},
   "source": [
    "# How to parse YAML output\n",
    "\n",
    ":::info Prerequisites\n",
    "\n",
    "This guide assumes familiarity with the following concepts:\n",
    "- [Chat models](/docs/concepts/chat_models)\n",
    "- [Output parsers](/docs/concepts/output_parsers)\n",
    "- [Prompt templates](/docs/concepts/prompt_templates)\n",
    "- [Structured output](/docs/how_to/structured_output)\n",
    "- [Chaining runnables together](/docs/how_to/sequence/)\n",
    "\n",
    ":::\n",
    "\n",
    "LLMs from different providers often have different strengths depending on the specific data they are trained on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
    "\n",
    "This output parser allows users to specify an arbitrary schema and query LLMs for outputs that conform to that schema, using YAML to format their response.\n",
    "\n",
    ":::note\n",
    "Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed YAML.\n",
    ":::\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f142c8ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU langchain langchain-openai\n",
    "\n",
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc479f3a",
   "metadata": {},
   "source": [
    "We use [Pydantic](https://docs.pydantic.dev) with the [`YamlOutputParser`](https://python.langchain.com/api_reference/langchain/output_parsers/langchain.output_parsers.yaml.YamlOutputParser.html#langchain.output_parsers.yaml.YamlOutputParser) to declare our data model and give the model more context as to what type of YAML it should generate:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4ccf45a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Joke(setup=\"Why couldn't the bicycle find its way home?\", punchline='Because it lost its bearings!')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.output_parsers import YamlOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "from pydantic import BaseModel, Field\n",
    "\n",
    "\n",
    "# Define your desired data structure.\n",
    "class Joke(BaseModel):\n",
    "    setup: str = Field(description=\"question to set up a joke\")\n",
    "    punchline: str = Field(description=\"answer to resolve the joke\")\n",
    "\n",
    "\n",
    "model = ChatOpenAI(temperature=0)\n",
    "\n",
    "# And a query intented to prompt a language model to populate the data structure.\n",
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = YamlOutputParser(pydantic_object=Joke)\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | model | parser\n",
    "\n",
    "chain.invoke({\"query\": joke_query})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25e2254a",
   "metadata": {},
   "source": [
    "The parser will automatically parse the output YAML and create a Pydantic model with the data. We can see the parser's `format_instructions`, which get added to the prompt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a4d12261",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The output should be formatted as a YAML instance that conforms to the given JSON schema below.\\n\\n# Examples\\n## Schema\\n```\\n{\"title\": \"Players\", \"description\": \"A list of players\", \"type\": \"array\", \"items\": {\"$ref\": \"#/definitions/Player\"}, \"definitions\": {\"Player\": {\"title\": \"Player\", \"type\": \"object\", \"properties\": {\"name\": {\"title\": \"Name\", \"description\": \"Player name\", \"type\": \"string\"}, \"avg\": {\"title\": \"Avg\", \"description\": \"Batting average\", \"type\": \"number\"}}, \"required\": [\"name\", \"avg\"]}}}\\n```\\n## Well formatted instance\\n```\\n- name: John Doe\\n  avg: 0.3\\n- name: Jane Maxfield\\n  avg: 1.4\\n```\\n\\n## Schema\\n```\\n{\"properties\": {\"habit\": { \"description\": \"A common daily habit\", \"type\": \"string\" }, \"sustainable_alternative\": { \"description\": \"An environmentally friendly alternative to the habit\", \"type\": \"string\"}}, \"required\": [\"habit\", \"sustainable_alternative\"]}\\n```\\n## Well formatted instance\\n```\\nhabit: Using disposable water bottles for daily hydration.\\nsustainable_alternative: Switch to a reusable water bottle to reduce plastic waste and decrease your environmental footprint.\\n``` \\n\\nPlease follow the standard YAML formatting conventions with an indent of 2 spaces and make sure that the data types adhere strictly to the following JSON schema: \\n```\\n{\"properties\": {\"setup\": {\"title\": \"Setup\", \"description\": \"question to set up a joke\", \"type\": \"string\"}, \"punchline\": {\"title\": \"Punchline\", \"description\": \"answer to resolve the joke\", \"type\": \"string\"}}, \"required\": [\"setup\", \"punchline\"]}\\n```\\n\\nMake sure to always enclose the YAML output in triple backticks (```). Please do not add anything other than valid YAML output!'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parser.get_format_instructions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b31ac9ce",
   "metadata": {},
   "source": [
    "You can and should experiment with adding your own formatting hints in the other parts of your prompt to either augment or replace the default instructions.\n",
    "\n",
    "## Next steps\n",
    "\n",
    "You've now learned how to prompt a model to return XML. Next, check out the [broader guide on obtaining structured output](/docs/how_to/structured_output) for other related techniques."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "666ba894",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}