langchain/docs/docs/how_to/output_parser_json.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72b1b316",
   "metadata": {},
   "source": [
    "# How to parse JSON output\n",
    "\n",
    ":::info Prerequisites\n",
    "\n",
    "This guide assumes familiarity with the following concepts:\n",
    "- [Chat models](/docs/concepts/#chat-models)\n",
    "- [Output parsers](/docs/concepts/#output-parsers)\n",
    "- [Prompt templates](/docs/concepts/#prompt-templates)\n",
    "- [Structured output](/docs/how_to/structured_output)\n",
    "- [Chaining runnables together](/docs/how_to/sequence/)\n",
    "\n",
    ":::\n",
    "\n",
    "While some model providers support [built-in ways to return structured output](/docs/how_to/structured_output), not all do. We can use an output parser to help users to specify an arbitrary JSON schema via the prompt, query a model for outputs that conform to that schema, and finally parse that schema as JSON.\n",
    "\n",
    ":::note\n",
    "Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON.\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae909b7a",
   "metadata": {},
   "source": [
    "The [`JsonOutputParser`](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.json.JsonOutputParser.html) is one built-in option for prompting for and then parsing JSON output. While it is similar in functionality to the [`PydanticOutputParser`](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.pydantic.PydanticOutputParser.html), it also supports streaming back partial JSON objects.\n",
    "\n",
    "Here's an example of how it can be used alongside [Pydantic](https://docs.pydantic.dev/) to conveniently declare the expected schema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd9d9110",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU langchain langchain-openai\n",
    "\n",
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4ccf45a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'setup': \"Why couldn't the bicycle stand up by itself?\",\n",
       " 'punchline': 'Because it was two tired!'}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.output_parsers import JsonOutputParser\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import ChatOpenAI\n",
    "from pydantic import BaseModel, Field\n",
    "\n",
    "model = ChatOpenAI(temperature=0)\n",
    "\n",
    "\n",
    "# Define your desired data structure.\n",
    "class Joke(BaseModel):\n",
    "    setup: str = Field(description=\"question to set up a joke\")\n",
    "    punchline: str = Field(description=\"answer to resolve the joke\")\n",
    "\n",
    "\n",
    "# And a query intented to prompt a language model to populate the data structure.\n",
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "# Set up a parser + inject instructions into the prompt template.\n",
    "parser = JsonOutputParser(pydantic_object=Joke)\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | model | parser\n",
    "\n",
    "chain.invoke({\"query\": joke_query})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51ffa2e3",
   "metadata": {},
   "source": [
    "Note that we are passing `format_instructions` from the parser directly into the prompt. You can and should experiment with adding your own formatting hints in the other parts of your prompt to either augment or replace the default instructions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "72de9c82",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The output should be formatted as a JSON instance that conforms to the JSON schema below.\\n\\nAs an example, for the schema {\"properties\": {\"foo\": {\"title\": \"Foo\", \"description\": \"a list of strings\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}, \"required\": [\"foo\"]}\\nthe object {\"foo\": [\"bar\", \"baz\"]} is a well-formatted instance of the schema. The object {\"properties\": {\"foo\": [\"bar\", \"baz\"]}} is not well-formatted.\\n\\nHere is the output schema:\\n```\\n{\"properties\": {\"setup\": {\"title\": \"Setup\", \"description\": \"question to set up a joke\", \"type\": \"string\"}, \"punchline\": {\"title\": \"Punchline\", \"description\": \"answer to resolve the joke\", \"type\": \"string\"}}, \"required\": [\"setup\", \"punchline\"]}\\n```'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "parser.get_format_instructions()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37d801be",
   "metadata": {},
   "source": [
    "## Streaming\n",
    "\n",
    "As mentioned above, a key difference between the `JsonOutputParser` and the `PydanticOutputParser` is that the `JsonOutputParser` output parser supports streaming partial chunks. Here's what that looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0309256d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{}\n",
      "{'setup': ''}\n",
      "{'setup': 'Why'}\n",
      "{'setup': 'Why couldn'}\n",
      "{'setup': \"Why couldn't\"}\n",
      "{'setup': \"Why couldn't the\"}\n",
      "{'setup': \"Why couldn't the bicycle\"}\n",
      "{'setup': \"Why couldn't the bicycle stand\"}\n",
      "{'setup': \"Why couldn't the bicycle stand up\"}\n",
      "{'setup': \"Why couldn't the bicycle stand up by\"}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself\"}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\"}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': ''}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because'}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because it'}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because it was'}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because it was two'}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because it was two tired'}\n",
      "{'setup': \"Why couldn't the bicycle stand up by itself?\", 'punchline': 'Because it was two tired!'}\n"
     ]
    }
   ],
   "source": [
    "for s in chain.stream({\"query\": joke_query}):\n",
    "    print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "344bd968",
   "metadata": {},
   "source": [
    "## Without Pydantic\n",
    "\n",
    "You can also use the `JsonOutputParser` without Pydantic. This will prompt the model to return JSON, but doesn't provide specifics about what the schema should be."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "dd3806d1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'response': \"Sure! Here's a joke for you: Why couldn't the bicycle stand up by itself? Because it was two tired!\"}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "joke_query = \"Tell me a joke.\"\n",
    "\n",
    "parser = JsonOutputParser()\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    template=\"Answer the user query.\\n{format_instructions}\\n{query}\\n\",\n",
    "    input_variables=[\"query\"],\n",
    "    partial_variables={\"format_instructions\": parser.get_format_instructions()},\n",
    ")\n",
    "\n",
    "chain = prompt | model | parser\n",
    "\n",
    "chain.invoke({\"query\": joke_query})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1eefe12b",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "You've now learned one way to prompt a model to return structured JSON. Next, check out the [broader guide on obtaining structured output](/docs/how_to/structured_output) for other techniques."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4d12261",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}