# Output Parsers

Language models output text. But many times you may want to get more structured information than just text back. This is where output parsers come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser must implement:

- `get_format_instructions() -> str`: A method which returns a string containing instructions for how the output of a language model should be formatted.
- `parse(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and parses it into some structure.

And then one optional one:

- `parse_with_prompt(str) -> Any`: A method which takes in a string (assumed to be the response from a language model) and a prompt (assumed to the prompt that generated such a response) and parses it into some structure. The prompt is largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from the prompt to do so.


Below we go over some examples of output parsers.

In [1]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

## PydanticOutputParser
This output parser allows users to specify an arbitrary JSON schema and query LLMs for JSON outputs that conform to that schema.

Keep in mind that large language models are leaky abstractions! You'll have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability already drops off dramatically. 

Use Pydantic to declare your data model. Pydantic's BaseModel like a Python dataclass, but with actual type checking + coercion.

In [2]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

In [3]:
model_name = 'text-davinci-003'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)

In [4]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    
    # You can add custom validation logic easily with Pydantic.
    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Badly formed question!")
        return field

# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=joke_query)

output = model(_input.to_string())

parser.parse(output)

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

In [5]:
# Here's another example, but with a compound typed field.
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=actor_query)

output = model(_input.to_string())

parser.parse(output)

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])

## Fixing Output Parsing Mistakes

The above guardrail simply tries to parse the LLM response. If it does not parse correctly, then it errors.

But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.

For this example, we'll use the above OutputParser. Here's what happens if we pass it a result that does not comply with the schema:

In [6]:
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

In [7]:
parser.parse(misformatted)

OutputParserException: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Now we can construct and use a `OutputFixingParser`. This output parser takes as an argument another output parser but also an LLM with which to try to correct any formatting mistakes.

In [9]:
from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

In [10]:
new_parser.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

## Fixing Output Parsing Mistakes with the original prompt

While in some cases it is possible to fix any parsing mistakes by only looking at the output, in other cases it can't. An example of this is when the output is not just in the incorrect format, but is partially complete. Consider the below example.

In [11]:
template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response:"""
class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")
        
parser = PydanticOutputParser(pydantic_object=Action)

In [12]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

In [13]:
prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")

In [14]:
bad_response = '{"action": "search"}'

If we try to parse this response as is, we will get an error

In [15]:
parser.parse(bad_response)

OutputParserException: Failed to parse Action from completion {"action": "search"}. Got: 1 validation error for Action
action_input
  field required (type=value_error.missing)

If we try to use the `OutputFixingParser` to fix this error, it will be confused - namely, it doesn't know what to actually put for action input.

In [16]:
fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

In [17]:
fix_parser.parse(bad_response)

Action(action='search', action_input='keyword')

Instead, we can use the RetryOutputParser, which passes in the prompt (as well as the original output) to try again to get a better response.

In [19]:
from langchain.output_parsers import RetryWithErrorOutputParser

In [20]:
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=ChatOpenAI())

In [21]:
retry_parser.parse_with_prompt(bad_response, prompt_value)

Action(action='search', action_input='leo di caprios girlfriend')

<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>

---

# Older, less powerful parsers

## Structured Output Parser

While the Pydantic/JSON parser is more powerful, we initially experimented data structures having text fields only.

In [16]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

Here we define the response schema we want to receive.

In [17]:
response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

We now get a string that contains instructions for how the response should be formatted, and we then insert that into our prompt.

In [18]:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

We can now use this to format a prompt to send to the language model, and then parse the returned result.

In [19]:
model = OpenAI(temperature=0)

In [20]:
_input = prompt.format_prompt(question="what's the capital of france")
output = model(_input.to_string())

In [21]:
output_parser.parse(output)

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}

And here's an example of using this in a chat model

In [22]:
chat_model = ChatOpenAI(temperature=0)

In [23]:
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")  
    ],
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

In [24]:
_input = prompt.format_prompt(question="what's the capital of france")
output = chat_model(_input.to_messages())

In [25]:
output_parser.parse(output.content)

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}

## CommaSeparatedListOutputParser

Here's another parser strictly less powerful than Pydantic/JSON parsing.

In [26]:
from langchain.output_parsers import CommaSeparatedListOutputParser

In [27]:
output_parser = CommaSeparatedListOutputParser()

In [28]:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)

In [29]:
model = OpenAI(temperature=0)

In [30]:
_input = prompt.format(subject="ice cream flavors")
output = model(_input)

In [31]:
output_parser.parse(output)

['Vanilla',
 'Chocolate',
 'Strawberry',
 'Mint Chocolate Chip',
 'Cookies and Cream']