mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-03 13:43:24 +00:00
# Description This pull request aims to address specific issues related to the ambiguity and error-proneness of the output types of certain output parsers, as well as the absence of unit tests for some parsers. These issues could potentially lead to runtime errors or unexpected behaviors due to type mismatches when used, causing confusion for developers and users. Through clarifying output types, this PR seeks to improve the stability and reliability. Therefore, this pull request - fixes the `OutputType` of OutputParsers to be the expected type; - e.g. `OutputType` property of `EnumOutputParser` raises `TypeError`. This PR introduce a logic to extract `OutputType` from its attribute. - and fixes the legacy API in OutputParsers like `LLMChain.run` to the modern API like `LLMChain.invoke`; - Note: For `OutputFixingParser`, `RetryOutputParser` and `RetryWithErrorOutputParser`, this PR introduces `legacy` attribute with False as default value in order to keep the backward compatibility - and adds the tests for the `OutputFixingParser` and `RetryOutputParser`. The following table shows my expected output and the actual output of the `OutputType` of OutputParsers. I have used this table to fix `OutputType` of OutputParsers. | Class Name of OutputParser | My Expected `OutputType` (after this PR)| Actual `OutputType` [evidence](#evidence) (before this PR)| Fix Required | |---------|--------------|---------|--------| | BooleanOutputParser | `<class 'bool'>` | `<class 'bool'>` | NO | | CombiningOutputParser | `typing.Dict[str, Any]` | `TypeError` is raised | YES | | DatetimeOutputParser | `<class 'datetime.datetime'>` | `<class 'datetime.datetime'>` | NO | | EnumOutputParser(enum=MyEnum) | `MyEnum` | `TypeError` is raised | YES | | OutputFixingParser | The same type as `self.parser.OutputType` | `~T` | YES | | CommaSeparatedListOutputParser | `typing.List[str]` | `typing.List[str]` | NO | | MarkdownListOutputParser | `typing.List[str]` | `typing.List[str]` | NO | | NumberedListOutputParser | `typing.List[str]` | `typing.List[str]` | NO | | JsonOutputKeyToolsParser | `typing.Any` | `typing.Any` | NO | | JsonOutputToolsParser | `typing.Any` | `typing.Any` | NO | | PydanticToolsParser | `typing.Any` | `typing.Any` | NO | | PandasDataFrameOutputParser | `typing.Dict[str, Any]` | `TypeError` is raised | YES | | PydanticOutputParser(pydantic_object=MyModel) | `<class '__main__.MyModel'>` | `<class '__main__.MyModel'>` | NO | | RegexParser | `typing.Dict[str, str]` | `TypeError` is raised | YES | | RegexDictParser | `typing.Dict[str, str]` | `TypeError` is raised | YES | | RetryOutputParser | The same type as `self.parser.OutputType` | `~T` | YES | | RetryWithErrorOutputParser | The same type as `self.parser.OutputType` | `~T` | YES | | StructuredOutputParser | `typing.Dict[str, Any]` | `TypeError` is raised | YES | | YamlOutputParser(pydantic_object=MyModel) | `MyModel` | `~T` | YES | NOTE: In "Fix Required", "YES" means that it is required to fix in this PR while "NO" means that it is not required. # Issue No issues for this PR. # Twitter handle - [hmdev3](https://twitter.com/hmdev3) # Questions: 1. Is it required to create tests for legacy APIs `LLMChain.run` in the following scripts? - libs/langchain/tests/unit_tests/output_parsers/test_fix.py; - libs/langchain/tests/unit_tests/output_parsers/test_retry.py. 2. Is there a more appropriate expected output type than I expect in the above table? - e.g. the `OutputType` of `CombiningOutputParser` should be SOMETHING... # Actual outputs (before this PR) <div id='evidence'></div> <details><summary>Actual outputs</summary> ## Requirements - Python==3.9.13 - langchain==0.1.13 ```python Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import langchain >>> langchain.__version__ '0.1.13' >>> from langchain import output_parsers ``` ### `BooleanOutputParser` ```python >>> output_parsers.BooleanOutputParser().OutputType <class 'bool'> ``` ### `CombiningOutputParser` ```python >>> output_parsers.CombiningOutputParser(parsers=[output_parsers.DatetimeOutputParser(), output_parsers.CommaSeparatedListOutputParser()]).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable CombiningOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `DatetimeOutputParser` ```python >>> output_parsers.DatetimeOutputParser().OutputType <class 'datetime.datetime'> ``` ### `EnumOutputParser` ```python >>> from enum import Enum >>> class MyEnum(Enum): ... a = 'a' ... b = 'b' ... >>> output_parsers.EnumOutputParser(enum=MyEnum).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable EnumOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `OutputFixingParser` ```python >>> output_parsers.OutputFixingParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `CommaSeparatedListOutputParser` ```python >>> output_parsers.CommaSeparatedListOutputParser().OutputType typing.List[str] ``` ### `MarkdownListOutputParser` ```python >>> output_parsers.MarkdownListOutputParser().OutputType typing.List[str] ``` ### `NumberedListOutputParser` ```python >>> output_parsers.NumberedListOutputParser().OutputType typing.List[str] ``` ### `JsonOutputKeyToolsParser` ```python >>> output_parsers.JsonOutputKeyToolsParser(key_name='tool').OutputType typing.Any ``` ### `JsonOutputToolsParser` ```python >>> output_parsers.JsonOutputToolsParser().OutputType typing.Any ``` ### `PydanticToolsParser` ```python >>> from langchain.pydantic_v1 import BaseModel >>> class MyModel(BaseModel): ... a: int ... >>> output_parsers.PydanticToolsParser(tools=[MyModel, MyModel]).OutputType typing.Any ``` ### `PandasDataFrameOutputParser` ```python >>> output_parsers.PandasDataFrameOutputParser().OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable PandasDataFrameOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `PydanticOutputParser` ```python >>> output_parsers.PydanticOutputParser(pydantic_object=MyModel).OutputType <class '__main__.MyModel'> ``` ### `RegexParser` ```python >>> output_parsers.RegexParser(regex='$', output_keys=['a']).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable RegexParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `RegexDictParser` ```python >>> output_parsers.RegexDictParser(output_key_to_format={'a':'a'}).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable RegexDictParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `RetryOutputParser` ```python >>> output_parsers.RetryOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `RetryWithErrorOutputParser` ```python >>> output_parsers.RetryWithErrorOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `StructuredOutputParser` ```python >>> from langchain.output_parsers.structured import ResponseSchema >>> response_schemas = [ResponseSchema(name="foo",description="a list of strings",type="List[string]"),ResponseSchema(name="bar",description="a string",type="string"), ] >>> output_parsers.StructuredOutputParser.from_response_schemas(response_schemas).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable StructuredOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `YamlOutputParser` ```python >>> output_parsers.YamlOutputParser(pydantic_object=MyModel).OutputType ~T ``` <div> --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
39 lines
1.1 KiB
Python
39 lines
1.1 KiB
Python
from typing import Dict
|
|
|
|
from langchain.output_parsers.regex import RegexParser
|
|
|
|
# NOTE: The almost same constant variables in ./test_combining_parser.py
|
|
DEF_EXPECTED_RESULT = {
|
|
"confidence": "A",
|
|
"explanation": "Paris is the capital of France according to Wikipedia.",
|
|
}
|
|
|
|
DEF_README = """```json
|
|
{
|
|
"answer": "Paris",
|
|
"source": "https://en.wikipedia.org/wiki/France"
|
|
}
|
|
```
|
|
|
|
//Confidence: A, Explanation: Paris is the capital of France according to Wikipedia."""
|
|
|
|
|
|
def test_regex_parser_parse() -> None:
|
|
"""Test regex parser parse."""
|
|
parser = RegexParser(
|
|
regex=r"Confidence: (A|B|C), Explanation: (.*)",
|
|
output_keys=["confidence", "explanation"],
|
|
default_output_key="noConfidence",
|
|
)
|
|
assert DEF_EXPECTED_RESULT == parser.parse(DEF_README)
|
|
|
|
|
|
def test_regex_parser_output_type() -> None:
|
|
"""Test regex parser output type is Dict[str, str]."""
|
|
parser = RegexParser(
|
|
regex=r"Confidence: (A|B|C), Explanation: (.*)",
|
|
output_keys=["confidence", "explanation"],
|
|
default_output_key="noConfidence",
|
|
)
|
|
assert parser.OutputType is Dict[str, str]
|