mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-28 15:00:23 +00:00
This PR fixes `QuestionListOutputParser` text splitting. `QuestionListOutputParser` incorrectly splits numbered list text into lines. If text doesn't end with `\n` , the regex doesn't capture the last item. So it always returns `n - 1` items, and `WebResearchRetriever.llm_chain` generates less queries than requested in the search prompt. How to reproduce: ```python from langchain.retrievers.web_research import QuestionListOutputParser parser = QuestionListOutputParser() good = parser.parse( """1. This is line one. 2. This is line two. """ # <-- ! ) bad = parser.parse( """1. This is line one. 2. This is line two.""" # <-- No new line. ) assert good.lines == ['1. This is line one.\n', '2. This is line two.\n'], good.lines assert bad.lines == ['1. This is line one.\n', '2. This is line two.'], bad.lines ``` NOTE: Last item will not contain a line break but this seems ok because the items are stripped in the `WebResearchRetriever.clean_search_query()`.