mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-24 23:54:14 +00:00
Merge branch 'master' into pprados/06-pdfplumber
This commit is contained in:
commit
af4fde385c
@ -50,11 +50,6 @@ locally to ensure that it looks good and is free of errors.
|
||||
If you're unable to build it locally that's okay as well, as you will be able to
|
||||
see a preview of the documentation on the pull request page.
|
||||
|
||||
From the **monorepo root**, run the following command to install the dependencies:
|
||||
|
||||
```bash
|
||||
poetry install --with lint,docs --no-root
|
||||
````
|
||||
|
||||
### Building
|
||||
|
||||
@ -158,14 +153,6 @@ the working directory to the `langchain-community` directory:
|
||||
cd [root]/libs/langchain-community
|
||||
```
|
||||
|
||||
Set up a virtual environment for the package if you haven't done so already.
|
||||
|
||||
Install the dependencies for the package.
|
||||
|
||||
```bash
|
||||
poetry install --with lint
|
||||
```
|
||||
|
||||
Then you can run the following commands to lint and format the in-code documentation:
|
||||
|
||||
```bash
|
||||
|
721
docs/docs/integrations/document_loaders/pymupdf4llm.ipynb
Normal file
721
docs/docs/integrations/document_loaders/pymupdf4llm.ipynb
Normal file
@ -0,0 +1,721 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: PyMuPDF4LLM\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PyMuPDF4LLMLoader\n",
|
||||
"\n",
|
||||
"This notebook provides a quick overview for getting started with PyMuPDF4LLM [document loader](https://python.langchain.com/docs/concepts/#document-loaders). For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the [GitHub repository](https://github.com/lakinduboteju/langchain-pymupdf4llm).\n",
|
||||
"\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"| Class | Package | Local | Serializable | JS support |\n",
|
||||
"| :--- | :--- | :---: | :---: | :---: |\n",
|
||||
"| [PyMuPDF4LLMLoader](https://github.com/lakinduboteju/langchain-pymupdf4llm) | [langchain_pymupdf4llm](https://pypi.org/project/langchain-pymupdf4llm) | ✅ | ❌ | ❌ |\n",
|
||||
"\n",
|
||||
"### Loader features\n",
|
||||
"\n",
|
||||
"| Source | Document Lazy Loading | Native Async Support | Extract Images | Extract Tables |\n",
|
||||
"| :---: | :---: | :---: | :---: | :---: |\n",
|
||||
"| PyMuPDF4LLMLoader | ✅ | ❌ | ✅ | ✅ |\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To access PyMuPDF4LLM document loader you'll need to install the `langchain-pymupdf4llm` integration package.\n",
|
||||
"\n",
|
||||
"### Credentials\n",
|
||||
"\n",
|
||||
"No credentials are required to use PyMuPDF4LLMLoader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"Install **langchain_community** and **langchain-pymupdf4llm**."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -qU langchain_community langchain-pymupdf4llm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialization\n",
|
||||
"\n",
|
||||
"Now we can instantiate our model object and load documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_pymupdf4llm import PyMuPDF4LLMLoader\n",
|
||||
"\n",
|
||||
"file_path = \"./example_data/layout-parser-paper.pdf\"\n",
|
||||
"loader = PyMuPDF4LLMLoader(file_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Load"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-06-22T01:27:10+00:00', 'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2021-06-22T01:27:10+00:00', 'trapped': '', 'modDate': 'D:20210622012710Z', 'creationDate': 'D:20210622012710Z', 'page': 0}, page_content='```\\nLayoutParser: A Unified Toolkit for Deep\\n\\n## Learning Based Document Image Analysis\\n\\n```\\n\\nZejiang Shen[1] (<28>), Ruochen Zhang[2], Melissa Dell[3], Benjamin Charles Germain\\nLee[4], Jacob Carlson[3], and Weining Li[5]\\n\\n1 Allen Institute for AI\\n```\\n shannons@allenai.org\\n\\n```\\n2 Brown University\\n```\\n ruochen zhang@brown.edu\\n\\n```\\n3 Harvard University\\n_{melissadell,jacob carlson}@fas.harvard.edu_\\n4 University of Washington\\n```\\n bcgl@cs.washington.edu\\n\\n```\\n5 University of Waterloo\\n```\\n w422li@uwaterloo.ca\\n\\n```\\n\\n**Abstract. Recent advances in document image analysis (DIA) have been**\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\n[The library is publicly available at https://layout-parser.github.io.](https://layout-parser.github.io)\\n\\n**Keywords: Document Image Analysis · Deep Learning · Layout Analysis**\\n\\n - Character Recognition · Open Source library · Toolkit.\\n\\n### 1 Introduction\\n\\n\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\n\\n')"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'producer': 'pdfTeX-1.40.21',\n",
|
||||
" 'creator': 'LaTeX with hyperref',\n",
|
||||
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'source': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'total_pages': 16,\n",
|
||||
" 'format': 'PDF 1.5',\n",
|
||||
" 'title': '',\n",
|
||||
" 'author': '',\n",
|
||||
" 'subject': '',\n",
|
||||
" 'keywords': '',\n",
|
||||
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'trapped': '',\n",
|
||||
" 'modDate': 'D:20210622012710Z',\n",
|
||||
" 'creationDate': 'D:20210622012710Z',\n",
|
||||
" 'page': 0}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import pprint\n",
|
||||
"\n",
|
||||
"pprint.pp(docs[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Lazy Load"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"6"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pages = []\n",
|
||||
"for doc in loader.lazy_load():\n",
|
||||
" pages.append(doc)\n",
|
||||
" if len(pages) >= 10:\n",
|
||||
" # do some paged operation, e.g.\n",
|
||||
" # index.upsert(page)\n",
|
||||
"\n",
|
||||
" pages = []\n",
|
||||
"len(pages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython.display import Markdown, display\n",
|
||||
"\n",
|
||||
"part = pages[0].page_content[778:1189]\n",
|
||||
"print(part)\n",
|
||||
"# Markdown rendering\n",
|
||||
"display(Markdown(part))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'producer': 'pdfTeX-1.40.21',\n",
|
||||
" 'creator': 'LaTeX with hyperref',\n",
|
||||
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'source': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'total_pages': 16,\n",
|
||||
" 'format': 'PDF 1.5',\n",
|
||||
" 'title': '',\n",
|
||||
" 'author': '',\n",
|
||||
" 'subject': '',\n",
|
||||
" 'keywords': '',\n",
|
||||
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'trapped': '',\n",
|
||||
" 'modDate': 'D:20210622012710Z',\n",
|
||||
" 'creationDate': 'D:20210622012710Z',\n",
|
||||
" 'page': 10}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pprint.pp(pages[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The metadata attribute contains at least the following keys:\n",
|
||||
"- source\n",
|
||||
"- page (if in mode *page*)\n",
|
||||
"- total_page\n",
|
||||
"- creationdate\n",
|
||||
"- creator\n",
|
||||
"- producer\n",
|
||||
"\n",
|
||||
"Additional metadata are specific to each parser.\n",
|
||||
"These pieces of information can be helpful (to categorize your PDFs for example)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Splitting mode & custom pages delimiter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When loading the PDF file you can split it in two different ways:\n",
|
||||
"- By page\n",
|
||||
"- As a single text flow\n",
|
||||
"\n",
|
||||
"By default PyMuPDF4LLMLoader will split the PDF by page."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extract the PDF by page. Each page is extracted as a langchain Document object:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"16\n",
|
||||
"{'producer': 'pdfTeX-1.40.21',\n",
|
||||
" 'creator': 'LaTeX with hyperref',\n",
|
||||
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'source': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'total_pages': 16,\n",
|
||||
" 'format': 'PDF 1.5',\n",
|
||||
" 'title': '',\n",
|
||||
" 'author': '',\n",
|
||||
" 'subject': '',\n",
|
||||
" 'keywords': '',\n",
|
||||
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'trapped': '',\n",
|
||||
" 'modDate': 'D:20210622012710Z',\n",
|
||||
" 'creationDate': 'D:20210622012710Z',\n",
|
||||
" 'page': 0}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"page\",\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"print(len(docs))\n",
|
||||
"pprint.pp(docs[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this mode the pdf is split by pages and the resulting Documents metadata contains the `page` (page number). But in some cases we could want to process the pdf as a single text flow (so we don't cut some paragraphs in half). In this case you can use the *single* mode :"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extract the whole PDF as a single langchain Document object:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"1\n",
|
||||
"{'producer': 'pdfTeX-1.40.21',\n",
|
||||
" 'creator': 'LaTeX with hyperref',\n",
|
||||
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'source': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
|
||||
" 'total_pages': 16,\n",
|
||||
" 'format': 'PDF 1.5',\n",
|
||||
" 'title': '',\n",
|
||||
" 'author': '',\n",
|
||||
" 'subject': '',\n",
|
||||
" 'keywords': '',\n",
|
||||
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
|
||||
" 'trapped': '',\n",
|
||||
" 'modDate': 'D:20210622012710Z',\n",
|
||||
" 'creationDate': 'D:20210622012710Z'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"single\",\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"print(len(docs))\n",
|
||||
"pprint.pp(docs[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Logically, in this mode, the `page` (page_number) metadata disappears. Here's how to clearly identify where pages end in the text flow :"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add a custom *pages_delimiter* to identify where are ends of pages in *single* mode:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"single\",\n",
|
||||
" pages_delimiter=\"\\n-------THIS IS A CUSTOM END OF PAGE-------\\n\\n\",\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"part = docs[0].page_content[10663:11317]\n",
|
||||
"print(part)\n",
|
||||
"display(Markdown(part))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The default `pages_delimiter` is \\n-----\\n\\n.\n",
|
||||
"But this could simply be \\n, or \\f to clearly indicate a page change, or \\<!-- PAGE BREAK --> for seamless injection in a Markdown viewer without a visual effect."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extract images from the PDF"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can extract images from your PDFs (in text form) with a choice of three different solutions:\n",
|
||||
"- rapidOCR (lightweight Optical Character Recognition tool)\n",
|
||||
"- Tesseract (OCR tool with high precision)\n",
|
||||
"- Multimodal language model\n",
|
||||
"\n",
|
||||
"The result is inserted at the end of text of the page."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extract images from the PDF with rapidOCR:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -qU rapidocr-onnxruntime pillow"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.parsers import RapidOCRBlobParser\n",
|
||||
"\n",
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"page\",\n",
|
||||
" extract_images=True,\n",
|
||||
" images_parser=RapidOCRBlobParser(),\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"part = docs[5].page_content[1863:]\n",
|
||||
"print(part)\n",
|
||||
"display(Markdown(part))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Be careful, RapidOCR is designed to work with Chinese and English, not other languages."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extract images from the PDF with Tesseract:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -qU pytesseract"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.parsers import TesseractBlobParser\n",
|
||||
"\n",
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"page\",\n",
|
||||
" extract_images=True,\n",
|
||||
" images_parser=TesseractBlobParser(),\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"print(docs[5].page_content[1863:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Extract images from the PDF with multimodal model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -qU langchain_openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 39,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"load_dotenv()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API key =\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.parsers import LLMImageBlobParser\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"page\",\n",
|
||||
" extract_images=True,\n",
|
||||
" images_parser=LLMImageBlobParser(\n",
|
||||
" model=ChatOpenAI(model=\"gpt-4o-mini\", max_tokens=1024)\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"print(docs[5].page_content[1863:])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extract tables from the PDF\n",
|
||||
"\n",
|
||||
"With PyMUPDF4LLM you can extract tables from your PDFs in *markdown* format :"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PyMuPDF4LLMLoader(\n",
|
||||
" \"./example_data/layout-parser-paper.pdf\",\n",
|
||||
" mode=\"page\",\n",
|
||||
" # \"lines_strict\" is the default strategy and\n",
|
||||
" # is the most accurate for tables with column and row lines,\n",
|
||||
" # but may not work well with all documents.\n",
|
||||
" # \"lines\" is a less strict strategy that may work better with\n",
|
||||
" # some documents.\n",
|
||||
" # \"text\" is the least strict strategy and may work better\n",
|
||||
" # with documents that do not have tables with lines.\n",
|
||||
" table_strategy=\"lines\",\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"part = docs[4].page_content[3210:]\n",
|
||||
"print(part)\n",
|
||||
"display(Markdown(part))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Working with Files\n",
|
||||
"\n",
|
||||
"Many document loaders involve parsing files. The difference between such loaders usually stems from how the file is parsed, rather than how the file is loaded. For example, you can use `open` to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.\n",
|
||||
"\n",
|
||||
"As a result, it can be helpful to decouple the parsing logic from the loading logic, which makes it easier to re-use a given parser regardless of how the data was loaded.\n",
|
||||
"You can use this strategy to analyze different files, with the same parsing parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import FileSystemBlobLoader\n",
|
||||
"from langchain_community.document_loaders.generic import GenericLoader\n",
|
||||
"from langchain_pymupdf4llm import PyMuPDF4LLMParser\n",
|
||||
"\n",
|
||||
"loader = GenericLoader(\n",
|
||||
" blob_loader=FileSystemBlobLoader(\n",
|
||||
" path=\"./example_data/\",\n",
|
||||
" glob=\"*.pdf\",\n",
|
||||
" ),\n",
|
||||
" blob_parser=PyMuPDF4LLMParser(),\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"part = docs[0].page_content[:562]\n",
|
||||
"print(part)\n",
|
||||
"display(Markdown(part))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the GitHub repository: https://github.com/lakinduboteju/langchain-pymupdf4llm"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.21"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -20,7 +20,7 @@ from langchain_community.chat_models.kinetica import ChatKinetica
|
||||
The Kinetca vectorstore wrapper leverages Kinetica's native support for [vector
|
||||
similarity search](https://docs.kinetica.com/7.2/vector_search/).
|
||||
|
||||
See [Kinetica Vectorsore API](/docs/integrations/vectorstores/kinetica) for usage.
|
||||
See [Kinetica Vectorstore API](/docs/integrations/vectorstores/kinetica) for usage.
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import Kinetica
|
||||
@ -28,8 +28,8 @@ from langchain_community.vectorstores import Kinetica
|
||||
|
||||
## Document Loader
|
||||
|
||||
The Kinetica Document loader can be used to load LangChain Documents from the
|
||||
Kinetica database.
|
||||
The Kinetica Document loader can be used to load LangChain [Documents](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) from the
|
||||
[Kinetica](https://www.kinetica.com/) database.
|
||||
|
||||
See [Kinetica Document Loader](/docs/integrations/document_loaders/kinetica) for usage
|
||||
|
||||
|
59
docs/docs/integrations/providers/pymupdf4llm.ipynb
Normal file
59
docs/docs/integrations/providers/pymupdf4llm.ipynb
Normal file
@ -0,0 +1,59 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PyMuPDF4LLM\n",
|
||||
"\n",
|
||||
"[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) is aimed to make it easier to extract PDF content in Markdown format, needed for LLM & RAG applications.\n",
|
||||
"\n",
|
||||
"[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-pymupdf4llm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "y8ku6X96sebl"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_pymupdf4llm import PyMuPDF4LLMLoader, PyMuPDF4LLMParser"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
}
|
File diff suppressed because it is too large
Load Diff
@ -888,6 +888,13 @@ const FEATURE_TABLES = {
|
||||
api: "Package",
|
||||
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
|
||||
},
|
||||
{
|
||||
name: "PyMuPDF4LLM",
|
||||
link: "pymupdf4llm",
|
||||
source: "Load PDF content to Markdown using PyMuPDF4LLM",
|
||||
api: "Package",
|
||||
apiLink: "https://github.com/lakinduboteju/langchain-pymupdf4llm"
|
||||
},
|
||||
{
|
||||
name: "PDFMiner",
|
||||
link: "pdfminer",
|
||||
|
@ -95,7 +95,7 @@ class SQLiteVec(VectorStore):
|
||||
)
|
||||
self._connection.execute(
|
||||
f"""
|
||||
CREATE TRIGGER IF NOT EXISTS embed_text
|
||||
CREATE TRIGGER IF NOT EXISTS {self._table}_embed_text
|
||||
AFTER INSERT ON {self._table}
|
||||
BEGIN
|
||||
INSERT INTO {self._table}_vec(rowid, text_embedding)
|
||||
|
@ -56,3 +56,27 @@ def test_sqlitevec_add_extra() -> None:
|
||||
docsearch.add_texts(texts, metadatas)
|
||||
output = docsearch.similarity_search("foo", k=10)
|
||||
assert len(output) == 6
|
||||
|
||||
|
||||
@pytest.mark.requires("sqlite-vec")
|
||||
def test_sqlitevec_search_multiple_tables() -> None:
|
||||
"""Test end to end construction and search with multiple tables."""
|
||||
docsearch_1 = SQLiteVec.from_texts(
|
||||
fake_texts,
|
||||
FakeEmbeddings(),
|
||||
table="table_1",
|
||||
db_file=":memory:", ## change to local storage for testing
|
||||
)
|
||||
|
||||
docsearch_2 = SQLiteVec.from_texts(
|
||||
fake_texts,
|
||||
FakeEmbeddings(),
|
||||
table="table_2",
|
||||
db_file=":memory:",
|
||||
)
|
||||
|
||||
output_1 = docsearch_1.similarity_search("foo", k=1)
|
||||
output_2 = docsearch_2.similarity_search("foo", k=1)
|
||||
|
||||
assert output_1 == [Document(page_content="foo", metadata={})]
|
||||
assert output_2 == [Document(page_content="foo", metadata={})]
|
||||
|
@ -3,13 +3,14 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, Any, Optional, TypeVar, Union
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.agents import AgentAction, AgentFinish
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.messages import BaseMessage
|
||||
|
@ -2,12 +2,14 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Optional, TextIO, cast
|
||||
from typing import TYPE_CHECKING, Any, Optional, TextIO, cast
|
||||
|
||||
from langchain_core.agents import AgentAction, AgentFinish
|
||||
from langchain_core.callbacks import BaseCallbackHandler
|
||||
from langchain_core.utils.input import print_text
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.agents import AgentAction, AgentFinish
|
||||
|
||||
|
||||
class FileCallbackHandler(BaseCallbackHandler):
|
||||
"""Callback Handler that writes to a file.
|
||||
@ -45,9 +47,15 @@ class FileCallbackHandler(BaseCallbackHandler):
|
||||
inputs (Dict[str, Any]): The inputs to the chain.
|
||||
**kwargs (Any): Additional keyword arguments.
|
||||
"""
|
||||
class_name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
|
||||
if "name" in kwargs:
|
||||
name = kwargs["name"]
|
||||
else:
|
||||
if serialized:
|
||||
name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
|
||||
else:
|
||||
name = "<unknown>"
|
||||
print_text(
|
||||
f"\n\n\033[1m> Entering new {class_name} chain...\033[0m",
|
||||
f"\n\n\033[1m> Entering new {name} chain...\033[0m",
|
||||
end="\n",
|
||||
file=self.file,
|
||||
)
|
||||
|
@ -5,7 +5,6 @@ import functools
|
||||
import logging
|
||||
import uuid
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from contextlib import asynccontextmanager, contextmanager
|
||||
from contextvars import copy_context
|
||||
@ -21,7 +20,6 @@ from typing import (
|
||||
from uuid import UUID
|
||||
|
||||
from langsmith.run_helpers import get_tracing_context
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.callbacks.base import (
|
||||
BaseCallbackHandler,
|
||||
@ -39,6 +37,10 @@ from langchain_core.tracers.schemas import Run
|
||||
from langchain_core.utils.env import env_var_is_set
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.agents import AgentAction, AgentFinish
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
|
||||
|
@ -17,8 +17,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from typing import Union
|
||||
from typing import TYPE_CHECKING, Union
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
@ -29,6 +28,9 @@ from langchain_core.messages import (
|
||||
get_buffer_string,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
|
||||
class BaseChatMessageHistory(ABC):
|
||||
"""Abstract base class for storing chat message history.
|
||||
|
@ -3,16 +3,17 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
from typing import TYPE_CHECKING, Optional
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.runnables import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
|
||||
from langchain_text_splitters import TextSplitter
|
||||
|
||||
from langchain_core.documents.base import Blob
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.documents.base import Blob
|
||||
|
||||
|
||||
class BaseLoader(ABC): # noqa: B024
|
||||
|
@ -8,12 +8,15 @@ In addition, content loading code should provide a lazy loading interface by def
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Iterable
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
# Re-export Blob and PathLike for backwards compatibility
|
||||
from langchain_core.documents.base import Blob as Blob
|
||||
from langchain_core.documents.base import PathLike as PathLike
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Iterable
|
||||
|
||||
|
||||
class BlobLoader(ABC):
|
||||
"""Abstract interface for blob loaders implementation.
|
||||
|
@ -2,15 +2,17 @@ from __future__ import annotations
|
||||
|
||||
import contextlib
|
||||
import mimetypes
|
||||
from collections.abc import Generator
|
||||
from io import BufferedReader, BytesIO
|
||||
from pathlib import PurePath
|
||||
from typing import Any, Literal, Optional, Union, cast
|
||||
from typing import TYPE_CHECKING, Any, Literal, Optional, Union, cast
|
||||
|
||||
from pydantic import ConfigDict, Field, field_validator, model_validator
|
||||
|
||||
from langchain_core.load.serializable import Serializable
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Generator
|
||||
|
||||
PathLike = Union[str, PurePath]
|
||||
|
||||
|
||||
|
@ -1,15 +1,18 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from typing import Optional
|
||||
from typing import TYPE_CHECKING, Optional
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain_core.callbacks import Callbacks
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.runnables import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from langchain_core.callbacks import Callbacks
|
||||
from langchain_core.documents import Document
|
||||
|
||||
|
||||
class BaseDocumentCompressor(BaseModel, ABC):
|
||||
"""Base class for document compressors.
|
||||
|
@ -1,12 +1,13 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from langchain_core.runnables.config import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
|
||||
|
@ -7,11 +7,11 @@ from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.example_selectors.base import BaseExampleSelector
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.embeddings import Embeddings
|
||||
|
||||
|
||||
|
@ -3,14 +3,17 @@ from __future__ import annotations
|
||||
import abc
|
||||
import time
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from typing import Any, Optional, TypedDict
|
||||
from typing import TYPE_CHECKING, Any, Optional, TypedDict
|
||||
|
||||
from langchain_core._api import beta
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.retrievers import BaseRetriever
|
||||
from langchain_core.runnables import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
|
||||
class RecordManager(ABC):
|
||||
"""Abstract base class representing the interface for a record manager.
|
||||
|
@ -4,7 +4,6 @@ import asyncio
|
||||
import inspect
|
||||
import json
|
||||
import typing
|
||||
import uuid
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
@ -70,6 +69,8 @@ from langchain_core.utils.function_calling import convert_to_openai_tool
|
||||
from langchain_core.utils.pydantic import TypeBaseModel, is_basemodel_subclass
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import uuid
|
||||
|
||||
from langchain_core.output_parsers.base import OutputParserLike
|
||||
from langchain_core.runnables import Runnable, RunnableConfig
|
||||
from langchain_core.tools import BaseTool
|
||||
|
@ -7,12 +7,12 @@ import functools
|
||||
import inspect
|
||||
import json
|
||||
import logging
|
||||
import uuid
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
from pathlib import Path
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Optional,
|
||||
@ -61,6 +61,9 @@ from langchain_core.prompt_values import ChatPromptValue, PromptValue, StringPro
|
||||
from langchain_core.runnables import RunnableConfig, ensure_config, get_config_list
|
||||
from langchain_core.runnables.config import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import uuid
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
|
@ -1,6 +1,5 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union, cast
|
||||
|
||||
from pydantic import ConfigDict, Field, field_validator
|
||||
@ -11,6 +10,8 @@ from langchain_core.utils._merge import merge_dicts, merge_lists
|
||||
from langchain_core.utils.interactive_env import is_interactive_env
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from langchain_core.prompts.chat import ChatPromptTemplate
|
||||
|
||||
|
||||
|
@ -4,14 +4,16 @@ import csv
|
||||
import re
|
||||
from abc import abstractmethod
|
||||
from collections import deque
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
from io import StringIO
|
||||
from typing import TYPE_CHECKING, TypeVar, Union
|
||||
from typing import Optional as Optional
|
||||
from typing import TypeVar, Union
|
||||
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langchain_core.output_parsers.transform import BaseTransformOutputParser
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
|
@ -1,6 +1,5 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
@ -19,6 +18,8 @@ from langchain_core.outputs import (
|
||||
from langchain_core.runnables.config import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
|
||||
|
||||
|
@ -1,14 +1,16 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Literal, Union
|
||||
from typing import TYPE_CHECKING, Literal, Union
|
||||
|
||||
from pydantic import model_validator
|
||||
from typing_extensions import Self
|
||||
|
||||
from langchain_core.messages import BaseMessage, BaseMessageChunk
|
||||
from langchain_core.outputs.generation import Generation
|
||||
from langchain_core.utils._merge import merge_dicts
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from typing_extensions import Self
|
||||
|
||||
|
||||
class ChatGeneration(Generation):
|
||||
"""A single chat generation output.
|
||||
|
@ -3,9 +3,8 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from pathlib import Path
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Annotated,
|
||||
Any,
|
||||
Optional,
|
||||
@ -47,6 +46,10 @@ from langchain_core.prompts.string import (
|
||||
from langchain_core.utils import get_colored_text
|
||||
from langchain_core.utils.interactive_env import is_interactive_env
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class BaseMessagePromptTemplate(Serializable, ABC):
|
||||
"""Base class for message prompt templates."""
|
||||
|
@ -2,8 +2,7 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any, Literal, Optional, Union
|
||||
from typing import TYPE_CHECKING, Any, Literal, Optional, Union
|
||||
|
||||
from pydantic import (
|
||||
BaseModel,
|
||||
@ -11,7 +10,6 @@ from pydantic import (
|
||||
Field,
|
||||
model_validator,
|
||||
)
|
||||
from typing_extensions import Self
|
||||
|
||||
from langchain_core.example_selectors import BaseExampleSelector
|
||||
from langchain_core.messages import BaseMessage, get_buffer_string
|
||||
@ -27,6 +25,11 @@ from langchain_core.prompts.string import (
|
||||
get_template_variables,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pathlib import Path
|
||||
|
||||
from typing_extensions import Self
|
||||
|
||||
|
||||
class _FewShotPromptTemplateMixin(BaseModel):
|
||||
"""Prompt template that contains few shot examples."""
|
||||
|
@ -3,8 +3,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import warnings
|
||||
from pathlib import Path
|
||||
from typing import Any, Optional, Union
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union
|
||||
|
||||
from pydantic import BaseModel, model_validator
|
||||
|
||||
@ -16,7 +15,11 @@ from langchain_core.prompts.string import (
|
||||
get_template_variables,
|
||||
mustache_schema,
|
||||
)
|
||||
from langchain_core.runnables.config import RunnableConfig
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pathlib import Path
|
||||
|
||||
from langchain_core.runnables.config import RunnableConfig
|
||||
|
||||
|
||||
class PromptTemplate(StringPromptTemplate):
|
||||
|
@ -60,7 +60,6 @@ from langchain_core.runnables.config import (
|
||||
run_in_executor,
|
||||
)
|
||||
from langchain_core.runnables.graph import Graph
|
||||
from langchain_core.runnables.schema import StreamEvent
|
||||
from langchain_core.runnables.utils import (
|
||||
AddableDict,
|
||||
AnyConfigurableField,
|
||||
@ -94,6 +93,7 @@ if TYPE_CHECKING:
|
||||
from langchain_core.runnables.fallbacks import (
|
||||
RunnableWithFallbacks as RunnableWithFallbacksT,
|
||||
)
|
||||
from langchain_core.runnables.schema import StreamEvent
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_core.tracers.log_stream import (
|
||||
RunLog,
|
||||
|
@ -7,6 +7,7 @@ from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
from collections.abc import Mapping as Mapping
|
||||
from functools import wraps
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Optional,
|
||||
@ -26,7 +27,6 @@ from langchain_core.runnables.config import (
|
||||
get_executor_for_config,
|
||||
merge_configs,
|
||||
)
|
||||
from langchain_core.runnables.graph import Graph
|
||||
from langchain_core.runnables.utils import (
|
||||
AnyConfigurableField,
|
||||
ConfigurableField,
|
||||
@ -39,6 +39,9 @@ from langchain_core.runnables.utils import (
|
||||
get_unique_config_specs,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.runnables.graph import Graph
|
||||
|
||||
|
||||
class DynamicRunnable(RunnableSerializable[Input, Output]):
|
||||
"""Serializable Runnable that can be dynamically configured.
|
||||
|
@ -2,7 +2,6 @@ from __future__ import annotations
|
||||
|
||||
import inspect
|
||||
from collections import defaultdict
|
||||
from collections.abc import Sequence
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import (
|
||||
@ -18,11 +17,13 @@ from typing import (
|
||||
)
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain_core.utils.pydantic import _IgnoreUnserializable, is_basemodel_subclass
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain_core.runnables.base import Runnable as RunnableType
|
||||
|
||||
|
||||
|
@ -5,7 +5,7 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import inspect
|
||||
import threading
|
||||
from collections.abc import AsyncIterator, Awaitable, Iterator, Mapping
|
||||
from collections.abc import Awaitable
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
@ -32,7 +32,6 @@ from langchain_core.runnables.config import (
|
||||
get_executor_for_config,
|
||||
patch_config,
|
||||
)
|
||||
from langchain_core.runnables.graph import Graph
|
||||
from langchain_core.runnables.utils import (
|
||||
AddableDict,
|
||||
ConfigurableFieldSpec,
|
||||
@ -42,10 +41,13 @@ from langchain_core.utils.iter import safetee
|
||||
from langchain_core.utils.pydantic import create_model_v2
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator, Mapping
|
||||
|
||||
from langchain_core.callbacks.manager import (
|
||||
AsyncCallbackManagerForChainRun,
|
||||
CallbackManagerForChainRun,
|
||||
)
|
||||
from langchain_core.runnables.graph import Graph
|
||||
|
||||
|
||||
def identity(x: Other) -> Other:
|
||||
|
@ -1,8 +1,9 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator, Iterator, Mapping
|
||||
from collections.abc import Mapping
|
||||
from itertools import starmap
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Optional,
|
||||
@ -31,6 +32,9 @@ from langchain_core.runnables.utils import (
|
||||
get_unique_config_specs,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
|
||||
|
||||
class RouterInput(TypedDict):
|
||||
"""Router input.
|
||||
|
@ -2,11 +2,13 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import Any, Literal, Union
|
||||
from typing import TYPE_CHECKING, Any, Literal, Union
|
||||
|
||||
from typing_extensions import NotRequired, TypedDict
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
|
||||
class EventData(TypedDict, total=False):
|
||||
"""Data associated with a streaming event."""
|
||||
|
@ -6,19 +6,11 @@ import ast
|
||||
import asyncio
|
||||
import inspect
|
||||
import textwrap
|
||||
from collections.abc import (
|
||||
AsyncIterable,
|
||||
AsyncIterator,
|
||||
Awaitable,
|
||||
Coroutine,
|
||||
Iterable,
|
||||
Mapping,
|
||||
Sequence,
|
||||
)
|
||||
from functools import lru_cache
|
||||
from inspect import signature
|
||||
from itertools import groupby
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
NamedTuple,
|
||||
@ -30,11 +22,22 @@ from typing import (
|
||||
|
||||
from typing_extensions import TypeGuard, override
|
||||
|
||||
from langchain_core.runnables.schema import StreamEvent
|
||||
|
||||
# Re-export create-model for backwards compatibility
|
||||
from langchain_core.utils.pydantic import create_model as create_model
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import (
|
||||
AsyncIterable,
|
||||
AsyncIterator,
|
||||
Awaitable,
|
||||
Coroutine,
|
||||
Iterable,
|
||||
Mapping,
|
||||
Sequence,
|
||||
)
|
||||
|
||||
from langchain_core.runnables.schema import StreamEvent
|
||||
|
||||
Input = TypeVar("Input", contravariant=True)
|
||||
# Output type should implement __concat__, as eg str, list, dict do
|
||||
Output = TypeVar("Output", covariant=True)
|
||||
|
@ -3,12 +3,14 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from enum import Enum
|
||||
from typing import Any, Optional, Union
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
|
||||
class Visitor(ABC):
|
||||
"""Defines interface for IR translation using a visitor pattern."""
|
||||
|
@ -4,13 +4,12 @@ import asyncio
|
||||
import functools
|
||||
import inspect
|
||||
import json
|
||||
import uuid
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from contextvars import copy_context
|
||||
from inspect import signature
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Annotated,
|
||||
Any,
|
||||
Callable,
|
||||
@ -68,6 +67,10 @@ from langchain_core.utils.pydantic import (
|
||||
is_pydantic_v2_subclass,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import uuid
|
||||
from collections.abc import Sequence
|
||||
|
||||
FILTERED_ARGS = ("run_manager", "callbacks")
|
||||
|
||||
|
||||
|
@ -1,21 +1,23 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from functools import partial
|
||||
from typing import Literal, Optional, Union
|
||||
from typing import TYPE_CHECKING, Literal, Optional, Union
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from langchain_core.callbacks import Callbacks
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.prompts import (
|
||||
BasePromptTemplate,
|
||||
PromptTemplate,
|
||||
aformat_document,
|
||||
format_document,
|
||||
)
|
||||
from langchain_core.retrievers import BaseRetriever
|
||||
from langchain_core.tools.simple import Tool
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.callbacks import Callbacks
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.retrievers import BaseRetriever
|
||||
|
||||
|
||||
class RetrieverInput(BaseModel):
|
||||
"""Input to the retriever."""
|
||||
|
@ -3,6 +3,7 @@ from __future__ import annotations
|
||||
from collections.abc import Awaitable
|
||||
from inspect import signature
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Optional,
|
||||
@ -13,7 +14,6 @@ from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.messages import ToolCall
|
||||
from langchain_core.runnables import RunnableConfig, run_in_executor
|
||||
from langchain_core.tools.base import (
|
||||
ArgsSchema,
|
||||
@ -22,6 +22,9 @@ from langchain_core.tools.base import (
|
||||
_get_runnable_config_param,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.messages import ToolCall
|
||||
|
||||
|
||||
class Tool(BaseTool):
|
||||
"""Tool that takes in function or coroutine directly."""
|
||||
|
@ -4,6 +4,7 @@ import textwrap
|
||||
from collections.abc import Awaitable
|
||||
from inspect import signature
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Annotated,
|
||||
Any,
|
||||
Callable,
|
||||
@ -18,7 +19,6 @@ from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.messages import ToolCall
|
||||
from langchain_core.runnables import RunnableConfig, run_in_executor
|
||||
from langchain_core.tools.base import (
|
||||
FILTERED_ARGS,
|
||||
@ -29,6 +29,9 @@ from langchain_core.tools.base import (
|
||||
)
|
||||
from langchain_core.utils.pydantic import is_basemodel_subclass
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.messages import ToolCall
|
||||
|
||||
|
||||
class StructuredTool(BaseTool):
|
||||
"""Tool that can operate on any number of inputs."""
|
||||
|
@ -5,26 +5,27 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Sequence
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Optional,
|
||||
Union,
|
||||
)
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
|
||||
from langchain_core.exceptions import TracerException # noqa
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
|
||||
from langchain_core.tracers.core import _TracerCore
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -1,6 +1,5 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Generator
|
||||
from contextlib import contextmanager
|
||||
from contextvars import ContextVar
|
||||
from typing import (
|
||||
@ -18,13 +17,15 @@ from langsmith import utils as ls_utils
|
||||
|
||||
from langchain_core.tracers.langchain import LangChainTracer
|
||||
from langchain_core.tracers.run_collector import RunCollectorCallbackHandler
|
||||
from langchain_core.tracers.schemas import TracerSessionV1
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Generator
|
||||
|
||||
from langsmith import Client as LangSmithClient
|
||||
|
||||
from langchain_core.callbacks.base import BaseCallbackHandler, Callbacks
|
||||
from langchain_core.callbacks.manager import AsyncCallbackManager, CallbackManager
|
||||
from langchain_core.tracers.schemas import TracerSessionV1
|
||||
|
||||
# for backwards partial compatibility if this is imported by users but unused
|
||||
tracing_callback_var: Any = None
|
||||
|
@ -6,7 +6,6 @@ import logging
|
||||
import sys
|
||||
import traceback
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Coroutine, Sequence
|
||||
from datetime import datetime, timezone
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
@ -16,13 +15,9 @@ from typing import (
|
||||
Union,
|
||||
cast,
|
||||
)
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.exceptions import TracerException
|
||||
from langchain_core.load import dumpd
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langchain_core.outputs import (
|
||||
ChatGeneration,
|
||||
ChatGenerationChunk,
|
||||
@ -32,7 +27,13 @@ from langchain_core.outputs import (
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Coroutine, Sequence
|
||||
from uuid import UUID
|
||||
|
||||
from tenacity import RetryCallState
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.messages import BaseMessage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -5,9 +5,8 @@ from __future__ import annotations
|
||||
import logging
|
||||
import threading
|
||||
import weakref
|
||||
from collections.abc import Sequence
|
||||
from concurrent.futures import Future, ThreadPoolExecutor, wait
|
||||
from typing import Any, Optional, Union, cast
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union, cast
|
||||
from uuid import UUID
|
||||
|
||||
import langsmith
|
||||
@ -17,7 +16,11 @@ from langchain_core.tracers import langchain as langchain_tracer
|
||||
from langchain_core.tracers.base import BaseTracer
|
||||
from langchain_core.tracers.context import tracing_v2_enabled
|
||||
from langchain_core.tracers.langchain import _get_executor
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -5,7 +5,6 @@ from __future__ import annotations
|
||||
import asyncio
|
||||
import contextlib
|
||||
import logging
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
@ -37,13 +36,15 @@ from langchain_core.runnables.utils import (
|
||||
_RootEventFilter,
|
||||
)
|
||||
from langchain_core.tracers._streaming import _StreamingCallbackHandler
|
||||
from langchain_core.tracers.log_stream import LogEntry
|
||||
from langchain_core.tracers.memory_stream import _MemoryStream
|
||||
from langchain_core.utils.aiter import aclosing, py_anext
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.runnables import Runnable, RunnableConfig
|
||||
from langchain_core.tracers.log_stream import LogEntry
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -22,12 +22,12 @@ from tenacity import (
|
||||
|
||||
from langchain_core.env import get_runtime_environment
|
||||
from langchain_core.load import dumpd
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
|
||||
from langchain_core.tracers.base import BaseTracer
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.messages import BaseMessage
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
_LOGGED = set()
|
||||
|
@ -5,8 +5,8 @@ import contextlib
|
||||
import copy
|
||||
import threading
|
||||
from collections import defaultdict
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Literal,
|
||||
Optional,
|
||||
@ -14,7 +14,6 @@ from typing import (
|
||||
Union,
|
||||
overload,
|
||||
)
|
||||
from uuid import UUID
|
||||
|
||||
import jsonpatch # type: ignore[import]
|
||||
from typing_extensions import NotRequired, TypedDict
|
||||
@ -23,11 +22,16 @@ from langchain_core.load import dumps
|
||||
from langchain_core.load.load import load
|
||||
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
|
||||
from langchain_core.runnables import Runnable, RunnableConfig, ensure_config
|
||||
from langchain_core.runnables.utils import Input, Output
|
||||
from langchain_core.tracers._streaming import _StreamingCallbackHandler
|
||||
from langchain_core.tracers.base import BaseTracer
|
||||
from langchain_core.tracers.memory_stream import _MemoryStream
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator, Iterator, Sequence
|
||||
from uuid import UUID
|
||||
|
||||
from langchain_core.runnables.utils import Input, Output
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
|
||||
class LogEntry(TypedDict):
|
||||
|
@ -1,6 +1,5 @@
|
||||
from collections.abc import Awaitable
|
||||
from typing import Callable, Optional, Union
|
||||
from uuid import UUID
|
||||
from typing import TYPE_CHECKING, Callable, Optional, Union
|
||||
|
||||
from langchain_core.runnables.config import (
|
||||
RunnableConfig,
|
||||
@ -10,6 +9,9 @@ from langchain_core.runnables.config import (
|
||||
from langchain_core.tracers.base import AsyncBaseTracer, BaseTracer
|
||||
from langchain_core.tracers.schemas import Run
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from uuid import UUID
|
||||
|
||||
Listener = Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]
|
||||
AsyncListener = Union[
|
||||
Callable[[Run], Awaitable[None]], Callable[[Run, RunnableConfig], Awaitable[None]]
|
||||
|
@ -1,8 +1,10 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from copy import deepcopy
|
||||
from typing import Any, Optional
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
|
||||
def _retrieve_ref(path: str, schema: dict) -> dict:
|
||||
|
@ -8,6 +8,7 @@ import logging
|
||||
from collections.abc import Iterator, Mapping, Sequence
|
||||
from types import MappingProxyType
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Literal,
|
||||
Optional,
|
||||
@ -15,7 +16,8 @@ from typing import (
|
||||
cast,
|
||||
)
|
||||
|
||||
from typing_extensions import TypeAlias
|
||||
if TYPE_CHECKING:
|
||||
from typing_extensions import TypeAlias
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -9,6 +9,7 @@ from contextlib import nullcontext
|
||||
from functools import lru_cache, wraps
|
||||
from types import GenericAlias
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Optional,
|
||||
@ -29,13 +30,16 @@ from pydantic import (
|
||||
from pydantic import (
|
||||
create_model as _create_model_base,
|
||||
)
|
||||
from pydantic.fields import FieldInfo as FieldInfoV2
|
||||
from pydantic.json_schema import (
|
||||
DEFAULT_REF_TEMPLATE,
|
||||
GenerateJsonSchema,
|
||||
JsonSchemaMode,
|
||||
JsonSchemaValue,
|
||||
)
|
||||
from pydantic_core import core_schema
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from pydantic_core import core_schema
|
||||
|
||||
|
||||
def get_pydantic_major_version() -> int:
|
||||
@ -71,8 +75,8 @@ elif PYDANTIC_MAJOR_VERSION == 2:
|
||||
from pydantic.v1.fields import FieldInfo as FieldInfoV1 # type: ignore[assignment]
|
||||
|
||||
# Union type needs to be last assignment to PydanticBaseModel to make mypy happy.
|
||||
PydanticBaseModel = Union[BaseModel, pydantic.BaseModel] # type: ignore
|
||||
TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]] # type: ignore
|
||||
PydanticBaseModel = Union[BaseModel, pydantic.BaseModel] # type: ignore[assignment,misc]
|
||||
TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]] # type: ignore[misc]
|
||||
else:
|
||||
msg = f"Unsupported Pydantic version: {PYDANTIC_MAJOR_VERSION}"
|
||||
raise ValueError(msg)
|
||||
@ -357,7 +361,6 @@ def _create_subset_model(
|
||||
|
||||
if PYDANTIC_MAJOR_VERSION == 2:
|
||||
from pydantic import BaseModel as BaseModelV2
|
||||
from pydantic.fields import FieldInfo as FieldInfoV2
|
||||
from pydantic.v1 import BaseModel as BaseModelV1
|
||||
|
||||
@overload
|
||||
|
@ -25,7 +25,6 @@ import logging
|
||||
import math
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from collections.abc import Collection, Iterable, Iterator, Sequence
|
||||
from itertools import cycle
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
@ -43,6 +42,8 @@ from langchain_core.retrievers import BaseRetriever, LangSmithRetrieverParams
|
||||
from langchain_core.runnables.config import run_in_executor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Collection, Iterable, Iterator, Sequence
|
||||
|
||||
from langchain_core.callbacks.manager import (
|
||||
AsyncCallbackManagerForRetrieverRun,
|
||||
CallbackManagerForRetrieverRun,
|
||||
|
@ -2,7 +2,6 @@ from __future__ import annotations
|
||||
|
||||
import json
|
||||
import uuid
|
||||
from collections.abc import Iterator, Sequence
|
||||
from pathlib import Path
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
@ -13,13 +12,15 @@ from typing import (
|
||||
|
||||
from langchain_core._api import deprecated
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.load import dumpd, load
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity
|
||||
from langchain_core.vectorstores.utils import maximal_marginal_relevance
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Iterator, Sequence
|
||||
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.indexing import UpsertResponse
|
||||
|
||||
|
||||
|
@ -77,8 +77,9 @@ target-version = "py39"
|
||||
|
||||
|
||||
[tool.ruff.lint]
|
||||
select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TID", "TRY", "UP", "W", "YTT",]
|
||||
ignore = [ "ANN401", "COM812", "UP007", "S110", "S112",]
|
||||
select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TC", "TID", "TRY", "UP", "W", "YTT",]
|
||||
ignore = [ "ANN401", "COM812", "UP007", "S110", "S112", "TC001", "TC002", "TC003"]
|
||||
flake8-type-checking.runtime-evaluated-base-classes = ["pydantic.BaseModel","langchain_core.load.serializable.Serializable","langchain_core.runnables.base.RunnableSerializable"]
|
||||
flake8-annotations.allow-star-arg-any = true
|
||||
flake8-annotations.mypy-init-return = true
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
import uuid
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
from typing import Any, Literal, Optional, Union
|
||||
from typing import TYPE_CHECKING, Any, Literal, Optional, Union
|
||||
|
||||
import pytest
|
||||
|
||||
@ -30,6 +30,9 @@ from tests.unit_tests.fake.callbacks import (
|
||||
)
|
||||
from tests.unit_tests.stubs import _any_id_ai_message, _any_id_ai_message_chunk
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain_core.outputs.llm_result import LLMResult
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def messages() -> list:
|
||||
|
@ -7,8 +7,7 @@ the relevant methods.
|
||||
from __future__ import annotations
|
||||
|
||||
import uuid
|
||||
from collections.abc import Iterable, Sequence
|
||||
from typing import Any, Optional
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
import pytest
|
||||
|
||||
@ -16,6 +15,9 @@ from langchain_core.documents import Document
|
||||
from langchain_core.embeddings import Embeddings, FakeEmbeddings
|
||||
from langchain_core.vectorstores import VectorStore
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Iterable, Sequence
|
||||
|
||||
|
||||
class CustomAddTextsVectorstore(VectorStore):
|
||||
"""A vectorstore that only implements add texts."""
|
||||
|
@ -1,9 +1,9 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from typing import Any, Dict, List, Optional, Sequence, Tuple
|
||||
|
||||
import numpy as np
|
||||
from langchain_core.callbacks import (
|
||||
CallbackManagerForChainRun,
|
||||
)
|
||||
@ -23,6 +23,8 @@ from langchain.chains.flare.prompts import (
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _extract_tokens_and_log_probs(response: AIMessage) -> Tuple[List[str], List[float]]:
|
||||
"""Extract tokens and log probabilities from chat model response."""
|
||||
@ -57,7 +59,24 @@ def _low_confidence_spans(
|
||||
min_token_gap: int,
|
||||
num_pad_tokens: int,
|
||||
) -> List[str]:
|
||||
_low_idx = np.where(np.exp(log_probs) < min_prob)[0]
|
||||
try:
|
||||
import numpy as np
|
||||
|
||||
_low_idx = np.where(np.exp(log_probs) < min_prob)[0]
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"NumPy not found in the current Python environment. FlareChain will use a "
|
||||
"pure Python implementation for internal calculations, which may "
|
||||
"significantly impact performance, especially for large datasets. For "
|
||||
"optimal speed and efficiency, consider installing NumPy: pip install numpy"
|
||||
)
|
||||
import math
|
||||
|
||||
_low_idx = [ # type: ignore[assignment]
|
||||
idx
|
||||
for idx, log_prob in enumerate(log_probs)
|
||||
if math.exp(log_prob) < min_prob
|
||||
]
|
||||
low_idx = [i for i in _low_idx if re.search(r"\w", tokens[i])]
|
||||
if len(low_idx) == 0:
|
||||
return []
|
||||
|
@ -5,9 +5,9 @@ https://arxiv.org/abs/2212.10496
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import numpy as np
|
||||
from langchain_core.callbacks import CallbackManagerForChainRun
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
@ -20,6 +20,8 @@ from langchain.chains.base import Chain
|
||||
from langchain.chains.hyde.prompts import PROMPT_MAP
|
||||
from langchain.chains.llm import LLMChain
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class HypotheticalDocumentEmbedder(Chain, Embeddings):
|
||||
"""Generate hypothetical document for query, and then embed that.
|
||||
@ -54,7 +56,22 @@ class HypotheticalDocumentEmbedder(Chain, Embeddings):
|
||||
|
||||
def combine_embeddings(self, embeddings: List[List[float]]) -> List[float]:
|
||||
"""Combine embeddings into final embeddings."""
|
||||
return list(np.array(embeddings).mean(axis=0))
|
||||
try:
|
||||
import numpy as np
|
||||
|
||||
return list(np.array(embeddings).mean(axis=0))
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"NumPy not found in the current Python environment. "
|
||||
"HypotheticalDocumentEmbedder will use a pure Python implementation "
|
||||
"for internal calculations, which may significantly impact "
|
||||
"performance, especially for large datasets. For optimal speed and "
|
||||
"efficiency, consider installing NumPy: pip install numpy"
|
||||
)
|
||||
if not embeddings:
|
||||
return []
|
||||
num_vectors = len(embeddings)
|
||||
return [sum(dim_values) / num_vectors for dim_values in zip(*embeddings)]
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Generate a hypothetical document and embedded it."""
|
||||
|
@ -1,9 +1,11 @@
|
||||
"""A chain for comparing the output of two models using embeddings."""
|
||||
|
||||
import functools
|
||||
import logging
|
||||
from enum import Enum
|
||||
from importlib import util
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import numpy as np
|
||||
from langchain_core.callbacks.manager import (
|
||||
AsyncCallbackManagerForChainRun,
|
||||
CallbackManagerForChainRun,
|
||||
@ -18,6 +20,34 @@ from langchain.evaluation.schema import PairwiseStringEvaluator, StringEvaluator
|
||||
from langchain.schema import RUN_KEY
|
||||
|
||||
|
||||
def _import_numpy() -> Any:
|
||||
try:
|
||||
import numpy as np
|
||||
|
||||
return np
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import numpy, please install with `pip install numpy`."
|
||||
) from e
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@functools.lru_cache(maxsize=1)
|
||||
def _check_numpy() -> bool:
|
||||
if bool(util.find_spec("numpy")):
|
||||
return True
|
||||
logger.warning(
|
||||
"NumPy not found in the current Python environment. "
|
||||
"langchain will use a pure Python implementation for embedding distance "
|
||||
"operations, which may significantly impact performance, especially for large "
|
||||
"datasets. For optimal speed and efficiency, consider installing NumPy: "
|
||||
"pip install numpy"
|
||||
)
|
||||
return False
|
||||
|
||||
|
||||
def _embedding_factory() -> Embeddings:
|
||||
"""Create an Embeddings object.
|
||||
Returns:
|
||||
@ -158,7 +188,7 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
raise ValueError(f"Invalid metric: {metric}")
|
||||
|
||||
@staticmethod
|
||||
def _cosine_distance(a: np.ndarray, b: np.ndarray) -> np.ndarray:
|
||||
def _cosine_distance(a: Any, b: Any) -> Any:
|
||||
"""Compute the cosine distance between two vectors.
|
||||
|
||||
Args:
|
||||
@ -179,7 +209,7 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
return 1.0 - cosine_similarity(a, b)
|
||||
|
||||
@staticmethod
|
||||
def _euclidean_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
|
||||
def _euclidean_distance(a: Any, b: Any) -> Any:
|
||||
"""Compute the Euclidean distance between two vectors.
|
||||
|
||||
Args:
|
||||
@ -189,10 +219,15 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
Returns:
|
||||
np.floating: The Euclidean distance.
|
||||
"""
|
||||
return np.linalg.norm(a - b)
|
||||
if _check_numpy():
|
||||
import numpy as np
|
||||
|
||||
return np.linalg.norm(a - b)
|
||||
|
||||
return sum((x - y) * (x - y) for x, y in zip(a, b)) ** 0.5
|
||||
|
||||
@staticmethod
|
||||
def _manhattan_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
|
||||
def _manhattan_distance(a: Any, b: Any) -> Any:
|
||||
"""Compute the Manhattan distance between two vectors.
|
||||
|
||||
Args:
|
||||
@ -202,10 +237,14 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
Returns:
|
||||
np.floating: The Manhattan distance.
|
||||
"""
|
||||
return np.sum(np.abs(a - b))
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
return np.sum(np.abs(a - b))
|
||||
|
||||
return sum(abs(x - y) for x, y in zip(a, b))
|
||||
|
||||
@staticmethod
|
||||
def _chebyshev_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
|
||||
def _chebyshev_distance(a: Any, b: Any) -> Any:
|
||||
"""Compute the Chebyshev distance between two vectors.
|
||||
|
||||
Args:
|
||||
@ -215,10 +254,14 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
Returns:
|
||||
np.floating: The Chebyshev distance.
|
||||
"""
|
||||
return np.max(np.abs(a - b))
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
return np.max(np.abs(a - b))
|
||||
|
||||
return max(abs(x - y) for x, y in zip(a, b))
|
||||
|
||||
@staticmethod
|
||||
def _hamming_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
|
||||
def _hamming_distance(a: Any, b: Any) -> Any:
|
||||
"""Compute the Hamming distance between two vectors.
|
||||
|
||||
Args:
|
||||
@ -228,9 +271,13 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
Returns:
|
||||
np.floating: The Hamming distance.
|
||||
"""
|
||||
return np.mean(a != b)
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
return np.mean(a != b)
|
||||
|
||||
def _compute_score(self, vectors: np.ndarray) -> float:
|
||||
return sum(1 for x, y in zip(a, b) if x != y) / len(a)
|
||||
|
||||
def _compute_score(self, vectors: Any) -> float:
|
||||
"""Compute the score based on the distance metric.
|
||||
|
||||
Args:
|
||||
@ -240,8 +287,11 @@ class _EmbeddingDistanceChainMixin(Chain):
|
||||
float: The computed score.
|
||||
"""
|
||||
metric = self._get_metric(self.distance_metric)
|
||||
score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
|
||||
return score
|
||||
if _check_numpy() and isinstance(vectors, _import_numpy().ndarray):
|
||||
score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
|
||||
else:
|
||||
score = metric(vectors[0], vectors[1])
|
||||
return float(score)
|
||||
|
||||
|
||||
class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
|
||||
@ -292,9 +342,12 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
|
||||
Returns:
|
||||
Dict[str, Any]: The computed score.
|
||||
"""
|
||||
vectors = np.array(
|
||||
self.embeddings.embed_documents([inputs["prediction"], inputs["reference"]])
|
||||
vectors = self.embeddings.embed_documents(
|
||||
[inputs["prediction"], inputs["reference"]]
|
||||
)
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
vectors = np.array(vectors)
|
||||
score = self._compute_score(vectors)
|
||||
return {"score": score}
|
||||
|
||||
@ -313,13 +366,15 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
|
||||
Returns:
|
||||
Dict[str, Any]: The computed score.
|
||||
"""
|
||||
embedded = await self.embeddings.aembed_documents(
|
||||
vectors = await self.embeddings.aembed_documents(
|
||||
[
|
||||
inputs["prediction"],
|
||||
inputs["reference"],
|
||||
]
|
||||
)
|
||||
vectors = np.array(embedded)
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
vectors = np.array(vectors)
|
||||
score = self._compute_score(vectors)
|
||||
return {"score": score}
|
||||
|
||||
@ -432,14 +487,15 @@ class PairwiseEmbeddingDistanceEvalChain(
|
||||
Returns:
|
||||
Dict[str, Any]: The computed score.
|
||||
"""
|
||||
vectors = np.array(
|
||||
self.embeddings.embed_documents(
|
||||
[
|
||||
inputs["prediction"],
|
||||
inputs["prediction_b"],
|
||||
]
|
||||
)
|
||||
vectors = self.embeddings.embed_documents(
|
||||
[
|
||||
inputs["prediction"],
|
||||
inputs["prediction_b"],
|
||||
]
|
||||
)
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
vectors = np.array(vectors)
|
||||
score = self._compute_score(vectors)
|
||||
return {"score": score}
|
||||
|
||||
@ -458,13 +514,15 @@ class PairwiseEmbeddingDistanceEvalChain(
|
||||
Returns:
|
||||
Dict[str, Any]: The computed score.
|
||||
"""
|
||||
embedded = await self.embeddings.aembed_documents(
|
||||
vectors = await self.embeddings.aembed_documents(
|
||||
[
|
||||
inputs["prediction"],
|
||||
inputs["prediction_b"],
|
||||
]
|
||||
)
|
||||
vectors = np.array(embedded)
|
||||
if _check_numpy():
|
||||
np = _import_numpy()
|
||||
vectors = np.array(vectors)
|
||||
score = self._compute_score(vectors)
|
||||
return {"score": score}
|
||||
|
||||
|
@ -1,6 +1,5 @@
|
||||
from typing import Callable, Dict, Optional, Sequence
|
||||
|
||||
import numpy as np
|
||||
from langchain_core.callbacks.manager import Callbacks
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.embeddings import Embeddings
|
||||
@ -69,6 +68,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
|
||||
"To use please install langchain-community "
|
||||
"with `pip install langchain-community`."
|
||||
)
|
||||
|
||||
try:
|
||||
import numpy as np
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import numpy, please install with `pip install numpy`."
|
||||
) from e
|
||||
stateful_documents = get_stateful_documents(documents)
|
||||
embedded_documents = _get_embeddings_from_stateful_docs(
|
||||
self.embeddings, stateful_documents
|
||||
@ -104,6 +110,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
|
||||
"To use please install langchain-community "
|
||||
"with `pip install langchain-community`."
|
||||
)
|
||||
|
||||
try:
|
||||
import numpy as np
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import numpy, please install with `pip install numpy`."
|
||||
) from e
|
||||
stateful_documents = get_stateful_documents(documents)
|
||||
embedded_documents = await _aget_embeddings_from_stateful_docs(
|
||||
self.embeddings, stateful_documents
|
||||
|
@ -14,8 +14,6 @@ dependencies = [
|
||||
"SQLAlchemy<3,>=1.4",
|
||||
"requests<3,>=2",
|
||||
"PyYAML>=5.3",
|
||||
"numpy<2,>=1.26.4; python_version < \"3.12\"",
|
||||
"numpy<3,>=1.26.2; python_version >= \"3.12\"",
|
||||
"async-timeout<5.0.0,>=4.0.0; python_version < \"3.11\"",
|
||||
]
|
||||
name = "langchain"
|
||||
@ -74,6 +72,7 @@ test = [
|
||||
"langchain-openai",
|
||||
"toml>=0.10.2",
|
||||
"packaging>=24.2",
|
||||
"numpy<3,>=1.26.4",
|
||||
]
|
||||
codespell = ["codespell<3.0.0,>=2.2.0"]
|
||||
test_integration = [
|
||||
@ -102,6 +101,7 @@ typing = [
|
||||
"mypy-protobuf<4.0.0,>=3.0.0",
|
||||
"langchain-core",
|
||||
"langchain-text-splitters",
|
||||
"numpy<3,>=1.26.4",
|
||||
]
|
||||
dev = [
|
||||
"jupyter<2.0.0,>=1.0.0",
|
||||
|
45
libs/langchain/tests/unit_tests/callbacks/test_file.py
Normal file
45
libs/langchain/tests/unit_tests/callbacks/test_file.py
Normal file
@ -0,0 +1,45 @@
|
||||
import pathlib
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import pytest
|
||||
|
||||
from langchain.callbacks import FileCallbackHandler
|
||||
from langchain.chains.base import CallbackManagerForChainRun, Chain
|
||||
|
||||
|
||||
class FakeChain(Chain):
|
||||
"""Fake chain class for testing purposes."""
|
||||
|
||||
be_correct: bool = True
|
||||
the_input_keys: List[str] = ["foo"]
|
||||
the_output_keys: List[str] = ["bar"]
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Input keys."""
|
||||
return self.the_input_keys
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Output key of bar."""
|
||||
return self.the_output_keys
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, str],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
return {"bar": "bar"}
|
||||
|
||||
|
||||
def test_filecallback(capsys: pytest.CaptureFixture, tmp_path: pathlib.Path) -> Any:
|
||||
"""Test the file callback handler."""
|
||||
p = tmp_path / "output.log"
|
||||
handler = FileCallbackHandler(str(p))
|
||||
chain_test = FakeChain(callbacks=[handler])
|
||||
chain_test.invoke({"foo": "bar"})
|
||||
# Assert the output is as expected
|
||||
assert p.read_text() == (
|
||||
"\n\n\x1b[1m> Entering new FakeChain "
|
||||
"chain...\x1b[0m\n\n\x1b[1m> Finished chain.\x1b[0m\n"
|
||||
)
|
@ -37,7 +37,6 @@ def test_required_dependencies(uv_conf: Mapping[str, Any]) -> None:
|
||||
"langchain-core",
|
||||
"langchain-text-splitters",
|
||||
"langsmith",
|
||||
"numpy",
|
||||
"pydantic",
|
||||
"requests",
|
||||
]
|
||||
@ -82,5 +81,6 @@ def test_test_group_dependencies(uv_conf: Mapping[str, Any]) -> None:
|
||||
"requests-mock",
|
||||
# TODO: temporary hack since cffi 1.17.1 doesn't work with py 3.9.
|
||||
"cffi",
|
||||
"numpy",
|
||||
]
|
||||
)
|
||||
|
@ -2247,8 +2247,6 @@ dependencies = [
|
||||
{ name = "langchain-core" },
|
||||
{ name = "langchain-text-splitters" },
|
||||
{ name = "langsmith" },
|
||||
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
|
||||
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "pyyaml" },
|
||||
{ name = "requests" },
|
||||
@ -2329,6 +2327,8 @@ test = [
|
||||
{ name = "langchain-tests" },
|
||||
{ name = "langchain-text-splitters" },
|
||||
{ name = "lark" },
|
||||
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
|
||||
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
|
||||
{ name = "packaging" },
|
||||
{ name = "pandas" },
|
||||
{ name = "pytest" },
|
||||
@ -2359,6 +2359,8 @@ typing = [
|
||||
{ name = "langchain-text-splitters" },
|
||||
{ name = "mypy" },
|
||||
{ name = "mypy-protobuf" },
|
||||
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
|
||||
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
|
||||
{ name = "types-chardet" },
|
||||
{ name = "types-pytz" },
|
||||
{ name = "types-pyyaml" },
|
||||
@ -2389,8 +2391,6 @@ requires-dist = [
|
||||
{ name = "langchain-together", marker = "extra == 'together'" },
|
||||
{ name = "langchain-xai", marker = "extra == 'xai'" },
|
||||
{ name = "langsmith", specifier = ">=0.1.17,<0.4" },
|
||||
{ name = "numpy", marker = "python_full_version < '3.12'", specifier = ">=1.26.4,<2" },
|
||||
{ name = "numpy", marker = "python_full_version >= '3.12'", specifier = ">=1.26.2,<3" },
|
||||
{ name = "pydantic", specifier = ">=2.7.4,<3.0.0" },
|
||||
{ name = "pyyaml", specifier = ">=5.3" },
|
||||
{ name = "requests", specifier = ">=2,<3" },
|
||||
@ -2422,6 +2422,7 @@ test = [
|
||||
{ name = "langchain-tests", editable = "../standard-tests" },
|
||||
{ name = "langchain-text-splitters", editable = "../text-splitters" },
|
||||
{ name = "lark", specifier = ">=1.1.5,<2.0.0" },
|
||||
{ name = "numpy", specifier = ">=1.26.4,<3" },
|
||||
{ name = "packaging", specifier = ">=24.2" },
|
||||
{ name = "pandas", specifier = ">=2.0.0,<3.0.0" },
|
||||
{ name = "pytest", specifier = ">=8,<9" },
|
||||
@ -2452,6 +2453,7 @@ typing = [
|
||||
{ name = "langchain-text-splitters", editable = "../text-splitters" },
|
||||
{ name = "mypy", specifier = ">=1.10,<2.0" },
|
||||
{ name = "mypy-protobuf", specifier = ">=3.0.0,<4.0.0" },
|
||||
{ name = "numpy", specifier = ">=1.26.4,<3" },
|
||||
{ name = "types-chardet", specifier = ">=5.0.4.6,<6.0.0.0" },
|
||||
{ name = "types-pytz", specifier = ">=2023.3.0.0,<2024.0.0.0" },
|
||||
{ name = "types-pyyaml", specifier = ">=6.0.12.2,<7.0.0.0" },
|
||||
|
@ -462,8 +462,11 @@ packages:
|
||||
- name: langchain-permit
|
||||
path: .
|
||||
repo: permitio/langchain-permit
|
||||
- name: langchain-pymupdf4llm
|
||||
path: .
|
||||
repo: lakinduboteju/langchain-pymupdf4llm
|
||||
- name: langchain-writer
|
||||
path: .
|
||||
repo: writer/langchain-writer
|
||||
downloads: 0
|
||||
downloads_updated_at: '2025-02-24T13:19:19.816059+00:00'
|
||||
downloads_updated_at: '2025-02-24T13:19:19.816059+00:00'
|
||||
|
Loading…
Reference in New Issue
Block a user