Merge branch 'master' into pprados/06-pdfplumber

This commit is contained in:
Philippe PRADOS 2025-02-27 10:18:41 +01:00 committed by GitHub
commit af4fde385c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
65 changed files with 1958 additions and 781 deletions

View File

@ -50,11 +50,6 @@ locally to ensure that it looks good and is free of errors.
If you're unable to build it locally that's okay as well, as you will be able to If you're unable to build it locally that's okay as well, as you will be able to
see a preview of the documentation on the pull request page. see a preview of the documentation on the pull request page.
From the **monorepo root**, run the following command to install the dependencies:
```bash
poetry install --with lint,docs --no-root
````
### Building ### Building
@ -158,14 +153,6 @@ the working directory to the `langchain-community` directory:
cd [root]/libs/langchain-community cd [root]/libs/langchain-community
``` ```
Set up a virtual environment for the package if you haven't done so already.
Install the dependencies for the package.
```bash
poetry install --with lint
```
Then you can run the following commands to lint and format the in-code documentation: Then you can run the following commands to lint and format the in-code documentation:
```bash ```bash

View File

@ -0,0 +1,721 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"sidebar_label: PyMuPDF4LLM\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyMuPDF4LLMLoader\n",
"\n",
"This notebook provides a quick overview for getting started with PyMuPDF4LLM [document loader](https://python.langchain.com/docs/concepts/#document-loaders). For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the [GitHub repository](https://github.com/lakinduboteju/langchain-pymupdf4llm).\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PyMuPDF4LLMLoader](https://github.com/lakinduboteju/langchain-pymupdf4llm) | [langchain_pymupdf4llm](https://pypi.org/project/langchain-pymupdf4llm) | ✅ | ❌ | ❌ |\n",
"\n",
"### Loader features\n",
"\n",
"| Source | Document Lazy Loading | Native Async Support | Extract Images | Extract Tables |\n",
"| :---: | :---: | :---: | :---: | :---: |\n",
"| PyMuPDF4LLMLoader | ✅ | ❌ | ✅ | ✅ |\n",
"\n",
"## Setup\n",
"\n",
"To access PyMuPDF4LLM document loader you'll need to install the `langchain-pymupdf4llm` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are required to use PyMuPDF4LLMLoader."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community** and **langchain-pymupdf4llm**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain_community langchain-pymupdf4llm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_pymupdf4llm import PyMuPDF4LLMLoader\n",
"\n",
"file_path = \"./example_data/layout-parser-paper.pdf\"\n",
"loader = PyMuPDF4LLMLoader(file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-06-22T01:27:10+00:00', 'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2021-06-22T01:27:10+00:00', 'trapped': '', 'modDate': 'D:20210622012710Z', 'creationDate': 'D:20210622012710Z', 'page': 0}, page_content='```\\nLayoutParser: A Unified Toolkit for Deep\\n\\n## Learning Based Document Image Analysis\\n\\n```\\n\\nZejiang Shen[1] (<28>), Ruochen Zhang[2], Melissa Dell[3], Benjamin Charles Germain\\nLee[4], Jacob Carlson[3], and Weining Li[5]\\n\\n1 Allen Institute for AI\\n```\\n shannons@allenai.org\\n\\n```\\n2 Brown University\\n```\\n ruochen zhang@brown.edu\\n\\n```\\n3 Harvard University\\n_{melissadell,jacob carlson}@fas.harvard.edu_\\n4 University of Washington\\n```\\n bcgl@cs.washington.edu\\n\\n```\\n5 University of Waterloo\\n```\\n w422li@uwaterloo.ca\\n\\n```\\n\\n**Abstract. Recent advances in document image analysis (DIA) have been**\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\n[The library is publicly available at https://layout-parser.github.io.](https://layout-parser.github.io)\\n\\n**Keywords: Document Image Analysis · Deep Learning · Layout Analysis**\\n\\n - Character Recognition · Open Source library · Toolkit.\\n\\n### 1 Introduction\\n\\n\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\n\\n')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'producer': 'pdfTeX-1.40.21',\n",
" 'creator': 'LaTeX with hyperref',\n",
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
" 'source': './example_data/layout-parser-paper.pdf',\n",
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
" 'total_pages': 16,\n",
" 'format': 'PDF 1.5',\n",
" 'title': '',\n",
" 'author': '',\n",
" 'subject': '',\n",
" 'keywords': '',\n",
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
" 'trapped': '',\n",
" 'modDate': 'D:20210622012710Z',\n",
" 'creationDate': 'D:20210622012710Z',\n",
" 'page': 0}\n"
]
}
],
"source": [
"import pprint\n",
"\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" pages = []\n",
"len(pages)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import Markdown, display\n",
"\n",
"part = pages[0].page_content[778:1189]\n",
"print(part)\n",
"# Markdown rendering\n",
"display(Markdown(part))"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'producer': 'pdfTeX-1.40.21',\n",
" 'creator': 'LaTeX with hyperref',\n",
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
" 'source': './example_data/layout-parser-paper.pdf',\n",
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
" 'total_pages': 16,\n",
" 'format': 'PDF 1.5',\n",
" 'title': '',\n",
" 'author': '',\n",
" 'subject': '',\n",
" 'keywords': '',\n",
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
" 'trapped': '',\n",
" 'modDate': 'D:20210622012710Z',\n",
" 'creationDate': 'D:20210622012710Z',\n",
" 'page': 10}\n"
]
}
],
"source": [
"pprint.pp(pages[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The metadata attribute contains at least the following keys:\n",
"- source\n",
"- page (if in mode *page*)\n",
"- total_page\n",
"- creationdate\n",
"- creator\n",
"- producer\n",
"\n",
"Additional metadata are specific to each parser.\n",
"These pieces of information can be helpful (to categorize your PDFs for example)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Splitting mode & custom pages delimiter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When loading the PDF file you can split it in two different ways:\n",
"- By page\n",
"- As a single text flow\n",
"\n",
"By default PyMuPDF4LLMLoader will split the PDF by page."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract the PDF by page. Each page is extracted as a langchain Document object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"16\n",
"{'producer': 'pdfTeX-1.40.21',\n",
" 'creator': 'LaTeX with hyperref',\n",
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
" 'source': './example_data/layout-parser-paper.pdf',\n",
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
" 'total_pages': 16,\n",
" 'format': 'PDF 1.5',\n",
" 'title': '',\n",
" 'author': '',\n",
" 'subject': '',\n",
" 'keywords': '',\n",
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
" 'trapped': '',\n",
" 'modDate': 'D:20210622012710Z',\n",
" 'creationDate': 'D:20210622012710Z',\n",
" 'page': 0}\n"
]
}
],
"source": [
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this mode the pdf is split by pages and the resulting Documents metadata contains the `page` (page number). But in some cases we could want to process the pdf as a single text flow (so we don't cut some paragraphs in half). In this case you can use the *single* mode :"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract the whole PDF as a single langchain Document object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"{'producer': 'pdfTeX-1.40.21',\n",
" 'creator': 'LaTeX with hyperref',\n",
" 'creationdate': '2021-06-22T01:27:10+00:00',\n",
" 'source': './example_data/layout-parser-paper.pdf',\n",
" 'file_path': './example_data/layout-parser-paper.pdf',\n",
" 'total_pages': 16,\n",
" 'format': 'PDF 1.5',\n",
" 'title': '',\n",
" 'author': '',\n",
" 'subject': '',\n",
" 'keywords': '',\n",
" 'moddate': '2021-06-22T01:27:10+00:00',\n",
" 'trapped': '',\n",
" 'modDate': 'D:20210622012710Z',\n",
" 'creationDate': 'D:20210622012710Z'}\n"
]
}
],
"source": [
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Logically, in this mode, the `page` (page_number) metadata disappears. Here's how to clearly identify where pages end in the text flow :"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add a custom *pages_delimiter* to identify where are ends of pages in *single* mode:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
" pages_delimiter=\"\\n-------THIS IS A CUSTOM END OF PAGE-------\\n\\n\",\n",
")\n",
"docs = loader.load()\n",
"\n",
"part = docs[0].page_content[10663:11317]\n",
"print(part)\n",
"display(Markdown(part))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The default `pages_delimiter` is \\n-----\\n\\n.\n",
"But this could simply be \\n, or \\f to clearly indicate a page change, or \\<!-- PAGE BREAK --> for seamless injection in a Markdown viewer without a visual effect."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extract images from the PDF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can extract images from your PDFs (in text form) with a choice of three different solutions:\n",
"- rapidOCR (lightweight Optical Character Recognition tool)\n",
"- Tesseract (OCR tool with high precision)\n",
"- Multimodal language model\n",
"\n",
"The result is inserted at the end of text of the page."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract images from the PDF with rapidOCR:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU rapidocr-onnxruntime pillow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.parsers import RapidOCRBlobParser\n",
"\n",
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" extract_images=True,\n",
" images_parser=RapidOCRBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"\n",
"part = docs[5].page_content[1863:]\n",
"print(part)\n",
"display(Markdown(part))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Be careful, RapidOCR is designed to work with Chinese and English, not other languages."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract images from the PDF with Tesseract:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU pytesseract"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.parsers import TesseractBlobParser\n",
"\n",
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" extract_images=True,\n",
" images_parser=TesseractBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(docs[5].page_content[1863:])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Extract images from the PDF with multimodal model:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain_openai"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"from getpass import getpass\n",
"\n",
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API key =\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.parsers import LLMImageBlobParser\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" extract_images=True,\n",
" images_parser=LLMImageBlobParser(\n",
" model=ChatOpenAI(model=\"gpt-4o-mini\", max_tokens=1024)\n",
" ),\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(docs[5].page_content[1863:])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extract tables from the PDF\n",
"\n",
"With PyMUPDF4LLM you can extract tables from your PDFs in *markdown* format :"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = PyMuPDF4LLMLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" # \"lines_strict\" is the default strategy and\n",
" # is the most accurate for tables with column and row lines,\n",
" # but may not work well with all documents.\n",
" # \"lines\" is a less strict strategy that may work better with\n",
" # some documents.\n",
" # \"text\" is the least strict strategy and may work better\n",
" # with documents that do not have tables with lines.\n",
" table_strategy=\"lines\",\n",
")\n",
"docs = loader.load()\n",
"\n",
"part = docs[4].page_content[3210:]\n",
"print(part)\n",
"display(Markdown(part))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with Files\n",
"\n",
"Many document loaders involve parsing files. The difference between such loaders usually stems from how the file is parsed, rather than how the file is loaded. For example, you can use `open` to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.\n",
"\n",
"As a result, it can be helpful to decouple the parsing logic from the loading logic, which makes it easier to re-use a given parser regardless of how the data was loaded.\n",
"You can use this strategy to analyze different files, with the same parsing parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import FileSystemBlobLoader\n",
"from langchain_community.document_loaders.generic import GenericLoader\n",
"from langchain_pymupdf4llm import PyMuPDF4LLMParser\n",
"\n",
"loader = GenericLoader(\n",
" blob_loader=FileSystemBlobLoader(\n",
" path=\"./example_data/\",\n",
" glob=\"*.pdf\",\n",
" ),\n",
" blob_parser=PyMuPDF4LLMParser(),\n",
")\n",
"docs = loader.load()\n",
"\n",
"part = docs[0].page_content[:562]\n",
"print(part)\n",
"display(Markdown(part))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the GitHub repository: https://github.com/lakinduboteju/langchain-pymupdf4llm"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.21"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@ -20,7 +20,7 @@ from langchain_community.chat_models.kinetica import ChatKinetica
The Kinetca vectorstore wrapper leverages Kinetica's native support for [vector The Kinetca vectorstore wrapper leverages Kinetica's native support for [vector
similarity search](https://docs.kinetica.com/7.2/vector_search/). similarity search](https://docs.kinetica.com/7.2/vector_search/).
See [Kinetica Vectorsore API](/docs/integrations/vectorstores/kinetica) for usage. See [Kinetica Vectorstore API](/docs/integrations/vectorstores/kinetica) for usage.
```python ```python
from langchain_community.vectorstores import Kinetica from langchain_community.vectorstores import Kinetica
@ -28,8 +28,8 @@ from langchain_community.vectorstores import Kinetica
## Document Loader ## Document Loader
The Kinetica Document loader can be used to load LangChain Documents from the The Kinetica Document loader can be used to load LangChain [Documents](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) from the
Kinetica database. [Kinetica](https://www.kinetica.com/) database.
See [Kinetica Document Loader](/docs/integrations/document_loaders/kinetica) for usage See [Kinetica Document Loader](/docs/integrations/document_loaders/kinetica) for usage

View File

@ -0,0 +1,59 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyMuPDF4LLM\n",
"\n",
"[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) is aimed to make it easier to extract PDF content in Markdown format, needed for LLM & RAG applications.\n",
"\n",
"[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-pymupdf4llm"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from langchain_pymupdf4llm import PyMuPDF4LLMLoader, PyMuPDF4LLMParser"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because it is too large Load Diff

View File

@ -888,6 +888,13 @@ const FEATURE_TABLES = {
api: "Package", api: "Package",
apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html" apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
}, },
{
name: "PyMuPDF4LLM",
link: "pymupdf4llm",
source: "Load PDF content to Markdown using PyMuPDF4LLM",
api: "Package",
apiLink: "https://github.com/lakinduboteju/langchain-pymupdf4llm"
},
{ {
name: "PDFMiner", name: "PDFMiner",
link: "pdfminer", link: "pdfminer",

View File

@ -95,7 +95,7 @@ class SQLiteVec(VectorStore):
) )
self._connection.execute( self._connection.execute(
f""" f"""
CREATE TRIGGER IF NOT EXISTS embed_text CREATE TRIGGER IF NOT EXISTS {self._table}_embed_text
AFTER INSERT ON {self._table} AFTER INSERT ON {self._table}
BEGIN BEGIN
INSERT INTO {self._table}_vec(rowid, text_embedding) INSERT INTO {self._table}_vec(rowid, text_embedding)

View File

@ -56,3 +56,27 @@ def test_sqlitevec_add_extra() -> None:
docsearch.add_texts(texts, metadatas) docsearch.add_texts(texts, metadatas)
output = docsearch.similarity_search("foo", k=10) output = docsearch.similarity_search("foo", k=10)
assert len(output) == 6 assert len(output) == 6
@pytest.mark.requires("sqlite-vec")
def test_sqlitevec_search_multiple_tables() -> None:
"""Test end to end construction and search with multiple tables."""
docsearch_1 = SQLiteVec.from_texts(
fake_texts,
FakeEmbeddings(),
table="table_1",
db_file=":memory:", ## change to local storage for testing
)
docsearch_2 = SQLiteVec.from_texts(
fake_texts,
FakeEmbeddings(),
table="table_2",
db_file=":memory:",
)
output_1 = docsearch_1.similarity_search("foo", k=1)
output_2 = docsearch_2.similarity_search("foo", k=1)
assert output_1 == [Document(page_content="foo", metadata={})]
assert output_2 == [Document(page_content="foo", metadata={})]

View File

@ -3,13 +3,14 @@
from __future__ import annotations from __future__ import annotations
import logging import logging
from collections.abc import Sequence
from typing import TYPE_CHECKING, Any, Optional, TypeVar, Union from typing import TYPE_CHECKING, Any, Optional, TypeVar, Union
from uuid import UUID
from tenacity import RetryCallState
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence
from uuid import UUID
from tenacity import RetryCallState
from langchain_core.agents import AgentAction, AgentFinish from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.messages import BaseMessage from langchain_core.messages import BaseMessage

View File

@ -2,12 +2,14 @@
from __future__ import annotations from __future__ import annotations
from typing import Any, Optional, TextIO, cast from typing import TYPE_CHECKING, Any, Optional, TextIO, cast
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.callbacks import BaseCallbackHandler from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.utils.input import print_text from langchain_core.utils.input import print_text
if TYPE_CHECKING:
from langchain_core.agents import AgentAction, AgentFinish
class FileCallbackHandler(BaseCallbackHandler): class FileCallbackHandler(BaseCallbackHandler):
"""Callback Handler that writes to a file. """Callback Handler that writes to a file.
@ -45,9 +47,15 @@ class FileCallbackHandler(BaseCallbackHandler):
inputs (Dict[str, Any]): The inputs to the chain. inputs (Dict[str, Any]): The inputs to the chain.
**kwargs (Any): Additional keyword arguments. **kwargs (Any): Additional keyword arguments.
""" """
class_name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1]) if "name" in kwargs:
name = kwargs["name"]
else:
if serialized:
name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
else:
name = "<unknown>"
print_text( print_text(
f"\n\n\033[1m> Entering new {class_name} chain...\033[0m", f"\n\n\033[1m> Entering new {name} chain...\033[0m",
end="\n", end="\n",
file=self.file, file=self.file,
) )

View File

@ -5,7 +5,6 @@ import functools
import logging import logging
import uuid import uuid
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
from contextlib import asynccontextmanager, contextmanager from contextlib import asynccontextmanager, contextmanager
from contextvars import copy_context from contextvars import copy_context
@ -21,7 +20,6 @@ from typing import (
from uuid import UUID from uuid import UUID
from langsmith.run_helpers import get_tracing_context from langsmith.run_helpers import get_tracing_context
from tenacity import RetryCallState
from langchain_core.callbacks.base import ( from langchain_core.callbacks.base import (
BaseCallbackHandler, BaseCallbackHandler,
@ -39,6 +37,10 @@ from langchain_core.tracers.schemas import Run
from langchain_core.utils.env import env_var_is_set from langchain_core.utils.env import env_var_is_set
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
from tenacity import RetryCallState
from langchain_core.agents import AgentAction, AgentFinish from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult

View File

@ -17,8 +17,7 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence from typing import TYPE_CHECKING, Union
from typing import Union
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
@ -29,6 +28,9 @@ from langchain_core.messages import (
get_buffer_string, get_buffer_string,
) )
if TYPE_CHECKING:
from collections.abc import Sequence
class BaseChatMessageHistory(ABC): class BaseChatMessageHistory(ABC):
"""Abstract base class for storing chat message history. """Abstract base class for storing chat message history.

View File

@ -3,16 +3,17 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Iterator
from typing import TYPE_CHECKING, Optional from typing import TYPE_CHECKING, Optional
from langchain_core.documents import Document
from langchain_core.runnables import run_in_executor from langchain_core.runnables import run_in_executor
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator
from langchain_text_splitters import TextSplitter from langchain_text_splitters import TextSplitter
from langchain_core.documents.base import Blob from langchain_core.documents import Document
from langchain_core.documents.base import Blob
class BaseLoader(ABC): # noqa: B024 class BaseLoader(ABC): # noqa: B024

View File

@ -8,12 +8,15 @@ In addition, content loading code should provide a lazy loading interface by def
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Iterable from typing import TYPE_CHECKING
# Re-export Blob and PathLike for backwards compatibility # Re-export Blob and PathLike for backwards compatibility
from langchain_core.documents.base import Blob as Blob from langchain_core.documents.base import Blob as Blob
from langchain_core.documents.base import PathLike as PathLike from langchain_core.documents.base import PathLike as PathLike
if TYPE_CHECKING:
from collections.abc import Iterable
class BlobLoader(ABC): class BlobLoader(ABC):
"""Abstract interface for blob loaders implementation. """Abstract interface for blob loaders implementation.

View File

@ -2,15 +2,17 @@ from __future__ import annotations
import contextlib import contextlib
import mimetypes import mimetypes
from collections.abc import Generator
from io import BufferedReader, BytesIO from io import BufferedReader, BytesIO
from pathlib import PurePath from pathlib import PurePath
from typing import Any, Literal, Optional, Union, cast from typing import TYPE_CHECKING, Any, Literal, Optional, Union, cast
from pydantic import ConfigDict, Field, field_validator, model_validator from pydantic import ConfigDict, Field, field_validator, model_validator
from langchain_core.load.serializable import Serializable from langchain_core.load.serializable import Serializable
if TYPE_CHECKING:
from collections.abc import Generator
PathLike = Union[str, PurePath] PathLike = Union[str, PurePath]

View File

@ -1,15 +1,18 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence from typing import TYPE_CHECKING, Optional
from typing import Optional
from pydantic import BaseModel from pydantic import BaseModel
from langchain_core.callbacks import Callbacks
from langchain_core.documents import Document
from langchain_core.runnables import run_in_executor from langchain_core.runnables import run_in_executor
if TYPE_CHECKING:
from collections.abc import Sequence
from langchain_core.callbacks import Callbacks
from langchain_core.documents import Document
class BaseDocumentCompressor(BaseModel, ABC): class BaseDocumentCompressor(BaseModel, ABC):
"""Base class for document compressors. """Base class for document compressors.

View File

@ -1,12 +1,13 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence
from typing import TYPE_CHECKING, Any from typing import TYPE_CHECKING, Any
from langchain_core.runnables.config import run_in_executor from langchain_core.runnables.config import run_in_executor
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence
from langchain_core.documents import Document from langchain_core.documents import Document

View File

@ -7,11 +7,11 @@ from typing import TYPE_CHECKING, Any, Optional
from pydantic import BaseModel, ConfigDict from pydantic import BaseModel, ConfigDict
from langchain_core.documents import Document
from langchain_core.example_selectors.base import BaseExampleSelector from langchain_core.example_selectors.base import BaseExampleSelector
from langchain_core.vectorstores import VectorStore from langchain_core.vectorstores import VectorStore
if TYPE_CHECKING: if TYPE_CHECKING:
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings from langchain_core.embeddings import Embeddings

View File

@ -3,14 +3,17 @@ from __future__ import annotations
import abc import abc
import time import time
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence from typing import TYPE_CHECKING, Any, Optional, TypedDict
from typing import Any, Optional, TypedDict
from langchain_core._api import beta from langchain_core._api import beta
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever from langchain_core.retrievers import BaseRetriever
from langchain_core.runnables import run_in_executor from langchain_core.runnables import run_in_executor
if TYPE_CHECKING:
from collections.abc import Sequence
from langchain_core.documents import Document
class RecordManager(ABC): class RecordManager(ABC):
"""Abstract base class representing the interface for a record manager. """Abstract base class representing the interface for a record manager.

View File

@ -4,7 +4,6 @@ import asyncio
import inspect import inspect
import json import json
import typing import typing
import uuid
import warnings import warnings
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Iterator, Sequence from collections.abc import AsyncIterator, Iterator, Sequence
@ -70,6 +69,8 @@ from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_core.utils.pydantic import TypeBaseModel, is_basemodel_subclass from langchain_core.utils.pydantic import TypeBaseModel, is_basemodel_subclass
if TYPE_CHECKING: if TYPE_CHECKING:
import uuid
from langchain_core.output_parsers.base import OutputParserLike from langchain_core.output_parsers.base import OutputParserLike
from langchain_core.runnables import Runnable, RunnableConfig from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.tools import BaseTool from langchain_core.tools import BaseTool

View File

@ -7,12 +7,12 @@ import functools
import inspect import inspect
import json import json
import logging import logging
import uuid
import warnings import warnings
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import AsyncIterator, Iterator, Sequence from collections.abc import AsyncIterator, Iterator, Sequence
from pathlib import Path from pathlib import Path
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
Optional, Optional,
@ -61,6 +61,9 @@ from langchain_core.prompt_values import ChatPromptValue, PromptValue, StringPro
from langchain_core.runnables import RunnableConfig, ensure_config, get_config_list from langchain_core.runnables import RunnableConfig, ensure_config, get_config_list
from langchain_core.runnables.config import run_in_executor from langchain_core.runnables.config import run_in_executor
if TYPE_CHECKING:
import uuid
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -1,6 +1,5 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import Sequence
from typing import TYPE_CHECKING, Any, Optional, Union, cast from typing import TYPE_CHECKING, Any, Optional, Union, cast
from pydantic import ConfigDict, Field, field_validator from pydantic import ConfigDict, Field, field_validator
@ -11,6 +10,8 @@ from langchain_core.utils._merge import merge_dicts, merge_lists
from langchain_core.utils.interactive_env import is_interactive_env from langchain_core.utils.interactive_env import is_interactive_env
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence
from langchain_core.prompts.chat import ChatPromptTemplate from langchain_core.prompts.chat import ChatPromptTemplate

View File

@ -4,14 +4,16 @@ import csv
import re import re
from abc import abstractmethod from abc import abstractmethod
from collections import deque from collections import deque
from collections.abc import AsyncIterator, Iterator
from io import StringIO from io import StringIO
from typing import TYPE_CHECKING, TypeVar, Union
from typing import Optional as Optional from typing import Optional as Optional
from typing import TypeVar, Union
from langchain_core.messages import BaseMessage from langchain_core.messages import BaseMessage
from langchain_core.output_parsers.transform import BaseTransformOutputParser from langchain_core.output_parsers.transform import BaseTransformOutputParser
if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator
T = TypeVar("T") T = TypeVar("T")

View File

@ -1,6 +1,5 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import AsyncIterator, Iterator
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
Any, Any,
@ -19,6 +18,8 @@ from langchain_core.outputs import (
from langchain_core.runnables.config import run_in_executor from langchain_core.runnables.config import run_in_executor
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator
from langchain_core.runnables import RunnableConfig from langchain_core.runnables import RunnableConfig

View File

@ -1,14 +1,16 @@
from __future__ import annotations from __future__ import annotations
from typing import Literal, Union from typing import TYPE_CHECKING, Literal, Union
from pydantic import model_validator from pydantic import model_validator
from typing_extensions import Self
from langchain_core.messages import BaseMessage, BaseMessageChunk from langchain_core.messages import BaseMessage, BaseMessageChunk
from langchain_core.outputs.generation import Generation from langchain_core.outputs.generation import Generation
from langchain_core.utils._merge import merge_dicts from langchain_core.utils._merge import merge_dicts
if TYPE_CHECKING:
from typing_extensions import Self
class ChatGeneration(Generation): class ChatGeneration(Generation):
"""A single chat generation output. """A single chat generation output.

View File

@ -3,9 +3,8 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence
from pathlib import Path
from typing import ( from typing import (
TYPE_CHECKING,
Annotated, Annotated,
Any, Any,
Optional, Optional,
@ -47,6 +46,10 @@ from langchain_core.prompts.string import (
from langchain_core.utils import get_colored_text from langchain_core.utils import get_colored_text
from langchain_core.utils.interactive_env import is_interactive_env from langchain_core.utils.interactive_env import is_interactive_env
if TYPE_CHECKING:
from collections.abc import Sequence
from pathlib import Path
class BaseMessagePromptTemplate(Serializable, ABC): class BaseMessagePromptTemplate(Serializable, ABC):
"""Base class for message prompt templates.""" """Base class for message prompt templates."""

View File

@ -2,8 +2,7 @@
from __future__ import annotations from __future__ import annotations
from pathlib import Path from typing import TYPE_CHECKING, Any, Literal, Optional, Union
from typing import Any, Literal, Optional, Union
from pydantic import ( from pydantic import (
BaseModel, BaseModel,
@ -11,7 +10,6 @@ from pydantic import (
Field, Field,
model_validator, model_validator,
) )
from typing_extensions import Self
from langchain_core.example_selectors import BaseExampleSelector from langchain_core.example_selectors import BaseExampleSelector
from langchain_core.messages import BaseMessage, get_buffer_string from langchain_core.messages import BaseMessage, get_buffer_string
@ -27,6 +25,11 @@ from langchain_core.prompts.string import (
get_template_variables, get_template_variables,
) )
if TYPE_CHECKING:
from pathlib import Path
from typing_extensions import Self
class _FewShotPromptTemplateMixin(BaseModel): class _FewShotPromptTemplateMixin(BaseModel):
"""Prompt template that contains few shot examples.""" """Prompt template that contains few shot examples."""

View File

@ -3,8 +3,7 @@
from __future__ import annotations from __future__ import annotations
import warnings import warnings
from pathlib import Path from typing import TYPE_CHECKING, Any, Optional, Union
from typing import Any, Optional, Union
from pydantic import BaseModel, model_validator from pydantic import BaseModel, model_validator
@ -16,7 +15,11 @@ from langchain_core.prompts.string import (
get_template_variables, get_template_variables,
mustache_schema, mustache_schema,
) )
from langchain_core.runnables.config import RunnableConfig
if TYPE_CHECKING:
from pathlib import Path
from langchain_core.runnables.config import RunnableConfig
class PromptTemplate(StringPromptTemplate): class PromptTemplate(StringPromptTemplate):

View File

@ -60,7 +60,6 @@ from langchain_core.runnables.config import (
run_in_executor, run_in_executor,
) )
from langchain_core.runnables.graph import Graph from langchain_core.runnables.graph import Graph
from langchain_core.runnables.schema import StreamEvent
from langchain_core.runnables.utils import ( from langchain_core.runnables.utils import (
AddableDict, AddableDict,
AnyConfigurableField, AnyConfigurableField,
@ -94,6 +93,7 @@ if TYPE_CHECKING:
from langchain_core.runnables.fallbacks import ( from langchain_core.runnables.fallbacks import (
RunnableWithFallbacks as RunnableWithFallbacksT, RunnableWithFallbacks as RunnableWithFallbacksT,
) )
from langchain_core.runnables.schema import StreamEvent
from langchain_core.tools import BaseTool from langchain_core.tools import BaseTool
from langchain_core.tracers.log_stream import ( from langchain_core.tracers.log_stream import (
RunLog, RunLog,

View File

@ -7,6 +7,7 @@ from collections.abc import AsyncIterator, Iterator, Sequence
from collections.abc import Mapping as Mapping from collections.abc import Mapping as Mapping
from functools import wraps from functools import wraps
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
Optional, Optional,
@ -26,7 +27,6 @@ from langchain_core.runnables.config import (
get_executor_for_config, get_executor_for_config,
merge_configs, merge_configs,
) )
from langchain_core.runnables.graph import Graph
from langchain_core.runnables.utils import ( from langchain_core.runnables.utils import (
AnyConfigurableField, AnyConfigurableField,
ConfigurableField, ConfigurableField,
@ -39,6 +39,9 @@ from langchain_core.runnables.utils import (
get_unique_config_specs, get_unique_config_specs,
) )
if TYPE_CHECKING:
from langchain_core.runnables.graph import Graph
class DynamicRunnable(RunnableSerializable[Input, Output]): class DynamicRunnable(RunnableSerializable[Input, Output]):
"""Serializable Runnable that can be dynamically configured. """Serializable Runnable that can be dynamically configured.

View File

@ -2,7 +2,6 @@ from __future__ import annotations
import inspect import inspect
from collections import defaultdict from collections import defaultdict
from collections.abc import Sequence
from dataclasses import dataclass, field from dataclasses import dataclass, field
from enum import Enum from enum import Enum
from typing import ( from typing import (
@ -18,11 +17,13 @@ from typing import (
) )
from uuid import UUID, uuid4 from uuid import UUID, uuid4
from pydantic import BaseModel
from langchain_core.utils.pydantic import _IgnoreUnserializable, is_basemodel_subclass from langchain_core.utils.pydantic import _IgnoreUnserializable, is_basemodel_subclass
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence
from pydantic import BaseModel
from langchain_core.runnables.base import Runnable as RunnableType from langchain_core.runnables.base import Runnable as RunnableType

View File

@ -5,7 +5,7 @@ from __future__ import annotations
import asyncio import asyncio
import inspect import inspect
import threading import threading
from collections.abc import AsyncIterator, Awaitable, Iterator, Mapping from collections.abc import Awaitable
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
Any, Any,
@ -32,7 +32,6 @@ from langchain_core.runnables.config import (
get_executor_for_config, get_executor_for_config,
patch_config, patch_config,
) )
from langchain_core.runnables.graph import Graph
from langchain_core.runnables.utils import ( from langchain_core.runnables.utils import (
AddableDict, AddableDict,
ConfigurableFieldSpec, ConfigurableFieldSpec,
@ -42,10 +41,13 @@ from langchain_core.utils.iter import safetee
from langchain_core.utils.pydantic import create_model_v2 from langchain_core.utils.pydantic import create_model_v2
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator, Mapping
from langchain_core.callbacks.manager import ( from langchain_core.callbacks.manager import (
AsyncCallbackManagerForChainRun, AsyncCallbackManagerForChainRun,
CallbackManagerForChainRun, CallbackManagerForChainRun,
) )
from langchain_core.runnables.graph import Graph
def identity(x: Other) -> Other: def identity(x: Other) -> Other:

View File

@ -1,8 +1,9 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import AsyncIterator, Iterator, Mapping from collections.abc import Mapping
from itertools import starmap from itertools import starmap
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
Optional, Optional,
@ -31,6 +32,9 @@ from langchain_core.runnables.utils import (
get_unique_config_specs, get_unique_config_specs,
) )
if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator
class RouterInput(TypedDict): class RouterInput(TypedDict):
"""Router input. """Router input.

View File

@ -2,11 +2,13 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import Sequence from typing import TYPE_CHECKING, Any, Literal, Union
from typing import Any, Literal, Union
from typing_extensions import NotRequired, TypedDict from typing_extensions import NotRequired, TypedDict
if TYPE_CHECKING:
from collections.abc import Sequence
class EventData(TypedDict, total=False): class EventData(TypedDict, total=False):
"""Data associated with a streaming event.""" """Data associated with a streaming event."""

View File

@ -6,19 +6,11 @@ import ast
import asyncio import asyncio
import inspect import inspect
import textwrap import textwrap
from collections.abc import (
AsyncIterable,
AsyncIterator,
Awaitable,
Coroutine,
Iterable,
Mapping,
Sequence,
)
from functools import lru_cache from functools import lru_cache
from inspect import signature from inspect import signature
from itertools import groupby from itertools import groupby
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
NamedTuple, NamedTuple,
@ -30,11 +22,22 @@ from typing import (
from typing_extensions import TypeGuard, override from typing_extensions import TypeGuard, override
from langchain_core.runnables.schema import StreamEvent
# Re-export create-model for backwards compatibility # Re-export create-model for backwards compatibility
from langchain_core.utils.pydantic import create_model as create_model from langchain_core.utils.pydantic import create_model as create_model
if TYPE_CHECKING:
from collections.abc import (
AsyncIterable,
AsyncIterator,
Awaitable,
Coroutine,
Iterable,
Mapping,
Sequence,
)
from langchain_core.runnables.schema import StreamEvent
Input = TypeVar("Input", contravariant=True) Input = TypeVar("Input", contravariant=True)
# Output type should implement __concat__, as eg str, list, dict do # Output type should implement __concat__, as eg str, list, dict do
Output = TypeVar("Output", covariant=True) Output = TypeVar("Output", covariant=True)

View File

@ -3,12 +3,14 @@
from __future__ import annotations from __future__ import annotations
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence
from enum import Enum from enum import Enum
from typing import Any, Optional, Union from typing import TYPE_CHECKING, Any, Optional, Union
from pydantic import BaseModel from pydantic import BaseModel
if TYPE_CHECKING:
from collections.abc import Sequence
class Visitor(ABC): class Visitor(ABC):
"""Defines interface for IR translation using a visitor pattern.""" """Defines interface for IR translation using a visitor pattern."""

View File

@ -4,13 +4,12 @@ import asyncio
import functools import functools
import inspect import inspect
import json import json
import uuid
import warnings import warnings
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence
from contextvars import copy_context from contextvars import copy_context
from inspect import signature from inspect import signature
from typing import ( from typing import (
TYPE_CHECKING,
Annotated, Annotated,
Any, Any,
Callable, Callable,
@ -68,6 +67,10 @@ from langchain_core.utils.pydantic import (
is_pydantic_v2_subclass, is_pydantic_v2_subclass,
) )
if TYPE_CHECKING:
import uuid
from collections.abc import Sequence
FILTERED_ARGS = ("run_manager", "callbacks") FILTERED_ARGS = ("run_manager", "callbacks")

View File

@ -1,21 +1,23 @@
from __future__ import annotations from __future__ import annotations
from functools import partial from functools import partial
from typing import Literal, Optional, Union from typing import TYPE_CHECKING, Literal, Optional, Union
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from langchain_core.callbacks import Callbacks
from langchain_core.documents import Document
from langchain_core.prompts import ( from langchain_core.prompts import (
BasePromptTemplate, BasePromptTemplate,
PromptTemplate, PromptTemplate,
aformat_document, aformat_document,
format_document, format_document,
) )
from langchain_core.retrievers import BaseRetriever
from langchain_core.tools.simple import Tool from langchain_core.tools.simple import Tool
if TYPE_CHECKING:
from langchain_core.callbacks import Callbacks
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
class RetrieverInput(BaseModel): class RetrieverInput(BaseModel):
"""Input to the retriever.""" """Input to the retriever."""

View File

@ -3,6 +3,7 @@ from __future__ import annotations
from collections.abc import Awaitable from collections.abc import Awaitable
from inspect import signature from inspect import signature
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
Optional, Optional,
@ -13,7 +14,6 @@ from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun, AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun, CallbackManagerForToolRun,
) )
from langchain_core.messages import ToolCall
from langchain_core.runnables import RunnableConfig, run_in_executor from langchain_core.runnables import RunnableConfig, run_in_executor
from langchain_core.tools.base import ( from langchain_core.tools.base import (
ArgsSchema, ArgsSchema,
@ -22,6 +22,9 @@ from langchain_core.tools.base import (
_get_runnable_config_param, _get_runnable_config_param,
) )
if TYPE_CHECKING:
from langchain_core.messages import ToolCall
class Tool(BaseTool): class Tool(BaseTool):
"""Tool that takes in function or coroutine directly.""" """Tool that takes in function or coroutine directly."""

View File

@ -4,6 +4,7 @@ import textwrap
from collections.abc import Awaitable from collections.abc import Awaitable
from inspect import signature from inspect import signature
from typing import ( from typing import (
TYPE_CHECKING,
Annotated, Annotated,
Any, Any,
Callable, Callable,
@ -18,7 +19,6 @@ from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun, AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun, CallbackManagerForToolRun,
) )
from langchain_core.messages import ToolCall
from langchain_core.runnables import RunnableConfig, run_in_executor from langchain_core.runnables import RunnableConfig, run_in_executor
from langchain_core.tools.base import ( from langchain_core.tools.base import (
FILTERED_ARGS, FILTERED_ARGS,
@ -29,6 +29,9 @@ from langchain_core.tools.base import (
) )
from langchain_core.utils.pydantic import is_basemodel_subclass from langchain_core.utils.pydantic import is_basemodel_subclass
if TYPE_CHECKING:
from langchain_core.messages import ToolCall
class StructuredTool(BaseTool): class StructuredTool(BaseTool):
"""Tool that can operate on any number of inputs.""" """Tool that can operate on any number of inputs."""

View File

@ -5,26 +5,27 @@ from __future__ import annotations
import asyncio import asyncio
import logging import logging
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Sequence
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
Any, Any,
Optional, Optional,
Union, Union,
) )
from uuid import UUID
from tenacity import RetryCallState
from langchain_core.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler from langchain_core.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
from langchain_core.exceptions import TracerException # noqa from langchain_core.exceptions import TracerException # noqa
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
from langchain_core.tracers.core import _TracerCore from langchain_core.tracers.core import _TracerCore
from langchain_core.tracers.schemas import Run
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence
from uuid import UUID
from tenacity import RetryCallState
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
from langchain_core.tracers.schemas import Run
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -1,6 +1,5 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import Generator
from contextlib import contextmanager from contextlib import contextmanager
from contextvars import ContextVar from contextvars import ContextVar
from typing import ( from typing import (
@ -18,13 +17,15 @@ from langsmith import utils as ls_utils
from langchain_core.tracers.langchain import LangChainTracer from langchain_core.tracers.langchain import LangChainTracer
from langchain_core.tracers.run_collector import RunCollectorCallbackHandler from langchain_core.tracers.run_collector import RunCollectorCallbackHandler
from langchain_core.tracers.schemas import TracerSessionV1
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Generator
from langsmith import Client as LangSmithClient from langsmith import Client as LangSmithClient
from langchain_core.callbacks.base import BaseCallbackHandler, Callbacks from langchain_core.callbacks.base import BaseCallbackHandler, Callbacks
from langchain_core.callbacks.manager import AsyncCallbackManager, CallbackManager from langchain_core.callbacks.manager import AsyncCallbackManager, CallbackManager
from langchain_core.tracers.schemas import TracerSessionV1
# for backwards partial compatibility if this is imported by users but unused # for backwards partial compatibility if this is imported by users but unused
tracing_callback_var: Any = None tracing_callback_var: Any = None

View File

@ -6,7 +6,6 @@ import logging
import sys import sys
import traceback import traceback
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Coroutine, Sequence
from datetime import datetime, timezone from datetime import datetime, timezone
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
@ -16,13 +15,9 @@ from typing import (
Union, Union,
cast, cast,
) )
from uuid import UUID
from tenacity import RetryCallState
from langchain_core.exceptions import TracerException from langchain_core.exceptions import TracerException
from langchain_core.load import dumpd from langchain_core.load import dumpd
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ( from langchain_core.outputs import (
ChatGeneration, ChatGeneration,
ChatGenerationChunk, ChatGenerationChunk,
@ -32,7 +27,13 @@ from langchain_core.outputs import (
from langchain_core.tracers.schemas import Run from langchain_core.tracers.schemas import Run
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Coroutine, Sequence
from uuid import UUID
from tenacity import RetryCallState
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.messages import BaseMessage
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -5,9 +5,8 @@ from __future__ import annotations
import logging import logging
import threading import threading
import weakref import weakref
from collections.abc import Sequence
from concurrent.futures import Future, ThreadPoolExecutor, wait from concurrent.futures import Future, ThreadPoolExecutor, wait
from typing import Any, Optional, Union, cast from typing import TYPE_CHECKING, Any, Optional, Union, cast
from uuid import UUID from uuid import UUID
import langsmith import langsmith
@ -17,7 +16,11 @@ from langchain_core.tracers import langchain as langchain_tracer
from langchain_core.tracers.base import BaseTracer from langchain_core.tracers.base import BaseTracer
from langchain_core.tracers.context import tracing_v2_enabled from langchain_core.tracers.context import tracing_v2_enabled
from langchain_core.tracers.langchain import _get_executor from langchain_core.tracers.langchain import _get_executor
from langchain_core.tracers.schemas import Run
if TYPE_CHECKING:
from collections.abc import Sequence
from langchain_core.tracers.schemas import Run
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -5,7 +5,6 @@ from __future__ import annotations
import asyncio import asyncio
import contextlib import contextlib
import logging import logging
from collections.abc import AsyncIterator, Iterator, Sequence
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
Any, Any,
@ -37,13 +36,15 @@ from langchain_core.runnables.utils import (
_RootEventFilter, _RootEventFilter,
) )
from langchain_core.tracers._streaming import _StreamingCallbackHandler from langchain_core.tracers._streaming import _StreamingCallbackHandler
from langchain_core.tracers.log_stream import LogEntry
from langchain_core.tracers.memory_stream import _MemoryStream from langchain_core.tracers.memory_stream import _MemoryStream
from langchain_core.utils.aiter import aclosing, py_anext from langchain_core.utils.aiter import aclosing, py_anext
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator, Sequence
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.runnables import Runnable, RunnableConfig from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.tracers.log_stream import LogEntry
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -22,12 +22,12 @@ from tenacity import (
from langchain_core.env import get_runtime_environment from langchain_core.env import get_runtime_environment
from langchain_core.load import dumpd from langchain_core.load import dumpd
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
from langchain_core.tracers.base import BaseTracer from langchain_core.tracers.base import BaseTracer
from langchain_core.tracers.schemas import Run from langchain_core.tracers.schemas import Run
if TYPE_CHECKING: if TYPE_CHECKING:
from langchain_core.messages import BaseMessage from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
_LOGGED = set() _LOGGED = set()

View File

@ -5,8 +5,8 @@ import contextlib
import copy import copy
import threading import threading
from collections import defaultdict from collections import defaultdict
from collections.abc import AsyncIterator, Iterator, Sequence
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Literal, Literal,
Optional, Optional,
@ -14,7 +14,6 @@ from typing import (
Union, Union,
overload, overload,
) )
from uuid import UUID
import jsonpatch # type: ignore[import] import jsonpatch # type: ignore[import]
from typing_extensions import NotRequired, TypedDict from typing_extensions import NotRequired, TypedDict
@ -23,11 +22,16 @@ from langchain_core.load import dumps
from langchain_core.load.load import load from langchain_core.load.load import load
from langchain_core.outputs import ChatGenerationChunk, GenerationChunk from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
from langchain_core.runnables import Runnable, RunnableConfig, ensure_config from langchain_core.runnables import Runnable, RunnableConfig, ensure_config
from langchain_core.runnables.utils import Input, Output
from langchain_core.tracers._streaming import _StreamingCallbackHandler from langchain_core.tracers._streaming import _StreamingCallbackHandler
from langchain_core.tracers.base import BaseTracer from langchain_core.tracers.base import BaseTracer
from langchain_core.tracers.memory_stream import _MemoryStream from langchain_core.tracers.memory_stream import _MemoryStream
from langchain_core.tracers.schemas import Run
if TYPE_CHECKING:
from collections.abc import AsyncIterator, Iterator, Sequence
from uuid import UUID
from langchain_core.runnables.utils import Input, Output
from langchain_core.tracers.schemas import Run
class LogEntry(TypedDict): class LogEntry(TypedDict):

View File

@ -1,6 +1,5 @@
from collections.abc import Awaitable from collections.abc import Awaitable
from typing import Callable, Optional, Union from typing import TYPE_CHECKING, Callable, Optional, Union
from uuid import UUID
from langchain_core.runnables.config import ( from langchain_core.runnables.config import (
RunnableConfig, RunnableConfig,
@ -10,6 +9,9 @@ from langchain_core.runnables.config import (
from langchain_core.tracers.base import AsyncBaseTracer, BaseTracer from langchain_core.tracers.base import AsyncBaseTracer, BaseTracer
from langchain_core.tracers.schemas import Run from langchain_core.tracers.schemas import Run
if TYPE_CHECKING:
from uuid import UUID
Listener = Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]] Listener = Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]
AsyncListener = Union[ AsyncListener = Union[
Callable[[Run], Awaitable[None]], Callable[[Run, RunnableConfig], Awaitable[None]] Callable[[Run], Awaitable[None]], Callable[[Run, RunnableConfig], Awaitable[None]]

View File

@ -1,8 +1,10 @@
from __future__ import annotations from __future__ import annotations
from collections.abc import Sequence
from copy import deepcopy from copy import deepcopy
from typing import Any, Optional from typing import TYPE_CHECKING, Any, Optional
if TYPE_CHECKING:
from collections.abc import Sequence
def _retrieve_ref(path: str, schema: dict) -> dict: def _retrieve_ref(path: str, schema: dict) -> dict:

View File

@ -8,6 +8,7 @@ import logging
from collections.abc import Iterator, Mapping, Sequence from collections.abc import Iterator, Mapping, Sequence
from types import MappingProxyType from types import MappingProxyType
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Literal, Literal,
Optional, Optional,
@ -15,7 +16,8 @@ from typing import (
cast, cast,
) )
from typing_extensions import TypeAlias if TYPE_CHECKING:
from typing_extensions import TypeAlias
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View File

@ -9,6 +9,7 @@ from contextlib import nullcontext
from functools import lru_cache, wraps from functools import lru_cache, wraps
from types import GenericAlias from types import GenericAlias
from typing import ( from typing import (
TYPE_CHECKING,
Any, Any,
Callable, Callable,
Optional, Optional,
@ -29,13 +30,16 @@ from pydantic import (
from pydantic import ( from pydantic import (
create_model as _create_model_base, create_model as _create_model_base,
) )
from pydantic.fields import FieldInfo as FieldInfoV2
from pydantic.json_schema import ( from pydantic.json_schema import (
DEFAULT_REF_TEMPLATE, DEFAULT_REF_TEMPLATE,
GenerateJsonSchema, GenerateJsonSchema,
JsonSchemaMode, JsonSchemaMode,
JsonSchemaValue, JsonSchemaValue,
) )
from pydantic_core import core_schema
if TYPE_CHECKING:
from pydantic_core import core_schema
def get_pydantic_major_version() -> int: def get_pydantic_major_version() -> int:
@ -71,8 +75,8 @@ elif PYDANTIC_MAJOR_VERSION == 2:
from pydantic.v1.fields import FieldInfo as FieldInfoV1 # type: ignore[assignment] from pydantic.v1.fields import FieldInfo as FieldInfoV1 # type: ignore[assignment]
# Union type needs to be last assignment to PydanticBaseModel to make mypy happy. # Union type needs to be last assignment to PydanticBaseModel to make mypy happy.
PydanticBaseModel = Union[BaseModel, pydantic.BaseModel] # type: ignore PydanticBaseModel = Union[BaseModel, pydantic.BaseModel] # type: ignore[assignment,misc]
TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]] # type: ignore TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]] # type: ignore[misc]
else: else:
msg = f"Unsupported Pydantic version: {PYDANTIC_MAJOR_VERSION}" msg = f"Unsupported Pydantic version: {PYDANTIC_MAJOR_VERSION}"
raise ValueError(msg) raise ValueError(msg)
@ -357,7 +361,6 @@ def _create_subset_model(
if PYDANTIC_MAJOR_VERSION == 2: if PYDANTIC_MAJOR_VERSION == 2:
from pydantic import BaseModel as BaseModelV2 from pydantic import BaseModel as BaseModelV2
from pydantic.fields import FieldInfo as FieldInfoV2
from pydantic.v1 import BaseModel as BaseModelV1 from pydantic.v1 import BaseModel as BaseModelV1
@overload @overload

View File

@ -25,7 +25,6 @@ import logging
import math import math
import warnings import warnings
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from collections.abc import Collection, Iterable, Iterator, Sequence
from itertools import cycle from itertools import cycle
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
@ -43,6 +42,8 @@ from langchain_core.retrievers import BaseRetriever, LangSmithRetrieverParams
from langchain_core.runnables.config import run_in_executor from langchain_core.runnables.config import run_in_executor
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Collection, Iterable, Iterator, Sequence
from langchain_core.callbacks.manager import ( from langchain_core.callbacks.manager import (
AsyncCallbackManagerForRetrieverRun, AsyncCallbackManagerForRetrieverRun,
CallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun,

View File

@ -2,7 +2,6 @@ from __future__ import annotations
import json import json
import uuid import uuid
from collections.abc import Iterator, Sequence
from pathlib import Path from pathlib import Path
from typing import ( from typing import (
TYPE_CHECKING, TYPE_CHECKING,
@ -13,13 +12,15 @@ from typing import (
from langchain_core._api import deprecated from langchain_core._api import deprecated
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.load import dumpd, load from langchain_core.load import dumpd, load
from langchain_core.vectorstores import VectorStore from langchain_core.vectorstores import VectorStore
from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity
from langchain_core.vectorstores.utils import maximal_marginal_relevance from langchain_core.vectorstores.utils import maximal_marginal_relevance
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Iterator, Sequence
from langchain_core.embeddings import Embeddings
from langchain_core.indexing import UpsertResponse from langchain_core.indexing import UpsertResponse

View File

@ -77,8 +77,9 @@ target-version = "py39"
[tool.ruff.lint] [tool.ruff.lint]
select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TID", "TRY", "UP", "W", "YTT",] select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TC", "TID", "TRY", "UP", "W", "YTT",]
ignore = [ "ANN401", "COM812", "UP007", "S110", "S112",] ignore = [ "ANN401", "COM812", "UP007", "S110", "S112", "TC001", "TC002", "TC003"]
flake8-type-checking.runtime-evaluated-base-classes = ["pydantic.BaseModel","langchain_core.load.serializable.Serializable","langchain_core.runnables.base.RunnableSerializable"]
flake8-annotations.allow-star-arg-any = true flake8-annotations.allow-star-arg-any = true
flake8-annotations.mypy-init-return = true flake8-annotations.mypy-init-return = true

View File

@ -2,7 +2,7 @@
import uuid import uuid
from collections.abc import AsyncIterator, Iterator from collections.abc import AsyncIterator, Iterator
from typing import Any, Literal, Optional, Union from typing import TYPE_CHECKING, Any, Literal, Optional, Union
import pytest import pytest
@ -30,6 +30,9 @@ from tests.unit_tests.fake.callbacks import (
) )
from tests.unit_tests.stubs import _any_id_ai_message, _any_id_ai_message_chunk from tests.unit_tests.stubs import _any_id_ai_message, _any_id_ai_message_chunk
if TYPE_CHECKING:
from langchain_core.outputs.llm_result import LLMResult
@pytest.fixture @pytest.fixture
def messages() -> list: def messages() -> list:

View File

@ -7,8 +7,7 @@ the relevant methods.
from __future__ import annotations from __future__ import annotations
import uuid import uuid
from collections.abc import Iterable, Sequence from typing import TYPE_CHECKING, Any, Optional
from typing import Any, Optional
import pytest import pytest
@ -16,6 +15,9 @@ from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings, FakeEmbeddings from langchain_core.embeddings import Embeddings, FakeEmbeddings
from langchain_core.vectorstores import VectorStore from langchain_core.vectorstores import VectorStore
if TYPE_CHECKING:
from collections.abc import Iterable, Sequence
class CustomAddTextsVectorstore(VectorStore): class CustomAddTextsVectorstore(VectorStore):
"""A vectorstore that only implements add texts.""" """A vectorstore that only implements add texts."""

View File

@ -1,9 +1,9 @@
from __future__ import annotations from __future__ import annotations
import logging
import re import re
from typing import Any, Dict, List, Optional, Sequence, Tuple from typing import Any, Dict, List, Optional, Sequence, Tuple
import numpy as np
from langchain_core.callbacks import ( from langchain_core.callbacks import (
CallbackManagerForChainRun, CallbackManagerForChainRun,
) )
@ -23,6 +23,8 @@ from langchain.chains.flare.prompts import (
) )
from langchain.chains.llm import LLMChain from langchain.chains.llm import LLMChain
logger = logging.getLogger(__name__)
def _extract_tokens_and_log_probs(response: AIMessage) -> Tuple[List[str], List[float]]: def _extract_tokens_and_log_probs(response: AIMessage) -> Tuple[List[str], List[float]]:
"""Extract tokens and log probabilities from chat model response.""" """Extract tokens and log probabilities from chat model response."""
@ -57,7 +59,24 @@ def _low_confidence_spans(
min_token_gap: int, min_token_gap: int,
num_pad_tokens: int, num_pad_tokens: int,
) -> List[str]: ) -> List[str]:
_low_idx = np.where(np.exp(log_probs) < min_prob)[0] try:
import numpy as np
_low_idx = np.where(np.exp(log_probs) < min_prob)[0]
except ImportError:
logger.warning(
"NumPy not found in the current Python environment. FlareChain will use a "
"pure Python implementation for internal calculations, which may "
"significantly impact performance, especially for large datasets. For "
"optimal speed and efficiency, consider installing NumPy: pip install numpy"
)
import math
_low_idx = [ # type: ignore[assignment]
idx
for idx, log_prob in enumerate(log_probs)
if math.exp(log_prob) < min_prob
]
low_idx = [i for i in _low_idx if re.search(r"\w", tokens[i])] low_idx = [i for i in _low_idx if re.search(r"\w", tokens[i])]
if len(low_idx) == 0: if len(low_idx) == 0:
return [] return []

View File

@ -5,9 +5,9 @@ https://arxiv.org/abs/2212.10496
from __future__ import annotations from __future__ import annotations
import logging
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
import numpy as np
from langchain_core.callbacks import CallbackManagerForChainRun from langchain_core.callbacks import CallbackManagerForChainRun
from langchain_core.embeddings import Embeddings from langchain_core.embeddings import Embeddings
from langchain_core.language_models import BaseLanguageModel from langchain_core.language_models import BaseLanguageModel
@ -20,6 +20,8 @@ from langchain.chains.base import Chain
from langchain.chains.hyde.prompts import PROMPT_MAP from langchain.chains.hyde.prompts import PROMPT_MAP
from langchain.chains.llm import LLMChain from langchain.chains.llm import LLMChain
logger = logging.getLogger(__name__)
class HypotheticalDocumentEmbedder(Chain, Embeddings): class HypotheticalDocumentEmbedder(Chain, Embeddings):
"""Generate hypothetical document for query, and then embed that. """Generate hypothetical document for query, and then embed that.
@ -54,7 +56,22 @@ class HypotheticalDocumentEmbedder(Chain, Embeddings):
def combine_embeddings(self, embeddings: List[List[float]]) -> List[float]: def combine_embeddings(self, embeddings: List[List[float]]) -> List[float]:
"""Combine embeddings into final embeddings.""" """Combine embeddings into final embeddings."""
return list(np.array(embeddings).mean(axis=0)) try:
import numpy as np
return list(np.array(embeddings).mean(axis=0))
except ImportError:
logger.warning(
"NumPy not found in the current Python environment. "
"HypotheticalDocumentEmbedder will use a pure Python implementation "
"for internal calculations, which may significantly impact "
"performance, especially for large datasets. For optimal speed and "
"efficiency, consider installing NumPy: pip install numpy"
)
if not embeddings:
return []
num_vectors = len(embeddings)
return [sum(dim_values) / num_vectors for dim_values in zip(*embeddings)]
def embed_query(self, text: str) -> List[float]: def embed_query(self, text: str) -> List[float]:
"""Generate a hypothetical document and embedded it.""" """Generate a hypothetical document and embedded it."""

View File

@ -1,9 +1,11 @@
"""A chain for comparing the output of two models using embeddings.""" """A chain for comparing the output of two models using embeddings."""
import functools
import logging
from enum import Enum from enum import Enum
from importlib import util
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
import numpy as np
from langchain_core.callbacks.manager import ( from langchain_core.callbacks.manager import (
AsyncCallbackManagerForChainRun, AsyncCallbackManagerForChainRun,
CallbackManagerForChainRun, CallbackManagerForChainRun,
@ -18,6 +20,34 @@ from langchain.evaluation.schema import PairwiseStringEvaluator, StringEvaluator
from langchain.schema import RUN_KEY from langchain.schema import RUN_KEY
def _import_numpy() -> Any:
try:
import numpy as np
return np
except ImportError as e:
raise ImportError(
"Could not import numpy, please install with `pip install numpy`."
) from e
logger = logging.getLogger(__name__)
@functools.lru_cache(maxsize=1)
def _check_numpy() -> bool:
if bool(util.find_spec("numpy")):
return True
logger.warning(
"NumPy not found in the current Python environment. "
"langchain will use a pure Python implementation for embedding distance "
"operations, which may significantly impact performance, especially for large "
"datasets. For optimal speed and efficiency, consider installing NumPy: "
"pip install numpy"
)
return False
def _embedding_factory() -> Embeddings: def _embedding_factory() -> Embeddings:
"""Create an Embeddings object. """Create an Embeddings object.
Returns: Returns:
@ -158,7 +188,7 @@ class _EmbeddingDistanceChainMixin(Chain):
raise ValueError(f"Invalid metric: {metric}") raise ValueError(f"Invalid metric: {metric}")
@staticmethod @staticmethod
def _cosine_distance(a: np.ndarray, b: np.ndarray) -> np.ndarray: def _cosine_distance(a: Any, b: Any) -> Any:
"""Compute the cosine distance between two vectors. """Compute the cosine distance between two vectors.
Args: Args:
@ -179,7 +209,7 @@ class _EmbeddingDistanceChainMixin(Chain):
return 1.0 - cosine_similarity(a, b) return 1.0 - cosine_similarity(a, b)
@staticmethod @staticmethod
def _euclidean_distance(a: np.ndarray, b: np.ndarray) -> np.floating: def _euclidean_distance(a: Any, b: Any) -> Any:
"""Compute the Euclidean distance between two vectors. """Compute the Euclidean distance between two vectors.
Args: Args:
@ -189,10 +219,15 @@ class _EmbeddingDistanceChainMixin(Chain):
Returns: Returns:
np.floating: The Euclidean distance. np.floating: The Euclidean distance.
""" """
return np.linalg.norm(a - b) if _check_numpy():
import numpy as np
return np.linalg.norm(a - b)
return sum((x - y) * (x - y) for x, y in zip(a, b)) ** 0.5
@staticmethod @staticmethod
def _manhattan_distance(a: np.ndarray, b: np.ndarray) -> np.floating: def _manhattan_distance(a: Any, b: Any) -> Any:
"""Compute the Manhattan distance between two vectors. """Compute the Manhattan distance between two vectors.
Args: Args:
@ -202,10 +237,14 @@ class _EmbeddingDistanceChainMixin(Chain):
Returns: Returns:
np.floating: The Manhattan distance. np.floating: The Manhattan distance.
""" """
return np.sum(np.abs(a - b)) if _check_numpy():
np = _import_numpy()
return np.sum(np.abs(a - b))
return sum(abs(x - y) for x, y in zip(a, b))
@staticmethod @staticmethod
def _chebyshev_distance(a: np.ndarray, b: np.ndarray) -> np.floating: def _chebyshev_distance(a: Any, b: Any) -> Any:
"""Compute the Chebyshev distance between two vectors. """Compute the Chebyshev distance between two vectors.
Args: Args:
@ -215,10 +254,14 @@ class _EmbeddingDistanceChainMixin(Chain):
Returns: Returns:
np.floating: The Chebyshev distance. np.floating: The Chebyshev distance.
""" """
return np.max(np.abs(a - b)) if _check_numpy():
np = _import_numpy()
return np.max(np.abs(a - b))
return max(abs(x - y) for x, y in zip(a, b))
@staticmethod @staticmethod
def _hamming_distance(a: np.ndarray, b: np.ndarray) -> np.floating: def _hamming_distance(a: Any, b: Any) -> Any:
"""Compute the Hamming distance between two vectors. """Compute the Hamming distance between two vectors.
Args: Args:
@ -228,9 +271,13 @@ class _EmbeddingDistanceChainMixin(Chain):
Returns: Returns:
np.floating: The Hamming distance. np.floating: The Hamming distance.
""" """
return np.mean(a != b) if _check_numpy():
np = _import_numpy()
return np.mean(a != b)
def _compute_score(self, vectors: np.ndarray) -> float: return sum(1 for x, y in zip(a, b) if x != y) / len(a)
def _compute_score(self, vectors: Any) -> float:
"""Compute the score based on the distance metric. """Compute the score based on the distance metric.
Args: Args:
@ -240,8 +287,11 @@ class _EmbeddingDistanceChainMixin(Chain):
float: The computed score. float: The computed score.
""" """
metric = self._get_metric(self.distance_metric) metric = self._get_metric(self.distance_metric)
score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item() if _check_numpy() and isinstance(vectors, _import_numpy().ndarray):
return score score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
else:
score = metric(vectors[0], vectors[1])
return float(score)
class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator): class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
@ -292,9 +342,12 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
Returns: Returns:
Dict[str, Any]: The computed score. Dict[str, Any]: The computed score.
""" """
vectors = np.array( vectors = self.embeddings.embed_documents(
self.embeddings.embed_documents([inputs["prediction"], inputs["reference"]]) [inputs["prediction"], inputs["reference"]]
) )
if _check_numpy():
np = _import_numpy()
vectors = np.array(vectors)
score = self._compute_score(vectors) score = self._compute_score(vectors)
return {"score": score} return {"score": score}
@ -313,13 +366,15 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
Returns: Returns:
Dict[str, Any]: The computed score. Dict[str, Any]: The computed score.
""" """
embedded = await self.embeddings.aembed_documents( vectors = await self.embeddings.aembed_documents(
[ [
inputs["prediction"], inputs["prediction"],
inputs["reference"], inputs["reference"],
] ]
) )
vectors = np.array(embedded) if _check_numpy():
np = _import_numpy()
vectors = np.array(vectors)
score = self._compute_score(vectors) score = self._compute_score(vectors)
return {"score": score} return {"score": score}
@ -432,14 +487,15 @@ class PairwiseEmbeddingDistanceEvalChain(
Returns: Returns:
Dict[str, Any]: The computed score. Dict[str, Any]: The computed score.
""" """
vectors = np.array( vectors = self.embeddings.embed_documents(
self.embeddings.embed_documents( [
[ inputs["prediction"],
inputs["prediction"], inputs["prediction_b"],
inputs["prediction_b"], ]
]
)
) )
if _check_numpy():
np = _import_numpy()
vectors = np.array(vectors)
score = self._compute_score(vectors) score = self._compute_score(vectors)
return {"score": score} return {"score": score}
@ -458,13 +514,15 @@ class PairwiseEmbeddingDistanceEvalChain(
Returns: Returns:
Dict[str, Any]: The computed score. Dict[str, Any]: The computed score.
""" """
embedded = await self.embeddings.aembed_documents( vectors = await self.embeddings.aembed_documents(
[ [
inputs["prediction"], inputs["prediction"],
inputs["prediction_b"], inputs["prediction_b"],
] ]
) )
vectors = np.array(embedded) if _check_numpy():
np = _import_numpy()
vectors = np.array(vectors)
score = self._compute_score(vectors) score = self._compute_score(vectors)
return {"score": score} return {"score": score}

View File

@ -1,6 +1,5 @@
from typing import Callable, Dict, Optional, Sequence from typing import Callable, Dict, Optional, Sequence
import numpy as np
from langchain_core.callbacks.manager import Callbacks from langchain_core.callbacks.manager import Callbacks
from langchain_core.documents import Document from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings from langchain_core.embeddings import Embeddings
@ -69,6 +68,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
"To use please install langchain-community " "To use please install langchain-community "
"with `pip install langchain-community`." "with `pip install langchain-community`."
) )
try:
import numpy as np
except ImportError as e:
raise ImportError(
"Could not import numpy, please install with `pip install numpy`."
) from e
stateful_documents = get_stateful_documents(documents) stateful_documents = get_stateful_documents(documents)
embedded_documents = _get_embeddings_from_stateful_docs( embedded_documents = _get_embeddings_from_stateful_docs(
self.embeddings, stateful_documents self.embeddings, stateful_documents
@ -104,6 +110,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
"To use please install langchain-community " "To use please install langchain-community "
"with `pip install langchain-community`." "with `pip install langchain-community`."
) )
try:
import numpy as np
except ImportError as e:
raise ImportError(
"Could not import numpy, please install with `pip install numpy`."
) from e
stateful_documents = get_stateful_documents(documents) stateful_documents = get_stateful_documents(documents)
embedded_documents = await _aget_embeddings_from_stateful_docs( embedded_documents = await _aget_embeddings_from_stateful_docs(
self.embeddings, stateful_documents self.embeddings, stateful_documents

View File

@ -14,8 +14,6 @@ dependencies = [
"SQLAlchemy<3,>=1.4", "SQLAlchemy<3,>=1.4",
"requests<3,>=2", "requests<3,>=2",
"PyYAML>=5.3", "PyYAML>=5.3",
"numpy<2,>=1.26.4; python_version < \"3.12\"",
"numpy<3,>=1.26.2; python_version >= \"3.12\"",
"async-timeout<5.0.0,>=4.0.0; python_version < \"3.11\"", "async-timeout<5.0.0,>=4.0.0; python_version < \"3.11\"",
] ]
name = "langchain" name = "langchain"
@ -74,6 +72,7 @@ test = [
"langchain-openai", "langchain-openai",
"toml>=0.10.2", "toml>=0.10.2",
"packaging>=24.2", "packaging>=24.2",
"numpy<3,>=1.26.4",
] ]
codespell = ["codespell<3.0.0,>=2.2.0"] codespell = ["codespell<3.0.0,>=2.2.0"]
test_integration = [ test_integration = [
@ -102,6 +101,7 @@ typing = [
"mypy-protobuf<4.0.0,>=3.0.0", "mypy-protobuf<4.0.0,>=3.0.0",
"langchain-core", "langchain-core",
"langchain-text-splitters", "langchain-text-splitters",
"numpy<3,>=1.26.4",
] ]
dev = [ dev = [
"jupyter<2.0.0,>=1.0.0", "jupyter<2.0.0,>=1.0.0",

View File

@ -0,0 +1,45 @@
import pathlib
from typing import Any, Dict, List, Optional
import pytest
from langchain.callbacks import FileCallbackHandler
from langchain.chains.base import CallbackManagerForChainRun, Chain
class FakeChain(Chain):
"""Fake chain class for testing purposes."""
be_correct: bool = True
the_input_keys: List[str] = ["foo"]
the_output_keys: List[str] = ["bar"]
@property
def input_keys(self) -> List[str]:
"""Input keys."""
return self.the_input_keys
@property
def output_keys(self) -> List[str]:
"""Output key of bar."""
return self.the_output_keys
def _call(
self,
inputs: Dict[str, str],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, str]:
return {"bar": "bar"}
def test_filecallback(capsys: pytest.CaptureFixture, tmp_path: pathlib.Path) -> Any:
"""Test the file callback handler."""
p = tmp_path / "output.log"
handler = FileCallbackHandler(str(p))
chain_test = FakeChain(callbacks=[handler])
chain_test.invoke({"foo": "bar"})
# Assert the output is as expected
assert p.read_text() == (
"\n\n\x1b[1m> Entering new FakeChain "
"chain...\x1b[0m\n\n\x1b[1m> Finished chain.\x1b[0m\n"
)

View File

@ -37,7 +37,6 @@ def test_required_dependencies(uv_conf: Mapping[str, Any]) -> None:
"langchain-core", "langchain-core",
"langchain-text-splitters", "langchain-text-splitters",
"langsmith", "langsmith",
"numpy",
"pydantic", "pydantic",
"requests", "requests",
] ]
@ -82,5 +81,6 @@ def test_test_group_dependencies(uv_conf: Mapping[str, Any]) -> None:
"requests-mock", "requests-mock",
# TODO: temporary hack since cffi 1.17.1 doesn't work with py 3.9. # TODO: temporary hack since cffi 1.17.1 doesn't work with py 3.9.
"cffi", "cffi",
"numpy",
] ]
) )

View File

@ -2247,8 +2247,6 @@ dependencies = [
{ name = "langchain-core" }, { name = "langchain-core" },
{ name = "langchain-text-splitters" }, { name = "langchain-text-splitters" },
{ name = "langsmith" }, { name = "langsmith" },
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
{ name = "pydantic" }, { name = "pydantic" },
{ name = "pyyaml" }, { name = "pyyaml" },
{ name = "requests" }, { name = "requests" },
@ -2329,6 +2327,8 @@ test = [
{ name = "langchain-tests" }, { name = "langchain-tests" },
{ name = "langchain-text-splitters" }, { name = "langchain-text-splitters" },
{ name = "lark" }, { name = "lark" },
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
{ name = "packaging" }, { name = "packaging" },
{ name = "pandas" }, { name = "pandas" },
{ name = "pytest" }, { name = "pytest" },
@ -2359,6 +2359,8 @@ typing = [
{ name = "langchain-text-splitters" }, { name = "langchain-text-splitters" },
{ name = "mypy" }, { name = "mypy" },
{ name = "mypy-protobuf" }, { name = "mypy-protobuf" },
{ name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
{ name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
{ name = "types-chardet" }, { name = "types-chardet" },
{ name = "types-pytz" }, { name = "types-pytz" },
{ name = "types-pyyaml" }, { name = "types-pyyaml" },
@ -2389,8 +2391,6 @@ requires-dist = [
{ name = "langchain-together", marker = "extra == 'together'" }, { name = "langchain-together", marker = "extra == 'together'" },
{ name = "langchain-xai", marker = "extra == 'xai'" }, { name = "langchain-xai", marker = "extra == 'xai'" },
{ name = "langsmith", specifier = ">=0.1.17,<0.4" }, { name = "langsmith", specifier = ">=0.1.17,<0.4" },
{ name = "numpy", marker = "python_full_version < '3.12'", specifier = ">=1.26.4,<2" },
{ name = "numpy", marker = "python_full_version >= '3.12'", specifier = ">=1.26.2,<3" },
{ name = "pydantic", specifier = ">=2.7.4,<3.0.0" }, { name = "pydantic", specifier = ">=2.7.4,<3.0.0" },
{ name = "pyyaml", specifier = ">=5.3" }, { name = "pyyaml", specifier = ">=5.3" },
{ name = "requests", specifier = ">=2,<3" }, { name = "requests", specifier = ">=2,<3" },
@ -2422,6 +2422,7 @@ test = [
{ name = "langchain-tests", editable = "../standard-tests" }, { name = "langchain-tests", editable = "../standard-tests" },
{ name = "langchain-text-splitters", editable = "../text-splitters" }, { name = "langchain-text-splitters", editable = "../text-splitters" },
{ name = "lark", specifier = ">=1.1.5,<2.0.0" }, { name = "lark", specifier = ">=1.1.5,<2.0.0" },
{ name = "numpy", specifier = ">=1.26.4,<3" },
{ name = "packaging", specifier = ">=24.2" }, { name = "packaging", specifier = ">=24.2" },
{ name = "pandas", specifier = ">=2.0.0,<3.0.0" }, { name = "pandas", specifier = ">=2.0.0,<3.0.0" },
{ name = "pytest", specifier = ">=8,<9" }, { name = "pytest", specifier = ">=8,<9" },
@ -2452,6 +2453,7 @@ typing = [
{ name = "langchain-text-splitters", editable = "../text-splitters" }, { name = "langchain-text-splitters", editable = "../text-splitters" },
{ name = "mypy", specifier = ">=1.10,<2.0" }, { name = "mypy", specifier = ">=1.10,<2.0" },
{ name = "mypy-protobuf", specifier = ">=3.0.0,<4.0.0" }, { name = "mypy-protobuf", specifier = ">=3.0.0,<4.0.0" },
{ name = "numpy", specifier = ">=1.26.4,<3" },
{ name = "types-chardet", specifier = ">=5.0.4.6,<6.0.0.0" }, { name = "types-chardet", specifier = ">=5.0.4.6,<6.0.0.0" },
{ name = "types-pytz", specifier = ">=2023.3.0.0,<2024.0.0.0" }, { name = "types-pytz", specifier = ">=2023.3.0.0,<2024.0.0.0" },
{ name = "types-pyyaml", specifier = ">=6.0.12.2,<7.0.0.0" }, { name = "types-pyyaml", specifier = ">=6.0.12.2,<7.0.0.0" },

View File

@ -462,8 +462,11 @@ packages:
- name: langchain-permit - name: langchain-permit
path: . path: .
repo: permitio/langchain-permit repo: permitio/langchain-permit
- name: langchain-pymupdf4llm
path: .
repo: lakinduboteju/langchain-pymupdf4llm
- name: langchain-writer - name: langchain-writer
path: . path: .
repo: writer/langchain-writer repo: writer/langchain-writer
downloads: 0 downloads: 0
downloads_updated_at: '2025-02-24T13:19:19.816059+00:00' downloads_updated_at: '2025-02-24T13:19:19.816059+00:00'