Merge branch 'master' into pprados/06-pdfplumber

2025-08-14 15:16:21 +00:00 · 2025-02-27 10:18:41 +01:00 · 2025-02-27 10:18:41 +01:00 · af4fde385c
commit af4fde385c
parent 48ef4443f3 289b3422dc
65 changed files with 1958 additions and 781 deletions
--- a/docs/docs/contributing/how_to/documentation/setup.mdx
+++ b/docs/docs/contributing/how_to/documentation/setup.mdx
@ -50,11 +50,6 @@ locally to ensure that it looks good and is free of errors.
 If you're unable to build it locally that's okay as well, as you will be able to
 see a preview of the documentation on the pull request page.

-From the **monorepo root**, run the following command to install the dependencies:
-
-```bash
-poetry install --with lint,docs --no-root
-````

 ### Building

@ -158,14 +153,6 @@ the working directory to the `langchain-community` directory:
 cd [root]/libs/langchain-community
 ```

-Set up a virtual environment for the package if you haven't done so already.
-
-Install the dependencies for the package.
-
-```bash
-poetry install --with lint
-```
-
 Then you can run the following commands to lint and format the in-code documentation:

 ```bash
--- a/docs/docs/integrations/document_loaders/pymupdf4llm.ipynb
+++ b/docs/docs/integrations/document_loaders/pymupdf4llm.ipynb
@ -0,0 +1,721 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_label: PyMuPDF4LLM\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PyMuPDF4LLMLoader\n",
+    "\n",
+    "This notebook provides a quick overview for getting started with PyMuPDF4LLM [document loader](https://python.langchain.com/docs/concepts/#document-loaders). For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the [GitHub repository](https://github.com/lakinduboteju/langchain-pymupdf4llm).\n",
+    "\n",
+    "## Overview\n",
+    "\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | Local | Serializable | JS support |\n",
+    "| :--- | :--- | :---: | :---: |  :---: |\n",
+    "| [PyMuPDF4LLMLoader](https://github.com/lakinduboteju/langchain-pymupdf4llm) | [langchain_pymupdf4llm](https://pypi.org/project/langchain-pymupdf4llm) | ✅ | ❌ | ❌ |\n",
+    "\n",
+    "### Loader features\n",
+    "\n",
+    "| Source | Document Lazy Loading | Native Async Support | Extract Images | Extract Tables |\n",
+    "| :---: | :---: | :---: | :---: | :---: |\n",
+    "| PyMuPDF4LLMLoader | ✅ | ❌ | ✅ | ✅ |\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "To access PyMuPDF4LLM document loader you'll need to install the `langchain-pymupdf4llm` integration package.\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "No credentials are required to use PyMuPDF4LLMLoader."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
+    "# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Installation\n",
+    "\n",
+    "Install **langchain_community** and **langchain-pymupdf4llm**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install -qU langchain_community langchain-pymupdf4llm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "\n",
+    "Now we can instantiate our model object and load documents:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_pymupdf4llm import PyMuPDF4LLMLoader\n",
+    "\n",
+    "file_path = \"./example_data/layout-parser-paper.pdf\"\n",
+    "loader = PyMuPDF4LLMLoader(file_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Document(metadata={'producer': 'pdfTeX-1.40.21', 'creator': 'LaTeX with hyperref', 'creationdate': '2021-06-22T01:27:10+00:00', 'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2021-06-22T01:27:10+00:00', 'trapped': '', 'modDate': 'D:20210622012710Z', 'creationDate': 'D:20210622012710Z', 'page': 0}, page_content='```\\nLayoutParser: A Unified Toolkit for Deep\\n\\n## Learning Based Document Image Analysis\\n\\n```\\n\\nZejiang Shen[1] (<28>), Ruochen Zhang[2], Melissa Dell[3], Benjamin Charles Germain\\nLee[4], Jacob Carlson[3], and Weining Li[5]\\n\\n1 Allen Institute for AI\\n```\\n              shannons@allenai.org\\n\\n```\\n2 Brown University\\n```\\n             ruochen zhang@brown.edu\\n\\n```\\n3 Harvard University\\n_{melissadell,jacob carlson}@fas.harvard.edu_\\n4 University of Washington\\n```\\n              bcgl@cs.washington.edu\\n\\n```\\n5 University of Waterloo\\n```\\n              w422li@uwaterloo.ca\\n\\n```\\n\\n**Abstract. Recent advances in document image analysis (DIA) have been**\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applications. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\n[The library is publicly available at https://layout-parser.github.io.](https://layout-parser.github.io)\\n\\n**Keywords: Document Image Analysis · Deep Learning · Layout Analysis**\\n\\n    - Character Recognition · Open Source library · Toolkit.\\n\\n### 1 Introduction\\n\\n\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\n\\n')"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "docs = loader.load()\n",
+    "docs[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'producer': 'pdfTeX-1.40.21',\n",
+      " 'creator': 'LaTeX with hyperref',\n",
+      " 'creationdate': '2021-06-22T01:27:10+00:00',\n",
+      " 'source': './example_data/layout-parser-paper.pdf',\n",
+      " 'file_path': './example_data/layout-parser-paper.pdf',\n",
+      " 'total_pages': 16,\n",
+      " 'format': 'PDF 1.5',\n",
+      " 'title': '',\n",
+      " 'author': '',\n",
+      " 'subject': '',\n",
+      " 'keywords': '',\n",
+      " 'moddate': '2021-06-22T01:27:10+00:00',\n",
+      " 'trapped': '',\n",
+      " 'modDate': 'D:20210622012710Z',\n",
+      " 'creationDate': 'D:20210622012710Z',\n",
+      " 'page': 0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "import pprint\n",
+    "\n",
+    "pprint.pp(docs[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Lazy Load"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "6"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pages = []\n",
+    "for doc in loader.lazy_load():\n",
+    "    pages.append(doc)\n",
+    "    if len(pages) >= 10:\n",
+    "        # do some paged operation, e.g.\n",
+    "        # index.upsert(page)\n",
+    "\n",
+    "        pages = []\n",
+    "len(pages)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import Markdown, display\n",
+    "\n",
+    "part = pages[0].page_content[778:1189]\n",
+    "print(part)\n",
+    "# Markdown rendering\n",
+    "display(Markdown(part))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'producer': 'pdfTeX-1.40.21',\n",
+      " 'creator': 'LaTeX with hyperref',\n",
+      " 'creationdate': '2021-06-22T01:27:10+00:00',\n",
+      " 'source': './example_data/layout-parser-paper.pdf',\n",
+      " 'file_path': './example_data/layout-parser-paper.pdf',\n",
+      " 'total_pages': 16,\n",
+      " 'format': 'PDF 1.5',\n",
+      " 'title': '',\n",
+      " 'author': '',\n",
+      " 'subject': '',\n",
+      " 'keywords': '',\n",
+      " 'moddate': '2021-06-22T01:27:10+00:00',\n",
+      " 'trapped': '',\n",
+      " 'modDate': 'D:20210622012710Z',\n",
+      " 'creationDate': 'D:20210622012710Z',\n",
+      " 'page': 10}\n"
+     ]
+    }
+   ],
+   "source": [
+    "pprint.pp(pages[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The metadata attribute contains at least the following keys:\n",
+    "- source\n",
+    "- page (if in mode *page*)\n",
+    "- total_page\n",
+    "- creationdate\n",
+    "- creator\n",
+    "- producer\n",
+    "\n",
+    "Additional metadata are specific to each parser.\n",
+    "These pieces of information can be helpful (to categorize your PDFs for example)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Splitting mode & custom pages delimiter"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When loading the PDF file you can split it in two different ways:\n",
+    "- By page\n",
+    "- As a single text flow\n",
+    "\n",
+    "By default PyMuPDF4LLMLoader will split the PDF by page."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract the PDF by page. Each page is extracted as a langchain Document object:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "16\n",
+      "{'producer': 'pdfTeX-1.40.21',\n",
+      " 'creator': 'LaTeX with hyperref',\n",
+      " 'creationdate': '2021-06-22T01:27:10+00:00',\n",
+      " 'source': './example_data/layout-parser-paper.pdf',\n",
+      " 'file_path': './example_data/layout-parser-paper.pdf',\n",
+      " 'total_pages': 16,\n",
+      " 'format': 'PDF 1.5',\n",
+      " 'title': '',\n",
+      " 'author': '',\n",
+      " 'subject': '',\n",
+      " 'keywords': '',\n",
+      " 'moddate': '2021-06-22T01:27:10+00:00',\n",
+      " 'trapped': '',\n",
+      " 'modDate': 'D:20210622012710Z',\n",
+      " 'creationDate': 'D:20210622012710Z',\n",
+      " 'page': 0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"page\",\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "print(len(docs))\n",
+    "pprint.pp(docs[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this mode the pdf is split by pages and the resulting Documents metadata contains the `page` (page number). But in some cases we could want to process the pdf as a single text flow (so we don't cut some paragraphs in half). In this case you can use the *single* mode :"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract the whole PDF as a single langchain Document object:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1\n",
+      "{'producer': 'pdfTeX-1.40.21',\n",
+      " 'creator': 'LaTeX with hyperref',\n",
+      " 'creationdate': '2021-06-22T01:27:10+00:00',\n",
+      " 'source': './example_data/layout-parser-paper.pdf',\n",
+      " 'file_path': './example_data/layout-parser-paper.pdf',\n",
+      " 'total_pages': 16,\n",
+      " 'format': 'PDF 1.5',\n",
+      " 'title': '',\n",
+      " 'author': '',\n",
+      " 'subject': '',\n",
+      " 'keywords': '',\n",
+      " 'moddate': '2021-06-22T01:27:10+00:00',\n",
+      " 'trapped': '',\n",
+      " 'modDate': 'D:20210622012710Z',\n",
+      " 'creationDate': 'D:20210622012710Z'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"single\",\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "print(len(docs))\n",
+    "pprint.pp(docs[0].metadata)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Logically, in this mode, the `page` (page_number) metadata disappears. Here's how to clearly identify where pages end in the text flow :"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Add a custom *pages_delimiter* to identify where are ends of pages in *single* mode:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"single\",\n",
+    "    pages_delimiter=\"\\n-------THIS IS A CUSTOM END OF PAGE-------\\n\\n\",\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "part = docs[0].page_content[10663:11317]\n",
+    "print(part)\n",
+    "display(Markdown(part))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The default `pages_delimiter` is \\n-----\\n\\n.\n",
+    "But this could simply be \\n, or \\f to clearly indicate a page change, or \\<!-- PAGE BREAK --> for seamless injection in a Markdown viewer without a visual effect."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Extract images from the PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can extract images from your PDFs (in text form) with a choice of three different solutions:\n",
+    "- rapidOCR (lightweight Optical Character Recognition tool)\n",
+    "- Tesseract (OCR tool with high precision)\n",
+    "- Multimodal language model\n",
+    "\n",
+    "The result is inserted at the end of text of the page."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract images from the PDF with rapidOCR:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install -qU rapidocr-onnxruntime pillow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders.parsers import RapidOCRBlobParser\n",
+    "\n",
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"page\",\n",
+    "    extract_images=True,\n",
+    "    images_parser=RapidOCRBlobParser(),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "part = docs[5].page_content[1863:]\n",
+    "print(part)\n",
+    "display(Markdown(part))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Be careful, RapidOCR is designed to work with Chinese and English, not other languages."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract images from the PDF with Tesseract:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install -qU pytesseract"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders.parsers import TesseractBlobParser\n",
+    "\n",
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"page\",\n",
+    "    extract_images=True,\n",
+    "    images_parser=TesseractBlobParser(),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "print(docs[5].page_content[1863:])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract images from the PDF with multimodal model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install -qU langchain_openai"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 39,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import os\n",
+    "\n",
+    "from dotenv import load_dotenv\n",
+    "\n",
+    "load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from getpass import getpass\n",
+    "\n",
+    "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
+    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API key =\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders.parsers import LLMImageBlobParser\n",
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"page\",\n",
+    "    extract_images=True,\n",
+    "    images_parser=LLMImageBlobParser(\n",
+    "        model=ChatOpenAI(model=\"gpt-4o-mini\", max_tokens=1024)\n",
+    "    ),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "print(docs[5].page_content[1863:])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Extract tables from the PDF\n",
+    "\n",
+    "With PyMUPDF4LLM you can extract tables from your PDFs in *markdown* format :"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loader = PyMuPDF4LLMLoader(\n",
+    "    \"./example_data/layout-parser-paper.pdf\",\n",
+    "    mode=\"page\",\n",
+    "    # \"lines_strict\" is the default strategy and\n",
+    "    # is the most accurate for tables with column and row lines,\n",
+    "    # but may not work well with all documents.\n",
+    "    # \"lines\" is a less strict strategy that may work better with\n",
+    "    # some documents.\n",
+    "    # \"text\" is the least strict strategy and may work better\n",
+    "    # with documents that do not have tables with lines.\n",
+    "    table_strategy=\"lines\",\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "part = docs[4].page_content[3210:]\n",
+    "print(part)\n",
+    "display(Markdown(part))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Working with Files\n",
+    "\n",
+    "Many document loaders involve parsing files. The difference between such loaders usually stems from how the file is parsed, rather than how the file is loaded. For example, you can use `open` to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.\n",
+    "\n",
+    "As a result, it can be helpful to decouple the parsing logic from the loading logic, which makes it easier to re-use a given parser regardless of how the data was loaded.\n",
+    "You can use this strategy to analyze different files, with the same parsing parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import FileSystemBlobLoader\n",
+    "from langchain_community.document_loaders.generic import GenericLoader\n",
+    "from langchain_pymupdf4llm import PyMuPDF4LLMParser\n",
+    "\n",
+    "loader = GenericLoader(\n",
+    "    blob_loader=FileSystemBlobLoader(\n",
+    "        path=\"./example_data/\",\n",
+    "        glob=\"*.pdf\",\n",
+    "    ),\n",
+    "    blob_parser=PyMuPDF4LLMParser(),\n",
+    ")\n",
+    "docs = loader.load()\n",
+    "\n",
+    "part = docs[0].page_content[:562]\n",
+    "print(part)\n",
+    "display(Markdown(part))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all PyMuPDF4LLMLoader features and configurations head to the GitHub repository: https://github.com/lakinduboteju/langchain-pymupdf4llm"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.21"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/docs/docs/integrations/providers/kinetica.mdx
+++ b/docs/docs/integrations/providers/kinetica.mdx
@ -20,7 +20,7 @@ from langchain_community.chat_models.kinetica import ChatKinetica
 The Kinetca vectorstore wrapper leverages Kinetica's native support for [vector
 similarity search](https://docs.kinetica.com/7.2/vector_search/).

-See [Kinetica Vectorsore API](/docs/integrations/vectorstores/kinetica) for usage.
+See [Kinetica Vectorstore API](/docs/integrations/vectorstores/kinetica) for usage.

 ```python
 from langchain_community.vectorstores import Kinetica
@ -28,8 +28,8 @@ from langchain_community.vectorstores import Kinetica

 ## Document Loader

-The Kinetica Document loader can be used to load LangChain Documents from the
-Kinetica database.
+The Kinetica Document loader can be used to load LangChain [Documents](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) from the
+[Kinetica](https://www.kinetica.com/) database.

 See [Kinetica Document Loader](/docs/integrations/document_loaders/kinetica) for usage

--- a/docs/docs/integrations/providers/pymupdf4llm.ipynb
+++ b/docs/docs/integrations/providers/pymupdf4llm.ipynb
@ -0,0 +1,59 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PyMuPDF4LLM\n",
+    "\n",
+    "[PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm) is aimed to make it easier to extract PDF content in Markdown format, needed for LLM & RAG applications.\n",
+    "\n",
+    "[langchain-pymupdf4llm](https://github.com/lakinduboteju/langchain-pymupdf4llm) integrates PyMuPDF4LLM to LangChain as a Document Loader."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-pymupdf4llm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "y8ku6X96sebl"
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_pymupdf4llm import PyMuPDF4LLMLoader, PyMuPDF4LLMParser"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/docs/docs/integrations/vectorstores/milvus.ipynb
+++ b/docs/docs/integrations/vectorstores/milvus.ipynb
--- a/docs/src/theme/FeatureTables.js
+++ b/docs/src/theme/FeatureTables.js
@ -888,6 +888,13 @@ const FEATURE_TABLES = {
                api: "Package",
                apiLink: "https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
            },
+            {
+                name: "PyMuPDF4LLM",
+                link: "pymupdf4llm",
+                source: "Load PDF content to Markdown using PyMuPDF4LLM",
+                api: "Package",
+                apiLink: "https://github.com/lakinduboteju/langchain-pymupdf4llm"
+            },
            {
                name: "PDFMiner",
                link: "pdfminer",
--- a/libs/community/langchain_community/vectorstores/sqlitevec.py
+++ b/libs/community/langchain_community/vectorstores/sqlitevec.py
@ -95,7 +95,7 @@ class SQLiteVec(VectorStore):
        )
        self._connection.execute(
            f"""
-                CREATE TRIGGER IF NOT EXISTS embed_text 
+                CREATE TRIGGER IF NOT EXISTS {self._table}_embed_text 
                AFTER INSERT ON {self._table}
                BEGIN
                    INSERT INTO {self._table}_vec(rowid, text_embedding)
--- a/libs/community/tests/integration_tests/vectorstores/test_sqlitevec.py
+++ b/libs/community/tests/integration_tests/vectorstores/test_sqlitevec.py
@ -56,3 +56,27 @@ def test_sqlitevec_add_extra() -> None:
    docsearch.add_texts(texts, metadatas)
    output = docsearch.similarity_search("foo", k=10)
    assert len(output) == 6
+
+
+@pytest.mark.requires("sqlite-vec")
+def test_sqlitevec_search_multiple_tables() -> None:
+    """Test end to end construction and search with multiple tables."""
+    docsearch_1 = SQLiteVec.from_texts(
+        fake_texts,
+        FakeEmbeddings(),
+        table="table_1",
+        db_file=":memory:",  ## change to local storage for testing
+    )
+
+    docsearch_2 = SQLiteVec.from_texts(
+        fake_texts,
+        FakeEmbeddings(),
+        table="table_2",
+        db_file=":memory:",
+    )
+
+    output_1 = docsearch_1.similarity_search("foo", k=1)
+    output_2 = docsearch_2.similarity_search("foo", k=1)
+
+    assert output_1 == [Document(page_content="foo", metadata={})]
+    assert output_2 == [Document(page_content="foo", metadata={})]
--- a/libs/core/langchain_core/callbacks/base.py
+++ b/libs/core/langchain_core/callbacks/base.py
@ -3,13 +3,14 @@
 from __future__ import annotations

 import logging
-from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any, Optional, TypeVar, Union
-from uuid import UUID
-
-from tenacity import RetryCallState

 if TYPE_CHECKING:
+    from collections.abc import Sequence
+    from uuid import UUID
+
+    from tenacity import RetryCallState
+
    from langchain_core.agents import AgentAction, AgentFinish
    from langchain_core.documents import Document
    from langchain_core.messages import BaseMessage
--- a/libs/core/langchain_core/callbacks/file.py
+++ b/libs/core/langchain_core/callbacks/file.py
@ -2,12 +2,14 @@

 from __future__ import annotations

-from typing import Any, Optional, TextIO, cast
+from typing import TYPE_CHECKING, Any, Optional, TextIO, cast

-from langchain_core.agents import AgentAction, AgentFinish
 from langchain_core.callbacks import BaseCallbackHandler
 from langchain_core.utils.input import print_text

+if TYPE_CHECKING:
+    from langchain_core.agents import AgentAction, AgentFinish
+

 class FileCallbackHandler(BaseCallbackHandler):
    """Callback Handler that writes to a file.
@ -45,9 +47,15 @@ class FileCallbackHandler(BaseCallbackHandler):
            inputs (Dict[str, Any]): The inputs to the chain.
            **kwargs (Any): Additional keyword arguments.
        """
-        class_name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
+        if "name" in kwargs:
+            name = kwargs["name"]
+        else:
+            if serialized:
+                name = serialized.get("name", serialized.get("id", ["<unknown>"])[-1])
+            else:
+                name = "<unknown>"
        print_text(
-            f"\n\n\033[1m> Entering new {class_name} chain...\033[0m",
+            f"\n\n\033[1m> Entering new {name} chain...\033[0m",
            end="\n",
            file=self.file,
        )
--- a/libs/core/langchain_core/callbacks/manager.py
+++ b/libs/core/langchain_core/callbacks/manager.py
@ -5,7 +5,6 @@ import functools
 import logging
 import uuid
 from abc import ABC, abstractmethod
-from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
 from concurrent.futures import ThreadPoolExecutor
 from contextlib import asynccontextmanager, contextmanager
 from contextvars import copy_context
@ -21,7 +20,6 @@ from typing import (
 from uuid import UUID

 from langsmith.run_helpers import get_tracing_context
-from tenacity import RetryCallState

 from langchain_core.callbacks.base import (
    BaseCallbackHandler,
@ -39,6 +37,10 @@ from langchain_core.tracers.schemas import Run
 from langchain_core.utils.env import env_var_is_set

 if TYPE_CHECKING:
+    from collections.abc import AsyncGenerator, Coroutine, Generator, Sequence
+
+    from tenacity import RetryCallState
+
    from langchain_core.agents import AgentAction, AgentFinish
    from langchain_core.documents import Document
    from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
--- a/libs/core/langchain_core/chat_history.py
+++ b/libs/core/langchain_core/chat_history.py
@ -17,8 +17,7 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Sequence
-from typing import Union
+from typing import TYPE_CHECKING, Union

 from pydantic import BaseModel, Field

@ -29,6 +28,9 @@ from langchain_core.messages import (
    get_buffer_string,
 )

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+

 class BaseChatMessageHistory(ABC):
    """Abstract base class for storing chat message history.
--- a/libs/core/langchain_core/document_loaders/base.py
+++ b/libs/core/langchain_core/document_loaders/base.py
@ -3,16 +3,17 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import AsyncIterator, Iterator
 from typing import TYPE_CHECKING, Optional

-from langchain_core.documents import Document
 from langchain_core.runnables import run_in_executor

 if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator
+
    from langchain_text_splitters import TextSplitter

-from langchain_core.documents.base import Blob
+    from langchain_core.documents import Document
+    from langchain_core.documents.base import Blob


 class BaseLoader(ABC):  # noqa: B024
--- a/libs/core/langchain_core/document_loaders/blob_loaders.py
+++ b/libs/core/langchain_core/document_loaders/blob_loaders.py
@ -8,12 +8,15 @@ In addition, content loading code should provide a lazy loading interface by def
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Iterable
+from typing import TYPE_CHECKING

 # Re-export Blob and PathLike for backwards compatibility
 from langchain_core.documents.base import Blob as Blob
 from langchain_core.documents.base import PathLike as PathLike

+if TYPE_CHECKING:
+    from collections.abc import Iterable
+

 class BlobLoader(ABC):
    """Abstract interface for blob loaders implementation.
--- a/libs/core/langchain_core/documents/base.py
+++ b/libs/core/langchain_core/documents/base.py
@ -2,15 +2,17 @@ from __future__ import annotations

 import contextlib
 import mimetypes
-from collections.abc import Generator
 from io import BufferedReader, BytesIO
 from pathlib import PurePath
-from typing import Any, Literal, Optional, Union, cast
+from typing import TYPE_CHECKING, Any, Literal, Optional, Union, cast

 from pydantic import ConfigDict, Field, field_validator, model_validator

 from langchain_core.load.serializable import Serializable

+if TYPE_CHECKING:
+    from collections.abc import Generator
+
 PathLike = Union[str, PurePath]


--- a/libs/core/langchain_core/documents/compressor.py
+++ b/libs/core/langchain_core/documents/compressor.py
@ -1,15 +1,18 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Sequence
-from typing import Optional
+from typing import TYPE_CHECKING, Optional

 from pydantic import BaseModel

-from langchain_core.callbacks import Callbacks
-from langchain_core.documents import Document
 from langchain_core.runnables import run_in_executor

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from langchain_core.callbacks import Callbacks
+    from langchain_core.documents import Document
+

 class BaseDocumentCompressor(BaseModel, ABC):
    """Base class for document compressors.
--- a/libs/core/langchain_core/documents/transformers.py
+++ b/libs/core/langchain_core/documents/transformers.py
@ -1,12 +1,13 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any

 from langchain_core.runnables.config import run_in_executor

 if TYPE_CHECKING:
+    from collections.abc import Sequence
+
    from langchain_core.documents import Document


--- a/libs/core/langchain_core/example_selectors/semantic_similarity.py
+++ b/libs/core/langchain_core/example_selectors/semantic_similarity.py
@ -7,11 +7,11 @@ from typing import TYPE_CHECKING, Any, Optional

 from pydantic import BaseModel, ConfigDict

-from langchain_core.documents import Document
 from langchain_core.example_selectors.base import BaseExampleSelector
 from langchain_core.vectorstores import VectorStore

 if TYPE_CHECKING:
+    from langchain_core.documents import Document
    from langchain_core.embeddings import Embeddings


--- a/libs/core/langchain_core/indexing/base.py
+++ b/libs/core/langchain_core/indexing/base.py
@ -3,14 +3,17 @@ from __future__ import annotations
 import abc
 import time
 from abc import ABC, abstractmethod
-from collections.abc import Sequence
-from typing import Any, Optional, TypedDict
+from typing import TYPE_CHECKING, Any, Optional, TypedDict

 from langchain_core._api import beta
-from langchain_core.documents import Document
 from langchain_core.retrievers import BaseRetriever
 from langchain_core.runnables import run_in_executor

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from langchain_core.documents import Document
+

 class RecordManager(ABC):
    """Abstract base class representing the interface for a record manager.
--- a/libs/core/langchain_core/language_models/chat_models.py
+++ b/libs/core/langchain_core/language_models/chat_models.py
@ -4,7 +4,6 @@ import asyncio
 import inspect
 import json
 import typing
-import uuid
 import warnings
 from abc import ABC, abstractmethod
 from collections.abc import AsyncIterator, Iterator, Sequence
@ -70,6 +69,8 @@ from langchain_core.utils.function_calling import convert_to_openai_tool
 from langchain_core.utils.pydantic import TypeBaseModel, is_basemodel_subclass

 if TYPE_CHECKING:
+    import uuid
+
    from langchain_core.output_parsers.base import OutputParserLike
    from langchain_core.runnables import Runnable, RunnableConfig
    from langchain_core.tools import BaseTool
--- a/libs/core/langchain_core/language_models/llms.py
+++ b/libs/core/langchain_core/language_models/llms.py
@ -7,12 +7,12 @@ import functools
 import inspect
 import json
 import logging
-import uuid
 import warnings
 from abc import ABC, abstractmethod
 from collections.abc import AsyncIterator, Iterator, Sequence
 from pathlib import Path
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Optional,
@ -61,6 +61,9 @@ from langchain_core.prompt_values import ChatPromptValue, PromptValue, StringPro
 from langchain_core.runnables import RunnableConfig, ensure_config, get_config_list
 from langchain_core.runnables.config import run_in_executor

+if TYPE_CHECKING:
+    import uuid
+
 logger = logging.getLogger(__name__)


--- a/libs/core/langchain_core/messages/base.py
+++ b/libs/core/langchain_core/messages/base.py
@ -1,6 +1,5 @@
 from __future__ import annotations

-from collections.abc import Sequence
 from typing import TYPE_CHECKING, Any, Optional, Union, cast

 from pydantic import ConfigDict, Field, field_validator
@ -11,6 +10,8 @@ from langchain_core.utils._merge import merge_dicts, merge_lists
 from langchain_core.utils.interactive_env import is_interactive_env

 if TYPE_CHECKING:
+    from collections.abc import Sequence
+
    from langchain_core.prompts.chat import ChatPromptTemplate


--- a/libs/core/langchain_core/output_parsers/list.py
+++ b/libs/core/langchain_core/output_parsers/list.py
@ -4,14 +4,16 @@ import csv
 import re
 from abc import abstractmethod
 from collections import deque
-from collections.abc import AsyncIterator, Iterator
 from io import StringIO
+from typing import TYPE_CHECKING, TypeVar, Union
 from typing import Optional as Optional
-from typing import TypeVar, Union

 from langchain_core.messages import BaseMessage
 from langchain_core.output_parsers.transform import BaseTransformOutputParser

+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator
+
 T = TypeVar("T")


--- a/libs/core/langchain_core/output_parsers/transform.py
+++ b/libs/core/langchain_core/output_parsers/transform.py
@ -1,6 +1,5 @@
 from __future__ import annotations

-from collections.abc import AsyncIterator, Iterator
 from typing import (
    TYPE_CHECKING,
    Any,
@ -19,6 +18,8 @@ from langchain_core.outputs import (
 from langchain_core.runnables.config import run_in_executor

 if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator
+
    from langchain_core.runnables import RunnableConfig


--- a/libs/core/langchain_core/outputs/chat_generation.py
+++ b/libs/core/langchain_core/outputs/chat_generation.py
@ -1,14 +1,16 @@
 from __future__ import annotations

-from typing import Literal, Union
+from typing import TYPE_CHECKING, Literal, Union

 from pydantic import model_validator
-from typing_extensions import Self

 from langchain_core.messages import BaseMessage, BaseMessageChunk
 from langchain_core.outputs.generation import Generation
 from langchain_core.utils._merge import merge_dicts

+if TYPE_CHECKING:
+    from typing_extensions import Self
+

 class ChatGeneration(Generation):
    """A single chat generation output.
--- a/libs/core/langchain_core/prompts/chat.py
+++ b/libs/core/langchain_core/prompts/chat.py
@ -3,9 +3,8 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Sequence
-from pathlib import Path
 from typing import (
+    TYPE_CHECKING,
    Annotated,
    Any,
    Optional,
@ -47,6 +46,10 @@ from langchain_core.prompts.string import (
 from langchain_core.utils import get_colored_text
 from langchain_core.utils.interactive_env import is_interactive_env

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+    from pathlib import Path
+

 class BaseMessagePromptTemplate(Serializable, ABC):
    """Base class for message prompt templates."""
--- a/libs/core/langchain_core/prompts/few_shot.py
+++ b/libs/core/langchain_core/prompts/few_shot.py
@ -2,8 +2,7 @@

 from __future__ import annotations

-from pathlib import Path
-from typing import Any, Literal, Optional, Union
+from typing import TYPE_CHECKING, Any, Literal, Optional, Union

 from pydantic import (
    BaseModel,
@ -11,7 +10,6 @@ from pydantic import (
    Field,
    model_validator,
 )
-from typing_extensions import Self

 from langchain_core.example_selectors import BaseExampleSelector
 from langchain_core.messages import BaseMessage, get_buffer_string
@ -27,6 +25,11 @@ from langchain_core.prompts.string import (
    get_template_variables,
 )

+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from typing_extensions import Self
+

 class _FewShotPromptTemplateMixin(BaseModel):
    """Prompt template that contains few shot examples."""
--- a/libs/core/langchain_core/prompts/prompt.py
+++ b/libs/core/langchain_core/prompts/prompt.py
@ -3,8 +3,7 @@
 from __future__ import annotations

 import warnings
-from pathlib import Path
-from typing import Any, Optional, Union
+from typing import TYPE_CHECKING, Any, Optional, Union

 from pydantic import BaseModel, model_validator

@ -16,7 +15,11 @@ from langchain_core.prompts.string import (
    get_template_variables,
    mustache_schema,
 )
-from langchain_core.runnables.config import RunnableConfig
+
+if TYPE_CHECKING:
+    from pathlib import Path
+
+    from langchain_core.runnables.config import RunnableConfig


 class PromptTemplate(StringPromptTemplate):
--- a/libs/core/langchain_core/runnables/base.py
+++ b/libs/core/langchain_core/runnables/base.py
@ -60,7 +60,6 @@ from langchain_core.runnables.config import (
    run_in_executor,
 )
 from langchain_core.runnables.graph import Graph
-from langchain_core.runnables.schema import StreamEvent
 from langchain_core.runnables.utils import (
    AddableDict,
    AnyConfigurableField,
@ -94,6 +93,7 @@ if TYPE_CHECKING:
    from langchain_core.runnables.fallbacks import (
        RunnableWithFallbacks as RunnableWithFallbacksT,
    )
+    from langchain_core.runnables.schema import StreamEvent
    from langchain_core.tools import BaseTool
    from langchain_core.tracers.log_stream import (
        RunLog,
--- a/libs/core/langchain_core/runnables/configurable.py
+++ b/libs/core/langchain_core/runnables/configurable.py
@ -7,6 +7,7 @@ from collections.abc import AsyncIterator, Iterator, Sequence
 from collections.abc import Mapping as Mapping
 from functools import wraps
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Optional,
@ -26,7 +27,6 @@ from langchain_core.runnables.config import (
    get_executor_for_config,
    merge_configs,
 )
-from langchain_core.runnables.graph import Graph
 from langchain_core.runnables.utils import (
    AnyConfigurableField,
    ConfigurableField,
@ -39,6 +39,9 @@ from langchain_core.runnables.utils import (
    get_unique_config_specs,
 )

+if TYPE_CHECKING:
+    from langchain_core.runnables.graph import Graph
+

 class DynamicRunnable(RunnableSerializable[Input, Output]):
    """Serializable Runnable that can be dynamically configured.
--- a/libs/core/langchain_core/runnables/graph.py
+++ b/libs/core/langchain_core/runnables/graph.py
@ -2,7 +2,6 @@ from __future__ import annotations

 import inspect
 from collections import defaultdict
-from collections.abc import Sequence
 from dataclasses import dataclass, field
 from enum import Enum
 from typing import (
@ -18,11 +17,13 @@ from typing import (
 )
 from uuid import UUID, uuid4

-from pydantic import BaseModel
-
 from langchain_core.utils.pydantic import _IgnoreUnserializable, is_basemodel_subclass

 if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from pydantic import BaseModel
+
    from langchain_core.runnables.base import Runnable as RunnableType


--- a/libs/core/langchain_core/runnables/passthrough.py
+++ b/libs/core/langchain_core/runnables/passthrough.py
@ -5,7 +5,7 @@ from __future__ import annotations
 import asyncio
 import inspect
 import threading
-from collections.abc import AsyncIterator, Awaitable, Iterator, Mapping
+from collections.abc import Awaitable
 from typing import (
    TYPE_CHECKING,
    Any,
@ -32,7 +32,6 @@ from langchain_core.runnables.config import (
    get_executor_for_config,
    patch_config,
 )
-from langchain_core.runnables.graph import Graph
 from langchain_core.runnables.utils import (
    AddableDict,
    ConfigurableFieldSpec,
@ -42,10 +41,13 @@ from langchain_core.utils.iter import safetee
 from langchain_core.utils.pydantic import create_model_v2

 if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator, Mapping
+
    from langchain_core.callbacks.manager import (
        AsyncCallbackManagerForChainRun,
        CallbackManagerForChainRun,
    )
+    from langchain_core.runnables.graph import Graph


 def identity(x: Other) -> Other:
--- a/libs/core/langchain_core/runnables/router.py
+++ b/libs/core/langchain_core/runnables/router.py
@ -1,8 +1,9 @@
 from __future__ import annotations

-from collections.abc import AsyncIterator, Iterator, Mapping
+from collections.abc import Mapping
 from itertools import starmap
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Optional,
@ -31,6 +32,9 @@ from langchain_core.runnables.utils import (
    get_unique_config_specs,
 )

+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator
+

 class RouterInput(TypedDict):
    """Router input.
--- a/libs/core/langchain_core/runnables/schema.py
+++ b/libs/core/langchain_core/runnables/schema.py
@ -2,11 +2,13 @@

 from __future__ import annotations

-from collections.abc import Sequence
-from typing import Any, Literal, Union
+from typing import TYPE_CHECKING, Any, Literal, Union

 from typing_extensions import NotRequired, TypedDict

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+

 class EventData(TypedDict, total=False):
    """Data associated with a streaming event."""
--- a/libs/core/langchain_core/runnables/utils.py
+++ b/libs/core/langchain_core/runnables/utils.py
@ -6,19 +6,11 @@ import ast
 import asyncio
 import inspect
 import textwrap
-from collections.abc import (
-    AsyncIterable,
-    AsyncIterator,
-    Awaitable,
-    Coroutine,
-    Iterable,
-    Mapping,
-    Sequence,
-)
 from functools import lru_cache
 from inspect import signature
 from itertools import groupby
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    NamedTuple,
@ -30,11 +22,22 @@ from typing import (

 from typing_extensions import TypeGuard, override

-from langchain_core.runnables.schema import StreamEvent
-
 # Re-export create-model for backwards compatibility
 from langchain_core.utils.pydantic import create_model as create_model

+if TYPE_CHECKING:
+    from collections.abc import (
+        AsyncIterable,
+        AsyncIterator,
+        Awaitable,
+        Coroutine,
+        Iterable,
+        Mapping,
+        Sequence,
+    )
+
+    from langchain_core.runnables.schema import StreamEvent
+
 Input = TypeVar("Input", contravariant=True)
 # Output type should implement __concat__, as eg str, list, dict do
 Output = TypeVar("Output", covariant=True)
--- a/libs/core/langchain_core/structured_query.py
+++ b/libs/core/langchain_core/structured_query.py
@ -3,12 +3,14 @@
 from __future__ import annotations

 from abc import ABC, abstractmethod
-from collections.abc import Sequence
 from enum import Enum
-from typing import Any, Optional, Union
+from typing import TYPE_CHECKING, Any, Optional, Union

 from pydantic import BaseModel

+if TYPE_CHECKING:
+    from collections.abc import Sequence
+

 class Visitor(ABC):
    """Defines interface for IR translation using a visitor pattern."""
--- a/libs/core/langchain_core/tools/base.py
+++ b/libs/core/langchain_core/tools/base.py
@ -4,13 +4,12 @@ import asyncio
 import functools
 import inspect
 import json
-import uuid
 import warnings
 from abc import ABC, abstractmethod
-from collections.abc import Sequence
 from contextvars import copy_context
 from inspect import signature
 from typing import (
+    TYPE_CHECKING,
    Annotated,
    Any,
    Callable,
@ -68,6 +67,10 @@ from langchain_core.utils.pydantic import (
    is_pydantic_v2_subclass,
 )

+if TYPE_CHECKING:
+    import uuid
+    from collections.abc import Sequence
+
 FILTERED_ARGS = ("run_manager", "callbacks")


--- a/libs/core/langchain_core/tools/retriever.py
+++ b/libs/core/langchain_core/tools/retriever.py
@ -1,21 +1,23 @@
 from __future__ import annotations

 from functools import partial
-from typing import Literal, Optional, Union
+from typing import TYPE_CHECKING, Literal, Optional, Union

 from pydantic import BaseModel, Field

-from langchain_core.callbacks import Callbacks
-from langchain_core.documents import Document
 from langchain_core.prompts import (
    BasePromptTemplate,
    PromptTemplate,
    aformat_document,
    format_document,
 )
-from langchain_core.retrievers import BaseRetriever
 from langchain_core.tools.simple import Tool

+if TYPE_CHECKING:
+    from langchain_core.callbacks import Callbacks
+    from langchain_core.documents import Document
+    from langchain_core.retrievers import BaseRetriever
+

 class RetrieverInput(BaseModel):
    """Input to the retriever."""
--- a/libs/core/langchain_core/tools/simple.py
+++ b/libs/core/langchain_core/tools/simple.py
@ -3,6 +3,7 @@ from __future__ import annotations
 from collections.abc import Awaitable
 from inspect import signature
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Optional,
@ -13,7 +14,6 @@ from langchain_core.callbacks import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
 )
-from langchain_core.messages import ToolCall
 from langchain_core.runnables import RunnableConfig, run_in_executor
 from langchain_core.tools.base import (
    ArgsSchema,
@ -22,6 +22,9 @@ from langchain_core.tools.base import (
    _get_runnable_config_param,
 )

+if TYPE_CHECKING:
+    from langchain_core.messages import ToolCall
+

 class Tool(BaseTool):
    """Tool that takes in function or coroutine directly."""
--- a/libs/core/langchain_core/tools/structured.py
+++ b/libs/core/langchain_core/tools/structured.py
@ -4,6 +4,7 @@ import textwrap
 from collections.abc import Awaitable
 from inspect import signature
 from typing import (
+    TYPE_CHECKING,
    Annotated,
    Any,
    Callable,
@ -18,7 +19,6 @@ from langchain_core.callbacks import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
 )
-from langchain_core.messages import ToolCall
 from langchain_core.runnables import RunnableConfig, run_in_executor
 from langchain_core.tools.base import (
    FILTERED_ARGS,
@ -29,6 +29,9 @@ from langchain_core.tools.base import (
 )
 from langchain_core.utils.pydantic import is_basemodel_subclass

+if TYPE_CHECKING:
+    from langchain_core.messages import ToolCall
+

 class StructuredTool(BaseTool):
    """Tool that can operate on any number of inputs."""
--- a/libs/core/langchain_core/tracers/base.py
+++ b/libs/core/langchain_core/tracers/base.py
@ -5,26 +5,27 @@ from __future__ import annotations
 import asyncio
 import logging
 from abc import ABC, abstractmethod
-from collections.abc import Sequence
 from typing import (
    TYPE_CHECKING,
    Any,
    Optional,
    Union,
 )
-from uuid import UUID
-
-from tenacity import RetryCallState

 from langchain_core.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
 from langchain_core.exceptions import TracerException  # noqa
-from langchain_core.messages import BaseMessage
-from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
 from langchain_core.tracers.core import _TracerCore
-from langchain_core.tracers.schemas import Run

 if TYPE_CHECKING:
+    from collections.abc import Sequence
+    from uuid import UUID
+
+    from tenacity import RetryCallState
+
    from langchain_core.documents import Document
+    from langchain_core.messages import BaseMessage
+    from langchain_core.outputs import ChatGenerationChunk, GenerationChunk, LLMResult
+    from langchain_core.tracers.schemas import Run

 logger = logging.getLogger(__name__)

--- a/libs/core/langchain_core/tracers/context.py
+++ b/libs/core/langchain_core/tracers/context.py
@ -1,6 +1,5 @@
 from __future__ import annotations

-from collections.abc import Generator
 from contextlib import contextmanager
 from contextvars import ContextVar
 from typing import (
@ -18,13 +17,15 @@ from langsmith import utils as ls_utils

 from langchain_core.tracers.langchain import LangChainTracer
 from langchain_core.tracers.run_collector import RunCollectorCallbackHandler
-from langchain_core.tracers.schemas import TracerSessionV1

 if TYPE_CHECKING:
+    from collections.abc import Generator
+
    from langsmith import Client as LangSmithClient

    from langchain_core.callbacks.base import BaseCallbackHandler, Callbacks
    from langchain_core.callbacks.manager import AsyncCallbackManager, CallbackManager
+    from langchain_core.tracers.schemas import TracerSessionV1

 # for backwards partial compatibility if this is imported by users but unused
 tracing_callback_var: Any = None
--- a/libs/core/langchain_core/tracers/core.py
+++ b/libs/core/langchain_core/tracers/core.py
@ -6,7 +6,6 @@ import logging
 import sys
 import traceback
 from abc import ABC, abstractmethod
-from collections.abc import Coroutine, Sequence
 from datetime import datetime, timezone
 from typing import (
    TYPE_CHECKING,
@ -16,13 +15,9 @@ from typing import (
    Union,
    cast,
 )
-from uuid import UUID
-
-from tenacity import RetryCallState

 from langchain_core.exceptions import TracerException
 from langchain_core.load import dumpd
-from langchain_core.messages import BaseMessage
 from langchain_core.outputs import (
    ChatGeneration,
    ChatGenerationChunk,
@ -32,7 +27,13 @@ from langchain_core.outputs import (
 from langchain_core.tracers.schemas import Run

 if TYPE_CHECKING:
+    from collections.abc import Coroutine, Sequence
+    from uuid import UUID
+
+    from tenacity import RetryCallState
+
    from langchain_core.documents import Document
+    from langchain_core.messages import BaseMessage

 logger = logging.getLogger(__name__)

--- a/libs/core/langchain_core/tracers/evaluation.py
+++ b/libs/core/langchain_core/tracers/evaluation.py
@ -5,9 +5,8 @@ from __future__ import annotations
 import logging
 import threading
 import weakref
-from collections.abc import Sequence
 from concurrent.futures import Future, ThreadPoolExecutor, wait
-from typing import Any, Optional, Union, cast
+from typing import TYPE_CHECKING, Any, Optional, Union, cast
 from uuid import UUID

 import langsmith
@ -17,7 +16,11 @@ from langchain_core.tracers import langchain as langchain_tracer
 from langchain_core.tracers.base import BaseTracer
 from langchain_core.tracers.context import tracing_v2_enabled
 from langchain_core.tracers.langchain import _get_executor
-from langchain_core.tracers.schemas import Run
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence
+
+    from langchain_core.tracers.schemas import Run

 logger = logging.getLogger(__name__)

--- a/libs/core/langchain_core/tracers/event_stream.py
+++ b/libs/core/langchain_core/tracers/event_stream.py
@ -5,7 +5,6 @@ from __future__ import annotations
 import asyncio
 import contextlib
 import logging
-from collections.abc import AsyncIterator, Iterator, Sequence
 from typing import (
    TYPE_CHECKING,
    Any,
@ -37,13 +36,15 @@ from langchain_core.runnables.utils import (
    _RootEventFilter,
 )
 from langchain_core.tracers._streaming import _StreamingCallbackHandler
-from langchain_core.tracers.log_stream import LogEntry
 from langchain_core.tracers.memory_stream import _MemoryStream
 from langchain_core.utils.aiter import aclosing, py_anext

 if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator, Sequence
+
    from langchain_core.documents import Document
    from langchain_core.runnables import Runnable, RunnableConfig
+    from langchain_core.tracers.log_stream import LogEntry

 logger = logging.getLogger(__name__)

--- a/libs/core/langchain_core/tracers/langchain.py
+++ b/libs/core/langchain_core/tracers/langchain.py
@ -22,12 +22,12 @@ from tenacity import (

 from langchain_core.env import get_runtime_environment
 from langchain_core.load import dumpd
-from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
 from langchain_core.tracers.base import BaseTracer
 from langchain_core.tracers.schemas import Run

 if TYPE_CHECKING:
    from langchain_core.messages import BaseMessage
+    from langchain_core.outputs import ChatGenerationChunk, GenerationChunk

 logger = logging.getLogger(__name__)
 _LOGGED = set()
--- a/libs/core/langchain_core/tracers/log_stream.py
+++ b/libs/core/langchain_core/tracers/log_stream.py
@ -5,8 +5,8 @@ import contextlib
 import copy
 import threading
 from collections import defaultdict
-from collections.abc import AsyncIterator, Iterator, Sequence
 from typing import (
+    TYPE_CHECKING,
    Any,
    Literal,
    Optional,
@ -14,7 +14,6 @@ from typing import (
    Union,
    overload,
 )
-from uuid import UUID

 import jsonpatch  # type: ignore[import]
 from typing_extensions import NotRequired, TypedDict
@ -23,11 +22,16 @@ from langchain_core.load import dumps
 from langchain_core.load.load import load
 from langchain_core.outputs import ChatGenerationChunk, GenerationChunk
 from langchain_core.runnables import Runnable, RunnableConfig, ensure_config
-from langchain_core.runnables.utils import Input, Output
 from langchain_core.tracers._streaming import _StreamingCallbackHandler
 from langchain_core.tracers.base import BaseTracer
 from langchain_core.tracers.memory_stream import _MemoryStream
-from langchain_core.tracers.schemas import Run
+
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator, Iterator, Sequence
+    from uuid import UUID
+
+    from langchain_core.runnables.utils import Input, Output
+    from langchain_core.tracers.schemas import Run


 class LogEntry(TypedDict):
--- a/libs/core/langchain_core/tracers/root_listeners.py
+++ b/libs/core/langchain_core/tracers/root_listeners.py
@ -1,6 +1,5 @@
 from collections.abc import Awaitable
-from typing import Callable, Optional, Union
-from uuid import UUID
+from typing import TYPE_CHECKING, Callable, Optional, Union

 from langchain_core.runnables.config import (
    RunnableConfig,
@ -10,6 +9,9 @@ from langchain_core.runnables.config import (
 from langchain_core.tracers.base import AsyncBaseTracer, BaseTracer
 from langchain_core.tracers.schemas import Run

+if TYPE_CHECKING:
+    from uuid import UUID
+
 Listener = Union[Callable[[Run], None], Callable[[Run, RunnableConfig], None]]
 AsyncListener = Union[
    Callable[[Run], Awaitable[None]], Callable[[Run, RunnableConfig], Awaitable[None]]
--- a/libs/core/langchain_core/utils/json_schema.py
+++ b/libs/core/langchain_core/utils/json_schema.py
@ -1,8 +1,10 @@
 from __future__ import annotations

-from collections.abc import Sequence
 from copy import deepcopy
-from typing import Any, Optional
+from typing import TYPE_CHECKING, Any, Optional
+
+if TYPE_CHECKING:
+    from collections.abc import Sequence


 def _retrieve_ref(path: str, schema: dict) -> dict:
--- a/libs/core/langchain_core/utils/mustache.py
+++ b/libs/core/langchain_core/utils/mustache.py
@ -8,6 +8,7 @@ import logging
 from collections.abc import Iterator, Mapping, Sequence
 from types import MappingProxyType
 from typing import (
+    TYPE_CHECKING,
    Any,
    Literal,
    Optional,
@ -15,7 +16,8 @@ from typing import (
    cast,
 )

-from typing_extensions import TypeAlias
+if TYPE_CHECKING:
+    from typing_extensions import TypeAlias

 logger = logging.getLogger(__name__)

--- a/libs/core/langchain_core/utils/pydantic.py
+++ b/libs/core/langchain_core/utils/pydantic.py
@ -9,6 +9,7 @@ from contextlib import nullcontext
 from functools import lru_cache, wraps
 from types import GenericAlias
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Optional,
@ -29,13 +30,16 @@ from pydantic import (
 from pydantic import (
    create_model as _create_model_base,
 )
+from pydantic.fields import FieldInfo as FieldInfoV2
 from pydantic.json_schema import (
    DEFAULT_REF_TEMPLATE,
    GenerateJsonSchema,
    JsonSchemaMode,
    JsonSchemaValue,
 )
-from pydantic_core import core_schema
+
+if TYPE_CHECKING:
+    from pydantic_core import core_schema


 def get_pydantic_major_version() -> int:
@ -71,8 +75,8 @@ elif PYDANTIC_MAJOR_VERSION == 2:
    from pydantic.v1.fields import FieldInfo as FieldInfoV1  # type: ignore[assignment]

    # Union type needs to be last assignment to PydanticBaseModel to make mypy happy.
-    PydanticBaseModel = Union[BaseModel, pydantic.BaseModel]  # type: ignore
-    TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]]  # type: ignore
+    PydanticBaseModel = Union[BaseModel, pydantic.BaseModel]  # type: ignore[assignment,misc]
+    TypeBaseModel = Union[type[BaseModel], type[pydantic.BaseModel]]  # type: ignore[misc]
 else:
    msg = f"Unsupported Pydantic version: {PYDANTIC_MAJOR_VERSION}"
    raise ValueError(msg)
@ -357,7 +361,6 @@ def _create_subset_model(

 if PYDANTIC_MAJOR_VERSION == 2:
    from pydantic import BaseModel as BaseModelV2
-    from pydantic.fields import FieldInfo as FieldInfoV2
    from pydantic.v1 import BaseModel as BaseModelV1

    @overload
--- a/libs/core/langchain_core/vectorstores/base.py
+++ b/libs/core/langchain_core/vectorstores/base.py
@ -25,7 +25,6 @@ import logging
 import math
 import warnings
 from abc import ABC, abstractmethod
-from collections.abc import Collection, Iterable, Iterator, Sequence
 from itertools import cycle
 from typing import (
    TYPE_CHECKING,
@ -43,6 +42,8 @@ from langchain_core.retrievers import BaseRetriever, LangSmithRetrieverParams
 from langchain_core.runnables.config import run_in_executor

 if TYPE_CHECKING:
+    from collections.abc import Collection, Iterable, Iterator, Sequence
+
    from langchain_core.callbacks.manager import (
        AsyncCallbackManagerForRetrieverRun,
        CallbackManagerForRetrieverRun,
--- a/libs/core/langchain_core/vectorstores/in_memory.py
+++ b/libs/core/langchain_core/vectorstores/in_memory.py
@ -2,7 +2,6 @@ from __future__ import annotations

 import json
 import uuid
-from collections.abc import Iterator, Sequence
 from pathlib import Path
 from typing import (
    TYPE_CHECKING,
@ -13,13 +12,15 @@ from typing import (

 from langchain_core._api import deprecated
 from langchain_core.documents import Document
-from langchain_core.embeddings import Embeddings
 from langchain_core.load import dumpd, load
 from langchain_core.vectorstores import VectorStore
 from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity
 from langchain_core.vectorstores.utils import maximal_marginal_relevance

 if TYPE_CHECKING:
+    from collections.abc import Iterator, Sequence
+
+    from langchain_core.embeddings import Embeddings
    from langchain_core.indexing import UpsertResponse


--- a/libs/core/pyproject.toml
+++ b/libs/core/pyproject.toml
@ -77,8 +77,9 @@ target-version = "py39"


 [tool.ruff.lint]
-select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TID", "TRY", "UP", "W", "YTT",]
-ignore = [ "ANN401", "COM812", "UP007", "S110", "S112",]
+select = [ "ANN", "ASYNC", "B", "C4", "COM", "DJ", "E", "EM", "EXE", "F", "FLY", "FURB", "I", "ICN", "INT", "LOG", "N", "NPY", "PD", "PIE", "Q", "RSE", "S", "SIM", "SLOT", "T10", "T201", "TC", "TID", "TRY", "UP", "W", "YTT",]
+ignore = [ "ANN401", "COM812", "UP007", "S110", "S112", "TC001", "TC002", "TC003"]
+flake8-type-checking.runtime-evaluated-base-classes = ["pydantic.BaseModel","langchain_core.load.serializable.Serializable","langchain_core.runnables.base.RunnableSerializable"]
 flake8-annotations.allow-star-arg-any = true
 flake8-annotations.mypy-init-return = true

--- a/libs/core/tests/unit_tests/language_models/chat_models/test_base.py
+++ b/libs/core/tests/unit_tests/language_models/chat_models/test_base.py
@ -2,7 +2,7 @@

 import uuid
 from collections.abc import AsyncIterator, Iterator
-from typing import Any, Literal, Optional, Union
+from typing import TYPE_CHECKING, Any, Literal, Optional, Union

 import pytest

@ -30,6 +30,9 @@ from tests.unit_tests.fake.callbacks import (
 )
 from tests.unit_tests.stubs import _any_id_ai_message, _any_id_ai_message_chunk

+if TYPE_CHECKING:
+    from langchain_core.outputs.llm_result import LLMResult
+

@pytest.fixture
 def messages() -> list:
--- a/libs/core/tests/unit_tests/vectorstores/test_vectorstore.py
+++ b/libs/core/tests/unit_tests/vectorstores/test_vectorstore.py
@ -7,8 +7,7 @@ the relevant methods.
 from __future__ import annotations

 import uuid
-from collections.abc import Iterable, Sequence
-from typing import Any, Optional
+from typing import TYPE_CHECKING, Any, Optional

 import pytest

@ -16,6 +15,9 @@ from langchain_core.documents import Document
 from langchain_core.embeddings import Embeddings, FakeEmbeddings
 from langchain_core.vectorstores import VectorStore

+if TYPE_CHECKING:
+    from collections.abc import Iterable, Sequence
+

 class CustomAddTextsVectorstore(VectorStore):
    """A vectorstore that only implements add texts."""
--- a/libs/langchain/langchain/chains/flare/base.py
+++ b/libs/langchain/langchain/chains/flare/base.py
@ -1,9 +1,9 @@
 from __future__ import annotations

+import logging
 import re
 from typing import Any, Dict, List, Optional, Sequence, Tuple

-import numpy as np
 from langchain_core.callbacks import (
    CallbackManagerForChainRun,
 )
@ -23,6 +23,8 @@ from langchain.chains.flare.prompts import (
 )
 from langchain.chains.llm import LLMChain

+logger = logging.getLogger(__name__)
+

 def _extract_tokens_and_log_probs(response: AIMessage) -> Tuple[List[str], List[float]]:
    """Extract tokens and log probabilities from chat model response."""
@ -57,7 +59,24 @@ def _low_confidence_spans(
    min_token_gap: int,
    num_pad_tokens: int,
 ) -> List[str]:
-    _low_idx = np.where(np.exp(log_probs) < min_prob)[0]
+    try:
+        import numpy as np
+
+        _low_idx = np.where(np.exp(log_probs) < min_prob)[0]
+    except ImportError:
+        logger.warning(
+            "NumPy not found in the current Python environment. FlareChain will use a "
+            "pure Python implementation for internal calculations, which may "
+            "significantly impact performance, especially for large datasets. For "
+            "optimal speed and efficiency, consider installing NumPy: pip install numpy"
+        )
+        import math
+
+        _low_idx = [  # type: ignore[assignment]
+            idx
+            for idx, log_prob in enumerate(log_probs)
+            if math.exp(log_prob) < min_prob
+        ]
    low_idx = [i for i in _low_idx if re.search(r"\w", tokens[i])]
    if len(low_idx) == 0:
        return []
--- a/libs/langchain/langchain/chains/hyde/base.py
+++ b/libs/langchain/langchain/chains/hyde/base.py
@ -5,9 +5,9 @@ https://arxiv.org/abs/2212.10496

 from __future__ import annotations

+import logging
 from typing import Any, Dict, List, Optional

-import numpy as np
 from langchain_core.callbacks import CallbackManagerForChainRun
 from langchain_core.embeddings import Embeddings
 from langchain_core.language_models import BaseLanguageModel
@ -20,6 +20,8 @@ from langchain.chains.base import Chain
 from langchain.chains.hyde.prompts import PROMPT_MAP
 from langchain.chains.llm import LLMChain

+logger = logging.getLogger(__name__)
+

 class HypotheticalDocumentEmbedder(Chain, Embeddings):
    """Generate hypothetical document for query, and then embed that.
@ -54,7 +56,22 @@ class HypotheticalDocumentEmbedder(Chain, Embeddings):

    def combine_embeddings(self, embeddings: List[List[float]]) -> List[float]:
        """Combine embeddings into final embeddings."""
-        return list(np.array(embeddings).mean(axis=0))
+        try:
+            import numpy as np
+
+            return list(np.array(embeddings).mean(axis=0))
+        except ImportError:
+            logger.warning(
+                "NumPy not found in the current Python environment. "
+                "HypotheticalDocumentEmbedder will use a pure Python implementation "
+                "for internal calculations, which may significantly impact "
+                "performance, especially for large datasets. For optimal speed and "
+                "efficiency, consider installing NumPy: pip install numpy"
+            )
+            if not embeddings:
+                return []
+            num_vectors = len(embeddings)
+            return [sum(dim_values) / num_vectors for dim_values in zip(*embeddings)]

    def embed_query(self, text: str) -> List[float]:
        """Generate a hypothetical document and embedded it."""
--- a/libs/langchain/langchain/evaluation/embedding_distance/base.py
+++ b/libs/langchain/langchain/evaluation/embedding_distance/base.py
@ -1,9 +1,11 @@
 """A chain for comparing the output of two models using embeddings."""

+import functools
+import logging
 from enum import Enum
+from importlib import util
 from typing import Any, Dict, List, Optional

-import numpy as np
 from langchain_core.callbacks.manager import (
    AsyncCallbackManagerForChainRun,
    CallbackManagerForChainRun,
@ -18,6 +20,34 @@ from langchain.evaluation.schema import PairwiseStringEvaluator, StringEvaluator
 from langchain.schema import RUN_KEY


+def _import_numpy() -> Any:
+    try:
+        import numpy as np
+
+        return np
+    except ImportError as e:
+        raise ImportError(
+            "Could not import numpy, please install with `pip install numpy`."
+        ) from e
+
+
+logger = logging.getLogger(__name__)
+
+
+@functools.lru_cache(maxsize=1)
+def _check_numpy() -> bool:
+    if bool(util.find_spec("numpy")):
+        return True
+    logger.warning(
+        "NumPy not found in the current Python environment. "
+        "langchain will use a pure Python implementation for embedding distance "
+        "operations, which may significantly impact performance, especially for large "
+        "datasets. For optimal speed and efficiency, consider installing NumPy: "
+        "pip install numpy"
+    )
+    return False
+
+
 def _embedding_factory() -> Embeddings:
    """Create an Embeddings object.
    Returns:
@ -158,7 +188,7 @@ class _EmbeddingDistanceChainMixin(Chain):
            raise ValueError(f"Invalid metric: {metric}")

    @staticmethod
-    def _cosine_distance(a: np.ndarray, b: np.ndarray) -> np.ndarray:
+    def _cosine_distance(a: Any, b: Any) -> Any:
        """Compute the cosine distance between two vectors.

        Args:
@ -179,7 +209,7 @@ class _EmbeddingDistanceChainMixin(Chain):
        return 1.0 - cosine_similarity(a, b)

    @staticmethod
-    def _euclidean_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
+    def _euclidean_distance(a: Any, b: Any) -> Any:
        """Compute the Euclidean distance between two vectors.

        Args:
@ -189,10 +219,15 @@ class _EmbeddingDistanceChainMixin(Chain):
        Returns:
            np.floating: The Euclidean distance.
        """
-        return np.linalg.norm(a - b)
+        if _check_numpy():
+            import numpy as np
+
+            return np.linalg.norm(a - b)
+
+        return sum((x - y) * (x - y) for x, y in zip(a, b)) ** 0.5

    @staticmethod
-    def _manhattan_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
+    def _manhattan_distance(a: Any, b: Any) -> Any:
        """Compute the Manhattan distance between two vectors.

        Args:
@ -202,10 +237,14 @@ class _EmbeddingDistanceChainMixin(Chain):
        Returns:
            np.floating: The Manhattan distance.
        """
-        return np.sum(np.abs(a - b))
+        if _check_numpy():
+            np = _import_numpy()
+            return np.sum(np.abs(a - b))
+
+        return sum(abs(x - y) for x, y in zip(a, b))

    @staticmethod
-    def _chebyshev_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
+    def _chebyshev_distance(a: Any, b: Any) -> Any:
        """Compute the Chebyshev distance between two vectors.

        Args:
@ -215,10 +254,14 @@ class _EmbeddingDistanceChainMixin(Chain):
        Returns:
            np.floating: The Chebyshev distance.
        """
-        return np.max(np.abs(a - b))
+        if _check_numpy():
+            np = _import_numpy()
+            return np.max(np.abs(a - b))
+
+        return max(abs(x - y) for x, y in zip(a, b))

    @staticmethod
-    def _hamming_distance(a: np.ndarray, b: np.ndarray) -> np.floating:
+    def _hamming_distance(a: Any, b: Any) -> Any:
        """Compute the Hamming distance between two vectors.

        Args:
@ -228,9 +271,13 @@ class _EmbeddingDistanceChainMixin(Chain):
        Returns:
            np.floating: The Hamming distance.
        """
-        return np.mean(a != b)
+        if _check_numpy():
+            np = _import_numpy()
+            return np.mean(a != b)

-    def _compute_score(self, vectors: np.ndarray) -> float:
+        return sum(1 for x, y in zip(a, b) if x != y) / len(a)
+
+    def _compute_score(self, vectors: Any) -> float:
        """Compute the score based on the distance metric.

        Args:
@ -240,8 +287,11 @@ class _EmbeddingDistanceChainMixin(Chain):
            float: The computed score.
        """
        metric = self._get_metric(self.distance_metric)
-        score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
-        return score
+        if _check_numpy() and isinstance(vectors, _import_numpy().ndarray):
+            score = metric(vectors[0].reshape(1, -1), vectors[1].reshape(1, -1)).item()
+        else:
+            score = metric(vectors[0], vectors[1])
+        return float(score)


 class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
@ -292,9 +342,12 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
        Returns:
            Dict[str, Any]: The computed score.
        """
-        vectors = np.array(
-            self.embeddings.embed_documents([inputs["prediction"], inputs["reference"]])
+        vectors = self.embeddings.embed_documents(
+            [inputs["prediction"], inputs["reference"]]
        )
+        if _check_numpy():
+            np = _import_numpy()
+            vectors = np.array(vectors)
        score = self._compute_score(vectors)
        return {"score": score}

@ -313,13 +366,15 @@ class EmbeddingDistanceEvalChain(_EmbeddingDistanceChainMixin, StringEvaluator):
        Returns:
            Dict[str, Any]: The computed score.
        """
-        embedded = await self.embeddings.aembed_documents(
+        vectors = await self.embeddings.aembed_documents(
            [
                inputs["prediction"],
                inputs["reference"],
            ]
        )
-        vectors = np.array(embedded)
+        if _check_numpy():
+            np = _import_numpy()
+            vectors = np.array(vectors)
        score = self._compute_score(vectors)
        return {"score": score}

@ -432,14 +487,15 @@ class PairwiseEmbeddingDistanceEvalChain(
        Returns:
            Dict[str, Any]: The computed score.
        """
-        vectors = np.array(
-            self.embeddings.embed_documents(
-                [
-                    inputs["prediction"],
-                    inputs["prediction_b"],
-                ]
-            )
+        vectors = self.embeddings.embed_documents(
+            [
+                inputs["prediction"],
+                inputs["prediction_b"],
+            ]
        )
+        if _check_numpy():
+            np = _import_numpy()
+            vectors = np.array(vectors)
        score = self._compute_score(vectors)
        return {"score": score}

@ -458,13 +514,15 @@ class PairwiseEmbeddingDistanceEvalChain(
        Returns:
            Dict[str, Any]: The computed score.
        """
-        embedded = await self.embeddings.aembed_documents(
+        vectors = await self.embeddings.aembed_documents(
            [
                inputs["prediction"],
                inputs["prediction_b"],
            ]
        )
-        vectors = np.array(embedded)
+        if _check_numpy():
+            np = _import_numpy()
+            vectors = np.array(vectors)
        score = self._compute_score(vectors)
        return {"score": score}

--- a/libs/langchain/langchain/retrievers/document_compressors/embeddings_filter.py
+++ b/libs/langchain/langchain/retrievers/document_compressors/embeddings_filter.py
@ -1,6 +1,5 @@
 from typing import Callable, Dict, Optional, Sequence

-import numpy as np
 from langchain_core.callbacks.manager import Callbacks
 from langchain_core.documents import Document
 from langchain_core.embeddings import Embeddings
@ -69,6 +68,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
                "To use please install langchain-community "
                "with `pip install langchain-community`."
            )
+
+        try:
+            import numpy as np
+        except ImportError as e:
+            raise ImportError(
+                "Could not import numpy, please install with `pip install numpy`."
+            ) from e
        stateful_documents = get_stateful_documents(documents)
        embedded_documents = _get_embeddings_from_stateful_docs(
            self.embeddings, stateful_documents
@ -104,6 +110,13 @@ class EmbeddingsFilter(BaseDocumentCompressor):
                "To use please install langchain-community "
                "with `pip install langchain-community`."
            )
+
+        try:
+            import numpy as np
+        except ImportError as e:
+            raise ImportError(
+                "Could not import numpy, please install with `pip install numpy`."
+            ) from e
        stateful_documents = get_stateful_documents(documents)
        embedded_documents = await _aget_embeddings_from_stateful_docs(
            self.embeddings, stateful_documents
--- a/libs/langchain/pyproject.toml
+++ b/libs/langchain/pyproject.toml
@ -14,8 +14,6 @@ dependencies = [
    "SQLAlchemy<3,>=1.4",
    "requests<3,>=2",
    "PyYAML>=5.3",
-    "numpy<2,>=1.26.4; python_version < \"3.12\"",
-    "numpy<3,>=1.26.2; python_version >= \"3.12\"",
    "async-timeout<5.0.0,>=4.0.0; python_version < \"3.11\"",
 ]
 name = "langchain"
@ -74,6 +72,7 @@ test = [
    "langchain-openai",
    "toml>=0.10.2",
    "packaging>=24.2",
+    "numpy<3,>=1.26.4",
 ]
 codespell = ["codespell<3.0.0,>=2.2.0"]
 test_integration = [
@ -102,6 +101,7 @@ typing = [
    "mypy-protobuf<4.0.0,>=3.0.0",
    "langchain-core",
    "langchain-text-splitters",
+    "numpy<3,>=1.26.4",
 ]
 dev = [
    "jupyter<2.0.0,>=1.0.0",
--- a/libs/langchain/tests/unit_tests/callbacks/test_file.py
+++ b/libs/langchain/tests/unit_tests/callbacks/test_file.py
@ -0,0 +1,45 @@
+import pathlib
+from typing import Any, Dict, List, Optional
+
+import pytest
+
+from langchain.callbacks import FileCallbackHandler
+from langchain.chains.base import CallbackManagerForChainRun, Chain
+
+
+class FakeChain(Chain):
+    """Fake chain class for testing purposes."""
+
+    be_correct: bool = True
+    the_input_keys: List[str] = ["foo"]
+    the_output_keys: List[str] = ["bar"]
+
+    @property
+    def input_keys(self) -> List[str]:
+        """Input keys."""
+        return self.the_input_keys
+
+    @property
+    def output_keys(self) -> List[str]:
+        """Output key of bar."""
+        return self.the_output_keys
+
+    def _call(
+        self,
+        inputs: Dict[str, str],
+        run_manager: Optional[CallbackManagerForChainRun] = None,
+    ) -> Dict[str, str]:
+        return {"bar": "bar"}
+
+
+def test_filecallback(capsys: pytest.CaptureFixture, tmp_path: pathlib.Path) -> Any:
+    """Test the file callback handler."""
+    p = tmp_path / "output.log"
+    handler = FileCallbackHandler(str(p))
+    chain_test = FakeChain(callbacks=[handler])
+    chain_test.invoke({"foo": "bar"})
+    # Assert the output is as expected
+    assert p.read_text() == (
+        "\n\n\x1b[1m> Entering new FakeChain "
+        "chain...\x1b[0m\n\n\x1b[1m> Finished chain.\x1b[0m\n"
+    )
--- a/libs/langchain/tests/unit_tests/test_dependencies.py
+++ b/libs/langchain/tests/unit_tests/test_dependencies.py
@ -37,7 +37,6 @@ def test_required_dependencies(uv_conf: Mapping[str, Any]) -> None:
            "langchain-core",
            "langchain-text-splitters",
            "langsmith",
-            "numpy",
            "pydantic",
            "requests",
        ]
@ -82,5 +81,6 @@ def test_test_group_dependencies(uv_conf: Mapping[str, Any]) -> None:
            "requests-mock",
            # TODO: temporary hack since cffi 1.17.1 doesn't work with py 3.9.
            "cffi",
+            "numpy",
        ]
    )
--- a/libs/langchain/uv.lock
+++ b/libs/langchain/uv.lock
@ -2247,8 +2247,6 @@ dependencies = [
    { name = "langchain-core" },
    { name = "langchain-text-splitters" },
    { name = "langsmith" },
-    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
-    { name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
    { name = "pydantic" },
    { name = "pyyaml" },
    { name = "requests" },
@ -2329,6 +2327,8 @@ test = [
    { name = "langchain-tests" },
    { name = "langchain-text-splitters" },
    { name = "lark" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
+    { name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
    { name = "packaging" },
    { name = "pandas" },
    { name = "pytest" },
@ -2359,6 +2359,8 @@ typing = [
    { name = "langchain-text-splitters" },
    { name = "mypy" },
    { name = "mypy-protobuf" },
+    { name = "numpy", version = "1.26.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" },
+    { name = "numpy", version = "2.2.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" },
    { name = "types-chardet" },
    { name = "types-pytz" },
    { name = "types-pyyaml" },
@ -2389,8 +2391,6 @@ requires-dist = [
    { name = "langchain-together", marker = "extra == 'together'" },
    { name = "langchain-xai", marker = "extra == 'xai'" },
    { name = "langsmith", specifier = ">=0.1.17,<0.4" },
-    { name = "numpy", marker = "python_full_version < '3.12'", specifier = ">=1.26.4,<2" },
-    { name = "numpy", marker = "python_full_version >= '3.12'", specifier = ">=1.26.2,<3" },
    { name = "pydantic", specifier = ">=2.7.4,<3.0.0" },
    { name = "pyyaml", specifier = ">=5.3" },
    { name = "requests", specifier = ">=2,<3" },
@ -2422,6 +2422,7 @@ test = [
    { name = "langchain-tests", editable = "../standard-tests" },
    { name = "langchain-text-splitters", editable = "../text-splitters" },
    { name = "lark", specifier = ">=1.1.5,<2.0.0" },
+    { name = "numpy", specifier = ">=1.26.4,<3" },
    { name = "packaging", specifier = ">=24.2" },
    { name = "pandas", specifier = ">=2.0.0,<3.0.0" },
    { name = "pytest", specifier = ">=8,<9" },
@ -2452,6 +2453,7 @@ typing = [
    { name = "langchain-text-splitters", editable = "../text-splitters" },
    { name = "mypy", specifier = ">=1.10,<2.0" },
    { name = "mypy-protobuf", specifier = ">=3.0.0,<4.0.0" },
+    { name = "numpy", specifier = ">=1.26.4,<3" },
    { name = "types-chardet", specifier = ">=5.0.4.6,<6.0.0.0" },
    { name = "types-pytz", specifier = ">=2023.3.0.0,<2024.0.0.0" },
    { name = "types-pyyaml", specifier = ">=6.0.12.2,<7.0.0.0" },
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -462,8 +462,11 @@ packages:
 - name: langchain-permit
  path: .
  repo: permitio/langchain-permit
+- name: langchain-pymupdf4llm
+  path: .
+  repo: lakinduboteju/langchain-pymupdf4llm
 - name: langchain-writer
  path: .
  repo: writer/langchain-writer
  downloads: 0
-  downloads_updated_at: '2025-02-24T13:19:19.816059+00:00'
+  downloads_updated_at: '2025-02-24T13:19:19.816059+00:00'