mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-31 07:41:40 +00:00 
			
		
		
		
	# TextLoader auto detect encoding and enhanced exception handling - Add an option to enable encoding detection on `TextLoader`. - The detection is done using `chardet` - The loading is done by trying all detected encodings by order of confidence or raise an exception otherwise. ### New Dependencies: - `chardet` Fixes #4479 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @eyurtsev --------- Co-authored-by: blob42 <spike@w530>
		
			
				
	
	
		
			572 lines
		
	
	
		
			50 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			572 lines
		
	
	
		
			50 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "79f24a6b",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# File Directory\n",
 | |
|     "\n",
 | |
|     "This covers how to use the `DirectoryLoader` to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 1,
 | |
|    "id": "019d8520",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.document_loaders import DirectoryLoader"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "0c76cdc5",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.ipynb` files."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 2,
 | |
|    "id": "891fe56f",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = DirectoryLoader('../', glob=\"**/*.md\")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 3,
 | |
|    "id": "addfe9cf",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 4,
 | |
|    "id": "b042086d",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "1"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 4,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "len(docs)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "e633d62f",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Show a progress bar"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "43911860",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "By default a progress bar will not be shown. To show a progress bar, install the `tqdm` library (e.g. `pip install tqdm`), and set the `show_progress` parameter to `True`."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 10,
 | |
|    "id": "bb93daac",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)\n"
 | |
|      ]
 | |
|     },
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "0it [00:00, ?it/s]\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "%pip install tqdm\n",
 | |
|     "loader = DirectoryLoader('../', glob=\"**/*.md\", show_progress=True)\n",
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "c16ed46a",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Use multithreading"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "attachments": {},
 | |
|    "cell_type": "markdown",
 | |
|    "id": "5752e23e",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "By default the loading happens in one thread. In order to utilize several threads set the `use_multithreading` flag to true."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "id": "f8d84f52",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = DirectoryLoader('../', glob=\"**/*.md\", use_multithreading=True)\n",
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "c5652850",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Change loader class\n",
 | |
|     "By default this uses the `UnstructuredLoader` class. However, you can change up the type of loader pretty easily."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 15,
 | |
|    "id": "81c92da3",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.document_loaders import TextLoader"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 6,
 | |
|    "id": "ab38ee36",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = DirectoryLoader('../', glob=\"**/*.md\", loader_cls=TextLoader)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 7,
 | |
|    "id": "25c8740f",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 8,
 | |
|    "id": "38337763",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "1"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 8,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "len(docs)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "598a2805",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "If you need to load Python source code files, use the `PythonLoader`."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "id": "c558bd73",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.document_loaders import PythonLoader"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 13,
 | |
|    "id": "a3cfaba7",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = DirectoryLoader('../../../../../', glob=\"**/*.py\", loader_cls=PythonLoader)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "id": "e2e1e26a",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 15,
 | |
|    "id": "ffb8ff36",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "691"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 15,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "len(docs)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "6411a0cb",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Auto detect file encodings with TextLoader\n",
 | |
|     "\n",
 | |
|     "In this example we will see some strategies that can be useful when loading a big list of arbitrary files from a directory using the `TextLoader` class.\n",
 | |
|     "\n",
 | |
|     "First to illustrate the problem, let's try to load multiple text with arbitrary encodings."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 16,
 | |
|    "id": "2c787a69",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "path = '../../../../../tests/integration_tests/examples'\n",
 | |
|     "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "e9001e12",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### A. Default Behavior"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 19,
 | |
|    "id": "b1e88c31",
 | |
|    "metadata": {
 | |
|     "scrolled": true
 | |
|    },
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/html": [
 | |
|        "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">text.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">29</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">26 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>text = <span style=\"color: #808000; text-decoration-color: #808000\">\"\"</span>                                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">27 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">with</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">open</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path, encoding=<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.encoding) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> f:                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">28 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>29 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>text = f.read()                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">30 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">UnicodeDecodeError</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">31 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.autodetect_encoding:                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">32 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>detected_encodings = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.detect_file_encodings()                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/home/spike/.pyenv/versions/3.9.11/lib/python3.9/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">codecs.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">322</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">decode</span>                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 319 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">decode</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, final=<span style=\"color: #0000ff; text-decoration-color: #0000ff\">False</span>):                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 320 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># decode input (taking the buffer into account)</span>                                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 321 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>data = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.buffer + <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 322 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>(result, consumed) = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._buffer_decode(data, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.errors, final)                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 323 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># keep undecoded input until the next call</span>                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 324 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.buffer = data[consumed:]                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 325 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> result                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
 | |
|        "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">UnicodeDecodeError: </span><span style=\"color: #008000; text-decoration-color: #008000\">'utf-8'</span> codec can't decode byte <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0xca</span> in position <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>: invalid continuation byte\n",
 | |
|        "\n",
 | |
|        "<span style=\"font-style: italic\">The above exception was the direct cause of the following exception:</span>\n",
 | |
|        "\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\"><module></span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span>                                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1 loader.load()                                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">2 </span>                                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">directory.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">84</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">81 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.silent_errors:                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">82 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span>logger.warning(e)                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">83 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>84 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> e                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">85 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">finally</span>:                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">86 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> pbar:                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">87 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span>pbar.update(<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span>)                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">directory.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">78</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">75 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> i.is_file():                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">76 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> _is_visible(i.relative_to(p)) <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.load_hidden:                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">77 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>78 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span>sub_docs = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.loader_cls(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">str</span>(i), **<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.loader_kwargs).load()     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">79 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span>docs.extend(sub_docs)                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">80 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">Exception</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">81 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.silent_errors:                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">text.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">44</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">41 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">UnicodeDecodeError</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">42 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">continue</span>                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">43 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>44 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">RuntimeError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">f\"Error loading {</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span>) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">from</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline\">e</span>            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">45 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">Exception</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">46 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">RuntimeError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">f\"Error loading {</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span>) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">from</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline\">e</span>                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">47 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
 | |
|        "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
 | |
|        "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">RuntimeError: </span>Error loading ..<span style=\"color: #800080; text-decoration-color: #800080\">/../../../../tests/integration_tests/examples/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">example-non-utf8.txt</span>\n",
 | |
|        "</pre>\n"
 | |
|       ],
 | |
|       "text/plain": [
 | |
|        "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mtext.py\u001b[0m:\u001b[94m29\u001b[0m in \u001b[92mload\u001b[0m                             \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m26 \u001b[0m\u001b[2m│   │   \u001b[0mtext = \u001b[33m\"\u001b[0m\u001b[33m\"\u001b[0m                                                                           \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m27 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mwith\u001b[0m \u001b[96mopen\u001b[0m(\u001b[96mself\u001b[0m.file_path, encoding=\u001b[96mself\u001b[0m.encoding) \u001b[94mas\u001b[0m f:                             \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m28 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                            \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m29 \u001b[2m│   │   │   │   \u001b[0mtext = f.read()                                                             \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m30 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mUnicodeDecodeError\u001b[0m \u001b[94mas\u001b[0m e:                                                 \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m31 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.autodetect_encoding:                                                \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m32 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mdetected_encodings = \u001b[96mself\u001b[0m.detect_file_encodings()                       \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[2;33m/home/spike/.pyenv/versions/3.9.11/lib/python3.9/\u001b[0m\u001b[1;33mcodecs.py\u001b[0m:\u001b[94m322\u001b[0m in \u001b[92mdecode\u001b[0m                         \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 319 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92mdecode\u001b[0m(\u001b[96mself\u001b[0m, \u001b[96minput\u001b[0m, final=\u001b[94mFalse\u001b[0m):                                                 \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 320 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# decode input (taking the buffer into account)\u001b[0m                                   \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 321 \u001b[0m\u001b[2m│   │   \u001b[0mdata = \u001b[96mself\u001b[0m.buffer + \u001b[96minput\u001b[0m                                                        \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 322 \u001b[2m│   │   \u001b[0m(result, consumed) = \u001b[96mself\u001b[0m._buffer_decode(data, \u001b[96mself\u001b[0m.errors, final)                \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 323 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# keep undecoded input until the next call\u001b[0m                                        \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 324 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[96mself\u001b[0m.buffer = data[consumed:]                                                     \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m 325 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mreturn\u001b[0m result                                                                     \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
 | |
|        "\u001b[1;91mUnicodeDecodeError: \u001b[0m\u001b[32m'utf-8'\u001b[0m codec can't decode byte \u001b[1;36m0xca\u001b[0m in position \u001b[1;36m0\u001b[0m: invalid continuation byte\n",
 | |
|        "\n",
 | |
|        "\u001b[3mThe above exception was the direct cause of the following exception:\u001b[0m\n",
 | |
|        "\n",
 | |
|        "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m in \u001b[92m<module>\u001b[0m:\u001b[94m1\u001b[0m                                                                                    \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1 loader.load()                                                                                \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m2 \u001b[0m                                                                                             \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mdirectory.py\u001b[0m:\u001b[94m84\u001b[0m in \u001b[92mload\u001b[0m                        \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m81 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.silent_errors:                                              \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m82 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0mlogger.warning(e)                                               \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m83 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94melse\u001b[0m:                                                               \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m84 \u001b[2m│   │   │   │   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m e                                                         \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m85 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mfinally\u001b[0m:                                                                \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m86 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m pbar:                                                            \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m87 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0mpbar.update(\u001b[94m1\u001b[0m)                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mdirectory.py\u001b[0m:\u001b[94m78\u001b[0m in \u001b[92mload\u001b[0m                        \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m75 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mif\u001b[0m i.is_file():                                                                 \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m76 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mif\u001b[0m _is_visible(i.relative_to(p)) \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m.load_hidden:                       \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m77 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                    \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m78 \u001b[2m│   │   │   │   │   │   \u001b[0msub_docs = \u001b[96mself\u001b[0m.loader_cls(\u001b[96mstr\u001b[0m(i), **\u001b[96mself\u001b[0m.loader_kwargs).load()     \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m79 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0mdocs.extend(sub_docs)                                               \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m80 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mException\u001b[0m \u001b[94mas\u001b[0m e:                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m81 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.silent_errors:                                              \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mtext.py\u001b[0m:\u001b[94m44\u001b[0m in \u001b[92mload\u001b[0m                             \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m41 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mUnicodeDecodeError\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m42 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0m\u001b[94mcontinue\u001b[0m                                                        \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m43 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                       \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m44 \u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mRuntimeError\u001b[0m(\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mError loading \u001b[0m\u001b[33m{\u001b[0m\u001b[96mself\u001b[0m.file_path\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m) \u001b[94mfrom\u001b[0m \u001b[4;96me\u001b[0m            \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m45 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mException\u001b[0m \u001b[94mas\u001b[0m e:                                                          \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m46 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mRuntimeError\u001b[0m(\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mError loading \u001b[0m\u001b[33m{\u001b[0m\u001b[96mself\u001b[0m.file_path\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m) \u001b[94mfrom\u001b[0m \u001b[4;96me\u001b[0m                \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m│\u001b[0m   \u001b[2m47 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
 | |
|        "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
 | |
|        "\u001b[1;91mRuntimeError: \u001b[0mError loading ..\u001b[35m/../../../../tests/integration_tests/examples/\u001b[0m\u001b[95mexample-non-utf8.txt\u001b[0m\n"
 | |
|       ]
 | |
|      },
 | |
|      "metadata": {},
 | |
|      "output_type": "display_data"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "da554f9a",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The file `example-non-utf8.txt` uses a different encoding the `load()` function fails with a helpful message indicating which file failed decoding. \n",
 | |
|     "\n",
 | |
|     "With the default behavior of `TextLoader` any failure to load any of the documents will fail the whole loading process and no documents are loaded. "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "eb844b7e",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### B. Silent fail\n",
 | |
|     "\n",
 | |
|     "We can pass the parameter `silent_errors` to the `DirectoryLoader` to skip the files which could not be loaded and continue the load process."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 30,
 | |
|    "id": "5314ec39",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stderr",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "Error loading ../../../../../tests/integration_tests/examples/example-non-utf8.txt\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader, silent_errors=True)\n",
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 35,
 | |
|    "id": "216337e5",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',\n",
 | |
|        " '../../../../../tests/integration_tests/examples/example-utf8.txt']"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 35,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "doc_sources = [doc.metadata['source']  for doc in docs]\n",
 | |
|     "doc_sources"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "id": "4cba0e53",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### C. Auto detect encodings\n",
 | |
|     "\n",
 | |
|     "We can also ask `TextLoader` to auto detect the file encoding before failing, by passing the `autodetect_encoding` to the loader class."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 37,
 | |
|    "id": "96d527d2",
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "text_loader_kwargs={'autodetect_encoding': True}\n",
 | |
|     "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n",
 | |
|     "docs = loader.load()\n"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 38,
 | |
|    "id": "f1a136a5",
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "data": {
 | |
|       "text/plain": [
 | |
|        "['../../../../../tests/integration_tests/examples/example-non-utf8.txt',\n",
 | |
|        " '../../../../../tests/integration_tests/examples/whatsapp_chat.txt',\n",
 | |
|        " '../../../../../tests/integration_tests/examples/example-utf8.txt']"
 | |
|       ]
 | |
|      },
 | |
|      "execution_count": 38,
 | |
|      "metadata": {},
 | |
|      "output_type": "execute_result"
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "doc_sources = [doc.metadata['source']  for doc in docs]\n",
 | |
|     "doc_sources"
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3 (ipykernel)",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.9.11"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 5
 | |
| }
 |