mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-11 18:16:12 +00:00
``` https://api\.python\.langchain\.com/en/latest/([^/]*)/langchain_([^.]*)\.(.*)\.html([^"]*) https://python.langchain.com/v0.2/api_reference/$2/$1/langchain_$2.$3.html$4 ``` --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
393 lines
18 KiB
Plaintext
393 lines
18 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9122e4b9-4883-4e6e-940b-ab44a70f0951",
|
|
"metadata": {},
|
|
"source": [
|
|
"# How to load documents from a directory\n",
|
|
"\n",
|
|
"LangChain's [DirectoryLoader](https://python.langchain.com/v0.2/api_reference/community/document_loaders/langchain_community.document_loaders.directory.DirectoryLoader.html) implements functionality for reading files from disk into LangChain [Document](https://python.langchain.com/v0.2/api_reference/core/documents/langchain_core.documents.base.Document.html#langchain_core.documents.base.Document) objects. Here we demonstrate:\n",
|
|
"\n",
|
|
"- How to load from a filesystem, including use of wildcard patterns;\n",
|
|
"- How to use multithreading for file I/O;\n",
|
|
"- How to use custom loader classes to parse specific file types (e.g., code);\n",
|
|
"- How to handle errors, such as those due to decoding."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "1c1e3796-bee8-4882-8065-6b98e48ec53a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import DirectoryLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e3cdb7bb-1f58-4a7a-af83-599443127834",
|
|
"metadata": {},
|
|
"source": [
|
|
"`DirectoryLoader` accepts a `loader_cls` kwarg, which defaults to [UnstructuredLoader](/docs/integrations/document_loaders/unstructured_file). [Unstructured](https://unstructured-io.github.io/unstructured/) supports parsing for a number of formats, such as PDF and HTML. Here we use it to read in a markdown (.md) file.\n",
|
|
"\n",
|
|
"We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.html` files."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "bd2fcd1f-8286-499b-b43a-0c17084ae8ee",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"20"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"loader = DirectoryLoader(\"../\", glob=\"**/*.md\")\n",
|
|
"docs = loader.load()\n",
|
|
"len(docs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "9ff1503d-3ac0-4172-99ec-15c9a4a707d8",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Security\n",
|
|
"\n",
|
|
"LangChain has a large ecosystem of integrations with various external resources like local\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(docs[0].page_content[:100])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b8b1cee8-626a-461a-8d33-1c56120f1cc0",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Show a progress bar\n",
|
|
"\n",
|
|
"By default a progress bar will not be shown. To show a progress bar, install the `tqdm` library (e.g. `pip install tqdm`), and set the `show_progress` parameter to `True`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "cfa48224-5d02-4aa7-93c7-ce48241645d5",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 54.56it/s]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"loader = DirectoryLoader(\"../\", glob=\"**/*.md\", show_progress=True)\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5e02c922-6a4b-48e6-8c46-5015553eafbe",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Use multithreading\n",
|
|
"\n",
|
|
"By default the loading happens in one thread. In order to utilize several threads set the `use_multithreading` flag to true."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "aae1c580-6d7c-409c-bfc8-3049fa8bdbf9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"loader = DirectoryLoader(\"../\", glob=\"**/*.md\", use_multithreading=True)\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5add3f54-f303-4006-90c9-540a90ab8c46",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Change loader class\n",
|
|
"By default this uses the `UnstructuredLoader` class. To customize the loader, specify the loader class in the `loader_cls` kwarg. Below we show an example using [TextLoader](https://python.langchain.com/v0.2/api_reference/community/document_loaders/langchain_community.document_loaders.text.TextLoader.html):"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "d369ee78-ea24-48cc-9f46-1f5cd4b56f48",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import TextLoader\n",
|
|
"\n",
|
|
"loader = DirectoryLoader(\"../\", glob=\"**/*.md\", loader_cls=TextLoader)\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "2863d7dd-2d56-4fef-8bfd-95c48a6b4a71",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"# Security\n",
|
|
"\n",
|
|
"LangChain has a large ecosystem of integrations with various external resources like loc\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(docs[0].page_content[:100])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c97ed37b-38c0-4f31-9403-d3a5d5444f78",
|
|
"metadata": {},
|
|
"source": [
|
|
"Notice that while the `UnstructuredLoader` parses Markdown headers, `TextLoader` does not.\n",
|
|
"\n",
|
|
"If you need to load Python source code files, use the `PythonLoader`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "5ef483a8-57d3-45e5-93be-37c8416c543c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.document_loaders import PythonLoader\n",
|
|
"\n",
|
|
"loader = DirectoryLoader(\"../../../../../\", glob=\"**/*.py\", loader_cls=PythonLoader)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "61dd1428-8246-47e3-b1da-f6a3d6f05566",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Auto-detect file encodings with TextLoader\n",
|
|
"\n",
|
|
"`DirectoryLoader` can help manage errors due to variations in file encodings. Below we will attempt to load in a collection of files, one of which includes non-UTF8 encodings."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "e69db7ae-0385-4129-968f-17c42c7a635c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path = \"../../../../libs/langchain/tests/unit_tests/examples/\"\n",
|
|
"\n",
|
|
"loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e3b61cf0-809b-4c97-b1a4-17c6aa4343e1",
|
|
"metadata": {},
|
|
"source": [
|
|
"### A. Default Behavior\n",
|
|
"\n",
|
|
"By default we raise an error:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "4b8f56be-122a-4c56-86a5-a70631a78ec7",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Error loading file ../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt\n"
|
|
]
|
|
},
|
|
{
|
|
"ename": "RuntimeError",
|
|
"evalue": "Error loading ../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt",
|
|
"output_type": "error",
|
|
"traceback": [
|
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
|
"\u001b[0;31mUnicodeDecodeError\u001b[0m Traceback (most recent call last)",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/text.py:43\u001b[0m, in \u001b[0;36mTextLoader.lazy_load\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfile_path, encoding\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mencoding) \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[0;32m---> 43\u001b[0m text \u001b[38;5;241m=\u001b[39m \u001b[43mf\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 44\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mUnicodeDecodeError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n",
|
|
"File \u001b[0;32m~/.pyenv/versions/3.10.4/lib/python3.10/codecs.py:322\u001b[0m, in \u001b[0;36mBufferedIncrementalDecoder.decode\u001b[0;34m(self, input, final)\u001b[0m\n\u001b[1;32m 321\u001b[0m data \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mbuffer \u001b[38;5;241m+\u001b[39m \u001b[38;5;28minput\u001b[39m\n\u001b[0;32m--> 322\u001b[0m (result, consumed) \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_buffer_decode\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mfinal\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 323\u001b[0m \u001b[38;5;66;03m# keep undecoded input until the next call\u001b[39;00m\n",
|
|
"\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0xca in position 0: invalid continuation byte",
|
|
"\nThe above exception was the direct cause of the following exception:\n",
|
|
"\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)",
|
|
"Cell \u001b[0;32mIn[10], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mloader\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/directory.py:117\u001b[0m, in \u001b[0;36mDirectoryLoader.load\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 115\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mload\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m List[Document]:\n\u001b[1;32m 116\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Load documents.\"\"\"\u001b[39;00m\n\u001b[0;32m--> 117\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mlist\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlazy_load\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/directory.py:182\u001b[0m, in \u001b[0;36mDirectoryLoader.lazy_load\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 180\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 181\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m i \u001b[38;5;129;01min\u001b[39;00m items:\n\u001b[0;32m--> 182\u001b[0m \u001b[38;5;28;01myield from\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_lazy_load_file(i, p, pbar)\n\u001b[1;32m 184\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m pbar:\n\u001b[1;32m 185\u001b[0m pbar\u001b[38;5;241m.\u001b[39mclose()\n",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/directory.py:220\u001b[0m, in \u001b[0;36mDirectoryLoader._lazy_load_file\u001b[0;34m(self, item, path, pbar)\u001b[0m\n\u001b[1;32m 218\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 219\u001b[0m logger\u001b[38;5;241m.\u001b[39merror(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError loading file \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mstr\u001b[39m(item)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 220\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 221\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m 222\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m pbar:\n",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/directory.py:210\u001b[0m, in \u001b[0;36mDirectoryLoader._lazy_load_file\u001b[0;34m(self, item, path, pbar)\u001b[0m\n\u001b[1;32m 208\u001b[0m loader \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mloader_cls(\u001b[38;5;28mstr\u001b[39m(item), \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mloader_kwargs)\n\u001b[1;32m 209\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 210\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m subdoc \u001b[38;5;129;01min\u001b[39;00m loader\u001b[38;5;241m.\u001b[39mlazy_load():\n\u001b[1;32m 211\u001b[0m \u001b[38;5;28;01myield\u001b[39;00m subdoc\n\u001b[1;32m 212\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m:\n",
|
|
"File \u001b[0;32m~/repos/langchain/libs/community/langchain_community/document_loaders/text.py:56\u001b[0m, in \u001b[0;36mTextLoader.lazy_load\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 54\u001b[0m \u001b[38;5;28;01mcontinue\u001b[39;00m\n\u001b[1;32m 55\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m---> 56\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError loading \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\u001b[1;32m 57\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 58\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError loading \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mfile_path\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n",
|
|
"\u001b[0;31mRuntimeError\u001b[0m: Error loading ../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "48308077-2d99-4dd6-9bf1-dd1ad6c64b0f",
|
|
"metadata": {},
|
|
"source": [
|
|
"The file `example-non-utf8.txt` uses a different encoding, so the `load()` function fails with a helpful message indicating which file failed decoding.\n",
|
|
"\n",
|
|
"With the default behavior of `TextLoader` any failure to load any of the documents will fail the whole loading process and no documents are loaded.\n",
|
|
"\n",
|
|
"### B. Silent fail\n",
|
|
"\n",
|
|
"We can pass the parameter `silent_errors` to the `DirectoryLoader` to skip the files which could not be loaded and continue the load process."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "b333c652-a7ad-47f4-8be8-d27c18ef11b7",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Error loading file ../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt: Error loading ../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"loader = DirectoryLoader(\n",
|
|
" path, glob=\"**/*.txt\", loader_cls=TextLoader, silent_errors=True\n",
|
|
")\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "b99ef682-b892-4790-8964-40185fea41a2",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['../../../../libs/langchain/tests/unit_tests/examples/example-utf8.txt']"
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"doc_sources = [doc.metadata[\"source\"] for doc in docs]\n",
|
|
"doc_sources"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "da475bff-2f4f-4ea3-a058-2979042c5326",
|
|
"metadata": {},
|
|
"source": [
|
|
"### C. Auto detect encodings\n",
|
|
"\n",
|
|
"We can also ask `TextLoader` to auto detect the file encoding before failing, by passing the `autodetect_encoding` to the loader class."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "832760da-ed9f-4e68-a67c-35493bde2214",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"text_loader_kwargs = {\"autodetect_encoding\": True}\n",
|
|
"loader = DirectoryLoader(\n",
|
|
" path, glob=\"**/*.txt\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs\n",
|
|
")\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "5c4f4dba-f84f-496e-9378-3e6858305619",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['../../../../libs/langchain/tests/unit_tests/examples/example-utf8.txt',\n",
|
|
" '../../../../libs/langchain/tests/unit_tests/examples/example-non-utf8.txt']"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"doc_sources = [doc.metadata[\"source\"] for doc in docs]\n",
|
|
"doc_sources"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|