Harrison/unstructured support (#903)

This commit is contained in:
Harrison Chase
2023-02-05 23:02:07 -08:00
committed by GitHub
parent 2a68be3e8d
commit 53d56d7650
16 changed files with 555 additions and 0 deletions

View File

@@ -0,0 +1,86 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Notion\n",
"This notebook covers how to load documents from a Notion database dump.\n",
"\n",
"In order to get this notion dump, follow these instructions:\n",
"\n",
"## 🧑 Instructions for ingesting your own dataset\n",
"\n",
"Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n",
"\n",
"<img src=\"export_notion.png\" alt=\"export\" width=\"200\"/>\n",
"\n",
"When exporting, make sure to select the `Markdown & CSV` format option.\n",
"\n",
"<img src=\"export_format.png\" alt=\"export-format\" width=\"200\"/>\n",
"\n",
"This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.\n",
"\n",
"Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).\n",
"\n",
"```shell\n",
"unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB\n",
"```\n",
"\n",
"Run the following command to ingest the data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import NotionDirectoryLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"loader = NotionDirectoryLoader(\"Notion_DB\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}