docs: streamline LangSmith teasing (#30302)

This can only be reviewed by [hiding
whitespaces](https://github.com/langchain-ai/langchain/pull/30302/files?diff=unified&w=1).

The motivation behind this PR is to get my hands on the docs and make
the LangSmith teasing short and clear.

Right now I don't know how to do it, but this could be an include in the
future.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This commit is contained in:
Oskar Stark
2025-03-28 20:13:22 +01:00
committed by GitHub
parent dd0faab07e
commit 0d2cea747c
61 changed files with 22329 additions and 22441 deletions

View File

@@ -68,9 +68,7 @@
"cell_type": "markdown",
"id": "f6844fff-3702-4489-ab74-732f69f3b9d7",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -67,9 +67,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -69,9 +69,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -48,9 +48,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -84,9 +84,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -47,9 +47,7 @@
"cell_type": "markdown",
"id": "4a524cff",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -61,18 +61,18 @@
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7f11de02",
"execution_count": null,
"id": "3c2fc2201dc80557",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass()"
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "4c26754b-b3c9-4d93-8f36-43049bd943bf",
"id": "31f2af10e04dec59",
"metadata": {},
"source": [
"## Usage\n",
@@ -82,11 +82,9 @@
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
"metadata": {
"tags": []
},
"execution_count": null,
"id": "fa83b00a929614ad",
"metadata": {},
"outputs": [],
"source": [
"from langchain_cohere import ChatCohere\n",

View File

@@ -71,9 +71,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -60,9 +60,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -58,9 +58,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -68,9 +68,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -55,7 +55,7 @@
"- https://cloud.google.com/docs/authentication/application-default-credentials#GAC\n",
"- https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth\n",
"\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
]
},
{

View File

@@ -58,9 +58,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -60,9 +60,7 @@
"cell_type": "markdown",
"id": "788f37ac",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -81,9 +81,7 @@
"cell_type": "markdown",
"id": "7c695442",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -82,9 +82,7 @@
"cell_type": "markdown",
"id": "52dc8dcb-0a48-4a4e-9947-764116d2ffd4",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",
@@ -426,9 +424,7 @@
"cell_type": "markdown",
"id": "be1688a5",
"metadata": {},
"source": [
"At the moment, some extra processing happens client-side to support larger images like the one above. But for smaller images (and to better illustrate the process going on under the hood), we can directly pass in the image as shown below: "
]
"source": "At the moment, some extra processing happens client-side to support larger images like the one above. But for smaller images (and to better illustrate the process going on under the hood), we can directly pass in the image as shown below:"
},
{
"cell_type": "code",

View File

@@ -61,9 +61,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",
@@ -97,18 +95,18 @@
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Make sure you're using the latest Ollama version for structured outputs. Update by running:",
"id": "b18bd692076f7cf7"
"id": "b18bd692076f7cf7",
"metadata": {},
"source": "Make sure you're using the latest Ollama version for structured outputs. Update by running:"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "%pip install -U ollama",
"id": "b7a05cba95644c2e"
"id": "b7a05cba95644c2e",
"metadata": {},
"outputs": [],
"source": "%pip install -U ollama"
},
{
"cell_type": "markdown",

View File

@@ -61,9 +61,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -46,9 +46,7 @@
"cell_type": "markdown",
"id": "c3b1707a-cf2c-4367-94e3-436c43402503",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -61,9 +61,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -48,9 +48,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -82,9 +82,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -34,9 +34,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -33,9 +33,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -42,9 +42,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -29,9 +29,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -43,9 +43,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -30,24 +30,22 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:08.878423Z",
"start_time": "2025-01-21T08:00:08.876042Z"
}
},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
],
"outputs": [],
"execution_count": 1
]
},
{
"cell_type": "markdown",
@@ -59,14 +57,14 @@
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:12.003718Z",
"start_time": "2025-01-21T08:00:10.291617Z"
}
},
"cell_type": "code",
"source": "%pip install -qU langchain_community pypdf pillow",
"outputs": [
{
"name": "stdout",
@@ -76,11 +74,11 @@
]
}
],
"execution_count": 2
"source": "%pip install -qU langchain_community pypdf pillow"
},
{
"metadata": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
@@ -88,13 +86,15 @@
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:18.512061Z",
"start_time": "2025-01-21T08:00:17.313969Z"
}
},
"cell_type": "code",
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFDirectoryLoader\n",
"\n",
@@ -102,9 +102,7 @@
" \"../../docs/integrations/document_loaders/example_data/layout-parser-paper.pdf\"\n",
")\n",
"loader = PyPDFDirectoryLoader(\"example_data/\")"
],
"outputs": [],
"execution_count": 3
]
},
{
"cell_type": "markdown",
@@ -115,16 +113,13 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:23.549752Z",
"start_time": "2025-01-21T08:00:23.129010Z"
}
},
"source": [
"docs = loader.load()\n",
"docs[0]"
],
"outputs": [
{
"data": {
@@ -137,19 +132,20 @@
"output_type": "execute_result"
}
],
"execution_count": 4
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:26.612346Z",
"start_time": "2025-01-21T08:00:26.609051Z"
}
},
"source": [
"print(docs[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -159,7 +155,9 @@
]
}
],
"execution_count": 5
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
@@ -170,12 +168,14 @@
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2025-01-21T08:00:30.251598Z",
"start_time": "2025-01-21T08:00:29.972141Z"
}
},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
@@ -185,9 +185,7 @@
" # index.upsert(page)\n",
"\n",
" page = []"
],
"outputs": [],
"execution_count": 6
]
},
{
"cell_type": "markdown",
@@ -199,10 +197,10 @@
]
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": ""
}
],

View File

@@ -37,22 +37,22 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:15.370257Z",
"start_time": "2025-02-06T07:04:15.367300Z"
}
},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
],
"outputs": [],
"execution_count": 1
]
},
{
"cell_type": "markdown",
@@ -65,13 +65,13 @@
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:18.630037Z",
"start_time": "2025-02-06T07:04:15.634391Z"
}
},
"source": "%pip install -qU langchain_community pypdfium2",
"outputs": [
{
"name": "stdout",
@@ -81,7 +81,7 @@
]
}
],
"execution_count": 2
"source": "%pip install -qU langchain_community pypdfium2"
},
{
"cell_type": "markdown",
@@ -94,20 +94,20 @@
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:19.594910Z",
"start_time": "2025-02-06T07:04:18.671508Z"
}
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFium2Loader\n",
"\n",
"file_path = \"./example_data/layout-parser-paper.pdf\"\n",
"loader = PyPDFium2Loader(file_path)"
],
"outputs": [],
"execution_count": 3
]
},
{
"cell_type": "markdown",
@@ -118,16 +118,13 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:19.717964Z",
"start_time": "2025-02-06T07:04:19.607741Z"
}
},
"source": [
"docs = loader.load()\n",
"docs[0]"
],
"outputs": [
{
"data": {
@@ -140,21 +137,20 @@
"output_type": "execute_result"
}
],
"execution_count": 4
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:19.784617Z",
"start_time": "2025-02-06T07:04:19.782020Z"
}
},
"source": [
"import pprint\n",
"\n",
"pprint.pp(docs[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -174,7 +170,11 @@
]
}
],
"execution_count": 5
"source": [
"import pprint\n",
"\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
@@ -185,23 +185,13 @@
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:22.359295Z",
"start_time": "2025-02-06T07:04:22.143306Z"
}
},
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" pages = []\n",
"len(pages)"
],
"outputs": [
{
"data": {
@@ -214,20 +204,27 @@
"output_type": "execute_result"
}
],
"execution_count": 6
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" pages = []\n",
"len(pages)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:23.200681Z",
"start_time": "2025-02-06T07:04:23.189169Z"
}
},
"source": [
"print(pages[0].page_content[:100])\n",
"pprint.pp(pages[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -249,7 +246,10 @@
]
}
],
"execution_count": 7
"source": [
"print(pages[0].page_content[:100])\n",
"pprint.pp(pages[0].metadata)"
]
},
{
"cell_type": "markdown",
@@ -290,21 +290,13 @@
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:27.102894Z",
"start_time": "2025-02-06T07:04:26.941787Z"
}
},
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
")\n",
"docs = loader.load()\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -325,7 +317,15 @@
]
}
],
"execution_count": 8
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
")\n",
"docs = loader.load()\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
@@ -339,21 +339,13 @@
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:29.714085Z",
"start_time": "2025-02-06T07:04:29.646263Z"
}
},
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
")\n",
"docs = loader.load()\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -373,7 +365,15 @@
]
}
],
"execution_count": 9
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
")\n",
"docs = loader.load()\n",
"print(len(docs))\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
@@ -387,21 +387,13 @@
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:33.462591Z",
"start_time": "2025-02-06T07:04:33.299846Z"
}
},
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
" pages_delimiter=\"\\n-------THIS IS A CUSTOM END OF PAGE-------\\n\",\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].page_content[:5780])"
],
"outputs": [
{
"name": "stdout",
@@ -502,7 +494,15 @@
]
}
],
"execution_count": 10
"source": [
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"single\",\n",
" pages_delimiter=\"\\n-------THIS IS A CUSTOM END OF PAGE-------\\n\",\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].page_content[:5780])"
]
},
{
"cell_type": "markdown",
@@ -535,15 +535,13 @@
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:04:39.419623Z",
"start_time": "2025-02-06T07:04:37.250297Z"
}
},
"source": [
"%pip install -qU rapidocr-onnxruntime"
],
"outputs": [
{
"name": "stdout",
@@ -553,29 +551,19 @@
]
}
],
"execution_count": 11
"source": [
"%pip install -qU rapidocr-onnxruntime"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:02.902374Z",
"start_time": "2025-02-06T07:04:39.500569Z"
}
},
"source": [
"from langchain_community.document_loaders.parsers import RapidOCRBlobParser\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"markdown-img\",\n",
" images_parser=RapidOCRBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(docs[5].page_content)"
],
"outputs": [
{
"name": "stdout",
@@ -649,7 +637,19 @@
]
}
],
"execution_count": 12
"source": [
"from langchain_community.document_loaders.parsers import RapidOCRBlobParser\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"markdown-img\",\n",
" images_parser=RapidOCRBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"\n",
"print(docs[5].page_content)"
]
},
{
"cell_type": "markdown",
@@ -663,15 +663,13 @@
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:07.656993Z",
"start_time": "2025-02-06T07:05:05.890565Z"
}
},
"source": [
"%pip install -qU pytesseract"
],
"outputs": [
{
"name": "stdout",
@@ -681,28 +679,19 @@
]
}
],
"execution_count": 13
"source": [
"%pip install -qU pytesseract"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:16.677336Z",
"start_time": "2025-02-06T07:05:07.724790Z"
}
},
"source": [
"from langchain_community.document_loaders.parsers import TesseractBlobParser\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"html-img\",\n",
" images_parser=TesseractBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"print(docs[5].page_content)"
],
"outputs": [
{
"name": "stdout",
@@ -776,7 +765,18 @@
]
}
],
"execution_count": 14
"source": [
"from langchain_community.document_loaders.parsers import TesseractBlobParser\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"html-img\",\n",
" images_parser=TesseractBlobParser(),\n",
")\n",
"docs = loader.load()\n",
"print(docs[5].page_content)"
]
},
{
"cell_type": "markdown",
@@ -785,15 +785,13 @@
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:57.591688Z",
"start_time": "2025-02-06T07:05:54.591989Z"
}
},
"source": [
"%pip install -qU langchain_openai"
],
"outputs": [
{
"name": "stdout",
@@ -803,23 +801,19 @@
]
}
],
"execution_count": 15
"source": [
"%pip install -qU langchain_openai"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:58.280055Z",
"start_time": "2025-02-06T07:05:58.180689Z"
}
},
"source": [
"import os\n",
"\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
],
"outputs": [
{
"data": {
@@ -832,46 +826,40 @@
"output_type": "execute_result"
}
],
"execution_count": 16
"source": [
"import os\n",
"\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:05:59.170560Z",
"start_time": "2025-02-06T07:05:59.167117Z"
}
},
"outputs": [],
"source": [
"from getpass import getpass\n",
"\n",
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API key =\")"
],
"outputs": [],
"execution_count": 17
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:07:05.416416Z",
"start_time": "2025-02-06T07:06:00.694853Z"
}
},
"source": [
"from langchain_community.document_loaders.parsers import LLMImageBlobParser\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"markdown-img\",\n",
" images_parser=LLMImageBlobParser(model=ChatOpenAI(model=\"gpt-4o\", max_tokens=1024)),\n",
")\n",
"docs = loader.load()\n",
"print(docs[5].page_content)"
],
"outputs": [
{
"name": "stdout",
@@ -956,7 +944,19 @@
]
}
],
"execution_count": 18
"source": [
"from langchain_community.document_loaders.parsers import LLMImageBlobParser\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"loader = PyPDFium2Loader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" mode=\"page\",\n",
" images_inner_format=\"markdown-img\",\n",
" images_parser=LLMImageBlobParser(model=ChatOpenAI(model=\"gpt-4o\", max_tokens=1024)),\n",
")\n",
"docs = loader.load()\n",
"print(docs[5].page_content)"
]
},
{
"cell_type": "markdown",
@@ -972,28 +972,13 @@
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2025-02-06T07:07:08.394894Z",
"start_time": "2025-02-06T07:07:08.164047Z"
}
},
"source": [
"from langchain_community.document_loaders import FileSystemBlobLoader\n",
"from langchain_community.document_loaders.generic import GenericLoader\n",
"from langchain_community.document_loaders.parsers import PyPDFium2Parser\n",
"\n",
"loader = GenericLoader(\n",
" blob_loader=FileSystemBlobLoader(\n",
" path=\"./example_data/\",\n",
" glob=\"*.pdf\",\n",
" ),\n",
" blob_parser=PyPDFium2Parser(),\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].page_content)\n",
"pprint.pp(docs[0].metadata)"
],
"outputs": [
{
"name": "stdout",
@@ -1060,7 +1045,22 @@
]
}
],
"execution_count": 19
"source": [
"from langchain_community.document_loaders import FileSystemBlobLoader\n",
"from langchain_community.document_loaders.generic import GenericLoader\n",
"from langchain_community.document_loaders.parsers import PyPDFium2Parser\n",
"\n",
"loader = GenericLoader(\n",
" blob_loader=FileSystemBlobLoader(\n",
" path=\"./example_data/\",\n",
" glob=\"*.pdf\",\n",
" ),\n",
" blob_parser=PyPDFium2Parser(),\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].page_content)\n",
"pprint.pp(docs[0].metadata)"
]
},
{
"cell_type": "markdown",

View File

@@ -37,7 +37,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -33,9 +33,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -32,9 +32,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -34,9 +34,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -35,9 +35,7 @@
"cell_type": "markdown",
"id": "fc4ba987",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -58,9 +58,7 @@
"cell_type": "markdown",
"id": "f5d528fa",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -35,7 +35,7 @@
"metadata": {},
"source": [
"## Setup\n",
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
]
},
{

View File

@@ -55,9 +55,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -63,9 +63,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -54,9 +54,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -54,9 +54,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -103,9 +103,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -53,9 +53,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -53,9 +53,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -55,9 +55,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -55,9 +55,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -53,9 +53,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -54,9 +54,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -57,9 +57,7 @@
"cell_type": "markdown",
"id": "72ee0c4b",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -24,9 +24,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -34,9 +34,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -44,9 +44,7 @@
"cell_type": "markdown",
"id": "36a178eb-1f2c-411e-bf25-0240ead4c62a",
"metadata": {},
"source": [
"Note that if you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -29,9 +29,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -19,7 +19,7 @@
"\n",
"## Setup\n",
"\n",
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
]
},
{

View File

@@ -254,9 +254,7 @@
"cell_type": "markdown",
"id": "7f98392b",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -80,7 +80,7 @@
"source": [
"You can use `VDMS` without any credentials.\n",
"\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
]
},
{

View File

@@ -70,9 +70,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -64,9 +64,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -63,9 +63,7 @@
"cell_type": "markdown",
"id": "4b6e1ca6",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -59,9 +59,7 @@
"cell_type": "markdown",
"id": "c84fb993",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -30,9 +30,7 @@
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing from runs of individual tools, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of individual tools, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",

View File

@@ -75,9 +75,7 @@
"cell_type": "markdown",
"id": "7f98392b",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
"source": "To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
},
{
"cell_type": "code",