mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-14 03:27:29 +00:00
More than 300 files - will fail check_diff. Will merge after Vercel deploy succeeds Still occurrences that need changing - will update more later
119 lines
3.4 KiB
Plaintext
119 lines
3.4 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Nuclia\n",
|
|
"\n",
|
|
">[Nuclia](https://nuclia.com) automatically indexes your unstructured data from any internal and external source, providing optimized search results and generative answers. It can handle video and audio transcription, image content extraction, and document parsing.\n",
|
|
"\n",
|
|
"`Nuclia Understanding API` document transformer splits text into paragraphs and sentences, identifies entities, provides a summary of the text and generates embeddings for all the sentences.\n",
|
|
"\n",
|
|
"To use the Nuclia Understanding API, you need to have a Nuclia account. You can create one for free at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro).\n",
|
|
"\n",
|
|
"from langchain_community.document_transformers.nuclia_text_transform import NucliaTextTransformer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade --quiet protobuf\n",
|
|
"%pip install --upgrade --quiet nucliadb-protos"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"\n",
|
|
"os.environ[\"NUCLIA_ZONE\"] = \"<YOUR_ZONE>\" # e.g. europe-1\n",
|
|
"os.environ[\"NUCLIA_NUA_KEY\"] = \"<YOUR_API_KEY>\""
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To use the Nuclia document transformer, you need to instantiate a `NucliaUnderstandingAPI` tool with `enable_ml` set to `True`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.tools.nuclia import NucliaUnderstandingAPI\n",
|
|
"\n",
|
|
"nua = NucliaUnderstandingAPI(enable_ml=True)"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The Nuclia document transformer must be called in async mode, so you need to use the `atransform_documents` method:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import asyncio\n",
|
|
"\n",
|
|
"from langchain_community.document_transformers.nuclia_text_transform import (\n",
|
|
" NucliaTextTransformer,\n",
|
|
")\n",
|
|
"from langchain_core.documents import Document\n",
|
|
"\n",
|
|
"\n",
|
|
"async def process():\n",
|
|
" documents = [\n",
|
|
" Document(page_content=\"<TEXT 1>\", metadata={}),\n",
|
|
" Document(page_content=\"<TEXT 2>\", metadata={}),\n",
|
|
" Document(page_content=\"<TEXT 3>\", metadata={}),\n",
|
|
" ]\n",
|
|
" nuclia_transformer = NucliaTextTransformer(nua)\n",
|
|
" transformed_documents = await nuclia_transformer.atransform_documents(documents)\n",
|
|
" print(transformed_documents)\n",
|
|
"\n",
|
|
"\n",
|
|
"asyncio.run(process())"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|