mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-07 05:52:15 +00:00
Rspace doc loader (#11511)
**Description:** Add a document loader for the RSpace Electronic Lab Notebook (www.researchspace.com), so that scientific documents and research notes can be easily pulled into Langchain pipelines. **Issue** This is an new contribution, rather than an issue fix. **Dependencies:** There are no new required dependencies. In order to use the loader, clients will need to install rspace_client SDK using `pip install rspace_client` --------- Co-authored-by: richarda23 <richard.c.adams@infinityworks.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
127
docs/docs/integrations/document_loaders/rspace.ipynb
Normal file
127
docs/docs/integrations/document_loaders/rspace.ipynb
Normal file
@@ -0,0 +1,127 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2f1572a5-9f8c-44f1-82f3-ddeee8f55145",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook shows how to use the RSpace document loader to import research notes and documents from RSpace Electronic\n",
|
||||
"Lab Notebook into Langchain pipelines.\n",
|
||||
"\n",
|
||||
"To start you'll need an RSpace account and an API key.\n",
|
||||
"\n",
|
||||
"You can set up a free account at [https://community.researchspace.com](https://community.researchspace.com) or use your institutional RSpace.\n",
|
||||
"\n",
|
||||
"You can get an RSpace API token from your account's profile page. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9e5310d2-a864-4464-bdca-81f30c9d0bdb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install rspace_client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "61b1d1b7-a28c-4fba-83a3-df64baa8b6b8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It's best to store your RSpace API key as an environment variable. \n",
|
||||
"\n",
|
||||
" RSPACE_API_KEY=<YOUR_KEY>\n",
|
||||
"\n",
|
||||
"You'll also need to set the URL of your RSpace installation e.g.\n",
|
||||
"\n",
|
||||
" RSPACE_URL=https://community.researchspace.com\n",
|
||||
"\n",
|
||||
"If you use these exact environment variable names, they will be detected automatically. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "13c19ea4-100f-417e-b52f-7e8730c7c1d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.rspace import RSpaceLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4fd42831-0e79-4068-a5e1-7e2cfc242789",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can import various items from RSpace:\n",
|
||||
"\n",
|
||||
"* A single RSpace structured or basic document. This will map 1-1 to a Langchain document.\n",
|
||||
"* A folder or noteook. All documents inside the notebook or folder are imported as Langchain documents. \n",
|
||||
"* If you have PDF files in the RSpace Gallery, these can be imported individually as well. Under the hood, Langchain's PDF loader will be used and this creates one Langchain document per PDF page. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8e614357-5eca-401b-ab98-ea55b0465009",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## replace these ids with some from your own research notes.\n",
|
||||
"## Make sure to use global ids (with the 2 character prefix). This helps the loader know which API calls to make \n",
|
||||
"## to RSpace API.\n",
|
||||
"\n",
|
||||
"rspace_ids = [\"NB1932027\", \"FL1921314\", \"SD1932029\", \"GL1932384\"]\n",
|
||||
"for rs_id in rspace_ids:\n",
|
||||
" loader = RSpaceLoader(global_id=rs_id)\n",
|
||||
" docs = loader.load()\n",
|
||||
" for doc in docs:\n",
|
||||
" ## the name and ID are added to the 'source' metadata property.\n",
|
||||
" print (doc.metadata)\n",
|
||||
" print(doc.page_content[:500])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1b41758d-24e0-4994-a30f-3acccc7795e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you don't want to use the environment variables as above, you can pass these into the RSpaceLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "aa079ca6-439d-4010-9edd-cd77d8884fab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = RSpaceLoader(global_id=rs_id, api_key=\"MY_API_KEY\", url=\"https://my.researchspace.com\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
Reference in New Issue
Block a user