docs: update notebook for latest Pinecone API + serverless (#21921)

Thank you for contributing to LangChain!

- [x] **PR title**: "docs: update notebook for latest Pinecone API +
serverless"


- [x] **PR message**: Published notebook is incompatible with latest
`pinecone-client` and not runnable. Updated for use with latest Pinecone
Python SDK. Also updated to be compatible with serverless indexes (only
index type available on Pinecone free tier).


- [x] **Add tests and docs**: N/A (tested in Colab)


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.


---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1207328087952499
This commit is contained in:
junefish 2024-05-20 14:51:03 -04:00 committed by GitHub
parent 9c76739425
commit 0614a53d9c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -24,7 +24,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet pinecone-client pinecone-text"
"%pip install --upgrade --quiet pinecone-client pinecone-text pinecone-notebooks"
]
},
{
@ -34,10 +34,14 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"# Connect to Pinecone and get an API key.\n",
"from pinecone_notebooks.colab import Authenticate\n",
"\n",
"Authenticate()\n",
"\n",
"import os\n",
"\n",
"os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Pinecone API Key:\")"
"api_key = os.environ[\"PINECONE_API_KEY\"]"
]
},
{
@ -52,16 +56,6 @@
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4577fea1-05e7-47a0-8173-56b0ddaa22bf",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"PINECONE_ENVIRONMENT\"] = getpass.getpass(\"Pinecone Environment:\")"
]
},
{
"cell_type": "markdown",
"id": "80e2e8e3-0fb5-4bd9-9196-9eada3439a61",
@ -77,6 +71,8 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
@ -93,9 +89,7 @@
"id": "95d5d7f9",
"metadata": {},
"source": [
"You should only have to do this part once.\n",
"\n",
"Note: it's important to make sure that the \"context\" field that holds the document text in the metadata is not indexed. Currently you need to specify explicitly the fields you do want to index. For more information checkout Pinecone's [docs](https://docs.pinecone.io/docs/manage-indexes#selective-metadata-indexing)."
"You should only have to do this part once."
]
},
{
@ -118,28 +112,21 @@
"source": [
"import os\n",
"\n",
"import pinecone\n",
"from pinecone import Pinecone, ServerlessSpec\n",
"\n",
"api_key = os.getenv(\"PINECONE_API_KEY\") or \"PINECONE_API_KEY\"\n",
"index_name = \"langchain-pinecone-hybrid-search\"\n",
"\n",
"# initialize Pinecone client\n",
"pc = Pinecone(api_key=api_key)\n",
"\n",
"index_name = \"langchain-pinecone-hybrid-search\""
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "cfa3a8d8",
"metadata": {},
"outputs": [],
"source": [
"# create the index\n",
"pinecone.create_index(\n",
" name=index_name,\n",
" dimension=1536, # dimensionality of dense model\n",
" metric=\"dotproduct\", # sparse values supported only for dotproduct\n",
" pod_type=\"s1\",\n",
" metadata_config={\"indexed\": []}, # see explanation above\n",
")"
"if index_name not in pc.list_indexes().names():\n",
" pc.create_index(\n",
" name=index_name,\n",
" dimension=1536, # dimensionality of dense model\n",
" metric=\"dotproduct\", # sparse values supported only for dotproduct\n",
" spec=ServerlessSpec(cloud=\"aws\", region=\"us-east-1\"),\n",
" )"
]
},
{
@ -147,7 +134,7 @@
"id": "e01549af",
"metadata": {},
"source": [
"Now that its created, we can use it"
"Now that the index is created, we can use it."
]
},
{
@ -157,7 +144,7 @@
"metadata": {},
"outputs": [],
"source": [
"index = pinecone.Index(index_name)"
"index = pc.Index(index_name)"
]
},
{