pinecone: init pkg (#16556)

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
This commit is contained in:
Erick Friis
2024-02-05 11:55:01 -08:00
committed by GitHub
parent 1183769cf7
commit 6ffd5b15bc
22 changed files with 2560 additions and 104 deletions

View File

@@ -13,7 +13,16 @@
"This notebook shows how to use functionality related to the `Pinecone` vector database.\n",
"\n",
"To use Pinecone, you must have an API key. \n",
"Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart)."
"Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"\n",
"Set the following environment variables to make using the `Pinecone` integration easier:\n",
"\n",
"- `PINECONE_API_KEY`: Your Pinecone API key.\n",
"- `PINECONE_INDEX_NAME`: The name of the index you want to use.\n",
"\n",
"And to follow along in this doc, you should also set\n",
"\n",
"- `OPENAI_API_KEY`: Your OpenAI API key, for using `OpenAIEmbeddings`"
]
},
{
@@ -25,74 +34,27 @@
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet pinecone-client langchain-openai tiktoken langchain"
"%pip install --upgrade --quiet langchain-pinecone langchain-openai langchain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Pinecone API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02a536e0-d603-4d79-b18b-1ed562977b40",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"PINECONE_ENV\"] = getpass.getpass(\"Pinecone Environment:\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "320af802-9271-46ee-948f-d2453933d44b",
"id": "42f2ea67",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
"First, let's split our state of the union document into chunked `docs`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffea66e4-bc23-46a9-9580-b348dfe7b7a7",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import Pinecone\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
@@ -103,43 +65,52 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e104aee",
"cell_type": "markdown",
"id": "3a4d377f",
"metadata": {},
"outputs": [],
"source": [
"import pinecone\n",
"Now let's assume you have your Pinecone index set up with `dimension=1536`.\n",
"\n",
"# initialize pinecone\n",
"pinecone.init(\n",
" api_key=os.getenv(\"PINECONE_API_KEY\"), # find at app.pinecone.io\n",
" environment=os.getenv(\"PINECONE_ENV\"), # next to api key in console\n",
")\n",
"\n",
"index_name = \"langchain-demo\"\n",
"\n",
"# First, check if our index already exists. If it doesn't, we create it\n",
"if index_name not in pinecone.list_indexes():\n",
" # we create a new index\n",
" pinecone.create_index(name=index_name, metric=\"cosine\", dimension=1536)\n",
"# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n",
"\n",
"# if you already have an index, you can load it like this\n",
"# docsearch = Pinecone.from_existing_index(index_name, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
"We can connect to our Pinecone index and insert those chunked docs as contents with `Pinecone.from_documents`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c608226",
"execution_count": 6,
"id": "6e104aee",
"metadata": {},
"outputs": [],
"source": [
"from langchain_pinecone import Pinecone\n",
"\n",
"index_name = \"langchain-test-index\"\n",
"\n",
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ffbcb3fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
@@ -156,15 +127,25 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "38a7a60e",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['24631802-4bad-44a7-a4ba-fd71f00cc160']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index = pinecone.Index(\"langchain-demo\")\n",
"vectorstore = Pinecone(index, embeddings.embed_query, \"text\")\n",
"vectorstore = Pinecone(index_name=index_name, embedding=embeddings)\n",
"\n",
"vectorstore.add_texts(\"More text!\")"
"vectorstore.add_texts([\"More text!\"])"
]
},
{
@@ -180,10 +161,91 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"## Document 0\n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"\n",
"## Document 1\n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n",
"\n",
"America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n",
"\n",
"These steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming. \n",
"\n",
"But I want you to know that we are going to be okay. \n",
"\n",
"When the history of this era is written Putins war on Ukraine will have left Russia weaker and the rest of the world stronger. \n",
"\n",
"While it shouldnt have taken something so terrible for people around the world to see whats at stake now everyone sees it clearly.\n",
"\n",
"## Document 2\n",
"\n",
"We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
"\n",
"Officer Mora was 27 years old. \n",
"\n",
"Officer Rivera was 22. \n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.\n",
"\n",
"## Document 3\n",
"\n",
"One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n",
"\n",
"When they came home, many of the worlds fittest and best trained warriors were never the same. \n",
"\n",
"Headaches. Numbness. Dizziness. \n",
"\n",
"A cancer that would put them in a flag-draped coffin. \n",
"\n",
"I know. \n",
"\n",
"One of those soldiers was my son Major Beau Biden. \n",
"\n",
"We dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n",
"\n",
"But Im committed to finding out everything we can. \n",
"\n",
"Committed to military families like Danielle Robinson from Ohio. \n",
"\n",
"The widow of Sergeant First Class Heath Robinson. \n",
"\n",
"He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n",
"\n",
"Stationed near Baghdad, just yards from burn pits the size of football fields. \n",
"\n",
"Heaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.\n"
]
}
],
"source": [
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
"matched_docs = retriever.get_relevant_documents(query)\n",
@@ -203,15 +265,56 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"id": "9ca82740",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. \n",
"\n",
"2. We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
"\n",
"Officer Mora was 27 years old. \n",
"\n",
"Officer Rivera was 22. \n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety. \n",
"\n"
]
}
],
"source": [
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
"for i, doc in enumerate(found_docs):\n",
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0fd750b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -230,7 +333,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.4"
}
},
"nbformat": 4,