From 5a40413bfd928813cf0ebbb61e58847c1b21086c Mon Sep 17 00:00:00 2001 From: junefish Date: Tue, 21 May 2024 12:35:20 -0400 Subject: [PATCH] docs: add Pinecone tab to vector stores page (#21969) Thank you for contributing to LangChain! - [x] **PR title**: docs: add Pinecone tab to [vector stores page](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/). - [x] **PR message**: Recreation of https://github.com/langchain-ai/langchain/pull/21721. Adds information about PineconeVectorStore to the LangChain vector stores page. Although this page is deprecated, it still shows up prominently in Google search results, so it will still be very helpful to users to have correct information. ![search results](https://github.com/langchain-ai/langchain/assets/19216250/e05d8d74-03da-44a1-b87f-0f8087d3c13a) - [x] **Add tests and docs**: N/A - [x] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --- .../data_connection/vectorstores/index.mdx | 46 ++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/docs/docs/modules/data_connection/vectorstores/index.mdx b/docs/docs/modules/data_connection/vectorstores/index.mdx index 060df47026d..2ffebd37956 100644 --- a/docs/docs/modules/data_connection/vectorstores/index.mdx +++ b/docs/docs/modules/data_connection/vectorstores/index.mdx @@ -56,6 +56,50 @@ documents = text_splitter.split_documents(raw_documents) db = Chroma.from_documents(documents, OpenAIEmbeddings()) ``` + + + +This walkthrough uses the `Pinecone` vector database, which provides broad functionality to store and search over vectors. + +```bash +pip install langchain-pinecone +``` + +We want to use OpenAIEmbeddings so we have to get the OpenAI API Key. + +```python +import os +import getpass + +os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:') +``` + +```python +from langchain_community.document_loaders import TextLoader +from langchain_openai import OpenAIEmbeddings +from langchain_text_splitters import CharacterTextSplitter + +# Load the document, split it into chunks, and embed each chunk. +loader = TextLoader("../../modules/state_of_the_union.txt") +documents = loader.load() +text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) +docs = text_splitter.split_documents(documents) + +embeddings = OpenAIEmbeddings() +``` + +Next, go to the [Pinecone console](https://app.pinecone.io) and create a new index with `dimension=1536` called "langchain-test-index". Then, copy the API key and index name. + +```python +from langchain_pinecone import PineconeVectorStore + +os.environ['PINECONE_API_KEY'] = '' + +index_name = "langchain-test-index" + +# Connect to Pinecone index and insert the chunked docs as contents +docsearch = PineconeVectorStore.from_documents(docs, embeddings, index_name=index_name) +``` @@ -280,4 +324,4 @@ I’ve worked on these issues a long time. I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and safety. ``` - + \ No newline at end of file