# Pinecone Hybrid Search

This notebook goes over how to use a retriever that under the hood uses Pinecone and Hybrid Search.

The logic of this retriever is largely taken from [this blog post](https://www.pinecone.io/learn/hybrid-search-intro/)

In [1]:
from langchain.retrievers import PineconeHybridSearchRetriever

## Setup Pinecone

In [3]:
import pinecone  # !pip install pinecone-client

pinecone.init(
   api_key="...",  # API key here
   environment="..."  # find next to api key in console
)
# choose a name for your index
index_name = "..."

You should only have to do this part once.

In [None]:
# create the index
pinecone.create_index(
   name = index_name,
   dimension = 1536,  # dimensionality of dense model
   metric = "dotproduct",
   pod_type = "s1"
)

Now that its created, we can use it

In [4]:
index = pinecone.Index(index_name)

## Get embeddings and tokenizers

Embeddings are used for the dense vectors, tokenizer is used for the sparse vector

In [5]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [6]:
from transformers import BertTokenizerFast  # !pip install transformers

# load bert tokenizer from huggingface
tokenizer = BertTokenizerFast.from_pretrained(
   'bert-base-uncased'
)

## Load Retriever

We can now construct the retriever!

In [7]:
retriever = PineconeHybridSearchRetriever(embeddings=embeddings, index=index, tokenizer=tokenizer)

## Add texts (if necessary)

We can optionally add texts to the retriever (if they aren't already in there)

In [8]:
retriever.add_texts(["foo", "bar", "world", "hello"])

  0%|          | 0/1 [00:00<?, ?it/s]

## Use Retriever

We can now use the retriever!

In [9]:
result = retriever.get_relevant_documents("foo")

In [10]:
result[0]

Document(page_content='foo', metadata={})