# Embeddings

This notebook goes over how to use the Embedding class in LangChain.

The Embedding class is a class designed for interfacing with embeddings. There are lots of Embedding providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

The base Embedding class in LangChain exposes two methods: `embed_documents` and `embed_query`. The largest difference is that these two methods have different interfaces: one works over multiple documents, while the other works over a single document. Besides this, another reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

## OpenAI

Let's load the OpenAI Embedding class.

In [1]:
from langchain.embeddings import OpenAIEmbeddings

In [2]:
embeddings = OpenAIEmbeddings()

In [3]:
text = "This is a test document."

In [4]:
query_result = embeddings.embed_query(text)

In [5]:
doc_result = embeddings.embed_documents([text])

Let's load the OpenAI Embedding class with first generation models (e.g. text-search-ada-doc-001/text-search-ada-query-001). Note: These are not recommended models - see [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

In [None]:
embeddings = OpenAIEmbeddings(model_name="ada")

In [None]:
text = "This is a test document."

In [None]:
query_result = embeddings.embed_query(text)

In [None]:
doc_result = embeddings.embed_documents([text])

## AzureOpenAI

Let's load the OpenAI Embedding class with environment variables set to indicate to use Azure endpoints.

In [None]:
# set the environment variables needed for openai package to know to reach out to azure
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://<your-endpoint.openai.azure.com/"
os.environ["OPENAI_API_KEY"] = "your AzureOpenAI key"

In [None]:
embeddings = OpenAIEmbeddings(model="your-embeddings-deployment-name")

In [None]:
text = "This is a test document."

In [None]:
query_result = embeddings.embed_query(text)

In [None]:
doc_result = embeddings.embed_documents([text])

## Cohere

Let's load the Cohere Embedding class.

In [1]:
from langchain.embeddings import CohereEmbeddings

In [2]:
embeddings = CohereEmbeddings(cohere_api_key=cohere_api_key)

In [3]:
text = "This is a test document."

In [4]:
query_result = embeddings.embed_query(text)

In [5]:
doc_result = embeddings.embed_documents([text])

## Hugging Face Hub
Let's load the Hugging Face Embedding class.

In [7]:
from langchain.embeddings import HuggingFaceEmbeddings

In [16]:
embeddings = HuggingFaceEmbeddings()

In [12]:
text = "This is a test document."

In [13]:
query_result = embeddings.embed_query(text)

In [14]:
doc_result = embeddings.embed_documents([text])

## TensorflowHub
Let's load the TensorflowHub Embedding class.

In [1]:
from langchain.embeddings import TensorflowHubEmbeddings

In [5]:
embeddings = TensorflowHubEmbeddings()

2023-01-30 23:53:01.652176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-30 23:53:34.362802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [6]:
text = "This is a test document."

In [7]:
query_result = embeddings.embed_query(text)

## InstructEmbeddings
Let's load the HuggingFace instruct Embeddings class.

In [8]:
from langchain.embeddings import HuggingFaceInstructEmbeddings

In [9]:
embeddings = HuggingFaceInstructEmbeddings(
    query_instruction="Represent the query for retrieval: "
)

load INSTRUCTOR_Transformer
max_seq_length  512


In [10]:
text = "This is a test document."

In [11]:
query_result = embeddings.embed_query(text)

## Self Hosted Embeddings
Let's load the SelfHostedEmbeddings, SelfHostedHuggingFaceEmbeddings, and SelfHostedHuggingFaceInstructEmbeddings classes.

In [None]:
from langchain.embeddings import (
    SelfHostedEmbeddings,
    SelfHostedHuggingFaceEmbeddings,
    SelfHostedHuggingFaceInstructEmbeddings,
)
import runhouse as rh

In [None]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='my-cluster')

In [None]:
embeddings = SelfHostedHuggingFaceEmbeddings(hardware=gpu)

In [6]:
text = "This is a test document."

In [None]:
query_result = embeddings.embed_query(text)

And similarly for SelfHostedHuggingFaceInstructEmbeddings:

In [None]:
embeddings = SelfHostedHuggingFaceInstructEmbeddings(hardware=gpu)

Now let's load an embedding model with a custom load function:

In [12]:
def get_pipeline():
    from transformers import (
        AutoModelForCausalLM,
        AutoTokenizer,
        pipeline,
    )  # Must be inside the function in notebooks

    model_id = "facebook/bart-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    return pipeline("feature-extraction", model=model, tokenizer=tokenizer)


def inference_fn(pipeline, prompt):
    # Return last hidden state of the model
    if isinstance(prompt, list):
        return [emb[0][-1] for emb in pipeline(prompt)]
    return pipeline(prompt)[0][-1]

In [None]:
embeddings = SelfHostedEmbeddings(
    model_load_fn=get_pipeline,
    hardware=gpu,
    model_reqs=["./", "torch", "transformers"],
    inference_fn=inference_fn,
)

In [None]:
query_result = embeddings.embed_query(text)

## Fake Embeddings

LangChain also provides a fake embedding class. You can use this to test your pipelines.

In [1]:
from langchain.embeddings import FakeEmbeddings

In [3]:
embeddings = FakeEmbeddings(size=1352)

In [5]:
query_result = embeddings.embed_query("foo")

In [6]:
doc_results = embeddings.embed_documents(["foo"])

## SageMaker Endpoint Embeddings

Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker learn more [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker)

In [None]:
!pip3 install langchain boto3

## _### TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging Face model ###_ 

In [8]:
!pip install sagemaker --quiet

import os 
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
import boto3
from sagemaker import Session
# get sagemaker execution role to deploy
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
sess = Session()
# create code/ dir
os.makedirs("model/code", exist_ok=True)

In [7]:
%%writefile model/code/inference.py

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Helper: Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
  model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
  return model, tokenizer

def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer

    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)

    # return dictonary, which will be json serializable
    return {"embeddings": sentence_embeddings[0].tolist()}


Writing model/code/inference.py


In [9]:
from sagemaker.s3 import S3Uploader
from sagemaker.huggingface.model import HuggingFaceModel

# create model.tar.gz and upload to s3 
parent_dir=os.getcwd()
# change to model dir
os.chdir("model")
# use pigz for faster and parallel compression
!tar zcvf model.tar.gz *
# change back to parent dir
os.chdir(parent_dir)


# upload model.tar.gz to s3
s3_model_uri = S3Uploader.upload(local_path="model/model.tar.gz", desired_s3_uri=f"s3://{sess.default_bucket()}/embeddings")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_model_uri,      # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.26",  # transformers version used
   pytorch_version="1.13",       # pytorch version used
   py_version='py39',            # python version used
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(1,"ml.m5.2xlarge")

code/
code/inference.py
----!

In [19]:
predictor.endpoint_name

'huggingface-pytorch-inference-2023-03-21-16-14-03-834'

In [11]:
predictor.predict({"inputs": "This is a test document."})

{'embeddings': [-0.03833858296275139,
  0.12346473336219788,
  -0.028642961755394936,
  0.05365271493792534,
  0.008845399133861065,
  -0.039839327335357666,
  -0.07300589978694916,
  0.04777129739522934,
  -0.03046245686709881,
  0.054979756474494934,
  0.08505291491746902,
  0.03665667772293091,
  -0.0053200023248791695,
  -0.002233208389952779,
  -0.06071101501584053,
  -0.027237888425588608,
  -0.011351668275892735,
  -0.04243773967027664,
  0.009129947051405907,
  0.10081552714109421,
  0.075787253677845,
  0.06911724805831909,
  0.009857476688921452,
  -0.0018377384403720498,
  0.02624901942908764,
  0.03290242329239845,
  -0.07177436351776123,
  0.028384245932102203,
  0.06170952320098877,
  -0.05252952501177788,
  0.033661700785160065,
  0.07446815073490143,
  0.07536035776138306,
  0.03538404032588005,
  0.0671340748667717,
  0.01079804077744484,
  0.08167019486427307,
  0.01656288281083107,
  0.03283063322305679,
  0.03632563352584839,
  0.002172857290133834,
  -0.09895739704

## _### END TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging Face model ###_ 

In [3]:
from typing import Dict
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
import json


class ContentHandler(ContentHandlerBase):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:
        input_str = json.dumps({"inputs": prompt, **model_kwargs})
        return input_str.encode('utf-8')
    
    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["embeddings"]

content_handler = ContentHandler()


embeddings = SagemakerEndpointEmbeddings(
    # endpoint_name="endpoint-name", 
    # credentials_profile_name="credentials-profile-name", 
    endpoint_name="huggingface-pytorch-inference-2023-03-21-16-14-03-834", 
    region_name="us-east-1", 
    content_handler=content_handler
)

In [5]:
query_result = embeddings.embed_query("foo")

[0.01623339205980301,
 -0.007662336342036724,
 0.018606489524245262,
 0.031968992203474045,
 -0.031003747135400772,
 0.008777972310781479,
 0.1594553291797638,
 -0.009521624073386192,
 0.020200366154313087,
 -0.04545809328556061,
 0.013985812664031982,
 -0.017674963921308517,
 -0.03616964817047119,
 -0.02194339968264103,
 0.021387653425335884,
 0.06459270417690277,
 -0.03659535571932793,
 -0.01213359646499157,
 -0.043666232377290726,
 -0.03515005484223366,
 -0.032629866153001785,
 0.07834123075008392,
 -0.021041689440608025,
 0.03372766822576523,
 -0.024157941341400146,
 -0.010767146944999695,
 -0.042864806950092316,
 0.013539575971662998,
 0.05039731785655022,
 -0.091956727206707,
 0.035494621843099594,
 0.18029741942882538,
 0.01576363667845726,
 -0.04949156939983368,
 -0.003976485226303339,
 0.00032106428989209235,
 0.021849628537893295,
 0.035368386656045914,
 0.04185418039560318,
 0.04899369180202484,
 -0.026651302352547646,
 -0.05650882050395012,
 -0.03276852145791054,
 -0.020723

In [6]:
doc_results = embeddings.embed_documents(["foo"])

In [7]:
doc_results

[[0.01623339205980301,
  -0.007662336342036724,
  0.018606489524245262,
  0.031968992203474045,
  -0.031003747135400772,
  0.008777972310781479,
  0.1594553291797638,
  -0.009521624073386192,
  0.020200366154313087,
  -0.04545809328556061,
  0.013985812664031982,
  -0.017674963921308517,
  -0.03616964817047119,
  -0.02194339968264103,
  0.021387653425335884,
  0.06459270417690277,
  -0.03659535571932793,
  -0.01213359646499157,
  -0.043666232377290726,
  -0.03515005484223366,
  -0.032629866153001785,
  0.07834123075008392,
  -0.021041689440608025,
  0.03372766822576523,
  -0.024157941341400146,
  -0.010767146944999695,
  -0.042864806950092316,
  0.013539575971662998,
  0.05039731785655022,
  -0.091956727206707,
  0.035494621843099594,
  0.18029741942882538,
  0.01576363667845726,
  -0.04949156939983368,
  -0.003976485226303339,
  0.00032106428989209235,
  0.021849628537893295,
  0.035368386656045914,
  0.04185418039560318,
  0.04899369180202484,
  -0.026651302352547646,
  -0.0565088205