mirror of
https://github.com/hwchase17/langchain.git
synced 2025-10-08 13:50:00 +00:00
**Description:** This updates the langchain_community > huggingface > default bge embeddings ([the current default recommends this change](https://huggingface.co/BAAI/bge-large-en)) **Issue:** None **Dependencies:** None **Twitter handle:** @jonzeolla --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
146 lines
4.5 KiB
Plaintext
146 lines
4.5 KiB
Plaintext
# Text embedding models
|
|
|
|
:::info
|
|
Head to [Integrations](/docs/integrations/text_embedding/) for documentation on built-in integrations with text embedding model providers.
|
|
:::
|
|
|
|
The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.
|
|
|
|
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.
|
|
|
|
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former, `.embed_documents`, takes as input multiple texts, while the latter, `.embed_query`, takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
|
|
`.embed_query` will return a list of floats, whereas `.embed_documents` returns a list of lists of floats.
|
|
|
|
## Get started
|
|
|
|
### Setup
|
|
|
|
import Tabs from '@theme/Tabs';
|
|
import TabItem from '@theme/TabItem';
|
|
|
|
<Tabs>
|
|
<TabItem value="openai" label="OpenAI" default>
|
|
To start we'll need to install the OpenAI partner package:
|
|
|
|
```bash
|
|
pip install langchain-openai
|
|
```
|
|
|
|
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://platform.openai.com/account/api-keys). Once we have a key we'll want to set it as an environment variable by running:
|
|
|
|
```bash
|
|
export OPENAI_API_KEY="..."
|
|
```
|
|
|
|
If you'd prefer not to set an environment variable you can pass the key in directly via the `api_key` named parameter when initiating the OpenAI LLM class:
|
|
|
|
```python
|
|
from langchain_openai import OpenAIEmbeddings
|
|
|
|
embeddings_model = OpenAIEmbeddings(api_key="...")
|
|
```
|
|
|
|
Otherwise you can initialize without any params:
|
|
```python
|
|
from langchain_openai import OpenAIEmbeddings
|
|
|
|
embeddings_model = OpenAIEmbeddings()
|
|
```
|
|
|
|
</TabItem>
|
|
<TabItem value="cohere" label="Cohere">
|
|
|
|
To start we'll need to install the Cohere SDK package:
|
|
|
|
```bash
|
|
pip install langchain-cohere
|
|
```
|
|
|
|
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://dashboard.cohere.com/api-keys). Once we have a key we'll want to set it as an environment variable by running:
|
|
|
|
```shell
|
|
export COHERE_API_KEY="..."
|
|
```
|
|
|
|
If you'd prefer not to set an environment variable you can pass the key in directly via the `cohere_api_key` named parameter when initiating the Cohere LLM class:
|
|
|
|
```python
|
|
from langchain_cohere import CohereEmbeddings
|
|
|
|
embeddings_model = CohereEmbeddings(cohere_api_key="...", model='embed-english-v3.0')
|
|
```
|
|
|
|
Otherwise you can initialize simply as shown below:
|
|
```python
|
|
from langchain_cohere import CohereEmbeddings
|
|
|
|
embeddings_model = CohereEmbeddings(model='embed-english-v3.0')
|
|
```
|
|
Do note that it is mandatory to pass the model parameter while initializing the CohereEmbeddings class.
|
|
|
|
</TabItem>
|
|
<TabItem value="huggingface" label="Hugging Face">
|
|
|
|
To start we'll need to install the Hugging Face partner package:
|
|
|
|
```bash
|
|
pip install langchain-huggingface
|
|
```
|
|
|
|
You can then load any [Sentence Transformers model](https://huggingface.co/models?library=sentence-transformers) from the Hugging Face Hub.
|
|
|
|
```python
|
|
from langchain_huggingface import HuggingFaceEmbeddings
|
|
|
|
embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
|
|
```
|
|
</TabItem>
|
|
</Tabs>
|
|
|
|
### `embed_documents`
|
|
#### Embed list of texts
|
|
|
|
Use `.embed_documents` to embed a list of strings, recovering a list of embeddings:
|
|
|
|
```python
|
|
embeddings = embeddings_model.embed_documents(
|
|
[
|
|
"Hi there!",
|
|
"Oh, hello!",
|
|
"What's your name?",
|
|
"My friends call me World",
|
|
"Hello World!"
|
|
]
|
|
)
|
|
len(embeddings), len(embeddings[0])
|
|
```
|
|
|
|
<CodeOutputBlock language="python">
|
|
|
|
```
|
|
(5, 1536)
|
|
```
|
|
|
|
</CodeOutputBlock>
|
|
|
|
### `embed_query`
|
|
#### Embed single query
|
|
Use `.embed_query` to embed a single piece of text (e.g., for the purpose of comparing to other embedded pieces of texts).
|
|
|
|
```python
|
|
embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?")
|
|
embedded_query[:5]
|
|
```
|
|
|
|
<CodeOutputBlock language="python">
|
|
|
|
```
|
|
[0.0053587136790156364,
|
|
-0.0004999046213924885,
|
|
0.038883671164512634,
|
|
-0.003001077566295862,
|
|
-0.00900818221271038]
|
|
```
|
|
|
|
</CodeOutputBlock>
|