docs[patch]: Expand embeddings docs (#22881)

This commit is contained in:
Jacob Lee 2024-06-13 23:06:07 -07:00 committed by GitHub
parent 73c76b9628
commit 8e89178047
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 10 additions and 2 deletions

View File

@ -425,9 +425,14 @@ For specifics on how to use text splitters, see the [relevant how-to guides here
### Embedding models
<span data-heading-keywords="embedding,embeddings"></span>
The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.
Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
where we provide an LLM with the relevant data it needs to effectively respond to a query.
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.
![](/img/embeddings.png)
The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
@ -440,6 +445,9 @@ One of the most common ways to store and search over unstructured data is to emb
and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
A vector store takes care of storing embedded data and performing vector search for you.
Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
similarity search, allowing you more control over returned documents.
Vector stores can be converted to the retriever interface by doing:
```python

BIN
docs/static/img/embeddings.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 105 KiB