mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-26 00:23:25 +00:00
DOCS updated data_connection
index page (#13426)
- the `Index` section was missed. Created it. - text simplification --------- Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
parent
7f8fd70ac4
commit
21552628c8
@ -18,23 +18,23 @@ This encompasses several key modules.
|
|||||||
|
|
||||||
**[Document loaders](/docs/modules/data_connection/document_loaders/)**
|
**[Document loaders](/docs/modules/data_connection/document_loaders/)**
|
||||||
|
|
||||||
Load documents from many different sources.
|
**Document loaders** load documents from many different sources.
|
||||||
LangChain provides over 100 different document loaders as well as integrations with other major providers in the space,
|
LangChain provides over 100 different document loaders as well as integrations with other major providers in the space,
|
||||||
like AirByte and Unstructured.
|
like AirByte and Unstructured.
|
||||||
We provide integrations to load all types of documents (HTML, PDF, code) from all types of locations (private s3 buckets, public websites).
|
LangChain provides integrations to load all types of documents (HTML, PDF, code) from all types of locations (private S3 buckets, public websites).
|
||||||
|
|
||||||
**[Document transformers](/docs/modules/data_connection/document_transformers/)**
|
**[Document transformers](/docs/modules/data_connection/document_transformers/)**
|
||||||
|
|
||||||
A key part of retrieval is fetching only the relevant parts of documents.
|
A key part of retrieval is fetching only the relevant parts of documents.
|
||||||
This involves several transformation steps in order to best prepare the documents for retrieval.
|
This involves several transformation steps to prepare the documents for retrieval.
|
||||||
One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
|
One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
|
||||||
LangChain provides several different algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).
|
LangChain provides several transformation algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).
|
||||||
|
|
||||||
**[Text embedding models](/docs/modules/data_connection/text_embedding/)**
|
**[Text embedding models](/docs/modules/data_connection/text_embedding/)**
|
||||||
|
|
||||||
Another key part of retrieval has become creating embeddings for documents.
|
Another key part of retrieval is creating embeddings for documents.
|
||||||
Embeddings capture the semantic meaning of the text, allowing you to quickly and
|
Embeddings capture the semantic meaning of the text, allowing you to quickly and
|
||||||
efficiently find other pieces of text that are similar.
|
efficiently find other pieces of a text that are similar.
|
||||||
LangChain provides integrations with over 25 different embedding providers and methods,
|
LangChain provides integrations with over 25 different embedding providers and methods,
|
||||||
from open-source to proprietary API,
|
from open-source to proprietary API,
|
||||||
allowing you to choose the one best suited for your needs.
|
allowing you to choose the one best suited for your needs.
|
||||||
@ -51,7 +51,7 @@ LangChain exposes a standard interface, allowing you to easily swap between vect
|
|||||||
|
|
||||||
Once the data is in the database, you still need to retrieve it.
|
Once the data is in the database, you still need to retrieve it.
|
||||||
LangChain supports many different retrieval algorithms and is one of the places where we add the most value.
|
LangChain supports many different retrieval algorithms and is one of the places where we add the most value.
|
||||||
We support basic methods that are easy to get started - namely simple semantic search.
|
LangChain supports basic methods that are easy to get started - namely simple semantic search.
|
||||||
However, we have also added a collection of algorithms on top of this to increase performance.
|
However, we have also added a collection of algorithms on top of this to increase performance.
|
||||||
These include:
|
These include:
|
||||||
|
|
||||||
@ -60,3 +60,13 @@ These include:
|
|||||||
- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
|
- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
|
||||||
- And more!
|
- And more!
|
||||||
|
|
||||||
|
**[Indexing](/docs/modules/data_connection/indexing)**
|
||||||
|
|
||||||
|
The LangChain **Indexing API** syncs your data from any source into a vector store,
|
||||||
|
helping you:
|
||||||
|
|
||||||
|
- Avoid writing duplicated content into the vector store
|
||||||
|
- Avoid re-writing unchanged content
|
||||||
|
- Avoid re-computing embeddings over unchanged content
|
||||||
|
|
||||||
|
All of which should save you time and money, as well as improve your vector search results.
|
Loading…
Reference in New Issue
Block a user