langchain/docs/docs_skeleton/docs/modules/data_connection/index.mdx

---
sidebar_position: 1
---

# Retrieval

Retrieval Augmented Generation (RAG) is a critical aspect of many LLM applications that require user-specific data not included in the model's training set. LangChain offers a comprehensive range of building blocks to support RAG applications, ranging from simple to complex. This section of the documentation focuses on the retrieval step, which involves fetching the necessary data. While seemingly straightforward, this step can be quite intricate and involves several essential modules.

![data_connection_diagram](/img/data_connection.jpg)

**[Document loaders](/docs/modules/data_connection/document_loaders/)**

Document loaders play a vital role in fetching documents from various sources. LangChain provides over 100 document loaders and integrations with leading providers such as AirByte and Unstructured. These loaders can handle different document types (HTML, PDF, code) from various locations (private s3 buckets, public websites).

**[Document transformers](/docs/modules/data_connection/document_transformers/)**

Document transformers are crucial for retrieving relevant information from documents. This process involves multiple steps to prepare the documents for retrieval. One crucial step is splitting or chunking large documents into smaller ones. LangChain offers different algorithms for this task, tailored to specific document types like code and markdown.

**[Text embedding models](/docs/modules/data_connection/text_embedding/)**

Text embedding models are another key component of retrieval, as they create embeddings that capture the semantic meaning of text. These embeddings enable efficient searching for similar pieces of text. LangChain integrates with more than 25 different embedding providers and methods, allowing users to choose the most suitable one. The platform also provides a standard interface for seamless model swapping.

**[Vector stores](/docs/modules/data_connection/vectorstores/)**

Vector stores are databases designed to store and search embeddings effectively. With the growing use of embeddings, LangChain integrates with over 50 vector stores, ranging from open-source local options to cloud-hosted proprietary solutions. The platform offers a standard interface that simplifies the process of switching between vector stores.

**[Retrievers](/docs/modules/data_connection/retrievers/)**

Retriever algorithms are responsible for retrieving the data stored in the database. LangChain supports various retrieval algorithms, offering both basic methods like simple semantic search and advanced algorithms for improved performance. These include:

- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query.
- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
- And more!