diff --git a/docs/docs/concepts/embedding_models.mdx b/docs/docs/concepts/embedding_models.mdx
index e1c57a2c253..d502eeb25d1 100644
--- a/docs/docs/concepts/embedding_models.mdx
+++ b/docs/docs/concepts/embedding_models.mdx
@@ -1,15 +1,163 @@
 # Embedding models
 <span data-heading-keywords="embedding,embeddings"></span>
 
-Embedding models create a vector representation of a piece of text. You can think of a vector as an array of numbers that captures the semantic meaning of the text.
-By representing the text in this way, you can perform mathematical operations that allow you to do things like search for other pieces of text that are most similar in meaning.
-These natural language search capabilities underpin many types of [context retrieval](/docs/concepts/#retrieval),
-where we provide an LLM with the relevant data it needs to effectively respond to a query.
+:::info[Prerequisites]
 
-![](/img/embeddings.png)
+* [Documents](/docs/concepts/retrievers/#interface)
 
-The `Embeddings` class is a class designed for interfacing with text embedding models. There are many different embedding model providers (OpenAI, Cohere, Hugging Face, etc) and local models, and this class is designed to provide a standard interface for all of them.
+:::
+:::info[Note]
 
-The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
+This conceptual overview focuses on text-based embedding models. 
+Embedding models can also be [multi-modal](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings), allowing for the embedding of a wide variety of data types.
+ 
+:::
 
-For specifics on how to use embedding models, see the [relevant how-to guides here](/docs/how_to/#embedding-models).
+## Overview
+
+Imagine being able to capture the essence of any text - a tweet, document, or book - in a single, compact representation. 
+This is the power of embedding models, which lie at the heart of many retrieval systems.
+Embedding models transform human language into a format that machines can understand and compare with speed and accuracy. 
+These models take text as input and produce a fixed-length array of numbers, a numerical fingerprint of the text's semantic meaning.
+Embeddings allow search system to find relevant documents not just based on keyword matches, but on semantic understanding. 
+
+## Key concepts
+
+![Conceptual Overview](/img/embeddings_concept.png)
+
+(1) **Embed text as a vector**: Embeddings transform text into a numerical vector representation.
+
+(2) **Measure similarity**: Embedding vectors can be comparing using simple mathematical operations.
+
+## Embedding data 
+
+### Historical context 
+
+The landscape of embedding models has evolved significantly over the years. 
+A pivotal moment came in 2018 when Google introduced [BERT (Bidirectional Encoder Representations from Transformers)](https://www.nvidia.com/en-us/glossary/bert/). 
+BERT applied transformer models to embed text as a simple vector representation, which lead to unprecedented performance across various NLP tasks.
+However, BERT wasn't optimized for generating sentence embeddings efficiently. 
+This limitation spurred the creation of [SBERT (Sentence-BERT)](https://www.sbert.net/examples/training/sts/README.html), which adapted the BERT architecture to generate semantically rich sentence embeddings, easily comparable via similarity metrics like cosine similarity, dramatically reduced the computational overhead for tasks like finding similar sentences.
+Today, the embedding model ecosystem is diverse, with numerous providers offering their own implementations. 
+To navigate this variety, researchers and practitioners often turn to benchmarks like the Massive Text Embedding Benchmark (MTEB) [here](https://huggingface.co/blog/mteb) for objective comparisons.
+
+:::info[Further reading]
+
+* See the [seminal BERT paper](https://arxiv.org/abs/1810.04805).
+* See Cameron Wolfe's [excellent review](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) of embedding models.
+* See the [Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/blog/mteb) leaderboard for a comprehensive overview of embedding models.
+
+:::
+
+### LangChain Interface  
+
+Today, there are [many different embedding models](/docs/integrations/text_embedding/).
+LangChain provides a universal interface for working with them, providing standard methods for common operations.
+This common interface simplifies interaction with various embedding providers through two central methods:
+
+- `embed_documents`: For embedding multiple texts (documents)
+- `embed_query`: For embedding a single text (query)
+
+This distinction is important, as some providers employ different embedding strategies for documents (which are to be searched) versus queries (the search input itself).
+To illustrate, here's a practical example using LangChain's `.embed_documents` method to embed a list of strings:
+
+```python
+from langchain_openai import OpenAIEmbeddings
+embeddings_model = OpenAIEmbeddings()
+embeddings = embeddings_model.embed_documents(
+    [
+        "Hi there!",
+        "Oh, hello!",
+        "What's your name?",
+        "My friends call me World",
+        "Hello World!"
+    ]
+)
+len(embeddings), len(embeddings[0])
+(5, 1536)
+```
+
+For convenience, you can also use the `embed_query` method to embed a single text:
+
+```python
+query_embedding = embeddings_model.embed_query("What is the meaning of life?")
+```
+
+:::info[Further reading]
+
+* See the full list of [LangChain embedding model integrations](/docs/integrations/text_embedding/).
+* See these [how-to guides](/docs/how_to/embed_text) for working with embedding models.
+
+:::
+
+## Measure similarity
+
+Each embedding is essentially a set of coordinates in a vast, abstract space. 
+In this space, the position of each point (embedding) reflects the meaning of its corresponding text.
+Just as similar words might be close to each other in a thesaurus, similar concepts end up close to each other in this embedding space. 
+This allows for intuitive comparisons between different pieces of text.
+By reducing text to these numerical representations, we can use simple mathematical operations to quickly measure how alike two pieces of text are, regardless of their original length or structure.
+Some common similarity metrics include:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+As an example, any two embedded texts can be compared with cosine_similarity:
+
+```python
+import numpy as np
+
+def cosine_similarity(vec1, vec2):
+    dot_product = np.dot(vec1, vec2)
+    norm_vec1 = np.linalg.norm(vec1)
+    norm_vec2 = np.linalg.norm(vec2)
+    return dot_product / (norm_vec1 * norm_vec2)
+
+similarity = cosine_similarity(query_result, document_result)
+print("Cosine Similarity:", similarity)
+```  
+
+:::info[Further reading]
+
+* See Simon Willison’s [nice blog post and video](https://simonwillison.net/2023/Oct/23/embeddings/) on embeddings and similarity metrics.
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+::: 
+
+
+### Embedding with higher granularity  
+
+![](/img/embeddings_colbert.png)
+
+Embedding models compress text into fixed-length (vector) representations, which can put a heavy burden on that single vector to capture the semantic nuance and detail of the document. 
+In some cases, irrelevant or redundant content can dilute the semantic usefulness of the embedding.
+[ColBERT](https://arxiv.org/abs/2004.12832) (Contextualized Late Interaction over BERT) is an innovative approach to address this limitation by using higher granularity embeddings. 
+Here's how ColBERT works:
+
+- **Token-level embeddings**: Produce contextually influenced embeddings for each token in the document and the query.
+- **MaxSim operation**: For each query token, compute its maximum similarity with all document tokens.
+- **Aggregation**: The final relevance score is obtained by summing these maximum similarities across all query tokens.
+
+This token-wise scoring can yield strong results, especially for tasks requiring precise matching or handling longer documents.
+Key advantages of ColBERT:
+
+- **Improved accuracy**: Token-level interactions can capture more nuanced relationships between query and document.
+- **Interpretability**: The token-level matching allows for easier interpretation of why a document was considered relevant.
+
+However, ColBERT does come with some trade-offs:
+
+- **Increased computational cost**: Processing and storing token-level embeddings requires more resources.
+- **Complexity**: Implementing and optimizing ColBERT can be more challenging than simpler embedding models.
+
+| Name              | When to use                                              | Description                                                                                                                                                                            |
+|-------------------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [ColBERT](/docs/integrations/providers/ragatouille/#using-colbert-as-a-reranker)           | When higher granularity embeddings are needed.           | ColBERT uses contextually influenced embeddings for each token in the document and query to get a granular query-document similarity score. [Paper](https://arxiv.org/abs/2112.01488). |
+
+:::tip
+
+See our RAG from Scratch video on [ColBERT](https://youtu.be/cN6S0Ehm7_8?feature=shared>).
+
+:::
\ No newline at end of file
diff --git a/docs/docs/concepts/rag.mdx b/docs/docs/concepts/rag.mdx
new file mode 100644
index 00000000000..dc459df95a4
--- /dev/null
+++ b/docs/docs/concepts/rag.mdx
@@ -0,0 +1,98 @@
+# Retrieval Augmented Generation (RAG)
+
+:::info[Prerequisites]
+
+* [Retrieval](/docs/concepts/retrieval/)
+
+:::
+
+## Overview
+
+Retrieval Augmented Generation (RAG) is a powerful technique that enhances [language models](/docs/concepts/chat_models/) by combining them with external knowledge bases. 
+RAG addresses [a key limitation of models](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise): models rely on fixed training datasets, which can lead to outdated or incomplete information.
+When given a query, RAG systems first search a knowledge base for relevant information.
+The system then incorporates this retrieved information into the model's prompt.
+The model uses the provided context to generate a response to the query.
+By bridging the gap between vast language models and dynamic, targeted information retrieval, RAG is a powerful technique for building more capable and reliable AI systems.
+
+## Key Concepts
+
+![Conceptual Overview](/img/rag_concepts.png)
+
+(1) **Retrieval system**: Retrieve relevant information from a knowledge base.
+
+(2) **Adding external knowledge**: Pass retrieved information to a model.
+
+## Retrieval system
+
+Model's have internal knowledge that is often fixed, or at least not updated frequently due to the high cost of training.
+This limits their ability to answer questions about current events, or to provide specific domain knowledge.
+To address this, there are various knowledge injection techniques like [fine-tuning](https://hamel.dev/blog/posts/fine_tuning_valuable.html) or continued pre-training.
+Both are [costly](https://www.glean.com/blog/how-to-build-an-ai-assistant-for-the-enterprise) and often [poorly suited](https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts) for factual retrieval.
+Using a retrieval system offers several advantages:
+
+- **Up-to-date information**: RAG can access and utilize the latest data, keeping responses current.
+- **Domain-specific expertise**: With domain-specific knowledge bases, RAG can provide answers in specific domains.
+- **Reduced hallucination**: Grounding responses in retrieved facts helps minimize false or invented information.
+- **Cost-effective knowledge integration**: RAG offers a more efficient alternative to expensive model fine-tuning.
+
+:::info[Further reading]
+
+See our conceptual guide on [retrieval](/docs/concepts/retrieval/).
+
+:::
+
+## Adding external knowledge
+
+With a retrieval system in place, we need to pass knowledge from this system to the model. 
+A RAG pipeline typically achieves this following these steps:
+
+- Receive an input query.
+- Use the retrieval system to search for relevant information based on the query.
+- Incorporate the retrieved information into the prompt sent to the LLM.
+- Generate a response that leverages the retrieved context.
+
+As an example, here's a simple RAG workflow that passes information from a [retriever](/docs/concepts/retrievers/) to a [chat model](/docs/concepts/chat_models/):
+
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a system prompt that tells the model how to use the retrieved context
+system_prompt = """You are an assistant for question-answering tasks. 
+Use the following pieces of retrieved context to answer the question. 
+If you don't know the answer, just say that you don't know. 
+Use three sentences maximum and keep the answer concise.
+Context: {context}:"""
+    
+# Define a question
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+
+# Retrieve relevant documents
+docs = retriever.invoke(question)
+
+# Combine the documents into a single string
+docs_text = "".join(d.page_content for d in docs)
+
+# Populate the system prompt with the retrieved context
+system_prompt_fmt = system_prompt.format(context=docs_text)
+
+# Create a model
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+
+# Generate a response
+questions = model.invoke([SystemMessage(content=system_prompt_fmt),
+                          HumanMessage(content=question)])
+```
+
+:::info[Further reading]
+
+RAG a deep area with many possible optimization and design choices:
+
+* See [this excellent blog](https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval?utm_source=profile&utm_medium=reader2) from Cameron Wolfe for a comprehensive overview and history of RAG.
+* See our [RAG how-to guides](/docs/how_to/#qa-with-rag).
+* See our RAG [tutorials](/docs/tutorials/#working-with-external-knowledge).
+* See our RAG from Scratch course, with [code](https://github.com/langchain-ai/rag-from-scratch) and [video playlist](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x).
+* Also, see our RAG from Scratch course [on Freecodecamp](https://youtu.be/sVcwVQRHIc8?feature=shared).
+
+:::
diff --git a/docs/docs/concepts/retrieval.mdx b/docs/docs/concepts/retrieval.mdx
new file mode 100644
index 00000000000..dac820d527e
--- /dev/null
+++ b/docs/docs/concepts/retrieval.mdx
@@ -0,0 +1,240 @@
+# Retrieval
+
+:::info[Prerequisites]
+
+* [Retrievers](/docs/concepts/retrievers/)
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
+
+:::
+
+:::danger[Security]
+ 
+Some of the concepts reviewed here utilize models to generate queries (e.g., for SQL or graph databases).
+There are inherent risks in doing this. 
+Make sure that your database connection permissions are scoped as narrowly as possible for your application's needs. 
+This will mitigate, though not eliminate, the risks of building a model-driven system capable of querying databases. 
+For more on general security best practices, see our [security guide](/docs/security/).
+
+:::
+
+## Overview 
+
+Retrieval systems are fundamental to many AI applications, efficiently identifying relevant information from large datasets. 
+These systems accommodate various data formats:
+
+- Unstructured text (e.g., documents) is often stored in vector stores or lexical search indexes.
+- Structured data is typically housed in relational or graph databases with defined schemas.
+
+Despite this diversity in data formats, modern AI applications increasingly aim to make all types of data accessible through natural language interfaces. 
+Models play a crucial role in this process by translating natural language queries into formats compatible with the underlying search index or database. 
+This translation enables more intuitive and flexible interactions with complex data structures.
+
+## Key concepts 
+
+![Retrieval](/img/retrieval_concept.png)
+
+(1) **Query analysis**: A process where models transform or construct search queries to optimize retrieval.
+
+(2) **Information retrieval**: Search queries are used to fetch information from various retrieval systems.
+
+## Query Analysis 
+
+While users typically prefer to interact with retrieval systems using natural language, retrieval systems can specific query syntax or benefit from particular keywords. 
+Query analysis serves as a bridge between raw user input and optimized search queries. Some common applications of query analysis include:
+
+1. **Query Re-writing**: Queries can be re-written or expanded to improve semantic or lexical searches.
+2. **Query Construction**: Search indexes may require structured queries (e.g., SQL for databases).
+
+Query analysis employs models to transform or construct optimized search queries from raw user input. 
+
+### Query Re-writing
+
+Retrieval systems should ideally handle a wide spectrum of user inputs, from simple and poorly worded queries to complex, multi-faceted questions. 
+To achieve this versatility, a popular approach is to use models to transform raw user queries into more effective search queries. 
+This transformation can range from simple keyword extraction to sophisticated query expansion and reformulation.
+Here are some key benefits of using models for query analysis in unstructured data retrieval:
+
+1. **Query Clarification**: Models can rephrase ambiguous or poorly worded queries for clarity.
+2. **Semantic Understanding**: They can capture the intent behind a query, going beyond literal keyword matching.
+3. **Query Expansion**: Models can generate related terms or concepts to broaden the search scope.
+4. **Complex Query Handling**: They can break down multi-part questions into simpler sub-queries.
+
+Various techniques have been developed to leverage models for query re-writing, including:
+
+| Name          | When to use | Description                                                                                                                                                                                                                                                                            |
+|---------------|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Multi-query](/docs/how_to/MultiQueryRetriever/)   | When you want to ensure high recall in retrieval by providing multiple pharsings of a question. | Rewrite the user question with multiple pharsings, retrieve documents for each rewritten question, return the unique documents for all queries. |
+| [Decomposition](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb) | When a question can be broken down into smaller subproblems. | Decompose a question into a set of subproblems / questions, which can either be solved sequentially (use the answer from first + retrieval to answer the second) or in parallel (consolidate each answer into final answer).                                                           |
+| [Step-back](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)     | When a higher-level conceptual understanding is required. | First prompt the LLM to ask a generic step-back question about higher-level concepts or principles, and retrieve relevant facts about them. Use this grounding to help answer the user question. [Paper](https://arxiv.org/pdf/2310.06117).                                            |
+| [HyDE](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb)          | If you have challenges retrieving relevant documents using the raw user inputs. | Use an LLM to convert questions into hypothetical documents that answer the question. Use the embedded hypothetical documents to retrieve real documents with the premise that doc-doc similarity search can produce more relevant matches. [Paper](https://arxiv.org/abs/2212.10496). |
+
+As an example, query decomposition can simply be accomplished using prompting and a structured output that enforces a list of sub-questions.
+These can then be run sequentially or in parallel on a downstream retrieval system.
+
+```python
+from pydantic import BaseModel, Field
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import SystemMessage, HumanMessage
+
+# Define a Pydantic model to enforce the output structure
+class Questions(BaseModel):
+    questions: List[str] = Field(
+        description="A list of sub-questions related to the input query."
+    )
+
+# Create an instance of the model and enforce the output structure
+model = ChatOpenAI(model="gpt-4o", temperature=0) 
+structured_model = model.with_structured_output(Questions)
+
+# Define the system prompt
+system = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
+The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n"""
+
+# Pass the question to the model
+question = """What are the main components of an LLM-powered autonomous agent system?"""
+questions = structured_model.invoke([SystemMessage(content=system)]+[HumanMessage(content=question)])
+```
+
+:::tip
+
+See our RAG from Scratch videos for a few different specific approaches:
+- [Multi-query](https://youtu.be/JChPi0CRnDY?feature=shared)
+- [Decomposition](https://youtu.be/h0OPWlEOank?feature=shared)
+- [Step-back](https://youtu.be/xn1jEjRyJ2U?feature=shared)
+- [HyDE](https://youtu.be/SaDzIVkYqyY?feature=shared)
+
+:::
+
+### Query Construction
+
+Query analysis also can focus on translating natural language queries into specialized query languages or filters. 
+This translation is crucial for effectively interacting with various types of databases that house structured or semi-structured data.
+
+1. **Structured Data examples**: For relational and graph databases, Domain-Specific Languages (DSLs) are used to query data.
+   - **Text-to-SQL**: [Converts natural language to SQL](https://paperswithcode.com/task/text-to-sql) for relational databases.
+   - **Text-to-Cypher**: [Converts natural language to Cypher](https://neo4j.com/labs/neodash/2.4/user-guide/extensions/natural-language-queries/) for graph databases.
+
+2. **Semi-structured Data examples**: For vectorstores, queries can combine semantic search with metadata filtering.
+   - **Natural Language to Metadata Filters**: Converts user queries into [appropriate metadata filters](https://docs.pinecone.io/guides/data/filter-with-metadata).
+
+These approaches leverage models to bridge the gap between user intent and the specific query requirements of different data storage systems. Here are some popular techniques:
+
+| Name                                        | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
+|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Self Query](/docs/how_to/self_query/)      | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.          | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filter to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                                              |
+| [Text to SQL](/docs/tutorials/sql_qa/)      | If users are asking questions that require information housed in a relational database, accessible via SQL.                                   | This uses an LLM to transform user input into a SQL query.                                             |
+| [Text-to-Cypher](/docs/tutorials/graph/)    | If users are asking questions that require information housed in a graph database, accessible via Cypher.                                     | This uses an LLM to transform user input into a Cypher query.                                              |
+
+As an example, here is how to use the `SelfQueryRetriever` to convert natural language queries into metadata filters.  
+
+```python
+metadata_field_info = schema_for_metadata 
+document_content_description = "Brief summary of a movie"
+llm = ChatOpenAI(temperature=0)
+retriever = SelfQueryRetriever.from_llm(
+    llm,
+    vectorstore,
+    document_content_description,
+    metadata_field_info,
+)
+```
+
+:::info[Further reading]
+
+* See our tutorials on [text-to-SQL](/docs/tutorials/sql_qa/), [text-to-Cypher](/docs/tutorials/graph/), and [query analysis for metadata filters](/docs/tutorials/query_analysis/).
+* See our [blog post overview](https://blog.langchain.dev/query-construction/).
+* See our RAG from Scratch video on [query construction](https://youtu.be/kl6NwWYxvbM?feature=shared).
+
+::: 
+
+## Information Retrieval 
+
+### Common retrieval systems
+
+#### Lexical search indexes
+
+Many search engines are based upon matching words in a query to the words in each document. 
+This approach is called lexical retrieval, using search [algorithms that are typically based upon word frequencies](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+The intution is simple: a word appears frequently both in the user’s query and a particular document, then this document might be a good match.
+
+The particular data structure used to implement this is often an [*inverted index*](https://www.geeksforgeeks.org/inverted-index/).
+This types of index contains a list of words and a mapping of each word to a list of locations at which it occurs in various documents. 
+Using this data structure, it is possible to efficiently match the words in search queries to the documents in which they appear.
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+#### Vector indexes
+
+Vector indexes are an alternative way to index and store unstructured data.
+See our conceptual guide on [vectorstores](/docs/concepts/vectorstores/) for a detailed overview.  
+In short, rather than using word frequencies, vectorstores use an [embedding model](/docs/concepts/embedding_models/) to compress documents into high-dimensional vector representation. 
+This allows for efficient similarity search over embedding vectors using simple mathematical operations like cosine similarity.
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/vectorstore_retriever/) for more details on working with vectorstores.
+* See our [list of vectorstore integrations](/docs/integrations/vectorstores/).
+* See Cameron Wolfe's [blog post](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2) on the basics of vector search.
+
+:::
+
+#### Relational databases
+
+Relational databases are a fundamental type of structured data storage used in many applications. 
+They organize data into tables with predefined schemas, where each table represents an entity or relationship. 
+Data is stored in rows (records) and columns (attributes), allowing for efficient querying and manipulation through SQL (Structured Query Language). 
+Relational databases excel at maintaining data integrity, supporting complex queries, and handling relationships between different data entities.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for working with SQL databases.
+* See our [SQL database toolkit](/docs/integrations/tools/sql_database/).
+
+:::
+
+#### Graph databases
+
+Graph databases are a specialized type of database designed to store and manage highly interconnected data. 
+Unlike traditional relational databases, graph databases use a flexible structure consisting of nodes (entities), edges (relationships), and properties. 
+This structure allows for efficient representation and querying of complex, interconnected data.
+Graph databases store data in a graph structure, with nodes, edges, and properties.
+They are particularly useful for storing and querying complex relationships between data points, such as social networks, supply-chain management, fraud detection, and recommendation services
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/graph/) for working with graph databases.
+* See our [list of graph database integrations](/docs/integrations/graphs/). 
+* See Neo4j's [starter kit for LangChain](https://neo4j.com/developer-blog/langchain-neo4j-starter-kit/).
+
+:::
+
+### Retriever  
+
+LangChain provides a unified interface for interacting with various retrieval systems through the [retriever](/docs/concepts/retrievers/) concept. The interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+You can create a retriever using any of the retrieval systems mentioned earlier. The query analysis techniques we discussed are particularly useful here, as they enable natural language interfaces for databases that typically require structured query languages.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) to be transformed into a SQL query behind the scenes.
+Regardless of the underlying retrieval system, all retrievers in LangChain share a common interface. You can use them with the simple `invoke` method:
+
+
+```python
+docs = retriever.invoke(query)
+```
+
+:::info[Further reading]
+
+* See our [conceptual guide on retrievers](/docs/concepts/retrievers/).
+* See our [how-to guide](/docs/how_to/#retrievers) on working with retrievers.
+
+:::
diff --git a/docs/docs/concepts/retrievers.mdx b/docs/docs/concepts/retrievers.mdx
index dd9903ec0b1..0c0c4c16020 100644
--- a/docs/docs/concepts/retrievers.mdx
+++ b/docs/docs/concepts/retrievers.mdx
@@ -2,11 +2,144 @@
 
 <span data-heading-keywords="retriever,retrievers"></span>
 
-A retriever is an interface that returns documents given an unstructured query.
-It is more general than a vector store.
-A retriever does not need to be able to store documents, only to return (or retrieve) them.
-Retrievers can be created from vector stores, but are also broad enough to include [Wikipedia search](/docs/integrations/retrievers/wikipedia/) and [Amazon Kendra](/docs/integrations/retrievers/amazon_kendra_retriever/).
+:::info[Prerequisites]
 
-Retrievers accept a string query as input and return a list of Document's as output.
+* [Vectorstores](/docs/concepts/vectorstores/)
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
 
-For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
+:::
+
+## Overview
+
+Many different types of retrieval systems exist, including vectorstores, graph databases, and relational databases.
+With the rise on popuarity of large language models, retrieval systems have become an important component in AI application (e.g., [RAG](/docs/concepts/rag/)).
+Because of their importance and variability, LangChain provides a uniform interface for interacting with different types of retrieval systems.
+The LangChain [retriever](/docs/concepts/retrievers/) interface is straightforward:
+
+1. Input: A query (string)
+2. Output: A list of documents (standardized LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects)
+
+## Key concept
+
+![Retriever](/img/retriever_concept.png)
+ 
+All retrievers implement a simple interface for retrieving documents using natural language queries.
+
+## Interface 
+
+The only requirement for a retriever is the ability to accepts a query and return documents. 
+In particular, LangChain's retriever class only requires that the `_get_relevant_documents` method is implemented, which takes a `query: str` and returns a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects that are most relevant to the query.
+The underlying logic used to get relevant documents is specified by the retriever and can be whatever is most useful for the application.
+
+A LangChain retriever is a [runnable](/docs/how_to/lcel_cheatsheet/), which is a standard interface is for LangChain components. 
+This means that it has a few common methods, including `invoke`, that are used to interact with it. A retriever can be invoked with a query:
+
+```python
+docs = retriever.invoke(query)
+```
+
+Retrievers return a list of [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects, which have two attributes:
+
+* `page_content`: The content of this document. Currently is a string.
+* `metadata`: Arbitrary metadata associated with this document (e.g., document id, file name, source, etc). 
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/custom_retriever/) on building your own custom retriever.
+
+:::
+ 
+## Common types
+
+Despite the flexibility of the retriever interface, a few common types of retrieval systems are frequently used.
+
+### Search APIs
+
+It's important to note that retrievers don't need to actually *store* documents. 
+For example, we can be built retrievers on top of search APIs that simply return search results! 
+See our retriever integrations with [Amazon Kendra](https://python.langchain.com/docs/integrations/retrievers/amazon_kendra_retriever/) or [Wikipedia Search](https://python.langchain.com/docs/integrations/retrievers/wikipedia/). 
+
+### Relational or Graph Database
+
+Retrievers can be built on top of relational or graph databases.
+In these cases, [query analysis](/docs/concepts/retrieval/) techniques to construct a structured query from natural language is critical.
+For example, you can build a retriever for a SQL database using text-to-SQL conversion. This allows a natural language query (string) retriever to be transformed into a SQL query behind the scenes.
+
+:::info[Further reading]
+
+* See our [tutorial](/docs/tutorials/sql_qa/) for context on how to build a retreiver using a SQL database and text-to-SQL.
+* See our [tutorial](/docs/tutorials/graph/) for context on how to build a retreiver using a graph database and text-to-Cypher.
+
+:::
+
+### Lexical Search
+
+As discussed in our conceptual review of [retrieval](/docs/concepts/retrieval/), many search engines are based upon matching words in a query to the words in each document. 
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25#:~:text=BM25%20is%20a%20bag%2Dof,slightly%20different%20components%20and%20parameters.) and [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) are [two popular lexical search algorithms](https://cameronrwolfe.substack.com/p/the-basics-of-ai-powered-vector-search?utm_source=profile&utm_medium=reader2).
+LangChain has retreivers for many popular lexical search algorithms / engines. 
+
+:::info[Further reading]
+
+* See the [BM25](/docs/integrations/retrievers/bm25/) retriever integration.
+* See the [TF-IDF](/docs/integrations/retrievers/tf_idf/) retriever integration.
+* See the [Elasticsearch](/docs/integrations/retrievers/elasticsearch_retriever/) retriever integration.
+
+::: 
+
+### Vectorstore 
+
+[Vectorstores](/docs/concepts/vectorstores/) are a powerful and efficient way to index and retrieve unstructured data. 
+An vectorstore can be used as a retriever by calling the `as_retriever()` method.
+
+```python
+vectorstore = MyVectorStore()
+retriever = vectorstore.as_retriever()
+```
+
+## Advanced retrieval patterns
+
+### Ensemble 
+
+Because the retriever interface is so simple, returning a list of `Document` objects given a search query, it is possible to combine multiple retrievers using ensembling.
+This is particularly useful when you have multiple retrievers that are good at finding different types of relevant documents.
+It is easy to create an [ensemble retriever](/docs/how_to/ensemble_retriever/) that combines multiple retrievers with linear weighted scores:
+
+```python
+# initialize the ensemble retriever
+ensemble_retriever = EnsembleRetriever(
+    retrievers=[bm25_retriever, vector_store_retriever], weights=[0.5, 0.5]
+)
+```
+
+When ensembling, how do we combine search results from many retrievers? 
+This motivates the concept of re-ranking, which takes the output of multiple retrievers and combines them using a more sophisticated algorithm such as [Reciprocal Rank Fusion (RRF)](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).
+
+### Source Document Retention 
+
+Many retriers utilize some kind of index to make documents easily searchable.
+The process of indexing can include a transformation step (e.g., vectorstores often use document splitting). 
+Whatever transformation is used, can be very useful to retain a link between the *transformed document* and the original, giving the retriever the ability to return the *original* document.
+
+![Retrieval with full docs](/img/retriever_full_docs.png)
+
+This is particularly useful in AI applications, because it ensures no loss in document context for the model.
+For example, you may use small chunk size for indexing documents in a vectorstore. 
+If you return *only* the chunks as the retrieval result, then the model will have lost the original document context for the chunks. 
+
+LangChain has two different retrievers that can be used to address this challenge. 
+The [Multi-Vector](/docs/how_to/multi_vector/) retriever allows the user to use any document transformation (e.g., use an LLM to write a summary of the document) for indexing while retaining linkage to the source document. 
+The [ParentDocument](/docs/how_to/parent_document_retriever/) retriever links document chunks from a text-splitter transformation for indexing while retaining linkage to the source document. 
+
+| Name                      | Index Type                   | Uses an LLM               | When to Use                                                                                                                                   | Description                                                                                                                                                                                                                                                                                      |
+|---------------------------|------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [ParentDocument](/docs/how_to/parent_document_retriever/)            | Vector store + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together.       | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                                                         |
+| [Multi Vector](/docs/how_to/multi_vector/)              | Vector store + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                          | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                                                 |
+
+:::info[Further reading]
+
+* See our [how-to guide](/docs/how_to/parent_document_retriever/) on using the ParentDocument retriever.
+* See our [how-to guide](/docs/how_to/multi_vector/) on using the MultiVector retriever.
+* See our RAG from Scratch video on the [multi vector retriever](https://youtu.be/gTCU9I6QqCE?feature=shared).
+
+:::
diff --git a/docs/docs/concepts/structured_output.mdx b/docs/docs/concepts/structured_output.mdx
new file mode 100644
index 00000000000..b36bd0498d4
--- /dev/null
+++ b/docs/docs/concepts/structured_output.mdx
@@ -0,0 +1,141 @@
+# Structured output
+
+## Overview 
+
+For many applications, such as chatbots, models need to respond to users directly in natural language. 
+However, there are scenarios where we need models to output in a *structured format*. 
+For example, we might want to store the model output in a database and ensure that the output conforms to the database schema.
+This need motivates the concept of structured output, where models can be instructed to respond with a particular output structure.
+
+![Structured output](/img/structured_output.png)
+
+## Key Concepts 
+
+**(1) Schema definition:** The output structure is represented as a schema, which can be defined in several ways. 
+**(2) Returning structured output:** The model is given this schema, and is instructed to return output that conforms to it.
+
+## Recommended usage
+
+This psuedo-code illustrates the recommended workflow when using structured output.
+LangChain provides a helper function, `with_structured_output()`, that automates the process of binding the schema to the model and parsing the output.
+This helper function is available for all model providers that support structured output. 
+
+```python
+# Define schema
+schema = {"foo": "bar"}
+# Bind schema to model
+model_with_structure = model.with_structured_output(schema)
+# Invoke the model to produce structured output that matches the schema
+structured_output = model_with_structure.invoke(user_input)
+```
+
+## Schema definition
+
+The central concept is that the output structure of model responses needs to be represented in some way. 
+While types of objects you can use depend on the model you're working with, there are common types of objects that are typically allowed or recommended for structured output in Python.
+
+The simplest and most common format for structured output is a JSON-like structure, which in Python can be represented as a dictionary (dict) or list (list).
+JSON objects (or dicts in Python) are often used directly when the tool requires raw, flexible, and minimal-overhead structured data.
+
+```json
+{
+  "answer": "The answer to the user's question",
+  "followup_question": "A followup question the user could ask"
+}
+```
+
+As a second example, [Pydantic](https://docs.pydantic.dev/latest/) is particularly useful for defining structured output schemas because it offers type hints and validation.
+Here's an example of a Pydantic schema: 
+
+```python
+from pydantic import BaseModel, Field
+class ResponseFormatter(BaseModel):
+    """Always use this tool to structure your response to the user."""
+    answer: str = Field(description="The answer to the user's question")
+    followup_question: str = Field(description="A followup question the user could ask")
+
+```
+
+TODO: There are many other ways to define schemas (Dataclasses, TypedDicts, Custom Classes). How many to cover? How many supported by popular model APIs?
+
+## Returning structured output
+
+With a schema defined, we need a way to instruct the model to use it.
+While one approach is to include this schema in the prompt and *ask nicely* for the model to use it, this is not recommended. 
+Several more powerful methods that utilites native features in the model provider's API are available.
+
+### Using Tool Calling
+
+Many [model providers support](/docs/integrations/chat/) tool calling, a concept discussed in more detail in our [tool calling guide](/docs/concepts/tool_calling/).
+In short, tool calling involves binding a tool to a model and, when appropriate, the model can *decide* to call this tool and ensure its response conforms to the tool's schema.
+With this in mind, the central concept is strightforward: *simply bind our schema to a model as a tool!*
+Here is an example using the `ResponseFormatter` schema defined above:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", temperature=0)
+# Bind ResponseFormatter schema as a tool to the model
+model_with_tools = model.bind_tools([ResponseFormatter])
+# Invoke the model
+ai_msg = model_with_tools.invoke("What is the powerhouse of the cell?")
+```
+
+The arguments of the tool call are already extracted as a dictionary. 
+This dictionary can be optionally parsed into a Pydantic object, matching our original `ResponseFormatter` schema.
+
+```python
+# Get the tool call arguments
+ai_msg.tool_calls[0]["args"]
+{'answer': "The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.",
+ 'followup_question': 'What is the function of ATP in the cell?'}
+# Parse the dictionary into a Pydantic object
+pydantic_object = ResponseFormatter.model_validate(ai_msg.tool_calls[0]["args"])
+```
+
+### JSON mode
+
+In addition to tool calling, some model providers support a feature called `JSON mode`. 
+This supports JSON schema definition as input and enforces the model to produce a conforming JSON output.
+You can find a table of model providers that support JSON mode [here](/docs/integrations/chat/).
+Here is an example of how to use JSON mode with OpenAI:
+
+```python
+from langchain_openai import ChatOpenAI
+model = ChatOpenAI(model="gpt-4o", model_kwargs={ "response_format": { "type": "json_object" } })
+ai_msg = model.invoke("Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]")
+ai_msg.content
+'\n{\n  "random_ints": [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]\n}'
+```
+
+One important point to flag: the model *still* returns a string, which needs to be parsed into a JSON object.
+This can, of course, simply use the `json` library or a JSON output parser if you need more adavanced functionality.
+See this [how-to guide on the JSON output parser](/docs/how_to/output_parser_json) for more details.
+
+```python
+import json
+json_object = json.loads(ai_msg.content)
+{'random_ints': [23, 47, 89, 15, 34, 76, 58, 3, 62, 91]}
+```
+
+## LangChain helper
+
+There a few challenges when producing structured output with the above methods: (1) If using tool calling, tool call arguments needs to be parsed from a dictionary back to the original schema.  
+(2) In addition, the model needs to be instructed to *always* use the tool when we want to enforce structured output, which is a provider specific setting. (3) If using JSON mode, the output needs to be parsed into a JSON object. 
+With these challenges in mind, LangChain provides a helper function (`with_structured_output()`) to streamline the process.
+
+![Diagram of with structured output](/img/with_structured_output.png)
+
+This both binds the schema to the model as a tool and parses the output to the specified output schema. 
+
+```python
+# Bind the schema to the model
+model_with_structure = model.with_structured_output(ResponseFormatter)
+# Invoke the model
+structured_output = model_with_structure.invoke("What is the powerhouse of the cell?")
+# Get back the Pydantic object
+structured_output
+ResponseFormatter(answer="The powerhouse of the cell is the mitochondrion. Mitochondria are organelles that generate most of the cell's supply of adenosine triphosphate (ATP), which is used as a source of chemical energy.", followup_question='What is the function of ATP in the cell?')
+```
+
+TODO: We need to explain the choice of implementation under the hood. Seems to be set with the `method` argument. How is default choosen? What if provider only has JSON mode? What inputs schemas are supported?
+For more details on usage, see our [how-to guide](/docs/how_to/structured_output/#the-with_structured_output-method).
diff --git a/docs/docs/concepts/text_splitters.mdx b/docs/docs/concepts/text_splitters.mdx
index 0a25d36d234..eebb3088349 100644
--- a/docs/docs/concepts/text_splitters.mdx
+++ b/docs/docs/concepts/text_splitters.mdx
@@ -1,18 +1,139 @@
 # Text Splitters
 
-Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
+<span data-heading-keywords="embedding,embeddings"></span>
 
-When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What "semantically related" means could depend on the type of text. This notebook showcases several ways to do that.
+:::info[Prerequisites]
 
-At a high level, text splitters work as following:
+* [Documents](/docs/concepts/retrievers/#interface)
+* Tokenization
 
-1. Split the text up into small, semantically meaningful chunks (often sentences).
-2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
-3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).
+:::
 
-That means there are two different axes along which you can customize your text splitter:
+## Overview
 
-1. How the text is split
-2. How the chunk size is measured
+Document splitting is often a crucial preprocessing step for many applications.
+It involves breaking down large texts into smaller, manageable chunks.
+This process offers several benefits, such as ensuring consistent processing of varying document lengths, overcoming input size limitations of models, and improving the quality of text representations used in retrieval systems.
+There are several strategies for splitting documents, each with its own advantages.
 
-For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
+## Key concepts
+
+![Conceptual Overview](/img/text_splitters.png)
+
+Text splitters split documents into smaller chunks for use in downstream applications.
+
+## Why split documents?
+
+There are several reasons to split documents:
+
+- **Handling non-uniform document lengths**: Real-world document collections often contain texts of varying sizes. Splitting ensures consistent processing across all documents.
+- **Overcoming model limitations**: Many embedding models and language models have maximum input size constraints. Splitting allows us to process documents that would otherwise exceed these limits.
+- **Improving representation quality**: For longer documents, the quality of embeddings or other representations may degrade as they try to capture too much information. Splitting can lead to more focused and accurate representations of each section.
+- **Enhancing retrieval precision**: In information retrieval systems, splitting can improve the granularity of search results, allowing for more precise matching of queries to relevant document sections.
+- **Optimizing computational resources**: Working with smaller chunks of text can be more memory-efficient and allow for better parallelization of processing tasks.
+
+Now, the next question is *how* to split the documents into chunks! There are several strategies, each with its own advantages.
+
+:::info[Further reading]
+
+* See Greg Kamradt's [chunkviz](https://chunkviz.up.railway.app/) to visualize different splitting strategies discussed below.
+
+:::
+
+## Approaches
+
+### Length-based
+
+The most intuitive strategy is to split documents based on their length. This simple yet effective approach ensures that each chunk doesn't exceed a specified size limit.
+Key benefits of length-based splitting:
+- Straightforward implementation
+- Consistent chunk sizes
+- Easily adaptable to different model requirements
+
+Types of length-based splitting:
+- **Token-based**: Splits text based on the number of tokens, which is useful when working with language models.
+- **Character-based**: Splits text based on the number of characters, which can be more consistent across different types of text.
+
+Example implementation using LangChain's `CharacterTextSplitter` with token-based splitting:
+
+```python
+from langchain_text_splitters import CharacterTextSplitter
+text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
+    encoding_name="cl100k_base", chunk_size=100, chunk_overlap=0
+)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [token-based](/docs/how_to/split_by_token/) splitting.
+* See the how-to guide for [character-based](/docs/how_to/character_text_splitter/) splitting.
+
+:::
+
+### Text-structured based
+
+Text is naturally organized into hierarchical units such as paragraphs, sentences, and words. 
+We can leverage this inherent structure to inform our splitting strategy, creating split that maintain natural language flow, maintain semantic coherence within split, and adapts to varying levels of text granularity.
+LangChain's [`RecursiveCharacterTextSplitter`](/docs/how_to/recursive_text_splitter/) implements this concept:
+- The `RecursiveCharacterTextSplitter` attempts to keep larger units (e.g., paragraphs) intact.
+- If a unit exceeds the chunk size, it moves to the next level (e.g., sentences).
+- This process continues down to the word level if necessary.
+
+Here is example usage:
+
+```python
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
+texts = text_splitter.split_text(document)
+```
+
+:::info[Further reading]
+
+* See the how-to guide for [recursive text splitting](/docs/how_to/recursive_text_splitter/).
+
+:::
+
+### Document-structured based
+
+Some documents have an inherent structure, such as HTML, Markdown, or JSON files. 
+In these cases, it's beneficial to split the document based on its structure, as it often naturally groups semantically related text.
+Key benefits of structure-based splitting:
+- Preserves the logical organization of the document
+- Maintains context within each chunk
+- Can be more effective for downstream tasks like retrieval or summarization
+
+Examples of structure-based splitting:
+- **Markdown**: Split based on headers (e.g., #, ##, ###)
+- **HTML**: Split using tags
+- **JSON**: Split by object or array elements
+- **Code**: Split by functions, classes, or logical blocks
+
+:::info[Further reading]
+
+* See the how-to guide for [Markdown splitting](/docs/how_to/markdown_header_metadata_splitter/).
+* See the how-to guide for [Recursive JSON splitting](/docs/how_to/recursive_json_splitter/).
+* See the how-to guide for [Code splitting](/docs/how_to/code_splitter/).
+* See the how-to guide for [HTML splitting](/docs/how_to/HTML_header_metadata_splitter/).
+
+:::
+
+### Semantic meaning based
+
+Unlike the previous methods, semantic-based splitting actually considers the *content* of the text. 
+While other approaches use document or text structure as proxies for semantic meaning, this method directly analyzes the text's semantics.
+There are several ways to implement this, but conceptually the approach is split text when there are significant changes in text *meaning*.
+As an example, we can use a sliding window approach to generate embeddings, and compare the embeddings to find significant differences:
+
+- Start with the first few sentences and generate an embedding.
+- Move to the next group of sentences and generate another embedding (e.g., using a sliding window approach).
+- Compare the embeddings to find significant differences, which indicate potential "break points" between semantic sections.
+
+This technique helps create chunks that are more semantically coherent, potentially improving the quality of downstream tasks like retrieval or summarization.
+
+:::info[Further reading]
+
+* See the how-to guide for [splitting text based on semantic meaning](/docs/how_to/semantic-chunker/).
+* See Greg Kamradt's [notebook ](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb) showcasing semantic splitting.
+
+:::
diff --git a/docs/docs/concepts/tool_calling.mdx b/docs/docs/concepts/tool_calling.mdx
index b1f72fe1fcc..d3b037070e9 100644
--- a/docs/docs/concepts/tool_calling.mdx
+++ b/docs/docs/concepts/tool_calling.mdx
@@ -1,3 +1,120 @@
 # Tool Calling
 
-Place holder
\ No newline at end of file
+## Overview 
+
+Many AI applications interact directly with humans. In these cases, it is approrpiate for models to respond in natural langague. 
+But what about cases where we want a model to also interact *directly* with systems, such as databases or an API?
+These systems often have a particular input schema; for example, APIs frequently have a required payload structure.
+This need motivates the concept of *tool calling*. You can use [tool calling](https://platform.openai.com/docs/guides/function-calling/example-use-cases) to request model responses that match a particular schema.
+
+:::info
+You will sometimes hear the term `function calling`. We use this term interchangeably with `tool calling`. 
+:::
+
+![Conceptual overview of tool calling](/img/tool_calling_concept.png)
+
+## Key Concepts 
+
+**(1) Tool Creation:** The tool needs to be described to the model so that the model knows what arguments to provide when it decides to call the tool.
+TODO: @eyurtsev to elaborate. **(2) Tool Binding:** The tool needs to be connected to a model that supports tool calling. 
+This gives the model awarenes of the tool and the associated input schema required by the tool.
+**(3) Tool Calling:** When appropriate, the model can decide to call a tool and ensure its response conforms to the tool's input schema.
+**(4) Tool Execution:** The tool can be executed using the arguments provided by the model.
+
+![Conceptual parts of tool calling](/img/tool_calling_components.png)
+
+## Recommended usage
+
+This psuedo-code illustrates the recommended workflow for using tool calling. 
+Created tools are passed to `.bind_tools()` method as a list.
+This model can be called, as usual. If a tool call is made, model's response will contain the tool call arguments.
+The tool call arguemtns can be passed directly to the tool.
+
+```python
+# Tool creation
+tools = [my_tool]
+# Tool binding
+modelwtools = model.bind_tools(tools)
+# Tool calling 
+response = modelwtools.invoke(user_input)
+# Tool execution
+tool_output = my_tool(response.tool_calls)
+```
+
+## Tool Creation 
+
+TODO: @eyurtsev to add links and summary of the conceptual guide. 
+
+## Tool Binding 
+
+[Many](https://platform.openai.com/docs/guides/function-calling) [model providers](https://platform.openai.com/docs/guides/function-calling) support tool calling. 
+
+:::tip
+See our [model integration page](/docs/integrations/chat/) for a list of providers that support tool calling.
+:::
+
+The central concept to understand is that LangChain provides a standardized interface for connecting tools to models. 
+The `.bind_tools()` method can be used to specify which tools are available for a model to call. 
+
+```python
+model_with_tools = model.bind_tools([tools_list])
+```
+
+As a specific example, let's take a function `multiply` and bind it as a tool to a model that supports tool calling.
+
+```python
+def multiply(a: int, b: int) -> int:
+    """Multiply a and b.
+
+    Args:
+        a: first int
+        b: second int
+    """
+    return a * b
+
+llm_with_tools = tool_calling_model.bind_tools([multiply])
+```
+
+## Tool Calling
+
+![Diagram of a tool call by a model](/img/tool_call_example.png)
+
+A key principle of tool calling is that the model decides when to use a tool based on the input's relevance. The model doesn't always need to call a tool.
+For example, given an unrelated input, the model would not call the tool:
+
+```python
+result = llm_with_tools.invoke("Hello world!")
+```
+
+The result would be an `AIMessage` containing the model's response in natural language (e.g., "Hello!").
+However, if we pass an input *relevant to the tool*, the model should choose to call it:
+
+```python
+result = llm_with_tools.invoke("What is 2 multiplied by 3?")
+```
+
+As before, the output `result` will be an `AIMessage`. 
+But, if the tool was called, `result` will have a `tool_calls` attribute.
+This attribute includes everything needed to execute the tool, including the tool name and input arguments:
+
+```
+result.tool_calls
+{'name': 'multiply', 'args': {'a': 2, 'b': 3}, 'id': 'xxx', 'type': 'tool_call'}
+```
+
+For more details on usage, see our [how-to guides](/docs/how_to/#tools)!
+
+## Tool execution
+
+TODO: @eyurtsev @vbarda let's discuss this. This should also link to tool execution in LangGraph. 
+
+## Best practices
+
+When designing tools to be used by a model, it is important to keep in mind that:
+
+* Models that have explicit [tool-calling APIs](/docs/concepts/#functiontool-calling) will be better at tool calling than non-fine-tuned models.
+* Models will perform better if the tools have well-chosen names and descriptions.
+* Simple, narrowly scoped tools are easier for models to use than complex tools.
+* Asking the model to select from a large list of tools poses challenges for the model.
+
+
diff --git a/docs/docs/concepts/vectorstores.mdx b/docs/docs/concepts/vectorstores.mdx
index 978507b321a..2e16a1c9ff6 100644
--- a/docs/docs/concepts/vectorstores.mdx
+++ b/docs/docs/concepts/vectorstores.mdx
@@ -2,18 +2,187 @@
 
 <span data-heading-keywords="vector,vectorstore,vectorstores,vector store,vector stores"></span>
 
-One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors,
-and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query.
-A vector store takes care of storing embedded data and performing vector search for you.
+:::info[Prerequisites]
 
-Most vector stores can also store metadata about embedded vectors and support filtering on that metadata before
-similarity search, allowing you more control over returned documents.
+* [Embeddings](/docs/concepts/embedding_models/)
+* [Text splitters](/docs/concepts/text_splitters/)
 
-Vector stores can be converted to the retriever interface by doing:
+:::
+:::info[Note]
+
+This conceptual overview focuses on text-based indexing and retrieval for simplicity. 
+However, embedding models can be [multi-modal](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-multimodal-embeddings)
+and vectorstores can be used to store and retrieve a variety of data types beyond text.
+ 
+:::
+
+## Overview
+
+Vectorstores are a powerful and efficient way to index and retrieve unstructured data. 
+They leverage vector [embeddings](/docs/concepts/embedding_models/), which are numerical representations of unstructured data that capture semantic meaning.
+At their core, vectorstores utilize specialized data structures called vector indices. 
+These indices are designed to perform efficient similarity searches over embedding vectors, allowing for rapid retrieval of relevant information based on semantic similarity rather than exact keyword matches.
+
+## Key concept
+
+![Vectorstores](/img/vectorstores.png)
+
+There are [many different types of vectorstores](/docs/integrations/vectorstores/).
+LangChain provides a universal interface for working with them, providing standard methods for common operations.
+
+## Adding documents
+
+Using [Pinecone](https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstore.PineconeVectorStore.html#langchain_pinecone.vectorstores.PineconeVectorStore) as an example, we initialize a vectorstore with the [embedding](/docs/concepts/embedding_models/) model we want to use:
 
 ```python
-vectorstore = MyVectorStore()
-retriever = vectorstore.as_retriever()
+from pinecone import Pinecone
+from langchain_openai import OpenAIEmbeddings
+from langchain_pinecone import PineconeVectorStore
+
+# Initialize Pinecone
+pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
+
+# Initialize with an embedding model
+vector_store = PineconeVectorStore(index=pc.Index(index_name), embedding=OpenAIEmbeddings())
 ```
 
-For specifics on how to use vector stores, see the [relevant how-to guides here](/docs/how_to/#vector-stores).
+Given a vectorstore, we need the ability to add documents to it.
+The `add_texts` and `add_documents` methods can be used to add texts (strings) and documents (LangChain [Document](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html) objects) to a vectorstore, respectively.
+As an example, we can create a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
+`Document` objects all have `page_content` and `metadata` attributes, making them a universal way to store unstructured text and associated metadata.
+
+```python
+from langchain_core.documents import Document
+document_1 = Document(
+    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
+    metadata={"source": "tweet"},
+)
+
+document_2 = Document(
+    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
+    metadata={"source": "news"},
+)
+documents = [document_1, document_2]
+```
+
+When we use the `add_documents` method to add the documents to the vectorstore, the vectorstore will use the provided embedding model to create an embedding of each document. 
+What happens if we add the same document twice?
+Many vectorstores support [`upsert`](https://docs.pinecone.io/guides/data/upsert-data) functionality, which combines the functionality of inserting and updating records.
+To use this, we simply supply a unique identifier for each document when we add it to the vectorstore using `add_documents` or `add_texts`.
+If the record doesn't exist, it inserts a new record.
+If the record already exists, it updates the existing record.
+
+```python
+# Given a list of documents and a vector store
+uuids = [str(uuid4()) for _ in range(len(documents))]
+vector_store.add_documents(documents=documents, ids=uuids)
+```
+
+:::info[Further reading]
+
+* See the [full list of LangChain vectorstore integrations](/docs/integrations/vectorstores/).
+* See Pinecone's [documentation](https://docs.pinecone.io/guides/data/upsert-data) on the `upsert` method.
+
+:::
+
+## Search
+
+Vectorstores embed and store the documents that added.
+If we pass in a query, the vectorstore will embed the query, perform a similarity search over the embedded documents, and return the most similar ones.
+This captures two important concepts: first, there needs to be a way to measure the similarity between the query and *any* [embedded](/docs/concepts/embedding_models/) document.
+Second, there needs to be an algorithm to efficiently perform this similarity search across *all* embedded documents.
+
+### Similarity metrics
+
+A critical advange of embeddings vectors is they can be compared using many simple mathematical operations:
+
+- **Cosine Similarity**: Measures the cosine of the angle between two vectors.
+- **Euclidean Distance**: Measures the straight-line distance between two points.
+- **Dot Product**: Measures the projection of one vector onto another.
+
+The choice of similarity metric can sometimes be selected when initializing the vectorstore.
+As an example, Pinecone allows the user to select the [similarity metric on index creation](/docs/integrations/vectorstores/pinecone/#initialization).
+
+```python
+pc.create_index(
+        name=index_name,
+        dimension=3072,
+        metric="cosine",
+    )
+```
+
+:::info[Further reading]
+
+* See [this documentation](https://developers.google.com/machine-learning/clustering/dnn-clustering/supervised-similarity) from Google on similarity metrics to consider with embeddings.
+* See Pinecone's [blog post](https://www.pinecone.io/learn/vector-similarity/) on similarity metrics.
+* See OpenAI's [FAQ](https://platform.openai.com/docs/guides/embeddings/faq) on what similarity metric to use with OpenAI embeddings.
+
+:::
+
+### Similarity Search
+
+Given a similarity metric to measure the distance between the embedded query and any embedded document, we need an algorithm to efficiently search over *all* the embedded documents to find the most similar ones.
+There are various ways to do this. As an example, many vectorstores implement [HNSW (Hierarchical Navigable Small World)](https://www.pinecone.io/learn/series/faiss/hnsw/), a graph-based index structure that allows for efficient similarity search.
+Regardless of the search algorithm used under the hood, the LangChain vectorstore interface has a `similarity_search` method for all integrations. 
+This will take the search query, create an embedding, find similar documents, and return them as a list of [Documents](https://api.python.langchain.com/en/latest/documents/langchain_core.documents.base.Document.html).
+
+```python
+query = "my query"
+docs = vectorstore.similarity_search(query)
+```
+
+Many vectorstores support search parameters to be passed with the `similarity_search` method. See the documentation for the specific vectorstore you are using to see what parameters are supported.
+As an example [Pinecone](https://python.langchain.com/api_reference/pinecone/vectorstores/langchain_pinecone.vectorstores.PineconeVectorStore.html#langchain_pinecone.vectorstores.PineconeVectorStore.similarity_search) several parameters that are important general concepts:
+Many vectorstores support [the `k`](/docs/integrations/vectorstores/pinecone/#query-directly), which controls the number of Documents to return, and `filter`, which allows for filtering documents by metadata.
+
+- `query (str) – Text to look up documents similar to.`
+- `k (int) – Number of Documents to return. Defaults to 4.`
+- `filter (dict | None) – Dictionary of argument(s) to filter on metadata`
+
+:::info[Further reading]
+
+* See the [how-to guide](/docs/how_to/vectorstores/) for more details on how to use the `similarity_search` method.
+* See the [integrations page](/docs/integrations/vectorstores/) for more details on arguments that can be passed in to the `similarity_search` method for specific vectorstores.
+
+:::
+
+### Metadata filtering
+
+While vectorstore implement a search algorithm to efficiently search over *all* the embedded documents to find the most similar ones, many also support filtering on metadata.
+This allows structured filters to reduce the size of the similarity search space. These two concepts work well together:
+
+1. **Semantic search**: Query the unstructured data directly, often using via embedding or keyword similarity.
+2. **Metadata search**: Apply structured query to the metadata, filering specific documents.
+
+Vectorstore support for metadata filtering is typically dependent on the underlying vector store implementation.
+Here is example usage with [Pinecone](/docs/integrations/vectorstores/pinecone/#query-directly), showing that we filter for all documents that have the metadata key `source` with value `tweet`.
+
+```python
+results = vector_store.similarity_search(
+    "LangChain provides abstractions to make working with LLMs easy",
+    k=2,
+    filter={"source": "tweet"},
+)
+```  
+
+:::info[Further reading]
+
+* See Pinecone's [documentation](https://docs.pinecone.io/guides/data/filter-with-metadata) on filtering with metadata.
+* See the [list of LangChain vectorstore integrations](/docs/integrations/retrievers/self_query/) that support metadata filtering.
+
+:::
+
+## Advanced search and retreival techniques
+
+While algorithms like HNSW provide the foundation for efficient similarity search in many cases, additional techniques can be employed to improve search quality and diversity.
+For example, [maximal marginal relevance](https://python.langchain.com/v0.1/docs/modules/model_io/prompts/example_selectors/mmr/) is a re-ranking algorithm used to diversify search results, which is applied after the initial similarity search to ensure a more diverse set of results.
+As a second example, some [vector stores](/docs/integrations/retrievers/pinecone_hybrid_search/) offer built-in [hybrid-search](https://docs.pinecone.io/guides/data/understanding-hybrid-search) to combine keyword and semantic similarity search, which marries the benefits of both approaches. 
+At the moment, there is no unified way to perform hybrid search using LangChain vectorstores, but it is generally exposed as a keyword argument that is passed in with `similarity_search`.
+See this [how-to guide on hybrid search](/docs/how_to/hybrid/) for more details.
+
+| Name              | When to use                                              | Description                                                                                                                                                                            |
+|-------------------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| [Hybrid search](/docs/integrations/retrievers/pinecone_hybrid_search/)     | When combining keyword-based and semantic similarity.    | Hybrid search combines keyword and semantic similarity, marrying the benefits of both approaches. [Paper](https://arxiv.org/abs/2210.11934).                                                                               |
+| [Maximal Marginal Relevance (MMR)](/docs/integrations/vectorstores/pinecone/#maximal-marginal-relevance-searches) | When needing to diversify search results. | MMR attempts to diversify the results of a search to avoid returning similar and redundant documents.                                                                                  |
+
+ 
diff --git a/docs/docs/concepts/why_langchain.mdx b/docs/docs/concepts/why_langchain.mdx
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/docs/static/img/embeddings_colbert.png b/docs/static/img/embeddings_colbert.png
new file mode 100644
index 00000000000..9b1bc9d1379
Binary files /dev/null and b/docs/static/img/embeddings_colbert.png differ
diff --git a/docs/static/img/embeddings_concept.png b/docs/static/img/embeddings_concept.png
new file mode 100644
index 00000000000..692ed1d4dc6
Binary files /dev/null and b/docs/static/img/embeddings_concept.png differ
diff --git a/docs/static/img/rag_concepts.png b/docs/static/img/rag_concepts.png
new file mode 100644
index 00000000000..3093f925f05
Binary files /dev/null and b/docs/static/img/rag_concepts.png differ
diff --git a/docs/static/img/retrieval_concept.png b/docs/static/img/retrieval_concept.png
new file mode 100644
index 00000000000..93e9db6f4b1
Binary files /dev/null and b/docs/static/img/retrieval_concept.png differ
diff --git a/docs/static/img/retrieval_high_level.png b/docs/static/img/retrieval_high_level.png
new file mode 100644
index 00000000000..461fe773de0
Binary files /dev/null and b/docs/static/img/retrieval_high_level.png differ
diff --git a/docs/static/img/retriever_concept.png b/docs/static/img/retriever_concept.png
new file mode 100644
index 00000000000..4a288d3d49b
Binary files /dev/null and b/docs/static/img/retriever_concept.png differ
diff --git a/docs/static/img/retriever_full_docs.png b/docs/static/img/retriever_full_docs.png
new file mode 100644
index 00000000000..a50ef823f5f
Binary files /dev/null and b/docs/static/img/retriever_full_docs.png differ
diff --git a/docs/static/img/structured_output.png b/docs/static/img/structured_output.png
new file mode 100644
index 00000000000..00511a2a111
Binary files /dev/null and b/docs/static/img/structured_output.png differ
diff --git a/docs/static/img/text_splitters.png b/docs/static/img/text_splitters.png
new file mode 100644
index 00000000000..6f5c06a2174
Binary files /dev/null and b/docs/static/img/text_splitters.png differ
diff --git a/docs/static/img/tool_call_example.png b/docs/static/img/tool_call_example.png
new file mode 100644
index 00000000000..9e122f43f6e
Binary files /dev/null and b/docs/static/img/tool_call_example.png differ
diff --git a/docs/static/img/tool_calling_agent.png b/docs/static/img/tool_calling_agent.png
new file mode 100644
index 00000000000..12bd9a33701
Binary files /dev/null and b/docs/static/img/tool_calling_agent.png differ
diff --git a/docs/static/img/tool_calling_components.png b/docs/static/img/tool_calling_components.png
new file mode 100644
index 00000000000..582fd7057c8
Binary files /dev/null and b/docs/static/img/tool_calling_components.png differ
diff --git a/docs/static/img/tool_calling_concept.png b/docs/static/img/tool_calling_concept.png
new file mode 100644
index 00000000000..7abdee69226
Binary files /dev/null and b/docs/static/img/tool_calling_concept.png differ
diff --git a/docs/static/img/vectorstores.png b/docs/static/img/vectorstores.png
new file mode 100644
index 00000000000..fb6604c1c81
Binary files /dev/null and b/docs/static/img/vectorstores.png differ
diff --git a/docs/static/img/with_structured_output.png b/docs/static/img/with_structured_output.png
new file mode 100644
index 00000000000..bf14853dc06
Binary files /dev/null and b/docs/static/img/with_structured_output.png differ