docs: misc retrievers fixes (#9791)

Various miscellaneous fixes to most pages in the 'Retrievers' section of the documentation: - "VectorStore" and "vectorstore" changed to "vector store" for consistency - Various spelling, grammar, and formatting improvements for readability Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2025-09-05 13:06:03 +00:00 · 2023-09-03 20:26:49 -07:00
parent 8bc452a466
commit 16945c9922
39 changed files with 148 additions and 163 deletions
--- a/docs/docs_skeleton/docs/get_started/quickstart.mdx
+++ b/docs/docs_skeleton/docs/get_started/quickstart.mdx
@@ -59,8 +59,8 @@ LangChain provides several objects to easily distinguish between different roles
 If none of those roles sound right, there is also a `ChatMessage` class where you can specify the role manually.
 For more information on how to use these different messages most effectively, see our prompting guide.

-LangChain exposes a standard interface for both, but it's useful to understand this difference in order to construct prompts for a given language model.
-The standard interface that LangChain exposes has two methods:
+LangChain provides a standard interface for both, but it's useful to understand this difference in order to construct prompts for a given language model.
+The standard interface that LangChain provides has two methods:
 - `predict`: Takes in a string, returns a string
 - `predict_messages`: Takes in a list of messages, returns a message.

--- a/docs/docs_skeleton/docs/modules/data_connection/document_loaders/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_loaders/index.mdx
@@ -11,7 +11,7 @@ Use document loaders to load data from a source as `Document`'s. A `Document` is
 and associated metadata. For example, there are document loaders for loading a simple `.txt` file, for loading the text
 contents of any web page, or even for loading a transcript of a YouTube video.

-Document loaders expose a "load" method for loading data as documents from a configured source. They optionally
+Document loaders provide a "load" method for loading data as documents from a configured source. They optionally
 implement a "lazy load" as well for lazily loading data into memory.

 ## Get started
--- a/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/character_text_splitter.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/character_text_splitter.mdx
@@ -2,8 +2,8 @@

 This is the simplest method. This splits based on characters (by default "\n\n") and measure chunk length by number of characters.

-1. How the text is split: by single character
-2. How the chunk size is measured: by number of characters
+1. How the text is split: by single character.
+2. How the chunk size is measured: by number of characters.

 import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/character_text_splitter.mdx"

--- a/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/code_splitter.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/code_splitter.mdx
@@ -1,6 +1,6 @@
 # Split code

-CodeTextSplitter allows you to split your code with multiple language support. Import enum `Language` and specify the language. 
+CodeTextSplitter allows you to split your code with multiple languages supported. Import enum `Language` and specify the language. 

 import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/code_splitter.mdx"

--- a/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter.mdx
@@ -2,8 +2,8 @@

 This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is `["\n\n", "\n", " ", ""]`. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.

-1. How the text is split: by list of characters
-2. How the chunk size is measured: by number of characters
+1. How the text is split: by list of characters.
+2. How the chunk size is measured: by number of characters.

 import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter.mdx"

--- a/docs/docs_skeleton/docs/modules/data_connection/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/index.mdx
@@ -37,7 +37,7 @@ efficiently find other pieces of text that are similar.
 LangChain provides integrations with over 25 different embedding providers and methods,
 from open-source to proprietary API,
 allowing you to choose the one best suited for your needs.
-LangChain exposes a standard interface, allowing you to easily swap between models.
+LangChain provides a standard interface, allowing you to easily swap between models.

 **[Vector stores](/docs/modules/data_connection/vectorstores/)**

@@ -55,7 +55,7 @@ However, we have also added a collection of algorithms on top of this to increas
 These include:

 - [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query
+- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query.
 - [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
 - And more!

--- a/docs/docs_skeleton/docs/modules/data_connection/retrievers/contextual_compression/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/retrievers/contextual_compression/index.mdx
@@ -5,10 +5,10 @@ One challenge with retrieval is that usually you don't know the specific queries
 Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.

 To use the Contextual Compression Retriever, you'll need:
- a base Retriever
+- a base retriever
 - a Document Compressor

-The Contextual Compression Retriever passes queries to the base Retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of Documents and shortens it by reducing the contents of Documents or dropping Documents altogether.
+The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.

 ![](https://drive.google.com/uc?id=1CtNgWODXZudxAWSRiWgSGEoTNrUFT98v)

--- a/docs/docs_skeleton/docs/modules/data_connection/retrievers/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/retrievers/index.mdx
@@ -8,7 +8,7 @@ Head to [Integrations](/docs/integrations/retrievers/) for documentation on buil
 :::

 A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store.
-A retriever does not need to be able to store documents, only to return (or retrieve) it. Vector stores can be used
+A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used
 as the backbone of a retriever, but there are other types of retrievers as well.

 ## Get started
--- a/docs/docs_skeleton/docs/modules/data_connection/retrievers/vectorstore.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/retrievers/vectorstore.mdx
@@ -1,9 +1,9 @@
 # Vector store-backed retriever

-A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the Vector Store class to make it conform to the Retriever interface.
+A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface.
 It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.

-Once you construct a Vector store, it's very easy to construct a retriever. Let's walk through an example.
+Once you construct a vector store, it's very easy to construct a retriever. Let's walk through an example.

 import Example from "@snippets/modules/data_connection/retrievers/how_to/vectorstore.mdx"

--- a/docs/docs_skeleton/docs/modules/data_connection/text_embedding/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/text_embedding/index.mdx
@@ -11,7 +11,7 @@ The Embeddings class is a class designed for interfacing with text embedding mod

 Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

-The base Embeddings class in LangChain exposes two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).
+The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself).

 ## Get started

--- a/docs/docs_skeleton/docs/modules/data_connection/vectorstores/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/vectorstores/index.mdx
@@ -16,7 +16,7 @@ for you.

 ## Get started

-This walkthrough showcases basic functionality related to VectorStores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the [text embedding model](/docs/modules/data_connection/text_embedding/) interfaces before diving into this.
+This walkthrough showcases basic functionality related to vector stores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the [text embedding model](/docs/modules/data_connection/text_embedding/) interfaces before diving into this.

 import GetStarted from "@snippets/modules/data_connection/vectorstores/get_started.mdx"