Add new types of document transformers (#7379)

- Description: Add two new document transformers that translates documents into different languages and converts documents into q&a format to improve vector search results. Uses OpenAI function calling via the [doctran](https://github.com/psychic-api/doctran/tree/main) library. - Issue: N/A - Dependencies: `doctran = "^0.0.5"` - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 - Twitter handle: @psychicapi or @jfan001 Notes - Adheres to the `DocumentTransformer` abstraction set by @dev2049 in #3182 - refactored `EmbeddingsRedundantFilter` to put it in a file under a new `document_transformers` module - Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as well as the existing `EmbeddingsRedundantFilter` --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-09-04 12:39:32 +00:00 · 2023-07-12 20:53:30 -07:00
parent f11d845dee
commit 8effd90be0
17 changed files with 985 additions and 6 deletions
--- a/docs/docs_skeleton/docs/modules/data_connection/document_transformers/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_transformers/index.mdx
@@ -24,7 +24,7 @@ That means there are two different axes along which you can customize your text
 1. How the text is split
 2. How the chunk size is measured

-## Get started with text splitters
+### Get started with text splitters

 import GetStarted from "@snippets/modules/data_connection/document_transformers/get_started.mdx"

--- a/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/_category_.yml
+++ b/docs/docs_skeleton/docs/modules/data_connection/document_transformers/text_splitters/_category_.yml
@@ -1 +1,2 @@
 label: 'Text splitters'
+position: 0
--- a/docs/docs_skeleton/docs/modules/data_connection/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/index.mdx
@@ -8,7 +8,7 @@ Many LLM applications require user-specific data that is not part of the model's
 building blocks to load, transform, store and query your data via:

 - [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, drop redundant documents, and more
+- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
 - [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
 - [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
 - [Retrievers](/docs/modules/data_connection/retrievers/): Query your data