Add new types of document transformers (#7379)

- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
  - Issue: N/A
  - Dependencies: `doctran = "^0.0.5"`
  - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 
  - Twitter handle: @psychicapi or @jfan001

Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Jason Fan
2023-07-12 20:53:30 -07:00
committed by GitHub
parent f11d845dee
commit 8effd90be0
17 changed files with 985 additions and 6 deletions

View File

@@ -24,7 +24,7 @@ That means there are two different axes along which you can customize your text
1. How the text is split
2. How the chunk size is measured
## Get started with text splitters
### Get started with text splitters
import GetStarted from "@snippets/modules/data_connection/document_transformers/get_started.mdx"

View File

@@ -8,7 +8,7 @@ Many LLM applications require user-specific data that is not part of the model's
building blocks to load, transform, store and query your data via:
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, drop redundant documents, and more
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data