mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-10 07:21:03 +00:00
Add new types of document transformers (#7379)
- Description: Add two new document transformers that translates documents into different languages and converts documents into q&a format to improve vector search results. Uses OpenAI function calling via the [doctran](https://github.com/psychic-api/doctran/tree/main) library. - Issue: N/A - Dependencies: `doctran = "^0.0.5"` - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 - Twitter handle: @psychicapi or @jfan001 Notes - Adheres to the `DocumentTransformer` abstraction set by @dev2049 in #3182 - refactored `EmbeddingsRedundantFilter` to put it in a file under a new `document_transformers` module - Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as well as the existing `EmbeddingsRedundantFilter` --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
@@ -45,3 +45,13 @@ print(texts[1])
|
||||
```
|
||||
|
||||
</CodeOutputBlock>
|
||||
|
||||
|
||||
## Other transformations:
|
||||
### Filter redundant docs, translate docs, extract metadata, and more
|
||||
|
||||
We can do perform a number of transformations on docs which are not simply splitting the text. With the
|
||||
`EmbeddingsRedundantFilter` we can identify similar documents and filter out redundancies. With integrations like
|
||||
[doctran](https://github.com/psychic-api/doctran/tree/main) we can do things like translate documents from one language
|
||||
to another, extract desired properties and add them to metadata, and convert conversational dialogue into a Q/A format
|
||||
set of documents.
|
||||
|
Reference in New Issue
Block a user