mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-31 12:09:58 +00:00
There are the following main changes in this PR: 1. Rewrite of the DocugamiLoader to not do any XML parsing of the DGML format internally, and instead use the `dgml-utils` library we are separately working on. This is a very lightweight dependency. 2. Added MMR search type as an option to multi-vector retriever, similar to other retrievers. MMR is especially useful when using Docugami for RAG since we deal with large sets of documents within which a few might be duplicates and straight similarity based search doesn't give great results in many cases. We are @docugami on twitter, and I am @tjaffri --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
21 lines
550 B
Plaintext
21 lines
550 B
Plaintext
# Docugami
|
|
|
|
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests
|
|
> of XML semantic trees representing entire documents. This is a rich representation that includes the semantic and
|
|
> structural characteristics of various chunks in the document as an XML tree.
|
|
|
|
## Installation and Setup
|
|
|
|
|
|
```bash
|
|
pip install dgml-utils
|
|
```
|
|
|
|
## Document Loader
|
|
|
|
See a [usage example](/docs/integrations/document_loaders/docugami).
|
|
|
|
```python
|
|
from langchain.document_loaders import DocugamiLoader
|
|
```
|