langchain/docs/docs/integrations/providers/docugami.mdx
Taqi Jaffri 144710ad9a
langchain[minor]: Updated DocugamiLoader, includes breaking changes (#13265)
There are the following main changes in this PR:

1. Rewrite of the DocugamiLoader to not do any XML parsing of the DGML
format internally, and instead use the `dgml-utils` library we are
separately working on. This is a very lightweight dependency.
2. Added MMR search type as an option to multi-vector retriever, similar
to other retrievers. MMR is especially useful when using Docugami for
RAG since we deal with large sets of documents within which a few might
be duplicates and straight similarity based search doesn't give great
results in many cases.

We are @docugami on twitter, and I am @tjaffri

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2023-11-28 15:56:22 -08:00

21 lines
550 B
Plaintext

# Docugami
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests
> of XML semantic trees representing entire documents. This is a rich representation that includes the semantic and
> structural characteristics of various chunks in the document as an XML tree.
## Installation and Setup
```bash
pip install dgml-utils
```
## Document Loader
See a [usage example](/docs/integrations/document_loaders/docugami).
```python
from langchain.document_loaders import DocugamiLoader
```