mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-06 05:08:20 +00:00
docs: unstructured no longer requires installing detectron2 from source (#5524)
# Update Unstructured docs to remove the `detectron2` install instructions Removes `detectron2` installation instructions from the Unstructured docs because installing `detectron2` is no longer required for `unstructured>=0.7.0`. The `detectron2` model now runs using the ONNX runtime. ## Who can review? @hwchase17 @eyurtsev
This commit is contained in:
parent
d765d77e9b
commit
4c8aad0d1b
@ -4,8 +4,7 @@
|
|||||||
[Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like
|
[Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like
|
||||||
PDFs and Word documents.
|
PDFs and Word documents.
|
||||||
This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured)
|
This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured)
|
||||||
ecosystem within LangChain.
|
ecosystem within LangChain.
|
||||||
|
|
||||||
|
|
||||||
## Installation and Setup
|
## Installation and Setup
|
||||||
|
|
||||||
@ -20,12 +19,6 @@ its dependencies running locally.
|
|||||||
- `tesseract-ocr`(images and PDFs)
|
- `tesseract-ocr`(images and PDFs)
|
||||||
- `libreoffice` (MS Office docs)
|
- `libreoffice` (MS Office docs)
|
||||||
- `pandoc` (EPUBs)
|
- `pandoc` (EPUBs)
|
||||||
- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which
|
|
||||||
`unstructured` uses for layout detection:
|
|
||||||
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2"`
|
|
||||||
- If `detectron2` is not installed, `unstructured` will fallback to processing PDFs
|
|
||||||
using the `"fast"` strategy, which uses `pdfminer` directly and doesn't require
|
|
||||||
`detectron2`.
|
|
||||||
|
|
||||||
If you want to get up and running with less set up, you can
|
If you want to get up and running with less set up, you can
|
||||||
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
|
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
|
||||||
|
@ -19,7 +19,6 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# # Install package\n",
|
"# # Install package\n",
|
||||||
"!pip install \"unstructured[local-inference]\"\n",
|
"!pip install \"unstructured[local-inference]\"\n",
|
||||||
"!pip install \"detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2\"\n",
|
|
||||||
"!pip install layoutparser[layoutmodels,tesseract]"
|
"!pip install layoutparser[layoutmodels,tesseract]"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
Loading…
Reference in New Issue
Block a user