mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-21 23:17:48 +00:00
fix: to rag-semi-structured template (#14568)
**Description:** Fixes to rag-semi-structured template. - Added required libraries - pdfminer was causing issues when installing with pip. pdfminer.six works best - Changed the pdf name for demo from llama2 to llava <!-- Thank you for contributing to LangChain! Replace this entire comment with: - **Description:** a description of the change, - **Issue:** the issue # it fixes (if applicable), - **Dependencies:** any dependencies required for this change, - **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below), - **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/extras` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->
This commit is contained in:
parent
a019183a01
commit
a4992ffada
@ -8,7 +8,7 @@ authors = [
|
||||
readme = "README.md"
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.8.1,<4.0"
|
||||
python = ">=3.9,<3.11"
|
||||
langchain = ">=0.0.325"
|
||||
tiktoken = ">=0.5.1"
|
||||
chromadb = ">=0.4.14"
|
||||
@ -16,6 +16,12 @@ openai = "<2"
|
||||
unstructured = ">=0.10.19"
|
||||
pdf2image = ">=1.16.3"
|
||||
pdfminer = "^20191125"
|
||||
opencv-python = "^4.8.1.78"
|
||||
pandas = "^2.1.4"
|
||||
pytesseract = "^0.3.10"
|
||||
pdfminer-six = "^20221105"
|
||||
unstructured-pytesseract = "^0.3.12"
|
||||
unstructured-inference = "^0.7.18"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
langchain-cli = ">=0.0.15"
|
||||
|
@ -16,7 +16,7 @@ from unstructured.partition.pdf import partition_pdf
|
||||
# Path to docs
|
||||
path = "docs"
|
||||
raw_pdf_elements = partition_pdf(
|
||||
filename=path + "LLaMA2.pdf",
|
||||
filename=path + "/LLaVA.pdf",
|
||||
# Unstructured first finds embedded image blocks
|
||||
extract_images_in_pdf=False,
|
||||
# Use layout model (YOLOX) to get bounding boxes (for tables) and find titles
|
||||
|
Loading…
Reference in New Issue
Block a user