mirror of
https://github.com/hwchase17/langchain.git
synced 2026-03-19 19:44:08 +00:00
Semi structured RAG
This template performs RAG on semi-structured data (e.g., a PDF with text and tables).
See this blog post for useful background context.
Data loading
We use partition_pdf from Unstructured to extract both table and text elements.
This will require some system-level package installations, e.g., on Mac:
brew install tesseract poppler
Chroma
Chroma is an open-source vector database.
This template will create and add documents to the vector database in chain.py.
These documents can be loaded from many sources.
LLM
Be sure that OPENAI_API_KEY is set in order to the OpenAI models.
Adding the template
Create your LangServe app:
langchain serve new my-app
cd my-app
Add template:
langchain serve add rag-semi-structured
Start server:
langchain start
See Jupyter notebook rag_semi_structured for various way to connect to the template.