mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-01 10:54:15 +00:00
Update multi-modal template README.md (#14991)
<!-- Thank you for contributing to LangChain! Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified. Replace this entire comment with: - **Description:** a description of the change, - **Issue:** the issue # it fixes if applicable, - **Dependencies:** any dependencies required for this change, - **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->
This commit is contained in:
parent
ca0a75e1fc
commit
448b4d3522
@ -1,13 +1,15 @@
|
||||
|
||||
# rag-chroma-multi-modal
|
||||
|
||||
Multi-modal LLMs enable text-to-image retrieval and question-answering over images.
|
||||
Multi-modal LLMs enable visual assistants that can perform question-answering about images.
|
||||
|
||||
You can ask questions in natural language about a collection of photos, retrieve relevant ones, and have a multi-modal LLM answer questions about the retrieved images.
|
||||
This template create a visual assistant for slide decks, which often contain visuals such as graphs or figures.
|
||||
|
||||
This template performs text-to-image retrieval for question-answering about a slide deck, which often contains visual elements that are not captured in standard RAG.
|
||||
It uses OpenCLIP embeddings to embed all of the slide images and stores them in Chroma.
|
||||
|
||||
Given a question, relevat slides are retrieved and passed to GPT-4V for answer synthesis.
|
||||
|
||||
This will use OpenCLIP embeddings and GPT-4V for answer synthesis.
|
||||

|
||||
|
||||
## Input
|
||||
|
||||
@ -112,4 +114,4 @@ We can access the template from code with:
|
||||
from langserve.client import RemoteRunnable
|
||||
|
||||
runnable = RemoteRunnable("http://localhost:8000/rag-chroma-multi-modal")
|
||||
```
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user