mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-02 03:15:11 +00:00
Update multi-modal multi-vector template README.md (#14992)
This commit is contained in:
parent
448b4d3522
commit
8996d1a65d
@ -1,13 +1,15 @@
|
|||||||
|
|
||||||
# rag-chroma-multi-modal-multi-vector
|
# rag-chroma-multi-modal-multi-vector
|
||||||
|
|
||||||
Multi-modal LLMs enable text-to-image retrieval and question-answering over images.
|
Multi-modal LLMs enable visual assistants that can perform question-answering about images.
|
||||||
|
|
||||||
You can ask questions in natural language about a collection of photos, retrieve relevant ones, and have a multi-modal LLM answer questions about the retrieved images.
|
This template create a visual assistant for slide decks, which often contain visuals such as graphs or figures.
|
||||||
|
|
||||||
This template performs text-to-image retrieval for question-answering about a slide deck, which often contains visual elements that are not captured in standard RAG.
|
It uses GPT-4V to create image summaries for each slide, embeds the summaries, and stores them in Chroma.
|
||||||
|
|
||||||
|
Given a question, relevat slides are retrieved and passed to GPT-4V for answer synthesis.
|
||||||
|
|
||||||
This will use GPT-4V for image captioning and answer synthesis.
|

|
||||||
|
|
||||||
## Input
|
## Input
|
||||||
|
|
||||||
@ -124,4 +126,4 @@ We can access the template from code with:
|
|||||||
from langserve.client import RemoteRunnable
|
from langserve.client import RemoteRunnable
|
||||||
|
|
||||||
runnable = RemoteRunnable("http://localhost:8000/rag-chroma-multi-modal-multi-vector")
|
runnable = RemoteRunnable("http://localhost:8000/rag-chroma-multi-modal-multi-vector")
|
||||||
```
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user