Update multi-modal multi-vector template README.md (#14992)

2025-07-03 11:47:49 +00:00 · 2023-12-20 20:07:12 -08:00 · 2023-12-20 20:07:12 -08:00 · 8996d1a65d
commit 8996d1a65d
parent 448b4d3522
1 changed files with 7 additions and 5 deletions
--- a/templates/rag-chroma-multi-modal-multi-vector/README.md
+++ b/templates/rag-chroma-multi-modal-multi-vector/README.md
@ -1,13 +1,15 @@
 # rag-chroma-multi-modal-multi-vector
-Multi-modal LLMs enable text-to-image retrieval and question-answering over images. 
+Multi-modal LLMs enable visual assistants that can perform question-answering about images. 
-You can ask questions in natural language about a collection of photos, retrieve relevant ones, and have a multi-modal LLM answer questions about the retrieved images.
+This template create a visual assistant for slide decks, which often contain visuals such as graphs or figures.
-This template performs text-to-image retrieval for question-answering about a slide deck, which often contains visual elements that are not captured in standard RAG.
+It uses GPT-4V to create image summaries for each slide, embeds the summaries, and stores them in Chroma.
-This will use GPT-4V for image captioning and answer synthesis.
+Given a question, relevat slides are retrieved and passed to GPT-4V for answer synthesis.
 ![mm-captioning](https://github.com/langchain-ai/langchain/assets/122662504/5277ef6b-d637-43c7-8dc1-9b1567470503)
 ## Input