From 1db7450bc29b2c9596909dfd4531851ded1f465e Mon Sep 17 00:00:00 2001 From: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Date: Wed, 20 Dec 2023 20:07:20 -0800 Subject: [PATCH] Update Gemini template README.md (#14993) --- templates/rag-gemini-multi-modal/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/templates/rag-gemini-multi-modal/README.md b/templates/rag-gemini-multi-modal/README.md index 49d736fb0f7..8d890ca440a 100644 --- a/templates/rag-gemini-multi-modal/README.md +++ b/templates/rag-gemini-multi-modal/README.md @@ -1,13 +1,15 @@ # rag-gemini-multi-modal -Multi-modal LLMs enable text-to-image retrieval and question-answering over images. +Multi-modal LLMs enable visual assistants that can perform question-answering about images. -You can ask questions in natural language about a collection of photos, retrieve relevant ones, and have a multi-modal LLM answer questions about the retrieved images. +This template create a visual assistant for slide decks, which often contain visuals such as graphs or figures. -This template performs text-to-image retrieval for question-answering about a slide deck, which often contains visual elements that are not captured in standard RAG. +It uses OpenCLIP embeddings to embed all of the slide images and stores them in Chroma. -This will use OpenCLIP embeddings and [Google Gemini](https://deepmind.google/technologies/gemini/#introduction) for answer synthesis. +Given a question, relevat slides are retrieved and passed to [Google Gemini](https://deepmind.google/technologies/gemini/#introduction) for answer synthesis. + +![mm-mmembd](https://github.com/langchain-ai/langchain/assets/122662504/b9e69bef-d687-4ecf-a599-937e559d5184) ## Input @@ -112,4 +114,4 @@ We can access the template from code with: from langserve.client import RemoteRunnable runnable = RemoteRunnable("http://localhost:8000/rag-gemini-multi-modal") -``` \ No newline at end of file +```