embeddings: nomic embed vision (#22482)

Thank you for contributing to LangChain! **Description:** Adds Langchain support for Nomic Embed Vision **Twitter handle:** nomic_ai,zach_nussbaum - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-09-12 00:11:17 +00:00 · 2024-06-05 12:47:17 -04:00
parent 3280a5b49b
commit 14f3014cce
9 changed files with 543 additions and 29 deletions
--- a/templates/rag-multi-modal-local/README.md
+++ b/templates/rag-multi-modal-local/README.md
@@ -7,11 +7,11 @@ With the release of open source, multi-modal LLMs it's possible to build this ki

 This template demonstrates how to perform private visual search and question-answering over a collection of your photos.

-It uses OpenCLIP embeddings to embed all of the photos and stores them in Chroma.
+It uses [`nomic-embed-vision-v1`](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images and `Ollama` for question-answering.
 
 Given a question, relevant photos are retrieved and passed to an open source multi-modal LLM of your choice for answer synthesis.
 
-![Diagram illustrating the visual search process with OpenCLIP embeddings and multi-modal LLM for question-answering, featuring example food pictures and a matcha soft serve answer trace.](https://github.com/langchain-ai/langchain/assets/122662504/da543b21-052c-4c43-939e-d4f882a45d75 "Visual Search Process Diagram")
+![Diagram illustrating the visual search process with nomic-embed-vision-v1 embeddings and multi-modal LLM for question-answering, featuring example food pictures and a matcha soft serve answer trace.](https://github.com/langchain-ai/langchain/assets/122662504/da543b21-052c-4c43-939e-d4f882a45d75 "Visual Search Process Diagram")

 ## Input

@@ -34,22 +34,23 @@ python ingest.py

 ## Storage

-This template will use [OpenCLIP](https://github.com/mlfoundations/open_clip) multi-modal embeddings to embed the images.
-
-You can select different embedding model options (see results [here](https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv)).
+This template will use [nomic-embed-vision-v1](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images.

 The first time you run the app, it will automatically download the multimodal embedding model.

-By default, LangChain will use an embedding model with moderate performance but lower memory requirments, `ViT-H-14`.

-You can choose alternative `OpenCLIPEmbeddings` models in `rag_chroma_multi_modal/ingest.py`:
+You can choose alternative models in `rag_chroma_multi_modal/ingest.py`, such as `OpenCLIPEmbeddings`.
 ```
+langchain_experimental.open_clip import OpenCLIPEmbeddings
+
+embedding_function=OpenCLIPEmbeddings(
+        model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
+        )
+
 vectorstore_mmembd = Chroma(
    collection_name="multi-modal-rag",
    persist_directory=str(re_vectorstore_path),
-    embedding_function=OpenCLIPEmbeddings(
-        model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
-    ),
+    embedding_function=embedding_function
 )
 ```

--- a/templates/rag-multi-modal-local/ingest.py
+++ b/templates/rag-multi-modal-local/ingest.py
@@ -2,7 +2,7 @@ import os
 from pathlib import Path

 from langchain_community.vectorstores import Chroma
-from langchain_experimental.open_clip import OpenCLIPEmbeddings
+from langchain_nomic import NomicMultimodalEmbeddings

 # Load images
 img_dump_path = Path(__file__).parent / "docs/"
@@ -21,7 +21,9 @@ re_vectorstore_path = vectorstore.relative_to(Path.cwd())

 # Load embedding function
 print("Loading embedding function")
-embedding = OpenCLIPEmbeddings(model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k")
+embedding = NomicMultimodalEmbeddings(
+    vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
+)

 # Create chroma
 vectorstore_mmembd = Chroma(
--- a/templates/rag-multi-modal-local/rag_multi_modal_local/chain.py
+++ b/templates/rag-multi-modal-local/rag_multi_modal_local/chain.py
@@ -9,7 +9,7 @@ from langchain_core.messages import HumanMessage
 from langchain_core.output_parsers import StrOutputParser
 from langchain_core.pydantic_v1 import BaseModel
 from langchain_core.runnables import RunnableLambda, RunnablePassthrough
-from langchain_experimental.open_clip import OpenCLIPEmbeddings
+from langchain_nomic import NomicMultimodalEmbeddings
 from PIL import Image


@@ -102,8 +102,8 @@ def multi_modal_rag_chain(retriever):
 vectorstore_mmembd = Chroma(
    collection_name="multi-modal-rag",
    persist_directory=str(Path(__file__).parent.parent / "chroma_db_multi_modal"),
-    embedding_function=OpenCLIPEmbeddings(
-        model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
+    embedding_function=NomicMultimodalEmbeddings(
+        vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
    ),
 )