mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-12 00:11:17 +00:00
embeddings: nomic embed vision (#22482)
Thank you for contributing to LangChain! **Description:** Adds Langchain support for Nomic Embed Vision **Twitter handle:** nomic_ai,zach_nussbaum - [x] **Add tests and docs**: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
@@ -7,11 +7,11 @@ With the release of open source, multi-modal LLMs it's possible to build this ki
|
||||
|
||||
This template demonstrates how to perform private visual search and question-answering over a collection of your photos.
|
||||
|
||||
It uses OpenCLIP embeddings to embed all of the photos and stores them in Chroma.
|
||||
It uses [`nomic-embed-vision-v1`](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images and `Ollama` for question-answering.
|
||||
|
||||
Given a question, relevant photos are retrieved and passed to an open source multi-modal LLM of your choice for answer synthesis.
|
||||
|
||||

|
||||

|
||||
|
||||
## Input
|
||||
|
||||
@@ -34,22 +34,23 @@ python ingest.py
|
||||
|
||||
## Storage
|
||||
|
||||
This template will use [OpenCLIP](https://github.com/mlfoundations/open_clip) multi-modal embeddings to embed the images.
|
||||
|
||||
You can select different embedding model options (see results [here](https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv)).
|
||||
This template will use [nomic-embed-vision-v1](https://huggingface.co/nomic-ai/nomic-embed-vision-v1) multi-modal embeddings to embed the images.
|
||||
|
||||
The first time you run the app, it will automatically download the multimodal embedding model.
|
||||
|
||||
By default, LangChain will use an embedding model with moderate performance but lower memory requirments, `ViT-H-14`.
|
||||
|
||||
You can choose alternative `OpenCLIPEmbeddings` models in `rag_chroma_multi_modal/ingest.py`:
|
||||
You can choose alternative models in `rag_chroma_multi_modal/ingest.py`, such as `OpenCLIPEmbeddings`.
|
||||
```
|
||||
langchain_experimental.open_clip import OpenCLIPEmbeddings
|
||||
|
||||
embedding_function=OpenCLIPEmbeddings(
|
||||
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
|
||||
)
|
||||
|
||||
vectorstore_mmembd = Chroma(
|
||||
collection_name="multi-modal-rag",
|
||||
persist_directory=str(re_vectorstore_path),
|
||||
embedding_function=OpenCLIPEmbeddings(
|
||||
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
|
||||
),
|
||||
embedding_function=embedding_function
|
||||
)
|
||||
```
|
||||
|
||||
|
@@ -2,7 +2,7 @@ import os
|
||||
from pathlib import Path
|
||||
|
||||
from langchain_community.vectorstores import Chroma
|
||||
from langchain_experimental.open_clip import OpenCLIPEmbeddings
|
||||
from langchain_nomic import NomicMultimodalEmbeddings
|
||||
|
||||
# Load images
|
||||
img_dump_path = Path(__file__).parent / "docs/"
|
||||
@@ -21,7 +21,9 @@ re_vectorstore_path = vectorstore.relative_to(Path.cwd())
|
||||
|
||||
# Load embedding function
|
||||
print("Loading embedding function")
|
||||
embedding = OpenCLIPEmbeddings(model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k")
|
||||
embedding = NomicMultimodalEmbeddings(
|
||||
vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
|
||||
)
|
||||
|
||||
# Create chroma
|
||||
vectorstore_mmembd = Chroma(
|
||||
|
@@ -9,7 +9,7 @@ from langchain_core.messages import HumanMessage
|
||||
from langchain_core.output_parsers import StrOutputParser
|
||||
from langchain_core.pydantic_v1 import BaseModel
|
||||
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
|
||||
from langchain_experimental.open_clip import OpenCLIPEmbeddings
|
||||
from langchain_nomic import NomicMultimodalEmbeddings
|
||||
from PIL import Image
|
||||
|
||||
|
||||
@@ -102,8 +102,8 @@ def multi_modal_rag_chain(retriever):
|
||||
vectorstore_mmembd = Chroma(
|
||||
collection_name="multi-modal-rag",
|
||||
persist_directory=str(Path(__file__).parent.parent / "chroma_db_multi_modal"),
|
||||
embedding_function=OpenCLIPEmbeddings(
|
||||
model_name="ViT-H-14", checkpoint="laion2b_s32b_b79k"
|
||||
embedding_function=NomicMultimodalEmbeddings(
|
||||
vision_model="nomic-embed-vision-v1", text_model="nomic-embed-text-v1"
|
||||
),
|
||||
)
|
||||
|
||||
|
Reference in New Issue
Block a user