feat(chroma): Add Chroma Cloud support (#32125)

* Adding support for more Chroma client options (`HttpClient` and
`CloundClient`). This includes adding arguments necessary for
instantiating these clients.
* Adding support for Chroma's new persisted collection configuration (we
moved index configuration into this new construct).
* Delegate `Settings` configuration to Chroma's client constructors.
This commit is contained in:
itaismith
2025-07-22 12:14:15 -07:00
committed by GitHub
parent 3fc27e7a95
commit 09769373b3
2 changed files with 467 additions and 161 deletions

View File

@@ -11,6 +11,13 @@
"\n",
">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0. View the full docs of `Chroma` at [this page](https://docs.trychroma.com/reference/py-collection), and find the API reference for the LangChain integration at [this page](https://python.langchain.com/api_reference/chroma/vectorstores/langchain_chroma.vectorstores.Chroma.html).\n",
"\n",
":::info Chroma Cloud\n",
"\n",
"Chroma Cloud powers serverless vector and full-text search. It's extremely fast, cost-effective, scalable and painless. Create a DB and try it out in under 30 seconds with $5 of free credits.\n",
"\n",
"[Get started with Chroma Cloud](https://trychroma.com/signup)\n",
":::\n",
"\n",
"## Setup\n",
"\n",
"To access `Chroma` vector stores you'll need to install the `langchain-chroma` integration package."
@@ -33,7 +40,15 @@
"source": [
"### Credentials\n",
"\n",
"You can use the `Chroma` vector store without any credentials, simply installing the package above is enough!"
"You can use the `Chroma` vector store without any credentials, simply installing the package above is enough!\n",
"\n",
"If you are a [Chroma Cloud](https://trychroma.com/signup) user, set your `CHROMA_TENANT`, `CHROMA_DATABASE`, and `CHROMA_API_KEY` environment variables.\n",
"\n",
"When you install the `chromadb` package you also get access to the Chroma CLI, which can set these for you. First, [login](https://docs.trychroma.com/docs/cli/login) via the CLI, and then use the [`connect` command](https://docs.trychroma.com/docs/cli/db):\n",
"\n",
"```bash\n",
"chroma db connect [db_name] --env-file\n",
"```"
]
},
{
@@ -73,7 +88,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "d3ed0a9a",
"metadata": {},
"outputs": [],
@@ -85,9 +100,19 @@
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
]
},
{
"cell_type": "markdown",
"id": "c6a43e25-227c-4e89-909f-3654fe2710fc",
"metadata": {},
"source": [
"#### Running Locally (In-Memory)\n",
"\n",
"You can get a Chroma server running in memory by simply instantiating a `Chroma` instance with a collection name and your embeddings provider:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"id": "3ea11a7b",
"metadata": {},
"outputs": [],
@@ -97,7 +122,104 @@
"vector_store = Chroma(\n",
" collection_name=\"example_collection\",\n",
" embedding_function=embeddings,\n",
" persist_directory=\"./chroma_langchain_db\", # Where to save data locally, remove if not necessary\n",
")"
]
},
{
"cell_type": "markdown",
"id": "92d04cda-e8cc-48aa-9680-470304e3ff4c",
"metadata": {},
"source": [
"If you don't need data persistence, this is a great option for experimenting while building your AI application with Langchain."
]
},
{
"cell_type": "markdown",
"id": "ad6adc53-4b3f-458e-8e2e-efcc3f99f0c5",
"metadata": {},
"source": [
"#### Running Locally (with Data Persistence)\n",
"\n",
"You can provide the `persist_directory` argument to save your data across multiple runs of your program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a858e77-fd6d-44f0-840f-8f71eaeae6f7",
"metadata": {},
"outputs": [],
"source": [
"from langchain_chroma import Chroma\n",
"\n",
"vector_store = Chroma(\n",
" collection_name=\"example_collection\",\n",
" embedding_function=embeddings,\n",
" persist_directory=\"./chroma_langchain_db\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "47bf272e-af0b-450e-8a86-3e8292273cde",
"metadata": {},
"source": [
"#### Connecting to a Chroma Server\n",
"\n",
"If you have a Chroma server running locally, or you have [deployed](https://docs.trychroma.com/guides/deploy/client-server-mode) one yourself, you can connect to it by providing the `host` argument.\n",
"\n",
"For example, you can start a Chroma server running locally with `chroma run`, and then connect it with `host='localhost'`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "679d619f-b8ee-4abb-8ac0-77ec859ddff1",
"metadata": {},
"outputs": [],
"source": [
"from langchain_chroma import Chroma\n",
"\n",
"vector_store = Chroma(\n",
" collection_name=\"example_collection\",\n",
" embedding_function=embeddings,\n",
" host=\"localhost\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e3c06ed9-c010-4764-bd6e-2a0c71201d5b",
"metadata": {},
"source": [
"For other deployments you can use the `port`, `ssl`, and `headers` arguments to customize your connection."
]
},
{
"cell_type": "markdown",
"id": "0f3238e1-ca57-482d-878d-b09bd2c8015c",
"metadata": {},
"source": [
"#### Chroma Cloud\n",
"\n",
"Chroma Cloud users can also build with Langchain. Provide your `Chroma` instance with your Chroma Cloud API key, tenant, and DB name:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e080d2d2-c501-467e-9842-e2045d86cdb5",
"metadata": {},
"outputs": [],
"source": [
"from langchain_chroma import Chroma\n",
"\n",
"vector_store = Chroma(\n",
" collection_name=\"example_collection\",\n",
" embedding_function=embeddings,\n",
" chroma_cloud_api_key=os.getenv(\"CHROMA_API_KEY\"),\n",
" tenant=os.getenv(\"CHROMA_TENANT\"),\n",
" database=os.getenv(\"CHROMA_DATABASE\"),\n",
")"
]
},
@@ -111,21 +233,132 @@
"You can also initialize from a `Chroma` client, which is particularly useful if you want easier access to the underlying database."
]
},
{
"cell_type": "markdown",
"id": "38e9f893-60df-4a4f-b570-2d1c463cc1e4",
"metadata": {},
"source": [
"#### Running Locally (In-Memory)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3fe4457f",
"execution_count": null,
"id": "09bfb62f-7c6b-43d3-a69a-0601899c6942",
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"\n",
"persistent_client = chromadb.PersistentClient()\n",
"collection = persistent_client.get_or_create_collection(\"collection_name\")\n",
"collection.add(ids=[\"1\", \"2\", \"3\"], documents=[\"a\", \"b\", \"c\"])\n",
"client = chromadb.Client()"
]
},
{
"cell_type": "markdown",
"id": "f3eac2de-0cca-4d57-b67d-04cc78bb59c1",
"metadata": {},
"source": [
"#### Running Locally (with Data Persistence)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ffc7f2ad-0d6c-4911-a4cf-a82bf7649478",
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"\n",
"client = chromadb.PersistentClient(path=\"./chroma_langchain_db\")"
]
},
{
"cell_type": "markdown",
"id": "41cc98d5-94f3-4a2f-903e-61c4a38d8f9c",
"metadata": {},
"source": [
"#### Connecting to a Chroma Server\n",
"\n",
"For example, if you are running a Chroma server locally (using `chroma run`):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb5828e3-c0a5-4f97-8d2e-23d82257743e",
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"\n",
"client = chromadb.HttpClient(host=\"localhost\", port=8000, ssl=False)"
]
},
{
"cell_type": "markdown",
"id": "254ecfdb-f247-4a3d-a52a-e515b17b7ba2",
"metadata": {},
"source": [
"#### Chroma Cloud"
]
},
{
"cell_type": "markdown",
"id": "fbbf8042-7ae7-4221-96e3-dc2048dd0f45",
"metadata": {},
"source": [
"After setting your `CHROMA_API_KEY`, `CHROMA_TENANT`, and `CHROMA_DATABASE`, you can simply instantiate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89e86a01-a347-4041-a4a1-01eecd299235",
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"\n",
"client = chromadb.CloudClient()"
]
},
{
"cell_type": "markdown",
"id": "8fdd8bbb-45ab-43d8-bdc1-7220b14cfc52",
"metadata": {},
"source": [
"#### Access your Chroma DB"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6da21a1a-8d0d-4a4b-bac5-008839e89540",
"metadata": {},
"outputs": [],
"source": [
"collection = client.get_or_create_collection(\"collection_name\")\n",
"collection.add(ids=[\"1\", \"2\", \"3\"], documents=[\"a\", \"b\", \"c\"])"
]
},
{
"cell_type": "markdown",
"id": "581906ba-8082-450c-a3c4-19284539980b",
"metadata": {},
"source": [
"#### Create a Chroma Vectorstore"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fe4457f",
"metadata": {},
"outputs": [],
"source": [
"vector_store_from_client = Chroma(\n",
" client=persistent_client,\n",
" client=client,\n",
" collection_name=\"collection_name\",\n",
" embedding_function=embeddings,\n",
")"
@@ -147,30 +380,10 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"id": "da279339",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['f22ed484-6db3-4b76-adb1-18a777426cd6',\n",
" 'e0d5bab4-6453-4511-9a37-023d9d288faa',\n",
" '877d76b8-3580-4d9e-a13f-eed0fa3d134a',\n",
" '26eaccab-81ce-4c0a-8e76-bf542647df18',\n",
" 'bcaa8239-7986-4050-bf40-e14fb7dab997',\n",
" 'cdc44b38-a83f-4e49-b249-7765b334e09d',\n",
" 'a7a35354-2687-4bc2-8242-3849a4d18d34',\n",
" '8780caf1-d946-4f27-a707-67d037e9e1d8',\n",
" 'dec6af2a-7326-408f-893d-7d7d717dfda9',\n",
" '3b18e210-bb59-47a0-8e17-c8e51176ea5e']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from uuid import uuid4\n",
"\n",
@@ -265,7 +478,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "ef5dbd1e",
"metadata": {},
"outputs": [],
@@ -301,7 +514,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "56f17791",
"metadata": {},
"outputs": [],
@@ -327,19 +540,10 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "e2b96fcf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n",
"* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n"
]
}
],
"outputs": [],
"source": [
"results = vector_store.similarity_search(\n",
" \"LangChain provides abstractions to make working with LLMs easy\",\n",
@@ -362,18 +566,10 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "2768a331",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* [SIM=1.726390] The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n"
]
}
],
"outputs": [],
"source": [
"results = vector_store.similarity_search_with_score(\n",
" \"Will it be hot tomorrow?\", k=1, filter={\"source\": \"news\"}\n",
@@ -394,18 +590,10 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "8ea434a5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* I had chocolate chip pancakes and fried eggs for breakfast this morning. [{'source': 'tweet'}]\n"
]
}
],
"outputs": [],
"source": [
"results = vector_store.similarity_search_by_vector(\n",
" embedding=embeddings.embed_query(\"I love green eggs and ham!\"), k=1\n",
@@ -430,21 +618,10 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "7b6f7867",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"retriever = vector_store.as_retriever(\n",
" search_type=\"mmr\", search_kwargs={\"k\": 1, \"fetch_k\": 5}\n",
@@ -493,7 +670,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.12.0"
}
},
"nbformat": 4,