DOCS: partners/chroma: Fix documentation around chroma query filter syntax (#31058)

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

**Description**:
* Starting to put together some PR's to fix the typing around
`langchain-chroma` `filter` and `where_document` query filtering, as
mentioned:

https://github.com/langchain-ai/langchain/issues/30879
https://github.com/langchain-ai/langchain/issues/30507

The typing of `dict[str, str]` is on the one hand too restrictive (marks
valid filter expressions as ill-typed) and also too permissive (allows
illegal filter expressions). That's not what this PR addresses though.
This PR just removes from the documentation some examples of filters
that are illegal, and also syntactically incorrect: (a) dictionaries
with keys like `$contains` but the key is missing quotation marks; (b)
dictionaries with multiple entries - this is illegal in Chroma filter
syntax and will raise an exception. (`{"foo": "bar", "qux": "baz"}`).
Filter dictionaries in Chroma must have one and one key only. Again this
is just the documentation issue, which is the lowest hanging fruit. I
also think we need to update the types for `filter` and `where_document`
to be (at the very least `dict[str, Any]`), or, since we have access to
Chroma's types, they should be `Where` and `WhereDocument` types. This
has a wider blast radius though, so I'm starting small.

This PR does not fix the issues mentioned above, it's just starting to
get the ball rolling, and cleaning up the documentation.



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Really Him <hesereallyhim@proton.me>
This commit is contained in:
Really Him 2025-04-30 17:51:07 -04:00 committed by GitHub
parent ed7cd3c5c4
commit 918c950737
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -378,10 +378,10 @@ class Chroma(VectorStore):
query_texts: List of query texts.
query_embeddings: List of query embeddings.
n_results: Number of results to return. Defaults to 4.
where: dict used to filter results by
e.g. {"color" : "red", "price": 4.20}.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where: dict used to filter results by metadata.
E.g. {"color" : "red"}.
where_document: dict used to filter by the document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -417,7 +417,7 @@ class Chroma(VectorStore):
uris: File path to the image.
metadatas: Optional list of metadatas.
When querying, you can filter on this metadata.
ids: Optional list of IDs.
ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)
kwargs: Additional keyword arguments to pass.
Returns:
@ -507,7 +507,7 @@ class Chroma(VectorStore):
texts: Texts to add to the vectorstore.
metadatas: Optional list of metadatas.
When querying, you can filter on this metadata.
ids: Optional list of IDs.
ids: Optional list of IDs. (Items without IDs will be assigned UUIDs)
kwargs: Additional keyword arguments.
Returns:
@ -619,8 +619,8 @@ class Chroma(VectorStore):
embedding: Embedding to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where_document: dict used to filter by the document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -650,7 +650,7 @@ class Chroma(VectorStore):
k: Number of Documents to return. Defaults to 4.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
E.g. {"$contains": "hello"}}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -680,8 +680,8 @@ class Chroma(VectorStore):
query: Query text to search for.
k: Number of results to return. Defaults to 4.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where_document: dict used to filter by document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -722,8 +722,8 @@ class Chroma(VectorStore):
query: Query text to search for.
k: Number of results to return. Defaults to 4.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where_document: dict used to filter by the document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -904,8 +904,8 @@ class Chroma(VectorStore):
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where_document: dict used to filter by the document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -955,8 +955,8 @@ class Chroma(VectorStore):
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
filter: Filter by metadata. Defaults to None.
where_document: dict used to filter by the documents.
E.g. {$contains: {"text": "hello"}}.
where_document: dict used to filter by the document contents.
E.g. {"$contains": "hello"}.
kwargs: Additional keyword arguments to pass to Chroma collection query.
Returns:
@ -1012,7 +1012,7 @@ class Chroma(VectorStore):
offset: The offset to start returning results from.
Useful for paging results with limit. Optional.
where_document: A WhereDocument type dict used to filter by the documents.
E.g. `{$contains: "hello"}`. Optional.
E.g. `{"$contains": "hello"}`. Optional.
include: A list of what to include in the results.
Can contain `"embeddings"`, `"metadatas"`, `"documents"`.
Ids are always included.