Fix update_document function, add test and documentation. (#5359)

# Fix for `update_document` Function in Chroma ## Summary This pull request addresses an issue with the `update_document` function in the Chroma class, as described in [#5031](https://github.com/hwchase17/langchain/issues/5031#issuecomment-1562577947). The issue was identified as an `AttributeError` raised when calling `update_document` due to a missing corresponding method in the `Collection` object. This fix refactors the `update_document` method in `Chroma` to correctly interact with the `Collection` object. ## Changes 1. Fixed the `update_document` method in the `Chroma` class to correctly call methods on the `Collection` object. 2. Added the corresponding test `test_chroma_update_document` in `tests/integration_tests/vectorstores/test_chroma.py` to reflect the updated method call. 3. Added an example and explanation of how to use the `update_document` function in the Jupyter notebook tutorial for Chroma. ## Test Plan All existing tests pass after this change. In addition, the `test_chroma_update_document` test case now correctly checks the functionality of `update_document`, ensuring that the function works as expected and updates the content of documents correctly. ## Reviewers @dev2049 This fix will ensure that users are able to use the `update_document` function as expected, without encountering the previous `AttributeError`. This will enhance the usability and reliability of the Chroma class for all users. Thank you for considering this pull request. I look forward to your feedback and suggestions.
2025-09-16 23:13:31 +00:00 · 2023-05-29 14:39:25 +01:00
parent e455ba4ed5
commit 44b48d9518
3 changed files with 124 additions and 5 deletions
--- a/tests/integration_tests/vectorstores/test_chroma.py
+++ b/tests/integration_tests/vectorstores/test_chroma.py
@@ -160,3 +160,37 @@ def test_chroma_with_include_parameter() -> None:
    assert output["embeddings"] is not None
    output = docsearch.get()
    assert output["embeddings"] is None
+
+
+def test_chroma_update_document() -> None:
+    """Test the update_document function in the Chroma class."""
+
+    # Initial document content and id
+    initial_content = "foo"
+    document_id = "doc1"
+
+    # Create an instance of Document with initial content and metadata
+    original_doc = Document(page_content=initial_content, metadata={"page": "0"})
+
+    # Initialize a Chroma instance with the original document
+    docsearch = Chroma.from_documents(
+        collection_name="test_collection",
+        documents=[original_doc],
+        embedding=FakeEmbeddings(),
+        ids=[document_id],
+    )
+
+    # Define updated content for the document
+    updated_content = "updated foo"
+
+    # Create a new Document instance with the updated content and the same id
+    updated_doc = Document(page_content=updated_content, metadata={"page": "0"})
+
+    # Update the document in the Chroma instance
+    docsearch.update_document(document_id=document_id, document=updated_doc)
+
+    # Perform a similarity search with the updated content
+    output = docsearch.similarity_search(updated_content, k=1)
+
+    # Assert that the updated document is returned by the search
+    assert output == [Document(page_content=updated_content, metadata={"page": "0"})]