langchain/libs/community/langchain_community/retrievers/needle.py
Jan Heimes ef365543cb
community: add Needle retriever and document loader integration (#28157)
- [x] **PR title**: "community: add Needle retriever and document loader
integration"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** This PR adds a new integration for Needle, which
includes:
- **NeedleRetriever**: A retriever for fetching documents from Needle
collections.
- **NeedleLoader**: A document loader for managing and loading documents
into Needle collections.
      - Example notebooks demonstrating usage have been added in:
        - `docs/docs/integrations/retrievers/needle.ipynb`
        - `docs/docs/integrations/document_loaders/needle.ipynb`.
- **Dependencies:** The `needle-python` package is required as an
external dependency for accessing Needle's API. It has been added to the
extended testing dependencies list.
- **Twitter handle:** Feel free to mention me if this PR gets announced:
[needlexai](https://x.com/NeedlexAI).

- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. Unit tests have been added for both `NeedleRetriever` and
`NeedleLoader` in `libs/community/tests/unit_tests`. These tests mock
API calls to avoid relying on network access.
2. Example notebooks have been added to `docs/docs/integrations/`,
showcasing both retriever and loader functionality.

- [x] **Lint and test**: Run `make format`, `make lint`, and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
  - `make format`: Passed
  - `make lint`: Passed
- `make test`: Passed (requires `needle-python` to be installed locally;
this package is not added to LangChain dependencies).

Additional guidelines:
- [x] Optional dependencies are imported only within functions.
- [x] No dependencies have been added to pyproject.toml files except for
those required for unit tests.
- [x] The PR does not touch more than one package.
- [x] Changes are fully backwards compatible.
- [x] Community additions are not re-imported into LangChain core.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 22:06:25 +00:00

97 lines
3.3 KiB
Python

from typing import Any, List, Optional # noqa: I001
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from pydantic import BaseModel, Field
class NeedleRetriever(BaseRetriever, BaseModel):
"""
NeedleRetriever retrieves relevant documents or context from a Needle collection
based on a search query.
Setup:
Install the `needle-python` library and set your Needle API key.
.. code-block:: bash
pip install needle-python
export NEEDLE_API_KEY="your-api-key"
Key init args:
- `needle_api_key` (Optional[str]): The API key for authenticating with Needle.
- `collection_id` (str): The ID of the Needle collection to search in.
- `client` (Optional[NeedleClient]): An optional instance of the NeedleClient.
Usage:
.. code-block:: python
from langchain_community.retrievers.needle import NeedleRetriever
retriever = NeedleRetriever(
needle_api_key="your-api-key",
collection_id="your-collection-id"
)
results = retriever.retrieve("example query")
for doc in results:
print(doc.page_content)
"""
client: Optional[Any] = None
"""Optional instance of NeedleClient."""
needle_api_key: Optional[str] = Field(None, description="Needle API Key")
collection_id: Optional[str] = Field(
..., description="The ID of the Needle collection to search in"
)
def _initialize_client(self) -> None:
"""
Initialize the NeedleClient with the provided API key.
If a client instance is already provided, this method does nothing.
"""
try:
from needle.v1 import NeedleClient
except ImportError:
raise ImportError("Please install with `pip install needle-python`.")
if not self.client:
self.client = NeedleClient(api_key=self.needle_api_key)
def _search_collection(self, query: str) -> List[Document]:
"""
Search the Needle collection for relevant documents.
Args:
query (str): The search query used to find relevant documents.
Returns:
List[Document]: A list of documents matching the search query.
"""
self._initialize_client()
if self.client is None:
raise ValueError("NeedleClient is not initialized. Provide an API key.")
results = self.client.collections.search(
collection_id=self.collection_id, text=query
)
docs = [Document(page_content=result.content) for result in results]
return docs
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
"""
Retrieve relevant documents based on the query.
Args:
query (str): The query string used to search the collection.
Returns:
List[Document]: A list of documents relevant to the query.
"""
# The `run_manager` parameter is included to match the superclass signature,
# but it is not used in this implementation.
return self._search_collection(query)