diff --git a/docs/docs/integrations/providers/zotero.ipynb b/docs/docs/integrations/providers/zotero.ipynb new file mode 100644 index 00000000000..9d20b7e9732 --- /dev/null +++ b/docs/docs/integrations/providers/zotero.ipynb @@ -0,0 +1,35 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Zotero\n", + "\n", + "[Zotero](https://www.zotero.org/) is an open source reference management system intended for managing bibliographic data and related research materials. You can connect to your personal library, as well as shared group libraries, via the [API](https://www.zotero.org/support/dev/web_api/v3/start). This retriever implementation utilizes [PyZotero](https://github.com/urschrei/pyzotero) to access libraries. \n", + "\n", + "\n", + "## Installation\n", + "\n", + "```bash\n", + "pip install pyzotero\n", + "```\n", + "\n", + "## Retriever\n", + "\n", + "See a [usage example](/docs/integrations/retrievers/zotero).\n", + "\n", + "```python\n", + "from langchain_zotero_retriever.retrievers import ZoteroRetriever\n", + "```" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/docs/integrations/retrievers/zotero.ipynb b/docs/docs/integrations/retrievers/zotero.ipynb new file mode 100644 index 00000000000..bed2330996f --- /dev/null +++ b/docs/docs/integrations/retrievers/zotero.ipynb @@ -0,0 +1,260 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: Zotero\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "e49f1e0d", + "metadata": {}, + "source": [ + "# ZoteroRetriever\n", + "\n", + "This will help you getting started with the Zotero [retriever](/docs/concepts/retrievers). For detailed documentation of all ZoteroRetriever features and configurations head to the [Github page](https://github.com/TimBMK/langchain-zotero-retriever).\n", + "\n", + "### Integration details\n", + "\n", + "\n", + "| Retriever | Source | Package |\n", + "| :--- | :--- | :---: |\n", + "[ZoteroRetriever](https://github.com/TimBMK/langchain-zotero-retriever) | [Zotero API](https://www.zotero.org/support/dev/web_api/v3/start) | langchain-community |\n", + "\n", + "## Setup\n" + ] + }, + { + "cell_type": "markdown", + "id": "72ee0c4b-9764-423a-9dbf-95129e185210", + "metadata": {}, + "source": [ + "If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a15d341e-3e26-4ca3-830b-5aab30ed66de", + "metadata": {}, + "outputs": [], + "source": [ + "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n", + "# os.environ[\"LANGSMITH_TRACING\"] = \"true\"" + ] + }, + { + "cell_type": "markdown", + "id": "0730d6a1-c893-4840-9817-5e5251676d5d", + "metadata": {}, + "source": [ + "### Installation\n", + "\n", + "This retriever lives in the `langchain-zotero-retriever` package. We also require the `pyzotero` dependency:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "652d6238-1f87-422a-b135-f5abbb8652fc", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU langchain-zotero-retriever pyzotero" + ] + }, + { + "cell_type": "markdown", + "id": "a38cde65-254d-4219-a441-068766c0d4b5", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "`ZoteroRetriever` parameters include:\n", + "- `k`: Number of results to include (Default: 50)\n", + "- `type`: Type of search to perform. \"Top\" retrieves top level Zotero library items, \"items\" returns any Zotero library items. (Default: top)\n", + "- `get_fulltext`: Retrieves full texts if they are attached to the items in the library. If False, or no text is attached, returns an empty string as page_content. (Default: True)\n", + "- `library_id`: ID of the Zotero library to search. Required to connect to a library.\n", + "- `library_type`: Type of library to search. \"user\" for personal library, \"group\" for shared group libraries. (Default: user)\n", + "- `api_key`: Zotero API key if not set as an environment variable. Optional, required to access non-public group libraries or your personal library. Fetched automatically if provided as ZOTERO_API_KEY environment variable.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70cc8e65-2a02-408a-bbc6-8ef649057d82", + "metadata": {}, + "outputs": [], + "source": [ + "from langchain_zotero_retriever.retrievers import ZoteroRetriever\n", + "\n", + "retriever = ZoteroRetriever(\n", + " k=10,\n", + " library_id=\"2319375\", # a public group library that does not require an API key for access\n", + " library_type=\"group\", # set this to \"user\" if you are using a personal library. Personal libraries require an API key\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5c5f2839-4020-424e-9fc9-07777eede442", + "metadata": {}, + "source": [ + "## Usage\n", + "\n", + "Apart from the `query`, the retriever provides these additional search parameters:\n", + "- `itemType`: Type of item to search for (e.g. \"book\" or \"journalArticle\")\n", + "- `tag`: for searching over tags attached to library items (see search syntax for combining multiple tags)\n", + "- `qmode`: Search mode to use. Changes what the query searches over. \"everything\" includes full-text content. \"titleCreatorYear\" to search over title, authors and year.\n", + "- `since`: Return only objects modified after the specified library version. Defaults to return everything.\n", + "\n", + "For Search Syntax, see Zotero API Documentation: https://www.zotero.org/support/dev/web_api/v3/basics#search_syntax\n", + "\n", + "For the full API schema (including available itemTypes) see: https://github.com/zotero/zotero-schema" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51a60dbe-9f2e-4e04-bb62-23968f17164a", + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Zuboff\"\n", + "\n", + "retriever.invoke(query)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "add7bbb0", + "metadata": {}, + "outputs": [], + "source": [ + "tags = [\n", + " \"Surveillance\",\n", + " \"Digital Capitalism\",\n", + "] # note that providing tags as a list will result in a logical AND operation\n", + "\n", + "retriever.invoke(\"\", tag=tags)" + ] + }, + { + "cell_type": "markdown", + "id": "d1ee55bc-ffc8-4cfa-801c-993953a08cfd", + "metadata": {}, + "source": [ + "## Use within a chain\n", + "\n", + "Due to the way the Zotero API search operates, directly passing a user question to the ZoteroRetriever will often not return satisfactory results. For use in chains or agentic frameworks, it is recommended to turn the ZoteroRetriever into a [tool](https://python.langchain.com/docs/how_to/custom_tools/#creating-tools-from-functions). This way, the LLM can turn the user query into a more concise search query for the API. Furthermore, this allows the LLM to fill in additional search parameters, such as tag or item type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b19a04d", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import List, Optional, Union\n", + "\n", + "from langchain_core.output_parsers import PydanticToolsParser\n", + "from langchain_core.tools import StructuredTool, tool\n", + "from langchain_openai import ChatOpenAI\n", + "\n", + "\n", + "def retrieve(\n", + " query: str,\n", + " itemType: Optional[str],\n", + " tag: Optional[Union[str, List[str]]],\n", + " qmode: str = \"everything\",\n", + " since: Optional[int] = None,\n", + "):\n", + " retrieved_docs = retriever.invoke(\n", + " query, itemType=itemType, tag=tag, qmode=qmode, since=since\n", + " )\n", + " serialized_docs = \"\\n\\n\".join(\n", + " (\n", + " f\"Metadata: { {key: doc.metadata[key] for key in doc.metadata if key != 'abstractNote'} }\\n\"\n", + " f\"Abstract: {doc.metadata['abstractNote']}\\n\"\n", + " )\n", + " for doc in retrieved_docs\n", + " )\n", + "\n", + " return serialized_docs, retrieved_docs\n", + "\n", + "\n", + "description = \"\"\"Search and return relevant documents from a Zotero library. The following search parameters can be used:\n", + "\n", + " Args:\n", + " query: str: The search query to be used. Try to keep this specific and short, e.g. a specific topic or author name\n", + " itemType: Optional. Type of item to search for (e.g. \"book\" or \"journalArticle\"). Multiple types can be passed as a string seperated by \"||\", e.g. \"book || journalArticle\". Defaults to all types.\n", + " tag: Optional. For searching over tags attached to library items. If documents tagged with multiple tags are to be retrieved, pass them as a list. If documents with any of the tags are to be retrieved, pass them as a string separated by \"||\", e.g. \"tag1 || tag2\"\n", + " qmode: Search mode to use. Changes what the query searches over. \"everything\" includes full-text content. \"titleCreatorYear\" to search over title, authors and year. Defaults to \"everything\".\n", + " since: Return only objects modified after the specified library version. Defaults to return everything.\n", + " \"\"\"\n", + "\n", + "retriever_tool = StructuredTool.from_function(\n", + " func=retrieve,\n", + " name=\"retrieve\",\n", + " description=description,\n", + " return_direct=True,\n", + ")\n", + "\n", + "\n", + "llm = ChatOpenAI(model=\"gpt-4o-mini-2024-07-18\")\n", + "\n", + "llm_with_tools = llm.bind_tools([retrieve])\n", + "\n", + "q = \"What journal articles do I have on Surveillance in the zotero library?\"\n", + "\n", + "chain = llm_with_tools | PydanticToolsParser(tools=[retrieve])\n", + "\n", + "chain.invoke(q)" + ] + }, + { + "cell_type": "markdown", + "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all ZoteroRetriever features and configurations head to the [Github page](https://github.com/TimBMK/langchain-zotero-retriever).\n", + "\n", + "For detailed documentation on the Zotero API, head to the [Zotero API reference](https://www.zotero.org/support/dev/web_api/v3/start)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/packages.yml b/libs/packages.yml index 6fe2e8bf6f6..51ac31ed454 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -526,4 +526,9 @@ packages: provider_page: dell - name: langchain-tavily path: . - repo: tavily-ai/langchain-tavily \ No newline at end of file + repo: tavily-ai/langchain-tavily +- name: langchain-zotero-retriever + path: . + repo: TimBMK/langchain-zotero-retriever + name_title: Zotero + provider_page: zotero