community: add cognee retriever (#29878)

This PR adds a new cognee integration, knowledge graph based retrieval enabling developers to ingest documents into cognee’s knowledge graph, process them, and then retrieve context via CogneeRetriever. It includes: - langchain_cognee package with a CogneeRetriever class - a test for the integration, demonstrating how to create, process, and retrieve with cognee - an example notebook showing its use. It lives in `docs/docs/integrations` directory. Followed additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. Thank you for the review! --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-07-01 10:54:15 +00:00 · 2025-02-20 20:15:23 +03:00 · 2025-02-20 20:15:23 +03:00 · d8bab89e6e
commit d8bab89e6e
parent 97dd5f45ae
3 changed files with 305 additions and 0 deletions
--- a/docs/docs/integrations/providers/cognee.mdx
+++ b/docs/docs/integrations/providers/cognee.mdx
@ -0,0 +1,27 @@
+# Cognee
+
+Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines that allow
+you to interconnect and retrieve past conversations, documents, and audio
+transcriptions while reducing hallucinations, developer effort, and cost.
+
+Cognee merges graph and vector databases to uncover hidden relationships and new
+patterns in your data. You can automatically model, load and retrieve entities and
+objects representing your business domain and analyze their relationships, uncovering
+insights that neither vector stores nor graph stores alone can provide.
+
+Try it in a Google Colab  <a href="https://colab.research.google.com/drive/1g-Qnx6l_ecHZi0IOw23rg0qC4TYvEvWZ?usp=sharing">notebook</a>  or have a look at the <a href="https://docs.cognee.ai">documentation</a>.
+
+If you have questions, join cognee <a href="https://discord.gg/NQPKmU5CCg">Discord</a> community.
+
+Have you seen cognee's <a href="https://github.com/topoteretes/cognee-starter">starter repo</a>? Check it out!
+
+
+## Installation and Setup
+
+```bash
+pip install langchain-cognee
+```
+
+## Retrievers
+
+See detail on available retrievers [here](/docs/integrations/retrievers/cognee).
--- a/docs/docs/integrations/retrievers/cognee.ipynb
+++ b/docs/docs/integrations/retrievers/cognee.ipynb
@ -0,0 +1,275 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "afaf8039",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "sidebar_label: Cognee\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49f1e0d",
+   "metadata": {},
+   "source": [
+    "# CogneeRetriever\n",
+    "\n",
+    "This will help you getting started with the Cognee [retriever](/docs/concepts/retrievers). For detailed documentation of all CogneeRetriever features and configurations head to the [API reference](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html).\n",
+    "\n",
+    "### Integration details\n",
+    "\n",
+    "Bring-your-own data (i.e., index and search a custom corpus of documents):\n",
+    "\n",
+    "| Retriever | Self-host | Cloud offering | Package |\n",
+    "| :--- | :--- | :---: | :---: |\n",
+    "[CogneeRetriever](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html) | ✅ | ❌ | langchain-cognee |\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "For cognee default setup, only thing you need is your OpenAI API key. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72ee0c4b-9764-423a-9dbf-95129e185210",
+   "metadata": {},
+   "source": [
+    "If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
+    "# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0730d6a1-c893-4840-9817-5e5251676d5d",
+   "metadata": {},
+   "source": [
+    "### Installation\n",
+    "\n",
+    "This retriever lives in the `langchain-cognee` package:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "652d6238-1f87-422a-b135-f5abbb8652fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -qU langchain-cognee"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b8bcb1e7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nest_asyncio\n",
+    "\n",
+    "nest_asyncio.apply()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a38cde65-254d-4219-a441-068766c0d4b5",
+   "metadata": {},
+   "source": [
+    "## Instantiation\n",
+    "\n",
+    "Now we can instantiate our retriever:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70cc8e65-2a02-408a-bbc6-8ef649057d82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_cognee import CogneeRetriever\n",
+    "\n",
+    "retriever = CogneeRetriever(\n",
+    "    llm_api_key=\"sk-\",  # OpenAI API Key\n",
+    "    dataset_name=\"my_dataset\",\n",
+    "    k=3,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c5f2839-4020-424e-9fc9-07777eede442",
+   "metadata": {},
+   "source": [
+    "## Usage\n",
+    "\n",
+    "Add some documents, process them, and then run queries. Cognee retrieves relevant knowledge to your queries and generates final answers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example of adding and processing documents\n",
+    "from langchain_core.documents import Document\n",
+    "\n",
+    "docs = [\n",
+    "    Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
+    "    Document(page_content=\"SpaceX focuses on rockets and space travel.\"),\n",
+    "]\n",
+    "\n",
+    "retriever.add_documents(docs)\n",
+    "retriever.process_data()\n",
+    "\n",
+    "# Now let's query the retriever\n",
+    "query = \"Tell me about Elon Musk\"\n",
+    "results = retriever.invoke(query)\n",
+    "\n",
+    "for idx, doc in enumerate(results, start=1):\n",
+    "    print(f\"Doc {idx}: {doc.page_content}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e",
+   "metadata": {},
+   "source": [
+    "## Use within a chain\n",
+    "\n",
+    "Like other retrievers, CogneeRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n",
+    "\n",
+    "We will need a LLM or chat model:\n",
+    "\n",
+    "import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
+    "\n",
+    "<ChatModelTabs customVarName=\"llm\" />"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "25b647a3-f8f2-4541-a289-7a241e43f9df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_openai import ChatOpenAI\n",
+    "\n",
+    "llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_cognee import CogneeRetriever\n",
+    "from langchain_core.documents import Document\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.runnables import RunnablePassthrough\n",
+    "\n",
+    "# Instantiate the retriever with your Cognee config\n",
+    "retriever = CogneeRetriever(llm_api_key=\"sk-\", dataset_name=\"my_dataset\", k=3)\n",
+    "\n",
+    "# Optionally, prune/reset the dataset for a clean slate\n",
+    "retriever.prune()\n",
+    "\n",
+    "# Add some documents\n",
+    "docs = [\n",
+    "    Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
+    "    Document(page_content=\"SpaceX focuses on space travel.\"),\n",
+    "]\n",
+    "retriever.add_documents(docs)\n",
+    "retriever.process_data()\n",
+    "\n",
+    "\n",
+    "prompt = ChatPromptTemplate.from_template(\n",
+    "    \"\"\"Answer the question based only on the context provided.\n",
+    "\n",
+    "Context: {context}\n",
+    "\n",
+    "Question: {question}\"\"\"\n",
+    ")\n",
+    "\n",
+    "\n",
+    "def format_docs(docs):\n",
+    "    return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
+    "\n",
+    "\n",
+    "chain = (\n",
+    "    {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | StrOutputParser()\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "answer = chain.invoke(\"What companies do Elon Musk own?\")\n",
+    "\n",
+    "print(\"\\nFinal chain answer:\\n\", answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "TODO: add link to API reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8dbdd72",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "langchain-cognee-wqM4bUfz-py3.11",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/libs/packages.yml
+++ b/libs/packages.yml
@ -445,6 +445,9 @@ packages:
  repo: Shikenso-Analytics/langchain-discord
  downloads: 1
  downloads_updated_at: '2025-02-15T16:00:00.000000+00:00'
+- name: langchain-cognee
+  repo: topoteretes/langchain-cognee
+  path: .
 - name: langchain-prolog
  path: .
  repo: apisani1/langchain-prolog