1
0
mirror of https://github.com/hwchase17/langchain.git synced 2025-05-06 07:38:50 +00:00

community: add cognee retriever ()

This PR adds a new cognee integration, knowledge graph based retrieval
enabling developers to ingest documents into cognee’s knowledge graph,
process them, and then retrieve context via CogneeRetriever.
It includes:
- langchain_cognee package with a CogneeRetriever class
- a test for the integration, demonstrating how to create, process, and
retrieve with cognee
- an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


Followed additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

Thank you for the review!

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
Hande 2025-02-20 20:15:23 +03:00 committed by GitHub
parent 97dd5f45ae
commit d8bab89e6e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 305 additions and 0 deletions
docs/docs/integrations
providers
retrievers
libs

View File

@ -0,0 +1,27 @@
# Cognee
Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines that allow
you to interconnect and retrieve past conversations, documents, and audio
transcriptions while reducing hallucinations, developer effort, and cost.
Cognee merges graph and vector databases to uncover hidden relationships and new
patterns in your data. You can automatically model, load and retrieve entities and
objects representing your business domain and analyze their relationships, uncovering
insights that neither vector stores nor graph stores alone can provide.
Try it in a Google Colab <a href="https://colab.research.google.com/drive/1g-Qnx6l_ecHZi0IOw23rg0qC4TYvEvWZ?usp=sharing">notebook</a> or have a look at the <a href="https://docs.cognee.ai">documentation</a>.
If you have questions, join cognee <a href="https://discord.gg/NQPKmU5CCg">Discord</a> community.
Have you seen cognee's <a href="https://github.com/topoteretes/cognee-starter">starter repo</a>? Check it out!
## Installation and Setup
```bash
pip install langchain-cognee
```
## Retrievers
See detail on available retrievers [here](/docs/integrations/retrievers/cognee).

View File

@ -0,0 +1,275 @@
{
"cells": [
{
"cell_type": "raw",
"id": "afaf8039",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Cognee\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"metadata": {},
"source": [
"# CogneeRetriever\n",
"\n",
"This will help you getting started with the Cognee [retriever](/docs/concepts/retrievers). For detailed documentation of all CogneeRetriever features and configurations head to the [API reference](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html).\n",
"\n",
"### Integration details\n",
"\n",
"Bring-your-own data (i.e., index and search a custom corpus of documents):\n",
"\n",
"| Retriever | Self-host | Cloud offering | Package |\n",
"| :--- | :--- | :---: | :---: |\n",
"[CogneeRetriever](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html) | ✅ | ❌ | langchain-cognee |\n",
"\n",
"## Setup\n",
"\n",
"For cognee default setup, only thing you need is your OpenAI API key. \n"
]
},
{
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"metadata": {},
"source": [
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "0730d6a1-c893-4840-9817-5e5251676d5d",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"This retriever lives in the `langchain-cognee` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-cognee"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8bcb1e7",
"metadata": {},
"outputs": [],
"source": [
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "markdown",
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our retriever:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70cc8e65-2a02-408a-bbc6-8ef649057d82",
"metadata": {},
"outputs": [],
"source": [
"from langchain_cognee import CogneeRetriever\n",
"\n",
"retriever = CogneeRetriever(\n",
" llm_api_key=\"sk-\", # OpenAI API Key\n",
" dataset_name=\"my_dataset\",\n",
" k=3,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5c5f2839-4020-424e-9fc9-07777eede442",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"Add some documents, process them, and then run queries. Cognee retrieves relevant knowledge to your queries and generates final answers."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
"metadata": {},
"outputs": [],
"source": [
"# Example of adding and processing documents\n",
"from langchain_core.documents import Document\n",
"\n",
"docs = [\n",
" Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
" Document(page_content=\"SpaceX focuses on rockets and space travel.\"),\n",
"]\n",
"\n",
"retriever.add_documents(docs)\n",
"retriever.process_data()\n",
"\n",
"# Now let's query the retriever\n",
"query = \"Tell me about Elon Musk\"\n",
"results = retriever.invoke(query)\n",
"\n",
"for idx, doc in enumerate(results, start=1):\n",
" print(f\"Doc {idx}: {doc.page_content}\")"
]
},
{
"cell_type": "markdown",
"id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e",
"metadata": {},
"source": [
"## Use within a chain\n",
"\n",
"Like other retrievers, CogneeRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n",
"\n",
"We will need a LLM or chat model:\n",
"\n",
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
"\n",
"<ChatModelTabs customVarName=\"llm\" />"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25b647a3-f8f2-4541-a289-7a241e43f9df",
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae",
"metadata": {},
"outputs": [],
"source": [
"from langchain_cognee import CogneeRetriever\n",
"from langchain_core.documents import Document\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Instantiate the retriever with your Cognee config\n",
"retriever = CogneeRetriever(llm_api_key=\"sk-\", dataset_name=\"my_dataset\", k=3)\n",
"\n",
"# Optionally, prune/reset the dataset for a clean slate\n",
"retriever.prune()\n",
"\n",
"# Add some documents\n",
"docs = [\n",
" Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
" Document(page_content=\"SpaceX focuses on space travel.\"),\n",
"]\n",
"retriever.add_documents(docs)\n",
"retriever.process_data()\n",
"\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the question based only on the context provided.\n",
"\n",
"Context: {context}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"chain = (\n",
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8",
"metadata": {},
"outputs": [],
"source": [
"answer = chain.invoke(\"What companies do Elon Musk own?\")\n",
"\n",
"print(\"\\nFinal chain answer:\\n\", answer)"
]
},
{
"cell_type": "markdown",
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"TODO: add link to API reference."
]
},
{
"cell_type": "markdown",
"id": "a8dbdd72",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain-cognee-wqM4bUfz-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -445,6 +445,9 @@ packages:
repo: Shikenso-Analytics/langchain-discord
downloads: 1
downloads_updated_at: '2025-02-15T16:00:00.000000+00:00'
- name: langchain-cognee
repo: topoteretes/langchain-cognee
path: .
- name: langchain-prolog
path: .
repo: apisani1/langchain-prolog