mirror of
https://github.com/hwchase17/langchain.git
synced 2025-05-06 07:38:50 +00:00
community: add cognee retriever (#29878)
This PR adds a new cognee integration, knowledge graph based retrieval enabling developers to ingest documents into cognee’s knowledge graph, process them, and then retrieve context via CogneeRetriever. It includes: - langchain_cognee package with a CogneeRetriever class - a test for the integration, demonstrating how to create, process, and retrieve with cognee - an example notebook showing its use. It lives in `docs/docs/integrations` directory. Followed additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. Thank you for the review! --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>
This commit is contained in:
parent
97dd5f45ae
commit
d8bab89e6e
27
docs/docs/integrations/providers/cognee.mdx
Normal file
27
docs/docs/integrations/providers/cognee.mdx
Normal file
@ -0,0 +1,27 @@
|
||||
# Cognee
|
||||
|
||||
Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines that allow
|
||||
you to interconnect and retrieve past conversations, documents, and audio
|
||||
transcriptions while reducing hallucinations, developer effort, and cost.
|
||||
|
||||
Cognee merges graph and vector databases to uncover hidden relationships and new
|
||||
patterns in your data. You can automatically model, load and retrieve entities and
|
||||
objects representing your business domain and analyze their relationships, uncovering
|
||||
insights that neither vector stores nor graph stores alone can provide.
|
||||
|
||||
Try it in a Google Colab <a href="https://colab.research.google.com/drive/1g-Qnx6l_ecHZi0IOw23rg0qC4TYvEvWZ?usp=sharing">notebook</a> or have a look at the <a href="https://docs.cognee.ai">documentation</a>.
|
||||
|
||||
If you have questions, join cognee <a href="https://discord.gg/NQPKmU5CCg">Discord</a> community.
|
||||
|
||||
Have you seen cognee's <a href="https://github.com/topoteretes/cognee-starter">starter repo</a>? Check it out!
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install langchain-cognee
|
||||
```
|
||||
|
||||
## Retrievers
|
||||
|
||||
See detail on available retrievers [here](/docs/integrations/retrievers/cognee).
|
275
docs/docs/integrations/retrievers/cognee.ipynb
Normal file
275
docs/docs/integrations/retrievers/cognee.ipynb
Normal file
@ -0,0 +1,275 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"id": "afaf8039",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: Cognee\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e49f1e0d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CogneeRetriever\n",
|
||||
"\n",
|
||||
"This will help you getting started with the Cognee [retriever](/docs/concepts/retrievers). For detailed documentation of all CogneeRetriever features and configurations head to the [API reference](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html).\n",
|
||||
"\n",
|
||||
"### Integration details\n",
|
||||
"\n",
|
||||
"Bring-your-own data (i.e., index and search a custom corpus of documents):\n",
|
||||
"\n",
|
||||
"| Retriever | Self-host | Cloud offering | Package |\n",
|
||||
"| :--- | :--- | :---: | :---: |\n",
|
||||
"[CogneeRetriever](https://python.langchain.com/api_reference/community/retrievers/langchain_community.retrievers.cognee.CogneeRetriever.html) | ✅ | ❌ | langchain-cognee |\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"For cognee default setup, only thing you need is your OpenAI API key. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0730d6a1-c893-4840-9817-5e5251676d5d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This retriever lives in the `langchain-cognee` package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -qU langchain-cognee"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b8bcb1e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import nest_asyncio\n",
|
||||
"\n",
|
||||
"nest_asyncio.apply()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"Now we can instantiate our retriever:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "70cc8e65-2a02-408a-bbc6-8ef649057d82",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_cognee import CogneeRetriever\n",
|
||||
"\n",
|
||||
"retriever = CogneeRetriever(\n",
|
||||
" llm_api_key=\"sk-\", # OpenAI API Key\n",
|
||||
" dataset_name=\"my_dataset\",\n",
|
||||
" k=3,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5c5f2839-4020-424e-9fc9-07777eede442",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"\n",
|
||||
"Add some documents, process them, and then run queries. Cognee retrieves relevant knowledge to your queries and generates final answers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example of adding and processing documents\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
|
||||
" Document(page_content=\"SpaceX focuses on rockets and space travel.\"),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"retriever.add_documents(docs)\n",
|
||||
"retriever.process_data()\n",
|
||||
"\n",
|
||||
"# Now let's query the retriever\n",
|
||||
"query = \"Tell me about Elon Musk\"\n",
|
||||
"results = retriever.invoke(query)\n",
|
||||
"\n",
|
||||
"for idx, doc in enumerate(results, start=1):\n",
|
||||
" print(f\"Doc {idx}: {doc.page_content}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within a chain\n",
|
||||
"\n",
|
||||
"Like other retrievers, CogneeRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n",
|
||||
"\n",
|
||||
"We will need a LLM or chat model:\n",
|
||||
"\n",
|
||||
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
||||
"\n",
|
||||
"<ChatModelTabs customVarName=\"llm\" />"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "25b647a3-f8f2-4541-a289-7a241e43f9df",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-4o-mini\", temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_cognee import CogneeRetriever\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Instantiate the retriever with your Cognee config\n",
|
||||
"retriever = CogneeRetriever(llm_api_key=\"sk-\", dataset_name=\"my_dataset\", k=3)\n",
|
||||
"\n",
|
||||
"# Optionally, prune/reset the dataset for a clean slate\n",
|
||||
"retriever.prune()\n",
|
||||
"\n",
|
||||
"# Add some documents\n",
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"Elon Musk is the CEO of SpaceX.\"),\n",
|
||||
" Document(page_content=\"SpaceX focuses on space travel.\"),\n",
|
||||
"]\n",
|
||||
"retriever.add_documents(docs)\n",
|
||||
"retriever.process_data()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(\n",
|
||||
" \"\"\"Answer the question based only on the context provided.\n",
|
||||
"\n",
|
||||
"Context: {context}\n",
|
||||
"\n",
|
||||
"Question: {question}\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def format_docs(docs):\n",
|
||||
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"answer = chain.invoke(\"What companies do Elon Musk own?\")\n",
|
||||
"\n",
|
||||
"print(\"\\nFinal chain answer:\\n\", answer)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"TODO: add link to API reference."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a8dbdd72",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "langchain-cognee-wqM4bUfz-py3.11",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -445,6 +445,9 @@ packages:
|
||||
repo: Shikenso-Analytics/langchain-discord
|
||||
downloads: 1
|
||||
downloads_updated_at: '2025-02-15T16:00:00.000000+00:00'
|
||||
- name: langchain-cognee
|
||||
repo: topoteretes/langchain-cognee
|
||||
path: .
|
||||
- name: langchain-prolog
|
||||
path: .
|
||||
repo: apisani1/langchain-prolog
|
||||
|
Loading…
Reference in New Issue
Block a user