feat(GraphRAG): enhance GraphRAG by graph community summary (#1801)

Co-authored-by: Florian <fanzhidongyzby@163.com>
Co-authored-by: KingSkyLi <15566300566@163.com>
Co-authored-by: aries_ckt <916701291@qq.com>
Co-authored-by: Fangyin Cheng <staneyffer@gmail.com>
Co-authored-by: yvonneyx <zhuyuxin0627@gmail.com>
This commit is contained in:
M1n9X
2024-08-30 21:59:44 +08:00
committed by GitHub
parent 471689ba20
commit 759f7d99cc
59 changed files with 29316 additions and 411 deletions

View File

@@ -0,0 +1,208 @@
"""CommunitySummarizer class."""
import logging
from dbgpt.core import LLMClient
from dbgpt.rag.transformer.llm_summarizer import LLMSummarizer
logger = logging.getLogger(__name__)
class CommunitySummarizer(LLMSummarizer):
"""CommunitySummarizer class."""
def __init__(self, llm_client: LLMClient, model_name: str):
"""Initialize the CommunitySummaryExtractor."""
super().__init__(llm_client, model_name, COMMUNITY_SUMMARY_PT_CN)
COMMUNITY_SUMMARY_PT_CN = (
"## 角色\n"
"你非常擅长知识图谱的信息总结,能根据给定的知识图谱中的实体和关系的名称以及描述"
"信息,全面、恰当地对知识图谱子图信息做出总结性描述,并且不会丢失关键的信息。\n"
"\n"
"## 技能\n"
"### 技能 1: 实体识别\n"
"- 准确地识别[Entities:]章节中的实体信息,包括实体名、实体描述信息。\n"
"- 实体信息的一般格式有:\n"
"(实体名)\n"
"(实体名:实体描述)\n"
"(实体名:实体属性表)\n"
"(文本块ID:文档块内容)\n"
"(目录ID:目录名)\n"
"(文档ID:文档名称)\n"
"\n"
"### 技能 2: 关系识别\n"
"- 准确地识别[Relationships:]章节中的关系信息,包括来源实体名、关系名、"
"目标实体名、关系描述信息实体名也可能是文档ID、目录ID、文本块ID。\n"
"- 关系信息的一般格式有:\n"
"(来源实体名)-[关系名]->(目标实体名)\n"
"(来源实体名)-[关系名:关系描述]->(目标实体名)\n"
"(来源实体名)-[关系名:关系属性表]->(目标实体名)\n"
"(文本块ID)-[包含]->(实体名)\n"
"(目录ID)-[包含]->(文本块实体)\n"
"(目录ID)-[包含]->(子目录ID)\n"
"(文档ID)-[包含]->(文本块实体)\n"
"(文档ID)-[包含]->(目录ID)\n"
"\n"
"### 技能 3: 图结构理解\n"
"--请按照如下步骤理解图结构--\n"
"1. 正确地将关系信息中的来源实体名与实体信息关联。\n"
"2. 正确地将关系信息中的目标实体名与实体信息关联。\n"
"3. 根据提供的关系信息还原出图结构。"
"\n"
"### 技能 4: 知识图谱总结\n"
"--请按照如下步骤总结知识图谱--\n"
"1. 确定知识图谱表达的主题或话题,突出关键实体和关系。"
"2. 使用准确、恰当、简洁的语言总结图结构表达的信息,不要生成与图结构中无关的信息。"
"\n"
"## 约束条件\n"
"- 不要在答案中描述你的思考过程,直接给出用户问题的答案,不要生成无关信息。\n"
"- 确保以第三人称书写,从客观角度对知识图谱表达的信息进行总结性描述。\n"
"- 如果实体或关系的描述信息为空,对最终的总结信息没有贡献,不要生成无关信息。\n"
"- 如果提供的描述信息相互矛盾,请解决矛盾并提供一个单一、连贯的描述。\n"
"- 避免使用停用词和过于常见的词汇。\n"
"\n"
"## 参考案例\n"
"--案例仅帮助你理解提示词的输入和输出格式,请不要在答案中使用它们。--\n"
"输入:\n"
"```\n"
"Entities:\n"
"(菲尔・贾伯#菲尔兹咖啡创始人)\n"
"(菲尔兹咖啡#加利福尼亚州伯克利创立的咖啡品牌)\n"
"(雅各布・贾伯#菲尔・贾伯的儿子)\n"
"(美国多地#菲尔兹咖啡的扩展地区)\n"
"\n"
"Relationships:\n"
"(菲尔・贾伯#创建#菲尔兹咖啡#1978年在加利福尼亚州伯克利创立)\n"
"(菲尔兹咖啡#位于#加利福尼亚州伯克利#菲尔兹咖啡的创立地点)\n"
"(菲尔・贾伯#拥有#雅各布・贾伯#菲尔・贾伯的儿子)\n"
"(雅各布・贾伯#担任#首席执行官#在2005年成为菲尔兹咖啡的首席执行官)\n"
"(菲尔兹咖啡#扩展至#美国多地#菲尔兹咖啡的扩展范围)\n"
"```\n"
"\n"
"输出:\n"
"```\n"
"菲尔兹咖啡是由菲尔・贾伯在1978年于加利福尼亚州伯克利创立的咖啡品牌。"
"菲尔・贾伯的儿子雅各布・贾伯在2005年接任首席执行官领导公司扩展到了美国多地"
"进一步巩固了菲尔兹咖啡作为加利福尼亚州伯克利创立的咖啡品牌的市场地位。\n"
"```\n"
"\n"
"----\n"
"\n"
"请根据接下来[知识图谱]提供的信息,按照上述要求,总结知识图谱表达的信息。\n"
"\n"
"[知识图谱]:\n"
"{graph}\n"
"\n"
"[总结]:\n"
"\n"
)
COMMUNITY_SUMMARY_PT_EN = (
"## Role\n"
"You are highly skilled in summarizing information from knowledge graphs. "
"Based on the names and descriptions of entities and relationships in a "
"given knowledge graph, you can comprehensively and appropriately summarize"
" the information of the subgraph without losing critical details.\n"
"\n"
"## Skills\n"
"### Skill 1: Entity Recognition\n"
"- Accurately recognize entity information in the [Entities:] section, "
"including entity names and descriptions.\n"
"- The general formats for entity information are:\n"
"(entity_name)\n"
"(entity_name: entity_description)\n"
"(entity_name: entity_property_map)\n"
"(chunk_id: chunk_content)\n"
"(catalog_id: catalog_name)\n"
"(document_id: document_name)\n"
"\n"
"### Skill 2: Relationship Recognition\n"
"- Accurately recognize relationship information in the [Relationships:] "
"section, including source_entity_name, relationship_name, "
"target_entity_name, and relationship_description, The entity_name may "
"also be the document_id, catalog_id, or chunk_id.\n"
"- The general formats for relationship information are:\n"
"(source_entity_name)-[relationship_name]->(target_entity_name)\n"
"(source_entity_name)-[relationship_name: relationship_description]->"
"(target_entity_name)\n"
"(source_entity_name)-[relationship_name: relationship_property_map]->"
"(target_entity_name)\n"
"(chunk_id)-[Contains]->(entity_name)\n"
"(catalog_id)-[Contains]->(chunk_id)\n"
"(catalog_id)-[Contains]->(sub_catalog_id)\n"
"(document_id)-[Contains]->(chunk_id)\n"
"(document_id)-[Contains]->(catalog_id)\n"
"\n"
"### Skill 3: Graph Structure Understanding\n"
"--Follow these steps to understand the graph structure--\n"
"1. Correctly associate the source entity name in the "
"relationship information with the entity information.\n"
"2. Correctly associate the target entity name in the "
"relationship information with the entity information.\n"
"3. Reconstruct the graph structure based on the provided "
"relationship information."
"\n"
"### Skill 4: Knowledge Graph Summarization\n"
"--Follow these steps to summarize the knowledge graph--\n"
"1. Determine the theme or topic expressed by the knowledge graph, "
"highlighting key entities and relationships."
"2. Use accurate, appropriate, and concise language to summarize "
"the information expressed by the graph "
"without generating irrelevant information."
"\n"
"## Constraints\n"
"- Don't describe your thought process in the answer, provide the answer "
"to the user's question directly without generating irrelevant information."
"- Ensure the summary is written in the third person and objectively "
"reflects the information conveyed by the knowledge graph.\n"
"- If the descriptions of entities or relationships are empty and "
"contribute nothing to the final summary, "
"do not generate unrelated information.\n"
"- If the provided descriptions are contradictory, resolve the conflicts "
"and provide a single, coherent description.\n"
"- Avoid using stop words and overly common words.\n"
"\n"
"## Reference Example\n"
"--The case is only to help you understand the input and output format of "
"the prompt, please do not use it in your answer.--\n"
"Input:\n"
"```\n"
"Entities:\n"
"(Phil Jaber#Founder of Philz Coffee)\n"
"(Philz Coffee#Coffee brand founded in Berkeley, California)\n"
"(Jacob Jaber#Son of Phil Jaber)\n"
"(Multiple locations in the USA#Expansion regions of Philz Coffee)\n"
"\n"
"Relationships:\n"
"(Phil Jaber#Created#Philz Coffee"
"#Founded in Berkeley, California in 1978)\n"
"(Philz Coffee#Located in#Berkeley, California"
"#Founding location of Philz Coffee)\n"
"(Phil Jaber#Has#Jacob Jaber#Son of Phil Jaber)\n"
"(Jacob Jaber#Serves as#CEO#Became CEO of Philz Coffee in 2005)\n"
"(Philz Coffee#Expanded to#Multiple locations in the USA"
"#Expansion regions of Philz Coffee)\n"
"```\n"
"\n"
"Output:\n"
"```\n"
"Philz Coffee is a coffee brand founded by Phil Jaber in 1978 in "
"Berkeley, California. Phil Jaber's son, Jacob Jaber, took over as CEO in "
"2005, leading the company to expand to multiple locations in the USA, "
"further solidifying Philz Coffee's market position as a coffee brand "
"founded in Berkeley, California.\n"
"```\n"
"\n"
"----\n"
"\n"
"Please summarize the information expressed by the [KnowledgeGraph] "
"provided in the following section according to the above requirements.\n"
"\n"
"[KnowledgeGraph]:\n"
"{graph}\n"
"\n"
"[Summary]:\n"
"\n"
)