diff --git a/README.md b/README.md index 954bba8d9..6fcef0518 100644 --- a/README.md +++ b/README.md @@ -33,42 +33,71 @@

-[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf) +[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf) ## What is DB-GPT? -DB-GPT is an open-source framework designed for the realm of large language models (LLMs) within the database field. Its primary purpose is to provide infrastructure that simplifies and streamlines the development of database-related applications. This is accomplished through the development of various technical capabilities, including: +DB-GPT is an open-source, data-domain large model framework. Its purpose is to build the infrastructure for the large model domain by developing a variety of technical capabilities, including multi-model management, Text2SQL performance optimization, RAG framework and optimization, and Multi-Agents framework collaboration. These capabilities aim to simplify and facilitate the construction of large model applications around databases. -1. **SMMF(Service-oriented Multi-model Management Framework)** -2. **Text2SQL Fine-tuning** -3. **RAG(Retrieval Augmented Generation) framework and optimization** -4. **Data-Driven Agents framework collaboration** -5. **GBI(Generative Business intelligence)** - -DB-GPT simplifies the creation of these applications based on large language models (LLMs) and databases. - -In the era of Data 3.0, enterprises and developers can take the ability to create customized applications with minimal coding, which harnesses the power of large language models (LLMs) and databases. +In the Data 3.0 era, based on models and databases, enterprises and developers can build their own bespoke applications with less code. +### Data Agents +![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb) ## Contents -- [Install](#install) -- [Demo](#demo) - [Introduction](#introduction) +- [Install](#install) - [Features](#features) - [Contribution](#contribution) -- [Roadmap](#roadmap) - [Contact](#contact-information) -[DB-GPT Youtube Video](https://www.youtube.com/watch?v=f5_g0OObZBQ) +## Introduction +The architecture of DB-GPT is shown in the following figure: -## Demo -##### Chat Data -![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063) +

+ +

-##### Chat Excel -![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59) +The core capabilities include the following parts: + +- **RAG (Retrieval Augmented Generation)**: RAG is currently the most practically implemented and urgently needed domain. DB-GPT has already implemented a framework based on RAG, allowing users to build knowledge-based applications using the RAG capabilities of DB-GPT. + +- **GBI (Generative Business Intelligence)**: Generative BI is one of the core capabilities of the DB-GPT project, providing the foundational data intelligence technology to build enterprise report analysis and business insights. + +- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. DB-GPT provides a complete fine-tuning framework that integrates seamlessly with the DB-GPT project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%. + +- **Data-Driven Multi-Agents Framework**: DB-GPT offers a data-driven self-evolving fine-tuning framework, aiming to continuously make decisions and execute based on data. + +- **Data Factory**: The Data Factory is mainly about cleaning and processing trustworthy knowledge and data in the era of large models. + +- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of DB-GPT. + +### SubModule +- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs). + +#### Text2SQL Finetune +- support llms + - [x] LLaMA + - [x] LLaMA-2 + - [x] BLOOM + - [x] BLOOMZ + - [x] Falcon + - [x] Baichuan + - [x] Baichuan2 + - [x] InternLM + - [x] Qwen + - [x] XVERSE + - [x] ChatGLM2 + +- SFT Accuracy +As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4! + +[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub) + +- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly +- [GPT-Vis](https://github.com/eosphoros-ai/GPT-Vis) Visualization protocol ## Install ![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white) @@ -120,26 +149,7 @@ At present, we have introduced several key features to showcase our current capa - Support Datasources - [Datasources](http://docs.dbgpt.site/docs/modules/connections) -## Introduction -The architecture of DB-GPT is shown in the following figure: -

- -

- -The core capabilities primarily consist of the following components: -1. Multi-Models: We support multiple Large Language Models (LLMs) such as LLaMA/LLaMA2, CodeLLaMA, ChatGLM, QWen, Vicuna, and proxy models like ChatGPT, Baichuan, Tongyi, Wenxin, and more. -2. Knowledge-Based QA: Our system enables high-quality intelligent Q&A based on local documents such as PDFs, Word documents, Excel files, and other data sources. -3. Embedding: We offer unified data vector storage and indexing. Data is embedded as vectors and stored in vector databases, allowing for content similarity search. -4. Multi-Datasources: This feature connects different modules and data sources, facilitating data flow and interaction. -5. Multi-Agents: Our platform provides Agent and plugin mechanisms, empowering users to customize and enhance the system's behaviour. -6. Privacy & Security: Rest assured that there is no risk of data leakage, and your data is 100% private and secure. -7. Text2SQL: We enhance Text-to-SQL performance through Supervised Fine-Tuning (SFT) applied to Large Language Models (LLMs). - -### SubModule -- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs). -- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly -- [DB-GPT-Web](https://github.com/eosphoros-ai/DB-GPT-Web) ChatUI for DB-GPT ## Image 🌐 [AutoDL Image](https://www.codewithgpu.com/i/eosphoros-ai/DB-GPT/dbgpt) @@ -151,106 +161,8 @@ The core capabilities primarily consist of the following components: ## Contribution - Please run `black .` before submitting the code. -- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/csunny/DB-GPT/blob/main/CONTRIBUTING.md) +- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/eosphoros-ai/DB-GPT/blob/main/CONTRIBUTING.md) -## RoadMap - -

- -

- -### KBQA RAG optimization -- [x] Multi Documents - - [x] PDF - - [x] Excel, CSV - - [x] Word - - [x] Text - - [x] MarkDown - - [ ] Code - - [ ] Images - -- [x] RAG -- [ ] Graph Database - - [ ] Neo4j Graph - - [ ] Nebula Graph -- [x] Multi-Vector Database - - [x] Chroma - - [x] Milvus - - [x] Weaviate - - [x] PGVector - - [ ] Elasticsearch - - [ ] ClickHouse - - [ ] Faiss - -- [ ] Testing and Evaluation Capability Building - - [ ] Knowledge QA datasets - - [ ] Question collection [easy, medium, hard]: - - [ ] Scoring mechanism - - [ ] Testing and evaluation using Excel + DB datasets - -### Multi Datasource Support - -- Multi Datasource Support - - [x] MySQL - - [x] PostgreSQL - - [x] Spark - - [x] DuckDB - - [x] Sqlite - - [x] MSSQL - - [x] ClickHouse - - [ ] Oracle - - [ ] Redis - - [ ] MongoDB - - [ ] HBase - - [x] Doris - - [ ] DB2 - - [ ] Couchbase - - [ ] Elasticsearch - - [ ] OceanBase - - [ ] TiDB - - [ ] StarRocks - -### Multi-Models And vLLM -- [x] [Cluster Deployment](https://docs.dbgpt.site/docs/installation/model_service/cluster) -- [x] [Fastchat Support](https://github.com/lm-sys/FastChat) -- [x] [vLLM Support](https://docs.dbgpt.site/docs/installation/advanced_usage/vLLM_inference) -- [ ] Cloud-native environment and support for Ray environment -- [ ] Service Registry(eg:nacos) -- [ ] Compatibility with OpenAI's interfaces -- [ ] Expansion and optimization of embedding models - -### Agents market and Plugins -- [x] multi-agents framework -- [x] custom plugin development -- [x] plugin market -- [ ] Integration with CoT -- [ ] Enrich plugin sample library -- [ ] Support for AutoGPT protocol -- [ ] Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards - -### Cost and Observability -- [x] [debugging](https://docs.dbgpt.site/docs/application_manual/advanced_tutorial/debugging) -- [ ] Observability -- [ ] cost & budgets - -### Text2SQL Finetune -- support llms - - [x] LLaMA - - [x] LLaMA-2 - - [x] BLOOM - - [x] BLOOMZ - - [x] Falcon - - [x] Baichuan - - [x] Baichuan2 - - [x] InternLM - - [x] Qwen - - [x] XVERSE - - [x] ChatGLM2 - -- SFT Accuracy -As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4! - -[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub) ## Licence The MIT License (MIT) @@ -272,8 +184,4 @@ If you find `DB-GPT` useful for your research or development, please cite the fo We are working on building a community, if you have any ideas for building the community, feel free to contact us. [![](https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat)](https://discord.gg/7uQnPuveTY) -

- -

- [![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT&type=Date)](https://star-history.com/#csunny/DB-GPT) diff --git a/README.zh.md b/README.zh.md index 2696080e8..f34db33c5 100644 --- a/README.zh.md +++ b/README.zh.md @@ -8,19 +8,19 @@

- stars + stars - forks + forks License: MIT - Release Notes + Release Notes - Open Issues + Open Issues Discord @@ -33,39 +33,56 @@

-[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/csunny/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf) +[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
## DB-GPT 是什么? -DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模型领域的基础设施,通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力,让围绕数据库构建大模型应用更简单,更方便。 - +DB-GPT是一个开源的数据域大模型框架。目的是构建大模型领域的基础设施,通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力,让围绕数据库构建大模型应用更简单,更方便。 数据3.0 时代,基于模型、数据库,企业/开发者可以用更少的代码搭建自己的专属应用。 -## 目录 +## 效果演示 -- [安装](#安装) -- [效果演示](#效果演示) +### Data Agents +![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb) + + +## 目录 - [架构方案](#架构方案) +- [安装](#安装) - [特性简介](#特性一览) - [贡献](#贡献) - [路线图](#路线图) - [联系我们](#联系我们) -[DB-GPT视频介绍](https://www.bilibili.com/video/BV1au41157bj/?spm_id_from=333.337.search-card.all.click&vd_source=7792e22c03b7da3c556a450eb42c8a0f) +## 架构方案 -## 效果演示 - -##### Chat Data -![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063) - -##### Chat Excel -![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59) - -#### 根据自然语言对话生成分析图表 -

- +

+

+核心能力主要有以下几个部分: +- **RAG(Retrieval Augmented Generation)**,RAG是当下落地实践最多,也是最迫切的领域,DB-GPT目前已经实现了一套基于RAG的框架,用户可以基于DB-GPT的RAG能力构建知识类应用。 + +- **GBI**:生成式BI是DB-GPT项目的核心能力之一,为构建企业报表分析、业务洞察提供基础的数智化技术保障。 + +- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力,DB-GPT提供了完整的微调框架,实现与DB-GPT项目的无缝打通,在最近的微调中,基于spider的准确率已经做到了82.5% + +- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架,目标是可以持续基于数据做决策与执行。 + +- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。 + +- **数据源**: 对接各类数据源,实现生产业务数据无缝对接到DB-GPT核心能力。 + +### RAG生产落地实践架构 +

+ +

+ +### 子模块 +- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果 +- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT +- [GPT-Vis](https://github.com/eosphoros-ai/DB-GPT-Web) 可视化协议 + ## 安装 ![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white) @@ -84,7 +101,7 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模 - [**Excel对话**](https://www.yuque.com/eosphoros/dbgpt-docs/prugoype0xd2g4bb) - [**数据库对话**](https://www.yuque.com/eosphoros/dbgpt-docs/wswpv3zcm2c9snmg) - [**报表分析**](https://www.yuque.com/eosphoros/dbgpt-docs/vsv49p33eg4p5xc1) - - [**插件**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm) + - [**Agents**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm) - [**模型服务部署**](https://www.yuque.com/eosphoros/dbgpt-docs/vubxiv9cqed5mc6o) - [**单机部署**](https://www.yuque.com/eosphoros/dbgpt-docs/kwg1ed88lu5fgawb) - [**集群部署**](https://www.yuque.com/eosphoros/dbgpt-docs/gmbp9619ytyn2v1s) @@ -137,34 +154,6 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模 - [支持数据源](https://www.yuque.com/eosphoros/dbgpt-docs/rc4r27ybmdwg9472) -## 架构方案 -整个DB-GPT的架构,如下图所示 -

- -

- -核心能力主要有以下几个部分: -- **RAG(Retrieval Augmented Generation)**,RAG是当下落地实践最多,也是最迫切的领域,DB-GPT目前已经实现了一套基于RAG的框架,用户可以基于DB-GPT的RAG能力构建知识类应用。 - -- **GBI**:生成式BI是DB-GPT项目的核心能力之一,为构建企业报表分析、业务洞察提供基础的数智化技术保障。 - -- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力,DB-GPT提供了完整的微调框架,实现与DB-GPT项目的无缝打通,在最近的微调中,基于spider的准确率已经做到了82.5% - -- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架,目标是可以持续基于数据做决策与执行。 - -- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。 - -- **数据源**: 对接各类数据源,实现生产业务数据无缝对接到DB-GPT核心能力。 - -### RAG生产落地实践架构 -

- -

- -### 子模块 -- [DB-GPT-Hub](https://github.com/csunny/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果 -- [DB-GPT-Plugins](https://github.com/csunny/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT -- [DB-GPT-Web](https://github.com/csunny/DB-GPT-Web) 多端交互前端界面 ## Image @@ -180,7 +169,11 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模 ### 多模型使用 -[使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv) +- [使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv) + +### 数据Agents使用 + +- [数据Agents](https://www.yuque.com/eosphoros/dbgpt-docs/gwz4rayfuwz78fbq) # 贡献 > 提交代码前请先执行 `black .` @@ -193,10 +186,6 @@ The MIT License (MIT) # 路线图 -

- -

- ### 知识库RAG检索优化 - [x] Multi Documents diff --git a/assets/dbgpt.png b/assets/dbgpt.png new file mode 100644 index 000000000..e99be8d9a Binary files /dev/null and b/assets/dbgpt.png differ diff --git a/assets/roadmap.jpg b/assets/roadmap.jpg deleted file mode 100644 index 4b845dd75..000000000 Binary files a/assets/roadmap.jpg and /dev/null differ diff --git a/assets/wechat.jpg b/assets/wechat.jpg index e29d4f7ea..06beffc13 100644 Binary files a/assets/wechat.jpg and b/assets/wechat.jpg differ diff --git a/examples/awel/simple_nl_schema_sql_chart_example.py b/examples/awel/simple_nl_schema_sql_chart_example.py index 54f5dbb3e..47bb3fc3c 100644 --- a/examples/awel/simple_nl_schema_sql_chart_example.py +++ b/examples/awel/simple_nl_schema_sql_chart_example.py @@ -228,7 +228,7 @@ class ChartDrawOperator(MapOperator[Any, Any]): return str(df) -with (DAG("simple_nl_schema_sql_chart_example") as dag): +with DAG("simple_nl_schema_sql_chart_example") as dag: trigger = HttpTrigger( "/examples/rag/schema_linking", methods="POST", request_body=TriggerReqBody )