docs: readme update & contact (#1097)

This commit is contained in:
magic.chen 2024-01-22 09:54:26 +08:00 committed by GitHub
parent 4f833634df
commit 1484981b72
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
6 changed files with 96 additions and 199 deletions

192
README.md
View File

@ -33,42 +33,71 @@
</p>
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
</div>
## What is DB-GPT?
DB-GPT is an open-source framework designed for the realm of large language models (LLMs) within the database field. Its primary purpose is to provide infrastructure that simplifies and streamlines the development of database-related applications. This is accomplished through the development of various technical capabilities, including:
DB-GPT is an open-source, data-domain large model framework. Its purpose is to build the infrastructure for the large model domain by developing a variety of technical capabilities, including multi-model management, Text2SQL performance optimization, RAG framework and optimization, and Multi-Agents framework collaboration. These capabilities aim to simplify and facilitate the construction of large model applications around databases.
1. **SMMF(Service-oriented Multi-model Management Framework)**
2. **Text2SQL Fine-tuning**
3. **RAG(Retrieval Augmented Generation) framework and optimization**
4. **Data-Driven Agents framework collaboration**
5. **GBI(Generative Business intelligence)**
DB-GPT simplifies the creation of these applications based on large language models (LLMs) and databases.
In the era of Data 3.0, enterprises and developers can take the ability to create customized applications with minimal coding, which harnesses the power of large language models (LLMs) and databases.
In the Data 3.0 era, based on models and databases, enterprises and developers can build their own bespoke applications with less code.
### Data Agents
![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb)
## Contents
- [Install](#install)
- [Demo](#demo)
- [Introduction](#introduction)
- [Install](#install)
- [Features](#features)
- [Contribution](#contribution)
- [Roadmap](#roadmap)
- [Contact](#contact-information)
[DB-GPT Youtube Video](https://www.youtube.com/watch?v=f5_g0OObZBQ)
## Introduction
The architecture of DB-GPT is shown in the following figure:
## Demo
##### Chat Data
![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063)
<p align="center">
<img src="./assets/dbgpt.png" width="800" />
</p>
##### Chat Excel
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59)
The core capabilities include the following parts:
- **RAG (Retrieval Augmented Generation)**: RAG is currently the most practically implemented and urgently needed domain. DB-GPT has already implemented a framework based on RAG, allowing users to build knowledge-based applications using the RAG capabilities of DB-GPT.
- **GBI (Generative Business Intelligence)**: Generative BI is one of the core capabilities of the DB-GPT project, providing the foundational data intelligence technology to build enterprise report analysis and business insights.
- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. DB-GPT provides a complete fine-tuning framework that integrates seamlessly with the DB-GPT project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%.
- **Data-Driven Multi-Agents Framework**: DB-GPT offers a data-driven self-evolving fine-tuning framework, aiming to continuously make decisions and execute based on data.
- **Data Factory**: The Data Factory is mainly about cleaning and processing trustworthy knowledge and data in the era of large models.
- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of DB-GPT.
### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).
#### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2
- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [GPT-Vis](https://github.com/eosphoros-ai/GPT-Vis) Visualization protocol
## Install
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
@ -120,26 +149,7 @@ At present, we have introduced several key features to showcase our current capa
- Support Datasources
- [Datasources](http://docs.dbgpt.site/docs/modules/connections)
## Introduction
The architecture of DB-GPT is shown in the following figure:
<p align="center">
<img src="./assets/DB-GPT.png" width="800" />
</p>
The core capabilities primarily consist of the following components:
1. Multi-Models: We support multiple Large Language Models (LLMs) such as LLaMA/LLaMA2, CodeLLaMA, ChatGLM, QWen, Vicuna, and proxy models like ChatGPT, Baichuan, Tongyi, Wenxin, and more.
2. Knowledge-Based QA: Our system enables high-quality intelligent Q&A based on local documents such as PDFs, Word documents, Excel files, and other data sources.
3. Embedding: We offer unified data vector storage and indexing. Data is embedded as vectors and stored in vector databases, allowing for content similarity search.
4. Multi-Datasources: This feature connects different modules and data sources, facilitating data flow and interaction.
5. Multi-Agents: Our platform provides Agent and plugin mechanisms, empowering users to customize and enhance the system's behaviour.
6. Privacy & Security: Rest assured that there is no risk of data leakage, and your data is 100% private and secure.
7. Text2SQL: We enhance Text-to-SQL performance through Supervised Fine-Tuning (SFT) applied to Large Language Models (LLMs).
### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [DB-GPT-Web](https://github.com/eosphoros-ai/DB-GPT-Web) ChatUI for DB-GPT
## Image
🌐 [AutoDL Image](https://www.codewithgpu.com/i/eosphoros-ai/DB-GPT/dbgpt)
@ -151,106 +161,8 @@ The core capabilities primarily consist of the following components:
## Contribution
- Please run `black .` before submitting the code.
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/csunny/DB-GPT/blob/main/CONTRIBUTING.md)
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/eosphoros-ai/DB-GPT/blob/main/CONTRIBUTING.md)
## RoadMap
<p align="left">
<img src="./assets/roadmap.jpg" width="800px" />
</p>
### KBQA RAG optimization
- [x] Multi Documents
- [x] PDF
- [x] Excel, CSV
- [x] Word
- [x] Text
- [x] MarkDown
- [ ] Code
- [ ] Images
- [x] RAG
- [ ] Graph Database
- [ ] Neo4j Graph
- [ ] Nebula Graph
- [x] Multi-Vector Database
- [x] Chroma
- [x] Milvus
- [x] Weaviate
- [x] PGVector
- [ ] Elasticsearch
- [ ] ClickHouse
- [ ] Faiss
- [ ] Testing and Evaluation Capability Building
- [ ] Knowledge QA datasets
- [ ] Question collection [easy, medium, hard]:
- [ ] Scoring mechanism
- [ ] Testing and evaluation using Excel + DB datasets
### Multi Datasource Support
- Multi Datasource Support
- [x] MySQL
- [x] PostgreSQL
- [x] Spark
- [x] DuckDB
- [x] Sqlite
- [x] MSSQL
- [x] ClickHouse
- [ ] Oracle
- [ ] Redis
- [ ] MongoDB
- [ ] HBase
- [x] Doris
- [ ] DB2
- [ ] Couchbase
- [ ] Elasticsearch
- [ ] OceanBase
- [ ] TiDB
- [ ] StarRocks
### Multi-Models And vLLM
- [x] [Cluster Deployment](https://docs.dbgpt.site/docs/installation/model_service/cluster)
- [x] [Fastchat Support](https://github.com/lm-sys/FastChat)
- [x] [vLLM Support](https://docs.dbgpt.site/docs/installation/advanced_usage/vLLM_inference)
- [ ] Cloud-native environment and support for Ray environment
- [ ] Service Registry(eg:nacos)
- [ ] Compatibility with OpenAI's interfaces
- [ ] Expansion and optimization of embedding models
### Agents market and Plugins
- [x] multi-agents framework
- [x] custom plugin development
- [x] plugin market
- [ ] Integration with CoT
- [ ] Enrich plugin sample library
- [ ] Support for AutoGPT protocol
- [ ] Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards
### Cost and Observability
- [x] [debugging](https://docs.dbgpt.site/docs/application_manual/advanced_tutorial/debugging)
- [ ] Observability
- [ ] cost & budgets
### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2
- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
## Licence
The MIT License (MIT)
@ -272,8 +184,4 @@ If you find `DB-GPT` useful for your research or development, please cite the fo
We are working on building a community, if you have any ideas for building the community, feel free to contact us.
[![](https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat)](https://discord.gg/7uQnPuveTY)
<p align="center">
<img src="./assets/wechat.jpg" width="300px" />
</p>
[![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT&type=Date)](https://star-history.com/#csunny/DB-GPT)

View File

@ -8,19 +8,19 @@
<div align="center">
<p>
<a href="https://github.com/eosphoros-ai/DB-GPT">
<img alt="stars" src="https://img.shields.io/github/stars/csunny/db-gpt?style=social" />
<img alt="stars" src="https://img.shields.io/github/stars/eosphoros-ai/db-gpt?style=social" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT">
<img alt="forks" src="https://img.shields.io/github/forks/csunny/db-gpt?style=social" />
<img alt="forks" src="https://img.shields.io/github/forks/eosphoros-ai/db-gpt?style=social" />
</a>
<a href="https://opensource.org/licenses/MIT">
<img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT/releases">
<img alt="Release Notes" src="https://img.shields.io/github/release/csunny/DB-GPT" />
<img alt="Release Notes" src="https://img.shields.io/github/release/eosphoros-ai/DB-GPT" />
</a>
<a href="https://github.com/eosphoros-ai/DB-GPT/issues">
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/csunny/DB-GPT" />
<img alt="Open Issues" src="https://img.shields.io/github/issues-raw/eosphoros-ai/DB-GPT" />
</a>
<a href="https://discord.gg/7uQnPuveTY">
<img alt="Discord" src="https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat" />
@ -33,39 +33,56 @@
</a>
</p>
[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/csunny/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
[**English**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**文档**](https://www.yuque.com/eosphoros/dbgpt-docs/bex30nsv60ru0fmx) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**社区**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
</div>
## DB-GPT 是什么?
DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模型领域的基础设施通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力让围绕数据库构建大模型应用更简单更方便。
DB-GPT是一个开源的数据域大模型框架。目的是构建大模型领域的基础设施通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力让围绕数据库构建大模型应用更简单更方便。
数据3.0 时代,基于模型、数据库,企业/开发者可以用更少的代码搭建自己的专属应用。
## 目录
## 效果演示
- [安装](#安装)
- [效果演示](#效果演示)
### Data Agents
![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb)
## 目录
- [架构方案](#架构方案)
- [安装](#安装)
- [特性简介](#特性一览)
- [贡献](#贡献)
- [路线图](#路线图)
- [联系我们](#联系我们)
[DB-GPT视频介绍](https://www.bilibili.com/video/BV1au41157bj/?spm_id_from=333.337.search-card.all.click&vd_source=7792e22c03b7da3c556a450eb42c8a0f)
## 架构方案
## 效果演示
##### Chat Data
![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063)
##### Chat Excel
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59)
#### 根据自然语言对话生成分析图表
<p align="left">
<img src="./assets/dashboard.png" width="800px" />
<p align="center">
<img src="./assets/dbgpt.png" width="800px" />
</p>
核心能力主要有以下几个部分:
- **RAG(Retrieval Augmented Generation)**RAG是当下落地实践最多也是最迫切的领域DB-GPT目前已经实现了一套基于RAG的框架用户可以基于DB-GPT的RAG能力构建知识类应用。
- **GBI**生成式BI是DB-GPT项目的核心能力之一为构建企业报表分析、业务洞察提供基础的数智化技术保障。
- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力DB-GPT提供了完整的微调框架实现与DB-GPT项目的无缝打通在最近的微调中基于spider的准确率已经做到了82.5%
- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架目标是可以持续基于数据做决策与执行。
- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。
- **数据源**: 对接各类数据源实现生产业务数据无缝对接到DB-GPT核心能力。
### RAG生产落地实践架构
<p align="center">
<img src="./assets/RAG-IN-ACTION.jpg" width="800px" />
</p>
### 子模块
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT
- [GPT-Vis](https://github.com/eosphoros-ai/DB-GPT-Web) 可视化协议
## 安装
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
@ -84,7 +101,7 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模
- [**Excel对话**](https://www.yuque.com/eosphoros/dbgpt-docs/prugoype0xd2g4bb)
- [**数据库对话**](https://www.yuque.com/eosphoros/dbgpt-docs/wswpv3zcm2c9snmg)
- [**报表分析**](https://www.yuque.com/eosphoros/dbgpt-docs/vsv49p33eg4p5xc1)
- [**插件**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm)
- [**Agents**](https://www.yuque.com/eosphoros/dbgpt-docs/pom41m7oqtdd57hm)
- [**模型服务部署**](https://www.yuque.com/eosphoros/dbgpt-docs/vubxiv9cqed5mc6o)
- [**单机部署**](https://www.yuque.com/eosphoros/dbgpt-docs/kwg1ed88lu5fgawb)
- [**集群部署**](https://www.yuque.com/eosphoros/dbgpt-docs/gmbp9619ytyn2v1s)
@ -137,34 +154,6 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模
- [支持数据源](https://www.yuque.com/eosphoros/dbgpt-docs/rc4r27ybmdwg9472)
## 架构方案
整个DB-GPT的架构如下图所示
<p align="center">
<img src="./assets/DB-GPT_zh.png" width="800px" />
</p>
核心能力主要有以下几个部分:
- **RAG(Retrieval Augmented Generation)**RAG是当下落地实践最多也是最迫切的领域DB-GPT目前已经实现了一套基于RAG的框架用户可以基于DB-GPT的RAG能力构建知识类应用。
- **GBI**生成式BI是DB-GPT项目的核心能力之一为构建企业报表分析、业务洞察提供基础的数智化技术保障。
- **微调框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力DB-GPT提供了完整的微调框架实现与DB-GPT项目的无缝打通在最近的微调中基于spider的准确率已经做到了82.5%
- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架目标是可以持续基于数据做决策与执行。
- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。
- **数据源**: 对接各类数据源实现生产业务数据无缝对接到DB-GPT核心能力。
### RAG生产落地实践架构
<p align="center">
<img src="./assets/RAG-IN-ACTION.jpg" width="800px" />
</p>
### 子模块
- [DB-GPT-Hub](https://github.com/csunny/DB-GPT-Hub) 通过微调来持续提升Text2SQL效果
- [DB-GPT-Plugins](https://github.com/csunny/DB-GPT-Plugins) DB-GPT 插件仓库, 兼容Auto-GPT
- [DB-GPT-Web](https://github.com/csunny/DB-GPT-Web) 多端交互前端界面
## Image
@ -180,7 +169,11 @@ DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模
### 多模型使用
[使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv)
- [使用指南](https://www.yuque.com/eosphoros/dbgpt-docs/huzgcf2abzvqy8uv)
### 数据Agents使用
- [数据Agents](https://www.yuque.com/eosphoros/dbgpt-docs/gwz4rayfuwz78fbq)
# 贡献
> 提交代码前请先执行 `black .`
@ -193,10 +186,6 @@ The MIT License (MIT)
# 路线图
<p align="left">
<img src="./assets/roadmap.jpg" width="800px" />
</p>
### 知识库RAG检索优化
- [x] Multi Documents

BIN
assets/dbgpt.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 213 KiB

After

Width:  |  Height:  |  Size: 219 KiB

View File

@ -228,7 +228,7 @@ class ChartDrawOperator(MapOperator[Any, Any]):
return str(df)
with (DAG("simple_nl_schema_sql_chart_example") as dag):
with DAG("simple_nl_schema_sql_chart_example") as dag:
trigger = HttpTrigger(
"/examples/rag/schema_linking", methods="POST", request_body=TriggerReqBody
)