docs: readme update & contact (#1097)

This commit is contained in:
magic.chen
2024-01-22 09:54:26 +08:00
committed by GitHub
parent 4f833634df
commit 1484981b72
6 changed files with 96 additions and 199 deletions

192
README.md
View File

@@ -33,42 +33,71 @@
</p>
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Documents**](https://docs.dbgpt.site) | [**微信**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Community**](https://github.com/eosphoros-ai/community) | [**Paper**](https://arxiv.org/pdf/2312.17449.pdf)
</div>
## What is DB-GPT?
DB-GPT is an open-source framework designed for the realm of large language models (LLMs) within the database field. Its primary purpose is to provide infrastructure that simplifies and streamlines the development of database-related applications. This is accomplished through the development of various technical capabilities, including:
DB-GPT is an open-source, data-domain large model framework. Its purpose is to build the infrastructure for the large model domain by developing a variety of technical capabilities, including multi-model management, Text2SQL performance optimization, RAG framework and optimization, and Multi-Agents framework collaboration. These capabilities aim to simplify and facilitate the construction of large model applications around databases.
1. **SMMF(Service-oriented Multi-model Management Framework)**
2. **Text2SQL Fine-tuning**
3. **RAG(Retrieval Augmented Generation) framework and optimization**
4. **Data-Driven Agents framework collaboration**
5. **GBI(Generative Business intelligence)**
DB-GPT simplifies the creation of these applications based on large language models (LLMs) and databases.
In the era of Data 3.0, enterprises and developers can take the ability to create customized applications with minimal coding, which harnesses the power of large language models (LLMs) and databases.
In the Data 3.0 era, based on models and databases, enterprises and developers can build their own bespoke applications with less code.
### Data Agents
![data agents](https://github.com/eosphoros-ai/DB-GPT/assets/17919400/ced393b4-9180-437a-90c5-b43633cda8cb)
## Contents
- [Install](#install)
- [Demo](#demo)
- [Introduction](#introduction)
- [Install](#install)
- [Features](#features)
- [Contribution](#contribution)
- [Roadmap](#roadmap)
- [Contact](#contact-information)
[DB-GPT Youtube Video](https://www.youtube.com/watch?v=f5_g0OObZBQ)
## Introduction
The architecture of DB-GPT is shown in the following figure:
## Demo
##### Chat Data
![chatdata](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/1f77079e-d018-4eee-982b-9b6a66bf1063)
<p align="center">
<img src="./assets/dbgpt.png" width="800" />
</p>
##### Chat Excel
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/3044e83b-a71e-41fe-a1e2-98e479e0ab59)
The core capabilities include the following parts:
- **RAG (Retrieval Augmented Generation)**: RAG is currently the most practically implemented and urgently needed domain. DB-GPT has already implemented a framework based on RAG, allowing users to build knowledge-based applications using the RAG capabilities of DB-GPT.
- **GBI (Generative Business Intelligence)**: Generative BI is one of the core capabilities of the DB-GPT project, providing the foundational data intelligence technology to build enterprise report analysis and business insights.
- **Fine-tuning Framework**: Model fine-tuning is an indispensable capability for any enterprise to implement in vertical and niche domains. DB-GPT provides a complete fine-tuning framework that integrates seamlessly with the DB-GPT project. In recent fine-tuning efforts, an accuracy rate based on the Spider dataset has been achieved at 82.5%.
- **Data-Driven Multi-Agents Framework**: DB-GPT offers a data-driven self-evolving fine-tuning framework, aiming to continuously make decisions and execute based on data.
- **Data Factory**: The Data Factory is mainly about cleaning and processing trustworthy knowledge and data in the era of large models.
- **Data Sources**: Integrating various data sources to seamlessly connect production business data to the core capabilities of DB-GPT.
### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).
#### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2
- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [GPT-Vis](https://github.com/eosphoros-ai/GPT-Vis) Visualization protocol
## Install
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
@@ -120,26 +149,7 @@ At present, we have introduced several key features to showcase our current capa
- Support Datasources
- [Datasources](http://docs.dbgpt.site/docs/modules/connections)
## Introduction
The architecture of DB-GPT is shown in the following figure:
<p align="center">
<img src="./assets/DB-GPT.png" width="800" />
</p>
The core capabilities primarily consist of the following components:
1. Multi-Models: We support multiple Large Language Models (LLMs) such as LLaMA/LLaMA2, CodeLLaMA, ChatGLM, QWen, Vicuna, and proxy models like ChatGPT, Baichuan, Tongyi, Wenxin, and more.
2. Knowledge-Based QA: Our system enables high-quality intelligent Q&A based on local documents such as PDFs, Word documents, Excel files, and other data sources.
3. Embedding: We offer unified data vector storage and indexing. Data is embedded as vectors and stored in vector databases, allowing for content similarity search.
4. Multi-Datasources: This feature connects different modules and data sources, facilitating data flow and interaction.
5. Multi-Agents: Our platform provides Agent and plugin mechanisms, empowering users to customize and enhance the system's behaviour.
6. Privacy & Security: Rest assured that there is no risk of data leakage, and your data is 100% private and secure.
7. Text2SQL: We enhance Text-to-SQL performance through Supervised Fine-Tuning (SFT) applied to Large Language Models (LLMs).
### SubModule
- [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) Text-to-SQL workflow with high performance by applying Supervised Fine-Tuning (SFT) on Large Language Models (LLMs).
- [DB-GPT-Plugins](https://github.com/eosphoros-ai/DB-GPT-Plugins) DB-GPT Plugins that can run Auto-GPT plugin directly
- [DB-GPT-Web](https://github.com/eosphoros-ai/DB-GPT-Web) ChatUI for DB-GPT
## Image
🌐 [AutoDL Image](https://www.codewithgpu.com/i/eosphoros-ai/DB-GPT/dbgpt)
@@ -151,106 +161,8 @@ The core capabilities primarily consist of the following components:
## Contribution
- Please run `black .` before submitting the code.
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/csunny/DB-GPT/blob/main/CONTRIBUTING.md)
- To check detailed guidelines for new contributions, please refer [how to contribute](https://github.com/eosphoros-ai/DB-GPT/blob/main/CONTRIBUTING.md)
## RoadMap
<p align="left">
<img src="./assets/roadmap.jpg" width="800px" />
</p>
### KBQA RAG optimization
- [x] Multi Documents
- [x] PDF
- [x] Excel, CSV
- [x] Word
- [x] Text
- [x] MarkDown
- [ ] Code
- [ ] Images
- [x] RAG
- [ ] Graph Database
- [ ] Neo4j Graph
- [ ] Nebula Graph
- [x] Multi-Vector Database
- [x] Chroma
- [x] Milvus
- [x] Weaviate
- [x] PGVector
- [ ] Elasticsearch
- [ ] ClickHouse
- [ ] Faiss
- [ ] Testing and Evaluation Capability Building
- [ ] Knowledge QA datasets
- [ ] Question collection [easy, medium, hard]:
- [ ] Scoring mechanism
- [ ] Testing and evaluation using Excel + DB datasets
### Multi Datasource Support
- Multi Datasource Support
- [x] MySQL
- [x] PostgreSQL
- [x] Spark
- [x] DuckDB
- [x] Sqlite
- [x] MSSQL
- [x] ClickHouse
- [ ] Oracle
- [ ] Redis
- [ ] MongoDB
- [ ] HBase
- [x] Doris
- [ ] DB2
- [ ] Couchbase
- [ ] Elasticsearch
- [ ] OceanBase
- [ ] TiDB
- [ ] StarRocks
### Multi-Models And vLLM
- [x] [Cluster Deployment](https://docs.dbgpt.site/docs/installation/model_service/cluster)
- [x] [Fastchat Support](https://github.com/lm-sys/FastChat)
- [x] [vLLM Support](https://docs.dbgpt.site/docs/installation/advanced_usage/vLLM_inference)
- [ ] Cloud-native environment and support for Ray environment
- [ ] Service Registry(eg:nacos)
- [ ] Compatibility with OpenAI's interfaces
- [ ] Expansion and optimization of embedding models
### Agents market and Plugins
- [x] multi-agents framework
- [x] custom plugin development
- [x] plugin market
- [ ] Integration with CoT
- [ ] Enrich plugin sample library
- [ ] Support for AutoGPT protocol
- [ ] Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards
### Cost and Observability
- [x] [debugging](https://docs.dbgpt.site/docs/application_manual/advanced_tutorial/debugging)
- [ ] Observability
- [ ] cost & budgets
### Text2SQL Finetune
- support llms
- [x] LLaMA
- [x] LLaMA-2
- [x] BLOOM
- [x] BLOOMZ
- [x] Falcon
- [x] Baichuan
- [x] Baichuan2
- [x] InternLM
- [x] Qwen
- [x] XVERSE
- [x] ChatGLM2
- SFT Accuracy
As of October 10, 2023, through the fine-tuning of an open-source model with 13 billion parameters using this project, we have achieved execution accuracy on the Spider dataset that surpasses even GPT-4!
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
## Licence
The MIT License (MIT)
@@ -272,8 +184,4 @@ If you find `DB-GPT` useful for your research or development, please cite the fo
We are working on building a community, if you have any ideas for building the community, feel free to contact us.
[![](https://dcbadge.vercel.app/api/server/7uQnPuveTY?compact=true&style=flat)](https://discord.gg/7uQnPuveTY)
<p align="center">
<img src="./assets/wechat.jpg" width="300px" />
</p>
[![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT&type=Date)](https://star-history.com/#csunny/DB-GPT)