Readme: add english version (#46)

Update:  Add english version readme file, and update readme.
This commit is contained in:
magic.chen
2023-05-16 20:31:01 +08:00
committed by GitHub
3 changed files with 95 additions and 83 deletions

View File

@@ -1,4 +1,4 @@
# DB-GPT # DB-GPT ![GitHub Repo stars](https://img.shields.io/github/stars/csunny/db-gpt?style=social)
--- ---
@@ -6,103 +6,97 @@
[![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT)](https://star-history.com/#csunny/DB-GPT) [![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT)](https://star-history.com/#csunny/DB-GPT)
A Open Database-GPT Experiment, interact your data and environment using the local GPT, no data leaks, 100% privately, 100% security. ## What is DB-GPT?
As large models are released and iterated upon, they are becoming increasingly intelligent. However, in the process of using large models, we face significant challenges in data security and privacy. We need to ensure that our sensitive data and environments remain completely controlled and avoid any data privacy leaks or security risks. Based on this, we have launched the DB-GPT project to build a complete private large model solution for all database-based scenarios. This solution supports local deployment, allowing it to be applied not only in independent private environments but also to be independently deployed and isolated according to business modules, ensuring that the ability of large models is absolutely private, secure, and controllable.
DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
## Features ## Features
We have currently released various features, which are listed below to showcase the capabilities that have been released so far: Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:
- SQL language capabilities - SQL language capabilities
- SQL generation - SQL generation
- SQL diagnosis - SQL diagnosis
- Private domain Q&A and data processing - Private domain Q&A and data processing
- Database knowledge Q&A - Database knowledge Q&A
- Data processing - Data processing
- Plugin model - Plugins
- Supports custom plugins to perform tasks, with native support for Auto-GPT plugins. For example: - Support custom plugin execution tasks and natively support the Auto-GPT plugin, such as:
- Automatic SQL execution to obtain query results - Automatic execution of SQL and retrieval of query results
- Automatic crawling of learning knowledge - Automatic crawling and learning of knowledge
- Vector storage/indexing of knowledge base - Unified vector storage/indexing of knowledge base
- Support for unstructured data - Support for unstructured data such as PDF, Markdown, CSV, and WebURL
- PDF, Markdown, CSV, WebURL
## Demo ## Demo
Run on an RTX 4090 GPU. [YouTube](https://www.youtube.com/watch?v=1PWI6F89LPo) Run on an RTX 4090 GPU. [YouTube](https://www.youtube.com/watch?v=1PWI6F89LPo)
### Run ### Run
<p align="center"> <p align="center">
<img src="./assets/演示.gif" width="600px" /> <img src="./assets/demo_en.gif" width="600px" />
</p> </p>
### SQL Generation
1. Generate Create Table SQL
<p align="center"> <p align="center">
<img src="./assets/Auto-DB-GPT.gif" width="600px" /> <img src="./assets/SQL_Gen_CreateTable_en.png" width="600px" />
</p> </p>
### SQL Generate 2. Generating executable SQL:To generate executable SQL, first select the corresponding database and then the model can generate SQL based on the corresponding database schema information. The successful result of running it would be demonstrated as follows:
1. Generate SQL based on schema information.
<p align="center"> <p align="center">
<img src="./assets/SQL_Gen_CreateTable.png" width="600px" /> <img src="./assets/exeable_en.png" width="600px" />
</p>
2. Generate executable SQL.
First, select the corresponding database, and then the model can generate SQL based on the corresponding database schema information. The successful execution result is demonstrated below:
<p align="center">
<img src="./assets/exeable.png" width="600px" />
</p> </p>
### Q&A ### Q&A
<p align="center"> <p align="center">
<img src="./assets/DB_QA.png" width="600px" /> <img src="./assets/DB_QA_en.png" width="600px" />
</p> </p>
1. Based on the default built-in knowledge base, question and answer. 1. Based on the default built-in knowledge base, question and answer.
- TODO
<p align="center">
<img src="./assets/VectorDBQA.png" width="600px" />
</p>
2. Add your own knowledge base. 2. Add your own knowledge base.
<p align="center"> - TODO
<img src="./assets/new_knownledge.gif" width="600px" />
</p>
3. Learn by scraping data from the internet. 3. Learning from crawling data from the Internet
- TODO
- TODO
## Architecture Design ## Introduction
[DB-GPT](https://github.com/csunny/DB-GPT) is an experimental open-source application that builds upon the [FastChat](https://github.com/lm-sys/FastChat) model and uses vicuna as its base model. Additionally, it looks like this application incorporates langchain and llama-index embedding knowledge to improve Database-QA capabilities. DB-GPT creates a vast model operating system using [FastChat](https://github.com/lm-sys/FastChat) and offers a large language model powered by [Vicuna](https://huggingface.co/Tribbiani/vicuna-7b). In addition, we provide private domain knowledge base question-answering capability through LangChain. Furthermore, we also provide support for additional plugins, and our design natively supports the Auto-GPT plugin.
Is the architecture of the entire DB-GPT shown in the following figure:
Overall, it appears to be a sophisticated and innovative tool for working with databases. If you have any specific questions about how to use or implement DB-GPT in your work, please let me know and I'll do my best to assist you.
<p align="center"> <p align="center">
<img src="./assets/DB-GPT.png" width="600px" /> <img src="./assets/DB-GPT.png" width="600px" />
</p> </p>
The core capabilities mainly include the following parts: The core capabilities mainly consist of the following parts:
1. Knowledge base capability: Supports private domain knowledge base question-answering capability.
1. Knowledge base 2. Large-scale model management capability: Provides a large model operating environment based on FastChat.
2. LLMs management 3. Unified data vector storage and indexing: Provides a uniform way to store and index various data types.
3. Vector storage and indexing 4. Connection module: Used to connect different modules and data sources to achieve data flow and interaction.
4. Connections 5. Agent and plugins: Provides Agent and plugin mechanisms, allowing users to customize and enhance the system's behavior.
5. Agent and plugins 6. Prompt generation and optimization: Automatically generates high-quality prompts and optimizes them to improve system response efficiency.
6. Prompt generation and optimization 7. Multi-platform product interface: Supports various client products, such as web, mobile applications, and desktop applications.
7. Multi-end product interface
Below is a brief introduction to each module: Below is a brief introduction to each module:
### Knowledge Base ### Knowledge base capability
As the most important scenario for current user needs, we natively support the construction and processing of knowledge bases. At the same time, in this project, we also provide various knowledge base management strategies, such as: As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as:
1. Default built-in knowledge base 1. Default built-in knowledge base
2. Customized new knowledge base 2. Custom addition of knowledge bases
3. Building knowledge bases through plug-in capabilities and other usage scenarios. 3. Various usage scenarios such as constructing knowledge bases through plugin capabilities and web crawling. Users only need to organize the knowledge documents, and they can use our existing capabilities to build the knowledge base required for the large model.
Users only need to organize their knowledge documents and use our existing capabilities to build the knowledge base required for large models.
### LLMs Management ### LLMs Management
@@ -138,6 +132,7 @@ As our project has the ability to achieve ChatGPT performance of over 85%, there
| RTX 4090 | 24 GB | Smooth conversation inference | | RTX 4090 | 24 GB | Smooth conversation inference |
| RTX 3090 | 24 GB | Smooth conversation inference, better than V100 | | RTX 3090 | 24 GB | Smooth conversation inference, better than V100 |
| V100 | 16 GB | Conversation inference possible, noticeable stutter | | V100 | 16 GB | Conversation inference possible, noticeable stutter |
### 2. Install ### 2. Install
This project relies on a local MySQL database service, which you need to install locally. We recommend using Docker for installation. This project relies on a local MySQL database service, which you need to install locally. We recommend using Docker for installation.
@@ -162,36 +157,41 @@ It is recommended to set the Python package path to avoid runtime errors due to
``` ```
echo "/root/workspace/DB-GPT" > /root/miniconda3/env/dbgpt_env/lib/python3.10/site-packages/dbgpt.pth echo "/root/workspace/DB-GPT" > /root/miniconda3/env/dbgpt_env/lib/python3.10/site-packages/dbgpt.pth
``` ```
Notice: You need replace the path to your owner.
### 3. Run ### 3. Run
You can refer to this document to obtain the Vicuna weights: [Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-weights) . You can refer to this document to obtain the Vicuna weights: [Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-weights) .
If you have difficulty with this step, you can also directly use the model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a replacement. If you have difficulty with this step, you can also directly use the model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a replacement.
1. Run server
```bash ```bash
$ cd pilot/server $ python pilot/server/llmserver.py
$ python vicuna_server.py
``` ```
Run gradio webui Run gradio webui
```bash ```bash
$ python webserver.py $ python pilot/server/webserver.py
``` ```
Notice: the webserver need to connect llmserver, so you need change the pilot/configs/model_config.py file. change the VICUNA_MODEL_SERVER = "http://127.0.0.1:8000" to your address. It's very important.
## Usage Instructions ## Usage Instructions
We provide a user interface for Gradio, which allows you to use DB-GPT through our user interface. Additionally, we have prepared several reference articles (written in Chinese) that introduce the code and principles related to our project. We provide a user interface for Gradio, which allows you to use DB-GPT through our user interface. Additionally, we have prepared several reference articles (written in Chinese) that introduce the code and principles related to our project.
1. [大模型实战系列(1) — 强强联合Langchain-Vicuna应用实战](https://zhuanlan.zhihu.com/p/628750042) - [LLM Practical In Action Series (1) — Combined Langchain-Vicuna Application Practical](https://medium.com/@cfqcsunny/llm-practical-in-action-series-1-combined-langchain-vicuna-application-practical-701cd0413c9f)
2. [大模型实战系列(2) —— DB-GPT 阿里云部署指南](https://zhuanlan.zhihu.com/p/629467580)
3. [大模型实战系列(3) —— DB-GPT插件模型原理与使用](https://zhuanlan.zhihu.com/p/629623125)
## Thanks ## Acknowledgement
- [FastChat](https://github.com/lm-sys/FastChat) The achievements of this project are thanks to the technical community, especially the following projects:
- [vicuna-13b](https://huggingface.co/Tribbiani/vicuna-13b) - FastChat for providing chat services
- [langchain](https://github.com/hwchase17/langchain) - vicuna-13b as the base model
- [llama-index](https://github.com/jerryjliu/llama_index) and [In-Context Learning](https://arxiv.org/abs/2301.00234) - langchain tool chain
- Auto-GPT universal plugin template
- Hugging Face for big model management
- Chroma for vector storage
- Milvus for distributed vector storage
- ChatGLM as the base model
- llama-index for enhancing database-related knowledge using in-context learning based on existing knowledge bases.
<!-- GITCONTRIBUTOR_START --> <!-- GITCONTRIBUTOR_START -->
@@ -208,3 +208,11 @@ This project follows the git-contributor [spec](https://github.com/xudafeng/git-
## Licence ## Licence
The MIT License (MIT) The MIT License (MIT)
## Contact Information
We are working on building a community, if you have any ideas about building the community, feel free to contact me us.
name | email|
---------|---------------------
yushun06| my_prophet@hotmail.com
csunny | cfqcsunny@gmail.com

View File

@@ -1,18 +1,15 @@
# DB-GPT ![GitHub Repo stars](https://img.shields.io/github/stars/csunny/db-gpt?style=social) # DB-GPT ![GitHub Repo stars](https://img.shields.io/github/stars/csunny/db-gpt?style=social)
---
[English Edition](README.en.md) [English Edition](README.en.md)
[![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT)](https://star-history.com/#csunny/DB-GPT) [![Star History Chart](https://api.star-history.com/svg?repos=csunny/DB-GPT)](https://star-history.com/#csunny/DB-GPT)
## 背景 ## DB-GPT 是什么?
随着大模型的发布迭代大模型变得越来越智能在使用大模型的过程当中遇到极大的数据安全与隐私挑战。在利用大模型能力的过程中我们的私密数据跟环境需要掌握自己的手里完全可控避免任何的数据隐私泄露以及安全风险。基于此我们发起了DB-GPT项目为所有以数据库为基础的场景构建一套完整的私有大模型解决方案。 此方案因为支持本地部署,所以不仅仅可以应用于独立私有环境,而且还可以根据业务模块独立部署隔离,让大模型的能力绝对私有、安全、可控。 随着大模型的发布迭代大模型变得越来越智能在使用大模型的过程当中遇到极大的数据安全与隐私挑战。在利用大模型能力的过程中我们的私密数据跟环境需要掌握自己的手里完全可控避免任何的数据隐私泄露以及安全风险。基于此我们发起了DB-GPT项目为所有以数据库为基础的场景构建一套完整的私有大模型解决方案。 此方案因为支持本地部署,所以不仅仅可以应用于独立私有环境,而且还可以根据业务模块独立部署隔离,让大模型的能力绝对私有、安全、可控。
## 愿景
DB-GPT 是一个开源的以数据库为基础的GPT实验项目使用本地化的GPT大模型与您的数据和环境进行交互无数据泄露风险100% 私密100% 安全。 DB-GPT 是一个开源的以数据库为基础的GPT实验项目使用本地化的GPT大模型与您的数据和环境进行交互无数据泄露风险100% 私密100% 安全。
## 特性一览 ## 特性一览
目前我们已经发布了多种关键的特性,这里一一列举展示一下当前发布的能力。 目前我们已经发布了多种关键的特性,这里一一列举展示一下当前发布的能力。
@@ -27,8 +24,7 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目使用本地
- SQL自动执行获取查询结果 - SQL自动执行获取查询结果
- 自动爬取学习知识 - 自动爬取学习知识
- 知识库统一向量存储/索引 - 知识库统一向量存储/索引
- 非结构化数据支持 - 非结构化数据支持包括PDF、MarkDown、CSV、WebURL
- PDF、Markdown、CSV、WebURL
## 效果演示 ## 效果演示
@@ -58,12 +54,19 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目使用本地
<img src="./assets/exeable.png" width="600px" /> <img src="./assets/exeable.png" width="600px" />
</p> </p>
3. 自动分析执行SQL输出运行结果
<p align="center">
<img src="./assets/AUTO-DB-GPT.png" width="600px" />
</p>
### 数据库问答 ### 数据库问答
<p align="center"> <p align="center">
<img src="./assets/DB_QA.png" width="600px" /> <img src="./assets/DB_QA.png" width="600px" />
</p> </p>
1. 基于默认内置知识库问答 1. 基于默认内置知识库问答
<p align="center"> <p align="center">
@@ -89,13 +92,13 @@ DB-GPT基于 [FastChat](https://github.com/lm-sys/FastChat) 构建大模型运
</p> </p>
核心能力主要有以下几个部分。 核心能力主要有以下几个部分。
1. 知识库能力 1. 知识库能力:支持私域知识库问答能力
2. 大模型管理能力 2. 大模型管理能力基于FastChat提供一个大模型的运营环境。
3. 统一的数据向量化存储与索引 3. 统一的数据向量化存储与索引:提供一种统一的方式来存储和索引各种数据类型。
4. 连接模块 4. 连接模块:用于连接不同的模块和数据源,实现数据的流转和交互。
5. Agent与插件 5. Agent与插件提供Agent和插件机制使得用户可以自定义并增强系统的行为。
6. Prompt自动生成与优化 6. Prompt自动生成与优化自动化生成高质量的Prompt并进行优化提高系统的响应效率。
7. 多端产品界面 7. 多端产品界面支持多种不同的客户端产品例如Web、移动应用和桌面应用等。
下面对每个模块也做一些简要的介绍: 下面对每个模块也做一些简要的介绍:
@@ -175,6 +178,7 @@ python llmserver.py
```bash ```bash
$ python webserver.py $ python webserver.py
``` ```
注意: 在启动Webserver之前, 需要修改pilot/configs/model_config.py 文件中的VICUNA_MODEL_SERVER = "http://127.0.0.1:8000", 将地址设置为你的服务器地址。
## 使用说明 ## 使用说明
@@ -190,7 +194,7 @@ $ python webserver.py
- [FastChat](https://github.com/lm-sys/FastChat) 提供 chat 服务 - [FastChat](https://github.com/lm-sys/FastChat) 提供 chat 服务
- [vicuna-13b](https://huggingface.co/Tribbiani/vicuna-13b) 作为基础模型 - [vicuna-13b](https://huggingface.co/Tribbiani/vicuna-13b) 作为基础模型
- [langchain](https://github.com/hwchase17/langchain) 工具链 - [langchain](https://github.com/hwchase17/langchain) 工具链
- [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT) 通用的插件模版 - [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT) 通用的插件模版
- [Hugging Face](https://huggingface.co/) 大模型管理 - [Hugging Face](https://huggingface.co/) 大模型管理
- [Chroma](https://github.com/chroma-core/chroma) 向量存储 - [Chroma](https://github.com/chroma-core/chroma) 向量存储
- [Milvus](https://milvus.io/) 分布式向量存储 - [Milvus](https://milvus.io/) 分布式向量存储

BIN
assets/Auto-DB-GPT.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB