Interact your data and environment using the local GPT, no data leaks, 100% privately, 100% security
Go to file
2023-10-09 17:24:24 +08:00
.github feat(ci): modify PR release drafter 2023-09-28 15:39:26 +08:00
assets chores: wechat update 2023-10-07 17:32:46 +08:00
docker feat(core): Multi-module dependency splitting 2023-09-20 17:31:56 +08:00
docs feat(model): llama.cpp support new GGUF file format 2023-10-07 21:12:53 +08:00
examples feat: define framework and split api 2023-06-20 19:36:35 +08:00
pilot fix: merge conflicts 2023-10-09 14:33:08 +08:00
plugins fix:unusual file 2023-05-15 00:18:06 +08:00
requirements feat(core): Multi-module dependency splitting 2023-09-20 17:31:56 +08:00
scripts chore: add autodl online image address 2023-09-17 17:00:00 +08:00
tests feat: add pgvector vectorstore 2023-10-08 16:57:49 +08:00
tools feat: Command-line tool with knowledge repository initialization 2023-09-01 18:21:22 +08:00
.dockerignore docker ignore plugins too 2023-06-22 16:28:48 -07:00
.env.template fix:.env customize vector store config does not work 2023-10-08 22:10:50 +05:00
.gitignore feat(core): Multi-module dependency splitting 2023-09-20 17:31:56 +08:00
.plugin_env.template add plugin_env file, define plugin config strategy. 2023-06-13 15:58:24 +08:00
.readthedocs.yaml fix 2023-05-25 11:14:58 +08:00
CONTRIBUTING.md rm oceanbase document 2023-05-08 23:01:52 +08:00
docker-compose.yml feat: Multi-model support with proxyllm and add more command-cli 2023-09-05 11:26:24 +08:00
LICENSE Initial commit 2023-04-13 22:52:44 +08:00
MANIFEST.in fix: no MANIFEST.in 2023-08-09 21:00:35 +08:00
README.md update: readme update roadmap 2023-10-09 17:24:24 +08:00
README.zh.md chore:discord expire 2023-09-28 21:40:04 +08:00
setup.py chore: Not cache package in local file default 2023-10-07 21:29:53 +08:00

DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

What is DB-GPT?

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Contents

DB-GPT Youtube Video

Demo

Run on an RTX 4090 GPU.

demo_en

Chat with data, and figure charts.

db plugins demonstration

Features

Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:

  • SQL language capabilities

    • SQL generation
    • SQL diagnosis
  • Private domain Q&A and data processing

    • Knowledge Management(We currently support many document formats: txt, pdf, md, html, doc, ppt, and url.)
  • ChatDB

  • ChatExcel

  • ChatDashboard

  • Multi-Agents&Plugins

  • Unified vector storage/indexing of knowledge base

    • Support for unstructured data such as PDF, TXT, Markdown, CSV, DOC, PPT, and WebURL
  • Multi LLMs Support, Supports multiple large language models, currently supporting

    • 🔥 InternLM(7b,20b)
    • 🔥 Baichuan2(7b,13b)
    • 🔥 Vicuna-v1.5(7b,13b)
    • 🔥 llama-2(7b,13b,70b)
    • WizardLM-v1.2(13b)
    • Vicuna (7b,13b)
    • ChatGLM-6b (int4,int8)
    • ChatGLM2-6b (int4,int8)
    • guanaco(7b,13b,33b)
    • Gorilla(7b,13b)
    • baichuan(7b,13b)
  • Support API Proxy LLMs

    • ChatGPT
    • Tongyi
    • Wenxin
    • Spark
    • MiniMax
    • ChatGLM

Introduction

DB-GPT creates a vast model operating system using FastChat and offers a large language model powered by vicuna. In addition, we provide private domain knowledge base question-answering capability. Furthermore, we also provide support for additional plugins, and our design natively supports the Auto-GPT plugin.Our vision is to make it easier and more convenient to build applications around databases and llm.

Is the architecture of the entire DB-GPT shown in the following figure:

The core capabilities mainly consist of the following parts:

  1. Knowledge base capability: Supports private domain knowledge base question-answering capability.
  2. Large-scale model management capability: Provides a large model operating environment based on FastChat.
  3. Unified data vector storage and indexing: Provides a uniform way to store and index various data types.
  4. Connection module: Used to connect different modules and data sources to achieve data flow and interaction.
  5. Agent and plugins: Provides Agent and plugin mechanisms, allowing users to customize and enhance the system's behavior.
  6. Prompt generation and optimization: Automatically generates high-quality prompts and optimizes them to improve system response efficiency.
  7. Multi-platform product interface: Supports various client products, such as web, mobile applications, and desktop applications.

SubModule

Image

🌐 AutoDL Image

Install

Docker Linux macOS Windows

Quickstart

Language Switching

In the .env configuration file, modify the LANGUAGE parameter to switch to different languages. The default is English (Chinese: zh, English: en, other languages to be added later).

Contribution

RoadMap

KBQA RAG optimization

  • [] KnownledgeGraph

Multi Datasource Support

DataSource support Notes
MySQL Yes
PostgresSQL Yes
Spark Yes
DuckDB Yes
Sqlite Yes
MSSQL Yes
ClickHouse Yes
Oracle No TODO
Redis No TODO
MongoDB No TODO
HBase No TODO
Doris No TODO
DB2 No TODO
Couchbase No TODO
Elasticsearch No TODO
OceanBase No TODO
TiDB No TODO
StarRocks No TODO

Multi-Models And vLLM

Agents market and Plugins

  • multi-agents framework
  • custom plugin development
  • plugin market

Text2SQL Finetune

LLMs Size Module Template
LLaMA 7B/13B/33B/65B q_proj,v_proj -
LLaMA-2 7B/13B/70B q_proj,v_proj llama2
BLOOM 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
BLOOMZ 560M/1.1B/1.7B/3B/7.1B/176B query_key_value -
Falcon 7B/40B query_key_value -
Baichuan 7B/13B W_pack baichuan
Baichuan2 7B/13B W_pack baichuan2
InternLM 7B q_proj,v_proj intern
Qwen 7B c_attn chatml
XVERSE 13B q_proj,v_proj xverse
ChatGLM2 6B query_key_value chatglm2

Datasets

Datasets License Link
academic Not Found https://github.com/jkkummerfeld/text2sql-data
advising CC-BY-4.0 https://github.com/jkkummerfeld/text2sql-data
atis Not Found https://github.com/jkkummerfeld/text2sql-data
restaurants Not Found https://github.com/jkkummerfeld/text2sql-data
scholar Not Found https://github.com/jkkummerfeld/text2sql-data
imdb Not Found https://github.com/jkkummerfeld/text2sql-data
yelp Not Found https://github.com/jkkummerfeld/text2sql-data
criteria2sql Apache-2.0 https://github.com/xiaojingyu92/Criteria2SQL
css CC-BY-4.0 https://huggingface.co/datasets/zhanghanchong/css
eICU CC-BY-4.0 https://github.com/glee4810/EHRSQL
mimic_iii CC-BY-4.0 https://github.com/glee4810/EHRSQL
geonucleardata CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
greatermanchestercrime CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
studentmathscore CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
thehistoryofbaseball CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
uswildfires CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
whatcdhiphop CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
worldsoccerdatabase CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
pesticide CC-BY-SA-4.0 https://github.com/chiahsuan156/KaggleDBQA
mimicsql_data MIT https://github.com/wangpinggl/TREQS
nvbench MIT https://github.com/TsinghuaDatabaseGroup/nvBench
sede Apache-2.0 https://github.com/hirupert/sede
spider CC-BY-SA-4.0 https://huggingface.co/datasets/spider
sql_create_context CC-BY-4.0 https://huggingface.co/datasets/b-mc2/sql-create-context
squall CC-BY-SA-4.0 https://github.com/tzshi/squall
wikisql BSD 3-Clause https://github.com/salesforce/WikiSQL
BIRD Not Found https://bird-bench.github.io/
CHASE MIT LICENSE https://xjtu-intsoft.github.io/chase/
cosql Not Found https://yale-lily.github.io/cosql/

More Information about Text2SQL finetune

Licence

The MIT License (MIT)

Contact Information

We are working on building a community, if you have any ideas about building the community, feel free to contact us.

Star History Chart