Merge remote-tracking branch 'origin/dbgpt_doc' into dev

# Conflicts:
#	pilot/server/webserver.py
This commit is contained in:
yhjun1026 2023-05-25 11:40:40 +08:00
commit 0e4955a62a
30 changed files with 494 additions and 25 deletions

22
.readthedocs.yaml Normal file
View File

@ -0,0 +1,22 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.10"
sphinx:
configuration: docs/conf.py
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt
- method: pip
path: .

View File

@ -191,8 +191,8 @@ To use multiple models, modify the LLM_MODEL parameter in the .env configuration
2.Run the knowledge repository script in the tools directory.
```
python tools/knowledge_init.py
```bash
& python tools/knowledge_init.py
--vector_name : your vector store name default_value:default
--append: append mode, True:append, False: not append default_value:False
@ -202,6 +202,15 @@ python tools/knowledge_init.py
3.Add the knowledge repository in the interface by entering the name of your knowledge repository (if not specified, enter "default") so you can use it for Q&A based on your knowledge base.
Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory.
If nltk-related errors occur during the use of the knowledge base, you need to install the nltk toolkit. For more details, please refer to: [nltk documents](https://www.nltk.org/data.html)
Run the Python interpreter and type the commands:
```bash
>>> import nltk
>>> nltk.download()
```
## Acknowledgement
The achievements of this project are thanks to the technical community, especially the following projects:

View File

@ -115,7 +115,6 @@ DB-GPT基于 [FastChat](https://github.com/lm-sys/FastChat) 构建大模型运
用户只需要整理好知识文档,即可用我们现有的能力构建大模型所需要的知识库能力。
### 大模型管理能力
在底层大模型接入中设计了开放的接口支持对接多种大模型。同时对于接入模型的效果我们有非常严格的把控与评审机制。对大模型能力上与ChatGPT对比在准确率上需要满足85%以上的能力对齐。我们用更高的标准筛选模型,是期望在用户使用过程中,可以省去前面繁琐的测试评估环节。
@ -188,7 +187,7 @@ $ python webserver.py
### 多模型使用
在.env 配置文件当中, 修改LLM_MODEL参数来切换使用的模型。
####打造属于你的知识库:
### 打造属于你的知识库:
1、将个人知识文件或者文件夹放入pilot/datasets目录中
@ -204,6 +203,14 @@ python tools/knowledge_init.py
3、在界面上新增知识库输入你的知识库名如果没指定输入default,就可以根据你的知识库进行问答
注意这里默认向量模型是text2vec-large-chinese(模型比较大如果个人电脑配置不够建议采用text2vec-base-chinese),因此确保需要将模型download下来放到models目录中。
如果在使用知识库时遇到与nltk相关的错误您需要安装nltk工具包。更多详情请参见[nltk文档](https://www.nltk.org/data.html)
Run the Python interpreter and type the commands:
```bash
>>> import nltk
>>> nltk.download()
```
## 感谢
项目取得的成果,需要感谢技术社区,尤其以下项目。
@ -225,14 +232,12 @@ python tools/knowledge_init.py
<!-- GITCONTRIBUTOR_START -->
## 贡献者
## Contributors
|[<img src="https://avatars.githubusercontent.com/u/17919400?v=4" width="100px;"/><br/><sub><b>csunny</b></sub>](https://github.com/csunny)<br/>|[<img src="https://avatars.githubusercontent.com/u/1011681?v=4" width="100px;"/><br/><sub><b>xudafeng</b></sub>](https://github.com/xudafeng)<br/>|[<img src="https://avatars.githubusercontent.com/u/7636723?s=96&v=4" width="100px;"/><br/><sub><b>明天</b></sub>](https://github.com/yhjun1026)<br/> | [<img src="https://avatars.githubusercontent.com/u/13723926?v=4" width="100px;"/><br/><sub><b>Aries-ckt</b></sub>](https://github.com/Aries-ckt)<br/>|[<img src="https://avatars.githubusercontent.com/u/95130644?v=4" width="100px;"/><br/><sub><b>thebigbone</b></sub>](https://github.com/thebigbone)<br/>|
| :---: | :---: | :---: | :---: |:---: |
[git-contributor 说明](https://github.com/xudafeng/git-contributor),自动生成时间:`Fri May 19 2023 00:24:18 GMT+0800`。
This project follows the git-contributor [spec](https://github.com/xudafeng/git-contributor), auto updated at `Sun May 14 2023 23:02:43 GMT+0800`.
<!-- GITCONTRIBUTOR_END -->

20
docs/Makefile Normal file
View File

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

56
docs/conf.py Normal file
View File

@ -0,0 +1,56 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import toml
project = "DB-GPT"
copyright = "2023, csunny"
author = "csunny"
with open("../pyproject.toml") as f:
data = toml.load(f)
version = data["tool"]["poetry"]["version"]
release = version
html_title = project + " " + version
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autodoc.typehints",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"myst_nb",
"sphinx_copybutton",
"sphinx_panels",
"IPython.sphinxext.ipython_console_highlighting",
]
source_suffix = [".ipynb", ".html", ".md", ".rst"]
autodoc_pydantic_model_show_json = False
autodoc_pydantic_field_list_validators = False
autodoc_pydantic_config_members = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_show_field_summary = False
autodoc_pydantic_model_members = False
autodoc_pydantic_model_undoc_members = False
templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
html_theme = "sphinx_book_theme"
html_static_path = ["_static"]

1
docs/ecosystem.md Normal file
View File

@ -0,0 +1 @@
# Ecosystem

View File

@ -0,0 +1,2 @@
# Concepts

View File

@ -0,0 +1,51 @@
# Quickstart Guide
This tutorial gives you a quick walkthrough about use DB-GPT with you environment and data.
## Installation
To get started, install DB-GPT with the following steps.
### 1. Hardware Requirements
As our project has the ability to achieve ChatGPT performance of over 85%, there are certain hardware requirements. However, overall, the project can be deployed and used on consumer-grade graphics cards. The specific hardware requirements for deployment are as follows:
| GPU | VRAM Size | Performance |
| --------- | --------- | ------------------------------------------- |
| RTX 4090 | 24 GB | Smooth conversation inference |
| RTX 3090 | 24 GB | Smooth conversation inference, better than V100 |
| V100 | 16 GB | Conversation inference possible, noticeable stutter |
### 2. Install
This project relies on a local MySQL database service, which you need to install locally. We recommend using Docker for installation.
```bash
$ docker run --name=mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=aa12345678 -dit mysql:latest
```
We use [Chroma embedding database](https://github.com/chroma-core/chroma) as the default for our vector database, so there is no need for special installation. If you choose to connect to other databases, you can follow our tutorial for installation and configuration.
For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies.
```
python>=3.10
conda create -n dbgpt_env python=3.10
conda activate dbgpt_env
pip install -r requirements.txt
```
### 3. Run
You can refer to this document to obtain the Vicuna weights: [Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-weights) .
If you have difficulty with this step, you can also directly use the model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a replacement.
1. Run server
```bash
$ python pilot/server/llmserver.py
```
Run gradio webui
```bash
$ python pilot/server/webserver.py
```
Notice: the webserver need to connect llmserver, so you need change the .env file. change the MODEL_SERVER = "http://127.0.0.1:8000" to your address. It's very important.

View File

@ -0,0 +1,6 @@
# Tutorials
-------------
This is a collection of DB-GPT tutorials on Medium.
Comming soon...

155
docs/index.rst Normal file
View File

@ -0,0 +1,155 @@
.. DB-GPT documentation master file, created by
sphinx-quickstart on Wed May 24 11:50:49 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to DB-GPT!
==================================
| As large models are released and iterated upon, they are becoming increasingly intelligent. However, in the process of using large models, we face significant challenges in data security and privacy. We need to ensure that our sensitive data and environments remain completely controlled and avoid any data privacy leaks or security risks. Based on this, we have launched the DB-GPT project to build a complete private large model solution for all database-based scenarios. This solution supports local deployment, allowing it to be applied not only in independent private environments but also to be independently deployed and isolated according to business modules, ensuring that the ability of large models is absolutely private, secure, and controllable.
| **DB-GPT** is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.
| **Features**
Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:
- SQL language capabilities
- SQL generation
- SQL diagnosis
- Private domain Q&A and data processing
- Database knowledge Q&A
- Data processing
- Plugins
- Support custom plugin execution tasks and natively support the Auto-GPT plugin, such as:
- Unified vector storage/indexing of knowledge base
- Support for unstructured data such as PDF, Markdown, CSV, and WebURL
- Milti LLMs Support
- Supports multiple large language models, currently supporting Vicuna (7b, 13b), ChatGLM-6b (int4, int8)
- TODO: codegen2, codet5p
Getting Started
-----------------
| How to get started using DB-GPT to interact with your data and environment.
- `Quickstart Guid <./getting_started/getting_started.html>`_
| Concepts and terminology
- `Concepts and terminology <./getting_started/concepts.html>`_
| Coming soon...
- `Tutorials <.getting_started/tutorials.html>`_
.. toctree::
:maxdepth: 2
:caption: Getting Started
:hidden:
getting_started/getting_started.md
getting_started/concepts.md
getting_started/tutorials.md
Modules
---------
| These modules are the core abstractions with which we can interact with data and environment smoothly.
It's very important for DB-GPT, DB-GPT also provide standard, extendable interfaces.
| The docs for each module contain quickstart examples, how to guides, reference docs, and conceptual guides.
| The modules are as follows
- `LLMs <./modules/llms.html>`_: Supported multi models management and integrations.
- `Prompts <./modules/prompts.html>`_: Prompt management, optimization, and serialization for multi database.
- `Plugins <./modules/plugins.html>`_: Plugins management, scheduler.
- `Knownledge <./modules/knownledge.html>`_: Knownledge management, embedding, and search.
- `Connections <./modules/connections.html>`_: Supported multi databases connection. management connections and interact with this.
.. toctree::
:maxdepth: 2
:caption: Modules
:name: modules
:hidden:
./modules/llms.md
./modules/prompts.md
./modules/plugins.md
./modules/connections.md
./modules/knownledge.md
Use Cases
---------
| Best Practices and built-in implementations for common DB-GPT use cases:
- `Sql generation and diagnosis <./use_cases/sql_generation_and_diagnosis.html>`: SQL generation and diagnosis.
- `knownledge Based QA <./use_cases/knownledge_based_qa.html>`_: A important scene for user to chat with database documents, codes, bugs and schemas.
- `Chatbots <./use_cases/chatbots.html>`_: Language model love to chat, use multi models to chat.
- `Querying Database Data <./use_cases/query_database_data.html>`_: Query and Analysis data from databases and give charts.
- `Interacting with apis <./use_cases/interacting_with_api.html>`_: Interact with apis, such as create a table, deploy a database cluster, create a database and so on.
- `Tool use with plugins <./use_cases/tool_use_with_plugin>`_: According to Plugin use tools to manage databases autonomoly.
.. toctree::
:maxdepth: 2
:caption: Use Cases
:name: use_cases
:hidden:
./use_cases/sql_generation_and_diagnosis.md
./use_cases/knownledge_based_qa.md
./use_cases/chatbots.md
./use_cases/query_database_data.md
./use_cases/interacting_with_api.md
./use_cases/tool_use_with_plugin.md
Reference
-----------
| Full documentation on all methods, classes, installation methods, and integration setups for DB-GPT.
.. toctree::
:maxdepth: 1
:caption: Reference
:name: reference
:hidden:
./reference.md
Ecosystem
----------
| Guides for how other companies/products can be used with DB-GPT
.. toctree::
:maxdepth: 1
:glob:
:caption: Ecosystem
:name: ecosystem
:hidden
./ecosystem.md
Resources
----------
| Additional resources we think may be useful as you develop your application!
- `Discord <https://discord.com/invite/twmZk3vv>`_: if your have some problem or ideas, you can talk from discord.
.. toctree::
:maxdepth: 1
:caption: Resources
:name: resources
:hidden:

View File

@ -1 +0,0 @@
#

35
docs/make.bat Normal file
View File

@ -0,0 +1,35 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

View File

@ -0,0 +1,4 @@
# Connections
In order to interact more conveniently with users' private environments, the project has designed a connection module, which can support connection to databases, Excel, knowledge bases, and other environments to achieve information and data exchange.

3
docs/modules/index.md Normal file
View File

@ -0,0 +1,3 @@
# Vector storage and indexing
In order to facilitate the management of knowledge after vectorization, we have built-in multiple vector storage engines, from memory-based Chroma to distributed Milvus. Users can choose different storage engines according to their own scenario needs. The storage of knowledge vectors is the cornerstone of AI capability enhancement. As the intermediate language for interaction between humans and large language models, vectors play a very important role in this project.

View File

@ -0,0 +1,25 @@
# Knownledge
As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as:
1. Default built-in knowledge base
2. Custom addition of knowledge bases
3. Various usage scenarios such as constructing knowledge bases through plugin capabilities and web crawling. Users only need to organize the knowledge documents, and they can use our existing capabilities to build the knowledge base required for the large model.
### Create your own knowledge repository
1.Place personal knowledge files or folders in the pilot/datasets directory.
2.Run the knowledge repository script in the tools directory.
```
python tools/knowledge_init.py
--vector_name : your vector store name default_value:default
--append: append mode, True:append, False: not append default_value:False
```
3.Add the knowledge repository in the interface by entering the name of your knowledge repository (if not specified, enter "default") so you can use it for Q&A based on your knowledge base.
Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory.

11
docs/modules/llms.md Normal file
View File

@ -0,0 +1,11 @@
# LLMs
In the underlying large model integration, we have designed an open interface that supports integration with various large models. At the same time, we have a very strict control and evaluation mechanism for the effectiveness of the integrated models. In terms of accuracy, the integrated models need to align with the capability of ChatGPT at a level of 85% or higher. We use higher standards to select models, hoping to save users the cumbersome testing and evaluation process in the process of use.
## Multi LLMs Usage
To use multiple models, modify the LLM_MODEL parameter in the .env configuration file to switch between the models.
Notice: you can create .env file from .env.template, just use command like this:
```
cp .env.template .env
```

3
docs/modules/plugins.md Normal file
View File

@ -0,0 +1,3 @@
# Plugins
The ability of Agent and Plugin is the core of whether large models can be automated. In this project, we natively support the plugin mode, and large models can automatically achieve their goals. At the same time, in order to give full play to the advantages of the community, the plugins used in this project natively support the Auto-GPT plugin ecology, that is, Auto-GPT plugins can directly run in our project.

3
docs/modules/prompts.md Normal file
View File

@ -0,0 +1,3 @@
# Prompts
Prompt is a very important part of the interaction between the large model and the user, and to a certain extent, it determines the quality and accuracy of the answer generated by the large model. In this project, we will automatically optimize the corresponding prompt according to user input and usage scenarios, making it easier and more efficient for users to use large language models.

3
docs/modules/server.md Normal file
View File

@ -0,0 +1,3 @@
# Server
TODO: In terms of terminal display, we will provide a multi-platform product interface, including PC, mobile phone, command line, Slack and other platforms.

1
docs/reference.md Normal file
View File

@ -0,0 +1 @@
# Reference

15
docs/requirements.txt Normal file
View File

@ -0,0 +1,15 @@
autodoc_pydantic==1.8.0
myst_parser
nbsphinx==0.8.9
sphinx==4.5.0
recommonmark
sphinx_intl
sphinx-autobuild==2021.3.14
sphinx_book_theme
sphinx_rtd_theme==1.0.0
sphinx-typlog-theme==0.8.0
sphinx-panels
toml
myst_nb
sphinx_copybutton
pydata-sphinx-theme==0.13.1

View File

@ -0,0 +1 @@
# Chatbot

View File

@ -0,0 +1 @@
# Interacting with api

View File

@ -0,0 +1 @@
# Knownledge based qa

View File

@ -0,0 +1 @@
# Query database data

View File

@ -0,0 +1 @@
# SQL generation and diagnosis

View File

@ -0,0 +1 @@
# Tool use with plugin

View File

@ -37,14 +37,6 @@ LLM_MODEL_CONFIG = {
"sentence-transforms": os.path.join(MODEL_PATH, "all-MiniLM-L6-v2"),
}
VECTOR_SEARCH_TOP_K = 20
LLM_MODEL = "vicuna-13b"
LIMIT_MODEL_CONCURRENCY = 5
MAX_POSITION_EMBEDDINGS = 4096
# VICUNA_MODEL_SERVER = "http://121.41.227.141:8000"
VICUNA_MODEL_SERVER = "http://120.79.27.110:8000"
# Load model config
ISLOAD_8BIT = True
ISDEBUG = False

49
pyproject.toml Normal file
View File

@ -0,0 +1,49 @@
[tool.poetry]
name = "db-gpt"
version = "0.0.6"
description = "Interact with your data and environment privately"
authors = []
readme = "README.md"
license = "MIT"
packages = [{include = "db_gpt"}]
repository = "https://www.github.com/csunny/DB-GPT"
[tool.poetry.dependencies]
python = "^3.10"
accelerate = "^0.16"
[tool.poetry.group.docs.dependencies]
autodoc_pydantic = "^1.8.0"
myst_parser = "^0.18.1"
nbsphinx = "^0.8.9"
sphinx = "^4.5.0"
sphinx-autobuild = "^2021.3.14"
sphinx_book_theme = "^0.3.3"
sphinx_rtd_theme = "^1.0.0"
sphinx-typlog-theme = "^0.8.0"
sphinx-panels = "^0.6.0"
toml = "^0.10.2"
myst-nb = "^0.17.1"
linkchecker = "^10.2.1"
sphinx-copybutton = "^0.5.1"
[tool.poetry.group.test.dependencies]
# The only dependencies that should be added are
# dependencies used for running tests (e.g., pytest, freezegun, response).
# Any dependencies that do not meet that criteria will be removed.
pytest = "^7.3.0"
pytest-cov = "^4.0.0"
pytest-dotenv = "^0.5.2"
duckdb-engine = "^0.7.0"
pytest-watcher = "^0.2.6"
freezegun = "^1.2.2"
responses = "^0.22.0"
pytest-asyncio = "^0.20.3"
lark = "^1.1.5"
pytest-mock = "^3.10.0"
pytest-socket = "^0.6.0"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

View File

@ -1,4 +1,3 @@
accelerate==0.16.0
torch==2.0.0
accelerate==0.16.0
aiohttp==3.8.4
@ -24,15 +23,12 @@ pycocotools==2.0.6
pyparsing==3.0.9
python-dateutil==2.8.2
pyyaml==6.0
regex==2022.10.31
tokenizers==0.13.2
tqdm==4.64.1
transformers==4.28.0
timm==0.6.13
spacy==3.5.1
webdataset==0.2.48
scikit-learn==1.2.2
scipy==1.10.1
yarl==1.8.2
zipp==3.14.0
omegaconf==2.3.0
@ -41,7 +37,6 @@ iopath==0.1.10
tenacity==8.2.2
peft
pycocoevalcap
sentence-transformers
cpm_kernels
umap-learn
notebook
@ -51,12 +46,10 @@ wandb
llama-index==0.5.27
pymysql
unstructured==0.6.3
pytesseract==0.3.10
grpcio==1.47.5
auto-gpt-plugin-template
pymdown-extensions
mkdocs
requests
gTTS==2.3.1
langchain
nltk
@ -69,7 +62,6 @@ colorama
playsound
distro
pypdf
milvus-cli==0.3.2
# Testing dependencies
pytest
@ -79,4 +71,5 @@ pytest-benchmark
pytest-cov
pytest-integration
pytest-mock
pytest-recording
pytest-recording
pytesseract==0.3.10