From c3a3a1908de175faca4ac5637bb294b93a702fe1 Mon Sep 17 00:00:00 2001 From: csunny Date: Fri, 19 May 2023 22:11:40 +0800 Subject: [PATCH 01/16] docs: add script for documents load --- README.md | 2 +- README.zh.md | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index cd7aaff9b..986867ec8 100644 --- a/README.md +++ b/README.md @@ -177,7 +177,7 @@ Notice: the webserver need to connect llmserver, so you need change the .env f We provide a user interface for Gradio, which allows you to use DB-GPT through our user interface. Additionally, we have prepared several reference articles (written in Chinese) that introduce the code and principles related to our project. - [LLM Practical In Action Series (1) — Combined Langchain-Vicuna Application Practical](https://medium.com/@cfqcsunny/llm-practical-in-action-series-1-combined-langchain-vicuna-application-practical-701cd0413c9f) -####Create your own knowledge repository: +### Create your own knowledge repository: 1.Place personal knowledge files or folders in the pilot/datasets directory. diff --git a/README.zh.md b/README.zh.md index 2c6ecca43..b24288d1b 100644 --- a/README.zh.md +++ b/README.zh.md @@ -180,7 +180,8 @@ $ python webserver.py 2. [大模型实战系列(2) —— DB-GPT 阿里云部署指南](https://zhuanlan.zhihu.com/p/629467580) 3. [大模型实战系列(3) —— DB-GPT插件模型原理与使用](https://zhuanlan.zhihu.com/p/629623125) -####打造属于你的知识库: + +### 打造属于你的知识库: 1、将个人知识文件或者文件夹放入pilot/datasets目录中 @@ -196,6 +197,7 @@ python tools/knowledge_init.py 3、在界面上新增知识库输入你的知识库名(如果没指定输入default),就可以根据你的知识库进行问答 注意,这里默认向量模型是text2vec-large-chinese(模型比较大,如果个人电脑配置不够建议采用text2vec-base-chinese),因此确保需要将模型download下来放到models目录中。 + ## 感谢 项目取得的成果,需要感谢技术社区,尤其以下项目。 From 9e9d7932fa9c389a4b638812e578b67a0fa38da0 Mon Sep 17 00:00:00 2001 From: csunny Date: Sat, 20 May 2023 08:57:58 +0800 Subject: [PATCH 02/16] add requirements --- requirements.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index 410d3129c..8f7524f36 100644 --- a/requirements.txt +++ b/requirements.txt @@ -67,6 +67,8 @@ colorama playsound distro pypdf +paddleocr +paddlepaddle==2.4.2 # Testing dependencies pytest @@ -76,4 +78,4 @@ pytest-benchmark pytest-cov pytest-integration pytest-mock -pytest-recording \ No newline at end of file +pytest-recording From 5a434d3be46d894d6c67f292da10ba6d25a84387 Mon Sep 17 00:00:00 2001 From: csunny Date: Sat, 20 May 2023 10:06:30 +0800 Subject: [PATCH 03/16] update readme --- README.md | 13 +++++++++++-- README.zh.md | 7 +++++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 17d1577b6..5278787ee 100644 --- a/README.md +++ b/README.md @@ -183,8 +183,8 @@ We provide a user interface for Gradio, which allows you to use DB-GPT through o 2.Run the knowledge repository script in the tools directory. -``` -python tools/knowledge_init.py +```bash +& python tools/knowledge_init.py --vector_name : your vector store name default_value:default --append: append mode, True:append, False: not append default_value:False @@ -194,6 +194,15 @@ python tools/knowledge_init.py 3.Add the knowledge repository in the interface by entering the name of your knowledge repository (if not specified, enter "default") so you can use it for Q&A based on your knowledge base. Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory. + +If nltk-related errors occur during the use of the knowledge base, you need to install the nltk toolkit. For more details, please refer to: [nltk documents](https://www.nltk.org/data.html) +Run the Python interpreter and type the commands: + +```bash +>>> import nltk +>>> nltk.download() +``` + ## Acknowledgement The achievements of this project are thanks to the technical community, especially the following projects: diff --git a/README.zh.md b/README.zh.md index 6d7b5ad49..3c0331dbe 100644 --- a/README.zh.md +++ b/README.zh.md @@ -196,6 +196,13 @@ python tools/knowledge_init.py 注意,这里默认向量模型是text2vec-large-chinese(模型比较大,如果个人电脑配置不够建议采用text2vec-base-chinese),因此确保需要将模型download下来放到models目录中。 +如果在使用知识库时遇到与nltk相关的错误,您需要安装nltk工具包。更多详情,请参见:[nltk文档](https://www.nltk.org/data.html) +Run the Python interpreter and type the commands: +```bash +>>> import nltk +>>> nltk.download() +``` + ## 感谢 项目取得的成果,需要感谢技术社区,尤其以下项目。 From 1ec1eeb9e093bffe4e38711891afd0b4115ec2db Mon Sep 17 00:00:00 2001 From: csunny Date: Tue, 23 May 2023 22:11:20 +0800 Subject: [PATCH 04/16] docs: add readthe doc --- .readthedocs.yaml | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 .readthedocs.yaml diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 000000000..e2645f949 --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,20 @@ +# .readthedocs.yaml +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-22.04 + tools: + python: "3.11" + +mkdocs: + configuration: mkdocs.yml + +# Optionally declare the Python requirements required to build your docs +python: + install: + - requirements: docs/requirements.txt From d3718d7277f7f6f831d0a9378e2692fcf0838dce Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 12:06:16 +0800 Subject: [PATCH 05/16] docs: init --- docs/Makefile | 20 ++++++++++++++++++ docs/conf.py | 49 +++++++++++++++++++++++++++++++++++++++++++ docs/index.rst | 20 ++++++++++++++++++ docs/introduct.md | 1 - docs/make.bat | 35 +++++++++++++++++++++++++++++++ docs/reference.md | 0 docs/requirements.txt | 15 +++++++++++++ pyproject.toml | 15 +++++++++++++ 8 files changed, 154 insertions(+), 1 deletion(-) create mode 100644 docs/Makefile create mode 100644 docs/conf.py create mode 100644 docs/index.rst delete mode 100644 docs/introduct.md create mode 100644 docs/make.bat create mode 100644 docs/reference.md create mode 100644 docs/requirements.txt create mode 100644 pyproject.toml diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 000000000..d4bb2cbb9 --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 000000000..f09019218 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,49 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +project = 'DB-GPT' +copyright = '2023, csunny' +author = 'csunny' +release = '0.0.6' + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.autodoc.typehints", + "sphinx.ext.autosummary", + "sphinx.ext.napoleon", + "sphinx.ext.viewcode", + "sphinxcontrib.autodoc_pydantic", + "myst_nb", + "sphinx_copybutton", + "sphinx_panels", + "IPython.sphinxext.ipython_console_highlighting", +] +source_suffix = [".ipynb", ".html", ".md", ".rst"] + +autodoc_pydantic_model_show_json = False +autodoc_pydantic_field_list_validators = False +autodoc_pydantic_config_members = False +autodoc_pydantic_model_show_config_summary = False +autodoc_pydantic_model_show_validator_members = False +autodoc_pydantic_model_show_field_summary = False +autodoc_pydantic_model_members = False +autodoc_pydantic_model_undoc_members = False + +templates_path = ['_templates'] +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + + + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = 'sphinx_book_theme' +html_static_path = ['_static'] diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 000000000..f973b626d --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,20 @@ +.. DB-GPT documentation master file, created by + sphinx-quickstart on Wed May 24 11:50:49 2023. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +Welcome to DB-GPT's documentation! +================================== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/docs/introduct.md b/docs/introduct.md deleted file mode 100644 index 4287ca861..000000000 --- a/docs/introduct.md +++ /dev/null @@ -1 +0,0 @@ -# \ No newline at end of file diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 000000000..32bb24529 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/reference.md b/docs/reference.md new file mode 100644 index 000000000..e69de29bb diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 000000000..4b7c0d72f --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,15 @@ +autodoc_pydantic==1.8.0 +myst_parser +nbsphinx==0.8.9 +sphinx==4.5.0 +recommonmark +sphinx_intl +sphinx-autobuild==2021.3.14 +sphinx_book_theme +sphinx_rtd_theme==1.0.0 +sphinx-typlog-theme==0.8.0 +sphinx-panels +toml +myst_nb +sphinx_copybutton +pydata-sphinx-theme==0.13.1 \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 000000000..c335294f0 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,15 @@ +[tool.poetry] +name = "db-gpt" +version = "0.0.6" +description = "" +authors = ["csunny "] +readme = "README.md" +packages = [{include = "db_gpt"}] + +[tool.poetry.dependencies] +python = "^3.10" + + +[build-system] +requires = ["poetry-core"] +build-backend = "poetry.core.masonry.api" From 05fc55c11637e2bdfe83f42004718ebde93e94f5 Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 16:17:54 +0800 Subject: [PATCH 06/16] docs: add getting started --- .readthedocs.yaml | 6 ++-- docs/getting_started/concepts.md | 3 ++ docs/getting_started/getting_started.md | 7 ++++ docs/getting_started/tutorials.md | 6 ++++ docs/index.rst | 46 +++++++++++++++++++++---- docs/modules/embedding.md | 1 + docs/modules/knownledge.md | 1 + docs/modules/llms.md | 1 + docs/modules/plugins.md | 1 + docs/modules/prompts.md | 1 + docs/modules/server.md | 1 + 11 files changed, 65 insertions(+), 9 deletions(-) create mode 100644 docs/getting_started/concepts.md create mode 100644 docs/getting_started/getting_started.md create mode 100644 docs/getting_started/tutorials.md create mode 100644 docs/modules/embedding.md create mode 100644 docs/modules/knownledge.md create mode 100644 docs/modules/llms.md create mode 100644 docs/modules/plugins.md create mode 100644 docs/modules/prompts.md create mode 100644 docs/modules/server.md diff --git a/.readthedocs.yaml b/.readthedocs.yaml index e2645f949..7f0ba5f6a 100644 --- a/.readthedocs.yaml +++ b/.readthedocs.yaml @@ -11,10 +11,12 @@ build: tools: python: "3.11" -mkdocs: - configuration: mkdocs.yml +sphinx: + configuration: docs/conf.py # Optionally declare the Python requirements required to build your docs python: install: - requirements: docs/requirements.txt + - method: pip + path: . diff --git a/docs/getting_started/concepts.md b/docs/getting_started/concepts.md new file mode 100644 index 000000000..4c2532afa --- /dev/null +++ b/docs/getting_started/concepts.md @@ -0,0 +1,3 @@ +# Concepts + + diff --git a/docs/getting_started/getting_started.md b/docs/getting_started/getting_started.md new file mode 100644 index 000000000..f407a277f --- /dev/null +++ b/docs/getting_started/getting_started.md @@ -0,0 +1,7 @@ +# Quickstart Guide + +This tutorial gives you a quick walkthrough about use DB-GPT with you environment and data. + +## Installation + +To get started, install DB-GPT with the following command. diff --git a/docs/getting_started/tutorials.md b/docs/getting_started/tutorials.md new file mode 100644 index 000000000..9583cda90 --- /dev/null +++ b/docs/getting_started/tutorials.md @@ -0,0 +1,6 @@ +# Tutorials +------------- + +This is a collection of DB-GPT tutorials on Medium. + +Comming soon... \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index f973b626d..4ece9459f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,18 +3,50 @@ You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. -Welcome to DB-GPT's documentation! +Welcome to DB-GPT! ================================== +| As large models are released and iterated upon, they are becoming increasingly intelligent. However, in the process of using large models, we face significant challenges in data security and privacy. We need to ensure that our sensitive data and environments remain completely controlled and avoid any data privacy leaks or security risks. Based on this, we have launched the DB-GPT project to build a complete private large model solution for all database-based scenarios. This solution supports local deployment, allowing it to be applied not only in independent private environments but also to be independently deployed and isolated according to business modules, ensuring that the ability of large models is absolutely private, secure, and controllable. + +| **DB-GPT** is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure. + +Getting Started +----------------- +| How to get started using DB-GPT to interact with your data and environment. +- `Quickstart Guid <./getting_started/getting_started.html>`_ + +| Concepts and terminology + +- `Concepts and terminology <./getting_started/concepts.html>`_ .. toctree:: :maxdepth: 2 - :caption: Contents: + :caption: Getting Started + :hidden: + + getting_started/getting_started.md + getting_started/concepts.md + getting_started/tutorials.md + + +Modules +--------- -Indices and tables -================== +Use Cases +--------- + + +Reference +----------- + + + +Ecosystem +---------- + + +Resources +---------- + -* :ref:`genindex` -* :ref:`modindex` -* :ref:`search` diff --git a/docs/modules/embedding.md b/docs/modules/embedding.md new file mode 100644 index 000000000..50418880b --- /dev/null +++ b/docs/modules/embedding.md @@ -0,0 +1 @@ +# Embedding \ No newline at end of file diff --git a/docs/modules/knownledge.md b/docs/modules/knownledge.md new file mode 100644 index 000000000..460818e6c --- /dev/null +++ b/docs/modules/knownledge.md @@ -0,0 +1 @@ +# Knownledge \ No newline at end of file diff --git a/docs/modules/llms.md b/docs/modules/llms.md new file mode 100644 index 000000000..e73655f25 --- /dev/null +++ b/docs/modules/llms.md @@ -0,0 +1 @@ +# LLMs \ No newline at end of file diff --git a/docs/modules/plugins.md b/docs/modules/plugins.md new file mode 100644 index 000000000..f39a3a0c3 --- /dev/null +++ b/docs/modules/plugins.md @@ -0,0 +1 @@ +# Plugins \ No newline at end of file diff --git a/docs/modules/prompts.md b/docs/modules/prompts.md new file mode 100644 index 000000000..3010b0341 --- /dev/null +++ b/docs/modules/prompts.md @@ -0,0 +1 @@ +# Prompts \ No newline at end of file diff --git a/docs/modules/server.md b/docs/modules/server.md new file mode 100644 index 000000000..ee9929c84 --- /dev/null +++ b/docs/modules/server.md @@ -0,0 +1 @@ +# Server \ No newline at end of file From 9edd279b56dd4bc4198852f73d183066af2a0d8c Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 21:38:54 +0800 Subject: [PATCH 07/16] add use cases --- docs/ecosystem.md | 1 + docs/index.rst | 82 +++++++++++++++++++ docs/modules/connections.md | 1 + docs/reference.md | 1 + docs/use_cases/chatbots.md | 1 + docs/use_cases/interacting_with_api.md | 1 + docs/use_cases/knownledge_based_qa.md | 1 + docs/use_cases/query_database_data.md | 1 + .../use_cases/sql_generation_and_diagnosis.md | 1 + docs/use_cases/tool_use_with_plugin.md | 1 + 10 files changed, 91 insertions(+) create mode 100644 docs/ecosystem.md create mode 100644 docs/modules/connections.md create mode 100644 docs/use_cases/chatbots.md create mode 100644 docs/use_cases/interacting_with_api.md create mode 100644 docs/use_cases/knownledge_based_qa.md create mode 100644 docs/use_cases/query_database_data.md create mode 100644 docs/use_cases/sql_generation_and_diagnosis.md create mode 100644 docs/use_cases/tool_use_with_plugin.md diff --git a/docs/ecosystem.md b/docs/ecosystem.md new file mode 100644 index 000000000..9d6bd4150 --- /dev/null +++ b/docs/ecosystem.md @@ -0,0 +1 @@ +# Ecosystem \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index 4ece9459f..95877b81d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -18,6 +18,9 @@ Getting Started - `Concepts and terminology <./getting_started/concepts.html>`_ +| Coming soon... + +- `Tutorials <.getting_started/tutorials.html>`_ .. toctree:: :maxdepth: 2 :caption: Getting Started @@ -31,22 +34,101 @@ Getting Started Modules --------- +| These modules are the core abstractions with which we can interact with data and environment smoothly. +It's very important for DB-GPT, DB-GPT also provide standard, extendable interfaces. +| The docs for each module contain quickstart examples, how to guides, reference docs, and conceptual guides. + +| The modules are as follows + +- `LLMs <./modules/llms.html>`_: Supported multi models management and integrations. + +- `Prompts <./modules/prompts.html>`_: Prompt management, optimization, and serialization for multi database. + +- `Plugins <./modules/plugins.html>`_: Plugins management, scheduler. + +- `Knownledge <./modules/knownledge.html>`_: Knownledge management, embedding, and search. + +- `Connections <./modules/connections.html>`_: Supported multi databases connection. management connections and interact with this. + +.. toctree:: + :maxdepth: 2 + :caption: Modules + :name: modules + :hidden: + + ./modules/llms.md + ./modules/prompts.md + ./modules/plugins.md + ./modules/connections.md + ./modules/knownledge.md Use Cases --------- +| Best Practices and built-in implementations for common DB-GPT use cases: + +- `Sql generation and diagnosis <./use_cases/sql_generation_and_diagnosis.html>`: SQL generation and diagnosis. + +- `knownledge Based QA <./use_cases/knownledge_based_qa.html>`_: A important scene for user to chat with database documents, codes, bugs and schemas. + +- `Chatbots <./use_cases/chatbots.html>`_: Language model love to chat, use multi models to chat. + +- `Querying Database Data <./use_cases/query_database_data.html>`_: Query and Analysis data from databases and give charts. + +- `Interacting with apis <./use_cases/interacting_with_api.html>`_: Interact with apis, such as create a table, deploy a database cluster, create a database and so on. + +- `Tool use with plugins <./use_cases/tool_use_with_plugin>`_: According to Plugin use tools to manage databases autonomoly. + +.. toctree:: + :maxdepth: 2 + :caption: Use Cases + :name: use_cases + :hidden: + + ./use_cases/sql_generation_and_diagnosis.md + ./use_cases/knownledge_based_qa.md + ./use_cases/chatbots.md + ./use_cases/query_database_data.md + ./use_cases/interacting_with_api.md + ./use_cases/tool_use_with_plugin.md Reference ----------- +| Full documentation on all methods, classes, installation methods, and integration setups for DB-GPT. +.. toctree:: + :maxdepth: 1 + :caption: Reference + :name: reference + :hidden: + ./reference.md Ecosystem ---------- +| Guides for how other companies/products can be used with DB-GPT + +.. toctree:: + :maxdepth: 1 + :glob: + :caption: Ecosystem + :name: ecosystem + :hidden + + ./ecosystem.md + Resources ---------- +| Additional resources we think may be useful as you develop your application! +- `Discord `_: if your have some problem or ideas, you can talk from discord. + +.. toctree:: + :maxdepth: 1 + :caption: Resources + :name: resources + :hidden: diff --git a/docs/modules/connections.md b/docs/modules/connections.md new file mode 100644 index 000000000..e2bfe7401 --- /dev/null +++ b/docs/modules/connections.md @@ -0,0 +1 @@ +# Connections \ No newline at end of file diff --git a/docs/reference.md b/docs/reference.md index e69de29bb..4a938e09d 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -0,0 +1 @@ +# Reference \ No newline at end of file diff --git a/docs/use_cases/chatbots.md b/docs/use_cases/chatbots.md new file mode 100644 index 000000000..547ae67cc --- /dev/null +++ b/docs/use_cases/chatbots.md @@ -0,0 +1 @@ +# Chatbot \ No newline at end of file diff --git a/docs/use_cases/interacting_with_api.md b/docs/use_cases/interacting_with_api.md new file mode 100644 index 000000000..65f69ed2a --- /dev/null +++ b/docs/use_cases/interacting_with_api.md @@ -0,0 +1 @@ +# Interacting with api \ No newline at end of file diff --git a/docs/use_cases/knownledge_based_qa.md b/docs/use_cases/knownledge_based_qa.md new file mode 100644 index 000000000..c9e25f385 --- /dev/null +++ b/docs/use_cases/knownledge_based_qa.md @@ -0,0 +1 @@ +# Knownledge based qa \ No newline at end of file diff --git a/docs/use_cases/query_database_data.md b/docs/use_cases/query_database_data.md new file mode 100644 index 000000000..fa25f7de7 --- /dev/null +++ b/docs/use_cases/query_database_data.md @@ -0,0 +1 @@ +# Query database data \ No newline at end of file diff --git a/docs/use_cases/sql_generation_and_diagnosis.md b/docs/use_cases/sql_generation_and_diagnosis.md new file mode 100644 index 000000000..f0448edd0 --- /dev/null +++ b/docs/use_cases/sql_generation_and_diagnosis.md @@ -0,0 +1 @@ +# SQL generation and diagnosis \ No newline at end of file diff --git a/docs/use_cases/tool_use_with_plugin.md b/docs/use_cases/tool_use_with_plugin.md new file mode 100644 index 000000000..8aa053daf --- /dev/null +++ b/docs/use_cases/tool_use_with_plugin.md @@ -0,0 +1 @@ +# Tool use with plugin \ No newline at end of file From 4021e9a597b88cb0f211ffd29121f20fa83f934d Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 21:59:04 +0800 Subject: [PATCH 08/16] docs: add docs for db-gpt --- docs/conf.py | 10 +++++++++- docs/index.rst | 21 +++++++++++++++++++++ pyproject.toml | 38 ++++++++++++++++++++++++++++++++++++-- 3 files changed, 66 insertions(+), 3 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index f09019218..1aef5c03e 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -6,10 +6,18 @@ # -- Project information ----------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information +import toml + project = 'DB-GPT' copyright = '2023, csunny' author = 'csunny' -release = '0.0.6' + +with open("../pyproject.toml") as f: + data = toml.load(f) + +version = data["tool"]["poetry"]["version"] +release = version +html_title = project + " " + version # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration diff --git a/docs/index.rst b/docs/index.rst index 95877b81d..603ac2049 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,6 +9,27 @@ Welcome to DB-GPT! | **DB-GPT** is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure. +| **Features** +Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities: + +- SQL language capabilities + - SQL generation + - SQL diagnosis + +- Private domain Q&A and data processing + - Database knowledge Q&A + - Data processing + +- Plugins + - Support custom plugin execution tasks and natively support the Auto-GPT plugin, such as: + +- Unified vector storage/indexing of knowledge base + - Support for unstructured data such as PDF, Markdown, CSV, and WebURL + +- Milti LLMs Support + - Supports multiple large language models, currently supporting Vicuna (7b, 13b), ChatGLM-6b (int4, int8) + - TODO: codegen2, codet5p + Getting Started ----------------- | How to get started using DB-GPT to interact with your data and environment. diff --git a/pyproject.toml b/pyproject.toml index c335294f0..12834cedd 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,15 +1,49 @@ [tool.poetry] name = "db-gpt" version = "0.0.6" -description = "" -authors = ["csunny "] +description = "Interact with your data and environment privately" +authors = [] readme = "README.md" +license = "MIT" packages = [{include = "db_gpt"}] +repository = "https://www.github.com/csunny/DB-GPT" [tool.poetry.dependencies] python = "^3.10" +accelerate = "^0.16" +[tool.poetry.group.docs.dependencies] +autodoc_pydantic = "^1.8.0" +myst_parser = "^0.18.1" +nbsphinx = "^0.8.9" +sphinx = "^4.5.0" +sphinx-autobuild = "^2021.3.14" +sphinx_book_theme = "^0.3.3" +sphinx_rtd_theme = "^1.0.0" +sphinx-typlog-theme = "^0.8.0" +sphinx-panels = "^0.6.0" +toml = "^0.10.2" +myst-nb = "^0.17.1" +linkchecker = "^10.2.1" +sphinx-copybutton = "^0.5.1" + +[tool.poetry.group.test.dependencies] +# The only dependencies that should be added are +# dependencies used for running tests (e.g., pytest, freezegun, response). +# Any dependencies that do not meet that criteria will be removed. +pytest = "^7.3.0" +pytest-cov = "^4.0.0" +pytest-dotenv = "^0.5.2" +duckdb-engine = "^0.7.0" +pytest-watcher = "^0.2.6" +freezegun = "^1.2.2" +responses = "^0.22.0" +pytest-asyncio = "^0.20.3" +lark = "^1.1.5" +pytest-mock = "^3.10.0" +pytest-socket = "^0.6.0" + [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api" From 340069aee6a5ef28938325a9bff43a5d6870ea34 Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 22:21:16 +0800 Subject: [PATCH 09/16] docs: add module description --- docs/getting_started/concepts.md | 1 - docs/getting_started/getting_started.md | 46 ++++++++++++++++++++++++- docs/modules/connections.md | 5 ++- docs/modules/embedding.md | 1 - docs/modules/index.md | 3 ++ docs/modules/knownledge.md | 7 +++- docs/modules/llms.md | 4 ++- docs/modules/plugins.md | 4 ++- docs/modules/prompts.md | 4 ++- docs/modules/server.md | 4 ++- 10 files changed, 70 insertions(+), 9 deletions(-) delete mode 100644 docs/modules/embedding.md create mode 100644 docs/modules/index.md diff --git a/docs/getting_started/concepts.md b/docs/getting_started/concepts.md index 4c2532afa..e834417d3 100644 --- a/docs/getting_started/concepts.md +++ b/docs/getting_started/concepts.md @@ -1,3 +1,2 @@ # Concepts - diff --git a/docs/getting_started/getting_started.md b/docs/getting_started/getting_started.md index f407a277f..caf566967 100644 --- a/docs/getting_started/getting_started.md +++ b/docs/getting_started/getting_started.md @@ -4,4 +4,48 @@ This tutorial gives you a quick walkthrough about use DB-GPT with you environmen ## Installation -To get started, install DB-GPT with the following command. +To get started, install DB-GPT with the following steps. + +### 1. Hardware Requirements +As our project has the ability to achieve ChatGPT performance of over 85%, there are certain hardware requirements. However, overall, the project can be deployed and used on consumer-grade graphics cards. The specific hardware requirements for deployment are as follows: + +| GPU | VRAM Size | Performance | +| --------- | --------- | ------------------------------------------- | +| RTX 4090 | 24 GB | Smooth conversation inference | +| RTX 3090 | 24 GB | Smooth conversation inference, better than V100 | +| V100 | 16 GB | Conversation inference possible, noticeable stutter | + +### 2. Install + +This project relies on a local MySQL database service, which you need to install locally. We recommend using Docker for installation. + +```bash +$ docker run --name=mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=aa12345678 -dit mysql:latest +``` +We use [Chroma embedding database](https://github.com/chroma-core/chroma) as the default for our vector database, so there is no need for special installation. If you choose to connect to other databases, you can follow our tutorial for installation and configuration. +For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies. + +``` +python>=3.10 +conda create -n dbgpt_env python=3.10 +conda activate dbgpt_env +pip install -r requirements.txt +``` + +### 3. Run +You can refer to this document to obtain the Vicuna weights: [Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-weights) . + +If you have difficulty with this step, you can also directly use the model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a replacement. + +1. Run server +```bash +$ python pilot/server/llmserver.py +``` + +Run gradio webui + +```bash +$ python pilot/server/webserver.py +``` + +Notice: the webserver need to connect llmserver, so you need change the .env file. change the MODEL_SERVER = "http://127.0.0.1:8000" to your address. It's very important. \ No newline at end of file diff --git a/docs/modules/connections.md b/docs/modules/connections.md index e2bfe7401..041120d26 100644 --- a/docs/modules/connections.md +++ b/docs/modules/connections.md @@ -1 +1,4 @@ -# Connections \ No newline at end of file +# Connections + +In order to interact more conveniently with users' private environments, the project has designed a connection module, which can support connection to databases, Excel, knowledge bases, and other environments to achieve information and data exchange. + diff --git a/docs/modules/embedding.md b/docs/modules/embedding.md deleted file mode 100644 index 50418880b..000000000 --- a/docs/modules/embedding.md +++ /dev/null @@ -1 +0,0 @@ -# Embedding \ No newline at end of file diff --git a/docs/modules/index.md b/docs/modules/index.md new file mode 100644 index 000000000..abbe16823 --- /dev/null +++ b/docs/modules/index.md @@ -0,0 +1,3 @@ +# Vector storage and indexing + +In order to facilitate the management of knowledge after vectorization, we have built-in multiple vector storage engines, from memory-based Chroma to distributed Milvus. Users can choose different storage engines according to their own scenario needs. The storage of knowledge vectors is the cornerstone of AI capability enhancement. As the intermediate language for interaction between humans and large language models, vectors play a very important role in this project. \ No newline at end of file diff --git a/docs/modules/knownledge.md b/docs/modules/knownledge.md index 460818e6c..d83d19517 100644 --- a/docs/modules/knownledge.md +++ b/docs/modules/knownledge.md @@ -1 +1,6 @@ -# Knownledge \ No newline at end of file +# Knownledge + +As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as: +1. Default built-in knowledge base +2. Custom addition of knowledge bases +3. Various usage scenarios such as constructing knowledge bases through plugin capabilities and web crawling. Users only need to organize the knowledge documents, and they can use our existing capabilities to build the knowledge base required for the large model. diff --git a/docs/modules/llms.md b/docs/modules/llms.md index e73655f25..26224b9b5 100644 --- a/docs/modules/llms.md +++ b/docs/modules/llms.md @@ -1 +1,3 @@ -# LLMs \ No newline at end of file +# LLMs + +In the underlying large model integration, we have designed an open interface that supports integration with various large models. At the same time, we have a very strict control and evaluation mechanism for the effectiveness of the integrated models. In terms of accuracy, the integrated models need to align with the capability of ChatGPT at a level of 85% or higher. We use higher standards to select models, hoping to save users the cumbersome testing and evaluation process in the process of use. \ No newline at end of file diff --git a/docs/modules/plugins.md b/docs/modules/plugins.md index f39a3a0c3..e4a95d3be 100644 --- a/docs/modules/plugins.md +++ b/docs/modules/plugins.md @@ -1 +1,3 @@ -# Plugins \ No newline at end of file +# Plugins + +The ability of Agent and Plugin is the core of whether large models can be automated. In this project, we natively support the plugin mode, and large models can automatically achieve their goals. At the same time, in order to give full play to the advantages of the community, the plugins used in this project natively support the Auto-GPT plugin ecology, that is, Auto-GPT plugins can directly run in our project. \ No newline at end of file diff --git a/docs/modules/prompts.md b/docs/modules/prompts.md index 3010b0341..647b93658 100644 --- a/docs/modules/prompts.md +++ b/docs/modules/prompts.md @@ -1 +1,3 @@ -# Prompts \ No newline at end of file +# Prompts + +Prompt is a very important part of the interaction between the large model and the user, and to a certain extent, it determines the quality and accuracy of the answer generated by the large model. In this project, we will automatically optimize the corresponding prompt according to user input and usage scenarios, making it easier and more efficient for users to use large language models. \ No newline at end of file diff --git a/docs/modules/server.md b/docs/modules/server.md index ee9929c84..ad1623c65 100644 --- a/docs/modules/server.md +++ b/docs/modules/server.md @@ -1 +1,3 @@ -# Server \ No newline at end of file +# Server + +TODO: In terms of terminal display, we will provide a multi-platform product interface, including PC, mobile phone, command line, Slack and other platforms. \ No newline at end of file From f95a15187b3a570dabddc0a8ea767718ee776898 Mon Sep 17 00:00:00 2001 From: csunny Date: Wed, 24 May 2023 22:27:27 +0800 Subject: [PATCH 10/16] docs: multi llms and qa update --- docs/modules/knownledge.md | 19 +++++++++++++++++++ docs/modules/llms.md | 10 +++++++++- 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/docs/modules/knownledge.md b/docs/modules/knownledge.md index d83d19517..f33438226 100644 --- a/docs/modules/knownledge.md +++ b/docs/modules/knownledge.md @@ -4,3 +4,22 @@ As the knowledge base is currently the most significant user demand scenario, we 1. Default built-in knowledge base 2. Custom addition of knowledge bases 3. Various usage scenarios such as constructing knowledge bases through plugin capabilities and web crawling. Users only need to organize the knowledge documents, and they can use our existing capabilities to build the knowledge base required for the large model. + + +### Create your own knowledge repository + +1.Place personal knowledge files or folders in the pilot/datasets directory. + +2.Run the knowledge repository script in the tools directory. + +``` +python tools/knowledge_init.py + +--vector_name : your vector store name default_value:default +--append: append mode, True:append, False: not append default_value:False + +``` + +3.Add the knowledge repository in the interface by entering the name of your knowledge repository (if not specified, enter "default") so you can use it for Q&A based on your knowledge base. + +Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory. \ No newline at end of file diff --git a/docs/modules/llms.md b/docs/modules/llms.md index 26224b9b5..b4d57579f 100644 --- a/docs/modules/llms.md +++ b/docs/modules/llms.md @@ -1,3 +1,11 @@ # LLMs -In the underlying large model integration, we have designed an open interface that supports integration with various large models. At the same time, we have a very strict control and evaluation mechanism for the effectiveness of the integrated models. In terms of accuracy, the integrated models need to align with the capability of ChatGPT at a level of 85% or higher. We use higher standards to select models, hoping to save users the cumbersome testing and evaluation process in the process of use. \ No newline at end of file +In the underlying large model integration, we have designed an open interface that supports integration with various large models. At the same time, we have a very strict control and evaluation mechanism for the effectiveness of the integrated models. In terms of accuracy, the integrated models need to align with the capability of ChatGPT at a level of 85% or higher. We use higher standards to select models, hoping to save users the cumbersome testing and evaluation process in the process of use. + +## Multi LLMs Usage +To use multiple models, modify the LLM_MODEL parameter in the .env configuration file to switch between the models. + +Notice: you can create .env file from .env.template, just use command like this: +``` +cp .env.template .env +``` \ No newline at end of file From 79c5bda22d7c3b2d4ea330eeae6b479bc5c35f6e Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 01:04:34 +0800 Subject: [PATCH 11/16] lint: format --- docs/conf.py | 15 +++++++-------- requirements.txt | 1 - 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index 1aef5c03e..ba156d994 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -8,9 +8,9 @@ import toml -project = 'DB-GPT' -copyright = '2023, csunny' -author = 'csunny' +project = "DB-GPT" +copyright = "2023, csunny" +author = "csunny" with open("../pyproject.toml") as f: data = toml.load(f) @@ -45,13 +45,12 @@ autodoc_pydantic_model_show_field_summary = False autodoc_pydantic_model_members = False autodoc_pydantic_model_undoc_members = False -templates_path = ['_templates'] -exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] - +templates_path = ["_templates"] +exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] # -- Options for HTML output ------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output -html_theme = 'sphinx_book_theme' -html_static_path = ['_static'] +html_theme = "sphinx_book_theme" +html_static_path = ["_static"] diff --git a/requirements.txt b/requirements.txt index 685661026..f41a187b4 100644 --- a/requirements.txt +++ b/requirements.txt @@ -56,7 +56,6 @@ pytesseract==0.3.10 auto-gpt-plugin-template pymdown-extensions mkdocs -requests gTTS==2.3.1 langchain nltk From 1733884cafd4379172516b9849afa934dbe8b75c Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 01:06:40 +0800 Subject: [PATCH 12/16] db: open dblist get --- pilot/server/webserver.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pilot/server/webserver.py b/pilot/server/webserver.py index 15d360ec7..04737f6f2 100644 --- a/pilot/server/webserver.py +++ b/pilot/server/webserver.py @@ -697,7 +697,7 @@ if __name__ == "__main__": # 配置初始化 cfg = Config() - # dbs = get_database_list() + dbs = get_database_list() cfg.set_plugins(scan_plugins(cfg, cfg.debug_mode)) # 加载插件可执行命令 From d0e3ae09f0274ac383a02237b579f236c749bee7 Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 10:39:19 +0800 Subject: [PATCH 13/16] bug: fix package --- requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index 5ec153277..c6e641529 100644 --- a/requirements.txt +++ b/requirements.txt @@ -41,7 +41,6 @@ iopath==0.1.10 tenacity==8.2.2 peft pycocoevalcap -sentence-transformers cpm_kernels umap-learn notebook @@ -51,7 +50,7 @@ wandb llama-index==0.5.27 pymysql unstructured==0.6.3 -pytesseract==0.3.10 +grpcio==1.47.5 auto-gpt-plugin-template pymdown-extensions @@ -79,3 +78,4 @@ pytest-cov pytest-integration pytest-mock pytest-recording +pytesseract==0.3.10 From 6068dfd88b18a93c285de233db01436d48461745 Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 10:50:40 +0800 Subject: [PATCH 14/16] fix: requirements --- requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index c6e641529..ed12c7f78 100644 --- a/requirements.txt +++ b/requirements.txt @@ -67,7 +67,7 @@ colorama playsound distro pypdf -milvus-cli==0.3.2 +# milvus-cli==0.3.2 # Testing dependencies pytest From 87eb830e87d70264f8e0e0d41d41e5c779cca6a4 Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 11:02:04 +0800 Subject: [PATCH 15/16] rm extra package --- requirements.txt | 6 ------ 1 file changed, 6 deletions(-) diff --git a/requirements.txt b/requirements.txt index ed12c7f78..28da42929 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,3 @@ -accelerate==0.16.0 torch==2.0.0 accelerate==0.16.0 aiohttp==3.8.4 @@ -24,15 +23,12 @@ pycocotools==2.0.6 pyparsing==3.0.9 python-dateutil==2.8.2 pyyaml==6.0 -regex==2022.10.31 tokenizers==0.13.2 tqdm==4.64.1 transformers==4.28.0 timm==0.6.13 spacy==3.5.1 webdataset==0.2.48 -scikit-learn==1.2.2 -scipy==1.10.1 yarl==1.8.2 zipp==3.14.0 omegaconf==2.3.0 @@ -54,7 +50,6 @@ grpcio==1.47.5 auto-gpt-plugin-template pymdown-extensions -mkdocs gTTS==2.3.1 langchain nltk @@ -67,7 +62,6 @@ colorama playsound distro pypdf -# milvus-cli==0.3.2 # Testing dependencies pytest From 5f0caa5319c7177397a6f7e85648d98aa5b23fa5 Mon Sep 17 00:00:00 2001 From: csunny Date: Thu, 25 May 2023 11:14:58 +0800 Subject: [PATCH 16/16] fix --- .readthedocs.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.readthedocs.yaml b/.readthedocs.yaml index 7f0ba5f6a..008aa7131 100644 --- a/.readthedocs.yaml +++ b/.readthedocs.yaml @@ -9,7 +9,7 @@ version: 2 build: os: ubuntu-22.04 tools: - python: "3.11" + python: "3.10" sphinx: configuration: docs/conf.py