mirror of
https://github.com/csunny/DB-GPT.git
synced 2025-08-09 12:18:12 +00:00
feat:dbgpt api 0.3.0 (#319)
1.EmbeddingEngine:provide knowledge_embedding() and similar_search() 2.Multi SourceEmbedding 3.doc for installation 4.fix chroma exit bug
This commit is contained in:
commit
75115f1175
10
README.md
10
README.md
@ -62,16 +62,6 @@ https://github.com/csunny/DB-GPT/assets/13723926/55f31781-1d49-4757-b96e-7ef6d3d
|
|||||||
<img src="./assets/chat_knowledge.png" width="800px" />
|
<img src="./assets/chat_knowledge.png" width="800px" />
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
## Releases
|
|
||||||
- [2023/07/06]🔥🔥🔥Brand-new DB-GPT product with a brand-new web UI. [documents](https://db-gpt.readthedocs.io/en/latest/getting_started/getting_started.html)
|
|
||||||
- [2023/06/25]🔥support chatglm2-6b model. [documents](https://db-gpt.readthedocs.io/en/latest/modules/llms.html)
|
|
||||||
- [2023/06/14] support gpt4all model, which can run at M1/M2, or cpu machine. [documents](https://db-gpt.readthedocs.io/en/latest/modules/llms.html)
|
|
||||||
- [2023/06/01]🔥 On the basis of the Vicuna-13B basic model, task chain calls are implemented through plugins. For example, the implementation of creating a database with a single sentence.[demo](./assets/auto_plugin.gif)
|
|
||||||
- [2023/06/01]🔥 QLoRA guanaco(7b, 13b, 33b) support.
|
|
||||||
- [2023/05/28] Learning from crawling data from the Internet [demo](./assets/dbgpt_demo.gif)
|
|
||||||
- [2023/05/21] Generate SQL and execute it automatically. [demo](./assets/chat-data.gif)
|
|
||||||
- [2023/05/15] Chat with documents. [demo](./assets/new_knownledge_en.gif)
|
|
||||||
- [2023/05/06] SQL generation and diagnosis. [demo](./assets/demo_en.gif)
|
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
|
11
README.zh.md
11
README.zh.md
@ -65,17 +65,6 @@ https://github.com/csunny/DB-GPT/assets/13723926/55f31781-1d49-4757-b96e-7ef6d3d
|
|||||||
<img src="./assets/chat_knowledge.png" width="800px" />
|
<img src="./assets/chat_knowledge.png" width="800px" />
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
## 最新发布
|
|
||||||
- [2023/07/06]🔥🔥🔥 全新的DB-GPT产品。 [使用文档](https://db-gpt.readthedocs.io/projects/db-gpt-docs-zh-cn/zh_CN/latest/getting_started/getting_started.html)
|
|
||||||
- [2023/06/25]🔥 支持ChatGLM2-6B模型。 [使用文档](https://db-gpt.readthedocs.io/projects/db-gpt-docs-zh-cn/zh_CN/latest/modules/llms.html)
|
|
||||||
- [2023/06/14]🔥 支持gpt4all模型,可以在M1/M2 或者CPU机器上运行。 [使用文档](https://db-gpt.readthedocs.io/projects/db-gpt-docs-zh-cn/zh_CN/latest/modules/llms.html)
|
|
||||||
- [2023/06/01]🔥 在Vicuna-13B基础模型的基础上,通过插件实现任务链调用。例如单句创建数据库的实现.
|
|
||||||
- [2023/06/01]🔥 QLoRA guanaco(原驼)支持, 支持4090运行33B
|
|
||||||
- [2023/05/28]🔥根据URL进行对话 [演示](./assets/chat_url_zh.gif)
|
|
||||||
- [2023/05/21] SQL生成与自动执行. [演示](./assets/auto_sql.gif)
|
|
||||||
- [2023/05/15] 知识库对话 [演示](./assets/new_knownledge.gif)
|
|
||||||
- [2023/05/06] SQL生成与诊断 [演示](./assets/演示.gif)
|
|
||||||
|
|
||||||
## 特性一览
|
## 特性一览
|
||||||
|
|
||||||
目前我们已经发布了多种关键的特性,这里一一列举展示一下当前发布的能力。
|
目前我们已经发布了多种关键的特性,这里一一列举展示一下当前发布的能力。
|
||||||
|
@ -25,22 +25,25 @@ $ docker run --name=mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=aa12345678 -dit my
|
|||||||
We use [Chroma embedding database](https://github.com/chroma-core/chroma) as the default for our vector database, so there is no need for special installation. If you choose to connect to other databases, you can follow our tutorial for installation and configuration.
|
We use [Chroma embedding database](https://github.com/chroma-core/chroma) as the default for our vector database, so there is no need for special installation. If you choose to connect to other databases, you can follow our tutorial for installation and configuration.
|
||||||
For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies.
|
For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies.
|
||||||
|
|
||||||
```
|
```bash
|
||||||
python>=3.10
|
python>=3.10
|
||||||
conda create -n dbgpt_env python=3.10
|
conda create -n dbgpt_env python=3.10
|
||||||
conda activate dbgpt_env
|
conda activate dbgpt_env
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
Before use DB-GPT Knowledge Management
|
Before use DB-GPT Knowledge Management
|
||||||
```
|
```bash
|
||||||
python -m spacy download zh_core_web_sm
|
python -m spacy download zh_core_web_sm
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Once the environment is installed, we have to create a new folder "models" in the DB-GPT project, and then we can put all the models downloaded from huggingface in this directory
|
Once the environment is installed, we have to create a new folder "models" in the DB-GPT project, and then we can put all the models downloaded from huggingface in this directory
|
||||||
|
|
||||||
|
```{tip}
|
||||||
Notice make sure you have install git-lfs
|
Notice make sure you have install git-lfs
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
git clone https://huggingface.co/Tribbiani/vicuna-13b
|
git clone https://huggingface.co/Tribbiani/vicuna-13b
|
||||||
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||||
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
|
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
|
||||||
@ -49,7 +52,7 @@ git clone https://huggingface.co/THUDM/chatglm2-6b
|
|||||||
|
|
||||||
The model files are large and will take a long time to download. During the download, let's configure the .env file, which needs to be copied and created from the .env.template
|
The model files are large and will take a long time to download. During the download, let's configure the .env file, which needs to be copied and created from the .env.template
|
||||||
|
|
||||||
```
|
```{tip}
|
||||||
cp .env.template .env
|
cp .env.template .env
|
||||||
```
|
```
|
||||||
|
|
||||||
|
35
docs/getting_started/installation.md
Normal file
35
docs/getting_started/installation.md
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
# Installation
|
||||||
|
DB-GPT provides a third-party Python API package that you can integrate into your own code.
|
||||||
|
|
||||||
|
### Installation from Pip
|
||||||
|
|
||||||
|
You can simply pip install:
|
||||||
|
```bash
|
||||||
|
pip install -i https://pypi.org/simple/ db-gpt==0.3.0
|
||||||
|
```
|
||||||
|
|
||||||
|
```{tip}
|
||||||
|
Notice:make sure python>=3.10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Environment Setup
|
||||||
|
|
||||||
|
By default, if you use the EmbeddingEngine api
|
||||||
|
|
||||||
|
you will prepare embedding models from huggingface
|
||||||
|
|
||||||
|
```{tip}
|
||||||
|
Notice make sure you have install git-lfs
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||||
|
|
||||||
|
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
|
||||||
|
```
|
||||||
|
version:
|
||||||
|
- db-gpt0.3.0
|
||||||
|
- [embedding_engine api](https://db-gpt.readthedocs.io/en/latest/modules/knowledge.html)
|
||||||
|
- [multi source embedding](https://db-gpt.readthedocs.io/en/latest/modules/knowledge/pdf/pdf_embedding.html)
|
||||||
|
- [vector connector](https://db-gpt.readthedocs.io/en/latest/modules/vector.html)
|
||||||
|
|
@ -48,6 +48,7 @@ Getting Started
|
|||||||
:hidden:
|
:hidden:
|
||||||
|
|
||||||
getting_started/getting_started.md
|
getting_started/getting_started.md
|
||||||
|
getting_started/installation.md
|
||||||
getting_started/concepts.md
|
getting_started/concepts.md
|
||||||
getting_started/tutorials.md
|
getting_started/tutorials.md
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-07-05 17:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -19,29 +19,29 @@ msgstr ""
|
|||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:1 2e1519d628044c07b384e8bbe441863a
|
#: ../../getting_started/getting_started.md:1 0b2e795438a3413c875fd80191e85bad
|
||||||
msgid "Quickstart Guide"
|
msgid "Quickstart Guide"
|
||||||
msgstr "使用指南"
|
msgstr "使用指南"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:3 00e8dc6e242d4f3b8b2fbc5e06f1f14e
|
#: ../../getting_started/getting_started.md:3 7b84c9776f8a4f9fb55afc640f37f45c
|
||||||
msgid ""
|
msgid ""
|
||||||
"This tutorial gives you a quick walkthrough about use DB-GPT with you "
|
"This tutorial gives you a quick walkthrough about use DB-GPT with you "
|
||||||
"environment and data."
|
"environment and data."
|
||||||
msgstr "本教程为您提供了关于如何使用DB-GPT的使用指南。"
|
msgstr "本教程为您提供了关于如何使用DB-GPT的使用指南。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:5 4b4473a5fbd64cef996d82fa36abe136
|
#: ../../getting_started/getting_started.md:5 1b2880e1ef674bfdbf39ac9f330aeec9
|
||||||
msgid "Installation"
|
msgid "Installation"
|
||||||
msgstr "安装"
|
msgstr "安装"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:7 5ab3187dd2134afe958d83a431c98f43
|
#: ../../getting_started/getting_started.md:7 d0a8c6654bfe4bbdb0eb40ceb2ea3388
|
||||||
msgid "To get started, install DB-GPT with the following steps."
|
msgid "To get started, install DB-GPT with the following steps."
|
||||||
msgstr "请按照以下步骤安装DB-GPT"
|
msgstr "请按照以下步骤安装DB-GPT"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:9 7286e3a0da00450c9a6e9f29dbd27130
|
#: ../../getting_started/getting_started.md:9 0a4e0b06c7fe49a9b2ca56ba2eb7b8ba
|
||||||
msgid "1. Hardware Requirements"
|
msgid "1. Hardware Requirements"
|
||||||
msgstr "1. 硬件要求"
|
msgstr "1. 硬件要求"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:10 3f3d279ca8a54c8c8ed16af3e0ffb281
|
#: ../../getting_started/getting_started.md:10 2b42f6546ef141f696943ba2120584e5
|
||||||
msgid ""
|
msgid ""
|
||||||
"As our project has the ability to achieve ChatGPT performance of over "
|
"As our project has the ability to achieve ChatGPT performance of over "
|
||||||
"85%, there are certain hardware requirements. However, overall, the "
|
"85%, there are certain hardware requirements. However, overall, the "
|
||||||
@ -49,62 +49,62 @@ msgid ""
|
|||||||
"specific hardware requirements for deployment are as follows:"
|
"specific hardware requirements for deployment are as follows:"
|
||||||
msgstr "由于我们的项目有能力达到85%以上的ChatGPT性能,所以对硬件有一定的要求。但总体来说,我们在消费级的显卡上即可完成项目的部署使用,具体部署的硬件说明如下:"
|
msgstr "由于我们的项目有能力达到85%以上的ChatGPT性能,所以对硬件有一定的要求。但总体来说,我们在消费级的显卡上即可完成项目的部署使用,具体部署的硬件说明如下:"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md 6e1e882511254687bd46fe45447794d1
|
#: ../../getting_started/getting_started.md 4df0c44eff8741f39ca0fdeff222f90c
|
||||||
msgid "GPU"
|
msgid "GPU"
|
||||||
msgstr "GPU"
|
msgstr "GPU"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md f0ee9919e1254bcdbe6e489a5fbf450f
|
#: ../../getting_started/getting_started.md b740a2991ce546cca43a426b760e9901
|
||||||
msgid "VRAM Size"
|
msgid "VRAM Size"
|
||||||
msgstr "显存大小"
|
msgstr "显存大小"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md eed88601ef0b49b58d95b89928a3810e
|
#: ../../getting_started/getting_started.md 222b91ff82f14d12acaac5aa238758c8
|
||||||
msgid "Performance"
|
msgid "Performance"
|
||||||
msgstr "显存大小"
|
msgstr "显存大小"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md 4f717383ef2d4e2da9ee2d1c148aa6c5
|
#: ../../getting_started/getting_started.md c2d2ae6a4c964c4f90a9009160754782
|
||||||
msgid "RTX 4090"
|
msgid "RTX 4090"
|
||||||
msgstr "RTX 4090"
|
msgstr "RTX 4090"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md d2d9bd1b57694404b39cdef49fd5b570
|
#: ../../getting_started/getting_started.md 529220ec6a294e449dc460ba2e8829a1
|
||||||
#: d7d914b8d5e34ac192b94d48f0ee1781
|
#: 5e0c5900842e4d66b2064b13cc31a3ad
|
||||||
msgid "24 GB"
|
msgid "24 GB"
|
||||||
msgstr "24 GB"
|
msgstr "24 GB"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md cb86730ab05e4172941c3e771384c4ba
|
#: ../../getting_started/getting_started.md 84d29eef342f4d6282295c0e32487548
|
||||||
msgid "Smooth conversation inference"
|
msgid "Smooth conversation inference"
|
||||||
msgstr "可以流畅的进行对话推理,无卡顿"
|
msgstr "可以流畅的进行对话推理,无卡顿"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md 3e32d5c38bf6499cbfedb80944549114
|
#: ../../getting_started/getting_started.md 5a10effe322e4afb8315415c04dc05a4
|
||||||
msgid "RTX 3090"
|
msgid "RTX 3090"
|
||||||
msgstr "RTX 3090"
|
msgstr "RTX 3090"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md 1d3caa2a06844997ad55d20863559e9f
|
#: ../../getting_started/getting_started.md 8924059525ab43329a8bb6659e034d5e
|
||||||
msgid "Smooth conversation inference, better than V100"
|
msgid "Smooth conversation inference, better than V100"
|
||||||
msgstr "可以流畅进行对话推理,有卡顿感,但好于V100"
|
msgstr "可以流畅进行对话推理,有卡顿感,但好于V100"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md b80ec359bd004d5f801ec09ca3b2d0ff
|
#: ../../getting_started/getting_started.md 10f5bc076f524127a956d7a23f3666ba
|
||||||
msgid "V100"
|
msgid "V100"
|
||||||
msgstr "V100"
|
msgstr "V100"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md aed55a6b8c8d49d9b9c02bfd5c10b062
|
#: ../../getting_started/getting_started.md 7d664e81984847c7accd08db93fad404
|
||||||
msgid "16 GB"
|
msgid "16 GB"
|
||||||
msgstr "16 GB"
|
msgstr "16 GB"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md dcd6daab75fe4bf8b8dd19ea785f0bd6
|
#: ../../getting_started/getting_started.md 86765bc9ab01409fb7f5edf04f9b32a5
|
||||||
msgid "Conversation inference possible, noticeable stutter"
|
msgid "Conversation inference possible, noticeable stutter"
|
||||||
msgstr "可以进行对话推理,有明显卡顿"
|
msgstr "可以进行对话推理,有明显卡顿"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:18 e39a4b763ed74cea88d54d163ea72ce0
|
#: ../../getting_started/getting_started.md:18 a0ac5591c0ac4ac6a385e562353daf22
|
||||||
msgid "2. Install"
|
msgid "2. Install"
|
||||||
msgstr "2. 安装"
|
msgstr "2. 安装"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:20 9beba274b78a46c6aafb30173372b334
|
#: ../../getting_started/getting_started.md:20 a64a9a5945074ece872509f8cb425da9
|
||||||
msgid ""
|
msgid ""
|
||||||
"This project relies on a local MySQL database service, which you need to "
|
"This project relies on a local MySQL database service, which you need to "
|
||||||
"install locally. We recommend using Docker for installation."
|
"install locally. We recommend using Docker for installation."
|
||||||
msgstr "本项目依赖一个本地的 MySQL 数据库服务,你需要本地安装,推荐直接使用 Docker 安装。"
|
msgstr "本项目依赖一个本地的 MySQL 数据库服务,你需要本地安装,推荐直接使用 Docker 安装。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:25 3bce689bb49043eca5b9aa3c5525eaac
|
#: ../../getting_started/getting_started.md:25 11e799a372ab4d0f8269cd7be98bebc6
|
||||||
msgid ""
|
msgid ""
|
||||||
"We use [Chroma embedding database](https://github.com/chroma-core/chroma)"
|
"We use [Chroma embedding database](https://github.com/chroma-core/chroma)"
|
||||||
" as the default for our vector database, so there is no need for special "
|
" as the default for our vector database, so there is no need for special "
|
||||||
@ -117,11 +117,11 @@ msgstr ""
|
|||||||
"向量数据库我们默认使用的是Chroma内存数据库,所以无需特殊安装,如果有需要连接其他的同学,可以按照我们的教程进行安装配置。整个DB-"
|
"向量数据库我们默认使用的是Chroma内存数据库,所以无需特殊安装,如果有需要连接其他的同学,可以按照我们的教程进行安装配置。整个DB-"
|
||||||
"GPT的安装过程,我们使用的是miniconda3的虚拟环境。创建虚拟环境,并安装python依赖包"
|
"GPT的安装过程,我们使用的是miniconda3的虚拟环境。创建虚拟环境,并安装python依赖包"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:34 61ad49740d0b49afa254cb2d10a0d2ae
|
#: ../../getting_started/getting_started.md:34 dcab69c83d4c48b9bb19c4336ee74a66
|
||||||
msgid "Before use DB-GPT Knowledge Management"
|
msgid "Before use DB-GPT Knowledge Management"
|
||||||
msgstr "使用知识库管理功能之前"
|
msgstr "使用知识库管理功能之前"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:40 656041e456f248a0a472be06357d7f89
|
#: ../../getting_started/getting_started.md:40 735aeb6ae8aa4344b7ff679548279acc
|
||||||
msgid ""
|
msgid ""
|
||||||
"Once the environment is installed, we have to create a new folder "
|
"Once the environment is installed, we have to create a new folder "
|
||||||
"\"models\" in the DB-GPT project, and then we can put all the models "
|
"\"models\" in the DB-GPT project, and then we can put all the models "
|
||||||
@ -130,29 +130,33 @@ msgstr ""
|
|||||||
"环境安装完成后,我们必须在DB-"
|
"环境安装完成后,我们必须在DB-"
|
||||||
"GPT项目中创建一个新文件夹\"models\",然后我们可以把从huggingface下载的所有模型放到这个目录下。"
|
"GPT项目中创建一个新文件夹\"models\",然后我们可以把从huggingface下载的所有模型放到这个目录下。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:42 4dfb7d63fdf544f2bf9dd8663efa8d31
|
#: ../../getting_started/getting_started.md:43 7cbefe131b24488b9be39b3e8ed4f563
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Notice make sure you have install git-lfs"
|
msgid "Notice make sure you have install git-lfs"
|
||||||
msgstr "确保你已经安装了git-lfs"
|
msgstr "确保你已经安装了git-lfs"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:50 a52c137b8ef54b7ead41a2d8ff81d457
|
#: ../../getting_started/getting_started.md:53 54ec90ebb969475988451cd66e6ff412
|
||||||
msgid ""
|
msgid ""
|
||||||
"The model files are large and will take a long time to download. During "
|
"The model files are large and will take a long time to download. During "
|
||||||
"the download, let's configure the .env file, which needs to be copied and"
|
"the download, let's configure the .env file, which needs to be copied and"
|
||||||
" created from the .env.template"
|
" created from the .env.template"
|
||||||
msgstr "模型文件很大,需要很长时间才能下载。在下载过程中,让我们配置.env文件,它需要从。env.template中复制和创建。"
|
msgstr "模型文件很大,需要很长时间才能下载。在下载过程中,让我们配置.env文件,它需要从。env.template中复制和创建。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:56 db87d872a47047dc8cd1de390d068ed4
|
#: ../../getting_started/getting_started.md:56 9bdadbee88af4683a4eb7b4f221fb4b8
|
||||||
|
msgid "cp .env.template .env"
|
||||||
|
msgstr "cp .env.template .env"
|
||||||
|
|
||||||
|
#: ../../getting_started/getting_started.md:59 6357c4a0154b4f08a079419ac408442d
|
||||||
msgid ""
|
msgid ""
|
||||||
"You can configure basic parameters in the .env file, for example setting "
|
"You can configure basic parameters in the .env file, for example setting "
|
||||||
"LLM_MODEL to the model to be used"
|
"LLM_MODEL to the model to be used"
|
||||||
msgstr "您可以在.env文件中配置基本参数,例如将LLM_MODEL设置为要使用的模型。"
|
msgstr "您可以在.env文件中配置基本参数,例如将LLM_MODEL设置为要使用的模型。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:58 c8865a327b4b44daa55813479c743e3c
|
#: ../../getting_started/getting_started.md:61 2f349f3ed3184b849ade2a15d5bf0c6c
|
||||||
msgid "3. Run"
|
msgid "3. Run"
|
||||||
msgstr "3. 运行"
|
msgstr "3. 运行"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:59 e81dabe730134753a4daa05a7bdd44af
|
#: ../../getting_started/getting_started.md:62 fe408e4405bd48288e2e746386615925
|
||||||
msgid ""
|
msgid ""
|
||||||
"You can refer to this document to obtain the Vicuna weights: "
|
"You can refer to this document to obtain the Vicuna weights: "
|
||||||
"[Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-"
|
"[Vicuna](https://github.com/lm-sys/FastChat/blob/main/README.md#model-"
|
||||||
@ -161,7 +165,7 @@ msgstr ""
|
|||||||
"关于基础模型, 可以根据[Vicuna](https://github.com/lm-"
|
"关于基础模型, 可以根据[Vicuna](https://github.com/lm-"
|
||||||
"sys/FastChat/blob/main/README.md#model-weights) 合成教程进行合成。"
|
"sys/FastChat/blob/main/README.md#model-weights) 合成教程进行合成。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:61 714cbc9485ea47d0a06aa1a31b9af3e3
|
#: ../../getting_started/getting_started.md:64 c0acfe28007f459ca21174f968763fa3
|
||||||
msgid ""
|
msgid ""
|
||||||
"If you have difficulty with this step, you can also directly use the "
|
"If you have difficulty with this step, you can also directly use the "
|
||||||
"model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a "
|
"model from [this link](https://huggingface.co/Tribbiani/vicuna-7b) as a "
|
||||||
@ -170,11 +174,11 @@ msgstr ""
|
|||||||
"如果此步有困难的同学,也可以直接使用[此链接](https://huggingface.co/Tribbiani/vicuna-"
|
"如果此步有困难的同学,也可以直接使用[此链接](https://huggingface.co/Tribbiani/vicuna-"
|
||||||
"7b)上的模型进行替代。"
|
"7b)上的模型进行替代。"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:63 2b8f6985fe1a414e95d334d3ee9d0878
|
#: ../../getting_started/getting_started.md:66 cc0f4c4e43f24b679f857a8d937528ee
|
||||||
msgid "prepare server sql script"
|
msgid "prepare server sql script"
|
||||||
msgstr "准备db-gpt server sql脚本"
|
msgstr "准备db-gpt server sql脚本"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:69 7cb9beb0e15a46759dbcb4606dcb6867
|
#: ../../getting_started/getting_started.md:72 386948064fe646f2b9f51a262dd64bf2
|
||||||
msgid ""
|
msgid ""
|
||||||
"set .env configuration set your vector store type, "
|
"set .env configuration set your vector store type, "
|
||||||
"eg:VECTOR_STORE_TYPE=Chroma, now we support Chroma and Milvus(version > "
|
"eg:VECTOR_STORE_TYPE=Chroma, now we support Chroma and Milvus(version > "
|
||||||
@ -183,17 +187,17 @@ msgstr ""
|
|||||||
"在.env文件设置向量数据库环境变量,eg:VECTOR_STORE_TYPE=Chroma, 目前我们支持了 Chroma and "
|
"在.env文件设置向量数据库环境变量,eg:VECTOR_STORE_TYPE=Chroma, 目前我们支持了 Chroma and "
|
||||||
"Milvus(version >2.1) "
|
"Milvus(version >2.1) "
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:72 cdb7ef30e8c9441293e8b3fd95d621ed
|
#: ../../getting_started/getting_started.md:75 e6f6b06459944f2d8509703af365c664
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Run db-gpt server"
|
msgid "Run db-gpt server"
|
||||||
msgstr "运行模型服务"
|
msgstr "运行模型服务"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:77 e7bb3001d46b458aa0c522c4a7a8d45b
|
#: ../../getting_started/getting_started.md:80 489b595dc08a459ca2fd83b1389d3bbd
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Open http://localhost:5000 with your browser to see the product."
|
msgid "Open http://localhost:5000 with your browser to see the product."
|
||||||
msgstr "打开浏览器访问http://localhost:5000"
|
msgstr "打开浏览器访问http://localhost:5000"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:79 68c55e3ecfc642f2869a9917ec65904c
|
#: ../../getting_started/getting_started.md:82 699afb01c9f243ab837cdc73252f624c
|
||||||
msgid ""
|
msgid ""
|
||||||
"If you want to access an external LLM service, you need to 1.set the "
|
"If you want to access an external LLM service, you need to 1.set the "
|
||||||
"variables LLM_MODEL=YOUR_MODEL_NAME "
|
"variables LLM_MODEL=YOUR_MODEL_NAME "
|
||||||
@ -201,11 +205,11 @@ msgid ""
|
|||||||
"file. 2.execute dbgpt_server.py in light mode"
|
"file. 2.execute dbgpt_server.py in light mode"
|
||||||
msgstr "如果你想访问外部的大模型服务,1.需要在.env文件设置模型名和外部模型服务地址。2.使用light模式启动服务"
|
msgstr "如果你想访问外部的大模型服务,1.需要在.env文件设置模型名和外部模型服务地址。2.使用light模式启动服务"
|
||||||
|
|
||||||
#: ../../getting_started/getting_started.md:86 474aea4023bb44dd970773b110bbf0ee
|
#: ../../getting_started/getting_started.md:89 7df7f3870e1140d3a17dc322a46d6476
|
||||||
msgid ""
|
msgid ""
|
||||||
"If you want to learn about dbgpt-webui, read https://github.com/csunny"
|
"If you want to learn about dbgpt-webui, read https://github.com/csunny"
|
||||||
"/DB-GPT/tree/new-page-framework/datacenter"
|
"/DB-GPT/tree/new-page-framework/datacenter"
|
||||||
msgstr "如果你想了解DB-GPT前端服务,访问https://github.com/csunny"
|
msgstr ""
|
||||||
"/DB-GPT/tree/new-page-framework/datacenter"
|
"如果你想了解DB-GPT前端服务,访问https://github.com/csunny/DB-GPT/tree/new-page-"
|
||||||
|
"framework/datacenter"
|
||||||
|
|
||||||
|
@ -0,0 +1,85 @@
|
|||||||
|
# SOME DESCRIPTIVE TITLE.
|
||||||
|
# Copyright (C) 2023, csunny
|
||||||
|
# This file is distributed under the same license as the DB-GPT package.
|
||||||
|
# FIRST AUTHOR <EMAIL@ADDRESS>, 2023.
|
||||||
|
#
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
msgstr ""
|
||||||
|
"Project-Id-Version: DB-GPT 👏👏 0.3.0\n"
|
||||||
|
"Report-Msgid-Bugs-To: \n"
|
||||||
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
|
"Language: zh_CN\n"
|
||||||
|
"Language-Team: zh_CN <LL@li.org>\n"
|
||||||
|
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||||
|
"MIME-Version: 1.0\n"
|
||||||
|
"Content-Type: text/plain; charset=utf-8\n"
|
||||||
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:1 bc5bfc8ebfc847c5a22f2346357cf747
|
||||||
|
msgid "Installation"
|
||||||
|
msgstr "安装dbgpt包指南"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:2 1aaef0db5ee9426aa337021d782666af
|
||||||
|
msgid ""
|
||||||
|
"DB-GPT provides a third-party Python API package that you can integrate "
|
||||||
|
"into your own code."
|
||||||
|
msgstr "DB-GPT提供了python第三方包,你可以在你的代码中引入"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:4 de542f259e20441991a0e5a7d52769b8
|
||||||
|
msgid "Installation from Pip"
|
||||||
|
msgstr "使用pip安装"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:6 3357f019aa8249b292162de92757eec4
|
||||||
|
msgid "You can simply pip install:"
|
||||||
|
msgstr "你可以使用pip install"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:12 9c610d593608452f9d7d8d7e462251e3
|
||||||
|
msgid "Notice:make sure python>=3.10"
|
||||||
|
msgstr "注意:确保你的python版本>=3.10"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:15 b2ed238c29bb40cba990068e8d7ceae7
|
||||||
|
msgid "Environment Setup"
|
||||||
|
msgstr "环境设置"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:17 4804ad4d8edf44f49b1d35b271635fad
|
||||||
|
msgid "By default, if you use the EmbeddingEngine api"
|
||||||
|
msgstr "如果你想使用EmbeddingEngine api"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:19 2205f69ec60d4f73bb3a93a583928455
|
||||||
|
msgid "you will prepare embedding models from huggingface"
|
||||||
|
msgstr "你需要从huggingface下载embedding models"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:22 693c18a83f034dcc8c263674418bcde2
|
||||||
|
msgid "Notice make sure you have install git-lfs"
|
||||||
|
msgstr "确保你已经安装了git-lfs"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:30 dd8d0880b55e4c48bfc414f8cbdda268
|
||||||
|
msgid "version:"
|
||||||
|
msgstr "版本:"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:31 731e634b96164efbbc1ce9fa88361b12
|
||||||
|
msgid "db-gpt0.3.0"
|
||||||
|
msgstr "db-gpt0.3.0"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:32 38fb635be4554d94b527c6762253d46d
|
||||||
|
msgid ""
|
||||||
|
"[embedding_engine api](https://db-"
|
||||||
|
"gpt.readthedocs.io/en/latest/modules/knowledge.html)"
|
||||||
|
msgstr "[embedding_engine api](https://db-gpt.readthedocs.io/en/latest/modules/knowledge.html)"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:33 a60b0ffe21a74ebca05529dc1dd1ba99
|
||||||
|
msgid ""
|
||||||
|
"[multi source embedding](https://db-"
|
||||||
|
"gpt.readthedocs.io/en/latest/modules/knowledge/pdf/pdf_embedding.html)"
|
||||||
|
msgstr "[multi source embedding](https://db-gpt.readthedocs.io/en/latest/modules/knowledge/pdf/pdf_embedding.html)"
|
||||||
|
|
||||||
|
#: ../../getting_started/installation.md:34 3c752c9305414719bc3f561cf18a75af
|
||||||
|
msgid ""
|
||||||
|
"[vector connector](https://db-"
|
||||||
|
"gpt.readthedocs.io/en/latest/modules/vector.html)"
|
||||||
|
msgstr "[vector connector](https://db-gpt.readthedocs.io/en/latest/modules/vector.html)"
|
||||||
|
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-30 17:16+0800\n"
|
"POT-Creation-Date: 2023-07-12 16:23+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -19,25 +19,25 @@ msgstr ""
|
|||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:1 e494f27e68fd40efa2864a532087cfef
|
#: ../../getting_started/tutorials.md:1 cb100b89a2a747cd90759e415c737070
|
||||||
msgid "Tutorials"
|
msgid "Tutorials"
|
||||||
msgstr "教程"
|
msgstr "教程"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:4 8eecfbf3240b44fcb425034600316cea
|
#: ../../getting_started/tutorials.md:4 dbc2a2346b384cc3930086f97181b14b
|
||||||
msgid "This is a collection of DB-GPT tutorials on Medium."
|
msgid "This is a collection of DB-GPT tutorials on Medium."
|
||||||
msgstr "这是知乎上DB-GPT教程的集合。."
|
msgstr "这是知乎上DB-GPT教程的集合。."
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:6 a40601867a3d4ce886a197f2f337ec0f
|
#: ../../getting_started/tutorials.md:6 67e5b6dbac654d428e6a8be9d1ec6473
|
||||||
msgid ""
|
msgid ""
|
||||||
"DB-GPT is divided into several functions, including chat with knowledge "
|
"DB-GPT is divided into several functions, including chat with knowledge "
|
||||||
"base, execute SQL, chat with database, and execute plugins."
|
"base, execute SQL, chat with database, and execute plugins."
|
||||||
msgstr "DB-GPT包含以下功能,和知识库聊天,执行SQL,和数据库聊天以及执行插件。"
|
msgstr "DB-GPT包含以下功能,和知识库聊天,执行SQL,和数据库聊天以及执行插件。"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:8 493e6f56a75d45ef8bb15d3049a24994
|
#: ../../getting_started/tutorials.md:8 744aaec68aa3413c9b17b09714476d32
|
||||||
msgid "Introduction"
|
msgid "Introduction"
|
||||||
msgstr "介绍"
|
msgstr "介绍"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:9 4526a793cdb94b8f99f41c48cd5ee453
|
#: ../../getting_started/tutorials.md:9 305bcf5e847a4322a2834b84fa3c694a
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "[What is DB-GPT](https://www.youtube.com/watch?v=QszhVJerc0I)"
|
msgid "[What is DB-GPT](https://www.youtube.com/watch?v=QszhVJerc0I)"
|
||||||
msgstr ""
|
msgstr ""
|
||||||
@ -45,12 +45,12 @@ msgstr ""
|
|||||||
"GPT](https://www.bilibili.com/video/BV1SM4y1a7Nj/?buvid=551b023900b290f9497610b2155a2668&is_story_h5=false&mid=%2BVyE%2Fwau5woPcUKieCWS0A%3D%3D&p=1&plat_id=116&share_from=ugc&share_medium=iphone&share_plat=ios&share_session_id=5D08B533-82A4-4D40-9615-7826065B4574&share_source=GENERIC&share_tag=s_i×tamp=1686307943&unique_k=bhO3lgQ&up_id=31375446)"
|
"GPT](https://www.bilibili.com/video/BV1SM4y1a7Nj/?buvid=551b023900b290f9497610b2155a2668&is_story_h5=false&mid=%2BVyE%2Fwau5woPcUKieCWS0A%3D%3D&p=1&plat_id=116&share_from=ugc&share_medium=iphone&share_plat=ios&share_session_id=5D08B533-82A4-4D40-9615-7826065B4574&share_source=GENERIC&share_tag=s_i×tamp=1686307943&unique_k=bhO3lgQ&up_id=31375446)"
|
||||||
" by csunny (https://github.com/csunny/DB-GPT)"
|
" by csunny (https://github.com/csunny/DB-GPT)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:11 95313384e5da4f5db96ac990596b2e73
|
#: ../../getting_started/tutorials.md:11 22fdc6937b2248ae8f5a7ef385aa55d9
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Knowledge"
|
msgid "Knowledge"
|
||||||
msgstr "知识库"
|
msgstr "知识库"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:13 e7a141f4df8d4974b0797dd7723c4658
|
#: ../../getting_started/tutorials.md:13 9bbf0f5aece64389b93b16235abda58e
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"[How to Create your own knowledge repository](https://db-"
|
"[How to Create your own knowledge repository](https://db-"
|
||||||
@ -59,55 +59,55 @@ msgstr ""
|
|||||||
"[怎么创建自己的知识库](https://db-"
|
"[怎么创建自己的知识库](https://db-"
|
||||||
"gpt.readthedocs.io/en/latest/modules/knowledge.html)"
|
"gpt.readthedocs.io/en/latest/modules/knowledge.html)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:15 f7db5b05a2db44e6a98b7d0df0a6f4ee
|
#: ../../getting_started/tutorials.md:15 ae201d75a3aa485e99b258103245db1c
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
msgstr "[新增知识库演示](../../assets/new_knownledge_en.gif)"
|
msgstr "[新增知识库演示](../../assets/new_knownledge_en.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:15 1a1647a7ca23423294823529301dd75f
|
#: ../../getting_started/tutorials.md:15 e7bfb3396f7b42f1a1be9f29df1773a2
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Add new Knowledge demonstration"
|
msgid "Add new Knowledge demonstration"
|
||||||
msgstr "[新增知识库演示](../../assets/new_knownledge_en.gif)"
|
msgstr "[新增知识库演示](../../assets/new_knownledge_en.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:17 de26224a814e4c6798d3a342b0f0fe3a
|
#: ../../getting_started/tutorials.md:17 d37acc0486ec40309e7e944bb0458b0a
|
||||||
msgid "SQL Generation"
|
msgid "SQL Generation"
|
||||||
msgstr "SQL生成"
|
msgstr "SQL生成"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:18 f8fe82c554424239beb522f94d285c52
|
#: ../../getting_started/tutorials.md:18 86a328c9e15f46679a2611f7162f9fbe
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
msgstr "[sql生成演示](../../assets/demo_en.gif)"
|
msgstr "[sql生成演示](../../assets/demo_en.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:18 41e932b692074fccb8059cadb0ed320e
|
#: ../../getting_started/tutorials.md:18 03bc8d7320be44f0879a553a324ec26f
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "sql generation demonstration"
|
msgid "sql generation demonstration"
|
||||||
msgstr "[sql生成演示](../../assets/demo_en.gif)"
|
msgstr "[sql生成演示](../../assets/demo_en.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:20 78bda916272f4cf99e9b26b4d9ba09ab
|
#: ../../getting_started/tutorials.md:20 5f3b241f24634c09880d5de014f64f1b
|
||||||
msgid "SQL Execute"
|
msgid "SQL Execute"
|
||||||
msgstr "SQL执行"
|
msgstr "SQL执行"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:21 53cc83de34784c3c8d4d8204eacccbe9
|
#: ../../getting_started/tutorials.md:21 13a16debf2624f44bfb2e0453c11572d
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
msgstr "[sql execute 演示](../../assets/auto_sql_en.gif)"
|
msgstr "[sql execute 演示](../../assets/auto_sql_en.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:21 535c06f487ed4d15a6cdd17a0154d798
|
#: ../../getting_started/tutorials.md:21 2d9673cfd48b49a5b1942fdc9de292bf
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "sql execute demonstration"
|
msgid "sql execute demonstration"
|
||||||
msgstr "SQL执行演示"
|
msgstr "SQL执行演示"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:23 0482e6155dc44843adc3a3aa77528f03
|
#: ../../getting_started/tutorials.md:23 8cc0c647ad804969b470b133708de37f
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "Plugins"
|
msgid "Plugins"
|
||||||
msgstr "DB插件"
|
msgstr "DB插件"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:24 632617dd88fe4688b789fbb941686c0f
|
#: ../../getting_started/tutorials.md:24 cad5cc0cb94b42a1a6619bbd2a8b9f4c
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
msgstr "[db plugins 演示](../../assets/dbgpt_bytebase_plugin.gif)"
|
msgstr "[db plugins 演示](../../assets/dbgpt_bytebase_plugin.gif)"
|
||||||
|
|
||||||
#: ../../getting_started/tutorials.md:24 020ff499469145f0a34ac468fff91948
|
#: ../../getting_started/tutorials.md:24 adeee7ea37b743c9b251976124520725
|
||||||
msgid "db plugins demonstration"
|
msgid "db plugins demonstration"
|
||||||
msgstr "DB插件演示"
|
msgstr "DB插件演示"
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 15:12+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -19,12 +19,12 @@ msgstr ""
|
|||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:2 ../../modules/knowledge.rst:30
|
#: ../../modules/knowledge.rst:2 ../../modules/knowledge.rst:136
|
||||||
#: e98ef6095fc54f8f8dc045cfa1733dc2
|
#: 3cc8fa6e9fbd4d889603d99424e9529a
|
||||||
msgid "Knowledge"
|
msgid "Knowledge"
|
||||||
msgstr "知识"
|
msgstr "知识"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:4 51340dd2758e42ee8e96c3935de53438
|
#: ../../modules/knowledge.rst:4 0465a393d9d541958c39c1d07c885d1f
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"As the knowledge base is currently the most significant user demand "
|
"As the knowledge base is currently the most significant user demand "
|
||||||
@ -32,55 +32,90 @@ msgid ""
|
|||||||
"knowledge bases. At the same time, we also provide multiple knowledge "
|
"knowledge bases. At the same time, we also provide multiple knowledge "
|
||||||
"base management strategies in this project, such as pdf knowledge,md "
|
"base management strategies in this project, such as pdf knowledge,md "
|
||||||
"knowledge, txt knowledge, word knowledge, ppt knowledge:"
|
"knowledge, txt knowledge, word knowledge, ppt knowledge:"
|
||||||
msgstr "由于知识库是当前用户需求最显著的场景,我们原生支持知识库的构建和处理。同时,我们还在本项目中提供了多种知识库管理策略,如:pdf,md "
|
msgstr ""
|
||||||
", txt, word, ppt"
|
"由于知识库是当前用户需求最显著的场景,我们原生支持知识库的构建和处理。同时,我们还在本项目中提供了多种知识库管理策略,如:pdf,md , "
|
||||||
|
"txt, word, ppt"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:7 25eeb187843a4d9baa4d0c0a404eec65
|
#: ../../modules/knowledge.rst:6 e670cbe14d8e4da88ba935e4120c31e0
|
||||||
|
msgid ""
|
||||||
|
"We currently support many document formats: raw text, txt, pdf, md, html,"
|
||||||
|
" doc, ppt, and url. In the future, we will continue to support more types"
|
||||||
|
" of knowledge, including audio, video, various databases, and big data "
|
||||||
|
"sources. Of course, we look forward to your active participation in "
|
||||||
|
"contributing code."
|
||||||
|
msgstr ""
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:9 e0bf601a1a0c458297306db6ff79f931
|
||||||
msgid "**Create your own knowledge repository**"
|
msgid "**Create your own knowledge repository**"
|
||||||
msgstr "创建你自己的知识库"
|
msgstr "创建你自己的知识库"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:9 bed8a8f08c194ff59a31dc53f67561c1
|
#: ../../modules/knowledge.rst:11 bb26708135d44615be3c1824668010f6
|
||||||
msgid ""
|
msgid "1.prepare"
|
||||||
"1.Place personal knowledge files or folders in the pilot/datasets "
|
msgstr "准备"
|
||||||
"directory."
|
|
||||||
msgstr "1.将个人知识文件或文件夹放在pilot/datasets目录中。"
|
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:11 6e03e1a2799a432f8319c3aaf33e2867
|
#: ../../modules/knowledge.rst:13 c150a0378f3e4625908fa0d8a25860e9
|
||||||
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"We currently support many document formats: txt, pdf, md, html, doc, ppt,"
|
"We currently support many document formats: TEXT(raw text), "
|
||||||
" and url."
|
"DOCUMENT(.txt, .pdf, .md, .doc, .ppt, .html), and URL."
|
||||||
msgstr "当前支持txt, pdf, md, html, doc, ppt, url文档格式"
|
msgstr "当前支持txt, pdf, md, html, doc, ppt, url文档格式"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:13 883ebf16fe7f4e1fbc73ef7430104e79
|
#: ../../modules/knowledge.rst:15 7f9f02a93d5d4325b3d2d976f4bb28a0
|
||||||
msgid "before execution: python -m spacy download zh_core_web_sm"
|
msgid "before execution:"
|
||||||
msgstr "在执行之前请先执行python -m spacy download zh_core_web_sm"
|
msgstr "开始前"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:15 59f4bfa8c1064391919ce2af69f2d4c9
|
#: ../../modules/knowledge.rst:24 59699a8385e04982a992cf0d71f6dcd5
|
||||||
msgid ""
|
|
||||||
"2.Update your .env, set your vector store type, VECTOR_STORE_TYPE=Chroma "
|
|
||||||
"(now only support Chroma and Milvus, if you set Milvus, please set "
|
|
||||||
"MILVUS_URL and MILVUS_PORT)"
|
|
||||||
msgstr "2.更新你的.env,设置你的向量存储类型,VECTOR_STORE_TYPE=Chroma(现在只支持Chroma和Milvus,如果你设置了Milvus,请设置MILVUS_URL和MILVUS_PORT)"
|
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:18 be600a4d93094045b78a43307dfc8f5f
|
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid "2.Run the knowledge repository script in the tools directory."
|
|
||||||
msgstr "3.在tools目录执行知识入库脚本"
|
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:20 b27eddbbf6c74993a6653575f57fff18
|
|
||||||
msgid ""
|
msgid ""
|
||||||
"python tools/knowledge_init.py note : --vector_name : your vector store "
|
"2.prepare embedding model, you can download from https://huggingface.co/."
|
||||||
"name default_value:default"
|
" Notice you have installed git-lfs."
|
||||||
msgstr ""
|
msgstr ""
|
||||||
|
"提前准备Embedding Model, 你可以在https://huggingface.co/进行下载,注意:你需要先安装git-lfs.eg:"
|
||||||
|
" git clone https://huggingface.co/THUDM/chatglm2-6b"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:23 f32dc12aedc94ffc8fee77a4b6e0ec88
|
#: ../../modules/knowledge.rst:27 2be1a17d0b54476b9dea080d244fd747
|
||||||
msgid ""
|
msgid ""
|
||||||
"3.Add the knowledge repository in the interface by entering the name of "
|
"eg: git clone https://huggingface.co/sentence-transformers/all-"
|
||||||
"your knowledge repository (if not specified, enter \"default\") so you "
|
"MiniLM-L6-v2"
|
||||||
"can use it for Q&A based on your knowledge base."
|
msgstr "eg: git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2"
|
||||||
msgstr "如果选择新增知识库,在界面上新增知识库输入你的知识库名"
|
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:25 5b1412c8beb24784bd2a93fe5c487b7b
|
#: ../../modules/knowledge.rst:33 d328f6e243624c9488ebd27c9324621b
|
||||||
|
msgid ""
|
||||||
|
"3.prepare vector_store instance and vector store config, now we support "
|
||||||
|
"Chroma, Milvus and Weaviate."
|
||||||
|
msgstr "提前准备向量数据库环境,目前支持Chroma, Milvus and Weaviate向量数据库"
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:63 44f97154eff647d399fd30b6f9e3b867
|
||||||
|
msgid ""
|
||||||
|
"3.init Url Type EmbeddingEngine api and embedding your document into "
|
||||||
|
"vector store in your code."
|
||||||
|
msgstr "初始化 Url类型 EmbeddingEngine api, 将url文档embedding向量化到向量数据库 "
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:75 e2581b414f0148bca88253c7af9cd591
|
||||||
|
msgid "If you want to add your source_reader or text_splitter, do this:"
|
||||||
|
msgstr "如果你想手动添加你自定义的source_reader和text_splitter, 请参考:"
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:95 74c110414f924bbfa3d512e45ba2f30f
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"4.init Document Type EmbeddingEngine api and embedding your document into"
|
||||||
|
" vector store in your code. Document type can be .txt, .pdf, .md, .doc, "
|
||||||
|
".ppt."
|
||||||
|
msgstr ""
|
||||||
|
"初始化 文档型类型 EmbeddingEngine api, 将文档embedding向量化到向量数据库(文档可以是.txt, .pdf, "
|
||||||
|
".md, .html, .doc, .ppt)"
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:108 0afd40098d5f4dfd9e44fe1d8004da25
|
||||||
|
msgid ""
|
||||||
|
"5.init TEXT Type EmbeddingEngine api and embedding your document into "
|
||||||
|
"vector store in your code."
|
||||||
|
msgstr "初始化TEXT类型 EmbeddingEngine api, 将文档embedding向量化到向量数据库"
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:120 a66961bf3efd41fa8ea938129446f5a5
|
||||||
|
msgid "4.similar search based on your knowledge base. ::"
|
||||||
|
msgstr "在知识库进行相似性搜索"
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:126 b7066f408378450db26770f83fbd2716
|
||||||
msgid ""
|
msgid ""
|
||||||
"Note that the default vector model used is text2vec-large-chinese (which "
|
"Note that the default vector model used is text2vec-large-chinese (which "
|
||||||
"is a large model, so if your personal computer configuration is not "
|
"is a large model, so if your personal computer configuration is not "
|
||||||
@ -90,9 +125,79 @@ msgstr ""
|
|||||||
"注意,这里默认向量模型是text2vec-large-chinese(模型比较大,如果个人电脑配置不够建议采用text2vec-base-"
|
"注意,这里默认向量模型是text2vec-large-chinese(模型比较大,如果个人电脑配置不够建议采用text2vec-base-"
|
||||||
"chinese),因此确保需要将模型download下来放到models目录中。"
|
"chinese),因此确保需要将模型download下来放到models目录中。"
|
||||||
|
|
||||||
#: ../../modules/knowledge.rst:27 67773e32b01c48628c80b6fab8c90146
|
#: ../../modules/knowledge.rst:128 58481d55cab74936b6e84b24c39b1674
|
||||||
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"`pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf "
|
"`pdf_embedding <./knowledge/pdf/pdf_embedding.html>`_: supported pdf "
|
||||||
"embedding."
|
"embedding."
|
||||||
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:129 fbb013c4f1bc46af910c91292f6690cf
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"`markdown_embedding <./knowledge/markdown/markdown_embedding.html>`_: "
|
||||||
|
"supported markdown embedding."
|
||||||
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:130 59d45732f4914d16b4e01aee0992edf7
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"`word_embedding <./knowledge/word/word_embedding.html>`_: supported word "
|
||||||
|
"embedding."
|
||||||
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:131 df0e6f311861423e885b38e020a7c0f0
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"`url_embedding <./knowledge/url/url_embedding.html>`_: supported url "
|
||||||
|
"embedding."
|
||||||
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:132 7c550c1f5bc34fe9986731fb465e12cd
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"`ppt_embedding <./knowledge/ppt/ppt_embedding.html>`_: supported ppt "
|
||||||
|
"embedding."
|
||||||
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#: ../../modules/knowledge.rst:133 8648684cb191476faeeb548389f79050
|
||||||
|
#, fuzzy
|
||||||
|
msgid ""
|
||||||
|
"`string_embedding <./knowledge/string/string_embedding.html>`_: supported"
|
||||||
|
" raw text embedding."
|
||||||
|
msgstr "pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding."
|
||||||
|
|
||||||
|
#~ msgid "before execution: python -m spacy download zh_core_web_sm"
|
||||||
|
#~ msgstr "在执行之前请先执行python -m spacy download zh_core_web_sm"
|
||||||
|
|
||||||
|
#~ msgid "2.Run the knowledge repository script in the tools directory."
|
||||||
|
#~ msgstr "3.在tools目录执行知识入库脚本"
|
||||||
|
|
||||||
|
#~ msgid ""
|
||||||
|
#~ "python tools/knowledge_init.py note : "
|
||||||
|
#~ "--vector_name : your vector store name"
|
||||||
|
#~ " default_value:default"
|
||||||
|
#~ msgstr ""
|
||||||
|
|
||||||
|
#~ msgid ""
|
||||||
|
#~ "3.Add the knowledge repository in the"
|
||||||
|
#~ " interface by entering the name of"
|
||||||
|
#~ " your knowledge repository (if not "
|
||||||
|
#~ "specified, enter \"default\") so you can"
|
||||||
|
#~ " use it for Q&A based on your"
|
||||||
|
#~ " knowledge base."
|
||||||
|
#~ msgstr "如果选择新增知识库,在界面上新增知识库输入你的知识库名"
|
||||||
|
|
||||||
|
#~ msgid ""
|
||||||
|
#~ "1.Place personal knowledge files or "
|
||||||
|
#~ "folders in the pilot/datasets directory."
|
||||||
|
#~ msgstr "1.将个人知识文件或文件夹放在pilot/datasets目录中。"
|
||||||
|
|
||||||
|
#~ msgid ""
|
||||||
|
#~ "2.Update your .env, set your vector "
|
||||||
|
#~ "store type, VECTOR_STORE_TYPE=Chroma (now only"
|
||||||
|
#~ " support Chroma and Milvus, if you"
|
||||||
|
#~ " set Milvus, please set MILVUS_URL "
|
||||||
|
#~ "and MILVUS_PORT)"
|
||||||
|
#~ msgstr "2.更新你的.env,设置你的向量存储类型,VECTOR_STORE_TYPE=Chroma(现在只支持Chroma和Milvus,如果你设置了Milvus,请设置MILVUS_URL和MILVUS_PORT)"
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 14:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -20,12 +20,13 @@ msgstr ""
|
|||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge/markdown/markdown_embedding.md:1
|
#: ../../modules/knowledge/markdown/markdown_embedding.md:1
|
||||||
#: b5fd3aea05a64590955b958b753bf22a
|
#: 6d4eb4d8566b4dbaa301715148342aca
|
||||||
msgid "MarkdownEmbedding"
|
#, fuzzy
|
||||||
|
msgid "Markdown"
|
||||||
msgstr "MarkdownEmbedding"
|
msgstr "MarkdownEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/markdown/markdown_embedding.md:3
|
#: ../../modules/knowledge/markdown/markdown_embedding.md:3
|
||||||
#: 0f98ce5b34d44c6f9c828e4b497984de
|
#: 050625646fa14cb1822d0d430fdf06ec
|
||||||
msgid ""
|
msgid ""
|
||||||
"markdown embedding can import md text into a vector knowledge base. The "
|
"markdown embedding can import md text into a vector knowledge base. The "
|
||||||
"entire embedding process includes the read (loading data), data_process "
|
"entire embedding process includes the read (loading data), data_process "
|
||||||
@ -36,20 +37,20 @@ msgstr ""
|
|||||||
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
||||||
|
|
||||||
#: ../../modules/knowledge/markdown/markdown_embedding.md:5
|
#: ../../modules/knowledge/markdown/markdown_embedding.md:5
|
||||||
#: 7f5ebfa8c7c146d7a340baca85634e16
|
#: af1313489c164e968def2f5f1716a522
|
||||||
msgid "inheriting the SourceEmbedding"
|
msgid "inheriting the SourceEmbedding"
|
||||||
msgstr "继承SourceEmbedding"
|
msgstr "继承SourceEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/markdown/markdown_embedding.md:17
|
#: ../../modules/knowledge/markdown/markdown_embedding.md:18
|
||||||
#: 732e946bc9d149a5af802b239304b943
|
#: aebe894f955b44f3ac677ce50d47c846
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"implement read() and data_process() read() method allows you to read data"
|
"implement read() and data_process() read() method allows you to read data"
|
||||||
" and split data into chunk"
|
" and split data into chunk"
|
||||||
msgstr "实现read方法可以加载数据"
|
msgstr "实现read方法可以加载数据"
|
||||||
|
|
||||||
#: ../../modules/knowledge/markdown/markdown_embedding.md:33
|
#: ../../modules/knowledge/markdown/markdown_embedding.md:41
|
||||||
#: f7e53658aee7403688b333b24ff08ce2
|
#: d53a087726be4a0dbb8dadbeb772442b
|
||||||
msgid "data_process() method allows you to pre processing your ways"
|
msgid "data_process() method allows you to pre processing your ways"
|
||||||
msgstr "实现data_process方法可以进行数据预处理"
|
msgstr "实现data_process方法可以进行数据预处理"
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 14:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -20,12 +20,12 @@ msgstr ""
|
|||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge/pdf/pdf_embedding.md:1
|
#: ../../modules/knowledge/pdf/pdf_embedding.md:1
|
||||||
#: fe600a1f3f9f492da81652ebd3d6d52d
|
#: edf96281acc04612a3384b451dc71391
|
||||||
msgid "PDFEmbedding"
|
msgid "PDF"
|
||||||
msgstr ""
|
msgstr ""
|
||||||
|
|
||||||
#: ../../modules/knowledge/pdf/pdf_embedding.md:3
|
#: ../../modules/knowledge/pdf/pdf_embedding.md:3
|
||||||
#: a26a7d6ff041476b975bab5c0bf9f506
|
#: fdc7396cc2eb4186bb28ea8c491738bc
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"pdfembedding can import PDF text into a vector knowledge base. The entire"
|
"pdfembedding can import PDF text into a vector knowledge base. The entire"
|
||||||
@ -37,20 +37,23 @@ msgstr ""
|
|||||||
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
||||||
|
|
||||||
#: ../../modules/knowledge/pdf/pdf_embedding.md:5
|
#: ../../modules/knowledge/pdf/pdf_embedding.md:5
|
||||||
#: 1895f2a6272c43f0b328caba092102a9
|
#: d4950371bace43d8957bce9757d77b6e
|
||||||
msgid "inheriting the SourceEmbedding"
|
msgid "inheriting the SourceEmbedding"
|
||||||
msgstr "继承SourceEmbedding"
|
msgstr "继承SourceEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/pdf/pdf_embedding.md:17
|
#: ../../modules/knowledge/pdf/pdf_embedding.md:18
|
||||||
#: 2a4a349398354f9cb3e8d9630a4b8696
|
#: 990c46bba6f3438da542417e4addb96f
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"implement read() and data_process() read() method allows you to read data"
|
"implement read() and data_process() read() method allows you to read data"
|
||||||
" and split data into chunk"
|
" and split data into chunk"
|
||||||
msgstr "实现read方法可以加载数据"
|
msgstr "实现read方法可以加载数据"
|
||||||
|
|
||||||
#: ../../modules/knowledge/pdf/pdf_embedding.md:34
|
#: ../../modules/knowledge/pdf/pdf_embedding.md:39
|
||||||
#: 9b5c6d3e9e96443a908a09a8a762ea7a
|
#: 29cf5a37da2f4ad7ab66750970f62d3f
|
||||||
msgid "data_process() method allows you to pre processing your ways"
|
msgid "data_process() method allows you to pre processing your ways"
|
||||||
msgstr "实现data_process方法可以进行数据预处理"
|
msgstr "实现data_process方法可以进行数据预处理"
|
||||||
|
|
||||||
|
#~ msgid "PDFEmbedding"
|
||||||
|
#~ msgstr ""
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 14:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -20,12 +20,12 @@ msgstr ""
|
|||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge/ppt/ppt_embedding.md:1
|
#: ../../modules/knowledge/ppt/ppt_embedding.md:1
|
||||||
#: 2cdb249b2b284064a0c9117d051e35d4
|
#: 86b98a120d0d4796a034c47a23ec8a03
|
||||||
msgid "PPTEmbedding"
|
msgid "PPT"
|
||||||
msgstr ""
|
msgstr ""
|
||||||
|
|
||||||
#: ../../modules/knowledge/ppt/ppt_embedding.md:3
|
#: ../../modules/knowledge/ppt/ppt_embedding.md:3
|
||||||
#: 71676e9b35434a849a206788da8f1394
|
#: af78e8c3a6c24bf79e03da41c6d13fba
|
||||||
msgid ""
|
msgid ""
|
||||||
"ppt embedding can import ppt text into a vector knowledge base. The "
|
"ppt embedding can import ppt text into a vector knowledge base. The "
|
||||||
"entire embedding process includes the read (loading data), data_process "
|
"entire embedding process includes the read (loading data), data_process "
|
||||||
@ -36,20 +36,23 @@ msgstr ""
|
|||||||
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
||||||
|
|
||||||
#: ../../modules/knowledge/ppt/ppt_embedding.md:5
|
#: ../../modules/knowledge/ppt/ppt_embedding.md:5
|
||||||
#: 016aeae4786e4d5bad815670bd109481
|
#: 0ddb5ec40a4e4864b63e7f578c2f3c34
|
||||||
msgid "inheriting the SourceEmbedding"
|
msgid "inheriting the SourceEmbedding"
|
||||||
msgstr "继承SourceEmbedding"
|
msgstr "继承SourceEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/ppt/ppt_embedding.md:17
|
#: ../../modules/knowledge/ppt/ppt_embedding.md:23
|
||||||
#: 2fb5b9dc912342df8c275cfd0e993fe0
|
#: b74741f4a1814fe19842985a3f960972
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"implement read() and data_process() read() method allows you to read data"
|
"implement read() and data_process() read() method allows you to read data"
|
||||||
" and split data into chunk"
|
" and split data into chunk"
|
||||||
msgstr "实现read方法可以加载数据"
|
msgstr "实现read方法可以加载数据"
|
||||||
|
|
||||||
#: ../../modules/knowledge/ppt/ppt_embedding.md:31
|
#: ../../modules/knowledge/ppt/ppt_embedding.md:44
|
||||||
#: 9a00f72c7ec84bde9971579c720d2628
|
#: bc1e705c60cd4dde921150cb814ac8ae
|
||||||
msgid "data_process() method allows you to pre processing your ways"
|
msgid "data_process() method allows you to pre processing your ways"
|
||||||
msgstr "实现data_process方法可以进行数据预处理"
|
msgstr "实现data_process方法可以进行数据预处理"
|
||||||
|
|
||||||
|
#~ msgid "PPTEmbedding"
|
||||||
|
#~ msgstr ""
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 14:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -20,12 +20,12 @@ msgstr ""
|
|||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge/url/url_embedding.md:1
|
#: ../../modules/knowledge/url/url_embedding.md:1
|
||||||
#: e6d335e613ec4c3a80b89de67ba93098
|
#: c1db535b997f4a90a75806f389200a4e
|
||||||
msgid "URL Embedding"
|
msgid "URL"
|
||||||
msgstr ""
|
msgstr ""
|
||||||
|
|
||||||
#: ../../modules/knowledge/url/url_embedding.md:3
|
#: ../../modules/knowledge/url/url_embedding.md:3
|
||||||
#: 25e7643335264bdaaa9386ded243d51d
|
#: a4e3929be4964c35b7d169eaae8f29fe
|
||||||
msgid ""
|
msgid ""
|
||||||
"url embedding can import PDF text into a vector knowledge base. The "
|
"url embedding can import PDF text into a vector knowledge base. The "
|
||||||
"entire embedding process includes the read (loading data), data_process "
|
"entire embedding process includes the read (loading data), data_process "
|
||||||
@ -36,20 +36,23 @@ msgstr ""
|
|||||||
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
||||||
|
|
||||||
#: ../../modules/knowledge/url/url_embedding.md:5
|
#: ../../modules/knowledge/url/url_embedding.md:5
|
||||||
#: 4b8ca6d93ed0412ab1e640bd42b400ac
|
#: 0c0be35a31e84e76a60e9e4ffb61a414
|
||||||
msgid "inheriting the SourceEmbedding"
|
msgid "inheriting the SourceEmbedding"
|
||||||
msgstr "继承SourceEmbedding"
|
msgstr "继承SourceEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/url/url_embedding.md:17
|
#: ../../modules/knowledge/url/url_embedding.md:23
|
||||||
#: 5d69d27adc70406db97c398a339f6453
|
#: f9916af3adee4da2988e5ed1912f2bdd
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"implement read() and data_process() read() method allows you to read data"
|
"implement read() and data_process() read() method allows you to read data"
|
||||||
" and split data into chunk"
|
" and split data into chunk"
|
||||||
msgstr "实现read方法可以加载数据"
|
msgstr "实现read方法可以加载数据"
|
||||||
|
|
||||||
#: ../../modules/knowledge/url/url_embedding.md:34
|
#: ../../modules/knowledge/url/url_embedding.md:44
|
||||||
#: 7d055e181d9b4d47965ab249b18bd704
|
#: 56c0720ae3d840069daad2ba7edc8122
|
||||||
msgid "data_process() method allows you to pre processing your ways"
|
msgid "data_process() method allows you to pre processing your ways"
|
||||||
msgstr "实现data_process方法可以进行数据预处理"
|
msgstr "实现data_process方法可以进行数据预处理"
|
||||||
|
|
||||||
|
#~ msgid "URL Embedding"
|
||||||
|
#~ msgstr ""
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-14 14:51+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -20,12 +20,12 @@ msgstr ""
|
|||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../modules/knowledge/word/word_embedding.md:1
|
#: ../../modules/knowledge/word/word_embedding.md:1
|
||||||
#: 1b3272def692480bb101060a33d076c6
|
#: fa236aa8d2e5471d8436e0ec60f906e8
|
||||||
msgid "WordEmbedding"
|
msgid "Word"
|
||||||
msgstr ""
|
msgstr ""
|
||||||
|
|
||||||
#: ../../modules/knowledge/word/word_embedding.md:3
|
#: ../../modules/knowledge/word/word_embedding.md:3
|
||||||
#: a7ea0e94e5c74dab9aa7fb80ed42ed39
|
#: 02d0c183f7f646a7b74e22d0166c8718
|
||||||
msgid ""
|
msgid ""
|
||||||
"word embedding can import word doc/docx text into a vector knowledge "
|
"word embedding can import word doc/docx text into a vector knowledge "
|
||||||
"base. The entire embedding process includes the read (loading data), "
|
"base. The entire embedding process includes the read (loading data), "
|
||||||
@ -36,20 +36,23 @@ msgstr ""
|
|||||||
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
"数据预处理data_process()和数据进向量数据库index_to_store()"
|
||||||
|
|
||||||
#: ../../modules/knowledge/word/word_embedding.md:5
|
#: ../../modules/knowledge/word/word_embedding.md:5
|
||||||
#: 12ba9527ef0745538dffb6b1dcf96933
|
#: ffa094cb7739457d88666c5b624bf078
|
||||||
msgid "inheriting the SourceEmbedding"
|
msgid "inheriting the SourceEmbedding"
|
||||||
msgstr "继承SourceEmbedding"
|
msgstr "继承SourceEmbedding"
|
||||||
|
|
||||||
#: ../../modules/knowledge/word/word_embedding.md:17
|
#: ../../modules/knowledge/word/word_embedding.md:18
|
||||||
#: a4e5e7553f4a43b0b79ba0de83268ef0
|
#: 146f03d86fd147b7847b7b907d52b408
|
||||||
#, fuzzy
|
#, fuzzy
|
||||||
msgid ""
|
msgid ""
|
||||||
"implement read() and data_process() read() method allows you to read data"
|
"implement read() and data_process() read() method allows you to read data"
|
||||||
" and split data into chunk"
|
" and split data into chunk"
|
||||||
msgstr "实现read方法可以加载数据"
|
msgstr "实现read方法可以加载数据"
|
||||||
|
|
||||||
#: ../../modules/knowledge/word/word_embedding.md:29
|
#: ../../modules/knowledge/word/word_embedding.md:39
|
||||||
#: 188a434dee7543f89cf5f1584f29ca62
|
#: b29a213855af4446a64aadc5a3b76739
|
||||||
msgid "data_process() method allows you to pre processing your ways"
|
msgid "data_process() method allows you to pre processing your ways"
|
||||||
msgstr "实现data_process方法可以进行数据预处理"
|
msgstr "实现data_process方法可以进行数据预处理"
|
||||||
|
|
||||||
|
#~ msgid "WordEmbedding"
|
||||||
|
#~ msgstr ""
|
||||||
|
|
||||||
|
@ -8,7 +8,7 @@ msgid ""
|
|||||||
msgstr ""
|
msgstr ""
|
||||||
"Project-Id-Version: DB-GPT 0.3.0\n"
|
"Project-Id-Version: DB-GPT 0.3.0\n"
|
||||||
"Report-Msgid-Bugs-To: \n"
|
"Report-Msgid-Bugs-To: \n"
|
||||||
"POT-Creation-Date: 2023-06-13 11:38+0800\n"
|
"POT-Creation-Date: 2023-07-13 15:39+0800\n"
|
||||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||||
"Language: zh_CN\n"
|
"Language: zh_CN\n"
|
||||||
@ -19,40 +19,43 @@ msgstr ""
|
|||||||
"Content-Transfer-Encoding: 8bit\n"
|
"Content-Transfer-Encoding: 8bit\n"
|
||||||
"Generated-By: Babel 2.12.1\n"
|
"Generated-By: Babel 2.12.1\n"
|
||||||
|
|
||||||
#: ../../use_cases/knownledge_based_qa.md:1 ddfe412b92e14324bdc11ffe58114e5f
|
#~ msgid "Knownledge based qa"
|
||||||
msgid "Knownledge based qa"
|
#~ msgstr "知识问答"
|
||||||
msgstr "知识问答"
|
|
||||||
|
|
||||||
#: ../../use_cases/knownledge_based_qa.md:3 48635316cc704a779089ff7b5cb9a836
|
#~ msgid ""
|
||||||
msgid ""
|
#~ "Chat with your own knowledge is a"
|
||||||
"Chat with your own knowledge is a very interesting thing. In the usage "
|
#~ " very interesting thing. In the usage"
|
||||||
"scenarios of this chapter, we will introduce how to build your own "
|
#~ " scenarios of this chapter, we will"
|
||||||
"knowledge base through the knowledge base API. Firstly, building a "
|
#~ " introduce how to build your own "
|
||||||
"knowledge store can currently be initialized by executing \"python "
|
#~ "knowledge base through the knowledge "
|
||||||
"tool/knowledge_init.py\" to initialize the content of your own knowledge "
|
#~ "base API. Firstly, building a knowledge"
|
||||||
"base, which was introduced in the previous knowledge base module. Of "
|
#~ " store can currently be initialized "
|
||||||
"course, you can also call our provided knowledge embedding API to store "
|
#~ "by executing \"python tool/knowledge_init.py\" "
|
||||||
"knowledge."
|
#~ "to initialize the content of your "
|
||||||
msgstr ""
|
#~ "own knowledge base, which was introduced"
|
||||||
"用自己的知识聊天是一件很有趣的事情。在本章的使用场景中,我们将介绍如何通过知识库API构建自己的知识库。首先,构建知识存储目前可以通过执行“python"
|
#~ " in the previous knowledge base "
|
||||||
" "
|
#~ "module. Of course, you can also "
|
||||||
"tool/knowledge_init.py”来初始化您自己的知识库的内容,这在前面的知识库模块中已经介绍过了。当然,你也可以调用我们提供的知识嵌入API来存储知识。"
|
#~ "call our provided knowledge embedding "
|
||||||
|
#~ "API to store knowledge."
|
||||||
|
#~ msgstr ""
|
||||||
|
#~ "用自己的知识聊天是一件很有趣的事情。在本章的使用场景中,我们将介绍如何通过知识库API构建自己的知识库。首先,构建知识存储目前可以通过执行“python"
|
||||||
|
#~ " "
|
||||||
|
#~ "tool/knowledge_init.py”来初始化您自己的知识库的内容,这在前面的知识库模块中已经介绍过了。当然,你也可以调用我们提供的知识嵌入API来存储知识。"
|
||||||
|
|
||||||
#: ../../use_cases/knownledge_based_qa.md:6 0a5c68429c9343cf8b88f4f1dddb18eb
|
#~ msgid ""
|
||||||
#, fuzzy
|
#~ "We currently support many document "
|
||||||
msgid ""
|
#~ "formats: txt, pdf, md, html, doc, "
|
||||||
"We currently support many document formats: txt, pdf, md, html, doc, ppt,"
|
#~ "ppt, and url."
|
||||||
" and url."
|
#~ msgstr "“我们目前支持四种文件格式: txt, pdf, url, 和md。"
|
||||||
msgstr "“我们目前支持四种文件格式: txt, pdf, url, 和md。"
|
|
||||||
|
|
||||||
#: ../../use_cases/knownledge_based_qa.md:20 83f3544c06954e5cbc0cc7788f699eb1
|
#~ msgid ""
|
||||||
msgid ""
|
#~ "Now we currently support vector "
|
||||||
"Now we currently support vector databases: Chroma (default) and Milvus. "
|
#~ "databases: Chroma (default) and Milvus. "
|
||||||
"You can switch between them by modifying the \"VECTOR_STORE_TYPE\" field "
|
#~ "You can switch between them by "
|
||||||
"in the .env file."
|
#~ "modifying the \"VECTOR_STORE_TYPE\" field in"
|
||||||
msgstr "“我们目前支持向量数据库:Chroma(默认)和Milvus。你可以通过修改.env文件中的“VECTOR_STORE_TYPE”参数在它们之间切换。"
|
#~ " the .env file."
|
||||||
|
#~ msgstr "“我们目前支持向量数据库:Chroma(默认)和Milvus。你可以通过修改.env文件中的“VECTOR_STORE_TYPE”参数在它们之间切换。"
|
||||||
|
|
||||||
#: ../../use_cases/knownledge_based_qa.md:31 ac12f26b81384fc4bf44ccce1c0d86b4
|
#~ msgid "Below is an example of using the knowledge base API to query knowledge:"
|
||||||
msgid "Below is an example of using the knowledge base API to query knowledge:"
|
#~ msgstr "下面是一个使用知识库API进行查询的例子:"
|
||||||
msgstr "下面是一个使用知识库API进行查询的例子:"
|
|
||||||
|
|
||||||
|
@ -3,28 +3,134 @@ Knowledge
|
|||||||
|
|
||||||
| As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as pdf knowledge,md knowledge, txt knowledge, word knowledge, ppt knowledge:
|
| As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as pdf knowledge,md knowledge, txt knowledge, word knowledge, ppt knowledge:
|
||||||
|
|
||||||
|
We currently support many document formats: raw text, txt, pdf, md, html, doc, ppt, and url.
|
||||||
|
In the future, we will continue to support more types of knowledge, including audio, video, various databases, and big data sources. Of course, we look forward to your active participation in contributing code.
|
||||||
|
|
||||||
**Create your own knowledge repository**
|
**Create your own knowledge repository**
|
||||||
|
|
||||||
1.Place personal knowledge files or folders in the pilot/datasets directory.
|
1.prepare
|
||||||
|
|
||||||
We currently support many document formats: txt, pdf, md, html, doc, ppt, and url.
|
We currently support many document formats: TEXT(raw text), DOCUMENT(.txt, .pdf, .md, .doc, .ppt, .html), and URL.
|
||||||
|
|
||||||
before execution: python -m spacy download zh_core_web_sm
|
before execution:
|
||||||
|
|
||||||
2.Update your .env, set your vector store type, VECTOR_STORE_TYPE=Chroma
|
::
|
||||||
(now only support Chroma and Milvus, if you set Milvus, please set MILVUS_URL and MILVUS_PORT)
|
|
||||||
|
|
||||||
2.Run the knowledge repository script in the tools directory.
|
pip install db-gpt -i https://pypi.org/
|
||||||
|
python -m spacy download zh_core_web_sm
|
||||||
|
from pilot import EmbeddingEngine,KnowledgeType
|
||||||
|
|
||||||
python tools/knowledge_init.py
|
|
||||||
note : --vector_name : your vector store name default_value:default
|
|
||||||
|
|
||||||
3.Add the knowledge repository in the interface by entering the name of your knowledge repository (if not specified, enter "default") so you can use it for Q&A based on your knowledge base.
|
2.prepare embedding model, you can download from https://huggingface.co/.
|
||||||
|
Notice you have installed git-lfs.
|
||||||
|
|
||||||
|
eg: git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
embedding_model = "your_embedding_model_path/all-MiniLM-L6-v2"
|
||||||
|
|
||||||
|
3.prepare vector_store instance and vector store config, now we support Chroma, Milvus and Weaviate.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
#Chroma
|
||||||
|
vector_store_config = {
|
||||||
|
"vector_store_type":"Chroma",
|
||||||
|
"vector_store_name":"your_name",#you can define yourself
|
||||||
|
"chroma_persist_path":"your_persist_dir"
|
||||||
|
}
|
||||||
|
#Milvus
|
||||||
|
vector_store_config = {
|
||||||
|
"vector_store_type":"Milvus",
|
||||||
|
"vector_store_name":"your_name",#you can define yourself
|
||||||
|
"milvus_url":"your_url",
|
||||||
|
"milvus_port":"your_port",
|
||||||
|
"milvus_username":"your_username",(optional)
|
||||||
|
"milvus_password":"your_password",(optional)
|
||||||
|
"milvus_secure":"your_secure"(optional)
|
||||||
|
}
|
||||||
|
#Weaviate
|
||||||
|
vector_store_config = {
|
||||||
|
"vector_store_type":"Weaviate",
|
||||||
|
"vector_store_name":"your_name",#you can define yourself
|
||||||
|
"weaviate_url":"your_url",
|
||||||
|
"weaviate_port":"your_port",
|
||||||
|
"weaviate_username":"your_username",(optional)
|
||||||
|
"weaviate_password":"your_password",(optional)
|
||||||
|
}
|
||||||
|
|
||||||
|
3.init Url Type EmbeddingEngine api and embedding your document into vector store in your code.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
url = "https://db-gpt.readthedocs.io/en/latest/getting_started/getting_started.html"
|
||||||
|
embedding_engine = EmbeddingEngine(
|
||||||
|
knowledge_source=url,
|
||||||
|
knowledge_type=KnowledgeType.URL.value,
|
||||||
|
model_name=embedding_model,
|
||||||
|
vector_store_config=vector_store_config)
|
||||||
|
embedding_engine.knowledge_embedding()
|
||||||
|
|
||||||
|
If you want to add your source_reader or text_splitter, do this:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
url = "https://db-gpt.readthedocs.io/en/latest/getting_started/getting_started.html"
|
||||||
|
|
||||||
|
source_reader = WebBaseLoader(web_path=self.file_path)
|
||||||
|
text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
embedding_engine = EmbeddingEngine(
|
||||||
|
knowledge_source=url,
|
||||||
|
knowledge_type=KnowledgeType.URL.value,
|
||||||
|
model_name=embedding_model,
|
||||||
|
vector_store_config=vector_store_config,
|
||||||
|
source_reader=source_reader,
|
||||||
|
text_splitter=text_splitter
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
4.init Document Type EmbeddingEngine api and embedding your document into vector store in your code.
|
||||||
|
Document type can be .txt, .pdf, .md, .doc, .ppt.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
document_path = "your_path/test.md"
|
||||||
|
embedding_engine = EmbeddingEngine(
|
||||||
|
knowledge_source=document_path,
|
||||||
|
knowledge_type=KnowledgeType.DOCUMENT.value,
|
||||||
|
model_name=embedding_model,
|
||||||
|
vector_store_config=vector_store_config)
|
||||||
|
embedding_engine.knowledge_embedding()
|
||||||
|
|
||||||
|
5.init TEXT Type EmbeddingEngine api and embedding your document into vector store in your code.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
raw_text = "a long passage"
|
||||||
|
embedding_engine = EmbeddingEngine(
|
||||||
|
knowledge_source=raw_text,
|
||||||
|
knowledge_type=KnowledgeType.TEXT.value,
|
||||||
|
model_name=embedding_model,
|
||||||
|
vector_store_config=vector_store_config)
|
||||||
|
embedding_engine.knowledge_embedding()
|
||||||
|
|
||||||
|
4.similar search based on your knowledge base.
|
||||||
|
::
|
||||||
|
query = "please introduce the oceanbase"
|
||||||
|
topk = 5
|
||||||
|
docs = embedding_engine.similar_search(query, topk)
|
||||||
|
|
||||||
Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory.
|
Note that the default vector model used is text2vec-large-chinese (which is a large model, so if your personal computer configuration is not enough, it is recommended to use text2vec-base-chinese). Therefore, ensure that you download the model and place it in the models directory.
|
||||||
|
|
||||||
- `pdf_embedding <./knowledge/pdf_embedding.html>`_: supported pdf embedding.
|
- `pdf_embedding <./knowledge/pdf/pdf_embedding.html>`_: supported pdf embedding.
|
||||||
|
- `markdown_embedding <./knowledge/markdown/markdown_embedding.html>`_: supported markdown embedding.
|
||||||
|
- `word_embedding <./knowledge/word/word_embedding.html>`_: supported word embedding.
|
||||||
|
- `url_embedding <./knowledge/url/url_embedding.html>`_: supported url embedding.
|
||||||
|
- `ppt_embedding <./knowledge/ppt/ppt_embedding.html>`_: supported ppt embedding.
|
||||||
|
- `string_embedding <./knowledge/string/string_embedding.html>`_: supported raw text embedding.
|
||||||
|
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
@ -37,4 +143,5 @@ Note that the default vector model used is text2vec-large-chinese (which is a la
|
|||||||
./knowledge/markdown/markdown_embedding.md
|
./knowledge/markdown/markdown_embedding.md
|
||||||
./knowledge/word/word_embedding.md
|
./knowledge/word/word_embedding.md
|
||||||
./knowledge/url/url_embedding.md
|
./knowledge/url/url_embedding.md
|
||||||
./knowledge/ppt/ppt_embedding.md
|
./knowledge/ppt/ppt_embedding.md
|
||||||
|
./knowledge/string/string_embedding.md
|
@ -1,4 +1,4 @@
|
|||||||
MarkdownEmbedding
|
Markdown
|
||||||
==================================
|
==================================
|
||||||
markdown embedding can import md text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
markdown embedding can import md text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
@ -6,13 +6,14 @@ inheriting the SourceEmbedding
|
|||||||
|
|
||||||
```
|
```
|
||||||
class MarkdownEmbedding(SourceEmbedding):
|
class MarkdownEmbedding(SourceEmbedding):
|
||||||
"""pdf embedding for read pdf document."""
|
"""pdf embedding for read markdown document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(self, file_path, vector_store_config, text_splitter):
|
||||||
"""Initialize with pdf path."""
|
"""Initialize with markdown path."""
|
||||||
super().__init__(file_path, vector_store_config)
|
super().__init__(file_path, vector_store_config, text_splitter)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or Nore
|
||||||
```
|
```
|
||||||
implement read() and data_process()
|
implement read() and data_process()
|
||||||
read() method allows you to read data and split data into chunk
|
read() method allows you to read data and split data into chunk
|
||||||
@ -22,12 +23,19 @@ read() method allows you to read data and split data into chunk
|
|||||||
def read(self):
|
def read(self):
|
||||||
"""Load from markdown path."""
|
"""Load from markdown path."""
|
||||||
loader = EncodeTextLoader(self.file_path)
|
loader = EncodeTextLoader(self.file_path)
|
||||||
textsplitter = SpacyTextSplitter(
|
if self.text_splitter is None:
|
||||||
pipeline="zh_core_web_sm",
|
try:
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
self.text_splitter = SpacyTextSplitter(
|
||||||
chunk_overlap=100,
|
pipeline="zh_core_web_sm",
|
||||||
)
|
chunk_size=100,
|
||||||
return loader.load_and_split(textsplitter)
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return loader.load_and_split(self.text_splitter)
|
||||||
```
|
```
|
||||||
|
|
||||||
data_process() method allows you to pre processing your ways
|
data_process() method allows you to pre processing your ways
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
PDFEmbedding
|
PDF
|
||||||
==================================
|
==================================
|
||||||
pdfembedding can import PDF text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
pdfembedding can import PDF text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
@ -7,11 +7,12 @@ inheriting the SourceEmbedding
|
|||||||
class PDFEmbedding(SourceEmbedding):
|
class PDFEmbedding(SourceEmbedding):
|
||||||
"""pdf embedding for read pdf document."""
|
"""pdf embedding for read pdf document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(self, file_path, vector_store_config, text_splitter):
|
||||||
"""Initialize with pdf path."""
|
"""Initialize with pdf path."""
|
||||||
super().__init__(file_path, vector_store_config)
|
super().__init__(file_path, vector_store_config, text_splitter)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or Nore
|
||||||
```
|
```
|
||||||
|
|
||||||
implement read() and data_process()
|
implement read() and data_process()
|
||||||
@ -21,15 +22,19 @@ read() method allows you to read data and split data into chunk
|
|||||||
def read(self):
|
def read(self):
|
||||||
"""Load from pdf path."""
|
"""Load from pdf path."""
|
||||||
loader = PyPDFLoader(self.file_path)
|
loader = PyPDFLoader(self.file_path)
|
||||||
# textsplitter = CHNDocumentSplitter(
|
if self.text_splitter is None:
|
||||||
# pdf=True, sentence_size=CFG.KNOWLEDGE_CHUNK_SIZE
|
try:
|
||||||
# )
|
self.text_splitter = SpacyTextSplitter(
|
||||||
textsplitter = SpacyTextSplitter(
|
pipeline="zh_core_web_sm",
|
||||||
pipeline="zh_core_web_sm",
|
chunk_size=100,
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_overlap=100,
|
||||||
chunk_overlap=100,
|
)
|
||||||
)
|
except Exception:
|
||||||
return loader.load_and_split(textsplitter)
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return loader.load_and_split(self.text_splitter)
|
||||||
```
|
```
|
||||||
data_process() method allows you to pre processing your ways
|
data_process() method allows you to pre processing your ways
|
||||||
```
|
```
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
PPTEmbedding
|
PPT
|
||||||
==================================
|
==================================
|
||||||
ppt embedding can import ppt text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
ppt embedding can import ppt text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
@ -7,11 +7,17 @@ inheriting the SourceEmbedding
|
|||||||
class PPTEmbedding(SourceEmbedding):
|
class PPTEmbedding(SourceEmbedding):
|
||||||
"""ppt embedding for read ppt document."""
|
"""ppt embedding for read ppt document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with pdf path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize ppt word path."""
|
||||||
|
super().__init__(file_path, vector_store_config, text_splitter=None)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
```
|
```
|
||||||
|
|
||||||
implement read() and data_process()
|
implement read() and data_process()
|
||||||
@ -21,12 +27,19 @@ read() method allows you to read data and split data into chunk
|
|||||||
def read(self):
|
def read(self):
|
||||||
"""Load from ppt path."""
|
"""Load from ppt path."""
|
||||||
loader = UnstructuredPowerPointLoader(self.file_path)
|
loader = UnstructuredPowerPointLoader(self.file_path)
|
||||||
textsplitter = SpacyTextSplitter(
|
if self.text_splitter is None:
|
||||||
pipeline="zh_core_web_sm",
|
try:
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
self.text_splitter = SpacyTextSplitter(
|
||||||
chunk_overlap=200,
|
pipeline="zh_core_web_sm",
|
||||||
)
|
chunk_size=100,
|
||||||
return loader.load_and_split(textsplitter)
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return loader.load_and_split(self.text_splitter)
|
||||||
```
|
```
|
||||||
data_process() method allows you to pre processing your ways
|
data_process() method allows you to pre processing your ways
|
||||||
```
|
```
|
||||||
|
41
docs/modules/knowledge/string/string_embedding.md
Normal file
41
docs/modules/knowledge/string/string_embedding.md
Normal file
@ -0,0 +1,41 @@
|
|||||||
|
String
|
||||||
|
==================================
|
||||||
|
string embedding can import a long raw text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
|
inheriting the SourceEmbedding
|
||||||
|
```
|
||||||
|
class StringEmbedding(SourceEmbedding):
|
||||||
|
"""string embedding for read string document."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize raw text word path."""
|
||||||
|
super().__init__(file_path=file_path, vector_store_config=vector_store_config)
|
||||||
|
self.file_path = file_path
|
||||||
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
```
|
||||||
|
|
||||||
|
implement read() and data_process()
|
||||||
|
read() method allows you to read data and split data into chunk
|
||||||
|
```
|
||||||
|
@register
|
||||||
|
def read(self):
|
||||||
|
"""Load from String path."""
|
||||||
|
metadata = {"source": "raw text"}
|
||||||
|
return [Document(page_content=self.file_path, metadata=metadata)]
|
||||||
|
```
|
||||||
|
data_process() method allows you to pre processing your ways
|
||||||
|
```
|
||||||
|
@register
|
||||||
|
def data_process(self, documents: List[Document]):
|
||||||
|
i = 0
|
||||||
|
for d in documents:
|
||||||
|
documents[i].page_content = d.page_content.replace("\n", "")
|
||||||
|
i += 1
|
||||||
|
return documents
|
||||||
|
```
|
@ -1,4 +1,4 @@
|
|||||||
URL Embedding
|
URL
|
||||||
==================================
|
==================================
|
||||||
url embedding can import PDF text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
url embedding can import PDF text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
@ -7,11 +7,17 @@ inheriting the SourceEmbedding
|
|||||||
class URLEmbedding(SourceEmbedding):
|
class URLEmbedding(SourceEmbedding):
|
||||||
"""url embedding for read url document."""
|
"""url embedding for read url document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with url path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize url word path."""
|
||||||
|
super().__init__(file_path, vector_store_config, text_splitter=None)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
```
|
```
|
||||||
|
|
||||||
implement read() and data_process()
|
implement read() and data_process()
|
||||||
@ -21,15 +27,19 @@ read() method allows you to read data and split data into chunk
|
|||||||
def read(self):
|
def read(self):
|
||||||
"""Load from url path."""
|
"""Load from url path."""
|
||||||
loader = WebBaseLoader(web_path=self.file_path)
|
loader = WebBaseLoader(web_path=self.file_path)
|
||||||
if CFG.LANGUAGE == "en":
|
if self.text_splitter is None:
|
||||||
text_splitter = CharacterTextSplitter(
|
try:
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
self.text_splitter = SpacyTextSplitter(
|
||||||
chunk_overlap=20,
|
pipeline="zh_core_web_sm",
|
||||||
length_function=len,
|
chunk_size=100,
|
||||||
)
|
chunk_overlap=100,
|
||||||
else:
|
)
|
||||||
text_splitter = CHNDocumentSplitter(pdf=True, sentence_size=1000)
|
except Exception:
|
||||||
return loader.load_and_split(text_splitter)
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return loader.load_and_split(self.text_splitter)
|
||||||
```
|
```
|
||||||
data_process() method allows you to pre processing your ways
|
data_process() method allows you to pre processing your ways
|
||||||
```
|
```
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
WordEmbedding
|
Word
|
||||||
==================================
|
==================================
|
||||||
word embedding can import word doc/docx text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
word embedding can import word doc/docx text into a vector knowledge base. The entire embedding process includes the read (loading data), data_process (data processing), and index_to_store (embedding to the vector database) methods.
|
||||||
|
|
||||||
@ -7,11 +7,12 @@ inheriting the SourceEmbedding
|
|||||||
class WordEmbedding(SourceEmbedding):
|
class WordEmbedding(SourceEmbedding):
|
||||||
"""word embedding for read word document."""
|
"""word embedding for read word document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(self, file_path, vector_store_config, text_splitter):
|
||||||
"""Initialize with word path."""
|
"""Initialize with pdf path."""
|
||||||
super().__init__(file_path, vector_store_config)
|
super().__init__(file_path, vector_store_config, text_splitter)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.text_splitter = text_splitter or Nore
|
||||||
```
|
```
|
||||||
|
|
||||||
implement read() and data_process()
|
implement read() and data_process()
|
||||||
@ -21,10 +22,19 @@ read() method allows you to read data and split data into chunk
|
|||||||
def read(self):
|
def read(self):
|
||||||
"""Load from word path."""
|
"""Load from word path."""
|
||||||
loader = UnstructuredWordDocumentLoader(self.file_path)
|
loader = UnstructuredWordDocumentLoader(self.file_path)
|
||||||
textsplitter = CHNDocumentSplitter(
|
if self.text_splitter is None:
|
||||||
pdf=True, sentence_size=CFG.KNOWLEDGE_CHUNK_SIZE
|
try:
|
||||||
)
|
self.text_splitter = SpacyTextSplitter(
|
||||||
return loader.load_and_split(textsplitter)
|
pipeline="zh_core_web_sm",
|
||||||
|
chunk_size=100,
|
||||||
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return loader.load_and_split(self.text_splitter)
|
||||||
```
|
```
|
||||||
data_process() method allows you to pre processing your ways
|
data_process() method allows you to pre processing your ways
|
||||||
```
|
```
|
||||||
|
@ -1,43 +0,0 @@
|
|||||||
# Knownledge based qa
|
|
||||||
|
|
||||||
Chat with your own knowledge is a very interesting thing. In the usage scenarios of this chapter, we will introduce how to build your own knowledge base through the knowledge base API. Firstly, building a knowledge store can currently be initialized by executing "python tool/knowledge_init.py" to initialize the content of your own knowledge base, which was introduced in the previous knowledge base module. Of course, you can also call our provided knowledge embedding API to store knowledge.
|
|
||||||
|
|
||||||
|
|
||||||
We currently support many document formats: txt, pdf, md, html, doc, ppt, and url.
|
|
||||||
```
|
|
||||||
vector_store_config = {
|
|
||||||
"vector_store_name": name
|
|
||||||
}
|
|
||||||
|
|
||||||
file_path = "your file path"
|
|
||||||
|
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(file_path=file_path, model_name=LLM_MODEL_CONFIG["text2vec"], vector_store_config=vector_store_config)
|
|
||||||
|
|
||||||
knowledge_embedding_client.knowledge_embedding()
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
Now we currently support vector databases: Chroma (default) and Milvus. You can switch between them by modifying the "VECTOR_STORE_TYPE" field in the .env file.
|
|
||||||
```
|
|
||||||
#*******************************************************************#
|
|
||||||
#** VECTOR STORE SETTINGS **#
|
|
||||||
#*******************************************************************#
|
|
||||||
VECTOR_STORE_TYPE=Chroma
|
|
||||||
#MILVUS_URL=127.0.0.1
|
|
||||||
#MILVUS_PORT=19530
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
Below is an example of using the knowledge base API to query knowledge:
|
|
||||||
|
|
||||||
```
|
|
||||||
vector_store_config = {
|
|
||||||
"vector_store_name": name
|
|
||||||
}
|
|
||||||
|
|
||||||
query = "your query"
|
|
||||||
|
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(file_path="", model_name=LLM_MODEL_CONFIG["text2vec"], vector_store_config=vector_store_config)
|
|
||||||
|
|
||||||
knowledge_embedding_client.similar_search(query, 10)
|
|
||||||
```
|
|
@ -1,3 +1,4 @@
|
|||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
|
from pilot.embedding_engine import EmbeddingEngine, KnowledgeType
|
||||||
|
|
||||||
__all__ = ["SourceEmbedding", "register"]
|
__all__ = ["SourceEmbedding", "register", "EmbeddingEngine", "KnowledgeType"]
|
||||||
|
@ -344,7 +344,14 @@ class Database:
|
|||||||
return [
|
return [
|
||||||
d[0]
|
d[0]
|
||||||
for d in results
|
for d in results
|
||||||
if d[0] not in ["information_schema", "performance_schema", "sys", "mysql"]
|
if d[0]
|
||||||
|
not in [
|
||||||
|
"information_schema",
|
||||||
|
"performance_schema",
|
||||||
|
"sys",
|
||||||
|
"mysql",
|
||||||
|
"knowledge_management",
|
||||||
|
]
|
||||||
]
|
]
|
||||||
|
|
||||||
def convert_sql_write_to_select(self, write_sql):
|
def convert_sql_write_to_select(self, write_sql):
|
||||||
@ -421,7 +428,13 @@ class Database:
|
|||||||
session = self._db_sessions()
|
session = self._db_sessions()
|
||||||
cursor = session.execute(text(f"SHOW CREATE TABLE {table_name}"))
|
cursor = session.execute(text(f"SHOW CREATE TABLE {table_name}"))
|
||||||
ans = cursor.fetchall()
|
ans = cursor.fetchall()
|
||||||
return ans[0][1]
|
res = ans[0][1]
|
||||||
|
res = re.sub(r"\s*ENGINE\s*=\s*InnoDB\s*", " ", res, flags=re.IGNORECASE)
|
||||||
|
res = re.sub(
|
||||||
|
r"\s*DEFAULT\s*CHARSET\s*=\s*\w+\s*", " ", res, flags=re.IGNORECASE
|
||||||
|
)
|
||||||
|
res = re.sub(r"\s*COLLATE\s*=\s*\w+\s*", " ", res, flags=re.IGNORECASE)
|
||||||
|
return res
|
||||||
|
|
||||||
def get_fields(self, table_name):
|
def get_fields(self, table_name):
|
||||||
"""Get column fields about specified table."""
|
"""Get column fields about specified table."""
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
from pilot.embedding_engine.source_embedding import SourceEmbedding, register
|
from pilot.embedding_engine.source_embedding import SourceEmbedding, register
|
||||||
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
from pilot.embedding_engine.knowledge_type import KnowledgeType
|
||||||
|
|
||||||
__all__ = ["SourceEmbedding", "register"]
|
__all__ = ["SourceEmbedding", "register", "EmbeddingEngine", "KnowledgeType"]
|
||||||
|
@ -2,6 +2,11 @@ from typing import Dict, List, Optional
|
|||||||
|
|
||||||
from langchain.document_loaders import CSVLoader
|
from langchain.document_loaders import CSVLoader
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
|
from langchain.text_splitter import (
|
||||||
|
TextSplitter,
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
|
|
||||||
@ -13,19 +18,36 @@ class CSVEmbedding(SourceEmbedding):
|
|||||||
self,
|
self,
|
||||||
file_path,
|
file_path,
|
||||||
vector_store_config,
|
vector_store_config,
|
||||||
embedding_args: Optional[Dict] = None,
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
):
|
):
|
||||||
"""Initialize with csv path."""
|
"""Initialize with csv path."""
|
||||||
super().__init__(file_path, vector_store_config)
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
self.embedding_args = embedding_args
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from csv path."""
|
"""Load from csv path."""
|
||||||
loader = CSVLoader(file_path=self.file_path)
|
if self.source_reader is None:
|
||||||
return loader.load()
|
self.source_reader = CSVLoader(self.file_path)
|
||||||
|
if self.text_splitter is None:
|
||||||
|
try:
|
||||||
|
self.text_splitter = SpacyTextSplitter(
|
||||||
|
pipeline="zh_core_web_sm",
|
||||||
|
chunk_size=100,
|
||||||
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -2,21 +2,28 @@ from typing import Optional
|
|||||||
|
|
||||||
from chromadb.errors import NotEnoughElementsException
|
from chromadb.errors import NotEnoughElementsException
|
||||||
from langchain.embeddings import HuggingFaceEmbeddings
|
from langchain.embeddings import HuggingFaceEmbeddings
|
||||||
|
from langchain.text_splitter import TextSplitter
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.embedding_engine.knowledge_type import get_knowledge_embedding, KnowledgeType
|
from pilot.embedding_engine.knowledge_type import get_knowledge_embedding, KnowledgeType
|
||||||
from pilot.vector_store.connector import VectorStoreConnector
|
from pilot.vector_store.connector import VectorStoreConnector
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
class EmbeddingEngine:
|
||||||
|
"""EmbeddingEngine provide a chain process include(read->text_split->data_process->index_store) for knowledge document embedding into vector store.
|
||||||
|
1.knowledge_embedding:knowledge document source into vector store.(Chroma, Milvus, Weaviate)
|
||||||
|
2.similar_search: similarity search from vector_store
|
||||||
|
how to use reference:https://db-gpt.readthedocs.io/en/latest/modules/knowledge.html
|
||||||
|
how to integrate:https://db-gpt.readthedocs.io/en/latest/modules/knowledge/pdf/pdf_embedding.html
|
||||||
|
"""
|
||||||
|
|
||||||
class KnowledgeEmbedding:
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
model_name,
|
model_name,
|
||||||
vector_store_config,
|
vector_store_config,
|
||||||
knowledge_type: Optional[str] = KnowledgeType.DOCUMENT.value,
|
knowledge_type: Optional[str] = KnowledgeType.DOCUMENT.value,
|
||||||
knowledge_source: Optional[str] = None,
|
knowledge_source: Optional[str] = None,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
):
|
):
|
||||||
"""Initialize with knowledge embedding client, model_name, vector_store_config, knowledge_type, knowledge_source"""
|
"""Initialize with knowledge embedding client, model_name, vector_store_config, knowledge_type, knowledge_source"""
|
||||||
self.knowledge_source = knowledge_source
|
self.knowledge_source = knowledge_source
|
||||||
@ -25,27 +32,36 @@ class KnowledgeEmbedding:
|
|||||||
self.knowledge_type = knowledge_type
|
self.knowledge_type = knowledge_type
|
||||||
self.embeddings = HuggingFaceEmbeddings(model_name=self.model_name)
|
self.embeddings = HuggingFaceEmbeddings(model_name=self.model_name)
|
||||||
self.vector_store_config["embeddings"] = self.embeddings
|
self.vector_store_config["embeddings"] = self.embeddings
|
||||||
|
self.source_reader = source_reader
|
||||||
|
self.text_splitter = text_splitter
|
||||||
|
|
||||||
def knowledge_embedding(self):
|
def knowledge_embedding(self):
|
||||||
|
"""source embedding is chain process.read->text_split->data_process->index_store"""
|
||||||
self.knowledge_embedding_client = self.init_knowledge_embedding()
|
self.knowledge_embedding_client = self.init_knowledge_embedding()
|
||||||
self.knowledge_embedding_client.source_embedding()
|
self.knowledge_embedding_client.source_embedding()
|
||||||
|
|
||||||
def knowledge_embedding_batch(self, docs):
|
def knowledge_embedding_batch(self, docs):
|
||||||
|
"""Deprecation"""
|
||||||
# docs = self.knowledge_embedding_client.read_batch()
|
# docs = self.knowledge_embedding_client.read_batch()
|
||||||
return self.knowledge_embedding_client.index_to_store(docs)
|
return self.knowledge_embedding_client.index_to_store(docs)
|
||||||
|
|
||||||
def read(self):
|
def read(self):
|
||||||
|
"""Deprecation"""
|
||||||
self.knowledge_embedding_client = self.init_knowledge_embedding()
|
self.knowledge_embedding_client = self.init_knowledge_embedding()
|
||||||
return self.knowledge_embedding_client.read_batch()
|
return self.knowledge_embedding_client.read_batch()
|
||||||
|
|
||||||
def init_knowledge_embedding(self):
|
def init_knowledge_embedding(self):
|
||||||
return get_knowledge_embedding(
|
return get_knowledge_embedding(
|
||||||
self.knowledge_type, self.knowledge_source, self.vector_store_config
|
self.knowledge_type,
|
||||||
|
self.knowledge_source,
|
||||||
|
self.vector_store_config,
|
||||||
|
self.source_reader,
|
||||||
|
self.text_splitter,
|
||||||
)
|
)
|
||||||
|
|
||||||
def similar_search(self, text, topk):
|
def similar_search(self, text, topk):
|
||||||
vector_client = VectorStoreConnector(
|
vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
ans = vector_client.similar_search(text, topk)
|
ans = vector_client.similar_search(text, topk)
|
||||||
@ -55,12 +71,12 @@ class KnowledgeEmbedding:
|
|||||||
|
|
||||||
def vector_exist(self):
|
def vector_exist(self):
|
||||||
vector_client = VectorStoreConnector(
|
vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
return vector_client.vector_name_exists()
|
return vector_client.vector_name_exists()
|
||||||
|
|
||||||
def delete_by_ids(self, ids):
|
def delete_by_ids(self, ids):
|
||||||
vector_client = VectorStoreConnector(
|
vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
vector_client.delete_by_ids(ids=ids)
|
vector_client.delete_by_ids(ids=ids)
|
@ -11,6 +11,7 @@ from pilot.embedding_engine.word_embedding import WordEmbedding
|
|||||||
DocumentEmbeddingType = {
|
DocumentEmbeddingType = {
|
||||||
".txt": (MarkdownEmbedding, {}),
|
".txt": (MarkdownEmbedding, {}),
|
||||||
".md": (MarkdownEmbedding, {}),
|
".md": (MarkdownEmbedding, {}),
|
||||||
|
".html": (MarkdownEmbedding, {}),
|
||||||
".pdf": (PDFEmbedding, {}),
|
".pdf": (PDFEmbedding, {}),
|
||||||
".doc": (WordEmbedding, {}),
|
".doc": (WordEmbedding, {}),
|
||||||
".docx": (WordEmbedding, {}),
|
".docx": (WordEmbedding, {}),
|
||||||
@ -25,10 +26,23 @@ class KnowledgeType(Enum):
|
|||||||
URL = "URL"
|
URL = "URL"
|
||||||
TEXT = "TEXT"
|
TEXT = "TEXT"
|
||||||
OSS = "OSS"
|
OSS = "OSS"
|
||||||
|
S3 = "S3"
|
||||||
NOTION = "NOTION"
|
NOTION = "NOTION"
|
||||||
|
MYSQL = "MYSQL"
|
||||||
|
TIDB = "TIDB"
|
||||||
|
CLICKHOUSE = "CLICKHOUSE"
|
||||||
|
OCEANBASE = "OCEANBASE"
|
||||||
|
ELASTICSEARCH = "ELASTICSEARCH"
|
||||||
|
HIVE = "HIVE"
|
||||||
|
PRESTO = "PRESTO"
|
||||||
|
KAFKA = "KAFKA"
|
||||||
|
SPARK = "SPARK"
|
||||||
|
YOUTUBE = "YOUTUBE"
|
||||||
|
|
||||||
|
|
||||||
def get_knowledge_embedding(knowledge_type, knowledge_source, vector_store_config):
|
def get_knowledge_embedding(
|
||||||
|
knowledge_type, knowledge_source, vector_store_config, source_reader, text_splitter
|
||||||
|
):
|
||||||
match knowledge_type:
|
match knowledge_type:
|
||||||
case KnowledgeType.DOCUMENT.value:
|
case KnowledgeType.DOCUMENT.value:
|
||||||
extension = "." + knowledge_source.rsplit(".", 1)[-1]
|
extension = "." + knowledge_source.rsplit(".", 1)[-1]
|
||||||
@ -37,6 +51,8 @@ def get_knowledge_embedding(knowledge_type, knowledge_source, vector_store_confi
|
|||||||
embedding = knowledge_class(
|
embedding = knowledge_class(
|
||||||
knowledge_source,
|
knowledge_source,
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
|
source_reader=source_reader,
|
||||||
|
text_splitter=text_splitter,
|
||||||
**knowledge_args,
|
**knowledge_args,
|
||||||
)
|
)
|
||||||
return embedding
|
return embedding
|
||||||
@ -45,18 +61,43 @@ def get_knowledge_embedding(knowledge_type, knowledge_source, vector_store_confi
|
|||||||
embedding = URLEmbedding(
|
embedding = URLEmbedding(
|
||||||
file_path=knowledge_source,
|
file_path=knowledge_source,
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
|
source_reader=source_reader,
|
||||||
|
text_splitter=text_splitter,
|
||||||
)
|
)
|
||||||
return embedding
|
return embedding
|
||||||
case KnowledgeType.TEXT.value:
|
case KnowledgeType.TEXT.value:
|
||||||
embedding = StringEmbedding(
|
embedding = StringEmbedding(
|
||||||
file_path=knowledge_source,
|
file_path=knowledge_source,
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
|
source_reader=source_reader,
|
||||||
|
text_splitter=text_splitter,
|
||||||
)
|
)
|
||||||
return embedding
|
return embedding
|
||||||
case KnowledgeType.OSS.value:
|
case KnowledgeType.OSS.value:
|
||||||
raise Exception("OSS have not integrate")
|
raise Exception("OSS have not integrate")
|
||||||
|
case KnowledgeType.S3.value:
|
||||||
|
raise Exception("S3 have not integrate")
|
||||||
case KnowledgeType.NOTION.value:
|
case KnowledgeType.NOTION.value:
|
||||||
raise Exception("NOTION have not integrate")
|
raise Exception("NOTION have not integrate")
|
||||||
|
case KnowledgeType.MYSQL.value:
|
||||||
|
raise Exception("MYSQL have not integrate")
|
||||||
|
case KnowledgeType.TIDB.value:
|
||||||
|
raise Exception("TIDB have not integrate")
|
||||||
|
case KnowledgeType.CLICKHOUSE.value:
|
||||||
|
raise Exception("CLICKHOUSE have not integrate")
|
||||||
|
case KnowledgeType.OCEANBASE.value:
|
||||||
|
raise Exception("OCEANBASE have not integrate")
|
||||||
|
case KnowledgeType.ELASTICSEARCH.value:
|
||||||
|
raise Exception("ELASTICSEARCH have not integrate")
|
||||||
|
case KnowledgeType.HIVE.value:
|
||||||
|
raise Exception("HIVE have not integrate")
|
||||||
|
case KnowledgeType.PRESTO.value:
|
||||||
|
raise Exception("PRESTO have not integrate")
|
||||||
|
case KnowledgeType.KAFKA.value:
|
||||||
|
raise Exception("KAFKA have not integrate")
|
||||||
|
case KnowledgeType.SPARK.value:
|
||||||
|
raise Exception("SPARK have not integrate")
|
||||||
|
case KnowledgeType.YOUTUBE.value:
|
||||||
|
raise Exception("YOUTUBE have not integrate")
|
||||||
case _:
|
case _:
|
||||||
raise Exception("unknown knowledge type")
|
raise Exception("unknown knowledge type")
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
import os
|
import os
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
import markdown
|
import markdown
|
||||||
from bs4 import BeautifulSoup
|
from bs4 import BeautifulSoup
|
||||||
@ -10,48 +10,50 @@ from langchain.text_splitter import (
|
|||||||
SpacyTextSplitter,
|
SpacyTextSplitter,
|
||||||
CharacterTextSplitter,
|
CharacterTextSplitter,
|
||||||
RecursiveCharacterTextSplitter,
|
RecursiveCharacterTextSplitter,
|
||||||
|
TextSplitter,
|
||||||
)
|
)
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
from pilot.embedding_engine.EncodeTextLoader import EncodeTextLoader
|
from pilot.embedding_engine.EncodeTextLoader import EncodeTextLoader
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class MarkdownEmbedding(SourceEmbedding):
|
class MarkdownEmbedding(SourceEmbedding):
|
||||||
"""markdown embedding for read markdown document."""
|
"""markdown embedding for read markdown document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with markdown path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize raw text word path."""
|
||||||
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
# self.encoding = encoding
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from markdown path."""
|
"""Load from markdown path."""
|
||||||
loader = EncodeTextLoader(self.file_path)
|
if self.source_reader is None:
|
||||||
|
self.source_reader = EncodeTextLoader(self.file_path)
|
||||||
if CFG.LANGUAGE == "en":
|
if self.text_splitter is None:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
chunk_overlap=20,
|
|
||||||
length_function=len,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
try:
|
try:
|
||||||
text_splitter = SpacyTextSplitter(
|
self.text_splitter = SpacyTextSplitter(
|
||||||
pipeline="zh_core_web_sm",
|
pipeline="zh_core_web_sm",
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_size=100,
|
||||||
chunk_overlap=100,
|
chunk_overlap=100,
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
chunk_size=100, chunk_overlap=50
|
||||||
)
|
)
|
||||||
return loader.load_and_split(text_splitter)
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -1,56 +1,55 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
from langchain.document_loaders import PyPDFLoader
|
from langchain.document_loaders import PyPDFLoader
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
from langchain.text_splitter import SpacyTextSplitter, RecursiveCharacterTextSplitter
|
from langchain.text_splitter import (
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
TextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class PDFEmbedding(SourceEmbedding):
|
class PDFEmbedding(SourceEmbedding):
|
||||||
"""pdf embedding for read pdf document."""
|
"""pdf embedding for read pdf document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with pdf path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize pdf word path."""
|
||||||
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from pdf path."""
|
"""Load from pdf path."""
|
||||||
loader = PyPDFLoader(self.file_path)
|
if self.source_reader is None:
|
||||||
# textsplitter = CHNDocumentSplitter(
|
self.source_reader = PyPDFLoader(self.file_path)
|
||||||
# pdf=True, sentence_size=CFG.KNOWLEDGE_CHUNK_SIZE
|
if self.text_splitter is None:
|
||||||
# )
|
|
||||||
# textsplitter = SpacyTextSplitter(
|
|
||||||
# pipeline="zh_core_web_sm",
|
|
||||||
# chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
# chunk_overlap=100,
|
|
||||||
# )
|
|
||||||
if CFG.LANGUAGE == "en":
|
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
chunk_overlap=20,
|
|
||||||
length_function=len,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
try:
|
try:
|
||||||
text_splitter = SpacyTextSplitter(
|
self.text_splitter = SpacyTextSplitter(
|
||||||
pipeline="zh_core_web_sm",
|
pipeline="zh_core_web_sm",
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_size=100,
|
||||||
chunk_overlap=100,
|
chunk_overlap=100,
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
chunk_size=100, chunk_overlap=50
|
||||||
)
|
)
|
||||||
return loader.load_and_split(text_splitter)
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -1,54 +1,55 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
from langchain.document_loaders import UnstructuredPowerPointLoader
|
from langchain.document_loaders import UnstructuredPowerPointLoader
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
from langchain.text_splitter import SpacyTextSplitter, RecursiveCharacterTextSplitter
|
from langchain.text_splitter import (
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
TextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
from pilot.embedding_engine.chn_document_splitter import CHNDocumentSplitter
|
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class PPTEmbedding(SourceEmbedding):
|
class PPTEmbedding(SourceEmbedding):
|
||||||
"""ppt embedding for read ppt document."""
|
"""ppt embedding for read ppt document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with pdf path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize ppt word path."""
|
||||||
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from ppt path."""
|
"""Load from ppt path."""
|
||||||
loader = UnstructuredPowerPointLoader(self.file_path)
|
if self.source_reader is None:
|
||||||
# textsplitter = SpacyTextSplitter(
|
self.source_reader = UnstructuredPowerPointLoader(self.file_path)
|
||||||
# pipeline="zh_core_web_sm",
|
if self.text_splitter is None:
|
||||||
# chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
# chunk_overlap=200,
|
|
||||||
# )
|
|
||||||
if CFG.LANGUAGE == "en":
|
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
chunk_overlap=20,
|
|
||||||
length_function=len,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
try:
|
try:
|
||||||
text_splitter = SpacyTextSplitter(
|
self.text_splitter = SpacyTextSplitter(
|
||||||
pipeline="zh_core_web_sm",
|
pipeline="zh_core_web_sm",
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_size=100,
|
||||||
chunk_overlap=100,
|
chunk_overlap=100,
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
chunk_size=100, chunk_overlap=50
|
||||||
)
|
)
|
||||||
return loader.load_and_split(text_splitter)
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -4,11 +4,11 @@ from abc import ABC, abstractmethod
|
|||||||
from typing import Dict, List, Optional
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
from chromadb.errors import NotEnoughElementsException
|
from chromadb.errors import NotEnoughElementsException
|
||||||
from pilot.configs.config import Config
|
from langchain.text_splitter import TextSplitter
|
||||||
|
|
||||||
from pilot.vector_store.connector import VectorStoreConnector
|
from pilot.vector_store.connector import VectorStoreConnector
|
||||||
|
|
||||||
registered_methods = []
|
registered_methods = []
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
def register(method):
|
def register(method):
|
||||||
@ -25,12 +25,16 @@ class SourceEmbedding(ABC):
|
|||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
file_path,
|
file_path,
|
||||||
vector_store_config,
|
vector_store_config: {},
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
embedding_args: Optional[Dict] = None,
|
embedding_args: Optional[Dict] = None,
|
||||||
):
|
):
|
||||||
"""Initialize with Loader url, model_name, vector_store_config"""
|
"""Initialize with Loader url, model_name, vector_store_config"""
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
self.embedding_args = embedding_args
|
self.embedding_args = embedding_args
|
||||||
self.embeddings = vector_store_config["embeddings"]
|
self.embeddings = vector_store_config["embeddings"]
|
||||||
|
|
||||||
@ -44,8 +48,8 @@ class SourceEmbedding(ABC):
|
|||||||
"""pre process data."""
|
"""pre process data."""
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def text_split(self, text):
|
def text_splitter(self, text_splitter: TextSplitter):
|
||||||
"""text split chunk"""
|
"""add text split chunk"""
|
||||||
pass
|
pass
|
||||||
|
|
||||||
@register
|
@register
|
||||||
@ -57,7 +61,7 @@ class SourceEmbedding(ABC):
|
|||||||
def index_to_store(self, docs):
|
def index_to_store(self, docs):
|
||||||
"""index to vector store"""
|
"""index to vector store"""
|
||||||
self.vector_client = VectorStoreConnector(
|
self.vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
return self.vector_client.load_document(docs)
|
return self.vector_client.load_document(docs)
|
||||||
|
|
||||||
@ -65,7 +69,7 @@ class SourceEmbedding(ABC):
|
|||||||
def similar_search(self, doc, topk):
|
def similar_search(self, doc, topk):
|
||||||
"""vector store similarity_search"""
|
"""vector store similarity_search"""
|
||||||
self.vector_client = VectorStoreConnector(
|
self.vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
ans = self.vector_client.similar_search(doc, topk)
|
ans = self.vector_client.similar_search(doc, topk)
|
||||||
@ -75,7 +79,7 @@ class SourceEmbedding(ABC):
|
|||||||
|
|
||||||
def vector_name_exist(self):
|
def vector_name_exist(self):
|
||||||
self.vector_client = VectorStoreConnector(
|
self.vector_client = VectorStoreConnector(
|
||||||
CFG.VECTOR_STORE_TYPE, self.vector_store_config
|
self.vector_store_config["vector_store_type"], self.vector_store_config
|
||||||
)
|
)
|
||||||
return self.vector_client.vector_name_exists()
|
return self.vector_client.vector_name_exists()
|
||||||
|
|
||||||
|
@ -1,24 +1,55 @@
|
|||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
|
from langchain.text_splitter import (
|
||||||
|
TextSplitter,
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
|
|
||||||
|
|
||||||
class StringEmbedding(SourceEmbedding):
|
class StringEmbedding(SourceEmbedding):
|
||||||
"""string embedding for read string document."""
|
"""string embedding for read string document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with pdf path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize raw text word path."""
|
||||||
|
super().__init__(
|
||||||
|
file_path=file_path,
|
||||||
|
vector_store_config=vector_store_config,
|
||||||
|
source_reader=None,
|
||||||
|
text_splitter=None,
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from String path."""
|
"""Load from String path."""
|
||||||
metadata = {"source": "db_summary"}
|
metadata = {"source": "raw text"}
|
||||||
return [Document(page_content=self.file_path, metadata=metadata)]
|
docs = [Document(page_content=self.file_path, metadata=metadata)]
|
||||||
|
if self.text_splitter is None:
|
||||||
|
try:
|
||||||
|
self.text_splitter = SpacyTextSplitter(
|
||||||
|
pipeline="zh_core_web_sm",
|
||||||
|
chunk_size=500,
|
||||||
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=100, chunk_overlap=50
|
||||||
|
)
|
||||||
|
return self.text_splitter.split_documents(docs)
|
||||||
|
return docs
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -1,49 +1,54 @@
|
|||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
from bs4 import BeautifulSoup
|
from bs4 import BeautifulSoup
|
||||||
from langchain.document_loaders import WebBaseLoader
|
from langchain.document_loaders import WebBaseLoader
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
from langchain.text_splitter import SpacyTextSplitter, RecursiveCharacterTextSplitter
|
from langchain.text_splitter import (
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
TextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.configs.model_config import KNOWLEDGE_CHUNK_SPLIT_SIZE
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
from pilot.embedding_engine.chn_document_splitter import CHNDocumentSplitter
|
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class URLEmbedding(SourceEmbedding):
|
class URLEmbedding(SourceEmbedding):
|
||||||
"""url embedding for read url document."""
|
"""url embedding for read url document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
"""Initialize with url path."""
|
self,
|
||||||
super().__init__(file_path, vector_store_config)
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
|
"""Initialize url word path."""
|
||||||
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from url path."""
|
"""Load from url path."""
|
||||||
loader = WebBaseLoader(web_path=self.file_path)
|
if self.source_reader is None:
|
||||||
if CFG.LANGUAGE == "en":
|
self.source_reader = WebBaseLoader(web_path=self.file_path)
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
if self.text_splitter is None:
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
chunk_overlap=20,
|
|
||||||
length_function=len,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
try:
|
try:
|
||||||
text_splitter = SpacyTextSplitter(
|
self.text_splitter = SpacyTextSplitter(
|
||||||
pipeline="zh_core_web_sm",
|
pipeline="zh_core_web_sm",
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_size=100,
|
||||||
chunk_overlap=100,
|
chunk_overlap=100,
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
chunk_size=100, chunk_overlap=50
|
||||||
)
|
)
|
||||||
return loader.load_and_split(text_splitter)
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -1,48 +1,55 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
from typing import List
|
from typing import List, Optional
|
||||||
|
|
||||||
from langchain.document_loaders import UnstructuredWordDocumentLoader
|
from langchain.document_loaders import UnstructuredWordDocumentLoader
|
||||||
from langchain.schema import Document
|
from langchain.schema import Document
|
||||||
from langchain.text_splitter import SpacyTextSplitter, RecursiveCharacterTextSplitter
|
from langchain.text_splitter import (
|
||||||
|
SpacyTextSplitter,
|
||||||
|
RecursiveCharacterTextSplitter,
|
||||||
|
TextSplitter,
|
||||||
|
)
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.embedding_engine import SourceEmbedding, register
|
from pilot.embedding_engine import SourceEmbedding, register
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class WordEmbedding(SourceEmbedding):
|
class WordEmbedding(SourceEmbedding):
|
||||||
"""word embedding for read word document."""
|
"""word embedding for read word document."""
|
||||||
|
|
||||||
def __init__(self, file_path, vector_store_config):
|
def __init__(
|
||||||
|
self,
|
||||||
|
file_path,
|
||||||
|
vector_store_config,
|
||||||
|
source_reader: Optional = None,
|
||||||
|
text_splitter: Optional[TextSplitter] = None,
|
||||||
|
):
|
||||||
"""Initialize with word path."""
|
"""Initialize with word path."""
|
||||||
super().__init__(file_path, vector_store_config)
|
super().__init__(
|
||||||
|
file_path, vector_store_config, source_reader=None, text_splitter=None
|
||||||
|
)
|
||||||
self.file_path = file_path
|
self.file_path = file_path
|
||||||
self.vector_store_config = vector_store_config
|
self.vector_store_config = vector_store_config
|
||||||
|
self.source_reader = source_reader or None
|
||||||
|
self.text_splitter = text_splitter or None
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def read(self):
|
def read(self):
|
||||||
"""Load from word path."""
|
"""Load from word path."""
|
||||||
loader = UnstructuredWordDocumentLoader(self.file_path)
|
if self.source_reader is None:
|
||||||
if CFG.LANGUAGE == "en":
|
self.source_reader = UnstructuredWordDocumentLoader(self.file_path)
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
if self.text_splitter is None:
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
|
||||||
chunk_overlap=20,
|
|
||||||
length_function=len,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
try:
|
try:
|
||||||
text_splitter = SpacyTextSplitter(
|
self.text_splitter = SpacyTextSplitter(
|
||||||
pipeline="zh_core_web_sm",
|
pipeline="zh_core_web_sm",
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
chunk_size=100,
|
||||||
chunk_overlap=100,
|
chunk_overlap=100,
|
||||||
)
|
)
|
||||||
except Exception:
|
except Exception:
|
||||||
text_splitter = RecursiveCharacterTextSplitter(
|
self.text_splitter = RecursiveCharacterTextSplitter(
|
||||||
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
chunk_size=100, chunk_overlap=50
|
||||||
)
|
)
|
||||||
return loader.load_and_split(text_splitter)
|
|
||||||
|
return self.source_reader.load_and_split(self.text_splitter)
|
||||||
|
|
||||||
@register
|
@register
|
||||||
def data_process(self, documents: List[Document]):
|
def data_process(self, documents: List[Document]):
|
||||||
|
@ -50,7 +50,7 @@ prompt = PromptTemplate(
|
|||||||
output_parser=DbChatOutputParser(
|
output_parser=DbChatOutputParser(
|
||||||
sep=PROMPT_SEP, is_stream_out=PROMPT_NEED_NEED_STREAM_OUT
|
sep=PROMPT_SEP, is_stream_out=PROMPT_NEED_NEED_STREAM_OUT
|
||||||
),
|
),
|
||||||
example_selector=sql_data_example,
|
# example_selector=sql_data_example,
|
||||||
temperature=PROMPT_TEMPERATURE,
|
temperature=PROMPT_TEMPERATURE,
|
||||||
)
|
)
|
||||||
CFG.prompt_templates.update({prompt.template_scene: prompt})
|
CFG.prompt_templates.update({prompt.template_scene: prompt})
|
||||||
|
@ -17,7 +17,7 @@ from pilot.configs.model_config import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
from pilot.scene.chat_knowledge.custom.prompt import prompt
|
from pilot.scene.chat_knowledge.custom.prompt import prompt
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
CFG = Config()
|
CFG = Config()
|
||||||
|
|
||||||
@ -37,10 +37,10 @@ class ChatNewKnowledge(BaseChat):
|
|||||||
self.knowledge_name = knowledge_name
|
self.knowledge_name = knowledge_name
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": knowledge_name,
|
"vector_store_name": knowledge_name,
|
||||||
"text_field": "content",
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
"vector_store_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
self.knowledge_embedding_client = KnowledgeEmbedding(
|
self.knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG["text2vec"],
|
model_name=LLM_MODEL_CONFIG["text2vec"],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
|
@ -19,7 +19,7 @@ from pilot.configs.model_config import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
from pilot.scene.chat_knowledge.default.prompt import prompt
|
from pilot.scene.chat_knowledge.default.prompt import prompt
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
CFG = Config()
|
CFG = Config()
|
||||||
|
|
||||||
@ -38,9 +38,10 @@ class ChatDefaultKnowledge(BaseChat):
|
|||||||
)
|
)
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": "default",
|
"vector_store_name": "default",
|
||||||
"vector_store_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
self.knowledge_embedding_client = KnowledgeEmbedding(
|
self.knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG["text2vec"],
|
model_name=LLM_MODEL_CONFIG["text2vec"],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
|
@ -18,7 +18,7 @@ from pilot.configs.model_config import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
from pilot.scene.chat_knowledge.url.prompt import prompt
|
from pilot.scene.chat_knowledge.url.prompt import prompt
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
CFG = Config()
|
CFG = Config()
|
||||||
|
|
||||||
@ -38,9 +38,10 @@ class ChatUrlKnowledge(BaseChat):
|
|||||||
self.url = url
|
self.url = url
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": url.replace(":", ""),
|
"vector_store_name": url.replace(":", ""),
|
||||||
"vector_store_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
self.knowledge_embedding_client = KnowledgeEmbedding(
|
self.knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
knowledge_type=KnowledgeType.URL.value,
|
knowledge_type=KnowledgeType.URL.value,
|
||||||
|
@ -19,7 +19,7 @@ from pilot.configs.model_config import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
from pilot.scene.chat_knowledge.v1.prompt import prompt
|
from pilot.scene.chat_knowledge.v1.prompt import prompt
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
CFG = Config()
|
CFG = Config()
|
||||||
|
|
||||||
@ -38,9 +38,10 @@ class ChatKnowledge(BaseChat):
|
|||||||
)
|
)
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": knowledge_space,
|
"vector_store_name": knowledge_space,
|
||||||
"vector_store_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
self.knowledge_embedding_client = KnowledgeEmbedding(
|
self.knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
|
@ -1,3 +1,4 @@
|
|||||||
|
import atexit
|
||||||
import traceback
|
import traceback
|
||||||
import os
|
import os
|
||||||
import shutil
|
import shutil
|
||||||
@ -36,7 +37,7 @@ CFG = Config()
|
|||||||
logger = build_logger("webserver", LOGDIR + "webserver.log")
|
logger = build_logger("webserver", LOGDIR + "webserver.log")
|
||||||
|
|
||||||
|
|
||||||
def signal_handler(sig, frame):
|
def signal_handler():
|
||||||
print("in order to avoid chroma db atexit problem")
|
print("in order to avoid chroma db atexit problem")
|
||||||
os._exit(0)
|
os._exit(0)
|
||||||
|
|
||||||
@ -96,7 +97,6 @@ if __name__ == "__main__":
|
|||||||
action="store_true",
|
action="store_true",
|
||||||
help="enable light mode",
|
help="enable light mode",
|
||||||
)
|
)
|
||||||
signal.signal(signal.SIGINT, signal_handler)
|
|
||||||
|
|
||||||
# init server config
|
# init server config
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
@ -114,3 +114,4 @@ if __name__ == "__main__":
|
|||||||
import uvicorn
|
import uvicorn
|
||||||
|
|
||||||
uvicorn.run(app, host="0.0.0.0", port=args.port)
|
uvicorn.run(app, host="0.0.0.0", port=args.port)
|
||||||
|
signal.signal(signal.SIGINT, signal_handler())
|
||||||
|
@ -10,7 +10,7 @@ from pilot.configs.config import Config
|
|||||||
from pilot.configs.model_config import LLM_MODEL_CONFIG, KNOWLEDGE_UPLOAD_ROOT_PATH
|
from pilot.configs.model_config import LLM_MODEL_CONFIG, KNOWLEDGE_UPLOAD_ROOT_PATH
|
||||||
|
|
||||||
from pilot.openapi.api_v1.api_view_model import Result
|
from pilot.openapi.api_v1.api_view_model import Result
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
from pilot.server.knowledge.service import KnowledgeService
|
from pilot.server.knowledge.service import KnowledgeService
|
||||||
from pilot.server.knowledge.request.request import (
|
from pilot.server.knowledge.request.request import (
|
||||||
@ -143,7 +143,7 @@ def document_list(space_name: str, query_request: ChunkQueryRequest):
|
|||||||
@router.post("/knowledge/{vector_name}/query")
|
@router.post("/knowledge/{vector_name}/query")
|
||||||
def similar_query(space_name: str, query_request: KnowledgeQueryRequest):
|
def similar_query(space_name: str, query_request: KnowledgeQueryRequest):
|
||||||
print(f"Received params: {space_name}, {query_request}")
|
print(f"Received params: {space_name}, {query_request}")
|
||||||
client = KnowledgeEmbedding(
|
client = EmbeddingEngine(
|
||||||
model_name=embeddings, vector_store_config={"vector_store_name": space_name}
|
model_name=embeddings, vector_store_config={"vector_store_name": space_name}
|
||||||
)
|
)
|
||||||
docs = client.similar_search(query_request.query, query_request.top_k)
|
docs = client.similar_search(query_request.query, query_request.top_k)
|
||||||
|
@ -1,9 +1,11 @@
|
|||||||
import threading
|
import threading
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
|
from langchain.text_splitter import RecursiveCharacterTextSplitter, SpacyTextSplitter
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
from pilot.configs.config import Config
|
||||||
from pilot.configs.model_config import LLM_MODEL_CONFIG
|
from pilot.configs.model_config import LLM_MODEL_CONFIG, KNOWLEDGE_UPLOAD_ROOT_PATH
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
from pilot.logs import logger
|
from pilot.logs import logger
|
||||||
from pilot.server.knowledge.chunk_db import (
|
from pilot.server.knowledge.chunk_db import (
|
||||||
DocumentChunkEntity,
|
DocumentChunkEntity,
|
||||||
@ -122,13 +124,34 @@ class KnowledgeService:
|
|||||||
raise Exception(
|
raise Exception(
|
||||||
f" doc:{doc.doc_name} status is {doc.status}, can not sync"
|
f" doc:{doc.doc_name} status is {doc.status}, can not sync"
|
||||||
)
|
)
|
||||||
client = KnowledgeEmbedding(
|
|
||||||
|
if CFG.LANGUAGE == "en":
|
||||||
|
text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
||||||
|
chunk_overlap=20,
|
||||||
|
length_function=len,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
text_splitter = SpacyTextSplitter(
|
||||||
|
pipeline="zh_core_web_sm",
|
||||||
|
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE,
|
||||||
|
chunk_overlap=100,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
text_splitter = RecursiveCharacterTextSplitter(
|
||||||
|
chunk_size=CFG.KNOWLEDGE_CHUNK_SIZE, chunk_overlap=50
|
||||||
|
)
|
||||||
|
client = EmbeddingEngine(
|
||||||
knowledge_source=doc.content,
|
knowledge_source=doc.content,
|
||||||
knowledge_type=doc.doc_type.upper(),
|
knowledge_type=doc.doc_type.upper(),
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config={
|
vector_store_config={
|
||||||
"vector_store_name": space_name,
|
"vector_store_name": space_name,
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
},
|
},
|
||||||
|
text_splitter=text_splitter,
|
||||||
)
|
)
|
||||||
chunk_docs = client.read()
|
chunk_docs = client.read()
|
||||||
# update document status
|
# update document status
|
||||||
|
@ -37,7 +37,7 @@ from pilot.conversation import (
|
|||||||
|
|
||||||
from pilot.server.gradio_css import code_highlight_css
|
from pilot.server.gradio_css import code_highlight_css
|
||||||
from pilot.server.gradio_patch import Chatbot as grChatbot
|
from pilot.server.gradio_patch import Chatbot as grChatbot
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
from pilot.utils import build_logger
|
from pilot.utils import build_logger
|
||||||
from pilot.vector_store.extract_tovec import (
|
from pilot.vector_store.extract_tovec import (
|
||||||
get_vector_storelist,
|
get_vector_storelist,
|
||||||
@ -659,13 +659,14 @@ def knowledge_embedding_store(vs_id, files):
|
|||||||
shutil.move(
|
shutil.move(
|
||||||
file.name, os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, vs_id, filename)
|
file.name, os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, vs_id, filename)
|
||||||
)
|
)
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(
|
knowledge_embedding_client = EmbeddingEngine(
|
||||||
knowledge_source=os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, vs_id, filename),
|
knowledge_source=os.path.join(KNOWLEDGE_UPLOAD_ROOT_PATH, vs_id, filename),
|
||||||
knowledge_type=KnowledgeType.DOCUMENT.value,
|
knowledge_type=KnowledgeType.DOCUMENT.value,
|
||||||
model_name=LLM_MODEL_CONFIG["text2vec"],
|
model_name=LLM_MODEL_CONFIG["text2vec"],
|
||||||
vector_store_config={
|
vector_store_config={
|
||||||
"vector_store_name": vector_store_name["vs_name"],
|
"vector_store_name": vector_store_name["vs_name"],
|
||||||
"vector_store_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
knowledge_embedding_client.knowledge_embedding()
|
knowledge_embedding_client.knowledge_embedding()
|
||||||
|
@ -4,10 +4,10 @@ import uuid
|
|||||||
from langchain.embeddings import HuggingFaceEmbeddings, logger
|
from langchain.embeddings import HuggingFaceEmbeddings, logger
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
from pilot.configs.config import Config
|
||||||
from pilot.configs.model_config import LLM_MODEL_CONFIG
|
from pilot.configs.model_config import LLM_MODEL_CONFIG, KNOWLEDGE_UPLOAD_ROOT_PATH
|
||||||
from pilot.scene.base import ChatScene
|
from pilot.scene.base import ChatScene
|
||||||
from pilot.scene.base_chat import BaseChat
|
from pilot.scene.base_chat import BaseChat
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
from pilot.embedding_engine.string_embedding import StringEmbedding
|
from pilot.embedding_engine.string_embedding import StringEmbedding
|
||||||
from pilot.summary.mysql_db_summary import MysqlSummary
|
from pilot.summary.mysql_db_summary import MysqlSummary
|
||||||
from pilot.scene.chat_factory import ChatFactory
|
from pilot.scene.chat_factory import ChatFactory
|
||||||
@ -33,6 +33,8 @@ class DBSummaryClient:
|
|||||||
)
|
)
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": dbname + "_summary",
|
"vector_store_name": dbname + "_summary",
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
"embeddings": embeddings,
|
"embeddings": embeddings,
|
||||||
}
|
}
|
||||||
embedding = StringEmbedding(
|
embedding = StringEmbedding(
|
||||||
@ -60,6 +62,8 @@ class DBSummaryClient:
|
|||||||
) in db_summary_client.get_table_summary().items():
|
) in db_summary_client.get_table_summary().items():
|
||||||
table_vector_store_config = {
|
table_vector_store_config = {
|
||||||
"vector_store_name": dbname + "_" + table_name + "_ts",
|
"vector_store_name": dbname + "_" + table_name + "_ts",
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
"embeddings": embeddings,
|
"embeddings": embeddings,
|
||||||
}
|
}
|
||||||
embedding = StringEmbedding(
|
embedding = StringEmbedding(
|
||||||
@ -73,8 +77,10 @@ class DBSummaryClient:
|
|||||||
def get_db_summary(self, dbname, query, topk):
|
def get_db_summary(self, dbname, query, topk):
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": dbname + "_profile",
|
"vector_store_name": dbname + "_profile",
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(
|
knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
@ -86,8 +92,11 @@ class DBSummaryClient:
|
|||||||
"""get user query related tables info"""
|
"""get user query related tables info"""
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": dbname + "_summary",
|
"vector_store_name": dbname + "_summary",
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(
|
knowledge_embedding_client = EmbeddingEngine(
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
@ -109,9 +118,11 @@ class DBSummaryClient:
|
|||||||
for table in related_tables:
|
for table in related_tables:
|
||||||
vector_store_config = {
|
vector_store_config = {
|
||||||
"vector_store_name": dbname + "_" + table + "_ts",
|
"vector_store_name": dbname + "_" + table + "_ts",
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
}
|
}
|
||||||
knowledge_embedding_client = KnowledgeEmbedding(
|
knowledge_embedding_client = EmbeddingEngine(
|
||||||
file_path="",
|
|
||||||
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
model_name=LLM_MODEL_CONFIG[CFG.EMBEDDING_MODEL],
|
||||||
vector_store_config=vector_store_config,
|
vector_store_config=vector_store_config,
|
||||||
)
|
)
|
||||||
@ -128,6 +139,8 @@ class DBSummaryClient:
|
|||||||
def init_db_profile(self, db_summary_client, dbname, embeddings):
|
def init_db_profile(self, db_summary_client, dbname, embeddings):
|
||||||
profile_store_config = {
|
profile_store_config = {
|
||||||
"vector_store_name": dbname + "_profile",
|
"vector_store_name": dbname + "_profile",
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
"embeddings": embeddings,
|
"embeddings": embeddings,
|
||||||
}
|
}
|
||||||
embedding = StringEmbedding(
|
embedding = StringEmbedding(
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
import os
|
import os
|
||||||
|
|
||||||
from langchain.vectorstores import Chroma
|
from langchain.vectorstores import Chroma
|
||||||
from pilot.configs.model_config import KNOWLEDGE_UPLOAD_ROOT_PATH
|
|
||||||
from pilot.logs import logger
|
from pilot.logs import logger
|
||||||
from pilot.vector_store.vector_store_base import VectorStoreBase
|
from pilot.vector_store.vector_store_base import VectorStoreBase
|
||||||
|
|
||||||
@ -13,7 +12,7 @@ class ChromaStore(VectorStoreBase):
|
|||||||
self.ctx = ctx
|
self.ctx = ctx
|
||||||
self.embeddings = ctx["embeddings"]
|
self.embeddings = ctx["embeddings"]
|
||||||
self.persist_dir = os.path.join(
|
self.persist_dir = os.path.join(
|
||||||
KNOWLEDGE_UPLOAD_ROOT_PATH, ctx["vector_store_name"] + ".vectordb"
|
ctx["chroma_persist_path"], ctx["vector_store_name"] + ".vectordb"
|
||||||
)
|
)
|
||||||
self.vector_store_client = Chroma(
|
self.vector_store_client = Chroma(
|
||||||
persist_directory=self.persist_dir, embedding_function=self.embeddings
|
persist_directory=self.persist_dir, embedding_function=self.embeddings
|
||||||
|
@ -1,12 +1,18 @@
|
|||||||
from pilot.vector_store.chroma_store import ChromaStore
|
from pilot.vector_store.chroma_store import ChromaStore
|
||||||
|
|
||||||
# from pilot.vector_store.milvus_store import MilvusStore
|
from pilot.vector_store.milvus_store import MilvusStore
|
||||||
|
|
||||||
connector = {"Chroma": ChromaStore, "Milvus": None}
|
connector = {"Chroma": ChromaStore, "Milvus": MilvusStore}
|
||||||
|
|
||||||
|
|
||||||
class VectorStoreConnector:
|
class VectorStoreConnector:
|
||||||
"""vector store connector, can connect different vector db provided load document api_v1 and similar search api_v1."""
|
"""VectorStoreConnector, can connect different vector db provided load document api_v1 and similar search api_v1.
|
||||||
|
1.load_document:knowledge document source into vector store.(Chroma, Milvus, Weaviate)
|
||||||
|
2.similar_search: similarity search from vector_store
|
||||||
|
how to use reference:https://db-gpt.readthedocs.io/en/latest/modules/vector.html
|
||||||
|
how to integrate:https://db-gpt.readthedocs.io/en/latest/modules/vector/milvus/milvus.html
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
def __init__(self, vector_store_type, ctx: {}) -> None:
|
def __init__(self, vector_store_type, ctx: {}) -> None:
|
||||||
"""initialize vector store connector."""
|
"""initialize vector store connector."""
|
||||||
|
@ -3,13 +3,9 @@ from typing import Any, Iterable, List, Optional, Tuple
|
|||||||
from langchain.docstore.document import Document
|
from langchain.docstore.document import Document
|
||||||
from pymilvus import Collection, DataType, connections, utility
|
from pymilvus import Collection, DataType, connections, utility
|
||||||
|
|
||||||
from pilot.configs.config import Config
|
|
||||||
from pilot.vector_store.vector_store_base import VectorStoreBase
|
from pilot.vector_store.vector_store_base import VectorStoreBase
|
||||||
|
|
||||||
|
|
||||||
CFG = Config()
|
|
||||||
|
|
||||||
|
|
||||||
class MilvusStore(VectorStoreBase):
|
class MilvusStore(VectorStoreBase):
|
||||||
"""Milvus database"""
|
"""Milvus database"""
|
||||||
|
|
||||||
@ -22,10 +18,10 @@ class MilvusStore(VectorStoreBase):
|
|||||||
# self.configure(cfg)
|
# self.configure(cfg)
|
||||||
|
|
||||||
connect_kwargs = {}
|
connect_kwargs = {}
|
||||||
self.uri = CFG.MILVUS_URL
|
self.uri = ctx.get("milvus_url", None)
|
||||||
self.port = CFG.MILVUS_PORT
|
self.port = ctx.get("milvus_port", None)
|
||||||
self.username = CFG.MILVUS_USERNAME
|
self.username = ctx.get("milvus_username", None)
|
||||||
self.password = CFG.MILVUS_PASSWORD
|
self.password = ctx.get("milvus_password", None)
|
||||||
self.collection_name = ctx.get("vector_store_name", None)
|
self.collection_name = ctx.get("vector_store_name", None)
|
||||||
self.secure = ctx.get("secure", None)
|
self.secure = ctx.get("secure", None)
|
||||||
self.embedding = ctx.get("embeddings", None)
|
self.embedding = ctx.get("embeddings", None)
|
||||||
|
4
setup.py
4
setup.py
@ -17,9 +17,9 @@ def parse_requirements(file_name: str) -> List[str]:
|
|||||||
|
|
||||||
|
|
||||||
setuptools.setup(
|
setuptools.setup(
|
||||||
name="DB-GPT",
|
name="db-gpt",
|
||||||
packages=find_packages(),
|
packages=find_packages(),
|
||||||
version="0.3.0",
|
version="0.3.1",
|
||||||
author="csunny",
|
author="csunny",
|
||||||
author_email="cfqcsunny@gmail.com",
|
author_email="cfqcsunny@gmail.com",
|
||||||
description="DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment."
|
description="DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment."
|
||||||
|
20
tests/unit/embedding_engine/test_url_embedding.py
Normal file
20
tests/unit/embedding_engine/test_url_embedding.py
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
from pilot import EmbeddingEngine, KnowledgeType
|
||||||
|
|
||||||
|
url = "https://db-gpt.readthedocs.io/en/latest/getting_started/getting_started.html"
|
||||||
|
embedding_model = "text2vec"
|
||||||
|
vector_store_type = "Chroma"
|
||||||
|
chroma_persist_path = "your_persist_path"
|
||||||
|
vector_store_config = {
|
||||||
|
"vector_store_name": url.replace(":", ""),
|
||||||
|
"vector_store_type": vector_store_type,
|
||||||
|
"chroma_persist_path": chroma_persist_path,
|
||||||
|
}
|
||||||
|
embedding_engine = EmbeddingEngine(
|
||||||
|
knowledge_source=url,
|
||||||
|
knowledge_type=KnowledgeType.URL.value,
|
||||||
|
model_name=embedding_model,
|
||||||
|
vector_store_config=vector_store_config,
|
||||||
|
)
|
||||||
|
|
||||||
|
# embedding url content to vector store
|
||||||
|
embedding_engine.knowledge_embedding()
|
@ -15,8 +15,9 @@ from pilot.configs.config import Config
|
|||||||
from pilot.configs.model_config import (
|
from pilot.configs.model_config import (
|
||||||
DATASETS_DIR,
|
DATASETS_DIR,
|
||||||
LLM_MODEL_CONFIG,
|
LLM_MODEL_CONFIG,
|
||||||
|
KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
)
|
)
|
||||||
from pilot.embedding_engine.knowledge_embedding import KnowledgeEmbedding
|
from pilot.embedding_engine.embedding_engine import EmbeddingEngine
|
||||||
|
|
||||||
knowledge_space_service = KnowledgeService()
|
knowledge_space_service = KnowledgeService()
|
||||||
|
|
||||||
@ -37,7 +38,7 @@ class LocalKnowledgeInit:
|
|||||||
for root, _, files in os.walk(file_path, topdown=False):
|
for root, _, files in os.walk(file_path, topdown=False):
|
||||||
for file in files:
|
for file in files:
|
||||||
filename = os.path.join(root, file)
|
filename = os.path.join(root, file)
|
||||||
ke = KnowledgeEmbedding(
|
ke = EmbeddingEngine(
|
||||||
knowledge_source=filename,
|
knowledge_source=filename,
|
||||||
knowledge_type=KnowledgeType.DOCUMENT.value,
|
knowledge_type=KnowledgeType.DOCUMENT.value,
|
||||||
model_name=self.model_name,
|
model_name=self.model_name,
|
||||||
@ -68,7 +69,11 @@ if __name__ == "__main__":
|
|||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
vector_name = args.vector_name
|
vector_name = args.vector_name
|
||||||
store_type = CFG.VECTOR_STORE_TYPE
|
store_type = CFG.VECTOR_STORE_TYPE
|
||||||
vector_store_config = {"vector_store_name": vector_name}
|
vector_store_config = {
|
||||||
|
"vector_store_name": vector_name,
|
||||||
|
"vector_store_type": CFG.VECTOR_STORE_TYPE,
|
||||||
|
"chroma_persist_path": KNOWLEDGE_UPLOAD_ROOT_PATH,
|
||||||
|
}
|
||||||
print(vector_store_config)
|
print(vector_store_config)
|
||||||
kv = LocalKnowledgeInit(vector_store_config=vector_store_config)
|
kv = LocalKnowledgeInit(vector_store_config=vector_store_config)
|
||||||
kv.knowledge_persist(file_path=DATASETS_DIR)
|
kv.knowledge_persist(file_path=DATASETS_DIR)
|
||||||
|
Loading…
Reference in New Issue
Block a user