mirror of
https://github.com/csunny/DB-GPT.git
synced 2025-09-01 09:06:55 +00:00
feat:embedding api
1.embedding_engine add source_reader param 2.docs update 3.fix chroma exit bug
This commit is contained in:
@@ -25,22 +25,25 @@ $ docker run --name=mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=aa12345678 -dit my
|
||||
We use [Chroma embedding database](https://github.com/chroma-core/chroma) as the default for our vector database, so there is no need for special installation. If you choose to connect to other databases, you can follow our tutorial for installation and configuration.
|
||||
For the entire installation process of DB-GPT, we use the miniconda3 virtual environment. Create a virtual environment and install the Python dependencies.
|
||||
|
||||
```{tip}
|
||||
```bash
|
||||
python>=3.10
|
||||
conda create -n dbgpt_env python=3.10
|
||||
conda activate dbgpt_env
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
Before use DB-GPT Knowledge Management
|
||||
```{tip}
|
||||
```bash
|
||||
python -m spacy download zh_core_web_sm
|
||||
|
||||
```
|
||||
|
||||
Once the environment is installed, we have to create a new folder "models" in the DB-GPT project, and then we can put all the models downloaded from huggingface in this directory
|
||||
|
||||
Notice make sure you have install git-lfs
|
||||
```{tip}
|
||||
Notice make sure you have install git-lfs
|
||||
```
|
||||
|
||||
```bash
|
||||
git clone https://huggingface.co/Tribbiani/vicuna-13b
|
||||
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
|
||||
|
@@ -4,11 +4,13 @@ DB-GPT provides a third-party Python API package that you can integrate into you
|
||||
### Installation from Pip
|
||||
|
||||
You can simply pip install:
|
||||
```{tip}
|
||||
```bash
|
||||
pip install -i https://pypi.org/ db-gpt==0.3.0
|
||||
```
|
||||
Notice:make sure python>=3.10
|
||||
|
||||
```{tip}
|
||||
Notice:make sure python>=3.10
|
||||
```
|
||||
|
||||
### Environment Setup
|
||||
|
||||
@@ -16,8 +18,11 @@ By default, if you use the EmbeddingEngine api
|
||||
|
||||
you will prepare embedding models from huggingface
|
||||
|
||||
Notice make sure you have install git-lfs
|
||||
```{tip}
|
||||
Notice make sure you have install git-lfs
|
||||
```
|
||||
|
||||
```bash
|
||||
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||
|
||||
git clone https://huggingface.co/GanymedeNil/text2vec-large-chinese
|
||||
|
@@ -4,13 +4,13 @@ Knowledge
|
||||
| As the knowledge base is currently the most significant user demand scenario, we natively support the construction and processing of knowledge bases. At the same time, we also provide multiple knowledge base management strategies in this project, such as pdf knowledge,md knowledge, txt knowledge, word knowledge, ppt knowledge:
|
||||
|
||||
We currently support many document formats: raw text, txt, pdf, md, html, doc, ppt, and url.
|
||||
|
||||
In the future, we will continue to support more types of knowledge, including audio, video, various databases, and big data sources. Of course, we look forward to your active participation in contributing code.
|
||||
|
||||
**Create your own knowledge repository**
|
||||
|
||||
1.prepare
|
||||
|
||||
We currently support many document formats: raw text, txt, pdf, md, html, doc, ppt, and url.
|
||||
We currently support many document formats: TEXT(raw text), DOCUMENT(.txt, .pdf, .md, .doc, .ppt, .html), and URL.
|
||||
|
||||
before execution:
|
||||
|
||||
@@ -72,12 +72,13 @@ eg: git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
|
||||
vector_store_config=vector_store_config)
|
||||
embedding_engine.knowledge_embedding()
|
||||
|
||||
If you want to add your text_splitter, do this:
|
||||
If you want to add your source_reader or text_splitter, do this:
|
||||
|
||||
::
|
||||
|
||||
url = "https://db-gpt.readthedocs.io/en/latest/getting_started/getting_started.html"
|
||||
|
||||
source_reader = WebBaseLoader(web_path=self.file_path)
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=100, chunk_overlap=50
|
||||
)
|
||||
@@ -86,6 +87,7 @@ If you want to add your text_splitter, do this:
|
||||
knowledge_type=KnowledgeType.URL.value,
|
||||
model_name=embedding_model,
|
||||
vector_store_config=vector_store_config,
|
||||
source_reader=source_reader,
|
||||
text_splitter=text_splitter
|
||||
)
|
||||
|
||||
|
Reference in New Issue
Block a user