docs: update chat_knowledge.md and wechat.jpg (#1035)

2025-09-14 13:40:54 +00:00 · 2024-01-05 16:37:33 +08:00
parent 186b6a5668
commit 5bc419679b
3 changed files with 18 additions and 1 deletions
--- a/assets/wechat.jpg
+++ b/assets/wechat.jpg
--- a/docs/docs/application/started_tutorial/chat_knowledge.md
+++ b/docs/docs/application/started_tutorial/chat_knowledge.md
@@ -56,6 +56,23 @@ and click Process, it will take a few minutes to complete the document segmentat
  <img src={'/img/chat_knowledge/doc_segmentation.png'} width="720px" />
 </p>

+:::tip
+**Automatic: The document is automatically segmented according to the document type.**
+
+**Chunk size: The number of words in each segment of the document. The default is 512 words.**
+    - chunk size: The number of words in each segment of the document. The default is 512 words.
+    - chunk overlap: The number of words overlapped between each segment of the document. The default is 50 words.
+** Separator:segmentation by separator ** 
+    - separator: The separator of the document. The default is `\n`.
+    - enable_merge: Whether to merge the separator chunks according to chunk_size after splits. The default is `False`.
+** Page: page segmentation, only support .pdf and .pptx document.**
+
+** Paragraph: paragraph segmentation, only support .docx document.**
+    - separator: The paragraph separator of the document. The default is `\n`.
+
+** Markdown header: markdown header segmentation, only support .md document.**
+:::
+

 ### Waiting for document vectorization

--- a/setup.py
+++ b/setup.py
@@ -638,7 +638,7 @@ init_install_requires()
 setuptools.setup(
    name="db-gpt",
    packages=find_packages(exclude=("tests", "*.tests", "*.tests.*", "examples")),
-    version="0.4.4",
+    version="0.4.5",
    author="csunny",
    author_email="cfqcsunny@gmail.com",
    description="DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment."