docs: update chat_knowledge.md and wechat.jpg (#1035)

This commit is contained in:
Aries-ckt
2024-01-05 16:37:33 +08:00
committed by GitHub
parent 186b6a5668
commit 5bc419679b
3 changed files with 18 additions and 1 deletions

View File

@@ -56,6 +56,23 @@ and click Process, it will take a few minutes to complete the document segmentat
<img src={'/img/chat_knowledge/doc_segmentation.png'} width="720px" />
</p>
:::tip
**Automatic: The document is automatically segmented according to the document type.**
**Chunk size: The number of words in each segment of the document. The default is 512 words.**
- chunk size: The number of words in each segment of the document. The default is 512 words.
- chunk overlap: The number of words overlapped between each segment of the document. The default is 50 words.
** Separator:segmentation by separator **
- separator: The separator of the document. The default is `\n`.
- enable_merge: Whether to merge the separator chunks according to chunk_size after splits. The default is `False`.
** Page: page segmentation, only support .pdf and .pptx document.**
** Paragraph: paragraph segmentation, only support .docx document.**
- separator: The paragraph separator of the document. The default is `\n`.
** Markdown header: markdown header segmentation, only support .md document.**
:::
### Waiting for document vectorization