From 922693caba2ddddc970d922f6894a70776d466e7 Mon Sep 17 00:00:00 2001 From: Leonid Ganeline Date: Tue, 19 Dec 2023 07:58:16 -0800 Subject: [PATCH] docs: `chunkviz` reference (#14802) Added a reference to the `Chunkviz` utility. --- .../data_connection/document_transformers/index.mdx | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/docs/docs/modules/data_connection/document_transformers/index.mdx b/docs/docs/modules/data_connection/document_transformers/index.mdx index bf7e4a05372..b6cabe2d3df 100644 --- a/docs/docs/modules/data_connection/document_transformers/index.mdx +++ b/docs/docs/modules/data_connection/document_transformers/index.mdx @@ -39,7 +39,6 @@ In addition to controlling which characters you can split on, you can also contr - `chunk_overlap`: the maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks (e.g. do a sliding window). - `add_start_index`: whether to include the starting position of each chunk within the original document in the metadata. - ```python # This is a long document we can split up. with open('../../state_of_the_union.txt') as f: @@ -79,6 +78,13 @@ print(texts[1]) +### Evaluate text splitters + +You can evaluate text splitters with the [Chunkviz utility](https://www.chunkviz.com/) created by `Greg Kamradt`. +`Chunkviz` is a great tool for visualizing how your text splitter is working. It will show you how your text is +being split up and help in tuning up the splitting parameters. + + ## Other transformations: ### Filter redundant docs, translate docs, extract metadata, and more