langchain/docs
mziru 9e3c1d4463
add HTMLHeaderTextSplitter (#11039)
Description: Similar in concept to the `MarkdownHeaderTextSplitter`, the
`HTMLHeaderTextSplitter` is a "structure-aware" chunker that splits text
at the element level and adds metadata for each header "relevant" to any
given chunk. It can return chunks element by element or combine elements
with the same metadata, with the objectives of (a) keeping related text
grouped (more or less) semantically and (b) preserving context-rich
information encoded in document structures. It can be used with other
text splitters as part of a chunking pipeline.

Dependency: lxml python package

Maintainer: @hwchase17

Twitter handle: @MartinZirulnik

---------

Co-authored-by: PresidioVantage <github@presidiovantage.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-04 09:24:25 -04:00
..
_scripts llm feat table revision (#10947) 2023-09-22 10:29:12 -07:00
api_reference add model feat table (#10921) 2023-09-22 01:10:27 -07:00
docs_skeleton Use term keyword according to the official python doc glossary (#11338) 2023-10-03 12:56:08 -07:00
extras add HTMLHeaderTextSplitter (#11039) 2023-10-04 09:24:25 -04:00
snippets Docs: improve similarity search examples (#11298) 2023-10-03 21:47:08 -04:00
.local_build.sh
vercel_requirements.txt