Commit Graph

81 Commits

Author SHA1 Message Date
Maxime Grenu
8951c01fe8 fix(text-splitters): prevent JSFrameworkTextSplitter from mutating self._separators on each split_text() call (#35316) 2026-02-18 17:51:42 -05:00
corridor-security[bot]
1493b4c5ee fix: Server-Side Request Forgery (SSRF) in HTMLHeaderTextSplitter.split_text_from_url (#35196) 2026-02-12 18:48:05 -05:00
Katha
253398ebca feat(text-splitters): add model_kwargs to SentenceTransformersTokenTextSplitter (#35113) 2026-02-11 12:26:58 -05:00
Rohan Disa
16f2c7d13b fix(text-splitters): reverse preserved elements iterator in HTMLSemanticPreservingSplitter (#34080) 2026-02-02 18:25:39 -05:00
Mason Daugherty
51f13f7bff style(text-splitters): lint (#34865) 2026-01-23 23:07:36 -05:00
Christophe Bornet
fd69425439 style(text-splitters): fix some ruff preview rules (#34665)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-01-09 17:28:18 -05:00
Julia (Juli) Huang
cd5b36456a fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … (#34587)
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.

Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.


Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only

Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph

All existing tests pass (make lint, format, test, etc).

Breaking changes
None.

Fixes
Fixes #31569

Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.

---------

Co-authored-by: julih <julih@julihs-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-05 10:28:27 -05:00
Christophe Bornet
5884fb9523 style(text-splitters,standard-tests,cli): add ruff TC and RUF012 rules (#34495)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-12-27 01:41:33 -06:00
rari404
0f940d74b2 feat(text-splitters): add R programming language support (#34241) 2025-12-12 13:34:22 -05:00
Christophe Bornet
ef79c26f18 chore(cli,standard-tests,text-splitters): fix some ruff TC rules (#33934)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-11-12 14:06:31 -05:00
Mason Daugherty
d40e340479 chore: attribute package change versions (#33854)
Needed to disambiguate for within inherited docs
2025-11-06 16:57:30 -05:00
Mason Daugherty
123e29dc26 style: more refs fixes (#33730) 2025-10-29 16:34:46 -04:00
Mason Daugherty
e5e1d6c705 style: more refs work (#33707) 2025-10-28 14:43:28 -04:00
Mason Daugherty
db7f2db1ae feat(infra): langchain docs MCP (#33636) 2025-10-22 11:50:35 -04:00
Mason Daugherty
e731ba1e47 style: more refs work (#33616) 2025-10-20 18:40:19 -04:00
ccurme
3152d25811 fix: support python 3.14 in various projects (#33575)
Co-authored-by: cbornet <cbornet@hotmail.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-10-17 11:06:23 -04:00
Mason Daugherty
26e0a00c4c style: more work for refs (#33508)
Largely:
- Remove explicit `"Default is x"` since new refs show default inferred
from sig
- Inline code (useful for eventual parsing)
- Fix code block rendering (indentations)
2025-10-15 18:46:55 -04:00
Christophe Bornet
83901b30e3 chore(text-splitters): remove arg types from docstrings (#33406)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-10-10 11:37:53 -04:00
Mason Daugherty
6fc21afbc9 style: .. code-block:: admonition translations (#33400)
biiiiiiiiiiiiiiiigggggggg pass
2025-10-09 16:52:58 -04:00
Mason Daugherty
d8a680ee57 style: address Sphinx double-backtick snippet syntax (#33389) 2025-10-09 13:35:51 -04:00
Mason Daugherty
b6132fc23e style: remove more Optional syntax (#33371) 2025-10-08 23:28:43 -04:00
Mason Daugherty
d13823043d style: monorepo pass for refs (#33359)
* Delete some double backticks previously used by Sphinx (not done
everywhere yet)
* Fix some code blocks / dropdowns

Ignoring CLI CI for now
2025-10-08 18:41:39 -04:00
Christophe Bornet
20e04fc3dd chore(text-splitters): cleanup ruff config (#33247)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-10-06 17:02:31 -04:00
Mason Daugherty
ae5b105d11 docs: v1 docs updates (#33173)
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com>
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2025-10-02 18:46:26 -04:00
Mason Daugherty
5e8cb58e6a refactor(text-splitters): drop python 3.9 (#33212) 2025-10-02 13:51:10 -04:00
Mason Daugherty
eaa6dcce9e release: v1.0.0 (#32567)
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com>
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2025-10-02 10:49:42 -04:00
Hyunjoon Jeong
9cc85387d1 fix(text-splitters): add validation to prevent infinite loop and prevent empty token splitter (#32205)
### Description
1) Add validation to prevent infinite loop condition when
```tokenizer.tokens_per_chunk > tokenizer.chunk_overlap```
2) Avoid empty decoded chunk when splitter appends tokens

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-09-11 16:55:32 -04:00
Christophe Bornet
0c3e8ccd0e chore(text-splitters): select ALL rules with exclusions (#32325)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-08 14:46:09 +00:00
Christophe Bornet
e0a4af8d8b docs(text-splitters): fix some docstrings (#32767) 2025-08-31 13:46:11 -05:00
Maitrey Talware
622337a297 docs(docs): fixed typos in documentations (#32661)
Minor typo fixes. (Not linked to current open issues)
2025-08-25 10:02:53 -04:00
Keyu Chen
03138f41a0 feat(text-splitters): add optional custom header pattern support (#31887)
## Description

This PR adds support for custom header patterns in
`MarkdownHeaderTextSplitter`, allowing users to define non-standard
Markdown header formats (like `**Header**`) and specify their hierarchy
levels.

**Issue:** Fixes #22738

**Dependencies:** None - this change has no new dependencies

**Key Changes:**
- Added optional `custom_header_patterns` parameter to support
non-standard header formats
- Enable splitting on patterns like `**Header**` and `***Header***`
- Maintain full backward compatibility with existing usage
- Added comprehensive tests for custom and mixed header scenarios

## Example Usage

```python
from langchain_text_splitters import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("**", "Chapter"),
    ("***", "Section"),
]

custom_header_patterns = {
    "**": 1,   # Level 1 headers
    "***": 2,  # Level 2 headers
}

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on,
    custom_header_patterns=custom_header_patterns,
)

# Now **Chapter 1** is treated as a level 1 header
# And ***Section 1.1*** is treated as a level 2 header
```

## Testing

-  Added unit tests for custom header patterns
-  Added tests for mixed standard and custom headers
-  All existing tests pass (backward compatibility maintained)
-  Linting and formatting checks pass

---

The implementation provides a flexible solution while maintaining the
simplicity of the existing API. Users can continue using the splitter
exactly as before, with the new functionality being entirely opt-in
through the `custom_header_patterns` parameter.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 10:10:49 -04:00
Christophe Bornet
4656f727da chore(text-splitters): add mypy warn_unreachable (#32558) 2025-08-15 09:45:20 -04:00
Mason Daugherty
457ce9c4b0 feat(text-splitters): ruff fixes and rules (#32502) 2025-08-11 13:28:22 -04:00
Mason Daugherty
c31236264e chore: formatting across codebase (#32466) 2025-08-08 10:20:10 -04:00
Mason Daugherty
96cbd90cba fix: formatting issues in docstrings (#32265)
Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
2025-07-27 23:37:47 -04:00
tanwirahmad
622bb05751 fix(langchain): class HTMLSemanticPreservingSplitter ignores the text inside the div tag (#32213)
**Description:** We collect the text from the "html", "body", "div", and
"main" nodes, if they have any.

**Issue:** Fixes #32206.
2025-07-24 10:09:03 -04:00
Fabio Fontana
fd168e1c11 feat(text-splitters): add Visual Basic 6 support (#31173)
### **Description**

Add Visual Basic 6 support.

---

### **Issue**

No specific issue addressed.

---

### **Dependencies**

No additional dependencies required.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-14 13:51:16 +00:00
Christophe Bornet
060fc0e3c9 text-splitters: Add ruff rules FBT (#31935)
See [flake8-boolean-trap
(FBT)](https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt)
2025-07-09 18:36:58 -04:00
Michael Li
5b3e29f809 text splitters: add chunk_size and chunk_overlap validations (#31916)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-07-08 12:22:33 -04:00
Christophe Bornet
451c90fefa text-splitters: Ruff autofixes (#31858)
Auto-fixes from ruff with rule `ALL`
2025-07-07 10:06:08 -04:00
Christophe Bornet
802d2bf249 text-splitters: Add ruff rule UP (pyupgrade) (#31841)
See https://docs.astral.sh/ruff/rules/#pyupgrade-up
All auto-fixed except `typing.AbstractSet` -> `collections.abc.Set`
2025-07-03 10:11:35 -04:00
Cole Murray
43eef43550 security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819)
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed

  ## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
  - Perform Server-Side Request Forgery (SSRF) attacks
  - Exfiltrate data to attacker-controlled servers

  ## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
     - `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations

3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter

  ## Test Plan
  - [x] All existing tests pass
  - [x] New security tests verify XXE attacks are blocked
  - [x] Code passes linting and formatting checks
  - [x] Tested with both old and new versions of lxml


Twitter handle: @_colemurray
2025-07-02 15:24:08 -04:00
Raghu Kapur
2c9859956a text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622)
**Description:**

Previously, when transitioning from a deeper Markdown header (e.g., ###)
to a shallower one (e.g., ##), the
ExperimentalMarkdownSyntaxTextSplitter retained the deeper header in the
metadata.

This commit updates the `_resolve_header_stack` method to remove headers
at the same or deeper levels before appending the current header. As a
result, each chunk now reflects only the active header context.

Fixes unexpected metadata leakage across sections in nested Markdown
documents.

Additionally, test cases have been updated to:
- Validate correct header resolution and metadata assignment.
- Cover edge cases with nested headers and horizontal rules.

**Issue:** 
Fixes [#31596](https://github.com/langchain-ai/langchain/issues/31596)

**Dependencies:**
None

**Twitter handle:** -> [_RaghuKapur](https://twitter.com/_RaghuKapur)

**LinkedIn:** ->
[https://www.linkedin.com/in/raghukapur/](https://www.linkedin.com/in/raghukapur/)
2025-06-20 15:52:17 -04:00
Tom-Trumper
532e6455e9 text-splitters: Add keep_separator arg to HTMLSemanticPreservingSplitter (#31588)
### Description
Add keep_separator arg to HTMLSemanticPreservingSplitter and pass value
to instance of RecursiveCharacterTextSplitter used under the hood.
### Issue
Documents returned by `HTMLSemanticPreservingSplitter.split_text(text)`
are defaulted to use separators at beginning of page_content. [See third
and fourth document in example output from how-to
guide](https://python.langchain.com/docs/how_to/split_html/#using-htmlsemanticpreservingsplitter):
```
[Document(metadata={'Header 1': 'Main Title'}, page_content='This is an introductory paragraph with some basic content.'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='This section introduces the topic'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='. Below is a list: First item Second item Third item with bold text and a link Subsection 1.1: Details This subsection provides additional details'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content=". Here's a table: Header 1 Header 2 Header 3 Row 1, Cell 1 Row 1, Cell 2 Row 1, Cell 3 Row 2, Cell 1 Row 2, Cell 2 Row 2, Cell 3"),
 Document(metadata={'Header 2': 'Section 2: Media Content'}, page_content='This section contains an image and a video: ![image:example_image_link.mp4](example_image_link.mp4) ![video:example_video_link.mp4](example_video_link.mp4)'),
 Document(metadata={'Header 2': 'Section 3: Code Example'}, page_content='This section contains a code block: <code:html> <div> <p>This is a paragraph inside a div.</p> </div> </code>'),
 Document(metadata={'Header 2': 'Conclusion'}, page_content='This is the conclusion of the document.')]
```
### Dependencies
None

@ttrumper3
2025-06-14 17:56:14 -04:00
Christophe Bornet
eab8484a80 text-splitters[patch]: fix some import-untyped errors (#31030) 2025-05-15 11:34:22 -04:00
Sumin Shin
683da2c9e9 text-splitters: Fix regex separator merge bug in CharacterTextSplitter (#31137)
**Description:**
Fix the merge logic in `CharacterTextSplitter.split_text` so that when
using a regex lookahead separator (`is_separator_regex=True`) with
`keep_separator=False`, the raw pattern is not re-inserted between
chunks.

**Issue:**
Fixes #31136 

**Dependencies:**
None

**Twitter handle:**
None

Since this is my first open-source PR, please feel free to point out any
mistakes, and I'll be eager to make corrections.
2025-05-10 15:42:03 -04:00
Christophe Bornet
8c5ae108dd text-splitters: Set strict mypy rules (#30900)
* Add strict mypy rules
* Fix mypy violations
* Add error codes to all type ignores
* Add ruff rule PGH003
* Bump mypy version to 1.15
2025-04-22 20:41:24 -07:00
keshavshrikant
e8be3cca5c fix huggingface tokenizer default length function (#30185)
#30184
2025-03-31 11:54:30 -04:00
Ahmed Tammaa
f23c3e2444 text-splitters[patch]: Refactor HTMLHeaderTextSplitter for Enhanced Maintainability and Readability (#29397)
Please see PR #27678 for context

## Overview

This pull request presents a refactor of the `HTMLHeaderTextSplitter`
class aimed at improving its maintainability and readability. The
primary enhancements include simplifying the internal structure by
consolidating multiple private helper functions into a single private
method, thereby reducing complexity and making the codebase easier to
understand and extend. Importantly, all existing functionalities and
public interfaces remain unchanged.

## PR Goals

1. **Simplify Internal Logic**:
- **Consolidation of Private Methods**: The original implementation
utilized multiple private helper functions (`_header_level`,
`_dom_depth`, `_get_elements`) to manage different aspects of HTML
parsing and document generation. This fragmentation increased cognitive
load and potential maintenance overhead.
- **Streamlined Processing**: By merging these functionalities into a
single private method (`_generate_documents`), the class now offers a
more straightforward flow, making it easier for developers to trace and
understand the processing steps. (Thanks to @eyurtsev)

2. **Enhance Readability**:
- **Clearer Method Responsibilities**: With fewer private methods, each
method now has a more focused responsibility. The primary logic resides
within `_generate_documents`, which handles both HTML traversal and
document creation in a cohesive manner.
- **Reduced Redundancy**: Eliminating redundant checks and consolidating
logic reduces the code's verbosity, making it more concise without
sacrificing clarity.

3. **Improve Maintainability**:
- **Easier Debugging and Extension**: A simplified internal structure
allows for quicker identification of issues and easier implementation of
future enhancements or feature additions.
- **Consistent Header Management**: The new implementation ensures that
headers are managed consistently within a single context, reducing the
likelihood of bugs related to header scope and hierarchy.

4. **Maintain Backward Compatibility**:
- **Unchanged Public Interface**: All public methods (`split_text`,
`split_text_from_url`, `split_text_from_file`) and their signatures
remain unchanged, ensuring that existing integrations and usage patterns
are unaffected.
- **Preserved Docstrings**: Comprehensive docstrings are retained,
providing clear documentation for users and developers alike.

## Detailed Changes

1. **Removed Redundant Private Methods**:
- **Eliminated `_header_level`, `_dom_depth`, and `_get_elements`**:
These methods were merged into the `_generate_documents` method,
centralizing the logic for HTML parsing and document generation.

2. **Consolidated Document Generation Logic**:
- **Single Private Method `_generate_documents`**: This method now
handles the entire process of parsing HTML, tracking active headers,
managing document chunks, and yielding `Document` instances. This
consolidation reduces the number of moving parts and simplifies the
overall processing flow.

3. **Simplified Header Management**:
- **Immediate Header Scope Handling**: Headers are now managed within
the traversal loop of `_generate_documents`, ensuring that headers are
added or removed from the active headers dictionary in real-time based
on their DOM depth and hierarchy.
- **Removed `chunk_dom_depth` Attribute**: The need to track chunk DOM
depth separately has been eliminated, as header scopes are now directly
managed within the traversal logic.

4. **Streamlined Chunk Finalization**:
- **Enhanced `finalize_chunk` Function**: The chunk finalization process
has been simplified to directly yield a single `Document` when needed,
without maintaining an intermediate list. This change reduces
unnecessary list operations and makes the logic more straightforward.

5. **Improved Variable Naming and Flow**:
- **Descriptive Variable Names**: Variables such as `current_chunk` and
`node_text` provide clear insights into their roles within the
processing logic.
- **Direct Header Removal Logic**: Headers that are out of scope are
removed immediately during traversal, ensuring that the active headers
dictionary remains accurate and up-to-date.

6. **Preserved Comprehensive Docstrings**:
- **Unchanged Documentation**: All existing docstrings, including
class-level and method-level documentation, remain intact. This ensures
that users and developers continue to have access to detailed usage
instructions and method explanations.

## Testing

All existing test cases from `test_html_header_text_splitter.py` have
been executed against the refactored code. The results confirm that:

- **Functionality Remains Intact**: The splitter continues to accurately
parse HTML content, respect header hierarchies, and produce the expected
`Document` objects with correct metadata.
- **Backward Compatibility is Maintained**: No changes were required in
the test cases, and all tests pass without modifications, demonstrating
that the refactor does not introduce any regressions or alter existing
behaviors.


This example remains fully operational and behaves as before, returning
a list of `Document` objects with the expected metadata and content
splits.

## Conclusion

This refactor achieves a more maintainable and readable codebase by
simplifying the internal structure of the `HTMLHeaderTextSplitter`
class. By consolidating multiple private methods into a single, cohesive
private method, the class becomes easier to understand, debug, and
extend. All existing functionalities are preserved, and comprehensive
tests confirm that the refactor maintains the expected behavior. These
changes align with LangChain’s standards for clean, maintainable, and
efficient code.

---

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-03-28 15:36:00 -04:00
Ben
789db7398b text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972)
## Description
This pull request introduces a new text splitter,
`JSFrameworkTextSplitter`, to the Langchain library. The
`JSFrameworkTextSplitter` extends the `RecursiveCharacterTextSplitter`
to handle JavaScript framework code effectively, including React (JSX),
Vue, and Svelte. It identifies and utilizes framework-specific component
tags and syntax elements as splitting points, alongside standard
JavaScript syntax. This ensures that code is divided at natural
boundaries, enhancing the parsing and processing of JavaScript and
framework-specific code.

### Key Features
- Supports React (JSX), Vue, and Svelte frameworks.
- Identifies and uses framework-specific tags and syntax elements as
natural splitting points.
- Extends the existing `RecursiveCharacterTextSplitter` for seamless
integration.

## Issue
No specific issue addressed.

## Dependencies
No additional dependencies required.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-17 23:32:33 +00:00