Commit Graph

53 Commits

Author SHA1 Message Date
Katha
253398ebca feat(text-splitters): add model_kwargs to SentenceTransformersTokenTextSplitter (#35113) 2026-02-11 12:26:58 -05:00
dependabot[bot]
d6e86aa748 chore(deps): bump the other-deps group across 3 directories with 12 updates (#35127)
Bumps the other-deps group with 4 updates in the /libs/model-profiles
directory: [pytest](https://github.com/pytest-dev/pytest),
[pytest-watcher](https://github.com/olzhasar/pytest-watcher),
[ruff](https://github.com/astral-sh/ruff) and
[mypy](https://github.com/python/mypy).
Bumps the other-deps group with 3 updates in the /libs/standard-tests
directory: [pytest](https://github.com/pytest-dev/pytest),
[ruff](https://github.com/astral-sh/ruff) and
[pytest-codspeed](https://github.com/CodSpeedHQ/pytest-codspeed).
Bumps the other-deps group with 6 updates in the /libs/text-splitters
directory:

| Package | From | To |
| --- | --- | --- |
| [pytest](https://github.com/pytest-dev/pytest) | `8.4.2` | `9.0.2` |
| [pytest-watcher](https://github.com/olzhasar/pytest-watcher) | `0.4.3`
| `0.6.3` |
| [ruff](https://github.com/astral-sh/ruff) | `0.14.11` | `0.15.0` |
| [types-requests](https://github.com/typeshed-internal/stub_uploader) |
`2.32.4.20250913` | `2.32.4.20260107` |
| [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/bs4/) |
`4.14.2` | `4.14.3` |
| [transformers](https://github.com/huggingface/transformers) | `4.56.2`
| `5.1.0` |


Updates `pytest` from 8.4.2 to 9.0.2
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest/releases">pytest's
releases</a>.</em></p>
<blockquote>
<h2>9.0.2</h2>
<h1>pytest 9.0.2 (2025-12-06)</h1>
<h2>Bug fixes</h2>
<ul>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>:
The terminal progress feature added in pytest 9.0.0 has been disabled by
default, except on Windows, due to compatibility issues with some
terminal emulators.</p>
<p>You may enable it again by passing <code>-p terminalprogress</code>.
We may enable it by default again once compatibility improves in the
future.</p>
<p>Additionally, when the environment variable <code>TERM</code> is
<code>dumb</code>, the escape codes are no longer emitted, even if the
plugin is enabled.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>:
Fixed the TOML type of the <code>tmp_path_retention_count</code>
settings in the API reference from number to string.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13946">#13946</a>:
The private <code>config.inicfg</code> attribute was changed in a
breaking manner in pytest 9.0.0.
Due to its usage in the ecosystem, it is now restored to working order
using a compatibility shim.
It will be deprecated in pytest 9.1 and removed in pytest 10.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13965">#13965</a>:
Fixed quadratic-time behavior when handling <code>unittest</code>
subtests in Python 3.10.</p>
</li>
</ul>
<h2>Improved documentation</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/4492">#4492</a>:
The API Reference now contains cross-reference-able documentation of
<code>pytest's command-line flags
&lt;command-line-flags&gt;</code>.</li>
</ul>
<h2>9.0.1</h2>
<h1>pytest 9.0.1 (2025-11-12)</h1>
<h2>Bug fixes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13895">#13895</a>:
Restore support for skipping tests via <code>raise
unittest.SkipTest</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>:
The terminal progress plugin added in pytest 9.0 is now automatically
disabled when iTerm2 is detected, it generated desktop notifications
instead of the desired functionality.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>:
Fixed the TOML type of the verbosity settings in the API reference from
number to string.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13910">#13910</a>:
Fixed <!-- raw HTML omitted -->UserWarning: Do not expect
file_or_dir<!-- raw HTML omitted --> on some earlier Python 3.12 and
3.13 point versions.</li>
</ul>
<h2>Packaging updates and notes for downstreams</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<h2>Contributor-facing changes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13891">#13891</a>,
<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>:
The CI/CD part of the release automation is now capable of
creating GitHub Releases without having a Git checkout on
disk -- by <code>bluetech</code> and <code>webknjaz</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3d10b5148e"><code>3d10b51</code></a>
Prepare release version 9.0.2</li>
<li><a
href="188750b725"><code>188750b</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14030">#14030</a>
from pytest-dev/patchback/backports/9.0.x/1e4b01d1f...</li>
<li><a
href="b7d7bef90c"><code>b7d7bef</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14014">#14014</a>
from bluetech/compat-note</li>
<li><a
href="bd08e85ac7"><code>bd08e85</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14013">#14013</a>
from pytest-dev/patchback/backports/9.0.x/922b60377...</li>
<li><a
href="bc78386299"><code>bc78386</code></a>
Add CLI options reference documentation (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13930">#13930</a>)</li>
<li><a
href="5a4e398ce8"><code>5a4e398</code></a>
Fix docs typo (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14005">#14005</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14008">#14008</a>)</li>
<li><a
href="d7ae6df394"><code>d7ae6df</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14006">#14006</a>
from pytest-dev/maintenance/update-plugin-list-tmpl...</li>
<li><a
href="556f6a22e1"><code>556f6a2</code></a>
pre-commit: fix rst-lint after new release (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13999">#13999</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14001">#14001</a>)</li>
<li><a
href="c60fbe63a2"><code>c60fbe6</code></a>
Fix quadratic-time behavior when handling <code>unittest</code> subtests
in Python 3.10 ...</li>
<li><a
href="73d9b01118"><code>73d9b01</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13995">#13995</a>
from nicoddemus/patchback/backports/9.0.x/1b5200c0f...</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.2">compare
view</a></li>
</ul>
</details>
<br />

Updates `pytest-watcher` from 0.4.3 to 0.6.3
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/olzhasar/pytest-watcher/releases">pytest-watcher's
releases</a>.</em></p>
<blockquote>
<h2>v0.6.3</h2>
<h3>Features</h3>
<ul>
<li>Add debug mode activated with <code>PTW_DEBUG</code> environment
variable and improve log messages.</li>
</ul>
<h3>Bugfixes</h3>
<ul>
<li>Fix terminal flushing after menu and header prints.</li>
<li>Use monotonic clock for trigger detection to avoid misbehavior on
clock changes.</li>
</ul>
<h2>v0.6.2</h2>
<h3>Bugfixes</h3>
<ul>
<li>Allow specifying blank patterns via CLI</li>
<li>Fix duplicate command entries in menu</li>
</ul>
<h2>v0.6.1</h2>
<h3>Bugfixes</h3>
<ul>
<li>Trigger tests in interactive mode for carriage return character</li>
</ul>
<h3>Improved Documentation</h3>
<ul>
<li>Add contributing guide</li>
</ul>
<h3>Misc</h3>
<ul>
<li>Integrate <a
href="https://towncrier.readthedocs.io/en/stable/index.html">towncrier</a>
into the development process</li>
</ul>
<h2>v0.6.0</h2>
<h2>Features</h2>
<ul>
<li>Add <code>notify-on-failure</code> flag (and config option) to emit
BEL symbol on test suite failure.</li>
</ul>
<h2>Infrastructure</h2>
<ul>
<li>Migrate from poetry to uv.</li>
<li>Remove tox.</li>
</ul>
<h2>v0.5.0</h2>
<h2>Fixes</h2>
<ul>
<li>Merge arguments passed to the runner from config and CLI instead of
overriding.</li>
</ul>
<h2>Changes</h2>
<ul>
<li>Drop support for Python 3.7 &amp; 3.8</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/olzhasar/pytest-watcher/blob/master/CHANGELOG.md">pytest-watcher's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/olzhasar/pytest-watcher/releases/tag/0.6.3">0.6.3</a>
- 2026-01-11</h2>
<h3>Features</h3>
<ul>
<li>Add debug mode activated with <code>PTW_DEBUG</code> environment
variable and improve log messages.</li>
</ul>
<h3>Bugfixes</h3>
<ul>
<li>Fix terminal flushing after menu and header prints.</li>
<li>Use monotonic clock for trigger detection to avoid misbehavior on
clock changes.</li>
</ul>
<h2><a
href="https://github.com/olzhasar/pytest-watcher/releases/tag/0.6.2">0.6.2</a>
- 2025-12-28</h2>
<h3>Bugfixes</h3>
<ul>
<li>Allow specifying blank patterns via CLI</li>
<li>Fix duplicate command entries in menu</li>
</ul>
<h2><a
href="https://github.com/olzhasar/pytest-watcher/releases/tag/0.6.1">0.6.1</a>
- 2025-12-26</h2>
<h3>Bugfixes</h3>
<ul>
<li>Trigger tests in interactive mode for carriage return character</li>
</ul>
<h3>Improved Documentation</h3>
<ul>
<li>Add contributing guide</li>
</ul>
<h3>Misc</h3>
<ul>
<li>Integrate <a
href="https://towncrier.readthedocs.io/en/stable/index.html">towncrier</a>
into the development process</li>
</ul>
<h2><a
href="https://github.com/olzhasar/pytest-watcher/releases/tag/0.6.0">0.6.0</a>
- 2025-12-22</h2>
<h3>Features</h3>
<ul>
<li>Add notify-on-failure flag (and config option) to emit BEL symbol on
test suite failure.</li>
</ul>
<h3>Infrastructure</h3>
<ul>
<li>Migrate from <code>poetry</code> to <code>uv</code>.</li>
<li>Remove <code>tox</code>.</li>
</ul>
<h2><a
href="https://github.com/olzhasar/pytest-watcher/releases/tag/0.5.0">0.5.0</a>
- 2025-12-21</h2>
<h3>Fixes</h3>
<ul>
<li>Merge arguments passed to the runner from config and CLI instead of
overriding.</li>
</ul>
<h3>Changes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c52925b613"><code>c52925b</code></a>
release v0.6.3</li>
<li><a
href="23d49893f7"><code>23d4989</code></a>
Add debug mode. Improve log messages</li>
<li><a
href="e3dffa1cb3"><code>e3dffa1</code></a>
Fix terminal flushing after menu and header prints</li>
<li><a
href="0eeaf6080e"><code>0eeaf60</code></a>
Use monotonic clock for trigger detection</li>
<li><a
href="5ed9d0e262"><code>5ed9d0e</code></a>
Update CHANGELOG. Fix changelog_reader action</li>
<li><a
href="756f005f5d"><code>756f005</code></a>
release v0.6.2</li>
<li><a
href="902aa9e07b"><code>902aa9e</code></a>
Merge pull request <a
href="https://redirect.github.com/olzhasar/pytest-watcher/issues/51">#51</a>
from olzhasar/fix-duplicate-menu</li>
<li><a
href="e6b20d35b9"><code>e6b20d3</code></a>
Allow specifying empty patterns via CLI</li>
<li><a
href="2d522dabf9"><code>2d522da</code></a>
Fix duplicate menu entries</li>
<li><a
href="171e6f1282"><code>171e6f1</code></a>
Fix towncrier CHANGELOG versioning</li>
<li>Additional commits viewable in <a
href="https://github.com/olzhasar/pytest-watcher/compare/v0.4.3...v0.6.3">compare
view</a></li>
</ul>
</details>
<br />

Updates `pytest-asyncio` from 1.2.0 to 1.3.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest-asyncio/releases">pytest-asyncio's
releases</a>.</em></p>
<blockquote>
<h2>pytest-asyncio 1.3.0</h2>
<h1><a
href="https://github.com/pytest-dev/pytest-asyncio/tree/1.3.0">1.3.0</a>
- 2025-11-10</h1>
<h2>Removed</h2>
<ul>
<li>Support for Python 3.9 (<a
href="https://redirect.github.com/pytest-dev/pytest-asyncio/issues/1278">#1278</a>)</li>
</ul>
<h2>Added</h2>
<ul>
<li>Support for pytest 9 (<a
href="https://redirect.github.com/pytest-dev/pytest-asyncio/issues/1279">#1279</a>)</li>
</ul>
<h2>Notes for Downstream Packagers</h2>
<ul>
<li>Tested Python versions include free threaded Python 3.14t (<a
href="https://redirect.github.com/pytest-dev/pytest-asyncio/issues/1274">#1274</a>)</li>
<li>Tests are run in the same pytest process, instead of spawning a
subprocess with <code>pytest.Pytester.runpytest_subprocess</code>. This
prevents the test suite from accidentally using a system installation of
pytest-asyncio, which could result in test errors. (<a
href="https://redirect.github.com/pytest-dev/pytest-asyncio/issues/1275">#1275</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="2e9695fcf8"><code>2e9695f</code></a>
docs: Compile changelog for v1.3.0</li>
<li><a
href="dd0e9ba3fa"><code>dd0e9ba</code></a>
docs: Reference correct issue in news fragment.</li>
<li><a
href="4c31abe5bf"><code>4c31abe</code></a>
Build(deps): Bump nh3 from 0.3.1 to 0.3.2</li>
<li><a
href="13e94770d7"><code>13e9477</code></a>
Link to migration guides from changelog</li>
<li><a
href="4d2cf3c36f"><code>4d2cf3c</code></a>
tests: handle Python 3.14 DefaultEventLoopPolicy deprecation
warnings</li>
<li><a
href="ee3549b6ef"><code>ee3549b</code></a>
test: Remove obsolete test for the event_loop fixture.</li>
<li><a
href="7a67c82c5a"><code>7a67c82</code></a>
tests: Fix failing test by preventing warning conversion to error.</li>
<li><a
href="a17b689a75"><code>a17b689</code></a>
test: add pytest config to isolated test directories</li>
<li><a
href="18afc9df5a"><code>18afc9d</code></a>
fix(tests): replace runpytest_subprocess with runpytest</li>
<li><a
href="cdc6bd1de7"><code>cdc6bd1</code></a>
Add support for pytest 9 and drop Python 3.9 support</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest-asyncio/compare/v1.2.0...v1.3.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `syrupy` from 4.9.1 to 5.1.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/syrupy-project/syrupy/releases">syrupy's
releases</a>.</em></p>
<blockquote>
<h2>v5.1.0</h2>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v5.0.0...v5.1.0">5.1.0</a>
(2026-01-25)</h1>
<h3>Features</h3>
<ul>
<li>add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)
(<a
href="df9bc8f6b3">df9bc8f</a>)</li>
</ul>
<h2>v5.0.0</h2>
<h2>Syrupy 5.0.0</h2>
<p><em>(2025-09-28)</em></p>
<p>This release introduces new features, bug fixes, and a major license
change. It also includes several <strong>breaking changes</strong>, so
please review those carefully before upgrading.</p>
<hr />
<h3>New Features </h3>
<ul>
<li><strong>Add <code>--snapshot-dirname</code> option:</strong> A new
option, <code>--snapshot-dirname</code>, is now available to change the
default directory snapshots are stored in. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">#810</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">syrupy-project/syrupy#810</a>))</li>
<li><strong>Remove private underscore prefix:</strong> The unnecessary
underscore prefixes have been removed from public methods for better
code clarity. ([<a
href="8cfc9059d3">8cfc905</a>](<a
href="8cfc9059d3</a>))</li>
</ul>
<hr />
<h3>Bug Fixes 🐛</h3>
<ul>
<li><strong>Fix terminal summary for <code>xdist</code>
workers:</strong> Resolves an issue where the terminal summary was not
displayed correctly with <code>xdist</code> workers. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">#978</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">syrupy-project/syrupy#978</a>))</li>
<li><strong>Ensure <code>pytest_assertrepr_compare</code> hook is called
first:</strong> This change ensures that Syrupy's assertion hook takes
precedence, improving compatibility with other plugins. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">#984</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">syrupy-project/syrupy#984</a>))</li>
</ul>
<hr />
<h3>Breaking Changes ⚠️</h3>
<ul>
<li>
<p><strong>License change:</strong> The project has switched to the more
permissive <strong>MIT license</strong>. This change applies to all
versions from 5.0.0 and beyond. If you need to use the previous Apache
2.0 license, you must continue to use Syrupy versions 4.x or earlier.
([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">#945</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">syrupy-project/syrupy#945</a>))</p>
</li>
<li>
<p><strong>Python and pytest version requirements:</strong> Syrupy now
requires <strong>Python 3.10</strong> or higher. Support for Python 3.8
has been dropped as it reached its end of life in October 2024. The
minimum required version of <strong>pytest is v8</strong>. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">#904</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">syrupy-project/syrupy#904</a>),
[<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">#1024</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">syrupy-project/syrupy#1024</a>))</p>
</li>
<li>
<p><strong>Method and constant name changes:</strong> Several methods
and constants have been renamed for improved clarity and to align with
public API standards.</p>
<ul>
<li>
<p><strong><code>SnapshotCollectionStorage</code></strong></p>
<ul>
<li><code>_read_snapshot_collection</code> -&gt;
<code>read_snapshot_collection</code></li>
<li><code>_read_snapshot_data_from_location</code> -&gt;
<code>read_snapshot_data_from_location</code></li>
<li><code>_write_snapshot_collection</code> -&gt;
<code>write_snapshot_collection</code></li>
<li><code>_get_file_basename</code> -&gt;
<code>get_file_basename</code></li>
<li><code>_file_extension</code> -&gt; <code>file_extension</code></li>
</ul>
</li>
<li>
<p><strong><code>AmberDataSerializer</code></strong></p>
<ul>
<li><code>_snapshot_sort_key</code> -&gt;
<code>snapshot_sort_key</code></li>
</ul>
</li>
<li>
<p><strong>Constants</strong></p>
<ul>
<li><code>SNAPSHOT_EMPTY_FOSSIL_KEY</code> -&gt;
<code>SNAPSHOT_EMPTY_COLLECTION_KEY</code></li>
<li><code>SNAPSHOT_UNKNOWN_FOSSIL_KEY</code> -&gt;
<code>SNAPSHOT_UNKNOWN_COLLECTION_KEY</code></li>
</ul>
</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/syrupy-project/syrupy/blob/main/CHANGELOG.md">syrupy's
changelog</a>.</em></p>
<blockquote>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v5.0.0...v5.1.0">5.1.0</a>
(2026-01-25)</h1>
<h3>Features</h3>
<ul>
<li>add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)
(<a
href="df9bc8f6b3">df9bc8f</a>)</li>
</ul>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v4.9.1...v5.0.0">5.0.0</a>
(2025-09-28)</h1>
<ul>
<li>Switch to MIT license (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">#945</a>)
(<a
href="d74d340f88">d74d340</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Block terminal summary for xdist workers. (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">#978</a>)
(<a
href="33a848df7c">33a848d</a>)</li>
<li>ensure syrupy's pytest_assertrepr_compare hook is called first. (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">#984</a>)
(<a
href="eb0024d339">eb0024d</a>)</li>
</ul>
<h3>Code Refactoring</h3>
<ul>
<li>remove incorrect private underscore prefix from public methods (<a
href="8cfc9059d3">8cfc905</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>add --snapshot-dirname option, close <a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">#810</a>
(<a
href="27135c7c86">27135c7</a>)</li>
<li>drop support for py3.8, raise min. pytest to v8 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">#904</a>)
(<a
href="a879ff15ad">a879ff1</a>)</li>
<li>update min. python version to 3.10 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">#1024</a>)
(<a
href="16b4113cd5">16b4113</a>)</li>
</ul>
<h3>BREAKING CHANGES</h3>
<ul>
<li>The following methods have been renamed:</li>
</ul>
<p>SnapshotCollectionStorage</p>
<ul>
<li>_read_snapshot_collection -&gt; read_snapshot_collection</li>
<li>_read_snapshot_data_from_location -&gt;
read_snapshot_data_from_location</li>
<li>_write_snapshot_collection -&gt; write_snapshot_collection</li>
<li>_get_file_basename -&gt; get_file_basename</li>
<li>_file_extension -&gt; file_extension</li>
</ul>
<p>AmberDataSerializer</p>
<ul>
<li>_snapshot_sort_key -&gt; snapshot_sort_key</li>
</ul>
<p>Renamed constants to improve clarity:</p>
<p>constants</p>
<ul>
<li>SNAPSHOT_EMPTY_FOSSIL_KEY -&gt; SNAPSHOT_EMPTY_COLLECTION_KEY</li>
<li>SNAPSHOT_UNKNOWN_FOSSIL_KEY -&gt;
SNAPSHOT_UNKNOWN_COLLECTION_KEY</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7096efdee6"><code>7096efd</code></a>
chore(release): 5.1.0 [skip ci]</li>
<li><a
href="07aa00dd48"><code>07aa00d</code></a>
chore(deps): update dependency attrs to v25 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1063">#1063</a>)</li>
<li><a
href="1f29ae061e"><code>1f29ae0</code></a>
docs: add bwrob as a contributor for code (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1064">#1064</a>)</li>
<li><a
href="df9bc8f6b3"><code>df9bc8f</code></a>
feat: add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)</li>
<li><a
href="841257deaf"><code>841257d</code></a>
chore(deps): update dependency coverage to v7.13.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1061">#1061</a>)</li>
<li><a
href="2d8dfa7f7b"><code>2d8dfa7</code></a>
chore(deps): update codecov/codecov-action action to v5.5.2 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1056">#1056</a>)</li>
<li><a
href="f5f9ef7702"><code>f5f9ef7</code></a>
chore(deps): update dependency debugpy to v1.8.18 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1057">#1057</a>)</li>
<li><a
href="eaeb6ae11f"><code>eaeb6ae</code></a>
chore(deps): update dependency pytest to v9.0.2 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1055">#1055</a>)</li>
<li><a
href="263b23b768"><code>263b23b</code></a>
chore(deps): update python docker tag to v3.14.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1054">#1054</a>)</li>
<li><a
href="a0dd77b023"><code>a0dd77b</code></a>
chore(deps): update actions/checkout action to v6.0.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1053">#1053</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/syrupy-project/syrupy/compare/v4.9.1...v5.1.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `ruff` from 0.12.12 to 0.15.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.15.0</h2>
<h2>Release Notes</h2>
<p>Released on 2026-02-03.</p>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.15.0">blog
post</a> for a migration guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<ul>
<li>
<p>Ruff now formats your code according to the 2026 style guide. See the
formatter section below or in the blog post for a detailed list of
changes.</p>
</li>
<li>
<p>The linter now supports block suppression comments. For example, to
suppress <code>N803</code> for all parameters in this function:</p>
<pre lang="python"><code># ruff: disable[N803]
def foo(
    legacyArg1,
    legacyArg2,
    legacyArg3,
    legacyArg4,
): ...
# ruff: enable[N803]
</code></pre>
<p>See the <a
href="https://docs.astral.sh/ruff/linter/#block-level">documentation</a>
for more details.</p>
</li>
<li>
<p>The <code>ruff:alpine</code> Docker image is now based on Alpine 3.23
(up from 3.21).</p>
</li>
<li>
<p>The <code>ruff:debian</code> and <code>ruff:debian-slim</code> Docker
images are now based on Debian 13 &quot;Trixie&quot; instead of Debian
12 &quot;Bookworm.&quot;</p>
</li>
<li>
<p>Binaries for the <code>ppc64</code> (64-bit big-endian PowerPC)
architecture are no longer included in our releases. It should still be
possible to build Ruff manually for this platform, if needed.</p>
</li>
<li>
<p>Ruff now resolves all <code>extend</code>ed configuration files
before falling back on a default Python version.</p>
</li>
</ul>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-http-call-httpx-in-async-function"><code>blocking-http-call-httpx-in-async-function</code></a>
(<code>ASYNC212</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-path-method-in-async-function"><code>blocking-path-method-in-async-function</code></a>
(<code>ASYNC240</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-input-in-async-function"><code>blocking-input-in-async-function</code></a>
(<code>ASYNC250</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/map-without-explicit-strict"><code>map-without-explicit-strict</code></a>
(<code>B912</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/if-exp-instead-of-or-operator"><code>if-exp-instead-of-or-operator</code></a>
(<code>FURB110</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/single-item-membership-test"><code>single-item-membership-test</code></a>
(<code>FURB171</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/missing-maxsplit-arg"><code>missing-maxsplit-arg</code></a>
(<code>PLC0207</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/unnecessary-lambda"><code>unnecessary-lambda</code></a>
(<code>PLW0108</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/unnecessary-empty-iterable-within-deque-call"><code>unnecessary-empty-iterable-within-deque-call</code></a>
(<code>RUF037</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/in-empty-collection"><code>in-empty-collection</code></a>
(<code>RUF060</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/legacy-form-pytest-raises"><code>legacy-form-pytest-raises</code></a>
(<code>RUF061</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/non-octal-permissions"><code>non-octal-permissions</code></a>
(<code>RUF064</code>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.15.0</h2>
<p>Released on 2026-02-03.</p>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.15.0">blog
post</a> for a migration
guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<ul>
<li>
<p>Ruff now formats your code according to the 2026 style guide. See the
formatter section below or in the blog post for a detailed list of
changes.</p>
</li>
<li>
<p>The linter now supports block suppression comments. For example, to
suppress <code>N803</code> for all parameters in this function:</p>
<pre lang="python"><code># ruff: disable[N803]
def foo(
    legacyArg1,
    legacyArg2,
    legacyArg3,
    legacyArg4,
): ...
# ruff: enable[N803]
</code></pre>
<p>See the <a
href="https://docs.astral.sh/ruff/linter/#block-level">documentation</a>
for more details.</p>
</li>
<li>
<p>The <code>ruff:alpine</code> Docker image is now based on Alpine 3.23
(up from 3.21).</p>
</li>
<li>
<p>The <code>ruff:debian</code> and <code>ruff:debian-slim</code> Docker
images are now based on Debian 13 &quot;Trixie&quot; instead of Debian
12 &quot;Bookworm.&quot;</p>
</li>
<li>
<p>Binaries for the <code>ppc64</code> (64-bit big-endian PowerPC)
architecture are no longer included in our releases. It should still be
possible to build Ruff manually for this platform, if needed.</p>
</li>
<li>
<p>Ruff now resolves all <code>extend</code>ed configuration files
before falling back on a default Python version.</p>
</li>
</ul>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-http-call-httpx-in-async-function"><code>blocking-http-call-httpx-in-async-function</code></a>
(<code>ASYNC212</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-path-method-in-async-function"><code>blocking-path-method-in-async-function</code></a>
(<code>ASYNC240</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-input-in-async-function"><code>blocking-input-in-async-function</code></a>
(<code>ASYNC250</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/map-without-explicit-strict"><code>map-without-explicit-strict</code></a>
(<code>B912</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/if-exp-instead-of-or-operator"><code>if-exp-instead-of-or-operator</code></a>
(<code>FURB110</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/single-item-membership-test"><code>single-item-membership-test</code></a>
(<code>FURB171</code>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ce5f7b6127"><code>ce5f7b6</code></a>
Bump 0.15.0 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23055">#23055</a>)</li>
<li><a
href="b4e40f539c"><code>b4e40f5</code></a>
[ty] Fix <code>__contains__</code> to respect descriptors (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23056">#23056</a>)</li>
<li><a
href="848cb72dc1"><code>848cb72</code></a>
[ty] Fix narrowing of nonlocal variables with conditional assignments
(<a
href="https://redirect.github.com/astral-sh/ruff/issues/22966">#22966</a>)</li>
<li><a
href="da7f33af22"><code>da7f33a</code></a>
[ty] Add a diagnostic for <code>Final</code> without assignment (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23001">#23001</a>)</li>
<li><a
href="e65f9a6b03"><code>e65f9a6</code></a>
Document markdown formatting feature (<a
href="https://redirect.github.com/astral-sh/ruff/issues/22990">#22990</a>)</li>
<li><a
href="c0c1b985c9"><code>c0c1b98</code></a>
Format markdown code blocks with line-by-line regex parse (<a
href="https://redirect.github.com/astral-sh/ruff/issues/22996">#22996</a>)</li>
<li><a
href="9f8f3e196b"><code>9f8f3e1</code></a>
Allow positional-only params with defaults in method overrides (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23037">#23037</a>)</li>
<li><a
href="ef83810e11"><code>ef83810</code></a>
[ty] ecosystem-analyzer: Support bare git repositories (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23054">#23054</a>)</li>
<li><a
href="54dfee4cb8"><code>54dfee4</code></a>
Customize where the <code>fix_title</code> sub-diagnostic appears (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23044">#23044</a>)</li>
<li><a
href="b53460799b"><code>b534607</code></a>
2026 Ruff Formatter Style (<a
href="https://redirect.github.com/astral-sh/ruff/issues/22735">#22735</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.12.12...0.15.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `mypy` from 1.18.2 to 1.19.1
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/python/mypy/blob/master/CHANGELOG.md">mypy's
changelog</a>.</em></p>
<blockquote>
<h3>Mypy 1.19.1</h3>
<ul>
<li>Fix noncommutative joins with bounded TypeVars (Shantanu, PR <a
href="https://redirect.github.com/python/mypy/pull/20345">20345</a>)</li>
<li>Respect output format for cached runs by serializing raw errors in
cache metas (Ivan Levkivskyi, PR <a
href="https://redirect.github.com/python/mypy/pull/20372">20372</a>)</li>
<li>Allow <code>types.NoneType</code> in match cases (A5rocks, PR <a
href="https://redirect.github.com/python/mypy/pull/20383">20383</a>)</li>
<li>Fix mypyc generator regression with empty tuple (BobTheBuidler, PR
<a
href="https://redirect.github.com/python/mypy/pull/20371">20371</a>)</li>
<li>Fix crash involving Unpack-ed TypeVarTuple (Shantanu, PR <a
href="https://redirect.github.com/python/mypy/pull/20323">20323</a>)</li>
<li>Fix crash on star import of redefinition (Ivan Levkivskyi, PR <a
href="https://redirect.github.com/python/mypy/pull/20333">20333</a>)</li>
<li>Fix crash on typevar with forward ref used in other module (Ivan
Levkivskyi, PR <a
href="https://redirect.github.com/python/mypy/pull/20334">20334</a>)</li>
<li>Fail with an explicit error on PyPy (Ivan Levkivskyi, PR <a
href="https://redirect.github.com/python/mypy/pull/20389">20389</a>)</li>
</ul>
<h3>Acknowledgements</h3>
<p>Thanks to all mypy contributors who contributed to this release:</p>
<ul>
<li>A5rocks</li>
<li>BobTheBuidler</li>
<li>bzoracler</li>
<li>Chainfire</li>
<li>Christoph Tyralla</li>
<li>David Foster</li>
<li>Frank Dana</li>
<li>Guo Ci</li>
<li>iap</li>
<li>Ivan Levkivskyi</li>
<li>James Hilton-Balfe</li>
<li>jhance</li>
<li>Joren Hammudoglu</li>
<li>Jukka Lehtosalo</li>
<li>KarelKenens</li>
<li>Kevin Kannammalil</li>
<li>Marc Mueller</li>
<li>Michael Carlstrom</li>
<li>Michael J. Sullivan</li>
<li>Piotr Sawicki</li>
<li>Randolf Scholz</li>
<li>Shantanu</li>
<li>Sigve Sebastian Farstad</li>
<li>sobolevn</li>
<li>Stanislav Terliakov</li>
<li>Stephen Morton</li>
<li>Theodore Ando</li>
<li>Thiago J. Barbalho</li>
<li>wyattscarpenter</li>
</ul>
<p>I’d also like to thank my employer, Dropbox, for supporting mypy
development.</p>
<h2>Mypy 1.18</h2>
<p>We’ve just uploaded mypy 1.18.1 to the Python Package Index (<a
href="https://pypi.org/project/mypy/">PyPI</a>).
Mypy is a static type checker for Python. This release includes new
features, performance</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="412c19a6bd"><code>412c19a</code></a>
Bump version to 1.19.1</li>
<li><a
href="20aea0a6ca"><code>20aea0a</code></a>
Update changelog for 1.19.1 (<a
href="https://redirect.github.com/python/mypy/issues/20414">#20414</a>)</li>
<li><a
href="2b23b50752"><code>2b23b50</code></a>
Serialize raw errors in cache metas (<a
href="https://redirect.github.com/python/mypy/issues/20372">#20372</a>)</li>
<li><a
href="f60f90fb88"><code>f60f90f</code></a>
Fail on PyPy in main instead of setup.py (<a
href="https://redirect.github.com/python/mypy/issues/20389">#20389</a>)</li>
<li><a
href="58d485b4ea"><code>58d485b</code></a>
Fail with an explicit error on PyPy (<a
href="https://redirect.github.com/python/mypy/issues/20384">#20384</a>)</li>
<li><a
href="a4b31a2678"><code>a4b31a2</code></a>
Allow <code>types.NoneType</code> in match cases (<a
href="https://redirect.github.com/python/mypy/issues/20383">#20383</a>)</li>
<li><a
href="8a6eff4784"><code>8a6eff4</code></a>
[mypyc] fix generator regression with empty tuple (<a
href="https://redirect.github.com/python/mypy/issues/20371">#20371</a>)</li>
<li><a
href="70eceea682"><code>70eceea</code></a>
Fix noncommutative joins with bounded TypeVars (<a
href="https://redirect.github.com/python/mypy/issues/20345">#20345</a>)</li>
<li><a
href="3890fc49bf"><code>3890fc4</code></a>
Fix crash involving Unpack-ed TypeVarTuple (<a
href="https://redirect.github.com/python/mypy/issues/20323">#20323</a>)</li>
<li><a
href="c93d917a86"><code>c93d917</code></a>
Fix crash on star import of redefinition (<a
href="https://redirect.github.com/python/mypy/issues/20333">#20333</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/python/mypy/compare/v1.18.2...v1.19.1">compare
view</a></li>
</ul>
</details>
<br />

Updates `pytest` from 8.4.2 to 9.0.2
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest/releases">pytest's
releases</a>.</em></p>
<blockquote>
<h2>9.0.2</h2>
<h1>pytest 9.0.2 (2025-12-06)</h1>
<h2>Bug fixes</h2>
<ul>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>:
The terminal progress feature added in pytest 9.0.0 has been disabled by
default, except on Windows, due to compatibility issues with some
terminal emulators.</p>
<p>You may enable it again by passing <code>-p terminalprogress</code>.
We may enable it by default again once compatibility improves in the
future.</p>
<p>Additionally, when the environment variable <code>TERM</code> is
<code>dumb</code>, the escape codes are no longer emitted, even if the
plugin is enabled.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>:
Fixed the TOML type of the <code>tmp_path_retention_count</code>
settings in the API reference from number to string.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13946">#13946</a>:
The private <code>config.inicfg</code> attribute was changed in a
breaking manner in pytest 9.0.0.
Due to its usage in the ecosystem, it is now restored to working order
using a compatibility shim.
It will be deprecated in pytest 9.1 and removed in pytest 10.</p>
</li>
<li>
<p><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13965">#13965</a>:
Fixed quadratic-time behavior when handling <code>unittest</code>
subtests in Python 3.10.</p>
</li>
</ul>
<h2>Improved documentation</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/4492">#4492</a>:
The API Reference now contains cross-reference-able documentation of
<code>pytest's command-line flags
&lt;command-line-flags&gt;</code>.</li>
</ul>
<h2>9.0.1</h2>
<h1>pytest 9.0.1 (2025-11-12)</h1>
<h2>Bug fixes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13895">#13895</a>:
Restore support for skipping tests via <code>raise
unittest.SkipTest</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13896">#13896</a>:
The terminal progress plugin added in pytest 9.0 is now automatically
disabled when iTerm2 is detected, it generated desktop notifications
instead of the desired functionality.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13904">#13904</a>:
Fixed the TOML type of the verbosity settings in the API reference from
number to string.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13910">#13910</a>:
Fixed <!-- raw HTML omitted -->UserWarning: Do not expect
file_or_dir<!-- raw HTML omitted --> on some earlier Python 3.12 and
3.13 point versions.</li>
</ul>
<h2>Packaging updates and notes for downstreams</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<h2>Contributor-facing changes</h2>
<ul>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13891">#13891</a>,
<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13942">#13942</a>:
The CI/CD part of the release automation is now capable of
creating GitHub Releases without having a Git checkout on
disk -- by <code>bluetech</code> and <code>webknjaz</code>.</li>
<li><a
href="https://redirect.github.com/pytest-dev/pytest/issues/13933">#13933</a>:
The tox configuration has been adjusted to make sure the desired
version string can be passed into its <code>package_env</code> through
the <code>SETUPTOOLS_SCM_PRETEND_VERSION_FOR_PYTEST</code> environment
variable as a part of the release process -- by
<code>webknjaz</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3d10b5148e"><code>3d10b51</code></a>
Prepare release version 9.0.2</li>
<li><a
href="188750b725"><code>188750b</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14030">#14030</a>
from pytest-dev/patchback/backports/9.0.x/1e4b01d1f...</li>
<li><a
href="b7d7bef90c"><code>b7d7bef</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14014">#14014</a>
from bluetech/compat-note</li>
<li><a
href="bd08e85ac7"><code>bd08e85</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14013">#14013</a>
from pytest-dev/patchback/backports/9.0.x/922b60377...</li>
<li><a
href="bc78386299"><code>bc78386</code></a>
Add CLI options reference documentation (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13930">#13930</a>)</li>
<li><a
href="5a4e398ce8"><code>5a4e398</code></a>
Fix docs typo (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14005">#14005</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14008">#14008</a>)</li>
<li><a
href="d7ae6df394"><code>d7ae6df</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/14006">#14006</a>
from pytest-dev/maintenance/update-plugin-list-tmpl...</li>
<li><a
href="556f6a22e1"><code>556f6a2</code></a>
pre-commit: fix rst-lint after new release (<a
href="https://redirect.github.com/pytest-dev/pytest/issues/13999">#13999</a>)
(<a
href="https://redirect.github.com/pytest-dev/pytest/issues/14001">#14001</a>)</li>
<li><a
href="c60fbe63a2"><code>c60fbe6</code></a>
Fix quadratic-time behavior when handling <code>unittest</code> subtests
in Python 3.10 ...</li>
<li><a
href="73d9b01118"><code>73d9b01</code></a>
Merge pull request <a
href="https://redirect.github.com/pytest-dev/pytest/issues/13995">#13995</a>
from nicoddemus/patchback/backports/9.0.x/1b5200c0f...</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest/compare/8.4.2...9.0.2">compare
view</a></li>
</ul>
</details>
<br />

Updates `syrupy` from 4.9.1 to 5.1.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/syrupy-project/syrupy/releases">syrupy's
releases</a>.</em></p>
<blockquote>
<h2>v5.1.0</h2>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v5.0.0...v5.1.0">5.1.0</a>
(2026-01-25)</h1>
<h3>Features</h3>
<ul>
<li>add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)
(<a
href="df9bc8f6b3">df9bc8f</a>)</li>
</ul>
<h2>v5.0.0</h2>
<h2>Syrupy 5.0.0</h2>
<p><em>(2025-09-28)</em></p>
<p>This release introduces new features, bug fixes, and a major license
change. It also includes several <strong>breaking changes</strong>, so
please review those carefully before upgrading.</p>
<hr />
<h3>New Features </h3>
<ul>
<li><strong>Add <code>--snapshot-dirname</code> option:</strong> A new
option, <code>--snapshot-dirname</code>, is now available to change the
default directory snapshots are stored in. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">#810</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">syrupy-project/syrupy#810</a>))</li>
<li><strong>Remove private underscore prefix:</strong> The unnecessary
underscore prefixes have been removed from public methods for better
code clarity. ([<a
href="8cfc9059d3">8cfc905</a>](<a
href="8cfc9059d3</a>))</li>
</ul>
<hr />
<h3>Bug Fixes 🐛</h3>
<ul>
<li><strong>Fix terminal summary for <code>xdist</code>
workers:</strong> Resolves an issue where the terminal summary was not
displayed correctly with <code>xdist</code> workers. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">#978</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">syrupy-project/syrupy#978</a>))</li>
<li><strong>Ensure <code>pytest_assertrepr_compare</code> hook is called
first:</strong> This change ensures that Syrupy's assertion hook takes
precedence, improving compatibility with other plugins. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">#984</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">syrupy-project/syrupy#984</a>))</li>
</ul>
<hr />
<h3>Breaking Changes ⚠️</h3>
<ul>
<li>
<p><strong>License change:</strong> The project has switched to the more
permissive <strong>MIT license</strong>. This change applies to all
versions from 5.0.0 and beyond. If you need to use the previous Apache
2.0 license, you must continue to use Syrupy versions 4.x or earlier.
([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">#945</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">syrupy-project/syrupy#945</a>))</p>
</li>
<li>
<p><strong>Python and pytest version requirements:</strong> Syrupy now
requires <strong>Python 3.10</strong> or higher. Support for Python 3.8
has been dropped as it reached its end of life in October 2024. The
minimum required version of <strong>pytest is v8</strong>. ([<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">#904</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">syrupy-project/syrupy#904</a>),
[<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">#1024</a>](<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">syrupy-project/syrupy#1024</a>))</p>
</li>
<li>
<p><strong>Method and constant name changes:</strong> Several methods
and constants have been renamed for improved clarity and to align with
public API standards.</p>
<ul>
<li>
<p><strong><code>SnapshotCollectionStorage</code></strong></p>
<ul>
<li><code>_read_snapshot_collection</code> -&gt;
<code>read_snapshot_collection</code></li>
<li><code>_read_snapshot_data_from_location</code> -&gt;
<code>read_snapshot_data_from_location</code></li>
<li><code>_write_snapshot_collection</code> -&gt;
<code>write_snapshot_collection</code></li>
<li><code>_get_file_basename</code> -&gt;
<code>get_file_basename</code></li>
<li><code>_file_extension</code> -&gt; <code>file_extension</code></li>
</ul>
</li>
<li>
<p><strong><code>AmberDataSerializer</code></strong></p>
<ul>
<li><code>_snapshot_sort_key</code> -&gt;
<code>snapshot_sort_key</code></li>
</ul>
</li>
<li>
<p><strong>Constants</strong></p>
<ul>
<li><code>SNAPSHOT_EMPTY_FOSSIL_KEY</code> -&gt;
<code>SNAPSHOT_EMPTY_COLLECTION_KEY</code></li>
<li><code>SNAPSHOT_UNKNOWN_FOSSIL_KEY</code> -&gt;
<code>SNAPSHOT_UNKNOWN_COLLECTION_KEY</code></li>
</ul>
</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/syrupy-project/syrupy/blob/main/CHANGELOG.md">syrupy's
changelog</a>.</em></p>
<blockquote>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v5.0.0...v5.1.0">5.1.0</a>
(2026-01-25)</h1>
<h3>Features</h3>
<ul>
<li>add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)
(<a
href="df9bc8f6b3">df9bc8f</a>)</li>
</ul>
<h1><a
href="https://github.com/syrupy-project/syrupy/compare/v4.9.1...v5.0.0">5.0.0</a>
(2025-09-28)</h1>
<ul>
<li>Switch to MIT license (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/945">#945</a>)
(<a
href="d74d340f88">d74d340</a>)</li>
</ul>
<h3>Bug Fixes</h3>
<ul>
<li>Block terminal summary for xdist workers. (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/978">#978</a>)
(<a
href="33a848df7c">33a848d</a>)</li>
<li>ensure syrupy's pytest_assertrepr_compare hook is called first. (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/984">#984</a>)
(<a
href="eb0024d339">eb0024d</a>)</li>
</ul>
<h3>Code Refactoring</h3>
<ul>
<li>remove incorrect private underscore prefix from public methods (<a
href="8cfc9059d3">8cfc905</a>)</li>
</ul>
<h3>Features</h3>
<ul>
<li>add --snapshot-dirname option, close <a
href="https://redirect.github.com/syrupy-project/syrupy/issues/810">#810</a>
(<a
href="27135c7c86">27135c7</a>)</li>
<li>drop support for py3.8, raise min. pytest to v8 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/904">#904</a>)
(<a
href="a879ff15ad">a879ff1</a>)</li>
<li>update min. python version to 3.10 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1024">#1024</a>)
(<a
href="16b4113cd5">16b4113</a>)</li>
</ul>
<h3>BREAKING CHANGES</h3>
<ul>
<li>The following methods have been renamed:</li>
</ul>
<p>SnapshotCollectionStorage</p>
<ul>
<li>_read_snapshot_collection -&gt; read_snapshot_collection</li>
<li>_read_snapshot_data_from_location -&gt;
read_snapshot_data_from_location</li>
<li>_write_snapshot_collection -&gt; write_snapshot_collection</li>
<li>_get_file_basename -&gt; get_file_basename</li>
<li>_file_extension -&gt; file_extension</li>
</ul>
<p>AmberDataSerializer</p>
<ul>
<li>_snapshot_sort_key -&gt; snapshot_sort_key</li>
</ul>
<p>Renamed constants to improve clarity:</p>
<p>constants</p>
<ul>
<li>SNAPSHOT_EMPTY_FOSSIL_KEY -&gt; SNAPSHOT_EMPTY_COLLECTION_KEY</li>
<li>SNAPSHOT_UNKNOWN_FOSSIL_KEY -&gt;
SNAPSHOT_UNKNOWN_COLLECTION_KEY</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="7096efdee6"><code>7096efd</code></a>
chore(release): 5.1.0 [skip ci]</li>
<li><a
href="07aa00dd48"><code>07aa00d</code></a>
chore(deps): update dependency attrs to v25 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1063">#1063</a>)</li>
<li><a
href="1f29ae061e"><code>1f29ae0</code></a>
docs: add bwrob as a contributor for code (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1064">#1064</a>)</li>
<li><a
href="df9bc8f6b3"><code>df9bc8f</code></a>
feat: add serializer plugin system; plugins for data models (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1062">#1062</a>)</li>
<li><a
href="841257deaf"><code>841257d</code></a>
chore(deps): update dependency coverage to v7.13.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1061">#1061</a>)</li>
<li><a
href="2d8dfa7f7b"><code>2d8dfa7</code></a>
chore(deps): update codecov/codecov-action action to v5.5.2 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1056">#1056</a>)</li>
<li><a
href="f5f9ef7702"><code>f5f9ef7</code></a>
chore(deps): update dependency debugpy to v1.8.18 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1057">#1057</a>)</li>
<li><a
href="eaeb6ae11f"><code>eaeb6ae</code></a>
chore(deps): update dependency pytest to v9.0.2 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1055">#1055</a>)</li>
<li><a
href="263b23b768"><code>263b23b</code></a>
chore(deps): update python docker tag to v3.14.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1054">#1054</a>)</li>
<li><a
href="a0dd77b023"><code>a0dd77b</code></a>
chore(deps): update actions/checkout action to v6.0.1 (<a
href="https://redirect.github.com/syrupy-project/syrupy/issues/1053">#1053</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/syrupy-project/syrupy/compare/v4.9.1...v5.1.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `ruff` from 0.14.11 to 0.15.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.15.0</h2>
<h2>Release Notes</h2>
<p>Released on 2026-02-03.</p>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.15.0">blog
post</a> for a migration guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<ul>
<li>
<p>Ruff now formats your code according to the 2026 style guide. See the
formatter section below or in the blog post for a detailed list of
changes.</p>
</li>
<li>
<p>The linter now supports block suppression comments. For example, to
suppress <code>N803</code> for all parameters in this function:</p>
<pre lang="python"><code># ruff: disable[N803]
def foo(
    legacyArg1,
    legacyArg2,
    legacyArg3,
    legacyArg4,
): ...
# ruff: enable[N803]
</code></pre>
<p>See the <a
href="https://docs.astral.sh/ruff/linter/#block-level">documentation</a>
for more details.</p>
</li>
<li>
<p>The <code>ruff:alpine</code> Docker image is now based on Alpine 3.23
(up from 3.21).</p>
</li>
<li>
<p>The <code>ruff:debian</code> and <code>ruff:debian-slim</code> Docker
images are now based on Debian 13 &quot;Trixie&quot; instead of Debian
12 &quot;Bookworm.&quot;</p>
</li>
<li>
<p>Binaries for the <code>ppc64</code> (64-bit big-endian PowerPC)
architecture are no longer included in our releases. It should still be
possible to build Ruff manually for this platform, if needed.</p>
</li>
<li>
<p>Ruff now resolves all <code>extend</code>ed configuration files
before falling back on a default Python version.</p>
</li>
</ul>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-http-call-httpx-in-async-function"><code>blocking-http-call-httpx-in-async-function</code></a>
(<code>ASYNC212</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-path-method-in-async-function"><code>blocking-path-method-in-async-function</code></a>
(<code>ASYNC240</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/blocking-input-in-async-function"><code>blocking-input-in-async-function</code></a>
(<code>ASYNC250</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/map-without-explicit-strict"><code>map-without-explicit-strict</code></a>
(<code>B912</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/if-exp-instead-of-or-operator"><code>if-exp-instead-of-or-operator</code></a>
(<code>FURB110</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/single-item-membership-test"><code>single-item-membership-test</code></a>
(<code>FURB171</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/missing-maxsplit-arg"><code>missing-maxsplit-arg</code></a>
(<code>PLC0207</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/unnecessary-lambda"><code>unnecessary-lambda</code></a>
(<code>PLW0108</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/unnecessary-empty-iterable-within-deque-call"><code>unnecessary-empty-iterable-within-deque-call</code></a>
(<code>RUF037</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/in-empty-collection"><code>in-empty-collection</code></a>
(<code>RUF060</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/legacy-form-pytest-raises"><code>legacy-form-pytest-raises</code></a>
(<code>RUF061</code>)</li>
<li><a
href="https://docs.astral.sh/ruff/rules/non-octal-permissions"><code>non-octal-permissions</code></a>
(<code>RUF064</code>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.15.0</h2>
<p>Released on 2026-02-03.</p>
<p>Check out the <a href="https://astral.sh/blog/ruff-v0.15.0">blog
post</a> for a migration
guide and overview of the changes!</p>
<h3>Breaking changes</h3>
<ul>
<li>
<p>Ruff now formats your code according to the 2026 style guide. See the
formatter section below or in the blog post for a detailed list of
changes.</p>
</li>
<li>
<p>The linter now supports block suppression comments. For example, to
suppress <code>N803</code> for all parameters in this function:</p>
<pre lang="python"><code># ruff: disable[N803]
def foo(
    legacyArg1,
    legacyArg2,
    legacyArg3,
    legacyArg4,
): ...
# ruff: enable[N803]
</code></pre>
<p>See the <a
href="https://docs.astral.sh/ruff/linter/#block-level">documentation</a>
for more details.</p>
</li>
<li>
<p>The <code>ruff:alpine</code> Docker image is now based on Alpine 3.23
(up from 3.21).</p>
</li>
<li>
<p>The <code>ruff:debian</code> and <code>ruff:debian-slim</code> Docker
images are now based on Debian 13 &quot;Trixie&quot; instead of Debian
12 &quot;Bookworm.&quot;</p>
</li>
<li>
<p>Binaries for the <code>ppc64</code> (64-bit big-endian PowerPC)
architecture are no longer included in our releases. It should still be
possible to build Ruff manually for this platform, if needed.</p>
</li>
<li>
<p>Ruff now resolves all <code>extend</code>ed configuration files
before falling back on a default Python version.</p>
</li>
</ul>
<h3>Stabilization</h3>
<p>The following rules have been stabilized and are no longer in
preview:</p>
<ul>
<li><a href="h...

_Description has been truncated_

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: John Kennedy <65985482+jkennedyvz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 23:30:23 -08:00
Rohan Disa
16f2c7d13b fix(text-splitters): reverse preserved elements iterator in HTMLSemanticPreservingSplitter (#34080) 2026-02-02 18:25:39 -05:00
Bodhi Russell Silberling
cb09f94e82 fix(text-splitters): fix typo 'seperator' -> 'separator' in test docs… (#34857)
…tring

(Replace this entire block of text)

Read the full contributing guidelines:
https://docs.langchain.com/oss/python/contributing/overview

Thank you for contributing to LangChain! Follow these steps to have your
pull request considered as ready for review.

1. PR title: Should follow the format: TYPE(SCOPE): DESCRIPTION

  - Examples:
    - fix(anthropic): resolve flag parsing error
    - feat(core): add multi-tenant support
    - test(openai): update API usage tests
- Allowed TYPE and SCOPE values:
https://github.com/langchain-ai/langchain/blob/master/.github/workflows/pr_lint.yml#L15-L33

2. PR description:

  - Write 1-2 sentences summarizing the change.
- If this PR addresses a specific issue, please include "Fixes
#ISSUE_NUMBER" in the description to automatically close the issue when
the PR is merged.
  - If there are any breaking changes, please clearly describe them.
- If this PR depends on another PR being merged first, please include
"Depends on #PR_NUMBER" in the description.

3. Run `make format`, `make lint` and `make test` from the root of the
package(s) you've modified.

  - We will not consider a PR unless these three are passing in CI.

Additional guidelines:

- We ask that if you use generative AI for your contribution, you
include a disclaimer.
- PRs should not touch more than one package unless absolutely
necessary.
- Do not update the `uv.lock` files or add dependencies to
`pyproject.toml` files (even optional ones) unless you have explicit
permission to do so by a maintainer.
2026-01-23 13:59:23 -05:00
Christophe Bornet
fd69425439 style(text-splitters): fix some ruff preview rules (#34665)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
2026-01-09 17:28:18 -05:00
Manas karthik
048de6dfb6 test(text-splitters): add edge case tests for CharacterTextSplitter (#34628) 2026-01-07 11:06:44 -05:00
Julia (Juli) Huang
cd5b36456a fix(text-splitters): HTMLSemanticPreservingSplitter nested preserved … (#34587)
Summary
Fixes an issue where HTMLSemanticPreservingSplitter failed to preserve
elements nested inside non-container tags. With these changes, preserved
elements are now correctly detected and handled at any nesting depth.

Root Cause
`_process_element()` only recursed into a small set of hard-coded
container tags (`html`, `body`, `div`, `main`). For other tags, the
subtree was flattened into text, preventing nested preserved elements
(inside `<p>`, `<section>`, `<article>`, etc.) from being detected.


Fix
- Updated traversal logic in _process_element (html.py) to recursively
process child elements for any tag that contains nested elements
- Avoided duplicate text extraction
- Preserved correct placeholder ordering
- Treated leaf nodes as text only

Tests
Adds regression tests covering preserved elements nested inside
non-container tags, including:
- table inside section
- nested divs
- code inside paragraph

All existing tests pass (make lint, format, test, etc).

Breaking changes
None.

Fixes
Fixes #31569

Disclaimer
GitHub Copilot was used to assist with test case design in
test_text_splitters.py and documentation comments; all code logic was
manually implemented and reviewed.

---------

Co-authored-by: julih <julih@julihs-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <github@mdrxy.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2026-01-05 10:28:27 -05:00
rari404
0f940d74b2 feat(text-splitters): add R programming language support (#34241) 2025-12-12 13:34:22 -05:00
Christophe Bornet
ef79c26f18 chore(cli,standard-tests,text-splitters): fix some ruff TC rules (#33934)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-11-12 14:06:31 -05:00
Mason Daugherty
e731ba1e47 style: more refs work (#33616) 2025-10-20 18:40:19 -04:00
Christophe Bornet
83901b30e3 chore(text-splitters): remove arg types from docstrings (#33406)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-10-10 11:37:53 -04:00
Mason Daugherty
6fc21afbc9 style: .. code-block:: admonition translations (#33400)
biiiiiiiiiiiiiiiigggggggg pass
2025-10-09 16:52:58 -04:00
Mason Daugherty
eaa6dcce9e release: v1.0.0 (#32567)
Co-authored-by: Mohammad Mohtashim <45242107+keenborder786@users.noreply.github.com>
Co-authored-by: Caspar Broekhuizen <caspar@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Sadra Barikbin <sadraqazvin1@yahoo.com>
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2025-10-02 10:49:42 -04:00
Mason Daugherty
e3efd1e891 test(text-splitters): capture beta warnings (#33113) 2025-09-25 01:30:20 -04:00
Mason Daugherty
d6769cf032 test(text-splitters): resolve pytest marker warning (#33112)
#33111
2025-09-25 01:29:42 -04:00
Hyunjoon Jeong
9cc85387d1 fix(text-splitters): add validation to prevent infinite loop and prevent empty token splitter (#32205)
### Description
1) Add validation to prevent infinite loop condition when
```tokenizer.tokens_per_chunk > tokenizer.chunk_overlap```
2) Avoid empty decoded chunk when splitter appends tokens

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-09-11 16:55:32 -04:00
Christophe Bornet
8b90eae455 chore(text-splitters): enable ruff docstring-code-format (#32854) 2025-09-08 16:40:11 -04:00
Christophe Bornet
0c3e8ccd0e chore(text-splitters): select ALL rules with exclusions (#32325)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-08 14:46:09 +00:00
Mason Daugherty
6b5fdfb804 release(text-splitters): 0.3.11 (#32770)
Fixes #32747

SpaCy integration test fixture was trying to use pip to download the
SpaCy language model (`en_core_web_sm`), but uv-managed environments
don't include pip by default. Fail test if not installed as opposed to
downloading.
2025-08-31 23:00:05 +00:00
Keyu Chen
03138f41a0 feat(text-splitters): add optional custom header pattern support (#31887)
## Description

This PR adds support for custom header patterns in
`MarkdownHeaderTextSplitter`, allowing users to define non-standard
Markdown header formats (like `**Header**`) and specify their hierarchy
levels.

**Issue:** Fixes #22738

**Dependencies:** None - this change has no new dependencies

**Key Changes:**
- Added optional `custom_header_patterns` parameter to support
non-standard header formats
- Enable splitting on patterns like `**Header**` and `***Header***`
- Maintain full backward compatibility with existing usage
- Added comprehensive tests for custom and mixed header scenarios

## Example Usage

```python
from langchain_text_splitters import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("**", "Chapter"),
    ("***", "Section"),
]

custom_header_patterns = {
    "**": 1,   # Level 1 headers
    "***": 2,  # Level 2 headers
}

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on,
    custom_header_patterns=custom_header_patterns,
)

# Now **Chapter 1** is treated as a level 1 header
# And ***Section 1.1*** is treated as a level 2 header
```

## Testing

-  Added unit tests for custom header patterns
-  Added tests for mixed standard and custom headers
-  All existing tests pass (backward compatibility maintained)
-  Linting and formatting checks pass

---

The implementation provides a flexible solution while maintaining the
simplicity of the existing API. Users can continue using the splitter
exactly as before, with the new functionality being entirely opt-in
through the `custom_header_patterns` parameter.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 10:10:49 -04:00
Mason Daugherty
c31236264e chore: formatting across codebase (#32466) 2025-08-08 10:20:10 -04:00
Mason Daugherty
96cbd90cba fix: formatting issues in docstrings (#32265)
Ensures proper reStructuredText formatting by adding the required blank
line before closing docstring quotes, which resolves the "Block quote
ends without a blank line; unexpected unindent" warning.
2025-07-27 23:37:47 -04:00
Fabio Fontana
fd168e1c11 feat(text-splitters): add Visual Basic 6 support (#31173)
### **Description**

Add Visual Basic 6 support.

---

### **Issue**

No specific issue addressed.

---

### **Dependencies**

No additional dependencies required.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-07-14 13:51:16 +00:00
Christophe Bornet
060fc0e3c9 text-splitters: Add ruff rules FBT (#31935)
See [flake8-boolean-trap
(FBT)](https://docs.astral.sh/ruff/rules/#flake8-boolean-trap-fbt)
2025-07-09 18:36:58 -04:00
Michael Li
5b3e29f809 text splitters: add chunk_size and chunk_overlap validations (#31916)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
  - Example: "core: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
2025-07-08 12:22:33 -04:00
Christophe Bornet
451c90fefa text-splitters: Ruff autofixes (#31858)
Auto-fixes from ruff with rule `ALL`
2025-07-07 10:06:08 -04:00
Christophe Bornet
802d2bf249 text-splitters: Add ruff rule UP (pyupgrade) (#31841)
See https://docs.astral.sh/ruff/rules/#pyupgrade-up
All auto-fixed except `typing.AbstractSet` -> `collections.abc.Set`
2025-07-03 10:11:35 -04:00
Cole Murray
43eef43550 security: Remove xslt_path and harden XML parsers in HTMLSectionSplitter: package: langchain-text-splitters (#31819)
## Summary
- Removes the `xslt_path` parameter from HTMLSectionSplitter to
eliminate XXE attack vector
- Hardens XML/HTML parsers with secure configurations to prevent XXE
attacks
- Adds comprehensive security tests to ensure the vulnerability is fixed

  ## Context
This PR addresses a critical XXE vulnerability discovered in the
HTMLSectionSplitter component. The vulnerability allowed attackers to:
- Read sensitive local files (SSH keys, passwords, configuration files)
  - Perform Server-Side Request Forgery (SSRF) attacks
  - Exfiltrate data to attacker-controlled servers

  ## Changes Made
1. **Removed `xslt_path` parameter** - This eliminates the primary
attack vector where users could supply malicious XSLT files
2. **Hardened XML parsers** - Added security configurations to prevent
XXE attacks even with the default XSLT:
     - `no_network=True` - Blocks network access
- `resolve_entities=False` - Prevents entity expansion -
`load_dtd=False` - Disables DTD processing -
`XSLTAccessControl.DENY_ALL` - Blocks all file/network I/O in XSLT
transformations

3. **Added security tests** - New test file `test_html_security.py` with
comprehensive tests for various XXE attack vectors
4. **Updated existing tests** - Modified tests that were using the
removed `xslt_path` parameter

  ## Test Plan
  - [x] All existing tests pass
  - [x] New security tests verify XXE attacks are blocked
  - [x] Code passes linting and formatting checks
  - [x] Tested with both old and new versions of lxml


Twitter handle: @_colemurray
2025-07-02 15:24:08 -04:00
Raghu Kapur
2c9859956a text-splitters: fix stale header metadata in ExperimentalMarkdownSyntaxTextSplitter (#31622)
**Description:**

Previously, when transitioning from a deeper Markdown header (e.g., ###)
to a shallower one (e.g., ##), the
ExperimentalMarkdownSyntaxTextSplitter retained the deeper header in the
metadata.

This commit updates the `_resolve_header_stack` method to remove headers
at the same or deeper levels before appending the current header. As a
result, each chunk now reflects only the active header context.

Fixes unexpected metadata leakage across sections in nested Markdown
documents.

Additionally, test cases have been updated to:
- Validate correct header resolution and metadata assignment.
- Cover edge cases with nested headers and horizontal rules.

**Issue:** 
Fixes [#31596](https://github.com/langchain-ai/langchain/issues/31596)

**Dependencies:**
None

**Twitter handle:** -> [_RaghuKapur](https://twitter.com/_RaghuKapur)

**LinkedIn:** ->
[https://www.linkedin.com/in/raghukapur/](https://www.linkedin.com/in/raghukapur/)
2025-06-20 15:52:17 -04:00
Tom-Trumper
532e6455e9 text-splitters: Add keep_separator arg to HTMLSemanticPreservingSplitter (#31588)
### Description
Add keep_separator arg to HTMLSemanticPreservingSplitter and pass value
to instance of RecursiveCharacterTextSplitter used under the hood.
### Issue
Documents returned by `HTMLSemanticPreservingSplitter.split_text(text)`
are defaulted to use separators at beginning of page_content. [See third
and fourth document in example output from how-to
guide](https://python.langchain.com/docs/how_to/split_html/#using-htmlsemanticpreservingsplitter):
```
[Document(metadata={'Header 1': 'Main Title'}, page_content='This is an introductory paragraph with some basic content.'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='This section introduces the topic'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content='. Below is a list: First item Second item Third item with bold text and a link Subsection 1.1: Details This subsection provides additional details'),
 Document(metadata={'Header 2': 'Section 1: Introduction'}, page_content=". Here's a table: Header 1 Header 2 Header 3 Row 1, Cell 1 Row 1, Cell 2 Row 1, Cell 3 Row 2, Cell 1 Row 2, Cell 2 Row 2, Cell 3"),
 Document(metadata={'Header 2': 'Section 2: Media Content'}, page_content='This section contains an image and a video: ![image:example_image_link.mp4](example_image_link.mp4) ![video:example_video_link.mp4](example_video_link.mp4)'),
 Document(metadata={'Header 2': 'Section 3: Code Example'}, page_content='This section contains a code block: <code:html> <div> <p>This is a paragraph inside a div.</p> </div> </code>'),
 Document(metadata={'Header 2': 'Conclusion'}, page_content='This is the conclusion of the document.')]
```
### Dependencies
None

@ttrumper3
2025-06-14 17:56:14 -04:00
Sumin Shin
683da2c9e9 text-splitters: Fix regex separator merge bug in CharacterTextSplitter (#31137)
**Description:**
Fix the merge logic in `CharacterTextSplitter.split_text` so that when
using a regex lookahead separator (`is_separator_regex=True`) with
`keep_separator=False`, the raw pattern is not re-inserted between
chunks.

**Issue:**
Fixes #31136 

**Dependencies:**
None

**Twitter handle:**
None

Since this is my first open-source PR, please feel free to point out any
mistakes, and I'll be eager to make corrections.
2025-05-10 15:42:03 -04:00
Christophe Bornet
8c5ae108dd text-splitters: Set strict mypy rules (#30900)
* Add strict mypy rules
* Fix mypy violations
* Add error codes to all type ignores
* Add ruff rule PGH003
* Bump mypy version to 1.15
2025-04-22 20:41:24 -07:00
Ben
789db7398b text-splitters: Add JSFrameworkTextSplitter for Handling JavaScript Framework Code (#28972)
## Description
This pull request introduces a new text splitter,
`JSFrameworkTextSplitter`, to the Langchain library. The
`JSFrameworkTextSplitter` extends the `RecursiveCharacterTextSplitter`
to handle JavaScript framework code effectively, including React (JSX),
Vue, and Svelte. It identifies and utilizes framework-specific component
tags and syntax elements as splitting points, alongside standard
JavaScript syntax. This ensures that code is divided at natural
boundaries, enhancing the parsing and processing of JavaScript and
framework-specific code.

### Key Features
- Supports React (JSX), Vue, and Svelte frameworks.
- Identifies and uses framework-specific tags and syntax elements as
natural splitting points.
- Extends the existing `RecursiveCharacterTextSplitter` for seamless
integration.

## Issue
No specific issue addressed.

## Dependencies
No additional dependencies required.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-03-17 23:32:33 +00:00
ccurme
d172984c91 infra: migrate to uv (#29566) 2025-02-06 13:36:26 -05:00
Christophe Bornet
836c791829 text-splitters: Bump ruff version to 0.9 (#29231)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:27:58 +00:00
Ahmed Tammaa
d3ed9b86be text-splitters[minor]: Replace lxml and XSLT with BeautifulSoup in HTMLHeaderTextSplitter for Improved Large HTML File Processing (#27678)
This pull request updates the `HTMLHeaderTextSplitter` by replacing the
`split_text_from_file` method's implementation. The original method used
`lxml` and XSLT for processing HTML files, which caused
`lxml.etree.xsltapplyerror maxhead` when handling large HTML documents
due to limitations in the XSLT processor. Fixes #13149

By switching to BeautifulSoup (`bs4`), we achieve:

- **Improved Performance and Reliability:** BeautifulSoup efficiently
processes large HTML files without the errors associated with `lxml` and
XSLT.
- **Simplified Dependencies:** Removes the dependency on `lxml` and
external XSLT files, relying instead on the widely used `beautifulsoup4`
library.
- **Maintained Functionality:** The new method replicates the original
behavior, ensuring compatibility with existing code and preserving the
extraction of content and metadata.

**Issue:**

This change addresses issues related to processing large HTML files with
the existing `HTMLHeaderTextSplitter` implementation. It resolves
problems where users encounter lxml.etree.xsltapplyerror maxhead due to
large HTML documents.

**Dependencies:**

- **BeautifulSoup (`beautifulsoup4`):** The `beautifulsoup4` library is
now used for parsing HTML content.
  - Installation: `pip install beautifulsoup4`

**Code Changes:**

Updated the `split_text_from_file` method in `HTMLHeaderTextSplitter` as
follows:

```python
def split_text_from_file(self, file: Any) -> List[Document]:
    """Split HTML file using BeautifulSoup.

    Args:
        file: HTML file path or file-like object.

    Returns:
        List of Document objects with page_content and metadata.
    """
    from bs4 import BeautifulSoup
    from langchain.docstore.document import Document
    import bs4

    # Read the HTML content from the file or file-like object
    if isinstance(file, str):
        with open(file, 'r', encoding='utf-8') as f:
            html_content = f.read()
    else:
        # Assuming file is a file-like object
        html_content = file.read()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the header tags and their corresponding metadata keys
    headers_to_split_on = [tag[0] for tag in self.headers_to_split_on]
    header_mapping = dict(self.headers_to_split_on)

    documents = []

    # Find the body of the document
    body = soup.body if soup.body else soup

    # Find all header tags in the order they appear
    all_headers = body.find_all(headers_to_split_on)

    # If there's content before the first header, collect it
    first_header = all_headers[0] if all_headers else None
    if first_header:
        pre_header_content = ''
        for elem in first_header.find_all_previous():
            if isinstance(elem, bs4.Tag):
                text = elem.get_text(separator=' ', strip=True)
                if text:
                    pre_header_content = text + ' ' + pre_header_content
        if pre_header_content.strip():
            documents.append(Document(
                page_content=pre_header_content.strip(),
                metadata={}  # No metadata since there's no header
            ))
    else:
        # If no headers are found, return the whole content
        full_text = body.get_text(separator=' ', strip=True)
        if full_text.strip():
            documents.append(Document(
                page_content=full_text.strip(),
                metadata={}
            ))
        return documents

    # Process each header and its associated content
    for header in all_headers:
        current_metadata = {}
        header_name = header.name
        header_text = header.get_text(separator=' ', strip=True)
        current_metadata[header_mapping[header_name]] = header_text

        # Collect all sibling elements until the next header of the same or higher level
        content_elements = []
        for sibling in header.find_next_siblings():
            if sibling.name in headers_to_split_on:
                # Stop at the next header
                break
            if isinstance(sibling, bs4.Tag):
                content_elements.append(sibling)

        # Get the text content of the collected elements
        current_content = ''
        for elem in content_elements:
            text = elem.get_text(separator=' ', strip=True)
            if text:
                current_content += text + ' '

        # Create a Document if there is content
        if current_content.strip():
            documents.append(Document(
                page_content=current_content.strip(),
                metadata=current_metadata.copy()
            ))
        else:
            # If there's no content, but we have metadata, still create a Document
            documents.append(Document(
                page_content='',
                metadata=current_metadata.copy()
            ))

    return documents
```

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 16:10:37 -05:00
Luke
f69695069d text_splitters: Add HTMLSemanticPreservingSplitter (#25911)
**Description:** 

With current HTML splitters, they rely on secondary use of the
`RecursiveCharacterSplitter` to further chunk the document into
manageable chunks. The issue with this is it fails to maintain important
structures such as tables, lists, etc within HTML.

This Implementation of a HTML splitter, allows the user to define a
maximum chunk size, HTML elements to preserve in full, options to
preserve `<a>` href links in the output and custom handlers.

The core splitting begins with headers, similar to `HTMLHeaderSplitter`.
If these sections exceed the length of the `max_chunk_size` further
recursive splitting is triggered. During this splitting, elements listed
to preserve, will be excluded from the splitting process. This can cause
chunks to be slightly larger then the max size, depending on preserved
length. However, all contextual relevance of the preserved item remains
intact.

**Custom Handlers**: Sometimes, companies such as Atlassian have custom
HTML elements, that are not parsed by default with `BeautifulSoup`.
Custom handlers allows a user to provide a function to be ran whenever a
specific html tag is encountered. This allows the user to preserve and
gather information within custom html tags that `bs4` will potentially
miss during extraction.

**Dependencies:** User will need to install `bs4` in their project to
utilise this class

I have also added in `how_to` and unit tests, which require `bs4` to
run, otherwise they will be skipped.

Flowchart of process:


![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 12:09:22 -05:00
Yuxin Chen
3256b5d6ae text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373)
- **Description:** 
This PR resolves an issue with the
`ExperimentalMarkdownSyntaxTextSplitter` class, which retains the
internal state across multiple calls to the `split_text` method. This
behaviour caused an unintended accumulation of chunks in `self`
variables, leading to incorrect outputs when processing multiple
Markdown files sequentially.

- Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to
reset the relevant internal attributes at the start of each `split_text`
invocation. This ensures each call processes the input independently.
- Added unit tests in
`libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify
the fix and ensure the state does not persist across calls.

- **Issue:**  
Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440).

- **Dependencies:**
No additional dependencies are introduced with this change.


- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.  
- [x] Ran `make format`, `make lint`, and `make test` to ensure
compliance with project standards.

---------

Co-authored-by: Angel Chen <angelchen396@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 20:27:59 +00:00
Antonio Lanza
b2102b8cc4 text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782)
This PR closes #27781

# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize

output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success.        Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```

# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-16 19:53:15 +00:00
Matthew DeGenaro
66828f4ecc text-splitters[patch]: Modified SpacyTextSplitter to fully keep whitespace when strip_whitespace is false (#23272)
Previously, regardless of whether or not strip_whitespace was set to
true or false, the strip text method in the SpacyTextSplitter class used
`sent.text` to get the sentence. I modified this to include a ternary
such that if strip_whitespace is false, it uses `sent.text_with_ws`
I also modified the project.toml to include the spacy pipeline package
and to lock the numpy version, as higher versions break spacy.

- **Issue:** N/a
- **Dependencies:** None
2024-09-02 21:15:56 +00:00
ccurme
bc557a5663 text-splitters[patch]: fix typing for keep_separator (#25706) 2024-08-23 17:22:02 +00:00
Tamir Zitman
b3e1378f2b langchain : text_splitters Added PowerShell (#24582)
- **Description:** Added PowerShell support for text splitters language
include docs relevant update
  - **Issue:** None
  - **Dependencies:** None

---------

Co-authored-by: tzitman <tamir.zitman@intel.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-30 16:13:52 +00:00
bilk0h
3d54784e6d text-splitters: Fix/recursive json splitter data persistence issue (#21529)
Thank you for contributing to LangChain!

**Description:** Noticed an issue with when I was calling
`RecursiveJsonSplitter().split_json()` multiple times that I was getting
weird results. I found an issue where `chunks` list in the `_json_split`
method. If chunks is not provided when _json_split (which is the case
when split_json calls _json_split) then the same list is used for
subsequent calls to `_json_split`.


You can see this in the test case i also added to this commit.

Output should be: 
```
[{'a': 1, 'b': 2}]
[{'c': 3, 'd': 4}]
```

Instead you get:
```
[{'a': 1, 'b': 2}]
[{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
```

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-06-18 20:21:55 -07:00
Ryan Elston
86ee4f0daa text-splitters: Introduce Experimental Markdown Syntax Splitter (#22257)
#### Description
This MR defines a `ExperimentalMarkdownSyntaxTextSplitter` class. The
main goal is to replicate the functionality of the original
`MarkdownHeaderTextSplitter` which extracts the header stack as metadata
but with one critical difference: it keeps the whitespace of the
original text intact.

This draft reimplements the `MarkdownHeaderTextSplitter` with a very
different algorithmic approach. Instead of marking up each line of the
text individually and aggregating them back together into chunks, this
method builds each chunk sequentially and applies the metadata to each
chunk. This makes the implementation simpler. However, since it's
designed to keep white space intact its not a full drop in replacement
for the original. Since it is a radical implementation change to the
original code and I would like to get feedback to see if this is a
worthwhile replacement, should be it's own class, or is not a good idea
at all.

Note: I implemented the `return_each_line` parameter but I don't think
it's a necessary feature. I'd prefer to remove it.

This implementation also adds the following additional features:
- Splits out code blocks and includes the language in the `"Code"`
metadata key
- Splits text on the horizontal rule `---` as well
- The `headers_to_split_on` parameter is now optional - with sensible
defaults that can be overridden.

#### Issue
Keeping the whitespace keeps the paragraphs structure and the formatting
of the code blocks intact which allows the caller much more flexibility
in how they want to further split the individuals sections of the
resulting documents. This addresses the issues brought up by the
community in the following issues:
- https://github.com/langchain-ai/langchain/issues/20823
- https://github.com/langchain-ai/langchain/issues/19436
- https://github.com/langchain-ai/langchain/issues/22256

#### Dependencies
N/A

#### Twitter handle
@RyanElston

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-18 19:44:00 -07:00
Jiejun Tan
c8c67dde6f text-splitters[patch]: Fix HTMLSectionSplitter (#22812)
Update former pull request:
https://github.com/langchain-ai/langchain/pull/22654.

Modified `langchain_text_splitters.HTMLSectionSplitter`, where in the
latest version `dict` data structure is used to store sections from a
html document, in function `split_html_by_headers`. The header/section
element names serve as dict keys. This can be a problem when duplicate
header/section element names are present in a single html document.
Latter ones can replace former ones with the same name. Therefore some
contents can be miss after html text splitting is conducted.

Using a list to store sections can hopefully solve the problem. A Unit
test considering duplicate header names has been added.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-06-14 22:40:39 +00:00
Tom Clelford
c599732e1a text-splitters[patch]: fix HTMLSectionSplitter parsing of xslt paths (#22176)
## Description
This PR allows passing the HTMLSectionSplitter paths to xslt files. It
does so by fixing two trivial bugs with how passed paths were being
handled. It also changes the default value of the param `xslt_path` to
`None` so the special case where the file was part of the langchain
package could be handled.

## Issue
#22175
2024-06-03 20:26:59 +00:00
Christos Boulmpasakos
c3bcfad66d text-splitters[patch]: Extend TextSplitter:keep_separator functionality (#21130)
**Description:** Added extra functionality to `CharacterTextSplitter`,
`TextSplitter` classes.
The user can select whether to append the separator to the previous
chunk with `keep_separator='end' ` or else prepend to the next chunk.
Previous functionality prepended by default to next chunk.
  
**Issue:** Fixes #20908

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-05-22 13:17:45 -07:00
Lei Zhang
2cd907ad7e text-splitters[patch]: fix MarkdownHeaderTextSplitter fails to parse headers with non-printable characters (#20645)
Description: MarkdownHeaderTextSplitter Fails to Parse Headers with
non-printable characters. more #20643

The following is the official test case. Just replacing `# Foo\n\n` with
`\ufeff# Foo\n\n` will cause the test case to fail.

chunk metadata is empty

```python
def test_md_header_text_splitter_1() -> None:
    """Test markdown splitter by header: Case 1."""

    markdown_document = (
        "\ufeff# Foo\n\n"
        "    ## Bar\n\n"
        "Hi this is Jim\n\n"
        "Hi this is Joe\n\n"
        " ## Baz\n\n"
        " Hi this is Molly"
    )
    headers_to_split_on = [
        ("#", "Header 1"),
        ("##", "Header 2"),
    ]
    markdown_splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on,
    )
    output = markdown_splitter.split_text(markdown_document)
    expected_output = [
        Document(
            page_content="Hi this is Jim  \nHi this is Joe",
            metadata={"Header 1": "Foo", "Header 2": "Bar"},
        ),
        Document(
            page_content="Hi this is Molly",
            metadata={"Header 1": "Foo", "Header 2": "Baz"},
        ),
    ]
    assert output == expected_output
```

twitter: @coolbeevip

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-25 00:07:42 +00:00
saberuster
160bcaeb93 text-splitters[minor]: Add lua code splitting (#20421)
- **Description:** Complete the support for Lua code in
langchain.text_splitter module.
- **Dependencies:** No
- **Twitter handle:** @saberuster

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-13 22:42:51 +00:00
Mahdi Setayesh
c28efb878c text-splitters[minor]: Adding a new section aware splitter to langchain (#16526)
- **Description:** the layout of html pages can be variant based on the
bootstrap framework or the styles of the pages. So we need to have a
splitter to transform the html tags to a proper layout and then split
the html content based on the provided list of tags to determine its
html sections. We are using BS4 library along with xslt structure to
split the html content using an section aware approach.
  - **Dependencies:** No new dependencies
  - **Twitter handle:** @m_setayesh

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-01 20:32:26 +00:00