mirror of
https://github.com/hwchase17/langchain.git
synced 2026-05-17 21:08:00 +00:00
## Summary Removes two incorrect separators from `get_separators_for_language()` in `RecursiveCharacterTextSplitter`: - **C#**: `"\nimplements "` is a Java keyword. C# uses `:` for interface implementation. This separator never matches valid C# source code. - **Elixir**: `"\nwhile "` does not exist in Elixir. The language uses recursion and `Enum.reduce_while/3` instead of while loops. Both are dead separators that silently degrade chunking quality by occupying positions in the separator priority list without contributing useful split points. ## Tests Added two targeted tests: - `test_csharp_separators_no_java_keywords`: verifies `"\nimplements "` is not in the C# separator list - `test_elixir_separators_no_while`: verifies `"\nwhile "` is not in the Elixir separator list Existing `test_csharp_code_splitter` continues to pass (no change to expected output since `implements` never matched valid C# code). Full suite: 129 passed, 0 failed. Fixes #37030