mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-04 20:46:45 +00:00
text-splitters: Fix/recursive json splitter data persistence issue (#21529)
Thank you for contributing to LangChain! **Description:** Noticed an issue with when I was calling `RecursiveJsonSplitter().split_json()` multiple times that I was getting weird results. I found an issue where `chunks` list in the `_json_split` method. If chunks is not provided when _json_split (which is the case when split_json calls _json_split) then the same list is used for subsequent calls to `_json_split`. You can see this in the test case i also added to this commit. Output should be: ``` [{'a': 1, 'b': 2}] [{'c': 3, 'd': 4}] ``` Instead you get: ``` [{'a': 1, 'b': 2}] [{'a': 1, 'b': 2, 'c': 3, 'd': 4}] ``` --------- Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: isaac hershenson <ihershenson@hmc.edu> Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
This commit is contained in:
@@ -55,7 +55,7 @@ class RecursiveJsonSplitter:
|
||||
Split json into maximum size dictionaries while preserving structure.
|
||||
"""
|
||||
current_path = current_path or []
|
||||
chunks = chunks or [{}]
|
||||
chunks = chunks if chunks is not None else [{}]
|
||||
if isinstance(data, dict):
|
||||
for key, value in data.items():
|
||||
new_path = current_path + [key]
|
||||
|
Reference in New Issue
Block a user