text-splitters: Fix/recursive json splitter data persistence issue (#21529)

Thank you for contributing to LangChain!

**Description:** Noticed an issue with when I was calling
`RecursiveJsonSplitter().split_json()` multiple times that I was getting
weird results. I found an issue where `chunks` list in the `_json_split`
method. If chunks is not provided when _json_split (which is the case
when split_json calls _json_split) then the same list is used for
subsequent calls to `_json_split`.


You can see this in the test case i also added to this commit.

Output should be: 
```
[{'a': 1, 'b': 2}]
[{'c': 3, 'd': 4}]
```

Instead you get:
```
[{'a': 1, 'b': 2}]
[{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
```

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
This commit is contained in:
bilk0h
2024-06-18 20:21:55 -07:00
committed by GitHub
parent 9ab7a6df39
commit 3d54784e6d
2 changed files with 22 additions and 1 deletions

View File

@@ -55,7 +55,7 @@ class RecursiveJsonSplitter:
Split json into maximum size dictionaries while preserving structure.
"""
current_path = current_path or []
chunks = chunks or [{}]
chunks = chunks if chunks is not None else [{}]
if isinstance(data, dict):
for key, value in data.items():
new_path = current_path + [key]