core: Handle unterminated escape character when parsing partial JSON (#29065)

**Description**
Currently, when parsing a partial JSON, if a string ends with the escape
character, the whole key/value is removed. For example:

```
>>> from langchain_core.utils.json import parse_partial_json
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> 
>>> parse_partial_json(my_str)
{'foo': 'bar'}
```

My expectation (and with this fix) would be for `parse_partial_json()`
to return:
```
>>> from langchain_core.utils.json import parse_partial_json
>>> 
>>> my_str = '{"foo": "bar", "baz": "qux\\'
>>> parse_partial_json(my_str)
{'foo': 'bar', 'baz': 'qux'}
```

Notes:
1. It could be argued that current behavior is still desired.
2. I have experienced this issue when the streaming output from an LLM
and the chunk happens to end with `\\`
3. I haven't included tests. Will do if change is accepted.
4. This is specially troublesome when this function is used by

187131c55c/libs/core/langchain_core/output_parsers/transform.py (L111)

since what happens is that, for example, if the received sequence of
chunks are: `{"foo": "b` , `ar\\` :

Then, the result of calling `self.parse_result()` is:
```
{"foo": "b"}
```
and the second time:
```
{}
```

Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
Bruno Alvisio 2025-02-07 15:18:21 -08:00 committed by GitHub
parent 0040d93b09
commit 3eaf561561
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 3 additions and 0 deletions

View File

@ -94,6 +94,8 @@ def parse_partial_json(s: str, *, strict: bool = False) -> Any:
# If we're still inside a string at the end of processing,
# we need to close the string.
if is_inside_string:
if escaped: # Remoe unterminated escape character
new_chars.pop()
new_chars.append('"')
# Reverse the stack to get the closing characters.

View File

@ -242,6 +242,7 @@ TEST_CASES_PARTIAL = [
('{"foo": "bar", "bar":', '{"foo": "bar"}'),
('{"foo": "bar", "bar"', '{"foo": "bar"}'),
('{"foo": "bar", ', '{"foo": "bar"}'),
('{"foo":"bar\\', '{"foo": "bar"}'),
]