langchain/libs/community/tests/unit_tests/document_loaders/parsers
Anusha Karkhanis 26bdf40072
Langchain_Community: SQL LanguageParser (#28430)
## Description
(This PR has contributions from @khushiDesai, @ashvini8, and
@ssumaiyaahmed).

This PR addresses **Issue #11229** which addresses the need for SQL
support in document parsing. This is integrated into the generic
TreeSitter parsing library, allowing LangChain users to easily load
codebases in SQL into smaller, manageable "documents."

This pull request adds a new ```SQLSegmenter``` class, which provides
the SQL integration.

## Issue
**Issue #11229**: Add support for a variety of languages to
LanguageParser

## Testing
We created a file ```test_sql.py``` with several tests to ensure the
```SQLSegmenter``` is functional. Below are the tests we added:

- ```def test_is_valid```: Checks SQL validity.
- ```def test_extract_functions_classes```: Extracts individual SQL
statements.
- ```def test_simplify_code```: Simplifies SQL code with comments.

---------

Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com>
Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com>
Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com>
Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-19 20:30:57 +00:00
..
language Langchain_Community: SQL LanguageParser (#28430) 2024-12-19 20:30:57 +00:00
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_azure_whisper_parser.py community: add AzureOpenAIWhisperParser (#27796) 2024-10-31 12:37:41 -04:00
test_doc_intelligence.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
test_generic.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_html_parsers.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
test_pdf_parsers.py community[patch]: move pdf text tests to integration (#18746) 2024-03-07 10:34:22 -08:00
test_public_api.py community[minor]: New documents loader for visio files (with extension .vsdx) (#16171) 2024-01-22 22:07:03 -08:00
test_vsdx_parser.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00