langchain/libs/community/tests/unit_tests/document_loaders
Hiros 8f5e72de05
community: Correctly handle multi-element rich text (#25762)
**Description:**

- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.

 **Issue:**

Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.

**Dependencies:** any dependencies required for this change
None.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 20:20:27 +00:00
..
blob_loaders multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
loaders multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
parsers community: add AzureOpenAIWhisperParser (#27796) 2024-10-31 12:37:41 -04:00
sample_documents community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_docs community: Fix CSVLoader columns is None (#20701) 2024-05-22 12:57:46 -07:00
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_airbyte.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_arcgis_loader.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_assemblyai.py Merge pull request #18421 2024-03-06 13:16:05 -05:00
test_bibtex.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_bshtml.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_confluence.py community: add include_labels option to ConfluenceLoader (#28259) 2024-12-09 19:35:01 +00:00
test_couchbase.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_csv_loader.py community[patch]: added content_columns option to CSVLoader (#23809) 2024-09-02 20:25:53 +00:00
test_cube_semantic.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_detect_encoding.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_directory_loader.py community: Fix CSVLoader columns is None (#20701) 2024-05-22 12:57:46 -07:00
test_directory.py community: glob multiple patterns when using DirectoryLoader (#22852) 2024-06-18 09:24:50 -07:00
test_evernote_loader.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_generic_loader.py infra: update mypy 1.10, ruff 0.5 (#23721) 2024-07-03 10:33:27 -07:00
test_git.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_github.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
test_hugging_face_model.py community[minor]: add hugging_face_model document loader (#17323) 2024-02-28 20:05:35 +00:00
test_hugging_face.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_imports.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
test_json_loader.py community: fixes json loader not getting texts with json standard (#27327) 2024-12-12 19:33:45 +00:00
test_lakefs.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
test_mediawikidump.py infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
test_mhtml.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_mongodb.py community: Enhance MongoDBLoader with flexible metadata and optimized field extraction (#23376) 2024-09-17 10:23:17 -04:00
test_needle.py community: add Needle retriever and document loader integration (#28157) 2024-12-03 22:06:25 +00:00
test_notebook.py community[patch]: add NotebookLoader unit test (#17721) 2024-03-29 00:27:46 +00:00
test_notiondb_loader.py community: Correctly handle multi-element rich text (#25762) 2024-12-16 20:20:27 +00:00
test_obsidian.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_onenote.py multiple: pydantic 2 compatibility, v0.3 (#26443) 2024-09-13 14:38:45 -07:00
test_oracleadb.py community[minor]: add oracle autonomous database doc loader integration (#19536) 2024-03-26 17:02:18 -07:00
test_pdf.py community[patch]: add to pypdf tests and run in CI (#26663) 2024-09-19 14:45:49 +00:00
test_pebblo.py [community] Added PebbloTextLoader for loading text data in PebbloSafeLoader (#26582) 2024-09-19 09:59:04 -04:00
test_psychic.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
test_readthedoc.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_recursive_url_loader.py community[patch]: recursive url loader fix and unit tests (#22521) 2024-06-05 17:56:20 -07:00
test_rspace_loader.py community[patch]: Add missing annotations (#24890) 2024-07-31 18:13:44 +00:00
test_rss.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_trello.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 2023-12-11 13:53:30 -08:00
test_web_base.py community[patch]: add web loader tests (#26728) 2024-09-20 18:29:54 -04:00
test_youtube.py community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710) 2024-06-11 17:44:36 +00:00