mirror of
https://github.com/hwchase17/langchain.git
synced 2025-04-29 12:25:37 +00:00
# Description Implements the `atransform_documents` method for `MarkdownifyTransformer` using the `asyncio` built-in library for concurrency. Note that this is mainly for API completeness when working with async frameworks rather than for performance, since the `markdownify` function is not I/O bound because it works with `Document` objects already in memory. # Issue Fixes #27865 # Dependencies No new dependencies added, but [`markdownify`](https://github.com/matthewwithanm/python-markdownify) is required since this PR updates the `markdownify` integration. # Tests and docs - Tests added - I did not modify the docstrings since they already described the basic functionality, and [the API docs also already included a description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents). If it would be helpful, I would be happy to update the docstrings and/or the API docs. # Lint and test - [x] format - [x] lint - [x] test I ran formatting with `make format`, linting with `make lint`, and confirmed that tests pass using `make test`. Note that some unit tests pass in CI but may fail when running `make_test`. Those unit tests are: - `test_extract_html` (and `test_extract_html_async`) - `test_strip_tags` (and `test_strip_tags_async`) - `test_convert_tags` (and `test_convert_tags_async`) The reason for the difference is that there are trailing spaces when the tests are run in the CI checks, and no trailing spaces when run with `make test`. I ensured that the tests pass in CI, but they may fail with `make test` due to the addition of trailing spaces. --------- Co-authored-by: Erick Friis <erick@langchain.dev> |
||
---|---|---|
.. | ||
xsl | ||
__init__.py | ||
beautiful_soup_transformer.py | ||
doctran_text_extract.py | ||
doctran_text_qa.py | ||
doctran_text_translate.py | ||
embeddings_redundant_filter.py | ||
google_translate.py | ||
html2text.py | ||
long_context_reorder.py | ||
markdownify.py | ||
nuclia_text_transform.py | ||
openai_functions.py |