Files
langchain/tests/unit_tests
Gordon Clark 96f3dff050 MediaWiki docloader improvements + unit tests (#5879)
Starting over from #5654 because I utterly borked the poetry.lock file.

Adds new paramerters for to the MWDumpLoader class:

* skip_redirecst (bool) Tells the loader to skip articles that redirect
to other articles. False by default.
* stop_on_error (bool) Tells the parser to skip any page that causes a
parse error. True by default.
* namespaces (List[int]) Tells the parser which namespaces to parse.
Contains namespaces from -2 to 15 by default.

Default values are chosen to preserve backwards compatibility.

Sample dump XML and full unit test coverage (with extended tests that
pass!) also included!

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-15 10:49:36 -04:00
..
2023-07-01 13:39:19 -04:00
2023-05-19 15:27:50 -07:00
2022-10-24 14:51:15 -07:00
2023-05-22 16:43:07 -07:00
2022-10-24 14:51:15 -07:00