mirror of
https://github.com/hwchase17/langchain.git
synced 2025-10-22 01:32:24 +00:00
**Description:** Repair Wikipedia document loader `load_max_docs` and improve test coverage. **Issue:** The Wikipedia document loader was not respecting the `load_max_docs` paramater (not reported) and would always return a maximum of 10 documents. This is because the API wrapper (in `utilities/wikipedia.py`) wasn't passing `top_k_results` to the underlying [Wikipedia library](https://wikipedia.readthedocs.io/en/latest/code.html#module-wikipedia). By default this library returns 10 results. The default number of results for the document loader has been reduced from 100 to 25. This is because loading 100 results takes a very long time and is an inconvenient default. It should possibly be 10. In addition, the documentation for the loader reported that there was a hard limit (300) on the number of documents returned. In actuality 300 is the maximum Wikipedia query character length set by the API wrapper. Tests have been added for the document loader (previously missing) and to test the correct numbers of documents are being returned by each class, both by default, and when overridden. Also repaired is the `assert_docs` test which has been updated to correctly test for the default metadata (which includes `source` in recent releases). **Dependencies:** nil **Tag maintainer:** @leo-gan **Twitter handle:** @queenvictoria