**Problem statement:** the
[document_loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html#)
section is too long and hard to comprehend.
**Proposal:** group document_loaders by 3 classes: (see `Files changed`
tab)
UPDATE: I've completely reworked the document_loader classification.
Now this PR changes only one file!
FYI @eyurtsev @hwchase17
We're fans of the LangChain framework thus we wanted to make sure we
provide an easy way for our customers to be able to utilize this
framework for their LLM-powered applications at our platform.
# Add option to `load_huggingface_tool`
Expose a method to load a huggingface Tool from the HF hub
---------
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
# ODF File Loader
Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.
### Testing
The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).
```python
from langchain.document_loaders import UnstructuredODTLoader
loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()
loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
# Minor Wording Documentation Change
```python
agent_chain.run("When's my friend Eric's surname?")
# Answer with 'Zhu'
```
is change to
```python
agent_chain.run("What's my friend Eric's surname?")
# Answer with 'Zhu'
```
I think when is a residual of the old query that was "When’s my friends
Eric`s birthday?".
# Fix grammar in Text Splitters docs
Just a small fix of grammar in the documentation:
"That means there two different axes" -> "That means there are two
different axes"
Related: #4028, I opened a new PR because (1) I was unable to unstage
mistakenly committed files (I'm not familiar with git enough to resolve
this issue), (2) I felt closing the original PR and opening a new PR
would be more appropriate if I changed the class name.
This PR creates HumanInputLLM(HumanLLM in #4028), a simple LLM wrapper
class that returns user input as the response. I also added a simple
Jupyter notebook regarding how and why to use this LLM wrapper. In the
notebook, I went over how to use this LLM wrapper and showed example of
testing `WikipediaQueryRun` using HumanInputLLM.
I believe this LLM wrapper will be useful especially for debugging,
educational or testing purpose.
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.
---------
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>