Commit Graph

17 Commits

Author SHA1 Message Date
Klein Tahiraj
d3d4503ce2
Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
2023-03-22 15:19:42 -07:00
Harrison Chase
45f05fc939
Harrison/blackboard loader (#1737)
Co-authored-by: Aidan Holland <thehappydinoa@gmail.com>
2023-03-17 08:02:44 -07:00
Tim Asp
b3234bf3b0
cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615)
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of
the pdf loaders.

Because they're all similar, I propose moving all to `pdy.py` and the
same docs/examples page.

Additionally, `PagedPDFSplitter` naming doesn't match the pattern the
rest of the loaders follow, so I renamed to `PyPDFLoader` and had it
inherit from `BasePDFLoader` so it can now load from remote file
sources.
2023-03-13 23:06:50 -07:00
Tim Asp
72ef69d1ba
Add new iFixit document loader (#1333)
iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0

Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
2023-02-27 20:40:20 -08:00
Ingo Kleiber
fd9975dad7
add CoNLL-U document loader (#1297)
I've added a simple
[CoNLL-U](https://universaldependencies.org/format.html) document
loader. CoNLL-U is a common format for NLP tasks and is used, for
example, in the Universal Dependencies treebank corpora. The loader
reads a single file in standard CoNLL-U format and returns a document.
2023-02-26 17:27:00 -08:00
Dennis Antela Martinez
23243ae69c
add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
2023-02-20 20:05:04 -08:00
Harrison Chase
d5f3dfa1e1
Harrison/hn loader (#1130)
Co-authored-by: William X <william.y.xuan@gmail.com>
2023-02-17 15:15:02 -08:00
Harrison Chase
98186ef180
Harrison/evernote nb (#1078)
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
2023-02-15 22:47:30 -08:00
Harrison Chase
2e96704d59
Harrison/airbyte (#989)
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
2023-02-10 18:08:00 -08:00
zanderchase
c2d1d903fa
Zander/online pdf loader (#984) 2023-02-10 15:42:30 -08:00
Harrison Chase
c64f98e2bb
Harrison/format agent instructions (#973)
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
2023-02-10 10:07:26 -08:00
zanderchase
8e126bc9bd
adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
Harrison Chase
3e1901e1aa
gutenberg books (#946)
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-08 12:00:47 -08:00
Harrison Chase
44ecec3896
Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
Harrison Chase
637c0d6508
Harrison/obsidian (#920) 2023-02-06 22:21:16 -08:00
Harrison Chase
2ec25ddd4c
add unstructured examples (#913) 2023-02-06 18:13:46 -08:00
Harrison Chase
53d56d7650
Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00