- **Description:** Added `ref` query parameter so data is not loaded
only from the default branch but any branch passed
---------
Co-authored-by: Osama Mehdi <mehdi@hm.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:
1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.
The respective solutions I propose are the following:
1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Description-
- Changed the GitHub endpoint as existing was not working and giving 404
not found error
- Also the existing function was failing if file_filter is not passed as
the tree api return all paths including directory as well, and when
get_file_content was iterating over these path, the function was failing
for directory as the api was returning list of files inside the
directory, so added a condition to ignore the paths if it a directory
- Fixes this issue -
https://github.com/langchain-ai/langchain/issues/17453
Co-authored-by: Radhika Bansal <Radhika.Bansal@veritas.com>
### Description
support load any github file content based on file extension.
Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.
### Twitter handle
my twitter: @shufanhaotop
---------
Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>