Files
langchain/docs/docs/integrations/document_loaders/github.ipynb
Bagatur 493e474063 docs: udpated api reference (#25172)
- Move the API reference into the vercel build
- Update api reference organization and styling
2024-08-14 07:00:17 -07:00

228 lines
5.4 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# GitHub\n",
"\n",
"This notebooks shows how you can load issues and pull requests (PRs) for a given repository on [GitHub](https://github.com/). Also shows how you can load github files for a given repository on [GitHub](https://github.com/). We will use the LangChain Python repository as an example."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup access token"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To access the GitHub API, you need a personal access token - you can set up yours here: https://github.com/settings/tokens?type=beta. You can either set this token as the environment variable ``GITHUB_PERSONAL_ACCESS_TOKEN`` and it will be automatically pulled in, or you can pass it in directly at initialization as the ``access_token`` named parameter."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# If you haven't set your access token as an environment variable, pass it in here.\n",
"from getpass import getpass\n",
"\n",
"ACCESS_TOKEN = getpass()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Issues and PRs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import GitHubIssuesLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = GitHubIssuesLoader(\n",
" repo=\"langchain-ai/langchain\",\n",
" access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n",
" creator=\"UmerHA\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load all issues and PRs created by \"UmerHA\".\n",
"\n",
"Here's a list of all filters you can use:\n",
"- include_prs\n",
"- milestone\n",
"- state\n",
"- assignee\n",
"- creator\n",
"- mentioned\n",
"- labels\n",
"- sort\n",
"- direction\n",
"- since\n",
"\n",
"For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)\n",
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Only load issues"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), use `include_prs=False`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = GitHubIssuesLoader(\n",
" repo=\"langchain-ai/langchain\",\n",
" access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n",
" creator=\"UmerHA\",\n",
" include_prs=False,\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)\n",
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Github File Content\n",
"\n",
"For below code, loads all markdown file in rpeo `langchain-ai/langchain`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import GithubFileLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = GithubFileLoader(\n",
" repo=\"langchain-ai/langchain\", # the repo name\n",
" access_token=ACCESS_TOKEN,\n",
" github_api_url=\"https://api.github.com\",\n",
" file_filter=lambda file_path: file_path.endswith(\n",
" \".md\"\n",
" ), # load all markdowns files.\n",
")\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"example output of one of document: \n",
"\n",
"```json\n",
"documents.metadata: \n",
" {\n",
" \"path\": \"README.md\",\n",
" \"sha\": \"82f1c4ea88ecf8d2dfsfx06a700e84be4\",\n",
" \"source\": \"https://github.com/langchain-ai/langchain/blob/master/README.md\"\n",
" }\n",
"documents.content:\n",
" mock content\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}