{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GitHub\n", "\n", "This notebooks shows how you can load issues and pull requests (PRs) for a given repository on [GitHub](https://github.com/). Also shows how you can load github files for a given repository on [GitHub](https://github.com/). We will use the LangChain Python repository as an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup access token" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To access the GitHub API, you need a personal access token - you can set up yours here: https://github.com/settings/tokens?type=beta. You can either set this token as the environment variable ``GITHUB_PERSONAL_ACCESS_TOKEN`` and it will be automatically pulled in, or you can pass it in directly at initialization as the ``access_token`` named parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# If you haven't set your access token as an environment variable, pass it in here.\n", "from getpass import getpass\n", "\n", "ACCESS_TOKEN = getpass()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Issues and PRs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain_community.document_loaders import GitHubIssuesLoader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loader = GitHubIssuesLoader(\n", " repo=\"langchain-ai/langchain\",\n", " access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n", " creator=\"UmerHA\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load all issues and PRs created by \"UmerHA\".\n", "\n", "Here's a list of all filters you can use:\n", "- include_prs\n", "- milestone\n", "- state\n", "- assignee\n", "- creator\n", "- mentioned\n", "- labels\n", "- sort\n", "- direction\n", "- since\n", "\n", "For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "docs = loader.load()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(docs[0].page_content)\n", "print(docs[0].metadata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Only load issues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), use `include_prs=False`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loader = GitHubIssuesLoader(\n", " repo=\"langchain-ai/langchain\",\n", " access_token=ACCESS_TOKEN, # delete/comment out this argument if you've set the access token as an env var.\n", " creator=\"UmerHA\",\n", " include_prs=False,\n", ")\n", "docs = loader.load()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(docs[0].page_content)\n", "print(docs[0].metadata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Github File Content\n", "\n", "For below code, loads all markdown file in rpeo `langchain-ai/langchain`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.document_loaders import GithubFileLoader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loader = GithubFileLoader(\n", " repo=\"langchain-ai/langchain\", # the repo name\n", " access_token=ACCESS_TOKEN,\n", " github_api_url=\"https://api.github.com\",\n", " file_filter=lambda file_path: file_path.endswith(\n", " \".md\"\n", " ), # load all markdowns files.\n", ")\n", "documents = loader.load()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "example output of one of document: \n", "\n", "```json\n", "documents.metadata: \n", " {\n", " \"path\": \"README.md\",\n", " \"sha\": \"82f1c4ea88ecf8d2dfsfx06a700e84be4\",\n", " \"source\": \"https://github.com/langchain-ai/langchain/blob/master/README.md\"\n", " }\n", "documents.content:\n", " mock content\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }