mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-31 07:41:40 +00:00 
			
		
		
		
	# Creates GitHubLoader (#5257) GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub. Fixes #5257 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
		
			
				
	
	
		
			262 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			262 lines
		
	
	
		
			7.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# GitHub\n",
 | |
|     "\n",
 | |
|     "This notebooks shows how you can load issues and pull requests (PRs) for a given repository on [GitHub](https://github.com/). We will use the LangChain Python repository as an example."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Setup access token"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "To access the GitHub API, you need a personal access token - you can set up yours here: https://github.com/settings/tokens?type=beta. You can either set this token as the environment variable ``GITHUB_PERSONAL_ACCESS_TOKEN`` and it will be automatically pulled in, or you can pass it in directly at initializaiton as the ``access_token`` named parameter."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# If you haven't set your access token as an environment variable, pass it in here.\n",
 | |
|     "from getpass import getpass\n",
 | |
|     "\n",
 | |
|     "ACCESS_TOKEN = getpass()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Load Issues and PRs"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 10,
 | |
|    "metadata": {
 | |
|     "tags": []
 | |
|    },
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from langchain.document_loaders import GitHubIssuesLoader"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 11,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = GitHubIssuesLoader(\n",
 | |
|     "    repo=\"hwchase17/langchain\",\n",
 | |
|     "    access_token=ACCESS_TOKEN,  # delete/comment out this argument if you've set the access token as an env var.\n",
 | |
|     "    creator=\"UmerHA\",\n",
 | |
|     ")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Let's load all issues and PRs created by \"UmerHA\".\n",
 | |
|     "\n",
 | |
|     "Here's a list of all filters you can use:\n",
 | |
|     "- include_prs\n",
 | |
|     "- milestone\n",
 | |
|     "- state\n",
 | |
|     "- assignee\n",
 | |
|     "- creator\n",
 | |
|     "- mentioned\n",
 | |
|     "- labels\n",
 | |
|     "- sort\n",
 | |
|     "- direction\n",
 | |
|     "- since\n",
 | |
|     "\n",
 | |
|     "For more info, see https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 12,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 13,
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "# Creates GitHubLoader (#5257)\r\n",
 | |
|       "\r\n",
 | |
|       "GitHubLoader is a DocumentLoader that loads issues and PRs from GitHub.\r\n",
 | |
|       "\r\n",
 | |
|       "Fixes #5257\r\n",
 | |
|       "\r\n",
 | |
|       "Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:\r\n",
 | |
|       "DataLoaders\r\n",
 | |
|       "- @eyurtsev\r\n",
 | |
|       "\n",
 | |
|       "{'url': 'https://github.com/hwchase17/langchain/pull/5408', 'title': 'DocumentLoader for GitHub', 'creator': 'UmerHA', 'created_at': '2023-05-29T14:50:53Z', 'comments': 0, 'state': 'open', 'labels': ['enhancement', 'lgtm', 'doc loader'], 'assignee': None, 'milestone': None, 'locked': False, 'number': 5408, 'is_pull_request': True}\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "print(docs[0].page_content)\n",
 | |
|     "print(docs[0].metadata)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Only load issues"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "By default, the GitHub API returns considers pull requests to also be issues. To only get 'pure' issues (i.e., no pull requests), use `include_prs=False`"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "loader = GitHubIssuesLoader(\n",
 | |
|     "    repo=\"hwchase17/langchain\",\n",
 | |
|     "    access_token=ACCESS_TOKEN,  # delete/comment out this argument if you've set the access token as an env var.\n",
 | |
|     "    creator=\"UmerHA\",\n",
 | |
|     "    include_prs=False,\n",
 | |
|     ")\n",
 | |
|     "docs = loader.load()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 15,
 | |
|    "metadata": {},
 | |
|    "outputs": [
 | |
|     {
 | |
|      "name": "stdout",
 | |
|      "output_type": "stream",
 | |
|      "text": [
 | |
|       "### System Info\n",
 | |
|       "\n",
 | |
|       "LangChain version = 0.0.167\r\n",
 | |
|       "Python version = 3.11.0\r\n",
 | |
|       "System = Windows 11 (using Jupyter)\n",
 | |
|       "\n",
 | |
|       "### Who can help?\n",
 | |
|       "\n",
 | |
|       "- @hwchase17\r\n",
 | |
|       "- @agola11\r\n",
 | |
|       "- @UmerHA (I have a fix ready, will submit a PR)\n",
 | |
|       "\n",
 | |
|       "### Information\n",
 | |
|       "\n",
 | |
|       "- [ ] The official example notebooks/scripts\n",
 | |
|       "- [X] My own modified scripts\n",
 | |
|       "\n",
 | |
|       "### Related Components\n",
 | |
|       "\n",
 | |
|       "- [X] LLMs/Chat Models\n",
 | |
|       "- [ ] Embedding Models\n",
 | |
|       "- [X] Prompts / Prompt Templates / Prompt Selectors\n",
 | |
|       "- [ ] Output Parsers\n",
 | |
|       "- [ ] Document Loaders\n",
 | |
|       "- [ ] Vector Stores / Retrievers\n",
 | |
|       "- [ ] Memory\n",
 | |
|       "- [ ] Agents / Agent Executors\n",
 | |
|       "- [ ] Tools / Toolkits\n",
 | |
|       "- [ ] Chains\n",
 | |
|       "- [ ] Callbacks/Tracing\n",
 | |
|       "- [ ] Async\n",
 | |
|       "\n",
 | |
|       "### Reproduction\n",
 | |
|       "\n",
 | |
|       "```\r\n",
 | |
|       "import os\r\n",
 | |
|       "os.environ[\"OPENAI_API_KEY\"] = \"...\"\r\n",
 | |
|       "\r\n",
 | |
|       "from langchain.chains import LLMChain\r\n",
 | |
|       "from langchain.chat_models import ChatOpenAI\r\n",
 | |
|       "from langchain.prompts import PromptTemplate\r\n",
 | |
|       "from langchain.prompts.chat import ChatPromptTemplate\r\n",
 | |
|       "from langchain.schema import messages_from_dict\r\n",
 | |
|       "\r\n",
 | |
|       "role_strings = [\r\n",
 | |
|       "    (\"system\", \"you are a bird expert\"), \r\n",
 | |
|       "    (\"human\", \"which bird has a point beak?\")\r\n",
 | |
|       "]\r\n",
 | |
|       "prompt = ChatPromptTemplate.from_role_strings(role_strings)\r\n",
 | |
|       "chain = LLMChain(llm=ChatOpenAI(), prompt=prompt)\r\n",
 | |
|       "chain.run({})\r\n",
 | |
|       "```\n",
 | |
|       "\n",
 | |
|       "### Expected behavior\n",
 | |
|       "\n",
 | |
|       "Chain should run\n",
 | |
|       "{'url': 'https://github.com/hwchase17/langchain/issues/5027', 'title': \"ChatOpenAI models don't work with prompts created via ChatPromptTemplate.from_role_strings\", 'creator': 'UmerHA', 'created_at': '2023-05-20T10:39:18Z', 'comments': 1, 'state': 'open', 'labels': [], 'assignee': None, 'milestone': None, 'locked': False, 'number': 5027, 'is_pull_request': False}\n"
 | |
|      ]
 | |
|     }
 | |
|    ],
 | |
|    "source": [
 | |
|     "print(docs[0].page_content)\n",
 | |
|     "print(docs[0].metadata)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": []
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3 (ipykernel)",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.11.3"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 4
 | |
| }
 |