langchain/docs/versioned_docs/version-0.2.x/integrations/document_loaders/confluence.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Confluence\n",
    "\n",
    ">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
    "\n",
    "A loader for `Confluence` pages.\n",
    "\n",
    "\n",
    "This currently supports `username/api_key`, `Oauth2 login`. Additionally, on-prem installations also support `token` authentication. \n",
    "\n",
    "\n",
    "Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
    "\n",
    "\n",
    "You can also specify a boolean `include_attachments` to include attachments, this is set to False by default, if set to True all attachments will be downloaded and ConfluenceReader will extract the text from the attachments and add it to the Document object. Currently supported attachment types are: `PDF`, `PNG`, `JPEG/JPG`, `SVG`, `Word` and `Excel`.\n",
    "\n",
    "Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before using ConfluenceLoader make sure you have the latest version of the atlassian-python-api package installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet  atlassian-python-api"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Username and Password or Username and API Token (Atlassian Cloud only)\n",
    "\n",
    "This example authenticates using either a username and password or, if you're connecting to an Atlassian Cloud hosted version of Confluence, a username and an API Token.\n",
    "You can generate an API token at: https://id.atlassian.com/manage-profile/security/api-tokens.\n",
    "\n",
    "The `limit` parameter specifies how many documents will be retrieved in a single call, not how many documents will be retrieved in total.\n",
    "By default the code will return up to 1000 documents in 50 documents batches. To control the total number of documents use the `max_pages` parameter. \n",
    "Plese note the maximum value for the `limit` parameter in the atlassian-python-api package is currently 100.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import ConfluenceLoader\n",
    "\n",
    "loader = ConfluenceLoader(\n",
    "    url=\"https://yoursite.atlassian.com/wiki\", username=\"me\", api_key=\"12345\"\n",
    ")\n",
    "documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Personal Access Token (Server/On-Prem only)\n",
    "\n",
    "This method is valid for the Data Center/Server on-prem edition only.\n",
    "For more information on how to generate a Personal Access Token (PAT) check the official Confluence documentation at: https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html.\n",
    "When using a PAT you provide only the token value, you cannot provide a username. \n",
    "Please note that ConfluenceLoader will run under the permissions of the user that generated the PAT and will only be able to load documents for which said user has access to.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import ConfluenceLoader\n",
    "\n",
    "loader = ConfluenceLoader(url=\"https://yoursite.atlassian.com/wiki\", token=\"12345\")\n",
    "documents = loader.load(\n",
    "    space_key=\"SPACE\", include_attachments=True, limit=50, max_pages=50\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  },
  "vscode": {
   "interpreter": {
    "hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}