mirror of
				https://github.com/hwchase17/langchain.git
				synced 2025-10-26 21:33:56 +00:00 
			
		
		
		
	# Add Mastodon toots loader. Loader works either with public toots, or Mastodon app credentials. Toot text and user info is loaded. I've also added integration test for this new loader as it works with public data, and a notebook with example output run now. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
		
			
				
	
	
		
			127 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			127 lines
		
	
	
		
			3.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | ||
|  "cells": [
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "66a7777e",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "# Mastodon\n",
 | ||
|     "\n",
 | ||
|     ">[Mastodon](https://joinmastodon.org/) is a federated social media and social networking service.\n",
 | ||
|     "\n",
 | ||
|     "This loader fetches the text from the \"toots\" of a list of `Mastodon` accounts, using the `Mastodon.py` Python package.\n",
 | ||
|     "\n",
 | ||
|     "Public accounts can the queried by default without any authentication. If non-public accounts or instances are queried, you have to register an application for your account which gets you an access token, and set that token and your account's API base URL.\n",
 | ||
|     "\n",
 | ||
|     "Then you need to pass in the Mastodon account names you want to extract, in the `@account@instance` format."
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": null,
 | ||
|    "id": "9ec8a3b3",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "from langchain.document_loaders import MastodonTootsLoader"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 2,
 | ||
|    "id": "43128d8d",
 | ||
|    "metadata": {},
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "#!pip install Mastodon.py"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 3,
 | ||
|    "id": "35d6809a",
 | ||
|    "metadata": {
 | ||
|     "pycharm": {
 | ||
|      "name": "#%%\n"
 | ||
|     }
 | ||
|    },
 | ||
|    "outputs": [],
 | ||
|    "source": [
 | ||
|     "loader = MastodonTootsLoader(\n",
 | ||
|     "    mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
 | ||
|     "    number_toots=50,  # Default value is 100\n",
 | ||
|     ")\n",
 | ||
|     "\n",
 | ||
|     "# Or set up access information to use a Mastodon app.\n",
 | ||
|     "# Note that the access token can either be passed into\n",
 | ||
|     "# constructor or you can set the envirovnment \"MASTODON_ACCESS_TOKEN\".\n",
 | ||
|     "# loader = MastodonTootsLoader(\n",
 | ||
|     "#     access_token=\"<ACCESS TOKEN OF MASTODON APP>\",\n",
 | ||
|     "#     api_base_url=\"<API BASE URL OF MASTODON APP INSTANCE>\",\n",
 | ||
|     "#     mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
 | ||
|     "#     number_toots=50,  # Default value is 100\n",
 | ||
|     "# )"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "code",
 | ||
|    "execution_count": 6,
 | ||
|    "id": "05fe33b9",
 | ||
|    "metadata": {
 | ||
|     "pycharm": {
 | ||
|      "name": "#%%\n"
 | ||
|     }
 | ||
|    },
 | ||
|    "outputs": [
 | ||
|     {
 | ||
|      "name": "stdout",
 | ||
|      "output_type": "stream",
 | ||
|      "text": [
 | ||
|       "<p>It is tough to leave this behind and go back to reality. And some people live here! I’m sure there are downsides but it sounds pretty good to me right now.</p>\n",
 | ||
|       "================================================================================\n",
 | ||
|       "<p>I wish we could stay here a little longer, but it is time to go home 🥲</p>\n",
 | ||
|       "================================================================================\n",
 | ||
|       "<p>Last day of the honeymoon. And it’s <a href=\"https://mastodon.social/tags/caturday\" class=\"mention hashtag\" rel=\"tag\">#<span>caturday</span></a>! This cute tabby came to the restaurant to beg for food and got some chicken.</p>\n",
 | ||
|       "================================================================================\n"
 | ||
|      ]
 | ||
|     }
 | ||
|    ],
 | ||
|    "source": [
 | ||
|     "documents = loader.load()\n",
 | ||
|     "for doc in documents[:3]:\n",
 | ||
|     "    print(doc.page_content)\n",
 | ||
|     "    print(\"=\" * 80)"
 | ||
|    ]
 | ||
|   },
 | ||
|   {
 | ||
|    "cell_type": "markdown",
 | ||
|    "id": "322bb6a1",
 | ||
|    "metadata": {},
 | ||
|    "source": [
 | ||
|     "The toot texts (the documents' `page_content`) is by default HTML as returned by the Mastodon API."
 | ||
|    ]
 | ||
|   }
 | ||
|  ],
 | ||
|  "metadata": {
 | ||
|   "kernelspec": {
 | ||
|    "display_name": "Python 3 (ipykernel)",
 | ||
|    "language": "python",
 | ||
|    "name": "python3"
 | ||
|   },
 | ||
|   "language_info": {
 | ||
|    "codemirror_mode": {
 | ||
|     "name": "ipython",
 | ||
|     "version": 3
 | ||
|    },
 | ||
|    "file_extension": ".py",
 | ||
|    "mimetype": "text/x-python",
 | ||
|    "name": "python",
 | ||
|    "nbconvert_exporter": "python",
 | ||
|    "pygments_lexer": "ipython3",
 | ||
|    "version": "3.11.3"
 | ||
|   }
 | ||
|  },
 | ||
|  "nbformat": 4,
 | ||
|  "nbformat_minor": 5
 | ||
| }
 |