{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "MwTWzDxYgbrR" }, "source": [ "# Athena\n", "\n", ">[Amazon Athena](https://aws.amazon.com/athena/) is a serverless, interactive analytics service built\n", ">on open-source frameworks, supporting open-table and file formats. `Athena` provides a simplified,\n", ">flexible way to analyze petabytes of data where it lives. Analyze data or build applications\n", ">from an Amazon Simple Storage Service (S3) data lake and 30 data sources, including on-premises data\n", ">sources or other cloud systems using SQL or Python. `Athena` is built on open-source `Trino`\n", ">and `Presto` engines and `Apache Spark` frameworks, with no provisioning or configuration effort required.\n", "\n", "This notebook goes over how to load documents from `AWS Athena`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up\n", "\n", "Follow [instructions to set up an AWS accoung](https://docs.aws.amazon.com/athena/latest/ug/setting-up.html).\n", "\n", "Install a python library:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "F0zaLR3xgWmO" }, "outputs": [], "source": [ "! pip install boto3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "076NLjfngoWJ" }, "outputs": [], "source": [ "from langchain_community.document_loaders.athena import AthenaLoader" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XpMRQwU9gu44" }, "outputs": [], "source": [ "database_name = \"my_database\"\n", "s3_output_path = \"s3://my_bucket/query_results/\"\n", "query = \"SELECT * FROM my_table\"\n", "profile_name = \"my_profile\"\n", "\n", "loader = AthenaLoader(\n", " query=query,\n", " database=database_name,\n", " s3_output_uri=s3_output_path,\n", " profile_name=profile_name,\n", ")\n", "\n", "documents = loader.load()\n", "print(documents)" ] }, { "cell_type": "markdown", "metadata": { "id": "5IBapL3ejoEt" }, "source": [ "Example with metadata columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wMx6nI1qjryD" }, "outputs": [], "source": [ "database_name = \"my_database\"\n", "s3_output_path = \"s3://my_bucket/query_results/\"\n", "query = \"SELECT * FROM my_table\"\n", "profile_name = \"my_profile\"\n", "metadata_columns = [\"_row\", \"_created_at\"]\n", "\n", "loader = AthenaLoader(\n", " query=query,\n", " database=database_name,\n", " s3_output_uri=s3_output_path,\n", " profile_name=profile_name,\n", " metadata_columns=metadata_columns,\n", ")\n", "\n", "documents = loader.load()\n", "print(documents)" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }