Add multi-input Reddit search tool (#13893)

- **Description:** Added a tool called RedditSearchRun and an
accompanying API wrapper, which searches Reddit for posts with support
for time filtering, post sorting, query string and subreddit filtering.
  - **Issue:** #13891 
  - **Dependencies:** `praw` module is used to search Reddit
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed
  - **Twitter handle:** None.

  Hello,

This is our first PR and we hope that our changes will be helpful to the
community. We have run `make format`, `make lint` and `make test`
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the `praw` package which is already used by
RedditPostsLoader in LangChain. Nonetheless, we have added integration
tests and edited unit tests to test our changes. An example notebook is
also provided. These changes were put together by me, @Anika2000,
@CharlesXu123, and @Jeremy-Cheng-stack

Thank you in advance to the maintainers for their time.

---------

Co-authored-by: What-Is-A-Username <49571870+What-Is-A-Username@users.noreply.github.com>
Co-authored-by: Anika2000 <anika.sultana@mail.utoronto.ca>
Co-authored-by: Jeremy Cheng <81793294+Jeremy-Cheng-stack@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This commit is contained in:
Cheng (William) Huang
2023-11-29 20:16:40 -05:00
committed by GitHub
parent 00a6e8962c
commit a00db4b28f
10 changed files with 600 additions and 2 deletions

View File

@@ -0,0 +1,262 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reddit Search \n",
"\n",
"In this notebook, we learn how the Reddit search tool works. \n",
"First make sure that you have installed praw with the command below: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"!pip install praw"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then you need to set you need to set up the proper API keys and environment variables. You would need to create a Reddit user account and get credentials. So, create a Reddit user account by going to https://www.reddit.com and signing up. \n",
"Then get your credentials by going to https://www.reddit.com/prefs/apps and creating an app. \n",
"You should have your client_id and secret from creating the app. Now, you can paste those strings in client_id and client_secret variable. \n",
"Note: You can put any string for user_agent "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client_id = \"\"\n",
"client_secret = \"\"\n",
"user_agent = \"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools.reddit_search.tool import RedditSearchRun\n",
"from langchain.utilities.reddit_search import RedditSearchAPIWrapper\n",
"\n",
"search = RedditSearchRun(\n",
" api_wrapper=RedditSearchAPIWrapper(\n",
" reddit_client_id=client_id,\n",
" reddit_client_secret=client_secret,\n",
" reddit_user_agent=user_agent,\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then set your queries for example, what subreddit you want to query, how many posts you want to be returned, how you would like the result to be sorted etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools.reddit_search.tool import RedditSearchSchema\n",
"\n",
"search_params = RedditSearchSchema(\n",
" query=\"beginner\", sort=\"new\", time_filter=\"week\", subreddit=\"python\", limit=\"2\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally run the search and get your results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"result = search.run(tool_input=search_params.dict())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is an example of printing the result. \n",
"Note: You may get different output depending on the newest post in the subreddit but the formatting should be similar."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"> Searching r/python found 2 posts:\n",
"> Post Title: 'Setup Github Copilot in Visual Studio Code'\n",
"> User: Feisty-Recording-715\n",
"> Subreddit: r/Python:\n",
"> Text body: 🛠️ This tutorial is perfect for beginners looking to strengthen their understanding of version control or for experienced developers seeking a quick reference for GitHub setup in Visual Studio Code.\n",
">\n",
">🎓 By the end of this video, you'll be equipped with the skills to confidently manage your codebase, collaborate with others, and contribute to open-source projects on GitHub.\n",
">\n",
">\n",
">Video link: https://youtu.be/IdT1BhrSfdo?si=mV7xVpiyuhlD8Zrw\n",
">\n",
">Your feedback is welcome\n",
"> Post URL: https://www.reddit.com/r/Python/comments/1823wr7/setup_github_copilot_in_visual_studio_code/\n",
"> Post Category: N/A.\n",
"> Score: 0\n",
">\n",
">Post Title: 'A Chinese Checkers game made with pygame and PySide6, with custom bots support'\n",
">User: HenryChess\n",
">Subreddit: r/Python:\n",
"> Text body: GitHub link: https://github.com/henrychess/pygame-chinese-checkers\n",
">\n",
">I'm not sure if this counts as beginner or intermediate. I think I'm still in the beginner zone, so I flair it as beginner.\n",
">\n",
">This is a Chinese Checkers (aka Sternhalma) game for 2 to 3 players. The bots I wrote are easy to beat, as they're mainly for debugging the game logic part of the code. However, you can write up your own custom bots. There is a guide at the github page.\n",
"> Post URL: https://www.reddit.com/r/Python/comments/181xq0u/a_chinese_checkers_game_made_with_pygame_and/\n",
"> Post Category: N/A.\n",
" > Score: 1\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using tool with an agent chain\n",
"\n",
"Reddit search functionality is also provided as a multi-input tool. In this example, we adapt [existing code from the docs](https://python.langchain.com/docs/modules/agents/how_to/sharedmemory_for_tools), and use ChatOpenAI to create an agent chain with memory. This agent chain is able to pull information from Reddit and use these posts to respond to subsequent input. \n",
"\n",
"To run the example, add your reddit API access information and also get an OpenAI key from the [OpenAI API](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Adapted code from https://python.langchain.com/docs/modules/agents/how_to/sharedmemory_for_tools\n",
"\n",
"from langchain.agents import AgentExecutor, StructuredChatAgent, Tool\n",
"from langchain.chains import LLMChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.memory import ConversationBufferMemory, ReadOnlySharedMemory\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.tools.reddit_search.tool import RedditSearchRun\n",
"from langchain.utilities.reddit_search import RedditSearchAPIWrapper\n",
"\n",
"# Provide keys for Reddit\n",
"client_id = \"\"\n",
"client_secret = \"\"\n",
"user_agent = \"\"\n",
"# Provide key for OpenAI\n",
"openai_api_key = \"\"\n",
"\n",
"template = \"\"\"This is a conversation between a human and a bot:\n",
"\n",
"{chat_history}\n",
"\n",
"Write a summary of the conversation for {input}:\n",
"\"\"\"\n",
"\n",
"prompt = PromptTemplate(input_variables=[\"input\", \"chat_history\"], template=template)\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\")\n",
"\n",
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"Begin!\"\n",
"\n",
"{chat_history}\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"tools = [\n",
" RedditSearchRun(\n",
" api_wrapper=RedditSearchAPIWrapper(\n",
" reddit_client_id=client_id,\n",
" reddit_client_secret=client_secret,\n",
" reddit_user_agent=user_agent,\n",
" )\n",
" )\n",
"]\n",
"\n",
"prompt = StructuredChatAgent.create_prompt(\n",
" prefix=prefix,\n",
" tools=tools,\n",
" suffix=suffix,\n",
" input_variables=[\"input\", \"chat_history\", \"agent_scratchpad\"],\n",
")\n",
"\n",
"llm = ChatOpenAI(temperature=0, openai_api_key=openai_api_key)\n",
"\n",
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
"agent = StructuredChatAgent(llm_chain=llm_chain, verbose=True, tools=tools)\n",
"agent_chain = AgentExecutor.from_agent_and_tools(\n",
" agent=agent, verbose=True, memory=memory, tools=tools\n",
")\n",
"\n",
"# Answering the first prompt requires usage of the Reddit search tool.\n",
"agent_chain.run(input=\"What is the newest post on r/langchain for the week?\")\n",
"# Answering the subsequent prompt uses memory.\n",
"agent_chain.run(input=\"Who is the author of the post?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.5 64-bit ('langchaindev')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
},
"vscode": {
"interpreter": {
"hash": "3929050b09828356c9f5ebaf862d05c053d8228eddbc70f990c168e54dd824ba"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}