mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-05 16:50:03 +00:00
Compare commits
1 Commits
vwp/openap
...
harrison/p
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
31f92cf88f |
@@ -10,8 +10,6 @@ Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set u
|
||||
## Quick Install
|
||||
|
||||
`pip install langchain`
|
||||
or
|
||||
`conda install langchain -c conda-forge`
|
||||
|
||||
## 🤔 What is this?
|
||||
|
||||
@@ -75,7 +73,7 @@ Memory is the concept of persisting state between calls of a chain/agent. LangCh
|
||||
|
||||
[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
|
||||
|
||||
For more information on these concepts, please see our [full documentation](https://langchain.readthedocs.io/en/latest/).
|
||||
For more information on these concepts, please see our [full documentation](https://langchain.readthedocs.io/en/latest/?).
|
||||
|
||||
## 💁 Contributing
|
||||
|
||||
|
||||
BIN
docs/_static/DataberryDashboard.png
vendored
BIN
docs/_static/DataberryDashboard.png
vendored
Binary file not shown.
|
Before Width: | Height: | Size: 157 KiB |
@@ -40,7 +40,3 @@ This includes: production ready endpoints, horizontal scaling across dependencie
|
||||
|
||||
## [Langchain-serve](https://github.com/jina-ai/langchain-serve)
|
||||
This repository allows users to serve local chains and agents as RESTful, gRPC, or Websocket APIs thanks to [Jina](https://docs.jina.ai/). Deploy your chains & agents with ease and enjoy independent scaling, serverless and autoscaling APIs, as well as a Streamlit playground on Jina AI Cloud.
|
||||
|
||||
## [BentoML](https://github.com/ssheng/BentoChain)
|
||||
|
||||
This repository provides an example of how to deploy a LangChain application with [BentoML](https://github.com/bentoml/BentoML). BentoML is a framework that enables the containerization of machine learning applications as standard OCI images. BentoML also allows for the automatic generation of OpenAPI and gRPC endpoints. With BentoML, you can integrate models from all popular ML frameworks and deploy them as microservices running on the most optimal hardware and scaling independently.
|
||||
|
||||
@@ -1,25 +0,0 @@
|
||||
# Databerry
|
||||
|
||||
This page covers how to use the [Databerry](https://databerry.ai) within LangChain.
|
||||
|
||||
## What is Databerry?
|
||||
|
||||
Databerry is an [open source](https://github.com/gmpetrov/databerry) document retrievial platform that helps to connect your personal data with Large Language Models.
|
||||
|
||||

|
||||
|
||||
## Quick start
|
||||
|
||||
Retrieving documents stored in Databerry from LangChain is very easy!
|
||||
|
||||
```python
|
||||
from langchain.retrievers import DataberryRetriever
|
||||
|
||||
retriever = DataberryRetriever(
|
||||
datastore_url="https://api.databerry.ai/query/clg1xg2h80000l708dymr0fxc",
|
||||
# api_key="DATABERRY_API_KEY", # optional if datastore is public
|
||||
# top_k=10 # optional
|
||||
)
|
||||
|
||||
docs = retriever.get_relevant_documents("What's Databerry?")
|
||||
```
|
||||
@@ -8,9 +8,8 @@ This page covers how to use the Deep Lake ecosystem within LangChain.
|
||||
|
||||
## More Resources
|
||||
1. [Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data](https://www.activeloop.ai/resources/ultimate-guide-to-lang-chain-deep-lake-build-chat-gpt-to-answer-questions-on-your-financial-data/)
|
||||
2. [Twitter the-algorithm codebase analysis with Deep Lake](../use_cases/code/twitter-the-algorithm-analysis-deeplake.ipynb)
|
||||
3. Here is [whitepaper](https://www.deeplake.ai/whitepaper) and [academic paper](https://arxiv.org/pdf/2209.10785.pdf) for Deep Lake
|
||||
4. Here is a set of additional resources available for review: [Deep Lake](https://github.com/activeloopai/deeplake), [Getting Started](https://docs.activeloop.ai/getting-started) and [Tutorials](https://docs.activeloop.ai/hub-tutorials)
|
||||
1. Here is [whitepaper](https://www.deeplake.ai/whitepaper) and [academic paper](https://arxiv.org/pdf/2209.10785.pdf) for Deep Lake
|
||||
2. Here is a set of additional resources available for review: [Deep Lake](https://github.com/activeloopai/deeplake), [Getting Started](https://docs.activeloop.ai/getting-started) and [Tutorials](https://docs.activeloop.ai/hub-tutorials)
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python package with `pip install deeplake`
|
||||
|
||||
@@ -1,21 +1,21 @@
|
||||
# GPT4All
|
||||
|
||||
This page covers how to use the `GPT4All` wrapper within LangChain. The tutorial is divided into two parts: installation and setup, followed by usage with an example.
|
||||
This page covers how to use the `GPT4All` wrapper within LangChain.
|
||||
It is broken into two parts: installation and setup, and then usage with an example.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python package with `pip install pyllamacpp`
|
||||
- Download a [GPT4All model](https://github.com/nomic-ai/pyllamacpp#supported-model) and place it in your desired directory
|
||||
- Download a [GPT4All model](https://github.com/nomic-ai/gpt4all) and place it in your desired directory
|
||||
|
||||
## Usage
|
||||
|
||||
### GPT4All
|
||||
|
||||
To use the GPT4All wrapper, you need to provide the path to the pre-trained model file and the model's configuration.
|
||||
|
||||
```python
|
||||
from langchain.llms import GPT4All
|
||||
|
||||
# Instantiate the model. Callbacks support token-wise streaming
|
||||
# Instantiate the model
|
||||
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8)
|
||||
|
||||
# Generate text
|
||||
@@ -24,24 +24,14 @@ response = model("Once upon a time, ")
|
||||
|
||||
You can also customize the generation parameters, such as n_predict, temp, top_p, top_k, and others.
|
||||
|
||||
To stream the model's predictions, add in a CallbackManager.
|
||||
Example:
|
||||
|
||||
```python
|
||||
from langchain.llms import GPT4All
|
||||
from langchain.callbacks.base import CallbackManager
|
||||
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
|
||||
# There are many CallbackHandlers supported, such as
|
||||
# from langchain.callbacks.streamlit import StreamlitCallbackHandler
|
||||
|
||||
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
|
||||
model = GPT4All(model="./models/gpt4all-model.bin", n_ctx=512, n_threads=8, callback_handler=callback_handler, verbose=True)
|
||||
|
||||
# Generate text. Tokens are streamed throught the callback manager.
|
||||
model("Once upon a time, ")
|
||||
model = GPT4All(model="./models/gpt4all-model.bin", n_predict=55, temp=0)
|
||||
response = model("Once upon a time, ")
|
||||
```
|
||||
|
||||
## Model File
|
||||
|
||||
You can find links to model file downloads in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
|
||||
You can find links to model file downloads at the [GPT4all](https://github.com/nomic-ai/gpt4all) repository. They will need to be converted to `ggml` format to work, as specified in the [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository.
|
||||
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/llms/integrations/gpt4all.ipynb)
|
||||
@@ -1,6 +1,6 @@
|
||||
# Graphsignal
|
||||
|
||||
This page covers how to use [Graphsignal](https://app.graphsignal.com) to trace and monitor LangChain. Graphsignal enables full visibility into your application. It provides latency breakdowns by chains and tools, exceptions with full context, data monitoring, compute/GPU utilization, OpenAI cost analytics, and more.
|
||||
This page covers how to use the Graphsignal ecosystem to trace and monitor LangChain.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -10,7 +10,7 @@ This page covers how to use [Graphsignal](https://app.graphsignal.com) to trace
|
||||
|
||||
## Tracing and Monitoring
|
||||
|
||||
Graphsignal automatically instruments and starts tracing and monitoring chains. Traces and metrics are then available in your [Graphsignal dashboards](https://app.graphsignal.com).
|
||||
Graphsignal automatically instruments and starts tracing and monitoring chains. Traces, metrics and errors are then available in your [Graphsignal dashboard](https://app.graphsignal.com/). No prompts or other sensitive data are sent to Graphsignal cloud, only statistics and metadata.
|
||||
|
||||
Initialize the tracer by providing a deployment name:
|
||||
|
||||
@@ -20,13 +20,7 @@ import graphsignal
|
||||
graphsignal.configure(deployment='my-langchain-app-prod')
|
||||
```
|
||||
|
||||
To additionally trace any function or code, you can use a decorator or a context manager:
|
||||
|
||||
```python
|
||||
@graphsignal.trace_function
|
||||
def handle_request():
|
||||
chain.run("some initial text")
|
||||
```
|
||||
In order to trace full runs and see a breakdown by chains and tools, you can wrap the calling routine or use a decorator:
|
||||
|
||||
```python
|
||||
with graphsignal.start_trace('my-chain'):
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
# Zilliz
|
||||
|
||||
This page covers how to use the Zilliz Cloud ecosystem within LangChain.
|
||||
Zilliz uses the Milvus integration.
|
||||
It is broken into two parts: installation and setup, and then references to specific Milvus wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install pymilvus`
|
||||
## Wrappers
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around Zilliz indexes, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Milvus
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Miluvs wrapper, see [this notebook](../modules/indexes/vectorstores/examples/zilliz.ipynb)
|
||||
@@ -9,8 +9,6 @@ To get started, install LangChain with the following command:
|
||||
|
||||
```bash
|
||||
pip install langchain
|
||||
# or
|
||||
conda install langchain -c conda-forge
|
||||
```
|
||||
|
||||
|
||||
|
||||
@@ -71,8 +71,6 @@ The above modules can be used in a variety of ways. LangChain also provides guid
|
||||
|
||||
- `Querying Tabular Data <./use_cases/tabular.html>`_: If you want to understand how to use LLMs to query data that is stored in a tabular format (csvs, SQL, dataframes, etc) you should read this page.
|
||||
|
||||
- `Code Understanding <./use_cases/code.html>`_: If you want to understand how to use LLMs to query source code from github, you should read this page.
|
||||
|
||||
- `Interacting with APIs <./use_cases/apis.html>`_: Enabling LLMs to interact with APIs is extremely powerful in order to give them more up-to-date information and allow them to take actions.
|
||||
|
||||
- `Extraction <./use_cases/extraction.html>`_: Extract structured information from text.
|
||||
@@ -92,7 +90,6 @@ The above modules can be used in a variety of ways. LangChain also provides guid
|
||||
./use_cases/question_answering.md
|
||||
./use_cases/chatbots.md
|
||||
./use_cases/tabular.rst
|
||||
./use_cases/code.md
|
||||
./use_cases/apis.md
|
||||
./use_cases/summarization.md
|
||||
./use_cases/extraction.md
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "68b24990",
|
||||
"metadata": {},
|
||||
@@ -10,7 +9,7 @@
|
||||
"\n",
|
||||
"This notebook covers how to combine agents and vectorstores. The use case for this is that you've ingested your data into a vectorstore and want to interact with it in an agentic manner.\n",
|
||||
"\n",
|
||||
"The recommended method for doing so is to create a RetrievalQA and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vectordbs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vectorstores as normal tools, or you can set `return_direct=True` to really just use the agent as a router."
|
||||
"The reccomended method for doing so is to create a RetrievalQA and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vectordbs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vectorstores as normal tools, or you can set `return_direct=True` to really just use the agent as a router."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -176,7 +176,7 @@
|
||||
" llm = OpenAI(temperature=0)\n",
|
||||
" tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm)\n",
|
||||
" agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION verbose=True\n",
|
||||
" )\n",
|
||||
" agent.run(q)\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,519 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c7ad998d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tool Retrieval over large Natural Language APIs\n",
|
||||
"\n",
|
||||
"This tutorial assumes familiarity with [Natural Language API Toolkits](../../toolkits/examples/openapi_nla.ipynb). NLAToolkits parse whole OpenAPI specs, which can be too large to reasonably fit into an agent's context. This tutorial walks through using a Retriever object to fetch tools.\n",
|
||||
"\n",
|
||||
"### First, import dependencies and load the LLM"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6593f793",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import re\n",
|
||||
"from typing import Callable, Union\n",
|
||||
"\n",
|
||||
"from langchain import OpenAI, LLMChain\n",
|
||||
"from langchain.prompts import StringPromptTemplate\n",
|
||||
"from langchain.requests import Requests\n",
|
||||
"from langchain.agents import AgentExecutor, AgentOutputParser, AgentType, initialize_agent, LLMSingleActionAgent\n",
|
||||
"from langchain.agents.agent_toolkits import NLAToolkit\n",
|
||||
"from langchain.schema import AgentAction, AgentFinish"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "37141072",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"spoonacular_api_key = \"\" # Copy from the API Console at https://spoonacular.com/food-api/console#Profile"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "dd720860",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Select the LLM to use. Here, we use text-davinci-003\n",
|
||||
"llm = OpenAI(temperature=0, max_tokens=700) # You can swap between different core LLM's here."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4cadac9d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### Next, load the Natural Language API Toolkits"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6b208ab0",
|
||||
"metadata": {
|
||||
"scrolled": true,
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.0 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"speak_toolkit = NLAToolkit.from_llm_and_url(llm, \"https://api.speak.com/openapi.yaml\")\n",
|
||||
"klarna_toolkit = NLAToolkit.from_llm_and_url(llm, \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")\n",
|
||||
"\n",
|
||||
"# Add the API key for authenticating to the API\n",
|
||||
"requests = Requests(headers={\"x-api-key\": spoonacular_api_key})\n",
|
||||
"spoonacular_toolkit = NLAToolkit.from_llm_and_url(\n",
|
||||
" llm, \n",
|
||||
" \"https://spoonacular.com/application/frontend/downloads/spoonacular-openapi-3.json\",\n",
|
||||
" requests=requests,\n",
|
||||
" max_text_length=1800, # If you want to truncate the response text\n",
|
||||
")\n",
|
||||
"toolkits = (speak_toolkit, spoonacular_toolkit, klarna_toolkit)\n",
|
||||
"ALL_TOOLS = [tool for toolkit in toolkits for tool in toolkit.get_tools()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5dfc494",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tool Retriever\n",
|
||||
"\n",
|
||||
"We will use a vectorstore to create embeddings for each tool description. Then, for an incoming query we can create embeddings for that query and do a similarity search for relevant tools."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "f7c29e82",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.schema import Document"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "5cd940a8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [Document(page_content=t.description, metadata={\"index\": i}) for i, t in enumerate(ALL_TOOLS)]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "19d77004",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store = FAISS.from_documents(docs, OpenAIEmbeddings())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "3d84c14d-c3eb-4381-8852-fd6175b02239",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create the retriever object\n",
|
||||
"retriever = vector_store.as_retriever()\n",
|
||||
"\n",
|
||||
"def get_tools(query):\n",
|
||||
" docs = retriever.get_relevant_documents(query)\n",
|
||||
" return [ALL_TOOLS[d.metadata[\"index\"]] for d in docs]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "d37782ab-a74c-4cd0-8712-ba9166152206",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"('Speak.explainPhrase',\n",
|
||||
" \"I'm an AI from Speak. Instruct what you want, and I'll assist via an API with description: Explain the meaning and usage of a specific foreign language phrase that the user is asking about.\")"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool = get_tools(\"How would I ask 'What is the most important thing?' in Maori?\")[0]\n",
|
||||
"tool.name, tool.description"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "88814a90-f1fe-49e7-88e4-fcd662142cfb",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"('spoonacular_API.ingredientSearch',\n",
|
||||
" \"I'm an AI from spoonacular API. Instruct what you want, and I'll assist via an API with description: Search for simple whole foods (e.g. fruits, vegetables, nuts, grains, meat, fish, dairy etc.).\")"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool = get_tools(\"What's a good vegetarian Thanksgiving dish?\")[0]\n",
|
||||
"tool.name, tool.description"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "16c7336f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create the Agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "3bc3de7c-dca9-4e29-9efb-dadb714c7100",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set up a prompt template\n",
|
||||
"class CustomPromptTemplate(StringPromptTemplate):\n",
|
||||
" # The template to use\n",
|
||||
" template: str\n",
|
||||
" ############## NEW ######################\n",
|
||||
" # The list of tools available\n",
|
||||
" tools_getter: Callable\n",
|
||||
" \n",
|
||||
" def format(self, **kwargs) -> str:\n",
|
||||
" # Get the intermediate steps (AgentAction, Observation tuples)\n",
|
||||
" # Format them in a particular way\n",
|
||||
" intermediate_steps = kwargs.pop(\"intermediate_steps\")\n",
|
||||
" thoughts = \"\"\n",
|
||||
" for action, observation in intermediate_steps:\n",
|
||||
" thoughts += action.log\n",
|
||||
" thoughts += f\"\\nObservation: {observation}\\nThought: \"\n",
|
||||
" # Set the agent_scratchpad variable to that value\n",
|
||||
" kwargs[\"agent_scratchpad\"] = thoughts\n",
|
||||
" ############## NEW ######################\n",
|
||||
" tools = self.tools_getter(kwargs['input'])\n",
|
||||
" # Create a tools variable from the list of tools provided\n",
|
||||
" kwargs[\"tools\"] = \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in tools])\n",
|
||||
" # Create a list of tool names for the tools provided\n",
|
||||
" kwargs[\"tool_names\"] = \", \".join([tool.name for tool in tools])\n",
|
||||
" return self.template.format(**kwargs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "9302880c-ad1d-490e-912f-bb166387c200",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Slightly tweak the instructions from the default agent\n",
|
||||
"template = \"\"\"Answer the following questions as best you can. You have access to the following tools:\n",
|
||||
"\n",
|
||||
"{tools}\n",
|
||||
"\n",
|
||||
"Use the following format:\n",
|
||||
"\n",
|
||||
"Question: the input question you must answer\n",
|
||||
"Thought: you should always think about what to do\n",
|
||||
"Action: the action to take, should be one of [{tool_names}]\n",
|
||||
"Action Input: full description of what you want to accomplish so the tool AI can assist.\n",
|
||||
"Observation: The Agent's response\n",
|
||||
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
|
||||
"Thought: I now know the final answer. User can't see any of my observations, API responses, links, or tools.\n",
|
||||
"Final Answer: the final answer to the original input question with the right amount of detail\n",
|
||||
"\n",
|
||||
"When responding with your Final Answer, remember that the person you are responding to CANNOT see any of your Thought/Action/Action Input/Observations, so if there is any relevant information there you need to include it explicitly in your response.\n",
|
||||
"Begin!\n",
|
||||
"\n",
|
||||
"Question: {input}\n",
|
||||
"Thought:{agent_scratchpad}\"\"\"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "36dedac3-94c2-4f02-b3ed-f44793c84eac",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = CustomPromptTemplate(\n",
|
||||
" template=template,\n",
|
||||
" tools_getter=get_tools,\n",
|
||||
" # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically\n",
|
||||
" # This includes the `intermediate_steps` variable because that is needed\n",
|
||||
" input_variables=[\"input\", \"intermediate_steps\"]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "8f2fe3cb-1060-4e71-8c9f-31cbb1931f1a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class CustomOutputParser(AgentOutputParser):\n",
|
||||
" \n",
|
||||
" def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:\n",
|
||||
" # Check if agent should finish\n",
|
||||
" if \"Final Answer:\" in llm_output:\n",
|
||||
" return AgentFinish(\n",
|
||||
" # Return values is generally always a dictionary with a single `output` key\n",
|
||||
" # It is not recommended to try anything else at the moment :)\n",
|
||||
" return_values={\"output\": llm_output.split(\"Final Answer:\")[-1].strip()},\n",
|
||||
" log=llm_output,\n",
|
||||
" )\n",
|
||||
" # Parse out the action and action input\n",
|
||||
" regex = r\"Action: (.*?)[\\n]*Action Input:[\\s]*(.*)\"\n",
|
||||
" match = re.search(regex, llm_output, re.DOTALL)\n",
|
||||
" if not match:\n",
|
||||
" raise ValueError(f\"Could not parse LLM output: `{llm_output}`\")\n",
|
||||
" action = match.group(1).strip()\n",
|
||||
" action_input = match.group(2)\n",
|
||||
" # Return the action and action input\n",
|
||||
" return AgentAction(tool=action, tool_input=action_input.strip(\" \").strip('\"'), log=llm_output)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "544ae3eb-9e10-40c9-b134-de752761f4f2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"output_parser = CustomOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4609a1ce-3f72-403e-bf83-958491f5805a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set up LLM, stop sequence, and the agent\n",
|
||||
"\n",
|
||||
"Also the same as the previous notebook"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "af14b84e-2713-4515-bc3c-c602b5527a06",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# LLM chain consisting of the LLM and a prompt\n",
|
||||
"llm_chain = LLMChain(llm=llm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "bc534f60-5042-454a-afd0-c2809b82fa6a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool_names = [tool.name for tool in ALL_TOOLS]\n",
|
||||
"agent = LLMSingleActionAgent(\n",
|
||||
" llm_chain=llm_chain, \n",
|
||||
" output_parser=output_parser,\n",
|
||||
" stop=[\"\\nObservation:\"], \n",
|
||||
" allowed_tools=tool_names,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "7283d902-f682-4631-aacc-a37616999de9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=ALL_TOOLS, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "ffe44871-25b0-4ef5-89a4-5a643fa6425d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Make the query more complex!\n",
|
||||
"user_input = (\n",
|
||||
" \"My Spanish relatives are coming and I need to pick some good wine and some food. Also, any advice on how to talk about it to them would be much appreciated\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "2f36278e-ec09-4237-8b7f-c9859ccd12e0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find a wine and a dish that go well together, and also provide some advice on how to talk about it\n",
|
||||
"Action: spoonacular_API.getWinePairing\n",
|
||||
"Action Input: \"Spanish cuisine\"\u001b[0m\n",
|
||||
"\n",
|
||||
"Observation:\u001b[36;1m\u001b[1;3mI attempted to call an API to find a wine pairing for Spanish cuisine, but the API returned an error saying it could not find a pairing. It may be that there is no known wine pairing for Spanish cuisine.\u001b[0m\u001b[32;1m\u001b[1;3m I should try to find a specific dish and then find a wine that goes well with it\n",
|
||||
"Action: spoonacular_API.getWinePairing\n",
|
||||
"Action Input: \"paella\"\u001b[0m\n",
|
||||
"\n",
|
||||
"Observation:\u001b[36;1m\u001b[1;3mWhen pairing wine with Spanish dishes, it is recommended to follow the rule 'what grows together goes together'. For paella, we recommend albariño for white wine and garnachan and tempranillo for red. One wine you could try is Becker Vineyards Tempranillo, which has 4.4 out of 5 stars and a bottle costs about 18 dollars.\u001b[0m\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: For your Spanish relatives, you should try pairing paella with Becker Vineyards Tempranillo. This red wine has 4.4 out of 5 stars and a bottle costs about 18 dollars. When pairing wine with Spanish dishes, it is recommended to follow the rule 'what grows together goes together'.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"For your Spanish relatives, you should try pairing paella with Becker Vineyards Tempranillo. This red wine has 4.4 out of 5 stars and a bottle costs about 18 dollars. When pairing wine with Spanish dishes, it is recommended to follow the rule 'what grows together goes together'.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.run(user_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a2959462",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Thank you!"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,409 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c7ad998d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Natural Language APIs\n",
|
||||
"\n",
|
||||
"Natural Language API Toolkits (NLAToolkits) permit LangChain Agents to efficiently plan and combine calls across endpoints. This notebook demonstrates a sample composition of the Speak, Klarna, and Spoonacluar APIs.\n",
|
||||
"\n",
|
||||
"For a detailed walkthrough of the OpenAPI chains wrapped within the NLAToolkit, see the [OpenAPI Operation Chain](openapi.ipynb) notebook.\n",
|
||||
"\n",
|
||||
"### First, import dependencies and load the LLM"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6593f793",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List, Optional\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.requests import Requests\n",
|
||||
"from langchain.tools import APIOperation, OpenAPISpec\n",
|
||||
"from langchain.agents import AgentType, Tool, initialize_agent\n",
|
||||
"from langchain.agents.agent_toolkits import NLAToolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd720860",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Select the LLM to use. Here, we use text-davinci-003\n",
|
||||
"llm = OpenAI(temperature=0, max_tokens=700) # You can swap between different core LLM's here."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4cadac9d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"### Next, load the Natural Language API Toolkits"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6b208ab0",
|
||||
"metadata": {
|
||||
"scrolled": true,
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"speak_toolkit = NLAToolkit.from_llm_and_url(llm, \"https://api.speak.com/openapi.yaml\")\n",
|
||||
"klarna_toolkit = NLAToolkit.from_llm_and_url(llm, \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "16c7336f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create the Agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "730a0dc2-b4d0-46d5-a1e9-583803220973",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Slightly tweak the instructions from the default agent\n",
|
||||
"openapi_format_instructions = \"\"\"Use the following format:\n",
|
||||
"\n",
|
||||
"Question: the input question you must answer\n",
|
||||
"Thought: you should always think about what to do\n",
|
||||
"Action: the action to take, should be one of [{tool_names}]\n",
|
||||
"Action Input: what to instruct the AI Action representative.\n",
|
||||
"Observation: The Agent's response\n",
|
||||
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
|
||||
"Thought: I now know the final answer. User can't see any of my observations, API responses, links, or tools.\n",
|
||||
"Final Answer: the final answer to the original input question with the right amount of detail\n",
|
||||
"\n",
|
||||
"When responding with your Final Answer, remember that the person you are responding to CANNOT see any of your Thought/Action/Action Input/Observations, so if there is any relevant information there you need to include it explicitly in your response.\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "40a979c3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"natural_language_tools = speak_toolkit.get_tools() + klarna_toolkit.get_tools()\n",
|
||||
"mrkl = initialize_agent(natural_language_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, \n",
|
||||
" verbose=True, agent_kwargs={\"format_instructions\":openapi_format_instructions})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "794380ba",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what kind of Italian clothes are available\n",
|
||||
"Action: Open_AI_Klarna_product_Api.productsUsingGET\n",
|
||||
"Action Input: Italian clothes\u001b[0m\n",
|
||||
"Observation: \u001b[31;1m\u001b[1;3mThe API response contains two products from the Alé brand in Italian Blue. The first is the Alé Colour Block Short Sleeve Jersey Men - Italian Blue, which costs $86.49, and the second is the Alé Dolid Flash Jersey Men - Italian Blue, which costs $40.00.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know what kind of Italian clothes are available and how much they cost.\n",
|
||||
"Final Answer: You can buy two products from the Alé brand in Italian Blue for your end of year party. The Alé Colour Block Short Sleeve Jersey Men - Italian Blue costs $86.49, and the Alé Dolid Flash Jersey Men - Italian Blue costs $40.00.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'You can buy two products from the Alé brand in Italian Blue for your end of year party. The Alé Colour Block Short Sleeve Jersey Men - Italian Blue costs $86.49, and the Alé Dolid Flash Jersey Men - Italian Blue costs $40.00.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\"I have an end of year party for my Italian class and have to buy some Italian clothes for it\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c61d92a8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using Auth + Adding more Endpoints\n",
|
||||
"\n",
|
||||
"Some endpoints may require user authentication via things like access tokens. Here we show how to pass in the authentication information via the `Requests` wrapper object.\n",
|
||||
"\n",
|
||||
"Since each NLATool exposes a concisee natural language interface to its wrapped API, the top level conversational agent has an easier job incorporating each endpoint to satisfy a user's request."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f0d132cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Adding the Spoonacular endpoints.**\n",
|
||||
"\n",
|
||||
"1. Go to the [Spoonacular API Console](https://spoonacular.com/food-api/console#Profile) and make a free account.\n",
|
||||
"2. Click on `Profile` and copy your API key below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "c2368b9c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"spoonacular_api_key = \"\" # Copy from the API Console"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "fbd97c28-fef6-41b5-9600-a9611a32bfb3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Attempting to load an OpenAPI 3.0.0 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Accept. Valid values are ['path', 'query'] Ignoring optional parameter\n",
|
||||
"Unsupported APIPropertyLocation \"header\" for parameter Content-Type. Valid values are ['path', 'query'] Ignoring optional parameter\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"requests = Requests(headers={\"x-api-key\": spoonacular_api_key})\n",
|
||||
"spoonacular_toolkit = NLAToolkit.from_llm_and_url(\n",
|
||||
" llm, \n",
|
||||
" \"https://spoonacular.com/application/frontend/downloads/spoonacular-openapi-3.json\",\n",
|
||||
" requests=requests,\n",
|
||||
" max_text_length=1800, # If you want to truncate the response text\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "81a6edac",
|
||||
"metadata": {
|
||||
"scrolled": true,
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"34 tools loaded.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"natural_language_api_tools = (speak_toolkit.get_tools() \n",
|
||||
" + klarna_toolkit.get_tools() \n",
|
||||
" + spoonacular_toolkit.get_tools()[:30]\n",
|
||||
" )\n",
|
||||
"print(f\"{len(natural_language_api_tools)} tools loaded.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "831f772d-5cd1-4467-b494-a3172af2ff48",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create an agent with the new tools\n",
|
||||
"mrkl = initialize_agent(natural_language_api_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, \n",
|
||||
" verbose=True, agent_kwargs={\"format_instructions\":openapi_format_instructions})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "0385e04b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Make the query more complex!\n",
|
||||
"user_input = (\n",
|
||||
" \"I'm learning Italian, and my language class is having an end of year party... \"\n",
|
||||
" \" Could you help me find an Italian outfit to wear and\"\n",
|
||||
" \" an appropriate recipe to prepare so I can present for the class in Italian?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "6ebd3f55",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find a recipe and an outfit that is Italian-themed.\n",
|
||||
"Action: spoonacular_API.searchRecipes\n",
|
||||
"Action Input: Italian\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mThe API response contains 10 Italian recipes, including Turkey Tomato Cheese Pizza, Broccolini Quinoa Pilaf, Bruschetta Style Pork & Pasta, Salmon Quinoa Risotto, Italian Tuna Pasta, Roasted Brussels Sprouts With Garlic, Asparagus Lemon Risotto, Italian Steamed Artichokes, Crispy Italian Cauliflower Poppers Appetizer, and Pappa Al Pomodoro.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find an Italian-themed outfit.\n",
|
||||
"Action: Open_AI_Klarna_product_Api.productsUsingGET\n",
|
||||
"Action Input: Italian\u001b[0m\n",
|
||||
"Observation: \u001b[31;1m\u001b[1;3mI found 10 products related to 'Italian' in the API response. These products include Italian Gold Sparkle Perfectina Necklace - Gold, Italian Design Miami Cuban Link Chain Necklace - Gold, Italian Gold Miami Cuban Link Chain Necklace - Gold, Italian Gold Herringbone Necklace - Gold, Italian Gold Claddagh Ring - Gold, Italian Gold Herringbone Chain Necklace - Gold, Garmin QuickFit 22mm Italian Vacchetta Leather Band, Macy's Italian Horn Charm - Gold, Dolce & Gabbana Light Blue Italian Love Pour Homme EdT 1.7 fl oz.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: To present for your Italian language class, you could wear an Italian Gold Sparkle Perfectina Necklace - Gold, an Italian Design Miami Cuban Link Chain Necklace - Gold, or an Italian Gold Miami Cuban Link Chain Necklace - Gold. For a recipe, you could make Turkey Tomato Cheese Pizza, Broccolini Quinoa Pilaf, Bruschetta Style Pork & Pasta, Salmon Quinoa Risotto, Italian Tuna Pasta, Roasted Brussels Sprouts With Garlic, Asparagus Lemon Risotto, Italian Steamed Artichokes, Crispy Italian Cauliflower Poppers Appetizer, or Pappa Al Pomodoro.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'To present for your Italian language class, you could wear an Italian Gold Sparkle Perfectina Necklace - Gold, an Italian Design Miami Cuban Link Chain Necklace - Gold, or an Italian Gold Miami Cuban Link Chain Necklace - Gold. For a recipe, you could make Turkey Tomato Cheese Pizza, Broccolini Quinoa Pilaf, Bruschetta Style Pork & Pasta, Salmon Quinoa Risotto, Italian Tuna Pasta, Roasted Brussels Sprouts With Garlic, Asparagus Lemon Risotto, Italian Steamed Artichokes, Crispy Italian Cauliflower Poppers Appetizer, or Pappa Al Pomodoro.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(user_input)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a2959462",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Thank you!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "6fcda5f0",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"In Italian, you can say 'Buon appetito' to someone to wish them to enjoy their meal. This phrase is commonly used in Italy when someone is about to eat, often at the beginning of a meal. It's similar to saying 'Bon appétit' in French or 'Guten Appetit' in German.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"natural_language_api_tools[1].run(\"Tell the LangChain audience to 'enjoy the meal' in Italian, please!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ab366dc0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -84,7 +84,7 @@
|
||||
"tools = load_tools([\"requests\"] )\n",
|
||||
"tools += [tool]\n",
|
||||
"\n",
|
||||
"agent_chain = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)\n",
|
||||
"agent_chain = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION verbose=True)\n",
|
||||
"agent_chain.run(\"what t shirts are available in klarna?\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"This notebook goes over how to use the google search component.\n",
|
||||
"\n",
|
||||
"First, you need to set up the proper API keys and environment variables. To set it up, create the GOOGLE_API_KEY in the Google Cloud credential console (https://console.cloud.google.com/apis/credentials) and a GOOGLE_CSE_ID using the Programmable Search Enginge (https://programmablesearchengine.google.com/controlpanel/create). Next, it is good to follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
|
||||
"First, you need to set up the proper API keys and environment variables. To set it up, follow the instructions found [here](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search).\n",
|
||||
"\n",
|
||||
"Then we will need to set some environment variables."
|
||||
]
|
||||
|
||||
@@ -1,388 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"# Callbacks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29dd6333-307c-43df-b848-65001c01733b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"LangChain provides a callback system that allows you to hook into the various stages of your LLM application. This is useful for logging, [monitoring](https://python.langchain.com/en/latest/tracing.html), [streaming](https://python.langchain.com/en/latest/modules/models/llms/examples/streaming_llm.html), and other tasks.\n",
|
||||
"\n",
|
||||
"You can subscribe to these events by using the `callback_manager` argument available throughout the API. A `CallbackManager` is an object that manages a list of `CallbackHandlers`. The `CallbackManager` will call the appropriate method on each handler when the event is triggered."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fdb72e8d-a02a-474d-96bf-f5759432afc8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"class CallbackManager(BaseCallbackHandler):\n",
|
||||
" \"\"\"Base callback manager that can be used to handle callbacks from LangChain.\"\"\"\n",
|
||||
"\n",
|
||||
" def add_handler(self, callback: BaseCallbackHandler) -> None:\n",
|
||||
" \"\"\"Add a handler to the callback manager.\"\"\"\n",
|
||||
"\n",
|
||||
" def remove_handler(self, handler: BaseCallbackHandler) -> None:\n",
|
||||
" \"\"\"Remove a handler from the callback manager.\"\"\"\n",
|
||||
"\n",
|
||||
" def set_handler(self, handler: BaseCallbackHandler) -> None:\n",
|
||||
" \"\"\"Set handler as the only handler on the callback manager.\"\"\"\n",
|
||||
" self.set_handlers([handler])\n",
|
||||
"\n",
|
||||
" def set_handlers(self, handlers: List[BaseCallbackHandler]) -> None:\n",
|
||||
" \"\"\"Set handlers as the only handlers on the callback manager.\"\"\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b6d7dba-cd20-472a-ae05-f68675cc9ea4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`CallbackHandlers` are objects that implement the `CallbackHandler` interface, which has a method for each event that can be subscribed to. The `CallbackManager` will call the appropriate method on each handler when the event is triggered."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e4592215-6604-47e2-89ff-5db3af6d1e40",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"class BaseCallbackHandler(ABC):\n",
|
||||
" \"\"\"Base callback handler that can be used to handle callbacks from langchain.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_llm_start(\n",
|
||||
" self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when LLM starts running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run on new LLM token. Only available when streaming is enabled.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run when LLM ends running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_llm_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when LLM errors.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_chain_start(\n",
|
||||
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when chain starts running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run when chain ends running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_chain_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when chain errors.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_tool_start(\n",
|
||||
" self, serialized: Dict[str, Any], input_str: str, **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when tool starts running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_tool_end(self, output: str, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run when tool ends running.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_tool_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when tool errors.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_text(self, text: str, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run on arbitrary text.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run on agent action.\"\"\"\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run on agent end.\"\"\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d3bf3304-43fb-47ad-ae50-0637a17018a2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating and Using a Custom `CallbackHandler`\n",
|
||||
"\n",
|
||||
"By default, a shared CallbackManager with the StdOutCallbackHandler will be used by models, chains, agents, and tools. However, you can pass in your own CallbackManager with a custom CallbackHandler:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "80532dfc-d687-4147-a0c9-1f90cc3e868c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"AgentAction(tool='Search', tool_input=\"US Open men's final 2019 winner\", log=' I need to find out who won the US Open men\\'s final in 2019 and then calculate his age raised to the 0.334 power.\\nAction: Search\\nAction Input: \"US Open men\\'s final 2019 winner\"')\n",
|
||||
"Rafael Nadal defeated Daniil Medvedev in the final, 7–5, 6–3, 5–7, 4–6, 6–4 to win the men's singles tennis title at the 2019 US Open. It was his fourth US ...\n",
|
||||
"AgentAction(tool='Search', tool_input='Rafael Nadal age', log=' I need to find out the age of the winner\\nAction: Search\\nAction Input: \"Rafael Nadal age\"')\n",
|
||||
"36 years\n",
|
||||
"AgentAction(tool='Calculator', tool_input='36^0.334', log=' I now need to calculate his age raised to the 0.334 power\\nAction: Calculator\\nAction Input: 36^0.334')\n",
|
||||
"Answer: 3.3098250249682484\n",
|
||||
"\n",
|
||||
" I now know the final answer\n",
|
||||
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Any, Dict, List, Optional, Union\n",
|
||||
"\n",
|
||||
"from langchain.agents import initialize_agent, load_tools\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.callbacks.base import CallbackManager, BaseCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.schema import AgentAction, AgentFinish, LLMResult\n",
|
||||
"\n",
|
||||
"class MyCustomCallbackHandler(BaseCallbackHandler):\n",
|
||||
" \"\"\"Custom CallbackHandler.\"\"\"\n",
|
||||
"\n",
|
||||
" def on_llm_start(\n",
|
||||
" self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Print out the prompts.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_llm_new_token(self, token: str, **kwargs: Any) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_llm_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_chain_start(\n",
|
||||
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Print out that we are entering a chain.\"\"\"\n",
|
||||
" class_name = serialized[\"name\"]\n",
|
||||
" print(f\"\\n\\n\\033[1m> Entering new {class_name} chain...\\033[0m\")\n",
|
||||
"\n",
|
||||
" def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:\n",
|
||||
" \"\"\"Print out that we finished a chain.\"\"\"\n",
|
||||
" print(\"\\n\\033[1m> Finished chain.\\033[0m\")\n",
|
||||
"\n",
|
||||
" def on_chain_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_tool_start(\n",
|
||||
" self,\n",
|
||||
" serialized: Dict[str, Any],\n",
|
||||
" input_str: str,\n",
|
||||
" **kwargs: Any,\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_agent_action(\n",
|
||||
" self, action: AgentAction, color: Optional[str] = None, **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run on agent action.\"\"\"\n",
|
||||
" print(action)\n",
|
||||
"\n",
|
||||
" def on_tool_end(\n",
|
||||
" self,\n",
|
||||
" output: str,\n",
|
||||
" color: Optional[str] = None,\n",
|
||||
" observation_prefix: Optional[str] = None,\n",
|
||||
" llm_prefix: Optional[str] = None,\n",
|
||||
" **kwargs: Any,\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"If not the final action, print out observation.\"\"\"\n",
|
||||
" print(output)\n",
|
||||
"\n",
|
||||
" def on_tool_error(\n",
|
||||
" self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Do nothing.\"\"\"\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def on_text(\n",
|
||||
" self,\n",
|
||||
" text: str,\n",
|
||||
" color: Optional[str] = None,\n",
|
||||
" end: str = \"\",\n",
|
||||
" **kwargs: Optional[str],\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Run when agent ends.\"\"\"\n",
|
||||
" print(text)\n",
|
||||
"\n",
|
||||
" def on_agent_finish(\n",
|
||||
" self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Run on agent end.\"\"\"\n",
|
||||
" print(finish.log)\n",
|
||||
"manager = CallbackManager([MyCustomCallbackHandler()])\n",
|
||||
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)\n",
|
||||
"tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, callback_manager=manager)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, callback_manager=manager\n",
|
||||
")\n",
|
||||
"agent.run(\"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bc9785fa-4f71-4797-91a3-4fe7e57d0429",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Async Support\n",
|
||||
"\n",
|
||||
"If you are planning to use the async API, it is recommended to use `AsyncCallbackHandler` and `AsyncCallbackManager` to avoid blocking the runloop."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "c702e0c9-a961-4897-90c1-cdd13b6f16b2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"zzzz....\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"zzzz....\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"from aiohttp import ClientSession\n",
|
||||
"\n",
|
||||
"from langchain.callbacks.base import AsyncCallbackHandler, AsyncCallbackManager\n",
|
||||
"\n",
|
||||
"class MyCustomAsyncCallbackHandler(AsyncCallbackHandler):\n",
|
||||
" \"\"\"Async callback handler that can be used to handle callbacks from langchain.\"\"\"\n",
|
||||
"\n",
|
||||
" async def on_chain_start(\n",
|
||||
" self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any\n",
|
||||
" ) -> None:\n",
|
||||
" \"\"\"Run when chain starts running.\"\"\"\n",
|
||||
" print(\"zzzz....\")\n",
|
||||
" await asyncio.sleep(0.5)\n",
|
||||
" class_name = serialized[\"name\"]\n",
|
||||
" print(f\"\\n\\n\\033[1m> Entering new {class_name} chain...\\033[0m\")\n",
|
||||
"\n",
|
||||
" async def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:\n",
|
||||
" \"\"\"Run when chain ends running.\"\"\"\n",
|
||||
" print(\"zzzz....\")\n",
|
||||
" await asyncio.sleep(0.5)\n",
|
||||
" print(\"\\n\\033[1m> Finished chain.\\033[0m\")\n",
|
||||
"\n",
|
||||
"manager = AsyncCallbackManager([MyCustomAsyncCallbackHandler()])\n",
|
||||
"\n",
|
||||
"# To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
|
||||
"# but you must manually close the client session at the end of your program/event loop\n",
|
||||
"aiosession = ClientSession()\n",
|
||||
"llm = OpenAI(temperature=0, callback_manager=manager)\n",
|
||||
"async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession, callback_manager=manager)\n",
|
||||
"async_agent = initialize_agent(async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True, callback_manager=manager)\n",
|
||||
"await async_agent.arun(\"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\")\n",
|
||||
"await aiosession.close()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "86be6304-e433-4048-880c-a92a73244407",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -36,6 +36,25 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "7a886879",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"cannot find .env file\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%load_ext dotenv\n",
|
||||
"%dotenv"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "3f2f9b8c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -232,23 +251,10 @@
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'title': 'Tragedy at sunset on the beach',\n",
|
||||
" 'era': 'Victorian England',\n",
|
||||
" 'synopsis': \"\\n\\nThe play follows the story of John, a young man from a wealthy Victorian family, who dreams of a better life for himself. He soon meets a beautiful young woman named Mary, who shares his dream. The two fall in love and decide to elope and start a new life together.\\n\\nOn their journey, they make their way to a beach at sunset, where they plan to exchange their vows of love. Unbeknownst to them, their plans are overheard by John's father, who has been tracking them. He follows them to the beach and, in a fit of rage, confronts them. \\n\\nA physical altercation ensues, and in the struggle, John's father accidentally stabs Mary in the chest with his sword. The two are left in shock and disbelief as Mary dies in John's arms, her last words being a declaration of her love for him.\\n\\nThe tragedy of the play comes to a head when John, broken and with no hope of a future, chooses to take his own life by jumping off the cliffs into the sea below. \\n\\nThe play is a powerful story of love, hope, and loss set against the backdrop of 19th century England.\",\n",
|
||||
" 'review': \"\\n\\nThe latest production from playwright X is a powerful and heartbreaking story of love and loss set against the backdrop of 19th century England. The play follows John, a young man from a wealthy Victorian family, and Mary, a beautiful young woman with whom he falls in love. The two decide to elope and start a new life together, and the audience is taken on a journey of hope and optimism for the future.\\n\\nUnfortunately, their dreams are cut short when John's father discovers them and in a fit of rage, fatally stabs Mary. The tragedy of the play is further compounded when John, broken and without hope, takes his own life. The storyline is not only realistic, but also emotionally compelling, drawing the audience in from start to finish.\\n\\nThe acting was also commendable, with the actors delivering believable and nuanced performances. The playwright and director have successfully crafted a timeless tale of love and loss that will resonate with audiences for years to come. Highly recommended.\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"overall_chain({\"title\":\"Tragedy at sunset on the beach\", \"era\": \"Victorian England\"})"
|
||||
"review = overall_chain({\"title\":\"Tragedy at sunset on the beach\", \"era\": \"Victorian England\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -5,14 +5,14 @@
|
||||
"id": "134a0785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Chat Over Documents with Chat History\n",
|
||||
"# Chat Index\n",
|
||||
"\n",
|
||||
"This notebook goes over how to set up a chain to chat over documents with chat history using a `ConversationalRetrievalChain`. The only difference between this chain and the [RetrievalQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
|
||||
"This notebook goes over how to set up a chain to chat with an index. The only difference between this chain and the [RetrievalQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 1,
|
||||
"id": "70c4e529",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -36,7 +36,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 2,
|
||||
"id": "01c46e92",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -58,7 +58,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 3,
|
||||
"id": "433363a5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -81,7 +81,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 4,
|
||||
"id": "a8930cf7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -109,12 +109,12 @@
|
||||
"id": "3c96b118",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We now initialize the `ConversationalRetrievalChain`"
|
||||
"We now initialize the ConversationalRetrievalChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 5,
|
||||
"id": "7b4110f3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -134,7 +134,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 6,
|
||||
"id": "7fe3e730",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -148,7 +148,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"id": "bfff9cc8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -160,7 +160,7 @@
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -179,7 +179,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"id": "00b4cf00",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -193,7 +193,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 9,
|
||||
"id": "f01828d1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -202,10 +202,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court.'"
|
||||
"' Justice Stephen Breyer'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -225,11 +225,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 11,
|
||||
"id": "562769c6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)"
|
||||
@@ -237,11 +235,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 12,
|
||||
"id": "ea478300",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
@@ -251,19 +247,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 13,
|
||||
"id": "4cb75b4e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../state_of_the_union.txt'})"
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -283,11 +277,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 14,
|
||||
"id": "5ed8d612",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectordbkwargs = {\"search_distance\": 0.9}"
|
||||
@@ -295,11 +287,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 15,
|
||||
"id": "6a7b3459",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)\n",
|
||||
@@ -319,25 +309,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 16,
|
||||
"id": "e53a9d66",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
|
||||
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "bf205e35",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
@@ -355,9 +341,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "78155887",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
@@ -369,9 +353,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "e54b5fa2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@@ -402,9 +384,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "d1058fd2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
|
||||
@@ -414,9 +394,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "a6594482",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
@@ -434,9 +412,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "e2badd21",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
@@ -448,9 +424,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "edb31fe5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@@ -479,7 +453,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"execution_count": 26,
|
||||
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -489,10 +463,10 @@
|
||||
"from langchain.chains.llm import LLMChain\n",
|
||||
"from langchain.callbacks.base import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
|
||||
"from langchain.chains.chat_index.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"\n",
|
||||
"# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
|
||||
"# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
|
||||
"# and a separate, non-streaming llm for question generation\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"streaming_llm = OpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
|
||||
@@ -506,7 +480,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"execution_count": 27,
|
||||
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -528,7 +502,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"execution_count": 28,
|
||||
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -538,7 +512,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Ketanji Brown Jackson succeeded Justice Stephen Breyer on the United States Supreme Court."
|
||||
" Justice Stephen Breyer"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -559,11 +533,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"execution_count": 29,
|
||||
"id": "a7ba9d8c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_chat_history(inputs) -> str:\n",
|
||||
@@ -571,16 +543,14 @@
|
||||
" for human, ai in inputs:\n",
|
||||
" res.append(f\"Human:{human}\\nAI:{ai}\")\n",
|
||||
" return \"\\n\".join(res)\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), get_chat_history=get_chat_history)"
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore, get_chat_history=get_chat_history)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"execution_count": 30,
|
||||
"id": "a3e33c0d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
@@ -590,11 +560,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"execution_count": 31,
|
||||
"id": "936dc62f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
@@ -602,7 +570,7 @@
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 33,
|
||||
"execution_count": 31,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -636,7 +604,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -23,9 +23,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "17fcbc0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
@@ -40,26 +38,17 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "ef9305cc",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"../../state_of_the_union.txt\") as f:\n",
|
||||
" state_of_the_union = f.read()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_text(state_of_the_union)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
"index_creator = VectorstoreIndexCreator()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "291f0117",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
@@ -71,29 +60,27 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{\"source\": str(i)} for i in range(len(texts))]).as_retriever()"
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader(\"../../state_of_the_union.txt\")\n",
|
||||
"docsearch = index_creator.from_loaders([loader])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "d1eaf6e6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Justice Breyer\"\n",
|
||||
"docs = docsearch.get_relevant_documents(query)"
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a16e3453",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
@@ -111,19 +98,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 19,
|
||||
"id": "fd9e6190",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'"
|
||||
"' The president said that he was honoring Justice Breyer for his service to the country and that he was a Constitutional scholar, Army veteran, and retiring Justice of the United States Supreme Court.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -154,11 +139,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "180fd4c1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"stuff\")"
|
||||
@@ -166,19 +149,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 7,
|
||||
"id": "77fdf1aa",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'output_text': ' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'}"
|
||||
"{'output_text': ' The president said that he was honoring Justice Breyer for his service to the country and that he was a Constitutional scholar, Army veteran, and retiring Justice of the United States Supreme Court.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -200,19 +181,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"id": "5558c9e0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha ricevuto una vasta gamma di supporto.'}"
|
||||
"{'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera come giudice della Corte Suprema degli Stati Uniti.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -243,11 +222,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 8,
|
||||
"id": "b0060f51",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_reduce\")"
|
||||
@@ -255,19 +232,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 9,
|
||||
"id": "fbdb9137",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'output_text': ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'}"
|
||||
"{'output_text': ' The president said, \"Justice Breyer, thank you for your service.\"'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -289,11 +264,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 10,
|
||||
"id": "452c8680",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_reduce\", return_map_steps=True)"
|
||||
@@ -301,23 +274,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 11,
|
||||
"id": "90b47a75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'intermediate_steps': [' \"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"',\n",
|
||||
" ' A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.',\n",
|
||||
" ' None',\n",
|
||||
" ' None',\n",
|
||||
" ' None'],\n",
|
||||
" 'output_text': ' The president said that Justice Breyer is an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court, and thanked him for his service.'}"
|
||||
" 'output_text': ' The president said, \"Justice Breyer, thank you for your service.\"'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -338,23 +309,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 13,
|
||||
"id": "af03a578",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'intermediate_steps': [\"\\nStasera vorrei onorare qualcuno che ha dedicato la sua vita a servire questo paese: il giustizia Stephen Breyer - un veterano dell'esercito, uno studioso costituzionale e un giustizia in uscita della Corte Suprema degli Stati Uniti. Giustizia Breyer, grazie per il tuo servizio.\",\n",
|
||||
" '\\nNessun testo pertinente.',\n",
|
||||
" ' Non ha detto nulla riguardo a Justice Breyer.',\n",
|
||||
" \" Non c'è testo pertinente.\"],\n",
|
||||
" 'output_text': ' Non ha detto nulla riguardo a Justice Breyer.'}"
|
||||
" \"\\nCome ho detto l'anno scorso, soprattutto ai nostri giovani americani transgender, avrò sempre il tuo sostegno come tuo Presidente, in modo che tu possa essere te stesso e raggiungere il tuo potenziale donato da Dio.\",\n",
|
||||
" '\\nNella mia amministrazione, i guardiani sono stati accolti di nuovo. Stiamo andando dietro ai criminali che hanno rubato miliardi di dollari di aiuti di emergenza destinati alle piccole imprese e a milioni di americani. E stasera, annuncio che il Dipartimento di Giustizia nominerà un procuratore capo per la frode pandemica.'],\n",
|
||||
" 'output_text': ' Non conosco la risposta alla tua domanda su cosa abbia detto il Presidente riguardo al Giustizia Breyer.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -410,11 +379,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 12,
|
||||
"id": "fb167057",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"refine\")"
|
||||
@@ -422,19 +389,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"id": "d8b5286e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which he said would be the most sweeping investment to rebuild America in history and would help the country compete for the jobs of the 21st Century.'}"
|
||||
"{'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -456,11 +421,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 14,
|
||||
"id": "a5c64200",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"refine\", return_refine_steps=True)"
|
||||
@@ -468,23 +431,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 15,
|
||||
"id": "817546ac",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'intermediate_steps': ['\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country and his legacy of excellence.',\n",
|
||||
" '\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice.',\n",
|
||||
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans.',\n",
|
||||
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which is the most sweeping investment to rebuild America in history.'],\n",
|
||||
" 'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his support of the Equality Act and his commitment to protecting the rights of LGBTQ+ Americans. He also praised Justice Breyer for his role in helping to pass the Bipartisan Infrastructure Law, which is the most sweeping investment to rebuild America in history.'}"
|
||||
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice.',\n",
|
||||
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act.',\n",
|
||||
" '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'],\n",
|
||||
" 'output_text': '\\n\\nThe president said that he wanted to honor Justice Breyer for his dedication to serving the country, his legacy of excellence, and his commitment to advancing liberty and justice, as well as for his commitment to protecting the rights of LGBTQ+ Americans and his support for the bipartisan Equality Act. He also mentioned his plan to lower costs to give families a fair shot, lower the deficit, and go after criminals who stole pandemic relief funds. He also announced that the Justice Department will name a chief prosecutor for pandemic fraud.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -505,23 +466,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 14,
|
||||
"id": "6664bda7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'intermediate_steps': ['\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha reso omaggio al suo servizio.',\n",
|
||||
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere e la risoluzione del sistema di immigrazione.\",\n",
|
||||
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei.\",\n",
|
||||
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei e per investire in America, educare gli americani, far crescere la forza lavoro e costruire l'economia dal\"],\n",
|
||||
" 'output_text': \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha reso omaggio al suo servizio e ha sostenuto la nomina di una top litigatrice in pratica privata, un ex difensore pubblico federale e una famiglia di insegnanti e agenti di polizia delle scuole pubbliche. Ha anche sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere, la risoluzione del sistema di immigrazione, la protezione degli americani LGBTQ+ e l'approvazione dell'Equality Act. Ha inoltre sottolineato l'importanza di lavorare insieme per sconfiggere l'epidemia di oppiacei e per investire in America, educare gli americani, far crescere la forza lavoro e costruire l'economia dal\"}"
|
||||
"{'intermediate_steps': ['\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito.',\n",
|
||||
" \"\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani.\",\n",
|
||||
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\",\n",
|
||||
" \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\"],\n",
|
||||
" 'output_text': \"\\n\\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera. Ha anche detto che la sua nomina di Circuit Court of Appeals Judge Ketanji Brown Jackson continuerà il suo eccezionale lascito. Ha sottolineato che la sua esperienza come avvocato di alto livello in pratica privata, come ex difensore federale pubblico e come membro di una famiglia di educatori e agenti di polizia, la rende una costruttrice di consenso. Ha anche sottolineato che, dalla sua nomina, ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Ha inoltre sottolineato che la nomina di Justice Breyer è un passo importante verso l'uguaglianza per tutti gli americani, in partic\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -573,11 +532,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 16,
|
||||
"id": "e2bfe203",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_rerank\", return_intermediate_steps=True)"
|
||||
@@ -585,11 +542,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 17,
|
||||
"id": "5c28880c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Justice Breyer\"\n",
|
||||
@@ -598,19 +553,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 18,
|
||||
"id": "80ac2db3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.'"
|
||||
"' The president thanked Justice Breyer for his service and honored him for dedicating his life to serving the country. '"
|
||||
]
|
||||
},
|
||||
"execution_count": 22,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -621,23 +574,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": 19,
|
||||
"id": "b428fcb9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'answer': ' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.',\n",
|
||||
"[{'answer': ' The president thanked Justice Breyer for his service and honored him for dedicating his life to serving the country. ',\n",
|
||||
" 'score': '100'},\n",
|
||||
" {'answer': ' This document does not answer the question', 'score': '0'},\n",
|
||||
" {'answer': ' This document does not answer the question', 'score': '0'},\n",
|
||||
" {'answer': ' This document does not answer the question', 'score': '0'}]"
|
||||
" {'answer': \" The president said that Justice Breyer is a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that since she's been nominated, she's received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans, and that she is a consensus builder.\",\n",
|
||||
" 'score': '100'},\n",
|
||||
" {'answer': ' The president did not mention Justice Breyer in this context.',\n",
|
||||
" 'score': '0'},\n",
|
||||
" {'answer': ' The president did not mention Justice Breyer in the given context. ',\n",
|
||||
" 'score': '0'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -658,25 +612,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"execution_count": 16,
|
||||
"id": "41b83cd8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'intermediate_steps': [{'answer': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese.',\n",
|
||||
"{'intermediate_steps': [{'answer': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.',\n",
|
||||
" 'score': '100'},\n",
|
||||
" {'answer': ' Il presidente non ha detto nulla sulla Giustizia Breyer.',\n",
|
||||
" 'score': '100'},\n",
|
||||
" {'answer': ' Non so.', 'score': '0'},\n",
|
||||
" {'answer': ' Non so.', 'score': '0'}],\n",
|
||||
" 'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese.'}"
|
||||
" {'answer': ' Il presidente non ha detto nulla sulla giustizia Breyer.',\n",
|
||||
" 'score': '100'}],\n",
|
||||
" 'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 24,
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -741,7 +694,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,87 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66a7777e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Bilibili\n",
|
||||
"\n",
|
||||
"This loader utilizes the `bilibili-api` to fetch the text transcript from Bilibili, one of the most beloved long-form video sites in China.\n",
|
||||
"\n",
|
||||
"With this BiliBiliLoader, users can easily obtain the transcript of their desired video content on the platform."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "9ec8a3b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.bilibili import BiliBiliLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "43128d8d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install bilibili-api"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "35d6809a",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = BiliBiliLoader(\n",
|
||||
" [\"https://www.bilibili.com/video/BV1xt411o7Xu/\"]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -104,11 +104,10 @@
|
||||
"Efficient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detection ModelsDocument Images \n",
|
||||
"T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y out Data Structur e\n",
|
||||
"Fig. 1: The overall architecture of LayoutParser . For an input document image,\n",
|
||||
"the core LayoutParser library provides a set of o\u000b",
|
||||
"\n",
|
||||
"the core LayoutParser library provides a set of o\u000B",
|
||||
"-the-shelf tools for layout\n",
|
||||
"detection, OCR, visualization, and storage, backed by a carefully designed layout\n",
|
||||
"data structure. LayoutParser also supports high level customization via e\u000ecient\n",
|
||||
"data structure. LayoutParser also supports high level customization via e\u000Ecient\n",
|
||||
"layout annotation and model training functions. These improve model accuracy\n",
|
||||
"on the target samples. The community platform enables the easy sharing of DIA\n",
|
||||
"models and whole digitization pipelines to promote reusability and reproducibility.\n",
|
||||
@@ -118,7 +117,6 @@
|
||||
"DL-based support for developing and deploying models for general computer\n",
|
||||
"vision and natural language processing problems. LayoutParser , on the other\n",
|
||||
"hand, specializes speci\f",
|
||||
"\n",
|
||||
"cally in DIA tasks. LayoutParser is also equipped with a\n",
|
||||
"community platform inspired by established model hubs such as Torch Hub [23]\n",
|
||||
"andTensorFlow Hub [1]. It enables the sharing of pretrained models as well as\n",
|
||||
@@ -127,16 +125,13 @@
|
||||
"development of DL models. Some examples include PRImA [ 3](magazine layouts),\n",
|
||||
"PubLayNet [ 38](academic paper layouts), Table Bank [ 18](tables in academic\n",
|
||||
"papers), Newspaper Navigator Dataset [ 16,17](newspaper \f",
|
||||
"\n",
|
||||
"gure layouts) and\n",
|
||||
"HJDataset [31](historical Japanese document layouts). A spectrum of models\n",
|
||||
"trained on these datasets are currently available in the LayoutParser model zoo\n",
|
||||
"to support di\u000b",
|
||||
"\n",
|
||||
"to support di\u000B",
|
||||
"erent use cases.\n",
|
||||
"3 The Core LayoutParser Library\n",
|
||||
"At the core of LayoutParser is an o\u000b",
|
||||
"\n",
|
||||
"At the core of LayoutParser is an o\u000B",
|
||||
"-the-shelf toolkit that streamlines DL-\n",
|
||||
"based document image analysis. Five components support a simple interface\n",
|
||||
"with comprehensive functionalities: 1) The layout detection models enable using\n",
|
||||
@@ -231,9 +226,7 @@
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
|
||||
]
|
||||
"text/plain": "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
@@ -246,51 +239,53 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "278c881f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Fetching remote PDFs using Unstructured\n",
|
||||
"\n",
|
||||
"This covers how to load online pdfs into a document format that we can use downstream. This can be used for various online pdf sites such as https://open.umn.edu/opentextbooks/textbooks/ and https://arxiv.org/archive/\n",
|
||||
"\n",
|
||||
"Note: all other pdf loaders can also be used to fetch remote PDFs, but `OnlinePDFLoader` is a legacy function, and works specifically with `UnstructuredPDFLoader`.\n"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "0c2686fc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import OnlinePDFLoader"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "101e0b82",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = OnlinePDFLoader(\"https://arxiv.org/pdf/2302.03803.pdf\")"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "be3ccbfa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "e1298dd6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
@@ -302,13 +297,17 @@
|
||||
],
|
||||
"source": [
|
||||
"print(data)"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "05187b33",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -350,101 +349,55 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c90a5fe8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using PDFMiner to generate HTML text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eb785e1c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This can be helpful for chunking texts semantically into sections as the output html content can be parsed via `BeautifulSoup` to get more structured and rich information about font size, page numbers, pdf headers/footers, etc."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "601000d7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PDFMinerPDFasHTMLLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a5525fb0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PDFMinerPDFasHTMLLoader(\"example_data/layout-parser-paper.pdf\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "dac7ff68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc2c2f4f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using PyMuPDF\n",
|
||||
"\n",
|
||||
"This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as returns one document per page."
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "55f0c4d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PyMuPDFLoader"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "718cbfbc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PyMuPDFLoader(\"example_data/layout-parser-paper.pdf\")"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "f2f93a15",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a24dfaa6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
|
||||
]
|
||||
"text/plain": "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
@@ -453,23 +406,28 @@
|
||||
],
|
||||
"source": [
|
||||
"data[0]"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "83cb52a0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Additionally, you can pass along any of the options from the [PyMuPDF documentation](https://pymupdf.readthedocs.io/en/latest/app1.html#plain-text/) as keyword arguments in the `load` call, and it will be pass along to the `get_text()` call."
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1bf73c97",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -488,7 +446,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.8"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,95 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9fc6205b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Databerry\n",
|
||||
"\n",
|
||||
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
|
||||
"\n",
|
||||
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "944e172b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query\n",
|
||||
"\n",
|
||||
"Now that our index is set up, we can set up a retriever and start querying it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d0e6f506",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import DataberryRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "f381f642",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = DataberryRetriever(\n",
|
||||
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.databerry.ai/query\",\n",
|
||||
" # api_key=\"DATABERRY_API_KEY\", # optional if datastore is public\n",
|
||||
" # top_k=10 # optional\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "20ae1a74",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='✨ Made with DaftpageOpen main menuPricingTemplatesLoginSearchHelpGetting StartedFeaturesAffiliate ProgramGetting StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!DaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord', metadata={'source': 'https:/daftpage.com/help/getting-started', 'score': 0.8697265}),\n",
|
||||
" Document(page_content=\"✨ Made with DaftpageOpen main menuPricingTemplatesLoginSearchHelpGetting StartedFeaturesAffiliate ProgramHelp CenterWelcome to Daftpage’s help center—the one-stop shop for learning everything about building websites with Daftpage.Daftpage is the simplest way to create websites for all purposes in seconds. Without knowing how to code, and for free!Get StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!Start here✨ Create your first site🧱 Add blocks🚀 PublishGuides🔖 Add a custom domainFeatures🔥 Drops🎨 Drawings👻 Ghost mode💀 Skeleton modeCant find the answer you're looking for?mail us at support@daftpage.comJoin the awesome Daftpage community on: 👾 DiscordDaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord\", metadata={'source': 'https:/daftpage.com/help', 'score': 0.86570895}),\n",
|
||||
" Document(page_content=\" is the simplest way to create websites for all purposes in seconds. Without knowing how to code, and for free!Get StartedDaftpage is a new type of website builder that works like a doc.It makes website building easy, fun and offers tons of powerful features for free. Just type / in your page to get started!Start here✨ Create your first site🧱 Add blocks🚀 PublishGuides🔖 Add a custom domainFeatures🔥 Drops🎨 Drawings👻 Ghost mode💀 Skeleton modeCant find the answer you're looking for?mail us at support@daftpage.comJoin the awesome Daftpage community on: 👾 DiscordDaftpageCopyright © 2022 Daftpage, Inc.All rights reserved.ProductPricingTemplatesHelp & SupportHelp CenterGetting startedBlogCompanyAboutRoadmapTwitterAffiliate Program👾 Discord\", metadata={'source': 'https:/daftpage.com/help', 'score': 0.8645384})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"What is Daftpage?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -55,7 +55,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\")\n",
|
||||
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
@@ -94,7 +94,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", engine=\"faiss\", space_type=\"innerproduct\", ef_construction=256, m=48)\n",
|
||||
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", engine=\"faiss\", space_type=\"innerproduct\", ef_construction=256, m=48)\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
@@ -133,7 +133,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
|
||||
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", k=1, search_type=\"script_scoring\")"
|
||||
@@ -172,10 +172,10 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
|
||||
"docsearch = OpenSearchVectorSearch.from_texts(texts, embeddings, opensearch_url=\"http://localhost:9200\", is_appx_search=False)\n",
|
||||
"filter = {\"bool\": {\"filter\": {\"term\": {\"text\": \"smuggling\"}}}}\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", search_type=\"painless_scripting\", space_type=\"cosineSimilarity\", pre_filter=filter)"
|
||||
"docs = docsearch.similarity_search(\"What did the president say about Ketanji Brown Jackson\", search_type=\"painless_scripting\", space_type=\"cosinesimil\", pre_filter=filter)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -238,4 +238,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
|
||||
@@ -139,7 +139,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.predict(human_input=\"Not too bad - how are you?\")"
|
||||
"llm_chain.predict(human_input=\"Not to bad - how are you?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,196 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Motörhead Memory\n",
|
||||
"[Motörhead](https://github.com/getmetal/motorhead) is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"See instructions at [Motörhead](https://github.com/getmetal/motorhead) for running the server locally.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.motorhead_memory import MotorheadMemory\n",
|
||||
"from langchain import OpenAI, LLMChain, PromptTemplate\n",
|
||||
"\n",
|
||||
"template = \"\"\"You are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"{chat_history}\n",
|
||||
"Human: {human_input}\n",
|
||||
"AI:\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"chat_history\", \"human_input\"], \n",
|
||||
" template=template\n",
|
||||
")\n",
|
||||
"memory = MotorheadMemory(\n",
|
||||
" session_id=\"testing-1\",\n",
|
||||
" url=\"http://localhost:8080\",\n",
|
||||
" memory_key=\"chat_history\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"await memory.init(); # loads previous state from Motörhead 🤘\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(\n",
|
||||
" llm=OpenAI(), \n",
|
||||
" prompt=prompt, \n",
|
||||
" verbose=True, \n",
|
||||
" memory=memory,\n",
|
||||
")\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Hi Bob, nice to meet you! How are you doing today?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"hi im bob\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI: Hi Bob, nice to meet you! How are you doing today?\n",
|
||||
"Human: whats my name?\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' You said your name is Bob. Is that correct?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"whats my name?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI: Hi Bob, nice to meet you! How are you doing today?\n",
|
||||
"Human: whats my name?\n",
|
||||
"AI: You said your name is Bob. Is that correct?\n",
|
||||
"Human: whats for dinner?\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" I'm sorry, I'm not sure what you're asking. Could you please rephrase your question?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"whats for dinner?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,62 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "91c6a7ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Postgres Chat Message History\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use Postgres to store chat message history."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d15e3302",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import PostgresChatMessageHistory\n",
|
||||
"\n",
|
||||
"history = PostgresChatMessageHistory(connection_string=\"postgresql://postgres:mypassword@localhost/chat_history\", session_id=\"foo\")\n",
|
||||
"\n",
|
||||
"history.add_user_message(\"hi!\")\n",
|
||||
"\n",
|
||||
"history.add_ai_message(\"whats up?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "64fc465e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"history.messages"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -314,7 +314,7 @@
|
||||
"source": [
|
||||
"## Saving Message History\n",
|
||||
"\n",
|
||||
"You may often have to save messages, and then load them to use again. This can be done easily by first converting the messages to normal python dictionaries, saving those (as json or something) and then loading those. Here is an example of doing that."
|
||||
"You may often to save messages, and then load them to use again. This can be done easily by first converting the messages to normal python dictionaries, saving those (as json or something) and then loading those. Here is an example of doing that."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -32,8 +32,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)\n",
|
||||
"memory.save_context({\"input\": \"hi\"}, {\"output\": \"whats up\"})\n",
|
||||
"memory.save_context({\"input\": \"not much you\"}, {\"output\": \"not much\"})"
|
||||
"memory.save_context({\"input\": \"hi\"}, {\"ouput\": \"whats up\"})\n",
|
||||
"memory.save_context({\"input\": \"not much you\"}, {\"ouput\": \"not much\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -73,8 +73,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10, return_messages=True)\n",
|
||||
"memory.save_context({\"input\": \"hi\"}, {\"output\": \"whats up\"})\n",
|
||||
"memory.save_context({\"input\": \"not much you\"}, {\"output\": \"not much\"})"
|
||||
"memory.save_context({\"input\": \"hi\"}, {\"ouput\": \"whats up\"})\n",
|
||||
"memory.save_context({\"input\": \"not much you\"}, {\"ouput\": \"not much\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -186,7 +186,7 @@
|
||||
"source": [
|
||||
"**Number of Tokens:** You can also estimate how many tokens a piece of text will be in that model. This is useful because models have a context length (and cost more for more tokens), which means you need to be aware of how long the text you are passing in is.\n",
|
||||
"\n",
|
||||
"Notice that by default the tokens are estimated using [tiktoken](https://github.com/openai/tiktoken) (except for legacy version <3.8, where a HuggingFace tokenizer is used)"
|
||||
"Notice that by default the tokens are estimated using a HuggingFace tokenizer."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GPT4All\n",
|
||||
"# GPT4all\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with GPT4All models"
|
||||
]
|
||||
@@ -15,7 +15,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install pyllamacpp > /dev/null"
|
||||
"!pip install pyllamacpp"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -24,10 +24,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"from langchain.llms import GPT4All\n",
|
||||
"from langchain.callbacks.base import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler"
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -43,70 +41,15 @@
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Specify Model\n",
|
||||
"\n",
|
||||
"To run locally, download a compatible ggml-formatted model. For more info, visit https://github.com/nomic-ai/pyllamacpp\n",
|
||||
"\n",
|
||||
"Note that new models are uploaded regularly - check the link above for the most recent `.bin` URL"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"local_path = './models/gpt4all-lora-quantized-ggml.bin' # replace with your desired local file path"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Uncomment the below block to download a model. You may want to update `url` to a new version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import requests\n",
|
||||
"\n",
|
||||
"# from pathlib import Path\n",
|
||||
"# from tqdm import tqdm\n",
|
||||
"\n",
|
||||
"# Path(local_path).parent.mkdir(parents=True, exist_ok=True)\n",
|
||||
"\n",
|
||||
"# # Example model. Check https://github.com/nomic-ai/pyllamacpp for the latest models.\n",
|
||||
"# url = 'https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized-ggml.bin'\n",
|
||||
"\n",
|
||||
"# # send a GET request to the URL to download the file. Stream since it's large\n",
|
||||
"# response = requests.get(url, stream=True)\n",
|
||||
"\n",
|
||||
"# # open the file in binary mode and write the contents of the response to it in chunks\n",
|
||||
"# # This is a large file, so be prepared to wait.\n",
|
||||
"# with open(local_path, 'wb') as f:\n",
|
||||
"# for chunk in tqdm(response.iter_content(chunk_size=8192)):\n",
|
||||
"# if chunk:\n",
|
||||
"# f.write(chunk)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Callbacks support token-wise streaming\n",
|
||||
"callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])\n",
|
||||
"# Verbose is required to pass to the callback manager\n",
|
||||
"llm = GPT4All(model=local_path, callback_manager=callback_manager, verbose=True)"
|
||||
"# You'll need to download a compatible model and convert it to ggml.\n",
|
||||
"# See: https://github.com/nomic-ai/gpt4all for more information.\n",
|
||||
"llm = GPT4All(model=\"./models/gpt4all-model.bin\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -146,7 +89,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -32,8 +32,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings(model=\"your-embeddings-deployment-name\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -55,12 +55,6 @@ The following use cases require specific installs and api keys:
|
||||
- _LlamaCpp_:
|
||||
- Install requirements with `pip install llama-cpp-python`
|
||||
- Download model and convert following [llama.cpp instructions](https://github.com/ggerganov/llama.cpp)
|
||||
- _Milvus_:
|
||||
- Install requirements with `pip install pymilvus`
|
||||
- In order to setup a local cluster, take a look [here](https://milvus.io/docs).
|
||||
- _Zilliz_:
|
||||
- Install requirements with `pip install pymilvus`
|
||||
- To get up and running, take a look [here](https://zilliz.com/doc/quick_start).
|
||||
|
||||
|
||||
If you are using the `NLTKTextSplitter` or the `SpacyTextSplitter`, you will also need to install the appropriate models. For example, if you want to use the `SpacyTextSplitter`, you will need to install the `en_core_web_sm` model with `python -m spacy download en_core_web_sm`. Similarly, if you want to use the `NLTKTextSplitter`, you will need to install the `punkt` model with `python -m nltk.downloader punkt`.
|
||||
|
||||
@@ -1,538 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "517a9fd4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BabyAGI User Guide\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to implement [BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main) by [Yohei Nakajima](https://twitter.com/yoheinakajima). BabyAGI is an AI agent that can generate and pretend to execute tasks based on a given objective.\n",
|
||||
"\n",
|
||||
"This guide will help you understand the components to create your own recursive agents.\n",
|
||||
"\n",
|
||||
"Although BabyAGI uses specific vectorstores/model providers (Pinecone, OpenAI), one of the benefits of implementing it with LangChain is that you can easily swap those out for different options. In this implementation we use a FAISS vectorstore (because it runs locally and is free)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "556af556",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Install and Import Required Modules"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 116,
|
||||
"id": "c8a354b6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from collections import deque\n",
|
||||
"from typing import Dict, List, Optional, Any\n",
|
||||
"\n",
|
||||
"from langchain import LLMChain, OpenAI, PromptTemplate\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import BaseLLM\n",
|
||||
"from langchain.vectorstores.base import VectorStore\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.chains.base import Chain\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09f70772",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connect to the Vector Store\n",
|
||||
"\n",
|
||||
"Depending on what vectorstore you use, this step may look different."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 71,
|
||||
"id": "794045d4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 72,
|
||||
"id": "6e0305eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define your embedding model\n",
|
||||
"embeddings_model = OpenAIEmbeddings()\n",
|
||||
"# Initialize the vectorstore as empty\n",
|
||||
"import faiss\n",
|
||||
"embedding_size = 1536\n",
|
||||
"index = faiss.IndexFlatL2(embedding_size)\n",
|
||||
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0f3b72bf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define the Chains\n",
|
||||
"\n",
|
||||
"BabyAGI relies on three LLM chains:\n",
|
||||
"- Task creation chain to select new tasks to add to the list\n",
|
||||
"- Task prioritization chain to re-prioritize tasks\n",
|
||||
"- Execution Chain to execute the tasks"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 73,
|
||||
"id": "bf4bd5cd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class TaskCreationChain(LLMChain):\n",
|
||||
" \"\"\"Chain to generates tasks.\"\"\"\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
|
||||
" \"\"\"Get the response parser.\"\"\"\n",
|
||||
" task_creation_template = (\n",
|
||||
" \"You are an task creation AI that uses the result of an execution agent\"\n",
|
||||
" \" to create new tasks with the following objective: {objective},\"\n",
|
||||
" \" The last completed task has the result: {result}.\"\n",
|
||||
" \" This result was based on this task description: {task_description}.\"\n",
|
||||
" \" These are incomplete tasks: {incomplete_tasks}.\"\n",
|
||||
" \" Based on the result, create new tasks to be completed\"\n",
|
||||
" \" by the AI system that do not overlap with incomplete tasks.\"\n",
|
||||
" \" Return the tasks as an array.\"\n",
|
||||
" )\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=task_creation_template,\n",
|
||||
" input_variables=[\"result\", \"task_description\", \"incomplete_tasks\", \"objective\"],\n",
|
||||
" )\n",
|
||||
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 74,
|
||||
"id": "b6488ffe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class TaskPrioritizationChain(LLMChain):\n",
|
||||
" \"\"\"Chain to prioritize tasks.\"\"\"\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
|
||||
" \"\"\"Get the response parser.\"\"\"\n",
|
||||
" task_prioritization_template = (\n",
|
||||
" \"You are an task prioritization AI tasked with cleaning the formatting of and reprioritizing\"\n",
|
||||
" \" the following tasks: {task_names}.\"\n",
|
||||
" \" Consider the ultimate objective of your team: {objective}.\"\n",
|
||||
" \" Do not remove any tasks. Return the result as a numbered list, like:\"\n",
|
||||
" \" #. First task\"\n",
|
||||
" \" #. Second task\"\n",
|
||||
" \" Start the task list with number {next_task_id}.\"\n",
|
||||
" )\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=task_prioritization_template,\n",
|
||||
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
|
||||
" )\n",
|
||||
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 84,
|
||||
"id": "b43cd580",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class ExecutionChain(LLMChain):\n",
|
||||
" \"\"\"Chain to execute tasks.\"\"\"\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
|
||||
" \"\"\"Get the response parser.\"\"\"\n",
|
||||
" execution_template = (\n",
|
||||
" \"You are an AI who performs one task based on the following objective: {objective}.\"\n",
|
||||
" \" Take into account these previously completed tasks: {context}.\"\n",
|
||||
" \" Your task: {task}.\"\n",
|
||||
" \" Response:\"\n",
|
||||
" )\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=execution_template,\n",
|
||||
" input_variables=[\"objective\", \"context\", \"task\"],\n",
|
||||
" )\n",
|
||||
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3ad996c5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define the BabyAGI Controller\n",
|
||||
"\n",
|
||||
"BabyAGI composes the chains defined above in a (potentially-)infinite loop."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 85,
|
||||
"id": "0ada0636",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_next_task(task_creation_chain: LLMChain, result: Dict, task_description: str, task_list: List[str], objective: str) -> List[Dict]:\n",
|
||||
" \"\"\"Get the next task.\"\"\"\n",
|
||||
" incomplete_tasks = \", \".join(task_list)\n",
|
||||
" response = task_creation_chain.run(result=result, task_description=task_description, incomplete_tasks=incomplete_tasks, objective=objective)\n",
|
||||
" new_tasks = response.split('\\n')\n",
|
||||
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 86,
|
||||
"id": "d35250ad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def prioritize_tasks(task_prioritization_chain: LLMChain, this_task_id: int, task_list: List[Dict], objective: str) -> List[Dict]:\n",
|
||||
" \"\"\"Prioritize tasks.\"\"\"\n",
|
||||
" task_names = [t[\"task_name\"] for t in task_list]\n",
|
||||
" next_task_id = int(this_task_id) + 1\n",
|
||||
" response = task_prioritization_chain.run(task_names=task_names, next_task_id=next_task_id, objective=objective)\n",
|
||||
" new_tasks = response.split('\\n')\n",
|
||||
" prioritized_task_list = []\n",
|
||||
" for task_string in new_tasks:\n",
|
||||
" if not task_string.strip():\n",
|
||||
" continue\n",
|
||||
" task_parts = task_string.strip().split(\".\", 1)\n",
|
||||
" if len(task_parts) == 2:\n",
|
||||
" task_id = task_parts[0].strip()\n",
|
||||
" task_name = task_parts[1].strip()\n",
|
||||
" prioritized_task_list.append({\"task_id\": task_id, \"task_name\": task_name})\n",
|
||||
" return prioritized_task_list"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 87,
|
||||
"id": "e3f1840c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def _get_top_tasks(vectorstore, query: str, k: int) -> List[str]:\n",
|
||||
" \"\"\"Get the top k tasks based on the query.\"\"\"\n",
|
||||
" results = vectorstore.similarity_search_with_score(query, k=k)\n",
|
||||
" if not results:\n",
|
||||
" return []\n",
|
||||
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
|
||||
" return [str(item.metadata['task']) for item in sorted_results]\n",
|
||||
"\n",
|
||||
"def execute_task(vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5) -> str:\n",
|
||||
" \"\"\"Execute a task.\"\"\"\n",
|
||||
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
|
||||
" return execution_chain.run(objective=objective, context=context, task=task)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 137,
|
||||
"id": "1e978938",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"class BabyAGI(Chain, BaseModel):\n",
|
||||
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
|
||||
"\n",
|
||||
" task_list: deque = Field(default_factory=deque)\n",
|
||||
" task_creation_chain: TaskCreationChain = Field(...)\n",
|
||||
" task_prioritization_chain: TaskPrioritizationChain = Field(...)\n",
|
||||
" execution_chain: ExecutionChain = Field(...)\n",
|
||||
" task_id_counter: int = Field(1)\n",
|
||||
" vectorstore: VectorStore = Field(init=False)\n",
|
||||
" max_iterations: Optional[int] = None\n",
|
||||
" \n",
|
||||
" class Config:\n",
|
||||
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
|
||||
" arbitrary_types_allowed = True\n",
|
||||
"\n",
|
||||
" def add_task(self, task: Dict):\n",
|
||||
" self.task_list.append(task)\n",
|
||||
"\n",
|
||||
" def print_task_list(self):\n",
|
||||
" print(\"\\033[95m\\033[1m\" + \"\\n*****TASK LIST*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" for t in self.task_list:\n",
|
||||
" print(str(t[\"task_id\"]) + \": \" + t[\"task_name\"])\n",
|
||||
"\n",
|
||||
" def print_next_task(self, task: Dict):\n",
|
||||
" print(\"\\033[92m\\033[1m\" + \"\\n*****NEXT TASK*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" print(str(task[\"task_id\"]) + \": \" + task[\"task_name\"])\n",
|
||||
"\n",
|
||||
" def print_task_result(self, result: str):\n",
|
||||
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" print(result)\n",
|
||||
" \n",
|
||||
" @property\n",
|
||||
" def input_keys(self) -> List[str]:\n",
|
||||
" return [\"objective\"]\n",
|
||||
" \n",
|
||||
" @property\n",
|
||||
" def output_keys(self) -> List[str]:\n",
|
||||
" return []\n",
|
||||
"\n",
|
||||
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
|
||||
" \"\"\"Run the agent.\"\"\"\n",
|
||||
" objective = inputs['objective']\n",
|
||||
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
|
||||
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
|
||||
" num_iters = 0\n",
|
||||
" while True:\n",
|
||||
" if self.task_list:\n",
|
||||
" self.print_task_list()\n",
|
||||
"\n",
|
||||
" # Step 1: Pull the first task\n",
|
||||
" task = self.task_list.popleft()\n",
|
||||
" self.print_next_task(task)\n",
|
||||
"\n",
|
||||
" # Step 2: Execute the task\n",
|
||||
" result = execute_task(\n",
|
||||
" self.vectorstore, self.execution_chain, objective, task[\"task_name\"]\n",
|
||||
" )\n",
|
||||
" this_task_id = int(task[\"task_id\"])\n",
|
||||
" self.print_task_result(result)\n",
|
||||
"\n",
|
||||
" # Step 3: Store the result in Pinecone\n",
|
||||
" result_id = f\"result_{task['task_id']}\"\n",
|
||||
" self.vectorstore.add_texts(\n",
|
||||
" texts=[result],\n",
|
||||
" metadatas=[{\"task\": task[\"task_name\"]}],\n",
|
||||
" ids=[result_id],\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Step 4: Create new tasks and reprioritize task list\n",
|
||||
" new_tasks = get_next_task(\n",
|
||||
" self.task_creation_chain, result, task[\"task_name\"], [t[\"task_name\"] for t in self.task_list], objective\n",
|
||||
" )\n",
|
||||
" for new_task in new_tasks:\n",
|
||||
" self.task_id_counter += 1\n",
|
||||
" new_task.update({\"task_id\": self.task_id_counter})\n",
|
||||
" self.add_task(new_task)\n",
|
||||
" self.task_list = deque(\n",
|
||||
" prioritize_tasks(\n",
|
||||
" self.task_prioritization_chain, this_task_id, list(self.task_list), objective\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" num_iters += 1\n",
|
||||
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
|
||||
" print(\"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" break\n",
|
||||
" return {}\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(\n",
|
||||
" cls,\n",
|
||||
" llm: BaseLLM,\n",
|
||||
" vectorstore: VectorStore,\n",
|
||||
" verbose: bool = False,\n",
|
||||
" **kwargs\n",
|
||||
" ) -> \"BabyAGI\":\n",
|
||||
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
|
||||
" task_creation_chain = TaskCreationChain.from_llm(\n",
|
||||
" llm, verbose=verbose\n",
|
||||
" )\n",
|
||||
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
|
||||
" llm, verbose=verbose\n",
|
||||
" )\n",
|
||||
" execution_chain = ExecutionChain.from_llm(llm, verbose=verbose)\n",
|
||||
" return cls(\n",
|
||||
" task_creation_chain=task_creation_chain,\n",
|
||||
" task_prioritization_chain=task_prioritization_chain,\n",
|
||||
" execution_chain=execution_chain,\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" **kwargs\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "05ba762e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the BabyAGI\n",
|
||||
"\n",
|
||||
"Now it's time to create the BabyAGI controller and watch it try to accomplish your objective."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 138,
|
||||
"id": "3d220b69",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"OBJECTIVE = \"Write a weather report for SF today\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 139,
|
||||
"id": "8a8e5543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 140,
|
||||
"id": "3d69899b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Logging of LLMChains\n",
|
||||
"verbose=False\n",
|
||||
"# If None, will keep on going forever\n",
|
||||
"max_iterations: Optional[int] = 3\n",
|
||||
"baby_agi = BabyAGI.from_llm(\n",
|
||||
" llm=llm,\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" verbose=verbose,\n",
|
||||
" max_iterations=max_iterations\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 141,
|
||||
"id": "f7957b51",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"1: Make a todo list\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"1: Make a todo list\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"1. Check the temperature range for the day.\n",
|
||||
"2. Gather temperature data for SF today.\n",
|
||||
"3. Analyze the temperature data and create a weather report.\n",
|
||||
"4. Publish the weather report.\n",
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"2: Gather data on the expected temperature range for the day.\n",
|
||||
"3: Collect data on the expected precipitation for the day.\n",
|
||||
"4: Analyze the data and create a weather report.\n",
|
||||
"5: Check the current weather conditions in SF.\n",
|
||||
"6: Publish the weather report.\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"2: Gather data on the expected temperature range for the day.\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I have gathered data on the expected temperature range for the day in San Francisco. The forecast is for temperatures to range from a low of 55 degrees Fahrenheit to a high of 68 degrees Fahrenheit.\n",
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"3: Check the current weather conditions in SF.\n",
|
||||
"4: Calculate the average temperature for the day in San Francisco.\n",
|
||||
"5: Determine the probability of precipitation for the day in San Francisco.\n",
|
||||
"6: Identify any potential weather warnings or advisories for the day in San Francisco.\n",
|
||||
"7: Research any historical weather patterns for the day in San Francisco.\n",
|
||||
"8: Compare the expected temperature range to the historical average for the day in San Francisco.\n",
|
||||
"9: Collect data on the expected precipitation for the day.\n",
|
||||
"10: Analyze the data and create a weather report.\n",
|
||||
"11: Publish the weather report.\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"3: Check the current weather conditions in SF.\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I am checking the current weather conditions in SF. According to the data I have gathered, the temperature in SF today is currently around 65 degrees Fahrenheit with clear skies. The temperature range for the day is expected to be between 60 and 70 degrees Fahrenheit.\n",
|
||||
"\u001b[91m\u001b[1m\n",
|
||||
"*****TASK ENDING*****\n",
|
||||
"\u001b[0m\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'objective': 'Write a weather report for SF today'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 141,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"baby_agi({\"objective\": OBJECTIVE})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "898a210b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,613 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "517a9fd4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BabyAGI with Tools\n",
|
||||
"\n",
|
||||
"This notebook builds on top of [baby agi](baby_agi.ipynb), but shows how you can swap out the execution chain. The previous execution chain was just an LLM which made stuff up. By swapping it out with an agent that has access to tools, we can hopefully get real reliable information"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "556af556",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Install and Import Required Modules"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c8a354b6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from collections import deque\n",
|
||||
"from typing import Dict, List, Optional, Any\n",
|
||||
"\n",
|
||||
"from langchain import LLMChain, OpenAI, PromptTemplate\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import BaseLLM\n",
|
||||
"from langchain.vectorstores.base import VectorStore\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.chains.base import Chain\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "09f70772",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connect to the Vector Store\n",
|
||||
"\n",
|
||||
"Depending on what vectorstore you use, this step may look different."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "794045d4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install faiss-cpu > /dev/null%pip install google-search-results > /dev/nullfrom langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6e0305eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define your embedding model\n",
|
||||
"embeddings_model = OpenAIEmbeddings()\n",
|
||||
"# Initialize the vectorstore as empty\n",
|
||||
"import faiss\n",
|
||||
"embedding_size = 1536\n",
|
||||
"index = faiss.IndexFlatL2(embedding_size)\n",
|
||||
"vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0f3b72bf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define the Chains\n",
|
||||
"\n",
|
||||
"BabyAGI relies on three LLM chains:\n",
|
||||
"- Task creation chain to select new tasks to add to the list\n",
|
||||
"- Task prioritization chain to re-prioritize tasks\n",
|
||||
"- Execution Chain to execute the tasks\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"NOTE: in this notebook, the Execution chain will now be an agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bf4bd5cd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class TaskCreationChain(LLMChain):\n",
|
||||
" \"\"\"Chain to generates tasks.\"\"\"\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
|
||||
" \"\"\"Get the response parser.\"\"\"\n",
|
||||
" task_creation_template = (\n",
|
||||
" \"You are an task creation AI that uses the result of an execution agent\"\n",
|
||||
" \" to create new tasks with the following objective: {objective},\"\n",
|
||||
" \" The last completed task has the result: {result}.\"\n",
|
||||
" \" This result was based on this task description: {task_description}.\"\n",
|
||||
" \" These are incomplete tasks: {incomplete_tasks}.\"\n",
|
||||
" \" Based on the result, create new tasks to be completed\"\n",
|
||||
" \" by the AI system that do not overlap with incomplete tasks.\"\n",
|
||||
" \" Return the tasks as an array.\"\n",
|
||||
" )\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=task_creation_template,\n",
|
||||
" input_variables=[\"result\", \"task_description\", \"incomplete_tasks\", \"objective\"],\n",
|
||||
" )\n",
|
||||
" return cls(prompt=prompt, llm=llm, verbose=verbose)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "b6488ffe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class TaskPrioritizationChain(LLMChain):\n",
|
||||
" \"\"\"Chain to prioritize tasks.\"\"\"\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(cls, llm: BaseLLM, verbose: bool = True) -> LLMChain:\n",
|
||||
" \"\"\"Get the response parser.\"\"\"\n",
|
||||
" task_prioritization_template = (\n",
|
||||
" \"You are an task prioritization AI tasked with cleaning the formatting of and reprioritizing\"\n",
|
||||
" \" the following tasks: {task_names}.\"\n",
|
||||
" \" Consider the ultimate objective of your team: {objective}.\"\n",
|
||||
" \" Do not remove any tasks. Return the result as a numbered list, like:\"\n",
|
||||
" \" #. First task\"\n",
|
||||
" \" #. Second task\"\n",
|
||||
" \" Start the task list with number {next_task_id}.\"\n",
|
||||
" )\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=task_prioritization_template,\n",
|
||||
" input_variables=[\"task_names\", \"next_task_id\", \"objective\"],\n",
|
||||
" )\n",
|
||||
" return cls(prompt=prompt, llm=llm, verbose=verbose)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"id": "b43cd580",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
|
||||
"from langchain import OpenAI, SerpAPIWrapper, LLMChain\n",
|
||||
"todo_prompt = PromptTemplate.from_template(\"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\")\n",
|
||||
"todo_chain = LLMChain(llm=OpenAI(temperature=0), prompt=todo_prompt)\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name = \"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"useful for when you need to answer questions about current events\"\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name = \"TODO\",\n",
|
||||
" func=todo_chain.run,\n",
|
||||
" description=\"useful for when you need to come up with todo lists. Input: an objective to create a todo list for. Output: a todo list for that objective. Please be very clear what the objective is!\"\n",
|
||||
" )\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prefix = \"\"\"You are an AI who performs one task based on the following objective: {objective}. Take into account these previously completed tasks: {context}.\"\"\"\n",
|
||||
"suffix = \"\"\"Question: {task}\n",
|
||||
"{agent_scratchpad}\"\"\"\n",
|
||||
"prompt = ZeroShotAgent.create_prompt(\n",
|
||||
" tools, \n",
|
||||
" prefix=prefix, \n",
|
||||
" suffix=suffix, \n",
|
||||
" input_variables=[\"objective\", \"task\", \"context\",\"agent_scratchpad\"]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3ad996c5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define the BabyAGI Controller\n",
|
||||
"\n",
|
||||
"BabyAGI composes the chains defined above in a (potentially-)infinite loop."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 47,
|
||||
"id": "0ada0636",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_next_task(task_creation_chain: LLMChain, result: Dict, task_description: str, task_list: List[str], objective: str) -> List[Dict]:\n",
|
||||
" \"\"\"Get the next task.\"\"\"\n",
|
||||
" incomplete_tasks = \", \".join(task_list)\n",
|
||||
" response = task_creation_chain.run(result=result, task_description=task_description, incomplete_tasks=incomplete_tasks, objective=objective)\n",
|
||||
" new_tasks = response.split('\\n')\n",
|
||||
" return [{\"task_name\": task_name} for task_name in new_tasks if task_name.strip()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 48,
|
||||
"id": "d35250ad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def prioritize_tasks(task_prioritization_chain: LLMChain, this_task_id: int, task_list: List[Dict], objective: str) -> List[Dict]:\n",
|
||||
" \"\"\"Prioritize tasks.\"\"\"\n",
|
||||
" task_names = [t[\"task_name\"] for t in task_list]\n",
|
||||
" next_task_id = int(this_task_id) + 1\n",
|
||||
" response = task_prioritization_chain.run(task_names=task_names, next_task_id=next_task_id, objective=objective)\n",
|
||||
" new_tasks = response.split('\\n')\n",
|
||||
" prioritized_task_list = []\n",
|
||||
" for task_string in new_tasks:\n",
|
||||
" if not task_string.strip():\n",
|
||||
" continue\n",
|
||||
" task_parts = task_string.strip().split(\".\", 1)\n",
|
||||
" if len(task_parts) == 2:\n",
|
||||
" task_id = task_parts[0].strip()\n",
|
||||
" task_name = task_parts[1].strip()\n",
|
||||
" prioritized_task_list.append({\"task_id\": task_id, \"task_name\": task_name})\n",
|
||||
" return prioritized_task_list"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 49,
|
||||
"id": "e3f1840c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def _get_top_tasks(vectorstore, query: str, k: int) -> List[str]:\n",
|
||||
" \"\"\"Get the top k tasks based on the query.\"\"\"\n",
|
||||
" results = vectorstore.similarity_search_with_score(query, k=k)\n",
|
||||
" if not results:\n",
|
||||
" return []\n",
|
||||
" sorted_results, _ = zip(*sorted(results, key=lambda x: x[1], reverse=True))\n",
|
||||
" return [str(item.metadata['task']) for item in sorted_results]\n",
|
||||
"\n",
|
||||
"def execute_task(vectorstore, execution_chain: LLMChain, objective: str, task: str, k: int = 5) -> str:\n",
|
||||
" \"\"\"Execute a task.\"\"\"\n",
|
||||
" context = _get_top_tasks(vectorstore, query=objective, k=k)\n",
|
||||
" return execution_chain.run(objective=objective, context=context, task=task)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 50,
|
||||
"id": "1e978938",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"class BabyAGI(Chain, BaseModel):\n",
|
||||
" \"\"\"Controller model for the BabyAGI agent.\"\"\"\n",
|
||||
"\n",
|
||||
" task_list: deque = Field(default_factory=deque)\n",
|
||||
" task_creation_chain: TaskCreationChain = Field(...)\n",
|
||||
" task_prioritization_chain: TaskPrioritizationChain = Field(...)\n",
|
||||
" execution_chain: AgentExecutor = Field(...)\n",
|
||||
" task_id_counter: int = Field(1)\n",
|
||||
" vectorstore: VectorStore = Field(init=False)\n",
|
||||
" max_iterations: Optional[int] = None\n",
|
||||
" \n",
|
||||
" class Config:\n",
|
||||
" \"\"\"Configuration for this pydantic object.\"\"\"\n",
|
||||
" arbitrary_types_allowed = True\n",
|
||||
"\n",
|
||||
" def add_task(self, task: Dict):\n",
|
||||
" self.task_list.append(task)\n",
|
||||
"\n",
|
||||
" def print_task_list(self):\n",
|
||||
" print(\"\\033[95m\\033[1m\" + \"\\n*****TASK LIST*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" for t in self.task_list:\n",
|
||||
" print(str(t[\"task_id\"]) + \": \" + t[\"task_name\"])\n",
|
||||
"\n",
|
||||
" def print_next_task(self, task: Dict):\n",
|
||||
" print(\"\\033[92m\\033[1m\" + \"\\n*****NEXT TASK*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" print(str(task[\"task_id\"]) + \": \" + task[\"task_name\"])\n",
|
||||
"\n",
|
||||
" def print_task_result(self, result: str):\n",
|
||||
" print(\"\\033[93m\\033[1m\" + \"\\n*****TASK RESULT*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" print(result)\n",
|
||||
" \n",
|
||||
" @property\n",
|
||||
" def input_keys(self) -> List[str]:\n",
|
||||
" return [\"objective\"]\n",
|
||||
" \n",
|
||||
" @property\n",
|
||||
" def output_keys(self) -> List[str]:\n",
|
||||
" return []\n",
|
||||
"\n",
|
||||
" def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:\n",
|
||||
" \"\"\"Run the agent.\"\"\"\n",
|
||||
" objective = inputs['objective']\n",
|
||||
" first_task = inputs.get(\"first_task\", \"Make a todo list\")\n",
|
||||
" self.add_task({\"task_id\": 1, \"task_name\": first_task})\n",
|
||||
" num_iters = 0\n",
|
||||
" while True:\n",
|
||||
" if self.task_list:\n",
|
||||
" self.print_task_list()\n",
|
||||
"\n",
|
||||
" # Step 1: Pull the first task\n",
|
||||
" task = self.task_list.popleft()\n",
|
||||
" self.print_next_task(task)\n",
|
||||
"\n",
|
||||
" # Step 2: Execute the task\n",
|
||||
" result = execute_task(\n",
|
||||
" self.vectorstore, self.execution_chain, objective, task[\"task_name\"]\n",
|
||||
" )\n",
|
||||
" this_task_id = int(task[\"task_id\"])\n",
|
||||
" self.print_task_result(result)\n",
|
||||
"\n",
|
||||
" # Step 3: Store the result in Pinecone\n",
|
||||
" result_id = f\"result_{task['task_id']}\"\n",
|
||||
" self.vectorstore.add_texts(\n",
|
||||
" texts=[result],\n",
|
||||
" metadatas=[{\"task\": task[\"task_name\"]}],\n",
|
||||
" ids=[result_id],\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" # Step 4: Create new tasks and reprioritize task list\n",
|
||||
" new_tasks = get_next_task(\n",
|
||||
" self.task_creation_chain, result, task[\"task_name\"], [t[\"task_name\"] for t in self.task_list], objective\n",
|
||||
" )\n",
|
||||
" for new_task in new_tasks:\n",
|
||||
" self.task_id_counter += 1\n",
|
||||
" new_task.update({\"task_id\": self.task_id_counter})\n",
|
||||
" self.add_task(new_task)\n",
|
||||
" self.task_list = deque(\n",
|
||||
" prioritize_tasks(\n",
|
||||
" self.task_prioritization_chain, this_task_id, list(self.task_list), objective\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" num_iters += 1\n",
|
||||
" if self.max_iterations is not None and num_iters == self.max_iterations:\n",
|
||||
" print(\"\\033[91m\\033[1m\" + \"\\n*****TASK ENDING*****\\n\" + \"\\033[0m\\033[0m\")\n",
|
||||
" break\n",
|
||||
" return {}\n",
|
||||
"\n",
|
||||
" @classmethod\n",
|
||||
" def from_llm(\n",
|
||||
" cls,\n",
|
||||
" llm: BaseLLM,\n",
|
||||
" vectorstore: VectorStore,\n",
|
||||
" verbose: bool = False,\n",
|
||||
" **kwargs\n",
|
||||
" ) -> \"BabyAGI\":\n",
|
||||
" \"\"\"Initialize the BabyAGI Controller.\"\"\"\n",
|
||||
" task_creation_chain = TaskCreationChain.from_llm(\n",
|
||||
" llm, verbose=verbose\n",
|
||||
" )\n",
|
||||
" task_prioritization_chain = TaskPrioritizationChain.from_llm(\n",
|
||||
" llm, verbose=verbose\n",
|
||||
" )\n",
|
||||
" llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
" tool_names = [tool.name for tool in tools]\n",
|
||||
" agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)\n",
|
||||
" agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)\n",
|
||||
" return cls(\n",
|
||||
" task_creation_chain=task_creation_chain,\n",
|
||||
" task_prioritization_chain=task_prioritization_chain,\n",
|
||||
" execution_chain=agent_executor,\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" **kwargs\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "05ba762e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Run the BabyAGI\n",
|
||||
"\n",
|
||||
"Now it's time to create the BabyAGI controller and watch it try to accomplish your objective."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 51,
|
||||
"id": "3d220b69",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"OBJECTIVE = \"Write a weather report for SF today\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 52,
|
||||
"id": "8a8e5543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 53,
|
||||
"id": "3d69899b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Logging of LLMChains\n",
|
||||
"verbose=False\n",
|
||||
"# If None, will keep on going forever\n",
|
||||
"max_iterations: Optional[int] = 3\n",
|
||||
"baby_agi = BabyAGI.from_llm(\n",
|
||||
" llm=llm,\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" verbose=verbose,\n",
|
||||
" max_iterations=max_iterations\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 54,
|
||||
"id": "f7957b51",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"1: Make a todo list\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"1: Make a todo list\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to gather data on the current weather conditions in SF\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: Current weather conditions in SF\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mHigh 67F. Winds WNW at 10 to 15 mph. Clear to partly cloudy.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to make a todo list\n",
|
||||
"Action: TODO\n",
|
||||
"Action Input: Write a weather report for SF today\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m\n",
|
||||
"\n",
|
||||
"1. Research current weather conditions in San Francisco\n",
|
||||
"2. Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
|
||||
"3. Analyze data to determine current weather trends\n",
|
||||
"4. Write a brief introduction to the weather report\n",
|
||||
"5. Describe current weather conditions in San Francisco\n",
|
||||
"6. Discuss any upcoming weather changes\n",
|
||||
"7. Summarize the weather report\n",
|
||||
"8. Proofread and edit the report\n",
|
||||
"9. Submit the report\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: A weather report for SF today should include research on current weather conditions in San Francisco, gathering data on temperature, humidity, wind speed, and other relevant weather conditions, analyzing data to determine current weather trends, writing a brief introduction to the weather report, describing current weather conditions in San Francisco, discussing any upcoming weather changes, summarizing the weather report, proofreading and editing the report, and submitting the report.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"A weather report for SF today should include research on current weather conditions in San Francisco, gathering data on temperature, humidity, wind speed, and other relevant weather conditions, analyzing data to determine current weather trends, writing a brief introduction to the weather report, describing current weather conditions in San Francisco, discussing any upcoming weather changes, summarizing the weather report, proofreading and editing the report, and submitting the report.\n",
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"2: Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
|
||||
"3: Analyze data to determine current weather trends\n",
|
||||
"4: Write a brief introduction to the weather report\n",
|
||||
"5: Describe current weather conditions in San Francisco\n",
|
||||
"6: Discuss any upcoming weather changes\n",
|
||||
"7: Summarize the weather report\n",
|
||||
"8: Proofread and edit the report\n",
|
||||
"9: Submit the report\n",
|
||||
"1: Research current weather conditions in San Francisco\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"2: Gather data on temperature, humidity, wind speed, and other relevant weather conditions\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to search for the current weather conditions in SF\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: Current weather conditions in SF\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mHigh 67F. Winds WNW at 10 to 15 mph. Clear to partly cloudy.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to make a todo list\n",
|
||||
"Action: TODO\n",
|
||||
"Action Input: Create a weather report for SF today\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m\n",
|
||||
"\n",
|
||||
"1. Gather current weather data for SF, including temperature, wind speed, humidity, and precipitation.\n",
|
||||
"2. Research historical weather data for SF to compare current conditions.\n",
|
||||
"3. Analyze current and historical data to determine any trends or patterns.\n",
|
||||
"4. Create a visual representation of the data, such as a graph or chart.\n",
|
||||
"5. Write a summary of the weather report, including key findings and any relevant information.\n",
|
||||
"6. Publish the weather report on a website or other platform.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Today in San Francisco, the temperature is 67F with winds WNW at 10 to 15 mph. The sky is clear to partly cloudy.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"Today in San Francisco, the temperature is 67F with winds WNW at 10 to 15 mph. The sky is clear to partly cloudy.\n",
|
||||
"\u001b[95m\u001b[1m\n",
|
||||
"*****TASK LIST*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"3: Research current weather conditions in San Francisco\n",
|
||||
"4: Compare the current weather conditions in San Francisco to the average for this time of year.\n",
|
||||
"5: Identify any potential weather-related hazards in the area.\n",
|
||||
"6: Research any historical weather patterns in San Francisco.\n",
|
||||
"7: Analyze data to determine current weather trends\n",
|
||||
"8: Include any relevant data from nearby cities in the report.\n",
|
||||
"9: Include any relevant data from the National Weather Service in the report.\n",
|
||||
"10: Include any relevant data from local news sources in the report.\n",
|
||||
"11: Include any relevant data from online weather sources in the report.\n",
|
||||
"12: Include any relevant data from local meteorologists in the report.\n",
|
||||
"13: Include any relevant data from local weather stations in the report.\n",
|
||||
"14: Include any relevant data from satellite images in the report.\n",
|
||||
"15: Describe current weather conditions in San Francisco\n",
|
||||
"16: Discuss any upcoming weather changes\n",
|
||||
"17: Write a brief introduction to the weather report\n",
|
||||
"18: Summarize the weather report\n",
|
||||
"19: Proofread and edit the report\n",
|
||||
"20: Submit the report\n",
|
||||
"\u001b[92m\u001b[1m\n",
|
||||
"*****NEXT TASK*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"3: Research current weather conditions in San Francisco\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to search for current weather conditions in San Francisco\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: Current weather conditions in San Francisco\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mTodaySun 04/09 High 67 · 1% Precip. ; TonightSun 04/09 Low 49 · 9% Precip. ; TomorrowMon 04/10 High 64 · 11% Precip.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Today in San Francisco, the high temperature is 67 degrees with 1% chance of precipitation. The low temperature tonight is 49 degrees with 9% chance of precipitation. Tomorrow's high temperature is 64 degrees with 11% chance of precipitation.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[93m\u001b[1m\n",
|
||||
"*****TASK RESULT*****\n",
|
||||
"\u001b[0m\u001b[0m\n",
|
||||
"Today in San Francisco, the high temperature is 67 degrees with 1% chance of precipitation. The low temperature tonight is 49 degrees with 9% chance of precipitation. Tomorrow's high temperature is 64 degrees with 11% chance of precipitation.\n",
|
||||
"\u001b[91m\u001b[1m\n",
|
||||
"*****TASK ENDING*****\n",
|
||||
"\u001b[0m\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'objective': 'Write a weather report for SF today'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 54,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"baby_agi({\"objective\": OBJECTIVE})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "898a210b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,693 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CAMEL Role-Playing Autonomous Cooperative Agents\n",
|
||||
"\n",
|
||||
"This is a langchain implementation of paper: \"CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society\".\n",
|
||||
"\n",
|
||||
"Overview:\n",
|
||||
"\n",
|
||||
"The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their \"cognitive\" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond.\n",
|
||||
"\n",
|
||||
"The original implementation: https://github.com/lightaime/camel\n",
|
||||
"\n",
|
||||
"Project website: https://www.camel-ai.org/\n",
|
||||
"\n",
|
||||
"Arxiv paper: https://arxiv.org/abs/2303.17760\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Import LangChain related modules "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" SystemMessagePromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
" BaseMessage,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Define a CAMEL agent helper class"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class CAMELAgent:\n",
|
||||
"\n",
|
||||
" def __init__(\n",
|
||||
" self,\n",
|
||||
" system_message: SystemMessage,\n",
|
||||
" model: ChatOpenAI,\n",
|
||||
" ) -> None:\n",
|
||||
" self.system_message = system_message\n",
|
||||
" self.model = model\n",
|
||||
" self.init_messages()\n",
|
||||
"\n",
|
||||
" def reset(self) -> None:\n",
|
||||
" self.init_messages()\n",
|
||||
" return self.stored_messages\n",
|
||||
"\n",
|
||||
" def init_messages(self) -> None:\n",
|
||||
" self.stored_messages = [self.system_message]\n",
|
||||
"\n",
|
||||
" def update_messages(self, message: BaseMessage) -> List[BaseMessage]:\n",
|
||||
" self.stored_messages.append(message)\n",
|
||||
" return self.stored_messages\n",
|
||||
"\n",
|
||||
" def step(\n",
|
||||
" self,\n",
|
||||
" input_message: HumanMessage,\n",
|
||||
" ) -> AIMessage:\n",
|
||||
" messages = self.update_messages(input_message)\n",
|
||||
"\n",
|
||||
" output_message = self.model(messages)\n",
|
||||
" self.update_messages(output_message)\n",
|
||||
"\n",
|
||||
" return output_message\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup OpenAI API key and roles and task for role-playing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
|
||||
"\n",
|
||||
"assistant_role_name = \"Python Programmer\"\n",
|
||||
"user_role_name = \"Stock Trader\"\n",
|
||||
"task = \"Develop a trading bot for the stock market\"\n",
|
||||
"word_limit = 50 # word limit for task brainstorming"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a task specify agent for brainstorming and get the specified task"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Specified task: Develop a Python-based swing trading bot that scans market trends, monitors stocks, and generates trading signals to help a stock trader to place optimal buy and sell orders with defined stop losses and profit targets.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"task_specifier_sys_msg = SystemMessage(content=\"You can make a task more specific.\")\n",
|
||||
"task_specifier_prompt = (\n",
|
||||
"\"\"\"Here is a task that {assistant_role_name} will help {user_role_name} to complete: {task}.\n",
|
||||
"Please make it more specific. Be creative and imaginative.\n",
|
||||
"Please reply with the specified task in {word_limit} words or less. Do not add anything else.\"\"\"\n",
|
||||
")\n",
|
||||
"task_specifier_template = HumanMessagePromptTemplate.from_template(template=task_specifier_prompt)\n",
|
||||
"task_specify_agent = CAMELAgent(task_specifier_sys_msg, ChatOpenAI(temperature=1.0))\n",
|
||||
"task_specifier_msg = task_specifier_template.format_messages(assistant_role_name=assistant_role_name,\n",
|
||||
" user_role_name=user_role_name,\n",
|
||||
" task=task, word_limit=word_limit)[0]\n",
|
||||
"specified_task_msg = task_specify_agent.step(task_specifier_msg)\n",
|
||||
"print(f\"Specified task: {specified_task_msg.content}\")\n",
|
||||
"specified_task = specified_task_msg.content"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create inception prompts for AI assistant and AI user for role-playing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"assistant_inception_prompt = (\n",
|
||||
"\"\"\"Never forget you are a {assistant_role_name} and I am a {user_role_name}. Never flip roles! Never instruct me!\n",
|
||||
"We share a common interest in collaborating to successfully complete a task.\n",
|
||||
"You must help me to complete the task.\n",
|
||||
"Here is the task: {task}. Never forget our task!\n",
|
||||
"I must instruct you based on your expertise and my needs to complete the task.\n",
|
||||
"\n",
|
||||
"I must give you one instruction at a time.\n",
|
||||
"You must write a specific solution that appropriately completes the requested instruction.\n",
|
||||
"You must decline my instruction honestly if you cannot perform the instruction due to physical, moral, legal reasons or your capability and explain the reasons.\n",
|
||||
"Do not add anything else other than your solution to my instruction.\n",
|
||||
"You are never supposed to ask me any questions you only answer questions.\n",
|
||||
"You are never supposed to reply with a flake solution. Explain your solutions.\n",
|
||||
"Your solution must be declarative sentences and simple present tense.\n",
|
||||
"Unless I say the task is completed, you should always start with:\n",
|
||||
"\n",
|
||||
"Solution: <YOUR_SOLUTION>\n",
|
||||
"\n",
|
||||
"<YOUR_SOLUTION> should be specific and provide preferable implementations and examples for task-solving.\n",
|
||||
"Always end <YOUR_SOLUTION> with: Next request.\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"user_inception_prompt = (\n",
|
||||
"\"\"\"Never forget you are a {user_role_name} and I am a {assistant_role_name}. Never flip roles! You will always instruct me.\n",
|
||||
"We share a common interest in collaborating to successfully complete a task.\n",
|
||||
"I must help you to complete the task.\n",
|
||||
"Here is the task: {task}. Never forget our task!\n",
|
||||
"You must instruct me based on my expertise and your needs to complete the task ONLY in the following two ways:\n",
|
||||
"\n",
|
||||
"1. Instruct with a necessary input:\n",
|
||||
"Instruction: <YOUR_INSTRUCTION>\n",
|
||||
"Input: <YOUR_INPUT>\n",
|
||||
"\n",
|
||||
"2. Instruct without any input:\n",
|
||||
"Instruction: <YOUR_INSTRUCTION>\n",
|
||||
"Input: None\n",
|
||||
"\n",
|
||||
"The \"Instruction\" describes a task or question. The paired \"Input\" provides further context or information for the requested \"Instruction\".\n",
|
||||
"\n",
|
||||
"You must give me one instruction at a time.\n",
|
||||
"I must write a response that appropriately completes the requested instruction.\n",
|
||||
"I must decline your instruction honestly if I cannot perform the instruction due to physical, moral, legal reasons or my capability and explain the reasons.\n",
|
||||
"You should instruct me not ask me questions.\n",
|
||||
"Now you must start to instruct me using the two ways described above.\n",
|
||||
"Do not add anything else other than your instruction and the optional corresponding input!\n",
|
||||
"Keep giving me instructions and necessary inputs until you think the task is completed.\n",
|
||||
"When the task is completed, you must only reply with a single word <CAMEL_TASK_DONE>.\n",
|
||||
"Never say <CAMEL_TASK_DONE> unless my responses have solved your task.\"\"\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a helper helper to get system messages for AI assistant and AI user from role names and the task"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_sys_msgs(assistant_role_name: str, user_role_name: str, task: str):\n",
|
||||
" \n",
|
||||
" assistant_sys_template = SystemMessagePromptTemplate.from_template(template=assistant_inception_prompt)\n",
|
||||
" assistant_sys_msg = assistant_sys_template.format_messages(assistant_role_name=assistant_role_name, user_role_name=user_role_name, task=task)[0]\n",
|
||||
" \n",
|
||||
" user_sys_template = SystemMessagePromptTemplate.from_template(template=user_inception_prompt)\n",
|
||||
" user_sys_msg = user_sys_template.format_messages(assistant_role_name=assistant_role_name, user_role_name=user_role_name, task=task)[0]\n",
|
||||
" \n",
|
||||
" return assistant_sys_msg, user_sys_msg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create AI assistant agent and AI user agent from obtained system messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"assistant_sys_msg, user_sys_msg = get_sys_msgs(assistant_role_name, user_role_name, specified_task)\n",
|
||||
"assistant_agent = CAMELAgent(assistant_sys_msg, ChatOpenAI(temperature=0.2))\n",
|
||||
"user_agent = CAMELAgent(user_sys_msg, ChatOpenAI(temperature=0.2))\n",
|
||||
"\n",
|
||||
"# Reset agents\n",
|
||||
"assistant_agent.reset()\n",
|
||||
"user_agent.reset()\n",
|
||||
"\n",
|
||||
"# Initialize chats \n",
|
||||
"assistant_msg = HumanMessage(\n",
|
||||
" content=(f\"{user_sys_msg.content}. \"\n",
|
||||
" \"Now start to give me introductions one by one. \"\n",
|
||||
" \"Only reply with Instruction and Input.\"))\n",
|
||||
"\n",
|
||||
"user_msg = HumanMessage(content=f\"{assistant_sys_msg.content}\")\n",
|
||||
"user_msg = assistant_agent.step(user_msg)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Start role-playing session to solve the task!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Original task prompt:\n",
|
||||
"Develop a trading bot for the stock market\n",
|
||||
"\n",
|
||||
"Specified task prompt:\n",
|
||||
"Develop a Python-based swing trading bot that scans market trends, monitors stocks, and generates trading signals to help a stock trader to place optimal buy and sell orders with defined stop losses and profit targets.\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Install the necessary Python libraries for data analysis and trading.\n",
|
||||
"Input: None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can install the necessary Python libraries using pip, a package installer for Python. We can install pandas, numpy, matplotlib, and ta-lib for data analysis and trading. We can use the following command to install these libraries:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"pip install pandas numpy matplotlib ta-lib\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Import the necessary libraries in the Python script.\n",
|
||||
"Input: None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can import the necessary libraries in the Python script using the import statement. We need to import pandas, numpy, matplotlib, and ta-lib for data analysis and trading. We can use the following code to import these libraries:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import talib as ta\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Load historical stock data into a pandas DataFrame.\n",
|
||||
"Input: The path to the CSV file containing the historical stock data.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can load historical stock data into a pandas DataFrame using the `read_csv()` function from pandas. We need to pass the path to the CSV file containing the historical stock data as an argument to this function. We can use the following code to load the historical stock data:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"df = pd.read_csv('path/to/csv/file.csv')\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will load the historical stock data into a pandas DataFrame called `df`. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Preprocess the historical stock data by setting the date column as the index and sorting the DataFrame in ascending order by date.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can preprocess the historical stock data by setting the date column as the index and sorting the DataFrame in ascending order by date using the `set_index()` and `sort_index()` functions from pandas. We can use the following code to preprocess the historical stock data:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"df = df.set_index('date')\n",
|
||||
"df = df.sort_index(ascending=True)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will set the date column as the index and sort the DataFrame in ascending order by date. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Calculate the short-term and long-term moving averages for the stock data using the `ta.SMA()` function from ta-lib.\n",
|
||||
"Input: The period for the short-term moving average and the period for the long-term moving average.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can calculate the short-term and long-term moving averages for the stock data using the `ta.SMA()` function from ta-lib. We need to pass the period for the short-term moving average and the period for the long-term moving average as arguments to this function. We can use the following code to calculate the short-term and long-term moving averages:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"short_ma = ta.SMA(df['close'], timeperiod=short_period)\n",
|
||||
"long_ma = ta.SMA(df['close'], timeperiod=long_period)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will calculate the short-term and long-term moving averages for the stock data and store them in the `short_ma` and `long_ma` variables, respectively. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages using the `concat()` function from pandas. We need to pass the historical stock data, the short-term moving average, and the long-term moving average as arguments to this function. We can use the following code to create the new DataFrame:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"new_df = pd.concat([df, short_ma, long_ma], axis=1)\n",
|
||||
"new_df.columns = ['open', 'high', 'low', 'close', 'volume', 'short_ma', 'long_ma']\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will create a new DataFrame called `new_df` that combines the historical stock data with the short-term and long-term moving averages. The columns of the new DataFrame are named 'open', 'high', 'low', 'close', 'volume', 'short_ma', and 'long_ma'. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages. We can use the following code to create the new column:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"new_df['signal'] = np.where(new_df['short_ma'] > new_df['long_ma'], 1, -1)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will create a new column called 'signal' in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages. If the short-term moving average is greater than the long-term moving average, the signal is 1 (buy), otherwise the signal is -1 (sell). Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target.\n",
|
||||
"Input: The stop loss and profit target as percentages.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target. We need to pass the stop loss and profit target as percentages as arguments to this function. We can use the following code to create the new column:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"stop_loss = stop_loss_percent / 100\n",
|
||||
"profit_target = profit_target_percent / 100\n",
|
||||
"\n",
|
||||
"new_df['pnl'] = 0.0\n",
|
||||
"buy_price = 0.0\n",
|
||||
"for i in range(1, len(new_df)):\n",
|
||||
" if new_df['signal'][i] == 1 and new_df['signal'][i-1] == -1:\n",
|
||||
" buy_price = new_df['close'][i]\n",
|
||||
" elif new_df['signal'][i] == -1 and new_df['signal'][i-1] == 1:\n",
|
||||
" sell_price = new_df['close'][i]\n",
|
||||
" if sell_price <= buy_price * (1 - stop_loss):\n",
|
||||
" new_df['pnl'][i] = -stop_loss\n",
|
||||
" elif sell_price >= buy_price * (1 + profit_target):\n",
|
||||
" new_df['pnl'][i] = profit_target\n",
|
||||
" else:\n",
|
||||
" new_df['pnl'][i] = (sell_price - buy_price) / buy_price\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will create a new column called 'pnl' in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target. The stop loss and profit target are calculated based on the stop_loss_percent and profit_target_percent variables, respectively. The buy and sell prices are stored in the buy_price and sell_price variables, respectively. If the sell price is less than or equal to the stop loss, the profit or loss is set to -stop_loss. If the sell price is greater than or equal to the profit target, the profit or loss is set to profit_target. Otherwise, the profit or loss is calculated as (sell_price - buy_price) / buy_price. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Calculate the total profit or loss for all trades.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can calculate the total profit or loss for all trades by summing the values in the 'pnl' column of the DataFrame. We can use the following code to calculate the total profit or loss:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"total_pnl = new_df['pnl'].sum()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will calculate the total profit or loss for all trades and store it in the total_pnl variable. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Visualize the stock data, short-term moving average, and long-term moving average using a line chart.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can visualize the stock data, short-term moving average, and long-term moving average using a line chart using the `plot()` function from pandas. We can use the following code to visualize the data:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"plt.figure(figsize=(12,6))\n",
|
||||
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
|
||||
"plt.plot(new_df.index, new_df['short_ma'], label='Short MA')\n",
|
||||
"plt.plot(new_df.index, new_df['long_ma'], label='Long MA')\n",
|
||||
"plt.xlabel('Date')\n",
|
||||
"plt.ylabel('Price')\n",
|
||||
"plt.title('Stock Data with Moving Averages')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will create a line chart that visualizes the stock data, short-term moving average, and long-term moving average. The x-axis represents the date and the y-axis represents the price. The chart also includes a legend that labels each line. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Visualize the buy and sell signals using a scatter plot.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can visualize the buy and sell signals using a scatter plot using the `scatter()` function from matplotlib. We can use the following code to visualize the signals:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"buy_signals = new_df[new_df['signal'] == 1]\n",
|
||||
"sell_signals = new_df[new_df['signal'] == -1]\n",
|
||||
"\n",
|
||||
"plt.figure(figsize=(12,6))\n",
|
||||
"plt.scatter(buy_signals.index, buy_signals['close'], label='Buy', marker='^', color='green')\n",
|
||||
"plt.scatter(sell_signals.index, sell_signals['close'], label='Sell', marker='v', color='red')\n",
|
||||
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
|
||||
"plt.xlabel('Date')\n",
|
||||
"plt.ylabel('Price')\n",
|
||||
"plt.title('Buy and Sell Signals')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will create a scatter plot that visualizes the buy and sell signals. The x-axis represents the date and the y-axis represents the price. The chart also includes a legend that labels each point. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Print the total profit or loss for all trades.\n",
|
||||
"Input: None.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: We can print the total profit or loss for all trades using the `print()` function. We can use the following code to print the total profit or loss:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"print('Total Profit/Loss: {:.2%}'.format(total_pnl))\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This will print the total profit or loss for all trades in percentage format with two decimal places. Next request.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"Instruction: Finalize the code by putting all the code snippets together in a single Python script.\n",
|
||||
"Input: The necessary inputs for each code snippet.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Solution: Here is the complete Python script that puts all the code snippets together:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import talib as ta\n",
|
||||
"\n",
|
||||
"# Load historical stock data into a pandas DataFrame\n",
|
||||
"df = pd.read_csv('path/to/csv/file.csv')\n",
|
||||
"\n",
|
||||
"# Preprocess the historical stock data\n",
|
||||
"df = df.set_index('date')\n",
|
||||
"df = df.sort_index(ascending=True)\n",
|
||||
"\n",
|
||||
"# Calculate the short-term and long-term moving averages\n",
|
||||
"short_period = 10\n",
|
||||
"long_period = 30\n",
|
||||
"short_ma = ta.SMA(df['close'], timeperiod=short_period)\n",
|
||||
"long_ma = ta.SMA(df['close'], timeperiod=long_period)\n",
|
||||
"\n",
|
||||
"# Create a new DataFrame that combines the historical stock data with the short-term and long-term moving averages\n",
|
||||
"new_df = pd.concat([df, short_ma, long_ma], axis=1)\n",
|
||||
"new_df.columns = ['open', 'high', 'low', 'close', 'volume', 'short_ma', 'long_ma']\n",
|
||||
"\n",
|
||||
"# Create a new column in the DataFrame that indicates when to buy or sell the stock based on the crossover of the short-term and long-term moving averages\n",
|
||||
"new_df['signal'] = np.where(new_df['short_ma'] > new_df['long_ma'], 1, -1)\n",
|
||||
"\n",
|
||||
"# Create a new column in the DataFrame that indicates the profit or loss for each trade based on the buy and sell signals and the defined stop loss and profit target\n",
|
||||
"stop_loss_percent = 5\n",
|
||||
"profit_target_percent = 10\n",
|
||||
"stop_loss = stop_loss_percent / 100\n",
|
||||
"profit_target = profit_target_percent / 100\n",
|
||||
"new_df['pnl'] = 0.0\n",
|
||||
"buy_price = 0.0\n",
|
||||
"for i in range(1, len(new_df)):\n",
|
||||
" if new_df['signal'][i] == 1 and new_df['signal'][i-1] == -1:\n",
|
||||
" buy_price = new_df['close'][i]\n",
|
||||
" elif new_df['signal'][i] == -1 and new_df['signal'][i-1] == 1:\n",
|
||||
" sell_price = new_df['close'][i]\n",
|
||||
" if sell_price <= buy_price * (1 - stop_loss):\n",
|
||||
" new_df['pnl'][i] = -stop_loss\n",
|
||||
" elif sell_price >= buy_price * (1 + profit_target):\n",
|
||||
" new_df['pnl'][i] = profit_target\n",
|
||||
" else:\n",
|
||||
" new_df['pnl'][i] = (sell_price - buy_price) / buy_price\n",
|
||||
"\n",
|
||||
"# Calculate the total profit or loss for all trades\n",
|
||||
"total_pnl = new_df['pnl'].sum()\n",
|
||||
"\n",
|
||||
"# Visualize the stock data, short-term moving average, and long-term moving average using a line chart\n",
|
||||
"plt.figure(figsize=(12,6))\n",
|
||||
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
|
||||
"plt.plot(new_df.index, new_df['short_ma'], label='Short MA')\n",
|
||||
"plt.plot(new_df.index, new_df['long_ma'], label='Long MA')\n",
|
||||
"plt.xlabel('Date')\n",
|
||||
"plt.ylabel('Price')\n",
|
||||
"plt.title('Stock Data with Moving Averages')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"# Visualize the buy and sell signals using a scatter plot\n",
|
||||
"buy_signals = new_df[new_df['signal'] == 1]\n",
|
||||
"sell_signals = new_df[new_df['signal'] == -1]\n",
|
||||
"plt.figure(figsize=(12,6))\n",
|
||||
"plt.scatter(buy_signals.index, buy_signals['close'], label='Buy', marker='^', color='green')\n",
|
||||
"plt.scatter(sell_signals.index, sell_signals['close'], label='Sell', marker='v', color='red')\n",
|
||||
"plt.plot(new_df.index, new_df['close'], label='Close')\n",
|
||||
"plt.xlabel('Date')\n",
|
||||
"plt.ylabel('Price')\n",
|
||||
"plt.title('Buy and Sell Signals')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()\n",
|
||||
"\n",
|
||||
"# Print the total profit or loss for all trades\n",
|
||||
"print('Total Profit/Loss: {:.2%}'.format(total_pnl))\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You need to replace the path/to/csv/file.csv with the actual path to the CSV file containing the historical stock data. You can also adjust the short_period, long_period, stop_loss_percent, and profit_target_percent variables to suit your needs.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI User (Stock Trader):\n",
|
||||
"\n",
|
||||
"<CAMEL_TASK_DONE>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"AI Assistant (Python Programmer):\n",
|
||||
"\n",
|
||||
"Great! Let me know if you need any further assistance.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Original task prompt:\\n{task}\\n\")\n",
|
||||
"print(f\"Specified task prompt:\\n{specified_task}\\n\")\n",
|
||||
"\n",
|
||||
"chat_turn_limit, n = 30, 0\n",
|
||||
"while n < chat_turn_limit:\n",
|
||||
" n += 1\n",
|
||||
" user_ai_msg = user_agent.step(assistant_msg)\n",
|
||||
" user_msg = HumanMessage(content=user_ai_msg.content)\n",
|
||||
" print(f\"AI User ({user_role_name}):\\n\\n{user_msg.content}\\n\\n\")\n",
|
||||
" \n",
|
||||
" assistant_ai_msg = assistant_agent.step(user_msg)\n",
|
||||
" assistant_msg = HumanMessage(content=assistant_ai_msg.content)\n",
|
||||
" print(f\"AI Assistant ({assistant_role_name}):\\n\\n{assistant_msg.content}\\n\\n\")\n",
|
||||
" if \"<CAMEL_TASK_DONE>\" in user_msg.content:\n",
|
||||
" break"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "camel",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,25 +0,0 @@
|
||||
# Code Understanding
|
||||
|
||||
Overview
|
||||
|
||||
LangChain is a useful tool designed to parse GitHub code repositories. By leveraging VectorStores, Conversational RetrieverChain, and GPT-4, it can answer questions in the context of an entire GitHub repository or generate new code. This documentation page outlines the essential components of the system and guides using LangChain for better code comprehension, contextual question answering, and code generation in GitHub repositories.
|
||||
|
||||
## Conversational Retriever Chain
|
||||
|
||||
Conversational RetrieverChain is a retrieval-focused system that interacts with the data stored in a VectorStore. Utilizing advanced techniques, like context-aware filtering and ranking, it retrieves the most relevant code snippets and information for a given user query. Conversational RetrieverChain is engineered to deliver high-quality, pertinent results while considering conversation history and context.
|
||||
|
||||
LangChain Workflow for Code Understanding and Generation
|
||||
|
||||
1. Index the code base: Clone the target repository, load all files within, chunk the files, and execute the indexing process. Optionally, you can skip this step and use an already indexed dataset.
|
||||
|
||||
2. Embedding and Code Store: Code snippets are embedded using a code-aware embedding model and stored in a VectorStore.
|
||||
Query Understanding: GPT-4 processes user queries, grasping the context and extracting relevant details.
|
||||
|
||||
3. Construct the Retriever: Conversational RetrieverChain searches the VectorStore to identify the most relevant code snippets for a given query.
|
||||
|
||||
4. Build the Conversational Chain: Customize the retriever settings and define any user-defined filters as needed.
|
||||
|
||||
5. Ask questions: Define a list of questions to ask about the codebase, and then use the ConversationalRetrievalChain to generate context-aware answers. The LLM (GPT-4) generates comprehensive, context-aware answers based on retrieved code snippets and conversation history.
|
||||
|
||||
The full tutorial is available below.
|
||||
- [Twitter the-algorithm codebase analysis with Deep Lake](code/twitter-the-algorithm-analysis-deeplake.ipynb): A notebook walking through how to parse github source code and run queries conversation.
|
||||
@@ -1,431 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Analysis of Twitter the-algorithm source code with LangChain, GPT4 and Deep Lake\n",
|
||||
"In this tutorial, we are going to use Langchain + Deep Lake with GPT4 to analyze the code base of the twitter algorithm. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!python3 -m pip install --upgrade langchain deeplake openai tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Define OpenAI embeddings, Deep Lake multi-modal vector store api and authenticate. For full documentation of Deep Lake please follow https://docs.activeloop.ai/ and API reference https://docs.deeplake.ai/en/latest/"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY']='sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Authenticate into Deep Lake if you want to create your own dataset and publish it. You can get an API key from the platform at https://app.activeloop.ai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!activeloop login -t <TOKEN>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Index the code base (optional)\n",
|
||||
"You can directly skip this part and directly jump into using already indexed dataset. To begin with, first we will clone the repository, then parse and chunk the code base and use OpenAI indexing."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!git clone https://github.com/twitter/the-algorithm # replace any repository of your choice "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Load all files inside the repository"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"root_dir = './the-algorithm'\n",
|
||||
"docs = []\n",
|
||||
"for dirpath, dirnames, filenames in os.walk(root_dir):\n",
|
||||
" for file in filenames:\n",
|
||||
" try: \n",
|
||||
" loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
|
||||
" docs.extend(loader.load_and_split())\n",
|
||||
" except Exception as e: \n",
|
||||
" pass"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then, chunk the files"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_documents(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Execute the indexing. This will take about ~4 mins to compute embeddings and upload to Activeloop. You can then publish the dataset to be public."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = DeepLake.from_documents(texts, embeddings, dataset_path=\"hub://davitbun/twitter-algorithm\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2. Question Answering on Twitter algorithm codebase\n",
|
||||
"First load the dataset, construct the retriever, then construct the Conversational Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/twitter-algorithm\n",
|
||||
"\n",
|
||||
"hub://davitbun/twitter-algorithm loaded successfully.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Deep Lake Dataset in hub://davitbun/twitter-algorithm already exists, loading from the storage\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Dataset(path='hub://davitbun/twitter-algorithm', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
|
||||
"\n",
|
||||
" tensor htype shape dtype compression\n",
|
||||
" ------- ------- ------- ------- ------- \n",
|
||||
" embedding generic (23152, 1536) float32 None \n",
|
||||
" ids text (23152, 1) str None \n",
|
||||
" metadata json (23152, 1) str None \n",
|
||||
" text text (23152, 1) str None \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db = DeepLake(dataset_path=\"hub://davitbun/twitter-algorithm\", read_only=True, embedding_function=embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"retriever = db.as_retriever()\n",
|
||||
"retriever.search_kwargs['distance_metric'] = 'cos'\n",
|
||||
"retriever.search_kwargs['fetch_k'] = 100\n",
|
||||
"retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
|
||||
"retriever.search_kwargs['k'] = 20"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def filter(x):\n",
|
||||
" # filter based on source code\n",
|
||||
" if 'com.google' in x['text'].data()['value']:\n",
|
||||
" return False\n",
|
||||
" \n",
|
||||
" # filter based on path e.g. extension\n",
|
||||
" metadata = x['metadata'].data()['value']\n",
|
||||
" return 'scala' in metadata['source'] or 'py' in metadata['source']\n",
|
||||
"\n",
|
||||
"### turn on below for custom filtering\n",
|
||||
"# retriever.search_kwargs['filter'] = filter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model='gpt-4') # 'gpt-3.5-turbo',\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"questions = [\n",
|
||||
" \"What does favCountParams do?\",\n",
|
||||
" \"is it Likes + Bookmarks, or not clear from the code?\",\n",
|
||||
" \"What are the major negative modifiers that lower your linear ranking parameters?\", \n",
|
||||
" \"How do you get assigned to SimClusters?\",\n",
|
||||
" \"What is needed to migrate from one SimClusters to another SimClusters?\",\n",
|
||||
" \"How much do I get boosted within my cluster?\", \n",
|
||||
" \"How does Heavy ranker work. what are it’s main inputs?\",\n",
|
||||
" \"How can one influence Heavy ranker?\",\n",
|
||||
" \"why threads and long tweets do so well on the platform?\",\n",
|
||||
" \"Are thread and long tweet creators building a following that reacts to only threads?\",\n",
|
||||
" \"Do you need to follow different strategies to get most followers vs to get most likes and bookmarks per tweet?\",\n",
|
||||
" \"Content meta data and how it impacts virality (e.g. ALT in images).\",\n",
|
||||
" \"What are some unexpected fingerprints for spam factors?\",\n",
|
||||
" \"Is there any difference between company verified checkmarks and blue verified individual checkmarks?\",\n",
|
||||
"] \n",
|
||||
"chat_history = []\n",
|
||||
"\n",
|
||||
"for question in questions: \n",
|
||||
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
|
||||
" chat_history.append((question, result['answer']))\n",
|
||||
" print(f\"-> **Question**: {question} \\n\")\n",
|
||||
" print(f\"**Answer**: {result['answer']} \\n\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"-> **Question**: is it Likes + Bookmarks, or not clear from the code?\n",
|
||||
"\n",
|
||||
"**Answer**: From the provided code, it is not clear if the favorite count metric is determined by the sum of likes and bookmarks. The favorite count is mentioned in the code, but there is no explicit reference to how it is calculated in terms of likes and bookmarks. \n",
|
||||
"\n",
|
||||
"-> **Question**: What are the major negative modifiers that lower your linear ranking parameters?\n",
|
||||
"\n",
|
||||
"**Answer**: In the given code, major negative modifiers that lower the linear ranking parameters are:\n",
|
||||
"\n",
|
||||
"1. `scoringData.querySpecificScore`: This score adjustment is based on the query-specific information. If its value is negative, it will lower the linear ranking parameters.\n",
|
||||
"\n",
|
||||
"2. `scoringData.authorSpecificScore`: This score adjustment is based on the author-specific information. If its value is negative, it will also lower the linear ranking parameters.\n",
|
||||
"\n",
|
||||
"Please note that I cannot provide more information on the exact calculations of these negative modifiers, as the code for their determination is not provided. \n",
|
||||
"\n",
|
||||
"-> **Question**: How do you get assigned to SimClusters?\n",
|
||||
"\n",
|
||||
"**Answer**: The assignment to SimClusters occurs through a Metropolis-Hastings sampling-based community detection algorithm that is run on the Producer-Producer similarity graph. This graph is created by computing the cosine similarity scores between the users who follow each producer. The algorithm identifies communities or clusters of Producers with similar followers, and takes a parameter *k* for specifying the number of communities to be detected.\n",
|
||||
"\n",
|
||||
"After the community detection, different users and content are represented as sparse, interpretable vectors within these identified communities (SimClusters). The resulting SimClusters embeddings can be used for various recommendation tasks. \n",
|
||||
"\n",
|
||||
"-> **Question**: What is needed to migrate from one SimClusters to another SimClusters?\n",
|
||||
"\n",
|
||||
"**Answer**: To migrate from one SimClusters representation to another, you can follow these general steps:\n",
|
||||
"\n",
|
||||
"1. **Prepare the new representation**: Create the new SimClusters representation using any necessary updates or changes in the clustering algorithm, similarity measures, or other model parameters. Ensure that this new representation is properly stored and indexed as needed.\n",
|
||||
"\n",
|
||||
"2. **Update the relevant code and configurations**: Modify the relevant code and configuration files to reference the new SimClusters representation. This may involve updating paths or dataset names to point to the new representation, as well as changing code to use the new clustering method or similarity functions if applicable.\n",
|
||||
"\n",
|
||||
"3. **Test the new representation**: Before deploying the changes to production, thoroughly test the new SimClusters representation to ensure its effectiveness and stability. This may involve running offline jobs like candidate generation and label candidates, validating the output, as well as testing the new representation in the evaluation environment using evaluation tools like TweetSimilarityEvaluationAdhocApp.\n",
|
||||
"\n",
|
||||
"4. **Deploy the changes**: Once the new representation has been tested and validated, deploy the changes to production. This may involve creating a zip file, uploading it to the packer, and then scheduling it with Aurora. Be sure to monitor the system to ensure a smooth transition between representations and verify that the new representation is being used in recommendations as expected.\n",
|
||||
"\n",
|
||||
"5. **Monitor and assess the new representation**: After the new representation has been deployed, continue to monitor its performance and impact on recommendations. Take note of any improvements or issues that arise and be prepared to iterate on the new representation if needed. Always ensure that the results and performance metrics align with the system's goals and objectives. \n",
|
||||
"\n",
|
||||
"-> **Question**: How much do I get boosted within my cluster?\n",
|
||||
"\n",
|
||||
"**Answer**: It's not possible to determine the exact amount your content is boosted within your cluster in the SimClusters representation without specific data about your content and its engagement metrics. However, a combination of factors, such as the favorite score and follow score, alongside other engagement signals and SimCluster calculations, influence the boosting of content. \n",
|
||||
"\n",
|
||||
"-> **Question**: How does Heavy ranker work. what are it’s main inputs?\n",
|
||||
"\n",
|
||||
"**Answer**: The Heavy Ranker is a machine learning model that plays a crucial role in ranking and scoring candidates within the recommendation algorithm. Its primary purpose is to predict the likelihood of a user engaging with a tweet or connecting with another user on the platform.\n",
|
||||
"\n",
|
||||
"Main inputs to the Heavy Ranker consist of:\n",
|
||||
"\n",
|
||||
"1. Static Features: These are features that can be computed directly from a tweet at the time it's created, such as whether it has a URL, has cards, has quotes, etc. These features are produced by the Index Ingester as the tweets are generated and stored in the index.\n",
|
||||
"\n",
|
||||
"2. Real-time Features: These per-tweet features can change after the tweet has been indexed. They mostly consist of social engagements like retweet count, favorite count, reply count, and some spam signals that are computed with later activities. The Signal Ingester, which is part of a Heron topology, processes multiple event streams to collect and compute these real-time features.\n",
|
||||
"\n",
|
||||
"3. User Table Features: These per-user features are obtained from the User Table Updater that processes a stream written by the user service. This input is used to store sparse real-time user information, which is later propagated to the tweet being scored by looking up the author of the tweet.\n",
|
||||
"\n",
|
||||
"4. Search Context Features: These features represent the context of the current searcher, like their UI language, their content consumption, and the current time (implied). They are combined with Tweet Data to compute some of the features used in scoring.\n",
|
||||
"\n",
|
||||
"These inputs are then processed by the Heavy Ranker to score and rank candidates based on their relevance and likelihood of engagement by the user. \n",
|
||||
"\n",
|
||||
"-> **Question**: How can one influence Heavy ranker?\n",
|
||||
"\n",
|
||||
"**Answer**: To influence the Heavy Ranker's output or ranking of content, consider the following actions:\n",
|
||||
"\n",
|
||||
"1. Improve content quality: Create high-quality and engaging content that is relevant, informative, and valuable to users. High-quality content is more likely to receive positive user engagement, which the Heavy Ranker considers when ranking content.\n",
|
||||
"\n",
|
||||
"2. Increase user engagement: Encourage users to interact with content through likes, retweets, replies, and comments. Higher engagement levels can lead to better ranking in the Heavy Ranker's output.\n",
|
||||
"\n",
|
||||
"3. Optimize your user profile: A user's reputation, based on factors such as their follower count and follower-to-following ratio, may impact the ranking of their content. Maintain a good reputation by following relevant users, keeping a reasonable follower-to-following ratio and engaging with your followers.\n",
|
||||
"\n",
|
||||
"4. Enhance content discoverability: Use relevant keywords, hashtags, and mentions in your tweets, making it easier for users to find and engage with your content. This increased discoverability may help improve the ranking of your content by the Heavy Ranker.\n",
|
||||
"\n",
|
||||
"5. Leverage multimedia content: Experiment with different content formats, such as videos, images, and GIFs, which may capture users' attention and increase engagement, resulting in better ranking by the Heavy Ranker.\n",
|
||||
"\n",
|
||||
"6. User feedback: Monitor and respond to feedback for your content. Positive feedback may improve your ranking, while negative feedback provides an opportunity to learn and improve.\n",
|
||||
"\n",
|
||||
"Note that the Heavy Ranker uses a combination of machine learning models and various features to rank the content. While the above actions may help influence the ranking, there are no guarantees as the ranking process is determined by a complex algorithm, which evolves over time. \n",
|
||||
"\n",
|
||||
"-> **Question**: why threads and long tweets do so well on the platform?\n",
|
||||
"\n",
|
||||
"**Answer**: Threads and long tweets perform well on the platform for several reasons:\n",
|
||||
"\n",
|
||||
"1. **More content and context**: Threads and long tweets provide more information and context about a topic, which can make the content more engaging and informative for users. People tend to appreciate a well-structured and detailed explanation of a subject or a story, and threads and long tweets can do that effectively.\n",
|
||||
"\n",
|
||||
"2. **Increased user engagement**: As threads and long tweets provide more content, they also encourage users to engage with the tweets through replies, retweets, and likes. This increased engagement can lead to better visibility of the content, as the Twitter algorithm considers user engagement when ranking and surfacing tweets.\n",
|
||||
"\n",
|
||||
"3. **Narrative structure**: Threads enable users to tell stories or present arguments in a step-by-step manner, making the information more accessible and easier to follow. This narrative structure can capture users' attention and encourage them to read through the entire thread and interact with the content.\n",
|
||||
"\n",
|
||||
"4. **Expanded reach**: When users engage with a thread, their interactions can bring the content to the attention of their followers, helping to expand the reach of the thread. This increased visibility can lead to more interactions and higher performance for the threaded tweets.\n",
|
||||
"\n",
|
||||
"5. **Higher content quality**: Generally, threads and long tweets require more thought and effort to create, which may lead to higher quality content. Users are more likely to appreciate and interact with high-quality, well-reasoned content, further improving the performance of these tweets within the platform.\n",
|
||||
"\n",
|
||||
"Overall, threads and long tweets perform well on Twitter because they encourage user engagement and provide a richer, more informative experience that users find valuable. \n",
|
||||
"\n",
|
||||
"-> **Question**: Are thread and long tweet creators building a following that reacts to only threads?\n",
|
||||
"\n",
|
||||
"**Answer**: Based on the provided code and context, there isn't enough information to conclude if the creators of threads and long tweets primarily build a following that engages with only thread-based content. The code provided is focused on Twitter's recommendation and ranking algorithms, as well as infrastructure components like Kafka, partitions, and the Follow Recommendations Service (FRS). To answer your question, data analysis of user engagement and results of specific edge cases would be required. \n",
|
||||
"\n",
|
||||
"-> **Question**: Do you need to follow different strategies to get most followers vs to get most likes and bookmarks per tweet?\n",
|
||||
"\n",
|
||||
"**Answer**: Yes, different strategies need to be followed to maximize the number of followers compared to maximizing likes and bookmarks per tweet. While there may be some overlap in the approaches, they target different aspects of user engagement.\n",
|
||||
"\n",
|
||||
"Maximizing followers: The primary focus is on growing your audience on the platform. Strategies include:\n",
|
||||
"\n",
|
||||
"1. Consistently sharing high-quality content related to your niche or industry.\n",
|
||||
"2. Engaging with others on the platform by replying, retweeting, and mentioning other users.\n",
|
||||
"3. Using relevant hashtags and participating in trending conversations.\n",
|
||||
"4. Collaborating with influencers and other users with a large following.\n",
|
||||
"5. Posting at optimal times when your target audience is most active.\n",
|
||||
"6. Optimizing your profile by using a clear profile picture, catchy bio, and relevant links.\n",
|
||||
"\n",
|
||||
"Maximizing likes and bookmarks per tweet: The focus is on creating content that resonates with your existing audience and encourages engagement. Strategies include:\n",
|
||||
"\n",
|
||||
"1. Crafting engaging and well-written tweets that encourage users to like or save them.\n",
|
||||
"2. Incorporating visually appealing elements, such as images, GIFs, or videos, that capture attention.\n",
|
||||
"3. Asking questions, sharing opinions, or sparking conversations that encourage users to engage with your tweets.\n",
|
||||
"4. Using analytics to understand the type of content that resonates with your audience and tailoring your tweets accordingly.\n",
|
||||
"5. Posting a mix of educational, entertaining, and promotional content to maintain variety and interest.\n",
|
||||
"6. Timing your tweets strategically to maximize engagement, likes, and bookmarks per tweet.\n",
|
||||
"\n",
|
||||
"Both strategies can overlap, and you may need to adapt your approach by understanding your target audience's preferences and analyzing your account's performance. However, it's essential to recognize that maximizing followers and maximizing likes and bookmarks per tweet have different focuses and require specific strategies. \n",
|
||||
"\n",
|
||||
"-> **Question**: Content meta data and how it impacts virality (e.g. ALT in images).\n",
|
||||
"\n",
|
||||
"**Answer**: There is no direct information in the provided context about how content metadata, such as ALT text in images, impacts the virality of a tweet or post. However, it's worth noting that including ALT text can improve the accessibility of your content for users who rely on screen readers, which may lead to increased engagement for a broader audience. Additionally, metadata can be used in search engine optimization, which might improve the visibility of the content, but the context provided does not mention any specific correlation with virality. \n",
|
||||
"\n",
|
||||
"-> **Question**: What are some unexpected fingerprints for spam factors?\n",
|
||||
"\n",
|
||||
"**Answer**: In the provided context, an unusual indicator of spam factors is when a tweet contains a non-media, non-news link. If the tweet has a link but does not have an image URL, video URL, or news URL, it is considered a potential spam vector, and a threshold for user reputation (tweepCredThreshold) is set to MIN_TWEEPCRED_WITH_LINK.\n",
|
||||
"\n",
|
||||
"While this rule may not cover all possible unusual spam indicators, it is derived from the specific codebase and logic shared in the context. \n",
|
||||
"\n",
|
||||
"-> **Question**: Is there any difference between company verified checkmarks and blue verified individual checkmarks?\n",
|
||||
"\n",
|
||||
"**Answer**: Yes, there is a distinction between the verified checkmarks for companies and blue verified checkmarks for individuals. The code snippet provided mentions \"Blue-verified account boost\" which indicates that there is a separate category for blue verified accounts. Typically, blue verified checkmarks are used to indicate notable individuals, while verified checkmarks are for companies or organizations. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
"source": [
|
||||
"# Data Augmented Question Answering\n",
|
||||
"\n",
|
||||
"This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your proprietary data.\n",
|
||||
"This notebook uses some generic prompts/language models to evaluate an question answering system that uses other sources of data besides what is in the model. For example, this can be used to evaluate a question answering system over your propritary data.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"Let's set up an example with our favorite example - the state of the union address."
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Personal Assistants (Agents)
|
||||
# Personal Assistants
|
||||
|
||||
> [Conceptual Guide](https://docs.langchain.com/docs/use-cases/personal-assistants)
|
||||
|
||||
@@ -16,9 +16,3 @@ Highlighting specific parts:
|
||||
- [Agent Documentation](../modules/agents.rst) (for interacting with the outside world)
|
||||
- [Index Documentation](../modules/indexes.rst) (for giving them knowledge of your data)
|
||||
- [Memory](../modules/memory.rst) (for helping them remember interactions)
|
||||
|
||||
Specific examples of this include:
|
||||
|
||||
- [Baby AGI](agents/baby_agi.ipynb): a notebook implementing [BabyAGI](https://github.com/yoheinakajima/babyagi) by Yohei Nakajima as LLM Chains
|
||||
- [Baby AGI with Tools](agents/baby_agi_with_agent.ipynb): building off the above notebook, this example substitutes in an agent with tools as the execution tools, allowing it to actually take actions.
|
||||
- [CAMEL](agents/camel_role_playing.ipynb): an implementation of the CAMEL (Communicative Agents for “Mind” Exploration of Large Scale Language Model Society) paper, where two agents communicate with eachother.
|
||||
|
||||
@@ -51,8 +51,8 @@ chain.run(input_documents=docs, question=query)
|
||||
```
|
||||
|
||||
The following resources exist:
|
||||
- [Question Answering Notebook](../modules/chains/index_examples/question_answering.ipynb): A notebook walking through how to accomplish this task.
|
||||
- [VectorDB Question Answering Notebook](../modules/chains/index_examples/vector_db_qa.ipynb): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
|
||||
- [Question Answering Notebook](/modules/indexes/chain_examples/question_answering.ipynb): A notebook walking through how to accomplish this task.
|
||||
- [VectorDB Question Answering Notebook](/modules/indexes/chain_examples/vector_db_qa.ipynb): A notebook walking through how to do question answering over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
|
||||
|
||||
## Adding in sources
|
||||
|
||||
@@ -67,8 +67,8 @@ chain({"input_documents": docs, "question": query}, return_only_outputs=True)
|
||||
```
|
||||
|
||||
The following resources exist:
|
||||
- [QA With Sources Notebook](../modules/chains/index_examples/qa_with_sources.ipynb): A notebook walking through how to accomplish this task.
|
||||
- [VectorDB QA With Sources Notebook](../modules/chains/index_examples/vector_db_qa_with_sources.ipynb): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
|
||||
- [QA With Sources Notebook](/modules/indexes/chain_examples/qa_with_sources.ipynb): A notebook walking through how to accomplish this task.
|
||||
- [VectorDB QA With Sources Notebook](/modules/indexes/chain_examples/vector_db_qa_with_sources.ipynb): A notebook walking through how to do question answering with sources over a vector database. This can often be useful for when you have a LOT of documents, and you don't want to pass them all to the LLM, but rather first want to do some semantic search over embeddings.
|
||||
|
||||
## Additional Related Resources
|
||||
|
||||
|
||||
@@ -3,7 +3,6 @@
|
||||
from langchain.agents.agent_toolkits.csv.base import create_csv_agent
|
||||
from langchain.agents.agent_toolkits.json.base import create_json_agent
|
||||
from langchain.agents.agent_toolkits.json.toolkit import JsonToolkit
|
||||
from langchain.agents.agent_toolkits.nla.toolkit import NLAToolkit
|
||||
from langchain.agents.agent_toolkits.openapi.base import create_openapi_agent
|
||||
from langchain.agents.agent_toolkits.openapi.toolkit import OpenAPIToolkit
|
||||
from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent
|
||||
@@ -29,7 +28,6 @@ __all__ = [
|
||||
"create_vectorstore_agent",
|
||||
"JsonToolkit",
|
||||
"SQLDatabaseToolkit",
|
||||
"NLAToolkit",
|
||||
"OpenAPIToolkit",
|
||||
"VectorStoreToolkit",
|
||||
"create_vectorstore_router_agent",
|
||||
|
||||
@@ -1,54 +0,0 @@
|
||||
"""Tool for interacting with a single API with natural language efinition."""
|
||||
|
||||
|
||||
from typing import Any, Optional
|
||||
|
||||
from langchain.agents.tools import Tool
|
||||
from langchain.chains.api.openapi.chain import OpenAPIEndpointChain
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.requests import Requests
|
||||
from langchain.tools.openapi.utils.api_models import APIOperation
|
||||
from langchain.tools.openapi.utils.openapi_utils import OpenAPISpec
|
||||
|
||||
|
||||
class NLATool(Tool):
|
||||
"""Natural Language API Tool."""
|
||||
|
||||
@classmethod
|
||||
def from_open_api_endpoint_chain(
|
||||
cls, chain: OpenAPIEndpointChain, api_title: str
|
||||
) -> "NLATool":
|
||||
"""Convert an endpoint chain to an API endpoint tool."""
|
||||
expanded_name = (
|
||||
f'{api_title.replace(" ", "_")}.{chain.api_operation.operation_id}'
|
||||
)
|
||||
description = (
|
||||
f"I'm an AI from {api_title}. Instruct what you want,"
|
||||
" and I'll assist via an API with description:"
|
||||
f" {chain.api_operation.description}"
|
||||
)
|
||||
return cls(name=expanded_name, func=chain.run, description=description)
|
||||
|
||||
@classmethod
|
||||
def from_llm_and_method(
|
||||
cls,
|
||||
llm: BaseLLM,
|
||||
path: str,
|
||||
method: str,
|
||||
spec: OpenAPISpec,
|
||||
requests: Optional[Requests] = None,
|
||||
verbose: bool = False,
|
||||
return_intermediate_steps: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> "NLATool":
|
||||
"""Instantiate the tool from the specified path and method."""
|
||||
api_operation = APIOperation.from_openapi_spec(spec, path, method)
|
||||
chain = OpenAPIEndpointChain.from_api_operation(
|
||||
api_operation,
|
||||
llm,
|
||||
requests=requests,
|
||||
verbose=verbose,
|
||||
return_intermediate_steps=return_intermediate_steps,
|
||||
**kwargs,
|
||||
)
|
||||
return cls.from_open_api_endpoint_chain(chain, spec.info.title)
|
||||
@@ -1,66 +0,0 @@
|
||||
"""Toolkit for interacting with API's using natural language."""
|
||||
|
||||
|
||||
from typing import Any, List, Optional, Sequence
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.agents.agent_toolkits.base import BaseToolkit
|
||||
from langchain.agents.agent_toolkits.nla.tool import NLATool
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.requests import Requests
|
||||
from langchain.tools.base import BaseTool
|
||||
from langchain.tools.openapi.utils.openapi_utils import OpenAPISpec
|
||||
|
||||
|
||||
class NLAToolkit(BaseToolkit):
|
||||
"""Natural Language API Toolkit Definition."""
|
||||
|
||||
nla_tools: Sequence[NLATool] = Field(...)
|
||||
"""List of API Endpoint Tools."""
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools for all the API operations."""
|
||||
return list(self.nla_tools)
|
||||
|
||||
@classmethod
|
||||
def from_llm_and_spec(
|
||||
cls,
|
||||
llm: BaseLLM,
|
||||
spec: OpenAPISpec,
|
||||
requests: Optional[Requests] = None,
|
||||
verbose: bool = False,
|
||||
**kwargs: Any
|
||||
) -> "NLAToolkit":
|
||||
"""Instantiate the toolkit by creating tools for each operation."""
|
||||
http_operation_tools: List[NLATool] = []
|
||||
if not spec.paths:
|
||||
return cls(nla_tools=http_operation_tools)
|
||||
for path in spec.paths:
|
||||
for method in spec.get_methods_for_path(path):
|
||||
endpoint_tool = NLATool.from_llm_and_method(
|
||||
llm=llm,
|
||||
path=path,
|
||||
method=method,
|
||||
spec=spec,
|
||||
requests=requests,
|
||||
verbose=verbose,
|
||||
**kwargs
|
||||
)
|
||||
http_operation_tools.append(endpoint_tool)
|
||||
return cls(nla_tools=http_operation_tools)
|
||||
|
||||
@classmethod
|
||||
def from_llm_and_url(
|
||||
cls,
|
||||
llm: BaseLLM,
|
||||
open_api_url: str,
|
||||
requests: Optional[Requests] = None,
|
||||
verbose: bool = False,
|
||||
**kwargs: Any
|
||||
) -> "NLAToolkit":
|
||||
"""Instantiate the toolkit from an OpenAPI Spec URL"""
|
||||
spec = OpenAPISpec.from_url(open_api_url)
|
||||
return cls.from_llm_and_spec(
|
||||
llm=llm, spec=spec, requests=requests, verbose=verbose, **kwargs
|
||||
)
|
||||
@@ -24,7 +24,6 @@ from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents.tools import Tool
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.memory import ReadOnlySharedMemory
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.requests import RequestsWrapper
|
||||
from langchain.schema import BaseLanguageModel
|
||||
@@ -54,8 +53,7 @@ class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
data = json.loads(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
data_params = data.get("params")
|
||||
response = self.requests_wrapper.get(data["url"], params=data_params)
|
||||
response = self.requests_wrapper.get(data["url"])
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
@@ -185,8 +183,6 @@ def create_openapi_agent(
|
||||
api_spec: ReducedOpenAPISpec,
|
||||
requests_wrapper: RequestsWrapper,
|
||||
llm: BaseLanguageModel,
|
||||
shared_memory: Optional[ReadOnlySharedMemory] = None,
|
||||
verbose: bool = True,
|
||||
) -> AgentExecutor:
|
||||
"""Instantiate API planner and controller for a given spec.
|
||||
|
||||
@@ -211,7 +207,7 @@ def create_openapi_agent(
|
||||
},
|
||||
)
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=shared_memory),
|
||||
llm_chain=LLMChain(llm=llm, prompt=prompt),
|
||||
allowed_tools=[tool.name for tool in tools],
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=verbose)
|
||||
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
|
||||
|
||||
@@ -40,7 +40,7 @@ Here are endpoints you can use. Do not reference any of the endpoints above.
|
||||
User query: {query}
|
||||
Plan:"""
|
||||
API_PLANNER_TOOL_NAME = "api_planner"
|
||||
API_PLANNER_TOOL_DESCRIPTION = f"Can be used to generate the right API calls to assist with a user query, like {API_PLANNER_TOOL_NAME}(query). Should always be called before trying to call the API controller."
|
||||
API_PLANNER_TOOL_DESCRIPTION = f"Can be used to generate the right API calls to assist with a user query, like {API_PLANNER_TOOL_NAME}(query). Should always be called before trying to calling the API controller."
|
||||
|
||||
# Execution.
|
||||
API_CONTROLLER_PROMPT = """You are an agent that gets a sequence of API calls and given their documentation, should execute them and return the final response.
|
||||
@@ -81,7 +81,7 @@ API_CONTROLLER_TOOL_DESCRIPTION = f"Can be used to execute a plan of API calls,
|
||||
# The goal is to have an agent at the top-level (e.g. so it can recover from errors and re-plan) while
|
||||
# keeping planning (and specifically the planning prompt) simple.
|
||||
API_ORCHESTRATOR_PROMPT = """You are an agent that assists with user queries against API, things like querying information or creating resources.
|
||||
Some user queries can be resolved in a single API call, particularly if you can find appropriate params from the OpenAPI spec; though some require several API call.
|
||||
Some user queries can be resolved in a single API call though some require several API call.
|
||||
You should always plan your API calls first, and then execute the plan second.
|
||||
You should never return information without executing the api_controller tool.
|
||||
|
||||
@@ -106,12 +106,12 @@ User query: can you add some trendy stuff to my shopping cart.
|
||||
Thought: I should plan API calls first.
|
||||
Action: api_planner
|
||||
Action Input: I need to find the right API calls to add trendy items to the users shopping cart
|
||||
Observation: 1) GET /items with params 'trending' is 'True' to get trending item ids
|
||||
Observation: 1) GET /items/trending to get trending item ids
|
||||
2) GET /user to get user
|
||||
3) POST /cart to post the trending items to the user's cart
|
||||
Thought: I'm ready to execute the API calls.
|
||||
Action: api_controller
|
||||
Action Input: 1) GET /items params 'trending' is 'True' to get trending item ids
|
||||
Action Input: 1) GET /items/trending to get trending item ids
|
||||
2) GET /user to get user
|
||||
3) POST /cart to post the trending items to the user's cart
|
||||
...
|
||||
@@ -123,12 +123,8 @@ Thought: I should generate a plan to help with this query and then copy that pla
|
||||
{agent_scratchpad}"""
|
||||
|
||||
REQUESTS_GET_TOOL_DESCRIPTION = """Use this to GET content from a website.
|
||||
Input to the tool should be a json string with 3 keys: "url", "params" and "output_instructions".
|
||||
The value of "url" should be a string.
|
||||
The value of "params" should be a dict of the needed and available parameters from the OpenAPI spec related to the endpoint.
|
||||
If parameters are not needed, or not available, leave it empty.
|
||||
The value of "output_instructions" should be instructions on what information to extract from the response,
|
||||
for example the id(s) for a resource(s) that the GET request fetches.
|
||||
Input to the tool should be a json string with 2 keys: "url" and "output_instructions".
|
||||
The value of "url" should be a string. The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the GET request fetches.
|
||||
"""
|
||||
|
||||
PARSING_GET_PROMPT = PromptTemplate(
|
||||
|
||||
@@ -44,7 +44,7 @@ def create_sql_agent(
|
||||
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
tools=toolkit.get_tools(),
|
||||
verbose=verbose,
|
||||
max_iterations=max_iterations,
|
||||
early_stopping_method=early_stopping_method,
|
||||
|
||||
@@ -4,7 +4,7 @@ SQL_PREFIX = """You are an agent designed to interact with a SQL database.
|
||||
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
|
||||
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
|
||||
You can order the results by a relevant column to return the most interesting examples in the database.
|
||||
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
|
||||
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
|
||||
You have access to tools for interacting with the database.
|
||||
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
|
||||
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.
|
||||
|
||||
@@ -4,8 +4,6 @@ from typing import List
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.agents.agent_toolkits.base import BaseToolkit
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.sql_database import SQLDatabase
|
||||
from langchain.tools import BaseTool
|
||||
from langchain.tools.sql_database.tool import (
|
||||
@@ -20,7 +18,6 @@ class SQLDatabaseToolkit(BaseToolkit):
|
||||
"""Toolkit for interacting with SQL databases."""
|
||||
|
||||
db: SQLDatabase = Field(exclude=True)
|
||||
llm: BaseLLM = Field(default_factory=lambda: OpenAI(temperature=0))
|
||||
|
||||
@property
|
||||
def dialect(self) -> str:
|
||||
@@ -38,5 +35,5 @@ class SQLDatabaseToolkit(BaseToolkit):
|
||||
QuerySQLDataBaseTool(db=self.db),
|
||||
InfoSQLDatabaseTool(db=self.db),
|
||||
ListSQLDatabaseTool(db=self.db),
|
||||
QueryCheckerTool(db=self.db, llm=self.llm),
|
||||
QueryCheckerTool(db=self.db),
|
||||
]
|
||||
|
||||
@@ -5,7 +5,6 @@ from typing import Generator, Optional
|
||||
|
||||
from langchain.callbacks.aim_callback import AimCallbackHandler
|
||||
from langchain.callbacks.base import (
|
||||
AsyncCallbackManager,
|
||||
BaseCallbackHandler,
|
||||
BaseCallbackManager,
|
||||
CallbackManager,
|
||||
@@ -14,7 +13,6 @@ from langchain.callbacks.clearml_callback import ClearMLCallbackHandler
|
||||
from langchain.callbacks.openai_info import OpenAICallbackHandler
|
||||
from langchain.callbacks.shared import SharedCallbackManager
|
||||
from langchain.callbacks.stdout import StdOutCallbackHandler
|
||||
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
|
||||
from langchain.callbacks.tracers import SharedLangChainTracer
|
||||
from langchain.callbacks.wandb_callback import WandbCallbackHandler
|
||||
|
||||
@@ -71,14 +69,12 @@ def get_openai_callback() -> Generator[OpenAICallbackHandler, None, None]:
|
||||
|
||||
__all__ = [
|
||||
"CallbackManager",
|
||||
"AsyncCallbackManager",
|
||||
"OpenAICallbackHandler",
|
||||
"SharedCallbackManager",
|
||||
"StdOutCallbackHandler",
|
||||
"AimCallbackHandler",
|
||||
"WandbCallbackHandler",
|
||||
"ClearMLCallbackHandler",
|
||||
"AsyncIteratorCallbackHandler",
|
||||
"get_openai_callback",
|
||||
"set_tracing_callback_manager",
|
||||
"set_default_callback_manager",
|
||||
|
||||
@@ -1,66 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from typing import Any, AsyncIterator, Dict, List, Literal, Union, cast
|
||||
|
||||
from langchain.callbacks.base import AsyncCallbackHandler
|
||||
from langchain.schema import LLMResult
|
||||
|
||||
# TODO If used by two LLM runs in parallel this won't work as expected
|
||||
|
||||
|
||||
class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
|
||||
"""Callback handler that returns an async iterator."""
|
||||
|
||||
queue: asyncio.Queue[str]
|
||||
|
||||
done: asyncio.Event
|
||||
|
||||
@property
|
||||
def always_verbose(self) -> bool:
|
||||
return True
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.queue = asyncio.Queue()
|
||||
self.done = asyncio.Event()
|
||||
|
||||
async def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
# If two calls are made in a row, this resets the state
|
||||
self.done.clear()
|
||||
|
||||
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
self.queue.put_nowait(token)
|
||||
|
||||
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
self.done.set()
|
||||
|
||||
async def on_llm_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
self.done.set()
|
||||
|
||||
# TODO implement the other methods
|
||||
|
||||
async def aiter(self) -> AsyncIterator[str]:
|
||||
while not self.queue.empty() or not self.done.is_set():
|
||||
# Wait for the next token in the queue,
|
||||
# but stop waiting if the done event is set
|
||||
done, _ = await asyncio.wait(
|
||||
[
|
||||
asyncio.ensure_future(self.queue.get()),
|
||||
asyncio.ensure_future(self.done.wait()),
|
||||
],
|
||||
return_when=asyncio.FIRST_COMPLETED,
|
||||
)
|
||||
|
||||
# Extract the value of the first completed task
|
||||
token_or_done = cast(Union[str, Literal[True]], done.pop().result())
|
||||
|
||||
# If the extracted value is the boolean True, the done event was set
|
||||
if token_or_done is True:
|
||||
break
|
||||
|
||||
# Otherwise, the extracted value is a token, which we yield
|
||||
yield token_or_done
|
||||
@@ -81,26 +81,6 @@ class APIChain(Chain):
|
||||
)
|
||||
return {self.output_key: answer}
|
||||
|
||||
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
question = inputs[self.question_key]
|
||||
api_url = await self.api_request_chain.apredict(
|
||||
question=question, api_docs=self.api_docs
|
||||
)
|
||||
self.callback_manager.on_text(
|
||||
api_url, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
api_response = await self.requests_wrapper.aget(api_url)
|
||||
self.callback_manager.on_text(
|
||||
api_response, color="yellow", end="\n", verbose=self.verbose
|
||||
)
|
||||
answer = await self.api_answer_chain.apredict(
|
||||
question=question,
|
||||
api_docs=self.api_docs,
|
||||
api_url=api_url,
|
||||
api_response=api_response,
|
||||
)
|
||||
return {self.output_key: answer}
|
||||
|
||||
@classmethod
|
||||
def from_llm_and_api_docs(
|
||||
cls,
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from typing import Any, Dict, List, NamedTuple, Optional, cast
|
||||
from typing import Dict, List, NamedTuple, Optional, cast
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
from requests import Response
|
||||
@@ -35,7 +35,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
return_intermediate_steps: bool = False
|
||||
instructions_key: str = "instructions" #: :meta private:
|
||||
output_key: str = "output" #: :meta private:
|
||||
max_text_length: Optional[int] = Field(ge=0) #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
@@ -109,7 +108,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
intermediate_steps = {}
|
||||
instructions = inputs[self.instructions_key]
|
||||
instructions = instructions[: self.max_text_length]
|
||||
_api_arguments = self.api_request_chain.predict_and_parse(
|
||||
instructions=instructions
|
||||
)
|
||||
@@ -139,7 +137,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
response_text = api_response.text
|
||||
except Exception as e:
|
||||
response_text = f"Error with message {str(e)}"
|
||||
response_text = response_text[: self.max_text_length]
|
||||
intermediate_steps["response_text"] = response_text
|
||||
self.callback_manager.on_text(
|
||||
response_text, color="blue", end="\n", verbose=self.verbose
|
||||
@@ -163,7 +160,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
llm: BaseLLM,
|
||||
requests: Optional[Requests] = None,
|
||||
return_intermediate_steps: bool = False,
|
||||
**kwargs: Any
|
||||
# TODO: Handle async
|
||||
) -> "OpenAPIEndpointChain":
|
||||
"""Create an OpenAPIEndpoint from a spec at the specified url."""
|
||||
@@ -173,7 +169,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
requests=requests,
|
||||
llm=llm,
|
||||
return_intermediate_steps=return_intermediate_steps,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@@ -184,7 +179,6 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
requests: Optional[Requests] = None,
|
||||
verbose: bool = False,
|
||||
return_intermediate_steps: bool = False,
|
||||
**kwargs: Any
|
||||
# TODO: Handle async
|
||||
) -> "OpenAPIEndpointChain":
|
||||
"""Create an OpenAPIEndpointChain from an operation and a spec."""
|
||||
@@ -206,5 +200,4 @@ class OpenAPIEndpointChain(Chain, BaseModel):
|
||||
param_mapping=param_mapping,
|
||||
verbose=verbose,
|
||||
return_intermediate_steps=return_intermediate_steps,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -42,7 +42,7 @@ USER_COMMENT: "{instructions}"
|
||||
|
||||
If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:
|
||||
Response: ```json
|
||||
{{"response": "Human-understandable synthesis of the API_RESPONSE"}}
|
||||
{{"response": "Concise response to USER_COMMENT based on API_RESPONSE."}}
|
||||
```
|
||||
|
||||
Otherwise respond with the following markdown json block:
|
||||
@@ -50,7 +50,7 @@ Response Error: ```json
|
||||
{{"response": "What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion."}}
|
||||
```
|
||||
|
||||
You MUST respond as a markdown json code block. The person you are responding to CANNOT see the API_RESPONSE, so if there is any relevant information there you must include it in your response.
|
||||
You MUST respond as a markdown json code block.
|
||||
|
||||
Begin:
|
||||
---
|
||||
|
||||
@@ -14,7 +14,7 @@ from langchain.docstore.document import Document
|
||||
class CombineDocsProtocol(Protocol):
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
def __call__(self, docs: List[Document], **kwargs: Any) -> str:
|
||||
def __call__(self, docs: List[Document], **kwargs: Any) -> Tuple[str, dict]:
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ def _collapse_docs(
|
||||
combine_document_func: CombineDocsProtocol,
|
||||
**kwargs: Any,
|
||||
) -> Document:
|
||||
result = combine_document_func(docs, **kwargs)
|
||||
result, _ = combine_document_func(docs, **kwargs)
|
||||
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
|
||||
for doc in docs[1:]:
|
||||
for k, v in doc.metadata.items():
|
||||
@@ -171,17 +171,15 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
]
|
||||
length_func = self.combine_document_chain.prompt_length
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
|
||||
def _collapse_docs_func(docs: List[Document], **kwargs: Any) -> str:
|
||||
return self._collapse_chain.run(input_documents=docs, **kwargs)
|
||||
|
||||
while num_tokens is not None and num_tokens > token_max:
|
||||
new_result_doc_list = _split_list_of_docs(
|
||||
result_docs, length_func, token_max, **kwargs
|
||||
)
|
||||
result_docs = []
|
||||
for docs in new_result_doc_list:
|
||||
new_doc = _collapse_docs(docs, _collapse_docs_func, **kwargs)
|
||||
new_doc = _collapse_docs(
|
||||
docs, self._collapse_chain.combine_docs, **kwargs
|
||||
)
|
||||
result_docs.append(new_doc)
|
||||
num_tokens = self.combine_document_chain.prompt_length(
|
||||
result_docs, **kwargs
|
||||
@@ -191,7 +189,7 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
extra_return_dict = {"intermediate_steps": _results}
|
||||
else:
|
||||
extra_return_dict = {}
|
||||
output = self.combine_document_chain.run(input_documents=result_docs, **kwargs)
|
||||
output, _ = self.combine_document_chain.combine_docs(result_docs, **kwargs)
|
||||
return output, extra_return_dict
|
||||
|
||||
@property
|
||||
|
||||
@@ -80,7 +80,7 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
new_inputs = inputs.copy()
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["chat_history"] = chat_history_str
|
||||
answer = self.combine_docs_chain.run(input_documents=docs, **new_inputs)
|
||||
answer, _ = self.combine_docs_chain.combine_docs(docs, **new_inputs)
|
||||
if self.return_source_documents:
|
||||
return {self.output_key: answer, "source_documents": docs}
|
||||
else:
|
||||
@@ -104,7 +104,7 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
new_inputs = inputs.copy()
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["chat_history"] = chat_history_str
|
||||
answer = await self.combine_docs_chain.arun(input_documents=docs, **new_inputs)
|
||||
answer, _ = await self.combine_docs_chain.acombine_docs(docs, **new_inputs)
|
||||
if self.return_source_documents:
|
||||
return {self.output_key: answer, "source_documents": docs}
|
||||
else:
|
||||
|
||||
@@ -68,35 +68,6 @@ class LLMMathChain(Chain):
|
||||
raise ValueError(f"unknown format from LLM: {t}")
|
||||
return {self.output_key: answer}
|
||||
|
||||
async def _aprocess_llm_result(self, t: str) -> Dict[str, str]:
|
||||
python_executor = PythonREPL()
|
||||
if self.callback_manager.is_async:
|
||||
await self.callback_manager.on_text(t, color="green", verbose=self.verbose)
|
||||
else:
|
||||
self.callback_manager.on_text(t, color="green", verbose=self.verbose)
|
||||
t = t.strip()
|
||||
if t.startswith("```python"):
|
||||
code = t[9:-4]
|
||||
output = python_executor.run(code)
|
||||
if self.callback_manager.is_async:
|
||||
await self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
|
||||
await self.callback_manager.on_text(
|
||||
output, color="yellow", verbose=self.verbose
|
||||
)
|
||||
else:
|
||||
await self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
|
||||
await self.callback_manager.on_text(
|
||||
output, color="yellow", verbose=self.verbose
|
||||
)
|
||||
answer = "Answer: " + output
|
||||
elif t.startswith("Answer:"):
|
||||
answer = t
|
||||
elif "Answer:" in t:
|
||||
answer = "Answer: " + t.split("Answer:")[-1]
|
||||
else:
|
||||
raise ValueError(f"unknown format from LLM: {t}")
|
||||
return {self.output_key: answer}
|
||||
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
llm_executor = LLMChain(
|
||||
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
|
||||
@@ -109,16 +80,11 @@ class LLMMathChain(Chain):
|
||||
llm_executor = LLMChain(
|
||||
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
|
||||
)
|
||||
if self.callback_manager.is_async:
|
||||
await self.callback_manager.on_text(
|
||||
inputs[self.input_key], verbose=self.verbose
|
||||
)
|
||||
else:
|
||||
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
|
||||
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
|
||||
t = await llm_executor.apredict(
|
||||
question=inputs[self.input_key], stop=["```output"]
|
||||
)
|
||||
return await self._aprocess_llm_result(t)
|
||||
return self._process_llm_result(t)
|
||||
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
|
||||
@@ -70,5 +70,5 @@ class MapReduceChain(Chain):
|
||||
# Split the larger text into smaller chunks.
|
||||
texts = self.text_splitter.split_text(inputs[self.input_key])
|
||||
docs = [Document(page_content=text) for text in texts]
|
||||
outputs = self.combine_documents_chain.run(input_documents=docs)
|
||||
outputs, _ = self.combine_documents_chain.combine_docs(docs)
|
||||
return {self.output_key: outputs}
|
||||
|
||||
@@ -31,7 +31,6 @@ class OpenAIModerationChain(Chain):
|
||||
input_key: str = "input" #: :meta private:
|
||||
output_key: str = "output" #: :meta private:
|
||||
openai_api_key: Optional[str] = None
|
||||
openai_organization: Optional[str] = None
|
||||
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
|
||||
@@ -116,7 +116,7 @@ class BaseQAWithSourcesChain(Chain, ABC):
|
||||
|
||||
def _call(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
|
||||
docs = self._get_docs(inputs)
|
||||
answer = self.combine_documents_chain.run(input_documents=docs, **inputs)
|
||||
answer, _ = self.combine_documents_chain.combine_docs(docs, **inputs)
|
||||
if re.search(r"SOURCES:\s", answer):
|
||||
answer, sources = re.split(r"SOURCES:\s", answer)
|
||||
else:
|
||||
@@ -135,7 +135,7 @@ class BaseQAWithSourcesChain(Chain, ABC):
|
||||
|
||||
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
|
||||
docs = await self._aget_docs(inputs)
|
||||
answer = await self.combine_documents_chain.arun(input_documents=docs, **inputs)
|
||||
answer, _ = await self.combine_documents_chain.acombine_docs(docs, **inputs)
|
||||
if re.search(r"SOURCES:\s", answer):
|
||||
answer, sources = re.split(r"SOURCES:\s", answer)
|
||||
else:
|
||||
|
||||
@@ -107,9 +107,7 @@ class BaseRetrievalQA(Chain):
|
||||
question = inputs[self.input_key]
|
||||
|
||||
docs = self._get_docs(question)
|
||||
answer = self.combine_documents_chain.run(
|
||||
input_documents=docs, question=question
|
||||
)
|
||||
answer, _ = self.combine_documents_chain.combine_docs(docs, question=question)
|
||||
|
||||
if self.return_source_documents:
|
||||
return {self.output_key: answer, "source_documents": docs}
|
||||
@@ -135,8 +133,8 @@ class BaseRetrievalQA(Chain):
|
||||
question = inputs[self.input_key]
|
||||
|
||||
docs = await self._aget_docs(question)
|
||||
answer = await self.combine_documents_chain.arun(
|
||||
input_documents=docs, question=question
|
||||
answer, _ = await self.combine_documents_chain.acombine_docs(
|
||||
docs, question=question
|
||||
)
|
||||
|
||||
if self.return_source_documents:
|
||||
|
||||
@@ -44,7 +44,6 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
openai_api_base: str = ""
|
||||
openai_api_version: str = ""
|
||||
openai_api_key: str = ""
|
||||
openai_organization: str = ""
|
||||
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
|
||||
@@ -115,7 +115,6 @@ class ChatOpenAI(BaseChatModel):
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Holds any model parameters valid for `create` call not explicitly specified."""
|
||||
openai_api_key: Optional[str] = None
|
||||
openai_organization: Optional[str] = None
|
||||
request_timeout: int = 60
|
||||
"""Timeout in seconds for the OpenAPI request."""
|
||||
max_retries: int = 6
|
||||
@@ -192,7 +191,6 @@ class ChatOpenAI(BaseChatModel):
|
||||
"max_tokens": self.max_tokens,
|
||||
"stream": self.streaming,
|
||||
"n": self.n,
|
||||
"temperature": self.temperature,
|
||||
**self.model_kwargs,
|
||||
}
|
||||
|
||||
@@ -327,8 +325,8 @@ class ChatOpenAI(BaseChatModel):
|
||||
|
||||
def get_num_tokens(self, text: str) -> int:
|
||||
"""Calculate num tokens with tiktoken package."""
|
||||
# tiktoken NOT supported for Python 3.7 or below
|
||||
if sys.version_info[1] <= 7:
|
||||
# tiktoken NOT supported for Python 3.8 or below
|
||||
if sys.version_info[1] <= 8:
|
||||
return super().get_num_tokens(text)
|
||||
try:
|
||||
import tiktoken
|
||||
|
||||
@@ -43,7 +43,6 @@ from langchain.document_loaders.obsidian import ObsidianLoader
|
||||
from langchain.document_loaders.pdf import (
|
||||
OnlinePDFLoader,
|
||||
PDFMinerLoader,
|
||||
PDFMinerPDFasHTMLLoader,
|
||||
PyMuPDFLoader,
|
||||
PyPDFLoader,
|
||||
UnstructuredPDFLoader,
|
||||
@@ -117,7 +116,6 @@ __all__ = [
|
||||
"AirbyteJSONLoader",
|
||||
"OnlinePDFLoader",
|
||||
"PDFMinerLoader",
|
||||
"PDFMinerPDFasHTMLLoader",
|
||||
"PyMuPDFLoader",
|
||||
"TelegramChatLoader",
|
||||
"SRTLoader",
|
||||
|
||||
@@ -1,77 +0,0 @@
|
||||
import json
|
||||
import re
|
||||
import warnings
|
||||
from typing import List, Tuple
|
||||
|
||||
import requests
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class BiliBiliLoader(BaseLoader):
|
||||
"""Loader that loads bilibili transcripts."""
|
||||
|
||||
def __init__(self, video_urls: List[str]):
|
||||
"""Initialize with bilibili url."""
|
||||
self.video_urls = video_urls
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load from bilibili url."""
|
||||
results = []
|
||||
for url in self.video_urls:
|
||||
transcript, video_info = self._get_bilibili_subs_and_info(url)
|
||||
doc = Document(page_content=transcript, metadata=video_info)
|
||||
results.append(doc)
|
||||
|
||||
return results
|
||||
|
||||
def _get_bilibili_subs_and_info(self, url: str) -> Tuple[str, dict]:
|
||||
try:
|
||||
from bilibili_api import sync, video
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"requests package not found, please install it with "
|
||||
"`pip install bilibili-api`"
|
||||
)
|
||||
|
||||
bvid = re.search(r"BV\w+", url)
|
||||
if bvid is not None:
|
||||
v = video.Video(bvid=bvid.group())
|
||||
else:
|
||||
aid = re.search(r"av[0-9]+", url)
|
||||
if aid is not None:
|
||||
try:
|
||||
v = video.Video(aid=int(aid.group()[2:]))
|
||||
except AttributeError:
|
||||
raise ValueError(f"{url} is not bilibili url.")
|
||||
else:
|
||||
raise ValueError(f"{url} is not bilibili url.")
|
||||
|
||||
video_info = sync(v.get_info())
|
||||
video_info.update({"url": url})
|
||||
|
||||
# Get subtitle url
|
||||
subtitle = video_info.pop("subtitle")
|
||||
sub_list = subtitle["list"]
|
||||
if sub_list:
|
||||
sub_url = sub_list[0]["subtitle_url"]
|
||||
result = requests.get(sub_url)
|
||||
raw_sub_titles = json.loads(result.content)["body"]
|
||||
raw_transcript = " ".join([c["content"] for c in raw_sub_titles])
|
||||
|
||||
raw_transcript_with_meta_info = f"""
|
||||
Video Title: {video_info['title']},
|
||||
description: {video_info['desc']}\n
|
||||
Transcript: {raw_transcript}
|
||||
"""
|
||||
return raw_transcript_with_meta_info, video_info
|
||||
else:
|
||||
raw_transcript = ""
|
||||
warnings.warn(
|
||||
f"""
|
||||
No subtitles found for video: {url}.
|
||||
Return Empty transcript.
|
||||
"""
|
||||
)
|
||||
return raw_transcript, video_info
|
||||
@@ -148,7 +148,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
creds = self._load_credentials()
|
||||
service = build("drive", "v3", credentials=creds)
|
||||
|
||||
file = service.files().get(fileId=id, supportsAllDrives=True).execute()
|
||||
file = service.files().get(fileId=id).execute()
|
||||
request = service.files().export_media(fileId=id, mimeType="text/plain")
|
||||
fh = BytesIO()
|
||||
downloader = MediaIoBaseDownload(fh, request)
|
||||
@@ -182,8 +182,6 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
.list(
|
||||
q=f"'{self.folder_id}' in parents",
|
||||
pageSize=1000,
|
||||
includeItemsFromAllDrives=True,
|
||||
supportsAllDrives=True,
|
||||
fields="nextPageToken, files(id, name, mimeType)",
|
||||
)
|
||||
.execute()
|
||||
@@ -219,7 +217,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
creds = self._load_credentials()
|
||||
service = build("drive", "v3", credentials=creds)
|
||||
|
||||
file = service.files().get(fileId=id, supportsAllDrives=True).execute()
|
||||
file = service.files().get(fileId=id).execute()
|
||||
request = service.files().get_media(fileId=id)
|
||||
fh = BytesIO()
|
||||
downloader = MediaIoBaseDownload(fh, request)
|
||||
|
||||
@@ -2,7 +2,6 @@
|
||||
import os
|
||||
import tempfile
|
||||
from abc import ABC
|
||||
from io import StringIO
|
||||
from typing import Any, List, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
@@ -130,40 +129,6 @@ class PDFMinerLoader(BasePDFLoader):
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
|
||||
|
||||
class PDFMinerPDFasHTMLLoader(BasePDFLoader):
|
||||
"""Loader that uses PDFMiner to load PDF files as HTML content."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
from pdfminer.high_level import extract_text_to_fp # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"pdfminer package not found, please install it with "
|
||||
"`pip install pdfminer.six`"
|
||||
)
|
||||
|
||||
super().__init__(file_path)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from pdfminer.high_level import extract_text_to_fp
|
||||
from pdfminer.layout import LAParams
|
||||
from pdfminer.utils import open_filename
|
||||
|
||||
output_string = StringIO()
|
||||
with open_filename(self.file_path, "rb") as fp:
|
||||
extract_text_to_fp(
|
||||
fp, # type: ignore[arg-type]
|
||||
output_string,
|
||||
codec="",
|
||||
laparams=LAParams(),
|
||||
output_type="html",
|
||||
)
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=output_string.getvalue(), metadata=metadata)]
|
||||
|
||||
|
||||
class PyMuPDFLoader(BasePDFLoader):
|
||||
"""Loader that uses PyMuPDF to load PDF files."""
|
||||
|
||||
|
||||
@@ -45,10 +45,6 @@ class ReadTheDocsLoader(BaseLoader):
|
||||
def _clean_data(data: str) -> str:
|
||||
soup = BeautifulSoup(data, **self.bs_kwargs)
|
||||
text = soup.find_all("main", {"id": "main-content"})
|
||||
|
||||
if len(text) == 0:
|
||||
text = soup.find_all("div", {"role": "main"})
|
||||
|
||||
if len(text) != 0:
|
||||
text = text[0].get_text()
|
||||
else:
|
||||
|
||||
@@ -2,17 +2,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import (
|
||||
Any,
|
||||
Callable,
|
||||
Dict,
|
||||
List,
|
||||
Literal,
|
||||
Optional,
|
||||
Set,
|
||||
Tuple,
|
||||
Union,
|
||||
)
|
||||
from typing import Any, Callable, Dict, List, Optional
|
||||
|
||||
import numpy as np
|
||||
from pydantic import BaseModel, Extra, root_validator
|
||||
@@ -53,14 +43,14 @@ def _create_retry_decorator(embeddings: OpenAIEmbeddings) -> Callable[[Any], Any
|
||||
|
||||
|
||||
def embed_with_retry(embeddings: OpenAIEmbeddings, **kwargs: Any) -> Any:
|
||||
"""Use tenacity to retry the embedding call."""
|
||||
"""Use tenacity to retry the completion call."""
|
||||
retry_decorator = _create_retry_decorator(embeddings)
|
||||
|
||||
@retry_decorator
|
||||
def _embed_with_retry(**kwargs: Any) -> Any:
|
||||
def _completion_with_retry(**kwargs: Any) -> Any:
|
||||
return embeddings.client.create(**kwargs)
|
||||
|
||||
return _embed_with_retry(**kwargs)
|
||||
return _completion_with_retry(**kwargs)
|
||||
|
||||
|
||||
class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
@@ -108,9 +98,6 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
query_model_name: str = "text-embedding-ada-002"
|
||||
embedding_ctx_length: int = 8191
|
||||
openai_api_key: Optional[str] = None
|
||||
openai_organization: Optional[str] = None
|
||||
allowed_special: Union[Literal["all"], Set[str]] = set()
|
||||
disallowed_special: Union[Literal["all"], Set[str], Tuple[()]] = "all"
|
||||
chunk_size: int = 1000
|
||||
"""Maximum number of texts to embed in each batch"""
|
||||
max_retries: int = 6
|
||||
@@ -207,11 +194,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
for i, text in enumerate(texts):
|
||||
# replace newlines, which can negatively affect performance.
|
||||
text = text.replace("\n", " ")
|
||||
token = encoding.encode(
|
||||
text,
|
||||
allowed_special=self.allowed_special,
|
||||
disallowed_special=self.disallowed_special,
|
||||
)
|
||||
token = encoding.encode(text)
|
||||
for j in range(0, len(token), self.embedding_ctx_length):
|
||||
tokens += [token[j : j + self.embedding_ctx_length]]
|
||||
indices += [i]
|
||||
@@ -247,11 +230,10 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
|
||||
def _embedding_func(self, text: str, *, engine: str) -> List[float]:
|
||||
"""Call out to OpenAI's embedding endpoint."""
|
||||
# handle large input text
|
||||
# replace newlines, which can negatively affect performance.
|
||||
if self.embedding_ctx_length > 0:
|
||||
return self._get_len_safe_embeddings([text], engine=engine)[0]
|
||||
else:
|
||||
# replace newlines, which can negatively affect performance.
|
||||
text = text.replace("\n", " ")
|
||||
return embed_with_retry(self, input=[text], engine=engine)["data"][0][
|
||||
"embedding"
|
||||
@@ -270,7 +252,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
# handle batches of large input text
|
||||
# handle large batches of texts
|
||||
if self.embedding_ctx_length > 0:
|
||||
return self._get_len_safe_embeddings(texts, engine=self.document_model_name)
|
||||
else:
|
||||
@@ -292,7 +274,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embedding for the text.
|
||||
Embeddings for the text.
|
||||
"""
|
||||
embedding = self._embedding_func(text, engine=self.query_model_name)
|
||||
return embedding
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
"""Wrapper for the GPT4All model."""
|
||||
from functools import partial
|
||||
from typing import Any, Dict, List, Mapping, Optional, Set
|
||||
|
||||
from pydantic import Extra, Field, root_validator
|
||||
@@ -175,12 +174,8 @@ class GPT4All(LLM):
|
||||
prompt = "Once upon a time, "
|
||||
response = model(prompt, n_predict=55)
|
||||
"""
|
||||
text_callback = partial(
|
||||
self.callback_manager.on_llm_new_token, verbose=self.verbose
|
||||
)
|
||||
text = self.client.generate(
|
||||
prompt,
|
||||
new_text_callback=text_callback,
|
||||
**self._default_params,
|
||||
)
|
||||
if stop is not None:
|
||||
|
||||
@@ -177,8 +177,6 @@ class LlamaCpp(LLM):
|
||||
raise ValueError("`stop` found in both the input and default params.")
|
||||
elif self.stop:
|
||||
params["stop_sequences"] = self.stop
|
||||
elif stop:
|
||||
params["stop_sequences"] = stop
|
||||
else:
|
||||
params["stop_sequences"] = []
|
||||
|
||||
|
||||
@@ -151,7 +151,6 @@ class BaseOpenAI(BaseLLM):
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Holds any model parameters valid for `create` call not explicitly specified."""
|
||||
openai_api_key: Optional[str] = None
|
||||
openai_organization: Optional[str] = None
|
||||
batch_size: int = 20
|
||||
"""Batch size to use when passing multiple documents to generate."""
|
||||
request_timeout: Optional[Union[float, Tuple[float, float]]] = None
|
||||
@@ -223,7 +222,7 @@ class BaseOpenAI(BaseLLM):
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import openai python package. "
|
||||
"Please install it with `pip install openai`."
|
||||
"Please it install it with `pip install openai`."
|
||||
)
|
||||
if values["streaming"] and values["n"] > 1:
|
||||
raise ValueError("Cannot stream results when n > 1.")
|
||||
|
||||
@@ -67,6 +67,8 @@ class RWKV(LLM, BaseModel):
|
||||
|
||||
pipeline: Any = None #: :meta private:
|
||||
|
||||
model_state: Any = None #: :meta private:
|
||||
|
||||
model_tokens: Any = None #: :meta private:
|
||||
|
||||
class Config:
|
||||
@@ -143,7 +145,7 @@ class RWKV(LLM, BaseModel):
|
||||
tokens = self.tokenizer.encode(prompt).ids
|
||||
|
||||
logits = None
|
||||
state = None
|
||||
state = self.model_state
|
||||
|
||||
occurrence = {}
|
||||
|
||||
@@ -176,6 +178,8 @@ class RWKV(LLM, BaseModel):
|
||||
+ occurrence[n] * self.penalty_alpha_frequency
|
||||
)
|
||||
|
||||
# Update state for future invocations
|
||||
self.model_state = state
|
||||
return decoded
|
||||
|
||||
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
|
||||
|
||||
@@ -5,7 +5,6 @@ from langchain.memory.buffer import (
|
||||
from langchain.memory.buffer_window import ConversationBufferWindowMemory
|
||||
from langchain.memory.chat_message_histories.dynamodb import DynamoDBChatMessageHistory
|
||||
from langchain.memory.chat_message_histories.in_memory import ChatMessageHistory
|
||||
from langchain.memory.chat_message_histories.postgres import PostgresChatMessageHistory
|
||||
from langchain.memory.chat_message_histories.redis import RedisChatMessageHistory
|
||||
from langchain.memory.combined import CombinedMemory
|
||||
from langchain.memory.entity import (
|
||||
@@ -37,5 +36,4 @@ __all__ = [
|
||||
"ConversationTokenBufferMemory",
|
||||
"RedisChatMessageHistory",
|
||||
"DynamoDBChatMessageHistory",
|
||||
"PostgresChatMessageHistory",
|
||||
]
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
from abc import ABC
|
||||
from typing import Any, Dict, Optional, Tuple
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
@@ -14,9 +14,8 @@ class BaseChatMemory(BaseMemory, ABC):
|
||||
input_key: Optional[str] = None
|
||||
return_messages: bool = False
|
||||
|
||||
def _get_input_output(
|
||||
self, inputs: Dict[str, Any], outputs: Dict[str, str]
|
||||
) -> Tuple[str, str]:
|
||||
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
|
||||
"""Save context from this conversation to buffer."""
|
||||
if self.input_key is None:
|
||||
prompt_input_key = get_prompt_input_key(inputs, self.memory_variables)
|
||||
else:
|
||||
@@ -27,13 +26,8 @@ class BaseChatMemory(BaseMemory, ABC):
|
||||
output_key = list(outputs.keys())[0]
|
||||
else:
|
||||
output_key = self.output_key
|
||||
return inputs[prompt_input_key], outputs[output_key]
|
||||
|
||||
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
|
||||
"""Save context from this conversation to buffer."""
|
||||
input_str, output_str = self._get_input_output(inputs, outputs)
|
||||
self.chat_memory.add_user_message(input_str)
|
||||
self.chat_memory.add_ai_message(output_str)
|
||||
self.chat_memory.add_user_message(inputs[prompt_input_key])
|
||||
self.chat_memory.add_ai_message(outputs[output_key])
|
||||
|
||||
def clear(self) -> None:
|
||||
"""Clear memory contents."""
|
||||
|
||||
@@ -1,9 +1,7 @@
|
||||
from langchain.memory.chat_message_histories.dynamodb import DynamoDBChatMessageHistory
|
||||
from langchain.memory.chat_message_histories.postgres import PostgresChatMessageHistory
|
||||
from langchain.memory.chat_message_histories.redis import RedisChatMessageHistory
|
||||
|
||||
__all__ = [
|
||||
"DynamoDBChatMessageHistory",
|
||||
"RedisChatMessageHistory",
|
||||
"PostgresChatMessageHistory",
|
||||
]
|
||||
|
||||
@@ -1,86 +0,0 @@
|
||||
import json
|
||||
import logging
|
||||
from typing import List
|
||||
|
||||
from langchain.schema import (
|
||||
AIMessage,
|
||||
BaseChatMessageHistory,
|
||||
BaseMessage,
|
||||
HumanMessage,
|
||||
_message_to_dict,
|
||||
messages_from_dict,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_CONNECTION_STRING = "postgresql://postgres:mypassword@localhost/chat_history"
|
||||
|
||||
|
||||
class PostgresChatMessageHistory(BaseChatMessageHistory):
|
||||
def __init__(
|
||||
self,
|
||||
session_id: str,
|
||||
connection_string: str = DEFAULT_CONNECTION_STRING,
|
||||
table_name: str = "message_store",
|
||||
):
|
||||
import psycopg
|
||||
from psycopg.rows import dict_row
|
||||
|
||||
try:
|
||||
self.connection = psycopg.connect(connection_string)
|
||||
self.cursor = self.connection.cursor(row_factory=dict_row)
|
||||
except psycopg.OperationalError as error:
|
||||
logger.error(error)
|
||||
|
||||
self.session_id = session_id
|
||||
self.table_name = table_name
|
||||
|
||||
self._create_table_if_not_exists()
|
||||
|
||||
def _create_table_if_not_exists(self) -> None:
|
||||
create_table_query = f"""CREATE TABLE IF NOT EXISTS {self.table_name} (
|
||||
id SERIAL PRIMARY KEY,
|
||||
session_id TEXT NOT NULL,
|
||||
message JSONB NOT NULL
|
||||
);"""
|
||||
self.cursor.execute(create_table_query)
|
||||
self.connection.commit()
|
||||
|
||||
@property
|
||||
def messages(self) -> List[BaseMessage]: # type: ignore
|
||||
"""Retrieve the messages from PostgreSQL"""
|
||||
query = f"SELECT message FROM {self.table_name} WHERE session_id = %s;"
|
||||
self.cursor.execute(query, (self.session_id,))
|
||||
items = [record["message"] for record in self.cursor.fetchall()]
|
||||
messages = messages_from_dict(items)
|
||||
return messages
|
||||
|
||||
def add_user_message(self, message: str) -> None:
|
||||
self.append(HumanMessage(content=message))
|
||||
|
||||
def add_ai_message(self, message: str) -> None:
|
||||
self.append(AIMessage(content=message))
|
||||
|
||||
def append(self, message: BaseMessage) -> None:
|
||||
"""Append the message to the record in PostgreSQL"""
|
||||
from psycopg import sql
|
||||
|
||||
query = sql.SQL("INSERT INTO {} (session_id, message) VALUES (%s, %s);").format(
|
||||
sql.Identifier(self.table_name)
|
||||
)
|
||||
self.cursor.execute(
|
||||
query, (self.session_id, json.dumps(_message_to_dict(message)))
|
||||
)
|
||||
self.connection.commit()
|
||||
|
||||
def clear(self) -> None:
|
||||
"""Clear session memory from PostgreSQL"""
|
||||
query = f"DELETE FROM {self.table_name} WHERE session_id = %s;"
|
||||
self.cursor.execute(query, (self.session_id,))
|
||||
self.connection.commit()
|
||||
|
||||
def __del__(self) -> None:
|
||||
if self.cursor:
|
||||
self.cursor.close()
|
||||
if self.connection:
|
||||
self.connection.close()
|
||||
@@ -1,58 +0,0 @@
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import requests
|
||||
|
||||
from langchain.memory.chat_memory import BaseChatMemory
|
||||
from langchain.schema import get_buffer_string
|
||||
|
||||
|
||||
class MotorheadMemory(BaseChatMemory):
|
||||
url: str = "http://localhost:8080"
|
||||
timeout = 3000
|
||||
memory_key = "history"
|
||||
session_id: str
|
||||
context: Optional[str] = None
|
||||
|
||||
async def init(self) -> None:
|
||||
res = requests.get(
|
||||
f"{self.url}/sessions/{self.session_id}/memory",
|
||||
timeout=self.timeout,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
res_data = res.json()
|
||||
messages = res_data.get("messages", [])
|
||||
context = res_data.get("context", "NONE")
|
||||
|
||||
for message in messages:
|
||||
if message["role"] == "AI":
|
||||
self.chat_memory.add_ai_message(message["content"])
|
||||
else:
|
||||
self.chat_memory.add_user_message(message["content"])
|
||||
|
||||
if context and context != "NONE":
|
||||
self.context = context
|
||||
|
||||
def load_memory_variables(self, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
if self.return_messages:
|
||||
return {self.memory_key: self.chat_memory.messages}
|
||||
else:
|
||||
return {self.memory_key: get_buffer_string(self.chat_memory.messages)}
|
||||
|
||||
@property
|
||||
def memory_variables(self) -> List[str]:
|
||||
return [self.memory_key]
|
||||
|
||||
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
|
||||
input_str, output_str = self._get_input_output(inputs, outputs)
|
||||
requests.post(
|
||||
f"{self.url}/sessions/{self.session_id}/memory",
|
||||
timeout=self.timeout,
|
||||
json={
|
||||
"messages": [
|
||||
{"role": "Human", "content": f"{input_str}"},
|
||||
{"role": "AI", "content": f"{output_str}"},
|
||||
]
|
||||
},
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
super().save_context(inputs, outputs)
|
||||
@@ -39,10 +39,7 @@ class StructuredOutputParser(BaseOutputParser):
|
||||
|
||||
def parse(self, text: str) -> BaseModel:
|
||||
json_string = text.split("```json")[1].strip().strip("```").strip()
|
||||
try:
|
||||
json_obj = json.loads(json_string)
|
||||
except json.JSONDecodeError as e:
|
||||
raise OutputParserException(f"Got invalid JSON object. Error: {e}")
|
||||
json_obj = json.loads(json_string)
|
||||
for schema in self.response_schemas:
|
||||
if schema.name not in json_obj:
|
||||
raise OutputParserException(
|
||||
|
||||
@@ -76,7 +76,7 @@ class SemanticSimilarityExampleSelector(BaseExampleSelector, BaseModel):
|
||||
|
||||
Args:
|
||||
examples: List of examples to use in the prompt.
|
||||
embeddings: An initialized embedding API interface, e.g. OpenAIEmbeddings().
|
||||
embeddings: An iniialized embedding API interface, e.g. OpenAIEmbeddings().
|
||||
vectorstore_cls: A vector store DB interface class, e.g. FAISS.
|
||||
k: Number of examples to select
|
||||
input_keys: If provided, the search is based on the input variables
|
||||
|
||||
@@ -86,7 +86,7 @@ class PromptTemplate(StringPromptTemplate):
|
||||
) -> PromptTemplate:
|
||||
"""Take examples in list format with prefix and suffix to create a prompt.
|
||||
|
||||
Intended to be used as a way to dynamically create a prompt from examples.
|
||||
Intended be used as a way to dynamically create a prompt from examples.
|
||||
|
||||
Args:
|
||||
examples: List of examples to use in the prompt.
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
"""Lightweight wrapper around requests library, with async support."""
|
||||
from contextlib import asynccontextmanager
|
||||
from typing import Any, AsyncGenerator, Dict, Optional
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import aiohttp
|
||||
import requests
|
||||
@@ -43,62 +42,47 @@ class Requests(BaseModel):
|
||||
"""DELETE the URL and return the text."""
|
||||
return requests.delete(url, headers=self.headers, **kwargs)
|
||||
|
||||
@asynccontextmanager
|
||||
async def _arequest(
|
||||
self, method: str, url: str, **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
) -> aiohttp.ClientResponse:
|
||||
"""Make an async request."""
|
||||
if not self.aiosession:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.request(
|
||||
method, url, headers=self.headers, **kwargs
|
||||
) as response:
|
||||
yield response
|
||||
return response
|
||||
else:
|
||||
async with self.aiosession.request(
|
||||
method, url, headers=self.headers, **kwargs
|
||||
) as response:
|
||||
yield response
|
||||
return response
|
||||
|
||||
@asynccontextmanager
|
||||
async def aget(
|
||||
self, url: str, **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
async def aget(self, url: str, **kwargs: Any) -> aiohttp.ClientResponse:
|
||||
"""GET the URL and return the text asynchronously."""
|
||||
async with self._arequest("GET", url, **kwargs) as response:
|
||||
yield response
|
||||
return await self._arequest("GET", url, **kwargs)
|
||||
|
||||
@asynccontextmanager
|
||||
async def apost(
|
||||
self, url: str, data: Dict[str, Any], **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
) -> aiohttp.ClientResponse:
|
||||
"""POST to the URL and return the text asynchronously."""
|
||||
async with self._arequest("POST", url, **kwargs) as response:
|
||||
yield response
|
||||
return await self._arequest("POST", url, json=data, **kwargs)
|
||||
|
||||
@asynccontextmanager
|
||||
async def apatch(
|
||||
self, url: str, data: Dict[str, Any], **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
) -> aiohttp.ClientResponse:
|
||||
"""PATCH the URL and return the text asynchronously."""
|
||||
async with self._arequest("PATCH", url, **kwargs) as response:
|
||||
yield response
|
||||
return await self._arequest("PATCH", url, json=data, **kwargs)
|
||||
|
||||
@asynccontextmanager
|
||||
async def aput(
|
||||
self, url: str, data: Dict[str, Any], **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
) -> aiohttp.ClientResponse:
|
||||
"""PUT the URL and return the text asynchronously."""
|
||||
async with self._arequest("PUT", url, **kwargs) as response:
|
||||
yield response
|
||||
return await self._arequest("PUT", url, json=data, **kwargs)
|
||||
|
||||
@asynccontextmanager
|
||||
async def adelete(
|
||||
self, url: str, **kwargs: Any
|
||||
) -> AsyncGenerator[aiohttp.ClientResponse, None]:
|
||||
async def adelete(self, url: str, **kwargs: Any) -> aiohttp.ClientResponse:
|
||||
"""DELETE the URL and return the text asynchronously."""
|
||||
async with self._arequest("DELETE", url, **kwargs) as response:
|
||||
yield response
|
||||
return await self._arequest("DELETE", url, **kwargs)
|
||||
|
||||
|
||||
class TextRequestsWrapper(BaseModel):
|
||||
@@ -142,28 +126,28 @@ class TextRequestsWrapper(BaseModel):
|
||||
|
||||
async def aget(self, url: str, **kwargs: Any) -> str:
|
||||
"""GET the URL and return the text asynchronously."""
|
||||
async with self.requests.aget(url, **kwargs) as response:
|
||||
return await response.text()
|
||||
response = await self.requests.aget(url, **kwargs)
|
||||
return await response.text()
|
||||
|
||||
async def apost(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
|
||||
"""POST to the URL and return the text asynchronously."""
|
||||
async with self.requests.apost(url, **kwargs) as response:
|
||||
return await response.text()
|
||||
response = await self.requests.apost(url, data, **kwargs)
|
||||
return await response.text()
|
||||
|
||||
async def apatch(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
|
||||
"""PATCH the URL and return the text asynchronously."""
|
||||
async with self.requests.apatch(url, **kwargs) as response:
|
||||
return await response.text()
|
||||
response = await self.requests.apatch(url, data, **kwargs)
|
||||
return await response.text()
|
||||
|
||||
async def aput(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
|
||||
"""PUT the URL and return the text asynchronously."""
|
||||
async with self.requests.aput(url, **kwargs) as response:
|
||||
return await response.text()
|
||||
response = await self.requests.aput(url, data, **kwargs)
|
||||
return await response.text()
|
||||
|
||||
async def adelete(self, url: str, **kwargs: Any) -> str:
|
||||
"""DELETE the URL and return the text asynchronously."""
|
||||
async with self.requests.adelete(url, **kwargs) as response:
|
||||
return await response.text()
|
||||
response = await self.requests.adelete(url, **kwargs)
|
||||
return await response.text()
|
||||
|
||||
|
||||
# For backwards compatibility
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
from langchain.retrievers.chatgpt_plugin_retriever import ChatGPTPluginRetriever
|
||||
from langchain.retrievers.databerry import DataberryRetriever
|
||||
from langchain.retrievers.elastic_search_bm25 import ElasticSearchBM25Retriever
|
||||
from langchain.retrievers.metal import MetalRetriever
|
||||
from langchain.retrievers.pinecone_hybrid_search import PineconeHybridSearchRetriever
|
||||
@@ -15,5 +14,4 @@ __all__ = [
|
||||
"ElasticSearchBM25Retriever",
|
||||
"TFIDFRetriever",
|
||||
"WeaviateHybridSearchRetriever",
|
||||
"DataberryRetriever",
|
||||
]
|
||||
|
||||
@@ -1,74 +0,0 @@
|
||||
from typing import List, Optional
|
||||
|
||||
import aiohttp
|
||||
import requests
|
||||
|
||||
from langchain.schema import BaseRetriever, Document
|
||||
|
||||
|
||||
class DataberryRetriever(BaseRetriever):
|
||||
datastore_url: str
|
||||
top_k: Optional[int]
|
||||
api_key: Optional[str]
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
datastore_url: str,
|
||||
top_k: Optional[int] = None,
|
||||
api_key: Optional[str] = None,
|
||||
):
|
||||
self.datastore_url = datastore_url
|
||||
self.api_key = api_key
|
||||
self.top_k = top_k
|
||||
|
||||
def get_relevant_documents(self, query: str) -> List[Document]:
|
||||
response = requests.post(
|
||||
self.datastore_url,
|
||||
json={
|
||||
"query": query,
|
||||
**({"topK": self.top_k} if self.top_k is not None else {}),
|
||||
},
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
**(
|
||||
{"Authorization": f"Bearer {self.api_key}"}
|
||||
if self.api_key is not None
|
||||
else {}
|
||||
),
|
||||
},
|
||||
)
|
||||
data = response.json()
|
||||
return [
|
||||
Document(
|
||||
page_content=r["text"],
|
||||
metadata={"source": r["source"], "score": r["score"]},
|
||||
)
|
||||
for r in data["results"]
|
||||
]
|
||||
|
||||
async def aget_relevant_documents(self, query: str) -> List[Document]:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.request(
|
||||
"POST",
|
||||
self.datastore_url,
|
||||
json={
|
||||
"query": query,
|
||||
**({"topK": self.top_k} if self.top_k is not None else {}),
|
||||
},
|
||||
headers={
|
||||
"Content-Type": "application/json",
|
||||
**(
|
||||
{"Authorization": f"Bearer {self.api_key}"}
|
||||
if self.api_key is not None
|
||||
else {}
|
||||
),
|
||||
},
|
||||
) as response:
|
||||
data = await response.json()
|
||||
return [
|
||||
Document(
|
||||
page_content=r["text"],
|
||||
metadata={"source": r["source"], "score": r["score"]},
|
||||
)
|
||||
for r in data["results"]
|
||||
]
|
||||
@@ -12,20 +12,13 @@ class RemoteLangChainRetriever(BaseRetriever, BaseModel):
|
||||
headers: Optional[dict] = None
|
||||
input_key: str = "message"
|
||||
response_key: str = "response"
|
||||
page_content_key: str = "page_content"
|
||||
metadata_key: str = "metadata"
|
||||
|
||||
def get_relevant_documents(self, query: str) -> List[Document]:
|
||||
response = requests.post(
|
||||
self.url, json={self.input_key: query}, headers=self.headers
|
||||
)
|
||||
result = response.json()
|
||||
return [
|
||||
Document(
|
||||
page_content=r[self.page_content_key], metadata=r[self.metadata_key]
|
||||
)
|
||||
for r in result[self.response_key]
|
||||
]
|
||||
return [Document(**r) for r in result[self.response_key]]
|
||||
|
||||
async def aget_relevant_documents(self, query: str) -> List[Document]:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
@@ -33,9 +26,4 @@ class RemoteLangChainRetriever(BaseRetriever, BaseModel):
|
||||
"POST", self.url, headers=self.headers, json={self.input_key: query}
|
||||
) as response:
|
||||
result = await response.json()
|
||||
return [
|
||||
Document(
|
||||
page_content=r[self.page_content_key], metadata=r[self.metadata_key]
|
||||
)
|
||||
for r in result[self.response_key]
|
||||
]
|
||||
return [Document(**r) for r in result[self.response_key]]
|
||||
|
||||
@@ -390,3 +390,9 @@ class OutputParserException(Exception):
|
||||
"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class BaseDocumentProcessor(ABC):
|
||||
@abstractmethod
|
||||
def process(self, documents: List[Document]) -> List[Document]:
|
||||
"""Process documents."""
|
||||
|
||||
@@ -17,11 +17,12 @@ from typing import (
|
||||
)
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.schema import BaseDocumentProcessor
|
||||
|
||||
logger = logging.getLogger()
|
||||
|
||||
|
||||
class TextSplitter(ABC):
|
||||
class TextSplitter(BaseDocumentProcessor, ABC):
|
||||
"""Interface for splitting text into chunks."""
|
||||
|
||||
def __init__(
|
||||
@@ -64,6 +65,9 @@ class TextSplitter(ABC):
|
||||
metadatas = [doc.metadata for doc in documents]
|
||||
return self.create_documents(texts, metadatas)
|
||||
|
||||
def process(self, documents: List[Document]) -> List[Document]:
|
||||
return self.split_documents(documents)
|
||||
|
||||
def _join_docs(self, docs: List[str], separator: str) -> Optional[str]:
|
||||
text = separator.join(docs)
|
||||
text = text.strip()
|
||||
|
||||
@@ -475,12 +475,9 @@ class APIOperation(BaseModel):
|
||||
if request_body is not None
|
||||
else None
|
||||
)
|
||||
description = operation.description or operation.summary
|
||||
if not description and spec.paths is not None:
|
||||
description = spec.paths[path].description or spec.paths[path].summary
|
||||
return cls(
|
||||
operation_id=operation_id,
|
||||
description=description,
|
||||
description=operation.description,
|
||||
base_url=spec.base_url,
|
||||
path=path,
|
||||
method=method,
|
||||
|
||||
@@ -50,7 +50,6 @@ class PythonAstREPLTool(BaseTool):
|
||||
)
|
||||
globals: Optional[Dict] = Field(default_factory=dict)
|
||||
locals: Optional[Dict] = Field(default_factory=dict)
|
||||
sanitize_input: bool = True
|
||||
|
||||
@root_validator(pre=True)
|
||||
def validate_python_version(cls, values: Dict) -> Dict:
|
||||
@@ -66,9 +65,6 @@ class PythonAstREPLTool(BaseTool):
|
||||
def _run(self, query: str) -> str:
|
||||
"""Use the tool."""
|
||||
try:
|
||||
if self.sanitize_input:
|
||||
# Remove the triple backticks from the query.
|
||||
query = query.strip().strip("```")
|
||||
tree = ast.parse(query)
|
||||
module = ast.Module(tree.body[:-1], type_ignores=[])
|
||||
exec(ast.unparse(module), self.globals, self.locals) # type: ignore
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
from pydantic import BaseModel, Extra, Field, validator
|
||||
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.sql_database import SQLDatabase
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.tools.base import BaseTool
|
||||
from langchain.tools.sql_database.prompt import QUERY_CHECKER
|
||||
|
||||
@@ -80,12 +80,11 @@ class QueryCheckerTool(BaseSQLDatabaseTool, BaseTool):
|
||||
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
|
||||
|
||||
template: str = QUERY_CHECKER
|
||||
llm: BaseLLM
|
||||
llm_chain: LLMChain = Field(
|
||||
default_factory=lambda: LLMChain(
|
||||
llm=QueryCheckerTool.llm,
|
||||
llm=OpenAI(temperature=0),
|
||||
prompt=PromptTemplate(
|
||||
template=QueryCheckerTool.template, input_variables=["query", "dialect"]
|
||||
template=QUERY_CHECKER, input_variables=["query", "dialect"]
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
@@ -59,7 +59,6 @@ class Chroma(VectorStore):
|
||||
embedding_function: Optional[Embeddings] = None,
|
||||
persist_directory: Optional[str] = None,
|
||||
client_settings: Optional[chromadb.config.Settings] = None,
|
||||
collection_metadata: Optional[Dict] = None,
|
||||
) -> None:
|
||||
"""Initialize with Chroma client."""
|
||||
try:
|
||||
@@ -87,7 +86,6 @@ class Chroma(VectorStore):
|
||||
embedding_function=self._embedding_function.embed_documents
|
||||
if self._embedding_function is not None
|
||||
else None,
|
||||
metadata=collection_metadata,
|
||||
)
|
||||
|
||||
def add_texts(
|
||||
|
||||
@@ -4,7 +4,7 @@ from __future__ import annotations
|
||||
import logging
|
||||
import uuid
|
||||
from functools import partial
|
||||
from typing import Any, Callable, Dict, Iterable, List, Optional, Sequence, Tuple
|
||||
from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
|
||||
|
||||
import numpy as np
|
||||
|
||||
@@ -46,7 +46,6 @@ def vector_search(
|
||||
# Calculate the distance between the query_vector and all data_vectors
|
||||
distances = distance_metric_map[distance_metric](query_embedding, data_vectors)
|
||||
nearest_indices = np.argsort(distances)
|
||||
|
||||
nearest_indices = (
|
||||
nearest_indices[::-1][:k] if distance_metric in ["cos"] else nearest_indices[:k]
|
||||
)
|
||||
@@ -94,17 +93,12 @@ class DeepLake(VectorStore):
|
||||
dataset_path: str = _LANGCHAIN_DEFAULT_DEEPLAKE_PATH,
|
||||
token: Optional[str] = None,
|
||||
embedding_function: Optional[Embeddings] = None,
|
||||
read_only: Optional[bool] = False,
|
||||
ingestion_batch_size: int = 1024,
|
||||
num_workers: int = 4,
|
||||
read_only: Optional[bool] = None,
|
||||
) -> None:
|
||||
"""Initialize with Deep Lake client."""
|
||||
self.ingestion_batch_size = ingestion_batch_size
|
||||
self.num_workers = num_workers
|
||||
|
||||
try:
|
||||
import deeplake
|
||||
from deeplake.constants import MB
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import deeplake python package. "
|
||||
@@ -121,7 +115,6 @@ class DeepLake(VectorStore):
|
||||
self.ds.summary()
|
||||
else:
|
||||
self.ds = deeplake.empty(dataset_path, token=token, overwrite=True)
|
||||
|
||||
with self.ds:
|
||||
self.ds.create_tensor(
|
||||
"text",
|
||||
@@ -129,7 +122,6 @@ class DeepLake(VectorStore):
|
||||
create_id_tensor=False,
|
||||
create_sample_info_tensor=False,
|
||||
create_shape_tensor=False,
|
||||
chunk_compression="lz4",
|
||||
)
|
||||
self.ds.create_tensor(
|
||||
"metadata",
|
||||
@@ -137,16 +129,13 @@ class DeepLake(VectorStore):
|
||||
create_id_tensor=False,
|
||||
create_sample_info_tensor=False,
|
||||
create_shape_tensor=False,
|
||||
chunk_compression="lz4",
|
||||
)
|
||||
self.ds.create_tensor(
|
||||
"embedding",
|
||||
htype="generic",
|
||||
dtype=np.float32,
|
||||
create_id_tensor=False,
|
||||
create_sample_info_tensor=False,
|
||||
max_chunk_size=64 * MB,
|
||||
create_shape_tensor=True,
|
||||
create_shape_tensor=False,
|
||||
)
|
||||
self.ds.create_tensor(
|
||||
"ids",
|
||||
@@ -154,7 +143,6 @@ class DeepLake(VectorStore):
|
||||
create_id_tensor=False,
|
||||
create_sample_info_tensor=False,
|
||||
create_shape_tensor=False,
|
||||
chunk_compression="lz4",
|
||||
)
|
||||
|
||||
self._embedding_function = embedding_function
|
||||
@@ -182,45 +170,29 @@ class DeepLake(VectorStore):
|
||||
|
||||
text_list = list(texts)
|
||||
|
||||
if self._embedding_function is None:
|
||||
embeddings: Sequence[Optional[List[float]]] = [None] * len(text_list)
|
||||
else:
|
||||
embeddings = self._embedding_function.embed_documents(text_list)
|
||||
|
||||
if metadatas is None:
|
||||
metadatas = [{}] * len(text_list)
|
||||
|
||||
elements = list(zip(text_list, metadatas, ids))
|
||||
elements = zip(text_list, embeddings, metadatas, ids)
|
||||
|
||||
@self._deeplake.compute
|
||||
def ingest(sample_in: list, sample_out: list) -> None:
|
||||
text_list = [s[0] for s in sample_in]
|
||||
s = {
|
||||
"text": sample_in[0],
|
||||
"embedding": sample_in[1],
|
||||
"metadata": sample_in[2],
|
||||
"ids": sample_in[3],
|
||||
}
|
||||
sample_out.append(s)
|
||||
|
||||
embeds: Sequence[Optional[np.ndarray]] = []
|
||||
|
||||
if self._embedding_function is not None:
|
||||
embeddings = self._embedding_function.embed_documents(text_list)
|
||||
embeds = [np.array(e, dtype=np.float32) for e in embeddings]
|
||||
else:
|
||||
embeds = [None] * len(text_list)
|
||||
|
||||
for s, e in zip(sample_in, embeds):
|
||||
sample_out.append(
|
||||
{
|
||||
"text": s[0],
|
||||
"metadata": s[1],
|
||||
"ids": s[2],
|
||||
"embedding": e,
|
||||
}
|
||||
)
|
||||
|
||||
batch_size = min(self.ingestion_batch_size, len(elements))
|
||||
batched = [
|
||||
elements[i : i + batch_size] for i in range(0, len(elements), batch_size)
|
||||
]
|
||||
|
||||
ingest().eval(
|
||||
batched,
|
||||
self.ds,
|
||||
num_workers=min(self.num_workers, len(batched) // self.num_workers),
|
||||
)
|
||||
ingest().eval(list(elements), self.ds)
|
||||
self.ds.commit(allow_empty=True)
|
||||
self.ds.summary()
|
||||
|
||||
return ids
|
||||
|
||||
def search(
|
||||
@@ -231,7 +203,7 @@ class DeepLake(VectorStore):
|
||||
distance_metric: str = "L2",
|
||||
use_maximal_marginal_relevance: Optional[bool] = False,
|
||||
fetch_k: Optional[int] = 20,
|
||||
filter: Optional[Any[Dict[str, str], Callable, str]] = None,
|
||||
filter: Optional[Dict[str, str]] = None,
|
||||
return_score: Optional[bool] = False,
|
||||
**kwargs: Any,
|
||||
) -> Any[List[Document], List[Tuple[Document, float]]]:
|
||||
@@ -244,9 +216,7 @@ class DeepLake(VectorStore):
|
||||
distance_metric: `L2` for Euclidean, `L1` for Nuclear,
|
||||
`max` L-infinity distance, `cos` for cosine similarity,
|
||||
'dot' for dot product. Defaults to `L2`.
|
||||
filter: Attribute filter by metadata example {'key': 'value'}. It can also
|
||||
take [Deep Lake filter]
|
||||
(https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)
|
||||
filter: Attribute filter by metadata example {'key': 'value'}.
|
||||
Defaults to None.
|
||||
maximal_marginal_relevance: Whether to use maximal marginal relevance.
|
||||
Defaults to False.
|
||||
@@ -262,10 +232,8 @@ class DeepLake(VectorStore):
|
||||
|
||||
# attribute based filtering
|
||||
if filter is not None:
|
||||
if isinstance(filter, dict):
|
||||
filter = partial(dp_filter, filter=filter)
|
||||
view = view.filter(partial(dp_filter, filter=filter))
|
||||
|
||||
view = view.filter(filter)
|
||||
if len(view) == 0:
|
||||
return []
|
||||
|
||||
@@ -284,7 +252,8 @@ class DeepLake(VectorStore):
|
||||
query
|
||||
) # type: ignore
|
||||
query_emb = np.array(emb, dtype=np.float32)
|
||||
embeddings = view.embedding.numpy(fetch_chunks=True)
|
||||
embeddings = view.embedding.numpy()
|
||||
|
||||
k_search = fetch_k if use_maximal_marginal_relevance else k
|
||||
indices, scores = vector_search(
|
||||
query_emb,
|
||||
@@ -292,8 +261,8 @@ class DeepLake(VectorStore):
|
||||
k=k_search,
|
||||
distance_metric=distance_metric.lower(),
|
||||
)
|
||||
|
||||
view = view[indices]
|
||||
|
||||
if use_maximal_marginal_relevance:
|
||||
indices = maximal_marginal_relevance(
|
||||
query_emb, embeddings[indices], k=min(k, len(indices))
|
||||
|
||||
@@ -65,8 +65,6 @@ def _bulk_ingest_embeddings(
|
||||
embeddings: List[List[float]],
|
||||
texts: Iterable[str],
|
||||
metadatas: Optional[List[dict]] = None,
|
||||
vector_field: str = "vector_field",
|
||||
text_field: str = "text",
|
||||
) -> List[str]:
|
||||
"""Bulk Ingest Embeddings into given index."""
|
||||
bulk = _import_bulk()
|
||||
@@ -78,8 +76,8 @@ def _bulk_ingest_embeddings(
|
||||
request = {
|
||||
"_op_type": "index",
|
||||
"_index": index_name,
|
||||
vector_field: embeddings[i],
|
||||
text_field: text,
|
||||
"vector_field": embeddings[i],
|
||||
"text": text,
|
||||
"metadata": metadata,
|
||||
"_id": _id,
|
||||
}
|
||||
@@ -90,15 +88,12 @@ def _bulk_ingest_embeddings(
|
||||
return ids
|
||||
|
||||
|
||||
def _default_scripting_text_mapping(
|
||||
dim: int,
|
||||
vector_field: str = "vector_field",
|
||||
) -> Dict:
|
||||
def _default_scripting_text_mapping(dim: int) -> Dict:
|
||||
"""For Painless Scripting or Script Scoring,the default mapping to create index."""
|
||||
return {
|
||||
"mappings": {
|
||||
"properties": {
|
||||
vector_field: {"type": "knn_vector", "dimension": dim},
|
||||
"vector_field": {"type": "knn_vector", "dimension": dim},
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -111,14 +106,13 @@ def _default_text_mapping(
|
||||
ef_search: int = 512,
|
||||
ef_construction: int = 512,
|
||||
m: int = 16,
|
||||
vector_field: str = "vector_field",
|
||||
) -> Dict:
|
||||
"""For Approximate k-NN Search, this is the default mapping to create index."""
|
||||
return {
|
||||
"settings": {"index": {"knn": True, "knn.algo_param.ef_search": ef_search}},
|
||||
"mappings": {
|
||||
"properties": {
|
||||
vector_field: {
|
||||
"vector_field": {
|
||||
"type": "knn_vector",
|
||||
"dimension": dim,
|
||||
"method": {
|
||||
@@ -171,18 +165,10 @@ def _default_script_query(
|
||||
}
|
||||
|
||||
|
||||
def __get_painless_scripting_source(
|
||||
space_type: str, query_vector: List[float], vector_field: str = "vector_field"
|
||||
) -> str:
|
||||
def __get_painless_scripting_source(space_type: str, query_vector: List[float]) -> str:
|
||||
"""For Painless Scripting, it returns the script source based on space type."""
|
||||
source_value = (
|
||||
"(1.0 + "
|
||||
+ space_type
|
||||
+ "("
|
||||
+ str(query_vector)
|
||||
+ ", doc['"
|
||||
+ vector_field
|
||||
+ "']))"
|
||||
"(1.0 + " + space_type + "(" + str(query_vector) + ", doc['vector_field']))"
|
||||
)
|
||||
if space_type == "cosineSimilarity":
|
||||
return source_value
|
||||
@@ -264,26 +250,13 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
|
||||
Returns:
|
||||
List of ids from adding the texts into the vectorstore.
|
||||
|
||||
Optional Args:
|
||||
vector_field: Document field embeddings are stored in. Defaults to
|
||||
"vector_field".
|
||||
|
||||
text_field: Document field the text of the document is stored in. Defaults
|
||||
to "text".
|
||||
"""
|
||||
embeddings = self.embedding_function.embed_documents(list(texts))
|
||||
embeddings = [
|
||||
self.embedding_function.embed_documents([text])[0] for text in texts
|
||||
]
|
||||
_validate_embeddings_and_bulk_size(len(embeddings), bulk_size)
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
text_field = _get_kwargs_value(kwargs, "text_field", "text")
|
||||
return _bulk_ingest_embeddings(
|
||||
self.client,
|
||||
self.index_name,
|
||||
embeddings,
|
||||
texts,
|
||||
metadatas,
|
||||
vector_field,
|
||||
text_field,
|
||||
self.client, self.index_name, embeddings, texts, metadatas
|
||||
)
|
||||
|
||||
def similarity_search(
|
||||
@@ -304,17 +277,14 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
Optional Args:
|
||||
vector_field: Document field embeddings are stored in. Defaults to
|
||||
"vector_field".
|
||||
|
||||
text_field: Document field the text of the document is stored in. Defaults
|
||||
to "text".
|
||||
|
||||
metadata_field: Document field that metadata is stored in. Defaults to
|
||||
"metadata".
|
||||
Can be set to a special value "*" to include the entire document.
|
||||
|
||||
Optional Args for Approximate Search:
|
||||
search_type: "approximate_search"; default: "approximate_search"
|
||||
|
||||
size: number of results the query actually returns; default: 4
|
||||
|
||||
Optional Args for Script Scoring Search:
|
||||
@@ -328,7 +298,6 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
|
||||
Optional Args for Painless Scripting Search:
|
||||
search_type: "painless_scripting"; default: "approximate_search"
|
||||
|
||||
space_type: "l2Squared", "l1Norm", "cosineSimilarity"; default: "l2Squared"
|
||||
|
||||
pre_filter: script_score query to pre-filter documents before identifying
|
||||
@@ -338,21 +307,23 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
search_type = _get_kwargs_value(kwargs, "search_type", "approximate_search")
|
||||
text_field = _get_kwargs_value(kwargs, "text_field", "text")
|
||||
metadata_field = _get_kwargs_value(kwargs, "metadata_field", "metadata")
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
if search_type == "approximate_search":
|
||||
size = _get_kwargs_value(kwargs, "size", 4)
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
search_query = _default_approximate_search_query(
|
||||
embedding, size, k, vector_field
|
||||
)
|
||||
elif search_type == SCRIPT_SCORING_SEARCH:
|
||||
space_type = _get_kwargs_value(kwargs, "space_type", "l2")
|
||||
pre_filter = _get_kwargs_value(kwargs, "pre_filter", MATCH_ALL_QUERY)
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
search_query = _default_script_query(
|
||||
embedding, space_type, pre_filter, vector_field
|
||||
)
|
||||
elif search_type == PAINLESS_SCRIPTING_SEARCH:
|
||||
space_type = _get_kwargs_value(kwargs, "space_type", "l2Squared")
|
||||
pre_filter = _get_kwargs_value(kwargs, "pre_filter", MATCH_ALL_QUERY)
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
search_query = _default_painless_scripting_query(
|
||||
embedding, space_type, pre_filter, vector_field
|
||||
)
|
||||
@@ -399,13 +370,6 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
and lucene engines recommended for large datasets. Also supports brute force
|
||||
search through Script Scoring and Painless Scripting.
|
||||
|
||||
Optional Args:
|
||||
vector_field: Document field embeddings are stored in. Defaults to
|
||||
"vector_field".
|
||||
|
||||
text_field: Document field the text of the document is stored in. Defaults
|
||||
to "text".
|
||||
|
||||
Optional Keyword Args for Approximate Search:
|
||||
engine: "nmslib", "faiss", "hnsw"; default: "nmslib"
|
||||
|
||||
@@ -438,8 +402,6 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
kwargs, "index_name", "OPENSEARCH_INDEX_NAME", default=uuid.uuid4().hex
|
||||
)
|
||||
is_appx_search = _get_kwargs_value(kwargs, "is_appx_search", True)
|
||||
vector_field = _get_kwargs_value(kwargs, "vector_field", "vector_field")
|
||||
text_field = _get_kwargs_value(kwargs, "text_field", "text")
|
||||
if is_appx_search:
|
||||
engine = _get_kwargs_value(kwargs, "engine", "nmslib")
|
||||
space_type = _get_kwargs_value(kwargs, "space_type", "l2")
|
||||
@@ -448,13 +410,11 @@ class OpenSearchVectorSearch(VectorStore):
|
||||
m = _get_kwargs_value(kwargs, "m", 16)
|
||||
|
||||
mapping = _default_text_mapping(
|
||||
dim, engine, space_type, ef_search, ef_construction, m, vector_field
|
||||
dim, engine, space_type, ef_search, ef_construction, m
|
||||
)
|
||||
else:
|
||||
mapping = _default_scripting_text_mapping(dim)
|
||||
|
||||
client.indices.create(index=index_name, body=mapping)
|
||||
_bulk_ingest_embeddings(
|
||||
client, index_name, embeddings, texts, metadatas, vector_field, text_field
|
||||
)
|
||||
_bulk_ingest_embeddings(client, index_name, embeddings, texts, metadatas)
|
||||
return cls(opensearch_url, index_name, embedding)
|
||||
|
||||
@@ -27,13 +27,11 @@ REDIS_REQUIRED_MODULES = [
|
||||
|
||||
def _check_redis_module_exist(client: RedisType, modules: List[dict]) -> None:
|
||||
"""Check if the correct Redis modules are installed."""
|
||||
installed_modules = client.module_list()
|
||||
installed_modules = {
|
||||
module[b"name"].decode("utf-8"): module for module in installed_modules
|
||||
}
|
||||
installed_modules = client.info().get("modules", [])
|
||||
installed_modules = {module["name"]: module for module in installed_modules}
|
||||
for module in modules:
|
||||
if module["name"] not in installed_modules or int(
|
||||
installed_modules[module["name"]][b"ver"]
|
||||
installed_modules[module["name"]]["ver"]
|
||||
) < int(module["ver"]):
|
||||
error_message = (
|
||||
"You must add the RediSearch (>= 2.4) module from Redis Stack. "
|
||||
|
||||
660
poetry.lock
generated
660
poetry.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
[tool.poetry]
|
||||
name = "langchain"
|
||||
version = "0.0.137"
|
||||
version = "0.0.134"
|
||||
description = "Building applications with LLMs through composability"
|
||||
authors = []
|
||||
license = "MIT"
|
||||
@@ -36,7 +36,7 @@ weaviate-client = {version = "^3", optional = true}
|
||||
google-api-python-client = {version = "2.70.0", optional = true}
|
||||
wolframalpha = {version = "5.0.0", optional = true}
|
||||
anthropic = {version = "^0.2.4", optional = true}
|
||||
qdrant-client = {version = "^1.1.2", optional = true, python = ">=3.8.1,<3.12"}
|
||||
qdrant-client = {version = "^1.1.1", optional = true, python = ">=3.8.1,<3.12"}
|
||||
dataclasses-json = "^0.5.7"
|
||||
tensorflow-text = {version = "^2.11.0", optional = true, python = "^3.10, <3.12"}
|
||||
tenacity = "^8.1.0"
|
||||
@@ -52,10 +52,10 @@ aiohttp = "^3.8.3"
|
||||
pypdf = {version = "^3.4.0", optional = true}
|
||||
networkx = {version="^2.6.3", optional = true}
|
||||
aleph-alpha-client = {version="^2.15.0", optional = true}
|
||||
deeplake = {version = "^3.2.21", optional = true}
|
||||
deeplake = {version = "^3.2.9", optional = true}
|
||||
pgvector = {version = "^0.1.6", optional = true}
|
||||
psycopg2-binary = {version = "^2.9.5", optional = true}
|
||||
#boto3 = {version = "^1.26.96", optional = true} # TODO: fix it, commented because the version failed with deeplake
|
||||
boto3 = {version = "^1.26.96", optional = true}
|
||||
pyowm = {version = "^3.3.0", optional = true}
|
||||
async-timeout = {version = "^4.0.0", python = "<3.11"}
|
||||
|
||||
@@ -92,12 +92,6 @@ openai = "^0.27.4"
|
||||
elasticsearch = {extras = ["async"], version = "^8.6.2"}
|
||||
pytest-vcr = "^1.0.2"
|
||||
wrapt = "^1.15.0"
|
||||
redis = "^4.5.4"
|
||||
pinecone-client = "^2.2.1"
|
||||
pgvector = "^0.1.6"
|
||||
transformers = "^4.27.4"
|
||||
pandas = "^2.0.0"
|
||||
deeplake = "^3.2.21"
|
||||
|
||||
[tool.poetry.group.lint.dependencies]
|
||||
ruff = "^0.0.249"
|
||||
|
||||
@@ -26,8 +26,7 @@ poetry install --with test_integration
|
||||
Any new dependencies should be added by running:
|
||||
|
||||
```bash
|
||||
# add package and install it after adding:
|
||||
poetry add deeplake --group "test_integration" && poetry install --with test_integration
|
||||
poetry add some_new_deps --group "test_integration"
|
||||
```
|
||||
|
||||
Before running any tests, you should start a specific Docker container that has all the
|
||||
|
||||
@@ -1,20 +0,0 @@
|
||||
from langchain.document_loaders.bilibili import BiliBiliLoader
|
||||
|
||||
|
||||
def test_bilibili_loader() -> None:
|
||||
"""Test Bilibili Loader."""
|
||||
loader = BiliBiliLoader(
|
||||
[
|
||||
"https://www.bilibili.com/video/BV1xt411o7Xu/",
|
||||
"https://www.bilibili.com/video/av330407025/",
|
||||
]
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 2
|
||||
|
||||
assert len(docs[0].page_content) > 0
|
||||
assert docs[1].metadata["owner"]["mid"] == 398095160
|
||||
|
||||
assert docs[1].page_content == ""
|
||||
assert docs[1].metadata["owner"]["mid"] == 398095160
|
||||
@@ -2,7 +2,6 @@ from pathlib import Path
|
||||
|
||||
from langchain.document_loaders import (
|
||||
PDFMinerLoader,
|
||||
PDFMinerPDFasHTMLLoader,
|
||||
PyMuPDFLoader,
|
||||
UnstructuredPDFLoader,
|
||||
)
|
||||
@@ -32,21 +31,6 @@ def test_pdfminer_loader() -> None:
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def test_pdfminer_pdf_as_html_loader() -> None:
|
||||
"""Test PDFMinerPDFasHTMLLoader."""
|
||||
file_path = Path(__file__).parent.parent / "examples/hello.pdf"
|
||||
loader = PDFMinerPDFasHTMLLoader(str(file_path))
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
file_path = Path(__file__).parent.parent / "examples/layout-parser-paper.pdf"
|
||||
loader = PDFMinerPDFasHTMLLoader(str(file_path))
|
||||
|
||||
docs = loader.load()
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def test_pymupdf_loader() -> None:
|
||||
"""Test PyMuPDF loader."""
|
||||
file_path = Path(__file__).parent.parent / "examples/hello.pdf"
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user