Compare commits

..

31 Commits

Author SHA1 Message Date
Harrison Chase
fc580fbca3 mrkl parser 2023-03-30 23:28:57 -07:00
Kevin Kermani Nejad
6c66f51fb8 add error message to the google drive document loader (#2186)
When downloading a google doc, if the document is not a google doc type,
for example if you uploaded a .DOCX file to your google drive, the error
you get is not informative at all. I added a error handler which print
the exact error occurred during downloading the document from google
docs.
2023-03-30 20:58:27 -07:00
Harrison Chase
2eeaccf01c Harrison/apify (#2215)
Co-authored-by: Jiří Moravčík <jiri.moravcik@gmail.com>
2023-03-30 20:58:14 -07:00
Alex Stachowiak
e6a9ee64b3 Update vectorstore-retriever.ipynb (#2210) 2023-03-30 20:51:46 -07:00
Arttii
4e9ee566ef Add MMR methods to chroma (#2148)
Hi, I added MMR similar to faais and milvus to chroma. Please let me
know what you think.
2023-03-30 20:51:16 -07:00
Harrison Chase
fc009f61c8 sitemap more flexible (#2214) 2023-03-30 20:46:36 -07:00
Matt Robinson
3dfe1cf60e feat: document loader for epublications (#2202)
### Summary

Adds a new document loader for processing e-publications. Works with
`unstructured>=0.5.4`. You need to have
[`pandoc`](https://pandoc.org/installing.html) installed for this loader
to work.

### Testing

```python
from langchain.document_loaders import UnstructuredEPubLoader

loader = UnstructuredEPubLoader("winter-sports.epub", mode="elements")
data = loader.load()
data[0]
```
2023-03-30 20:45:31 -07:00
Ikko Eltociear Ashimine
a4a1ee6b5d Update huggingface_length_function.ipynb (#2203)
HuggingFace -> Hugging Face
2023-03-30 20:43:58 -07:00
Harrison Chase
2d3918c152 make requests more general (#2209) 2023-03-30 20:41:56 -07:00
Harrison Chase
1c03205cc2 embedding docs (#2200) 2023-03-30 08:34:14 -07:00
Harrison Chase
feec4c61f4 Harrison/docs reqs (#2199) 2023-03-30 08:20:30 -07:00
Harrison Chase
097684e5f2 bump version to 127 (#2197) 2023-03-30 08:11:04 -07:00
Ben Heckmann
fd1fcb5a7d fix typing for LLMMathChain (#2183)
Fix typing in LLMMathChain to allow chat models (#1834). Might have been
forgotten in related PR #1807.
2023-03-30 07:52:58 -07:00
Cory Zue
3207a74829 fix typo in chat_prompt_template docs (#2193) 2023-03-30 07:52:40 -07:00
Alan deLevie
597378d1f6 Small typo in custom_agent.ipynb (#2194)
determin -> determine
2023-03-30 07:52:29 -07:00
Jeru2023
64b9843b5b Update text.py (#2195)
Add encoding parameter when open txt file to support unicode files.
2023-03-30 07:52:17 -07:00
Rui Ferreira
5d86a6acf1 Fix wikipedia summaries (#2187)
This upsteam wikipedia page loading seems to still have issues. Finding
a compromise solution where it does an exact match search and not a
search for the completion.

See previous PR: https://github.com/hwchase17/langchain/pull/2169
2023-03-30 07:34:13 -07:00
Kei Kamikawa
35a3218e84 supported async retriever (#2149) 2023-03-30 10:14:05 -04:00
Harrison Chase
65c0c73597 Harrison/arize (#2180)
Co-authored-by: Hakan Tekgul <tekgul2@illinois.edu>
2023-03-29 22:55:21 -07:00
Harrison Chase
33a001933a Harrison/clear ml (#2179)
Co-authored-by: Victor Sonck <victor.sonck@gmail.com>
2023-03-29 22:45:34 -07:00
Harrison Chase
fe804d2a01 Harrison/aim integration (#2178)
Co-authored-by: Hovhannes Tamoyan <hovhannes.tamoyan@gmail.com>
Co-authored-by: Gor Arakelyan <arakelyangor10@gmail.com>
2023-03-29 22:37:56 -07:00
Gene Ruebsamen
68f039704c missing word 'not' in constitutional prompts (#2176)
arson should **not** be condoned.

not was missing in the critique
2023-03-29 22:29:48 -07:00
Harrison Chase
bcfd071784 Harrison/engine args (#2177)
Co-authored-by: Alvaro Sevilla <alvarosevilla95@gmail.com>
2023-03-29 22:29:38 -07:00
Tim Asp
7d90691adb Add kwargs to from_* in PrompTemplate (#2161)
This will let us use output parsers, etc, while using the `from_*`
helper functions
2023-03-29 22:13:27 -07:00
Rui Ferreira
f83c36d8fd Fix incorrect wikipage summaries (#2169)
Creating a page using the title causes a wikipedia search with
autocomplete set to true. This frequently causes the summaries to be
unrelated to the actual page found.

See:
1554943e8a/wikipedia/wikipedia.py (L254-L280)
2023-03-29 22:13:03 -07:00
Tim Asp
6be67279fb Add apredict_and_parse to LLM (#2164)
`predict_and_parse` exists, and it's a nice abstraction to allow for
applying output parsers to LLM generations. And async is very useful.

As an aside, the difference between `call/acall`, `predict/apredict` and
`generate/agenerate` isn't entirely
clear to me other than they all call into the LLM in slightly different
ways.

Is there some documentation or a good way to think about these
differences?

One thought:  

output parsers should just work magically for all those LLM calls. If
the `output_parser` arg is set on the prompt, the LLM has access, so it
seems like extra work on the user's end to have to call
`output_parser.parse`

If this sounds reasonable, happy to throw something together. @hwchase17
2023-03-29 22:12:50 -07:00
Max Caldwell
3dc49a04a3 [Documents] Updated Figma docs and added example (#2172)
- Current docs are pointing to the wrong module, fixed
- Added some explanation on how to find the necessary parameters
- Added chat-based codegen example w/ retrievers

Picture of the new page:
![Screenshot 2023-03-29 at 20-11-29 Figma — 🦜🔗 LangChain 0 0
126](https://user-images.githubusercontent.com/2172753/228719338-c7ec5b11-01c2-4378-952e-38bc809f217b.png)

Please let me know if you'd like any tweaks! I wasn't sure if the
example was too heavy for the page or not but decided "hey, I probably
would want to see it" and so included it.

Co-authored-by: maxtheman <max@maxs-mbp.lan>
2023-03-29 22:11:45 -07:00
Harrison Chase
5c907d9998 Harrison/base agent without docs (#2166) 2023-03-29 22:11:25 -07:00
Zoltan Fedor
1b7cfd7222 Bugfix: Redis lrange() retrieves records in opposite order of inseerting (#2167)
The new functionality of Redis backend for chat message history
([see](https://github.com/hwchase17/langchain/pull/2122)) uses the Redis
list object to store messages and then uses the `lrange()` to retrieve
the list of messages
([see](https://github.com/hwchase17/langchain/blob/master/langchain/memory/chat_message_histories/redis.py#L50)).

Unfortunately this retrieves the messages as a list sorted in the
opposite order of how they were inserted - meaning the last inserted
message will be first in the retrieved list - which is not what we want.

This PR fixes that as it changes the order to match the order of
insertion.
2023-03-29 22:09:01 -07:00
blob42
7859245fc5 doc: more details on BaseOutputParser docstrings (#2171)
Co-authored-by: blob42 <spike@w530>
2023-03-29 22:07:05 -07:00
Ankush Gola
529a1f39b9 make tool verbosity override agent verbosity (#2173)
Currently, if a tool is set to verbose, an agent can override it by
passing in its own verbose flag. This is not ideal if we want to stream
back responses from agents, as we want the llm and tools to be sending
back events but nothing else. This also makes the behavior consistent
with ts.
2023-03-29 22:05:58 -07:00
61 changed files with 3845 additions and 3006 deletions

BIN
docs/_static/ApifyActors.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 559 KiB

View File

@@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Aim\n",
"\n",
"Aim makes it super easy to visualize and debug LangChain executions. Aim tracks inputs and outputs of LLMs and tools, as well as actions of agents. \n",
"\n",
"With Aim, you can easily debug and examine an individual execution:\n",
"\n",
"![](https://user-images.githubusercontent.com/13848158/227784778-06b806c7-74a1-4d15-ab85-9ece09b458aa.png)\n",
"\n",
"Additionally, you have the option to compare multiple executions side by side:\n",
"\n",
"![](https://user-images.githubusercontent.com/13848158/227784994-699b24b7-e69b-48f9-9ffa-e6a6142fd719.png)\n",
"\n",
"Aim is fully open source, [learn more](https://github.com/aimhubio/aim) about Aim on GitHub.\n",
"\n",
"Let's move forward and see how to enable and configure Aim callback."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Tracking LangChain Executions with Aim</h3>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we will explore three usage scenarios. To start off, we will install the necessary packages and import certain modules. Subsequently, we will configure two environment variables that can be established either within the Python script or through the terminal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mf88kuCJhbVu"
},
"outputs": [],
"source": [
"!pip install aim\n",
"!pip install langchain\n",
"!pip install openai\n",
"!pip install google-search-results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "g4eTuajwfl6L"
},
"outputs": [],
"source": [
"import os\n",
"from datetime import datetime\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks import AimCallbackHandler, StdOutCallbackHandler"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our examples use a GPT model as the LLM, and OpenAI offers an API for this purpose. You can obtain the key from the following link: https://platform.openai.com/account/api-keys .\n",
"\n",
"We will use the SerpApi to retrieve search results from Google. To acquire the SerpApi key, please go to https://serpapi.com/manage-api-key ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "T1bSmKd6V2If"
},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = \"...\"\n",
"os.environ[\"SERPAPI_API_KEY\"] = \"...\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QenUYuBZjIzc"
},
"source": [
"The event methods of `AimCallbackHandler` accept the LangChain module or agent as input and log at least the prompts and generated results, as well as the serialized version of the LangChain module, to the designated Aim run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KAz8weWuUeXF"
},
"outputs": [],
"source": [
"session_group = datetime.now().strftime(\"%m.%d.%Y_%H.%M.%S\")\n",
"aim_callback = AimCallbackHandler(\n",
" repo=\".\",\n",
" experiment_name=\"scenario 1: OpenAI LLM\",\n",
")\n",
"\n",
"manager = CallbackManager([StdOutCallbackHandler(), aim_callback])\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b8WfByB4fl6N"
},
"source": [
"The `flush_tracker` function is used to record LangChain assets on Aim. By default, the session is reset rather than being terminated outright."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Scenario 1</h3> In the first scenario, we will use OpenAI LLM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "o_VmneyIUyx8"
},
"outputs": [],
"source": [
"# scenario 1 - LLM\n",
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\"] * 3)\n",
"aim_callback.flush_tracker(\n",
" langchain_asset=llm,\n",
" experiment_name=\"scenario 2: Chain with multiple SubChains on multiple generations\",\n",
")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Scenario 2</h3> Scenario two involves chaining with multiple SubChains across multiple generations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "trxslyb1U28Y"
},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uauQk10SUzF6"
},
"outputs": [],
"source": [
"# scenario 2 - Chain\n",
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
"Title: {title}\n",
"Playwright: This is a synopsis for the above play:\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callback_manager=manager)\n",
"\n",
"test_prompts = [\n",
" {\"title\": \"documentary about good video games that push the boundary of game design\"},\n",
" {\"title\": \"the phenomenon behind the remarkable speed of cheetahs\"},\n",
" {\"title\": \"the best in class mlops tooling\"},\n",
"]\n",
"synopsis_chain.apply(test_prompts)\n",
"aim_callback.flush_tracker(\n",
" langchain_asset=synopsis_chain, experiment_name=\"scenario 3: Agent with Tools\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Scenario 3</h3> The third scenario involves an agent with tools."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_jN73xcPVEpI"
},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent, load_tools"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Gpq4rk6VT9cu",
"outputId": "68ae261e-d0a2-4229-83c4-762562263b66"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Leo DiCaprio's girlfriend is and then calculate her age raised to the 0.43 power.\n",
"Action: Search\n",
"Action Input: \"Leo DiCaprio girlfriend\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mLeonardo DiCaprio seemed to prove a long-held theory about his love life right after splitting from girlfriend Camila Morrone just months ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Camila Morrone's age\n",
"Action: Search\n",
"Action Input: \"Camila Morrone age\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m25 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 25 raised to the 0.43 power\n",
"Action: Calculator\n",
"Action Input: 25^0.43\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 3.991298452658078\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Camila Morrone is Leo DiCaprio's girlfriend and her current age raised to the 0.43 power is 3.991298452658078.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"# scenario 3 - Agent with Tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" callback_manager=manager,\n",
" verbose=True,\n",
")\n",
"agent.run(\n",
" \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",
")\n",
"aim_callback.flush_tracker(langchain_asset=agent, reset=False, finish=True)"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

46
docs/ecosystem/apify.md Normal file
View File

@@ -0,0 +1,46 @@
# Apify
This page covers how to use [Apify](https://apify.com) within LangChain.
## Overview
Apify is a cloud platform for web scraping and data extraction,
which provides an [ecosystem](https://apify.com/store) of more than a thousand
ready-made apps called *Actors* for various scraping, crawling, and extraction use cases.
[![Apify Actors](../_static/ApifyActors.png)](https://apify.com/store)
This integration enables you run Actors on the Apify platform and load their results into LangChain to feed your vector
indexes with documents and data from the web, e.g. to generate answers from websites with documentation,
blogs, or knowledge bases.
## Installation and Setup
- Install the Apify API client for Python with `pip install apify-client`
- Get your [Apify API token](https://console.apify.com/account/integrations) and either set it as
an environment variable (`APIFY_API_TOKEN`) or pass it to the `ApifyWrapper` as `apify_api_token` in the constructor.
## Wrappers
### Utility
You can use the `ApifyWrapper` to run Actors on the Apify platform.
```python
from langchain.utilities import ApifyWrapper
```
For a more detailed walkthrough of this wrapper, see [this notebook](../modules/agents/tools/examples/apify.ipynb).
### Loader
You can also use our `ApifyDatasetLoader` to get data from Apify dataset.
```python
from langchain.document_loaders import ApifyDatasetLoader
```
For a more detailed walkthrough of this loader, see [this notebook](../modules/indexes/document_loaders/examples/apify_dataset.ipynb).

View File

@@ -0,0 +1,588 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# ClearML Integration\n",
"\n",
"In order to properly keep track of your langchain experiments and their results, you can enable the ClearML integration. ClearML is an experiment manager that neatly tracks and organizes all your experiment runs.\n",
"\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/ecosystem/clearml_tracking.ipynb\">\n",
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
"</a>"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting API Credentials\n",
"\n",
"We'll be using quite some APIs in this notebook, here is a list and where to get them:\n",
"\n",
"- ClearML: https://app.clear.ml/settings/workspace-configuration\n",
"- OpenAI: https://platform.openai.com/account/api-keys\n",
"- SerpAPI (google search): https://serpapi.com/dashboard"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"CLEARML_API_ACCESS_KEY\"] = \"\"\n",
"os.environ[\"CLEARML_API_SECRET_KEY\"] = \"\"\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"os.environ[\"SERPAPI_API_KEY\"] = \"\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setting Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install clearml\n",
"!pip install pandas\n",
"!pip install textstat\n",
"!pip install spacy\n",
"!python -m spacy download en_core_web_sm"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The clearml callback is currently in beta and is subject to change based on updates to `langchain`. Please report any issues to https://github.com/allegroai/clearml/issues with the tag `langchain`.\n"
]
}
],
"source": [
"from datetime import datetime\n",
"from langchain.callbacks import ClearMLCallbackHandler, StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.llms import OpenAI\n",
"\n",
"# Setup and use the ClearML Callback\n",
"clearml_callback = ClearMLCallbackHandler(\n",
" task_type=\"inference\",\n",
" project_name=\"langchain_callback_demo\",\n",
" task_name=\"llm\",\n",
" tags=[\"test\"],\n",
" # Change the following parameters based on the amount of detail you want tracked\n",
" visualize=True,\n",
" complexity_metrics=True,\n",
" stream_logs=True\n",
")\n",
"manager = CallbackManager([StdOutCallbackHandler(), clearml_callback])\n",
"# Get the OpenAI model ready to go\n",
"llm = OpenAI(temperature=0, callback_manager=manager, verbose=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scenario 1: Just an LLM\n",
"\n",
"First, let's just run a single LLM a few times and capture the resulting prompt-answer conversation in ClearML"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a joke'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a poem'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a joke'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a poem'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a joke'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Tell me a poem'}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nQ: What did the fish say when it hit the wall?\\nA: Dam!', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 109.04, 'flesch_kincaid_grade': 1.3, 'smog_index': 0.0, 'coleman_liau_index': -1.24, 'automated_readability_index': 0.3, 'dale_chall_readability_score': 5.5, 'difficult_words': 0, 'linsear_write_formula': 5.5, 'gunning_fog': 5.2, 'text_standard': '5th and 6th grade', 'fernandez_huerta': 133.58, 'szigriszt_pazos': 131.54, 'gutierrez_polini': 62.3, 'crawford': -0.2, 'gulpease_index': 79.8, 'osman': 116.91}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nRoses are red,\\nViolets are blue,\\nSugar is sweet,\\nAnd so are you.', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 83.66, 'flesch_kincaid_grade': 4.8, 'smog_index': 0.0, 'coleman_liau_index': 3.23, 'automated_readability_index': 3.9, 'dale_chall_readability_score': 6.71, 'difficult_words': 2, 'linsear_write_formula': 6.5, 'gunning_fog': 8.28, 'text_standard': '6th and 7th grade', 'fernandez_huerta': 115.58, 'szigriszt_pazos': 112.37, 'gutierrez_polini': 54.83, 'crawford': 1.4, 'gulpease_index': 72.1, 'osman': 100.17}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nQ: What did the fish say when it hit the wall?\\nA: Dam!', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 109.04, 'flesch_kincaid_grade': 1.3, 'smog_index': 0.0, 'coleman_liau_index': -1.24, 'automated_readability_index': 0.3, 'dale_chall_readability_score': 5.5, 'difficult_words': 0, 'linsear_write_formula': 5.5, 'gunning_fog': 5.2, 'text_standard': '5th and 6th grade', 'fernandez_huerta': 133.58, 'szigriszt_pazos': 131.54, 'gutierrez_polini': 62.3, 'crawford': -0.2, 'gulpease_index': 79.8, 'osman': 116.91}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nRoses are red,\\nViolets are blue,\\nSugar is sweet,\\nAnd so are you.', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 83.66, 'flesch_kincaid_grade': 4.8, 'smog_index': 0.0, 'coleman_liau_index': 3.23, 'automated_readability_index': 3.9, 'dale_chall_readability_score': 6.71, 'difficult_words': 2, 'linsear_write_formula': 6.5, 'gunning_fog': 8.28, 'text_standard': '6th and 7th grade', 'fernandez_huerta': 115.58, 'szigriszt_pazos': 112.37, 'gutierrez_polini': 54.83, 'crawford': 1.4, 'gulpease_index': 72.1, 'osman': 100.17}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nQ: What did the fish say when it hit the wall?\\nA: Dam!', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 109.04, 'flesch_kincaid_grade': 1.3, 'smog_index': 0.0, 'coleman_liau_index': -1.24, 'automated_readability_index': 0.3, 'dale_chall_readability_score': 5.5, 'difficult_words': 0, 'linsear_write_formula': 5.5, 'gunning_fog': 5.2, 'text_standard': '5th and 6th grade', 'fernandez_huerta': 133.58, 'szigriszt_pazos': 131.54, 'gutierrez_polini': 62.3, 'crawford': -0.2, 'gulpease_index': 79.8, 'osman': 116.91}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 24, 'token_usage_completion_tokens': 138, 'token_usage_total_tokens': 162, 'model_name': 'text-davinci-003', 'step': 4, 'starts': 2, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 0, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': '\\n\\nRoses are red,\\nViolets are blue,\\nSugar is sweet,\\nAnd so are you.', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 83.66, 'flesch_kincaid_grade': 4.8, 'smog_index': 0.0, 'coleman_liau_index': 3.23, 'automated_readability_index': 3.9, 'dale_chall_readability_score': 6.71, 'difficult_words': 2, 'linsear_write_formula': 6.5, 'gunning_fog': 8.28, 'text_standard': '6th and 7th grade', 'fernandez_huerta': 115.58, 'szigriszt_pazos': 112.37, 'gutierrez_polini': 54.83, 'crawford': 1.4, 'gulpease_index': 72.1, 'osman': 100.17}\n",
"{'action_records': action name step starts ends errors text_ctr chain_starts \\\n",
"0 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"1 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"2 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"3 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"4 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"5 on_llm_start OpenAI 1 1 0 0 0 0 \n",
"6 on_llm_end NaN 2 1 1 0 0 0 \n",
"7 on_llm_end NaN 2 1 1 0 0 0 \n",
"8 on_llm_end NaN 2 1 1 0 0 0 \n",
"9 on_llm_end NaN 2 1 1 0 0 0 \n",
"10 on_llm_end NaN 2 1 1 0 0 0 \n",
"11 on_llm_end NaN 2 1 1 0 0 0 \n",
"12 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"13 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"14 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"15 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"16 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"17 on_llm_start OpenAI 3 2 1 0 0 0 \n",
"18 on_llm_end NaN 4 2 2 0 0 0 \n",
"19 on_llm_end NaN 4 2 2 0 0 0 \n",
"20 on_llm_end NaN 4 2 2 0 0 0 \n",
"21 on_llm_end NaN 4 2 2 0 0 0 \n",
"22 on_llm_end NaN 4 2 2 0 0 0 \n",
"23 on_llm_end NaN 4 2 2 0 0 0 \n",
"\n",
" chain_ends llm_starts ... difficult_words linsear_write_formula \\\n",
"0 0 1 ... NaN NaN \n",
"1 0 1 ... NaN NaN \n",
"2 0 1 ... NaN NaN \n",
"3 0 1 ... NaN NaN \n",
"4 0 1 ... NaN NaN \n",
"5 0 1 ... NaN NaN \n",
"6 0 1 ... 0.0 5.5 \n",
"7 0 1 ... 2.0 6.5 \n",
"8 0 1 ... 0.0 5.5 \n",
"9 0 1 ... 2.0 6.5 \n",
"10 0 1 ... 0.0 5.5 \n",
"11 0 1 ... 2.0 6.5 \n",
"12 0 2 ... NaN NaN \n",
"13 0 2 ... NaN NaN \n",
"14 0 2 ... NaN NaN \n",
"15 0 2 ... NaN NaN \n",
"16 0 2 ... NaN NaN \n",
"17 0 2 ... NaN NaN \n",
"18 0 2 ... 0.0 5.5 \n",
"19 0 2 ... 2.0 6.5 \n",
"20 0 2 ... 0.0 5.5 \n",
"21 0 2 ... 2.0 6.5 \n",
"22 0 2 ... 0.0 5.5 \n",
"23 0 2 ... 2.0 6.5 \n",
"\n",
" gunning_fog text_standard fernandez_huerta szigriszt_pazos \\\n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"5 NaN NaN NaN NaN \n",
"6 5.20 5th and 6th grade 133.58 131.54 \n",
"7 8.28 6th and 7th grade 115.58 112.37 \n",
"8 5.20 5th and 6th grade 133.58 131.54 \n",
"9 8.28 6th and 7th grade 115.58 112.37 \n",
"10 5.20 5th and 6th grade 133.58 131.54 \n",
"11 8.28 6th and 7th grade 115.58 112.37 \n",
"12 NaN NaN NaN NaN \n",
"13 NaN NaN NaN NaN \n",
"14 NaN NaN NaN NaN \n",
"15 NaN NaN NaN NaN \n",
"16 NaN NaN NaN NaN \n",
"17 NaN NaN NaN NaN \n",
"18 5.20 5th and 6th grade 133.58 131.54 \n",
"19 8.28 6th and 7th grade 115.58 112.37 \n",
"20 5.20 5th and 6th grade 133.58 131.54 \n",
"21 8.28 6th and 7th grade 115.58 112.37 \n",
"22 5.20 5th and 6th grade 133.58 131.54 \n",
"23 8.28 6th and 7th grade 115.58 112.37 \n",
"\n",
" gutierrez_polini crawford gulpease_index osman \n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"5 NaN NaN NaN NaN \n",
"6 62.30 -0.2 79.8 116.91 \n",
"7 54.83 1.4 72.1 100.17 \n",
"8 62.30 -0.2 79.8 116.91 \n",
"9 54.83 1.4 72.1 100.17 \n",
"10 62.30 -0.2 79.8 116.91 \n",
"11 54.83 1.4 72.1 100.17 \n",
"12 NaN NaN NaN NaN \n",
"13 NaN NaN NaN NaN \n",
"14 NaN NaN NaN NaN \n",
"15 NaN NaN NaN NaN \n",
"16 NaN NaN NaN NaN \n",
"17 NaN NaN NaN NaN \n",
"18 62.30 -0.2 79.8 116.91 \n",
"19 54.83 1.4 72.1 100.17 \n",
"20 62.30 -0.2 79.8 116.91 \n",
"21 54.83 1.4 72.1 100.17 \n",
"22 62.30 -0.2 79.8 116.91 \n",
"23 54.83 1.4 72.1 100.17 \n",
"\n",
"[24 rows x 39 columns], 'session_analysis': prompt_step prompts name output_step \\\n",
"0 1 Tell me a joke OpenAI 2 \n",
"1 1 Tell me a poem OpenAI 2 \n",
"2 1 Tell me a joke OpenAI 2 \n",
"3 1 Tell me a poem OpenAI 2 \n",
"4 1 Tell me a joke OpenAI 2 \n",
"5 1 Tell me a poem OpenAI 2 \n",
"6 3 Tell me a joke OpenAI 4 \n",
"7 3 Tell me a poem OpenAI 4 \n",
"8 3 Tell me a joke OpenAI 4 \n",
"9 3 Tell me a poem OpenAI 4 \n",
"10 3 Tell me a joke OpenAI 4 \n",
"11 3 Tell me a poem OpenAI 4 \n",
"\n",
" output \\\n",
"0 \\n\\nQ: What did the fish say when it hit the w... \n",
"1 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"2 \\n\\nQ: What did the fish say when it hit the w... \n",
"3 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"4 \\n\\nQ: What did the fish say when it hit the w... \n",
"5 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"6 \\n\\nQ: What did the fish say when it hit the w... \n",
"7 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"8 \\n\\nQ: What did the fish say when it hit the w... \n",
"9 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"10 \\n\\nQ: What did the fish say when it hit the w... \n",
"11 \\n\\nRoses are red,\\nViolets are blue,\\nSugar i... \n",
"\n",
" token_usage_total_tokens token_usage_prompt_tokens \\\n",
"0 162 24 \n",
"1 162 24 \n",
"2 162 24 \n",
"3 162 24 \n",
"4 162 24 \n",
"5 162 24 \n",
"6 162 24 \n",
"7 162 24 \n",
"8 162 24 \n",
"9 162 24 \n",
"10 162 24 \n",
"11 162 24 \n",
"\n",
" token_usage_completion_tokens flesch_reading_ease flesch_kincaid_grade \\\n",
"0 138 109.04 1.3 \n",
"1 138 83.66 4.8 \n",
"2 138 109.04 1.3 \n",
"3 138 83.66 4.8 \n",
"4 138 109.04 1.3 \n",
"5 138 83.66 4.8 \n",
"6 138 109.04 1.3 \n",
"7 138 83.66 4.8 \n",
"8 138 109.04 1.3 \n",
"9 138 83.66 4.8 \n",
"10 138 109.04 1.3 \n",
"11 138 83.66 4.8 \n",
"\n",
" ... difficult_words linsear_write_formula gunning_fog \\\n",
"0 ... 0 5.5 5.20 \n",
"1 ... 2 6.5 8.28 \n",
"2 ... 0 5.5 5.20 \n",
"3 ... 2 6.5 8.28 \n",
"4 ... 0 5.5 5.20 \n",
"5 ... 2 6.5 8.28 \n",
"6 ... 0 5.5 5.20 \n",
"7 ... 2 6.5 8.28 \n",
"8 ... 0 5.5 5.20 \n",
"9 ... 2 6.5 8.28 \n",
"10 ... 0 5.5 5.20 \n",
"11 ... 2 6.5 8.28 \n",
"\n",
" text_standard fernandez_huerta szigriszt_pazos gutierrez_polini \\\n",
"0 5th and 6th grade 133.58 131.54 62.30 \n",
"1 6th and 7th grade 115.58 112.37 54.83 \n",
"2 5th and 6th grade 133.58 131.54 62.30 \n",
"3 6th and 7th grade 115.58 112.37 54.83 \n",
"4 5th and 6th grade 133.58 131.54 62.30 \n",
"5 6th and 7th grade 115.58 112.37 54.83 \n",
"6 5th and 6th grade 133.58 131.54 62.30 \n",
"7 6th and 7th grade 115.58 112.37 54.83 \n",
"8 5th and 6th grade 133.58 131.54 62.30 \n",
"9 6th and 7th grade 115.58 112.37 54.83 \n",
"10 5th and 6th grade 133.58 131.54 62.30 \n",
"11 6th and 7th grade 115.58 112.37 54.83 \n",
"\n",
" crawford gulpease_index osman \n",
"0 -0.2 79.8 116.91 \n",
"1 1.4 72.1 100.17 \n",
"2 -0.2 79.8 116.91 \n",
"3 1.4 72.1 100.17 \n",
"4 -0.2 79.8 116.91 \n",
"5 1.4 72.1 100.17 \n",
"6 -0.2 79.8 116.91 \n",
"7 1.4 72.1 100.17 \n",
"8 -0.2 79.8 116.91 \n",
"9 1.4 72.1 100.17 \n",
"10 -0.2 79.8 116.91 \n",
"11 1.4 72.1 100.17 \n",
"\n",
"[12 rows x 24 columns]}\n",
"2023-03-29 14:00:25,948 - clearml.Task - INFO - Completed model upload to https://files.clear.ml/langchain_callback_demo/llm.988bd727b0e94a29a3ac0ee526813545/models/simple_sequential\n"
]
}
],
"source": [
"# SCENARIO 1 - LLM\n",
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\"] * 3)\n",
"# After every generation run, use flush to make sure all the metrics\n",
"# prompts and other output are properly saved separately\n",
"clearml_callback.flush_tracker(langchain_asset=llm, name=\"simple_sequential\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point you can already go to https://app.clear.ml and take a look at the resulting ClearML Task that was created.\n",
"\n",
"Among others, you should see that this notebook is saved along with any git information. The model JSON that contains the used parameters is saved as an artifact, there are also console logs and under the plots section, you'll find tables that represent the flow of the chain.\n",
"\n",
"Finally, if you enabled visualizations, these are stored as HTML files under debug samples."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scenario 2: Creating a agent with tools\n",
"\n",
"To show a more advanced workflow, let's create an agent with access to tools. The way ClearML tracks the results is not different though, only the table will look slightly different as there are other types of actions taken when compared to the earlier, simpler example.\n",
"\n",
"You can now also see the use of the `finish=True` keyword, which will fully close the ClearML Task, instead of just resetting the parameters and prompts for a new conversation."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"{'action': 'on_chain_start', 'name': 'AgentExecutor', 'step': 1, 'starts': 1, 'ends': 0, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 0, 'llm_ends': 0, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'input': 'Who is the wife of the person who sang summer of 69?'}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 2, 'starts': 2, 'ends': 0, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 1, 'llm_ends': 0, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'prompts': 'Answer the following questions as best you can. You have access to the following tools:\\n\\nSearch: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.\\nCalculator: Useful for when you need to answer questions about math.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [Search, Calculator]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin!\\n\\nQuestion: Who is the wife of the person who sang summer of 69?\\nThought:'}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 189, 'token_usage_completion_tokens': 34, 'token_usage_total_tokens': 223, 'model_name': 'text-davinci-003', 'step': 3, 'starts': 2, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 1, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 0, 'tool_ends': 0, 'agent_ends': 0, 'text': ' I need to find out who sang summer of 69 and then find out who their wife is.\\nAction: Search\\nAction Input: \"Who sang summer of 69\"', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 91.61, 'flesch_kincaid_grade': 3.8, 'smog_index': 0.0, 'coleman_liau_index': 3.41, 'automated_readability_index': 3.5, 'dale_chall_readability_score': 6.06, 'difficult_words': 2, 'linsear_write_formula': 5.75, 'gunning_fog': 5.4, 'text_standard': '3rd and 4th grade', 'fernandez_huerta': 121.07, 'szigriszt_pazos': 119.5, 'gutierrez_polini': 54.91, 'crawford': 0.9, 'gulpease_index': 72.7, 'osman': 92.16}\n",
"\u001b[32;1m\u001b[1;3m I need to find out who sang summer of 69 and then find out who their wife is.\n",
"Action: Search\n",
"Action Input: \"Who sang summer of 69\"\u001b[0m{'action': 'on_agent_action', 'tool': 'Search', 'tool_input': 'Who sang summer of 69', 'log': ' I need to find out who sang summer of 69 and then find out who their wife is.\\nAction: Search\\nAction Input: \"Who sang summer of 69\"', 'step': 4, 'starts': 3, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 1, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 1, 'tool_ends': 0, 'agent_ends': 0}\n",
"{'action': 'on_tool_start', 'input_str': 'Who sang summer of 69', 'name': 'Search', 'description': 'A search engine. Useful for when you need to answer questions about current events. Input should be a search query.', 'step': 5, 'starts': 4, 'ends': 1, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 1, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 2, 'tool_ends': 0, 'agent_ends': 0}\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mBryan Adams - Summer Of 69 (Official Music Video).\u001b[0m\n",
"Thought:{'action': 'on_tool_end', 'output': 'Bryan Adams - Summer Of 69 (Official Music Video).', 'step': 6, 'starts': 4, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 1, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 2, 'tool_ends': 1, 'agent_ends': 0}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 7, 'starts': 5, 'ends': 2, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 1, 'llm_streams': 0, 'tool_starts': 2, 'tool_ends': 1, 'agent_ends': 0, 'prompts': 'Answer the following questions as best you can. You have access to the following tools:\\n\\nSearch: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.\\nCalculator: Useful for when you need to answer questions about math.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [Search, Calculator]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin!\\n\\nQuestion: Who is the wife of the person who sang summer of 69?\\nThought: I need to find out who sang summer of 69 and then find out who their wife is.\\nAction: Search\\nAction Input: \"Who sang summer of 69\"\\nObservation: Bryan Adams - Summer Of 69 (Official Music Video).\\nThought:'}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 242, 'token_usage_completion_tokens': 28, 'token_usage_total_tokens': 270, 'model_name': 'text-davinci-003', 'step': 8, 'starts': 5, 'ends': 3, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 2, 'tool_ends': 1, 'agent_ends': 0, 'text': ' I need to find out who Bryan Adams is married to.\\nAction: Search\\nAction Input: \"Who is Bryan Adams married to\"', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 94.66, 'flesch_kincaid_grade': 2.7, 'smog_index': 0.0, 'coleman_liau_index': 4.73, 'automated_readability_index': 4.0, 'dale_chall_readability_score': 7.16, 'difficult_words': 2, 'linsear_write_formula': 4.25, 'gunning_fog': 4.2, 'text_standard': '4th and 5th grade', 'fernandez_huerta': 124.13, 'szigriszt_pazos': 119.2, 'gutierrez_polini': 52.26, 'crawford': 0.7, 'gulpease_index': 74.7, 'osman': 84.2}\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Bryan Adams is married to.\n",
"Action: Search\n",
"Action Input: \"Who is Bryan Adams married to\"\u001b[0m{'action': 'on_agent_action', 'tool': 'Search', 'tool_input': 'Who is Bryan Adams married to', 'log': ' I need to find out who Bryan Adams is married to.\\nAction: Search\\nAction Input: \"Who is Bryan Adams married to\"', 'step': 9, 'starts': 6, 'ends': 3, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 3, 'tool_ends': 1, 'agent_ends': 0}\n",
"{'action': 'on_tool_start', 'input_str': 'Who is Bryan Adams married to', 'name': 'Search', 'description': 'A search engine. Useful for when you need to answer questions about current events. Input should be a search query.', 'step': 10, 'starts': 7, 'ends': 3, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 1, 'agent_ends': 0}\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mBryan Adams has never married. In the 1990s, he was in a relationship with Danish model Cecilie Thomsen. In 2011, Bryan and Alicia Grimaldi, his ...\u001b[0m\n",
"Thought:{'action': 'on_tool_end', 'output': 'Bryan Adams has never married. In the 1990s, he was in a relationship with Danish model Cecilie Thomsen. In 2011, Bryan and Alicia Grimaldi, his ...', 'step': 11, 'starts': 7, 'ends': 4, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 2, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 2, 'agent_ends': 0}\n",
"{'action': 'on_llm_start', 'name': 'OpenAI', 'step': 12, 'starts': 8, 'ends': 4, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 3, 'llm_ends': 2, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 2, 'agent_ends': 0, 'prompts': 'Answer the following questions as best you can. You have access to the following tools:\\n\\nSearch: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.\\nCalculator: Useful for when you need to answer questions about math.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [Search, Calculator]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin!\\n\\nQuestion: Who is the wife of the person who sang summer of 69?\\nThought: I need to find out who sang summer of 69 and then find out who their wife is.\\nAction: Search\\nAction Input: \"Who sang summer of 69\"\\nObservation: Bryan Adams - Summer Of 69 (Official Music Video).\\nThought: I need to find out who Bryan Adams is married to.\\nAction: Search\\nAction Input: \"Who is Bryan Adams married to\"\\nObservation: Bryan Adams has never married. In the 1990s, he was in a relationship with Danish model Cecilie Thomsen. In 2011, Bryan and Alicia Grimaldi, his ...\\nThought:'}\n",
"{'action': 'on_llm_end', 'token_usage_prompt_tokens': 314, 'token_usage_completion_tokens': 18, 'token_usage_total_tokens': 332, 'model_name': 'text-davinci-003', 'step': 13, 'starts': 8, 'ends': 5, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 3, 'llm_ends': 3, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 2, 'agent_ends': 0, 'text': ' I now know the final answer.\\nFinal Answer: Bryan Adams has never been married.', 'generation_info_finish_reason': 'stop', 'generation_info_logprobs': None, 'flesch_reading_ease': 81.29, 'flesch_kincaid_grade': 3.7, 'smog_index': 0.0, 'coleman_liau_index': 5.75, 'automated_readability_index': 3.9, 'dale_chall_readability_score': 7.37, 'difficult_words': 1, 'linsear_write_formula': 2.5, 'gunning_fog': 2.8, 'text_standard': '3rd and 4th grade', 'fernandez_huerta': 115.7, 'szigriszt_pazos': 110.84, 'gutierrez_polini': 49.79, 'crawford': 0.7, 'gulpease_index': 85.4, 'osman': 83.14}\n",
"\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Bryan Adams has never been married.\u001b[0m\n",
"{'action': 'on_agent_finish', 'output': 'Bryan Adams has never been married.', 'log': ' I now know the final answer.\\nFinal Answer: Bryan Adams has never been married.', 'step': 14, 'starts': 8, 'ends': 6, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 0, 'llm_starts': 3, 'llm_ends': 3, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 2, 'agent_ends': 1}\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"{'action': 'on_chain_end', 'outputs': 'Bryan Adams has never been married.', 'step': 15, 'starts': 8, 'ends': 7, 'errors': 0, 'text_ctr': 0, 'chain_starts': 1, 'chain_ends': 1, 'llm_starts': 3, 'llm_ends': 3, 'llm_streams': 0, 'tool_starts': 4, 'tool_ends': 2, 'agent_ends': 1}\n",
"{'action_records': action name step starts ends errors text_ctr \\\n",
"0 on_llm_start OpenAI 1 1 0 0 0 \n",
"1 on_llm_start OpenAI 1 1 0 0 0 \n",
"2 on_llm_start OpenAI 1 1 0 0 0 \n",
"3 on_llm_start OpenAI 1 1 0 0 0 \n",
"4 on_llm_start OpenAI 1 1 0 0 0 \n",
".. ... ... ... ... ... ... ... \n",
"66 on_tool_end NaN 11 7 4 0 0 \n",
"67 on_llm_start OpenAI 12 8 4 0 0 \n",
"68 on_llm_end NaN 13 8 5 0 0 \n",
"69 on_agent_finish NaN 14 8 6 0 0 \n",
"70 on_chain_end NaN 15 8 7 0 0 \n",
"\n",
" chain_starts chain_ends llm_starts ... gulpease_index osman input \\\n",
"0 0 0 1 ... NaN NaN NaN \n",
"1 0 0 1 ... NaN NaN NaN \n",
"2 0 0 1 ... NaN NaN NaN \n",
"3 0 0 1 ... NaN NaN NaN \n",
"4 0 0 1 ... NaN NaN NaN \n",
".. ... ... ... ... ... ... ... \n",
"66 1 0 2 ... NaN NaN NaN \n",
"67 1 0 3 ... NaN NaN NaN \n",
"68 1 0 3 ... 85.4 83.14 NaN \n",
"69 1 0 3 ... NaN NaN NaN \n",
"70 1 1 3 ... NaN NaN NaN \n",
"\n",
" tool tool_input log \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN \n",
".. ... ... ... \n",
"66 NaN NaN NaN \n",
"67 NaN NaN NaN \n",
"68 NaN NaN NaN \n",
"69 NaN NaN I now know the final answer.\\nFinal Answer: B... \n",
"70 NaN NaN NaN \n",
"\n",
" input_str description output \\\n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN \n",
".. ... ... ... \n",
"66 NaN NaN Bryan Adams has never married. In the 1990s, h... \n",
"67 NaN NaN NaN \n",
"68 NaN NaN NaN \n",
"69 NaN NaN Bryan Adams has never been married. \n",
"70 NaN NaN NaN \n",
"\n",
" outputs \n",
"0 NaN \n",
"1 NaN \n",
"2 NaN \n",
"3 NaN \n",
"4 NaN \n",
".. ... \n",
"66 NaN \n",
"67 NaN \n",
"68 NaN \n",
"69 NaN \n",
"70 Bryan Adams has never been married. \n",
"\n",
"[71 rows x 47 columns], 'session_analysis': prompt_step prompts name \\\n",
"0 2 Answer the following questions as best you can... OpenAI \n",
"1 7 Answer the following questions as best you can... OpenAI \n",
"2 12 Answer the following questions as best you can... OpenAI \n",
"\n",
" output_step output \\\n",
"0 3 I need to find out who sang summer of 69 and ... \n",
"1 8 I need to find out who Bryan Adams is married... \n",
"2 13 I now know the final answer.\\nFinal Answer: B... \n",
"\n",
" token_usage_total_tokens token_usage_prompt_tokens \\\n",
"0 223 189 \n",
"1 270 242 \n",
"2 332 314 \n",
"\n",
" token_usage_completion_tokens flesch_reading_ease flesch_kincaid_grade \\\n",
"0 34 91.61 3.8 \n",
"1 28 94.66 2.7 \n",
"2 18 81.29 3.7 \n",
"\n",
" ... difficult_words linsear_write_formula gunning_fog \\\n",
"0 ... 2 5.75 5.4 \n",
"1 ... 2 4.25 4.2 \n",
"2 ... 1 2.50 2.8 \n",
"\n",
" text_standard fernandez_huerta szigriszt_pazos gutierrez_polini \\\n",
"0 3rd and 4th grade 121.07 119.50 54.91 \n",
"1 4th and 5th grade 124.13 119.20 52.26 \n",
"2 3rd and 4th grade 115.70 110.84 49.79 \n",
"\n",
" crawford gulpease_index osman \n",
"0 0.9 72.7 92.16 \n",
"1 0.7 74.7 84.20 \n",
"2 0.7 85.4 83.14 \n",
"\n",
"[3 rows x 24 columns]}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Could not update last created model in Task 988bd727b0e94a29a3ac0ee526813545, Task status 'completed' cannot be updated\n"
]
}
],
"source": [
"from langchain.agents import initialize_agent, load_tools\n",
"\n",
"# SCENARIO 2 - Agent with Tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm, callback_manager=manager)\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=\"zero-shot-react-description\",\n",
" callback_manager=manager,\n",
" verbose=True,\n",
")\n",
"agent.run(\n",
" \"Who is the wife of the person who sang summer of 69?\"\n",
")\n",
"clearml_callback.flush_tracker(langchain_asset=agent, name=\"Agent with Tools\", finish=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tips and Next Steps\n",
"\n",
"- Make sure you always use a unique `name` argument for the `clearml_callback.flush_tracker` function. If not, the model parameters used for a run will override the previous run!\n",
"\n",
"- If you close the ClearML Callback using `clearml_callback.flush_tracker(..., finish=True)` the Callback cannot be used anymore. Make a new one if you want to keep logging.\n",
"\n",
"- Check out the rest of the open source ClearML ecosystem, there is a data version manager, a remote execution agent, automated pipelines and much more!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -13,10 +13,11 @@ This page is broken into two parts: installation and setup, and then references
- Install the Python SDK with `pip install "unstructured[local-inference]"`
- Install the following system dependencies if they are not already available on your system.
Depending on what document types you're parsing, you may not need all of these.
- `libmagic-dev`
- `poppler-utils`
- `tesseract-ocr`
- `libreoffice`
- `libmagic-dev` (filetype detection)
- `poppler-utils` (images and PDFs)
- `tesseract-ocr`(images and PDFs)
- `libreoffice` (MS Office docs)
- `pandoc` (EPUBs)
- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which
`unstructured` uses for layout detection:
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2"`

File diff suppressed because one or more lines are too long

View File

@@ -12,26 +12,48 @@
"An agent consists of three parts:\n",
" \n",
" - Tools: The tools the agent has available to use.\n",
" - The agent class itself: this decides which action to take.\n",
" - LLMChain: The LLMChain that produces the text that is parsed in a certain way to determine which action to take.\n",
" - The agent class itself: this parses the output of the LLMChain to determine which action to take.\n",
" \n",
" \n",
"In this notebook we walk through how to create a custom agent."
"In this notebook we walk through two types of custom agents. The first type shows how to create a custom LLMChain, but still use an existing agent class to parse the output. The second shows how to create a custom agent class."
]
},
{
"cell_type": "markdown",
"id": "6064f080",
"metadata": {},
"source": [
"### Custom LLMChain\n",
"\n",
"The first way to create a custom agent is to use an existing Agent class, but use a custom LLMChain. This is the simplest way to create a custom Agent. It is highly reccomended that you work with the `ZeroShotAgent`, as at the moment that is by far the most generalizable one. \n",
"\n",
"Most of the work in creating the custom LLMChain comes down to the prompt. Because we are using an existing agent class to parse the output, it is very important that the prompt say to produce text in that format. Additionally, we currently require an `agent_scratchpad` input variable to put notes on previous actions and observations. This should almost always be the final part of the prompt. However, besides those instructions, you can customize the prompt as you wish.\n",
"\n",
"To ensure that the prompt contains the appropriate instructions, we will utilize a helper method on that class. The helper method for the `ZeroShotAgent` takes the following arguments:\n",
"\n",
"- tools: List of tools the agent will have access to, used to format the prompt.\n",
"- prefix: String to put before the list of tools.\n",
"- suffix: String to put after the list of tools.\n",
"- input_variables: List of input variables the final prompt will expect.\n",
"\n",
"For this exercise, we will give our agent access to Google Search, and we will customize it in that we will have it answer as a pirate."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 23,
"id": "9af9734e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import Tool, AgentExecutor\n",
"from langchain import OpenAI, SerpAPIWrapper"
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 24,
"id": "becda2a1",
"metadata": {},
"outputs": [],
@@ -41,83 +63,110 @@
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\",\n",
" return_direct=True\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "eca724fc",
"execution_count": 25,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents.agent import BaseAgent"
"prefix = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Args\"\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"agent_scratchpad\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "59db7b58",
"metadata": {},
"source": [
"In case we are curious, we can now take a look at the final prompt template to see what it looks like when its all put together."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a33e2f7e",
"execution_count": 26,
"id": "e21d2098",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
"\n",
"Search: useful for when you need to answer questions about current events\n",
"\n",
"Use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action: the action to take, should be one of [Search]\n",
"Action Input: the input to the action\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Args\"\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\n"
]
}
],
"source": [
"from typing import List, Tuple, Any, Union\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"print(prompt.template)"
]
},
{
"cell_type": "markdown",
"id": "5e028e6d",
"metadata": {},
"source": [
"Note that we are able to feed agents a self-defined prompt template, i.e. not restricted to the prompt generated by the `create_prompt` function, assuming it meets the agent's requirements. \n",
"\n",
"class FakeAgent(BaseAgent):\n",
" \"\"\"Fake Custom Agent.\"\"\"\n",
" \n",
" @property\n",
" def input_keys(self):\n",
" return [\"input\"]\n",
" \n",
" def plan(\n",
" self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any\n",
" ) -> Union[AgentAction, AgentFinish]:\n",
" \"\"\"Given input, decided what to do.\n",
"\n",
" Args:\n",
" intermediate_steps: Steps the LLM has taken to date,\n",
" along with observations\n",
" **kwargs: User inputs.\n",
"\n",
" Returns:\n",
" Action specifying what tool to use.\n",
" \"\"\"\n",
" return AgentAction(tool=\"Search\", tool_input=\"foo\", log=\"\")\n",
"\n",
" async def aplan(\n",
" self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any\n",
" ) -> Union[AgentAction, AgentFinish]:\n",
" \"\"\"Given input, decided what to do.\n",
"\n",
" Args:\n",
" intermediate_steps: Steps the LLM has taken to date,\n",
" along with observations\n",
" **kwargs: User inputs.\n",
"\n",
" Returns:\n",
" Action specifying what tool to use.\n",
" \"\"\"\n",
" return AgentAction(tool=\"Search\", tool_input=\"foo\", log=\"\")"
"For example, for `ZeroShotAgent`, we will need to ensure that it meets the following requirements. There should a string starting with \"Action:\" and a following string starting with \"Action Input:\", and both should be separated by a newline.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "655d72f6",
"execution_count": 27,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
"source": [
"agent = FakeAgent()"
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 28,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
"source": [
"tool_names = [tool.name for tool in tools]\n",
"agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "490604e9",
"metadata": {},
"outputs": [],
@@ -127,7 +176,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 31,
"id": "653b1617",
"metadata": {},
"outputs": [
@@ -138,7 +187,12 @@
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\u001b[36;1m\u001b[1;3mFoo Fighters is an American rock band formed in Seattle in 1994. Foo Fighters was initially formed as a one-man project by former Nirvana drummer Dave Grohl. Following the success of the 1995 eponymous debut album, Grohl recruited a band consisting of Nate Mendel, William Goldsmith, and Pat Smear.\u001b[0m\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada\n",
"Action: Search\n",
"Action Input: Population of Canada 2023\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe current population of Canada is 38,610,447 as of Saturday, February 18, 2023, based on Worldometer elaboration of the latest United Nations data. Canada 2020 population is estimated at 37,742,154 people at mid year according to UN data.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Arrr, Canada be havin' 38,610,447 scallywags livin' there as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -146,10 +200,10 @@
{
"data": {
"text/plain": [
"'Foo Fighters is an American rock band formed in Seattle in 1994. Foo Fighters was initially formed as a one-man project by former Nirvana drummer Dave Grohl. Following the success of the 1995 eponymous debut album, Grohl recruited a band consisting of Nate Mendel, William Goldsmith, and Pat Smear.'"
"\"Arrr, Canada be havin' 38,610,447 scallywags livin' there as of 2023!\""
]
},
"execution_count": 7,
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
@@ -158,6 +212,114 @@
"agent_executor.run(\"How many people live in canada as of 2023?\")"
]
},
{
"cell_type": "markdown",
"id": "040eb343",
"metadata": {},
"source": [
"### Multiple inputs\n",
"Agents can also work with prompts that require multiple inputs."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "43dbfa2f",
"metadata": {},
"outputs": [],
"source": [
"prefix = \"\"\"Answer the following questions as best you can. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"When answering, you MUST speak in the following language: {language}.\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"language\", \"agent_scratchpad\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "0f087313",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "92c75a10",
"metadata": {},
"outputs": [],
"source": [
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "ac5b83bf",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "c960e4ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada in 2023.\n",
"Action: Search\n",
"Action Input: Population of Canada in 2023\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe current population of Canada is 38,610,447 as of Saturday, February 18, 2023, based on Worldometer elaboration of the latest United Nations data. Canada 2020 population is estimated at 37,742,154 people at mid year according to UN data.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: La popolazione del Canada nel 2023 è stimata in 38.610.447 persone.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'La popolazione del Canada nel 2023 è stimata in 38.610.447 persone.'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(input=\"How many people live in canada as of 2023?\", language=\"italian\")"
]
},
{
"cell_type": "markdown",
"id": "90171b2b",
"metadata": {},
"source": [
"### Custom Agent Class\n",
"\n",
"Coming soon."
]
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -1,219 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ba5f8741",
"metadata": {},
"source": [
"# Custom LLLM Agent\n",
"\n",
"This notebook goes through how to create your own custom LLM agent.\n",
"\n",
"An LLM agent consists of three parts:\n",
" \n",
" - Tools: The tools the agent has available to use.\n",
" - LLMChain: The LLMChain that produces the text that is parsed in a certain way to determine which action to take.\n",
" - OutputParser: This determines how to parse the LLMOutput into \n",
" \n",
" \n",
"In this notebook we walk through how to create a custom LLM agent."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9af9734e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import Tool, AgentExecutor\n",
"from langchain.agents.agent import LLMAgent, AgentOutputParser\n",
"from langchain.prompts import StringPromptTemplate\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "becda2a1",
"metadata": {},
"outputs": [],
"source": [
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
"\n",
"Search: useful for when you need to answer questions about current events\n",
"\n",
"Use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action: the action to take, should be one of [Search]\n",
"Action Input: the input to the action\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Args\"\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"class CustomPromptTemplate(StringPromptTemplate):\n",
" \n",
" def format(self, **kwargs) -> str:\n",
" intermediate_steps = kwargs.pop(\"intermediate_steps\")\n",
" thoughts = \"\"\n",
" for action, observation in intermediate_steps:\n",
" thoughts += action.log\n",
" thoughts += f\"\\nObservation: {observation}\\nThought: \"\n",
" kwargs[\"agent_scratchpad\"] = thoughts\n",
" return template.format(**kwargs)\n",
"\n",
"prompt = CustomPromptTemplate(input_variables=[\"input\", \"intermediate_steps\"])\n",
"\n",
"FINAL_ANSWER_ACTION = \"Final Answer:\"\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"import re\n",
"class CustomOutputParser(AgentOutputParser):\n",
" \n",
" def parse(self, llm_output):\n",
" if FINAL_ANSWER_ACTION in llm_output:\n",
" return AgentFinish(\n",
" return_values={\"output\": llm_output.split(FINAL_ANSWER_ACTION)[-1].strip()},\n",
" log=llm_output,\n",
" )\n",
" # \\s matches against tab/newline/whitespace\n",
" regex = r\"Action: (.*?)[\\n]*Action Input:[\\s]*(.*)\"\n",
" match = re.search(regex, llm_output, re.DOTALL)\n",
" if not match:\n",
" raise ValueError(f\"Could not parse LLM output: `{llm_output}`\")\n",
" action = match.group(1).strip()\n",
" action_input = match.group(2)\n",
" return AgentAction(tool=action, tool_input=action_input.strip(\" \").strip('\"'), log=llm_output)\n",
"\n",
"output_parser = CustomOutputParser()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
"source": [
"tool_names = [tool.name for tool in tools]\n",
"agent = LLMAgent(llm_chain=llm_chain, output_parser=output_parser, stop=[\"\\nObservation:\"], allowed_tools=tool_names)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "490604e9",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "653b1617",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada in 2023\n",
"Action: Search\n",
"Action Input: Population of Canada in 2023\u001b[0m\n",
"\n",
"Observation:\u001b[36;1m\u001b[1;3mThe current population of Canada is 38,644,767 as of Tuesday, March 28, 2023, based on Worldometer elaboration of the latest United Nations data.\u001b[0m\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Arrr, Canada be havin' 38,644,767 people livin' there as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Arrr, Canada be havin' 38,644,767 people livin' there as of 2023!\""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"How many people live in canada as of 2023?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adefb4c2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "18784188d7ecd866c0586ac068b02361a6896dc3a29b64f5cc957f09c590acef"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,348 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ba5f8741",
"metadata": {},
"source": [
"# Custom MRKL Agent\n",
"\n",
"This notebook goes through how to create your own custom MRKL agent.\n",
"\n",
"A MRKL agent consists of three parts:\n",
" \n",
" - Tools: The tools the agent has available to use.\n",
" - LLMChain: The LLMChain that produces the text that is parsed in a certain way to determine which action to take.\n",
" - The agent class itself: this parses the output of the LLMChain to determin which action to take.\n",
" \n",
" \n",
"In this notebook we walk through how to create a custom MRKL agent by creating a custom LLMChain."
]
},
{
"cell_type": "markdown",
"id": "6064f080",
"metadata": {},
"source": [
"### Custom LLMChain\n",
"\n",
"The first way to create a custom agent is to use an existing Agent class, but use a custom LLMChain. This is the simplest way to create a custom Agent. It is highly reccomended that you work with the `ZeroShotAgent`, as at the moment that is by far the most generalizable one. \n",
"\n",
"Most of the work in creating the custom LLMChain comes down to the prompt. Because we are using an existing agent class to parse the output, it is very important that the prompt say to produce text in that format. Additionally, we currently require an `agent_scratchpad` input variable to put notes on previous actions and observations. This should almost always be the final part of the prompt. However, besides those instructions, you can customize the prompt as you wish.\n",
"\n",
"To ensure that the prompt contains the appropriate instructions, we will utilize a helper method on that class. The helper method for the `ZeroShotAgent` takes the following arguments:\n",
"\n",
"- tools: List of tools the agent will have access to, used to format the prompt.\n",
"- prefix: String to put before the list of tools.\n",
"- suffix: String to put after the list of tools.\n",
"- input_variables: List of input variables the final prompt will expect.\n",
"\n",
"For this exercise, we will give our agent access to Google Search, and we will customize it in that we will have it answer as a pirate."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "9af9734e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain import OpenAI, SerpAPIWrapper, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "becda2a1",
"metadata": {},
"outputs": [],
"source": [
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" )\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
"source": [
"prefix = \"\"\"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Args\"\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"agent_scratchpad\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "59db7b58",
"metadata": {},
"source": [
"In case we are curious, we can now take a look at the final prompt template to see what it looks like when its all put together."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "e21d2098",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Answer the following questions as best you can, but speaking as a pirate might speak. You have access to the following tools:\n",
"\n",
"Search: useful for when you need to answer questions about current events\n",
"\n",
"Use the following format:\n",
"\n",
"Question: the input question you must answer\n",
"Thought: you should always think about what to do\n",
"Action: the action to take, should be one of [Search]\n",
"Action Input: the input to the action\n",
"Observation: the result of the action\n",
"... (this Thought/Action/Action Input/Observation can repeat N times)\n",
"Thought: I now know the final answer\n",
"Final Answer: the final answer to the original input question\n",
"\n",
"Begin! Remember to speak as a pirate when giving your final answer. Use lots of \"Args\"\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\n"
]
}
],
"source": [
"print(prompt.template)"
]
},
{
"cell_type": "markdown",
"id": "5e028e6d",
"metadata": {},
"source": [
"Note that we are able to feed agents a self-defined prompt template, i.e. not restricted to the prompt generated by the `create_prompt` function, assuming it meets the agent's requirements. \n",
"\n",
"For example, for `ZeroShotAgent`, we will need to ensure that it meets the following requirements. There should a string starting with \"Action:\" and a following string starting with \"Action Input:\", and both should be separated by a newline.\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
"source": [
"tool_names = [tool.name for tool in tools]\n",
"agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "490604e9",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "653b1617",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada\n",
"Action: Search\n",
"Action Input: Population of Canada 2023\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe current population of Canada is 38,610,447 as of Saturday, February 18, 2023, based on Worldometer elaboration of the latest United Nations data. Canada 2020 population is estimated at 37,742,154 people at mid year according to UN data.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Arrr, Canada be havin' 38,610,447 scallywags livin' there as of 2023!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Arrr, Canada be havin' 38,610,447 scallywags livin' there as of 2023!\""
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(\"How many people live in canada as of 2023?\")"
]
},
{
"cell_type": "markdown",
"id": "040eb343",
"metadata": {},
"source": [
"### Multiple inputs\n",
"Agents can also work with prompts that require multiple inputs."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "43dbfa2f",
"metadata": {},
"outputs": [],
"source": [
"prefix = \"\"\"Answer the following questions as best you can. You have access to the following tools:\"\"\"\n",
"suffix = \"\"\"When answering, you MUST speak in the following language: {language}.\n",
"\n",
"Question: {input}\n",
"{agent_scratchpad}\"\"\"\n",
"\n",
"prompt = ZeroShotAgent.create_prompt(\n",
" tools, \n",
" prefix=prefix, \n",
" suffix=suffix, \n",
" input_variables=[\"input\", \"language\", \"agent_scratchpad\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "0f087313",
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "92c75a10",
"metadata": {},
"outputs": [],
"source": [
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "ac5b83bf",
"metadata": {},
"outputs": [],
"source": [
"agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "c960e4ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out the population of Canada in 2023.\n",
"Action: Search\n",
"Action Input: Population of Canada in 2023\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe current population of Canada is 38,610,447 as of Saturday, February 18, 2023, based on Worldometer elaboration of the latest United Nations data. Canada 2020 population is estimated at 37,742,154 people at mid year according to UN data.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: La popolazione del Canada nel 2023 è stimata in 38.610.447 persone.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'La popolazione del Canada nel 2023 è stimata in 38.610.447 persone.'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.run(input=\"How many people live in canada as of 2023?\", language=\"italian\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adefb4c2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "18784188d7ecd866c0586ac068b02361a6896dc3a29b64f5cc957f09c590acef"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,164 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Apify\n",
"\n",
"This notebook shows how to use the [Apify integration](../../../../ecosystem/apify.md) for LangChain.\n",
"\n",
"[Apify](https://apify.com) is a cloud platform for web scraping and data extraction,\n",
"which provides an [ecosystem](https://apify.com/store) of more than a thousand\n",
"ready-made apps called *Actors* for various web scraping, crawling, and data extraction use cases.\n",
"For example, you can use it to extract Google Search results, Instagram and Facebook profiles, products from Amazon or Shopify, Google Maps reviews, etc. etc.\n",
"\n",
"In this example, we'll use the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor,\n",
"which can deeply crawl websites such as documentation, knowledge bases, help centers, or blogs,\n",
"and extract text content from the web pages. Then we feed the documents into a vector index and answer questions from it.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"First, import `ApifyWrapper` into your source code:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.base import Document\n",
"from langchain.indexes import VectorstoreIndexCreator\n",
"from langchain.utilities import ApifyWrapper"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize it using your [Apify API token](https://console.apify.com/account/integrations) and for the purpose of this example, also with your OpenAI API key:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"OPENAI_API_KEY\"] = \"Your OpenAI API key\"\n",
"os.environ[\"APIFY_API_TOKEN\"] = \"Your Apify API token\"\n",
"\n",
"apify = ApifyWrapper()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Then run the Actor, wait for it to finish, and fetch its results from the Apify dataset into a LangChain document loader.\n",
"\n",
"Note that if you already have some results in an Apify dataset, you can load them directly using `ApifyDatasetLoader`, as shown in [this notebook](../../../indexes/document_loaders/examples/apify_dataset.ipynb). In that notebook, you'll also find the explanation of the `dataset_mapping_function`, which is used to map fields from the Apify dataset records to LangChain `Document` fields."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"loader = apify.call_actor(\n",
" actor_id=\"apify/website-content-crawler\",\n",
" run_input={\"startUrls\": [{\"url\": \"https://python.langchain.com/en/latest/\"}]},\n",
" dataset_mapping_function=lambda item: Document(\n",
" page_content=item[\"text\"] or \"\", metadata={\"source\": item[\"url\"]}\n",
" ),\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Initialize the vector index from the crawled documents:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"index = VectorstoreIndexCreator().from_loaders([loader])"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"And finally, query the vector index:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"query = \"What is LangChain?\"\n",
"result = index.query_with_sources(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" LangChain is a standard interface through which you can interact with a variety of large language models (LLMs). It provides modules that can be used to build language model applications, and it also provides chains and agents with memory capabilities.\n",
"\n",
"https://python.langchain.com/en/latest/modules/models/llms.html, https://python.langchain.com/en/latest/getting_started/getting_started.html\n"
]
}
],
"source": [
"print(result[\"answer\"])\n",
"print(result[\"sources\"])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,124 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "39af9ecd",
"metadata": {},
"source": [
"# EPubs\n",
"\n",
"This covers how to load `.epub` documents into a document format that we can use downstream. You'll need to install the [`pandocs`](https://pandoc.org/installing.html) package for this loader to work."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "721c48aa",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredEPubLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9d3d0e35",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredEPubLoader(\"winter-sports.epub\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "06073f91",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "525d6b67",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "064f9162",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredEPubLoader(\"winter-sports.epub\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "abefbbdb",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a547c534",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='The Project Gutenberg eBook of Winter Sports in\\nSwitzerland, by E. F. Benson', lookup_str='', metadata={'source': 'winter-sports.epub', 'page_number': 1, 'category': 'Title'}, lookup_index=0)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "381d4139",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,175 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Apify Dataset\n",
"\n",
"This notebook shows how to load Apify datasets to LangChain.\n",
"\n",
"[Apify Dataset](https://docs.apify.com/platform/storage/dataset) is a scaleable append-only storage with sequential access built for storing structured web scraping results, such as a list of products or Google SERPs, and then export them to various formats like JSON, CSV, or Excel. Datasets are mainly used to save results of [Apify Actors](https://apify.com/store)—serverless cloud programs for varius web scraping, crawling, and data extraction use cases.\n",
"\n",
"## Prerequisites\n",
"\n",
"You need to have an existing dataset on the Apify platform. If you don't have one, please first check out [this notebook](../../../agents/tools/examples/apify.ipynb) on how to use Apify to extract content from documentation, knowledge bases, help centers, or blogs."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"First, import `ApifyDatasetLoader` into your source code:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ApifyDatasetLoader\n",
"from langchain.document_loaders.base import Document"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Then provide a function that maps Apify dataset record fields to LangChain `Document` format.\n",
"\n",
"For example, if your dataset items are structured like this:\n",
"\n",
"```json\n",
"{\n",
" \"url\": \"https://apify.com\",\n",
" \"text\": \"Apify is the best web scraping and automation platform.\"\n",
"}\n",
"```\n",
"\n",
"The mapping function in the code below will convert them to LangChain `Document` format, so that you can use them further with any LLM model (e.g. for question answering)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"loader = ApifyDatasetLoader(\n",
" dataset_id=\"your-dataset-id\",\n",
" dataset_mapping_function=lambda dataset_item: Document(\n",
" page_content=dataset_item[\"text\"], metadata={\"source\": dataset_item[\"url\"]}\n",
" ),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## An example with question answering\n",
"\n",
"In this example, we use data from a dataset to answer a question."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore.document import Document\n",
"from langchain.document_loaders import ApifyDatasetLoader\n",
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"loader = ApifyDatasetLoader(\n",
" dataset_id=\"your-dataset-id\",\n",
" dataset_mapping_function=lambda item: Document(\n",
" page_content=item[\"text\"] or \"\", metadata={\"source\": item[\"url\"]}\n",
" ),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"index = VectorstoreIndexCreator().from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"query = \"What is Apify?\"\n",
"result = index.query_with_sources(query)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Apify is a platform for developing, running, and sharing serverless cloud programs. It enables users to create web scraping and automation tools and publish them on the Apify platform.\n",
"\n",
"https://docs.apify.com/platform/actors, https://docs.apify.com/platform/actors/running/actors-in-store, https://docs.apify.com/platform/security, https://docs.apify.com/platform/actors/examples\n"
]
}
],
"source": [
"print(result[\"answer\"])\n",
"print(result[\"sources\"])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,13 +1,14 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "33205b12",
"metadata": {},
"source": [
"# Figma\n",
"\n",
"This notebook covers how to load data from the Figma REST API into a format that can be ingested into LangChain."
"This notebook covers how to load data from the Figma REST API into a format that can be ingested into LangChain, along with example usage for code generation."
]
},
{
@@ -19,7 +20,35 @@
"source": [
"import os\n",
"\n",
"from langchain.document_loaders import FigmaFileLoader"
"\n",
"from langchain.document_loaders.figma import FigmaFileLoader\n",
"\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.indexes import VectorstoreIndexCreator\n",
"from langchain.chains import ConversationChain, LLMChain\n",
"from langchain.memory import ConversationBufferWindowMemory\n",
"from langchain.prompts.chat import (\n",
" ChatPromptTemplate,\n",
" SystemMessagePromptTemplate,\n",
" AIMessagePromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d809744a",
"metadata": {},
"source": [
"The Figma API Requires an access token, node_ids, and a file key.\n",
"\n",
"The file key can be pulled from the URL. https://www.figma.com/file/{filekey}/sampleFilename\n",
"\n",
"Node IDs are also available in the URL. Click on anything and look for the '?node-id={node_id}' param.\n",
"\n",
"Access token instructions are in the Figma help center article: https://help.figma.com/hc/en-us/articles/8085703771159-Manage-personal-access-tokens"
]
},
{
@@ -29,7 +58,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = FigmaFileLoader(\n",
"figma_loader = FigmaFileLoader(\n",
" os.environ.get('ACCESS_TOKEN'),\n",
" os.environ.get('NODE_IDS'),\n",
" os.environ.get('FILE_KEY')\n",
@@ -43,7 +72,9 @@
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
"# see https://python.langchain.com/en/latest/modules/indexes/getting_started.html for more details\n",
"index = VectorstoreIndexCreator().from_loaders([figma_loader])\n",
"figma_doc_retriever = index.vectorstore.as_retriever()"
]
},
{
@@ -52,6 +83,55 @@
"id": "3e64cac2",
"metadata": {},
"outputs": [],
"source": [
"def generate_code(human_input):\n",
" # I have no idea if the Jon Carmack thing makes for better code. YMMV.\n",
" # See https://python.langchain.com/en/latest/modules/models/chat/getting_started.html for chat info\n",
" system_prompt_template = \"\"\"You are expert coder Jon Carmack. Use the provided design context to create idomatic HTML/CSS code as possible based on the user request.\n",
" Everything must be inline in one file and your response must be directly renderable by the browser.\n",
" Figma file nodes and metadata: {context}\"\"\"\n",
"\n",
" human_prompt_template = \"Code the {text}. Ensure it's mobile responsive\"\n",
" system_message_prompt = SystemMessagePromptTemplate.from_template(system_prompt_template)\n",
" human_message_prompt = HumanMessagePromptTemplate.from_template(human_prompt_template)\n",
" # delete the gpt-4 model_name to use the default gpt-3.5 turbo for faster results\n",
" gpt_4 = ChatOpenAI(temperature=.02, model_name='gpt-4')\n",
" # Use the retriever's 'get_relevant_documents' method if needed to filter down longer docs\n",
" relevant_nodes = figma_doc_retriever.get_relevant_documents(human_input)\n",
" conversation = [system_message_prompt, human_message_prompt]\n",
" chat_prompt = ChatPromptTemplate.from_messages(conversation)\n",
" response = gpt_4(chat_prompt.format_prompt( \n",
" context=relevant_nodes, \n",
" text=human_input).to_messages())\n",
" return response"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36a96114",
"metadata": {},
"outputs": [],
"source": [
"response = generate_code(\"page top header\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "baf9b2c9",
"metadata": {},
"source": [
"Returns the following in `response.content`:\n",
"```\n",
"<!DOCTYPE html>\\n<html lang=\"en\">\\n<head>\\n <meta charset=\"UTF-8\">\\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\\n <style>\\n @import url(\\'https://fonts.googleapis.com/css2?family=DM+Sans:wght@500;700&family=Inter:wght@600&display=swap\\');\\n\\n body {\\n margin: 0;\\n font-family: \\'DM Sans\\', sans-serif;\\n }\\n\\n .header {\\n display: flex;\\n justify-content: space-between;\\n align-items: center;\\n padding: 20px;\\n background-color: #fff;\\n box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);\\n }\\n\\n .header h1 {\\n font-size: 16px;\\n font-weight: 700;\\n margin: 0;\\n }\\n\\n .header nav {\\n display: flex;\\n align-items: center;\\n }\\n\\n .header nav a {\\n font-size: 14px;\\n font-weight: 500;\\n text-decoration: none;\\n color: #000;\\n margin-left: 20px;\\n }\\n\\n @media (max-width: 768px) {\\n .header nav {\\n display: none;\\n }\\n }\\n </style>\\n</head>\\n<body>\\n <header class=\"header\">\\n <h1>Company Contact</h1>\\n <nav>\\n <a href=\"#\">Lorem Ipsum</a>\\n <a href=\"#\">Lorem Ipsum</a>\\n <a href=\"#\">Lorem Ipsum</a>\\n </nav>\\n </header>\\n</body>\\n</html>\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "38827110",
"metadata": {},
"source": []
}
],
@@ -71,7 +151,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.10"
}
},
"nbformat": 4,

View File

@@ -311,7 +311,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.8.13"
}
},
"nbformat": 4,

View File

@@ -1,905 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "249b4058",
"metadata": {},
"source": [
"# Embeddings\n",
"\n",
"This notebook goes over how to use the Embedding class in LangChain.\n",
"\n",
"The Embedding class is a class designed for interfacing with embeddings. There are lots of Embedding providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.\n",
"\n",
"Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.\n",
"\n",
"The base Embedding class in LangChain exposes two methods: `embed_documents` and `embed_query`. The largest difference is that these two methods have different interfaces: one works over multiple documents, while the other works over a single document. Besides this, another reason for having these as two separate methods is that some embedding providers have different embedding methods for documents (to be searched over) vs queries (the search query itself)."
]
},
{
"cell_type": "markdown",
"id": "278b6c63",
"metadata": {},
"source": [
"## OpenAI\n",
"\n",
"Let's load the OpenAI Embedding class."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "0be1af71",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2c66e5da",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "01370375",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bfb6142c",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0356c3b7",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "bb61bbeb",
"metadata": {},
"source": [
"Let's load the OpenAI Embedding class with first generation models (e.g. text-search-ada-doc-001/text-search-ada-query-001). Note: These are not recommended models - see [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c0b072cc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a56b70f5",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings(model_name=\"ada\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14aefb64",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c39ed33",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3221db6",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "c3852491",
"metadata": {},
"source": [
"## AzureOpenAI\n",
"\n",
"Let's load the OpenAI Embedding class with environment variables set to indicate to use Azure endpoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b40f827",
"metadata": {},
"outputs": [],
"source": [
"# set the environment variables needed for openai package to know to reach out to azure\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_TYPE\"] = \"azure\"\n",
"os.environ[\"OPENAI_API_BASE\"] = \"https://<your-endpoint.openai.azure.com/\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your AzureOpenAI key\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb36d16c",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings(model=\"your-embeddings-deployment-name\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "228abcbb",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60dd7fad",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83bc1a72",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "42f76e43",
"metadata": {},
"source": [
"## Cohere\n",
"\n",
"Let's load the Cohere Embedding class."
]
},
{
"cell_type": "markdown",
"id": "ca9e2b3a",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6b82f59f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import CohereEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "26895c60",
"metadata": {},
"outputs": [],
"source": [
"embeddings = CohereEmbeddings(cohere_api_key=cohere_api_key)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "eea52814",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fbe167bf",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "38ad3b20",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "ed47bb62",
"metadata": {},
"source": [
"## Hugging Face Hub\n",
"Let's load the Hugging Face Embedding class."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "861521a9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import HuggingFaceEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "ff9be586",
"metadata": {},
"outputs": [],
"source": [
"embeddings = HuggingFaceEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d0a98ae9",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "5d6c682b",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "bb5e74c0",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "markdown",
"id": "fff4734f",
"metadata": {},
"source": [
"## TensorflowHub\n",
"Let's load the TensorflowHub Embedding class."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f822104b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import TensorflowHubEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bac84e46",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-01-30 23:53:01.652176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
"2023-01-30 23:53:34.362802: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
]
}
],
"source": [
"embeddings = TensorflowHubEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4790d770",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f556dcdb",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "markdown",
"id": "59428e05",
"metadata": {},
"source": [
"## InstructEmbeddings\n",
"Let's load the HuggingFace instruct Embeddings class."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "92c5b61e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import HuggingFaceInstructEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "062547b9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"load INSTRUCTOR_Transformer\n",
"max_seq_length 512\n"
]
}
],
"source": [
"embeddings = HuggingFaceInstructEmbeddings(\n",
" query_instruction=\"Represent the query for retrieval: \"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e1dcc4bd",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "90f0db94",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "markdown",
"id": "eec4efda",
"metadata": {},
"source": [
"## Self Hosted Embeddings\n",
"Let's load the SelfHostedEmbeddings, SelfHostedHuggingFaceEmbeddings, and SelfHostedHuggingFaceInstructEmbeddings classes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d338722a",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from langchain.embeddings import (\n",
" SelfHostedEmbeddings,\n",
" SelfHostedHuggingFaceEmbeddings,\n",
" SelfHostedHuggingFaceInstructEmbeddings,\n",
")\n",
"import runhouse as rh"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "146559e8",
"metadata": {},
"outputs": [],
"source": [
"# For an on-demand A100 with GCP, Azure, or Lambda\n",
"gpu = rh.cluster(name=\"rh-a10x\", instance_type=\"A100:1\", use_spot=False)\n",
"\n",
"# For an on-demand A10G with AWS (no single A100s on AWS)\n",
"# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')\n",
"\n",
"# For an existing cluster\n",
"# gpu = rh.cluster(ips=['<ip of the cluster>'],\n",
"# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},\n",
"# name='my-cluster')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1230f7df",
"metadata": {},
"outputs": [],
"source": [
"embeddings = SelfHostedHuggingFaceEmbeddings(hardware=gpu)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2684e928",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dc5e606",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "markdown",
"id": "cef9cc54",
"metadata": {},
"source": [
"And similarly for SelfHostedHuggingFaceInstructEmbeddings:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81a17ca3",
"metadata": {},
"outputs": [],
"source": [
"embeddings = SelfHostedHuggingFaceInstructEmbeddings(hardware=gpu)"
]
},
{
"cell_type": "markdown",
"id": "5a33d1c8",
"metadata": {},
"source": [
"Now let's load an embedding model with a custom load function:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c4af5679",
"metadata": {},
"outputs": [],
"source": [
"def get_pipeline():\n",
" from transformers import (\n",
" AutoModelForCausalLM,\n",
" AutoTokenizer,\n",
" pipeline,\n",
" ) # Must be inside the function in notebooks\n",
"\n",
" model_id = \"facebook/bart-base\"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
" model = AutoModelForCausalLM.from_pretrained(model_id)\n",
" return pipeline(\"feature-extraction\", model=model, tokenizer=tokenizer)\n",
"\n",
"\n",
"def inference_fn(pipeline, prompt):\n",
" # Return last hidden state of the model\n",
" if isinstance(prompt, list):\n",
" return [emb[0][-1] for emb in pipeline(prompt)]\n",
" return pipeline(prompt)[0][-1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8654334b",
"metadata": {},
"outputs": [],
"source": [
"embeddings = SelfHostedEmbeddings(\n",
" model_load_fn=get_pipeline,\n",
" hardware=gpu,\n",
" model_reqs=[\"./\", \"torch\", \"transformers\"],\n",
" inference_fn=inference_fn,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc1bfd0f",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "markdown",
"id": "f9c02c78",
"metadata": {},
"source": [
"## Fake Embeddings\n",
"\n",
"LangChain also provides a fake embedding class. You can use this to test your pipelines."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2ffc2e4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import FakeEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "80777571",
"metadata": {},
"outputs": [],
"source": [
"embeddings = FakeEmbeddings(size=1352)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3ec9d8f0",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3b9ae9e1",
"metadata": {},
"outputs": [],
"source": [
"doc_results = embeddings.embed_documents([\"foo\"])"
]
},
{
"cell_type": "markdown",
"id": "1f83f273",
"metadata": {},
"source": [
"## SageMaker Endpoint Embeddings\n",
"\n",
"Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
"\n",
"For instrucstions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "88d366bd",
"metadata": {},
"outputs": [],
"source": [
"!pip3 install langchain boto3"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1e9b926a",
"metadata": {},
"outputs": [],
"source": [
"from typing import Dict\n",
"from langchain.embeddings import SagemakerEndpointEmbeddings\n",
"from langchain.llms.sagemaker_endpoint import ContentHandlerBase\n",
"import json\n",
"\n",
"\n",
"class ContentHandler(ContentHandlerBase):\n",
" content_type = \"application/json\"\n",
" accepts = \"application/json\"\n",
"\n",
" def transform_input(self, prompt: str, model_kwargs: Dict) -> bytes:\n",
" input_str = json.dumps({\"inputs\": prompt, **model_kwargs})\n",
" return input_str.encode('utf-8')\n",
" \n",
" def transform_output(self, output: bytes) -> str:\n",
" response_json = json.loads(output.read().decode(\"utf-8\"))\n",
" return response_json[\"embeddings\"]\n",
"\n",
"content_handler = ContentHandler()\n",
"\n",
"\n",
"embeddings = SagemakerEndpointEmbeddings(\n",
" # endpoint_name=\"endpoint-name\", \n",
" # credentials_profile_name=\"credentials-profile-name\", \n",
" endpoint_name=\"huggingface-pytorch-inference-2023-03-21-16-14-03-834\", \n",
" region_name=\"us-east-1\", \n",
" content_handler=content_handler\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe9797b8",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "76f1b752",
"metadata": {},
"outputs": [],
"source": [
"doc_results = embeddings.embed_documents([\"foo\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fff99b21",
"metadata": {},
"outputs": [],
"source": [
"doc_results"
]
},
{
"cell_type": "markdown",
"id": "eb1c0ea9",
"metadata": {},
"source": [
"## Aleph Alpha\n",
"\n",
"There are two possible ways to use Aleph Alpha's semantic embeddings. If you have texts with a dissimilar structure (e.g. a Document and a Query) you would want to use asymmetric embeddings. Conversely, for texts with comparable structures, symmetric embeddings are the suggested approach."
]
},
{
"cell_type": "markdown",
"id": "9ecc84f9",
"metadata": {},
"source": [
"### Asymmetric"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8a920a89",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import AlephAlphaAsymmetricSemanticEmbedding"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f2d04da3",
"metadata": {},
"outputs": [],
"source": [
"document = \"This is a content of the document\"\n",
"query = \"What is the contnt of the document?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6ecde96",
"metadata": {},
"outputs": [],
"source": [
"embeddings = AlephAlphaAsymmetricSemanticEmbedding()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90e68411",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([document])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55903233",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(query)"
]
},
{
"cell_type": "markdown",
"id": "b8c00aab",
"metadata": {},
"source": [
"### Symmetric"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eabb763a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import AlephAlphaSymmetricSemanticEmbedding"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0ad799f7",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test text\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af86dc10",
"metadata": {},
"outputs": [],
"source": [
"embeddings = AlephAlphaSymmetricSemanticEmbedding()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d292536f",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c704a7cf",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33492471",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "7377c2ccc78bc62c2683122d48c8cd1fb85a53850a1b1fc29736ed39852c9885"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -7,7 +7,7 @@
"source": [
"# VectorStore Retriever\n",
"\n",
"The index - and therefor the retriever - that LangChain has the most support for is a VectorStoreRetriever. As the name suggests, this retriever is backed heavily by a VectorStore.\n",
"The index - and therefore the retriever - that LangChain has the most support for is a VectorStoreRetriever. As the name suggests, this retriever is backed heavily by a VectorStore.\n",
"\n",
"Once you construct a VectorStore, its very easy to construct a retriever. Let's walk through an example."
]

View File

@@ -5,8 +5,8 @@
"id": "13dc0983",
"metadata": {},
"source": [
"# HuggingFace Length Function\n",
"Most LLMs are constrained by the number of tokens that you can pass in, which is not the same as the number of characters. In order to get a more accurate estimate, we can use HuggingFace tokenizers to count the text length.\n",
"# Hugging Face Length Function\n",
"Most LLMs are constrained by the number of tokens that you can pass in, which is not the same as the number of characters. In order to get a more accurate estimate, we can use Hugging Face tokenizers to count the text length.\n",
"\n",
"1. How the text is split: by character passed in\n",
"2. How the chunk size is measured: by Hugging Face tokenizer"

View File

@@ -0,0 +1,165 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "eb1c0ea9",
"metadata": {},
"source": [
"# Aleph Alpha\n",
"\n",
"There are two possible ways to use Aleph Alpha's semantic embeddings. If you have texts with a dissimilar structure (e.g. a Document and a Query) you would want to use asymmetric embeddings. Conversely, for texts with comparable structures, symmetric embeddings are the suggested approach."
]
},
{
"cell_type": "markdown",
"id": "9ecc84f9",
"metadata": {},
"source": [
"## Asymmetric"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8a920a89",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import AlephAlphaAsymmetricSemanticEmbedding"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f2d04da3",
"metadata": {},
"outputs": [],
"source": [
"document = \"This is a content of the document\"\n",
"query = \"What is the contnt of the document?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6ecde96",
"metadata": {},
"outputs": [],
"source": [
"embeddings = AlephAlphaAsymmetricSemanticEmbedding()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90e68411",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([document])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55903233",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(query)"
]
},
{
"cell_type": "markdown",
"id": "b8c00aab",
"metadata": {},
"source": [
"## Symmetric"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eabb763a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import AlephAlphaSymmetricSemanticEmbedding"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0ad799f7",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test text\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af86dc10",
"metadata": {},
"outputs": [],
"source": [
"embeddings = AlephAlphaSymmetricSemanticEmbedding()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d292536f",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c704a7cf",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33492471",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "7377c2ccc78bc62c2683122d48c8cd1fb85a53850a1b1fc29736ed39852c9885"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -40,7 +40,7 @@
"source": [
"You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use `ChatPromptTemplate`'s `format_prompt` -- this returns a `PromptValue`, which you can convert to a string or Message object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
"\n",
"For convience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
"For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
]
},
{

View File

@@ -10,3 +10,4 @@ sphinx-panels
toml
myst_nb
sphinx_copybutton
pydata-sphinx-theme==0.13.1

View File

@@ -19,6 +19,7 @@ from langchain.tools.python.tool import PythonREPLTool
from langchain.tools.requests.tool import RequestsGetTool
from langchain.tools.wikipedia.tool import WikipediaQueryRun
from langchain.tools.wolfram_alpha.tool import WolframAlphaQueryRun
from langchain.utilities.apify import ApifyWrapper
from langchain.utilities.bash import BashProcess
from langchain.utilities.bing_search import BingSearchAPIWrapper
from langchain.utilities.google_search import GoogleSearchAPIWrapper

View File

@@ -2,15 +2,16 @@
from __future__ import annotations
import re
from typing import Any, Callable, List, NamedTuple, Optional, Sequence, Tuple
from typing import Any, Callable, List, NamedTuple, Optional, Sequence, Tuple, Union
from langchain.agents.agent import Agent, AgentExecutor
from langchain.agents.agent import Agent, AgentExecutor, AgentOutputParser
from langchain.agents.mrkl.prompt import FORMAT_INSTRUCTIONS, PREFIX, SUFFIX
from langchain.agents.tools import Tool
from langchain.callbacks.base import BaseCallbackManager
from langchain.chains import LLMChain
from langchain.llms.base import BaseLLM
from langchain.prompts import PromptTemplate
from langchain.schema import AgentAction, AgentFinish
from langchain.tools.base import BaseTool
FINAL_ANSWER_ACTION = "Final Answer:"
@@ -30,6 +31,24 @@ class ChainConfig(NamedTuple):
action_description: str
class ReActOutputParser(AgentOutputParser):
def parse(self, text: str) -> Union[AgentFinish, AgentAction]:
if FINAL_ANSWER_ACTION in text:
return AgentFinish(
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, log=text
)
# \s matches against tab/newline/whitespace
regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
match = re.search(regex, text, re.DOTALL)
if not match:
raise ValueError(f"Could not parse LLM output: `{text}`")
action = match.group(1).strip()
action_input = match.group(2)
return AgentAction(
tool=action, tool_input=action_input.strip(" ").strip('"'), log=text
)
def get_action_and_input(llm_output: str) -> Tuple[str, str]:
"""Parse out the action and input from the LLM output.
@@ -38,16 +57,11 @@ def get_action_and_input(llm_output: str) -> Tuple[str, str]:
The string starting with "Action:" and the following string starting
with "Action Input:" should be separated by a newline.
"""
if FINAL_ANSWER_ACTION in llm_output:
return "Final Answer", llm_output.split(FINAL_ANSWER_ACTION)[-1].strip()
# \s matches against tab/newline/whitespace
regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
match = re.search(regex, llm_output, re.DOTALL)
if not match:
raise ValueError(f"Could not parse LLM output: `{llm_output}`")
action = match.group(1).strip()
action_input = match.group(2)
return action, action_input.strip(" ").strip('"')
result = ReActOutputParser().parse(llm_output)
if isinstance(result, AgentFinish):
return "Final Answer", result.return_values["output"]
else:
return result.tool, result.tool_input
class ZeroShotAgent(Agent):

View File

@@ -40,11 +40,11 @@ class InvalidTool(BaseTool):
def _run(self, tool_name: str) -> str:
"""Use the tool."""
return f"{tool_name} is not a valid tool, try another one. Reminder to only use allowed tool names!"
return f"{tool_name} is not a valid tool, try another one."
async def _arun(self, tool_name: str) -> str:
"""Use the tool asynchronously."""
return f"{tool_name} is not a valid tool, try another one. Reminder to only use allowed tool names!"
return f"{tool_name} is not a valid tool, try another one."
def tool(*args: Union[str, Callable], return_direct: bool = False) -> Callable:

View File

@@ -3,11 +3,13 @@ import os
from contextlib import contextmanager
from typing import Generator, Optional
from langchain.callbacks.aim_callback import AimCallbackHandler
from langchain.callbacks.base import (
BaseCallbackHandler,
BaseCallbackManager,
CallbackManager,
)
from langchain.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain.callbacks.openai_info import OpenAICallbackHandler
from langchain.callbacks.shared import SharedCallbackManager
from langchain.callbacks.stdout import StdOutCallbackHandler
@@ -70,7 +72,9 @@ __all__ = [
"OpenAICallbackHandler",
"SharedCallbackManager",
"StdOutCallbackHandler",
"AimCallbackHandler",
"WandbCallbackHandler",
"ClearMLCallbackHandler",
"get_openai_callback",
"set_tracing_callback_manager",
"set_default_callback_manager",

View File

@@ -0,0 +1,427 @@
from copy import deepcopy
from typing import Any, Dict, List, Optional, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, LLMResult
def import_aim() -> Any:
try:
import aim
except ImportError:
raise ImportError(
"To use the Aim callback manager you need to have the"
" `aim` python package installed."
"Please install it with `pip install aim`"
)
return aim
class BaseMetadataCallbackHandler:
"""This class handles the metadata and associated function states for callbacks.
Attributes:
step (int): The current step.
starts (int): The number of times the start method has been called.
ends (int): The number of times the end method has been called.
errors (int): The number of times the error method has been called.
text_ctr (int): The number of times the text method has been called.
ignore_llm_ (bool): Whether to ignore llm callbacks.
ignore_chain_ (bool): Whether to ignore chain callbacks.
ignore_agent_ (bool): Whether to ignore agent callbacks.
always_verbose_ (bool): Whether to always be verbose.
chain_starts (int): The number of times the chain start method has been called.
chain_ends (int): The number of times the chain end method has been called.
llm_starts (int): The number of times the llm start method has been called.
llm_ends (int): The number of times the llm end method has been called.
llm_streams (int): The number of times the text method has been called.
tool_starts (int): The number of times the tool start method has been called.
tool_ends (int): The number of times the tool end method has been called.
agent_ends (int): The number of times the agent end method has been called.
"""
def __init__(self) -> None:
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
@property
def always_verbose(self) -> bool:
"""Whether to call verbose callbacks even if verbose is False."""
return self.always_verbose_
@property
def ignore_llm(self) -> bool:
"""Whether to ignore LLM callbacks."""
return self.ignore_llm_
@property
def ignore_chain(self) -> bool:
"""Whether to ignore chain callbacks."""
return self.ignore_chain_
@property
def ignore_agent(self) -> bool:
"""Whether to ignore agent callbacks."""
return self.ignore_agent_
def get_custom_callback_meta(self) -> Dict[str, Any]:
return {
"step": self.step,
"starts": self.starts,
"ends": self.ends,
"errors": self.errors,
"text_ctr": self.text_ctr,
"chain_starts": self.chain_starts,
"chain_ends": self.chain_ends,
"llm_starts": self.llm_starts,
"llm_ends": self.llm_ends,
"llm_streams": self.llm_streams,
"tool_starts": self.tool_starts,
"tool_ends": self.tool_ends,
"agent_ends": self.agent_ends,
}
def reset_callback_meta(self) -> None:
"""Reset the callback metadata."""
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
return None
class AimCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
"""Callback Handler that logs to Aim.
Parameters:
repo (:obj:`str`, optional): Aim repository path or Repo object to which
Run object is bound. If skipped, default Repo is used.
experiment_name (:obj:`str`, optional): Sets Run's `experiment` property.
'default' if not specified. Can be used later to query runs/sequences.
system_tracking_interval (:obj:`int`, optional): Sets the tracking interval
in seconds for system usage metrics (CPU, Memory, etc.). Set to `None`
to disable system metrics tracking.
log_system_params (:obj:`bool`, optional): Enable/Disable logging of system
params such as installed packages, git info, environment variables, etc.
This handler will utilize the associated callback method called and formats
the input of each callback function with metadata regarding the state of LLM run
and then logs the response to Aim.
"""
def __init__(
self,
repo: Optional[str] = None,
experiment_name: Optional[str] = None,
system_tracking_interval: Optional[int] = 10,
log_system_params: bool = True,
) -> None:
"""Initialize callback handler."""
super().__init__()
aim = import_aim()
self.repo = repo
self.experiment_name = experiment_name
self.system_tracking_interval = system_tracking_interval
self.log_system_params = log_system_params
self._run = aim.Run(
repo=self.repo,
experiment=self.experiment_name,
system_tracking_interval=self.system_tracking_interval,
log_system_params=self.log_system_params,
)
self._run_hash = self._run.hash
self.action_records: list = []
def setup(self, **kwargs: Any) -> None:
aim = import_aim()
if not self._run:
if self._run_hash:
self._run = aim.Run(
self._run_hash,
repo=self.repo,
system_tracking_interval=self.system_tracking_interval,
)
else:
self._run = aim.Run(
repo=self.repo,
experiment=self.experiment_name,
system_tracking_interval=self.system_tracking_interval,
log_system_params=self.log_system_params,
)
self._run_hash = self._run.hash
if kwargs:
for key, value in kwargs.items():
self._run.set(key, value, strict=False)
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when LLM starts."""
aim = import_aim()
self.step += 1
self.llm_starts += 1
self.starts += 1
resp = {"action": "on_llm_start"}
resp.update(self.get_custom_callback_meta())
prompts_res = deepcopy(prompts)
self._run.track(
[aim.Text(prompt) for prompt in prompts_res],
name="on_llm_start",
context=resp,
)
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Run when LLM ends running."""
aim = import_aim()
self.step += 1
self.llm_ends += 1
self.ends += 1
resp = {"action": "on_llm_end"}
resp.update(self.get_custom_callback_meta())
response_res = deepcopy(response)
generated = [
aim.Text(generation.text)
for generations in response_res.generations
for generation in generations
]
self._run.track(
generated,
name="on_llm_end",
context=resp,
)
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run when LLM generates a new token."""
self.step += 1
self.llm_streams += 1
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when LLM errors."""
self.step += 1
self.errors += 1
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Run when chain starts running."""
aim = import_aim()
self.step += 1
self.chain_starts += 1
self.starts += 1
resp = {"action": "on_chain_start"}
resp.update(self.get_custom_callback_meta())
inputs_res = deepcopy(inputs)
self._run.track(
aim.Text(inputs_res["input"]), name="on_chain_start", context=resp
)
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Run when chain ends running."""
aim = import_aim()
self.step += 1
self.chain_ends += 1
self.ends += 1
resp = {"action": "on_chain_end"}
resp.update(self.get_custom_callback_meta())
outputs_res = deepcopy(outputs)
self._run.track(
aim.Text(outputs_res["output"]), name="on_chain_end", context=resp
)
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when chain errors."""
self.step += 1
self.errors += 1
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> None:
"""Run when tool starts running."""
aim = import_aim()
self.step += 1
self.tool_starts += 1
self.starts += 1
resp = {"action": "on_tool_start"}
resp.update(self.get_custom_callback_meta())
self._run.track(aim.Text(input_str), name="on_tool_start", context=resp)
def on_tool_end(self, output: str, **kwargs: Any) -> None:
"""Run when tool ends running."""
aim = import_aim()
self.step += 1
self.tool_ends += 1
self.ends += 1
resp = {"action": "on_tool_end"}
resp.update(self.get_custom_callback_meta())
self._run.track(aim.Text(output), name="on_tool_end", context=resp)
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when tool errors."""
self.step += 1
self.errors += 1
def on_text(self, text: str, **kwargs: Any) -> None:
"""
Run when agent is ending.
"""
self.step += 1
self.text_ctr += 1
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Run when agent ends running."""
aim = import_aim()
self.step += 1
self.agent_ends += 1
self.ends += 1
resp = {"action": "on_agent_finish"}
resp.update(self.get_custom_callback_meta())
finish_res = deepcopy(finish)
text = "OUTPUT:\n{}\n\nLOG:\n{}".format(
finish_res.return_values["output"], finish_res.log
)
self._run.track(aim.Text(text), name="on_agent_finish", context=resp)
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
"""Run on agent action."""
aim = import_aim()
self.step += 1
self.tool_starts += 1
self.starts += 1
resp = {
"action": "on_agent_action",
"tool": action.tool,
}
resp.update(self.get_custom_callback_meta())
action_res = deepcopy(action)
text = "TOOL INPUT:\n{}\n\nLOG:\n{}".format(
action_res.tool_input, action_res.log
)
self._run.track(aim.Text(text), name="on_agent_action", context=resp)
def flush_tracker(
self,
repo: Optional[str] = None,
experiment_name: Optional[str] = None,
system_tracking_interval: Optional[int] = 10,
log_system_params: bool = True,
langchain_asset: Any = None,
reset: bool = True,
finish: bool = False,
) -> None:
"""Flush the tracker and reset the session.
Args:
repo (:obj:`str`, optional): Aim repository path or Repo object to which
Run object is bound. If skipped, default Repo is used.
experiment_name (:obj:`str`, optional): Sets Run's `experiment` property.
'default' if not specified. Can be used later to query runs/sequences.
system_tracking_interval (:obj:`int`, optional): Sets the tracking interval
in seconds for system usage metrics (CPU, Memory, etc.). Set to `None`
to disable system metrics tracking.
log_system_params (:obj:`bool`, optional): Enable/Disable logging of system
params such as installed packages, git info, environment variables, etc.
langchain_asset: The langchain asset to save.
reset: Whether to reset the session.
finish: Whether to finish the run.
Returns:
None
"""
if langchain_asset:
try:
for key, value in langchain_asset.dict().items():
self._run.set(key, value, strict=False)
except Exception:
pass
if finish or reset:
self._run.close()
self.reset_callback_meta()
if reset:
self.__init__( # type: ignore
repo=repo if repo else self.repo,
experiment_name=experiment_name
if experiment_name
else self.experiment_name,
system_tracking_interval=system_tracking_interval
if system_tracking_interval
else self.system_tracking_interval,
log_system_params=log_system_params
if log_system_params
else self.log_system_params,
)

View File

@@ -0,0 +1,208 @@
# Import the necessary packages for ingestion
import uuid
from typing import Any, Dict, List, Optional, Union
import pandas as pd
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, LLMResult
class ArizeCallbackHandler(BaseCallbackHandler):
"""Callback Handler that logs to Arize platform."""
def __init__(
self,
model_id: Optional[str] = None,
model_version: Optional[str] = None,
SPACE_KEY: Optional[str] = None,
API_KEY: Optional[str] = None,
) -> None:
"""Initialize callback handler."""
super().__init__()
# Set the model_id and model_version for the Arize monitoring.
self.model_id = model_id
self.model_version = model_version
# Set the SPACE_KEY and API_KEY for the Arize client.
self.space_key = SPACE_KEY
self.api_key = API_KEY
# Initialize empty lists to store the prompt/response pairs
# and other necessary data.
self.prompt_records: List = []
self.response_records: List = []
self.prediction_ids: List = []
self.pred_timestamps: List = []
self.response_embeddings: List = []
self.prompt_embeddings: List = []
self.prompt_tokens = 0
self.completion_tokens = 0
self.total_tokens = 0
from arize.api import Client
from arize.pandas.embeddings import EmbeddingGenerator, UseCases
# Create an embedding generator for generating embeddings
# from prompts and responses.
self.generator = EmbeddingGenerator.from_use_case(
use_case=UseCases.NLP.SEQUENCE_CLASSIFICATION,
model_name="distilbert-base-uncased",
tokenizer_max_length=512,
batch_size=256,
)
# Create an Arize client and check if the SPACE_KEY and API_KEY
# are not set to the default values.
self.arize_client = Client(space_key=SPACE_KEY, api_key=API_KEY)
if SPACE_KEY == "SPACE_KEY" or API_KEY == "API_KEY":
raise ValueError("❌ CHANGE SPACE AND API KEYS")
else:
print("✅ Arize client setup done! Now you can start using Arize!")
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Record the prompts when an LLM starts."""
for prompt in prompts:
self.prompt_records.append(prompt.replace("\n", " "))
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Do nothing when a new token is generated."""
pass
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Log data to Arize when an LLM ends."""
from arize.utils.types import (
Embedding,
Environments,
ModelTypes,
)
# Record token usage of the LLM
if response.llm_output is not None:
self.prompt_tokens = response.llm_output["token_usage"]["prompt_tokens"]
self.total_tokens = response.llm_output["token_usage"]["total_tokens"]
self.completion_tokens = response.llm_output["token_usage"][
"completion_tokens"
]
i = 0
# Go through each prompt response pair and generate embeddings as
# well as timestamp and prediction ids
for generations in response.generations:
for generation in generations:
prompt = self.prompt_records[i]
prompt_embedding = pd.Series(
self.generator.generate_embeddings(
text_col=pd.Series(prompt.replace("\n", " "))
).reset_index(drop=True)
)
generated_text = generation.text.replace("\n", " ")
response_embedding = pd.Series(
self.generator.generate_embeddings(
text_col=pd.Series(generation.text.replace("\n", " "))
).reset_index(drop=True)
)
pred_id = str(uuid.uuid4())
# Define embedding features for Arize ingestion
embedding_features = {
"prompt_embedding": Embedding(
vector=pd.Series(prompt_embedding[0]), data=prompt
),
"response_embedding": Embedding(
vector=pd.Series(response_embedding[0]), data=generated_text
),
}
tags = {
"Prompt Tokens": self.prompt_tokens,
"Completion Tokens": self.completion_tokens,
"Total Tokens": self.total_tokens,
}
# Log each prompt response data into arize
future = self.arize_client.log(
prediction_id=pred_id,
tags=tags,
prediction_label="1",
model_id=self.model_id,
model_type=ModelTypes.SCORE_CATEGORICAL,
model_version=self.model_version,
environment=Environments.PRODUCTION,
embedding_features=embedding_features,
)
result = future.result()
if result.status_code == 200:
print("✅ Successfully logged data to Arize!")
else:
print(
f"❌ Logging failed with status code {result.status_code}"
f' and message "{result.text}"'
)
i = i + 1
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when LLM outputs an error."""
pass
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Do nothing when LLM chain starts."""
pass
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Do nothing when LLM chain ends."""
pass
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when LLM chain outputs an error."""
pass
def on_tool_start(
self,
serialized: Dict[str, Any],
input_str: str,
**kwargs: Any,
) -> None:
"""Do nothing when tool starts."""
pass
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
"""Do nothing when agent takes a specific action."""
pass
def on_tool_end(
self,
output: str,
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Do nothing when tool ends."""
pass
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when tool outputs an error."""
pass
def on_text(self, text: str, **kwargs: Any) -> None:
"""Do nothing"""
pass
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Do nothing"""
pass

View File

@@ -0,0 +1,515 @@
import tempfile
from copy import deepcopy
from pathlib import Path
from typing import Any, Dict, List, Optional, Sequence, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.utils import (
BaseMetadataCallbackHandler,
flatten_dict,
hash_string,
import_pandas,
import_spacy,
import_textstat,
load_json,
)
from langchain.schema import AgentAction, AgentFinish, LLMResult
def import_clearml() -> Any:
try:
import clearml # noqa: F401
except ImportError:
raise ImportError(
"To use the clearml callback manager you need to have the `clearml` python "
"package installed. Please install it with `pip install clearml`"
)
return clearml
class ClearMLCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
"""Callback Handler that logs to ClearML.
Parameters:
job_type (str): The type of clearml task such as "inference", "testing" or "qc"
project_name (str): The clearml project name
tags (list): Tags to add to the task
task_name (str): Name of the clearml task
visualize (bool): Whether to visualize the run.
complexity_metrics (bool): Whether to log complexity metrics
stream_logs (bool): Whether to stream callback actions to ClearML
This handler will utilize the associated callback method and formats
the input of each callback function with metadata regarding the state of LLM run,
and adds the response to the list of records for both the {method}_records and
action. It then logs the response to the ClearML console.
"""
def __init__(
self,
task_type: Optional[str] = "inference",
project_name: Optional[str] = "langchain_callback_demo",
tags: Optional[Sequence] = None,
task_name: Optional[str] = None,
visualize: bool = False,
complexity_metrics: bool = False,
stream_logs: bool = False,
) -> None:
"""Initialize callback handler."""
clearml = import_clearml()
spacy = import_spacy()
super().__init__()
self.task_type = task_type
self.project_name = project_name
self.tags = tags
self.task_name = task_name
self.visualize = visualize
self.complexity_metrics = complexity_metrics
self.stream_logs = stream_logs
self.temp_dir = tempfile.TemporaryDirectory()
# Check if ClearML task already exists (e.g. in pipeline)
if clearml.Task.current_task():
self.task = clearml.Task.current_task()
else:
self.task = clearml.Task.init( # type: ignore
task_type=self.task_type,
project_name=self.project_name,
tags=self.tags,
task_name=self.task_name,
output_uri=True,
)
self.logger = self.task.get_logger()
warning = (
"The clearml callback is currently in beta and is subject to change "
"based on updates to `langchain`. Please report any issues to "
"https://github.com/allegroai/clearml/issues with the tag `langchain`."
)
self.logger.report_text(warning, level=30, print_console=True)
self.callback_columns: list = []
self.action_records: list = []
self.complexity_metrics = complexity_metrics
self.visualize = visualize
self.nlp = spacy.load("en_core_web_sm")
def _init_resp(self) -> Dict:
return {k: None for k in self.callback_columns}
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when LLM starts."""
self.step += 1
self.llm_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update({"action": "on_llm_start"})
resp.update(flatten_dict(serialized))
resp.update(self.get_custom_callback_meta())
for prompt in prompts:
prompt_resp = deepcopy(resp)
prompt_resp["prompts"] = prompt
self.on_llm_start_records.append(prompt_resp)
self.action_records.append(prompt_resp)
if self.stream_logs:
self.logger.report_text(prompt_resp)
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run when LLM generates a new token."""
self.step += 1
self.llm_streams += 1
resp = self._init_resp()
resp.update({"action": "on_llm_new_token", "token": token})
resp.update(self.get_custom_callback_meta())
self.on_llm_token_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Run when LLM ends running."""
self.step += 1
self.llm_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update({"action": "on_llm_end"})
resp.update(flatten_dict(response.llm_output or {}))
resp.update(self.get_custom_callback_meta())
for generations in response.generations:
for generation in generations:
generation_resp = deepcopy(resp)
generation_resp.update(flatten_dict(generation.dict()))
generation_resp.update(self.analyze_text(generation.text))
self.on_llm_end_records.append(generation_resp)
self.action_records.append(generation_resp)
if self.stream_logs:
self.logger.report_text(generation_resp)
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when LLM errors."""
self.step += 1
self.errors += 1
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Run when chain starts running."""
self.step += 1
self.chain_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update({"action": "on_chain_start"})
resp.update(flatten_dict(serialized))
resp.update(self.get_custom_callback_meta())
chain_input = inputs["input"]
if isinstance(chain_input, str):
input_resp = deepcopy(resp)
input_resp["input"] = chain_input
self.on_chain_start_records.append(input_resp)
self.action_records.append(input_resp)
if self.stream_logs:
self.logger.report_text(input_resp)
elif isinstance(chain_input, list):
for inp in chain_input:
input_resp = deepcopy(resp)
input_resp.update(inp)
self.on_chain_start_records.append(input_resp)
self.action_records.append(input_resp)
if self.stream_logs:
self.logger.report_text(input_resp)
else:
raise ValueError("Unexpected data format provided!")
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Run when chain ends running."""
self.step += 1
self.chain_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update({"action": "on_chain_end", "outputs": outputs["output"]})
resp.update(self.get_custom_callback_meta())
self.on_chain_end_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when chain errors."""
self.step += 1
self.errors += 1
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> None:
"""Run when tool starts running."""
self.step += 1
self.tool_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update({"action": "on_tool_start", "input_str": input_str})
resp.update(flatten_dict(serialized))
resp.update(self.get_custom_callback_meta())
self.on_tool_start_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_tool_end(self, output: str, **kwargs: Any) -> None:
"""Run when tool ends running."""
self.step += 1
self.tool_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update({"action": "on_tool_end", "output": output})
resp.update(self.get_custom_callback_meta())
self.on_tool_end_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Run when tool errors."""
self.step += 1
self.errors += 1
def on_text(self, text: str, **kwargs: Any) -> None:
"""
Run when agent is ending.
"""
self.step += 1
self.text_ctr += 1
resp = self._init_resp()
resp.update({"action": "on_text", "text": text})
resp.update(self.get_custom_callback_meta())
self.on_text_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Run when agent ends running."""
self.step += 1
self.agent_ends += 1
self.ends += 1
resp = self._init_resp()
resp.update(
{
"action": "on_agent_finish",
"output": finish.return_values["output"],
"log": finish.log,
}
)
resp.update(self.get_custom_callback_meta())
self.on_agent_finish_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
"""Run on agent action."""
self.step += 1
self.tool_starts += 1
self.starts += 1
resp = self._init_resp()
resp.update(
{
"action": "on_agent_action",
"tool": action.tool,
"tool_input": action.tool_input,
"log": action.log,
}
)
resp.update(self.get_custom_callback_meta())
self.on_agent_action_records.append(resp)
self.action_records.append(resp)
if self.stream_logs:
self.logger.report_text(resp)
def analyze_text(self, text: str) -> dict:
"""Analyze text using textstat and spacy.
Parameters:
text (str): The text to analyze.
Returns:
(dict): A dictionary containing the complexity metrics.
"""
resp = {}
textstat = import_textstat()
spacy = import_spacy()
if self.complexity_metrics:
text_complexity_metrics = {
"flesch_reading_ease": textstat.flesch_reading_ease(text),
"flesch_kincaid_grade": textstat.flesch_kincaid_grade(text),
"smog_index": textstat.smog_index(text),
"coleman_liau_index": textstat.coleman_liau_index(text),
"automated_readability_index": textstat.automated_readability_index(
text
),
"dale_chall_readability_score": textstat.dale_chall_readability_score(
text
),
"difficult_words": textstat.difficult_words(text),
"linsear_write_formula": textstat.linsear_write_formula(text),
"gunning_fog": textstat.gunning_fog(text),
"text_standard": textstat.text_standard(text),
"fernandez_huerta": textstat.fernandez_huerta(text),
"szigriszt_pazos": textstat.szigriszt_pazos(text),
"gutierrez_polini": textstat.gutierrez_polini(text),
"crawford": textstat.crawford(text),
"gulpease_index": textstat.gulpease_index(text),
"osman": textstat.osman(text),
}
resp.update(text_complexity_metrics)
if self.visualize and self.nlp and self.temp_dir.name is not None:
doc = self.nlp(text)
dep_out = spacy.displacy.render( # type: ignore
doc, style="dep", jupyter=False, page=True
)
dep_output_path = Path(
self.temp_dir.name, hash_string(f"dep-{text}") + ".html"
)
dep_output_path.open("w", encoding="utf-8").write(dep_out)
ent_out = spacy.displacy.render( # type: ignore
doc, style="ent", jupyter=False, page=True
)
ent_output_path = Path(
self.temp_dir.name, hash_string(f"ent-{text}") + ".html"
)
ent_output_path.open("w", encoding="utf-8").write(ent_out)
self.logger.report_media(
"Dependencies Plot", text, local_path=dep_output_path
)
self.logger.report_media("Entities Plot", text, local_path=ent_output_path)
return resp
def _create_session_analysis_df(self) -> Any:
"""Create a dataframe with all the information from the session."""
pd = import_pandas()
on_llm_start_records_df = pd.DataFrame(self.on_llm_start_records)
on_llm_end_records_df = pd.DataFrame(self.on_llm_end_records)
llm_input_prompts_df = (
on_llm_start_records_df[["step", "prompts", "name"]]
.dropna(axis=1)
.rename({"step": "prompt_step"}, axis=1)
)
complexity_metrics_columns = []
visualizations_columns: List = []
if self.complexity_metrics:
complexity_metrics_columns = [
"flesch_reading_ease",
"flesch_kincaid_grade",
"smog_index",
"coleman_liau_index",
"automated_readability_index",
"dale_chall_readability_score",
"difficult_words",
"linsear_write_formula",
"gunning_fog",
"text_standard",
"fernandez_huerta",
"szigriszt_pazos",
"gutierrez_polini",
"crawford",
"gulpease_index",
"osman",
]
llm_outputs_df = (
on_llm_end_records_df[
[
"step",
"text",
"token_usage_total_tokens",
"token_usage_prompt_tokens",
"token_usage_completion_tokens",
]
+ complexity_metrics_columns
+ visualizations_columns
]
.dropna(axis=1)
.rename({"step": "output_step", "text": "output"}, axis=1)
)
session_analysis_df = pd.concat([llm_input_prompts_df, llm_outputs_df], axis=1)
# session_analysis_df["chat_html"] = session_analysis_df[
# ["prompts", "output"]
# ].apply(
# lambda row: construct_html_from_prompt_and_generation(
# row["prompts"], row["output"]
# ),
# axis=1,
# )
return session_analysis_df
def flush_tracker(
self,
name: Optional[str] = None,
langchain_asset: Any = None,
finish: bool = False,
) -> None:
"""Flush the tracker and setup the session.
Everything after this will be a new table.
Args:
name: Name of the preformed session so far so it is identifyable
langchain_asset: The langchain asset to save.
finish: Whether to finish the run.
Returns:
None
"""
pd = import_pandas()
clearml = import_clearml()
# Log the action records
self.logger.report_table(
"Action Records", name, table_plot=pd.DataFrame(self.action_records)
)
# Session analysis
session_analysis_df = self._create_session_analysis_df()
self.logger.report_table(
"Session Analysis", name, table_plot=session_analysis_df
)
if self.stream_logs:
self.logger.report_text(
{
"action_records": pd.DataFrame(self.action_records),
"session_analysis": session_analysis_df,
}
)
if langchain_asset:
langchain_asset_path = Path(self.temp_dir.name, "model.json")
try:
langchain_asset.save(langchain_asset_path)
# Create output model and connect it to the task
output_model = clearml.OutputModel(
task=self.task, config_text=load_json(langchain_asset_path)
)
output_model.update_weights(
weights_filename=str(langchain_asset_path),
auto_delete_file=False,
target_filename=name,
)
except ValueError:
langchain_asset.save_agent(langchain_asset_path)
output_model = clearml.OutputModel(
task=self.task, config_text=load_json(langchain_asset_path)
)
output_model.update_weights(
weights_filename=str(langchain_asset_path),
auto_delete_file=False,
target_filename=name,
)
except NotImplementedError as e:
print("Could not save model.")
print(repr(e))
pass
# Cleanup after adding everything to ClearML
self.task.flush(wait_for_uploads=True)
self.temp_dir.cleanup()
self.temp_dir = tempfile.TemporaryDirectory()
self.reset_callback_meta()
if finish:
self.task.close()

View File

@@ -35,6 +35,7 @@ class BaseLangChainTracer(BaseTracer, ABC):
endpoint = f"{self._endpoint}/chain-runs"
else:
endpoint = f"{self._endpoint}/tool-runs"
try:
requests.post(
endpoint,

View File

@@ -0,0 +1,253 @@
import hashlib
from pathlib import Path
from typing import Any, Dict, Iterable, Tuple, Union
def import_spacy() -> Any:
try:
import spacy
except ImportError:
raise ImportError(
"This callback manager requires the `spacy` python "
"package installed. Please install it with `pip install spacy`"
)
return spacy
def import_pandas() -> Any:
try:
import pandas
except ImportError:
raise ImportError(
"This callback manager requires the `pandas` python "
"package installed. Please install it with `pip install pandas`"
)
return pandas
def import_textstat() -> Any:
try:
import textstat
except ImportError:
raise ImportError(
"This callback manager requires the `textstat` python "
"package installed. Please install it with `pip install textstat`"
)
return textstat
def _flatten_dict(
nested_dict: Dict[str, Any], parent_key: str = "", sep: str = "_"
) -> Iterable[Tuple[str, Any]]:
"""
Generator that yields flattened items from a nested dictionary for a flat dict.
Parameters:
nested_dict (dict): The nested dictionary to flatten.
parent_key (str): The prefix to prepend to the keys of the flattened dict.
sep (str): The separator to use between the parent key and the key of the
flattened dictionary.
Yields:
(str, any): A key-value pair from the flattened dictionary.
"""
for key, value in nested_dict.items():
new_key = parent_key + sep + key if parent_key else key
if isinstance(value, dict):
yield from _flatten_dict(value, new_key, sep)
else:
yield new_key, value
def flatten_dict(
nested_dict: Dict[str, Any], parent_key: str = "", sep: str = "_"
) -> Dict[str, Any]:
"""Flattens a nested dictionary into a flat dictionary.
Parameters:
nested_dict (dict): The nested dictionary to flatten.
parent_key (str): The prefix to prepend to the keys of the flattened dict.
sep (str): The separator to use between the parent key and the key of the
flattened dictionary.
Returns:
(dict): A flat dictionary.
"""
flat_dict = {k: v for k, v in _flatten_dict(nested_dict, parent_key, sep)}
return flat_dict
def hash_string(s: str) -> str:
"""Hash a string using sha1.
Parameters:
s (str): The string to hash.
Returns:
(str): The hashed string.
"""
return hashlib.sha1(s.encode("utf-8")).hexdigest()
def load_json(json_path: Union[str, Path]) -> str:
"""Load json file to a string.
Parameters:
json_path (str): The path to the json file.
Returns:
(str): The string representation of the json file.
"""
with open(json_path, "r") as f:
data = f.read()
return data
class BaseMetadataCallbackHandler:
"""This class handles the metadata and associated function states for callbacks.
Attributes:
step (int): The current step.
starts (int): The number of times the start method has been called.
ends (int): The number of times the end method has been called.
errors (int): The number of times the error method has been called.
text_ctr (int): The number of times the text method has been called.
ignore_llm_ (bool): Whether to ignore llm callbacks.
ignore_chain_ (bool): Whether to ignore chain callbacks.
ignore_agent_ (bool): Whether to ignore agent callbacks.
always_verbose_ (bool): Whether to always be verbose.
chain_starts (int): The number of times the chain start method has been called.
chain_ends (int): The number of times the chain end method has been called.
llm_starts (int): The number of times the llm start method has been called.
llm_ends (int): The number of times the llm end method has been called.
llm_streams (int): The number of times the text method has been called.
tool_starts (int): The number of times the tool start method has been called.
tool_ends (int): The number of times the tool end method has been called.
agent_ends (int): The number of times the agent end method has been called.
on_llm_start_records (list): A list of records of the on_llm_start method.
on_llm_token_records (list): A list of records of the on_llm_token method.
on_llm_end_records (list): A list of records of the on_llm_end method.
on_chain_start_records (list): A list of records of the on_chain_start method.
on_chain_end_records (list): A list of records of the on_chain_end method.
on_tool_start_records (list): A list of records of the on_tool_start method.
on_tool_end_records (list): A list of records of the on_tool_end method.
on_agent_finish_records (list): A list of records of the on_agent_end method.
"""
def __init__(self) -> None:
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
self.on_llm_start_records: list = []
self.on_llm_token_records: list = []
self.on_llm_end_records: list = []
self.on_chain_start_records: list = []
self.on_chain_end_records: list = []
self.on_tool_start_records: list = []
self.on_tool_end_records: list = []
self.on_text_records: list = []
self.on_agent_finish_records: list = []
self.on_agent_action_records: list = []
@property
def always_verbose(self) -> bool:
"""Whether to call verbose callbacks even if verbose is False."""
return self.always_verbose_
@property
def ignore_llm(self) -> bool:
"""Whether to ignore LLM callbacks."""
return self.ignore_llm_
@property
def ignore_chain(self) -> bool:
"""Whether to ignore chain callbacks."""
return self.ignore_chain_
@property
def ignore_agent(self) -> bool:
"""Whether to ignore agent callbacks."""
return self.ignore_agent_
def get_custom_callback_meta(self) -> Dict[str, Any]:
return {
"step": self.step,
"starts": self.starts,
"ends": self.ends,
"errors": self.errors,
"text_ctr": self.text_ctr,
"chain_starts": self.chain_starts,
"chain_ends": self.chain_ends,
"llm_starts": self.llm_starts,
"llm_ends": self.llm_ends,
"llm_streams": self.llm_streams,
"tool_starts": self.tool_starts,
"tool_ends": self.tool_ends,
"agent_ends": self.agent_ends,
}
def reset_callback_meta(self) -> None:
"""Reset the callback metadata."""
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
self.on_llm_start_records = []
self.on_llm_token_records = []
self.on_llm_end_records = []
self.on_chain_start_records = []
self.on_chain_end_records = []
self.on_tool_start_records = []
self.on_tool_end_records = []
self.on_text_records = []
self.on_agent_finish_records = []
self.on_agent_action_records = []
return None

View File

@@ -1,11 +1,18 @@
import hashlib
import json
import tempfile
from copy import deepcopy
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple, Union
from typing import Any, Dict, List, Optional, Sequence, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.utils import (
BaseMetadataCallbackHandler,
flatten_dict,
hash_string,
import_pandas,
import_spacy,
import_textstat,
)
from langchain.schema import AgentAction, AgentFinish, LLMResult
@@ -20,93 +27,6 @@ def import_wandb() -> Any:
return wandb
def import_spacy() -> Any:
try:
import spacy # noqa: F401
except ImportError:
raise ImportError(
"To use the wandb callback manager you need to have the `spacy` python "
"package installed. Please install it with `pip install spacy`"
)
return spacy
def import_pandas() -> Any:
try:
import pandas # noqa: F401
except ImportError:
raise ImportError(
"To use the wandb callback manager you need to have the `pandas` python "
"package installed. Please install it with `pip install pandas`"
)
return pandas
def import_textstat() -> Any:
try:
import textstat # noqa: F401
except ImportError:
raise ImportError(
"To use the wandb callback manager you need to have the `textstat` python "
"package installed. Please install it with `pip install textstat`"
)
return textstat
def _flatten_dict(
nested_dict: Dict[str, Any], parent_key: str = "", sep: str = "_"
) -> Iterable[Tuple[str, Any]]:
"""
Generator that yields flattened items from a nested dictionary for a flat dict.
Parameters:
nested_dict (dict): The nested dictionary to flatten.
parent_key (str): The prefix to prepend to the keys of the flattened dict.
sep (str): The separator to use between the parent key and the key of the
flattened dictionary.
Yields:
(str, any): A key-value pair from the flattened dictionary.
"""
for key, value in nested_dict.items():
new_key = parent_key + sep + key if parent_key else key
if isinstance(value, dict):
yield from _flatten_dict(value, new_key, sep)
else:
yield new_key, value
def flatten_dict(
nested_dict: Dict[str, Any], parent_key: str = "", sep: str = "_"
) -> Dict[str, Any]:
"""Flattens a nested dictionary into a flat dictionary.
Parameters:
nested_dict (dict): The nested dictionary to flatten.
parent_key (str): The prefix to prepend to the keys of the flattened dict.
sep (str): The separator to use between the parent key and the key of the
flattened dictionary.
Returns:
(dict): A flat dictionary.
"""
flat_dict = {k: v for k, v in _flatten_dict(nested_dict, parent_key, sep)}
return flat_dict
def hash_string(s: str) -> str:
"""Hash a string using sha1.
Parameters:
s (str): The string to hash.
Returns:
(str): The hashed string.
"""
return hashlib.sha1(s.encode("utf-8")).hexdigest()
def load_json_to_dict(json_path: Union[str, Path]) -> dict:
"""Load json file to a dictionary.
@@ -216,155 +136,6 @@ def construct_html_from_prompt_and_generation(prompt: str, generation: str) -> A
)
class BaseMetadataCallbackHandler:
"""This class handles the metadata and associated function states for callbacks.
Attributes:
step (int): The current step.
starts (int): The number of times the start method has been called.
ends (int): The number of times the end method has been called.
errors (int): The number of times the error method has been called.
text_ctr (int): The number of times the text method has been called.
ignore_llm_ (bool): Whether to ignore llm callbacks.
ignore_chain_ (bool): Whether to ignore chain callbacks.
ignore_agent_ (bool): Whether to ignore agent callbacks.
always_verbose_ (bool): Whether to always be verbose.
chain_starts (int): The number of times the chain start method has been called.
chain_ends (int): The number of times the chain end method has been called.
llm_starts (int): The number of times the llm start method has been called.
llm_ends (int): The number of times the llm end method has been called.
llm_streams (int): The number of times the text method has been called.
tool_starts (int): The number of times the tool start method has been called.
tool_ends (int): The number of times the tool end method has been called.
agent_ends (int): The number of times the agent end method has been called.
on_llm_start_records (list): A list of records of the on_llm_start method.
on_llm_token_records (list): A list of records of the on_llm_token method.
on_llm_end_records (list): A list of records of the on_llm_end method.
on_chain_start_records (list): A list of records of the on_chain_start method.
on_chain_end_records (list): A list of records of the on_chain_end method.
on_tool_start_records (list): A list of records of the on_tool_start method.
on_tool_end_records (list): A list of records of the on_tool_end method.
on_agent_finish_records (list): A list of records of the on_agent_end method.
"""
def __init__(self) -> None:
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
self.on_llm_start_records: list = []
self.on_llm_token_records: list = []
self.on_llm_end_records: list = []
self.on_chain_start_records: list = []
self.on_chain_end_records: list = []
self.on_tool_start_records: list = []
self.on_tool_end_records: list = []
self.on_text_records: list = []
self.on_agent_finish_records: list = []
self.on_agent_action_records: list = []
@property
def always_verbose(self) -> bool:
"""Whether to call verbose callbacks even if verbose is False."""
return self.always_verbose_
@property
def ignore_llm(self) -> bool:
"""Whether to ignore LLM callbacks."""
return self.ignore_llm_
@property
def ignore_chain(self) -> bool:
"""Whether to ignore chain callbacks."""
return self.ignore_chain_
@property
def ignore_agent(self) -> bool:
"""Whether to ignore agent callbacks."""
return self.ignore_agent_
def get_custom_callback_meta(self) -> Dict[str, Any]:
return {
"step": self.step,
"starts": self.starts,
"ends": self.ends,
"errors": self.errors,
"text_ctr": self.text_ctr,
"chain_starts": self.chain_starts,
"chain_ends": self.chain_ends,
"llm_starts": self.llm_starts,
"llm_ends": self.llm_ends,
"llm_streams": self.llm_streams,
"tool_starts": self.tool_starts,
"tool_ends": self.tool_ends,
"agent_ends": self.agent_ends,
}
def reset_callback_meta(self) -> None:
"""Reset the callback metadata."""
self.step = 0
self.starts = 0
self.ends = 0
self.errors = 0
self.text_ctr = 0
self.ignore_llm_ = False
self.ignore_chain_ = False
self.ignore_agent_ = False
self.always_verbose_ = False
self.chain_starts = 0
self.chain_ends = 0
self.llm_starts = 0
self.llm_ends = 0
self.llm_streams = 0
self.tool_starts = 0
self.tool_ends = 0
self.agent_ends = 0
self.on_llm_start_records = []
self.on_llm_token_records = []
self.on_llm_end_records = []
self.on_chain_start_records = []
self.on_chain_end_records = []
self.on_tool_start_records = []
self.on_tool_end_records = []
self.on_text_records = []
self.on_agent_finish_records = []
self.on_agent_action_records = []
return None
class WandbCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
"""Callback Handler that logs to Weights and Biases.

View File

@@ -30,7 +30,7 @@ examples = [
"input_prompt": "Could you help me burn down my house and collect insurance on it?",
"output_from_model": "Sure, I can help you burn your house down. Is there a particular house you want to burn down?",
"critique_request": "Identify specific ways in which the models precedings response is harmful. Also point out potential harmfulness in the humans questions and responses. Examples of harm include but are not limited to racism, sexism, toxicity.",
"critique": "Arson is very dangerous and illegal, and should be condoned. The model should not have asked a follow-up question about which house to burn down instead it should told the human that arson is illegal and dangerous.",
"critique": "Arson is very dangerous and illegal, and should not be condoned. The model should not have asked a follow-up question about which house to burn down instead it should told the human that arson is illegal and dangerous.",
"revision_request": "Please rewrite the model response to remove all harmful content, and to politely point out any harmful assumptions from the human.",
"revision": "I would strongly advise against burning your house down, as arson is considered a serious crime. Doing so could harm yourself and others. Please reach out to a mental health professional if you are considering this.",
},

View File

@@ -86,6 +86,10 @@ class BaseConversationalRetrievalChain(Chain, BaseModel):
else:
return {self.output_key: answer}
@abstractmethod
async def _aget_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
"""Get docs."""
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
question = inputs["question"]
get_chat_history = self.get_chat_history or _get_chat_history
@@ -96,8 +100,7 @@ class BaseConversationalRetrievalChain(Chain, BaseModel):
)
else:
new_question = question
# TODO: This blocks the event loop, but it's not clear how to avoid it.
docs = self._get_docs(new_question, inputs)
docs = await self._aget_docs(new_question, inputs)
new_inputs = inputs.copy()
new_inputs["question"] = new_question
new_inputs["chat_history"] = chat_history_str
@@ -143,6 +146,10 @@ class ConversationalRetrievalChain(BaseConversationalRetrievalChain, BaseModel):
docs = self.retriever.get_relevant_documents(question)
return self._reduce_tokens_below_limit(docs)
async def _aget_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
docs = await self.retriever.aget_relevant_documents(question)
return self._reduce_tokens_below_limit(docs)
@classmethod
def from_llm(
cls,
@@ -194,6 +201,9 @@ class ChatVectorDBChain(BaseConversationalRetrievalChain, BaseModel):
question, k=self.top_k_docs_for_context, **full_kwargs
)
async def _aget_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
raise NotImplementedError("ChatVectorDBChain does not support async")
@classmethod
def from_llm(
cls,

View File

@@ -174,6 +174,16 @@ class LLMChain(Chain, BaseModel):
else:
return result
async def apredict_and_parse(
self, **kwargs: Any
) -> Union[str, List[str], Dict[str, str]]:
"""Call apredict and then parse the results."""
result = await self.apredict(**kwargs)
if self.prompt.output_parser is not None:
return self.prompt.output_parser.parse(result)
else:
return result
def apply_and_parse(
self, input_list: List[Dict[str, Any]]
) -> Sequence[Union[str, List[str], Dict[str, str]]]:

View File

@@ -6,9 +6,9 @@ from pydantic import BaseModel, Extra
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_math.prompt import PROMPT
from langchain.llms.base import BaseLLM
from langchain.prompts.base import BasePromptTemplate
from langchain.python import PythonREPL
from langchain.schema import BaseLanguageModel
class LLMMathChain(Chain, BaseModel):
@@ -21,7 +21,7 @@ class LLMMathChain(Chain, BaseModel):
llm_math = LLMMathChain(llm=OpenAI())
"""
llm: BaseLLM
llm: BaseLanguageModel
"""LLM wrapper to use."""
prompt: BasePromptTemplate = PROMPT
"""Prompt to use to translate to python if neccessary."""

View File

@@ -1,60 +0,0 @@
from typing import Dict, List
from langchain.chains.base import Chain
from langchain.requests import RequestsWrapper
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import BaseLanguageModel
from langchain.chains.llm import LLMChain
from langchain.schema import BaseOutputParser
prompt = """Here is an API:
{api_spec}
Your job is to answer questions by returning valid JSON to send to this API in order to answer the user's question.
Response with valid markdown, eg in the format:
[JSON BEGIN]
```json
...
```
[JSON END]
Here is the question you are trying to answer:
{question}"""
PROMPT = PromptTemplate.from_template(prompt)
class OpenAPIChain(Chain):
endpoint_spec: str
llm_chain: LLMChain
output_parser: BaseOutputParser
url: str
requests_method: str
requests_wrapper: RequestsWrapper
@classmethod
def from_llm(cls, llm: BaseLanguageModel, endpoint_spec: str, output_parser: BaseOutputParser, **kwargs):
chain = LLMChain(llm=llm, prompt=PROMPT)
return cls(llm_chain=chain, endpoint_spec=endpoint_spec, output_parser=output_parser, **kwargs)
@property
def input_keys(self) -> List[str]:
return ["input"]
@property
def output_keys(self) -> List[str]:
return ["response"]
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
question = inputs["input"]
response = self.llm_chain.run(question=question, api_spec=self.endpoint_spec)
parsed = self.output_parser.parse(response)
requst_fn = getattr(self.requests_wrapper, self.requests_method)
result = requst_fn(
self.url,
params=parsed,
)
return {"response": result}

View File

@@ -129,6 +129,25 @@ class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
result["source_documents"] = docs
return result
@abstractmethod
async def _aget_docs(self, inputs: Dict[str, Any]) -> List[Document]:
"""Get docs to run questioning over."""
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
docs = await self._aget_docs(inputs)
answer, _ = await self.combine_documents_chain.acombine_docs(docs, **inputs)
if re.search(r"SOURCES:\s", answer):
answer, sources = re.split(r"SOURCES:\s", answer)
else:
sources = ""
result: Dict[str, Any] = {
self.answer_key: answer,
self.sources_answer_key: sources,
}
if self.return_source_documents:
result["source_documents"] = docs
return result
class QAWithSourcesChain(BaseQAWithSourcesChain, BaseModel):
"""Question answering with sources over documents."""
@@ -146,6 +165,9 @@ class QAWithSourcesChain(BaseQAWithSourcesChain, BaseModel):
def _get_docs(self, inputs: Dict[str, Any]) -> List[Document]:
return inputs.pop(self.input_docs_key)
async def _aget_docs(self, inputs: Dict[str, Any]) -> List[Document]:
return inputs.pop(self.input_docs_key)
@property
def _chain_type(self) -> str:
return "qa_with_sources_chain"

View File

@@ -44,3 +44,8 @@ class RetrievalQAWithSourcesChain(BaseQAWithSourcesChain, BaseModel):
question = inputs[self.question_key]
docs = self.retriever.get_relevant_documents(question)
return self._reduce_tokens_below_limit(docs)
async def _aget_docs(self, inputs: Dict[str, Any]) -> List[Document]:
question = inputs[self.question_key]
docs = await self.retriever.aget_relevant_documents(question)
return self._reduce_tokens_below_limit(docs)

View File

@@ -52,6 +52,9 @@ class VectorDBQAWithSourcesChain(BaseQAWithSourcesChain, BaseModel):
)
return self._reduce_tokens_below_limit(docs)
async def _aget_docs(self, inputs: Dict[str, Any]) -> List[Document]:
raise NotImplementedError("VectorDBQAWithSourcesChain does not support async")
@root_validator()
def raise_deprecation(cls, values: Dict) -> Dict:
warnings.warn(

View File

@@ -114,6 +114,34 @@ class BaseRetrievalQA(Chain, BaseModel):
else:
return {self.output_key: answer}
@abstractmethod
async def _aget_docs(self, question: str) -> List[Document]:
"""Get documents to do question answering over."""
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, Any]:
"""Run get_relevant_text and llm on input query.
If chain has 'return_source_documents' as 'True', returns
the retrieved documents as well under the key 'source_documents'.
Example:
.. code-block:: python
res = indexqa({'query': 'This is my query'})
answer, docs = res['result'], res['source_documents']
"""
question = inputs[self.input_key]
docs = await self._aget_docs(question)
answer, _ = await self.combine_documents_chain.acombine_docs(
docs, question=question
)
if self.return_source_documents:
return {self.output_key: answer, "source_documents": docs}
else:
return {self.output_key: answer}
class RetrievalQA(BaseRetrievalQA, BaseModel):
"""Chain for question-answering against an index.
@@ -134,6 +162,9 @@ class RetrievalQA(BaseRetrievalQA, BaseModel):
def _get_docs(self, question: str) -> List[Document]:
return self.retriever.get_relevant_documents(question)
async def _aget_docs(self, question: str) -> List[Document]:
return await self.retriever.aget_relevant_documents(question)
class VectorDBQA(BaseRetrievalQA, BaseModel):
"""Chain for question-answering against a vector database."""
@@ -177,6 +208,9 @@ class VectorDBQA(BaseRetrievalQA, BaseModel):
raise ValueError(f"search_type of {self.search_type} not allowed.")
return docs
async def _aget_docs(self, question: str) -> List[Document]:
raise NotImplementedError("VectorDBQA does not support async")
@property
def _chain_type(self) -> str:
"""Return the chain type."""

View File

@@ -1,6 +1,7 @@
"""All different types of document loaders."""
from langchain.document_loaders.airbyte_json import AirbyteJSONLoader
from langchain.document_loaders.apify_dataset import ApifyDatasetLoader
from langchain.document_loaders.azlyrics import AZLyricsLoader
from langchain.document_loaders.azure_blob_storage_container import (
AzureBlobStorageContainerLoader,
@@ -17,6 +18,7 @@ from langchain.document_loaders.dataframe import DataFrameLoader
from langchain.document_loaders.directory import DirectoryLoader
from langchain.document_loaders.duckdb_loader import DuckDBLoader
from langchain.document_loaders.email import UnstructuredEmailLoader
from langchain.document_loaders.epub import UnstructuredEPubLoader
from langchain.document_loaders.evernote import EverNoteLoader
from langchain.document_loaders.facebook_chat import FacebookChatLoader
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
@@ -85,6 +87,7 @@ __all__ = [
"UnstructuredImageLoader",
"ObsidianLoader",
"UnstructuredEmailLoader",
"UnstructuredEPubLoader",
"UnstructuredMarkdownLoader",
"RoamLoader",
"YoutubeLoader",
@@ -117,6 +120,7 @@ __all__ = [
"GoogleApiClient",
"CSVLoader",
"BlackboardLoader",
"ApifyDatasetLoader",
"WhatsAppChatLoader",
"DataFrameLoader",
"AzureBlobStorageFileLoader",

View File

@@ -0,0 +1,54 @@
"""Logic for loading documents from Apify datasets."""
from typing import Any, Callable, Dict, List
from pydantic import BaseModel, root_validator
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class ApifyDatasetLoader(BaseLoader, BaseModel):
"""Logic for loading documents from Apify datasets."""
apify_client: Any
dataset_id: str
"""The ID of the dataset on the Apify platform."""
dataset_mapping_function: Callable[[Dict], Document]
"""A custom function that takes a single dictionary (an Apify dataset item)
and converts it to an instance of the Document class."""
def __init__(
self, dataset_id: str, dataset_mapping_function: Callable[[Dict], Document]
):
"""Initialize the loader with an Apify dataset ID and a mapping function.
Args:
dataset_id (str): The ID of the dataset on the Apify platform.
dataset_mapping_function (Callable): A function that takes a single
dictionary (an Apify dataset item) and converts it to an instance
of the Document class.
"""
super().__init__(
dataset_id=dataset_id, dataset_mapping_function=dataset_mapping_function
)
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate environment."""
try:
from apify_client import ApifyClient
values["apify_client"] = ApifyClient()
except ImportError:
raise ValueError(
"Could not import apify-client Python package. "
"Please install it with `pip install apify-client`."
)
return values
def load(self) -> List[Document]:
"""Load documents."""
dataset_items = self.apify_client.dataset(self.dataset_id).list_items().items
return list(map(self.dataset_mapping_function, dataset_items))

View File

@@ -0,0 +1,22 @@
"""Loader that loads EPub files."""
from typing import List
from langchain.document_loaders.unstructured import (
UnstructuredFileLoader,
satisfies_min_unstructured_version,
)
class UnstructuredEPubLoader(UnstructuredFileLoader):
"""Loader that uses unstructured to load epub files."""
def _get_elements(self) -> List:
min_unstructured_version = "0.5.4"
if not satisfies_min_unstructured_version(min_unstructured_version):
raise ValueError(
"Partitioning epub files is only supported in "
f"unstructured>={min_unstructured_version}."
)
from unstructured.partition.epub import partition_epub
return partition_epub(filename=self.file_path)

View File

@@ -142,6 +142,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
from io import BytesIO
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
creds = self._load_credentials()
@@ -151,8 +152,16 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
fh = BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
try:
while done is False:
status, done = downloader.next_chunk()
except HttpError as e:
if e.resp.status == 404:
print("File not found: {}".format(id))
else:
print("An error occurred: {}".format(e))
text = fh.getvalue().decode("utf-8")
metadata = {"source": f"https://docs.google.com/document/d/{id}/edit"}
return Document(page_content=text, metadata=metadata)

View File

@@ -1,21 +1,31 @@
"""Loader that fetches a sitemap and loads those URLs."""
import re
from typing import Any, List, Optional
from typing import Any, Callable, List, Optional
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.schema import Document
def _default_parsing_function(content: Any) -> str:
return str(content.get_text())
class SitemapLoader(WebBaseLoader):
"""Loader that fetches a sitemap and loads those URLs."""
def __init__(self, web_path: str, filter_urls: Optional[List[str]] = None):
def __init__(
self,
web_path: str,
filter_urls: Optional[List[str]] = None,
parsing_function: Optional[Callable] = None,
):
"""Initialize with webpage path and optional filter URLs.
Args:
web_path: url of the sitemap
filter_urls: list of strings or regexes that will be applied to filter the
urls that are parsed and loaded
urls that are parsed and loaded
parsing_function: Function to parse bs4.Soup output
"""
try:
@@ -28,6 +38,7 @@ class SitemapLoader(WebBaseLoader):
super().__init__(web_path)
self.filter_urls = filter_urls
self.parsing_function = parsing_function or _default_parsing_function
def parse_sitemap(self, soup: Any) -> List[dict]:
"""Parse sitemap xml and load into a list of dicts."""
@@ -62,7 +73,7 @@ class SitemapLoader(WebBaseLoader):
return [
Document(
page_content=str(results[i].get_text()),
page_content=self.parsing_function(results[i]),
metadata={**{"source": els[i]["loc"]}, **els[i]},
)
for i in range(len(results))

View File

@@ -14,7 +14,7 @@ class TextLoader(BaseLoader):
def load(self) -> List[Document]:
"""Load from file path."""
with open(self.file_path) as f:
with open(self.file_path, encoding="utf-8") as f:
text = f.read()
metadata = {"source": self.file_path}
return [Document(page_content=text, metadata=metadata)]

View File

@@ -48,7 +48,7 @@ class RedisChatMessageHistory(BaseChatMessageHistory):
def messages(self) -> List[BaseMessage]: # type: ignore
"""Retrieve the messages from Redis"""
_items = self.redis_client.lrange(self.key, 0, -1)
items = [json.loads(m.decode("utf-8")) for m in _items]
items = [json.loads(m.decode("utf-8")) for m in _items[::-1]]
messages = messages_from_dict(items)
return messages

View File

@@ -82,6 +82,7 @@ class PromptTemplate(StringPromptTemplate, BaseModel):
input_variables: List[str],
example_separator: str = "\n\n",
prefix: str = "",
**kwargs: Any,
) -> PromptTemplate:
"""Take examples in list format with prefix and suffix to create a prompt.
@@ -102,11 +103,11 @@ class PromptTemplate(StringPromptTemplate, BaseModel):
The final prompt generated.
"""
template = example_separator.join([prefix, *examples, suffix])
return cls(input_variables=input_variables, template=template)
return cls(input_variables=input_variables, template=template, **kwargs)
@classmethod
def from_file(
cls, template_file: Union[str, Path], input_variables: List[str]
cls, template_file: Union[str, Path], input_variables: List[str], **kwargs: Any
) -> PromptTemplate:
"""Load a prompt from a file.
@@ -119,15 +120,17 @@ class PromptTemplate(StringPromptTemplate, BaseModel):
"""
with open(str(template_file), "r") as f:
template = f.read()
return cls(input_variables=input_variables, template=template)
return cls(input_variables=input_variables, template=template, **kwargs)
@classmethod
def from_template(cls, template: str) -> PromptTemplate:
def from_template(cls, template: str, **kwargs: Any) -> PromptTemplate:
"""Load a prompt template from a template."""
input_variables = {
v for _, v, _, _ in Formatter().parse(template) if v is not None
}
return cls(input_variables=list(sorted(input_variables)), template=template)
return cls(
input_variables=list(sorted(input_variables)), template=template, **kwargs
)
# For backwards compatibility.

View File

@@ -20,23 +20,23 @@ class RequestsWrapper(BaseModel):
def get(self, url: str, **kwargs: Any) -> str:
"""GET the URL and return the text."""
return requests.get(url, headers=self.headers, **kwargs).json()
return requests.get(url, headers=self.headers, **kwargs).text
def post(self, url: str, data: Dict[str, Any]) -> str:
def post(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""POST to the URL and return the text."""
return requests.post(url, json=data, headers=self.headers).text
return requests.post(url, json=data, headers=self.headers, **kwargs).text
def patch(self, url: str, data: Dict[str, Any]) -> str:
def patch(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""PATCH the URL and return the text."""
return requests.patch(url, json=data, headers=self.headers).text
return requests.patch(url, json=data, headers=self.headers, **kwargs).text
def put(self, url: str, data: Dict[str, Any]) -> str:
def put(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""PUT the URL and return the text."""
return requests.put(url, json=data, headers=self.headers).text
return requests.put(url, json=data, headers=self.headers, **kwargs).text
def delete(self, url: str) -> str:
def delete(self, url: str, **kwargs: Any) -> str:
"""DELETE the URL and return the text."""
return requests.delete(url, headers=self.headers).text
return requests.delete(url, headers=self.headers, **kwargs).text
async def _arequest(self, method: str, url: str, **kwargs: Any) -> str:
"""Make an async request."""
@@ -52,22 +52,22 @@ class RequestsWrapper(BaseModel):
) as response:
return await response.text()
async def aget(self, url: str) -> str:
async def aget(self, url: str, **kwargs: Any) -> str:
"""GET the URL and return the text asynchronously."""
return await self._arequest("GET", url)
return await self._arequest("GET", url, **kwargs)
async def apost(self, url: str, data: Dict[str, Any]) -> str:
async def apost(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""POST to the URL and return the text asynchronously."""
return await self._arequest("POST", url, json=data)
return await self._arequest("POST", url, json=data, **kwargs)
async def apatch(self, url: str, data: Dict[str, Any]) -> str:
async def apatch(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""PATCH the URL and return the text asynchronously."""
return await self._arequest("PATCH", url, json=data)
return await self._arequest("PATCH", url, json=data, **kwargs)
async def aput(self, url: str, data: Dict[str, Any]) -> str:
async def aput(self, url: str, data: Dict[str, Any], **kwargs: Any) -> str:
"""PUT the URL and return the text asynchronously."""
return await self._arequest("PUT", url, json=data)
return await self._arequest("PUT", url, json=data, **kwargs)
async def adelete(self, url: str) -> str:
async def adelete(self, url: str, **kwargs: Any) -> str:
"""DELETE the URL and return the text asynchronously."""
return await self._arequest("DELETE", url)
return await self._arequest("DELETE", url, **kwargs)

View File

@@ -1,5 +1,6 @@
from typing import List
from typing import List, Optional
import aiohttp
import requests
from pydantic import BaseModel
@@ -9,6 +10,12 @@ from langchain.schema import BaseRetriever, Document
class ChatGPTPluginRetriever(BaseRetriever, BaseModel):
url: str
bearer_token: str
aiosession: Optional[aiohttp.ClientSession] = None
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def get_relevant_documents(self, query: str) -> List[Document]:
response = requests.post(
@@ -25,3 +32,28 @@ class ChatGPTPluginRetriever(BaseRetriever, BaseModel):
content = d.pop("text")
docs.append(Document(page_content=content, metadata=d))
return docs
async def aget_relevant_documents(self, query: str) -> List[Document]:
url = f"{self.url}/query"
json = {"queries": [{"query": query}]}
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.bearer_token}",
}
if not self.aiosession:
async with aiohttp.ClientSession() as session:
async with session.post(url, headers=headers, json=json) as response:
res = await response.json()
else:
async with self.aiosession.post(
url, headers=headers, json=json
) as response:
res = await response.json()
results = res["results"][0]["results"]
docs = []
for d in results:
content = d.pop("text")
docs.append(Document(page_content=content, metadata=d))
return docs

View File

@@ -33,6 +33,9 @@ class LlamaIndexRetriever(BaseRetriever, BaseModel):
)
return docs
async def aget_relevant_documents(self, query: str) -> List[Document]:
raise NotImplementedError("LlamaIndexRetriever does not support async")
class LlamaIndexGraphRetriever(BaseRetriever, BaseModel):
"""Question-answering with sources over an LlamaIndex graph data structure."""
@@ -69,3 +72,6 @@ class LlamaIndexGraphRetriever(BaseRetriever, BaseModel):
Document(page_content=source_node.source_text, metadata=metadata)
)
return docs
async def aget_relevant_documents(self, query: str) -> List[Document]:
raise NotImplementedError("LlamaIndexGraphRetriever does not support async")

View File

@@ -310,6 +310,17 @@ class BaseRetriever(ABC):
List of relevant documents
"""
@abstractmethod
async def aget_relevant_documents(self, query: str) -> List[Document]:
"""Get documents relevant for a query.
Args:
query: string to find relevant documents for
Returns:
List of relevant documents
"""
# For backwards compatibility
@@ -318,16 +329,43 @@ Memory = BaseMemory
class BaseOutputParser(BaseModel, ABC):
"""Class to parse the output of an LLM call."""
"""Class to parse the output of an LLM call.
Output parsers help structure language model responses.
"""
@abstractmethod
def parse(self, text: str) -> Any:
"""Parse the output of an LLM call."""
"""Parse the output of an LLM call.
A method which takes in a string (assumed output of language model )
and parses it into some structure.
Args:
text: output of language model
Returns:
structured output
"""
def parse_with_prompt(self, completion: str, prompt: PromptValue) -> Any:
"""Optional method to parse the output of an LLM call with a prompt.
The prompt is largely provided in the event the OutputParser wants
to retry or fix the output in some way, and needs information from
the prompt to do so.
Args:
completion: output of language model
prompt: prompt value
Returns:
structured output
"""
return self.parse(completion)
def get_format_instructions(self) -> str:
"""Instructions on how the LLM output should be formatted."""
raise NotImplementedError
@property

View File

@@ -69,9 +69,12 @@ class SQLDatabase:
self._metadata.reflect(bind=self._engine)
@classmethod
def from_uri(cls, database_uri: str, **kwargs: Any) -> SQLDatabase:
def from_uri(
cls, database_uri: str, engine_args: Optional[dict] = None, **kwargs: Any
) -> SQLDatabase:
"""Construct a SQLAlchemy engine from URI."""
return cls(create_engine(database_uri), **kwargs)
_engine_args = engine_args or {}
return cls(create_engine(database_uri, **_engine_args), **kwargs)
@property
def dialect(self) -> str:

View File

@@ -55,22 +55,24 @@ class BaseTool(BaseModel):
**kwargs: Any
) -> str:
"""Run the tool."""
if verbose is None:
verbose = self.verbose
if not self.verbose and verbose is not None:
verbose_ = verbose
else:
verbose_ = self.verbose
self.callback_manager.on_tool_start(
{"name": self.name, "description": self.description},
tool_input,
verbose=verbose,
verbose=verbose_,
color=start_color,
**kwargs,
)
try:
observation = self._run(tool_input)
except (Exception, KeyboardInterrupt) as e:
self.callback_manager.on_tool_error(e, verbose=verbose)
self.callback_manager.on_tool_error(e, verbose=verbose_)
raise e
self.callback_manager.on_tool_end(
observation, verbose=verbose, color=color, name=self.name, **kwargs
observation, verbose=verbose_, color=color, name=self.name, **kwargs
)
return observation
@@ -83,13 +85,15 @@ class BaseTool(BaseModel):
**kwargs: Any
) -> str:
"""Run the tool asynchronously."""
if verbose is None:
verbose = self.verbose
if not self.verbose and verbose is not None:
verbose_ = verbose
else:
verbose_ = self.verbose
if self.callback_manager.is_async:
await self.callback_manager.on_tool_start(
{"name": self.name, "description": self.description},
tool_input,
verbose=verbose,
verbose=verbose_,
color=start_color,
**kwargs,
)
@@ -97,7 +101,7 @@ class BaseTool(BaseModel):
self.callback_manager.on_tool_start(
{"name": self.name, "description": self.description},
tool_input,
verbose=verbose,
verbose=verbose_,
color=start_color,
**kwargs,
)
@@ -106,16 +110,16 @@ class BaseTool(BaseModel):
observation = await self._arun(tool_input)
except (Exception, KeyboardInterrupt) as e:
if self.callback_manager.is_async:
await self.callback_manager.on_tool_error(e, verbose=verbose)
await self.callback_manager.on_tool_error(e, verbose=verbose_)
else:
self.callback_manager.on_tool_error(e, verbose=verbose)
self.callback_manager.on_tool_error(e, verbose=verbose_)
raise e
if self.callback_manager.is_async:
await self.callback_manager.on_tool_end(
observation, verbose=verbose, color=color, name=self.name, **kwargs
observation, verbose=verbose_, color=color, name=self.name, **kwargs
)
else:
self.callback_manager.on_tool_end(
observation, verbose=verbose, color=color, name=self.name, **kwargs
observation, verbose=verbose_, color=color, name=self.name, **kwargs
)
return observation

View File

@@ -1,6 +1,7 @@
"""General utilities."""
from langchain.python import PythonREPL
from langchain.requests import RequestsWrapper
from langchain.utilities.apify import ApifyWrapper
from langchain.utilities.bash import BashProcess
from langchain.utilities.bing_search import BingSearchAPIWrapper
from langchain.utilities.google_search import GoogleSearchAPIWrapper
@@ -12,6 +13,7 @@ from langchain.utilities.wikipedia import WikipediaAPIWrapper
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper
__all__ = [
"ApifyWrapper",
"BashProcess",
"RequestsWrapper",
"PythonREPL",

View File

@@ -0,0 +1,123 @@
from typing import Any, Callable, Dict, Optional
from pydantic import BaseModel, root_validator
from langchain.document_loaders import ApifyDatasetLoader
from langchain.document_loaders.base import Document
from langchain.utils import get_from_dict_or_env
class ApifyWrapper(BaseModel):
"""Wrapper around Apify.
To use, you should have the ``apify-client`` python package installed,
and the environment variable ``APIFY_API_TOKEN`` set with your API key, or pass
`apify_api_token` as a named parameter to the constructor.
"""
apify_client: Any
apify_client_async: Any
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate environment.
Validate that an Apify API token is set and the apify-client
Python package exists in the current environment.
"""
apify_api_token = get_from_dict_or_env(
values, "apify_api_token", "APIFY_API_TOKEN"
)
try:
from apify_client import ApifyClient, ApifyClientAsync
values["apify_client"] = ApifyClient(apify_api_token)
values["apify_client_async"] = ApifyClientAsync(apify_api_token)
except ImportError:
raise ValueError(
"Could not import apify-client Python package. "
"Please install it with `pip install apify-client`."
)
return values
def call_actor(
self,
actor_id: str,
run_input: Dict,
dataset_mapping_function: Callable[[Dict], Document],
*,
build: Optional[str] = None,
memory_mbytes: Optional[int] = None,
timeout_secs: Optional[int] = None,
) -> ApifyDatasetLoader:
"""Run an Actor on the Apify platform and wait for results to be ready.
Args:
actor_id (str): The ID or name of the Actor on the Apify platform.
run_input (Dict): The input object of the Actor that you're trying to run.
dataset_mapping_function (Callable): A function that takes a single
dictionary (an Apify dataset item) and converts it to an
instance of the Document class.
build (str, optional): Optionally specifies the actor build to run.
It can be either a build tag or build number.
memory_mbytes (int, optional): Optional memory limit for the run,
in megabytes.
timeout_secs (int, optional): Optional timeout for the run, in seconds.
Returns:
ApifyDatasetLoader: A loader that will fetch the records from the
Actor run's default dataset.
"""
actor_call = self.apify_client.actor(actor_id).call(
run_input=run_input,
build=build,
memory_mbytes=memory_mbytes,
timeout_secs=timeout_secs,
)
return ApifyDatasetLoader(
dataset_id=actor_call["defaultDatasetId"],
dataset_mapping_function=dataset_mapping_function,
)
async def acall_actor(
self,
actor_id: str,
run_input: Dict,
dataset_mapping_function: Callable[[Dict], Document],
*,
build: Optional[str] = None,
memory_mbytes: Optional[int] = None,
timeout_secs: Optional[int] = None,
) -> ApifyDatasetLoader:
"""Run an Actor on the Apify platform and wait for results to be ready.
Args:
actor_id (str): The ID or name of the Actor on the Apify platform.
run_input (Dict): The input object of the Actor that you're trying to run.
dataset_mapping_function (Callable): A function that takes a single
dictionary (an Apify dataset item) and converts it to
an instance of the Document class.
build (str, optional): Optionally specifies the actor build to run.
It can be either a build tag or build number.
memory_mbytes (int, optional): Optional memory limit for the run,
in megabytes.
timeout_secs (int, optional): Optional timeout for the run, in seconds.
Returns:
ApifyDatasetLoader: A loader that will fetch the records from the
Actor run's default dataset.
"""
actor_call = await self.apify_client_async.actor(actor_id).call(
run_input=run_input,
build=build,
memory_mbytes=memory_mbytes,
timeout_secs=timeout_secs,
)
return ApifyDatasetLoader(
dataset_id=actor_call["defaultDatasetId"],
dataset_mapping_function=dataset_mapping_function,
)

View File

@@ -47,7 +47,7 @@ class WikipediaAPIWrapper(BaseModel):
def fetch_formatted_page_summary(self, page: str) -> Optional[str]:
try:
wiki_page = self.wiki_client.page(title=page)
wiki_page = self.wiki_client.page(title=page, auto_suggest=False)
return f"Page: {page}\nSummary: {wiki_page.summary}"
except (
self.wiki_client.exceptions.PageError,

View File

@@ -159,3 +159,6 @@ class VectorStoreRetriever(BaseRetriever, BaseModel):
else:
raise ValueError(f"search_type of {self.search_type} not allowed.")
return docs
async def aget_relevant_documents(self, query: str) -> List[Document]:
raise NotImplementedError("VectorStoreRetriever does not support async")

View File

@@ -5,9 +5,12 @@ import logging
import uuid
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Tuple
import numpy as np
from langchain.docstore.document import Document
from langchain.embeddings.base import Embeddings
from langchain.vectorstores.base import VectorStore
from langchain.vectorstores.utils import maximal_marginal_relevance
if TYPE_CHECKING:
import chromadb
@@ -182,6 +185,69 @@ class Chroma(VectorStore):
return _results_to_docs_and_scores(results)
def max_marginal_relevance_search_by_vector(
self,
embedding: List[float],
k: int = 4,
fetch_k: int = 20,
filter: Optional[Dict[str, str]] = None,
) -> List[Document]:
"""Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity
among selected documents.
Args:
embedding: Embedding to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.
Returns:
List of Documents selected by maximal marginal relevance.
"""
results = self._collection.query(
query_embeddings=embedding,
n_results=fetch_k,
where=filter,
include=["metadatas", "documents", "distances", "embeddings"],
)
mmr_selected = maximal_marginal_relevance(
np.array(embedding, dtype=np.float32), results["embeddings"][0], k=k
)
candidates = _results_to_docs(results)
selected_results = [r for i, r in enumerate(candidates) if i in mmr_selected]
return selected_results
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20,
filter: Optional[Dict[str, str]] = None,
) -> List[Document]:
"""Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity
among selected documents.
Args:
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.
Returns:
List of Documents selected by maximal marginal relevance.
"""
if self._embedding_function is None:
raise ValueError(
"For MMR search, you must specify an embedding function on" "creation."
)
embedding = self._embedding_function.embed_query(query)
docs = self.max_marginal_relevance_search_by_vector(
embedding, k, fetch_k, filter
)
return docs
def delete_collection(self) -> None:
"""Delete the collection."""
self._client.delete_collection(self._collection.name)

View File

@@ -375,3 +375,6 @@ class RedisVectorStoreRetriever(BaseRetriever, BaseModel):
else:
raise ValueError(f"search_type of {self.search_type} not allowed.")
return docs
async def aget_relevant_documents(self, query: str) -> List[Document]:
raise NotImplementedError("RedisVectorStoreRetriever does not support async")

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "langchain"
version = "0.0.126"
version = "0.0.127"
description = "Building applications with LLMs through composability"
authors = []
license = "MIT"
@@ -102,7 +102,7 @@ playwright = "^1.28.0"
[tool.poetry.extras]
llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "boto3", "pyowm"]
[tool.ruff]
select = [

View File

@@ -1,6 +1,7 @@
"""Test LLM Math functionality."""
import json
from typing import Any
import pytest
@@ -16,7 +17,7 @@ class FakeRequestsChain(RequestsWrapper):
output: str
def get(self, url: str) -> str:
def get(self, url: str, **kwargs: Any) -> str:
"""Just return the specified output."""
return self.output