mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-09 18:51:07 +00:00
Compare commits
41 Commits
vwp/simpli
...
ankush/cal
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f090b94eb5 | ||
|
|
9a7488a5ce | ||
|
|
20ec1173f4 | ||
|
|
949729ff5c | ||
|
|
c5a7a85a4e | ||
|
|
3c6fa9126a | ||
|
|
d784401215 | ||
|
|
71a7c16ee0 | ||
|
|
d1f65d8dc1 | ||
|
|
8b3df18bcc | ||
|
|
6655f43282 | ||
|
|
28d6277396 | ||
|
|
db45970a66 | ||
|
|
4c572ffe95 | ||
|
|
001b147450 | ||
|
|
8441cff1d7 | ||
|
|
6258f72a00 | ||
|
|
14a611775c | ||
|
|
80b3fdf2f7 | ||
|
|
6632188606 | ||
|
|
6afb463e9b | ||
|
|
47c2ec2d0b | ||
|
|
342b671d05 | ||
|
|
983a213bdc | ||
|
|
22603d19e0 | ||
|
|
373ad49157 | ||
|
|
bc66b3fb8d | ||
|
|
3bae595182 | ||
|
|
8d07ba0d51 | ||
|
|
b61f50665e | ||
|
|
0ad76c3380 | ||
|
|
bd9e0f3934 | ||
|
|
359fb8fa3a | ||
|
|
4c8aad0d1b | ||
|
|
d765d77e9b | ||
|
|
af41cdfc8b | ||
|
|
226a7521ed | ||
|
|
5ffa924488 | ||
|
|
6b47aaab82 | ||
|
|
f39340ff6b | ||
|
|
ea09c0846f |
8
.github/PULL_REQUEST_TEMPLATE.md
vendored
8
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,5 +1,3 @@
|
||||
# Your PR Title (What it does)
|
||||
|
||||
<!--
|
||||
Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution.
|
||||
|
||||
@@ -14,7 +12,7 @@ Finally, we'd love to show appreciation for your contribution - if you'd like us
|
||||
|
||||
Fixes # (issue)
|
||||
|
||||
## Before submitting
|
||||
#### Before submitting
|
||||
|
||||
<!-- If you're adding a new integration, please include:
|
||||
|
||||
@@ -28,9 +26,9 @@ etc:
|
||||
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
-->
|
||||
|
||||
## Who can review?
|
||||
#### Who can review?
|
||||
|
||||
Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:
|
||||
Tag maintainers/contributors who might be interested:
|
||||
|
||||
<!-- For a quicker response, figure out the right person to tag with @
|
||||
|
||||
|
||||
5
.gitignore
vendored
5
.gitignore
vendored
@@ -149,4 +149,7 @@ wandb/
|
||||
|
||||
# integration test artifacts
|
||||
data_map*
|
||||
\[('_type', 'fake'), ('stop', None)]
|
||||
\[('_type', 'fake'), ('stop', None)]
|
||||
|
||||
# Replit files
|
||||
*replit*
|
||||
@@ -1,13 +1,14 @@
|
||||
# Tutorials
|
||||
|
||||
This is a collection of `LangChain` tutorials mostly on `YouTube`.
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
|
||||
⛓ icon marks a new video [last update 2023-05-15]
|
||||
### DeepLearning.AI course
|
||||
⛓[LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain) by Harrison Chase presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
|
||||
|
||||
###
|
||||
### Handbook
|
||||
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
|
||||
|
||||
###
|
||||
### Tutorials
|
||||
[LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
|
||||
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
|
||||
@@ -108,4 +109,4 @@ LangChain by [Chat with data](https://www.youtube.com/@chatwithdata)
|
||||
- ⛓ [Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations](https://youtu.be/CyuUlf54wTs)
|
||||
|
||||
---------------------
|
||||
⛓ icon marks a new video [last update 2023-05-15]
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
|
||||
184
docs/integrations/agent_with_wandb_tracing.ipynb
Normal file
184
docs/integrations/agent_with_wandb_tracing.ipynb
Normal file
File diff suppressed because one or more lines are too long
29
docs/integrations/argilla.md
Normal file
29
docs/integrations/argilla.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Argilla
|
||||
|
||||

|
||||
|
||||
>[Argilla](https://argilla.io/) is an open-source data curation platform for LLMs.
|
||||
> Using Argilla, everyone can build robust language models through faster data curation
|
||||
> using both human and machine feedback. We provide support for each step in the MLOps cycle,
|
||||
> from data labeling to model monitoring.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you'll need to install the `argilla` Python package as follows:
|
||||
|
||||
```bash
|
||||
pip install argilla --upgrade
|
||||
```
|
||||
|
||||
If you already have an Argilla Server running, then you're good to go; but if
|
||||
you don't, follow the next steps to install it.
|
||||
|
||||
If you don't you can refer to [Argilla - 🚀 Quickstart](https://docs.argilla.io/en/latest/getting_started/quickstart.html#Running-Argilla-Quickstart) to deploy Argilla either on HuggingFace Spaces, locally, or on a server.
|
||||
|
||||
## Tracking
|
||||
|
||||
See a [usage example of `ArgillaCallbackHandler`](../modules/callbacks/examples/examples/argilla.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.callbacks import ArgillaCallbackHandler
|
||||
```
|
||||
30
docs/integrations/discord.md
Normal file
30
docs/integrations/discord.md
Normal file
@@ -0,0 +1,30 @@
|
||||
# Discord
|
||||
|
||||
>[Discord](https://discord.com/) is a VoIP and instant messaging social platform. Users have the ability to communicate
|
||||
> with voice calls, video calls, text messaging, media and files in private chats or as part of communities called
|
||||
> "servers". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install pandas
|
||||
```
|
||||
|
||||
Follow these steps to download your `Discord` data:
|
||||
|
||||
1. Go to your **User Settings**
|
||||
2. Then go to **Privacy and Safety**
|
||||
3. Head over to the **Request all of my Data** and click on **Request Data** button
|
||||
|
||||
It might take 30 days for you to receive your data. You'll receive an email at the address which is registered
|
||||
with Discord. That email will have a download button using which you would be able to download your personal Discord data.
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/discord.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DiscordChatLoader
|
||||
```
|
||||
@@ -1,25 +1,20 @@
|
||||
# Docugami
|
||||
|
||||
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests of
|
||||
> XML semantic trees representing entire documents.
|
||||
> This is a rich representation that includes the semantic and
|
||||
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests
|
||||
> of XML semantic trees representing entire documents. This is a rich representation that includes the semantic and
|
||||
> structural characteristics of various chunks in the document as an XML tree.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Create a Docugami workspace: <a href="http://www.docugami.com">http://www.docugami.com</a> (free trials available)
|
||||
2. Add your documents (PDF, DOCX or DOC) and allow Docugami to ingest and cluster them into sets of similar documents, e.g. NDAs, Lease Agreements, and Service Agreements. There is no fixed set of document types supported by the system, the clusters created depend on your particular documents, and you can [change the docset assignments](https://help.docugami.com/home/working-with-the-doc-sets-view) later.
|
||||
3. Create an access token via the Developer Playground for your workspace. Detailed instructions: https://help.docugami.com/home/docugami-api
|
||||
4. Explore the Docugami API at <a href="https://api-docs.docugami.com">https://api-docs.docugami.com</a> to get a list of your processed docset IDs, or just the document IDs for a particular docset.
|
||||
6. Use the DocugamiLoader as detailed in [this notebook](../modules/indexes/document_loaders/examples/docugami.ipynb), to get rich semantic chunks for your documents.
|
||||
7. Optionally, build and publish one or more [reports or abstracts](https://help.docugami.com/home/reports). This helps Docugami improve the semantic XML with better tags based on your preferences, which are then added to the DocugamiLoader output as metadata. Use techniques like [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) to do high accuracy Document QA.
|
||||
```bash
|
||||
pip install lxml
|
||||
```
|
||||
|
||||
## Advantages vs Other Chunking Techniques
|
||||
## Document Loader
|
||||
|
||||
Appropriate chunking of your documents is critical for retrieval from documents. Many chunking techniques exist, including simple ones that rely on whitespace and recursive chunk splitting based on character length. Docugami offers a different approach:
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/docugami.ipynb).
|
||||
|
||||
1. **Intelligent Chunking:** Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.
|
||||
2. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.
|
||||
3. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.
|
||||
4. **Additional Metadata:** Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through in [this notebook](../modules/indexes/document_loaders/examples/docugami.ipynb).
|
||||
```python
|
||||
from langchain.document_loaders import DocugamiLoader
|
||||
```
|
||||
|
||||
19
docs/integrations/duckdb.md
Normal file
19
docs/integrations/duckdb.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# DuckDB
|
||||
|
||||
>[DuckDB](https://duckdb.org/) is an in-process SQL OLAP database management system.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `duckdb` python package.
|
||||
|
||||
```bash
|
||||
pip install duckdb
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/duckdb.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DuckDBLoader
|
||||
```
|
||||
20
docs/integrations/evernote.md
Normal file
20
docs/integrations/evernote.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# EverNote
|
||||
|
||||
>[EverNote](https://evernote.com/) is intended for archiving and creating notes in which photos, audio and saved web content can be embedded. Notes are stored in virtual "notebooks" and can be tagged, annotated, edited, searched, and exported.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `lxml` and `html2text` python packages.
|
||||
|
||||
```bash
|
||||
pip install lxml
|
||||
pip install html2text
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/evernote.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import EverNoteLoader
|
||||
```
|
||||
21
docs/integrations/facebook_chat.md
Normal file
21
docs/integrations/facebook_chat.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Facebook Chat
|
||||
|
||||
>[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and
|
||||
> platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its
|
||||
> messaging service in 2010.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `pandas` python package.
|
||||
|
||||
```bash
|
||||
pip install pandas
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/facebook_chat.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import FacebookChatLoader
|
||||
```
|
||||
21
docs/integrations/figma.md
Normal file
21
docs/integrations/figma.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Figma
|
||||
|
||||
>[Figma](https://www.figma.com/) is a collaborative web application for interface design.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
The Figma API requires an `access token`, `node_ids`, and a `file key`.
|
||||
|
||||
The `file key` can be pulled from the URL. https://www.figma.com/file/{filekey}/sampleFilename
|
||||
|
||||
`Node IDs` are also available in the URL. Click on anything and look for the '?node-id={node_id}' param.
|
||||
|
||||
`Access token` [instructions](https://help.figma.com/hc/en-us/articles/8085703771159-Manage-personal-access-tokens).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/figma.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import FigmaFileLoader
|
||||
```
|
||||
19
docs/integrations/git.md
Normal file
19
docs/integrations/git.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Git
|
||||
|
||||
>[Git](https://en.wikipedia.org/wiki/Git) is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `GitPython` python package.
|
||||
|
||||
```bash
|
||||
pip install GitPython
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/git.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GitLoader
|
||||
```
|
||||
15
docs/integrations/gitbook.md
Normal file
15
docs/integrations/gitbook.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# GitBook
|
||||
|
||||
>[GitBook](https://docs.gitbook.com/) is a modern documentation platform where teams can document everything from products to internal knowledge bases and APIs.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/gitbook.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GitbookLoader
|
||||
```
|
||||
20
docs/integrations/google_bigquery.md
Normal file
20
docs/integrations/google_bigquery.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Google BigQuery
|
||||
|
||||
>[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
|
||||
`BigQuery` is a part of the `Google Cloud Platform`.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `google-cloud-bigquery` python package.
|
||||
|
||||
```bash
|
||||
pip install google-cloud-bigquery
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_bigquery.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import BigQueryLoader
|
||||
```
|
||||
26
docs/integrations/google_cloud_storage.md
Normal file
26
docs/integrations/google_cloud_storage.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Google Cloud Storage
|
||||
|
||||
>[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `google-cloud-bigquery` python package.
|
||||
|
||||
```bash
|
||||
pip install google-cloud-storage
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GCSDirectoryLoader
|
||||
```
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GCSFileLoader
|
||||
```
|
||||
22
docs/integrations/google_drive.md
Normal file
22
docs/integrations/google_drive.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Google Drive
|
||||
|
||||
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
|
||||
|
||||
Currently, only `Google Docs` are supported.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install several python package.
|
||||
|
||||
```bash
|
||||
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example and authorizing instructions](../modules/indexes/document_loaders/examples/google_drive.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GoogleDriveLoader
|
||||
```
|
||||
15
docs/integrations/gutenberg.md
Normal file
15
docs/integrations/gutenberg.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Gutenberg
|
||||
|
||||
>[Project Gutenberg](https://www.gutenberg.org/about/) is an online library of free eBooks.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/gutenberg.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GutenbergLoader
|
||||
```
|
||||
18
docs/integrations/hacker_news.md
Normal file
18
docs/integrations/hacker_news.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Hacker News
|
||||
|
||||
>[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as `HN`) is a social news
|
||||
> website focusing on computer science and entrepreneurship. It is run by the investment fund and startup
|
||||
> incubator `Y Combinator`. In general, content that can be submitted is defined as "anything that gratifies
|
||||
> one's intellectual curiosity."
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/hacker_news.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import HNLoader
|
||||
```
|
||||
16
docs/integrations/ifixit.md
Normal file
16
docs/integrations/ifixit.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# iFixit
|
||||
|
||||
>[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k
|
||||
> repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under `CC-BY-NC-SA 3.0`.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/ifixit.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import IFixitLoader
|
||||
```
|
||||
16
docs/integrations/imsdb.md
Normal file
16
docs/integrations/imsdb.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# IMSDb
|
||||
|
||||
>[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.
|
||||
>
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/imsdb.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import IMSDbLoader
|
||||
```
|
||||
31
docs/integrations/mediawikidump.md
Normal file
31
docs/integrations/mediawikidump.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# MediaWikiDump
|
||||
|
||||
>[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki
|
||||
> (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup
|
||||
> of the wiki database, the dump does not contain user accounts, images, edit logs, etc.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to install several python packages.
|
||||
|
||||
The `mediawiki-utilities` supports XML schema 0.11 in unmerged branches.
|
||||
```bash
|
||||
pip install -qU git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11
|
||||
```
|
||||
|
||||
The `mediawiki-utilities mwxml` has a bug, fix PR pending.
|
||||
|
||||
```bash
|
||||
pip install -qU git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11
|
||||
pip install -qU mwparserfromhell
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/mediawikidump.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import MWDumpLoader
|
||||
```
|
||||
22
docs/integrations/microsoft_onedrive.md
Normal file
22
docs/integrations/microsoft_onedrive.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Microsoft OneDrive
|
||||
|
||||
>[Microsoft OneDrive](https://en.wikipedia.org/wiki/OneDrive) (formerly `SkyDrive`) is a file-hosting service operated by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install o365
|
||||
```
|
||||
|
||||
Then follow instructions [here](../modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import OneDriveLoader
|
||||
```
|
||||
16
docs/integrations/microsoft_powerpoint.md
Normal file
16
docs/integrations/microsoft_powerpoint.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Microsoft PowerPoint
|
||||
|
||||
>[Microsoft PowerPoint](https://en.wikipedia.org/wiki/Microsoft_PowerPoint) is a presentation program by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_powerpoint.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredPowerPointLoader
|
||||
```
|
||||
16
docs/integrations/microsoft_word.md
Normal file
16
docs/integrations/microsoft_word.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Microsoft Word
|
||||
|
||||
>[Microsoft Word](https://www.microsoft.com/en-us/microsoft-365/word) is a word processor developed by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_word.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredWordDocumentLoader
|
||||
```
|
||||
19
docs/integrations/modern_treasury.md
Normal file
19
docs/integrations/modern_treasury.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Modern Treasury
|
||||
|
||||
>[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations. It is a unified platform to power products and processes that move money.
|
||||
>- Connect to banks and payment systems
|
||||
>- Track transactions and balances in real-time
|
||||
>- Automate payment operations for scale
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/modern_treasury.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import ModernTreasuryLoader
|
||||
```
|
||||
27
docs/integrations/notion.md
Normal file
27
docs/integrations/notion.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Notion DB
|
||||
|
||||
>[Notion](https://www.notion.so/) is a collaboration platform with modified Markdown support that integrates kanban
|
||||
> boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management,
|
||||
> and project and task management.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
All instructions are in examples below.
|
||||
|
||||
## Document Loader
|
||||
|
||||
We have two different loaders: `NotionDirectoryLoader` and `NotionDBLoader`.
|
||||
|
||||
See a [usage example for the NotionDirectoryLoader](../modules/indexes/document_loaders/examples/notion.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import NotionDirectoryLoader
|
||||
```
|
||||
|
||||
See a [usage example for the NotionDBLoader](../modules/indexes/document_loaders/examples/notiondb.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import NotionDBLoader
|
||||
```
|
||||
19
docs/integrations/obsidian.md
Normal file
19
docs/integrations/obsidian.md
Normal file
@@ -0,0 +1,19 @@
|
||||
# Obsidian
|
||||
|
||||
>[Obsidian](https://obsidian.md/) is a powerful and extensible knowledge base
|
||||
that works on top of your local folder of plain text files.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
All instructions are in examples below.
|
||||
|
||||
## Document Loader
|
||||
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/obsidian.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import ObsidianLoader
|
||||
```
|
||||
|
||||
@@ -1,19 +1,25 @@
|
||||
# Psychic
|
||||
|
||||
This page covers how to use [Psychic](https://www.psychic.dev/) within LangChain.
|
||||
>[Psychic](https://www.psychic.dev/) is a platform for integrating with SaaS tools like `Notion`, `Zendesk`,
|
||||
> `Confluence`, and `Google Drive` via OAuth and syncing documents from these applications to your SQL or vector
|
||||
> database. You can think of it like Plaid for unstructured data.
|
||||
|
||||
## What is Psychic?
|
||||
## Installation and Setup
|
||||
|
||||
Psychic is a platform for integrating with your customer’s SaaS tools like Notion, Zendesk, Confluence, and Google Drive via OAuth and syncing documents from these applications to your SQL or vector database. You can think of it like Plaid for unstructured data. Psychic is easy to set up - you use it by importing the react library and configuring it with your Sidekick API key, which you can get from the [Psychic dashboard](https://dashboard.psychic.dev/). When your users connect their applications, you can view these connections from the dashboard and retrieve data using the server-side libraries.
|
||||
|
||||
## Quick start
|
||||
```bash
|
||||
pip install psychicapi
|
||||
```
|
||||
|
||||
Psychic is easy to set up - you import the `react` library and configure it with your `Sidekick API` key, which you get
|
||||
from the [Psychic dashboard](https://dashboard.psychic.dev/). When you connect the applications, you
|
||||
view these connections from the dashboard and retrieve data using the server-side libraries.
|
||||
|
||||
1. Create an account in the [dashboard](https://dashboard.psychic.dev/).
|
||||
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. Users will use this to connect their SaaS apps.
|
||||
3. Once your user has created a connection, you can use the langchain PsychicLoader by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
|
||||
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. You will use this to connect the SaaS apps.
|
||||
3. Once you have created a connection, you can use the `PsychicLoader` by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
|
||||
|
||||
|
||||
# Advantages vs Other Document Loaders
|
||||
## Advantages vs Other Document Loaders
|
||||
|
||||
1. **Universal API:** Instead of building OAuth flows and learning the APIs for every SaaS app, you integrate Psychic once and leverage our universal API to retrieve data.
|
||||
2. **Data Syncs:** Data in your customers' SaaS apps can get stale fast. With Psychic you can configure webhooks to keep your documents up to date on a daily or realtime basis.
|
||||
|
||||
22
docs/integrations/reddit.md
Normal file
22
docs/integrations/reddit.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# Reddit
|
||||
|
||||
>[Reddit](www.reddit.com) is an American social news aggregation, content rating, and discussion website.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install praw
|
||||
```
|
||||
|
||||
Make a [Reddit Application](https://www.reddit.com/prefs/apps/) and initialize the loader with with your Reddit API credentials.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/reddit.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import RedditPostsLoader
|
||||
```
|
||||
@@ -4,8 +4,7 @@
|
||||
[Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like
|
||||
PDFs and Word documents.
|
||||
This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured)
|
||||
ecosystem within LangChain.
|
||||
|
||||
ecosystem within LangChain.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -20,12 +19,6 @@ its dependencies running locally.
|
||||
- `tesseract-ocr`(images and PDFs)
|
||||
- `libreoffice` (MS Office docs)
|
||||
- `pandoc` (EPUBs)
|
||||
- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which
|
||||
`unstructured` uses for layout detection:
|
||||
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2"`
|
||||
- If `detectron2` is not installed, `unstructured` will fallback to processing PDFs
|
||||
using the `"fast"` strategy, which uses `pdfminer` directly and doesn't require
|
||||
`detectron2`.
|
||||
|
||||
If you want to get up and running with less set up, you can
|
||||
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -8,9 +9,15 @@
|
||||
"\n",
|
||||
"This notebook goes over how to track your LangChain experiments into one centralized Weights and Biases dashboard. To learn more about prompt engineering and the callback please refer to this Report which explains both alongside the resultant dashboards you can expect to see.\n",
|
||||
"\n",
|
||||
"Run in Colab: https://colab.research.google.com/drive/1DXH4beT4HFaRKy_Vm4PoxhXVDRf7Ym8L?usp=sharing\n",
|
||||
"\n",
|
||||
"View Report: https://wandb.ai/a-sh0ts/langchain_callback_demo/reports/Prompt-Engineering-LLMs-with-LangChain-and-W-B--VmlldzozNjk1NTUw#👋-how-to-build-a-callback-in-langchain-for-better-prompt-engineering"
|
||||
"<a href=\"https://colab.research.google.com/drive/1DXH4beT4HFaRKy_Vm4PoxhXVDRf7Ym8L?usp=sharing\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[View Report](https://wandb.ai/a-sh0ts/langchain_callback_demo/reports/Prompt-Engineering-LLMs-with-LangChain-and-W-B--VmlldzozNjk1NTUw#👋-how-to-build-a-callback-in-langchain-for-better-prompt-engineering\n",
|
||||
") \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"**Note**: _the `WandbCallbackHandler` is being deprecated in favour of the `WandbTracer`_ . In future please use the `WandbTracer` as it is more flexible and allows for more granular logging. To know more about the `WandbTracer` refer to the [agent_with_wandb_tracing.ipynb](https://python.langchain.com/en/latest/integrations/agent_with_wandb_tracing.html) notebook or use the following [colab notebook](http://wandb.me/prompts-quickstart). To know more about Weights & Biases Prompts refer to the following [prompts documentation](https://docs.wandb.ai/guides/prompts)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -54,6 +61,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -75,6 +83,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "cxBFfZR8d9FC"
|
||||
@@ -90,6 +99,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -200,6 +210,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Q-65jwrDeK6w"
|
||||
@@ -217,6 +228,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
|
||||
@@ -9,8 +9,8 @@
|
||||
"\n",
|
||||
"This notebook goes over adding memory to **both** of an Agent and its tools. Before going through this notebook, please walk through the following notebooks, as this will build on top of both of them:\n",
|
||||
"\n",
|
||||
"- [Adding memory to an LLM Chain](../../memory/examples/adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](custom_agent.ipynb)\n",
|
||||
"- [Adding memory to an LLM Chain](../../../memory/examples/adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/custom_agent.ipynb)\n",
|
||||
"\n",
|
||||
"We are going to create a custom Agent. The agent has access to a conversation memory, search tool, and a summarization tool. And, the summarization tool also needs access to the conversation memory."
|
||||
]
|
||||
|
||||
@@ -36,7 +36,7 @@ The first category of how-to guides here cover specific parts of working with ag
|
||||
:glob:
|
||||
:hidden:
|
||||
|
||||
./examples/*
|
||||
./agents/examples/*
|
||||
|
||||
|
||||
Agent Toolkits
|
||||
@@ -46,26 +46,26 @@ The next set of examples covers agents with toolkits.
|
||||
As opposed to the examples above, these examples are not intended to show off an agent `type`,
|
||||
but rather to show off an agent applied to particular use case.
|
||||
|
||||
`SQLDatabase Agent <./agent_toolkits/sql_database.html>`_: This notebook covers how to interact with an arbitrary SQL database using an agent.
|
||||
`SQLDatabase Agent <./toolkits/sql_database.html>`_: This notebook covers how to interact with an arbitrary SQL database using an agent.
|
||||
|
||||
`JSON Agent <./agent_toolkits/json.html>`_: This notebook covers how to interact with a JSON dictionary using an agent.
|
||||
`JSON Agent <./toolkits/json.html>`_: This notebook covers how to interact with a JSON dictionary using an agent.
|
||||
|
||||
`OpenAPI Agent <./agent_toolkits/openapi.html>`_: This notebook covers how to interact with an arbitrary OpenAPI endpoint using an agent.
|
||||
`OpenAPI Agent <./toolkits/openapi.html>`_: This notebook covers how to interact with an arbitrary OpenAPI endpoint using an agent.
|
||||
|
||||
`VectorStore Agent <./agent_toolkits/vectorstore.html>`_: This notebook covers how to interact with VectorStores using an agent.
|
||||
`VectorStore Agent <./toolkits/vectorstore.html>`_: This notebook covers how to interact with VectorStores using an agent.
|
||||
|
||||
`Python Agent <./agent_toolkits/python.html>`_: This notebook covers how to produce and execute python code using an agent.
|
||||
`Python Agent <./toolkits/python.html>`_: This notebook covers how to produce and execute python code using an agent.
|
||||
|
||||
`Pandas DataFrame Agent <./agent_toolkits/pandas.html>`_: This notebook covers how to do question answering over a pandas dataframe using an agent. Under the hood this calls the Python agent..
|
||||
`Pandas DataFrame Agent <./toolkits/pandas.html>`_: This notebook covers how to do question answering over a pandas dataframe using an agent. Under the hood this calls the Python agent..
|
||||
|
||||
`CSV Agent <./agent_toolkits/csv.html>`_: This notebook covers how to do question answering over a csv file. Under the hood this calls the Pandas DataFrame agent.
|
||||
`CSV Agent <./toolkits/csv.html>`_: This notebook covers how to do question answering over a csv file. Under the hood this calls the Pandas DataFrame agent.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
:hidden:
|
||||
|
||||
./agent_toolkits/*
|
||||
./toolkits/*
|
||||
|
||||
|
||||
Agent Types
|
||||
|
||||
94
docs/modules/agents/tools/examples/brave_search.ipynb
Normal file
94
docs/modules/agents/tools/examples/brave_search.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eda326e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Brave Search\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the Brave Search tool."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a4c896e5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import BraveSearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "6784d37c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"api_key = \"...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5b14008a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = BraveSearch.from_api_key(api_key=api_key, search_kwargs={\"count\": 3})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "f11937b2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'[{\"title\": \"Barack Obama - Wikipedia\", \"link\": \"https://en.wikipedia.org/wiki/Barack_Obama\", \"snippet\": \"Outside of politics, <strong>Obama</strong> has published three bestselling books: Dreams from My Father (1995), The Audacity of Hope (2006) and A Promised Land (2020). Rankings by scholars and historians, in which he has been featured since 2010, place him in the <strong>middle</strong> to upper tier of American presidents.\"}, {\"title\": \"Obama\\'s Middle Name -- My Last Name -- is \\'Hussein.\\' So?\", \"link\": \"https://www.cair.com/cair_in_the_news/obamas-middle-name-my-last-name-is-hussein-so/\", \"snippet\": \"Many Americans understand that common names don\\\\u2019t only come in the form of a \\\\u201cSmith\\\\u201d or a \\\\u201cJohnson.\\\\u201d Perhaps, they have a neighbor, mechanic or teacher named Hussein. Or maybe they\\\\u2019ve seen fashion designer Hussein Chalayan in the pages of Vogue or recall <strong>King Hussein</strong>, our ally in the Middle East.\"}, {\"title\": \"What\\'s up with Obama\\'s middle name? - Quora\", \"link\": \"https://www.quora.com/Whats-up-with-Obamas-middle-name\", \"snippet\": \"Answer (1 of 15): A better question would be, \\\\u201cWhat\\\\u2019s up with Obama\\\\u2019s first name?\\\\u201d President <strong>Barack Hussein Obama</strong>\\\\u2019s father\\\\u2019s name was <strong>Barack Hussein Obama</strong>. He was named after his father. Hussein, Obama\\\\u2019s middle name, is a very common Arabic name, meaning "good," "handsome," or "beautiful."\"}]'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool.run(\"obama middle name\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "da9c63d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -184,7 +184,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
322
docs/modules/agents/tools/human_approval.ipynb
Normal file
322
docs/modules/agents/tools/human_approval.ipynb
Normal file
@@ -0,0 +1,322 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "144e77fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Human-in-the-loop Tool Validation\n",
|
||||
"\n",
|
||||
"This walkthrough demonstrates how to add Human validation to any Tool. We'll do this using the `HumanApprovalCallbackhandler`.\n",
|
||||
"\n",
|
||||
"Let's suppose we need to make use of the ShellTool. Adding this tool to an automated flow poses obvious risks. Let's see how we could enforce manual human approval of inputs going into this tool.\n",
|
||||
"\n",
|
||||
"**Note**: We generally recommend against using the ShellTool. There's a lot of ways to misuse it, and it's not required for most use cases. We employ it here only for demonstration purposes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "ad84c682",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import HumanApprovalCallbackHandler\n",
|
||||
"from langchain.tools import ShellTool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "70090dd6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = ShellTool()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "20d5175f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Hello World!\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run('echo Hello World!'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e0475dd6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Adding Human Approval\n",
|
||||
"Adding the default HumanApprovalCallbackHandler to the tool will make it so that a user has to manually approve every input to the tool before the command is actually executed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "f1c88793",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = ShellTool(callbacks=[HumanApprovalCallbackHandler()])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f749815d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /usr\n",
|
||||
"yes\n",
|
||||
"\u001b[35mX11\u001b[m\u001b[m\n",
|
||||
"\u001b[35mX11R6\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mbin\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlib\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlibexec\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlocal\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36msbin\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mshare\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mstandalone\u001b[m\u001b[m\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run(\"ls /usr\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "b6e455d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /private\n",
|
||||
"no\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "HumanRejectedException",
|
||||
"evalue": "Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected.",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[17], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mls /private\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/tools/base.py:257\u001b[0m, in \u001b[0;36mBaseTool.run\u001b[0;34m(self, tool_input, verbose, start_color, color, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[38;5;66;03m# TODO: maybe also pass through run_manager is _run supports kwargs\u001b[39;00m\n\u001b[1;32m 256\u001b[0m new_arg_supported \u001b[38;5;241m=\u001b[39m signature(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run)\u001b[38;5;241m.\u001b[39mparameters\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 257\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m \u001b[43mcallback_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mon_tool_start\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 258\u001b[0m \u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mname\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdescription\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdescription\u001b[49m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 259\u001b[0m \u001b[43m \u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43misinstance\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 260\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_color\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 261\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 262\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 264\u001b[0m tool_args, tool_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_to_args_and_kwargs(parsed_input)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:672\u001b[0m, in \u001b[0;36mCallbackManager.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 669\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m run_id \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 670\u001b[0m run_id \u001b[38;5;241m=\u001b[39m uuid4()\n\u001b[0;32m--> 672\u001b[0m \u001b[43m_handle_event\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 673\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandlers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 674\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mon_tool_start\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 675\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mignore_agent\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 676\u001b[0m \u001b[43m \u001b[49m\u001b[43mserialized\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 677\u001b[0m \u001b[43m \u001b[49m\u001b[43minput_str\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 678\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 679\u001b[0m \u001b[43m \u001b[49m\u001b[43mparent_run_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparent_run_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 680\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 681\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 683\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CallbackManagerForToolRun(\n\u001b[1;32m 684\u001b[0m run_id, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minheritable_handlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparent_run_id\n\u001b[1;32m 685\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:157\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 156\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m handler\u001b[38;5;241m.\u001b[39mraise_error:\n\u001b[0;32m--> 157\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 158\u001b[0m logging\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError in \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mevent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m callback: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:139\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ignore_condition_name \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[1;32m 137\u001b[0m handler, ignore_condition_name\n\u001b[1;32m 138\u001b[0m ):\n\u001b[0;32m--> 139\u001b[0m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mhandler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mevent_name\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 141\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m event_name \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mon_chat_model_start\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/human.py:48\u001b[0m, in \u001b[0;36mHumanApprovalCallbackHandler.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mon_tool_start\u001b[39m(\n\u001b[1;32m 39\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 40\u001b[0m serialized: Dict[\u001b[38;5;28mstr\u001b[39m, Any],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 46\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Any:\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_check(serialized) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_approve(input_str):\n\u001b[0;32m---> 48\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HumanRejectedException(\n\u001b[1;32m 49\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInputs \u001b[39m\u001b[38;5;132;01m{\u001b[39;00minput_str\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to tool \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mserialized\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m were rejected.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 50\u001b[0m )\n",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m: Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run(\"ls /private\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a3b092ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuring Human Approval\n",
|
||||
"\n",
|
||||
"Let's suppose we have an agent that takes in multiple tools, and we want it to only trigger human approval requests on certain tools and certain inputs. We can configure out callback handler to do just this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4521c581",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "9e8d5428",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def _should_check(serialized_obj: dict) -> bool:\n",
|
||||
" # Only require approval on ShellTool.\n",
|
||||
" return serialized_obj.get(\"name\") == \"terminal\"\n",
|
||||
"\n",
|
||||
"def _approve(_input: str) -> bool:\n",
|
||||
" if _input == \"echo 'Hello World'\":\n",
|
||||
" return True\n",
|
||||
" msg = (\n",
|
||||
" \"Do you approve of the following input? \"\n",
|
||||
" \"Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\"\n",
|
||||
" )\n",
|
||||
" msg += \"\\n\\n\" + _input + \"\\n\"\n",
|
||||
" resp = input(msg)\n",
|
||||
" return resp.lower() in (\"yes\", \"y\")\n",
|
||||
"\n",
|
||||
"callbacks = [HumanApprovalCallbackHandler(should_check=_should_check, approve=_approve)]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "9922898e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"tools = load_tools([\"wikipedia\", \"llm-math\", \"terminal\"], llm=llm)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, \n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"id": "e69ea402",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 38,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"id": "25182a7e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Hello World'"
|
||||
]
|
||||
},
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"print 'Hello World' in the terminal\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"id": "2f5a93d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /private\n",
|
||||
"no\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "HumanRejectedException",
|
||||
"evalue": "Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected.",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[39], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mlist all directories in /private\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m)\u001b[49m\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:236\u001b[0m, in \u001b[0;36mChain.run\u001b[0;34m(self, callbacks, *args, **kwargs)\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 235\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m`run` supports only one positional argument.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 236\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43margs\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m)\u001b[49m[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]]\n\u001b[1;32m 238\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m kwargs \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m args:\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m(kwargs, callbacks\u001b[38;5;241m=\u001b[39mcallbacks)[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]]\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:140\u001b[0m, in \u001b[0;36mChain.__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m, \u001b[38;5;167;01mException\u001b[39;00m) \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 139\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n\u001b[0;32m--> 140\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 141\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_end(outputs)\n\u001b[1;32m 142\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprep_outputs(inputs, outputs, return_only_outputs)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:134\u001b[0m, in \u001b[0;36mChain.__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks)\u001b[0m\n\u001b[1;32m 128\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m callback_manager\u001b[38;5;241m.\u001b[39mon_chain_start(\n\u001b[1;32m 129\u001b[0m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mname\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m},\n\u001b[1;32m 130\u001b[0m inputs,\n\u001b[1;32m 131\u001b[0m )\n\u001b[1;32m 132\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 133\u001b[0m outputs \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 134\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call(inputs)\n\u001b[1;32m 137\u001b[0m )\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m, \u001b[38;5;167;01mException\u001b[39;00m) \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 139\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/agents/agent.py:953\u001b[0m, in \u001b[0;36mAgentExecutor._call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m 951\u001b[0m \u001b[38;5;66;03m# We now enter the agent loop (until it returns something).\u001b[39;00m\n\u001b[1;32m 952\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_continue(iterations, time_elapsed):\n\u001b[0;32m--> 953\u001b[0m next_step_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_take_next_step\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 954\u001b[0m \u001b[43m \u001b[49m\u001b[43mname_to_tool_map\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 955\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 956\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 957\u001b[0m \u001b[43m \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 958\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 959\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 960\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(next_step_output, AgentFinish):\n\u001b[1;32m 961\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_return(\n\u001b[1;32m 962\u001b[0m next_step_output, intermediate_steps, run_manager\u001b[38;5;241m=\u001b[39mrun_manager\n\u001b[1;32m 963\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/agents/agent.py:820\u001b[0m, in \u001b[0;36mAgentExecutor._take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 818\u001b[0m tool_run_kwargs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mllm_prefix\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 819\u001b[0m \u001b[38;5;66;03m# We then call the tool on the tool input to get an observation\u001b[39;00m\n\u001b[0;32m--> 820\u001b[0m observation \u001b[38;5;241m=\u001b[39m \u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 821\u001b[0m \u001b[43m \u001b[49m\u001b[43magent_action\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 822\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 823\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcolor\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 824\u001b[0m \u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_child\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 825\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mtool_run_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 826\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 827\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 828\u001b[0m tool_run_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39magent\u001b[38;5;241m.\u001b[39mtool_run_logging_kwargs()\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/tools/base.py:257\u001b[0m, in \u001b[0;36mBaseTool.run\u001b[0;34m(self, tool_input, verbose, start_color, color, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[38;5;66;03m# TODO: maybe also pass through run_manager is _run supports kwargs\u001b[39;00m\n\u001b[1;32m 256\u001b[0m new_arg_supported \u001b[38;5;241m=\u001b[39m signature(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run)\u001b[38;5;241m.\u001b[39mparameters\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 257\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m \u001b[43mcallback_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mon_tool_start\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 258\u001b[0m \u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mname\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdescription\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdescription\u001b[49m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 259\u001b[0m \u001b[43m \u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43misinstance\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 260\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_color\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 261\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 262\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 264\u001b[0m tool_args, tool_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_to_args_and_kwargs(parsed_input)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:672\u001b[0m, in \u001b[0;36mCallbackManager.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 669\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m run_id \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 670\u001b[0m run_id \u001b[38;5;241m=\u001b[39m uuid4()\n\u001b[0;32m--> 672\u001b[0m \u001b[43m_handle_event\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 673\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandlers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 674\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mon_tool_start\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 675\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mignore_agent\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 676\u001b[0m \u001b[43m \u001b[49m\u001b[43mserialized\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 677\u001b[0m \u001b[43m \u001b[49m\u001b[43minput_str\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 678\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 679\u001b[0m \u001b[43m \u001b[49m\u001b[43mparent_run_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparent_run_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 680\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 681\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 683\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CallbackManagerForToolRun(\n\u001b[1;32m 684\u001b[0m run_id, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minheritable_handlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparent_run_id\n\u001b[1;32m 685\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:157\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 156\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m handler\u001b[38;5;241m.\u001b[39mraise_error:\n\u001b[0;32m--> 157\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 158\u001b[0m logging\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError in \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mevent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m callback: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:139\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ignore_condition_name \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[1;32m 137\u001b[0m handler, ignore_condition_name\n\u001b[1;32m 138\u001b[0m ):\n\u001b[0;32m--> 139\u001b[0m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mhandler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mevent_name\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 141\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m event_name \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mon_chat_model_start\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/human.py:48\u001b[0m, in \u001b[0;36mHumanApprovalCallbackHandler.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mon_tool_start\u001b[39m(\n\u001b[1;32m 39\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 40\u001b[0m serialized: Dict[\u001b[38;5;28mstr\u001b[39m, Any],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 46\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Any:\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_check(serialized) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_approve(input_str):\n\u001b[0;32m---> 48\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HumanRejectedException(\n\u001b[1;32m 49\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInputs \u001b[39m\u001b[38;5;132;01m{\u001b[39;00minput_str\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to tool \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mserialized\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m were rejected.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 50\u001b[0m )\n",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m: Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"list all directories in /private\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c0b47e26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
423
docs/modules/callbacks/examples/argilla.ipynb
Normal file
423
docs/modules/callbacks/examples/argilla.ipynb
Normal file
@@ -0,0 +1,423 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Argilla\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
">[Argilla](https://argilla.io/) is an open-source data curation platform for LLMs.\n",
|
||||
"> Using Argilla, everyone can build robust language models through faster data curation \n",
|
||||
"> using both human and machine feedback. We provide support for each step in the MLOps cycle, \n",
|
||||
"> from data labeling to model monitoring.\n",
|
||||
"\n",
|
||||
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/modules/callbacks/examples/argilla.ipynb\">\n",
|
||||
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
|
||||
"</a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this guide we will demonstrate how to track the inputs and reponses of your LLM to generate a dataset in Argilla, using the `ArgillaCallbackHandler`.\n",
|
||||
"\n",
|
||||
"It's useful to keep track of the inputs and outputs of your LLMs to generate datasets for future fine-tuning. This is especially useful when you're using a LLM to generate data for a specific task, such as question answering, summarization, or translation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install argilla --upgrade\n",
|
||||
"!pip install openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Getting API Credentials\n",
|
||||
"\n",
|
||||
"To get the Argilla API credentials, follow the next steps:\n",
|
||||
"\n",
|
||||
"1. Go to your Argilla UI.\n",
|
||||
"2. Click on your profile picture and go to \"My settings\".\n",
|
||||
"3. Then copy the API Key.\n",
|
||||
"\n",
|
||||
"In Argilla the API URL will be the same as the URL of your Argilla UI.\n",
|
||||
"\n",
|
||||
"To get the OpenAI API credentials, please visit https://platform.openai.com/account/api-keys"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"ARGILLA_API_URL\"] = \"...\"\n",
|
||||
"os.environ[\"ARGILLA_API_KEY\"] = \"...\"\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup Argilla\n",
|
||||
"\n",
|
||||
"To use the `ArgillaCallbackHandler` we will need to create a new `FeedbackDataset` in Argilla to keep track of your LLM experiments. To do so, please use the following code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import argilla as rg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from packaging.version import parse as parse_version\n",
|
||||
"\n",
|
||||
"if parse_version(rg.__version__) < parse_version(\"1.8.0\"):\n",
|
||||
" raise RuntimeError(\n",
|
||||
" \"`FeedbackDataset` is only available in Argilla v1.8.0 or higher, please \"\n",
|
||||
" \"upgrade `argilla` as `pip install argilla --upgrade`.\"\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset = rg.FeedbackDataset(\n",
|
||||
" fields=[\n",
|
||||
" rg.TextField(name=\"prompt\"),\n",
|
||||
" rg.TextField(name=\"response\"),\n",
|
||||
" ],\n",
|
||||
" questions=[\n",
|
||||
" rg.RatingQuestion(\n",
|
||||
" name=\"response-rating\",\n",
|
||||
" description=\"How would you rate the quality of the response?\",\n",
|
||||
" values=[1, 2, 3, 4, 5],\n",
|
||||
" required=True,\n",
|
||||
" ),\n",
|
||||
" rg.TextQuestion(\n",
|
||||
" name=\"response-feedback\",\n",
|
||||
" description=\"What feedback do you have for the response?\",\n",
|
||||
" required=False,\n",
|
||||
" ),\n",
|
||||
" ],\n",
|
||||
" guidelines=\"You're asked to rate the quality of the response and provide feedback.\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"rg.init(\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"dataset.push_to_argilla(\"langchain-dataset\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> 📌 NOTE: at the moment, just the prompt-response pairs are supported as `FeedbackDataset.fields`, so the `ArgillaCallbackHandler` will just track the prompt i.e. the LLM input, and the response i.e. the LLM output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tracking"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To use the `ArgillaCallbackHandler` you can either use the following code, or just reproduce one of the examples presented in the following sections."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 1: Tracking an LLM\n",
|
||||
"\n",
|
||||
"First, let's just run a single LLM a few times and capture the resulting prompt-response pairs in Argilla."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[Generation(text='\\n\\nQ: What did the fish say when he hit the wall? \\nA: Dam.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nThe Moon \\n\\nThe moon is high in the midnight sky,\\nSparkling like a star above.\\nThe night so peaceful, so serene,\\nFilling up the air with love.\\n\\nEver changing and renewing,\\nA never-ending light of grace.\\nThe moon remains a constant view,\\nA reminder of life’s gentle pace.\\n\\nThrough time and space it guides us on,\\nA never-fading beacon of hope.\\nThe moon shines down on us all,\\nAs it continues to rise and elope.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ. What did one magnet say to the other magnet?\\nA. \"I find you very attractive!\"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nThe world is charged with the grandeur of God.\\nIt will flame out, like shining from shook foil;\\nIt gathers to a greatness, like the ooze of oil\\nCrushed. Why do men then now not reck his rod?\\n\\nGenerations have trod, have trod, have trod;\\nAnd all is seared with trade; bleared, smeared with toil;\\nAnd wears man's smudge and shares man's smell: the soil\\nIs bare now, nor can foot feel, being shod.\\n\\nAnd for all this, nature is never spent;\\nThere lives the dearest freshness deep down things;\\nAnd though the last lights off the black West went\\nOh, morning, at the brown brink eastward, springs —\\n\\nBecause the Holy Ghost over the bent\\nWorld broods with warm breast and with ah! bright wings.\\n\\n~Gerard Manley Hopkins\", generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ: What did one ocean say to the other ocean?\\nA: Nothing, they just waved.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nA poem for you\\n\\nOn a field of green\\n\\nThe sky so blue\\n\\nA gentle breeze, the sun above\\n\\nA beautiful world, for us to love\\n\\nLife is a journey, full of surprise\\n\\nFull of joy and full of surprise\\n\\nBe brave and take small steps\\n\\nThe future will be revealed with depth\\n\\nIn the morning, when dawn arrives\\n\\nA fresh start, no reason to hide\\n\\nSomewhere down the road, there's a heart that beats\\n\\nBelieve in yourself, you'll always succeed.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'completion_tokens': 504, 'total_tokens': 528, 'prompt_tokens': 24}, 'model_name': 'text-davinci-003'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"llm.generate([\"Tell me a joke\", \"Tell me a poem\"] * 3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 2: Tracking an LLM in a chain\n",
|
||||
"\n",
|
||||
"Then we can create a chain using a prompt template, and then track the initial prompt and the final response in Argilla."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
|
||||
"Title: Documentary about Bigfoot in Paris\n",
|
||||
"Playwright: This is a synopsis for the above play:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'text': \"\\n\\nDocumentary about Bigfoot in Paris focuses on the story of a documentary filmmaker and their search for evidence of the legendary Bigfoot creature in the city of Paris. The play follows the filmmaker as they explore the city, meeting people from all walks of life who have had encounters with the mysterious creature. Through their conversations, the filmmaker unravels the story of Bigfoot and finds out the truth about the creature's presence in Paris. As the story progresses, the filmmaker learns more and more about the mysterious creature, as well as the different perspectives of the people living in the city, and what they think of the creature. In the end, the filmmaker's findings lead them to some surprising and heartwarming conclusions about the creature's existence and the importance it holds in the lives of the people in Paris.\"}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
|
||||
"Title: {title}\n",
|
||||
"Playwright: This is a synopsis for the above play:\"\"\"\n",
|
||||
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
|
||||
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"test_prompts = [{\"title\": \"Documentary about Bigfoot in Paris\"}]\n",
|
||||
"synopsis_chain.apply(test_prompts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 3: Using an Agent with Tools\n",
|
||||
"\n",
|
||||
"Finally, as a more advanced workflow, you can create an agent that uses some tools. So that `ArgillaCallbackHandler` will keep track of the input and the output, but not about the intermediate steps/thoughts, so that given a prompt we log the original prompt and the final response to that given prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> Note that for this scenario we'll be using Google Search API (Serp API) so you will need to both install `google-search-results` as `pip install google-search-results`, and to set the Serp API Key as `os.environ[\"SERPAPI_API_KEY\"] = \"...\"` (you can find it at https://serpapi.com/dashboard), otherwise the example below won't work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to answer a historical question\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"who was the first president of the United States of America\" \u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mGeorge Washington\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m George Washington was the first president\n",
|
||||
"Final Answer: George Washington was the first president of the United States of America.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'George Washington was the first president of the United States of America.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"tools = load_tools([\"serpapi\"], llm=llm, callbacks=callbacks)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" callbacks=callbacks,\n",
|
||||
")\n",
|
||||
"agent.run(\"Who was the first president of the United States of America?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
|
||||
"\n",
|
||||
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](../indexes/chain_examples/question_answering.html). Async support for other chains is on the roadmap."
|
||||
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](../index_examples/question_answering.ipynb). Async support for other chains is on the roadmap."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -104,7 +104,7 @@
|
||||
"s = time.perf_counter()\n",
|
||||
"generate_serially()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -88,7 +88,7 @@ We don't need any access permissions to these datasets and services.
|
||||
|
||||
|
||||
Proprietary dataset or service loaders
|
||||
------------------------------
|
||||
--------------------------------------
|
||||
These datasets and services are not from the public domain.
|
||||
These loaders mostly transform data from specific formats of applications or cloud services,
|
||||
for example **Google Drive**.
|
||||
|
||||
@@ -0,0 +1,256 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f08772b0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Alibaba Cloud MaxCompute\n",
|
||||
"\n",
|
||||
">[Alibaba Cloud MaxCompute](https://www.alibabacloud.com/product/maxcompute) (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.\n",
|
||||
"\n",
|
||||
"The `MaxComputeLoader` lets you execute a MaxCompute SQL query and loads the results as one document per row."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "067b7213",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Collecting pyodps\n",
|
||||
" Downloading pyodps-0.11.4.post0-cp39-cp39-macosx_10_9_universal2.whl (2.0 MB)\n",
|
||||
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m0m\n",
|
||||
"\u001b[?25hRequirement already satisfied: charset-normalizer>=2 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.1.0)\n",
|
||||
"Requirement already satisfied: urllib3<2.0,>=1.26.0 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (1.26.15)\n",
|
||||
"Requirement already satisfied: idna>=2.5 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.4)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (2023.5.7)\n",
|
||||
"Installing collected packages: pyodps\n",
|
||||
"Successfully installed pyodps-0.11.4.post0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!pip install pyodps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "19641457",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic Usage\n",
|
||||
"To instantiate the loader you'll need a SQL query to execute, your MaxCompute endpoint and project name, and you access ID and secret access key. The access ID and secret access key can either be passed in direct via the `access_id` and `secret_access_key` parameters or they can be set as environment variables `MAX_COMPUTE_ACCESS_ID` and `MAX_COMPUTE_SECRET_ACCESS_KEY`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "71a0da4b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import MaxComputeLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d4770c4a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"base_query = \"\"\"\n",
|
||||
"SELECT *\n",
|
||||
"FROM (\n",
|
||||
" SELECT 1 AS id, 'content1' AS content, 'meta_info1' AS meta_info\n",
|
||||
" UNION ALL\n",
|
||||
" SELECT 2 AS id, 'content2' AS content, 'meta_info2' AS meta_info\n",
|
||||
" UNION ALL\n",
|
||||
" SELECT 3 AS id, 'content3' AS content, 'meta_info3' AS meta_info\n",
|
||||
") mydata;\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1616c174",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"endpoint=\"<ENDPOINT>\"\n",
|
||||
"project=\"<PROJECT>\"\n",
|
||||
"ACCESS_ID = \"<ACCESS ID>\"\n",
|
||||
"SECRET_ACCESS_KEY = \"<SECRET ACCESS KEY>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "e5c25041",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = MaxComputeLoader.from_params(\n",
|
||||
" base_query,\n",
|
||||
" endpoint,\n",
|
||||
" project,\n",
|
||||
" access_id=ACCESS_ID,\n",
|
||||
" secret_access_key=SECRET_ACCESS_KEY,\n",
|
||||
"\n",
|
||||
")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "311e74ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Document(page_content='id: 1\\ncontent: content1\\nmeta_info: meta_info1', metadata={}), Document(page_content='id: 2\\ncontent: content2\\nmeta_info: meta_info2', metadata={}), Document(page_content='id: 3\\ncontent: content3\\nmeta_info: meta_info3', metadata={})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "a4d8c388",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"id: 1\n",
|
||||
"content: content1\n",
|
||||
"meta_info: meta_info1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "f2422e6c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "85e07e28",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying Which Columns are Content vs Metadata\n",
|
||||
"You can configure which subset of columns should be loaded as the contents of the Document and which as the metadata using the `page_content_columns` and `metadata_columns` parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "a7b9d726",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = MaxComputeLoader.from_params(\n",
|
||||
" base_query,\n",
|
||||
" endpoint,\n",
|
||||
" project,\n",
|
||||
" page_content_columns=[\"content\"], # Specify Document page content\n",
|
||||
" metadata_columns=[\"id\", \"meta_info\"], # Specify Document metadata\n",
|
||||
" access_id=ACCESS_ID,\n",
|
||||
" secret_access_key=SECRET_ACCESS_KEY,\n",
|
||||
")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "532c19e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content: content1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "5fe4990a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'id': 1, 'meta_info': 'meta_info1'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].metadata)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -5,22 +5,47 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Docugami\n",
|
||||
"This notebook covers how to load documents from `Docugami`. See [here](../../../../ecosystem/docugami.md) for more details, and the advantages of using this system over alternative data loaders.\n",
|
||||
"This notebook covers how to load documents from `Docugami`. It provides the advantages of using this system over alternative data loaders.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"1. Follow the Quick Start section in [this document](../../../../ecosystem/docugami.md)\n",
|
||||
"2. Grab an access token for your workspace, and make sure it is set as the DOCUGAMI_API_KEY environment variable\n",
|
||||
"1. Install necessary python packages.\n",
|
||||
"2. Grab an access token for your workspace, and make sure it is set as the `DOCUGAMI_API_KEY` environment variable.\n",
|
||||
"3. Grab some docset and document IDs for your processed documents, as described here: https://help.docugami.com/home/docugami-api"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You need the lxml package to use the DocugamiLoader\n",
|
||||
"!poetry run pip -q install lxml"
|
||||
"!pip install lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quick start\n",
|
||||
"\n",
|
||||
"1. Create a [Docugami workspace](http://www.docugami.com) (free trials available)\n",
|
||||
"2. Add your documents (PDF, DOCX or DOC) and allow Docugami to ingest and cluster them into sets of similar documents, e.g. NDAs, Lease Agreements, and Service Agreements. There is no fixed set of document types supported by the system, the clusters created depend on your particular documents, and you can [change the docset assignments](https://help.docugami.com/home/working-with-the-doc-sets-view) later.\n",
|
||||
"3. Create an access token via the Developer Playground for your workspace. [Detailed instructions](https://help.docugami.com/home/docugami-api)\n",
|
||||
"4. Explore the [Docugami API](https://api-docs.docugami.com) to get a list of your processed docset IDs, or just the document IDs for a particular docset. \n",
|
||||
"6. Use the DocugamiLoader as detailed below, to get rich semantic chunks for your documents.\n",
|
||||
"7. Optionally, build and publish one or more [reports or abstracts](https://help.docugami.com/home/reports). This helps Docugami improve the semantic XML with better tags based on your preferences, which are then added to the DocugamiLoader output as metadata. Use techniques like [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) to do high accuracy Document QA.\n",
|
||||
"\n",
|
||||
"## Advantages vs Other Chunking Techniques\n",
|
||||
"\n",
|
||||
"Appropriate chunking of your documents is critical for retrieval from documents. Many chunking techniques exist, including simple ones that rely on whitespace and recursive chunk splitting based on character length. Docugami offers a different approach:\n",
|
||||
"\n",
|
||||
"1. **Intelligent Chunking:** Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.\n",
|
||||
"2. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.\n",
|
||||
"3. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.\n",
|
||||
"4. **Additional Metadata:** Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through below.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -112,7 +137,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!poetry run pip -q install openai tiktoken chromadb "
|
||||
"!poetry run pip -q install openai tiktoken chromadb"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -292,7 +317,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use a [self-querying retriever](../../retrievers/examples/self_query_retriever.ipynb) to improve our query accuracy, using this additional metadata:"
|
||||
"We can use a [self-querying retriever](../../retrievers/examples/self_query.ipynb) to improve our query accuracy, using this additional metadata:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -339,7 +364,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's run the same question again. It returns the correct result since all the chunks have metadata key/value pairs on them carrying key information about the document even if this infromation is physically very far away from the source chunk used to generate the answer."
|
||||
"Let's run the same question again. It returns the correct result since all the chunks have metadata key/value pairs on them carrying key information about the document even if this information is physically very far away from the source chunk used to generate the answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -398,7 +423,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Facebook Chat\n",
|
||||
"# Facebook Chat\n",
|
||||
"\n",
|
||||
">[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its messaging service in 2010.\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,17 +1,18 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PySpack DataFrame Loader\n",
|
||||
"# PySpark DataFrame Loader\n",
|
||||
"\n",
|
||||
"This shows how to load data from a PySpark DataFrame"
|
||||
"This notebook goes over how to load data from a [PySpark](https://spark.apache.org/docs/latest/api/python/) DataFrame."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -20,7 +21,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -29,16 +30,26 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Setting default log level to \"WARN\".\n",
|
||||
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
|
||||
"23/05/31 14:08:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"spark = SparkSession.builder.getOrCreate()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -47,7 +58,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -56,7 +67,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -65,9 +76,56 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Stage 8:> (0 + 1) / 1]\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': ' 81.34', ' \"Wins\"': ' 98'}),\n",
|
||||
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': ' 82.20', ' \"Wins\"': ' 97'}),\n",
|
||||
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': ' 197.96', ' \"Wins\"': ' 95'}),\n",
|
||||
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': ' 117.62', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': ' 83.31', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': ' 55.37', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': ' 120.51', ' \"Wins\"': ' 93'}),\n",
|
||||
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': ' 81.43', ' \"Wins\"': ' 93'}),\n",
|
||||
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': ' 64.17', ' \"Wins\"': ' 90'}),\n",
|
||||
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': ' 154.49', ' \"Wins\"': ' 89'}),\n",
|
||||
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': ' 132.30', ' \"Wins\"': ' 88'}),\n",
|
||||
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': ' 110.30', ' \"Wins\"': ' 88'}),\n",
|
||||
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': ' 95.14', ' \"Wins\"': ' 86'}),\n",
|
||||
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': ' 96.92', ' \"Wins\"': ' 85'}),\n",
|
||||
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': ' 97.65', ' \"Wins\"': ' 83'}),\n",
|
||||
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': ' 174.54', ' \"Wins\"': ' 81'}),\n",
|
||||
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': ' 74.28', ' \"Wins\"': ' 81'}),\n",
|
||||
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': ' 63.43', ' \"Wins\"': ' 79'}),\n",
|
||||
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': ' 55.24', ' \"Wins\"': ' 76'}),\n",
|
||||
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': ' 81.97', ' \"Wins\"': ' 75'}),\n",
|
||||
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': ' 93.35', ' \"Wins\"': ' 74'}),\n",
|
||||
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': ' 75.48', ' \"Wins\"': ' 73'}),\n",
|
||||
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': ' 60.91', ' \"Wins\"': ' 72'}),\n",
|
||||
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': ' 118.07', ' \"Wins\"': ' 69'}),\n",
|
||||
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': ' 173.18', ' \"Wins\"': ' 69'}),\n",
|
||||
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': ' 78.43', ' \"Wins\"': ' 68'}),\n",
|
||||
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': ' 94.08', ' \"Wins\"': ' 66'}),\n",
|
||||
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': ' 78.06', ' \"Wins\"': ' 64'}),\n",
|
||||
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': ' 88.19', ' \"Wins\"': ' 61'}),\n",
|
||||
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': ' 60.65', ' \"Wins\"': ' 55'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
@@ -89,7 +147,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"# Reddit\n",
|
||||
"\n",
|
||||
">[Reddit (reddit)](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
|
||||
">[Reddit](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This loader fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package.\n",
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
"\n",
|
||||
"Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n",
|
||||
"\n",
|
||||
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
|
||||
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -63,6 +63,25 @@
|
||||
"docs = sitemap_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can change the `requests_per_second` parameter to increase the max concurrent requests. and use `requests_kwargs` to pass kwargs when send requests."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sitemap_loader.requests_per_second = 2\n",
|
||||
"# Optional: avoid `[SSL: CERTIFICATE_VERIFY_FAILED]` issue\n",
|
||||
"sitemap_loader.requests_kwargs = {\"verify\": False}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
|
||||
@@ -19,7 +19,6 @@
|
||||
"source": [
|
||||
"# # Install package\n",
|
||||
"!pip install \"unstructured[local-inference]\"\n",
|
||||
"!pip install \"detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2\"\n",
|
||||
"!pip install layoutparser[layoutmodels,tesseract]"
|
||||
]
|
||||
},
|
||||
|
||||
396
docs/modules/indexes/retrievers/examples/qdrant_self_query.ipynb
Normal file
396
docs/modules/indexes/retrievers/examples/qdrant_self_query.ipynb
Normal file
@@ -0,0 +1,396 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "13afcae7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Self-querying with Qdrant\n",
|
||||
"\n",
|
||||
">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support. It makes it useful \n",
|
||||
"\n",
|
||||
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Qdrant vector store. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68e75fb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating a Qdrant vectorstore\n",
|
||||
"First we'll want to create a Chroma VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
|
||||
"\n",
|
||||
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `qdrant-client` package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "63a8af5b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install lark qdrant-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "83811610-7df3-4ede-b268-68a6a83ba9e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd01b61b-7d32-4a55-85d6-b2d2d4f18840",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import os\n",
|
||||
"# import getpass\n",
|
||||
"\n",
|
||||
"# os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "cb4a5787",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores import Qdrant\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bcbe04d9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"}),\n",
|
||||
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
|
||||
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
|
||||
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
|
||||
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"year\": 1995, \"genre\": \"animated\"}),\n",
|
||||
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"year\": 1979, \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": \"science fiction\", \"rating\": 9.9})\n",
|
||||
"]\n",
|
||||
"vectorstore = Qdrant.from_documents(\n",
|
||||
" docs, \n",
|
||||
" embeddings, \n",
|
||||
" location=\":memory:\", # Local mode with in-memory storage only\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ecaab6d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating our self-querying retriever\n",
|
||||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "86e34dbf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"\n",
|
||||
"metadata_field_info=[\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"genre\",\n",
|
||||
" description=\"The genre of the movie\", \n",
|
||||
" type=\"string or list[string]\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"year\",\n",
|
||||
" description=\"The year the movie was released\", \n",
|
||||
" type=\"integer\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"director\",\n",
|
||||
" description=\"The name of the movie director\", \n",
|
||||
" type=\"string\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"rating\",\n",
|
||||
" description=\"A 1-10 rating for the movie\",\n",
|
||||
" type=\"float\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"document_content_description = \"Brief summary of a movie\"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea9df8d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Testing it out\n",
|
||||
"And now we can try actually using our retriever!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "38a126e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='dinosaur' filter=None limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "fc3f1e6e",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a filter\n",
|
||||
"retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "b19d4da0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and a filter\n",
|
||||
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "f900e40e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction')]) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "12a51522",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LT: 'lt'>, attribute='year', value=2005), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated')]) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filter k\n",
|
||||
"\n",
|
||||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
|
||||
"\n",
|
||||
"We can do this by passing `enable_limit=True` to the constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, \n",
|
||||
" vectorstore, \n",
|
||||
" document_content_description, \n",
|
||||
" metadata_field_info, \n",
|
||||
" enable_limit=True,\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "2758d229-4f97-499c-819f-888acaf8ee10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='dinosaur' filter=None limit=2\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"what are two movies about dinosaurs\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,238 +1,580 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ElasticSearch\n",
|
||||
"\n",
|
||||
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Elasticsearch` database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out [Elasticsearch installation instructions](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html).\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that does not require\n",
|
||||
"login credentials, pass the Elasticsearch URL and index name along with the\n",
|
||||
"embedding object to the constructor.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that requires login credentials,\n",
|
||||
"including Elastic Cloud, use the Elasticsearch URL format\n",
|
||||
"https://username:password@es_host:9243. For example, to connect to Elastic\n",
|
||||
"Cloud, create the Elasticsearch URL with the required authentication details and\n",
|
||||
"pass it to the ElasticVectorSearch constructor as the named parameter\n",
|
||||
"elasticsearch_url.\n",
|
||||
"\n",
|
||||
"You can obtain your Elastic Cloud URL and login credentials by logging in to the\n",
|
||||
"Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and\n",
|
||||
"navigating to the \"Deployments\" page.\n",
|
||||
"\n",
|
||||
"To obtain your Elastic Cloud password for the default \"elastic\" user:\n",
|
||||
"1. Log in to the Elastic Cloud console at https://cloud.elastic.co\n",
|
||||
"2. Go to \"Security\" > \"Users\"\n",
|
||||
"3. Locate the \"elastic\" user and click \"Edit\"\n",
|
||||
"4. Click \"Reset password\"\n",
|
||||
"5. Follow the prompts to reset the password\n",
|
||||
"\n",
|
||||
"Format for Elastic Cloud URLs is\n",
|
||||
"https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
" elastic_host = \"cluster_id.region_id.gcp.cloud.es.io\"\n",
|
||||
" elasticsearch_url = f\"https://username:password@{elastic_host}:9243\"\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=elasticsearch_url,\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install elasticsearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
"cells": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import ElasticVectorSearch\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "4b172de8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {
|
||||
"id": "683953b3"
|
||||
},
|
||||
"source": [
|
||||
"# ElasticSearch\n",
|
||||
"\n",
|
||||
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Elasticsearch` database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
|
||||
"\n",
|
||||
"We cannot let this happen. \n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticVectorSearch class"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409"
|
||||
},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718",
|
||||
"metadata": {
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718"
|
||||
},
|
||||
"source": [
|
||||
"Check out [Elasticsearch installation instructions](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html).\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that does not require\n",
|
||||
"login credentials, pass the Elasticsearch URL and index name along with the\n",
|
||||
"embedding object to the constructor.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that requires login credentials,\n",
|
||||
"including Elastic Cloud, use the Elasticsearch URL format\n",
|
||||
"https://username:password@es_host:9243. For example, to connect to Elastic\n",
|
||||
"Cloud, create the Elasticsearch URL with the required authentication details and\n",
|
||||
"pass it to the ElasticVectorSearch constructor as the named parameter\n",
|
||||
"elasticsearch_url.\n",
|
||||
"\n",
|
||||
"You can obtain your Elastic Cloud URL and login credentials by logging in to the\n",
|
||||
"Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and\n",
|
||||
"navigating to the \"Deployments\" page.\n",
|
||||
"\n",
|
||||
"To obtain your Elastic Cloud password for the default \"elastic\" user:\n",
|
||||
"1. Log in to the Elastic Cloud console at https://cloud.elastic.co\n",
|
||||
"2. Go to \"Security\" > \"Users\"\n",
|
||||
"3. Locate the \"elastic\" user and click \"Edit\"\n",
|
||||
"4. Click \"Reset password\"\n",
|
||||
"5. Follow the prompts to reset the password\n",
|
||||
"\n",
|
||||
"Format for Elastic Cloud URLs is\n",
|
||||
"https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
" elastic_host = \"cluster_id.region_id.gcp.cloud.es.io\"\n",
|
||||
" elasticsearch_url = f\"https://username:password@{elastic_host}:9243\"\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=elasticsearch_url,\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install elasticsearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0"
|
||||
},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "aac9563e"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import ElasticVectorSearch\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "a3c3999a"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "12eb86d8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4b172de8",
|
||||
"metadata": {
|
||||
"id": "4b172de8",
|
||||
"outputId": "ca05a209-4514-4b5c-f6cb-2348f58c19a2"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
|
||||
"\n",
|
||||
"We cannot let this happen. \n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticKnnSearch Class\n",
|
||||
"The `ElasticKnnSearch` implements features allowing storing vectors and documents in Elasticsearch for use with approximate [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "FheGPztJsrRB"
|
||||
},
|
||||
"id": "FheGPztJsrRB"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install langchain elasticsearch"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"from langchain.vectorstores.elastic_vector_search import ElasticKnnSearch\n",
|
||||
"from langchain.embeddings import ElasticsearchEmbeddings\n",
|
||||
"import elasticsearch"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticsearchEmbeddings\n",
|
||||
"model_id = \"<model_id_from_es>\" \n",
|
||||
"dims = dim_count\n",
|
||||
"es_cloud_id = \"ESS_CLOUD_ID\"\n",
|
||||
"es_user = \"es_user\"\n",
|
||||
"es_password = \"es_pass\"\n",
|
||||
"test_index = \"<index_name>\"\n",
|
||||
"#input_field = \"your_input_field\" # if different from 'text_field'"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Generate embedding object\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
|
||||
" model_id,\n",
|
||||
" #input_field=input_field,\n",
|
||||
" es_cloud_id=es_cloud_id,\n",
|
||||
" es_user=es_user,\n",
|
||||
" es_password=es_password,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
"\tes_cloud_id=es_cloud_id, \n",
|
||||
"\tes_user=es_user, \n",
|
||||
"\tes_password=es_password, \n",
|
||||
"\tindex_name= test_index, \n",
|
||||
"\tembedding= embeddings\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test adding vectors"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "7AfgIKLWqnQl"
|
||||
},
|
||||
"id": "7AfgIKLWqnQl"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `add_texts` method\n",
|
||||
"texts = [\"Hello, world!\", \"Machine learning is fun.\", \"I love Python.\"]\n",
|
||||
"knn_search.add_texts(texts)\n",
|
||||
"\n",
|
||||
"# Test `from_texts` method\n",
|
||||
"new_texts = [\"This is a new text.\", \"Elasticsearch is powerful.\", \"Python is great for data analysis.\"]\n",
|
||||
"knn_search.from_texts(new_texts, dims=dims)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using query vector builder "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0zdR-Iubquov"
|
||||
},
|
||||
"id": "0zdR-Iubquov"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"kNN search results for query '{query}': {knn_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"Hybrid search results for query '{query}': {hybrid_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{hybrid_result['hits']['hits'][0]['_source']['text']}'\")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using pre generated vector \n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ltXYqp0qqz7R"
|
||||
},
|
||||
"id": "ltXYqp0qqz7R"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Generate embedding for tests\n",
|
||||
"query_text = 'Hello'\n",
|
||||
"query_embedding = embeddings.embed_query(query_text)\n",
|
||||
"print(f\"Length of embedding: {len(query_embedding)}\\nFirst two items in embedding: {query_embedding[:2]}\")\n",
|
||||
"\n",
|
||||
"# Test knn Search\n",
|
||||
"knn_result = knn_search.knn_search(query_vector = query_embedding, k=2)\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n",
|
||||
"\n",
|
||||
"# Test hybrid search - Requires both query_text and query_vector\n",
|
||||
"knn_result = knn_search.knn_hybrid_search(query_vector = query_embedding, query=query_text, k=2)\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test source option"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0dnmimcJq42C"
|
||||
},
|
||||
"id": "0dnmimcJq42C"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2, source=False)\n",
|
||||
"assert not '_source' in knn_result['hits']['hits'][0].keys()\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2, source=False)\n",
|
||||
"assert not '_source' in hybrid_result['hits']['hits'][0].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test fields option "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "teHgJgrlq-Jb"
|
||||
},
|
||||
"id": "teHgJgrlq-Jb"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2, fields=['text'])\n",
|
||||
"assert 'text' in knn_result['hits']['hits'][0]['fields'].keys()\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2, fields=['text'])\n",
|
||||
"assert 'text' in hybrid_result['hits']['hits'][0]['fields'].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Test with es client connection rather than cloud_id "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hddsIFferBy1"
|
||||
},
|
||||
"id": "hddsIFferBy1"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Create Elasticsearch connection\n",
|
||||
"es_connection = Elasticsearch(\n",
|
||||
" hosts=['https://es_cluster_url:port'], \n",
|
||||
" basic_auth=('user', 'password')\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Instantiate ElasticsearchEmbeddings using es_connection\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_es_connection(\n",
|
||||
" model_id,\n",
|
||||
" es_connection,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
"\tes_connection = es_connection,\n",
|
||||
"\tindex_name= test_index, \n",
|
||||
"\tembedding= embeddings\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"kNN search results for query '{query}': {knn_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0kgyaL6QrYVF"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "0kgyaL6QrYVF"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"colab": {
|
||||
"provenance": []
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -401,17 +401,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "525e3582",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Metadata filtering\n",
|
||||
"\n",
|
||||
"Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1c2c58dc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"from qdrant_client.http import models as rest\n",
|
||||
@@ -419,10 +420,7 @@
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))\n",
|
||||
"```"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -683,7 +681,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
"This notebook goes over adding memory to an Agent. Before going through this notebook, please walkthrough the following notebooks, as this will build on top of both of them:\n",
|
||||
"\n",
|
||||
"- [Adding memory to an LLM Chain](adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/examples/custom_agent.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/agents/custom_agent.ipynb)\n",
|
||||
"\n",
|
||||
"In order to add a memory to an agent we are going to the the following steps:\n",
|
||||
"\n",
|
||||
|
||||
@@ -10,8 +10,8 @@
|
||||
"This notebook goes over adding memory to an Agent where the memory uses an external message store. Before going through this notebook, please walkthrough the following notebooks, as this will build on top of both of them:\n",
|
||||
"\n",
|
||||
"- [Adding memory to an LLM Chain](adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/examples/custom_agent.ipynb)\n",
|
||||
"- [Agent with Memory](agetn_with_memory.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/agents/custom_agent.ipynb)\n",
|
||||
"- [Agent with Memory](agent_with_memory.ipynb)\n",
|
||||
"\n",
|
||||
"In order to add a memory with an external message store to an agent we are going to do the following steps:\n",
|
||||
"\n",
|
||||
|
||||
198
docs/modules/memory/examples/motorhead_memory_managed.ipynb
Normal file
198
docs/modules/memory/examples/motorhead_memory_managed.ipynb
Normal file
@@ -0,0 +1,198 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Motörhead Memory (Managed)\n",
|
||||
"[Motörhead](https://github.com/getmetal/motorhead) is a memory server implemented in Rust. It automatically handles incremental summarization in the background and allows for stateless applications.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"See instructions at [Motörhead](https://docs.getmetal.io/motorhead/introduction) for running the managed version of Motorhead. You can retrieve your `api_key` and `client_id` by creating an account on [Metal](https://getmetal.io).\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.motorhead_memory import MotorheadMemory\n",
|
||||
"from langchain import OpenAI, LLMChain, PromptTemplate\n",
|
||||
"\n",
|
||||
"template = \"\"\"You are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"{chat_history}\n",
|
||||
"Human: {human_input}\n",
|
||||
"AI:\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"chat_history\", \"human_input\"], \n",
|
||||
" template=template\n",
|
||||
")\n",
|
||||
"memory = MotorheadMemory(\n",
|
||||
" api_key=\"YOUR_API_KEY\",\n",
|
||||
" client_id=\"YOUR_CLIENT_ID\"\n",
|
||||
" session_id=\"testing-1\",\n",
|
||||
" memory_key=\"chat_history\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"await memory.init(); # loads previous state from Motörhead 🤘\n",
|
||||
"\n",
|
||||
"llm_chain = LLMChain(\n",
|
||||
" llm=OpenAI(), \n",
|
||||
" prompt=prompt, \n",
|
||||
" verbose=True, \n",
|
||||
" memory=memory,\n",
|
||||
")\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Hi Bob, nice to meet you! How are you doing today?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"hi im bob\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI: Hi Bob, nice to meet you! How are you doing today?\n",
|
||||
"Human: whats my name?\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' You said your name is Bob. Is that correct?'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"whats my name?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a chatbot having a conversation with a human.\n",
|
||||
"\n",
|
||||
"Human: hi im bob\n",
|
||||
"AI: Hi Bob, nice to meet you! How are you doing today?\n",
|
||||
"Human: whats my name?\n",
|
||||
"AI: You said your name is Bob. Is that correct?\n",
|
||||
"Human: whats for dinner?\n",
|
||||
"AI:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" I'm sorry, I'm not sure what you're asking. Could you please rephrase your question?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.run(\"whats for dinner?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -12,12 +12,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 1,
|
||||
"id": "ac95c968",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector\n",
|
||||
"from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector, SemanticSimilarityExampleSelector\n",
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.prompts import FewShotPromptTemplate, PromptTemplate\n",
|
||||
@@ -39,7 +39,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 2,
|
||||
"id": "db579bea",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -66,7 +66,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 3,
|
||||
"id": "cd76e344",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -94,7 +94,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 4,
|
||||
"id": "cf82956b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -107,8 +107,8 @@
|
||||
"Input: happy\n",
|
||||
"Output: sad\n",
|
||||
"\n",
|
||||
"Input: windy\n",
|
||||
"Output: calm\n",
|
||||
"Input: sunny\n",
|
||||
"Output: gloomy\n",
|
||||
"\n",
|
||||
"Input: worried\n",
|
||||
"Output:\n"
|
||||
@@ -116,7 +116,18 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Let's compare this to what we would just get if we went solely off of similarity\n",
|
||||
"# Let's compare this to what we would just get if we went solely off of similarity,\n",
|
||||
"# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.\n",
|
||||
"example_selector = SemanticSimilarityExampleSelector.from_examples(\n",
|
||||
" # This is the list of examples available to select from.\n",
|
||||
" examples, \n",
|
||||
" # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
|
||||
" OpenAIEmbeddings(), \n",
|
||||
" # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
|
||||
" FAISS, \n",
|
||||
" # This is the number of examples to produce.\n",
|
||||
" k=2\n",
|
||||
")\n",
|
||||
"similar_prompt = FewShotPromptTemplate(\n",
|
||||
" # We provide an ExampleSelector instead of examples.\n",
|
||||
" example_selector=example_selector,\n",
|
||||
@@ -125,7 +136,6 @@
|
||||
" suffix=\"Input: {adjective}\\nOutput:\", \n",
|
||||
" input_variables=[\"adjective\"],\n",
|
||||
")\n",
|
||||
"similar_prompt.example_selector.k = 2\n",
|
||||
"print(similar_prompt.format(adjective=\"worried\"))"
|
||||
]
|
||||
},
|
||||
@@ -154,7 +164,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
==========
|
||||
====================
|
||||
Experimental Modules
|
||||
==========
|
||||
====================
|
||||
|
||||
This module contains experimental modules and reproductions of existing work using LangChain primitives.
|
||||
|
||||
|
||||
64
docs/templates/integration.md
vendored
Normal file
64
docs/templates/integration.md
vendored
Normal file
@@ -0,0 +1,64 @@
|
||||
|
||||
[comment: Please, a reference example here "docs/integrations/arxiv.md"]::
|
||||
[comment: Use this template to create a new .md file in "docs/integrations/"]::
|
||||
|
||||
# Title_REPLACE_ME
|
||||
|
||||
[comment: Only one Tile/H1 is allowed!]::
|
||||
|
||||
>
|
||||
|
||||
[comment: Description: After reading this description, a reader should decide if this integration is good enough to try/follow reading OR]::
|
||||
[comment: go to read the next integration doc. ]::
|
||||
[comment: Description should include a link to the source for follow reading.]::
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
[comment: Installation and Setup: All necessary additional package installations and set ups for Tokens, etc]::
|
||||
|
||||
```bash
|
||||
pip install package_name_REPLACE_ME
|
||||
```
|
||||
|
||||
[comment: OR this text:]::
|
||||
There isn't any special setup for it.
|
||||
|
||||
|
||||
[comment: The next H2/## sections with names of the integration modules, like "LLM", "Text Embedding Models", etc]::
|
||||
[comment: see "Modules" in the "index.html" page]::
|
||||
[comment: Each H2 section should include a link to an example(s) and a python code with import of the integration class]::
|
||||
[comment: Below are several example sections. Remove all unnecessary sections. Add all necessary sections not provided here.]::
|
||||
|
||||
## LLM
|
||||
|
||||
See a [usage example](../modules/models/llms/integrations/INCLUDE_REAL_NAME.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.llms import integration_class_REPLACE_ME
|
||||
```
|
||||
|
||||
|
||||
## Text Embedding Models
|
||||
|
||||
See a [usage example](../modules/models/text_embedding/examples/INCLUDE_REAL_NAME.ipynb)
|
||||
|
||||
```python
|
||||
from langchain.embeddings import integration_class_REPLACE_ME
|
||||
```
|
||||
|
||||
|
||||
## Chat Models
|
||||
|
||||
See a [usage example](../modules/models/chat/integrations/INCLUDE_REAL_NAME.ipynb)
|
||||
|
||||
```python
|
||||
from langchain.chat_models import integration_class_REPLACE_ME
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/INCLUDE_REAL_NAME.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import integration_class_REPLACE_ME
|
||||
```
|
||||
@@ -18,7 +18,7 @@ Markdown code snippet formatted in the following schema:
|
||||
|
||||
```json
|
||||
{{{{
|
||||
"action": string \\ The action to take. Must be one of {tool_names}
|
||||
"action": string, \\ The action to take. Must be one of {tool_names}
|
||||
"action_input": string \\ The input to the action
|
||||
}}}}
|
||||
```
|
||||
|
||||
@@ -1,11 +1,14 @@
|
||||
"""Callback handlers that allow listening to events in LangChain."""
|
||||
|
||||
from langchain.callbacks.aim_callback import AimCallbackHandler
|
||||
from langchain.callbacks.argilla_callback import ArgillaCallbackHandler
|
||||
from langchain.callbacks.clearml_callback import ClearMLCallbackHandler
|
||||
from langchain.callbacks.comet_ml_callback import CometCallbackHandler
|
||||
from langchain.callbacks.human import HumanApprovalCallbackHandler
|
||||
from langchain.callbacks.manager import (
|
||||
get_openai_callback,
|
||||
tracing_enabled,
|
||||
wandb_tracing_enabled,
|
||||
)
|
||||
from langchain.callbacks.mlflow_callback import MlflowCallbackHandler
|
||||
from langchain.callbacks.openai_info import OpenAICallbackHandler
|
||||
@@ -15,6 +18,7 @@ from langchain.callbacks.wandb_callback import WandbCallbackHandler
|
||||
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
|
||||
|
||||
__all__ = [
|
||||
"ArgillaCallbackHandler",
|
||||
"OpenAICallbackHandler",
|
||||
"StdOutCallbackHandler",
|
||||
"AimCallbackHandler",
|
||||
@@ -26,4 +30,6 @@ __all__ = [
|
||||
"AsyncIteratorCallbackHandler",
|
||||
"get_openai_callback",
|
||||
"tracing_enabled",
|
||||
"wandb_tracing_enabled",
|
||||
"HumanApprovalCallbackHandler",
|
||||
]
|
||||
|
||||
316
langchain/callbacks/argilla_callback.py
Normal file
316
langchain/callbacks/argilla_callback.py
Normal file
@@ -0,0 +1,316 @@
|
||||
import os
|
||||
import warnings
|
||||
from typing import Any, Dict, List, Optional, Union
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.schema import AgentAction, AgentFinish, LLMResult
|
||||
|
||||
|
||||
class ArgillaCallbackHandler(BaseCallbackHandler):
|
||||
"""Callback Handler that logs into Argilla.
|
||||
|
||||
Args:
|
||||
dataset_name: name of the `FeedbackDataset` in Argilla. Note that it must
|
||||
exist in advance. If you need help on how to create a `FeedbackDataset` in
|
||||
Argilla, please visit
|
||||
https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html.
|
||||
workspace_name: name of the workspace in Argilla where the specified
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that the
|
||||
default workspace will be used.
|
||||
api_url: URL of the Argilla Server that we want to use, and where the
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that either
|
||||
`ARGILLA_API_URL` environment variable or the default http://localhost:6900
|
||||
will be used.
|
||||
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
|
||||
means that either `ARGILLA_API_KEY` environment variable or the default
|
||||
`argilla.apikey` will be used.
|
||||
|
||||
Raises:
|
||||
ImportError: if the `argilla` package is not installed.
|
||||
ConnectionError: if the connection to Argilla fails.
|
||||
FileNotFoundError: if the `FeedbackDataset` retrieval from Argilla fails.
|
||||
|
||||
Examples:
|
||||
>>> from langchain.llms import OpenAI
|
||||
>>> from langchain.callbacks import ArgillaCallbackHandler
|
||||
>>> argilla_callback = ArgillaCallbackHandler(
|
||||
... dataset_name="my-dataset",
|
||||
... workspace_name="my-workspace",
|
||||
... api_url="http://localhost:6900",
|
||||
... api_key="argilla.apikey",
|
||||
... )
|
||||
>>> llm = OpenAI(
|
||||
... temperature=0,
|
||||
... callbacks=[argilla_callback],
|
||||
... verbose=True,
|
||||
... openai_api_key="API_KEY_HERE",
|
||||
... )
|
||||
>>> llm.generate([
|
||||
... "What is the best NLP-annotation tool out there? (no bias at all)",
|
||||
... ])
|
||||
"Argilla, no doubt about it."
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
dataset_name: str,
|
||||
workspace_name: Optional[str] = None,
|
||||
api_url: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
) -> None:
|
||||
"""Initializes the `ArgillaCallbackHandler`.
|
||||
|
||||
Args:
|
||||
dataset_name: name of the `FeedbackDataset` in Argilla. Note that it must
|
||||
exist in advance. If you need help on how to create a `FeedbackDataset`
|
||||
in Argilla, please visit
|
||||
https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html.
|
||||
workspace_name: name of the workspace in Argilla where the specified
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that the
|
||||
default workspace will be used.
|
||||
api_url: URL of the Argilla Server that we want to use, and where the
|
||||
`FeedbackDataset` lives in. Defaults to `None`, which means that either
|
||||
`ARGILLA_API_URL` environment variable or the default
|
||||
http://localhost:6900 will be used.
|
||||
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
|
||||
means that either `ARGILLA_API_KEY` environment variable or the default
|
||||
`argilla.apikey` will be used.
|
||||
|
||||
Raises:
|
||||
ImportError: if the `argilla` package is not installed.
|
||||
ConnectionError: if the connection to Argilla fails.
|
||||
FileNotFoundError: if the `FeedbackDataset` retrieval from Argilla fails.
|
||||
"""
|
||||
|
||||
super().__init__()
|
||||
|
||||
# Import Argilla (not via `import_argilla` to keep hints in IDEs)
|
||||
try:
|
||||
import argilla as rg # noqa: F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"To use the Argilla callback manager you need to have the `argilla` "
|
||||
"Python package installed. Please install it with `pip install argilla`"
|
||||
)
|
||||
|
||||
# Show a warning message if Argilla will assume the default values will be used
|
||||
if api_url is None and os.getenv("ARGILLA_API_URL") is None:
|
||||
warnings.warn(
|
||||
(
|
||||
"Since `api_url` is None, and the env var `ARGILLA_API_URL` is not"
|
||||
" set, it will default to `http://localhost:6900`."
|
||||
),
|
||||
)
|
||||
if api_key is None and os.getenv("ARGILLA_API_KEY") is None:
|
||||
warnings.warn(
|
||||
(
|
||||
"Since `api_key` is None, and the env var `ARGILLA_API_KEY` is not"
|
||||
" set, it will default to `argilla.apikey`."
|
||||
),
|
||||
)
|
||||
|
||||
# Connect to Argilla with the provided credentials, if applicable
|
||||
try:
|
||||
rg.init(
|
||||
api_key=api_key,
|
||||
api_url=api_url,
|
||||
)
|
||||
except Exception as e:
|
||||
raise ConnectionError(
|
||||
f"Could not connect to Argilla with exception: '{e}'.\n"
|
||||
"Please check your `api_key` and `api_url`, and make sure that "
|
||||
"the Argilla server is up and running. If the problem persists "
|
||||
"please report it to https://github.com/argilla-io/argilla/issues "
|
||||
"with the label `langchain`."
|
||||
) from e
|
||||
|
||||
# Set the Argilla variables
|
||||
self.dataset_name = dataset_name
|
||||
self.workspace_name = workspace_name or rg.get_workspace()
|
||||
|
||||
# Retrieve the `FeedbackDataset` from Argilla (without existing records)
|
||||
try:
|
||||
self.dataset = rg.FeedbackDataset.from_argilla(
|
||||
name=self.dataset_name,
|
||||
workspace=self.workspace_name,
|
||||
with_records=False,
|
||||
)
|
||||
except Exception as e:
|
||||
raise FileNotFoundError(
|
||||
"`FeedbackDataset` retrieval from Argilla failed with exception:"
|
||||
f" '{e}'.\nPlease check that the dataset with"
|
||||
f" name={self.dataset_name} in the"
|
||||
f" workspace={self.workspace_name} exists in advance. If you need help"
|
||||
" on how to create a `langchain`-compatible `FeedbackDataset` in"
|
||||
" Argilla, please visit"
|
||||
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
|
||||
" If the problem persists please report it to"
|
||||
" https://github.com/argilla-io/argilla/issues with the label"
|
||||
" `langchain`."
|
||||
) from e
|
||||
|
||||
supported_fields = ["prompt", "response"]
|
||||
if supported_fields != [field.name for field in self.dataset.fields]:
|
||||
raise ValueError(
|
||||
f"`FeedbackDataset` with name={self.dataset_name} in the"
|
||||
f" workspace={self.workspace_name} "
|
||||
"had fields that are not supported yet for the `langchain` integration."
|
||||
" Supported fields are: "
|
||||
f"{supported_fields}, and the current `FeedbackDataset` fields are"
|
||||
f" {[field.name for field in self.dataset.fields]}. "
|
||||
"For more information on how to create a `langchain`-compatible"
|
||||
" `FeedbackDataset` in Argilla, please visit"
|
||||
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
|
||||
)
|
||||
|
||||
self.prompts: Dict[str, List[str]] = {}
|
||||
|
||||
warnings.warn(
|
||||
(
|
||||
"The `ArgillaCallbackHandler` is currently in beta and is subject to "
|
||||
"change based on updates to `langchain`. Please report any issues to "
|
||||
"https://github.com/argilla-io/argilla/issues with the tag `langchain`."
|
||||
),
|
||||
)
|
||||
|
||||
def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
"""Save the prompts in memory when an LLM starts."""
|
||||
self.prompts.update({str(kwargs["parent_run_id"] or kwargs["run_id"]): prompts})
|
||||
|
||||
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
"""Do nothing when a new token is generated."""
|
||||
pass
|
||||
|
||||
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
"""Log records to Argilla when an LLM ends."""
|
||||
# Do nothing if there's a parent_run_id, since we will log the records when
|
||||
# the chain ends
|
||||
if kwargs["parent_run_id"]:
|
||||
return
|
||||
|
||||
# Creates the records and adds them to the `FeedbackDataset`
|
||||
prompts = self.prompts[str(kwargs["run_id"])]
|
||||
for prompt, generations in zip(prompts, response.generations):
|
||||
self.dataset.add_records(
|
||||
records=[
|
||||
{
|
||||
"fields": {
|
||||
"prompt": prompt,
|
||||
"response": generation.text.strip(),
|
||||
},
|
||||
}
|
||||
for generation in generations
|
||||
]
|
||||
)
|
||||
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
# Pop current run from `self.runs`
|
||||
self.prompts.pop(str(kwargs["run_id"]))
|
||||
|
||||
def on_llm_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when LLM outputs an error."""
|
||||
pass
|
||||
|
||||
def on_chain_start(
|
||||
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when LLM chain starts."""
|
||||
if "input" in inputs:
|
||||
self.prompts.update(
|
||||
{
|
||||
str(kwargs["parent_run_id"] or kwargs["run_id"]): (
|
||||
inputs["input"]
|
||||
if isinstance(inputs["input"], list)
|
||||
else [inputs["input"]]
|
||||
)
|
||||
}
|
||||
)
|
||||
|
||||
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
|
||||
"""Do nothing when LLM chain ends."""
|
||||
prompts = self.prompts[str(kwargs["parent_run_id"] or kwargs["run_id"])]
|
||||
if "outputs" in outputs:
|
||||
# Creates the records and adds them to the `FeedbackDataset`
|
||||
self.dataset.add_records(
|
||||
records=[
|
||||
{
|
||||
"fields": {
|
||||
"prompt": prompt,
|
||||
"response": output["text"].strip(),
|
||||
},
|
||||
}
|
||||
for prompt, output in zip(prompts, outputs["outputs"])
|
||||
]
|
||||
)
|
||||
elif "output" in outputs:
|
||||
# Creates the records and adds them to the `FeedbackDataset`
|
||||
self.dataset.add_records(
|
||||
records=[
|
||||
{
|
||||
"fields": {
|
||||
"prompt": " ".join(prompts),
|
||||
"response": outputs["output"].strip(),
|
||||
},
|
||||
}
|
||||
]
|
||||
)
|
||||
else:
|
||||
raise ValueError(
|
||||
"The `outputs` dictionary did not contain the expected keys `outputs` "
|
||||
"or `output`."
|
||||
)
|
||||
|
||||
# Push the records to Argilla
|
||||
self.dataset.push_to_argilla()
|
||||
|
||||
# Pop current run from `self.runs`
|
||||
self.prompts.pop(str(kwargs["parent_run_id"] or kwargs["run_id"]))
|
||||
|
||||
def on_chain_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when LLM chain outputs an error."""
|
||||
pass
|
||||
|
||||
def on_tool_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
input_str: str,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing when tool starts."""
|
||||
pass
|
||||
|
||||
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
|
||||
"""Do nothing when agent takes a specific action."""
|
||||
pass
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
output: str,
|
||||
observation_prefix: Optional[str] = None,
|
||||
llm_prefix: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing when tool ends."""
|
||||
pass
|
||||
|
||||
def on_tool_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing when tool outputs an error."""
|
||||
pass
|
||||
|
||||
def on_text(self, text: str, **kwargs: Any) -> None:
|
||||
"""Do nothing"""
|
||||
pass
|
||||
|
||||
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
|
||||
"""Do nothing"""
|
||||
pass
|
||||
@@ -188,6 +188,8 @@ class BaseCallbackHandler(
|
||||
):
|
||||
"""Base callback handler that can be used to handle callbacks from langchain."""
|
||||
|
||||
raise_error: bool = False
|
||||
|
||||
@property
|
||||
def ignore_llm(self) -> bool:
|
||||
"""Whether to ignore LLM callbacks."""
|
||||
@@ -208,6 +210,10 @@ class BaseCallbackHandler(
|
||||
"""Whether to ignore chat model callbacks."""
|
||||
return False
|
||||
|
||||
def cleanup(self) -> None:
|
||||
"""Cleanup callback handler."""
|
||||
pass
|
||||
|
||||
|
||||
class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
"""Async callback handler that can be used to handle callbacks from langchain."""
|
||||
@@ -359,6 +365,10 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
) -> None:
|
||||
"""Run on agent end."""
|
||||
|
||||
async def cleanup(self) -> None:
|
||||
"""Cleanup callback handler."""
|
||||
pass
|
||||
|
||||
|
||||
class BaseCallbackManager(CallbackManagerMixin):
|
||||
"""Base callback manager that can be used to handle callbacks from LangChain."""
|
||||
|
||||
50
langchain/callbacks/human.py
Normal file
50
langchain/callbacks/human.py
Normal file
@@ -0,0 +1,50 @@
|
||||
from typing import Any, Callable, Dict, Optional
|
||||
from uuid import UUID
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
|
||||
|
||||
def _default_approve(_input: str) -> bool:
|
||||
msg = (
|
||||
"Do you approve of the following input? "
|
||||
"Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no."
|
||||
)
|
||||
msg += "\n\n" + _input + "\n"
|
||||
resp = input(msg)
|
||||
return resp.lower() in ("yes", "y")
|
||||
|
||||
|
||||
def _default_true(_: Dict[str, Any]) -> bool:
|
||||
return True
|
||||
|
||||
|
||||
class HumanRejectedException(Exception):
|
||||
"""Exception to raise when a person manually review and rejects a value."""
|
||||
|
||||
|
||||
class HumanApprovalCallbackHandler(BaseCallbackHandler):
|
||||
"""Callback for manually validating values."""
|
||||
|
||||
raise_error: bool = True
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
approve: Callable[[Any], bool] = _default_approve,
|
||||
should_check: Callable[[Dict[str, Any]], bool] = _default_true,
|
||||
):
|
||||
self._approve = approve
|
||||
self._should_check = should_check
|
||||
|
||||
def on_tool_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
input_str: str,
|
||||
*,
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
if self._should_check(serialized) and not self._approve(input_str):
|
||||
raise HumanRejectedException(
|
||||
f"Inputs {input_str} to tool {serialized} were rejected."
|
||||
)
|
||||
@@ -23,8 +23,8 @@ from langchain.callbacks.openai_info import OpenAICallbackHandler
|
||||
from langchain.callbacks.stdout import StdOutCallbackHandler
|
||||
from langchain.callbacks.tracers.langchain import LangChainTracer
|
||||
from langchain.callbacks.tracers.langchain_v1 import LangChainTracerV1, TracerSessionV1
|
||||
from langchain.callbacks.tracers.schemas import TracerSession
|
||||
from langchain.callbacks.tracers.stdout import ConsoleCallbackHandler
|
||||
from langchain.callbacks.tracers.wandb import WandbTracer
|
||||
from langchain.schema import (
|
||||
AgentAction,
|
||||
AgentFinish,
|
||||
@@ -44,6 +44,12 @@ tracing_callback_var: ContextVar[
|
||||
] = ContextVar( # noqa: E501
|
||||
"tracing_callback", default=None
|
||||
)
|
||||
wandb_tracing_callback_var: ContextVar[
|
||||
Optional[WandbTracer]
|
||||
] = ContextVar( # noqa: E501
|
||||
"tracing_wandb_callback", default=None
|
||||
)
|
||||
|
||||
tracing_v2_callback_var: ContextVar[
|
||||
Optional[LangChainTracer]
|
||||
] = ContextVar( # noqa: E501
|
||||
@@ -76,31 +82,37 @@ def tracing_enabled(
|
||||
tracing_callback_var.set(None)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def wandb_tracing_enabled(
|
||||
session_name: str = "default",
|
||||
) -> Generator[None, None, None]:
|
||||
"""Get WandbTracer in a context manager."""
|
||||
cb = WandbTracer()
|
||||
wandb_tracing_callback_var.set(cb)
|
||||
yield None
|
||||
wandb_tracing_callback_var.set(None)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def tracing_v2_enabled(
|
||||
session_name: Optional[str] = None,
|
||||
*,
|
||||
example_id: Optional[Union[str, UUID]] = None,
|
||||
tenant_id: Optional[str] = None,
|
||||
session_extra: Optional[Dict[str, Any]] = None,
|
||||
) -> Generator[TracerSession, None, None]:
|
||||
) -> Generator[None, None, None]:
|
||||
"""Get the experimental tracer handler in a context manager."""
|
||||
# Issue a warning that this is experimental
|
||||
warnings.warn(
|
||||
"The experimental tracing v2 is in development. "
|
||||
"The tracing v2 API is in development. "
|
||||
"This is not yet stable and may change in the future."
|
||||
)
|
||||
if isinstance(example_id, str):
|
||||
example_id = UUID(example_id)
|
||||
cb = LangChainTracer(
|
||||
tenant_id=tenant_id,
|
||||
session_name=session_name,
|
||||
example_id=example_id,
|
||||
session_extra=session_extra,
|
||||
session_name=session_name,
|
||||
)
|
||||
session = cb.ensure_session()
|
||||
tracing_v2_callback_var.set(cb)
|
||||
yield session
|
||||
yield
|
||||
tracing_v2_callback_var.set(None)
|
||||
|
||||
|
||||
@@ -135,6 +147,9 @@ def _handle_event(
|
||||
else:
|
||||
logger.warning(f"Error in {event_name} callback: {e}")
|
||||
except Exception as e:
|
||||
handler.cleanup()
|
||||
if handler.raise_error:
|
||||
raise e
|
||||
logging.warning(f"Error in {event_name} callback: {e}")
|
||||
|
||||
|
||||
@@ -169,6 +184,12 @@ async def _ahandle_event_for_handler(
|
||||
else:
|
||||
logger.warning(f"Error in {event_name} callback: {e}")
|
||||
except Exception as e:
|
||||
if asyncio.iscoroutinefunction(handler.cleanup):
|
||||
await handler.cleanup()
|
||||
else:
|
||||
await asyncio.get_event_loop().run_in_executor(None, handler.cleanup)
|
||||
if handler.raise_error:
|
||||
raise e
|
||||
logger.warning(f"Error in {event_name} callback: {e}")
|
||||
|
||||
|
||||
@@ -831,12 +852,17 @@ def _configure(
|
||||
callback_manager.add_handler(handler, False)
|
||||
|
||||
tracer = tracing_callback_var.get()
|
||||
wandb_tracer = wandb_tracing_callback_var.get()
|
||||
open_ai = openai_callback_var.get()
|
||||
tracing_enabled_ = (
|
||||
os.environ.get("LANGCHAIN_TRACING") is not None
|
||||
or tracer is not None
|
||||
or os.environ.get("LANGCHAIN_HANDLER") is not None
|
||||
)
|
||||
wandb_tracing_enabled_ = (
|
||||
os.environ.get("LANGCHAIN_WANDB_TRACING") is not None
|
||||
or wandb_tracer is not None
|
||||
)
|
||||
|
||||
tracer_v2 = tracing_v2_callback_var.get()
|
||||
tracing_v2_enabled_ = (
|
||||
@@ -851,6 +877,7 @@ def _configure(
|
||||
or debug
|
||||
or tracing_enabled_
|
||||
or tracing_v2_enabled_
|
||||
or wandb_tracing_enabled_
|
||||
or open_ai is not None
|
||||
):
|
||||
if verbose and not any(
|
||||
@@ -876,6 +903,14 @@ def _configure(
|
||||
handler = LangChainTracerV1()
|
||||
handler.load_session(tracer_session)
|
||||
callback_manager.add_handler(handler, True)
|
||||
if wandb_tracing_enabled_ and not any(
|
||||
isinstance(handler, WandbTracer) for handler in callback_manager.handlers
|
||||
):
|
||||
if wandb_tracer:
|
||||
callback_manager.add_handler(wandb_tracer, True)
|
||||
else:
|
||||
handler = WandbTracer()
|
||||
callback_manager.add_handler(handler, True)
|
||||
if tracing_v2_enabled_ and not any(
|
||||
isinstance(handler, LangChainTracer)
|
||||
for handler in callback_manager.handlers
|
||||
@@ -885,7 +920,6 @@ def _configure(
|
||||
else:
|
||||
try:
|
||||
handler = LangChainTracer(session_name=tracer_session)
|
||||
handler.ensure_session()
|
||||
callback_manager.add_handler(handler, True)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
|
||||
@@ -3,5 +3,11 @@
|
||||
from langchain.callbacks.tracers.langchain import LangChainTracer
|
||||
from langchain.callbacks.tracers.langchain_v1 import LangChainTracerV1
|
||||
from langchain.callbacks.tracers.stdout import ConsoleCallbackHandler
|
||||
from langchain.callbacks.tracers.wandb import WandbTracer
|
||||
|
||||
__all__ = ["LangChainTracer", "LangChainTracerV1", "ConsoleCallbackHandler"]
|
||||
__all__ = [
|
||||
"LangChainTracer",
|
||||
"LangChainTracerV1",
|
||||
"ConsoleCallbackHandler",
|
||||
"WandbTracer",
|
||||
]
|
||||
|
||||
@@ -25,10 +25,8 @@ from langchain.callbacks.tracers.schemas import (
|
||||
RunTypeEnum,
|
||||
RunUpdate,
|
||||
TracerSession,
|
||||
TracerSessionCreate,
|
||||
)
|
||||
from langchain.schema import BaseMessage, messages_to_dict
|
||||
from langchain.utils import raise_for_status_with_text
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -65,49 +63,13 @@ retry_decorator = retry(
|
||||
)
|
||||
|
||||
|
||||
@retry_decorator
|
||||
def _get_tenant_id(
|
||||
tenant_id: Optional[str], endpoint: Optional[str], headers: Optional[dict]
|
||||
) -> str:
|
||||
"""Get the tenant ID for the LangChain API."""
|
||||
tenant_id_: Optional[str] = tenant_id or os.getenv("LANGCHAIN_TENANT_ID")
|
||||
if tenant_id_:
|
||||
return tenant_id_
|
||||
endpoint_ = endpoint or get_endpoint()
|
||||
headers_ = headers or get_headers()
|
||||
response = None
|
||||
try:
|
||||
response = requests.get(endpoint_ + "/tenants", headers=headers_)
|
||||
raise_for_status_with_text(response)
|
||||
except HTTPError as e:
|
||||
if response is not None and response.status_code == 500:
|
||||
raise LangChainTracerAPIError(
|
||||
f"Failed to get tenant ID from LangChain API. {e}"
|
||||
)
|
||||
else:
|
||||
raise LangChainTracerUserError(
|
||||
f"Failed to get tenant ID from LangChain API. {e}"
|
||||
)
|
||||
except Exception as e:
|
||||
raise LangChainTracerError(
|
||||
f"Failed to get tenant ID from LangChain API. {e}"
|
||||
) from e
|
||||
|
||||
tenants: List[Dict[str, Any]] = response.json()
|
||||
if not tenants:
|
||||
raise ValueError(f"No tenants found for URL {endpoint_}")
|
||||
return tenants[0]["id"]
|
||||
|
||||
|
||||
class LangChainTracer(BaseTracer):
|
||||
"""An implementation of the SharedTracer that POSTS to the langchain endpoint."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
tenant_id: Optional[str] = None,
|
||||
example_id: Optional[UUID] = None,
|
||||
session_name: Optional[str] = None,
|
||||
session_extra: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Initialize the LangChain tracer."""
|
||||
@@ -115,10 +77,8 @@ class LangChainTracer(BaseTracer):
|
||||
self.session: Optional[TracerSession] = None
|
||||
self._endpoint = get_endpoint()
|
||||
self._headers = get_headers()
|
||||
self.tenant_id = tenant_id
|
||||
self.example_id = example_id
|
||||
self.session_name = session_name or os.getenv("LANGCHAIN_SESSION", "default")
|
||||
self.session_extra = session_extra
|
||||
# set max_workers to 1 to process tasks in order
|
||||
self.executor = ThreadPoolExecutor(max_workers=1)
|
||||
|
||||
@@ -149,62 +109,20 @@ class LangChainTracer(BaseTracer):
|
||||
self._start_trace(chat_model_run)
|
||||
self._on_chat_model_start(chat_model_run)
|
||||
|
||||
def ensure_tenant_id(self) -> str:
|
||||
"""Load or use the tenant ID."""
|
||||
tenant_id = self.tenant_id or _get_tenant_id(
|
||||
self.tenant_id, self._endpoint, self._headers
|
||||
)
|
||||
self.tenant_id = tenant_id
|
||||
return tenant_id
|
||||
|
||||
@retry_decorator
|
||||
def ensure_session(self) -> TracerSession:
|
||||
"""Upsert a session."""
|
||||
if self.session is not None:
|
||||
return self.session
|
||||
tenant_id = self.ensure_tenant_id()
|
||||
url = f"{self._endpoint}/sessions?upsert=true"
|
||||
session_create = TracerSessionCreate(
|
||||
name=self.session_name, extra=self.session_extra, tenant_id=tenant_id
|
||||
)
|
||||
response = None
|
||||
try:
|
||||
response = requests.post(
|
||||
url,
|
||||
data=session_create.json(),
|
||||
headers=self._headers,
|
||||
)
|
||||
response.raise_for_status()
|
||||
except HTTPError as e:
|
||||
if response is not None and response.status_code == 500:
|
||||
raise LangChainTracerAPIError(
|
||||
f"Failed to upsert session to LangChain API. {e}"
|
||||
)
|
||||
else:
|
||||
raise LangChainTracerUserError(
|
||||
f"Failed to upsert session to LangChain API. {e}"
|
||||
)
|
||||
except Exception as e:
|
||||
raise LangChainTracerError(
|
||||
f"Failed to upsert session to LangChain API. {e}"
|
||||
) from e
|
||||
self.session = TracerSession(**response.json())
|
||||
return self.session
|
||||
|
||||
def _persist_run(self, run: Run) -> None:
|
||||
"""Persist a run."""
|
||||
"""The Langchain Tracer uses Post/Patch rather than persist."""
|
||||
|
||||
@retry_decorator
|
||||
def _persist_run_single(self, run: Run) -> None:
|
||||
"""Persist a run."""
|
||||
session = self.ensure_session()
|
||||
if run.parent_run_id is None:
|
||||
run.reference_example_id = self.example_id
|
||||
run_dict = run.dict()
|
||||
del run_dict["child_runs"]
|
||||
run_create = RunCreate(**run_dict, session_id=session.id)
|
||||
run_create = RunCreate(**run_dict, session_name=self.session_name)
|
||||
response = None
|
||||
try:
|
||||
# TODO: Add retries when async
|
||||
response = requests.post(
|
||||
f"{self._endpoint}/runs",
|
||||
data=run_create.json(),
|
||||
|
||||
@@ -36,12 +36,6 @@ class TracerSessionBase(TracerSessionV1Base):
|
||||
tenant_id: UUID
|
||||
|
||||
|
||||
class TracerSessionCreate(TracerSessionBase):
|
||||
"""A creation class for TracerSession."""
|
||||
|
||||
id: Optional[UUID]
|
||||
|
||||
|
||||
class TracerSession(TracerSessionBase):
|
||||
"""TracerSessionV1 schema for the V2 API."""
|
||||
|
||||
@@ -136,7 +130,7 @@ class Run(RunBase):
|
||||
|
||||
class RunCreate(RunBase):
|
||||
name: str
|
||||
session_id: UUID
|
||||
session_name: Optional[str] = None
|
||||
|
||||
@root_validator(pre=True)
|
||||
def add_runtime_env(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
|
||||
265
langchain/callbacks/tracers/wandb.py
Normal file
265
langchain/callbacks/tracers/wandb.py
Normal file
@@ -0,0 +1,265 @@
|
||||
"""A Tracer Implementation that records activity to Weights & Biases."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Dict,
|
||||
List,
|
||||
Optional,
|
||||
Sequence,
|
||||
TypedDict,
|
||||
Union,
|
||||
)
|
||||
|
||||
from langchain.callbacks.tracers.base import BaseTracer
|
||||
from langchain.callbacks.tracers.schemas import Run, RunTypeEnum
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from wandb import Settings as WBSettings
|
||||
from wandb.sdk.data_types import trace_tree
|
||||
from wandb.sdk.lib.paths import StrPath
|
||||
from wandb.wandb_run import Run as WBRun
|
||||
|
||||
|
||||
PRINT_WARNINGS = True
|
||||
|
||||
|
||||
def _convert_lc_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
|
||||
if run.run_type == RunTypeEnum.llm:
|
||||
return _convert_llm_run_to_wb_span(trace_tree, run)
|
||||
elif run.run_type == RunTypeEnum.chain:
|
||||
return _convert_chain_run_to_wb_span(trace_tree, run)
|
||||
elif run.run_type == RunTypeEnum.tool:
|
||||
return _convert_tool_run_to_wb_span(trace_tree, run)
|
||||
else:
|
||||
return _convert_run_to_wb_span(trace_tree, run)
|
||||
|
||||
|
||||
def _convert_llm_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
|
||||
base_span = _convert_run_to_wb_span(trace_tree, run)
|
||||
|
||||
base_span.results = [
|
||||
trace_tree.Result(
|
||||
inputs={"prompt": prompt},
|
||||
outputs={
|
||||
f"gen_{g_i}": gen["text"]
|
||||
for g_i, gen in enumerate(run.outputs["generations"][ndx])
|
||||
}
|
||||
if (
|
||||
run.outputs is not None
|
||||
and len(run.outputs["generations"]) > ndx
|
||||
and len(run.outputs["generations"][ndx]) > 0
|
||||
)
|
||||
else None,
|
||||
)
|
||||
for ndx, prompt in enumerate(run.inputs["prompts"] or [])
|
||||
]
|
||||
base_span.span_kind = trace_tree.SpanKind.LLM
|
||||
|
||||
return base_span
|
||||
|
||||
|
||||
def _convert_chain_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
|
||||
base_span = _convert_run_to_wb_span(trace_tree, run)
|
||||
|
||||
base_span.results = [trace_tree.Result(inputs=run.inputs, outputs=run.outputs)]
|
||||
base_span.child_spans = [
|
||||
_convert_lc_run_to_wb_span(trace_tree, child_run)
|
||||
for child_run in run.child_runs
|
||||
]
|
||||
base_span.span_kind = (
|
||||
trace_tree.SpanKind.AGENT
|
||||
if "agent" in run.serialized.get("name", "").lower()
|
||||
else trace_tree.SpanKind.CHAIN
|
||||
)
|
||||
|
||||
return base_span
|
||||
|
||||
|
||||
def _convert_tool_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
|
||||
base_span = _convert_run_to_wb_span(trace_tree, run)
|
||||
base_span.results = [trace_tree.Result(inputs=run.inputs, outputs=run.outputs)]
|
||||
base_span.child_spans = [
|
||||
_convert_lc_run_to_wb_span(trace_tree, child_run)
|
||||
for child_run in run.child_runs
|
||||
]
|
||||
base_span.span_kind = trace_tree.SpanKind.TOOL
|
||||
|
||||
return base_span
|
||||
|
||||
|
||||
def _convert_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
|
||||
attributes = {**run.extra} if run.extra else {}
|
||||
attributes["execution_order"] = run.execution_order
|
||||
|
||||
return trace_tree.Span(
|
||||
span_id=str(run.id) if run.id is not None else None,
|
||||
name=run.serialized.get("name"),
|
||||
start_time_ms=int(run.start_time.timestamp() * 1000),
|
||||
end_time_ms=int(run.end_time.timestamp() * 1000),
|
||||
status_code=trace_tree.StatusCode.SUCCESS
|
||||
if run.error is None
|
||||
else trace_tree.StatusCode.ERROR,
|
||||
status_message=run.error,
|
||||
attributes=attributes,
|
||||
)
|
||||
|
||||
|
||||
def _replace_type_with_kind(data: Any) -> Any:
|
||||
if isinstance(data, dict):
|
||||
# W&B TraceTree expects "_kind" instead of "_type" since `_type` is special
|
||||
# in W&B.
|
||||
if "_type" in data:
|
||||
_type = data.pop("_type")
|
||||
data["_kind"] = _type
|
||||
return {k: _replace_type_with_kind(v) for k, v in data.items()}
|
||||
elif isinstance(data, list):
|
||||
return [_replace_type_with_kind(v) for v in data]
|
||||
elif isinstance(data, tuple):
|
||||
return tuple(_replace_type_with_kind(v) for v in data)
|
||||
elif isinstance(data, set):
|
||||
return {_replace_type_with_kind(v) for v in data}
|
||||
else:
|
||||
return data
|
||||
|
||||
|
||||
class WandbRunArgs(TypedDict):
|
||||
job_type: Optional[str]
|
||||
dir: Optional[StrPath]
|
||||
config: Union[Dict, str, None]
|
||||
project: Optional[str]
|
||||
entity: Optional[str]
|
||||
reinit: Optional[bool]
|
||||
tags: Optional[Sequence]
|
||||
group: Optional[str]
|
||||
name: Optional[str]
|
||||
notes: Optional[str]
|
||||
magic: Optional[Union[dict, str, bool]]
|
||||
config_exclude_keys: Optional[List[str]]
|
||||
config_include_keys: Optional[List[str]]
|
||||
anonymous: Optional[str]
|
||||
mode: Optional[str]
|
||||
allow_val_change: Optional[bool]
|
||||
resume: Optional[Union[bool, str]]
|
||||
force: Optional[bool]
|
||||
tensorboard: Optional[bool]
|
||||
sync_tensorboard: Optional[bool]
|
||||
monitor_gym: Optional[bool]
|
||||
save_code: Optional[bool]
|
||||
id: Optional[str]
|
||||
settings: Union[WBSettings, Dict[str, Any], None]
|
||||
|
||||
|
||||
class WandbTracer(BaseTracer):
|
||||
"""Callback Handler that logs to Weights and Biases.
|
||||
|
||||
This handler will log the model architecture and run traces to Weights and Biases.
|
||||
This will ensure that all LangChain activity is logged to W&B.
|
||||
"""
|
||||
|
||||
_run: Optional[WBRun] = None
|
||||
_run_args: Optional[WandbRunArgs] = None
|
||||
|
||||
def __init__(self, run_args: Optional[WandbRunArgs] = None, **kwargs: Any) -> None:
|
||||
"""Initializes the WandbTracer.
|
||||
|
||||
Parameters:
|
||||
run_args: (dict, optional) Arguments to pass to `wandb.init()`. If not
|
||||
provided, `wandb.init()` will be called with no arguments. Please
|
||||
refer to the `wandb.init` for more details.
|
||||
|
||||
To use W&B to monitor all LangChain activity, add this tracer like any other
|
||||
LangChain callback:
|
||||
```
|
||||
from wandb.integration.langchain import WandbTracer
|
||||
|
||||
tracer = WandbTracer()
|
||||
chain = LLMChain(llm, callbacks=[tracer])
|
||||
# ...end of notebook / script:
|
||||
tracer.finish()
|
||||
```
|
||||
"""
|
||||
super().__init__(**kwargs)
|
||||
try:
|
||||
import wandb
|
||||
from wandb.sdk.data_types import trace_tree
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"Could not import wandb python package."
|
||||
"Please install it with `pip install wandb`."
|
||||
) from e
|
||||
self._wandb = wandb
|
||||
self._trace_tree = trace_tree
|
||||
self._run_args = run_args
|
||||
self._ensure_run(should_print_url=(wandb.run is None))
|
||||
|
||||
def finish(self) -> None:
|
||||
"""Waits for all asynchronous processes to finish and data to upload.
|
||||
|
||||
Proxy for `wandb.finish()`.
|
||||
"""
|
||||
self._wandb.finish()
|
||||
|
||||
def _log_trace_from_run(self, run: Run) -> None:
|
||||
"""Logs a LangChain Run to W*B as a W&B Trace."""
|
||||
self._ensure_run()
|
||||
|
||||
try:
|
||||
root_span = _convert_lc_run_to_wb_span(self._trace_tree, run)
|
||||
except Exception as e:
|
||||
if PRINT_WARNINGS:
|
||||
self._wandb.termwarn(
|
||||
f"Skipping trace saving - unable to safely convert LangChain Run "
|
||||
f"into W&B Trace due to: {e}"
|
||||
)
|
||||
return
|
||||
|
||||
model_dict = None
|
||||
|
||||
# TODO: Add something like this once we have a way to get the clean serialized
|
||||
# parent dict from a run:
|
||||
# serialized_parent = safely_get_span_producing_model(run)
|
||||
# if serialized_parent is not None:
|
||||
# model_dict = safely_convert_model_to_dict(serialized_parent)
|
||||
|
||||
model_trace = self._trace_tree.WBTraceTree(
|
||||
root_span=root_span,
|
||||
model_dict=model_dict,
|
||||
)
|
||||
if self._wandb.run is not None:
|
||||
self._wandb.run.log({"langchain_trace": model_trace})
|
||||
|
||||
def _ensure_run(self, should_print_url: bool = False) -> None:
|
||||
"""Ensures an active W&B run exists.
|
||||
|
||||
If not, will start a new run with the provided run_args.
|
||||
"""
|
||||
if self._wandb.run is None:
|
||||
# Make a shallow copy of the run args, so we don't modify the original
|
||||
run_args = self._run_args or {} # type: ignore
|
||||
run_args: dict = {**run_args} # type: ignore
|
||||
|
||||
# Prefer to run in silent mode since W&B has a lot of output
|
||||
# which can be undesirable when dealing with text-based models.
|
||||
if "settings" not in run_args: # type: ignore
|
||||
run_args["settings"] = {"silent": True} # type: ignore
|
||||
|
||||
# Start the run and add the stream table
|
||||
self._wandb.init(**run_args)
|
||||
if self._wandb.run is not None:
|
||||
if should_print_url:
|
||||
run_url = self._wandb.run.settings.run_url
|
||||
self._wandb.termlog(
|
||||
f"Streaming LangChain activity to W&B at {run_url}\n"
|
||||
"`WandbTracer` is currently in beta.\n"
|
||||
"Please report any issues to "
|
||||
"https://github.com/wandb/wandb/issues with the tag "
|
||||
"`langchain`."
|
||||
)
|
||||
|
||||
self._wandb.run._label(repo="langchain")
|
||||
|
||||
def _persist_run(self, run: "Run") -> None:
|
||||
"""Persist a run."""
|
||||
self._log_trace_from_run(run)
|
||||
@@ -200,9 +200,9 @@ class WandbCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
|
||||
notes=self.notes,
|
||||
)
|
||||
warning = (
|
||||
"The wandb callback is currently in beta and is subject to change "
|
||||
"based on updates to `langchain`. Please report any issues to "
|
||||
"https://github.com/wandb/wandb/issues with the tag `langchain`."
|
||||
"DEPRECATION: The `WandbCallbackHandler` will soon be deprecated in favor "
|
||||
"of the `WandbTracer`. Please update your code to use the `WandbTracer` "
|
||||
"instead."
|
||||
)
|
||||
wandb.termwarn(
|
||||
warning,
|
||||
|
||||
@@ -33,6 +33,7 @@ class StructuredQueryOutputParser(BaseOutputParser[StructuredQuery]):
|
||||
def parse(self, text: str) -> StructuredQuery:
|
||||
try:
|
||||
expected_keys = ["query", "filter"]
|
||||
allowed_keys = ["query", "filter", "limit"]
|
||||
parsed = parse_and_check_json_markdown(text, expected_keys)
|
||||
if len(parsed["query"]) == 0:
|
||||
parsed["query"] = " "
|
||||
@@ -40,10 +41,10 @@ class StructuredQueryOutputParser(BaseOutputParser[StructuredQuery]):
|
||||
parsed["filter"] = None
|
||||
else:
|
||||
parsed["filter"] = self.ast_parse(parsed["filter"])
|
||||
if not parsed.get("limit"):
|
||||
parsed.pop("limit", None)
|
||||
return StructuredQuery(
|
||||
query=parsed["query"],
|
||||
filter=parsed["filter"],
|
||||
limit=parsed.get("limit"),
|
||||
**{k: v for k, v in parsed.items() if k in allowed_keys}
|
||||
)
|
||||
except Exception as e:
|
||||
raise OutputParserException(
|
||||
|
||||
@@ -3,7 +3,7 @@ from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from enum import Enum
|
||||
from typing import Any, List, Optional, Sequence
|
||||
from typing import Any, List, Optional, Sequence, Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
@@ -14,6 +14,20 @@ class Visitor(ABC):
|
||||
allowed_comparators: Optional[Sequence[Comparator]] = None
|
||||
allowed_operators: Optional[Sequence[Operator]] = None
|
||||
|
||||
def _validate_func(self, func: Union[Operator, Comparator]) -> None:
|
||||
if isinstance(func, Operator) and self.allowed_operators is not None:
|
||||
if func not in self.allowed_operators:
|
||||
raise ValueError(
|
||||
f"Received disallowed operator {func}. Allowed "
|
||||
f"comparators are {self.allowed_operators}"
|
||||
)
|
||||
if isinstance(func, Comparator) and self.allowed_comparators is not None:
|
||||
if func not in self.allowed_comparators:
|
||||
raise ValueError(
|
||||
f"Received disallowed comparator {func}. Allowed "
|
||||
f"comparators are {self.allowed_comparators}"
|
||||
)
|
||||
|
||||
@abstractmethod
|
||||
def visit_operation(self, operation: Operation) -> Any:
|
||||
"""Translate an Operation."""
|
||||
|
||||
@@ -28,6 +28,7 @@ class ChatAnthropic(BaseChatModel, _AnthropicCommon):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
import anthropic
|
||||
from langchain.llms import Anthropic
|
||||
model = ChatAnthropic(model="<model_name>", anthropic_api_key="my-api-key")
|
||||
|
||||
@@ -30,6 +30,7 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
`35-turbo-dev`, the constructor should look like:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
AzureChatOpenAI(
|
||||
deployment_name="35-turbo-dev",
|
||||
openai_api_version="2023-03-15-preview",
|
||||
|
||||
@@ -20,6 +20,7 @@ class PromptLayerChatOpenAI(ChatOpenAI):
|
||||
|
||||
All parameters that can be passed to the OpenAI LLM can also
|
||||
be passed here. The PromptLayerChatOpenAI adds to optional
|
||||
|
||||
parameters:
|
||||
``pl_tags``: List of strings to tag the request with.
|
||||
``return_pl_id``: If True, the PromptLayer request ID will be
|
||||
|
||||
@@ -10,8 +10,9 @@ from typing import (
|
||||
Callable,
|
||||
Dict,
|
||||
Iterator,
|
||||
List,
|
||||
Mapping,
|
||||
Optional,
|
||||
Sequence,
|
||||
Tuple,
|
||||
Union,
|
||||
)
|
||||
@@ -24,14 +25,23 @@ from requests import Response
|
||||
from tenacity import retry, stop_after_attempt, wait_fixed
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.tracers.schemas import Run, TracerSession
|
||||
from langchain.callbacks.tracers.schemas import Run as TracerRun
|
||||
from langchain.callbacks.tracers.schemas import TracerSession
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.client.models import (
|
||||
APIFeedbackSource,
|
||||
Dataset,
|
||||
DatasetCreate,
|
||||
Example,
|
||||
ExampleCreate,
|
||||
ExampleUpdate,
|
||||
Feedback,
|
||||
FeedbackCreate,
|
||||
FeedbackSourceBase,
|
||||
FeedbackSourceType,
|
||||
ListFeedbackQueryParams,
|
||||
ListRunsQueryParams,
|
||||
ModelFeedbackSource,
|
||||
)
|
||||
from langchain.client.runner_utils import arun_on_examples, run_on_examples
|
||||
from langchain.utils import raise_for_status_with_text, xor_args
|
||||
@@ -44,6 +54,10 @@ logger = logging.getLogger(__name__)
|
||||
MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]
|
||||
|
||||
|
||||
class Run(TracerRun):
|
||||
id: UUID
|
||||
|
||||
|
||||
def _get_link_stem(url: str) -> str:
|
||||
scheme = urlsplit(url).scheme
|
||||
netloc_prefix = urlsplit(url).netloc.split(":")[0]
|
||||
@@ -65,7 +79,6 @@ class LangChainPlusClient(BaseSettings):
|
||||
|
||||
api_key: Optional[str] = Field(default=None, env="LANGCHAIN_API_KEY")
|
||||
api_url: str = Field(default="http://localhost:1984", env="LANGCHAIN_ENDPOINT")
|
||||
tenant_id: Optional[str] = None
|
||||
|
||||
@root_validator(pre=True)
|
||||
def validate_api_key_if_hosted(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@@ -77,31 +90,8 @@ class LangChainPlusClient(BaseSettings):
|
||||
raise ValueError(
|
||||
"API key must be provided when using hosted LangChain+ API"
|
||||
)
|
||||
tenant_id = values.get("tenant_id")
|
||||
if not tenant_id:
|
||||
values["tenant_id"] = LangChainPlusClient._get_seeded_tenant_id(
|
||||
api_url, api_key
|
||||
)
|
||||
return values
|
||||
|
||||
@staticmethod
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def _get_seeded_tenant_id(api_url: str, api_key: Optional[str]) -> str:
|
||||
"""Get the tenant ID from the seeded tenant."""
|
||||
url = f"{api_url}/tenants"
|
||||
headers = {"x-api-key": api_key} if api_key else {}
|
||||
response = requests.get(url, headers=headers)
|
||||
try:
|
||||
raise_for_status_with_text(response)
|
||||
except Exception as e:
|
||||
raise ValueError(
|
||||
"Unable to get default tenant ID. Please manually provide."
|
||||
) from e
|
||||
results: List[dict] = response.json()
|
||||
if len(results) == 0:
|
||||
raise ValueError("No seeded tenant found")
|
||||
return results[0]["id"]
|
||||
|
||||
@staticmethod
|
||||
def _get_session_name(
|
||||
session_name: Optional[str],
|
||||
@@ -139,18 +129,10 @@ class LangChainPlusClient(BaseSettings):
|
||||
headers["x-api-key"] = self.api_key
|
||||
return headers
|
||||
|
||||
@property
|
||||
def query_params(self) -> Dict[str, Any]:
|
||||
"""Get the headers for the API request."""
|
||||
return {"tenant_id": self.tenant_id}
|
||||
|
||||
def _get(self, path: str, params: Optional[Dict[str, Any]] = None) -> Response:
|
||||
"""Make a GET request."""
|
||||
query_params = self.query_params
|
||||
if params:
|
||||
query_params.update(params)
|
||||
return requests.get(
|
||||
f"{self.api_url}{path}", headers=self._headers, params=query_params
|
||||
f"{self.api_url}{path}", headers=self._headers, params=params
|
||||
)
|
||||
|
||||
def upload_dataframe(
|
||||
@@ -158,8 +140,8 @@ class LangChainPlusClient(BaseSettings):
|
||||
df: pd.DataFrame,
|
||||
name: str,
|
||||
description: str,
|
||||
input_keys: List[str],
|
||||
output_keys: List[str],
|
||||
input_keys: Sequence[str],
|
||||
output_keys: Sequence[str],
|
||||
) -> Dataset:
|
||||
"""Upload a dataframe as individual examples to the LangChain+ API."""
|
||||
dataset = self.create_dataset(dataset_name=name, description=description)
|
||||
@@ -173,8 +155,8 @@ class LangChainPlusClient(BaseSettings):
|
||||
self,
|
||||
csv_file: Union[str, Tuple[str, BytesIO]],
|
||||
description: str,
|
||||
input_keys: List[str],
|
||||
output_keys: List[str],
|
||||
input_keys: Sequence[str],
|
||||
output_keys: Sequence[str],
|
||||
) -> Dataset:
|
||||
"""Upload a CSV file to the LangChain+ API."""
|
||||
files = {"file": csv_file}
|
||||
@@ -182,7 +164,6 @@ class LangChainPlusClient(BaseSettings):
|
||||
"input_keys": ",".join(input_keys),
|
||||
"output_keys": ",".join(output_keys),
|
||||
"description": description,
|
||||
"tenant_id": self.tenant_id,
|
||||
}
|
||||
response = requests.post(
|
||||
self.api_url + "/datasets/upload",
|
||||
@@ -223,10 +204,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
query_params = ListRunsQueryParams(
|
||||
session_id=session_id, run_type=run_type, **kwargs
|
||||
)
|
||||
filtered_params = {
|
||||
k: v for k, v in query_params.dict().items() if v is not None
|
||||
}
|
||||
response = self._get("/runs", params=filtered_params)
|
||||
response = self._get("/runs", params=query_params.dict(exclude_none=True))
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Run(**run) for run in response.json()]
|
||||
|
||||
@@ -237,7 +215,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
) -> TracerSession:
|
||||
"""Read a session from the LangChain+ API."""
|
||||
path = "/sessions"
|
||||
params: Dict[str, Any] = {"limit": 1, "tenant_id": self.tenant_id}
|
||||
params: Dict[str, Any] = {"limit": 1}
|
||||
if session_id is not None:
|
||||
path += f"/{session_id}"
|
||||
elif session_name is not None:
|
||||
@@ -279,10 +257,11 @@ class LangChainPlusClient(BaseSettings):
|
||||
raise_for_status_with_text(response)
|
||||
return None
|
||||
|
||||
def create_dataset(self, dataset_name: str, description: str) -> Dataset:
|
||||
def create_dataset(
|
||||
self, dataset_name: str, *, description: Optional[str] = None
|
||||
) -> Dataset:
|
||||
"""Create a dataset in the LangChain+ API."""
|
||||
dataset = DatasetCreate(
|
||||
tenant_id=self.tenant_id,
|
||||
name=dataset_name,
|
||||
description=description,
|
||||
)
|
||||
@@ -300,7 +279,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
self, *, dataset_name: Optional[str] = None, dataset_id: Optional[str] = None
|
||||
) -> Dataset:
|
||||
path = "/datasets"
|
||||
params: Dict[str, Any] = {"limit": 1, "tenant_id": self.tenant_id}
|
||||
params: Dict[str, Any] = {"limit": 1}
|
||||
if dataset_id is not None:
|
||||
path += f"/{dataset_id}"
|
||||
elif dataset_name is not None:
|
||||
@@ -394,6 +373,110 @@ class LangChainPlusClient(BaseSettings):
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Example(**dataset) for dataset in response.json()]
|
||||
|
||||
def update_example(
|
||||
self,
|
||||
example_id: str,
|
||||
*,
|
||||
inputs: Optional[Dict[str, Any]] = None,
|
||||
outputs: Optional[Mapping[str, Any]] = None,
|
||||
dataset_id: Optional[str] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Update a specific example."""
|
||||
example = ExampleUpdate(
|
||||
inputs=inputs,
|
||||
outputs=outputs,
|
||||
dataset_id=dataset_id,
|
||||
)
|
||||
response = requests.patch(
|
||||
f"{self.api_url}/examples/{example_id}",
|
||||
headers=self._headers,
|
||||
data=example.json(exclude_none=True),
|
||||
)
|
||||
raise_for_status_with_text(response)
|
||||
return response.json()
|
||||
|
||||
def create_feedback(
|
||||
self,
|
||||
run_id: str,
|
||||
key: str,
|
||||
*,
|
||||
score: Union[float, int, bool, None] = None,
|
||||
value: Union[float, int, bool, str, dict, None] = None,
|
||||
correction: Union[str, dict, None] = None,
|
||||
comment: Union[str, None] = None,
|
||||
source_info: Optional[Dict[str, Any]] = None,
|
||||
feedback_source_type: FeedbackSourceType = FeedbackSourceType.API,
|
||||
) -> Feedback:
|
||||
"""Create a feedback in the LangChain+ API.
|
||||
|
||||
Args:
|
||||
run_id: The ID of the run to provide feedback on.
|
||||
key: The name of the metric, tag, or 'aspect' this
|
||||
feedback is about.
|
||||
score: The score to rate this run on the metric
|
||||
or aspect.
|
||||
value: The display value or non-numeric value for this feedback.
|
||||
correction: The proper ground truth for this run.
|
||||
comment: A comment about this feedback.
|
||||
source_info: Information about the source of this feedback.
|
||||
feedback_source_type: The type of feedback source.
|
||||
"""
|
||||
if feedback_source_type == FeedbackSourceType.API:
|
||||
feedback_source: FeedbackSourceBase = APIFeedbackSource(
|
||||
metadata=source_info
|
||||
)
|
||||
elif feedback_source_type == FeedbackSourceType.MODEL:
|
||||
feedback_source = ModelFeedbackSource(metadata=source_info)
|
||||
else:
|
||||
raise ValueError(f"Unknown feedback source type {feedback_source_type}")
|
||||
feedback = FeedbackCreate(
|
||||
run_id=run_id,
|
||||
key=key,
|
||||
score=score,
|
||||
value=value,
|
||||
correction=correction,
|
||||
comment=comment,
|
||||
feedback_source=feedback_source,
|
||||
)
|
||||
response = requests.post(
|
||||
self.api_url + "/feedback",
|
||||
headers=self._headers,
|
||||
data=feedback.json(),
|
||||
)
|
||||
raise_for_status_with_text(response)
|
||||
return Feedback(**feedback.dict())
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def read_feedback(self, feedback_id: str) -> Feedback:
|
||||
"""Read a feedback from the LangChain+ API."""
|
||||
response = self._get(f"/feedback/{feedback_id}")
|
||||
raise_for_status_with_text(response)
|
||||
return Feedback(**response.json())
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def list_feedback(
|
||||
self,
|
||||
*,
|
||||
run_ids: Optional[Sequence[Union[str, UUID]]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Iterator[Feedback]:
|
||||
"""List the feedback objects on the LangChain+ API."""
|
||||
params = ListFeedbackQueryParams(
|
||||
run=run_ids,
|
||||
**kwargs,
|
||||
)
|
||||
response = self._get("/feedback", params=params.dict(exclude_none=True))
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Feedback(**feedback) for feedback in response.json()]
|
||||
|
||||
def delete_feedback(self, feedback_id: str) -> None:
|
||||
"""Delete a feedback by ID."""
|
||||
response = requests.delete(
|
||||
f"{self.api_url}/feedback/{feedback_id}",
|
||||
headers=self._headers,
|
||||
)
|
||||
raise_for_status_with_text(response)
|
||||
|
||||
async def arun_on_dataset(
|
||||
self,
|
||||
dataset_name: str,
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, List, Optional
|
||||
from uuid import UUID
|
||||
from enum import Enum
|
||||
from typing import Any, ClassVar, Dict, List, Mapping, Optional, Sequence, Union
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
from pydantic import BaseModel, Field, root_validator
|
||||
|
||||
@@ -14,6 +15,9 @@ class ExampleBase(BaseModel):
|
||||
inputs: Dict[str, Any]
|
||||
outputs: Optional[Dict[str, Any]] = Field(default=None)
|
||||
|
||||
class Config:
|
||||
frozen = True
|
||||
|
||||
|
||||
class ExampleCreate(ExampleBase):
|
||||
"""Example create model."""
|
||||
@@ -31,12 +35,25 @@ class Example(ExampleBase):
|
||||
runs: List[Run] = Field(default_factory=list)
|
||||
|
||||
|
||||
class ExampleUpdate(BaseModel):
|
||||
"""Update class for Example."""
|
||||
|
||||
dataset_id: Optional[UUID] = None
|
||||
inputs: Optional[Dict[str, Any]] = None
|
||||
outputs: Optional[Dict[str, Any]] = None
|
||||
|
||||
class Config:
|
||||
frozen = True
|
||||
|
||||
|
||||
class DatasetBase(BaseModel):
|
||||
"""Dataset base model."""
|
||||
|
||||
tenant_id: UUID
|
||||
name: str
|
||||
description: str
|
||||
description: Optional[str] = None
|
||||
|
||||
class Config:
|
||||
frozen = True
|
||||
|
||||
|
||||
class DatasetCreate(DatasetBase):
|
||||
@@ -50,6 +67,7 @@ class Dataset(DatasetBase):
|
||||
"""Dataset ORM model."""
|
||||
|
||||
id: UUID
|
||||
tenant_id: UUID
|
||||
created_at: datetime
|
||||
modified_at: Optional[datetime] = Field(default=None)
|
||||
|
||||
@@ -57,9 +75,6 @@ class Dataset(DatasetBase):
|
||||
class ListRunsQueryParams(BaseModel):
|
||||
"""Query params for GET /runs endpoint."""
|
||||
|
||||
class Config:
|
||||
extra = "forbid"
|
||||
|
||||
id: Optional[List[UUID]]
|
||||
"""Filter runs by id."""
|
||||
parent_run: Optional[UUID]
|
||||
@@ -89,7 +104,11 @@ class ListRunsQueryParams(BaseModel):
|
||||
description="Query Runs that ended >= this time",
|
||||
)
|
||||
|
||||
@root_validator
|
||||
class Config:
|
||||
extra = "forbid"
|
||||
frozen = True
|
||||
|
||||
@root_validator(allow_reuse=True)
|
||||
def validate_time_range(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Validate that start_time <= end_time."""
|
||||
start_time = values.get("start_time")
|
||||
@@ -97,3 +116,91 @@ class ListRunsQueryParams(BaseModel):
|
||||
if start_time and end_time and start_time > end_time:
|
||||
raise ValueError("start_time must be <= end_time")
|
||||
return values
|
||||
|
||||
|
||||
class FeedbackSourceBase(BaseModel):
|
||||
type: ClassVar[str]
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
|
||||
class Config:
|
||||
frozen = True
|
||||
|
||||
|
||||
class APIFeedbackSource(FeedbackSourceBase):
|
||||
"""API feedback source."""
|
||||
|
||||
type: ClassVar[str] = "api"
|
||||
|
||||
|
||||
class ModelFeedbackSource(FeedbackSourceBase):
|
||||
"""Model feedback source."""
|
||||
|
||||
type: ClassVar[str] = "model"
|
||||
|
||||
|
||||
class FeedbackSourceType(Enum):
|
||||
"""Feedback source type."""
|
||||
|
||||
API = "api"
|
||||
"""General feedback submitted from the API."""
|
||||
MODEL = "model"
|
||||
"""Model-assisted feedback."""
|
||||
|
||||
|
||||
class FeedbackBase(BaseModel):
|
||||
"""Feedback schema."""
|
||||
|
||||
created_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
"""The time the feedback was created."""
|
||||
modified_at: datetime = Field(default_factory=datetime.utcnow)
|
||||
"""The time the feedback was last modified."""
|
||||
run_id: UUID
|
||||
"""The associated run ID this feedback is logged for."""
|
||||
key: str
|
||||
"""The metric name, tag, or aspect to provide feedback on."""
|
||||
score: Union[float, int, bool, None] = None
|
||||
"""Value or score to assign the run."""
|
||||
value: Union[float, int, bool, str, dict, None] = None
|
||||
"""The display value, tag or other value for the feedback if not a metric."""
|
||||
comment: Optional[str] = None
|
||||
"""Comment or explanation for the feedback."""
|
||||
correction: Union[str, dict, None] = None
|
||||
"""Correction for the run."""
|
||||
feedback_source: Optional[
|
||||
Union[APIFeedbackSource, ModelFeedbackSource, Mapping[str, Any]]
|
||||
] = None
|
||||
"""The source of the feedback."""
|
||||
|
||||
class Config:
|
||||
frozen = True
|
||||
|
||||
|
||||
class FeedbackCreate(FeedbackBase):
|
||||
"""Schema used for creating feedback."""
|
||||
|
||||
id: UUID = Field(default_factory=uuid4)
|
||||
|
||||
feedback_source: APIFeedbackSource
|
||||
"""The source of the feedback."""
|
||||
|
||||
|
||||
class Feedback(FeedbackBase):
|
||||
"""Schema for getting feedback."""
|
||||
|
||||
id: UUID
|
||||
feedback_source: Optional[Dict] = None
|
||||
"""The source of the feedback. In this case"""
|
||||
|
||||
|
||||
class ListFeedbackQueryParams(BaseModel):
|
||||
"""Query Params for listing feedbacks."""
|
||||
|
||||
run: Optional[Sequence[UUID]] = None
|
||||
limit: int = 100
|
||||
offset: int = 0
|
||||
|
||||
class Config:
|
||||
"""Config for query params."""
|
||||
|
||||
extra = "forbid"
|
||||
frozen = True
|
||||
|
||||
@@ -151,7 +151,7 @@ async def _arun_llm_or_chain(
|
||||
)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = await chain.arun(example.inputs, callbacks=callbacks)
|
||||
output = await chain.acall(example.inputs, callbacks=callbacks)
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
@@ -214,7 +214,6 @@ async def _tracer_initializer(session_name: Optional[str]) -> Optional[LangChain
|
||||
"""
|
||||
if session_name:
|
||||
tracer = LangChainTracer(session_name=session_name)
|
||||
tracer.ensure_session()
|
||||
return tracer
|
||||
else:
|
||||
return None
|
||||
@@ -326,7 +325,7 @@ def run_llm_or_chain(
|
||||
output: Any = run_llm(llm_or_chain_factory, example.inputs, callbacks)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = chain.run(example.inputs, callbacks=callbacks)
|
||||
output = chain(example.inputs, callbacks=callbacks)
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
|
||||
0
langchain/client/utils.py
Normal file
0
langchain/client/utils.py
Normal file
@@ -33,6 +33,7 @@ from langchain.document_loaders.email import (
|
||||
from langchain.document_loaders.epub import UnstructuredEPubLoader
|
||||
from langchain.document_loaders.evernote import EverNoteLoader
|
||||
from langchain.document_loaders.facebook_chat import FacebookChatLoader
|
||||
from langchain.document_loaders.figma import FigmaFileLoader
|
||||
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
|
||||
from langchain.document_loaders.gcs_file import GCSFileLoader
|
||||
from langchain.document_loaders.git import GitLoader
|
||||
@@ -48,10 +49,12 @@ from langchain.document_loaders.ifixit import IFixitLoader
|
||||
from langchain.document_loaders.image import UnstructuredImageLoader
|
||||
from langchain.document_loaders.image_captions import ImageCaptionLoader
|
||||
from langchain.document_loaders.imsdb import IMSDbLoader
|
||||
from langchain.document_loaders.iugu import IuguLoader
|
||||
from langchain.document_loaders.joplin import JoplinLoader
|
||||
from langchain.document_loaders.json_loader import JSONLoader
|
||||
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
|
||||
from langchain.document_loaders.mastodon import MastodonTootsLoader
|
||||
from langchain.document_loaders.max_compute import MaxComputeLoader
|
||||
from langchain.document_loaders.mediawikidump import MWDumpLoader
|
||||
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
|
||||
from langchain.document_loaders.notebook import NotebookLoader
|
||||
@@ -60,6 +63,7 @@ from langchain.document_loaders.notiondb import NotionDBLoader
|
||||
from langchain.document_loaders.obsidian import ObsidianLoader
|
||||
from langchain.document_loaders.odt import UnstructuredODTLoader
|
||||
from langchain.document_loaders.onedrive import OneDriveLoader
|
||||
from langchain.document_loaders.onedrive_file import OneDriveFileLoader
|
||||
from langchain.document_loaders.pdf import (
|
||||
MathpixPDFLoader,
|
||||
OnlinePDFLoader,
|
||||
@@ -152,10 +156,11 @@ __all__ = [
|
||||
"DuckDBLoader",
|
||||
"EverNoteLoader",
|
||||
"FacebookChatLoader",
|
||||
"FigmaFileLoader",
|
||||
"GCSDirectoryLoader",
|
||||
"GCSFileLoader",
|
||||
"GitLoader",
|
||||
"GitHubIssuesLoader",
|
||||
"GitLoader",
|
||||
"GitbookLoader",
|
||||
"GoogleApiClient",
|
||||
"GoogleApiYoutubeLoader",
|
||||
@@ -167,17 +172,20 @@ __all__ = [
|
||||
"IFixitLoader",
|
||||
"IMSDbLoader",
|
||||
"ImageCaptionLoader",
|
||||
"JoplinLoader",
|
||||
"IuguLoader",
|
||||
"JSONLoader",
|
||||
"JoplinLoader",
|
||||
"MWDumpLoader",
|
||||
"MastodonTootsLoader",
|
||||
"MathpixPDFLoader",
|
||||
"MaxComputeLoader",
|
||||
"ModernTreasuryLoader",
|
||||
"NotebookLoader",
|
||||
"NotionDBLoader",
|
||||
"NotionDirectoryLoader",
|
||||
"ObsidianLoader",
|
||||
"OneDriveLoader",
|
||||
"OneDriveFileLoader",
|
||||
"OnlinePDFLoader",
|
||||
"OutlookMessageLoader",
|
||||
"PDFMinerLoader",
|
||||
@@ -185,6 +193,7 @@ __all__ = [
|
||||
"PDFPlumberLoader",
|
||||
"PagedPDFSplitter",
|
||||
"PlaywrightURLLoader",
|
||||
"PsychicLoader",
|
||||
"PyMuPDFLoader",
|
||||
"PyPDFDirectoryLoader",
|
||||
"PyPDFLoader",
|
||||
@@ -200,11 +209,13 @@ __all__ = [
|
||||
"SeleniumURLLoader",
|
||||
"SitemapLoader",
|
||||
"SlackDirectoryLoader",
|
||||
"TelegramChatFileLoader",
|
||||
"TelegramChatApiLoader",
|
||||
"SpreedlyLoader",
|
||||
"StripeLoader",
|
||||
"TelegramChatApiLoader",
|
||||
"TelegramChatFileLoader",
|
||||
"TelegramChatLoader",
|
||||
"TextLoader",
|
||||
"ToMarkdownLoader",
|
||||
"TomlLoader",
|
||||
"TrelloLoader",
|
||||
"TwitterTweetLoader",
|
||||
@@ -228,7 +239,4 @@ __all__ = [
|
||||
"WhatsAppChatLoader",
|
||||
"WikipediaLoader",
|
||||
"YoutubeLoader",
|
||||
"TelegramChatLoader",
|
||||
"ToMarkdownLoader",
|
||||
"PsychicLoader",
|
||||
]
|
||||
|
||||
82
langchain/document_loaders/max_compute.py
Normal file
82
langchain/document_loaders/max_compute.py
Normal file
@@ -0,0 +1,82 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Iterator, List, Optional, Sequence
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.utilities.max_compute import MaxComputeAPIWrapper
|
||||
|
||||
|
||||
class MaxComputeLoader(BaseLoader):
|
||||
"""Loads a query result from Alibaba Cloud MaxCompute table into documents."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
query: str,
|
||||
api_wrapper: MaxComputeAPIWrapper,
|
||||
*,
|
||||
page_content_columns: Optional[Sequence[str]] = None,
|
||||
metadata_columns: Optional[Sequence[str]] = None,
|
||||
):
|
||||
"""Initialize Alibaba Cloud MaxCompute document loader.
|
||||
|
||||
Args:
|
||||
query: SQL query to execute.
|
||||
api_wrapper: MaxCompute API wrapper.
|
||||
page_content_columns: The columns to write into the `page_content` of the
|
||||
Document. If unspecified, all columns will be written to `page_content`.
|
||||
metadata_columns: The columns to write into the `metadata` of the Document.
|
||||
If unspecified, all columns not added to `page_content` will be written.
|
||||
"""
|
||||
self.query = query
|
||||
self.api_wrapper = api_wrapper
|
||||
self.page_content_columns = page_content_columns
|
||||
self.metadata_columns = metadata_columns
|
||||
|
||||
@classmethod
|
||||
def from_params(
|
||||
cls,
|
||||
query: str,
|
||||
endpoint: str,
|
||||
project: str,
|
||||
*,
|
||||
access_id: Optional[str] = None,
|
||||
secret_access_key: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> MaxComputeLoader:
|
||||
"""Convenience constructor that builds the MaxCompute API wrapper from
|
||||
given parameters.
|
||||
|
||||
Args:
|
||||
query: SQL query to execute.
|
||||
endpoint: MaxCompute endpoint.
|
||||
project: A project is a basic organizational unit of MaxCompute, which is
|
||||
similar to a database.
|
||||
access_id: MaxCompute access ID. Should be passed in directly or set as the
|
||||
environment variable `MAX_COMPUTE_ACCESS_ID`.
|
||||
secret_access_key: MaxCompute secret access key. Should be passed in
|
||||
directly or set as the environment variable
|
||||
`MAX_COMPUTE_SECRET_ACCESS_KEY`.
|
||||
"""
|
||||
api_wrapper = MaxComputeAPIWrapper.from_params(
|
||||
endpoint, project, access_id=access_id, secret_access_key=secret_access_key
|
||||
)
|
||||
return cls(query, api_wrapper, **kwargs)
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
for row in self.api_wrapper.query(self.query):
|
||||
if self.page_content_columns:
|
||||
page_content_data = {
|
||||
k: v for k, v in row.items() if k in self.page_content_columns
|
||||
}
|
||||
else:
|
||||
page_content_data = row
|
||||
page_content = "\n".join(f"{k}: {v}" for k, v in page_content_data.items())
|
||||
if self.metadata_columns:
|
||||
metadata = {k: v for k, v in row.items() if k in self.metadata_columns}
|
||||
else:
|
||||
metadata = {k: v for k, v in row.items() if k not in page_content_data}
|
||||
yield Document(page_content=page_content, metadata=metadata)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
return list(self.lazy_load())
|
||||
@@ -2,7 +2,7 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import warnings
|
||||
from typing import Any, List, Optional, Union
|
||||
from typing import Any, Dict, List, Optional, Union
|
||||
|
||||
import aiohttp
|
||||
import requests
|
||||
@@ -47,6 +47,9 @@ class WebBaseLoader(BaseLoader):
|
||||
default_parser: str = "html.parser"
|
||||
"""Default parser to use for BeautifulSoup."""
|
||||
|
||||
requests_kwargs: Dict[str, Any] = {}
|
||||
"""kwargs for requests"""
|
||||
|
||||
def __init__(
|
||||
self, web_path: Union[str, List[str]], header_template: Optional[dict] = None
|
||||
):
|
||||
@@ -170,7 +173,7 @@ class WebBaseLoader(BaseLoader):
|
||||
|
||||
self._check_parser(parser)
|
||||
|
||||
html_doc = self.session.get(url)
|
||||
html_doc = self.session.get(url, **self.requests_kwargs)
|
||||
html_doc.encoding = html_doc.apparent_encoding
|
||||
return BeautifulSoup(html_doc.text, parser)
|
||||
|
||||
|
||||
@@ -149,6 +149,7 @@ class AlephAlphaSymmetricSemanticEmbedding(AlephAlphaAsymmetricSemanticEmbedding
|
||||
queries are embedded with a SemanticRepresentation.Symmetric
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from aleph_alpha import AlephAlphaSymmetricSemanticEmbedding
|
||||
|
||||
embeddings = AlephAlphaAsymmetricSemanticEmbedding()
|
||||
|
||||
@@ -68,6 +68,10 @@ class BedrockEmbeddings(BaseModel, Embeddings):
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
"""Validate that AWS credentials to and python package exists in environment."""
|
||||
|
||||
if values["client"] is not None:
|
||||
return values
|
||||
|
||||
try:
|
||||
import boto3
|
||||
|
||||
|
||||
@@ -66,30 +66,32 @@ class ElasticsearchEmbeddings(Embeddings):
|
||||
es_user: (str, optional): Elasticsearch username.
|
||||
es_password: (str, optional): Elasticsearch password.
|
||||
|
||||
Example Usage:
|
||||
from langchain.embeddings import ElasticsearchEmbeddings
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
# Define the model ID and input field name (if different from default)
|
||||
model_id = "your_model_id"
|
||||
# Optional, only if different from 'text_field'
|
||||
input_field = "your_input_field"
|
||||
from langchain.embeddings import ElasticsearchEmbeddings
|
||||
|
||||
# Credentials can be passed in two ways. Either set the env vars
|
||||
# ES_CLOUD_ID, ES_USER, ES_PASSWORD and they will be automatically pulled
|
||||
# in, or pass them in directly as kwargs.
|
||||
embeddings = ElasticsearchEmbeddings.from_credentials(
|
||||
model_id,
|
||||
input_field=input_field,
|
||||
# es_cloud_id="foo",
|
||||
# es_user="bar",
|
||||
# es_password="baz",
|
||||
)
|
||||
# Define the model ID and input field name (if different from default)
|
||||
model_id = "your_model_id"
|
||||
# Optional, only if different from 'text_field'
|
||||
input_field = "your_input_field"
|
||||
|
||||
documents = [
|
||||
"This is an example document.",
|
||||
"Another example document to generate embeddings for.",
|
||||
]
|
||||
embeddings_generator.embed_documents(documents)
|
||||
# Credentials can be passed in two ways. Either set the env vars
|
||||
# ES_CLOUD_ID, ES_USER, ES_PASSWORD and they will be automatically
|
||||
# pulled in, or pass them in directly as kwargs.
|
||||
embeddings = ElasticsearchEmbeddings.from_credentials(
|
||||
model_id,
|
||||
input_field=input_field,
|
||||
# es_cloud_id="foo",
|
||||
# es_user="bar",
|
||||
# es_password="baz",
|
||||
)
|
||||
|
||||
documents = [
|
||||
"This is an example document.",
|
||||
"Another example document to generate embeddings for.",
|
||||
]
|
||||
embeddings_generator.embed_documents(documents)
|
||||
"""
|
||||
try:
|
||||
from elasticsearch import Elasticsearch
|
||||
@@ -135,32 +137,35 @@ class ElasticsearchEmbeddings(Embeddings):
|
||||
Returns:
|
||||
ElasticsearchEmbeddings: An instance of the ElasticsearchEmbeddings class.
|
||||
|
||||
Example Usage:
|
||||
from elasticsearch import Elasticsearch
|
||||
from langchain.embeddings import ElasticsearchEmbeddings
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
# Define the model ID and input field name (if different from default)
|
||||
model_id = "your_model_id"
|
||||
# Optional, only if different from 'text_field'
|
||||
input_field = "your_input_field"
|
||||
from elasticsearch import Elasticsearch
|
||||
|
||||
# Create Elasticsearch connection
|
||||
es_connection = Elasticsearch(
|
||||
hosts=["localhost:9200"], http_auth=("user", "password")
|
||||
)
|
||||
from langchain.embeddings import ElasticsearchEmbeddings
|
||||
|
||||
# Instantiate ElasticsearchEmbeddings using the existing connection
|
||||
embeddings = ElasticsearchEmbeddings.from_es_connection(
|
||||
model_id,
|
||||
es_connection,
|
||||
input_field=input_field,
|
||||
)
|
||||
# Define the model ID and input field name (if different from default)
|
||||
model_id = "your_model_id"
|
||||
# Optional, only if different from 'text_field'
|
||||
input_field = "your_input_field"
|
||||
|
||||
documents = [
|
||||
"This is an example document.",
|
||||
"Another example document to generate embeddings for.",
|
||||
]
|
||||
embeddings_generator.embed_documents(documents)
|
||||
# Create Elasticsearch connection
|
||||
es_connection = Elasticsearch(
|
||||
hosts=["localhost:9200"], http_auth=("user", "password")
|
||||
)
|
||||
|
||||
# Instantiate ElasticsearchEmbeddings using the existing connection
|
||||
embeddings = ElasticsearchEmbeddings.from_es_connection(
|
||||
model_id,
|
||||
es_connection,
|
||||
input_field=input_field,
|
||||
)
|
||||
|
||||
documents = [
|
||||
"This is an example document.",
|
||||
"Another example document to generate embeddings for.",
|
||||
]
|
||||
embeddings_generator.embed_documents(documents)
|
||||
"""
|
||||
# Importing MlClient from elasticsearch.client within the method to
|
||||
# avoid unnecessary import if the method is not used
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
"""LLM Chain specifically for evaluating question answering."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, List
|
||||
from typing import Any, List, Sequence
|
||||
|
||||
from langchain import PromptTemplate
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
@@ -41,8 +41,8 @@ class QAEvalChain(LLMChain):
|
||||
|
||||
def evaluate(
|
||||
self,
|
||||
examples: List[dict],
|
||||
predictions: List[dict],
|
||||
examples: Sequence[dict],
|
||||
predictions: Sequence[dict],
|
||||
question_key: str = "query",
|
||||
answer_key: str = "answer",
|
||||
prediction_key: str = "result",
|
||||
|
||||
@@ -86,10 +86,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
"<a href=\"https://dev.langchain.plus\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
],
|
||||
"text/plain": [
|
||||
"LangChainPlusClient (API URL: http://localhost:8000)"
|
||||
"LangChainPlusClient (API URL: https://dev.api.langchain.plus)"
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
@@ -101,7 +101,6 @@
|
||||
"import os\n",
|
||||
"from langchain.client import LangChainPlusClient\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_SESSION\"] = \"Tracing Walkthrough\"\n",
|
||||
"# os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.langchain.plus\" # Uncomment this line if you want to use the hosted version\n",
|
||||
@@ -142,60 +141,59 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"39,566,248\n",
|
||||
"Anwar Hadid is Dua Lipa's boyfriend and his age raised to the 0.43 power is approximately 3.87.\n",
|
||||
"LLMMathChain._evaluate(\"\n",
|
||||
"(age ** 0.43)\n",
|
||||
"\") raised error: 'age'. Please try again with a valid numerical expression\n",
|
||||
"The distance between Paris and Boston is 3448 miles.\n",
|
||||
"The total number of points scored in the 2023 super bowl raised to the .23 power is approximately 3.457460415669602.\n",
|
||||
"LLMMathChain._evaluate(\"\n",
|
||||
"(total number of points scored in the 2023 super bowl)**0.23\n",
|
||||
"\") raised error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression\n"
|
||||
"unknown format from LLM: Sorry, I cannot answer this question as it requires information that is not currently available.\n",
|
||||
"unknown format from LLM: Sorry, as an AI language model, I do not have access to personal information such as someone's age. Please provide a different math problem.\n",
|
||||
"unknown format from LLM: As an AI language model, I do not have information on future events such as the 2023 super bowl. Therefore, I cannot provide a solution to this question.\n",
|
||||
"unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 63c89b8bad9b172227d890620cdec651 in your message.).\n",
|
||||
"Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID e3dd37877de500d7defe699f8411b3dd in your message.).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"0\n",
|
||||
"1.9347796717823205\n",
|
||||
"1.2600907451828602 (inches)\n",
|
||||
"LLMMathChain._evaluate(\"\n",
|
||||
"round(0.2791714614499425, 2)\n",
|
||||
"\") raised error: 'VariableNode' object is not callable. Please try again with a valid numerical expression\n"
|
||||
]
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['The population of Canada as of 2023 is estimated to be 39,566,248.',\n",
|
||||
" \"Anwar Hadid's age raised to the 0.43 power is approximately 3.87.\",\n",
|
||||
" ValueError(\"unknown format from LLM: Sorry, as an AI language model, I do not have access to personal information such as someone's age. Please provide a different math problem.\"),\n",
|
||||
" 'The distance between Paris and Boston is 3448 miles.',\n",
|
||||
" ValueError('unknown format from LLM: Sorry, I cannot answer this question as it requires information that is not currently available.'),\n",
|
||||
" ValueError('unknown format from LLM: As an AI language model, I do not have information on future events such as the 2023 super bowl. Therefore, I cannot provide a solution to this question.'),\n",
|
||||
" '15 points were scored more in the 2023 Super Bowl than in the 2022 Super Bowl.',\n",
|
||||
" '1.9347796717823205',\n",
|
||||
" ValueError('unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.'),\n",
|
||||
" '0.2791714614499425']"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inputs = [\n",
|
||||
"'How many people live in canada as of 2023?',\n",
|
||||
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
|
||||
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
|
||||
" 'how far is it from paris to boston in miles',\n",
|
||||
" 'what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?',\n",
|
||||
" 'what was the total number of points scored in the 2023 super bowl raised to the .23 power?',\n",
|
||||
" 'how many more points were scored in the 2023 super bowl than in the 2022 super bowl?',\n",
|
||||
" 'what is 153 raised to .1312 power?',\n",
|
||||
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
|
||||
" 'what is 1213 divided by 4345?'\n",
|
||||
"]\n",
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"for input_example in inputs:\n",
|
||||
"inputs = [\n",
|
||||
" \"How many people live in canada as of 2023?\",\n",
|
||||
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
|
||||
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
|
||||
" \"how far is it from paris to boston in miles\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?\",\n",
|
||||
" \"what was the total number of points scored in the 2023 super bowl raised to the .23 power?\",\n",
|
||||
" \"how many more points were scored in the 2023 super bowl than in the 2022 super bowl?\",\n",
|
||||
" \"what is 153 raised to .1312 power?\",\n",
|
||||
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
|
||||
" \"what is 1213 divided by 4345?\",\n",
|
||||
"]\n",
|
||||
"results = []\n",
|
||||
"\n",
|
||||
"async def arun(agent, input_example):\n",
|
||||
" try:\n",
|
||||
" print(agent.run(input_example))\n",
|
||||
" return await agent.arun(input_example)\n",
|
||||
" except Exception as e:\n",
|
||||
" # The agent sometimes makes mistakes! These will be captured by the tracing.\n",
|
||||
" print(e)\n",
|
||||
" "
|
||||
" return e\n",
|
||||
"for input_example in inputs:\n",
|
||||
" results.append(arun(agent, input_example))\n",
|
||||
"await asyncio.gather(*results) "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -217,42 +215,31 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset_name = \"calculator-example-dataset\""
|
||||
"dataset_name = \"calculator-example-dataset-2\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "c0e12629-bca5-4438-8665-890d0cb9cc4a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"runs = client.list_runs(\n",
|
||||
" session_name=os.environ[\"LANGCHAIN_SESSION\"],\n",
|
||||
" run_type=\"chain\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
|
||||
" dataset = client.create_dataset(dataset_name, description=\"A calculator example dataset\")\n",
|
||||
" # List all \"Chain\" runs in the current session \n",
|
||||
" runs = client.list_runs(\n",
|
||||
" session_name=os.environ[\"LANGCHAIN_SESSION\"],\n",
|
||||
" run_type=\"chain\")\n",
|
||||
" for run in runs:\n",
|
||||
" if run.name == \"AgentExecutor\":\n",
|
||||
" # We will only use examples from the top level AgentExecutor run here.\n",
|
||||
" client.create_example(inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id)"
|
||||
"if dataset_name in set([dataset.name for dataset in client.list_datasets()]):\n",
|
||||
" client.delete_dataset(dataset_name=dataset_name)\n",
|
||||
"dataset = client.create_dataset(dataset_name, description=\"A calculator example dataset\")\n",
|
||||
"runs = client.list_runs(\n",
|
||||
" session_name=os.environ[\"LANGCHAIN_SESSION\"],\n",
|
||||
" execution_order=1, # Only return the top-level runs\n",
|
||||
" error=False, # Only runs that succeed\n",
|
||||
")\n",
|
||||
"for run in runs:\n",
|
||||
" try:\n",
|
||||
" client.create_example(inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id)\n",
|
||||
" except:\n",
|
||||
" pass"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -286,7 +273,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "1baa677c-5642-4378-8e01-3aa1647f19d6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -299,7 +286,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 7,
|
||||
"id": "60d14593-c61f-449f-a38f-772ca43707c2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -317,7 +304,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 8,
|
||||
"id": "52a7ea76-79ca-4765-abf7-231e884040d6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -353,7 +340,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 9,
|
||||
"id": "c2b59104-b90e-466a-b7ea-c5bd0194263b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -381,7 +368,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 10,
|
||||
"id": "112d7bdf-7e50-4c1a-9285-5bac8473f2ee",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -418,7 +405,7 @@
|
||||
"\n",
|
||||
"Returns:\n",
|
||||
" A dictionary mapping example ids to the model outputs.\n",
|
||||
"\u001b[0;31mFile:\u001b[0m ~/Code/langchain/langchain/client/langchain.py\n",
|
||||
"\u001b[0;31mFile:\u001b[0m ~/code/lc/lckg/langchain/client/langchain.py\n",
|
||||
"\u001b[0;31mType:\u001b[0m method"
|
||||
]
|
||||
},
|
||||
@@ -432,7 +419,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 11,
|
||||
"id": "6e10f823",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -442,7 +429,12 @@
|
||||
"# Since chains can be stateful (e.g. they can have memory), we need provide\n",
|
||||
"# a way to initialize a new chain for each row in the dataset. This is done\n",
|
||||
"# by passing in a factory function that returns a new chain for each row.\n",
|
||||
"chain_factory = lambda: initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)\n",
|
||||
"chain_factory = lambda: initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" verbose=False,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# If your chain is NOT stateful, your lambda can return the object directly\n",
|
||||
"# to improve runtime performance. For example:\n",
|
||||
@@ -451,28 +443,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 12,
|
||||
"id": "a8088b7d-3ab6-4279-94c8-5116fe7cee33",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 1\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 604fbd32-7cbe-4dd4-9ddd-fd5ab5c01566. Error: LLMMathChain._evaluate(\"\n",
|
||||
"(age ** 0.43)\n",
|
||||
"\") raised error: 'age'. Please try again with a valid numerical expression\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
@@ -484,25 +460,55 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example 4c82b6a4-d8ce-4129-8229-7f4e2f76294c. Error: LLMMathChain._evaluate(\"\n",
|
||||
"(total number of points scored in the 2023 super bowl)**0.23\n",
|
||||
"\") raised error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression\n"
|
||||
"Chain failed for example 898af6aa-ea39-4959-9ecd-9b9f1ffee31c. Error: LLMMathChain._evaluate(\"\n",
|
||||
"round(0.2791714614499425, 2)\n",
|
||||
"\") raised error: 'VariableNode' object is not callable. Please try again with a valid numerical expression\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 10\r"
|
||||
"Processed examples: 5\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Chain failed for example ffb8071d-60e4-49ca-aa9f-5ec03ea78f2d. Error: unknown format from LLM: This is not a math problem and cannot be translated into a mathematical expression.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 6\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 29fc448d09a0f240719eb1dbb95db18d in your message.).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 7\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"evaluation_session_name = \"Search + Calculator Agent Evaluation\"\n",
|
||||
"chain_results = await client.arun_on_dataset(\n",
|
||||
" dataset_name=dataset_name,\n",
|
||||
" llm_or_chain_factory=chain_factory,\n",
|
||||
" concurrency_level=5, # Optional, sets the number of examples to run at a time\n",
|
||||
" verbose=True\n",
|
||||
" verbose=True,\n",
|
||||
" session_name=evaluation_session_name # Optional, a unique session name will be generated if not provided\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.\n",
|
||||
@@ -511,18 +517,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2737458-b20c-4288-8790-1f4a8d237b2a",
|
||||
"metadata": {},
|
||||
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Reviewing the Chain Results\n",
|
||||
"### Reviewing the Chain Results\n",
|
||||
"\n",
|
||||
"You can review the results of the run in the tracing UI below and navigating to the session \n",
|
||||
"with the title 'calculator-example-dataset-AgentExecutor-YYYY-MM-DD-HH-MM-SS'"
|
||||
"with the title **\"Search + Calculator Agent Evaluation\"**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 13,
|
||||
"id": "136db492-d6ca-4215-96f9-439c23538241",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -531,13 +539,13 @@
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
"<a href=\"https://dev.langchain.plus\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
],
|
||||
"text/plain": [
|
||||
"LangChainPlusClient (API URL: http://localhost:8000)"
|
||||
"LangChainPlusClient (API URL: https://dev.api.langchain.plus)"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -549,226 +557,70 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c70cceb5-aa53-4851-bb12-386f092191f9",
|
||||
"id": "63ed6561-6574-43b3-a653-fe410aa8a617",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Running a Chat Model over a Traced Dataset\n",
|
||||
"## Running an Evaluation Chain\n",
|
||||
"\n",
|
||||
"We've shown how to run a _chain_ over a dataset, but you can also run an LLM or Chat model over a datasets formed from runs. \n",
|
||||
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
|
||||
"It's easier to leverage AI-assisted feedback to evaluate your agent's performance.\n",
|
||||
"\n",
|
||||
"First, we'll show an example using a ChatModel. This is useful for things like:\n",
|
||||
"- Comparing results under different decoding parameters\n",
|
||||
"- Comparing model providers\n",
|
||||
"- Testing for regressions in model behavior\n",
|
||||
"- Running multiple times with a temperature to gauge stability \n",
|
||||
"A few ways of doing this include:\n",
|
||||
"- Adding ground-truth answers as outputs to the dataset and evaluating relative to those references.\n",
|
||||
"- Evaluating the overall agent trajectory based on the tool usage and intermediate steps.\n",
|
||||
"- Evaluating performance based on 'context' such as retrieved documents or tool results.\n",
|
||||
"- Evaluating 'aspects' of the agent's response in a reference-free manner using targeted agent prompts.\n",
|
||||
" \n",
|
||||
"Below, we show how to run an evaluation chain that compares the model output with the ground-truth answers.\n",
|
||||
"\n",
|
||||
"To speed things up, we'll upload a dataset we've previously captured directly to the tracing service."
|
||||
"**Note: the feedback API is currently experimental and subject to change.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "64490d7c-9a18-49ed-a3ac-36049c522cb4",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--two-player-dnd-cc62c3037e2d9250/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "44f3c72015944e2ea4c39516350ea15c",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/1 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>generations</th>\n",
|
||||
" <th>messages</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
|
||||
" <td>[{'data': {'content': 'Here is the topic for a...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
|
||||
" <td>[{'data': {'content': 'Here is the topic for a...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
|
||||
" <td>[{'data': {'content': 'Here is the topic for a...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
|
||||
" <td>[{'data': {'content': 'Here is the topic for a...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>[[{'generation_info': None, 'message': {'conte...</td>\n",
|
||||
" <td>[{'data': {'content': 'Here is the topic for a...</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" generations \\\n",
|
||||
"0 [[{'generation_info': None, 'message': {'conte... \n",
|
||||
"1 [[{'generation_info': None, 'message': {'conte... \n",
|
||||
"2 [[{'generation_info': None, 'message': {'conte... \n",
|
||||
"3 [[{'generation_info': None, 'message': {'conte... \n",
|
||||
"4 [[{'generation_info': None, 'message': {'conte... \n",
|
||||
"\n",
|
||||
" messages \n",
|
||||
"0 [{'data': {'content': 'Here is the topic for a... \n",
|
||||
"1 [{'data': {'content': 'Here is the topic for a... \n",
|
||||
"2 [{'data': {'content': 'Here is the topic for a... \n",
|
||||
"3 [{'data': {'content': 'Here is the topic for a... \n",
|
||||
"4 [{'data': {'content': 'Here is the topic for a... "
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"from langchain.evaluation.loading import load_dataset\n",
|
||||
"\n",
|
||||
"chat_dataset = load_dataset(\"two-player-dnd\")\n",
|
||||
"chat_df = pd.DataFrame(chat_dataset)\n",
|
||||
"chat_df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "348acd86-a927-4d60-8d52-02e64585e4fc",
|
||||
"execution_count": 14,
|
||||
"id": "35db4025-9183-4e5f-ba14-0b1b380f49c7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_dataset_name = \"two-player-dnd\"\n",
|
||||
"from langchain.evaluation.qa import QAEvalChain\n",
|
||||
"\n",
|
||||
"if chat_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
|
||||
" client.upload_dataframe(chat_df, \n",
|
||||
" name=chat_dataset_name,\n",
|
||||
" description=\"An example dataset traced from chat models in a multiagent bidding dialogue\",\n",
|
||||
" input_keys=[\"messages\"],\n",
|
||||
" output_keys=[\"generations\"],\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "927a43b8-e4f9-4220-b75d-33e310bc318b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Reviewing behavior with temperature\n",
|
||||
"eval_llm = ChatOpenAI(model=\"gpt-4\")\n",
|
||||
"chain = QAEvalChain.from_llm(eval_llm)\n",
|
||||
"\n",
|
||||
"Here, we will set `num_repetitions > 1` and set the temperature to 0.3 to see the variety of response types for a each example.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "a69dd183-ad5e-473d-b631-db90706e837f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatAnthropic\n",
|
||||
"\n",
|
||||
"chat_model = ChatAnthropic(temperature=.3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "063da2a9-3692-4b7b-8edb-e474824fe416",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Processed examples: 36\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_model_results = await client.arun_on_dataset(\n",
|
||||
" dataset_name=chat_dataset_name,\n",
|
||||
" llm_or_chain_factory=chat_model,\n",
|
||||
" concurrency_level=5, # Optional, sets the number of examples to run at a time\n",
|
||||
" num_repetitions=3,\n",
|
||||
" verbose=True\n",
|
||||
"examples = []\n",
|
||||
"predictions = []\n",
|
||||
"run_ids = []\n",
|
||||
"for run in client.list_runs(session_name=evaluation_session_name, execution_order=1, error=False):\n",
|
||||
" if run.reference_example_id is None or not run.outputs:\n",
|
||||
" continue\n",
|
||||
" run_ids.append(run.id)\n",
|
||||
" example = client.read_example(run.reference_example_id)\n",
|
||||
" examples.append({**run.inputs, **example.outputs})\n",
|
||||
" predictions.append(\n",
|
||||
" run.outputs\n",
|
||||
" )\n",
|
||||
" \n",
|
||||
"evaluation_results = chain.evaluate(\n",
|
||||
" examples,\n",
|
||||
" predictions,\n",
|
||||
" question_key=\"input\",\n",
|
||||
" answer_key=\"output\",\n",
|
||||
" prediction_key=\"output\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The 'experimental tracing v2' warning is expected, as we are still actively developing the v2 tracing API \n",
|
||||
"# Since we are running examples concurrently, you may run into some RateLimit warnings from your model\n",
|
||||
"# provider. In most cases, the tests will still run to completion (the wrappers have backoff)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "de7bfe08-215c-4328-b9b0-631d9a41f0e8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Reviewing the Chat Model Results\n",
|
||||
"\n",
|
||||
"You can review the latest runs by clicking on the link below and navigating to the \"two-player-dnd\" session."
|
||||
"for run_id, result in zip(run_ids, evaluation_results):\n",
|
||||
" score = {\"CORRECT\": 1, \"INCORRECT\": 0}.get(result[\"text\"], 0)\n",
|
||||
" client.create_feedback(run_id, \"Accuracy\", score=score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "5b7a81f2-d19d-438b-a4bb-5678f746b965",
|
||||
"execution_count": 15,
|
||||
"id": "8696f167-dc75-4ef8-8bb3-ac1ce8324f30",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -776,243 +628,13 @@
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
"<a href=\"https://dev.langchain.plus\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
],
|
||||
"text/plain": [
|
||||
"LangChainPlusClient (API URL: http://localhost:8000)"
|
||||
"LangChainPlusClient (API URL: https://dev.api.langchain.plus)"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7896cbeb-345f-430b-ab5e-e108973174f8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Running an LLM over a Traced Dataset\n",
|
||||
"\n",
|
||||
"You can run an LLM over a dataset in much the same way as the chain and chat models, provided the dataset you've captured is in the appropriate format. We've cached one for you here, but using application-specific traces will be much more useful for your use cases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "d6805d0b-4612-4671-bffb-e6978992bd40",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(model_name='text-curie-001', temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "5d7cb243-40c3-44dd-8158-a7b910441e9f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--state-of-the-union-completions-5347290a406c64c8/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "5ce2168f975241fbae82a76b4d70e4c4",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
" 0%| | 0/1 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>generations</th>\n",
|
||||
" <th>ground_truth</th>\n",
|
||||
" <th>prompt</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>[[{'generation_info': {'finish_reason': 'stop'...</td>\n",
|
||||
" <td>The pandemic has been punishing. \\n\\nAnd so ma...</td>\n",
|
||||
" <td>Putin may circle Kyiv with tanks, but he will ...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>[[]]</td>\n",
|
||||
" <td>With a duty to one another to the American peo...</td>\n",
|
||||
" <td>Madam Speaker, Madam Vice President, our First...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>[[{'generation_info': {'finish_reason': 'stop'...</td>\n",
|
||||
" <td>He thought he could roll into Ukraine and the ...</td>\n",
|
||||
" <td>With a duty to one another to the American peo...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>[[]]</td>\n",
|
||||
" <td>And the costs and the threats to America and t...</td>\n",
|
||||
" <td>Please rise if you are able and show that, Yes...</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>[[{'generation_info': {'finish_reason': 'stop'...</td>\n",
|
||||
" <td>Please rise if you are able and show that, Yes...</td>\n",
|
||||
" <td>Groups of citizens blocking tanks with their b...</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" generations \\\n",
|
||||
"0 [[{'generation_info': {'finish_reason': 'stop'... \n",
|
||||
"1 [[]] \n",
|
||||
"2 [[{'generation_info': {'finish_reason': 'stop'... \n",
|
||||
"3 [[]] \n",
|
||||
"4 [[{'generation_info': {'finish_reason': 'stop'... \n",
|
||||
"\n",
|
||||
" ground_truth \\\n",
|
||||
"0 The pandemic has been punishing. \\n\\nAnd so ma... \n",
|
||||
"1 With a duty to one another to the American peo... \n",
|
||||
"2 He thought he could roll into Ukraine and the ... \n",
|
||||
"3 And the costs and the threats to America and t... \n",
|
||||
"4 Please rise if you are able and show that, Yes... \n",
|
||||
"\n",
|
||||
" prompt \n",
|
||||
"0 Putin may circle Kyiv with tanks, but he will ... \n",
|
||||
"1 Madam Speaker, Madam Vice President, our First... \n",
|
||||
"2 With a duty to one another to the American peo... \n",
|
||||
"3 Please rise if you are able and show that, Yes... \n",
|
||||
"4 Groups of citizens blocking tanks with their b... "
|
||||
]
|
||||
},
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"completions_dataset = load_dataset(\"state-of-the-union-completions\")\n",
|
||||
"completions_df = pd.DataFrame(completions_dataset)\n",
|
||||
"completions_df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "c7dcc1b2-7aef-44c0-ba0f-c812279099a5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"completions_dataset_name = \"state-of-the-union-completions\"\n",
|
||||
"\n",
|
||||
"if completions_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):\n",
|
||||
" client.upload_dataframe(completions_df, \n",
|
||||
" name=completions_dataset_name,\n",
|
||||
" description=\"An example dataset traced from completion endpoints over the state of the union address\",\n",
|
||||
" input_keys=[\"prompt\"],\n",
|
||||
" output_keys=[\"generations\"],\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "e946138e-bf7c-43d7-861d-9c5740c933fa",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"50 processed\r"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# We also offer a synchronous method for running examples if a chain or llm's async methods aren't yet implemented\n",
|
||||
"completions_model_results = client.run_on_dataset(\n",
|
||||
" dataset_name=completions_dataset_name,\n",
|
||||
" llm_or_chain_factory=llm,\n",
|
||||
" num_repetitions=1,\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc86e8e6-cee2-429e-942b-289284d14816",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Reviewing the LLM Results\n",
|
||||
"\n",
|
||||
"You can once again inspect the latest runs by clicking on the link below and navigating to the \"two-player-dnd\" session."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "2bf96f17-74c1-4f7d-8458-ae5ab5c6bd36",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<a href=\"http://localhost\", target=\"_blank\" rel=\"noopener\">LangChain+ Client</a>"
|
||||
],
|
||||
"text/plain": [
|
||||
"LangChainPlusClient (API URL: http://localhost:8000)"
|
||||
]
|
||||
},
|
||||
"execution_count": 25,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -1024,7 +646,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "df80cd88-cd6f-4fdc-965f-f74600e1f286",
|
||||
"id": "daf7dc7f-a5b0-49be-a695-2a87e283e588",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -1046,7 +668,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -107,6 +107,7 @@ class Anthropic(LLM, _AnthropicCommon):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
import anthropic
|
||||
from langchain.llms import Anthropic
|
||||
model = Anthropic(model="<model_name>", anthropic_api_key="my-api-key")
|
||||
|
||||
@@ -18,6 +18,7 @@ class Anyscale(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import Anyscale
|
||||
anyscale = Anyscale(anyscale_service_url="SERVICE_URL",
|
||||
anyscale_service_route="SERVICE_ROUTE",
|
||||
|
||||
@@ -23,6 +23,7 @@ class Banana(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import Banana
|
||||
banana = Banana(model_key="")
|
||||
"""
|
||||
|
||||
@@ -31,24 +31,28 @@ class Beam(LLM):
|
||||
The wrapper can then be called as follows, where the name, cpu, memory, gpu,
|
||||
python version, and python packages can be updated accordingly. Once deployed,
|
||||
the instance can be called.
|
||||
llm = Beam(model_name="gpt2",
|
||||
name="langchain-gpt2",
|
||||
cpu=8,
|
||||
memory="32Gi",
|
||||
gpu="A10G",
|
||||
python_version="python3.8",
|
||||
python_packages=[
|
||||
"diffusers[torch]>=0.10",
|
||||
"transformers",
|
||||
"torch",
|
||||
"pillow",
|
||||
"accelerate",
|
||||
"safetensors",
|
||||
"xformers",],
|
||||
max_length=50)
|
||||
|
||||
llm._deploy()
|
||||
call_result = llm._call(input)
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
llm = Beam(model_name="gpt2",
|
||||
name="langchain-gpt2",
|
||||
cpu=8,
|
||||
memory="32Gi",
|
||||
gpu="A10G",
|
||||
python_version="python3.8",
|
||||
python_packages=[
|
||||
"diffusers[torch]>=0.10",
|
||||
"transformers",
|
||||
"torch",
|
||||
"pillow",
|
||||
"accelerate",
|
||||
"safetensors",
|
||||
"xformers",],
|
||||
max_length=50)
|
||||
llm._deploy()
|
||||
call_result = llm._call(input)
|
||||
|
||||
"""
|
||||
|
||||
model_name: str = ""
|
||||
|
||||
@@ -99,6 +99,11 @@ class Bedrock(LLM):
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
"""Validate that AWS credentials to and python package exists in environment."""
|
||||
|
||||
# Skip creating new client if passed in constructor
|
||||
if values["client"] is not None:
|
||||
return values
|
||||
|
||||
try:
|
||||
import boto3
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ class CerebriumAI(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import CerebriumAI
|
||||
cerebrium = CerebriumAI(endpoint_url="")
|
||||
|
||||
|
||||
@@ -22,6 +22,7 @@ class GooseAI(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import GooseAI
|
||||
gooseai = GooseAI(model_name="gpt-neo-20b")
|
||||
|
||||
|
||||
@@ -92,6 +92,9 @@ class GPT4All(LLM):
|
||||
"""Leave (n_ctx * context_erase) tokens
|
||||
starting from beginning if the context has run out."""
|
||||
|
||||
allow_download: bool = False
|
||||
"""If model does not exist in ~/.cache/gpt4all/, download it."""
|
||||
|
||||
client: Any = None #: :meta private:
|
||||
|
||||
class Config:
|
||||
@@ -145,7 +148,7 @@ class GPT4All(LLM):
|
||||
model_name,
|
||||
model_path=model_path or None,
|
||||
model_type=values["backend"],
|
||||
allow_download=False,
|
||||
allow_download=values["allow_download"],
|
||||
)
|
||||
if values["n_threads"] is not None:
|
||||
# set n_threads
|
||||
|
||||
@@ -172,7 +172,7 @@ class LlamaCpp(LLM):
|
||||
|
||||
def _get_parameters(self, stop: Optional[List[str]] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
Performs sanity check, preparing paramaters in format needed by llama_cpp.
|
||||
Performs sanity check, preparing parameters in format needed by llama_cpp.
|
||||
|
||||
Args:
|
||||
stop (Optional[List[str]]): List of stop sequences for llama_cpp.
|
||||
@@ -238,7 +238,7 @@ class LlamaCpp(LLM):
|
||||
) -> Generator[Dict, None, None]:
|
||||
"""Yields results objects as they are generated in real time.
|
||||
|
||||
BETA: this is a beta feature while we figure out the right abstraction:
|
||||
BETA: this is a beta feature while we figure out the right abstraction.
|
||||
Once that happens, this interface could change.
|
||||
|
||||
It also calls the callback manager's on_llm_new_token event with
|
||||
|
||||
@@ -22,6 +22,7 @@ class Modal(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import Modal
|
||||
modal = Modal(endpoint_url="")
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ class Petals(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import petals
|
||||
petals = Petals()
|
||||
|
||||
|
||||
@@ -23,6 +23,7 @@ class PipelineAI(LLM, BaseModel):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain import PipelineAI
|
||||
pipeline = PipelineAI(pipeline_key="")
|
||||
"""
|
||||
|
||||
@@ -19,8 +19,10 @@ class PredictionGuard(LLM):
|
||||
it as a named parameter to the constructor. To use Prediction Guard's API along
|
||||
with OpenAI models, set the environment variable ``OPENAI_API_KEY`` with your
|
||||
OpenAI API key as well.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
pgllm = PredictionGuard(model="MPT-7B-Instruct",
|
||||
token="my-access-token",
|
||||
output={
|
||||
|
||||
@@ -20,6 +20,7 @@ class PromptLayerOpenAI(OpenAI):
|
||||
|
||||
All parameters that can be passed to the OpenAI LLM can also
|
||||
be passed here. The PromptLayerOpenAI LLM adds two optional
|
||||
|
||||
parameters:
|
||||
``pl_tags``: List of strings to tag the request with.
|
||||
``return_pl_id``: If True, the PromptLayer request ID will be
|
||||
@@ -124,6 +125,7 @@ class PromptLayerOpenAIChat(OpenAIChat):
|
||||
|
||||
All parameters that can be passed to the OpenAIChat LLM can also
|
||||
be passed here. The PromptLayerOpenAIChat adds two optional
|
||||
|
||||
parameters:
|
||||
``pl_tags``: List of strings to tag the request with.
|
||||
``return_pl_id``: If True, the PromptLayer request ID will be
|
||||
|
||||
@@ -23,6 +23,7 @@ class Replicate(LLM):
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.llms import Replicate
|
||||
replicate = Replicate(model="stability-ai/stable-diffusion: \
|
||||
27b93a2413e7f36cd83da926f365628\
|
||||
|
||||
@@ -20,6 +20,7 @@ DEFAULT_PORT = 9042
|
||||
|
||||
class CassandraChatMessageHistory(BaseChatMessageHistory):
|
||||
"""Chat message history that stores history in Cassandra.
|
||||
|
||||
Args:
|
||||
contact_points: list of ips to connect to Cassandra cluster
|
||||
session_id: arbitrary key that is used to store the messages
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user