mirror of
https://github.com/hwchase17/langchain.git
synced 2026-04-21 19:27:58 +00:00
Compare commits
59 Commits
vwp/evalua
...
vwp/simpli
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a077fddda0 | ||
|
|
17df1cbf70 | ||
|
|
ece2a04ced | ||
|
|
a047400ca1 | ||
|
|
77de60366b | ||
|
|
36457c7d5b | ||
|
|
65619ce90c | ||
|
|
ec4c508d50 | ||
|
|
c9dc343062 | ||
|
|
153df9dc36 | ||
|
|
4b57854de2 | ||
|
|
a0f0602aec | ||
|
|
05e137e39e | ||
|
|
75d09059e5 | ||
|
|
354176e835 | ||
|
|
f1519460f6 | ||
|
|
3065fab122 | ||
|
|
5a179a6dda | ||
|
|
cc30791a2a | ||
|
|
62446aacb5 | ||
|
|
7054826246 | ||
|
|
cade6f0fde | ||
|
|
395e359ecf | ||
|
|
add07a3db5 | ||
|
|
6ba8fcc5d6 | ||
|
|
f0ac0b83b1 | ||
|
|
8805b090b0 | ||
|
|
de93fc91ab | ||
|
|
5cd2f833c3 | ||
|
|
fb379025d2 | ||
|
|
e84f9ea812 | ||
|
|
f8b6bcb4a3 | ||
|
|
7c4a0aa97a | ||
|
|
0cc2bd49d9 | ||
|
|
da04f08882 | ||
|
|
cb0a5cd98c | ||
|
|
368ffb3334 | ||
|
|
75efcc7e6d | ||
|
|
8ccce24bf4 | ||
|
|
118194b6f9 | ||
|
|
e80fb4085b | ||
|
|
ede5f1a78b | ||
|
|
e8f75af025 | ||
|
|
9fd0b5f699 | ||
|
|
d1fd6db81a | ||
|
|
9526699c62 | ||
|
|
31070269fd | ||
|
|
35f4bbf432 | ||
|
|
d0e4def7bf | ||
|
|
93df74f10b | ||
|
|
e3f2a4b88f | ||
|
|
ca4fc19a3c | ||
|
|
91c814df4a | ||
|
|
d26cea93b6 | ||
|
|
6bf778c2f1 | ||
|
|
8b1339293c | ||
|
|
e884da2851 | ||
|
|
b9a20ddc34 | ||
|
|
7bf07c1b69 |
8
.github/PULL_REQUEST_TEMPLATE.md
vendored
8
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,3 +1,5 @@
|
||||
# Your PR Title (What it does)
|
||||
|
||||
<!--
|
||||
Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution.
|
||||
|
||||
@@ -12,7 +14,7 @@ Finally, we'd love to show appreciation for your contribution - if you'd like us
|
||||
|
||||
Fixes # (issue)
|
||||
|
||||
#### Before submitting
|
||||
## Before submitting
|
||||
|
||||
<!-- If you're adding a new integration, please include:
|
||||
|
||||
@@ -26,9 +28,9 @@ etc:
|
||||
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
-->
|
||||
|
||||
#### Who can review?
|
||||
## Who can review?
|
||||
|
||||
Tag maintainers/contributors who might be interested:
|
||||
Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:
|
||||
|
||||
<!-- For a quicker response, figure out the right person to tag with @
|
||||
|
||||
|
||||
5
.gitignore
vendored
5
.gitignore
vendored
@@ -149,7 +149,4 @@ wandb/
|
||||
|
||||
# integration test artifacts
|
||||
data_map*
|
||||
\[('_type', 'fake'), ('stop', None)]
|
||||
|
||||
# Replit files
|
||||
*replit*
|
||||
\[('_type', 'fake'), ('stop', None)]
|
||||
@@ -2,7 +2,6 @@
|
||||
|
||||
⚡ Building applications with LLMs through composability ⚡
|
||||
|
||||
[](https://github.com/hwchase17/langchain/releases)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/lint.yml)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/test.yml)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml)
|
||||
@@ -13,8 +12,6 @@
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
|
||||
[](https://codespaces.new/hwchase17/langchain)
|
||||
[](https://star-history.com/#hwchase17/langchain)
|
||||
[](https://libraries.io/github/hwchase17/langchain)
|
||||
[](https://github.com/hwchase17/langchain/issues)
|
||||
|
||||
|
||||
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).
|
||||
|
||||
@@ -1,14 +1,13 @@
|
||||
# Tutorials
|
||||
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
This is a collection of `LangChain` tutorials mostly on `YouTube`.
|
||||
|
||||
### DeepLearning.AI course
|
||||
⛓[LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain) by Harrison Chase presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
|
||||
⛓ icon marks a new video [last update 2023-05-15]
|
||||
|
||||
### Handbook
|
||||
###
|
||||
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
|
||||
|
||||
### Tutorials
|
||||
###
|
||||
[LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
|
||||
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
|
||||
@@ -109,4 +108,4 @@ LangChain by [Chat with data](https://www.youtube.com/@chatwithdata)
|
||||
- ⛓ [Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations](https://youtu.be/CyuUlf54wTs)
|
||||
|
||||
---------------------
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
⛓ icon marks a new video [last update 2023-05-15]
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,18 +0,0 @@
|
||||
# Annoy
|
||||
|
||||
> [Annoy](https://github.com/spotify/annoy) (`Approximate Nearest Neighbors Oh Yeah`) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install annoy
|
||||
```
|
||||
|
||||
|
||||
## Vectorstore
|
||||
|
||||
See a [usage example](../modules/indexes/vectorstores/examples/annoy.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Annoy
|
||||
```
|
||||
@@ -1,29 +0,0 @@
|
||||
# Argilla
|
||||
|
||||

|
||||
|
||||
>[Argilla](https://argilla.io/) is an open-source data curation platform for LLMs.
|
||||
> Using Argilla, everyone can build robust language models through faster data curation
|
||||
> using both human and machine feedback. We provide support for each step in the MLOps cycle,
|
||||
> from data labeling to model monitoring.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you'll need to install the `argilla` Python package as follows:
|
||||
|
||||
```bash
|
||||
pip install argilla --upgrade
|
||||
```
|
||||
|
||||
If you already have an Argilla Server running, then you're good to go; but if
|
||||
you don't, follow the next steps to install it.
|
||||
|
||||
If you don't you can refer to [Argilla - 🚀 Quickstart](https://docs.argilla.io/en/latest/getting_started/quickstart.html#Running-Argilla-Quickstart) to deploy Argilla either on HuggingFace Spaces, locally, or on a server.
|
||||
|
||||
## Tracking
|
||||
|
||||
See a [usage example of `ArgillaCallbackHandler`](../modules/callbacks/examples/examples/argilla.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.callbacks import ArgillaCallbackHandler
|
||||
```
|
||||
@@ -26,11 +26,3 @@ See a [usage example](../modules/indexes/document_loaders/examples/arxiv.ipynb).
|
||||
```python
|
||||
from langchain.document_loaders import ArxivLoader
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/arxiv.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ArxivRetriever
|
||||
```
|
||||
|
||||
@@ -1,24 +0,0 @@
|
||||
# Azure Cognitive Search
|
||||
|
||||
>[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
|
||||
|
||||
>Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:
|
||||
>- A search engine for full text search over a search index containing user-owned content
|
||||
>- Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation
|
||||
>- Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more
|
||||
>- Programmability through REST APIs and client libraries in Azure SDKs
|
||||
>- Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
See [set up instructions](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal).
|
||||
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/azure_cognitive_search.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import AzureCognitiveSearchRetriever
|
||||
```
|
||||
@@ -1,4 +1,4 @@
|
||||
# Bedrock
|
||||
# Amazon Bedrock
|
||||
|
||||
>[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.
|
||||
|
||||
|
||||
@@ -1,23 +0,0 @@
|
||||
# Cassandra
|
||||
|
||||
>[Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra) is a free and open-source, distributed, wide-column
|
||||
> store, NoSQL database management system designed to handle large amounts of data across many commodity servers,
|
||||
> providing high availability with no single point of failure. `Cassandra` offers support for clusters spanning
|
||||
> multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
|
||||
> `Cassandra` was designed to implement a combination of `Amazon's Dynamo` distributed storage and replication
|
||||
> techniques combined with `Google's Bigtable` data and storage engine model.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install cassandra-drive
|
||||
```
|
||||
|
||||
|
||||
## Memory
|
||||
|
||||
See a [usage example](../modules/memory/examples/cassandra_chat_message_history.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.memory import CassandraChatMessageHistory
|
||||
```
|
||||
@@ -1,29 +1,20 @@
|
||||
# Chroma
|
||||
|
||||
>[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.
|
||||
This page covers how to use the Chroma ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Chroma wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python package with `pip install chromadb`
|
||||
## Wrappers
|
||||
|
||||
```bash
|
||||
pip install chromadb
|
||||
```
|
||||
|
||||
|
||||
## VectorStore
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around Chroma vector databases, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Chroma
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/indexes/vectorstores/getting_started.ipynb)
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/chroma_self_query.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import SelfQueryRetriever
|
||||
```
|
||||
|
||||
@@ -1,38 +1,25 @@
|
||||
# Cohere
|
||||
|
||||
>[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models
|
||||
> that help companies improve human-machine interactions.
|
||||
This page covers how to use the Cohere ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Cohere wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK :
|
||||
```bash
|
||||
pip install cohere
|
||||
```
|
||||
- Install the Python SDK with `pip install cohere`
|
||||
- Get an Cohere api key and set it as an environment variable (`COHERE_API_KEY`)
|
||||
|
||||
Get a [Cohere api key](https://dashboard.cohere.ai/) and set it as an environment variable (`COHERE_API_KEY`)
|
||||
## Wrappers
|
||||
|
||||
|
||||
## LLM
|
||||
### LLM
|
||||
|
||||
There exists an Cohere LLM wrapper, which you can access with
|
||||
See a [usage example](../modules/models/llms/integrations/cohere.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.llms import Cohere
|
||||
```
|
||||
|
||||
## Text Embedding Model
|
||||
### Embeddings
|
||||
|
||||
There exists an Cohere Embedding model, which you can access with
|
||||
There exists an Cohere Embeddings wrapper, which you can access with
|
||||
```python
|
||||
from langchain.embeddings import CohereEmbeddings
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/cohere.ipynb)
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/cohere-reranker.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers.document_compressors import CohereRerank
|
||||
```
|
||||
|
||||
@@ -1,17 +1,25 @@
|
||||
# Databerry
|
||||
|
||||
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
|
||||
This page covers how to use the [Databerry](https://databerry.ai) within LangChain.
|
||||
|
||||
## What is Databerry?
|
||||
|
||||
## Installation and Setup
|
||||
Databerry is an [open source](https://github.com/gmpetrov/databerry) document retrievial platform that helps to connect your personal data with Large Language Models.
|
||||
|
||||
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
|
||||
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
|
||||

|
||||
|
||||
## Retriever
|
||||
## Quick start
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/databerry.ipynb).
|
||||
Retrieving documents stored in Databerry from LangChain is very easy!
|
||||
|
||||
```python
|
||||
from langchain.retrievers import DataberryRetriever
|
||||
|
||||
retriever = DataberryRetriever(
|
||||
datastore_url="https://api.databerry.ai/query/clg1xg2h80000l708dymr0fxc",
|
||||
# api_key="DATABERRY_API_KEY", # optional if datastore is public
|
||||
# top_k=10 # optional
|
||||
)
|
||||
|
||||
docs = retriever.get_relevant_documents("What's Databerry?")
|
||||
```
|
||||
|
||||
@@ -1,30 +0,0 @@
|
||||
# Discord
|
||||
|
||||
>[Discord](https://discord.com/) is a VoIP and instant messaging social platform. Users have the ability to communicate
|
||||
> with voice calls, video calls, text messaging, media and files in private chats or as part of communities called
|
||||
> "servers". A server is a collection of persistent chat rooms and voice channels which can be accessed via invite links.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install pandas
|
||||
```
|
||||
|
||||
Follow these steps to download your `Discord` data:
|
||||
|
||||
1. Go to your **User Settings**
|
||||
2. Then go to **Privacy and Safety**
|
||||
3. Head over to the **Request all of my Data** and click on **Request Data** button
|
||||
|
||||
It might take 30 days for you to receive your data. You'll receive an email at the address which is registered
|
||||
with Discord. That email will have a download button using which you would be able to download your personal Discord data.
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/discord.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DiscordChatLoader
|
||||
```
|
||||
@@ -1,20 +1,25 @@
|
||||
# Docugami
|
||||
|
||||
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests
|
||||
> of XML semantic trees representing entire documents. This is a rich representation that includes the semantic and
|
||||
>[Docugami](https://docugami.com) converts business documents into a Document XML Knowledge Graph, generating forests of
|
||||
> XML semantic trees representing entire documents.
|
||||
> This is a rich representation that includes the semantic and
|
||||
> structural characteristics of various chunks in the document as an XML tree.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
pip install lxml
|
||||
```
|
||||
1. Create a Docugami workspace: <a href="http://www.docugami.com">http://www.docugami.com</a> (free trials available)
|
||||
2. Add your documents (PDF, DOCX or DOC) and allow Docugami to ingest and cluster them into sets of similar documents, e.g. NDAs, Lease Agreements, and Service Agreements. There is no fixed set of document types supported by the system, the clusters created depend on your particular documents, and you can [change the docset assignments](https://help.docugami.com/home/working-with-the-doc-sets-view) later.
|
||||
3. Create an access token via the Developer Playground for your workspace. Detailed instructions: https://help.docugami.com/home/docugami-api
|
||||
4. Explore the Docugami API at <a href="https://api-docs.docugami.com">https://api-docs.docugami.com</a> to get a list of your processed docset IDs, or just the document IDs for a particular docset.
|
||||
6. Use the DocugamiLoader as detailed in [this notebook](../modules/indexes/document_loaders/examples/docugami.ipynb), to get rich semantic chunks for your documents.
|
||||
7. Optionally, build and publish one or more [reports or abstracts](https://help.docugami.com/home/reports). This helps Docugami improve the semantic XML with better tags based on your preferences, which are then added to the DocugamiLoader output as metadata. Use techniques like [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) to do high accuracy Document QA.
|
||||
|
||||
## Document Loader
|
||||
## Advantages vs Other Chunking Techniques
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/docugami.ipynb).
|
||||
Appropriate chunking of your documents is critical for retrieval from documents. Many chunking techniques exist, including simple ones that rely on whitespace and recursive chunk splitting based on character length. Docugami offers a different approach:
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DocugamiLoader
|
||||
```
|
||||
1. **Intelligent Chunking:** Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.
|
||||
2. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.
|
||||
3. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.
|
||||
4. **Additional Metadata:** Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through in [this notebook](../modules/indexes/document_loaders/examples/docugami.ipynb).
|
||||
|
||||
@@ -1,19 +0,0 @@
|
||||
# DuckDB
|
||||
|
||||
>[DuckDB](https://duckdb.org/) is an in-process SQL OLAP database management system.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `duckdb` python package.
|
||||
|
||||
```bash
|
||||
pip install duckdb
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/duckdb.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import DuckDBLoader
|
||||
```
|
||||
@@ -1,24 +0,0 @@
|
||||
# Elasticsearch
|
||||
|
||||
>[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine.
|
||||
> It provides a distributed, multi-tenant-capable full-text search engine with an HTTP web interface and schema-free
|
||||
> JSON documents.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install elasticsearch
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
>In information retrieval, [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) (BM is an abbreviation of best matching) is a ranking function used by search engines to estimate the relevance of documents to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.
|
||||
|
||||
>The name of the actual ranking function is BM25. The fuller name, Okapi BM25, includes the name of the first system to use it, which was the Okapi information retrieval system, implemented at London's City University in the 1980s and 1990s. BM25 and its newer variants, e.g. BM25F (a version of BM25 that can take document structure and anchor text into account), represent TF-IDF-like retrieval functions used in document retrieval.
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/elastic_search_bm25.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ElasticSearchBM25Retriever
|
||||
```
|
||||
@@ -1,20 +0,0 @@
|
||||
# EverNote
|
||||
|
||||
>[EverNote](https://evernote.com/) is intended for archiving and creating notes in which photos, audio and saved web content can be embedded. Notes are stored in virtual "notebooks" and can be tagged, annotated, edited, searched, and exported.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `lxml` and `html2text` python packages.
|
||||
|
||||
```bash
|
||||
pip install lxml
|
||||
pip install html2text
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/evernote.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import EverNoteLoader
|
||||
```
|
||||
@@ -1,21 +0,0 @@
|
||||
# Facebook Chat
|
||||
|
||||
>[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and
|
||||
> platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its
|
||||
> messaging service in 2010.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `pandas` python package.
|
||||
|
||||
```bash
|
||||
pip install pandas
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/facebook_chat.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import FacebookChatLoader
|
||||
```
|
||||
@@ -1,21 +0,0 @@
|
||||
# Figma
|
||||
|
||||
>[Figma](https://www.figma.com/) is a collaborative web application for interface design.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
The Figma API requires an `access token`, `node_ids`, and a `file key`.
|
||||
|
||||
The `file key` can be pulled from the URL. https://www.figma.com/file/{filekey}/sampleFilename
|
||||
|
||||
`Node IDs` are also available in the URL. Click on anything and look for the '?node-id={node_id}' param.
|
||||
|
||||
`Access token` [instructions](https://help.figma.com/hc/en-us/articles/8085703771159-Manage-personal-access-tokens).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/figma.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import FigmaFileLoader
|
||||
```
|
||||
@@ -1,19 +0,0 @@
|
||||
# Git
|
||||
|
||||
>[Git](https://en.wikipedia.org/wiki/Git) is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `GitPython` python package.
|
||||
|
||||
```bash
|
||||
pip install GitPython
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/git.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GitLoader
|
||||
```
|
||||
@@ -1,15 +0,0 @@
|
||||
# GitBook
|
||||
|
||||
>[GitBook](https://docs.gitbook.com/) is a modern documentation platform where teams can document everything from products to internal knowledge bases and APIs.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/gitbook.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GitbookLoader
|
||||
```
|
||||
@@ -1,20 +0,0 @@
|
||||
# Google BigQuery
|
||||
|
||||
>[Google BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.
|
||||
`BigQuery` is a part of the `Google Cloud Platform`.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `google-cloud-bigquery` python package.
|
||||
|
||||
```bash
|
||||
pip install google-cloud-bigquery
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_bigquery.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import BigQueryLoader
|
||||
```
|
||||
@@ -1,26 +0,0 @@
|
||||
# Google Cloud Storage
|
||||
|
||||
>[Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install `google-cloud-bigquery` python package.
|
||||
|
||||
```bash
|
||||
pip install google-cloud-storage
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_directory.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GCSDirectoryLoader
|
||||
```
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/google_cloud_storage_file.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GCSFileLoader
|
||||
```
|
||||
@@ -1,22 +0,0 @@
|
||||
# Google Drive
|
||||
|
||||
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
|
||||
|
||||
Currently, only `Google Docs` are supported.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install several python package.
|
||||
|
||||
```bash
|
||||
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example and authorizing instructions](../modules/indexes/document_loaders/examples/google_drive.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GoogleDriveLoader
|
||||
```
|
||||
@@ -1,15 +0,0 @@
|
||||
# Gutenberg
|
||||
|
||||
>[Project Gutenberg](https://www.gutenberg.org/about/) is an online library of free eBooks.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/gutenberg.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import GutenbergLoader
|
||||
```
|
||||
@@ -1,18 +0,0 @@
|
||||
# Hacker News
|
||||
|
||||
>[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as `HN`) is a social news
|
||||
> website focusing on computer science and entrepreneurship. It is run by the investment fund and startup
|
||||
> incubator `Y Combinator`. In general, content that can be submitted is defined as "anything that gratifies
|
||||
> one's intellectual curiosity."
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/hacker_news.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import HNLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# iFixit
|
||||
|
||||
>[iFixit](https://www.ifixit.com) is the largest, open repair community on the web. The site contains nearly 100k
|
||||
> repair manuals, 200k Questions & Answers on 42k devices, and all the data is licensed under `CC-BY-NC-SA 3.0`.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/ifixit.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import IFixitLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# IMSDb
|
||||
|
||||
>[IMSDb](https://imsdb.com/) is the `Internet Movie Script Database`.
|
||||
>
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/imsdb.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import IMSDbLoader
|
||||
```
|
||||
@@ -1,31 +0,0 @@
|
||||
# MediaWikiDump
|
||||
|
||||
>[MediaWiki XML Dumps](https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps) contain the content of a wiki
|
||||
> (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup
|
||||
> of the wiki database, the dump does not contain user accounts, images, edit logs, etc.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need to install several python packages.
|
||||
|
||||
The `mediawiki-utilities` supports XML schema 0.11 in unmerged branches.
|
||||
```bash
|
||||
pip install -qU git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11
|
||||
```
|
||||
|
||||
The `mediawiki-utilities mwxml` has a bug, fix PR pending.
|
||||
|
||||
```bash
|
||||
pip install -qU git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11
|
||||
pip install -qU mwparserfromhell
|
||||
```
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/mediawikidump.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import MWDumpLoader
|
||||
```
|
||||
@@ -1,22 +0,0 @@
|
||||
# Microsoft OneDrive
|
||||
|
||||
>[Microsoft OneDrive](https://en.wikipedia.org/wiki/OneDrive) (formerly `SkyDrive`) is a file-hosting service operated by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install o365
|
||||
```
|
||||
|
||||
Then follow instructions [here](../modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_onedrive.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import OneDriveLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# Microsoft PowerPoint
|
||||
|
||||
>[Microsoft PowerPoint](https://en.wikipedia.org/wiki/Microsoft_PowerPoint) is a presentation program by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_powerpoint.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredPowerPointLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# Microsoft Word
|
||||
|
||||
>[Microsoft Word](https://www.microsoft.com/en-us/microsoft-365/word) is a word processor developed by Microsoft.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/microsoft_word.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import UnstructuredWordDocumentLoader
|
||||
```
|
||||
@@ -1,19 +0,0 @@
|
||||
# Modern Treasury
|
||||
|
||||
>[Modern Treasury](https://www.moderntreasury.com/) simplifies complex payment operations. It is a unified platform to power products and processes that move money.
|
||||
>- Connect to banks and payment systems
|
||||
>- Track transactions and balances in real-time
|
||||
>- Automate payment operations for scale
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/modern_treasury.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import ModernTreasuryLoader
|
||||
```
|
||||
@@ -1,21 +1,20 @@
|
||||
# Momento
|
||||
|
||||
>[Momento Cache](https://docs.momentohq.com/) is the world's first truly serverless caching service. It provides instant elasticity, scale-to-zero
|
||||
> capability, and blazing-fast performance.
|
||||
> With Momento Cache, you grab the SDK, you get an end point, input a few lines into your code, and you're off and running.
|
||||
|
||||
This page covers how to use the [Momento](https://gomomento.com) ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Momento wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Sign up for a free account [here](https://docs.momentohq.com/getting-started) and get an auth token
|
||||
- Install the Momento Python SDK with `pip install momento`
|
||||
|
||||
## Wrappers
|
||||
|
||||
## Cache
|
||||
### Cache
|
||||
|
||||
The Cache wrapper allows for [Momento](https://gomomento.com) to be used as a serverless, distributed, low-latency cache for LLM prompts and responses.
|
||||
|
||||
#### Standard Cache
|
||||
|
||||
The standard cache is the go-to use case for [Momento](https://gomomento.com) users in any environment.
|
||||
|
||||
@@ -45,10 +44,10 @@ cache_name = "langchain"
|
||||
langchain.llm_cache = MomentoCache(cache_client, cache_name)
|
||||
```
|
||||
|
||||
## Memory
|
||||
### Memory
|
||||
|
||||
Momento can be used as a distributed memory store for LLMs.
|
||||
|
||||
### Chat Message History Memory
|
||||
#### Chat Message History Memory
|
||||
|
||||
See [this notebook](../modules/memory/examples/momento_chat_message_history.ipynb) for a walkthrough of how to use Momento as a memory store for chat message history.
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
# Notion DB
|
||||
|
||||
>[Notion](https://www.notion.so/) is a collaboration platform with modified Markdown support that integrates kanban
|
||||
> boards, tasks, wikis and databases. It is an all-in-one workspace for notetaking, knowledge and data management,
|
||||
> and project and task management.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
All instructions are in examples below.
|
||||
|
||||
## Document Loader
|
||||
|
||||
We have two different loaders: `NotionDirectoryLoader` and `NotionDBLoader`.
|
||||
|
||||
See a [usage example for the NotionDirectoryLoader](../modules/indexes/document_loaders/examples/notion.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import NotionDirectoryLoader
|
||||
```
|
||||
|
||||
See a [usage example for the NotionDBLoader](../modules/indexes/document_loaders/examples/notiondb.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import NotionDBLoader
|
||||
```
|
||||
@@ -1,19 +0,0 @@
|
||||
# Obsidian
|
||||
|
||||
>[Obsidian](https://obsidian.md/) is a powerful and extensible knowledge base
|
||||
that works on top of your local folder of plain text files.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
All instructions are in examples below.
|
||||
|
||||
## Document Loader
|
||||
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/obsidian.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import ObsidianLoader
|
||||
```
|
||||
|
||||
@@ -71,11 +71,3 @@ See a [usage example](../modules/indexes/document_loaders/examples/chatgpt_loade
|
||||
```python
|
||||
from langchain.document_loaders.chatgpt import ChatGPTLoader
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/chatgpt-plugin.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ChatGPTPluginRetriever
|
||||
```
|
||||
|
||||
@@ -4,19 +4,17 @@ This page covers how to use the Pinecone ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Pinecone wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
Install the Python SDK:
|
||||
```bash
|
||||
pip install pinecone-client
|
||||
```
|
||||
- Install the Python SDK with `pip install pinecone-client`
|
||||
## Wrappers
|
||||
|
||||
|
||||
## Vectorstore
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around Pinecone indexes, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Pinecone
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Pinecone vectorstore, see [this notebook](../modules/indexes/vectorstores/examples/pinecone.ipynb)
|
||||
For a more detailed walkthrough of the Pinecone wrapper, see [this notebook](../modules/indexes/vectorstores/examples/pinecone.ipynb)
|
||||
|
||||
@@ -1,25 +1,19 @@
|
||||
# Psychic
|
||||
|
||||
>[Psychic](https://www.psychic.dev/) is a platform for integrating with SaaS tools like `Notion`, `Zendesk`,
|
||||
> `Confluence`, and `Google Drive` via OAuth and syncing documents from these applications to your SQL or vector
|
||||
> database. You can think of it like Plaid for unstructured data.
|
||||
This page covers how to use [Psychic](https://www.psychic.dev/) within LangChain.
|
||||
|
||||
## Installation and Setup
|
||||
## What is Psychic?
|
||||
|
||||
```bash
|
||||
pip install psychicapi
|
||||
```
|
||||
Psychic is a platform for integrating with your customer’s SaaS tools like Notion, Zendesk, Confluence, and Google Drive via OAuth and syncing documents from these applications to your SQL or vector database. You can think of it like Plaid for unstructured data. Psychic is easy to set up - you use it by importing the react library and configuring it with your Sidekick API key, which you can get from the [Psychic dashboard](https://dashboard.psychic.dev/). When your users connect their applications, you can view these connections from the dashboard and retrieve data using the server-side libraries.
|
||||
|
||||
## Quick start
|
||||
|
||||
Psychic is easy to set up - you import the `react` library and configure it with your `Sidekick API` key, which you get
|
||||
from the [Psychic dashboard](https://dashboard.psychic.dev/). When you connect the applications, you
|
||||
view these connections from the dashboard and retrieve data using the server-side libraries.
|
||||
|
||||
1. Create an account in the [dashboard](https://dashboard.psychic.dev/).
|
||||
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. You will use this to connect the SaaS apps.
|
||||
3. Once you have created a connection, you can use the `PsychicLoader` by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
|
||||
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. Users will use this to connect their SaaS apps.
|
||||
3. Once your user has created a connection, you can use the langchain PsychicLoader by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
|
||||
|
||||
|
||||
## Advantages vs Other Document Loaders
|
||||
# Advantages vs Other Document Loaders
|
||||
|
||||
1. **Universal API:** Instead of building OAuth flows and learning the APIs for every SaaS app, you integrate Psychic once and leverage our universal API to retrieve data.
|
||||
2. **Data Syncs:** Data in your customers' SaaS apps can get stale fast. With Psychic you can configure webhooks to keep your documents up to date on a daily or realtime basis.
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
# Reddit
|
||||
|
||||
>[Reddit](www.reddit.com) is an American social news aggregation, content rating, and discussion website.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
First, you need to install a python package.
|
||||
|
||||
```bash
|
||||
pip install praw
|
||||
```
|
||||
|
||||
Make a [Reddit Application](https://www.reddit.com/prefs/apps/) and initialize the loader with with your Reddit API credentials.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/reddit.ipynb).
|
||||
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import RedditPostsLoader
|
||||
```
|
||||
@@ -1,17 +0,0 @@
|
||||
# Roam
|
||||
|
||||
>[ROAM](https://roamresearch.com/) is a note-taking tool for networked thought, designed to create a personal knowledge base.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/roam.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import RoamLoader
|
||||
```
|
||||
@@ -1,17 +0,0 @@
|
||||
# Slack
|
||||
|
||||
>[Slack](https://slack.com/) is an instant messaging program.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/slack.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import SlackDirectoryLoader
|
||||
```
|
||||
@@ -1,20 +0,0 @@
|
||||
# spaCy
|
||||
|
||||
>[spaCy](https://spacy.io/) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install spacy
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Text Splitter
|
||||
|
||||
See a [usage example](../modules/indexes/text_splitters/examples/spacy.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.llms import SpacyTextSplitter
|
||||
```
|
||||
@@ -1,15 +0,0 @@
|
||||
# Spreedly
|
||||
|
||||
>[Spreedly](https://docs.spreedly.com/) is a service that allows you to securely store credit cards and use them to transact against any number of payment gateways and third party APIs. It does this by simultaneously providing a card tokenization/vault service as well as a gateway and receiver integration service. Payment methods tokenized by Spreedly are stored at `Spreedly`, allowing you to independently store a card and then pass that card to different end points based on your business requirements.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
See [setup instructions](../modules/indexes/document_loaders/examples/spreedly.ipynb).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/spreedly.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import SpreedlyLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# Stripe
|
||||
|
||||
>[Stripe](https://stripe.com/en-ca) is an Irish-American financial services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
See [setup instructions](../modules/indexes/document_loaders/examples/stripe.ipynb).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/stripe.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import StripeLoader
|
||||
```
|
||||
@@ -1,17 +0,0 @@
|
||||
# Telegram
|
||||
|
||||
>[Telegram Messenger](https://web.telegram.org/a/) is a globally accessible freemium, cross-platform, encrypted, cloud-based and centralized instant messaging service. The application also provides optional end-to-end encrypted chats and video calling, VoIP, file sharing and several other features.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
See [setup instructions](../modules/indexes/document_loaders/examples/telegram.ipynb).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/telegram.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import TelegramChatFileLoader
|
||||
from langchain.document_loaders import TelegramChatApiLoader
|
||||
```
|
||||
@@ -1,16 +0,0 @@
|
||||
# 2Markdown
|
||||
|
||||
>[2markdown](https://2markdown.com/) service transforms website content into structured markdown files.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
We need the `API key`. See [instructions how to get it](https://2markdown.com/login).
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/tomarkdown.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import ToMarkdownLoader
|
||||
```
|
||||
@@ -1,22 +0,0 @@
|
||||
# Trello
|
||||
|
||||
>[Trello](https://www.atlassian.com/software/trello) is a web-based project management and collaboration tool that allows individuals and teams to organize and track their tasks and projects. It provides a visual interface known as a "board" where users can create lists and cards to represent their tasks and activities.
|
||||
>The TrelloLoader allows us to load cards from a `Trello` board.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install py-trello beautifulsoup4
|
||||
```
|
||||
|
||||
See [setup instructions](../modules/indexes/document_loaders/examples/trello.ipynb).
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/trello.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import TrelloLoader
|
||||
```
|
||||
@@ -1,21 +0,0 @@
|
||||
# Twitter
|
||||
|
||||
>[Twitter](https://twitter.com/) is an online social media and social networking service.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install tweepy
|
||||
```
|
||||
|
||||
We must initialize the loader with the `Twitter API` token, and we need to set up the Twitter `username`.
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/twitter.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import TwitterTweetLoader
|
||||
```
|
||||
@@ -4,7 +4,8 @@
|
||||
[Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like
|
||||
PDFs and Word documents.
|
||||
This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured)
|
||||
ecosystem within LangChain.
|
||||
ecosystem within LangChain.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -19,6 +20,12 @@ its dependencies running locally.
|
||||
- `tesseract-ocr`(images and PDFs)
|
||||
- `libreoffice` (MS Office docs)
|
||||
- `pandoc` (EPUBs)
|
||||
- If you are parsing PDFs using the `"hi_res"` strategy, run the following to install the `detectron2` model, which
|
||||
`unstructured` uses for layout detection:
|
||||
- `pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@e2ce8dc#egg=detectron2"`
|
||||
- If `detectron2` is not installed, `unstructured` will fallback to processing PDFs
|
||||
using the `"fast"` strategy, which uses `pdfminer` directly and doesn't require
|
||||
`detectron2`.
|
||||
|
||||
If you want to get up and running with less set up, you can
|
||||
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
# Vespa
|
||||
|
||||
>[Vespa](https://vespa.ai/) is a fully featured search engine and vector database.
|
||||
> It supports vector search (ANN), lexical search, and search in structured data, all in the same query.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install pyvespa
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/vespa.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import VespaRetriever
|
||||
```
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -9,15 +8,9 @@
|
||||
"\n",
|
||||
"This notebook goes over how to track your LangChain experiments into one centralized Weights and Biases dashboard. To learn more about prompt engineering and the callback please refer to this Report which explains both alongside the resultant dashboards you can expect to see.\n",
|
||||
"\n",
|
||||
"Run in Colab: https://colab.research.google.com/drive/1DXH4beT4HFaRKy_Vm4PoxhXVDRf7Ym8L?usp=sharing\n",
|
||||
"\n",
|
||||
"<a href=\"https://colab.research.google.com/drive/1DXH4beT4HFaRKy_Vm4PoxhXVDRf7Ym8L?usp=sharing\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"[View Report](https://wandb.ai/a-sh0ts/langchain_callback_demo/reports/Prompt-Engineering-LLMs-with-LangChain-and-W-B--VmlldzozNjk1NTUw#👋-how-to-build-a-callback-in-langchain-for-better-prompt-engineering\n",
|
||||
") \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"**Note**: _the `WandbCallbackHandler` is being deprecated in favour of the `WandbTracer`_ . In future please use the `WandbTracer` as it is more flexible and allows for more granular logging. To know more about the `WandbTracer` refer to the [agent_with_wandb_tracing.ipynb](https://python.langchain.com/en/latest/integrations/agent_with_wandb_tracing.html) notebook or use the following [colab notebook](http://wandb.me/prompts-quickstart). To know more about Weights & Biases Prompts refer to the following [prompts documentation](https://docs.wandb.ai/guides/prompts)."
|
||||
"View Report: https://wandb.ai/a-sh0ts/langchain_callback_demo/reports/Prompt-Engineering-LLMs-with-LangChain-and-W-B--VmlldzozNjk1NTUw#👋-how-to-build-a-callback-in-langchain-for-better-prompt-engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -61,7 +54,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -83,7 +75,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "cxBFfZR8d9FC"
|
||||
@@ -99,7 +90,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -210,7 +200,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Q-65jwrDeK6w"
|
||||
@@ -228,7 +217,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
# Weather
|
||||
|
||||
>[OpenWeatherMap](https://openweathermap.org/) is an open source weather service provider.
|
||||
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install pyowm
|
||||
```
|
||||
|
||||
We must set up the `OpenWeatherMap API token`.
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/weather.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import WeatherDataLoader
|
||||
```
|
||||
@@ -1,18 +0,0 @@
|
||||
# WhatsApp
|
||||
|
||||
>[WhatsApp](https://www.whatsapp.com/) (also called `WhatsApp Messenger`) is a freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service. It allows users to send text and voice messages, make voice and video calls, and share images, documents, user locations, and other content.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
There isn't any special setup for it.
|
||||
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/whatsapp_chat.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import WhatsAppChatLoader
|
||||
```
|
||||
@@ -1,28 +0,0 @@
|
||||
# Wikipedia
|
||||
|
||||
>[Wikipedia](https://wikipedia.org/) is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. `Wikipedia` is the largest and most-read reference work in history.
|
||||
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install wikipedia
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/wikipedia.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import WikipediaLoader
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/wikipedia.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import WikipediaRetriever
|
||||
```
|
||||
@@ -1,22 +0,0 @@
|
||||
# YouTube
|
||||
|
||||
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.
|
||||
> We download the `YouTube` transcripts and video information.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install youtube-transcript-api
|
||||
pip install pytube
|
||||
```
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/youtube_transcript.ipynb).
|
||||
|
||||
|
||||
## Document Loader
|
||||
|
||||
See a [usage example](../modules/indexes/document_loaders/examples/youtube_transcript.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.document_loaders import YoutubeLoader
|
||||
from langchain.document_loaders import GoogleApiYoutubeLoader
|
||||
```
|
||||
@@ -1,28 +0,0 @@
|
||||
# Zep
|
||||
|
||||
>[Zep](https://docs.getzep.com/) - A long-term memory store for LLM applications.
|
||||
|
||||
>`Zep` stores, summarizes, embeds, indexes, and enriches conversational AI chat histories, and exposes them via simple, low-latency APIs.
|
||||
>- Long-term memory persistence, with access to historical messages irrespective of your summarization strategy.
|
||||
>- Auto-summarization of memory messages based on a configurable message window. A series of summaries are stored, providing flexibility for future summarization strategies.
|
||||
>- Vector search over memories, with messages automatically embedded on creation.
|
||||
>- Auto-token counting of memories and summaries, allowing finer-grained control over prompt assembly.
|
||||
>- Python and JavaScript SDKs.
|
||||
|
||||
|
||||
`Zep` [project](https://github.com/getzep/zep)
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install zep_python
|
||||
```
|
||||
|
||||
|
||||
## Retriever
|
||||
|
||||
See a [usage example](../modules/indexes/retrievers/examples/zep_memorystore.ipynb).
|
||||
|
||||
```python
|
||||
from langchain.retrievers import ZepRetriever
|
||||
```
|
||||
@@ -1,20 +1,19 @@
|
||||
# Zilliz
|
||||
|
||||
>[Zilliz Cloud](https://zilliz.com/doc/quick_start) is a fully managed service on cloud for `LF AI Milvus®`,
|
||||
|
||||
This page covers how to use the Zilliz Cloud ecosystem within LangChain.
|
||||
Zilliz uses the Milvus integration.
|
||||
It is broken into two parts: installation and setup, and then references to specific Milvus wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install pymilvus`
|
||||
## Wrappers
|
||||
|
||||
Install the Python SDK:
|
||||
```bash
|
||||
pip install pymilvus
|
||||
```
|
||||
### VectorStore
|
||||
|
||||
## Vectorstore
|
||||
|
||||
A wrapper around Zilliz indexes allows you to use it as a vectorstore,
|
||||
There exists a wrapper around Zilliz indexes, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Milvus
|
||||
```
|
||||
|
||||
@@ -5,101 +5,108 @@ Agents
|
||||
`Conceptual Guide <https://docs.langchain.com/docs/components/agents>`_
|
||||
|
||||
|
||||
Some applications require not just a predetermined chain of calls to LLMs/other tools,
|
||||
Some applications will require not just a predetermined chain of calls to LLMs/other tools,
|
||||
but potentially an unknown chain that depends on the user's input.
|
||||
In these types of chains, there is an **agent** which has access to a suite of **tools**.
|
||||
In these types of chains, there is a “agent” which has access to a suite of tools.
|
||||
Depending on the user input, the agent can then decide which, if any, of these tools to call.
|
||||
|
||||
At the moment, there are two main types of agents:
|
||||
|
||||
1. **Action Agents**: these agents decide the actions to take and execute that actions one action at a time.
|
||||
2. **Plan-and-Execute Agents**: these agents first decide a plan of actions to take, and then execute those actions one at a time.
|
||||
1. "Action Agents": these agents decide an action to take and take that action one step at a time
|
||||
2. "Plan-and-Execute Agents": these agents first decide a plan of actions to take, and then execute those actions one at a time.
|
||||
|
||||
When should you use each one? Action Agents are more conventional, and good for small tasks.
|
||||
For more complex or long running tasks, the initial planning step helps to maintain long term objectives and focus.
|
||||
However, that comes at the expense of generally more calls and higher latency.
|
||||
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in charge
|
||||
of the execution for the Plan and Execute agent.
|
||||
For more complex or long running tasks, the initial planning step helps to maintain long term objectives and focus. However, that comes at the expense of generally more calls and higher latency.
|
||||
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in charge of the execution for the Plan and Execute agent.
|
||||
|
||||
Action Agents
|
||||
-------------
|
||||
|
||||
High level pseudocode of the Action Agents:
|
||||
High level pseudocode of agents looks something like:
|
||||
|
||||
- The **user input** is received
|
||||
- The **agent** decides which **tool** - if any - to use, and what the **tool input** should be
|
||||
- That **tool** is then called with the **tool input**, and an **observation** is recorded (the output of this calling)
|
||||
- That history of **tool**, **tool input**, and **observation** is passed back into the **agent**, and it decides the next step
|
||||
- This is repeated until the **agent** decides it no longer needs to use a **tool**, and then it responds directly to the user.
|
||||
- Some user input is received
|
||||
- The `agent` decides which `tool` - if any - to use, and what the input to that tool should be
|
||||
- That `tool` is then called with that `tool input`, and an `observation` is recorded (this is just the output of calling that tool with that tool input)
|
||||
- That history of `tool`, `tool input`, and `observation` is passed back into the `agent`, and it decides what step to take next
|
||||
- This is repeated until the `agent` decides it no longer needs to use a `tool`, and then it responds directly to the user.
|
||||
|
||||
The different abstractions involved in agents are as follows:
|
||||
|
||||
The different abstractions involved in agents are:
|
||||
- Agent: this is where the logic of the application lives. Agents expose an interface that takes in user input along with a list of previous steps the agent has taken, and returns either an `AgentAction` or `AgentFinish`
|
||||
- `AgentAction` corresponds to the tool to use and the input to that tool
|
||||
- `AgentFinish` means the agent is done, and has information around what to return to the user
|
||||
- Tools: these are the actions an agent can take. What tools you give an agent highly depend on what you want the agent to do
|
||||
- Toolkits: these are groups of tools designed for a specific use case. For example, in order for an agent to interact with a SQL database in the best way it may need access to one tool to execute queries and another tool to inspect tables.
|
||||
- Agent Executor: this wraps an agent and a list of tools. This is responsible for the loop of running the agent iteratively until the stopping criteria is met.
|
||||
|
||||
- **Agent**: this is where the logic of the application lives. Agents expose an interface that takes in user input
|
||||
along with a list of previous steps the agent has taken, and returns either an **AgentAction** or **AgentFinish**
|
||||
The most important abstraction of the four above to understand is that of the agent.
|
||||
Although an agent can be defined in whatever way one chooses, the typical way to construct an agent is with:
|
||||
|
||||
- **AgentAction** corresponds to the tool to use and the input to that tool
|
||||
- **AgentFinish** means the agent is done, and has information around what to return to the user
|
||||
- **Tools**: these are the actions an agent can take. What tools you give an agent highly depend on what you want the agent to do
|
||||
- **Toolkits**: these are groups of tools designed for a specific use case. For example, in order for an agent to
|
||||
interact with a SQL database in the best way it may need access to one tool to execute queries and another tool to inspect tables.
|
||||
- **Agent Executor**: this wraps an agent and a list of tools. This is responsible for the loop of running the agent
|
||||
iteratively until the stopping criteria is met.
|
||||
|
||||
|
||||
|
|
||||
- `Getting Started <./agents/getting_started.html>`_: An overview of agents. It covers how to use all things related to agents in an end-to-end manner.
|
||||
|
||||
|
||||
|
|
||||
**Agent Construction:**
|
||||
|
||||
Although an agent can be constructed in many way, the typical way to construct an agent is with:
|
||||
|
||||
- **PromptTemplate**: this is responsible for taking the user input and previous steps and constructing a prompt
|
||||
to send to the language model
|
||||
- **Language Model**: this takes the prompt constructed by the PromptTemplate and returns some output
|
||||
- **Output Parser**: this takes the output of the Language Model and parses it into an **AgentAction** or **AgentFinish** object.
|
||||
|
||||
|
||||
|
|
||||
**Additional Documentation:**
|
||||
|
||||
|
||||
- `Tools <./agents/tools.html>`_: Different types of **tools** LangChain supports natively. We also cover how to add your own tools.
|
||||
|
||||
- `Agents <./agents/agents.html>`_: Different types of **agents** LangChain supports natively. We also cover how to
|
||||
modify and create your own agents.
|
||||
|
||||
- `Toolkits <./agents/toolkits.html>`_: Various **toolkits** that LangChain supports out of the box, and how to
|
||||
create an agent from them.
|
||||
|
||||
- `Agent Executor <./agents/agent_executors.html>`_: The **Agent Executor** class, which is responsible for calling
|
||||
the agent and tools in a loop. We go over different ways to customize this, and options you can use for more control.
|
||||
|
||||
|
||||
Plan-and-Execute Agents
|
||||
-----------------------
|
||||
High level pseudocode of the **Plan-and-Execute Agents**:
|
||||
|
||||
- The **user input** is received
|
||||
- The **planner** lists out the steps to take
|
||||
- The **executor** goes through the list of steps, executing them
|
||||
|
||||
The most typical implementation is to have the planner be a language model, and the executor be an action agent.
|
||||
|
||||
|
|
||||
- `Plan-and-Execute Agents <./agents/plan_and_execute.html>`_
|
||||
- PromptTemplate: this is responsible for taking the user input and previous steps and constructing a prompt to send to the language model
|
||||
- Language Model: this takes the prompt constructed by the PromptTemplate and returns some output
|
||||
- Output Parser: this takes the output of the Language Model and parses it into an `AgentAction` or `AgentFinish` object.
|
||||
|
||||
In this section of documentation, we first start with a Getting Started notebook to cover how to use all things related to agents in an end-to-end manner.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
./agents/getting_started.ipynb
|
||||
|
||||
|
||||
We then split the documentation into the following sections:
|
||||
|
||||
**Tools**
|
||||
|
||||
In this section we cover the different types of tools LangChain supports natively.
|
||||
We then cover how to add your own tools.
|
||||
|
||||
|
||||
**Agents**
|
||||
|
||||
In this section we cover the different types of agents LangChain supports natively.
|
||||
We then cover how to modify and create your own agents.
|
||||
|
||||
|
||||
**Toolkits**
|
||||
|
||||
In this section we go over the various toolkits that LangChain supports out of the box,
|
||||
and how to create an agent from them.
|
||||
|
||||
|
||||
**Agent Executor**
|
||||
|
||||
In this section we go over the Agent Executor class, which is responsible for calling
|
||||
the agent and tools in a loop. We go over different ways to customize this, and options you
|
||||
can use for more control.
|
||||
|
||||
**Go Deeper**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
./agents/tools.rst
|
||||
./agents/agents.rst
|
||||
./agents/toolkits.rst
|
||||
./agents/agent_executors.rst
|
||||
|
||||
Plan-and-Execute Agents
|
||||
-----------------------
|
||||
|
||||
High level pseudocode of agents looks something like:
|
||||
|
||||
- Some user input is received
|
||||
- The planner lists out the steps to take
|
||||
- The executor goes through the list of steps, executing them
|
||||
|
||||
The most typical implementation is to have the planner be a language model,
|
||||
and the executor be an action agent.
|
||||
|
||||
**Go Deeper**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
./agents/plan_and_execute.ipynb
|
||||
|
||||
|
||||
@@ -9,8 +9,8 @@
|
||||
"\n",
|
||||
"This notebook goes over adding memory to **both** of an Agent and its tools. Before going through this notebook, please walk through the following notebooks, as this will build on top of both of them:\n",
|
||||
"\n",
|
||||
"- [Adding memory to an LLM Chain](../../../memory/examples/adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](../../agents/custom_agent.ipynb)\n",
|
||||
"- [Adding memory to an LLM Chain](../../memory/examples/adding_memory.ipynb)\n",
|
||||
"- [Custom Agents](custom_agent.ipynb)\n",
|
||||
"\n",
|
||||
"We are going to create a custom Agent. The agent has access to a conversation memory, search tool, and a summarization tool. And, the summarization tool also needs access to the conversation memory."
|
||||
]
|
||||
|
||||
@@ -36,7 +36,7 @@ The first category of how-to guides here cover specific parts of working with ag
|
||||
:glob:
|
||||
:hidden:
|
||||
|
||||
./agents/examples/*
|
||||
./examples/*
|
||||
|
||||
|
||||
Agent Toolkits
|
||||
@@ -46,26 +46,26 @@ The next set of examples covers agents with toolkits.
|
||||
As opposed to the examples above, these examples are not intended to show off an agent `type`,
|
||||
but rather to show off an agent applied to particular use case.
|
||||
|
||||
`SQLDatabase Agent <./toolkits/sql_database.html>`_: This notebook covers how to interact with an arbitrary SQL database using an agent.
|
||||
`SQLDatabase Agent <./agent_toolkits/sql_database.html>`_: This notebook covers how to interact with an arbitrary SQL database using an agent.
|
||||
|
||||
`JSON Agent <./toolkits/json.html>`_: This notebook covers how to interact with a JSON dictionary using an agent.
|
||||
`JSON Agent <./agent_toolkits/json.html>`_: This notebook covers how to interact with a JSON dictionary using an agent.
|
||||
|
||||
`OpenAPI Agent <./toolkits/openapi.html>`_: This notebook covers how to interact with an arbitrary OpenAPI endpoint using an agent.
|
||||
`OpenAPI Agent <./agent_toolkits/openapi.html>`_: This notebook covers how to interact with an arbitrary OpenAPI endpoint using an agent.
|
||||
|
||||
`VectorStore Agent <./toolkits/vectorstore.html>`_: This notebook covers how to interact with VectorStores using an agent.
|
||||
`VectorStore Agent <./agent_toolkits/vectorstore.html>`_: This notebook covers how to interact with VectorStores using an agent.
|
||||
|
||||
`Python Agent <./toolkits/python.html>`_: This notebook covers how to produce and execute python code using an agent.
|
||||
`Python Agent <./agent_toolkits/python.html>`_: This notebook covers how to produce and execute python code using an agent.
|
||||
|
||||
`Pandas DataFrame Agent <./toolkits/pandas.html>`_: This notebook covers how to do question answering over a pandas dataframe using an agent. Under the hood this calls the Python agent..
|
||||
`Pandas DataFrame Agent <./agent_toolkits/pandas.html>`_: This notebook covers how to do question answering over a pandas dataframe using an agent. Under the hood this calls the Python agent..
|
||||
|
||||
`CSV Agent <./toolkits/csv.html>`_: This notebook covers how to do question answering over a csv file. Under the hood this calls the Pandas DataFrame agent.
|
||||
`CSV Agent <./agent_toolkits/csv.html>`_: This notebook covers how to do question answering over a csv file. Under the hood this calls the Pandas DataFrame agent.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
:hidden:
|
||||
|
||||
./toolkits/*
|
||||
./agent_toolkits/*
|
||||
|
||||
|
||||
Agent Types
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
|
||||
"metadata": {
|
||||
@@ -12,7 +11,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "29dd6333-307c-43df-b848-65001c01733b",
|
||||
"metadata": {},
|
||||
@@ -38,7 +36,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "19a813f7",
|
||||
"metadata": {},
|
||||
@@ -87,7 +84,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "53a743b8",
|
||||
"metadata": {},
|
||||
@@ -96,12 +92,11 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "23602c62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"By default, we assume that the token sequence ``\"Final\", \"Answer\", \":\"`` indicates that the agent has reached an answers. We can, however, also pass a custom sequence to use as answer prefix."
|
||||
"By default, we assume that the token sequence ``\"\\nFinal\", \" Answer\", \":\"`` indicates that the agent has reached an answers. We can, however, also pass a custom sequence to use as answer prefix."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -113,75 +108,26 @@
|
||||
"source": [
|
||||
"llm = OpenAI(\n",
|
||||
" streaming=True,\n",
|
||||
" callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=[\"The\", \"answer\", \":\"])],\n",
|
||||
" callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=[\"\\nThe\", \" answer\", \":\"])],\n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "b1a96cc0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For convenience, the callback automatically strips whitespaces and new line characters when comparing to `answer_prefix_tokens`. I.e., if `answer_prefix_tokens = [\"The\", \" answer\", \":\"]` then both `[\"\\nThe\", \" answer\", \":\"]` and `[\"The\", \" answer\", \":\"]` would be recognized a the answer prefix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9278b522",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you don't know the tokenized version of your answer prefix, you can determine it with the following code:"
|
||||
"Be aware you likely need to include whitespaces and new line characters in your token. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2f8f0640",
|
||||
"id": "9278b522",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.base import BaseCallbackHandler\n",
|
||||
"\n",
|
||||
"class MyCallbackHandler(BaseCallbackHandler):\n",
|
||||
" def on_llm_new_token(self, token, **kwargs) -> None:\n",
|
||||
" # print every token on a new line\n",
|
||||
" print(f\"#{token}#\")\n",
|
||||
"\n",
|
||||
"llm = OpenAI(streaming=True, callbacks=[MyCallbackHandler()])\n",
|
||||
"tools = load_tools([\"wikipedia\", \"llm-math\"], llm=llm)\n",
|
||||
"agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)\n",
|
||||
"agent.run(\"It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "61190e58",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Also streaming the answer prefixes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1255776f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When the parameter `stream_prefix = True` is set, the answer prefix itself will also be streamed. This can be useful when the answer prefix itself is part of the answer. For example, when your answer is a JSON like\n",
|
||||
"\n",
|
||||
"`\n",
|
||||
"{\n",
|
||||
" \"action\": \"Final answer\",\n",
|
||||
" \"action_input\": \"Konrad Adenauer became Chancellor 74 years ago.\"\n",
|
||||
"}\n",
|
||||
"`\n",
|
||||
"\n",
|
||||
"and you don't only want the action_input to be streamed, but the entire JSON."
|
||||
]
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -1,94 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eda326e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Brave Search\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use the Brave Search tool."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a4c896e5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import BraveSearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "6784d37c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"api_key = \"...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5b14008a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = BraveSearch.from_api_key(api_key=api_key, search_kwargs={\"count\": 3})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "f11937b2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'[{\"title\": \"Barack Obama - Wikipedia\", \"link\": \"https://en.wikipedia.org/wiki/Barack_Obama\", \"snippet\": \"Outside of politics, <strong>Obama</strong> has published three bestselling books: Dreams from My Father (1995), The Audacity of Hope (2006) and A Promised Land (2020). Rankings by scholars and historians, in which he has been featured since 2010, place him in the <strong>middle</strong> to upper tier of American presidents.\"}, {\"title\": \"Obama\\'s Middle Name -- My Last Name -- is \\'Hussein.\\' So?\", \"link\": \"https://www.cair.com/cair_in_the_news/obamas-middle-name-my-last-name-is-hussein-so/\", \"snippet\": \"Many Americans understand that common names don\\\\u2019t only come in the form of a \\\\u201cSmith\\\\u201d or a \\\\u201cJohnson.\\\\u201d Perhaps, they have a neighbor, mechanic or teacher named Hussein. Or maybe they\\\\u2019ve seen fashion designer Hussein Chalayan in the pages of Vogue or recall <strong>King Hussein</strong>, our ally in the Middle East.\"}, {\"title\": \"What\\'s up with Obama\\'s middle name? - Quora\", \"link\": \"https://www.quora.com/Whats-up-with-Obamas-middle-name\", \"snippet\": \"Answer (1 of 15): A better question would be, \\\\u201cWhat\\\\u2019s up with Obama\\\\u2019s first name?\\\\u201d President <strong>Barack Hussein Obama</strong>\\\\u2019s father\\\\u2019s name was <strong>Barack Hussein Obama</strong>. He was named after his father. Hussein, Obama\\\\u2019s middle name, is a very common Arabic name, meaning "good," "handsome," or "beautiful."\"}]'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool.run(\"obama middle name\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "da9c63d5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -184,7 +184,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.2"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,86 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64f20f38",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PubMed Tool\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use PubMed as a tool\n",
|
||||
"\n",
|
||||
"PubMed® comprises more than 35 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c80b9273",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import PubmedQueryRun"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "f203c965",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = PubmedQueryRun()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "baee7a2a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Published: <Year>2023</Year><Month>May</Month><Day>31</Day>\\nTitle: Dermatology in the wake of an AI revolution: who gets a say?\\nSummary: \\n\\nPublished: <Year>2023</Year><Month>May</Month><Day>30</Day>\\nTitle: What is ChatGPT and what do we do with it? Implications of the age of AI for nursing and midwifery practice and education: An editorial.\\nSummary: \\n\\nPublished: <Year>2023</Year><Month>Jun</Month><Day>02</Day>\\nTitle: The Impact of ChatGPT on the Nursing Profession: Revolutionizing Patient Care and Education.\\nSummary: The nursing field has undergone notable changes over time and is projected to undergo further modifications in the future, owing to the advent of sophisticated technologies and growing healthcare needs. The advent of ChatGPT, an AI-powered language model, is expected to exert a significant influence on the nursing profession, specifically in the domains of patient care and instruction. The present article delves into the ramifications of ChatGPT within the nursing domain and accentuates its capacity and constraints to transform the discipline.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tool.run(\"chatgpt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "965903ba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,322 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "144e77fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Human-in-the-loop Tool Validation\n",
|
||||
"\n",
|
||||
"This walkthrough demonstrates how to add Human validation to any Tool. We'll do this using the `HumanApprovalCallbackhandler`.\n",
|
||||
"\n",
|
||||
"Let's suppose we need to make use of the ShellTool. Adding this tool to an automated flow poses obvious risks. Let's see how we could enforce manual human approval of inputs going into this tool.\n",
|
||||
"\n",
|
||||
"**Note**: We generally recommend against using the ShellTool. There's a lot of ways to misuse it, and it's not required for most use cases. We employ it here only for demonstration purposes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "ad84c682",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import HumanApprovalCallbackHandler\n",
|
||||
"from langchain.tools import ShellTool"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "70090dd6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = ShellTool()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "20d5175f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Hello World!\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run('echo Hello World!'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e0475dd6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Adding Human Approval\n",
|
||||
"Adding the default HumanApprovalCallbackHandler to the tool will make it so that a user has to manually approve every input to the tool before the command is actually executed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "f1c88793",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tool = ShellTool(callbacks=[HumanApprovalCallbackHandler()])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f749815d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /usr\n",
|
||||
"yes\n",
|
||||
"\u001b[35mX11\u001b[m\u001b[m\n",
|
||||
"\u001b[35mX11R6\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mbin\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlib\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlibexec\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mlocal\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36msbin\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mshare\u001b[m\u001b[m\n",
|
||||
"\u001b[1m\u001b[36mstandalone\u001b[m\u001b[m\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run(\"ls /usr\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "b6e455d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /private\n",
|
||||
"no\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "HumanRejectedException",
|
||||
"evalue": "Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected.",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[17], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mls /private\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/tools/base.py:257\u001b[0m, in \u001b[0;36mBaseTool.run\u001b[0;34m(self, tool_input, verbose, start_color, color, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[38;5;66;03m# TODO: maybe also pass through run_manager is _run supports kwargs\u001b[39;00m\n\u001b[1;32m 256\u001b[0m new_arg_supported \u001b[38;5;241m=\u001b[39m signature(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run)\u001b[38;5;241m.\u001b[39mparameters\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 257\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m \u001b[43mcallback_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mon_tool_start\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 258\u001b[0m \u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mname\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdescription\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdescription\u001b[49m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 259\u001b[0m \u001b[43m \u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43misinstance\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 260\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_color\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 261\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 262\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 264\u001b[0m tool_args, tool_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_to_args_and_kwargs(parsed_input)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:672\u001b[0m, in \u001b[0;36mCallbackManager.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 669\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m run_id \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 670\u001b[0m run_id \u001b[38;5;241m=\u001b[39m uuid4()\n\u001b[0;32m--> 672\u001b[0m \u001b[43m_handle_event\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 673\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandlers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 674\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mon_tool_start\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 675\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mignore_agent\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 676\u001b[0m \u001b[43m \u001b[49m\u001b[43mserialized\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 677\u001b[0m \u001b[43m \u001b[49m\u001b[43minput_str\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 678\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 679\u001b[0m \u001b[43m \u001b[49m\u001b[43mparent_run_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparent_run_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 680\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 681\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 683\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CallbackManagerForToolRun(\n\u001b[1;32m 684\u001b[0m run_id, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minheritable_handlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparent_run_id\n\u001b[1;32m 685\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:157\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 156\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m handler\u001b[38;5;241m.\u001b[39mraise_error:\n\u001b[0;32m--> 157\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 158\u001b[0m logging\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError in \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mevent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m callback: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:139\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ignore_condition_name \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[1;32m 137\u001b[0m handler, ignore_condition_name\n\u001b[1;32m 138\u001b[0m ):\n\u001b[0;32m--> 139\u001b[0m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mhandler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mevent_name\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 141\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m event_name \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mon_chat_model_start\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/human.py:48\u001b[0m, in \u001b[0;36mHumanApprovalCallbackHandler.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mon_tool_start\u001b[39m(\n\u001b[1;32m 39\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 40\u001b[0m serialized: Dict[\u001b[38;5;28mstr\u001b[39m, Any],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 46\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Any:\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_check(serialized) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_approve(input_str):\n\u001b[0;32m---> 48\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HumanRejectedException(\n\u001b[1;32m 49\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInputs \u001b[39m\u001b[38;5;132;01m{\u001b[39;00minput_str\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to tool \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mserialized\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m were rejected.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 50\u001b[0m )\n",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m: Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(tool.run(\"ls /private\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a3b092ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configuring Human Approval\n",
|
||||
"\n",
|
||||
"Let's suppose we have an agent that takes in multiple tools, and we want it to only trigger human approval requests on certain tools and certain inputs. We can configure out callback handler to do just this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4521c581",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "9e8d5428",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def _should_check(serialized_obj: dict) -> bool:\n",
|
||||
" # Only require approval on ShellTool.\n",
|
||||
" return serialized_obj.get(\"name\") == \"terminal\"\n",
|
||||
"\n",
|
||||
"def _approve(_input: str) -> bool:\n",
|
||||
" if _input == \"echo 'Hello World'\":\n",
|
||||
" return True\n",
|
||||
" msg = (\n",
|
||||
" \"Do you approve of the following input? \"\n",
|
||||
" \"Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\"\n",
|
||||
" )\n",
|
||||
" msg += \"\\n\\n\" + _input + \"\\n\"\n",
|
||||
" resp = input(msg)\n",
|
||||
" return resp.lower() in (\"yes\", \"y\")\n",
|
||||
"\n",
|
||||
"callbacks = [HumanApprovalCallbackHandler(should_check=_should_check, approve=_approve)]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"id": "9922898e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"tools = load_tools([\"wikipedia\", \"llm-math\", \"terminal\"], llm=llm)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, \n",
|
||||
" llm, \n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, \n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"id": "e69ea402",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 38,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"id": "25182a7e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Hello World'"
|
||||
]
|
||||
},
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"print 'Hello World' in the terminal\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"id": "2f5a93d0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Do you approve of the following input? Anything except 'Y'/'Yes' (case-insensitive) will be treated as a no.\n",
|
||||
"\n",
|
||||
"ls /private\n",
|
||||
"no\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"ename": "HumanRejectedException",
|
||||
"evalue": "Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected.",
|
||||
"output_type": "error",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m Traceback (most recent call last)",
|
||||
"Cell \u001b[0;32mIn[39], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43magent\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mlist all directories in /private\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m)\u001b[49m\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:236\u001b[0m, in \u001b[0;36mChain.run\u001b[0;34m(self, callbacks, *args, **kwargs)\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 235\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m`run` supports only one positional argument.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 236\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43margs\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m)\u001b[49m[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]]\n\u001b[1;32m 238\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m kwargs \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m args:\n\u001b[1;32m 239\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m(kwargs, callbacks\u001b[38;5;241m=\u001b[39mcallbacks)[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]]\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:140\u001b[0m, in \u001b[0;36mChain.__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks)\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m, \u001b[38;5;167;01mException\u001b[39;00m) \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 139\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n\u001b[0;32m--> 140\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 141\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_end(outputs)\n\u001b[1;32m 142\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mprep_outputs(inputs, outputs, return_only_outputs)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/chains/base.py:134\u001b[0m, in \u001b[0;36mChain.__call__\u001b[0;34m(self, inputs, return_only_outputs, callbacks)\u001b[0m\n\u001b[1;32m 128\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m callback_manager\u001b[38;5;241m.\u001b[39mon_chain_start(\n\u001b[1;32m 129\u001b[0m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mname\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m},\n\u001b[1;32m 130\u001b[0m inputs,\n\u001b[1;32m 131\u001b[0m )\n\u001b[1;32m 132\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 133\u001b[0m outputs \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 134\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call(inputs)\n\u001b[1;32m 137\u001b[0m )\n\u001b[1;32m 138\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m (\u001b[38;5;167;01mKeyboardInterrupt\u001b[39;00m, \u001b[38;5;167;01mException\u001b[39;00m) \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 139\u001b[0m run_manager\u001b[38;5;241m.\u001b[39mon_chain_error(e)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/agents/agent.py:953\u001b[0m, in \u001b[0;36mAgentExecutor._call\u001b[0;34m(self, inputs, run_manager)\u001b[0m\n\u001b[1;32m 951\u001b[0m \u001b[38;5;66;03m# We now enter the agent loop (until it returns something).\u001b[39;00m\n\u001b[1;32m 952\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_continue(iterations, time_elapsed):\n\u001b[0;32m--> 953\u001b[0m next_step_output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_take_next_step\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 954\u001b[0m \u001b[43m \u001b[49m\u001b[43mname_to_tool_map\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 955\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor_mapping\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 956\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 957\u001b[0m \u001b[43m \u001b[49m\u001b[43mintermediate_steps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 958\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 959\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 960\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(next_step_output, AgentFinish):\n\u001b[1;32m 961\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_return(\n\u001b[1;32m 962\u001b[0m next_step_output, intermediate_steps, run_manager\u001b[38;5;241m=\u001b[39mrun_manager\n\u001b[1;32m 963\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/agents/agent.py:820\u001b[0m, in \u001b[0;36mAgentExecutor._take_next_step\u001b[0;34m(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)\u001b[0m\n\u001b[1;32m 818\u001b[0m tool_run_kwargs[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mllm_prefix\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 819\u001b[0m \u001b[38;5;66;03m# We then call the tool on the tool input to get an observation\u001b[39;00m\n\u001b[0;32m--> 820\u001b[0m observation \u001b[38;5;241m=\u001b[39m \u001b[43mtool\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 821\u001b[0m \u001b[43m \u001b[49m\u001b[43magent_action\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 822\u001b[0m \u001b[43m \u001b[49m\u001b[43mverbose\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mverbose\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 823\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcolor\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 824\u001b[0m \u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_child\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 825\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mtool_run_kwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 826\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 827\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 828\u001b[0m tool_run_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39magent\u001b[38;5;241m.\u001b[39mtool_run_logging_kwargs()\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/tools/base.py:257\u001b[0m, in \u001b[0;36mBaseTool.run\u001b[0;34m(self, tool_input, verbose, start_color, color, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[38;5;66;03m# TODO: maybe also pass through run_manager is _run supports kwargs\u001b[39;00m\n\u001b[1;32m 256\u001b[0m new_arg_supported \u001b[38;5;241m=\u001b[39m signature(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run)\u001b[38;5;241m.\u001b[39mparameters\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m--> 257\u001b[0m run_manager \u001b[38;5;241m=\u001b[39m \u001b[43mcallback_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mon_tool_start\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 258\u001b[0m \u001b[43m \u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mname\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mdescription\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdescription\u001b[49m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 259\u001b[0m \u001b[43m \u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43misinstance\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mtool_input\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 260\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolor\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstart_color\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 261\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 262\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 263\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 264\u001b[0m tool_args, tool_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_to_args_and_kwargs(parsed_input)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:672\u001b[0m, in \u001b[0;36mCallbackManager.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 669\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m run_id \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 670\u001b[0m run_id \u001b[38;5;241m=\u001b[39m uuid4()\n\u001b[0;32m--> 672\u001b[0m \u001b[43m_handle_event\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 673\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mhandlers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 674\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mon_tool_start\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 675\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mignore_agent\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 676\u001b[0m \u001b[43m \u001b[49m\u001b[43mserialized\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 677\u001b[0m \u001b[43m \u001b[49m\u001b[43minput_str\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 678\u001b[0m \u001b[43m \u001b[49m\u001b[43mrun_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 679\u001b[0m \u001b[43m \u001b[49m\u001b[43mparent_run_id\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mparent_run_id\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 680\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 681\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 683\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CallbackManagerForToolRun(\n\u001b[1;32m 684\u001b[0m run_id, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minheritable_handlers, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mparent_run_id\n\u001b[1;32m 685\u001b[0m )\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:157\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 156\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m handler\u001b[38;5;241m.\u001b[39mraise_error:\n\u001b[0;32m--> 157\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m 158\u001b[0m logging\u001b[38;5;241m.\u001b[39mwarning(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mError in \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mevent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m callback: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00me\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/manager.py:139\u001b[0m, in \u001b[0;36m_handle_event\u001b[0;34m(handlers, event_name, ignore_condition_name, *args, **kwargs)\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 136\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ignore_condition_name \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(\n\u001b[1;32m 137\u001b[0m handler, ignore_condition_name\n\u001b[1;32m 138\u001b[0m ):\n\u001b[0;32m--> 139\u001b[0m \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mhandler\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mevent_name\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mNotImplementedError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 141\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m event_name \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mon_chat_model_start\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n",
|
||||
"File \u001b[0;32m~/langchain/langchain/callbacks/human.py:48\u001b[0m, in \u001b[0;36mHumanApprovalCallbackHandler.on_tool_start\u001b[0;34m(self, serialized, input_str, run_id, parent_run_id, **kwargs)\u001b[0m\n\u001b[1;32m 38\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mon_tool_start\u001b[39m(\n\u001b[1;32m 39\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 40\u001b[0m serialized: Dict[\u001b[38;5;28mstr\u001b[39m, Any],\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m 46\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Any:\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_should_check(serialized) \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_approve(input_str):\n\u001b[0;32m---> 48\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HumanRejectedException(\n\u001b[1;32m 49\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInputs \u001b[39m\u001b[38;5;132;01m{\u001b[39;00minput_str\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m to tool \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mserialized\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m were rejected.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 50\u001b[0m )\n",
|
||||
"\u001b[0;31mHumanRejectedException\u001b[0m: Inputs ls /private to tool {'name': 'terminal', 'description': 'Run shell commands on this MacOS machine.'} were rejected."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"list all directories in /private\", callbacks=callbacks)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c0b47e26",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,423 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Argilla\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
">[Argilla](https://argilla.io/) is an open-source data curation platform for LLMs.\n",
|
||||
"> Using Argilla, everyone can build robust language models through faster data curation \n",
|
||||
"> using both human and machine feedback. We provide support for each step in the MLOps cycle, \n",
|
||||
"> from data labeling to model monitoring.\n",
|
||||
"\n",
|
||||
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/hwchase17/langchain/blob/master/docs/modules/callbacks/examples/argilla.ipynb\">\n",
|
||||
" <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
|
||||
"</a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this guide we will demonstrate how to track the inputs and reponses of your LLM to generate a dataset in Argilla, using the `ArgillaCallbackHandler`.\n",
|
||||
"\n",
|
||||
"It's useful to keep track of the inputs and outputs of your LLMs to generate datasets for future fine-tuning. This is especially useful when you're using a LLM to generate data for a specific task, such as question answering, summarization, or translation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install argilla --upgrade\n",
|
||||
"!pip install openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Getting API Credentials\n",
|
||||
"\n",
|
||||
"To get the Argilla API credentials, follow the next steps:\n",
|
||||
"\n",
|
||||
"1. Go to your Argilla UI.\n",
|
||||
"2. Click on your profile picture and go to \"My settings\".\n",
|
||||
"3. Then copy the API Key.\n",
|
||||
"\n",
|
||||
"In Argilla the API URL will be the same as the URL of your Argilla UI.\n",
|
||||
"\n",
|
||||
"To get the OpenAI API credentials, please visit https://platform.openai.com/account/api-keys"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"ARGILLA_API_URL\"] = \"...\"\n",
|
||||
"os.environ[\"ARGILLA_API_KEY\"] = \"...\"\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"...\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup Argilla\n",
|
||||
"\n",
|
||||
"To use the `ArgillaCallbackHandler` we will need to create a new `FeedbackDataset` in Argilla to keep track of your LLM experiments. To do so, please use the following code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import argilla as rg"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from packaging.version import parse as parse_version\n",
|
||||
"\n",
|
||||
"if parse_version(rg.__version__) < parse_version(\"1.8.0\"):\n",
|
||||
" raise RuntimeError(\n",
|
||||
" \"`FeedbackDataset` is only available in Argilla v1.8.0 or higher, please \"\n",
|
||||
" \"upgrade `argilla` as `pip install argilla --upgrade`.\"\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dataset = rg.FeedbackDataset(\n",
|
||||
" fields=[\n",
|
||||
" rg.TextField(name=\"prompt\"),\n",
|
||||
" rg.TextField(name=\"response\"),\n",
|
||||
" ],\n",
|
||||
" questions=[\n",
|
||||
" rg.RatingQuestion(\n",
|
||||
" name=\"response-rating\",\n",
|
||||
" description=\"How would you rate the quality of the response?\",\n",
|
||||
" values=[1, 2, 3, 4, 5],\n",
|
||||
" required=True,\n",
|
||||
" ),\n",
|
||||
" rg.TextQuestion(\n",
|
||||
" name=\"response-feedback\",\n",
|
||||
" description=\"What feedback do you have for the response?\",\n",
|
||||
" required=False,\n",
|
||||
" ),\n",
|
||||
" ],\n",
|
||||
" guidelines=\"You're asked to rate the quality of the response and provide feedback.\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"rg.init(\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"dataset.push_to_argilla(\"langchain-dataset\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> 📌 NOTE: at the moment, just the prompt-response pairs are supported as `FeedbackDataset.fields`, so the `ArgillaCallbackHandler` will just track the prompt i.e. the LLM input, and the response i.e. the LLM output."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tracking"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To use the `ArgillaCallbackHandler` you can either use the following code, or just reproduce one of the examples presented in the following sections."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 1: Tracking an LLM\n",
|
||||
"\n",
|
||||
"First, let's just run a single LLM a few times and capture the resulting prompt-response pairs in Argilla."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[Generation(text='\\n\\nQ: What did the fish say when he hit the wall? \\nA: Dam.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nThe Moon \\n\\nThe moon is high in the midnight sky,\\nSparkling like a star above.\\nThe night so peaceful, so serene,\\nFilling up the air with love.\\n\\nEver changing and renewing,\\nA never-ending light of grace.\\nThe moon remains a constant view,\\nA reminder of life’s gentle pace.\\n\\nThrough time and space it guides us on,\\nA never-fading beacon of hope.\\nThe moon shines down on us all,\\nAs it continues to rise and elope.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ. What did one magnet say to the other magnet?\\nA. \"I find you very attractive!\"', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nThe world is charged with the grandeur of God.\\nIt will flame out, like shining from shook foil;\\nIt gathers to a greatness, like the ooze of oil\\nCrushed. Why do men then now not reck his rod?\\n\\nGenerations have trod, have trod, have trod;\\nAnd all is seared with trade; bleared, smeared with toil;\\nAnd wears man's smudge and shares man's smell: the soil\\nIs bare now, nor can foot feel, being shod.\\n\\nAnd for all this, nature is never spent;\\nThere lives the dearest freshness deep down things;\\nAnd though the last lights off the black West went\\nOh, morning, at the brown brink eastward, springs —\\n\\nBecause the Holy Ghost over the bent\\nWorld broods with warm breast and with ah! bright wings.\\n\\n~Gerard Manley Hopkins\", generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\nQ: What did one ocean say to the other ocean?\\nA: Nothing, they just waved.', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text=\"\\n\\nA poem for you\\n\\nOn a field of green\\n\\nThe sky so blue\\n\\nA gentle breeze, the sun above\\n\\nA beautiful world, for us to love\\n\\nLife is a journey, full of surprise\\n\\nFull of joy and full of surprise\\n\\nBe brave and take small steps\\n\\nThe future will be revealed with depth\\n\\nIn the morning, when dawn arrives\\n\\nA fresh start, no reason to hide\\n\\nSomewhere down the road, there's a heart that beats\\n\\nBelieve in yourself, you'll always succeed.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'completion_tokens': 504, 'total_tokens': 528, 'prompt_tokens': 24}, 'model_name': 'text-davinci-003'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"llm.generate([\"Tell me a joke\", \"Tell me a poem\"] * 3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 2: Tracking an LLM in a chain\n",
|
||||
"\n",
|
||||
"Then we can create a chain using a prompt template, and then track the initial prompt and the final response in Argilla."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mYou are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
|
||||
"Title: Documentary about Bigfoot in Paris\n",
|
||||
"Playwright: This is a synopsis for the above play:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'text': \"\\n\\nDocumentary about Bigfoot in Paris focuses on the story of a documentary filmmaker and their search for evidence of the legendary Bigfoot creature in the city of Paris. The play follows the filmmaker as they explore the city, meeting people from all walks of life who have had encounters with the mysterious creature. Through their conversations, the filmmaker unravels the story of Bigfoot and finds out the truth about the creature's presence in Paris. As the story progresses, the filmmaker learns more and more about the mysterious creature, as well as the different perspectives of the people living in the city, and what they think of the creature. In the end, the filmmaker's findings lead them to some surprising and heartwarming conclusions about the creature's existence and the importance it holds in the lives of the people in Paris.\"}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"template = \"\"\"You are a playwright. Given the title of play, it is your job to write a synopsis for that title.\n",
|
||||
"Title: {title}\n",
|
||||
"Playwright: This is a synopsis for the above play:\"\"\"\n",
|
||||
"prompt_template = PromptTemplate(input_variables=[\"title\"], template=template)\n",
|
||||
"synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"test_prompts = [{\"title\": \"Documentary about Bigfoot in Paris\"}]\n",
|
||||
"synopsis_chain.apply(test_prompts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scenario 3: Using an Agent with Tools\n",
|
||||
"\n",
|
||||
"Finally, as a more advanced workflow, you can create an agent that uses some tools. So that `ArgillaCallbackHandler` will keep track of the input and the output, but not about the intermediate steps/thoughts, so that given a prompt we log the original prompt and the final response to that given prompt."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> Note that for this scenario we'll be using Google Search API (Serp API) so you will need to both install `google-search-results` as `pip install google-search-results`, and to set the Serp API Key as `os.environ[\"SERPAPI_API_KEY\"] = \"...\"` (you can find it at https://serpapi.com/dashboard), otherwise the example below won't work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to answer a historical question\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"who was the first president of the United States of America\" \u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mGeorge Washington\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m George Washington was the first president\n",
|
||||
"Final Answer: George Washington was the first president of the United States of America.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'George Washington was the first president of the United States of America.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
|
||||
"from langchain.callbacks import ArgillaCallbackHandler, StdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"argilla_callback = ArgillaCallbackHandler(\n",
|
||||
" dataset_name=\"langchain-dataset\",\n",
|
||||
" api_url=os.environ[\"ARGILLA_API_URL\"],\n",
|
||||
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
|
||||
")\n",
|
||||
"callbacks = [StdOutCallbackHandler(), argilla_callback]\n",
|
||||
"llm = OpenAI(temperature=0.9, callbacks=callbacks)\n",
|
||||
"\n",
|
||||
"tools = load_tools([\"serpapi\"], llm=llm, callbacks=callbacks)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools,\n",
|
||||
" llm,\n",
|
||||
" agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" callbacks=callbacks,\n",
|
||||
")\n",
|
||||
"agent.run(\"Who was the first president of the United States of America?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,175 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "63b87b91",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Logging to file\n",
|
||||
"This example shows how to print logs to file. It shows how to use the `FileCallbackHandler`, which does the same thing as [`StdOutCallbackHandler`](https://python.langchain.com/en/latest/modules/callbacks/getting_started.html#using-an-existing-handler), but instead writes the output to file. It also uses the `loguru` library to log other outputs that are not captured by the handler."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "6cb156cc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3m1 + 2 = \u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32m2023-06-01 18:36:38.929\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36m__main__\u001b[0m:\u001b[36m<module>\u001b[0m:\u001b[36m20\u001b[0m - \u001b[1m\n",
|
||||
"\n",
|
||||
"3\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from loguru import logger\n",
|
||||
"\n",
|
||||
"from langchain.callbacks import FileCallbackHandler\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"logfile = 'output.log'\n",
|
||||
"\n",
|
||||
"logger.add(logfile, colorize=True, enqueue=True)\n",
|
||||
"handler = FileCallbackHandler(logfile)\n",
|
||||
"\n",
|
||||
"llm = OpenAI()\n",
|
||||
"prompt = PromptTemplate.from_template(\"1 + {number} = \")\n",
|
||||
"\n",
|
||||
"# this chain will both print to stdout (because verbose=True) and write to 'output.log'\n",
|
||||
"# if verbose=False, the FileCallbackHandler will still write to 'output.log'\n",
|
||||
"chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)\n",
|
||||
"answer = chain.run(number=2)\n",
|
||||
"logger.info(answer)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c50d54f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can open the file `output.log` to see that the output has been captured."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aa32dc0a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install ansi2html > /dev/null"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "4af00719",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n",
|
||||
"<html>\n",
|
||||
"<head>\n",
|
||||
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n",
|
||||
"<title></title>\n",
|
||||
"<style type=\"text/css\">\n",
|
||||
".ansi2html-content { display: inline; white-space: pre-wrap; word-wrap: break-word; }\n",
|
||||
".body_foreground { color: #AAAAAA; }\n",
|
||||
".body_background { background-color: #000000; }\n",
|
||||
".inv_foreground { color: #000000; }\n",
|
||||
".inv_background { background-color: #AAAAAA; }\n",
|
||||
".ansi1 { font-weight: bold; }\n",
|
||||
".ansi3 { font-style: italic; }\n",
|
||||
".ansi32 { color: #00aa00; }\n",
|
||||
".ansi36 { color: #00aaaa; }\n",
|
||||
"</style>\n",
|
||||
"</head>\n",
|
||||
"<body class=\"body_foreground body_background\" style=\"font-size: normal;\" >\n",
|
||||
"<pre class=\"ansi2html-content\">\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"<span class=\"ansi1\">> Entering new LLMChain chain...</span>\n",
|
||||
"Prompt after formatting:\n",
|
||||
"<span class=\"ansi1 ansi32\"></span><span class=\"ansi1 ansi3 ansi32\">1 + 2 = </span>\n",
|
||||
"\n",
|
||||
"<span class=\"ansi1\">> Finished chain.</span>\n",
|
||||
"<span class=\"ansi32\">2023-06-01 18:36:38.929</span> | <span class=\"ansi1\">INFO </span> | <span class=\"ansi36\">__main__</span>:<span class=\"ansi36\"><module></span>:<span class=\"ansi36\">20</span> - <span class=\"ansi1\">\n",
|
||||
"\n",
|
||||
"3</span>\n",
|
||||
"\n",
|
||||
"</pre>\n",
|
||||
"</body>\n",
|
||||
"\n",
|
||||
"</html>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.HTML object>"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from IPython.display import display, HTML\n",
|
||||
"from ansi2html import Ansi2HTMLConverter\n",
|
||||
"\n",
|
||||
"with open('output.log', 'r') as f:\n",
|
||||
" content = f.read()\n",
|
||||
"\n",
|
||||
"conv = Ansi2HTMLConverter()\n",
|
||||
"html = conv.convert(content, full=True)\n",
|
||||
"\n",
|
||||
"display(HTML(html))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -6,13 +6,14 @@ Chains
|
||||
|
||||
|
||||
Using an LLM in isolation is fine for some simple applications,
|
||||
but more complex applications require chaining LLMs - either with each other or with other experts.
|
||||
LangChain provides a standard interface for **Chains**, as well as several common implementations of chains.
|
||||
but many more complex ones require chaining LLMs - either with each other or with other experts.
|
||||
LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use.
|
||||
|
||||
|
|
||||
- `Getting Started <./chains/getting_started.html>`_: An overview of chains.
|
||||
The following sections of documentation are provided:
|
||||
|
||||
- `How-To Guides <./chains/how_to_guides.html>`_: How-to guides about various types of chains.
|
||||
- `Getting Started <./chains/getting_started.html>`_: A getting started guide for chains, to get you up and running quickly.
|
||||
|
||||
- `How-To Guides <./chains/how_to_guides.html>`_: A collection of how-to guides. These highlight how to use various types of chains.
|
||||
|
||||
- `Reference <../reference/modules/chains.html>`_: API reference documentation for all Chain classes.
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
|
||||
"\n",
|
||||
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](../index_examples/question_answering.ipynb). Async support for other chains is on the roadmap."
|
||||
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](../indexes/chain_examples/question_answering.html). Async support for other chains is on the roadmap."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -104,7 +104,7 @@
|
||||
"s = time.perf_counter()\n",
|
||||
"generate_serially()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n"
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -21,7 +21,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "e9db25f3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -318,141 +318,6 @@
|
||||
"chain({\"input_documents\": docs}, return_only_outputs=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "b882e209",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## The custom `MapReduceChain`\n",
|
||||
"\n",
|
||||
"**Multi input prompt**\n",
|
||||
"\n",
|
||||
"You can also use prompt with multi input. In this example, we will use a MapReduce chain to answer specifc question about our code."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "f7ad9ee2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain\n",
|
||||
"from langchain.chains.combine_documents.stuff import StuffDocumentsChain\n",
|
||||
"\n",
|
||||
"map_template_string = \"\"\"Give the following python code information, generate a description that explains what the code does and also mention the time complexity.\n",
|
||||
"Code:\n",
|
||||
"{code}\n",
|
||||
"\n",
|
||||
"Return the the description in the following format:\n",
|
||||
"name of the function: description of the function\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"reduce_template_string = \"\"\"Give the following following python fuctions name and their descritpion, answer the following question\n",
|
||||
"{code_description}\n",
|
||||
"Question: {question}\n",
|
||||
"Answer:\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"MAP_PROMPT = PromptTemplate(input_variables=[\"code\"], template=map_template_string)\n",
|
||||
"REDUCE_PROMPT = PromptTemplate(input_variables=[\"code_description\", \"question\"], template=reduce_template_string)\n",
|
||||
"\n",
|
||||
"llm = OpenAI()\n",
|
||||
"\n",
|
||||
"map_llm_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)\n",
|
||||
"reduce_llm_chain = LLMChain(llm=llm, prompt=REDUCE_PROMPT)\n",
|
||||
"\n",
|
||||
"generative_result_reduce_chain = StuffDocumentsChain(\n",
|
||||
" llm_chain=reduce_llm_chain,\n",
|
||||
" document_variable_name=\"code_description\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"combine_documents = MapReduceDocumentsChain(\n",
|
||||
" llm_chain=map_llm_chain,\n",
|
||||
" combine_document_chain=generative_result_reduce_chain,\n",
|
||||
" document_variable_name=\"code\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"map_reduce = MapReduceChain(\n",
|
||||
" combine_documents_chain=combine_documents,\n",
|
||||
" text_splitter=CharacterTextSplitter(separator=\"\\n##\\n\", chunk_size=100, chunk_overlap=0),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "0d4caccb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"code = \"\"\"\n",
|
||||
"def bubblesort(list):\n",
|
||||
" for iter_num in range(len(list)-1,0,-1):\n",
|
||||
" for idx in range(iter_num):\n",
|
||||
" if list[idx]>list[idx+1]:\n",
|
||||
" temp = list[idx]\n",
|
||||
" list[idx] = list[idx+1]\n",
|
||||
" list[idx+1] = temp\n",
|
||||
" return list\n",
|
||||
"##\n",
|
||||
"def insertion_sort(InputList):\n",
|
||||
" for i in range(1, len(InputList)):\n",
|
||||
" j = i-1\n",
|
||||
" nxt_element = InputList[i]\n",
|
||||
" while (InputList[j] > nxt_element) and (j >= 0):\n",
|
||||
" InputList[j+1] = InputList[j]\n",
|
||||
" j=j-1\n",
|
||||
" InputList[j+1] = nxt_element\n",
|
||||
" return InputList\n",
|
||||
"##\n",
|
||||
"def shellSort(input_list):\n",
|
||||
" gap = len(input_list) // 2\n",
|
||||
" while gap > 0:\n",
|
||||
" for i in range(gap, len(input_list)):\n",
|
||||
" temp = input_list[i]\n",
|
||||
" j = i\n",
|
||||
" while j >= gap and input_list[j - gap] > temp:\n",
|
||||
" input_list[j] = input_list[j - gap]\n",
|
||||
" j = j-gap\n",
|
||||
" input_list[j] = temp\n",
|
||||
" gap = gap//2\n",
|
||||
" return input_list\n",
|
||||
"\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "d5a9a35b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Created a chunk of size 247, which is longer than the specified 100\n",
|
||||
"Created a chunk of size 267, which is longer than the specified 100\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'shellSort has a better time complexity than both bubblesort and insertion_sort, as it has a time complexity of O(n^2), while the other two have a time complexity of O(n^2).'"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"map_reduce.run(input_text=code, question=\"Which function has a better time complexity?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f61350f9",
|
||||
@@ -605,7 +470,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.16"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -5,41 +5,53 @@ Indexes
|
||||
`Conceptual Guide <https://docs.langchain.com/docs/components/indexing>`_
|
||||
|
||||
|
||||
**Indexes** refer to ways to structure documents so that LLMs can best interact with them.
|
||||
Indexes refer to ways to structure documents so that LLMs can best interact with them.
|
||||
This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.
|
||||
|
||||
The most common way that indexes are used in chains is in a "retrieval" step.
|
||||
This step refers to taking a user's query and returning the most relevant documents.
|
||||
We draw this distinction because (1) an index can be used for other things besides retrieval, and
|
||||
(2) retrieval can use other logic besides an index to find relevant documents.
|
||||
We therefore have a concept of a **Retriever** interface - this is the interface that most chains work with.
|
||||
We draw this distinction because (1) an index can be used for other things besides retrieval, and (2) retrieval can use other logic besides an index to find relevant documents.
|
||||
We therefore have a concept of a "Retriever" interface - this is the interface that most chains work with.
|
||||
|
||||
Most of the time when we talk about indexes and retrieval we are talking about indexing and retrieving
|
||||
unstructured data (like text documents).
|
||||
For interacting with structured data (SQL tables, etc) or APIs, please see the corresponding use case
|
||||
sections for links to relevant functionality.
|
||||
Most of the time when we talk about indexes and retrieval we are talking about indexing and retrieving unstructured data (like text documents).
|
||||
For interacting with structured data (SQL tables, etc) or APIs, please see the corresponding use case sections for links to relevant functionality.
|
||||
The primary index and retrieval types supported by LangChain are currently centered around vector databases, and therefore
|
||||
a lot of the functionality we dive deep on those topics.
|
||||
|
||||
|
|
||||
- `Getting Started <./indexes/getting_started.html>`_: An overview of the indexes.
|
||||
For an overview of everything related to this, please see the below notebook for getting started:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
./indexes/getting_started.ipynb
|
||||
|
||||
We then provide a deep dive on the four main components.
|
||||
|
||||
**Document Loaders**
|
||||
|
||||
How to load documents from a variety of sources.
|
||||
|
||||
**Text Splitters**
|
||||
|
||||
An overview of the abstractions and implementions around splitting text.
|
||||
|
||||
|
||||
Index Types
|
||||
---------------------
|
||||
**VectorStores**
|
||||
|
||||
- `Document Loaders <./indexes/document_loaders.html>`_: How to load documents from a variety of sources.
|
||||
An overview of VectorStores and the many integrations LangChain provides.
|
||||
|
||||
- `Text Splitters <./indexes/text_splitters.html>`_: An overview and different types of the **Text Splitters**.
|
||||
|
||||
- `VectorStores <./indexes/vectorstores.html>`_: An overview and different types of the **Vector Stores**.
|
||||
**Retrievers**
|
||||
|
||||
- `Retrievers <./indexes/retrievers.html>`_: An overview and different types of the **Retrievers**.
|
||||
An overview of Retrievers and the implementations LangChain provides.
|
||||
|
||||
Go Deeper
|
||||
---------
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
./indexes/getting_started.ipynb
|
||||
./indexes/document_loaders.rst
|
||||
./indexes/text_splitters.rst
|
||||
./indexes/vectorstores.rst
|
||||
|
||||
@@ -88,7 +88,7 @@ We don't need any access permissions to these datasets and services.
|
||||
|
||||
|
||||
Proprietary dataset or service loaders
|
||||
--------------------------------------
|
||||
------------------------------
|
||||
These datasets and services are not from the public domain.
|
||||
These loaders mostly transform data from specific formats of applications or cloud services,
|
||||
for example **Google Drive**.
|
||||
|
||||
@@ -1,256 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f08772b0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Alibaba Cloud MaxCompute\n",
|
||||
"\n",
|
||||
">[Alibaba Cloud MaxCompute](https://www.alibabacloud.com/product/maxcompute) (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.\n",
|
||||
"\n",
|
||||
"The `MaxComputeLoader` lets you execute a MaxCompute SQL query and loads the results as one document per row."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "067b7213",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Collecting pyodps\n",
|
||||
" Downloading pyodps-0.11.4.post0-cp39-cp39-macosx_10_9_universal2.whl (2.0 MB)\n",
|
||||
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m0m\n",
|
||||
"\u001b[?25hRequirement already satisfied: charset-normalizer>=2 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.1.0)\n",
|
||||
"Requirement already satisfied: urllib3<2.0,>=1.26.0 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (1.26.15)\n",
|
||||
"Requirement already satisfied: idna>=2.5 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.4)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (2023.5.7)\n",
|
||||
"Installing collected packages: pyodps\n",
|
||||
"Successfully installed pyodps-0.11.4.post0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!pip install pyodps"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "19641457",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic Usage\n",
|
||||
"To instantiate the loader you'll need a SQL query to execute, your MaxCompute endpoint and project name, and you access ID and secret access key. The access ID and secret access key can either be passed in direct via the `access_id` and `secret_access_key` parameters or they can be set as environment variables `MAX_COMPUTE_ACCESS_ID` and `MAX_COMPUTE_SECRET_ACCESS_KEY`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "71a0da4b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import MaxComputeLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d4770c4a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"base_query = \"\"\"\n",
|
||||
"SELECT *\n",
|
||||
"FROM (\n",
|
||||
" SELECT 1 AS id, 'content1' AS content, 'meta_info1' AS meta_info\n",
|
||||
" UNION ALL\n",
|
||||
" SELECT 2 AS id, 'content2' AS content, 'meta_info2' AS meta_info\n",
|
||||
" UNION ALL\n",
|
||||
" SELECT 3 AS id, 'content3' AS content, 'meta_info3' AS meta_info\n",
|
||||
") mydata;\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1616c174",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"endpoint=\"<ENDPOINT>\"\n",
|
||||
"project=\"<PROJECT>\"\n",
|
||||
"ACCESS_ID = \"<ACCESS ID>\"\n",
|
||||
"SECRET_ACCESS_KEY = \"<SECRET ACCESS KEY>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "e5c25041",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = MaxComputeLoader.from_params(\n",
|
||||
" base_query,\n",
|
||||
" endpoint,\n",
|
||||
" project,\n",
|
||||
" access_id=ACCESS_ID,\n",
|
||||
" secret_access_key=SECRET_ACCESS_KEY,\n",
|
||||
"\n",
|
||||
")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "311e74ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Document(page_content='id: 1\\ncontent: content1\\nmeta_info: meta_info1', metadata={}), Document(page_content='id: 2\\ncontent: content2\\nmeta_info: meta_info2', metadata={}), Document(page_content='id: 3\\ncontent: content3\\nmeta_info: meta_info3', metadata={})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "a4d8c388",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"id: 1\n",
|
||||
"content: content1\n",
|
||||
"meta_info: meta_info1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "f2422e6c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].metadata)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "85e07e28",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying Which Columns are Content vs Metadata\n",
|
||||
"You can configure which subset of columns should be loaded as the contents of the Document and which as the metadata using the `page_content_columns` and `metadata_columns` parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "a7b9d726",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = MaxComputeLoader.from_params(\n",
|
||||
" base_query,\n",
|
||||
" endpoint,\n",
|
||||
" project,\n",
|
||||
" page_content_columns=[\"content\"], # Specify Document page content\n",
|
||||
" metadata_columns=[\"id\", \"meta_info\"], # Specify Document metadata\n",
|
||||
" access_id=ACCESS_ID,\n",
|
||||
" secret_access_key=SECRET_ACCESS_KEY,\n",
|
||||
")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "532c19e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"content: content1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "5fe4990a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'id': 1, 'meta_info': 'meta_info1'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(data[0].metadata)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,18 +1,15 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"source": [
|
||||
"# Confluence\n",
|
||||
"\n",
|
||||
">[Confluence](https://www.atlassian.com/software/confluence) is a wiki collaboration platform that saves and organizes all of the project-related material. `Confluence` is a knowledge base that primarily handles content management activities. \n",
|
||||
"\n",
|
||||
"A loader for `Confluence` pages.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This currently supports `username/api_key`, `Oauth2 login`. Additionally, on-prem installations also support `token` authentication. \n",
|
||||
"A loader for `Confluence` pages currently supports both `username/api_key` and `Oauth2 login`.\n",
|
||||
"See [instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Specify a list `page_id`-s and/or `space_key` to load in the corresponding pages into Document objects, if both are specified the union of both sets will be returned.\n",
|
||||
@@ -23,17 +20,9 @@
|
||||
"Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Before using ConfluenceLoader make sure you have the latest version of the atlassian-python-api package installed:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -42,29 +31,6 @@
|
||||
"#!pip install atlassian-python-api"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Username and Password or Username and API Token (Atlassian Cloud only)\n",
|
||||
"\n",
|
||||
"This example authenticates using either a username and password or, if you're connecting to an Atlassian Cloud hosted version of Confluence, a username and an API Token.\n",
|
||||
"You can generate an API token at: https://id.atlassian.com/manage-profile/security/api-tokens.\n",
|
||||
"\n",
|
||||
"The `limit` parameter specifies how many documents will be retrieved in a single call, not how many documents will be retrieved in total.\n",
|
||||
"By default the code will return up to 1000 documents in 50 documents batches. To control the total number of documents use the `max_pages` parameter. \n",
|
||||
"Plese note the maximum value for the `limit` parameter in the atlassian-python-api package is currently 100. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -80,34 +46,6 @@
|
||||
")\n",
|
||||
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Personal Access Token (Server/On-Prem only)\n",
|
||||
"\n",
|
||||
"This method is valid for the Data Center/Server on-prem edition only.\n",
|
||||
"For more information on how to generate a Personal Access Token (PAT) check the official Confluence documentation at: https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html.\n",
|
||||
"When using a PAT you provide only the token value, you cannot provide a username. \n",
|
||||
"Please note that ConfluenceLoader will run under the permissions of the user that generated the PAT and will only be able to load documents for which said user has access to. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import ConfluenceLoader\n",
|
||||
"\n",
|
||||
"loader = ConfluenceLoader(\n",
|
||||
" url=\"https://yoursite.atlassian.com/wiki\",\n",
|
||||
" token=\"12345\"\n",
|
||||
")\n",
|
||||
"documents = loader.load(space_key=\"SPACE\", include_attachments=True, limit=50, max_pages=50)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -126,7 +64,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.13"
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -5,47 +5,22 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Docugami\n",
|
||||
"This notebook covers how to load documents from `Docugami`. It provides the advantages of using this system over alternative data loaders.\n",
|
||||
"This notebook covers how to load documents from `Docugami`. See [here](../../../../ecosystem/docugami.md) for more details, and the advantages of using this system over alternative data loaders.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"1. Install necessary python packages.\n",
|
||||
"2. Grab an access token for your workspace, and make sure it is set as the `DOCUGAMI_API_KEY` environment variable.\n",
|
||||
"1. Follow the Quick Start section in [this document](../../../../ecosystem/docugami.md)\n",
|
||||
"2. Grab an access token for your workspace, and make sure it is set as the DOCUGAMI_API_KEY environment variable\n",
|
||||
"3. Grab some docset and document IDs for your processed documents, as described here: https://help.docugami.com/home/docugami-api"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You need the lxml package to use the DocugamiLoader\n",
|
||||
"!pip install lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quick start\n",
|
||||
"\n",
|
||||
"1. Create a [Docugami workspace](http://www.docugami.com) (free trials available)\n",
|
||||
"2. Add your documents (PDF, DOCX or DOC) and allow Docugami to ingest and cluster them into sets of similar documents, e.g. NDAs, Lease Agreements, and Service Agreements. There is no fixed set of document types supported by the system, the clusters created depend on your particular documents, and you can [change the docset assignments](https://help.docugami.com/home/working-with-the-doc-sets-view) later.\n",
|
||||
"3. Create an access token via the Developer Playground for your workspace. [Detailed instructions](https://help.docugami.com/home/docugami-api)\n",
|
||||
"4. Explore the [Docugami API](https://api-docs.docugami.com) to get a list of your processed docset IDs, or just the document IDs for a particular docset. \n",
|
||||
"6. Use the DocugamiLoader as detailed below, to get rich semantic chunks for your documents.\n",
|
||||
"7. Optionally, build and publish one or more [reports or abstracts](https://help.docugami.com/home/reports). This helps Docugami improve the semantic XML with better tags based on your preferences, which are then added to the DocugamiLoader output as metadata. Use techniques like [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) to do high accuracy Document QA.\n",
|
||||
"\n",
|
||||
"## Advantages vs Other Chunking Techniques\n",
|
||||
"\n",
|
||||
"Appropriate chunking of your documents is critical for retrieval from documents. Many chunking techniques exist, including simple ones that rely on whitespace and recursive chunk splitting based on character length. Docugami offers a different approach:\n",
|
||||
"\n",
|
||||
"1. **Intelligent Chunking:** Docugami breaks down every document into a hierarchical semantic XML tree of chunks of varying sizes, from single words or numerical values to entire sections. These chunks follow the semantic contours of the document, providing a more meaningful representation than arbitrary length or simple whitespace-based chunking.\n",
|
||||
"2. **Structured Representation:** In addition, the XML tree indicates the structural contours of every document, using attributes denoting headings, paragraphs, lists, tables, and other common elements, and does that consistently across all supported document formats, such as scanned PDFs or DOCX files. It appropriately handles long-form document characteristics like page headers/footers or multi-column flows for clean text extraction.\n",
|
||||
"3. **Semantic Annotations:** Chunks are annotated with semantic tags that are coherent across the document set, facilitating consistent hierarchical queries across multiple documents, even if they are written and formatted differently. For example, in set of lease agreements, you can easily identify key provisions like the Landlord, Tenant, or Renewal Date, as well as more complex information such as the wording of any sub-lease provision or whether a specific jurisdiction has an exception section within a Termination Clause.\n",
|
||||
"4. **Additional Metadata:** Chunks are also annotated with additional metadata, if a user has been using Docugami. This additional metadata can be used for high-accuracy Document QA without context window restrictions. See detailed code walk-through below.\n"
|
||||
"!poetry run pip -q install lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -137,7 +112,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!poetry run pip -q install openai tiktoken chromadb"
|
||||
"!poetry run pip -q install openai tiktoken chromadb "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -317,7 +292,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use a [self-querying retriever](../../retrievers/examples/self_query.ipynb) to improve our query accuracy, using this additional metadata:"
|
||||
"We can use a [self-querying retriever](../../retrievers/examples/self_query_retriever.ipynb) to improve our query accuracy, using this additional metadata:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -364,7 +339,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's run the same question again. It returns the correct result since all the chunks have metadata key/value pairs on them carrying key information about the document even if this information is physically very far away from the source chunk used to generate the answer."
|
||||
"Let's run the same question again. It returns the correct result since all the chunks have metadata key/value pairs on them carrying key information about the document even if this infromation is physically very far away from the source chunk used to generate the answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -423,7 +398,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.9.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
Binary file not shown.
@@ -1,79 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22a849cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Microsoft Excel\n",
|
||||
"\n",
|
||||
"The `UnstructuredExcelLoader` is used to load `Microsoft Excel` files. The loader works with both `.xlsx` and `.xls` files. The page content will be the raw text of the Excel file. If you use the loader in `\"elements\"` mode, an HTML representation of the Excel file will be available in the document metadata under the `text_as_html` key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e6616e3a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredExcelLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a654e4d9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='\\n \\n \\n Team\\n Location\\n Stanley Cups\\n \\n \\n Blues\\n STL\\n 1\\n \\n \\n Flyers\\n PHI\\n 2\\n \\n \\n Maple Leafs\\n TOR\\n 13\\n \\n \\n', metadata={'source': 'example_data/stanley-cups.xlsx', 'filename': 'stanley-cups.xlsx', 'file_directory': 'example_data', 'filetype': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'page_number': 1, 'page_name': 'Stanley Cups', 'text_as_html': '<table border=\"1\" class=\"dataframe\">\\n <tbody>\\n <tr>\\n <td>Team</td>\\n <td>Location</td>\\n <td>Stanley Cups</td>\\n </tr>\\n <tr>\\n <td>Blues</td>\\n <td>STL</td>\\n <td>1</td>\\n </tr>\\n <tr>\\n <td>Flyers</td>\\n <td>PHI</td>\\n <td>2</td>\\n </tr>\\n <tr>\\n <td>Maple Leafs</td>\\n <td>TOR</td>\\n <td>13</td>\\n </tr>\\n </tbody>\\n</table>', 'category': 'Table'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = UnstructuredExcelLoader(\n",
|
||||
" \"example_data/stanley-cups.xlsx\",\n",
|
||||
" mode=\"elements\"\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ab94bde",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Facebook Chat\n",
|
||||
"### Facebook Chat\n",
|
||||
"\n",
|
||||
">[Messenger](https://en.wikipedia.org/wiki/Messenger_(software)) is an American proprietary instant messaging app and platform developed by `Meta Platforms`. Originally developed as `Facebook Chat` in 2008, the company revamped its messaging service in 2010.\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,18 +1,17 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PySpark DataFrame Loader\n",
|
||||
"# PySpack DataFrame Loader\n",
|
||||
"\n",
|
||||
"This notebook goes over how to load data from a [PySpark](https://spark.apache.org/docs/latest/api/python/) DataFrame."
|
||||
"This shows how to load data from a PySpark DataFrame"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -21,7 +20,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -30,26 +29,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Setting default log level to \"WARN\".\n",
|
||||
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
|
||||
"23/05/31 14:08:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"spark = SparkSession.builder.getOrCreate()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -58,7 +47,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -67,7 +56,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -76,56 +65,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Stage 8:> (0 + 1) / 1]\r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': ' 81.34', ' \"Wins\"': ' 98'}),\n",
|
||||
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': ' 82.20', ' \"Wins\"': ' 97'}),\n",
|
||||
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': ' 197.96', ' \"Wins\"': ' 95'}),\n",
|
||||
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': ' 117.62', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': ' 83.31', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': ' 55.37', ' \"Wins\"': ' 94'}),\n",
|
||||
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': ' 120.51', ' \"Wins\"': ' 93'}),\n",
|
||||
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': ' 81.43', ' \"Wins\"': ' 93'}),\n",
|
||||
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': ' 64.17', ' \"Wins\"': ' 90'}),\n",
|
||||
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': ' 154.49', ' \"Wins\"': ' 89'}),\n",
|
||||
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': ' 132.30', ' \"Wins\"': ' 88'}),\n",
|
||||
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': ' 110.30', ' \"Wins\"': ' 88'}),\n",
|
||||
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': ' 95.14', ' \"Wins\"': ' 86'}),\n",
|
||||
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': ' 96.92', ' \"Wins\"': ' 85'}),\n",
|
||||
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': ' 97.65', ' \"Wins\"': ' 83'}),\n",
|
||||
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': ' 174.54', ' \"Wins\"': ' 81'}),\n",
|
||||
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': ' 74.28', ' \"Wins\"': ' 81'}),\n",
|
||||
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': ' 63.43', ' \"Wins\"': ' 79'}),\n",
|
||||
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': ' 55.24', ' \"Wins\"': ' 76'}),\n",
|
||||
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': ' 81.97', ' \"Wins\"': ' 75'}),\n",
|
||||
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': ' 93.35', ' \"Wins\"': ' 74'}),\n",
|
||||
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': ' 75.48', ' \"Wins\"': ' 73'}),\n",
|
||||
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': ' 60.91', ' \"Wins\"': ' 72'}),\n",
|
||||
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': ' 118.07', ' \"Wins\"': ' 69'}),\n",
|
||||
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': ' 173.18', ' \"Wins\"': ' 69'}),\n",
|
||||
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': ' 78.43', ' \"Wins\"': ' 68'}),\n",
|
||||
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': ' 94.08', ' \"Wins\"': ' 66'}),\n",
|
||||
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': ' 78.06', ' \"Wins\"': ' 64'}),\n",
|
||||
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': ' 88.19', ' \"Wins\"': ' 61'}),\n",
|
||||
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': ' 60.65', ' \"Wins\"': ' 55'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
@@ -147,7 +89,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"# Reddit\n",
|
||||
"\n",
|
||||
">[Reddit](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
|
||||
">[Reddit (reddit)](www.reddit.com) is an American social news aggregation, content rating, and discussion website.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This loader fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package.\n",
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
"\n",
|
||||
"Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n",
|
||||
"\n",
|
||||
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
|
||||
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load, you can change the `requests_per_second` parameter to increase the max concurrent requests. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -63,25 +63,6 @@
|
||||
"docs = sitemap_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can change the `requests_per_second` parameter to increase the max concurrent requests. and use `requests_kwargs` to pass kwargs when send requests."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sitemap_loader.requests_per_second = 2\n",
|
||||
"# Optional: avoid `[SSL: CERTIFICATE_VERIFY_FAILED]` issue\n",
|
||||
"sitemap_loader.requests_kwargs = {\"verify\": False}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
|
||||
@@ -54,6 +54,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3e64cac2",
|
||||
"metadata": {},
|
||||
@@ -116,7 +117,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.9.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -171,7 +171,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -19,6 +19,7 @@
|
||||
"source": [
|
||||
"# # Install package\n",
|
||||
"!pip install \"unstructured[local-inference]\"\n",
|
||||
"!pip install \"detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2\"\n",
|
||||
"!pip install layoutparser[layoutmodels,tesseract]"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# WhatsApp Chat\n",
|
||||
"### WhatsApp Chat\n",
|
||||
"\n",
|
||||
">[WhatsApp](https://www.whatsapp.com/) (also called `WhatsApp Messenger`) is a freeware, cross-platform, centralized instant messaging (IM) and voice-over-IP (VoIP) service. It allows users to send text and voice messages, make voice and video calls, and share images, documents, user locations, and other content.\n",
|
||||
"\n",
|
||||
|
||||
@@ -5,16 +5,7 @@
|
||||
"id": "1edb9e6b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Cognitive Search\n",
|
||||
"\n",
|
||||
">[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
|
||||
"\n",
|
||||
">Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:\n",
|
||||
">- A search engine for full text search over a search index containing user-owned content\n",
|
||||
">- Rich indexing, with lexical analysis and optional AI enrichment for content extraction and transformation\n",
|
||||
">- Rich query syntax for text search, fuzzy search, autocomplete, geo-search and more\n",
|
||||
">- Programmability through REST APIs and client libraries in Azure SDKs\n",
|
||||
">- Azure integration at the data layer, machine learning layer, and AI (Cognitive Services)\n",
|
||||
"# Azure Cognitive Search Retriever\n",
|
||||
"\n",
|
||||
"This notebook shows how to use Azure Cognitive Search (ACS) within LangChain."
|
||||
]
|
||||
@@ -129,7 +120,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
@@ -1,80 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3df0dcf8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PubMed Retriever\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use PubMed as a retriever\n",
|
||||
"\n",
|
||||
"PubMed® comprises more than 35 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "aecaff63",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import PubMedRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "f2f7e8d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = PubMedRetriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "ed115aa1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='', metadata={'uid': '37268021', 'title': 'Dermatology in the wake of an AI revolution: who gets a say?', 'pub_date': '<Year>2023</Year><Month>May</Month><Day>31</Day>'}),\n",
|
||||
" Document(page_content='', metadata={'uid': '37267643', 'title': 'What is ChatGPT and what do we do with it? Implications of the age of AI for nursing and midwifery practice and education: An editorial.', 'pub_date': '<Year>2023</Year><Month>May</Month><Day>30</Day>'}),\n",
|
||||
" Document(page_content='The nursing field has undergone notable changes over time and is projected to undergo further modifications in the future, owing to the advent of sophisticated technologies and growing healthcare needs. The advent of ChatGPT, an AI-powered language model, is expected to exert a significant influence on the nursing profession, specifically in the domains of patient care and instruction. The present article delves into the ramifications of ChatGPT within the nursing domain and accentuates its capacity and constraints to transform the discipline.', metadata={'uid': '37266721', 'title': 'The Impact of ChatGPT on the Nursing Profession: Revolutionizing Patient Care and Education.', 'pub_date': '<Year>2023</Year><Month>Jun</Month><Day>02</Day>'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"chatgpt\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,396 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "13afcae7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Self-querying with Qdrant\n",
|
||||
"\n",
|
||||
">[Qdrant](https://qdrant.tech/documentation/) (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. `Qdrant` is tailored to extended filtering support. It makes it useful \n",
|
||||
"\n",
|
||||
"In the notebook we'll demo the `SelfQueryRetriever` wrapped around a Qdrant vector store. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68e75fb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating a Qdrant vectorstore\n",
|
||||
"First we'll want to create a Chroma VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
|
||||
"\n",
|
||||
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `qdrant-client` package."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "63a8af5b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install lark qdrant-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "83811610-7df3-4ede-b268-68a6a83ba9e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "dd01b61b-7d32-4a55-85d6-b2d2d4f18840",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import os\n",
|
||||
"# import getpass\n",
|
||||
"\n",
|
||||
"# os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "cb4a5787",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores import Qdrant\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bcbe04d9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = [\n",
|
||||
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"}),\n",
|
||||
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
|
||||
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
|
||||
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
|
||||
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"year\": 1995, \"genre\": \"animated\"}),\n",
|
||||
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"year\": 1979, \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": \"science fiction\", \"rating\": 9.9})\n",
|
||||
"]\n",
|
||||
"vectorstore = Qdrant.from_documents(\n",
|
||||
" docs, \n",
|
||||
" embeddings, \n",
|
||||
" location=\":memory:\", # Local mode with in-memory storage only\n",
|
||||
" collection_name=\"my_documents\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ecaab6d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating our self-querying retriever\n",
|
||||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "86e34dbf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain.chains.query_constructor.base import AttributeInfo\n",
|
||||
"\n",
|
||||
"metadata_field_info=[\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"genre\",\n",
|
||||
" description=\"The genre of the movie\", \n",
|
||||
" type=\"string or list[string]\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"year\",\n",
|
||||
" description=\"The year the movie was released\", \n",
|
||||
" type=\"integer\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"director\",\n",
|
||||
" description=\"The name of the movie director\", \n",
|
||||
" type=\"string\", \n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"rating\",\n",
|
||||
" description=\"A 1-10 rating for the movie\",\n",
|
||||
" type=\"float\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"document_content_description = \"Brief summary of a movie\"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea9df8d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Testing it out\n",
|
||||
"And now we can try actually using our retriever!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "38a126e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='dinosaur' filter=None limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"What are some movies about dinosaurs\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "fc3f1e6e",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query=' ' filter=Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a filter\n",
|
||||
"retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "b19d4da0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='women' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='director', value='Greta Gerwig') limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and a filter\n",
|
||||
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "f900e40e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='rating', value=8.5), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='science fiction')]) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "12a51522",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='toys' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.GT: 'gt'>, attribute='year', value=1990), Comparison(comparator=<Comparator.LT: 'lt'>, attribute='year', value=2005), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre', value='animated')]) limit=None\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filter k\n",
|
||||
"\n",
|
||||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
|
||||
"\n",
|
||||
"We can do this by passing `enable_limit=True` to the constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, \n",
|
||||
" vectorstore, \n",
|
||||
" document_content_description, \n",
|
||||
" metadata_field_info, \n",
|
||||
" enable_limit=True,\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "2758d229-4f97-499c-819f-888acaf8ee10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"query='dinosaur' filter=None limit=2\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction'}),\n",
|
||||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.get_relevant_documents(\"what are two movies about dinosaurs\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9fc6205b",
|
||||
"metadata": {},
|
||||
@@ -13,6 +14,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "51489529-5dcd-4b86-bda6-de0a39d8ffd1",
|
||||
"metadata": {},
|
||||
@@ -21,6 +23,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1435c804-069d-4ade-9a7b-006b97b767c1",
|
||||
"metadata": {},
|
||||
@@ -41,6 +44,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6c15470b-a16b-4e0d-bc6a-6998bafbb5a4",
|
||||
"metadata": {},
|
||||
@@ -54,6 +58,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ae3c3d16",
|
||||
"metadata": {},
|
||||
@@ -62,6 +67,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6fafb73b-d6ec-4822-b161-edf0aaf5224a",
|
||||
"metadata": {},
|
||||
@@ -145,6 +151,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2670363b-3806-4c7e-b14d-90a4d5d2a200",
|
||||
"metadata": {},
|
||||
@@ -266,7 +273,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -2,15 +2,21 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Zep\n",
|
||||
"# Zep Memory\n",
|
||||
"\n",
|
||||
">[Zep](https://docs.getzep.com/) - A long-term memory store for LLM applications.\n",
|
||||
"## Retriever Example\n",
|
||||
"\n",
|
||||
"More on `Zep`:\n",
|
||||
"This notebook demonstrates how to search historical chat message histories using the [Zep Long-term Memory Store](https://getzep.github.io/).\n",
|
||||
"\n",
|
||||
"`Zep` stores, summarizes, embeds, indexes, and enriches conversational AI chat histories, and exposes them via simple, low-latency APIs.\n",
|
||||
"We'll demonstrate:\n",
|
||||
"\n",
|
||||
"1. Adding conversation history to the Zep memory store.\n",
|
||||
"2. Vector search over the conversation history.\n",
|
||||
"\n",
|
||||
"More on Zep:\n",
|
||||
"\n",
|
||||
"Zep stores, summarizes, embeds, indexes, and enriches conversational AI chat histories, and exposes them via simple, low-latency APIs.\n",
|
||||
"\n",
|
||||
"Key Features:\n",
|
||||
"\n",
|
||||
@@ -22,37 +28,15 @@
|
||||
"\n",
|
||||
"Zep's Go Extractor model is easily extensible, with a simple, clean interface available to build new enrichment functionality, such as summarizers, entity extractors, embedders, and more.\n",
|
||||
"\n",
|
||||
"`Zep` project: [https://github.com/getzep/zep](https://github.com/getzep/zep)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Retriever Example\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to search historical chat message histories using the [Zep Long-term Memory Store](https://getzep.github.io/).\n",
|
||||
"\n",
|
||||
"We'll demonstrate:\n",
|
||||
"\n",
|
||||
"1. Adding conversation history to the Zep memory store.\n",
|
||||
"2. Vector search over the conversation history.\n",
|
||||
"\n"
|
||||
]
|
||||
"Zep project: [https://github.com/getzep/zep](https://github.com/getzep/zep)\n"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:27.863217Z",
|
||||
"start_time": "2023-05-25T15:03:25.690273Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import ZepChatMessageHistory\n",
|
||||
@@ -61,30 +45,29 @@
|
||||
"\n",
|
||||
"# Set this to your Zep server URL\n",
|
||||
"ZEP_API_URL = \"http://localhost:8000\""
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:27.863217Z",
|
||||
"start_time": "2023-05-25T15:03:25.690273Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Initialize the Zep Chat Message History Class and add a chat message history to the memory store\n",
|
||||
"\n",
|
||||
"**NOTE:** Unlike other Retrievers, the content returned by the Zep Retriever is session/user specific. A `session_id` is required when instantiating the Retriever."
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:29.118416Z",
|
||||
"start_time": "2023-05-25T15:03:29.022464Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"session_id = str(uuid4()) # This is a unique identifier for the user/session\n",
|
||||
@@ -94,21 +77,18 @@
|
||||
" session_id=session_id,\n",
|
||||
" url=ZEP_API_URL,\n",
|
||||
")"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:29.118416Z",
|
||||
"start_time": "2023-05-25T15:03:29.022464Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:30.271181Z",
|
||||
"start_time": "2023-05-25T15:03:30.180442Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Preload some messages into the memory. The default message window is 12 messages. We want to push beyond this to demonstrate auto-summarization.\n",
|
||||
@@ -177,42 +157,35 @@
|
||||
" if msg[\"role\"] == \"human\"\n",
|
||||
" else AIMessage(content=msg[\"content\"])\n",
|
||||
" )\n"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:30.271181Z",
|
||||
"start_time": "2023-05-25T15:03:30.180442Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Use the Zep Retriever to vector search over the Zep memory\n",
|
||||
"\n",
|
||||
"Zep provides native vector search over historical conversation memory. Embedding happens automatically.\n",
|
||||
"\n",
|
||||
"NOTE: Embedding of messages occurs asynchronously, so the first query may not return results. Subsequent queries will return results as the embeddings are generated."
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:32.979155Z",
|
||||
"start_time": "2023-05-25T15:03:32.590310Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759001673780126, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n",
|
||||
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602262941130749, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n",
|
||||
" Document(page_content='Who were her contemporaries?', metadata={'score': 0.757553366415519, 'uuid': '41f9c41a-a205-41e1-b48b-a0a4cd943fc8', 'created_at': '2023-05-25T15:03:30.243995Z', 'role': 'human', 'token_count': 8}),\n",
|
||||
" Document(page_content='Octavia Estelle Butler (June 22, 1947 – February 24, 2006) was an American science fiction author.', metadata={'score': 0.7546211059317948, 'uuid': '34678311-0098-4f1a-8fd4-5615ac692deb', 'created_at': '2023-05-25T15:03:30.231427Z', 'role': 'ai', 'token_count': 31}),\n",
|
||||
" Document(page_content='Which books of hers were made into movies?', metadata={'score': 0.7496714959247069, 'uuid': '18046c3a-9666-4d3e-b4f0-43d1394732b7', 'created_at': '2023-05-25T15:03:30.236837Z', 'role': 'human', 'token_count': 11})]"
|
||||
]
|
||||
"text/plain": "[Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759001673780126, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602262941130749, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n Document(page_content='Who were her contemporaries?', metadata={'score': 0.757553366415519, 'uuid': '41f9c41a-a205-41e1-b48b-a0a4cd943fc8', 'created_at': '2023-05-25T15:03:30.243995Z', 'role': 'human', 'token_count': 8}),\n Document(page_content='Octavia Estelle Butler (June 22, 1947 – February 24, 2006) was an American science fiction author.', metadata={'score': 0.7546211059317948, 'uuid': '34678311-0098-4f1a-8fd4-5615ac692deb', 'created_at': '2023-05-25T15:03:30.231427Z', 'role': 'ai', 'token_count': 31}),\n Document(page_content='Which books of hers were made into movies?', metadata={'score': 0.7496714959247069, 'uuid': '18046c3a-9666-4d3e-b4f0-43d1394732b7', 'created_at': '2023-05-25T15:03:30.236837Z', 'role': 'human', 'token_count': 11})]"
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
@@ -229,38 +202,31 @@
|
||||
")\n",
|
||||
"\n",
|
||||
"await zep_retriever.aget_relevant_documents(\"Who wrote Parable of the Sower?\")"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:32.979155Z",
|
||||
"start_time": "2023-05-25T15:03:32.590310Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use the Zep sync API to retrieve results:"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:34.713354Z",
|
||||
"start_time": "2023-05-25T15:03:34.577974Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.8897321402776546, 'uuid': '1c09603a-52c1-40d7-9d69-29f26256029c', 'created_at': '2023-05-25T15:03:30.268257Z', 'role': 'ai', 'token_count': 56}),\n",
|
||||
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8857628682610436, 'uuid': 'f6706e8c-6c91-452f-8c1b-9559fd924657', 'created_at': '2023-05-25T15:03:30.265302Z', 'role': 'human', 'token_count': 23}),\n",
|
||||
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759670375149477, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n",
|
||||
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602854653476563, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n",
|
||||
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7595293992240313, 'uuid': 'f22f2498-6118-4c74-8718-aa89ccd7e3d6', 'created_at': '2023-05-25T15:03:30.261198Z', 'role': 'ai', 'token_count': 18})]"
|
||||
]
|
||||
"text/plain": "[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.8897321402776546, 'uuid': '1c09603a-52c1-40d7-9d69-29f26256029c', 'created_at': '2023-05-25T15:03:30.268257Z', 'role': 'ai', 'token_count': 56}),\n Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8857628682610436, 'uuid': 'f6706e8c-6c91-452f-8c1b-9559fd924657', 'created_at': '2023-05-25T15:03:30.265302Z', 'role': 'human', 'token_count': 23}),\n Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759670375149477, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602854653476563, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7595293992240313, 'uuid': 'f22f2498-6118-4c74-8718-aa89ccd7e3d6', 'created_at': '2023-05-25T15:03:30.261198Z', 'role': 'ai', 'token_count': 18})]"
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
@@ -269,44 +235,48 @@
|
||||
],
|
||||
"source": [
|
||||
"zep_retriever.get_relevant_documents(\"Who wrote Parable of the Sower?\")"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-25T15:03:34.713354Z",
|
||||
"start_time": "2023-05-25T15:03:34.577974Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"outputs": [],
|
||||
"source": [],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-18T20:09:21.298710Z",
|
||||
"start_time": "2023-05-18T20:09:21.297169Z"
|
||||
},
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
|
||||
@@ -1,131 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73dbcdb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SentenceTransformersTokenTextSplitter\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use the `SentenceTransformersTokenTextSplitter` text splitter.\n",
|
||||
"\n",
|
||||
"Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you should use the same tokenizer as used in the language model. \n",
|
||||
"\n",
|
||||
"The `SentenceTransformersTokenTextSplitter` is a specialized text splitter for use with the sentence-transformer models. The default behaviour is to split the text into chunks that fit the token window of the sentence transformer model that you would like to use."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "9dd5419e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import SentenceTransformersTokenTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "b43e5d54",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0)\n",
|
||||
"text = \"Lorem \""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1df84cb4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"count_start_and_stop_tokens = 2\n",
|
||||
"text_token_count = splitter.count_tokens(text=text) - count_start_and_stop_tokens\n",
|
||||
"print(text_token_count)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "d7ad2213",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"tokens in text to split: 514\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"token_multiplier = splitter.maximum_tokens_per_chunk // text_token_count + 1\n",
|
||||
"\n",
|
||||
"# `text_to_split` does not fit in a single chunk\n",
|
||||
"text_to_split = text * token_multiplier\n",
|
||||
"\n",
|
||||
"print(f\"tokens in text to split: {splitter.count_tokens(text=text_to_split)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "818aea04",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"lorem\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"text_chunks = splitter.split_text(text=text_to_split)\n",
|
||||
"\n",
|
||||
"print(text_chunks[1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e9ba4f23",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,580 +1,238 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {
|
||||
"id": "683953b3"
|
||||
},
|
||||
"source": [
|
||||
"# ElasticSearch\n",
|
||||
"\n",
|
||||
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Elasticsearch` database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticVectorSearch class"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
"id": "tKSYjyTBtSLc"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409"
|
||||
},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718",
|
||||
"metadata": {
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718"
|
||||
},
|
||||
"source": [
|
||||
"Check out [Elasticsearch installation instructions](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html).\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that does not require\n",
|
||||
"login credentials, pass the Elasticsearch URL and index name along with the\n",
|
||||
"embedding object to the constructor.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that requires login credentials,\n",
|
||||
"including Elastic Cloud, use the Elasticsearch URL format\n",
|
||||
"https://username:password@es_host:9243. For example, to connect to Elastic\n",
|
||||
"Cloud, create the Elasticsearch URL with the required authentication details and\n",
|
||||
"pass it to the ElasticVectorSearch constructor as the named parameter\n",
|
||||
"elasticsearch_url.\n",
|
||||
"\n",
|
||||
"You can obtain your Elastic Cloud URL and login credentials by logging in to the\n",
|
||||
"Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and\n",
|
||||
"navigating to the \"Deployments\" page.\n",
|
||||
"\n",
|
||||
"To obtain your Elastic Cloud password for the default \"elastic\" user:\n",
|
||||
"1. Log in to the Elastic Cloud console at https://cloud.elastic.co\n",
|
||||
"2. Go to \"Security\" > \"Users\"\n",
|
||||
"3. Locate the \"elastic\" user and click \"Edit\"\n",
|
||||
"4. Click \"Reset password\"\n",
|
||||
"5. Follow the prompts to reset the password\n",
|
||||
"\n",
|
||||
"Format for Elastic Cloud URLs is\n",
|
||||
"https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
" elastic_host = \"cluster_id.region_id.gcp.cloud.es.io\"\n",
|
||||
" elasticsearch_url = f\"https://username:password@{elastic_host}:9243\"\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=elasticsearch_url,\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install elasticsearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0"
|
||||
},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "aac9563e"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import ElasticVectorSearch\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "a3c3999a"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"tags": [],
|
||||
"id": "12eb86d8"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4b172de8",
|
||||
"metadata": {
|
||||
"id": "4b172de8",
|
||||
"outputId": "ca05a209-4514-4b5c-f6cb-2348f58c19a2"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
|
||||
"\n",
|
||||
"We cannot let this happen. \n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# ElasticKnnSearch Class\n",
|
||||
"The `ElasticKnnSearch` implements features allowing storing vectors and documents in Elasticsearch for use with approximate [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "FheGPztJsrRB"
|
||||
},
|
||||
"id": "FheGPztJsrRB"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip install langchain elasticsearch"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "gRVcbh5zqCJQ"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"from langchain.vectorstores.elastic_vector_search import ElasticKnnSearch\n",
|
||||
"from langchain.embeddings import ElasticsearchEmbeddings\n",
|
||||
"import elasticsearch"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TJtqiw5AqBp8"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticsearchEmbeddings\n",
|
||||
"model_id = \"<model_id_from_es>\" \n",
|
||||
"dims = dim_count\n",
|
||||
"es_cloud_id = \"ESS_CLOUD_ID\"\n",
|
||||
"es_user = \"es_user\"\n",
|
||||
"es_password = \"es_pass\"\n",
|
||||
"test_index = \"<index_name>\"\n",
|
||||
"#input_field = \"your_input_field\" # if different from 'text_field'"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "XHfC0As6qN3T"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Generate embedding object\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
|
||||
" model_id,\n",
|
||||
" #input_field=input_field,\n",
|
||||
" es_cloud_id=es_cloud_id,\n",
|
||||
" es_user=es_user,\n",
|
||||
" es_password=es_password,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "UkTipx1lqc3h"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
"\tes_cloud_id=es_cloud_id, \n",
|
||||
"\tes_user=es_user, \n",
|
||||
"\tes_password=es_password, \n",
|
||||
"\tindex_name= test_index, \n",
|
||||
"\tembedding= embeddings\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "74psgD0oqjYK"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test adding vectors"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "7AfgIKLWqnQl"
|
||||
},
|
||||
"id": "7AfgIKLWqnQl"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `add_texts` method\n",
|
||||
"texts = [\"Hello, world!\", \"Machine learning is fun.\", \"I love Python.\"]\n",
|
||||
"knn_search.add_texts(texts)\n",
|
||||
"\n",
|
||||
"# Test `from_texts` method\n",
|
||||
"new_texts = [\"This is a new text.\", \"Elasticsearch is powerful.\", \"Python is great for data analysis.\"]\n",
|
||||
"knn_search.from_texts(new_texts, dims=dims)"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "yNUUIaL9qmze"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using query vector builder "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0zdR-Iubquov"
|
||||
},
|
||||
"id": "0zdR-Iubquov"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"kNN search results for query '{query}': {knn_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"Hybrid search results for query '{query}': {hybrid_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{hybrid_result['hits']['hits'][0]['_source']['text']}'\")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bwR4jYvqqxTo"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test knn search using pre generated vector \n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "ltXYqp0qqz7R"
|
||||
},
|
||||
"id": "ltXYqp0qqz7R"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Generate embedding for tests\n",
|
||||
"query_text = 'Hello'\n",
|
||||
"query_embedding = embeddings.embed_query(query_text)\n",
|
||||
"print(f\"Length of embedding: {len(query_embedding)}\\nFirst two items in embedding: {query_embedding[:2]}\")\n",
|
||||
"\n",
|
||||
"# Test knn Search\n",
|
||||
"knn_result = knn_search.knn_search(query_vector = query_embedding, k=2)\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n",
|
||||
"\n",
|
||||
"# Test hybrid search - Requires both query_text and query_vector\n",
|
||||
"knn_result = knn_search.knn_hybrid_search(query_vector = query_embedding, query=query_text, k=2)\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "O5COtpTqq23t"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test source option"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0dnmimcJq42C"
|
||||
},
|
||||
"id": "0dnmimcJq42C"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2, source=False)\n",
|
||||
"assert not '_source' in knn_result['hits']['hits'][0].keys()\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2, source=False)\n",
|
||||
"assert not '_source' in hybrid_result['hits']['hits'][0].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "v4_B72nHq7g1"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Test fields option "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "teHgJgrlq-Jb"
|
||||
},
|
||||
"id": "teHgJgrlq-Jb"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2, fields=['text'])\n",
|
||||
"assert 'text' in knn_result['hits']['hits'][0]['fields'].keys()\n",
|
||||
"\n",
|
||||
"# Test `hybrid_search` method\n",
|
||||
"query = \"Hello\"\n",
|
||||
"hybrid_result = knn_search.knn_hybrid_search(query = query, model_id= model_id, k=2, fields=['text'])\n",
|
||||
"assert 'text' in hybrid_result['hits']['hits'][0]['fields'].keys()"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "utNBbpZYrAYW"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Test with es client connection rather than cloud_id "
|
||||
],
|
||||
"metadata": {
|
||||
"id": "hddsIFferBy1"
|
||||
},
|
||||
"id": "hddsIFferBy1"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Create Elasticsearch connection\n",
|
||||
"es_connection = Elasticsearch(\n",
|
||||
" hosts=['https://es_cluster_url:port'], \n",
|
||||
" basic_auth=('user', 'password')\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "bXqrUnoirFia"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Instantiate ElasticsearchEmbeddings using es_connection\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_es_connection(\n",
|
||||
" model_id,\n",
|
||||
" es_connection,\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "TIM__Hm8rSEW"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Initialize ElasticKnnSearch\n",
|
||||
"knn_search = ElasticKnnSearch(\n",
|
||||
"\tes_connection = es_connection,\n",
|
||||
"\tindex_name= test_index, \n",
|
||||
"\tembedding= embeddings\n",
|
||||
")"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "1-CdnOrArVc_"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Test `knn_search` method with model_id and query_text\n",
|
||||
"query = \"Hello\"\n",
|
||||
"knn_result = knn_search.knn_search(query = query, model_id= model_id, k=2)\n",
|
||||
"print(f\"kNN search results for query '{query}': {knn_result}\")\n",
|
||||
"print(f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "0kgyaL6QrYVF"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"id": "0kgyaL6QrYVF"
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"colab": {
|
||||
"provenance": []
|
||||
}
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ElasticSearch\n",
|
||||
"\n",
|
||||
">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Elasticsearch` database."
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Installation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81f43794-f002-477c-9b68-4975df30e718",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Check out [Elasticsearch installation instructions](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html).\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that does not require\n",
|
||||
"login credentials, pass the Elasticsearch URL and index name along with the\n",
|
||||
"embedding object to the constructor.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=\"http://localhost:9200\",\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"To connect to an Elasticsearch instance that requires login credentials,\n",
|
||||
"including Elastic Cloud, use the Elasticsearch URL format\n",
|
||||
"https://username:password@es_host:9243. For example, to connect to Elastic\n",
|
||||
"Cloud, create the Elasticsearch URL with the required authentication details and\n",
|
||||
"pass it to the ElasticVectorSearch constructor as the named parameter\n",
|
||||
"elasticsearch_url.\n",
|
||||
"\n",
|
||||
"You can obtain your Elastic Cloud URL and login credentials by logging in to the\n",
|
||||
"Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and\n",
|
||||
"navigating to the \"Deployments\" page.\n",
|
||||
"\n",
|
||||
"To obtain your Elastic Cloud password for the default \"elastic\" user:\n",
|
||||
"1. Log in to the Elastic Cloud console at https://cloud.elastic.co\n",
|
||||
"2. Go to \"Security\" > \"Users\"\n",
|
||||
"3. Locate the \"elastic\" user and click \"Edit\"\n",
|
||||
"4. Click \"Reset password\"\n",
|
||||
"5. Follow the prompts to reset the password\n",
|
||||
"\n",
|
||||
"Format for Elastic Cloud URLs is\n",
|
||||
"https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.\n",
|
||||
"\n",
|
||||
"Example:\n",
|
||||
"```python\n",
|
||||
" from langchain import ElasticVectorSearch\n",
|
||||
" from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
" embedding = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
" elastic_host = \"cluster_id.region_id.gcp.cloud.es.io\"\n",
|
||||
" elasticsearch_url = f\"https://username:password@{elastic_host}:9243\"\n",
|
||||
" elastic_vector_search = ElasticVectorSearch(\n",
|
||||
" elasticsearch_url=elasticsearch_url,\n",
|
||||
" index_name=\"test_index\",\n",
|
||||
" embedding=embedding\n",
|
||||
" )\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install elasticsearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6030187-0bd7-4798-8372-a265036af5e0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import ElasticVectorSearch\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "12eb86d8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = ElasticVectorSearch.from_documents(docs, embeddings, elasticsearch_url=\"http://localhost:9200\")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "4b172de8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
|
||||
"\n",
|
||||
"We cannot let this happen. \n",
|
||||
"\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
|
||||
@@ -118,14 +118,15 @@
|
||||
"\n",
|
||||
"db_name = \"lanchain_db\"\n",
|
||||
"collection_name = \"langchain_col\"\n",
|
||||
"collection = client[db_name][collection_name]\n",
|
||||
"namespace = f\"{db_name}.{collection_name}\"\n",
|
||||
"index_name = \"langchain_demo\"\n",
|
||||
"\n",
|
||||
"# insert the documents in MongoDB Atlas with their embedding\n",
|
||||
"docsearch = MongoDBAtlasVectorSearch.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" collection=collection,\n",
|
||||
" client=client,\n",
|
||||
" namespace=namespace,\n",
|
||||
" index_name=index_name\n",
|
||||
")\n",
|
||||
"\n",
|
||||
|
||||
@@ -24,48 +24,13 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Requirement already satisfied: pgvector in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.1.8)\n",
|
||||
"Requirement already satisfied: numpy in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from pgvector) (1.24.3)\n",
|
||||
"Requirement already satisfied: openai in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.27.7)\n",
|
||||
"Requirement already satisfied: requests>=2.20 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (2.28.2)\n",
|
||||
"Requirement already satisfied: tqdm in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (4.65.0)\n",
|
||||
"Requirement already satisfied: aiohttp in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from openai) (3.8.4)\n",
|
||||
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (3.1.0)\n",
|
||||
"Requirement already satisfied: idna<4,>=2.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (3.4)\n",
|
||||
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (1.26.15)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.20->openai) (2023.5.7)\n",
|
||||
"Requirement already satisfied: attrs>=17.3.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (23.1.0)\n",
|
||||
"Requirement already satisfied: multidict<7.0,>=4.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (6.0.4)\n",
|
||||
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (4.0.2)\n",
|
||||
"Requirement already satisfied: yarl<2.0,>=1.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.9.2)\n",
|
||||
"Requirement already satisfied: frozenlist>=1.1.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.3.3)\n",
|
||||
"Requirement already satisfied: aiosignal>=1.1.2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from aiohttp->openai) (1.3.1)\n",
|
||||
"Requirement already satisfied: psycopg2-binary in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (2.9.6)\n",
|
||||
"Requirement already satisfied: tiktoken in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (0.4.0)\n",
|
||||
"Requirement already satisfied: regex>=2022.1.18 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from tiktoken) (2023.5.5)\n",
|
||||
"Requirement already satisfied: requests>=2.26.0 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from tiktoken) (2.28.2)\n",
|
||||
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)\n",
|
||||
"Requirement already satisfied: idna<4,>=2.5 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (3.4)\n",
|
||||
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (1.26.15)\n",
|
||||
"Requirement already satisfied: certifi>=2017.4.17 in /Users/joyeed/langchain/langchain/.venv/lib/python3.9/site-packages (from requests>=2.26.0->tiktoken) (2023.5.7)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Pip install necessary package\n",
|
||||
"!pip install pgvector\n",
|
||||
"!pip install openai\n",
|
||||
"!pip install psycopg2-binary\n",
|
||||
"!pip install tiktoken"
|
||||
"!pip install pgvector"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -77,17 +42,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
@@ -97,7 +54,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 61,
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -108,7 +65,7 @@
|
||||
"False"
|
||||
]
|
||||
},
|
||||
"execution_count": 61,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -122,7 +79,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 62,
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -137,7 +94,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 63,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -172,29 +129,6 @@
|
||||
"# postgresql+psycopg2://username:password@localhost:5432/database_name"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 64,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# ## PGVector needs the connection string to the database.\n",
|
||||
"# ## We will load it from the environment variables.\n",
|
||||
"# import os\n",
|
||||
"# CONNECTION_STRING = PGVector.connection_string_from_db_params(\n",
|
||||
"# driver=os.environ.get(\"PGVECTOR_DRIVER\", \"psycopg2\"),\n",
|
||||
"# host=os.environ.get(\"PGVECTOR_HOST\", \"localhost\"),\n",
|
||||
"# port=int(os.environ.get(\"PGVECTOR_PORT\", \"5432\")),\n",
|
||||
"# database=os.environ.get(\"PGVECTOR_DATABASE\", \"rd-embeddings\"),\n",
|
||||
"# user=os.environ.get(\"PGVECTOR_USER\", \"admin\"),\n",
|
||||
"# password=os.environ.get(\"PGVECTOR_PASSWORD\", \"password\"),\n",
|
||||
"# )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# ## Example\n",
|
||||
"# # postgresql+psycopg2://username:password@localhost:5432/database_name"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -211,7 +145,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 69,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -231,7 +165,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 70,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -239,7 +173,7 @@
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6076804864602984\n",
|
||||
"Score: 0.6076628081132506\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
@@ -249,7 +183,7 @@
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6076804864602984\n",
|
||||
"Score: 0.6076628081132506\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
@@ -259,32 +193,24 @@
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.659062774389974\n",
|
||||
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"Score: 0.6076804780049968\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.659062774389974\n",
|
||||
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"Score: 0.6076804780049968\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n"
|
||||
]
|
||||
}
|
||||
@@ -298,6 +224,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -305,6 +232,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -313,14 +241,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 55,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data=docs\n",
|
||||
"api_key=os.environ['OPENAI_API_KEY']\n",
|
||||
"db = PGVector.from_documents(\n",
|
||||
" documents=docs,\n",
|
||||
" documents=data,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" connection_string=connection_string,\n",
|
||||
@@ -331,6 +257,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
@@ -339,14 +266,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 56,
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"connection_string = CONNECTION_STRING \n",
|
||||
"embedding=embeddings\n",
|
||||
"collection_name=\"state_of_the_union\"\n",
|
||||
"from langchain.vectorstores.pgvector import DistanceStrategy\n",
|
||||
"store = PGVector(\n",
|
||||
" connection_string=connection_string, \n",
|
||||
" embedding_function=embedding, \n",
|
||||
@@ -356,127 +279,6 @@
|
||||
"\n",
|
||||
"retriever = store.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"vectorstore=<langchain.vectorstores.pgvector.PGVector object at 0x7fe9a1b1c670> search_type='similarity' search_kwargs={}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 83,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6075870262188066), (Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6075870262188066), (Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6589478388546668), (Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6589478388546668)]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# When we have an existing PG VEctor \n",
|
||||
"DEFAULT_DISTANCE_STRATEGY = DistanceStrategy.EUCLIDEAN\n",
|
||||
"db1 = PGVector.from_existing_index(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" collection_name=\"state_of_the_union\",\n",
|
||||
" distance_strategy=DEFAULT_DISTANCE_STRATEGY,\n",
|
||||
" pre_delete_collection = False,\n",
|
||||
" connection_string=CONNECTION_STRING,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score: List[Tuple[Document, float]] = db1.similarity_search_with_score(query)\n",
|
||||
"print(docs_with_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 81,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6075870262188066\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6075870262188066\n",
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6589478388546668\n",
|
||||
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"--------------------------------------------------------------------------------\n",
|
||||
"Score: 0.6589478388546668\n",
|
||||
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
|
||||
"\n",
|
||||
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
|
||||
"\n",
|
||||
"We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
|
||||
"\n",
|
||||
"We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
|
||||
"\n",
|
||||
"We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
|
||||
"\n",
|
||||
"We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
|
||||
"--------------------------------------------------------------------------------\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for doc, score in docs_with_score:\n",
|
||||
" print(\"-\" * 80)\n",
|
||||
" print(\"Score: \", score)\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"-\" * 80)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -495,7 +297,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.7"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -401,18 +401,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "525e3582",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Metadata filtering\n",
|
||||
"\n",
|
||||
"Qdrant has an [extensive filtering system](https://qdrant.tech/documentation/concepts/filtering/) with rich type support. It is also possible to use the filters in Langchain, by passing an additional param to both the `similarity_search_with_score` and `similarity_search` methods."
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1c2c58dc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"from qdrant_client.http import models as rest\n",
|
||||
@@ -420,7 +419,10 @@
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = qdrant.similarity_search_with_score(query, filter=rest.Filter(...))\n",
|
||||
"```"
|
||||
]
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -681,7 +683,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -9,15 +9,16 @@ By default, Chains and Agents are stateless,
|
||||
meaning that they treat each incoming query independently (as are the underlying LLMs and chat models).
|
||||
In some applications (chatbots being a GREAT example) it is highly important
|
||||
to remember previous interactions, both at a short term but also at a long term level.
|
||||
The **Memory** does exactly that.
|
||||
The concept of “Memory” exists to do exactly that.
|
||||
|
||||
LangChain provides memory components in two forms.
|
||||
First, LangChain provides helper utilities for managing and manipulating previous chat messages.
|
||||
These are designed to be modular and useful regardless of how they are used.
|
||||
Secondly, LangChain provides easy ways to incorporate these utilities into chains.
|
||||
|
||||
|
|
||||
- `Getting Started <./memory/getting_started.html>`_: An overview of different types of memory.
|
||||
The following sections of documentation are provided:
|
||||
|
||||
- `Getting Started <./memory/getting_started.html>`_: An overview of how to get started with different types of memory.
|
||||
|
||||
- `How-To Guides <./memory/how_to_guides.html>`_: A collection of how-to guides. These highlight different types of memory, as well as how to use memory in chains.
|
||||
|
||||
@@ -27,7 +28,6 @@ Secondly, LangChain provides easy ways to incorporate these utilities into chain
|
||||
:maxdepth: 1
|
||||
:caption: Memory
|
||||
:name: Memory
|
||||
:hidden:
|
||||
|
||||
./memory/getting_started.html
|
||||
./memory/getting_started.ipynb
|
||||
./memory/how_to_guides.rst
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user