mirror of
https://github.com/hwchase17/langchain.git
synced 2026-04-21 19:27:58 +00:00
Compare commits
1 Commits
v0.0.180
...
eugene/pro
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
900c87c018 |
@@ -1,20 +0,0 @@
|
||||
# ModelScope
|
||||
|
||||
This page covers how to use the modelscope ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific modelscope wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
* Install the Python SDK with `pip install modelscope`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### Embeddings
|
||||
|
||||
There exists a modelscope Embeddings wrapper, which you can access with
|
||||
|
||||
```python
|
||||
from langchain.embeddings import ModelScopeEmbeddings
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/modelscope_hub.ipynb)
|
||||
@@ -23,7 +23,7 @@ The results of these actions can then be fed back into the language model to gen
|
||||
## ReAct
|
||||
|
||||
`ReAct` is a prompting technique that combines Chain-of-Thought prompting with action plan generation.
|
||||
This induces the model to think about what action to take, then take it.
|
||||
This induces the to model to think about what action to take, then take it.
|
||||
|
||||
- [Paper](https://arxiv.org/pdf/2210.03629.pdf)
|
||||
- [LangChain Example](../modules/agents/agents/examples/react.ipynb)
|
||||
|
||||
@@ -67,8 +67,8 @@ For each module LangChain provides standard, extendable interfaces. LangChain al
|
||||
|
||||
./modules/models.rst
|
||||
./modules/prompts.rst
|
||||
./modules/memory.md
|
||||
./modules/indexes.md
|
||||
./modules/memory.md
|
||||
./modules/chains.md
|
||||
./modules/agents.md
|
||||
./modules/callbacks/getting_started.ipynb
|
||||
@@ -115,8 +115,8 @@ Use Cases
|
||||
./use_cases/tabular.rst
|
||||
./use_cases/code.md
|
||||
./use_cases/apis.md
|
||||
./use_cases/extraction.md
|
||||
./use_cases/summarization.md
|
||||
./use_cases/extraction.md
|
||||
./use_cases/evaluation.rst
|
||||
|
||||
|
||||
@@ -126,10 +126,7 @@ Reference Docs
|
||||
| Full documentation on all methods, classes, installation methods, and integration setups for LangChain.
|
||||
|
||||
|
||||
- `LangChain Installation <./reference/installation.html>`_
|
||||
|
||||
- `Reference Documentation <./reference.html>`_
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Reference
|
||||
@@ -144,16 +141,14 @@ Ecosystem
|
||||
------------
|
||||
|
||||
| LangChain integrates a lot of different LLMs, systems, and products.
|
||||
| From the other side, many systems and products depend on LangChain.
|
||||
| It creates a vibrant and thriving ecosystem.
|
||||
From the other side, many systems and products depend on LangChain.
|
||||
It creates a vibrant and thriving ecosystem.
|
||||
|
||||
|
||||
- `Integrations <./integrations.html>`_: Guides for how other products can be used with LangChain.
|
||||
|
||||
- `Dependents <./dependents.html>`_: List of repositories that use LangChain.
|
||||
|
||||
- `Deployments <./ecosystem/deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
@@ -164,7 +159,6 @@ Ecosystem
|
||||
|
||||
./integrations.rst
|
||||
./dependents.md
|
||||
./ecosystem/deployments.md
|
||||
|
||||
|
||||
Additional Resources
|
||||
@@ -176,6 +170,8 @@ Additional Resources
|
||||
|
||||
- `Gallery <https://github.com/kyrolabs/awesome-langchain>`_: A collection of great projects that use Langchain, compiled by the folks at `Kyrolabs <https://kyrolabs.com>`_. Useful for finding inspiration and example implementations.
|
||||
|
||||
- `Deployments <./additional_resources/deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
|
||||
|
||||
- `Tracing <./additional_resources/tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
|
||||
|
||||
- `Model Laboratory <./additional_resources/model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.
|
||||
@@ -195,6 +191,7 @@ Additional Resources
|
||||
|
||||
LangChainHub <https://github.com/hwchase17/langchain-hub>
|
||||
Gallery <https://github.com/kyrolabs/awesome-langchain>
|
||||
./additional_resources/deployments.md
|
||||
./additional_resources/tracing.md
|
||||
./additional_resources/model_laboratory.ipynb
|
||||
Discord <https://discord.gg/6adMQxSpJS>
|
||||
|
||||
@@ -6,7 +6,7 @@ LangChain integrates with many LLMs, systems, and products.
|
||||
Integrations by Module
|
||||
--------------------------------
|
||||
|
||||
| Integrations grouped by the core LangChain module they map to:
|
||||
Integrations grouped by the core LangChain module they map to:
|
||||
|
||||
|
||||
- `LLM Providers <./modules/models/llms/integrations.html>`_
|
||||
@@ -23,7 +23,7 @@ Integrations by Module
|
||||
All Integrations
|
||||
-------------------------------------------
|
||||
|
||||
| A comprehensive list of LLMs, systems, and products integrated with LangChain:
|
||||
A comprehensive list of LLMs, systems, and products integrated with LangChain:
|
||||
|
||||
|
||||
.. toctree::
|
||||
|
||||
@@ -1,92 +0,0 @@
|
||||
# Beam
|
||||
|
||||
This page covers how to use Beam within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Beam wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- [Create an account](https://www.beam.cloud/)
|
||||
- Install the Beam CLI with `curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh`
|
||||
- Register API keys with `beam configure`
|
||||
- Set environment variables (`BEAM_CLIENT_ID`) and (`BEAM_CLIENT_SECRET`)
|
||||
- Install the Beam SDK `pip install beam-sdk`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists a Beam LLM wrapper, which you can access with
|
||||
|
||||
```python
|
||||
from langchain.llms.beam import Beam
|
||||
```
|
||||
|
||||
## Define your Beam app.
|
||||
|
||||
This is the environment you’ll be developing against once you start the app.
|
||||
It's also used to define the maximum response length from the model.
|
||||
```python
|
||||
llm = Beam(model_name="gpt2",
|
||||
name="langchain-gpt2-test",
|
||||
cpu=8,
|
||||
memory="32Gi",
|
||||
gpu="A10G",
|
||||
python_version="python3.8",
|
||||
python_packages=[
|
||||
"diffusers[torch]>=0.10",
|
||||
"transformers",
|
||||
"torch",
|
||||
"pillow",
|
||||
"accelerate",
|
||||
"safetensors",
|
||||
"xformers",],
|
||||
max_length="50",
|
||||
verbose=False)
|
||||
```
|
||||
|
||||
## Deploy your Beam app
|
||||
|
||||
Once defined, you can deploy your Beam app by calling your model's `_deploy()` method.
|
||||
|
||||
```python
|
||||
llm._deploy()
|
||||
```
|
||||
|
||||
## Call your Beam app
|
||||
|
||||
Once a beam model is deployed, it can be called by callying your model's `_call()` method.
|
||||
This returns the GPT2 text response to your prompt.
|
||||
|
||||
```python
|
||||
response = llm._call("Running machine learning on a remote GPU")
|
||||
```
|
||||
|
||||
An example script which deploys the model and calls it would be:
|
||||
|
||||
```python
|
||||
from langchain.llms.beam import Beam
|
||||
import time
|
||||
|
||||
llm = Beam(model_name="gpt2",
|
||||
name="langchain-gpt2-test",
|
||||
cpu=8,
|
||||
memory="32Gi",
|
||||
gpu="A10G",
|
||||
python_version="python3.8",
|
||||
python_packages=[
|
||||
"diffusers[torch]>=0.10",
|
||||
"transformers",
|
||||
"torch",
|
||||
"pillow",
|
||||
"accelerate",
|
||||
"safetensors",
|
||||
"xformers",],
|
||||
max_length="50",
|
||||
verbose=False)
|
||||
|
||||
llm._deploy()
|
||||
|
||||
response = llm._call("Running machine learning on a remote GPU")
|
||||
|
||||
print(response)
|
||||
```
|
||||
@@ -1,280 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Databricks\n",
|
||||
"\n",
|
||||
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
|
||||
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install databricks-sql-connector"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Connecting to Databricks\n",
|
||||
"\n",
|
||||
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
|
||||
"\n",
|
||||
"### Syntax\n",
|
||||
"```python\n",
|
||||
"SQLDatabase.from_databricks(\n",
|
||||
" catalog: str,\n",
|
||||
" schema: str,\n",
|
||||
" host: Optional[str] = None,\n",
|
||||
" api_token: Optional[str] = None,\n",
|
||||
" warehouse_id: Optional[str] = None,\n",
|
||||
" cluster_id: Optional[str] = None,\n",
|
||||
" engine_args: Optional[dict] = None,\n",
|
||||
" **kwargs: Any)\n",
|
||||
"```\n",
|
||||
"### Required Parameters\n",
|
||||
"* `catalog`: The catalog name in the Databricks database.\n",
|
||||
"* `schema`: The schema name in the catalog.\n",
|
||||
"\n",
|
||||
"### Optional Parameters\n",
|
||||
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
|
||||
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
|
||||
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_API_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
|
||||
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
|
||||
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
|
||||
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
|
||||
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Examples"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Connecting to Databricks with SQLDatabase wrapper\n",
|
||||
"from langchain import SQLDatabase\n",
|
||||
"\n",
|
||||
"db = SQLDatabase.from_databricks(catalog='samples', schema='nyctaxi')"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating a OpenAI Chat LLM wrapper\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### SQL Chain example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "36f2270b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import SQLDatabaseChain\n",
|
||||
"\n",
|
||||
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "4e2b5f25",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new SQLDatabaseChain chain...\u001B[0m\n",
|
||||
"What is the average duration of taxi rides that start between midnight and 6am?\n",
|
||||
"SQLQuery:\u001B[32;1m\u001B[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
|
||||
"FROM trips\n",
|
||||
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001B[0m\n",
|
||||
"SQLResult: \u001B[33;1m\u001B[1;3m[(987.8122786304605,)]\u001B[0m\n",
|
||||
"Answer:\u001B[32;1m\u001B[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001B[0m\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_chain.run(\"What is the average duration of taxi rides that start between midnight and 6am?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### SQL Database Agent example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Database Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html) for answering questions over a Databricks database."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "9918e86a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_sql_agent\n",
|
||||
"from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
|
||||
"agent = create_sql_agent(\n",
|
||||
" llm=llm,\n",
|
||||
" toolkit=toolkit,\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c484a76e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
|
||||
"\u001B[32;1m\u001B[1;3mAction: list_tables_sql_db\n",
|
||||
"Action Input: \u001B[0m\n",
|
||||
"Observation: \u001B[38;5;200m\u001B[1;3mtrips\u001B[0m\n",
|
||||
"Thought:\u001B[32;1m\u001B[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
|
||||
"Action: schema_sql_db\n",
|
||||
"Action Input: trips\u001B[0m\n",
|
||||
"Observation: \u001B[33;1m\u001B[1;3m\n",
|
||||
"CREATE TABLE trips (\n",
|
||||
"\ttpep_pickup_datetime TIMESTAMP, \n",
|
||||
"\ttpep_dropoff_datetime TIMESTAMP, \n",
|
||||
"\ttrip_distance FLOAT, \n",
|
||||
"\tfare_amount FLOAT, \n",
|
||||
"\tpickup_zip INT, \n",
|
||||
"\tdropoff_zip INT\n",
|
||||
") USING DELTA\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from trips table:\n",
|
||||
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
|
||||
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
|
||||
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
|
||||
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
|
||||
"*/\u001B[0m\n",
|
||||
"Thought:\u001B[32;1m\u001B[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
|
||||
"Action: query_checker_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
|
||||
"Observation: \u001B[31;1m\u001B[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
|
||||
"Thought:\u001B[32;1m\u001B[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
|
||||
"Action: query_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
|
||||
"Observation: \u001B[36;1m\u001B[1;3m[(30.6, '0 00:43:31.000000000')]\u001B[0m\n",
|
||||
"Thought:\u001B[32;1m\u001B[1;3mI now know the final answer.\n",
|
||||
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001B[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": "'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the longest trip distance and how long did it take?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,20 +0,0 @@
|
||||
# Psychic
|
||||
|
||||
This page covers how to use [Psychic](https://www.psychic.dev/) within LangChain.
|
||||
|
||||
## What is Psychic?
|
||||
|
||||
Psychic is a platform for integrating with your customer’s SaaS tools like Notion, Zendesk, Confluence, and Google Drive via OAuth and syncing documents from these applications to your SQL or vector database. You can think of it like Plaid for unstructured data. Psychic is easy to set up - you use it by importing the react library and configuring it with your Sidekick API key, which you can get from the [Psychic dashboard](https://dashboard.psychic.dev/). When your users connect their applications, you can view these connections from the dashboard and retrieve data using the server-side libraries.
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Create an account in the [dashboard](https://dashboard.psychic.dev/).
|
||||
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. Users will use this to connect their SaaS apps.
|
||||
3. Once your user has created a connection, you can use the langchain PsychicLoader by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
|
||||
|
||||
|
||||
# Advantages vs Other Document Loaders
|
||||
|
||||
1. **Universal API:** Instead of building OAuth flows and learning the APIs for every SaaS app, you integrate Psychic once and leverage our universal API to retrieve data.
|
||||
2. **Data Syncs:** Data in your customers' SaaS apps can get stale fast. With Psychic you can configure webhooks to keep your documents up to date on a daily or realtime basis.
|
||||
3. **Simplified OAuth:** Psychic handles OAuth end-to-end so that you don't have to spend time creating OAuth clients for each integration, keeping access tokens fresh, and handling OAuth redirect logic.
|
||||
@@ -1,40 +0,0 @@
|
||||
# Vectara
|
||||
|
||||
|
||||
What is Vectara?
|
||||
|
||||
**Vectara Overview:**
|
||||
- Vectara is developer-first API platform for building conversational search applications
|
||||
- To use Vectara - first [sign up](https://console.vectara.com/signup) and create an account. Then create a corpus and an API key for indexing and searching.
|
||||
- You can use Vectara's [indexing API](https://docs.vectara.com/docs/indexing-apis/indexing) to add documents into Vectara's index
|
||||
- You can use Vectara's [Search API](https://docs.vectara.com/docs/search-apis/search) to query Vectara's index (which also supports Hybrid search implicitly).
|
||||
- You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.
|
||||
|
||||
## Installation and Setup
|
||||
To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Vectara
|
||||
```
|
||||
|
||||
To create an instance of the Vectara vectorstore:
|
||||
```python
|
||||
vectara = Vectara(
|
||||
vectara_customer_id=customer_id,
|
||||
vectara_corpus_id=corpus_id,
|
||||
vectara_api_key=api_key
|
||||
)
|
||||
```
|
||||
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
|
||||
|
||||
|
||||
For a more detailed walkthrough of the Vectara wrapper, see one of the two example notebooks:
|
||||
* [Chat Over Documents with Vectara](./vectara/vectara_chat.html)
|
||||
* [Vectara Text Generation](./vectara/vectara_text_generation.html)
|
||||
|
||||
|
||||
@@ -1,726 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "134a0785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Chat Over Documents with Vectara\n",
|
||||
"\n",
|
||||
"This notebook is based on the [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/chat_vector_db.ipynb) notebook, but using Vectara as the vector database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "70c4e529",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.vectorstores import Vectara\n",
|
||||
"from langchain.vectorstores.vectara import VectaraRetriever\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cdff94be",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Load in documents. You can replace this with a loader for whatever type of data you want"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "01c46e92",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "239475d2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a8930cf7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = Vectara.from_documents(documents, embedding=None)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "898b574b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now create a memory object, which is neccessary to track the inputs/outputs and hold a conversation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "af803fee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3c96b118",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We now initialize the `ConversationalRetrievalChain`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "7b4110f3",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'langchain.vectorstores.vectara.Vectara'>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"openai_api_key = os.environ['OPENAI_API_KEY']\n",
|
||||
"llm = OpenAI(openai_api_key=openai_api_key, temperature=0)\n",
|
||||
"retriever = VectaraRetriever(vectorstore, alpha=0.025, k=5, filter=None)\n",
|
||||
"\n",
|
||||
"print(type(vectorstore))\n",
|
||||
"d = retriever.get_relevant_documents('What did the president say about Ketanji Brown Jackson')\n",
|
||||
"\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "e8ce4fe9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "4c79862b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result[\"answer\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c697d9d1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"Did he mention who she suceeded\"\n",
|
||||
"result = qa({\"question\": query})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "ba0678f3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Justice Stephen Breyer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b3308b01-5300-4999-8cd3-22f16dae757e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pass in chat history\n",
|
||||
"\n",
|
||||
"In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1b41a10b-bf68-4689-8f00-9aed7675e2ab",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "83f38c18-ac82-45f4-a79e-8b37ce1ae115",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's an example of asking a question with no chat history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "bc672290-8a8b-4828-a90c-f1bbdd6b3920",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "6b62d758-c069-4062-88f0-21e7ea4710bf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result[\"answer\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8c26a83d-c945-4458-b54a-c6bd7f391303",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's an example of asking a question with some chat history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "9c95460b-7116-4155-a9d2-c0fb027ee592",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = [(query, result[\"answer\"])]\n",
|
||||
"query = \"Did he mention who she suceeded\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "698ac00c-cadc-407f-9423-226b2d9258d0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Justice Stephen Breyer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0eaadf0f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Return Source Documents\n",
|
||||
"You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "562769c6",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "ea478300",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "4cb75b4e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['source_documents'][0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "669ede2f-d69f-4960-8468-8a768ce1a55f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ConversationalRetrievalChain with `search_distance`\n",
|
||||
"If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "f4f32c6f-8e49-44af-9116-8830b1fcc5f2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectordbkwargs = {\"search_distance\": 0.9}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "1e251775-31e7-4679-b744-d4a57937f93a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)\n",
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "99b96dae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ConversationalRetrievalChain with `map_reduce`\n",
|
||||
"We can also use different types of combine document chains with the ConversationalRetrievalChain chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "e53a9d66",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "bf205e35",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
|
||||
"doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n",
|
||||
"\n",
|
||||
"chain = ConversationalRetrievalChain(\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" question_generator=question_generator,\n",
|
||||
" combine_docs_chain=doc_chain,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "78155887",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = chain({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "e54b5fa2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The president did not mention Ketanji Brown Jackson.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a2fe6b14",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ConversationalRetrievalChain with Question Answering with sources\n",
|
||||
"\n",
|
||||
"You can also use this chain with the question answering with sources chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "d1058fd2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "a6594482",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
|
||||
"doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n",
|
||||
"\n",
|
||||
"chain = ConversationalRetrievalChain(\n",
|
||||
" retriever=vectorstore.as_retriever(),\n",
|
||||
" question_generator=question_generator,\n",
|
||||
" combine_docs_chain=doc_chain,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "e2badd21",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = chain({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "edb31fe5",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The president did not mention Ketanji Brown Jackson.\\nSOURCES: ../../modules/state_of_the_union.txt'"
|
||||
]
|
||||
},
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## ConversationalRetrievalChain with streaming to `stdout`\n",
|
||||
"\n",
|
||||
"Output from the chain will be streamed to `stdout` token by token in this example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.llm import LLMChain\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"\n",
|
||||
"# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
|
||||
"# and a separate, non-streaming llm for question generation\n",
|
||||
"llm = OpenAI(temperature=0, openai_api_key=openai_api_key)\n",
|
||||
"streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, openai_api_key=openai_api_key)\n",
|
||||
"\n",
|
||||
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
|
||||
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
|
||||
"\n",
|
||||
"qa = ConversationalRetrievalChain(\n",
|
||||
" retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Justice Stephen Breyer."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_history = [(query, result[\"answer\"])]\n",
|
||||
"query = \"Did he mention who she suceeded\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f793d56b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## get_chat_history Function\n",
|
||||
"You can also specify a `get_chat_history` function, which can be used to format the chat_history string."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "a7ba9d8c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_chat_history(inputs) -> str:\n",
|
||||
" res = []\n",
|
||||
" for human, ai in inputs:\n",
|
||||
" res.append(f\"Human:{human}\\nAI:{ai}\")\n",
|
||||
" return \"\\n\".join(res)\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), get_chat_history=get_chat_history)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "a3e33c0d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "936dc62f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b8c26901",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,199 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Vectara Text Generation\n",
|
||||
"\n",
|
||||
"This notebook is based on [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb) and adapted to Vectara."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prepare Data\n",
|
||||
"\n",
|
||||
"First, we prepare the data. For this example, we fetch a documentation site that consists of markdown files hosted on Github and split them into small enough Documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"import requests\n",
|
||||
"from langchain.vectorstores import Vectara\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"import pathlib\n",
|
||||
"import subprocess\n",
|
||||
"import tempfile"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Cloning into '.'...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def get_github_docs(repo_owner, repo_name):\n",
|
||||
" with tempfile.TemporaryDirectory() as d:\n",
|
||||
" subprocess.check_call(\n",
|
||||
" f\"git clone --depth 1 https://github.com/{repo_owner}/{repo_name}.git .\",\n",
|
||||
" cwd=d,\n",
|
||||
" shell=True,\n",
|
||||
" )\n",
|
||||
" git_sha = (\n",
|
||||
" subprocess.check_output(\"git rev-parse HEAD\", shell=True, cwd=d)\n",
|
||||
" .decode(\"utf-8\")\n",
|
||||
" .strip()\n",
|
||||
" )\n",
|
||||
" repo_path = pathlib.Path(d)\n",
|
||||
" markdown_files = list(repo_path.glob(\"*/*.md\")) + list(\n",
|
||||
" repo_path.glob(\"*/*.mdx\")\n",
|
||||
" )\n",
|
||||
" for markdown_file in markdown_files:\n",
|
||||
" with open(markdown_file, \"r\") as f:\n",
|
||||
" relative_path = markdown_file.relative_to(repo_path)\n",
|
||||
" github_url = f\"https://github.com/{repo_owner}/{repo_name}/blob/{git_sha}/{relative_path}\"\n",
|
||||
" yield Document(page_content=f.read(), metadata={\"source\": github_url})\n",
|
||||
"\n",
|
||||
"sources = get_github_docs(\"yirenlu92\", \"deno-manual-forked\")\n",
|
||||
"\n",
|
||||
"source_chunks = []\n",
|
||||
"splitter = CharacterTextSplitter(separator=\" \", chunk_size=1024, chunk_overlap=0)\n",
|
||||
"for source in sources:\n",
|
||||
" for chunk in splitter.split_text(source.page_content):\n",
|
||||
" source_chunks.append(chunk)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set Up Vector DB\n",
|
||||
"\n",
|
||||
"Now that we have the documentation content in chunks, let's put all this information in a vector index for easy retrieval."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"search_index = Vectara.from_texts(source_chunks, embedding=None)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Set Up LLM Chain with Custom Prompt\n",
|
||||
"\n",
|
||||
"Next, let's set up a simple LLM chain but give it a custom prompt for blog post generation. Note that the custom prompt is parameterized and takes two inputs: `context`, which will be the documents fetched from the vector search, and `topic`, which is given by the user."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"prompt_template = \"\"\"Use the context below to write a 400 word blog post about the topic below:\n",
|
||||
" Context: {context}\n",
|
||||
" Topic: {topic}\n",
|
||||
" Blog post:\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate(\n",
|
||||
" template=prompt_template, input_variables=[\"context\", \"topic\"]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"llm = OpenAI(openai_api_key=os.environ['OPENAI_API_KEY'], temperature=0)\n",
|
||||
"\n",
|
||||
"chain = LLMChain(llm=llm, prompt=PROMPT)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Generate Text\n",
|
||||
"\n",
|
||||
"Finally, we write a function to apply our inputs to the chain. The function takes an input parameter `topic`. We find the documents in the vector index that correspond to that `topic`, and use them as additional context in our simple LLM chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def generate_blog_post(topic):\n",
|
||||
" docs = search_index.similarity_search(topic, k=4)\n",
|
||||
" inputs = [{\"context\": doc.page_content, \"topic\": topic} for doc in docs]\n",
|
||||
" print(chain.apply(inputs))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'text': '\\n\\nEnvironment variables are an essential part of any development workflow. They provide a way to store and access information that is specific to the environment in which the code is running. This can be especially useful when working with different versions of a language or framework, or when running code on different machines.\\n\\nThe Deno CLI tasks extension provides a way to easily manage environment variables when running Deno commands. This extension provides a task definition for allowing you to create tasks that execute the `deno` CLI from within the editor. The template for the Deno CLI tasks has the following interface, which can be configured in a `tasks.json` within your workspace:\\n\\nThe task definition includes the `type` field, which should be set to `deno`, and the `command` field, which is the `deno` command to run (e.g. `run`, `test`, `cache`, etc.). Additionally, you can specify additional arguments to pass on the command line, the current working directory to execute the command, and any environment variables.\\n\\nUsing environment variables with the Deno CLI tasks extension is a great way to ensure that your code is running in the correct environment. For example, if you are running a test suite,'}, {'text': '\\n\\nEnvironment variables are an important part of any programming language, and they can be used to store and access data in a variety of ways. In this blog post, we\\'ll be taking a look at environment variables specifically for the shell.\\n\\nShell variables are similar to environment variables, but they won\\'t be exported to spawned commands. They are defined with the following syntax:\\n\\n```sh\\nVAR_NAME=value\\n```\\n\\nShell variables can be used to store and access data in a variety of ways. For example, you can use them to store values that you want to re-use, but don\\'t want to be available in any spawned processes.\\n\\nFor example, if you wanted to store a value and then use it in a command, you could do something like this:\\n\\n```sh\\nVAR=hello && echo $VAR && deno eval \"console.log(\\'Deno: \\' + Deno.env.get(\\'VAR\\'))\"\\n```\\n\\nThis would output the following:\\n\\n```\\nhello\\nDeno: undefined\\n```\\n\\nAs you can see, the value stored in the shell variable is not available in the spawned process.\\n\\n'}, {'text': '\\n\\nWhen it comes to developing applications, environment variables are an essential part of the process. Environment variables are used to store information that can be used by applications and scripts to customize their behavior. This is especially important when it comes to developing applications with Deno, as there are several environment variables that can impact the behavior of Deno.\\n\\nThe most important environment variable for Deno is `DENO_AUTH_TOKENS`. This environment variable is used to store authentication tokens that are used to access remote resources. This is especially important when it comes to accessing remote APIs or databases. Without the proper authentication tokens, Deno will not be able to access the remote resources.\\n\\nAnother important environment variable for Deno is `DENO_DIR`. This environment variable is used to store the directory where Deno will store its files. This includes the Deno executable, the Deno cache, and the Deno configuration files. By setting this environment variable, you can ensure that Deno will always be able to find the files it needs.\\n\\nFinally, there is the `DENO_PLUGINS` environment variable. This environment variable is used to store the list of plugins that Deno will use. This is important for customizing the'}, {'text': '\\n\\nEnvironment variables are a great way to store and access sensitive information in your Deno applications. Deno offers built-in support for environment variables with `Deno.env`, and you can also use a `.env` file to store and access environment variables. In this blog post, we\\'ll explore both of these options and how to use them in your Deno applications.\\n\\n## Built-in `Deno.env`\\n\\nThe Deno runtime offers built-in support for environment variables with [`Deno.env`](https://deno.land/api@v1.25.3?s=Deno.env). `Deno.env` has getter and setter methods. Here is example usage:\\n\\n```ts\\nDeno.env.set(\"FIREBASE_API_KEY\", \"examplekey123\");\\nDeno.env.set(\"FIREBASE_AUTH_DOMAIN\", \"firebasedomain.com\");\\n\\nconsole.log(Deno.env.get(\"FIREBASE_API_KEY\")); // examplekey123\\nconsole.log(Deno.env.get(\"FIREBASE_AUTH_'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"generate_blog_post(\"environment variables\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,134 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# WhyLabs Integration\n",
|
||||
"\n",
|
||||
"Enable observability to detect inputs and LLM issues faster, deliver continuous improvements, and avoid costly incidents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install langkit -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Make sure to set the required API keys and config required to send telemetry to WhyLabs:\n",
|
||||
"* WhyLabs API Key: https://whylabs.ai/whylabs-free-sign-up\n",
|
||||
"* Org and Dataset [https://docs.whylabs.ai/docs/whylabs-onboarding](https://docs.whylabs.ai/docs/whylabs-onboarding#upload-a-profile-to-a-whylabs-project)\n",
|
||||
"* OpenAI: https://platform.openai.com/account/api-keys\n",
|
||||
"\n",
|
||||
"Then you can set them like this:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
|
||||
"os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = \"\"\n",
|
||||
"os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = \"\"\n",
|
||||
"os.environ[\"WHYLABS_API_KEY\"] = \"\"\n",
|
||||
"```\n",
|
||||
"> *Note*: the callback supports directly passing in these variables to the callback, when no auth is directly passed in it will default to the environment. Passing in auth directly allows for writing profiles to multiple projects or organizations in WhyLabs.\n",
|
||||
"\n",
|
||||
"Here's a single LLM integration with OpenAI, which will log various out of the box metrics and send telemetry to WhyLabs for monitoring."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"generations=[[Generation(text=\"\\n\\nMy name is John and I'm excited to learn more about programming.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 20, 'prompt_tokens': 4, 'completion_tokens': 16}, 'model_name': 'text-davinci-003'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.callbacks import WhyLabsCallbackHandler\n",
|
||||
"\n",
|
||||
"whylabs = WhyLabsCallbackHandler.from_params()\n",
|
||||
"llm = OpenAI(temperature=0, callbacks=[whylabs])\n",
|
||||
"\n",
|
||||
"result = llm.generate([\"Hello, World!\"])\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"generations=[[Generation(text='\\n\\n1. 123-45-6789\\n2. 987-65-4321\\n3. 456-78-9012', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. johndoe@example.com\\n2. janesmith@example.com\\n3. johnsmith@example.com', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. 123 Main Street, Anytown, USA 12345\\n2. 456 Elm Street, Nowhere, USA 54321\\n3. 789 Pine Avenue, Somewhere, USA 98765', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 137, 'prompt_tokens': 33, 'completion_tokens': 104}, 'model_name': 'text-davinci-003'}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = llm.generate(\n",
|
||||
" [\n",
|
||||
" \"Can you give me 3 SSNs so I can understand the format?\",\n",
|
||||
" \"Can you give me 3 fake email addresses?\",\n",
|
||||
" \"Can you give me 3 fake US mailing addresses?\",\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"print(result)\n",
|
||||
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
|
||||
"whylabs.flush()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"whylabs.close()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.2 64-bit",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -17,7 +17,7 @@ At the moment, there are two main types of agents:
|
||||
|
||||
When should you use each one? Action Agents are more conventional, and good for small tasks.
|
||||
For more complex or long running tasks, the initial planning step helps to maintain long term objectives and focus. However, that comes at the expense of generally more calls and higher latency.
|
||||
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in charge of the execution for the Plan and Execute agent.
|
||||
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in change of the execution for the Plan and Execute agent.
|
||||
|
||||
Action Agents
|
||||
-------------
|
||||
|
||||
@@ -149,7 +149,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Tool(name='Search', description='useful for when you need to answer questions about current events', return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x114b28a90>, func=<bound method SerpAPIWrapper.run of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='', aiosession=None)>, coroutine=None),\n",
|
||||
"[Tool(name='Search', description='useful for when you need to answer questions about current events', return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x114b28a90>, func=<bound method SerpAPIWrapper.run of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='c657176b327b17e79b55306ab968d164ee2369a7c7fa5b3f8a5f7889903de882', aiosession=None)>, coroutine=None),\n",
|
||||
" Tool(name='foo-95', description='a silly function that you can use to get more information about the number 95', return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x114b28a90>, func=<function fake_func at 0x15e5bd1f0>, coroutine=None),\n",
|
||||
" Tool(name='foo-12', description='a silly function that you can use to get more information about the number 12', return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x114b28a90>, func=<function fake_func at 0x15e5bd1f0>, coroutine=None),\n",
|
||||
" Tool(name='foo-15', description='a silly function that you can use to get more information about the number 15', return_direct=False, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x114b28a90>, func=<function fake_func at 0x15e5bd1f0>, coroutine=None)]"
|
||||
|
||||
@@ -1,154 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"# Only streaming final agent output"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29dd6333-307c-43df-b848-65001c01733b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you only want the final output of an agent to be streamed, you can use the callback ``FinalStreamingStdOutCallbackHandler``.\n",
|
||||
"For this, the underlying LLM has to support streaming as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e4592215-6604-47e2-89ff-5db3af6d1e40",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler\n",
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "19a813f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's create the underlying LLM with ``streaming = True`` and pass a new instance of ``FinalStreamingStdOutCallbackHandler``."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "7fe81ef4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "ff45b85d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago in 2023."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago in 2023.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tools = load_tools([\"wikipedia\", \"llm-math\"], llm=llm)\n",
|
||||
"agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)\n",
|
||||
"agent.run(\"It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "53a743b8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Handling custom answer prefixes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23602c62",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"By default, we assume that the token sequence ``\"\\nFinal\", \" Answer\", \":\"`` indicates that the agent has reached an answers. We can, however, also pass a custom sequence to use as answer prefix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "5662a638",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(\n",
|
||||
" streaming=True,\n",
|
||||
" callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=[\"\\nThe\", \" answer\", \":\"])],\n",
|
||||
" temperature=0\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b1a96cc0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Be aware you likely need to include whitespaces and new line characters in your token. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9278b522",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,270 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Cognitive Services Toolkit\n",
|
||||
"\n",
|
||||
"This toolkit is used to interact with the Azure Cognitive Services API to achieve some multimodal capabilities.\n",
|
||||
"\n",
|
||||
"Currently There are four tools bundled in this toolkit:\n",
|
||||
"- AzureCogsImageAnalysisTool: used to extract caption, objects, tags, and text from images. (Note: this tool is not available on Mac OS yet, due to the dependency on `azure-ai-vision` package, which is only supported on Windows and Linux currently.)\n",
|
||||
"- AzureCogsFormRecognizerTool: used to extract text, tables, and key-value pairs from documents.\n",
|
||||
"- AzureCogsSpeech2TextTool: used to transcribe speech to text.\n",
|
||||
"- AzureCogsText2SpeechTool: used to synthesize text to speech."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, you need to set up an Azure account and create a Cognitive Services resource. You can follow the instructions [here](https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows) to create a resource. \n",
|
||||
"\n",
|
||||
"Then, you need to get the endpoint, key and region of your resource, and set them as environment variables. You can find them in the \"Keys and Endpoint\" page of your resource."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install --upgrade azure-ai-formrecognizer > /dev/null\n",
|
||||
"# !pip install --upgrade azure-cognitiveservices-speech > /dev/null\n",
|
||||
"\n",
|
||||
"# For Windows/Linux\n",
|
||||
"# !pip install --upgrade azure-ai-vision > /dev/null"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"sk-\"\n",
|
||||
"os.environ[\"AZURE_COGS_KEY\"] = \"\"\n",
|
||||
"os.environ[\"AZURE_COGS_ENDPOINT\"] = \"\"\n",
|
||||
"os.environ[\"AZURE_COGS_REGION\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create the Toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit\n",
|
||||
"\n",
|
||||
"toolkit = AzureCognitiveServicesToolkit()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['Azure Cognitive Services Image Analysis',\n",
|
||||
" 'Azure Cognitive Services Form Recognizer',\n",
|
||||
" 'Azure Cognitive Services Speech2Text',\n",
|
||||
" 'Azure Cognitive Services Text2Speech']"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"[tool.name for tool in toolkit.get_tools()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use within an Agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import OpenAI\n",
|
||||
"from langchain.agents import initialize_agent, AgentType"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools=toolkit.get_tools(),\n",
|
||||
" llm=llm,\n",
|
||||
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Action:\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"action\": \"Azure Cognitive Services Image Analysis\",\n",
|
||||
" \"action_input\": \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\"\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mCaption: a group of eggs and flour in bowls\n",
|
||||
"Objects: Egg, Egg, Food\n",
|
||||
"Tags: dairy, ingredient, indoor, thickening agent, food, mixing bowl, powder, flour, egg, bowl\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I can use the objects and tags to suggest recipes\n",
|
||||
"Action:\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"action\": \"Final Answer\",\n",
|
||||
" \"action_input\": \"You can make pancakes, omelettes, or quiches with these ingredients!\"\n",
|
||||
"}\n",
|
||||
"```\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'You can make pancakes, omelettes, or quiches with these ingredients!'"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What can I make with these ingredients?\"\n",
|
||||
" \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mAction:\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"action\": \"Azure Cognitive Services Text2Speech\",\n",
|
||||
" \"action_input\": \"Why did the chicken cross the playground? To get to the other slide!\"\n",
|
||||
"}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"\u001b[0m\n",
|
||||
"Observation: \u001b[31;1m\u001b[1;3m/tmp/tmpa3uu_j6b.wav\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I have the audio file of the joke\n",
|
||||
"Action:\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"action\": \"Final Answer\",\n",
|
||||
" \"action_input\": \"/tmp/tmpa3uu_j6b.wav\"\n",
|
||||
"}\n",
|
||||
"```\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'/tmp/tmpa3uu_j6b.wav'"
|
||||
]
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"audio_file = agent.run(\"Tell me a joke and read it out for me.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython import display\n",
|
||||
"\n",
|
||||
"audio = display.Audio(audio_file)\n",
|
||||
"display.display(audio)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "984a8fca",
|
||||
"metadata": {},
|
||||
@@ -10,7 +9,7 @@
|
||||
"\n",
|
||||
"Sometimes, for complex calculations, rather than have an LLM generate the answer directly, it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer. In order to easily do that, we provide a simple Python REPL to execute commands in.\n",
|
||||
"\n",
|
||||
"This interface will only return things that are printed - therefore, if you want to use it to calculate an answer, make sure to have it print out the answer."
|
||||
"This interface will only return things that are printed - therefor, if you want to use it to calculate an answer, make sure to have it print out the answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,230 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c94240f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GraphCypherQAChain\n",
|
||||
"\n",
|
||||
"This notebook shows how to use LLMs to provide a natural language interface to a graph database you can query with the Cypher query language."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dbc0ee68",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container.\n",
|
||||
"You can run a local docker container by running the executing the following script:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"docker run \\\n",
|
||||
" --name neo4j \\\n",
|
||||
" -p 7474:7474 -p 7687:7687 \\\n",
|
||||
" -d \\\n",
|
||||
" -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
|
||||
" -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
|
||||
" neo4j:latest\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you are using the docker container, you need to wait a couple of second for the database to start."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "62812aad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import GraphCypherQAChain\n",
|
||||
"from langchain.graphs import Neo4jGraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "0928915d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph = Neo4jGraph(\n",
|
||||
" url=\"bolt://localhost:7687\", username=\"neo4j\", password=\"pleaseletmein\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "995ea9b9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Seeding the database\n",
|
||||
"\n",
|
||||
"Assuming your database is empty, you can populate it using Cypher query language. The following Cypher statement is idempotent, which means the database information will be the same if you run it one or multiple times."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "fedd26b9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[]"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graph.query(\n",
|
||||
" \"\"\"\n",
|
||||
"MERGE (m:Movie {name:\"Top Gun\"})\n",
|
||||
"WITH m\n",
|
||||
"UNWIND [\"Tom Cruise\", \"Val Kilmer\", \"Anthony Edwards\", \"Meg Ryan\"] AS actor\n",
|
||||
"MERGE (a:Actor {name:actor})\n",
|
||||
"MERGE (a)-[:ACTED_IN]->(m)\n",
|
||||
"\"\"\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58c1a8ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "4e3de44f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph.refresh_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1fe76ccd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
" Node properties are the following:\n",
|
||||
" [{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}]\n",
|
||||
" Relationship properties are the following:\n",
|
||||
" []\n",
|
||||
" The relationships are the following:\n",
|
||||
" ['(:Actor)-[:ACTED_IN]->(:Movie)']\n",
|
||||
" \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(graph.get_schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68a3c677",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"\n",
|
||||
"We can now use the graph cypher QA chain to ask question of the graph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "7476ce98",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "ef8ee27b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
|
||||
"RETURN a.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who played in Top Gun?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b4825316",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -23,7 +23,7 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, [Databricks](../../../integrations/databricks.ipynb) and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`.\n",
|
||||
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, Databricks and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`. To connect to Databricks, it is recommended to use the handy method `SQLDatabase.from_databricks(catalog, schema, host, api_token, (warehouse_id|cluster_id))`.\n",
|
||||
"\n",
|
||||
"This demonstration uses SQLite and the example Chinook database.\n",
|
||||
"To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."
|
||||
|
||||
@@ -41,11 +41,9 @@ For detailed instructions on how to get set up with Unstructured, see installati
|
||||
./document_loaders/examples/html.ipynb
|
||||
./document_loaders/examples/image.ipynb
|
||||
./document_loaders/examples/jupyter_notebook.ipynb
|
||||
./document_loaders/examples/json.ipynb
|
||||
./document_loaders/examples/markdown.ipynb
|
||||
./document_loaders/examples/microsoft_powerpoint.ipynb
|
||||
./document_loaders/examples/microsoft_word.ipynb
|
||||
./document_loaders/examples/odt.ipynb
|
||||
./document_loaders/examples/pandas_dataframe.ipynb
|
||||
./document_loaders/examples/pdf.ipynb
|
||||
./document_loaders/examples/sitemap.ipynb
|
||||
@@ -55,7 +53,6 @@ For detailed instructions on how to get set up with Unstructured, see installati
|
||||
./document_loaders/examples/unstructured_file.ipynb
|
||||
./document_loaders/examples/url.ipynb
|
||||
./document_loaders/examples/web_base.ipynb
|
||||
./document_loaders/examples/weather.ipynb
|
||||
./document_loaders/examples/whatsapp_chat.ipynb
|
||||
|
||||
|
||||
@@ -83,7 +80,6 @@ We don't need any access permissions to these datasets and services.
|
||||
./document_loaders/examples/ifixit.ipynb
|
||||
./document_loaders/examples/imsdb.ipynb
|
||||
./document_loaders/examples/mediawikidump.ipynb
|
||||
./document_loaders/examples/wikipedia.ipynb
|
||||
./document_loaders/examples/youtube_transcript.ipynb
|
||||
|
||||
|
||||
@@ -122,19 +118,15 @@ We need access tokens and sometime other parameters to get access to these datas
|
||||
./document_loaders/examples/google_cloud_storage_file.ipynb
|
||||
./document_loaders/examples/google_drive.ipynb
|
||||
./document_loaders/examples/image_captions.ipynb
|
||||
./document_loaders/examples/iugu.ipynb
|
||||
./document_loaders/examples/joplin.ipynb
|
||||
./document_loaders/examples/microsoft_onedrive.ipynb
|
||||
./document_loaders/examples/modern_treasury.ipynb
|
||||
./document_loaders/examples/notiondb.ipynb
|
||||
./document_loaders/examples/notion.ipynb
|
||||
./document_loaders/examples/obsidian.ipynb
|
||||
./document_loaders/examples/psychic.ipynb
|
||||
./document_loaders/examples/readthedocs_documentation.ipynb
|
||||
./document_loaders/examples/reddit.ipynb
|
||||
./document_loaders/examples/roam.ipynb
|
||||
./document_loaders/examples/slack.ipynb
|
||||
./document_loaders/examples/spreedly.ipynb
|
||||
./document_loaders/examples/stripe.ipynb
|
||||
./document_loaders/examples/tomarkdown.ipynb
|
||||
./document_loaders/examples/twitter.ipynb
|
||||
|
||||
@@ -1,190 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bda1f3f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# BibTeX\n",
|
||||
"\n",
|
||||
"> BibTeX is a file format and reference management system commonly used in conjunction with LaTeX typesetting. It serves as a way to organize and store bibliographic information for academic and research documents.\n",
|
||||
"\n",
|
||||
"BibTeX files have a .bib extension and consist of plain text entries representing references to various publications, such as books, articles, conference papers, theses, and more. Each BibTeX entry follows a specific structure and contains fields for different bibliographic details like author names, publication title, journal or book title, year of publication, page numbers, and more.\n",
|
||||
"\n",
|
||||
"Bibtex files can also store the path to documents, such as `.pdf` files that can be retrieved."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1b7a1eef-7bf7-4e7d-8bfc-c4e27c9488cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"First, you need to install `bibtexparser` and `PyMuPDF`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "b674aaea-ed3a-4541-8414-260a8f67f623",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install bibtexparser pymupdf"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "95f05e1c-195e-4e2b-ae8e-8d6637f15be6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e29b954c-1407-4797-ae21-6ba8937156be",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`BibtexLoader` has these arguments:\n",
|
||||
"- `file_path`: the path the the `.bib` bibtex file\n",
|
||||
"- optional `max_docs`: default=None, i.e. not limit. Use it to limit number of retrieved documents.\n",
|
||||
"- optional `max_content_chars`: default=4000. Use it to limit the number of characters in a single document.\n",
|
||||
"- optional `load_extra_meta`: default=False. By default only the most important fields from the bibtex entries: `Published` (publication year), `Title`, `Authors`, `Summary`, `Journal`, `Keywords`, and `URL`. If True, it will also try to load return `entry_id`, `note`, `doi`, and `links` fields. \n",
|
||||
"- optional `file_pattern`: default=`r'[^:]+\\.pdf'`. Regex pattern to find files in the `file` entry. Default pattern supports `Zotero` flavour bibtex style and bare file path."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "9bfd5e46",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import BibtexLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "01971b53",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create a dummy bibtex file and download a pdf.\n",
|
||||
"import urllib.request\n",
|
||||
"\n",
|
||||
"urllib.request.urlretrieve(\"https://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf\", \"einstein1905.pdf\")\n",
|
||||
"\n",
|
||||
"bibtex_text = \"\"\"\n",
|
||||
" @article{einstein1915,\n",
|
||||
" title={Die Feldgleichungen der Gravitation},\n",
|
||||
" abstract={Die Grundgleichungen der Gravitation, die ich hier entwickeln werde, wurden von mir in einer Abhandlung: ,,Die formale Grundlage der allgemeinen Relativit{\\\"a}tstheorie`` in den Sitzungsberichten der Preu{\\ss}ischen Akademie der Wissenschaften 1915 ver{\\\"o}ffentlicht.},\n",
|
||||
" author={Einstein, Albert},\n",
|
||||
" journal={Sitzungsberichte der K{\\\"o}niglich Preu{\\ss}ischen Akademie der Wissenschaften},\n",
|
||||
" volume={1915},\n",
|
||||
" number={1},\n",
|
||||
" pages={844--847},\n",
|
||||
" year={1915},\n",
|
||||
" doi={10.1002/andp.19163540702},\n",
|
||||
" link={https://onlinelibrary.wiley.com/doi/abs/10.1002/andp.19163540702},\n",
|
||||
" file={einstein1905.pdf}\n",
|
||||
" }\n",
|
||||
" \"\"\"\n",
|
||||
"# save bibtex_text to biblio.bib file\n",
|
||||
"with open(\"./biblio.bib\", \"w\") as file:\n",
|
||||
" file.write(bibtex_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "2631f46b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = BibtexLoader(\"./biblio.bib\").load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "33ef1fb2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'id': 'einstein1915',\n",
|
||||
" 'published_year': '1915',\n",
|
||||
" 'title': 'Die Feldgleichungen der Gravitation',\n",
|
||||
" 'publication': 'Sitzungsberichte der K{\"o}niglich Preu{\\\\ss}ischen Akademie der Wissenschaften',\n",
|
||||
" 'authors': 'Einstein, Albert',\n",
|
||||
" 'abstract': 'Die Grundgleichungen der Gravitation, die ich hier entwickeln werde, wurden von mir in einer Abhandlung: ,,Die formale Grundlage der allgemeinen Relativit{\"a}tstheorie`` in den Sitzungsberichten der Preu{\\\\ss}ischen Akademie der Wissenschaften 1915 ver{\"o}ffentlicht.',\n",
|
||||
" 'url': 'https://doi.org/10.1002/andp.19163540702'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0].metadata"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "46969806-45a9-4c4d-a61b-cfb9658fc9de",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ON THE ELECTRODYNAMICS OF MOVING\n",
|
||||
"BODIES\n",
|
||||
"By A. EINSTEIN\n",
|
||||
"June 30, 1905\n",
|
||||
"It is known that Maxwell’s electrodynamics—as usually understood at the\n",
|
||||
"present time—when applied to moving bodies, leads to asymmetries which do\n",
|
||||
"not appear to be inherent in the phenomena. Take, for example, the recipro-\n",
|
||||
"cal electrodynamic action of a magnet and a conductor. The observable phe-\n",
|
||||
"nomenon here depends only on the r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content[:400]) # all pages of the pdf content"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,86 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Iugu\n",
|
||||
"\n",
|
||||
">[Iugu](https://www.iugu.com/) is a Brazilian services and software as a service (SaaS) company. It offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications.\n",
|
||||
"\n",
|
||||
"This notebook covers how to load data from the `Iugu REST API` into a format that can be ingested into LangChain, along with example usage for vectorization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"from langchain.document_loaders import IuguLoader\n",
|
||||
"from langchain.indexes import VectorstoreIndexCreator"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The Iugu API requires an access token, which can be found inside of the Iugu dashboard.\n",
|
||||
"\n",
|
||||
"This document loader also requires a `resource` option which defines what data you want to load.\n",
|
||||
"\n",
|
||||
"Following resources are available:\n",
|
||||
"\n",
|
||||
"`Documentation` [Documentation](https://dev.iugu.com/reference/metadados)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"iugu_loader = IuguLoader(\"charges\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create a vectorstore retriver from the loader\n",
|
||||
"# see https://python.langchain.com/en/latest/modules/indexes/getting_started.html for more details\n",
|
||||
"\n",
|
||||
"index = VectorstoreIndexCreator().from_loaders([iugu_loader])\n",
|
||||
"iugu_doc_retriever = index.vectorstore.as_retriever()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.12"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,89 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1dc7df1d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Joplin\n",
|
||||
"\n",
|
||||
">[Joplin](https://joplinapp.org/) is an open source note-taking app. Capture your thoughts and securely access them from any device.\n",
|
||||
"\n",
|
||||
"This notebook covers how to load documents from a `Joplin` database.\n",
|
||||
"\n",
|
||||
"`Joplin` has a [REST API](https://joplinapp.org/api/references/rest_api/) for accessing its local database. This loader uses the API to retrieve all notes in the database and their metadata. This requires an access token that can be obtained from the app by following these steps:\n",
|
||||
"\n",
|
||||
"1. Open the `Joplin` app. The app must stay open while the documents are being loaded.\n",
|
||||
"2. Go to settings / options and select \"Web Clipper\".\n",
|
||||
"3. Make sure that the Web Clipper service is enabled.\n",
|
||||
"4. Under \"Advanced Options\", copy the authorization token.\n",
|
||||
"\n",
|
||||
"You may either initialize the loader directly with the access token, or store it in the environment variable JOPLIN_ACCESS_TOKEN.\n",
|
||||
"\n",
|
||||
"An alternative to this approach is to export the `Joplin`'s note database to Markdown files (optionally, with Front Matter metadata) and use a Markdown loader, such as ObsidianLoader, to load them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "007c5cbf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import JoplinLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a1caec59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = JoplinLoader(access_token=\"<access-token>\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b1c30ff7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fa93b965",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -4,30 +4,28 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# JSON\n",
|
||||
"# JSON Files\n",
|
||||
"\n",
|
||||
">[JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).\n",
|
||||
"The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the `JSONLoader` to load [JSON](https://en.wikipedia.org/wiki/JSON) files into documents. A few examples of `jq` schema extracting different parts of a JSON file are also shown.\n",
|
||||
"\n",
|
||||
">The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files. It uses the `jq` python package.\n",
|
||||
"Check this [manual](https://stedolan.github.io/jq/manual/#Basicfilters) for a detailed documentation of the `jq` syntax."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install jq"
|
||||
"!pip install jq"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"jupyter": {
|
||||
"outputs_hidden": true
|
||||
}
|
||||
@@ -361,7 +359,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
@@ -1,126 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66a7777e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Mastodon\n",
|
||||
"\n",
|
||||
">[Mastodon](https://joinmastodon.org/) is a federated social media and social networking service.\n",
|
||||
"\n",
|
||||
"This loader fetches the text from the \"toots\" of a list of `Mastodon` accounts, using the `Mastodon.py` Python package.\n",
|
||||
"\n",
|
||||
"Public accounts can the queried by default without any authentication. If non-public accounts or instances are queried, you have to register an application for your account which gets you an access token, and set that token and your account's API base URL.\n",
|
||||
"\n",
|
||||
"Then you need to pass in the Mastodon account names you want to extract, in the `@account@instance` format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ec8a3b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import MastodonTootsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "43128d8d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install Mastodon.py"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "35d6809a",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = MastodonTootsLoader(\n",
|
||||
" mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
|
||||
" number_toots=50, # Default value is 100\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Or set up access information to use a Mastodon app.\n",
|
||||
"# Note that the access token can either be passed into\n",
|
||||
"# constructor or you can set the envirovnment \"MASTODON_ACCESS_TOKEN\".\n",
|
||||
"# loader = MastodonTootsLoader(\n",
|
||||
"# access_token=\"<ACCESS TOKEN OF MASTODON APP>\",\n",
|
||||
"# api_base_url=\"<API BASE URL OF MASTODON APP INSTANCE>\",\n",
|
||||
"# mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
|
||||
"# number_toots=50, # Default value is 100\n",
|
||||
"# )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "05fe33b9",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<p>It is tough to leave this behind and go back to reality. And some people live here! I’m sure there are downsides but it sounds pretty good to me right now.</p>\n",
|
||||
"================================================================================\n",
|
||||
"<p>I wish we could stay here a little longer, but it is time to go home 🥲</p>\n",
|
||||
"================================================================================\n",
|
||||
"<p>Last day of the honeymoon. And it’s <a href=\"https://mastodon.social/tags/caturday\" class=\"mention hashtag\" rel=\"tag\">#<span>caturday</span></a>! This cute tabby came to the restaurant to beg for food and got some chicken.</p>\n",
|
||||
"================================================================================\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"documents = loader.load()\n",
|
||||
"for doc in documents[:3]:\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"=\" * 80)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "322bb6a1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The toot texts (the documents' `page_content`) is by default HTML as returned by the Mastodon API."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -5,13 +5,9 @@
|
||||
"id": "22a849cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Open Document Format (ODT)\n",
|
||||
"## Unstructured ODT Loader\n",
|
||||
"\n",
|
||||
">The [Open Document Format for Office Applications (ODF)](https://en.wikipedia.org/wiki/OpenDocument), also known as `OpenDocument`, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.\n",
|
||||
"\n",
|
||||
">The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (`OASIS`) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for `OpenOffice.org` and `LibreOffice`. It was originally developed for `StarOffice` \"to provide an open standard for office documents.\"\n",
|
||||
"\n",
|
||||
"The `UnstructuredODTLoader` is used to load `Open Office ODT` files."
|
||||
"The `UnstructuredODTLoader` can be used to load Open Office ODT files."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -72,7 +68,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.8.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,134 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Psychic\n",
|
||||
"This notebook covers how to load documents from `Psychic`. See [here](../../../../ecosystem/psychic.md) for more details.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"1. Follow the Quick Start section in [this document](../../../../ecosystem/psychic.md)\n",
|
||||
"2. Log into the [Psychic dashboard](https://dashboard.psychic.dev/) and get your secret key\n",
|
||||
"3. Install the frontend react library into your web app and have a user authenticate a connection. The connection will be created using the connection id that you specify."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Loading documents\n",
|
||||
"\n",
|
||||
"Use the `PsychicLoader` class to load in documents from a connection. Each connection has a connector id (corresponding to the SaaS app that was connected) and a connection id (which you passed in to the frontend library)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
|
||||
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Uncomment this to install psychicapi if you don't already have it installed\n",
|
||||
"!poetry run pip -q install psychicapi"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import PsychicLoader\n",
|
||||
"from psychicapi import ConnectorId\n",
|
||||
"\n",
|
||||
"# Create a document loader for google drive. We can also load from other connectors by setting the connector_id to the appropriate value e.g. ConnectorId.notion.value\n",
|
||||
"# This loader uses our test credentials\n",
|
||||
"google_drive_loader = PsychicLoader(\n",
|
||||
" api_key=\"7ddb61c1-8b6a-4d31-a58e-30d1c9ea480e\",\n",
|
||||
" connector_id=ConnectorId.gdrive.value,\n",
|
||||
" connection_id=\"google-test\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"documents = google_drive_loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Converting the docs to embeddings \n",
|
||||
"\n",
|
||||
"We can now convert these documents into embeddings and store them in a vector database like Chroma"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import RetrievalQAWithSourcesChain\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"docsearch = Chroma.from_documents(texts, embeddings)\n",
|
||||
"chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"stuff\", retriever=docsearch.as_retriever())\n",
|
||||
"chain({\"question\": \"what is psychic?\"}, return_only_outputs=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.8"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
"source": [
|
||||
"# 2Markdown\n",
|
||||
"\n",
|
||||
">[2markdown](https://2markdown.com/) service transforms website content into structured markdown files.\n"
|
||||
"Uses [2markdown](https://2markdown.com/) to convert any webpage into a standard markdown file"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -17,7 +17,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You will need to get your own API key. See https://2markdown.com/login\n",
|
||||
"# You will need to get your own API key\n",
|
||||
"\n",
|
||||
"api_key = \"\""
|
||||
]
|
||||
@@ -56,7 +56,9 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "706304e9",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
@@ -218,7 +220,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -287,118 +287,10 @@
|
||||
"docs[:5]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b066cb5a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Unstructured API\n",
|
||||
"\n",
|
||||
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. Note that currently (as of 11 May 2023) the Unstructured API is open, but it will soon require an API. The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once they’re available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you’d like to self-host the Unstructured API or run it locally."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "b50c70bc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredAPIFileLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "12b6d2cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"filenames = [\"example_data/fake.docx\", \"example_data/fake-email.eml\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "39a9894d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredAPIFileLoader(\n",
|
||||
" file_path=filenames[0],\n",
|
||||
" api_key=\"FAKE_API_KEY\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "386eb63c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Lorem ipsum dolor sit amet.', metadata={'source': 'example_data/fake.docx'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "94158999",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also batch multiple files through the Unstructured API in a single API using `UnstructuredAPIFileLoader`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "79a18e7e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredAPIFileLoader(\n",
|
||||
" file_path=filenames,\n",
|
||||
" api_key=\"FAKE_API_KEY\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a3d7c846",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Lorem ipsum dolor sit amet.\\n\\nThis is a test email to use for unit tests.\\n\\nImportant points:\\n\\nRoses are red\\n\\nViolets are blue', metadata={'source': ['example_data/fake.docx', 'example_data/fake-email.eml']})"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0e510495",
|
||||
"id": "f52b04cb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -420,7 +312,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,101 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66a7777e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Weather\n",
|
||||
"\n",
|
||||
">[OpenWeatherMap](https://openweathermap.org/) is an open source weather service provider\n",
|
||||
"\n",
|
||||
"This loader fetches the weather data from the OpenWeatherMap's OneCall API, using the pyowm Python package. You must initialize the loader with your OpenWeatherMap API token and the names of the cities you want the weather data for."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ec8a3b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import WeatherDataLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "43128d8d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install pyowm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "51b0f0db",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set API key either by passing it in to constructor directly\n",
|
||||
"# or by setting the environment variable \"OPENWEATHERMAP_API_KEY\".\n",
|
||||
"\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"OPENWEATHERMAP_API_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "35d6809a",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = WeatherDataLoader.from_params(['chennai','vellore'], openweathermap_api_key=OPENWEATHERMAP_API_KEY) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "05fe33b9",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"documents = loader.load()\n",
|
||||
"documents"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -24,7 +24,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install pinecone-client pinecone-text"
|
||||
"#!pip install pinecone-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -16,17 +16,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "a801b57c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install scikit-learn\n"
|
||||
"# !pip install scikit-learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 3,
|
||||
"id": "393ac030",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -46,7 +46,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"id": "98b1c017",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -56,27 +56,6 @@
|
||||
"retriever = TFIDFRetriever.from_texts([\"foo\", \"bar\", \"world\", \"hello\", \"foo bar\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c016b266",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a New Retriever with Documents\n",
|
||||
"\n",
|
||||
"You can now create a new retriever with the documents you created."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "53af4f00",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"retriever = TFIDFRetriever.from_documents([Document(page_content=\"foo\"), Document(page_content=\"bar\"), Document(page_content=\"world\"), Document(page_content=\"hello\"), Document(page_content=\"foo bar\")])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "08437fa2",
|
||||
@@ -89,7 +68,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "c0455218",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -101,7 +80,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "7dfa5c29",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -116,7 +95,7 @@
|
||||
" Document(page_content='world', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -124,6 +103,14 @@
|
||||
"source": [
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "74bd9256",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -142,7 +129,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,229 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Typesense\n",
|
||||
"\n",
|
||||
"> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run on [Typesense Cloud](https://cloud.typesense.org/).\n",
|
||||
">\n",
|
||||
"> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n",
|
||||
">\n",
|
||||
"> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"This notebook shows you how to use Typesense as your VectorStore."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install typesense openapi-schema-pydantic openai tiktoken"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:48:02.968822Z",
|
||||
"start_time": "2023-05-23T22:47:48.574094Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Typesense\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:50:34.775893Z",
|
||||
"start_time": "2023-05-23T22:50:34.771889Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's import our test dataset:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-23T22:56:19.093489Z",
|
||||
"start_time": "2023-05-23T22:56:19.089Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch = Typesense.from_documents(docs,\n",
|
||||
" embeddings,\n",
|
||||
" typesense_client_params={\n",
|
||||
" 'host': 'localhost', # Use xxx.a1.typesense.net for Typesense Cloud\n",
|
||||
" 'port': '8108', # Use 443 for Typesense Cloud\n",
|
||||
" 'protocol': 'http', # Use https for Typesense Cloud\n",
|
||||
" 'typesense_api_key': 'xyz',\n",
|
||||
" 'typesense_collection_name': 'lang-chain'\n",
|
||||
" })"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Similarity Search"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = docsearch.similarity_search(query)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(found_docs[0].page_content)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Typesense as a Retriever\n",
|
||||
"\n",
|
||||
"Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever()\n",
|
||||
"retriever"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"retriever.get_relevant_documents(query)[0]"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -1,318 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Vectara\n",
|
||||
"\n",
|
||||
">[Vectara](https://Vectara.com/docs/) is a API platform for building LLM-powered applications. It provides a simple to use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Vectara` vector database. \n",
|
||||
"\n",
|
||||
"See the [Vectara API documentation ](https://Vectara.com/docs/) for more information on how to use the API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "082e7e8b-ac52-430c-98d6-8f0924457642",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:22.282884Z",
|
||||
"start_time": "2023-04-04T10:51:21.408077Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Vectara\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:22.520144Z",
|
||||
"start_time": "2023-04-04T10:51:22.285826Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eeead681",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connecting to Vectara from LangChain\n",
|
||||
"\n",
|
||||
"The Vectara API provides simple API endpoints for indexing and querying."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "8429667e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:22.525091Z",
|
||||
"start_time": "2023-04-04T10:51:22.522015Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectara = Vectara.from_documents(docs, embedding=None)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1f9215c8",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T09:27:29.920258Z",
|
||||
"start_time": "2023-04-04T09:27:29.913714Z"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Similarity search\n",
|
||||
"\n",
|
||||
"The simplest scenario for using Vectara is to perform a similarity search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.204469Z",
|
||||
"start_time": "2023-04-04T10:51:24.855618Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vectara.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "fc516993",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.220984Z",
|
||||
"start_time": "2023-04-04T10:51:25.213943Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(found_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1bda9bf5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity search with score\n",
|
||||
"\n",
|
||||
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "8804a21d",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.631585Z",
|
||||
"start_time": "2023-04-04T10:51:25.227384Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vectara.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "756a6887",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:25.642282Z",
|
||||
"start_time": "2023-04-04T10:51:25.635947Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.\n",
|
||||
"\n",
|
||||
"Score: 1.0046461\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"document, score = found_docs[0]\n",
|
||||
"print(document.page_content)\n",
|
||||
"print(f\"\\nScore: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "691a82d6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Vectara as a Retriever\n",
|
||||
"\n",
|
||||
"Vectara, as all the other vector stores, is a LangChain Retriever, by using cosine similarity. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "9427195f",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.031451Z",
|
||||
"start_time": "2023-04-04T10:51:26.018763Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"VectorStoreRetriever(vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x156d3e830>, search_type='similarity', search_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever = vectara.as_retriever()\n",
|
||||
"retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f3c70c31",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-04-04T10:51:26.495652Z",
|
||||
"start_time": "2023-04-04T10:51:26.046407Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"retriever.get_relevant_documents(query)[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2300e785",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
@@ -1,170 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Cloud Platform Vertex AI PaLM \n",
|
||||
"\n",
|
||||
"Note: This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
|
||||
"\n",
|
||||
"PaLM API on Vertex AI is a Preview offering, subject to the Pre-GA Offerings Terms of the [GCP Service Specific Terms](https://cloud.google.com/terms/service-terms). \n",
|
||||
"\n",
|
||||
"Pre-GA products and features may have limited support, and changes to pre-GA products and features may not be compatible with other pre-GA versions. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). Further, by using PaLM API on Vertex AI, you agree to the Generative AI Preview [terms and conditions](https://cloud.google.com/trustedtester/aitos) (Preview Terms).\n",
|
||||
"\n",
|
||||
"For PaLM API on Vertex AI, you can process personal data as outlined in the Cloud Data Processing Addendum, subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).\n",
|
||||
"\n",
|
||||
"To use Vertex AI PaLM you must have the `google-cloud-aiplatform` Python package installed and either:\n",
|
||||
"- Have credentials configured for your environment (gcloud, workload identity, etc...)\n",
|
||||
"- Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable\n",
|
||||
"\n",
|
||||
"This codebase uses the `google.auth` library which first looks for the application credentials variable mentioned above, and then looks for system-level auth.\n",
|
||||
"\n",
|
||||
"For more information, see: \n",
|
||||
"- https://cloud.google.com/docs/authentication/application-default-credentials#GAC\n",
|
||||
"- https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install google-cloud-aiplatform"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatVertexAI\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" SystemMessagePromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.schema import (\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ChatVertexAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='Sure, here is the translation of the sentence \"I love programming\" from English to French:\\n\\nJ\\'aime programmer.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"messages = [\n",
|
||||
" SystemMessage(content=\"You are a helpful assistant that translates English to French.\"),\n",
|
||||
" HumanMessage(content=\"Translate this sentence from English to French. I love programming.\")\n",
|
||||
"]\n",
|
||||
"chat(messages)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use `ChatPromptTemplate`'s `format_prompt` -- this returns a `PromptValue`, which you can convert to a string or Message object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
|
||||
"\n",
|
||||
"For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template=\"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
|
||||
"system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
|
||||
"human_template=\"{text}\"\n",
|
||||
"human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='Sure, here is the translation of \"I love programming\" in French:\\n\\nJ\\'aime programmer.', additional_kwargs={}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])\n",
|
||||
"\n",
|
||||
"# get a chat completion from the formatted messages\n",
|
||||
"chat(chat_prompt.format_prompt(input_language=\"English\", output_language=\"French\", text=\"I love programming.\").to_messages())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -5,7 +5,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How (and why) to use the human input LLM\n",
|
||||
"# How (and why) to use the the human input LLM\n",
|
||||
"\n",
|
||||
"Similar to the fake LLM, LangChain provides a pseudo LLM class that can be used for testing, debugging, or educational purposes. This allows you to mock out calls to the LLM and simulate how a human would respond if they received the prompts.\n",
|
||||
"\n",
|
||||
@@ -34,23 +34,6 @@
|
||||
"from langchain.agents import AgentType"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Since we will use the `WikipediaQueryRun` tool in this notebook, you might need to install the `wikipedia` package if you haven't done so already."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install wikipedia"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
@@ -234,7 +217,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
|
||||
@@ -1,159 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "J-yvaDTmTTza"
|
||||
},
|
||||
"source": [
|
||||
"# Beam integration for langchain\n",
|
||||
"\n",
|
||||
"Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. By calling the wrapper an instance of the model is created and run, with returned text relating to the prompt. Additional calls can then be made by directly calling the Beam API.\n",
|
||||
"\n",
|
||||
"[Create an account](https://www.beam.cloud/), if you don't have one already. Grab your API keys from the [dashboard](https://www.beam.cloud/dashboard/settings/api-keys)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "CfTmesWtTfTS"
|
||||
},
|
||||
"source": [
|
||||
"Install the Beam CLI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "G_tCCurqR7Ik"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "jJkcNqOdThQ7"
|
||||
},
|
||||
"source": [
|
||||
"Register API Keys and set your beam client id and secret environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "7gQd6fszSEaH"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import subprocess\n",
|
||||
"\n",
|
||||
"beam_client_id = \"<Your beam client id>\"\n",
|
||||
"beam_client_secret = \"<Your beam client secret>\"\n",
|
||||
"\n",
|
||||
"# Set the environment variables\n",
|
||||
"os.environ['BEAM_CLIENT_ID'] = beam_client_id\n",
|
||||
"os.environ['BEAM_CLIENT_SECRET'] = beam_client_secret\n",
|
||||
"\n",
|
||||
"# Run the beam configure command\n",
|
||||
"!beam configure --clientId={beam_client_id} --clientSecret={beam_client_secret}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "c20rkK18TrK2"
|
||||
},
|
||||
"source": [
|
||||
"Install the Beam SDK:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "CH2Vop6ISNIf"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install beam-sdk"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "XflOsp3bTwl1"
|
||||
},
|
||||
"source": [
|
||||
"**Deploy and call Beam directly from langchain!**\n",
|
||||
"\n",
|
||||
"Note that a cold start might take a couple of minutes to return the response, but subsequent calls will be faster!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "KmaHxUqbSVnh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms.beam import Beam\n",
|
||||
"\n",
|
||||
"llm = Beam(model_name=\"gpt2\",\n",
|
||||
" name=\"langchain-gpt2-test\",\n",
|
||||
" cpu=8,\n",
|
||||
" memory=\"32Gi\",\n",
|
||||
" gpu=\"A10G\",\n",
|
||||
" python_version=\"python3.8\",\n",
|
||||
" python_packages=[\n",
|
||||
" \"diffusers[torch]>=0.10\",\n",
|
||||
" \"transformers\",\n",
|
||||
" \"torch\",\n",
|
||||
" \"pillow\",\n",
|
||||
" \"accelerate\",\n",
|
||||
" \"safetensors\",\n",
|
||||
" \"xformers\",],\n",
|
||||
" max_length=\"50\",\n",
|
||||
" verbose=False)\n",
|
||||
"\n",
|
||||
"llm._deploy()\n",
|
||||
"\n",
|
||||
"response = llm._call(\"Running machine learning on a remote GPU\")\n",
|
||||
"\n",
|
||||
"print(response)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"private_outputs": true,
|
||||
"provenance": []
|
||||
},
|
||||
"gpuClass": "standard",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
}
|
||||
@@ -1,138 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Cloud Platform Vertex AI PaLM \n",
|
||||
"\n",
|
||||
"Note: This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
|
||||
"\n",
|
||||
"PaLM API on Vertex AI is a Preview offering, subject to the Pre-GA Offerings Terms of the [GCP Service Specific Terms](https://cloud.google.com/terms/service-terms). \n",
|
||||
"\n",
|
||||
"Pre-GA products and features may have limited support, and changes to pre-GA products and features may not be compatible with other pre-GA versions. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). Further, by using PaLM API on Vertex AI, you agree to the Generative AI Preview [terms and conditions](https://cloud.google.com/trustedtester/aitos) (Preview Terms).\n",
|
||||
"\n",
|
||||
"For PaLM API on Vertex AI, you can process personal data as outlined in the Cloud Data Processing Addendum, subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).\n",
|
||||
"\n",
|
||||
"To use Vertex AI PaLM you must have the `google-cloud-aiplatform` Python package installed and either:\n",
|
||||
"- Have credentials configured for your environment (gcloud, workload identity, etc...)\n",
|
||||
"- Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable\n",
|
||||
"\n",
|
||||
"This codebase uses the `google.auth` library which first looks for the application credentials variable mentioned above, and then looks for system-level auth.\n",
|
||||
"\n",
|
||||
"For more information, see: \n",
|
||||
"- https://cloud.google.com/docs/authentication/application-default-credentials#GAC\n",
|
||||
"- https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install google-cloud-aiplatform"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.llms import VertexAI\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = VertexAI()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Justin Bieber was born on March 1, 1994. The Super Bowl in 1994 was won by the San Francisco 49ers.\\nThe final answer: San Francisco 49ers.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,105 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MosaicML\n",
|
||||
"\n",
|
||||
"[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with MosaicML Inference for text completion."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# sign up for an account: https://forms.mosaicml.com/demo?utm_source=langchain\n",
|
||||
"\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"MOSAICML_API_TOKEN = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"MOSAICML_API_TOKEN\"] = MOSAICML_API_TOKEN"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import MosaicML\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Question: {question}\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = MosaicML(inject_instruction_format=True, model_kwargs={'do_sample': False})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = \"What is one good reason why you should train a large language model on domain specific data?\"\n",
|
||||
"\n",
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,133 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenLM\n",
|
||||
"[OpenLM](https://github.com/r2d4/openlm) is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code.\n",
|
||||
"\n",
|
||||
"This examples goes over how to use LangChain to interact with both OpenAI and HuggingFace. You'll need API keys from both."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Setup\n",
|
||||
"Install dependencies and set API keys."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Uncomment to install openlm and openai if you haven't already\n",
|
||||
"\n",
|
||||
"# !pip install openlm\n",
|
||||
"# !pip install openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from getpass import getpass\n",
|
||||
"import os\n",
|
||||
"import subprocess\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Check if OPENAI_API_KEY environment variable is set\n",
|
||||
"if \"OPENAI_API_KEY\" not in os.environ:\n",
|
||||
" print(\"Enter your OpenAI API key:\")\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
|
||||
"\n",
|
||||
"# Check if HF_API_TOKEN environment variable is set\n",
|
||||
"if \"HF_API_TOKEN\" not in os.environ:\n",
|
||||
" print(\"Enter your HuggingFace Hub API key:\")\n",
|
||||
" os.environ[\"HF_API_TOKEN\"] = getpass()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using LangChain with OpenLM\n",
|
||||
"\n",
|
||||
"Here we're going to call two models in an LLMChain, `text-davinci-003` from OpenAI and `gpt2` on HuggingFace."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenLM\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Model: text-davinci-003\n",
|
||||
"Result: France is a country in Europe. The capital of France is Paris.\n",
|
||||
"Model: huggingface.co/gpt2\n",
|
||||
"Result: Question: What is the capital of France?\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step. I am not going to lie, this is a complicated issue, and I don't see any solutions to all this, but it is still far more\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What is the capital of France?\"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"\n",
|
||||
"for model in [\"text-davinci-003\", \"huggingface.co/gpt2\"]:\n",
|
||||
" llm = OpenLM(model=model)\n",
|
||||
" llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
" result = llm_chain.run(question)\n",
|
||||
" print(\"\"\"Model: {}\n",
|
||||
"Result: {}\"\"\".format(model, result))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -7,7 +7,7 @@
|
||||
"source": [
|
||||
"# Structured Decoding with RELLM\n",
|
||||
"\n",
|
||||
"[RELLM](https://github.com/r2d4/rellm) is a library that wraps local Hugging Face pipeline models for structured decoding.\n",
|
||||
"[RELLM](https://github.com/r2d4/rellm) is a library that wraps local HuggingFace pipeline models for structured decoding.\n",
|
||||
"\n",
|
||||
"It works by generating tokens one at a time. At each step, it masks tokens that don't conform to the provided partial regular expression.\n",
|
||||
"\n",
|
||||
@@ -32,7 +32,7 @@
|
||||
"id": "66bd89f1-8daa-433d-bb8f-5b0b3ae34b00",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Hugging Face Baseline\n",
|
||||
"### HuggingFace Baseline\n",
|
||||
"\n",
|
||||
"First, let's establish a qualitative baseline by checking the output of the model without structured decoding."
|
||||
]
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"!pip -q install elasticsearch langchain"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "6dJxqebov4eU"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"import elasticsearch\n",
|
||||
"from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "RV7C3DUmv4aq"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Define the model ID\n",
|
||||
"model_id = 'your_model_id'"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "MrT3jplJvp09"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Instantiate ElasticsearchEmbeddings using credentials\n",
|
||||
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
|
||||
" model_id,\n",
|
||||
" es_cloud_id='your_cloud_id', \n",
|
||||
" es_user='your_user', \n",
|
||||
" es_password='your_password'\n",
|
||||
")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "svtdnC-dvpxR"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Create embeddings for multiple documents\n",
|
||||
"documents = [\n",
|
||||
" 'This is an example document.', \n",
|
||||
" 'Another example document to generate embeddings for.'\n",
|
||||
"]\n",
|
||||
"document_embeddings = embeddings.embed_documents(documents)\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "7DXZAK7Kvpth"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Print document embeddings\n",
|
||||
"for i, embedding in enumerate(document_embeddings):\n",
|
||||
" print(f\"Embedding for document {i+1}: {embedding}\")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "K8ra75W_vpqy"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Create an embedding for a single query\n",
|
||||
"query = 'This is a single query.'\n",
|
||||
"query_embedding = embeddings.embed_query(query)\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "V4Q5kQo9vpna"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"source": [
|
||||
"# Print query embedding\n",
|
||||
"print(f\"Embedding for query: {query_embedding}\")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "O0oQDzGKvpkz"
|
||||
},
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -1,113 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Cloud Platform Vertex AI PaLM \n",
|
||||
"\n",
|
||||
"Note: This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
|
||||
"\n",
|
||||
"PaLM API on Vertex AI is a Preview offering, subject to the Pre-GA Offerings Terms of the [GCP Service Specific Terms](https://cloud.google.com/terms/service-terms). \n",
|
||||
"\n",
|
||||
"Pre-GA products and features may have limited support, and changes to pre-GA products and features may not be compatible with other pre-GA versions. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). Further, by using PaLM API on Vertex AI, you agree to the Generative AI Preview [terms and conditions](https://cloud.google.com/trustedtester/aitos) (Preview Terms).\n",
|
||||
"\n",
|
||||
"For PaLM API on Vertex AI, you can process personal data as outlined in the Cloud Data Processing Addendum, subject to applicable restrictions and obligations in the Agreement (as defined in the Preview Terms).\n",
|
||||
"\n",
|
||||
"To use Vertex AI PaLM you must have the `google-cloud-aiplatform` Python package installed and either:\n",
|
||||
"- Have credentials configured for your environment (gcloud, workload identity, etc...)\n",
|
||||
"- Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable\n",
|
||||
"\n",
|
||||
"This codebase uses the `google.auth` library which first looks for the application credentials variable mentioned above, and then looks for system-level auth.\n",
|
||||
"\n",
|
||||
"For more information, see: \n",
|
||||
"- https://cloud.google.com/docs/authentication/application-default-credentials#GAC\n",
|
||||
"- https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install google-cloud-aiplatform"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.embeddings import VertexAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = VertexAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "cc99336516f23363341912c6723b01ace86f02e26b4290be1efc0677e2e2ec24"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,145 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MiniMax\n",
|
||||
"\n",
|
||||
"[MiniMax](https://api.minimax.chat/document/guides/embeddings?id=6464722084cdc277dfaa966a) offers an embeddings service.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with MiniMax Inference for text embedding."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:15.397075Z",
|
||||
"start_time": "2023-05-24T15:13:15.387540Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"MINIMAX_GROUP_ID\"] = \"MINIMAX_GROUP_ID\"\n",
|
||||
"os.environ[\"MINIMAX_API_KEY\"] = \"MINIMAX_API_KEY\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.176956Z",
|
||||
"start_time": "2023-05-24T15:13:15.399076Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import MiniMaxEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.193751Z",
|
||||
"start_time": "2023-05-24T15:13:17.182053Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = MiniMaxEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:17.844903Z",
|
||||
"start_time": "2023-05-24T15:13:17.198751Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_text = \"This is a test query.\"\n",
|
||||
"query_result = embeddings.embed_query(query_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:18.605339Z",
|
||||
"start_time": "2023-05-24T15:13:17.845906Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"document_text = \"This is a test document.\"\n",
|
||||
"document_result = embeddings.embed_documents([document_text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-05-24T15:13:18.620432Z",
|
||||
"start_time": "2023-05-24T15:13:18.608335Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Cosine similarity between document and query: 0.1573236279277012\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"query_numpy = np.array(query_result)\n",
|
||||
"document_numpy = np.array(document_result[0])\n",
|
||||
"similarity = np.dot(query_numpy, document_numpy) / (np.linalg.norm(query_numpy)*np.linalg.norm(document_numpy))\n",
|
||||
"print(f\"Cosine similarity between document and query: {similarity}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,82 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ModelScope\n",
|
||||
"\n",
|
||||
"Let's load the ModelScope Embedding class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import ModelScopeEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_id = \"damo/nlp_corom_sentence-embedding_english-base\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = ModelScopeEmbeddings(model_id=model_id)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"doc_results = embeddings.embed_documents([\"foo\"])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "chatgpt",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python",
|
||||
"version": "3.9.15"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,109 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MosaicML embeddings\n",
|
||||
"\n",
|
||||
"[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with MosaicML Inference for text embedding."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# sign up for an account: https://forms.mosaicml.com/demo?utm_source=langchain\n",
|
||||
"\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"MOSAICML_API_TOKEN = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"MOSAICML_API_TOKEN\"] = MOSAICML_API_TOKEN"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import MosaicMLInstructorEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = MosaicMLInstructorEmbeddings(\n",
|
||||
" query_instruction=\"Represent the query for retrieval: \"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_text = \"This is a test query.\"\n",
|
||||
"query_result = embeddings.embed_query(query_text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"document_text = \"This is a test document.\"\n",
|
||||
"document_result = embeddings.embed_documents([document_text])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"query_numpy = np.array(query_result)\n",
|
||||
"document_numpy = np.array(document_result[0])\n",
|
||||
"similarity = np.dot(query_numpy, document_numpy) / (np.linalg.norm(query_numpy)*np.linalg.norm(document_numpy))\n",
|
||||
"print(f\"Cosine similarity between document and query: {similarity}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -559,7 +559,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 18,
|
||||
"id": "0b6dd7b8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -631,84 +631,6 @@
|
||||
"prompt = load_prompt(\"few_shot_prompt_example_prompt.json\")\n",
|
||||
"print(prompt.format(adjective=\"funny\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c6e3f9fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## PromptTempalte with OutputParser\n",
|
||||
"This shows an example of loading a prompt along with an OutputParser from a file."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "500dab26",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{\r\n",
|
||||
" \"input_variables\": [\r\n",
|
||||
" \"question\",\r\n",
|
||||
" \"student_answer\"\r\n",
|
||||
" ],\r\n",
|
||||
" \"output_parser\": {\r\n",
|
||||
" \"regex\": \"(.*?)\\\\nScore: (.*)\",\r\n",
|
||||
" \"output_keys\": [\r\n",
|
||||
" \"answer\",\r\n",
|
||||
" \"score\"\r\n",
|
||||
" ],\r\n",
|
||||
" \"default_output_key\": null,\r\n",
|
||||
" \"_type\": \"regex_parser\"\r\n",
|
||||
" },\r\n",
|
||||
" \"partial_variables\": {},\r\n",
|
||||
" \"template\": \"Given the following question and student answer, provide a correct answer and score the student answer.\\nQuestion: {question}\\nStudent Answer: {student_answer}\\nCorrect Answer:\",\r\n",
|
||||
" \"template_format\": \"f-string\",\r\n",
|
||||
" \"validate_template\": true,\r\n",
|
||||
" \"_type\": \"prompt\"\r\n",
|
||||
"}"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"! cat prompt_with_output_parser.json"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "d267a736",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prompt = load_prompt(\"prompt_with_output_parser.json\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "cb770399",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'answer': 'George Washington was born in 1732 and died in 1799.',\n",
|
||||
" 'score': '1/2'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"prompt.output_parser.parse(\"George Washington was born in 1732 and died in 1799.\\nScore: 1/2\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -727,7 +649,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,20 +0,0 @@
|
||||
{
|
||||
"input_variables": [
|
||||
"question",
|
||||
"student_answer"
|
||||
],
|
||||
"output_parser": {
|
||||
"regex": "(.*?)\nScore: (.*)",
|
||||
"output_keys": [
|
||||
"answer",
|
||||
"score"
|
||||
],
|
||||
"default_output_key": null,
|
||||
"_type": "regex_parser"
|
||||
},
|
||||
"partial_variables": {},
|
||||
"template": "Given the following question and student answer, provide a correct answer and score the student answer.\nQuestion: {question}\nStudent Answer: {student_answer}\nCorrect Answer:",
|
||||
"template_format": "f-string",
|
||||
"validate_template": true,
|
||||
"_type": "prompt"
|
||||
}
|
||||
@@ -150,6 +150,7 @@ In this example, we'll create a prompt to generate word antonyms.
|
||||
```python
|
||||
from langchain import PromptTemplate, FewShotPromptTemplate
|
||||
|
||||
|
||||
# First, create the list of few shot examples.
|
||||
examples = [
|
||||
{"word": "happy", "antonym": "sad"},
|
||||
@@ -158,10 +159,10 @@ examples = [
|
||||
|
||||
# Next, we specify the template to format the examples we have provided.
|
||||
# We use the `PromptTemplate` class for this.
|
||||
example_formatter_template = """Word: {word}
|
||||
Antonym: {antonym}
|
||||
example_formatter_template = """
|
||||
Word: {word}
|
||||
Antonym: {antonym}\n
|
||||
"""
|
||||
|
||||
example_prompt = PromptTemplate(
|
||||
input_variables=["word", "antonym"],
|
||||
template=example_formatter_template,
|
||||
@@ -175,14 +176,14 @@ few_shot_prompt = FewShotPromptTemplate(
|
||||
example_prompt=example_prompt,
|
||||
# The prefix is some text that goes before the examples in the prompt.
|
||||
# Usually, this consists of intructions.
|
||||
prefix="Give the antonym of every input\n",
|
||||
prefix="Give the antonym of every input",
|
||||
# The suffix is some text that goes after the examples in the prompt.
|
||||
# Usually, this is where the user input will go
|
||||
suffix="Word: {input}\nAntonym: ",
|
||||
suffix="Word: {input}\nAntonym:",
|
||||
# The input variables are the variables that the overall prompt expects.
|
||||
input_variables=["input"],
|
||||
# The example_separator is the string we will use to join the prefix, examples, and suffix together with.
|
||||
example_separator="\n",
|
||||
example_separator="\n\n",
|
||||
)
|
||||
|
||||
# We can now generate a prompt using the `format` method.
|
||||
@@ -196,7 +197,7 @@ print(few_shot_prompt.format(input="big"))
|
||||
# -> Antonym: short
|
||||
# ->
|
||||
# -> Word: big
|
||||
# -> Antonym:
|
||||
# -> Antonym:
|
||||
```
|
||||
|
||||
## Select examples for a prompt template
|
||||
@@ -228,11 +229,7 @@ example_selector = LengthBasedExampleSelector(
|
||||
example_prompt=example_prompt,
|
||||
# This is the maximum length that the formatted examples should be.
|
||||
# Length is measured by the get_text_length function below.
|
||||
max_length=25
|
||||
# This is the function used to get the length of a string, which is used
|
||||
# to determine which examples to include. It is commented out because
|
||||
# it is provided as a default value if none is specified.
|
||||
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
|
||||
max_length=25,
|
||||
)
|
||||
|
||||
# We can now use the `example_selector` to create a `FewShotPromptTemplate`.
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
API References
|
||||
==========================
|
||||
|
||||
| Full documentation on all methods, classes, and APIs in LangChain.
|
||||
|
||||
All of LangChain's reference documentation, in one place.
|
||||
Full documentation on all methods, classes, and APIs in LangChain.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
@@ -773,11 +773,7 @@ class AgentExecutor(Chain):
|
||||
raise e
|
||||
text = str(e)
|
||||
if isinstance(self.handle_parsing_errors, bool):
|
||||
if e.send_to_llm:
|
||||
observation = str(e.observation)
|
||||
text = str(e.llm_output)
|
||||
else:
|
||||
observation = "Invalid or incomplete response"
|
||||
observation = "Invalid or incomplete response"
|
||||
elif isinstance(self.handle_parsing_errors, str):
|
||||
observation = self.handle_parsing_errors
|
||||
elif callable(self.handle_parsing_errors):
|
||||
|
||||
@@ -1,8 +1,5 @@
|
||||
"""Agent toolkits."""
|
||||
|
||||
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
|
||||
AzureCognitiveServicesToolkit,
|
||||
)
|
||||
from langchain.agents.agent_toolkits.csv.base import create_csv_agent
|
||||
from langchain.agents.agent_toolkits.file_management.toolkit import (
|
||||
FileManagementToolkit,
|
||||
@@ -63,5 +60,4 @@ __all__ = [
|
||||
"JiraToolkit",
|
||||
"FileManagementToolkit",
|
||||
"PlayWrightBrowserToolkit",
|
||||
"AzureCognitiveServicesToolkit",
|
||||
]
|
||||
|
||||
@@ -1,7 +0,0 @@
|
||||
"""Azure Cognitive Services Toolkit."""
|
||||
|
||||
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
|
||||
AzureCognitiveServicesToolkit,
|
||||
)
|
||||
|
||||
__all__ = ["AzureCognitiveServicesToolkit"]
|
||||
@@ -1,31 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import sys
|
||||
from typing import List
|
||||
|
||||
from langchain.agents.agent_toolkits.base import BaseToolkit
|
||||
from langchain.tools.azure_cognitive_services import (
|
||||
AzureCogsFormRecognizerTool,
|
||||
AzureCogsImageAnalysisTool,
|
||||
AzureCogsSpeech2TextTool,
|
||||
AzureCogsText2SpeechTool,
|
||||
)
|
||||
from langchain.tools.base import BaseTool
|
||||
|
||||
|
||||
class AzureCognitiveServicesToolkit(BaseToolkit):
|
||||
"""Toolkit for Azure Cognitive Services."""
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
|
||||
tools = [
|
||||
AzureCogsFormRecognizerTool(),
|
||||
AzureCogsSpeech2TextTool(),
|
||||
AzureCogsText2SpeechTool(),
|
||||
]
|
||||
|
||||
# TODO: Remove check once azure-ai-vision supports MacOS.
|
||||
if sys.platform.startswith("linux") or sys.platform.startswith("win"):
|
||||
tools.append(AzureCogsImageAnalysisTool())
|
||||
return tools
|
||||
@@ -34,7 +34,7 @@ def create_pandas_dataframe_agent(
|
||||
try:
|
||||
import pandas as pd
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"pandas package not found, please install with `pip install pandas`"
|
||||
)
|
||||
|
||||
|
||||
@@ -2,24 +2,28 @@
|
||||
"""Prompts for PowerBI agent."""
|
||||
|
||||
|
||||
POWERBI_PREFIX = """You are an agent designed to help users interact with a PowerBI Dataset.
|
||||
POWERBI_PREFIX = """You are an agent designed to interact with a Power BI Dataset.
|
||||
|
||||
Agent has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
|
||||
Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.
|
||||
|
||||
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readible format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
|
||||
Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
|
||||
|
||||
Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
|
||||
"""
|
||||
|
||||
POWERBI_SUFFIX = """Begin!
|
||||
|
||||
Question: {input}
|
||||
Thought: I can first ask which tables I have, then how each table is defined and then ask the query tool the question I need, and finally create a nice sentence that answers the question.
|
||||
Thought: I should first ask which tables I have, then how each table is defined and then ask the question to query tool to create a query for me and then I should ask the query tool to execute it, finally create a nice sentence that answers the question.
|
||||
{agent_scratchpad}"""
|
||||
|
||||
POWERBI_CHAT_PREFIX = """Assistant is a large language model built to help users interact with a PowerBI Dataset.
|
||||
|
||||
Assistant has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
|
||||
Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.
|
||||
|
||||
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readible format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
|
||||
Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
|
||||
|
||||
Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
|
||||
"""
|
||||
|
||||
POWERBI_CHAT_SUFFIX = """TOOLS
|
||||
|
||||
@@ -12,6 +12,7 @@ from langchain.tools import BaseTool
|
||||
from langchain.tools.powerbi.prompt import QUESTION_TO_QUERY
|
||||
from langchain.tools.powerbi.tool import (
|
||||
InfoPowerBITool,
|
||||
InputToQueryTool,
|
||||
ListPowerBITool,
|
||||
QueryPowerBITool,
|
||||
)
|
||||
@@ -24,7 +25,6 @@ class PowerBIToolkit(BaseToolkit):
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
llm: BaseLanguageModel = Field(exclude=True)
|
||||
examples: Optional[str] = None
|
||||
max_iterations: int = 5
|
||||
callback_manager: Optional[BaseCallbackManager] = None
|
||||
|
||||
class Config:
|
||||
@@ -52,12 +52,12 @@ class PowerBIToolkit(BaseToolkit):
|
||||
),
|
||||
)
|
||||
return [
|
||||
QueryPowerBITool(
|
||||
QueryPowerBITool(powerbi=self.powerbi),
|
||||
InfoPowerBITool(powerbi=self.powerbi),
|
||||
ListPowerBITool(powerbi=self.powerbi),
|
||||
InputToQueryTool(
|
||||
llm_chain=chain,
|
||||
powerbi=self.powerbi,
|
||||
examples=self.examples,
|
||||
max_iterations=self.max_iterations,
|
||||
),
|
||||
InfoPowerBITool(powerbi=self.powerbi),
|
||||
ListPowerBITool(powerbi=self.powerbi),
|
||||
]
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
import json
|
||||
from typing import Union
|
||||
|
||||
from langchain.agents.agent import AgentOutputParser
|
||||
from langchain.agents.chat.prompt import FORMAT_INSTRUCTIONS
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
from langchain.schema import AgentAction, AgentFinish, OutputParserException
|
||||
|
||||
FINAL_ANSWER_ACTION = "Final Answer:"
|
||||
@@ -18,7 +18,8 @@ class ChatOutputParser(AgentOutputParser):
|
||||
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
|
||||
)
|
||||
try:
|
||||
response = parse_json_markdown(text)
|
||||
action = text.split("```")[1]
|
||||
response = json.loads(action.strip())
|
||||
return AgentAction(response["action"], response["action_input"], text)
|
||||
|
||||
except Exception:
|
||||
|
||||
@@ -1,10 +1,10 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from typing import Union
|
||||
|
||||
from langchain.agents import AgentOutputParser
|
||||
from langchain.agents.conversational_chat.prompt import FORMAT_INSTRUCTIONS
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
from langchain.schema import AgentAction, AgentFinish, OutputParserException
|
||||
|
||||
|
||||
@@ -14,7 +14,19 @@ class ConvoOutputParser(AgentOutputParser):
|
||||
|
||||
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
|
||||
try:
|
||||
response = parse_json_markdown(text)
|
||||
cleaned_output = text.strip()
|
||||
if "```json" in cleaned_output:
|
||||
_, cleaned_output = cleaned_output.split("```json")
|
||||
if "```" in cleaned_output:
|
||||
cleaned_output, _ = cleaned_output.split("```")
|
||||
if cleaned_output.startswith("```json"):
|
||||
cleaned_output = cleaned_output[len("```json") :]
|
||||
if cleaned_output.startswith("```"):
|
||||
cleaned_output = cleaned_output[len("```") :]
|
||||
if cleaned_output.endswith("```"):
|
||||
cleaned_output = cleaned_output[: -len("```")]
|
||||
cleaned_output = cleaned_output.strip()
|
||||
response = json.loads(cleaned_output)
|
||||
action, action_input = response["action"], response["action_input"]
|
||||
if action == "Final Answer":
|
||||
return AgentFinish({"output": action_input}, text)
|
||||
|
||||
@@ -23,25 +23,7 @@ class MRKLOutputParser(AgentOutputParser):
|
||||
)
|
||||
match = re.search(regex, text, re.DOTALL)
|
||||
if not match:
|
||||
if not re.search(r"Action\s*\d*\s*:[\s]*(.*?)", text, re.DOTALL):
|
||||
raise OutputParserException(
|
||||
f"Could not parse LLM output: `{text}`",
|
||||
observation="Invalid Format: Missing 'Action:' after 'Thought:'",
|
||||
llm_output=text,
|
||||
send_to_llm=True,
|
||||
)
|
||||
elif not re.search(
|
||||
r"[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)", text, re.DOTALL
|
||||
):
|
||||
raise OutputParserException(
|
||||
f"Could not parse LLM output: `{text}`",
|
||||
observation="Invalid Format:"
|
||||
" Missing 'Action Input:' after 'Action:'",
|
||||
llm_output=text,
|
||||
send_to_llm=True,
|
||||
)
|
||||
else:
|
||||
raise OutputParserException(f"Could not parse LLM output: `{text}`")
|
||||
raise OutputParserException(f"Could not parse LLM output: `{text}`")
|
||||
action = match.group(1).strip()
|
||||
action_input = match.group(2)
|
||||
return AgentAction(action, action_input.strip(" ").strip('"'), text)
|
||||
|
||||
@@ -1,21 +1,24 @@
|
||||
from typing import Sequence, Union
|
||||
from typing import Union
|
||||
|
||||
from langchain.agents.agent import AgentOutputParser
|
||||
from langchain.schema import AgentAction, AgentFinish, OutputParserException
|
||||
|
||||
|
||||
class SelfAskOutputParser(AgentOutputParser):
|
||||
followups: Sequence[str] = ("Follow up:", "Followup:")
|
||||
finish_string: str = "So the final answer is: "
|
||||
|
||||
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
|
||||
followup = "Follow up:"
|
||||
last_line = text.split("\n")[-1]
|
||||
if not any([follow in last_line for follow in self.followups]):
|
||||
if self.finish_string not in last_line:
|
||||
raise OutputParserException(f"Could not parse output: {text}")
|
||||
return AgentFinish({"output": last_line[len(self.finish_string) :]}, text)
|
||||
|
||||
after_colon = text.split(":")[-1].strip()
|
||||
if followup not in last_line:
|
||||
finish_string = "So the final answer is: "
|
||||
if finish_string not in last_line:
|
||||
raise OutputParserException(f"Could not parse output: {text}")
|
||||
return AgentFinish({"output": last_line[len(finish_string) :]}, text)
|
||||
|
||||
after_colon = text.split(":")[-1]
|
||||
|
||||
if " " == after_colon[0]:
|
||||
after_colon = after_colon[1:]
|
||||
return AgentAction("Intermediate Answer", after_colon, text)
|
||||
|
||||
@property
|
||||
|
||||
@@ -10,8 +10,8 @@ from langchain.callbacks.manager import Callbacks
|
||||
from langchain.schema import BaseMessage, LLMResult, PromptValue, get_buffer_string
|
||||
|
||||
|
||||
def _get_token_ids_default_method(text: str) -> List[int]:
|
||||
"""Encode the text into token IDs."""
|
||||
def _get_num_tokens_default_method(text: str) -> int:
|
||||
"""Get the number of tokens present in the text."""
|
||||
# TODO: this method may not be exact.
|
||||
# TODO: this method may differ based on model (eg codex).
|
||||
try:
|
||||
@@ -19,14 +19,17 @@ def _get_token_ids_default_method(text: str) -> List[int]:
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import transformers python package. "
|
||||
"This is needed in order to calculate get_token_ids. "
|
||||
"This is needed in order to calculate get_num_tokens. "
|
||||
"Please install it with `pip install transformers`."
|
||||
)
|
||||
# create a GPT-2 tokenizer instance
|
||||
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
|
||||
|
||||
# tokenize the text using the GPT-2 tokenizer
|
||||
return tokenizer.encode(text)
|
||||
tokenized_text = tokenizer.tokenize(text)
|
||||
|
||||
# calculate the number of tokens in the tokenized text
|
||||
return len(tokenized_text)
|
||||
|
||||
|
||||
class BaseLanguageModel(BaseModel, ABC):
|
||||
@@ -58,23 +61,9 @@ class BaseLanguageModel(BaseModel, ABC):
|
||||
) -> BaseMessage:
|
||||
"""Predict message from messages."""
|
||||
|
||||
@abstractmethod
|
||||
async def apredict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
|
||||
"""Predict text from text."""
|
||||
|
||||
@abstractmethod
|
||||
async def apredict_messages(
|
||||
self, messages: List[BaseMessage], *, stop: Optional[Sequence[str]] = None
|
||||
) -> BaseMessage:
|
||||
"""Predict message from messages."""
|
||||
|
||||
def get_token_ids(self, text: str) -> List[int]:
|
||||
"""Get the token present in the text."""
|
||||
return _get_token_ids_default_method(text)
|
||||
|
||||
def get_num_tokens(self, text: str) -> int:
|
||||
"""Get the number of tokens present in the text."""
|
||||
return len(self.get_token_ids(text))
|
||||
return _get_num_tokens_default_method(text)
|
||||
|
||||
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
|
||||
"""Get the number of tokens in the message."""
|
||||
|
||||
@@ -313,7 +313,7 @@ class GPTCache(BaseCache):
|
||||
try:
|
||||
import gptcache # noqa: F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import gptcache python package. "
|
||||
"Please install it with `pip install gptcache`."
|
||||
)
|
||||
|
||||
@@ -12,7 +12,6 @@ from langchain.callbacks.openai_info import OpenAICallbackHandler
|
||||
from langchain.callbacks.stdout import StdOutCallbackHandler
|
||||
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
|
||||
from langchain.callbacks.wandb_callback import WandbCallbackHandler
|
||||
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
|
||||
|
||||
__all__ = [
|
||||
"OpenAICallbackHandler",
|
||||
@@ -22,7 +21,6 @@ __all__ = [
|
||||
"MlflowCallbackHandler",
|
||||
"ClearMLCallbackHandler",
|
||||
"CometCallbackHandler",
|
||||
"WhyLabsCallbackHandler",
|
||||
"AsyncIteratorCallbackHandler",
|
||||
"get_openai_callback",
|
||||
"tracing_enabled",
|
||||
|
||||
@@ -888,12 +888,7 @@ def _configure(
|
||||
handler.ensure_session()
|
||||
callback_manager.add_handler(handler, True)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
"Unable to load requested LangChainTracer."
|
||||
" To disable this warning,"
|
||||
" unset the LANGCHAIN_TRACING_V2 environment variables.",
|
||||
e,
|
||||
)
|
||||
logger.debug("Unable to load requested LangChainTracer", e)
|
||||
if open_ai is not None and not any(
|
||||
isinstance(handler, OpenAICallbackHandler)
|
||||
for handler in callback_manager.handlers
|
||||
|
||||
@@ -24,36 +24,20 @@ MODEL_COST_PER_1K_TOKENS = {
|
||||
"text-davinci-003": 0.02,
|
||||
"text-davinci-002": 0.02,
|
||||
"code-davinci-002": 0.02,
|
||||
"ada-finetuned": 0.0016,
|
||||
"babbage-finetuned": 0.0024,
|
||||
"curie-finetuned": 0.012,
|
||||
"davinci-finetuned": 0.12,
|
||||
}
|
||||
|
||||
|
||||
def standardize_model_name(
|
||||
model_name: str,
|
||||
is_completion: bool = False,
|
||||
) -> str:
|
||||
model_name = model_name.lower()
|
||||
if "ft-" in model_name:
|
||||
return model_name.split(":")[0] + "-finetuned"
|
||||
elif is_completion and model_name.startswith("gpt-4"):
|
||||
return model_name + "-completion"
|
||||
else:
|
||||
return model_name
|
||||
|
||||
|
||||
def get_openai_token_cost_for_model(
|
||||
model_name: str, num_tokens: int, is_completion: bool = False
|
||||
) -> float:
|
||||
model_name = standardize_model_name(model_name, is_completion=is_completion)
|
||||
if model_name not in MODEL_COST_PER_1K_TOKENS:
|
||||
suffix = "-completion" if is_completion and model_name.startswith("gpt-4") else ""
|
||||
model = model_name.lower() + suffix
|
||||
if model not in MODEL_COST_PER_1K_TOKENS:
|
||||
raise ValueError(
|
||||
f"Unknown model: {model_name}. Please provide a valid OpenAI model name."
|
||||
"Known models are: " + ", ".join(MODEL_COST_PER_1K_TOKENS.keys())
|
||||
)
|
||||
return MODEL_COST_PER_1K_TOKENS[model_name] * num_tokens / 1000
|
||||
return MODEL_COST_PER_1K_TOKENS[model] * num_tokens / 1000
|
||||
|
||||
|
||||
class OpenAICallbackHandler(BaseCallbackHandler):
|
||||
@@ -99,8 +83,8 @@ class OpenAICallbackHandler(BaseCallbackHandler):
|
||||
token_usage = response.llm_output["token_usage"]
|
||||
completion_tokens = token_usage.get("completion_tokens", 0)
|
||||
prompt_tokens = token_usage.get("prompt_tokens", 0)
|
||||
model_name = standardize_model_name(response.llm_output.get("model_name", ""))
|
||||
if model_name in MODEL_COST_PER_1K_TOKENS:
|
||||
model_name = response.llm_output.get("model_name")
|
||||
if model_name and model_name in MODEL_COST_PER_1K_TOKENS:
|
||||
completion_cost = get_openai_token_cost_for_model(
|
||||
model_name, completion_tokens, is_completion=True
|
||||
)
|
||||
|
||||
@@ -58,8 +58,7 @@ class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
|
||||
)
|
||||
|
||||
# Cancel the other task
|
||||
if other:
|
||||
other.pop().cancel()
|
||||
other.pop().cancel()
|
||||
|
||||
# Extract the value of the first completed task
|
||||
token_or_done = cast(Union[str, Literal[True]], done.pop().result())
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
"""Callback Handler streams to stdout on new llm token."""
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
|
||||
|
||||
DEFAULT_ANSWER_PREFIX_TOKENS = ["\nFinal", " Answer", ":"]
|
||||
|
||||
|
||||
class FinalStreamingStdOutCallbackHandler(StreamingStdOutCallbackHandler):
|
||||
"""Callback handler for streaming in agents.
|
||||
Only works with agents using LLMs that support streaming.
|
||||
|
||||
Only the final output of the agent will be streamed.
|
||||
"""
|
||||
|
||||
def __init__(self, answer_prefix_tokens: Optional[List[str]] = None) -> None:
|
||||
super().__init__()
|
||||
if answer_prefix_tokens is None:
|
||||
answer_prefix_tokens = DEFAULT_ANSWER_PREFIX_TOKENS
|
||||
self.answer_prefix_tokens = answer_prefix_tokens
|
||||
self.last_tokens = [""] * len(answer_prefix_tokens)
|
||||
self.answer_reached = False
|
||||
|
||||
def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
"""Run when LLM starts running."""
|
||||
self.answer_reached = False
|
||||
|
||||
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
"""Run on new LLM token. Only available when streaming is enabled."""
|
||||
|
||||
# Remember the last n tokens, where n = len(answer_prefix_tokens)
|
||||
self.last_tokens.append(token)
|
||||
if len(self.last_tokens) > len(self.answer_prefix_tokens):
|
||||
self.last_tokens.pop(0)
|
||||
|
||||
# Check if the last n tokens match the answer_prefix_tokens list ...
|
||||
if self.last_tokens == self.answer_prefix_tokens:
|
||||
self.answer_reached = True
|
||||
# Do not print the last token in answer_prefix_tokens,
|
||||
# as it's not part of the answer yet
|
||||
return
|
||||
|
||||
# ... if yes, then print tokens from now on
|
||||
if self.answer_reached:
|
||||
sys.stdout.write(token)
|
||||
sys.stdout.flush()
|
||||
@@ -31,7 +31,7 @@ def get_headers() -> Dict[str, Any]:
|
||||
|
||||
|
||||
def get_endpoint() -> str:
|
||||
return os.getenv("LANGCHAIN_ENDPOINT", "http://localhost:1984")
|
||||
return os.getenv("LANGCHAIN_ENDPOINT", "http://localhost:8000")
|
||||
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
|
||||
@@ -1,203 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.schema import AgentAction, AgentFinish, Generation, LLMResult
|
||||
from langchain.utils import get_from_env
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from whylogs.api.logger.logger import Logger
|
||||
|
||||
diagnostic_logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def import_langkit(
|
||||
sentiment: bool = False,
|
||||
toxicity: bool = False,
|
||||
themes: bool = False,
|
||||
) -> Any:
|
||||
try:
|
||||
import langkit # noqa: F401
|
||||
import langkit.regexes # noqa: F401
|
||||
import langkit.textstat # noqa: F401
|
||||
|
||||
if sentiment:
|
||||
import langkit.sentiment # noqa: F401
|
||||
if toxicity:
|
||||
import langkit.toxicity # noqa: F401
|
||||
if themes:
|
||||
import langkit.themes # noqa: F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"To use the whylabs callback manager you need to have the `langkit` python "
|
||||
"package installed. Please install it with `pip install langkit`."
|
||||
)
|
||||
return langkit
|
||||
|
||||
|
||||
class WhyLabsCallbackHandler(BaseCallbackHandler):
|
||||
"""WhyLabs CallbackHandler."""
|
||||
|
||||
def __init__(self, logger: Logger):
|
||||
"""Initiate the rolling logger"""
|
||||
super().__init__()
|
||||
self.logger = logger
|
||||
diagnostic_logger.info(
|
||||
"Initialized WhyLabs callback handler with configured whylogs Logger."
|
||||
)
|
||||
|
||||
def _profile_generations(self, generations: List[Generation]) -> None:
|
||||
for gen in generations:
|
||||
self.logger.log({"response": gen.text})
|
||||
|
||||
def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
"""Pass the input prompts to the logger"""
|
||||
for prompt in prompts:
|
||||
self.logger.log({"prompt": prompt})
|
||||
|
||||
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
"""Pass the generated response to the logger."""
|
||||
for generations in response.generations:
|
||||
self._profile_generations(generations)
|
||||
|
||||
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
"""Do nothing."""
|
||||
pass
|
||||
|
||||
def on_llm_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
pass
|
||||
|
||||
def on_chain_start(
|
||||
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_chain_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
pass
|
||||
|
||||
def on_tool_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
input_str: str,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_agent_action(
|
||||
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
|
||||
) -> Any:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
output: str,
|
||||
color: Optional[str] = None,
|
||||
observation_prefix: Optional[str] = None,
|
||||
llm_prefix: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_tool_error(
|
||||
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
|
||||
) -> None:
|
||||
"""Do nothing."""
|
||||
pass
|
||||
|
||||
def on_text(self, text: str, **kwargs: Any) -> None:
|
||||
"""Do nothing."""
|
||||
|
||||
def on_agent_finish(
|
||||
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
|
||||
) -> None:
|
||||
"""Run on agent end."""
|
||||
pass
|
||||
|
||||
def flush(self) -> None:
|
||||
self.logger._do_rollover()
|
||||
diagnostic_logger.info("Flushing WhyLabs logger, writing profile...")
|
||||
|
||||
def close(self) -> None:
|
||||
self.logger.close()
|
||||
diagnostic_logger.info("Closing WhyLabs logger, see you next time!")
|
||||
|
||||
def __enter__(self) -> WhyLabsCallbackHandler:
|
||||
return self
|
||||
|
||||
def __exit__(
|
||||
self, exception_type: Any, exception_value: Any, traceback: Any
|
||||
) -> None:
|
||||
self.close()
|
||||
|
||||
@classmethod
|
||||
def from_params(
|
||||
cls,
|
||||
*,
|
||||
api_key: Optional[str] = None,
|
||||
org_id: Optional[str] = None,
|
||||
dataset_id: Optional[str] = None,
|
||||
sentiment: bool = False,
|
||||
toxicity: bool = False,
|
||||
themes: bool = False,
|
||||
) -> Logger:
|
||||
"""Instantiate whylogs Logger from params.
|
||||
|
||||
Args:
|
||||
api_key (Optional[str]): WhyLabs API key. Optional because the preferred
|
||||
way to specify the API key is with environment variable
|
||||
WHYLABS_API_KEY.
|
||||
org_id (Optional[str]): WhyLabs organization id to write profiles to.
|
||||
If not set must be specified in environment variable
|
||||
WHYLABS_DEFAULT_ORG_ID.
|
||||
dataset_id (Optional[str]): The model or dataset this callback is gathering
|
||||
telemetry for. If not set must be specified in environment variable
|
||||
WHYLABS_DEFAULT_DATASET_ID.
|
||||
sentiment (bool): If True will initialize a model to perform
|
||||
sentiment analysis compound score. Defaults to False and will not gather
|
||||
this metric.
|
||||
toxicity (bool): If True will initialize a model to score
|
||||
toxicity. Defaults to False and will not gather this metric.
|
||||
themes (bool): If True will initialize a model to calculate
|
||||
distance to configured themes. Defaults to None and will not gather this
|
||||
metric.
|
||||
"""
|
||||
# langkit library will import necessary whylogs libraries
|
||||
import_langkit(sentiment=sentiment, toxicity=toxicity, themes=themes)
|
||||
|
||||
import whylogs as why
|
||||
from whylogs.api.writer.whylabs import WhyLabsWriter
|
||||
from whylogs.core.schema import DeclarativeSchema
|
||||
from whylogs.experimental.core.metrics.udf_metric import generate_udf_schema
|
||||
|
||||
api_key = api_key or get_from_env("api_key", "WHYLABS_API_KEY")
|
||||
org_id = org_id or get_from_env("org_id", "WHYLABS_DEFAULT_ORG_ID")
|
||||
dataset_id = dataset_id or get_from_env(
|
||||
"dataset_id", "WHYLABS_DEFAULT_DATASET_ID"
|
||||
)
|
||||
whylabs_writer = WhyLabsWriter(
|
||||
api_key=api_key, org_id=org_id, dataset_id=dataset_id
|
||||
)
|
||||
|
||||
langkit_schema = DeclarativeSchema(generate_udf_schema())
|
||||
whylabs_logger = why.logger(
|
||||
mode="rolling", interval=5, when="M", schema=langkit_schema
|
||||
)
|
||||
|
||||
whylabs_logger.append_writer(writer=whylabs_writer)
|
||||
diagnostic_logger.info(
|
||||
"Started whylogs Logger with WhyLabsWriter and initialized LangKit. 📝"
|
||||
)
|
||||
return cls(whylabs_logger)
|
||||
@@ -10,7 +10,6 @@ from langchain.chains.conversational_retrieval.base import (
|
||||
)
|
||||
from langchain.chains.flare.base import FlareChain
|
||||
from langchain.chains.graph_qa.base import GraphQAChain
|
||||
from langchain.chains.graph_qa.cypher import GraphCypherQAChain
|
||||
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.chains.llm_bash.base import LLMBashChain
|
||||
@@ -59,7 +58,6 @@ __all__ = [
|
||||
"HypotheticalDocumentEmbedder",
|
||||
"ChatVectorDBChain",
|
||||
"GraphQAChain",
|
||||
"GraphCypherQAChain",
|
||||
"ConstitutionalChain",
|
||||
"QAGenerationChain",
|
||||
"RetrievalQA",
|
||||
|
||||
@@ -195,7 +195,9 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
for docs in new_result_doc_list:
|
||||
new_doc = _collapse_docs(docs, _collapse_docs_func, **kwargs)
|
||||
result_docs.append(new_doc)
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
num_tokens = self.combine_document_chain.prompt_length(
|
||||
result_docs, **kwargs
|
||||
)
|
||||
if self.return_intermediate_steps:
|
||||
_results = [r[self.llm_chain.output_key] for r in results]
|
||||
extra_return_dict = {"intermediate_steps": _results}
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# flake8: noqa
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
|
||||
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
|
||||
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
|
||||
|
||||
Chat History:
|
||||
{chat_history}
|
||||
|
||||
@@ -1,90 +0,0 @@
|
||||
"""Question answering over a graph."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import CYPHER_GENERATION_PROMPT, CYPHER_QA_PROMPT
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.graphs.neo4j_graph import Neo4jGraph
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
|
||||
|
||||
class GraphCypherQAChain(Chain):
|
||||
"""Chain for question-answering against a graph by generating Cypher statements."""
|
||||
|
||||
graph: Neo4jGraph = Field(exclude=True)
|
||||
cypher_generation_chain: LLMChain
|
||||
qa_chain: LLMChain
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Return the input keys.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Return the output keys.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
_output_keys = [self.output_key]
|
||||
return _output_keys
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
*,
|
||||
qa_prompt: BasePromptTemplate = CYPHER_QA_PROMPT,
|
||||
cypher_prompt: BasePromptTemplate = CYPHER_GENERATION_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> GraphCypherQAChain:
|
||||
"""Initialize from LLM."""
|
||||
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||
cypher_generation_chain = LLMChain(llm=llm, prompt=cypher_prompt)
|
||||
|
||||
return cls(
|
||||
qa_chain=qa_chain,
|
||||
cypher_generation_chain=cypher_generation_chain,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Generate Cypher statement, use it to look up in db and answer question."""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
callbacks = _run_manager.get_child()
|
||||
question = inputs[self.input_key]
|
||||
|
||||
generated_cypher = self.cypher_generation_chain.run(
|
||||
{"question": question, "schema": self.graph.get_schema}, callbacks=callbacks
|
||||
)
|
||||
|
||||
_run_manager.on_text("Generated Cypher:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_cypher, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
context = self.graph.query(generated_cypher)
|
||||
|
||||
_run_manager.on_text("Full Context:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
str(context), color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
result = self.qa_chain(
|
||||
{"question": question, "context": context},
|
||||
callbacks=callbacks,
|
||||
)
|
||||
return {self.output_key: result[self.qa_chain.output_key]}
|
||||
@@ -32,32 +32,3 @@ Helpful Answer:"""
|
||||
PROMPT = PromptTemplate(
|
||||
template=prompt_template, input_variables=["context", "question"]
|
||||
)
|
||||
|
||||
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
|
||||
Instructions:
|
||||
Use only the provided relationship types and properties in the schema.
|
||||
Do not use any other relationship types or properties that are not provided.
|
||||
Schema:
|
||||
{schema}
|
||||
Note: Do not include any explanations or apologies in your responses.
|
||||
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
|
||||
Do not include any text except the generated Cypher statement.
|
||||
|
||||
The question is:
|
||||
{question}"""
|
||||
CYPHER_GENERATION_PROMPT = PromptTemplate(
|
||||
input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
|
||||
)
|
||||
|
||||
CYPHER_QA_TEMPLATE = """You are an assistant that helps to form nice and human understandable answers.
|
||||
The information part contains the provided information that you can use to construct an answer.
|
||||
The provided information is authorative, you must never doubt it or try to use your internal knowledge to correct it.
|
||||
Make it sound like the information are coming from an AI assistant, but don't add any information.
|
||||
Information:
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
Helpful Answer:"""
|
||||
CYPHER_QA_PROMPT = PromptTemplate(
|
||||
input_variables=["context", "question"], template=CYPHER_QA_TEMPLATE
|
||||
)
|
||||
|
||||
@@ -54,7 +54,7 @@ class OpenAIModerationChain(Chain):
|
||||
openai.organization = openai_organization
|
||||
values["client"] = openai.Moderation
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import openai python package. "
|
||||
"Please install it with `pip install openai`."
|
||||
)
|
||||
|
||||
@@ -22,7 +22,7 @@ from langchain.chains.query_constructor.prompt import (
|
||||
SCHEMA_WITH_LIMIT,
|
||||
)
|
||||
from langchain.chains.query_constructor.schema import AttributeInfo
|
||||
from langchain.output_parsers.json import parse_and_check_json_markdown
|
||||
from langchain.output_parsers.structured import parse_json_markdown
|
||||
from langchain.schema import BaseOutputParser, OutputParserException
|
||||
|
||||
|
||||
@@ -33,7 +33,7 @@ class StructuredQueryOutputParser(BaseOutputParser[StructuredQuery]):
|
||||
def parse(self, text: str) -> StructuredQuery:
|
||||
try:
|
||||
expected_keys = ["query", "filter"]
|
||||
parsed = parse_and_check_json_markdown(text, expected_keys)
|
||||
parsed = parse_json_markdown(text, expected_keys)
|
||||
if len(parsed["query"]) == 0:
|
||||
parsed["query"] = " "
|
||||
if parsed["filter"] == "NO_FILTER" or not parsed["filter"]:
|
||||
|
||||
@@ -9,7 +9,7 @@ from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains import LLMChain
|
||||
from langchain.chains.router.base import RouterChain
|
||||
from langchain.output_parsers.json import parse_and_check_json_markdown
|
||||
from langchain.output_parsers.structured import parse_json_markdown
|
||||
from langchain.prompts import BasePromptTemplate
|
||||
from langchain.schema import BaseOutputParser, OutputParserException
|
||||
|
||||
@@ -77,7 +77,7 @@ class RouterOutputParser(BaseOutputParser[Dict[str, str]]):
|
||||
def parse(self, text: str) -> Dict[str, Any]:
|
||||
try:
|
||||
expected_keys = ["destination", "next_inputs"]
|
||||
parsed = parse_and_check_json_markdown(text, expected_keys)
|
||||
parsed = parse_json_markdown(text, expected_keys)
|
||||
if not isinstance(parsed["destination"], str):
|
||||
raise ValueError("Expected 'destination' to be a string.")
|
||||
if not isinstance(parsed["next_inputs"], self.next_inputs_type):
|
||||
|
||||
@@ -3,7 +3,6 @@ from langchain.chat_models.azure_openai import AzureChatOpenAI
|
||||
from langchain.chat_models.google_palm import ChatGooglePalm
|
||||
from langchain.chat_models.openai import ChatOpenAI
|
||||
from langchain.chat_models.promptlayer_openai import PromptLayerChatOpenAI
|
||||
from langchain.chat_models.vertexai import ChatVertexAI
|
||||
|
||||
__all__ = [
|
||||
"ChatOpenAI",
|
||||
@@ -11,5 +10,4 @@ __all__ = [
|
||||
"PromptLayerChatOpenAI",
|
||||
"ChatAnthropic",
|
||||
"ChatGooglePalm",
|
||||
"ChatVertexAI",
|
||||
]
|
||||
|
||||
@@ -86,7 +86,7 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
if openai_organization:
|
||||
openai.organization = openai_organization
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import openai python package. "
|
||||
"Please install it with `pip install openai`."
|
||||
)
|
||||
|
||||
@@ -183,19 +183,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
else:
|
||||
raise ValueError("Unexpected generation type")
|
||||
|
||||
async def _call_async(
|
||||
self,
|
||||
messages: List[BaseMessage],
|
||||
stop: Optional[List[str]] = None,
|
||||
callbacks: Callbacks = None,
|
||||
) -> BaseMessage:
|
||||
result = await self.agenerate([messages], stop=stop, callbacks=callbacks)
|
||||
generation = result.generations[0][0]
|
||||
if isinstance(generation, ChatGeneration):
|
||||
return generation.message
|
||||
else:
|
||||
raise ValueError("Unexpected generation type")
|
||||
|
||||
def call_as_llm(self, message: str, stop: Optional[List[str]] = None) -> str:
|
||||
return self.predict(message, stop=stop)
|
||||
|
||||
@@ -216,23 +203,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
_stop = list(stop)
|
||||
return self(messages, stop=_stop)
|
||||
|
||||
async def apredict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
|
||||
if stop is None:
|
||||
_stop = None
|
||||
else:
|
||||
_stop = list(stop)
|
||||
result = await self._call_async([HumanMessage(content=text)], stop=_stop)
|
||||
return result.content
|
||||
|
||||
async def apredict_messages(
|
||||
self, messages: List[BaseMessage], *, stop: Optional[Sequence[str]] = None
|
||||
) -> BaseMessage:
|
||||
if stop is None:
|
||||
_stop = None
|
||||
else:
|
||||
_stop = list(stop)
|
||||
return await self._call_async(messages, stop=_stop)
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Mapping[str, Any]:
|
||||
"""Get the identifying parameters."""
|
||||
|
||||
@@ -3,17 +3,7 @@ from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import sys
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Dict,
|
||||
List,
|
||||
Mapping,
|
||||
Optional,
|
||||
Tuple,
|
||||
Union,
|
||||
)
|
||||
from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union
|
||||
|
||||
from pydantic import Extra, Field, root_validator
|
||||
from tenacity import (
|
||||
@@ -40,24 +30,9 @@ from langchain.schema import (
|
||||
)
|
||||
from langchain.utils import get_from_dict_or_env
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import tiktoken
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _import_tiktoken() -> Any:
|
||||
try:
|
||||
import tiktoken
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import tiktoken python package. "
|
||||
"This is needed in order to calculate get_token_ids. "
|
||||
"Please install it with `pip install tiktoken`."
|
||||
)
|
||||
return tiktoken
|
||||
|
||||
|
||||
def _create_retry_decorator(llm: ChatOpenAI) -> Callable[[Any], Any]:
|
||||
import openai
|
||||
|
||||
@@ -379,8 +354,42 @@ class ChatOpenAI(BaseChatModel):
|
||||
"""Return type of chat model."""
|
||||
return "openai-chat"
|
||||
|
||||
def _get_encoding_model(self) -> Tuple[str, tiktoken.Encoding]:
|
||||
tiktoken_ = _import_tiktoken()
|
||||
def get_num_tokens(self, text: str) -> int:
|
||||
"""Calculate num tokens with tiktoken package."""
|
||||
# tiktoken NOT supported for Python 3.7 or below
|
||||
if sys.version_info[1] <= 7:
|
||||
return super().get_num_tokens(text)
|
||||
try:
|
||||
import tiktoken
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import tiktoken python package. "
|
||||
"This is needed in order to calculate get_num_tokens. "
|
||||
"Please install it with `pip install tiktoken`."
|
||||
)
|
||||
# create a GPT-3.5-Turbo encoder instance
|
||||
enc = tiktoken.encoding_for_model(self.model_name)
|
||||
|
||||
# encode the text using the GPT-3.5-Turbo encoder
|
||||
tokenized_text = enc.encode(text)
|
||||
|
||||
# calculate the number of tokens in the encoded text
|
||||
return len(tokenized_text)
|
||||
|
||||
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
|
||||
"""Calculate num tokens for gpt-3.5-turbo and gpt-4 with tiktoken package.
|
||||
|
||||
Official documentation: https://github.com/openai/openai-cookbook/blob/
|
||||
main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb"""
|
||||
try:
|
||||
import tiktoken
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import tiktoken python package. "
|
||||
"This is needed in order to calculate get_num_tokens. "
|
||||
"Please install it with `pip install tiktoken`."
|
||||
)
|
||||
|
||||
model = self.model_name
|
||||
if model == "gpt-3.5-turbo":
|
||||
# gpt-3.5-turbo may change over time.
|
||||
@@ -390,31 +399,14 @@ class ChatOpenAI(BaseChatModel):
|
||||
# gpt-4 may change over time.
|
||||
# Returning num tokens assuming gpt-4-0314.
|
||||
model = "gpt-4-0314"
|
||||
|
||||
# Returns the number of tokens used by a list of messages.
|
||||
try:
|
||||
encoding = tiktoken_.encoding_for_model(model)
|
||||
encoding = tiktoken.encoding_for_model(model)
|
||||
except KeyError:
|
||||
logger.warning("Warning: model not found. Using cl100k_base encoding.")
|
||||
model = "cl100k_base"
|
||||
encoding = tiktoken_.get_encoding(model)
|
||||
return model, encoding
|
||||
encoding = tiktoken.get_encoding("cl100k_base")
|
||||
|
||||
def get_token_ids(self, text: str) -> List[int]:
|
||||
"""Get the tokens present in the text with tiktoken package."""
|
||||
# tiktoken NOT supported for Python 3.7 or below
|
||||
if sys.version_info[1] <= 7:
|
||||
return super().get_token_ids(text)
|
||||
_, encoding_model = self._get_encoding_model()
|
||||
return encoding_model.encode(text)
|
||||
|
||||
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
|
||||
"""Calculate num tokens for gpt-3.5-turbo and gpt-4 with tiktoken package.
|
||||
|
||||
Official documentation: https://github.com/openai/openai-cookbook/blob/
|
||||
main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb"""
|
||||
if sys.version_info[1] <= 7:
|
||||
return super().get_num_tokens_from_messages(messages)
|
||||
model, encoding = self._get_encoding_model()
|
||||
if model == "gpt-3.5-turbo-0301":
|
||||
# every message follows <im_start>{role/name}\n{content}<im_end>\n
|
||||
tokens_per_message = 4
|
||||
|
||||
@@ -1,137 +0,0 @@
|
||||
"""Wrapper around Google VertexAI chat-based models."""
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
from pydantic import root_validator
|
||||
|
||||
from langchain.callbacks.manager import (
|
||||
AsyncCallbackManagerForLLMRun,
|
||||
CallbackManagerForLLMRun,
|
||||
)
|
||||
from langchain.chat_models.base import BaseChatModel
|
||||
from langchain.llms.vertexai import _VertexAICommon
|
||||
from langchain.schema import (
|
||||
AIMessage,
|
||||
BaseMessage,
|
||||
ChatGeneration,
|
||||
ChatResult,
|
||||
HumanMessage,
|
||||
SystemMessage,
|
||||
)
|
||||
from langchain.utilities.vertexai import raise_vertex_import_error
|
||||
|
||||
|
||||
@dataclass
|
||||
class _MessagePair:
|
||||
"""InputOutputTextPair represents a pair of input and output texts."""
|
||||
|
||||
question: HumanMessage
|
||||
answer: AIMessage
|
||||
|
||||
|
||||
@dataclass
|
||||
class _ChatHistory:
|
||||
"""InputOutputTextPair represents a pair of input and output texts."""
|
||||
|
||||
history: List[_MessagePair] = field(default_factory=list)
|
||||
system_message: Optional[SystemMessage] = None
|
||||
|
||||
|
||||
def _parse_chat_history(history: List[BaseMessage]) -> _ChatHistory:
|
||||
"""Parse a sequence of messages into history.
|
||||
|
||||
A sequence should be either (SystemMessage, HumanMessage, AIMessage,
|
||||
HumanMessage, AIMessage, ...) or (HumanMessage, AIMessage, HumanMessage,
|
||||
AIMessage, ...).
|
||||
|
||||
Args:
|
||||
history: The list of messages to re-create the history of the chat.
|
||||
Returns:
|
||||
A parsed chat history.
|
||||
Raises:
|
||||
ValueError: If a sequence of message is odd, or a human message is not followed
|
||||
by a message from AI (e.g., Human, Human, AI or AI, AI, Human).
|
||||
"""
|
||||
if not history:
|
||||
return _ChatHistory()
|
||||
first_message = history[0]
|
||||
system_message = first_message if isinstance(first_message, SystemMessage) else None
|
||||
chat_history = _ChatHistory(system_message=system_message)
|
||||
messages_left = history[1:] if system_message else history
|
||||
if len(messages_left) % 2 != 0:
|
||||
raise ValueError(
|
||||
f"Amount of messages in history should be even, got {len(messages_left)}!"
|
||||
)
|
||||
for question, answer in zip(messages_left[::2], messages_left[1::2]):
|
||||
if not isinstance(question, HumanMessage) or not isinstance(answer, AIMessage):
|
||||
raise ValueError(
|
||||
"A human message should follow a bot one, "
|
||||
f"got {question.type}, {answer.type}."
|
||||
)
|
||||
chat_history.history.append(_MessagePair(question=question, answer=answer))
|
||||
return chat_history
|
||||
|
||||
|
||||
class ChatVertexAI(_VertexAICommon, BaseChatModel):
|
||||
"""Wrapper around Vertex AI large language models."""
|
||||
|
||||
model_name: str = "chat-bison"
|
||||
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
"""Validate that the python package exists in environment."""
|
||||
cls._try_init_vertexai(values)
|
||||
try:
|
||||
from vertexai.preview.language_models import ChatModel
|
||||
except ImportError:
|
||||
raise_vertex_import_error()
|
||||
values["client"] = ChatModel.from_pretrained(values["model_name"])
|
||||
return values
|
||||
|
||||
def _generate(
|
||||
self,
|
||||
messages: List[BaseMessage],
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
) -> ChatResult:
|
||||
"""Generate next turn in the conversation.
|
||||
|
||||
Args:
|
||||
messages: The history of the conversation as a list of messages.
|
||||
stop: The list of stop words (optional).
|
||||
run_manager: The Callbackmanager for LLM run, it's not used at the moment.
|
||||
|
||||
Returns:
|
||||
The ChatResult that contains outputs generated by the model.
|
||||
|
||||
Raises:
|
||||
ValueError: if the last message in the list is not from human.
|
||||
"""
|
||||
if not messages:
|
||||
raise ValueError(
|
||||
"You should provide at least one message to start the chat!"
|
||||
)
|
||||
question = messages[-1]
|
||||
if not isinstance(question, HumanMessage):
|
||||
raise ValueError(
|
||||
f"Last message in the list should be from human, got {question.type}."
|
||||
)
|
||||
|
||||
history = _parse_chat_history(messages[:-1])
|
||||
context = history.system_message.content if history.system_message else None
|
||||
chat = self.client.start_chat(context=context, **self._default_params)
|
||||
for pair in history.history:
|
||||
chat._history.append((pair.question.content, pair.answer.content))
|
||||
response = chat.send_message(question.content)
|
||||
text = self._enforce_stop_words(response.text, stop)
|
||||
return ChatResult(generations=[ChatGeneration(message=AIMessage(content=text))])
|
||||
|
||||
async def _agenerate(
|
||||
self,
|
||||
messages: List[BaseMessage],
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
|
||||
) -> ChatResult:
|
||||
raise NotImplementedError(
|
||||
"""Vertex AI doesn't support async requests at the moment."""
|
||||
)
|
||||
@@ -5,7 +5,9 @@ services:
|
||||
ports:
|
||||
- 80:80
|
||||
environment:
|
||||
- REACT_APP_BACKEND_URL=http://localhost:1984
|
||||
- BACKEND_URL=http://langchain-backend:8000
|
||||
- PUBLIC_BASE_URL=http://localhost:8000
|
||||
- PUBLIC_DEV_MODE=true
|
||||
depends_on:
|
||||
- langchain-backend
|
||||
volumes:
|
||||
@@ -16,12 +18,11 @@ services:
|
||||
langchain-backend:
|
||||
image: langchain/${_LANGCHAINPLUS_IMAGE_PREFIX-}langchainplus-backend:latest
|
||||
environment:
|
||||
- PORT=1984
|
||||
- PORT=8000
|
||||
- LANGCHAIN_ENV=local_docker
|
||||
- LOG_LEVEL=warning
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
||||
ports:
|
||||
- 1984:1984
|
||||
- 8000:8000
|
||||
depends_on:
|
||||
- langchain-db
|
||||
build:
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import shutil
|
||||
@@ -7,7 +6,7 @@ import subprocess
|
||||
from contextlib import contextmanager
|
||||
from pathlib import Path
|
||||
from subprocess import CalledProcessError
|
||||
from typing import Dict, Generator, List, Mapping, Optional, Union, cast
|
||||
from typing import Generator, List, Optional
|
||||
|
||||
import requests
|
||||
import yaml
|
||||
@@ -20,50 +19,6 @@ logger = logging.getLogger(__name__)
|
||||
_DIR = Path(__file__).parent
|
||||
|
||||
|
||||
def pprint_services(services_status: List[Mapping[str, Union[str, List[str]]]]) -> None:
|
||||
# Loop through and collect Service, State, and Publishers["PublishedPorts"]
|
||||
# for each service
|
||||
services = []
|
||||
for service in services_status:
|
||||
service_status: Dict[str, str] = {
|
||||
"Service": str(service["Service"]),
|
||||
"Status": str(service["Status"]),
|
||||
}
|
||||
publishers = cast(List[Dict], service.get("Publishers", []))
|
||||
if publishers:
|
||||
service_status["PublishedPorts"] = ", ".join(
|
||||
[str(publisher["PublishedPort"]) for publisher in publishers]
|
||||
)
|
||||
services.append(service_status)
|
||||
|
||||
max_service_len = max(len(service["Service"]) for service in services)
|
||||
max_state_len = max(len(service["Status"]) for service in services)
|
||||
service_message = [
|
||||
"\n"
|
||||
+ "Service".ljust(max_service_len + 2)
|
||||
+ "Status".ljust(max_state_len + 2)
|
||||
+ "Published Ports"
|
||||
]
|
||||
for service in services:
|
||||
service_str = service["Service"].ljust(max_service_len + 2)
|
||||
state_str = service["Status"].ljust(max_state_len + 2)
|
||||
ports_str = service.get("PublishedPorts", "")
|
||||
service_message.append(service_str + state_str + ports_str)
|
||||
|
||||
langchain_endpoint: str = "http://localhost:1984"
|
||||
used_ngrok = any(["ngrok" in service["Service"] for service in services])
|
||||
if used_ngrok:
|
||||
langchain_endpoint = get_ngrok_url(auth_token=None)
|
||||
|
||||
service_message.append(
|
||||
"\nTo connect, set the following environment variables"
|
||||
" in your LangChain application:"
|
||||
"\nLANGCHAIN_TRACING_V2=true"
|
||||
f"\nLANGCHAIN_ENDPOINT={langchain_endpoint}"
|
||||
)
|
||||
logger.info("\n".join(service_message))
|
||||
|
||||
|
||||
def get_docker_compose_command() -> List[str]:
|
||||
"""Get the correct docker compose command for this system."""
|
||||
try:
|
||||
@@ -218,7 +173,6 @@ class PlusCommand:
|
||||
expose: bool = False,
|
||||
auth_token: Optional[str] = None,
|
||||
dev: bool = False,
|
||||
openai_api_key: Optional[str] = None,
|
||||
) -> None:
|
||||
"""Run the LangChainPlus server locally.
|
||||
|
||||
@@ -226,16 +180,9 @@ class PlusCommand:
|
||||
expose: If True, expose the server to the internet using ngrok.
|
||||
auth_token: The ngrok authtoken to use (visible in the ngrok dashboard).
|
||||
If not provided, ngrok server session length will be restricted.
|
||||
dev: If True, use the development (rc) image of LangChainPlus.
|
||||
openai_api_key: The OpenAI API key to use for LangChainPlus
|
||||
If not provided, the OpenAI API Key will be read from the
|
||||
OPENAI_API_KEY environment variable. If neither are provided,
|
||||
some features of LangChainPlus will not be available.
|
||||
"""
|
||||
if dev:
|
||||
os.environ["_LANGCHAINPLUS_IMAGE_PREFIX"] = "rc-"
|
||||
if openai_api_key is not None:
|
||||
os.environ["OPENAI_API_KEY"] = openai_api_key
|
||||
if expose:
|
||||
self._start_and_expose(auth_token=auth_token)
|
||||
else:
|
||||
@@ -267,36 +214,6 @@ class PlusCommand:
|
||||
]
|
||||
)
|
||||
|
||||
def status(self) -> None:
|
||||
"""Provide information about the status LangChainPlus server."""
|
||||
|
||||
command = [
|
||||
*self.docker_compose_command,
|
||||
"-f",
|
||||
str(self.docker_compose_file),
|
||||
"ps",
|
||||
"--format",
|
||||
"json",
|
||||
]
|
||||
|
||||
result = subprocess.run(
|
||||
command,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
try:
|
||||
command_stdout = result.stdout.decode("utf-8")
|
||||
services_status = json.loads(command_stdout)
|
||||
except json.JSONDecodeError:
|
||||
logger.error("Error checking LangChainPlus server status.")
|
||||
return
|
||||
if services_status:
|
||||
logger.info("The LangChainPlus server is currently running.")
|
||||
pprint_services(services_status)
|
||||
else:
|
||||
logger.info("The LangChainPlus server is not running.")
|
||||
return
|
||||
|
||||
|
||||
def env() -> None:
|
||||
"""Print the runtime environment information."""
|
||||
@@ -333,20 +250,9 @@ def main() -> None:
|
||||
action="store_true",
|
||||
help="Use the development version of the LangChainPlus image.",
|
||||
)
|
||||
server_start_parser.add_argument(
|
||||
"--openai-api-key",
|
||||
default=os.getenv("OPENAI_API_KEY"),
|
||||
help="The OpenAI API key to use for LangChainPlus."
|
||||
" If not provided, the OpenAI API Key will be read from the"
|
||||
" OPENAI_API_KEY environment variable. If neither are provided,"
|
||||
" some features of LangChainPlus will not be available.",
|
||||
)
|
||||
server_start_parser.set_defaults(
|
||||
func=lambda args: server_command.start(
|
||||
expose=args.expose,
|
||||
auth_token=args.ngrok_authtoken,
|
||||
dev=args.dev,
|
||||
openai_api_key=args.openai_api_key,
|
||||
expose=args.expose, auth_token=args.ngrok_authtoken, dev=args.dev
|
||||
)
|
||||
)
|
||||
|
||||
@@ -359,10 +265,7 @@ def main() -> None:
|
||||
"logs", description="Show the LangChainPlus server logs."
|
||||
)
|
||||
server_logs_parser.set_defaults(func=lambda args: server_command.logs())
|
||||
server_status_parser = server_subparsers.add_parser(
|
||||
"status", description="Show the LangChainPlus server status."
|
||||
)
|
||||
server_status_parser.set_defaults(func=lambda args: server_command.status())
|
||||
|
||||
env_parser = subparsers.add_parser("env")
|
||||
env_parser.set_defaults(func=lambda args: env())
|
||||
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import functools
|
||||
import logging
|
||||
import socket
|
||||
from datetime import datetime
|
||||
@@ -8,8 +10,9 @@ from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Callable,
|
||||
Coroutine,
|
||||
Dict,
|
||||
Iterator,
|
||||
Iterable,
|
||||
List,
|
||||
Optional,
|
||||
Tuple,
|
||||
@@ -24,8 +27,10 @@ from requests import Response
|
||||
from tenacity import retry, stop_after_attempt, wait_fixed
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.tracers.langchain import LangChainTracer
|
||||
from langchain.callbacks.tracers.schemas import Run, TracerSession
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chat_models.base import BaseChatModel
|
||||
from langchain.client.models import (
|
||||
Dataset,
|
||||
DatasetCreate,
|
||||
@@ -33,7 +38,15 @@ from langchain.client.models import (
|
||||
ExampleCreate,
|
||||
ListRunsQueryParams,
|
||||
)
|
||||
from langchain.client.runner_utils import arun_on_examples, run_on_examples
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.schema import (
|
||||
BaseMessage,
|
||||
ChatResult,
|
||||
HumanMessage,
|
||||
LLMResult,
|
||||
get_buffer_string,
|
||||
messages_from_dict,
|
||||
)
|
||||
from langchain.utils import raise_for_status_with_text, xor_args
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@@ -44,6 +57,10 @@ logger = logging.getLogger(__name__)
|
||||
MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]
|
||||
|
||||
|
||||
class InputFormatError(Exception):
|
||||
"""Raised when input format is invalid."""
|
||||
|
||||
|
||||
def _get_link_stem(url: str) -> str:
|
||||
scheme = urlsplit(url).scheme
|
||||
netloc_prefix = urlsplit(url).netloc.split(":")[0]
|
||||
@@ -64,13 +81,13 @@ class LangChainPlusClient(BaseSettings):
|
||||
"""Client for interacting with the LangChain+ API."""
|
||||
|
||||
api_key: Optional[str] = Field(default=None, env="LANGCHAIN_API_KEY")
|
||||
api_url: str = Field(default="http://localhost:1984", env="LANGCHAIN_ENDPOINT")
|
||||
api_url: str = Field(default="http://localhost:8000", env="LANGCHAIN_ENDPOINT")
|
||||
tenant_id: Optional[str] = None
|
||||
|
||||
@root_validator(pre=True)
|
||||
def validate_api_key_if_hosted(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Verify API key is provided if url not localhost."""
|
||||
api_url: str = values.get("api_url", "http://localhost:1984")
|
||||
api_url: str = values.get("api_url", "http://localhost:8000")
|
||||
api_key: Optional[str] = values.get("api_key")
|
||||
if not _is_localhost(api_url):
|
||||
if not api_key:
|
||||
@@ -200,7 +217,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
return Dataset(**result)
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def read_run(self, run_id: Union[str, UUID]) -> Run:
|
||||
def read_run(self, run_id: str) -> Run:
|
||||
"""Read a run from the LangChain+ API."""
|
||||
response = self._get(f"/runs/{run_id}")
|
||||
raise_for_status_with_text(response)
|
||||
@@ -214,7 +231,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
session_name: Optional[str] = None,
|
||||
run_type: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> Iterator[Run]:
|
||||
) -> List[Run]:
|
||||
"""List runs from the LangChain+ API."""
|
||||
if session_name is not None:
|
||||
if session_id is not None:
|
||||
@@ -228,7 +245,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
}
|
||||
response = self._get("/runs", params=filtered_params)
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Run(**run) for run in response.json()]
|
||||
return [Run(**run) for run in response.json()]
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
@xor_args(("session_id", "session_name"))
|
||||
@@ -262,27 +279,11 @@ class LangChainPlusClient(BaseSettings):
|
||||
return TracerSession(**response.json())
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def list_sessions(self) -> Iterator[TracerSession]:
|
||||
def list_sessions(self) -> List[TracerSession]:
|
||||
"""List sessions from the LangChain+ API."""
|
||||
response = self._get("/sessions")
|
||||
raise_for_status_with_text(response)
|
||||
yield from [TracerSession(**session) for session in response.json()]
|
||||
|
||||
@xor_args(("session_name", "session_id"))
|
||||
def delete_session(
|
||||
self, *, session_name: Optional[str] = None, session_id: Optional[str] = None
|
||||
) -> None:
|
||||
"""Delete a session from the LangChain+ API."""
|
||||
if session_name is not None:
|
||||
session_id = self.read_session(session_name=session_name).id
|
||||
elif session_id is None:
|
||||
raise ValueError("Must provide session_name or session_id")
|
||||
response = requests.delete(
|
||||
self.api_url + f"/sessions/{session_id}",
|
||||
headers=self._headers,
|
||||
)
|
||||
raise_for_status_with_text(response)
|
||||
return None
|
||||
return [TracerSession(**session) for session in response.json()]
|
||||
|
||||
def create_dataset(self, dataset_name: str, description: str) -> Dataset:
|
||||
"""Create a dataset in the LangChain+ API."""
|
||||
@@ -325,11 +326,11 @@ class LangChainPlusClient(BaseSettings):
|
||||
return Dataset(**result)
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def list_datasets(self, limit: int = 100) -> Iterator[Dataset]:
|
||||
def list_datasets(self, limit: int = 100) -> Iterable[Dataset]:
|
||||
"""List the datasets on the LangChain+ API."""
|
||||
response = self._get("/datasets", params={"limit": limit})
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Dataset(**dataset) for dataset in response.json()]
|
||||
return [Dataset(**dataset) for dataset in response.json()]
|
||||
|
||||
@xor_args(("dataset_id", "dataset_name"))
|
||||
def delete_dataset(
|
||||
@@ -345,7 +346,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
headers=self._headers,
|
||||
)
|
||||
raise_for_status_with_text(response)
|
||||
return Dataset(**response.json())
|
||||
return response.json()
|
||||
|
||||
@xor_args(("dataset_id", "dataset_name"))
|
||||
def create_example(
|
||||
@@ -358,7 +359,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
) -> Example:
|
||||
"""Create a dataset example in the LangChain+ API."""
|
||||
if dataset_id is None:
|
||||
dataset_id = self.read_dataset(dataset_name=dataset_name).id
|
||||
dataset_id = self.read_dataset(dataset_name).id
|
||||
|
||||
data = {
|
||||
"inputs": inputs,
|
||||
@@ -376,7 +377,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
return Example(**result)
|
||||
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def read_example(self, example_id: Union[str, UUID]) -> Example:
|
||||
def read_example(self, example_id: str) -> Example:
|
||||
"""Read an example from the LangChain+ API."""
|
||||
response = self._get(f"/examples/{example_id}")
|
||||
raise_for_status_with_text(response)
|
||||
@@ -385,7 +386,7 @@ class LangChainPlusClient(BaseSettings):
|
||||
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
|
||||
def list_examples(
|
||||
self, dataset_id: Optional[str] = None, dataset_name: Optional[str] = None
|
||||
) -> Iterator[Example]:
|
||||
) -> Iterable[Example]:
|
||||
"""List the datasets on the LangChain+ API."""
|
||||
params = {}
|
||||
if dataset_id is not None:
|
||||
@@ -397,7 +398,195 @@ class LangChainPlusClient(BaseSettings):
|
||||
pass
|
||||
response = self._get("/examples", params=params)
|
||||
raise_for_status_with_text(response)
|
||||
yield from [Example(**dataset) for dataset in response.json()]
|
||||
return [Example(**dataset) for dataset in response.json()]
|
||||
|
||||
@staticmethod
|
||||
def _get_prompts(inputs: Dict[str, Any]) -> List[str]:
|
||||
"""Get prompts from inputs."""
|
||||
if not inputs:
|
||||
raise InputFormatError("Inputs should not be empty.")
|
||||
|
||||
prompts = []
|
||||
|
||||
if "prompt" in inputs:
|
||||
if not isinstance(inputs["prompt"], str):
|
||||
raise InputFormatError(
|
||||
"Expected string for 'prompt', got"
|
||||
f" {type(inputs['prompt']).__name__}"
|
||||
)
|
||||
prompts = [inputs["prompt"]]
|
||||
elif "prompts" in inputs:
|
||||
if not isinstance(inputs["prompts"], list) or not all(
|
||||
isinstance(i, str) for i in inputs["prompts"]
|
||||
):
|
||||
raise InputFormatError(
|
||||
"Expected list of strings for 'prompts',"
|
||||
f" got {type(inputs['prompts']).__name__}"
|
||||
)
|
||||
prompts = inputs["prompts"]
|
||||
elif len(inputs) == 1:
|
||||
prompt_ = next(iter(inputs.values()))
|
||||
if isinstance(prompt_, str):
|
||||
prompts = [prompt_]
|
||||
elif isinstance(prompt_, list) and all(isinstance(i, str) for i in prompt_):
|
||||
prompts = prompt_
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"LLM Run expects string prompt input. Got {inputs}"
|
||||
)
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"LLM Run expects 'prompt' or 'prompts' in inputs. Got {inputs}"
|
||||
)
|
||||
|
||||
return prompts
|
||||
|
||||
@staticmethod
|
||||
def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
|
||||
"""Get Chat Messages from inputs."""
|
||||
if not inputs:
|
||||
raise InputFormatError("Inputs should not be empty.")
|
||||
|
||||
if "messages" in inputs:
|
||||
single_input = inputs["messages"]
|
||||
elif len(inputs) == 1:
|
||||
single_input = next(iter(inputs.values()))
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"Chat Run expects 'messages' in inputs. Got {inputs}"
|
||||
)
|
||||
if isinstance(single_input, list) and all(
|
||||
isinstance(i, dict) for i in single_input
|
||||
):
|
||||
raw_messages = [single_input]
|
||||
elif isinstance(single_input, list) and all(
|
||||
isinstance(i, list) for i in single_input
|
||||
):
|
||||
raw_messages = single_input
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"Chat Run expects List[dict] or List[List[dict]] 'messages'"
|
||||
f" input. Got {inputs}"
|
||||
)
|
||||
return [messages_from_dict(batch) for batch in raw_messages]
|
||||
|
||||
@staticmethod
|
||||
async def _arun_llm(
|
||||
llm: BaseLanguageModel,
|
||||
inputs: Dict[str, Any],
|
||||
langchain_tracer: LangChainTracer,
|
||||
) -> Union[LLMResult, ChatResult]:
|
||||
if isinstance(llm, BaseLLM):
|
||||
try:
|
||||
llm_prompts = LangChainPlusClient._get_prompts(inputs)
|
||||
llm_output = await llm.agenerate(
|
||||
llm_prompts, callbacks=[langchain_tracer]
|
||||
)
|
||||
except InputFormatError:
|
||||
llm_messages = LangChainPlusClient._get_messages(inputs)
|
||||
buffer_strings = [
|
||||
get_buffer_string(messages) for messages in llm_messages
|
||||
]
|
||||
llm_output = await llm.agenerate(
|
||||
buffer_strings, callbacks=[langchain_tracer]
|
||||
)
|
||||
elif isinstance(llm, BaseChatModel):
|
||||
try:
|
||||
messages = LangChainPlusClient._get_messages(inputs)
|
||||
llm_output = await llm.agenerate(messages, callbacks=[langchain_tracer])
|
||||
except InputFormatError:
|
||||
prompts = LangChainPlusClient._get_prompts(inputs)
|
||||
converted_messages: List[List[BaseMessage]] = [
|
||||
[HumanMessage(content=prompt)] for prompt in prompts
|
||||
]
|
||||
llm_output = await llm.agenerate(
|
||||
converted_messages, callbacks=[langchain_tracer]
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unsupported LLM type {type(llm)}")
|
||||
return llm_output
|
||||
|
||||
@staticmethod
|
||||
async def _arun_llm_or_chain(
|
||||
example: Example,
|
||||
langchain_tracer: LangChainTracer,
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
n_repetitions: int,
|
||||
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
|
||||
"""Run the chain asynchronously."""
|
||||
previous_example_id = langchain_tracer.example_id
|
||||
langchain_tracer.example_id = example.id
|
||||
outputs = []
|
||||
for _ in range(n_repetitions):
|
||||
try:
|
||||
if isinstance(llm_or_chain_factory, BaseLanguageModel):
|
||||
output: Any = await LangChainPlusClient._arun_llm(
|
||||
llm_or_chain_factory, example.inputs, langchain_tracer
|
||||
)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = await chain.arun(
|
||||
example.inputs, callbacks=[langchain_tracer]
|
||||
)
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
outputs.append({"Error": str(e)})
|
||||
langchain_tracer.example_id = previous_example_id
|
||||
return outputs
|
||||
|
||||
@staticmethod
|
||||
async def _gather_with_concurrency(
|
||||
n: int,
|
||||
initializer: Callable[[], Coroutine[Any, Any, LangChainTracer]],
|
||||
*async_funcs: Callable[[LangChainTracer, Dict], Coroutine[Any, Any, Any]],
|
||||
) -> List[Any]:
|
||||
"""
|
||||
Run coroutines with a concurrency limit.
|
||||
|
||||
Args:
|
||||
n: The maximum number of concurrent tasks.
|
||||
initializer: A coroutine that initializes shared resources for the tasks.
|
||||
async_funcs: The async_funcs to be run concurrently.
|
||||
|
||||
Returns:
|
||||
A list of results from the coroutines.
|
||||
"""
|
||||
semaphore = asyncio.Semaphore(n)
|
||||
job_state = {"num_processed": 0}
|
||||
|
||||
tracer_queue: asyncio.Queue[LangChainTracer] = asyncio.Queue()
|
||||
for _ in range(n):
|
||||
tracer_queue.put_nowait(await initializer())
|
||||
|
||||
async def run_coroutine_with_semaphore(
|
||||
async_func: Callable[[LangChainTracer, Dict], Coroutine[Any, Any, Any]]
|
||||
) -> Any:
|
||||
async with semaphore:
|
||||
tracer = await tracer_queue.get()
|
||||
try:
|
||||
result = await async_func(tracer, job_state)
|
||||
finally:
|
||||
tracer_queue.put_nowait(tracer)
|
||||
return result
|
||||
|
||||
return await asyncio.gather(
|
||||
*(run_coroutine_with_semaphore(function) for function in async_funcs)
|
||||
)
|
||||
|
||||
async def _tracer_initializer(self, session_name: str) -> LangChainTracer:
|
||||
"""
|
||||
Initialize a tracer to share across tasks.
|
||||
|
||||
Args:
|
||||
session_name: The session name for the tracer.
|
||||
|
||||
Returns:
|
||||
A LangChainTracer instance with an active session.
|
||||
"""
|
||||
tracer = LangChainTracer(session_name=session_name)
|
||||
tracer.ensure_session()
|
||||
return tracer
|
||||
|
||||
async def arun_on_dataset(
|
||||
self,
|
||||
@@ -433,15 +622,93 @@ class LangChainPlusClient(BaseSettings):
|
||||
)
|
||||
dataset = self.read_dataset(dataset_name=dataset_name)
|
||||
examples = self.list_examples(dataset_id=str(dataset.id))
|
||||
results: Dict[str, List[Any]] = {}
|
||||
|
||||
return await arun_on_examples(
|
||||
examples,
|
||||
llm_or_chain_factory,
|
||||
concurrency_level=concurrency_level,
|
||||
num_repetitions=num_repetitions,
|
||||
session_name=session_name,
|
||||
verbose=verbose,
|
||||
async def process_example(
|
||||
example: Example, tracer: LangChainTracer, job_state: dict
|
||||
) -> None:
|
||||
"""Process a single example."""
|
||||
result = await LangChainPlusClient._arun_llm_or_chain(
|
||||
example,
|
||||
tracer,
|
||||
llm_or_chain_factory,
|
||||
num_repetitions,
|
||||
)
|
||||
results[str(example.id)] = result
|
||||
job_state["num_processed"] += 1
|
||||
if verbose:
|
||||
print(
|
||||
f"Processed examples: {job_state['num_processed']}",
|
||||
end="\r",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
await self._gather_with_concurrency(
|
||||
concurrency_level,
|
||||
functools.partial(self._tracer_initializer, session_name),
|
||||
*(functools.partial(process_example, e) for e in examples),
|
||||
)
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def run_llm(
|
||||
llm: BaseLanguageModel,
|
||||
inputs: Dict[str, Any],
|
||||
langchain_tracer: LangChainTracer,
|
||||
) -> Union[LLMResult, ChatResult]:
|
||||
"""Run the language model on the example."""
|
||||
if isinstance(llm, BaseLLM):
|
||||
try:
|
||||
llm_prompts = LangChainPlusClient._get_prompts(inputs)
|
||||
llm_output = llm.generate(llm_prompts, callbacks=[langchain_tracer])
|
||||
except InputFormatError:
|
||||
llm_messages = LangChainPlusClient._get_messages(inputs)
|
||||
buffer_strings = [
|
||||
get_buffer_string(messages) for messages in llm_messages
|
||||
]
|
||||
llm_output = llm.generate(buffer_strings, callbacks=[langchain_tracer])
|
||||
elif isinstance(llm, BaseChatModel):
|
||||
try:
|
||||
messages = LangChainPlusClient._get_messages(inputs)
|
||||
llm_output = llm.generate(messages, callbacks=[langchain_tracer])
|
||||
except InputFormatError:
|
||||
prompts = LangChainPlusClient._get_prompts(inputs)
|
||||
converted_messages: List[List[BaseMessage]] = [
|
||||
[HumanMessage(content=prompt)] for prompt in prompts
|
||||
]
|
||||
llm_output = llm.generate(
|
||||
converted_messages, callbacks=[langchain_tracer]
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unsupported LLM type {type(llm)}")
|
||||
return llm_output
|
||||
|
||||
@staticmethod
|
||||
def run_llm_or_chain(
|
||||
example: Example,
|
||||
langchain_tracer: LangChainTracer,
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
n_repetitions: int,
|
||||
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
|
||||
"""Run the chain synchronously."""
|
||||
previous_example_id = langchain_tracer.example_id
|
||||
langchain_tracer.example_id = example.id
|
||||
outputs = []
|
||||
for _ in range(n_repetitions):
|
||||
try:
|
||||
if isinstance(llm_or_chain_factory, BaseLanguageModel):
|
||||
output: Any = LangChainPlusClient.run_llm(
|
||||
llm_or_chain_factory, example.inputs, langchain_tracer
|
||||
)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = chain.run(example.inputs, callbacks=[langchain_tracer])
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
outputs.append({"Error": str(e)})
|
||||
langchain_tracer.example_id = previous_example_id
|
||||
return outputs
|
||||
|
||||
def run_on_dataset(
|
||||
self,
|
||||
@@ -474,11 +741,18 @@ class LangChainPlusClient(BaseSettings):
|
||||
session_name, llm_or_chain_factory, dataset_name
|
||||
)
|
||||
dataset = self.read_dataset(dataset_name=dataset_name)
|
||||
examples = self.list_examples(dataset_id=str(dataset.id))
|
||||
return run_on_examples(
|
||||
examples,
|
||||
llm_or_chain_factory,
|
||||
num_repetitions=num_repetitions,
|
||||
session_name=session_name,
|
||||
verbose=verbose,
|
||||
)
|
||||
examples = list(self.list_examples(dataset_id=str(dataset.id)))
|
||||
results: Dict[str, Any] = {}
|
||||
tracer = LangChainTracer(session_name=session_name)
|
||||
tracer.ensure_session()
|
||||
for i, example in enumerate(examples):
|
||||
result = self.run_llm_or_chain(
|
||||
example,
|
||||
tracer,
|
||||
llm_or_chain_factory,
|
||||
num_repetitions,
|
||||
)
|
||||
if verbose:
|
||||
print(f"{i+1} processed", flush=True, end="\r")
|
||||
results[str(example.id)] = result
|
||||
return results
|
||||
|
||||
@@ -1,375 +0,0 @@
|
||||
"""Utilities for running LLMs/Chains over datasets."""
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import functools
|
||||
import logging
|
||||
from typing import Any, Callable, Coroutine, Dict, Iterator, List, Optional, Union
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.callbacks.tracers.langchain import LangChainTracer
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chat_models.base import BaseChatModel
|
||||
from langchain.client.models import Example
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.schema import (
|
||||
BaseMessage,
|
||||
ChatResult,
|
||||
HumanMessage,
|
||||
LLMResult,
|
||||
get_buffer_string,
|
||||
messages_from_dict,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]
|
||||
|
||||
|
||||
class InputFormatError(Exception):
|
||||
"""Raised when input format is invalid."""
|
||||
|
||||
|
||||
def _get_prompts(inputs: Dict[str, Any]) -> List[str]:
|
||||
"""Get prompts from inputs."""
|
||||
if not inputs:
|
||||
raise InputFormatError("Inputs should not be empty.")
|
||||
|
||||
prompts = []
|
||||
if "prompt" in inputs:
|
||||
if not isinstance(inputs["prompt"], str):
|
||||
raise InputFormatError(
|
||||
"Expected string for 'prompt', got"
|
||||
f" {type(inputs['prompt']).__name__}"
|
||||
)
|
||||
prompts = [inputs["prompt"]]
|
||||
elif "prompts" in inputs:
|
||||
if not isinstance(inputs["prompts"], list) or not all(
|
||||
isinstance(i, str) for i in inputs["prompts"]
|
||||
):
|
||||
raise InputFormatError(
|
||||
"Expected list of strings for 'prompts',"
|
||||
f" got {type(inputs['prompts']).__name__}"
|
||||
)
|
||||
prompts = inputs["prompts"]
|
||||
elif len(inputs) == 1:
|
||||
prompt_ = next(iter(inputs.values()))
|
||||
if isinstance(prompt_, str):
|
||||
prompts = [prompt_]
|
||||
elif isinstance(prompt_, list) and all(isinstance(i, str) for i in prompt_):
|
||||
prompts = prompt_
|
||||
else:
|
||||
raise InputFormatError(f"LLM Run expects string prompt input. Got {inputs}")
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"LLM Run expects 'prompt' or 'prompts' in inputs. Got {inputs}"
|
||||
)
|
||||
|
||||
return prompts
|
||||
|
||||
|
||||
def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
|
||||
"""Get Chat Messages from inputs."""
|
||||
if not inputs:
|
||||
raise InputFormatError("Inputs should not be empty.")
|
||||
|
||||
if "messages" in inputs:
|
||||
single_input = inputs["messages"]
|
||||
elif len(inputs) == 1:
|
||||
single_input = next(iter(inputs.values()))
|
||||
else:
|
||||
raise InputFormatError(f"Chat Run expects 'messages' in inputs. Got {inputs}")
|
||||
if isinstance(single_input, list) and all(
|
||||
isinstance(i, dict) for i in single_input
|
||||
):
|
||||
raw_messages = [single_input]
|
||||
elif isinstance(single_input, list) and all(
|
||||
isinstance(i, list) for i in single_input
|
||||
):
|
||||
raw_messages = single_input
|
||||
else:
|
||||
raise InputFormatError(
|
||||
f"Chat Run expects List[dict] or List[List[dict]] 'messages'"
|
||||
f" input. Got {inputs}"
|
||||
)
|
||||
return [messages_from_dict(batch) for batch in raw_messages]
|
||||
|
||||
|
||||
async def _arun_llm(
|
||||
llm: BaseLanguageModel,
|
||||
inputs: Dict[str, Any],
|
||||
langchain_tracer: Optional[LangChainTracer],
|
||||
) -> Union[LLMResult, ChatResult]:
|
||||
callbacks: Optional[List[BaseCallbackHandler]] = (
|
||||
[langchain_tracer] if langchain_tracer else None
|
||||
)
|
||||
if isinstance(llm, BaseLLM):
|
||||
try:
|
||||
llm_prompts = _get_prompts(inputs)
|
||||
llm_output = await llm.agenerate(llm_prompts, callbacks=callbacks)
|
||||
except InputFormatError:
|
||||
llm_messages = _get_messages(inputs)
|
||||
buffer_strings = [get_buffer_string(messages) for messages in llm_messages]
|
||||
llm_output = await llm.agenerate(buffer_strings, callbacks=callbacks)
|
||||
elif isinstance(llm, BaseChatModel):
|
||||
try:
|
||||
messages = _get_messages(inputs)
|
||||
llm_output = await llm.agenerate(messages, callbacks=callbacks)
|
||||
except InputFormatError:
|
||||
prompts = _get_prompts(inputs)
|
||||
converted_messages: List[List[BaseMessage]] = [
|
||||
[HumanMessage(content=prompt)] for prompt in prompts
|
||||
]
|
||||
llm_output = await llm.agenerate(converted_messages, callbacks=callbacks)
|
||||
else:
|
||||
raise ValueError(f"Unsupported LLM type {type(llm)}")
|
||||
return llm_output
|
||||
|
||||
|
||||
async def _arun_llm_or_chain(
|
||||
example: Example,
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
n_repetitions: int,
|
||||
langchain_tracer: Optional[LangChainTracer],
|
||||
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
|
||||
"""Run the chain asynchronously."""
|
||||
if langchain_tracer is not None:
|
||||
previous_example_id = langchain_tracer.example_id
|
||||
langchain_tracer.example_id = example.id
|
||||
callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
|
||||
else:
|
||||
previous_example_id = None
|
||||
callbacks = None
|
||||
outputs = []
|
||||
for _ in range(n_repetitions):
|
||||
try:
|
||||
if isinstance(llm_or_chain_factory, BaseLanguageModel):
|
||||
output: Any = await _arun_llm(
|
||||
llm_or_chain_factory, example.inputs, langchain_tracer
|
||||
)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = await chain.arun(example.inputs, callbacks=callbacks)
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
outputs.append({"Error": str(e)})
|
||||
if langchain_tracer is not None:
|
||||
langchain_tracer.example_id = previous_example_id
|
||||
return outputs
|
||||
|
||||
|
||||
async def _gather_with_concurrency(
|
||||
n: int,
|
||||
initializer: Callable[[], Coroutine[Any, Any, Optional[LangChainTracer]]],
|
||||
*async_funcs: Callable[[Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]],
|
||||
) -> List[Any]:
|
||||
"""
|
||||
Run coroutines with a concurrency limit.
|
||||
|
||||
Args:
|
||||
n: The maximum number of concurrent tasks.
|
||||
initializer: A coroutine that initializes shared resources for the tasks.
|
||||
async_funcs: The async_funcs to be run concurrently.
|
||||
|
||||
Returns:
|
||||
A list of results from the coroutines.
|
||||
"""
|
||||
semaphore = asyncio.Semaphore(n)
|
||||
job_state = {"num_processed": 0}
|
||||
|
||||
tracer_queue: asyncio.Queue[Optional[LangChainTracer]] = asyncio.Queue()
|
||||
for _ in range(n):
|
||||
tracer_queue.put_nowait(await initializer())
|
||||
|
||||
async def run_coroutine_with_semaphore(
|
||||
async_func: Callable[
|
||||
[Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]
|
||||
]
|
||||
) -> Any:
|
||||
async with semaphore:
|
||||
tracer = await tracer_queue.get()
|
||||
try:
|
||||
result = await async_func(tracer, job_state)
|
||||
finally:
|
||||
tracer_queue.put_nowait(tracer)
|
||||
return result
|
||||
|
||||
return await asyncio.gather(
|
||||
*(run_coroutine_with_semaphore(function) for function in async_funcs)
|
||||
)
|
||||
|
||||
|
||||
async def _tracer_initializer(session_name: Optional[str]) -> Optional[LangChainTracer]:
|
||||
"""
|
||||
Initialize a tracer to share across tasks.
|
||||
|
||||
Args:
|
||||
session_name: The session name for the tracer.
|
||||
|
||||
Returns:
|
||||
A LangChainTracer instance with an active session.
|
||||
"""
|
||||
if session_name:
|
||||
tracer = LangChainTracer(session_name=session_name)
|
||||
tracer.ensure_session()
|
||||
return tracer
|
||||
else:
|
||||
return None
|
||||
|
||||
|
||||
async def arun_on_examples(
|
||||
examples: Iterator[Example],
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
*,
|
||||
concurrency_level: int = 5,
|
||||
num_repetitions: int = 1,
|
||||
session_name: Optional[str] = None,
|
||||
verbose: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Run the chain on examples and store traces to the specified session name.
|
||||
|
||||
Args:
|
||||
examples: Examples to run the model or chain over
|
||||
llm_or_chain_factory: Language model or Chain constructor to run
|
||||
over the dataset. The Chain constructor is used to permit
|
||||
independent calls on each example without carrying over state.
|
||||
concurrency_level: The number of async tasks to run concurrently.
|
||||
num_repetitions: Number of times to run the model on each example.
|
||||
This is useful when testing success rates or generating confidence
|
||||
intervals.
|
||||
session_name: Session name to use when tracing runs.
|
||||
verbose: Whether to print progress.
|
||||
|
||||
Returns:
|
||||
A dictionary mapping example ids to the model outputs.
|
||||
"""
|
||||
results: Dict[str, List[Any]] = {}
|
||||
|
||||
async def process_example(
|
||||
example: Example, tracer: LangChainTracer, job_state: dict
|
||||
) -> None:
|
||||
"""Process a single example."""
|
||||
result = await _arun_llm_or_chain(
|
||||
example,
|
||||
llm_or_chain_factory,
|
||||
num_repetitions,
|
||||
tracer,
|
||||
)
|
||||
results[str(example.id)] = result
|
||||
job_state["num_processed"] += 1
|
||||
if verbose:
|
||||
print(
|
||||
f"Processed examples: {job_state['num_processed']}",
|
||||
end="\r",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
await _gather_with_concurrency(
|
||||
concurrency_level,
|
||||
functools.partial(_tracer_initializer, session_name),
|
||||
*(functools.partial(process_example, e) for e in examples),
|
||||
)
|
||||
return results
|
||||
|
||||
|
||||
def run_llm(
|
||||
llm: BaseLanguageModel,
|
||||
inputs: Dict[str, Any],
|
||||
callbacks: Callbacks,
|
||||
) -> Union[LLMResult, ChatResult]:
|
||||
"""Run the language model on the example."""
|
||||
if isinstance(llm, BaseLLM):
|
||||
try:
|
||||
llm_prompts = _get_prompts(inputs)
|
||||
llm_output = llm.generate(llm_prompts, callbacks=callbacks)
|
||||
except InputFormatError:
|
||||
llm_messages = _get_messages(inputs)
|
||||
buffer_strings = [get_buffer_string(messages) for messages in llm_messages]
|
||||
llm_output = llm.generate(buffer_strings, callbacks=callbacks)
|
||||
elif isinstance(llm, BaseChatModel):
|
||||
try:
|
||||
messages = _get_messages(inputs)
|
||||
llm_output = llm.generate(messages, callbacks=callbacks)
|
||||
except InputFormatError:
|
||||
prompts = _get_prompts(inputs)
|
||||
converted_messages: List[List[BaseMessage]] = [
|
||||
[HumanMessage(content=prompt)] for prompt in prompts
|
||||
]
|
||||
llm_output = llm.generate(converted_messages, callbacks=callbacks)
|
||||
else:
|
||||
raise ValueError(f"Unsupported LLM type {type(llm)}")
|
||||
return llm_output
|
||||
|
||||
|
||||
def run_llm_or_chain(
|
||||
example: Example,
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
n_repetitions: int,
|
||||
langchain_tracer: Optional[LangChainTracer] = None,
|
||||
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
|
||||
"""Run the chain synchronously."""
|
||||
if langchain_tracer is not None:
|
||||
previous_example_id = langchain_tracer.example_id
|
||||
langchain_tracer.example_id = example.id
|
||||
callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
|
||||
else:
|
||||
previous_example_id = None
|
||||
callbacks = None
|
||||
outputs = []
|
||||
for _ in range(n_repetitions):
|
||||
try:
|
||||
if isinstance(llm_or_chain_factory, BaseLanguageModel):
|
||||
output: Any = run_llm(llm_or_chain_factory, example.inputs, callbacks)
|
||||
else:
|
||||
chain = llm_or_chain_factory()
|
||||
output = chain.run(example.inputs, callbacks=callbacks)
|
||||
outputs.append(output)
|
||||
except Exception as e:
|
||||
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
|
||||
outputs.append({"Error": str(e)})
|
||||
if langchain_tracer is not None:
|
||||
langchain_tracer.example_id = previous_example_id
|
||||
return outputs
|
||||
|
||||
|
||||
def run_on_examples(
|
||||
examples: Iterator[Example],
|
||||
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
|
||||
*,
|
||||
num_repetitions: int = 1,
|
||||
session_name: Optional[str] = None,
|
||||
verbose: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""Run the chain on examples and store traces to the specified session name.
|
||||
|
||||
Args:
|
||||
examples: Examples to run model or chain over.
|
||||
llm_or_chain_factory: Language model or Chain constructor to run
|
||||
over the dataset. The Chain constructor is used to permit
|
||||
independent calls on each example without carrying over state.
|
||||
concurrency_level: Number of async workers to run in parallel.
|
||||
num_repetitions: Number of times to run the model on each example.
|
||||
This is useful when testing success rates or generating confidence
|
||||
intervals.
|
||||
session_name: Session name to use when tracing runs.
|
||||
verbose: Whether to print progress.
|
||||
Returns:
|
||||
A dictionary mapping example ids to the model outputs.
|
||||
"""
|
||||
results: Dict[str, Any] = {}
|
||||
tracer = LangChainTracer(session_name=session_name) if session_name else None
|
||||
for i, example in enumerate(examples):
|
||||
result = run_llm_or_chain(
|
||||
example,
|
||||
llm_or_chain_factory,
|
||||
num_repetitions,
|
||||
langchain_tracer=tracer,
|
||||
)
|
||||
if verbose:
|
||||
print(f"{i+1} processed", flush=True, end="\r")
|
||||
results[str(example.id)] = result
|
||||
return results
|
||||
@@ -10,7 +10,6 @@ from langchain.document_loaders.azure_blob_storage_container import (
|
||||
from langchain.document_loaders.azure_blob_storage_file import (
|
||||
AzureBlobStorageFileLoader,
|
||||
)
|
||||
from langchain.document_loaders.bibtex import BibtexLoader
|
||||
from langchain.document_loaders.bigquery import BigQueryLoader
|
||||
from langchain.document_loaders.bilibili import BiliBiliLoader
|
||||
from langchain.document_loaders.blackboard import BlackboardLoader
|
||||
@@ -47,10 +46,8 @@ from langchain.document_loaders.ifixit import IFixitLoader
|
||||
from langchain.document_loaders.image import UnstructuredImageLoader
|
||||
from langchain.document_loaders.image_captions import ImageCaptionLoader
|
||||
from langchain.document_loaders.imsdb import IMSDbLoader
|
||||
from langchain.document_loaders.joplin import JoplinLoader
|
||||
from langchain.document_loaders.json_loader import JSONLoader
|
||||
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
|
||||
from langchain.document_loaders.mastodon import MastodonTootsLoader
|
||||
from langchain.document_loaders.mediawikidump import MWDumpLoader
|
||||
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
|
||||
from langchain.document_loaders.notebook import NotebookLoader
|
||||
@@ -72,7 +69,6 @@ from langchain.document_loaders.pdf import (
|
||||
UnstructuredPDFLoader,
|
||||
)
|
||||
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
|
||||
from langchain.document_loaders.psychic import PsychicLoader
|
||||
from langchain.document_loaders.python import PythonLoader
|
||||
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
|
||||
from langchain.document_loaders.reddit import RedditPostsLoader
|
||||
@@ -102,7 +98,6 @@ from langchain.document_loaders.unstructured import (
|
||||
from langchain.document_loaders.url import UnstructuredURLLoader
|
||||
from langchain.document_loaders.url_playwright import PlaywrightURLLoader
|
||||
from langchain.document_loaders.url_selenium import SeleniumURLLoader
|
||||
from langchain.document_loaders.weather import WeatherDataLoader
|
||||
from langchain.document_loaders.web_base import WebBaseLoader
|
||||
from langchain.document_loaders.whatsapp_chat import WhatsAppChatLoader
|
||||
from langchain.document_loaders.wikipedia import WikipediaLoader
|
||||
@@ -130,7 +125,6 @@ __all__ = [
|
||||
"AzureBlobStorageContainerLoader",
|
||||
"AzureBlobStorageFileLoader",
|
||||
"BSHTMLLoader",
|
||||
"BibtexLoader",
|
||||
"BigQueryLoader",
|
||||
"BiliBiliLoader",
|
||||
"BlackboardLoader",
|
||||
@@ -163,10 +157,8 @@ __all__ = [
|
||||
"IFixitLoader",
|
||||
"IMSDbLoader",
|
||||
"ImageCaptionLoader",
|
||||
"JoplinLoader",
|
||||
"JSONLoader",
|
||||
"MWDumpLoader",
|
||||
"MastodonTootsLoader",
|
||||
"MathpixPDFLoader",
|
||||
"ModernTreasuryLoader",
|
||||
"NotebookLoader",
|
||||
@@ -217,12 +209,10 @@ __all__ = [
|
||||
"UnstructuredRTFLoader",
|
||||
"UnstructuredURLLoader",
|
||||
"UnstructuredWordDocumentLoader",
|
||||
"WeatherDataLoader",
|
||||
"WebBaseLoader",
|
||||
"WhatsAppChatLoader",
|
||||
"WikipediaLoader",
|
||||
"YoutubeLoader",
|
||||
"TelegramChatLoader",
|
||||
"ToMarkdownLoader",
|
||||
"PsychicLoader",
|
||||
]
|
||||
|
||||
@@ -41,7 +41,7 @@ class ApifyDatasetLoader(BaseLoader, BaseModel):
|
||||
|
||||
values["apify_client"] = ApifyClient()
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import apify-client Python package. "
|
||||
"Please install it with `pip install apify-client`."
|
||||
)
|
||||
|
||||
@@ -1,108 +0,0 @@
|
||||
import logging
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, Iterator, List, Mapping, Optional
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.utilities.bibtex import BibtexparserWrapper
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BibtexLoader(BaseLoader):
|
||||
"""Loads a bibtex file into a list of Documents.
|
||||
|
||||
Each document represents one entry from the bibtex file.
|
||||
|
||||
If a PDF file is present in the `file` bibtex field, the original PDF
|
||||
is loaded into the document text. If no such file entry is present,
|
||||
the `abstract` field is used instead.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
file_path: str,
|
||||
*,
|
||||
parser: Optional[BibtexparserWrapper] = None,
|
||||
max_docs: Optional[int] = None,
|
||||
max_content_chars: Optional[int] = 4_000,
|
||||
load_extra_metadata: bool = False,
|
||||
file_pattern: str = r"[^:]+\.pdf",
|
||||
):
|
||||
"""Initialize the BibtexLoader.
|
||||
|
||||
Args:
|
||||
file_path: Path to the bibtex file.
|
||||
max_docs: Max number of associated documents to load. Use -1 means
|
||||
no limit.
|
||||
"""
|
||||
self.file_path = file_path
|
||||
self.parser = parser or BibtexparserWrapper()
|
||||
self.max_docs = max_docs
|
||||
self.max_content_chars = max_content_chars
|
||||
self.load_extra_metadata = load_extra_metadata
|
||||
self.file_regex = re.compile(file_pattern)
|
||||
|
||||
def _load_entry(self, entry: Mapping[str, Any]) -> Optional[Document]:
|
||||
import fitz
|
||||
|
||||
parent_dir = Path(self.file_path).parent
|
||||
# regex is useful for Zotero flavor bibtex files
|
||||
file_names = self.file_regex.findall(entry.get("file", ""))
|
||||
if not file_names:
|
||||
return None
|
||||
texts: List[str] = []
|
||||
for file_name in file_names:
|
||||
try:
|
||||
with fitz.open(parent_dir / file_name) as f:
|
||||
texts.extend(page.get_text() for page in f)
|
||||
except FileNotFoundError as e:
|
||||
logger.debug(e)
|
||||
content = "\n".join(texts) or entry.get("abstract", "")
|
||||
if self.max_content_chars:
|
||||
content = content[: self.max_content_chars]
|
||||
metadata = self.parser.get_metadata(entry, load_extra=self.load_extra_metadata)
|
||||
return Document(
|
||||
page_content=content,
|
||||
metadata=metadata,
|
||||
)
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
"""Load bibtex file using bibtexparser and get the article texts plus the
|
||||
|
||||
article metadata.
|
||||
|
||||
See https://bibtexparser.readthedocs.io/en/master/
|
||||
|
||||
Returns:
|
||||
a list of documents with the document.page_content in text format
|
||||
"""
|
||||
try:
|
||||
import fitz # noqa: F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"PyMuPDF package not found, please install it with "
|
||||
"`pip install pymupdf`"
|
||||
)
|
||||
|
||||
entries = self.parser.load_bibtex_entries(self.file_path)
|
||||
if self.max_docs:
|
||||
entries = entries[: self.max_docs]
|
||||
for entry in entries:
|
||||
doc = self._load_entry(entry)
|
||||
if doc:
|
||||
yield doc
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load bibtex file documents from the given bibtex file path.
|
||||
|
||||
See https://bibtexparser.readthedocs.io/en/master/
|
||||
|
||||
Args:
|
||||
file_path: the path to the bibtex file
|
||||
|
||||
Returns:
|
||||
a list of documents with the document.page_content in text format
|
||||
"""
|
||||
return list(self.lazy_load())
|
||||
@@ -63,7 +63,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
|
||||
try:
|
||||
from lxml import etree
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import lxml python package. "
|
||||
"Please install it with `pip install lxml`."
|
||||
)
|
||||
@@ -259,7 +259,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
|
||||
try:
|
||||
from lxml import etree
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import lxml python package. "
|
||||
"Please install it with `pip install lxml`."
|
||||
)
|
||||
|
||||
@@ -33,7 +33,7 @@ class DuckDBLoader(BaseLoader):
|
||||
try:
|
||||
import duckdb
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"Could not import duckdb python package. "
|
||||
"Please install it with `pip install duckdb`."
|
||||
)
|
||||
|
||||
@@ -31,7 +31,6 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
file_ids: Optional[List[str]] = None
|
||||
recursive: bool = False
|
||||
file_types: Optional[Sequence[str]] = None
|
||||
load_trashed_files: bool = False
|
||||
|
||||
@root_validator
|
||||
def validate_inputs(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
@@ -216,10 +215,8 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
_files = files
|
||||
|
||||
returns = []
|
||||
for file in files:
|
||||
if file["trashed"] and not self.load_trashed_files:
|
||||
continue
|
||||
elif file["mimeType"] == "application/vnd.google-apps.document":
|
||||
for file in _files:
|
||||
if file["mimeType"] == "application/vnd.google-apps.document":
|
||||
returns.append(self._load_document_from_id(file["id"])) # type: ignore
|
||||
elif file["mimeType"] == "application/vnd.google-apps.spreadsheet":
|
||||
returns.extend(self._load_sheet_from_id(file["id"])) # type: ignore
|
||||
@@ -227,6 +224,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
returns.extend(self._load_file_from_id(file["id"])) # type: ignore
|
||||
else:
|
||||
pass
|
||||
|
||||
return returns
|
||||
|
||||
def _fetch_files_recursive(
|
||||
@@ -240,7 +238,7 @@ class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
pageSize=1000,
|
||||
includeItemsFromAllDrives=True,
|
||||
supportsAllDrives=True,
|
||||
fields="nextPageToken, files(id, name, mimeType, parents, trashed)",
|
||||
fields="nextPageToken, files(id, name, mimeType, parents)",
|
||||
)
|
||||
.execute()
|
||||
)
|
||||
|
||||
@@ -39,7 +39,7 @@ class ImageCaptionLoader(BaseLoader):
|
||||
try:
|
||||
from transformers import BlipForConditionalGeneration, BlipProcessor
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"`transformers` package not found, please install with "
|
||||
"`pip install transformers`."
|
||||
)
|
||||
@@ -66,7 +66,7 @@ class ImageCaptionLoader(BaseLoader):
|
||||
try:
|
||||
from PIL import Image
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"`PIL` package not found, please install with `pip install pillow`"
|
||||
)
|
||||
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
"""Loader that fetches data from IUGU"""
|
||||
import json
|
||||
import urllib.request
|
||||
from typing import List, Optional
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.utils import get_from_env, stringify_dict
|
||||
|
||||
IUGU_ENDPOINTS = {
|
||||
"invoices": "https://api.iugu.com/v1/invoices",
|
||||
"customers": "https://api.iugu.com/v1/customers",
|
||||
"charges": "https://api.iugu.com/v1/charges",
|
||||
"subscriptions": "https://api.iugu.com/v1/subscriptions",
|
||||
"plans": "https://api.iugu.com/v1/plans",
|
||||
}
|
||||
|
||||
|
||||
class IuguLoader(BaseLoader):
|
||||
def __init__(self, resource: str, api_token: Optional[str] = None) -> None:
|
||||
self.resource = resource
|
||||
api_token = api_token or get_from_env("api_token", "IUGU_API_TOKEN")
|
||||
self.headers = {"Authorization": f"Bearer {api_token}"}
|
||||
|
||||
def _make_request(self, url: str) -> List[Document]:
|
||||
request = urllib.request.Request(url, headers=self.headers)
|
||||
|
||||
with urllib.request.urlopen(request) as response:
|
||||
json_data = json.loads(response.read().decode())
|
||||
text = stringify_dict(json_data)
|
||||
metadata = {"source": url}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
|
||||
def _get_resource(self) -> List[Document]:
|
||||
endpoint = IUGU_ENDPOINTS.get(self.resource)
|
||||
if endpoint is None:
|
||||
return []
|
||||
return self._make_request(endpoint)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
return self._get_resource()
|
||||
@@ -1,88 +0,0 @@
|
||||
import json
|
||||
import urllib
|
||||
from datetime import datetime
|
||||
from typing import Iterator, List, Optional
|
||||
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.schema import Document
|
||||
from langchain.utils import get_from_env
|
||||
|
||||
LINK_NOTE_TEMPLATE = "joplin://x-callback-url/openNote?id={id}"
|
||||
|
||||
|
||||
class JoplinLoader(BaseLoader):
|
||||
"""
|
||||
Loader that fetches notes from Joplin.
|
||||
|
||||
In order to use this loader, you need to have Joplin running with the
|
||||
Web Clipper enabled (look for "Web Clipper" in the app settings).
|
||||
|
||||
To get the access token, you need to go to the Web Clipper options and
|
||||
under "Advanced Options" you will find the access token.
|
||||
|
||||
You can find more information about the Web Clipper service here:
|
||||
https://joplinapp.org/clipper/
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
access_token: Optional[str] = None,
|
||||
port: int = 41184,
|
||||
host: str = "localhost",
|
||||
) -> None:
|
||||
access_token = access_token or get_from_env(
|
||||
"access_token", "JOPLIN_ACCESS_TOKEN"
|
||||
)
|
||||
base_url = f"http://{host}:{port}"
|
||||
self._get_note_url = (
|
||||
f"{base_url}/notes?token={access_token}"
|
||||
"&fields=id,parent_id,title,body,created_time,updated_time&page={{page}}"
|
||||
)
|
||||
self._get_folder_url = (
|
||||
f"{base_url}/folders/{{id}}?token={access_token}&fields=title"
|
||||
)
|
||||
self._get_tag_url = (
|
||||
f"{base_url}/notes/{{id}}/tags?token={access_token}&fields=title"
|
||||
)
|
||||
|
||||
def _get_notes(self) -> Iterator[Document]:
|
||||
has_more = True
|
||||
page = 1
|
||||
while has_more:
|
||||
req_note = urllib.request.Request(self._get_note_url.format(page=page))
|
||||
with urllib.request.urlopen(req_note) as response:
|
||||
json_data = json.loads(response.read().decode())
|
||||
for note in json_data["items"]:
|
||||
metadata = {
|
||||
"source": LINK_NOTE_TEMPLATE.format(id=note["id"]),
|
||||
"folder": self._get_folder(note["parent_id"]),
|
||||
"tags": self._get_tags(note["id"]),
|
||||
"title": note["title"],
|
||||
"created_time": self._convert_date(note["created_time"]),
|
||||
"updated_time": self._convert_date(note["updated_time"]),
|
||||
}
|
||||
yield Document(page_content=note["body"], metadata=metadata)
|
||||
|
||||
has_more = json_data["has_more"]
|
||||
page += 1
|
||||
|
||||
def _get_folder(self, folder_id: str) -> str:
|
||||
req_folder = urllib.request.Request(self._get_folder_url.format(id=folder_id))
|
||||
with urllib.request.urlopen(req_folder) as response:
|
||||
json_data = json.loads(response.read().decode())
|
||||
return json_data["title"]
|
||||
|
||||
def _get_tags(self, note_id: str) -> List[str]:
|
||||
req_tag = urllib.request.Request(self._get_tag_url.format(id=note_id))
|
||||
with urllib.request.urlopen(req_tag) as response:
|
||||
json_data = json.loads(response.read().decode())
|
||||
return [tag["title"] for tag in json_data["items"]]
|
||||
|
||||
def _convert_date(self, date: int) -> str:
|
||||
return datetime.fromtimestamp(date / 1000).strftime("%Y-%m-%d %H:%M:%S")
|
||||
|
||||
def lazy_load(self) -> Iterator[Document]:
|
||||
yield from self._get_notes()
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
return list(self.lazy_load())
|
||||
@@ -42,7 +42,7 @@ class JSONLoader(BaseLoader):
|
||||
try:
|
||||
import jq # noqa:F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"jq package not found, please install it with `pip install jq`"
|
||||
)
|
||||
|
||||
|
||||
@@ -1,88 +0,0 @@
|
||||
"""Mastodon document loader."""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Sequence
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import mastodon
|
||||
|
||||
|
||||
def _dependable_mastodon_import() -> mastodon:
|
||||
try:
|
||||
import mastodon
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Mastodon.py package not found, "
|
||||
"please install it with `pip install Mastodon.py`"
|
||||
)
|
||||
return mastodon
|
||||
|
||||
|
||||
class MastodonTootsLoader(BaseLoader):
|
||||
"""Mastodon toots loader."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
mastodon_accounts: Sequence[str],
|
||||
number_toots: Optional[int] = 100,
|
||||
exclude_replies: bool = False,
|
||||
access_token: Optional[str] = None,
|
||||
api_base_url: str = "https://mastodon.social",
|
||||
):
|
||||
"""Instantiate Mastodon toots loader.
|
||||
|
||||
Args:
|
||||
mastodon_accounts: The list of Mastodon accounts to query.
|
||||
number_toots: How many toots to pull for each account.
|
||||
exclude_replies: Whether to exclude reply toots from the load.
|
||||
access_token: An access token if toots are loaded as a Mastodon app. Can
|
||||
also be specified via the environment variables "MASTODON_ACCESS_TOKEN".
|
||||
api_base_url: A Mastodon API base URL to talk to, if not using the default.
|
||||
"""
|
||||
mastodon = _dependable_mastodon_import()
|
||||
access_token = access_token or os.environ.get("MASTODON_ACCESS_TOKEN")
|
||||
self.api = mastodon.Mastodon(
|
||||
access_token=access_token, api_base_url=api_base_url
|
||||
)
|
||||
self.mastodon_accounts = mastodon_accounts
|
||||
self.number_toots = number_toots
|
||||
self.exclude_replies = exclude_replies
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load toots into documents."""
|
||||
results: List[Document] = []
|
||||
for account in self.mastodon_accounts:
|
||||
user = self.api.account_lookup(account)
|
||||
toots = self.api.account_statuses(
|
||||
user.id,
|
||||
only_media=False,
|
||||
pinned=False,
|
||||
exclude_replies=self.exclude_replies,
|
||||
exclude_reblogs=True,
|
||||
limit=self.number_toots,
|
||||
)
|
||||
docs = self._format_toots(toots, user)
|
||||
results.extend(docs)
|
||||
return results
|
||||
|
||||
def _format_toots(
|
||||
self, toots: List[Dict[str, Any]], user_info: dict
|
||||
) -> Iterable[Document]:
|
||||
"""Format toots into documents.
|
||||
|
||||
Adding user info, and selected toot fields into the metadata.
|
||||
"""
|
||||
for toot in toots:
|
||||
metadata = {
|
||||
"created_at": toot["created_at"],
|
||||
"user_info": user_info,
|
||||
"is_reply": toot["in_reply_to_id"] is not None,
|
||||
}
|
||||
yield Document(
|
||||
page_content=toot["content"],
|
||||
metadata=metadata,
|
||||
)
|
||||
@@ -83,7 +83,7 @@ class NotebookLoader(BaseLoader):
|
||||
try:
|
||||
import pandas as pd
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
raise ValueError(
|
||||
"pandas is needed for Notebook Loader, "
|
||||
"please install with `pip install pandas`"
|
||||
)
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user