mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-08 18:19:21 +00:00
Compare commits
66 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e323d0cfb1 | ||
|
|
01fa2d8117 | ||
|
|
8e126bc9bd | ||
|
|
c71027e725 | ||
|
|
e85c53ce68 | ||
|
|
3e1901e1aa | ||
|
|
6a4f602156 | ||
|
|
6023d5be09 | ||
|
|
a306baacd1 | ||
|
|
44ecec3896 | ||
|
|
bc7e56e8df | ||
|
|
afc7f1b892 | ||
|
|
d43250bfa5 | ||
|
|
bc53c928fc | ||
|
|
637c0d6508 | ||
|
|
1e56879d38 | ||
|
|
6bd1529cb7 | ||
|
|
2584663e44 | ||
|
|
cc20b9425e | ||
|
|
cea380174f | ||
|
|
87fad8fc00 | ||
|
|
e2b834e427 | ||
|
|
f95cedc443 | ||
|
|
ba5a2f06b9 | ||
|
|
2ec25ddd4c | ||
|
|
31b054f69d | ||
|
|
93a091cfb8 | ||
|
|
3aa53b44dd | ||
|
|
82c080c6e6 | ||
|
|
71e662e88d | ||
|
|
53d56d7650 | ||
|
|
2a68be3e8d | ||
|
|
8217a2f26c | ||
|
|
7658263bfb | ||
|
|
32b11101d3 | ||
|
|
1614c5f5fd | ||
|
|
a2b699dcd2 | ||
|
|
7cc44b3bdb | ||
|
|
0b9f086d36 | ||
|
|
bcfbc7a818 | ||
|
|
1dd0733515 | ||
|
|
4c79100b15 | ||
|
|
777aaff841 | ||
|
|
e9ef08862d | ||
|
|
364b771743 | ||
|
|
483441d305 | ||
|
|
8df6b68093 | ||
|
|
3f48eed5bd | ||
|
|
933441cc52 | ||
|
|
4a8f5cdf4b | ||
|
|
523ad2e6bd | ||
|
|
fc0cfd7d1f | ||
|
|
4d32441b86 | ||
|
|
23d5f64bda | ||
|
|
0de55048b7 | ||
|
|
d564308e0f | ||
|
|
576609e665 | ||
|
|
3f952eb597 | ||
|
|
ba26a879e0 | ||
|
|
bfabd1d5c0 | ||
|
|
f3508228df | ||
|
|
b4eb043b81 | ||
|
|
06438794e1 | ||
|
|
9f8e05ffd4 | ||
|
|
b0d560be56 | ||
|
|
ebea40ce86 |
@@ -22,3 +22,18 @@ This repo serves as a template for how deploy a LangChain with Gradio.
|
||||
It implements a chatbot interface, with a "Bring-Your-Own-Token" approach (nice for not wracking up big bills).
|
||||
It also contains instructions for how to deploy this app on the Hugging Face platform.
|
||||
This is heavily influenced by James Weaver's [excellent examples](https://huggingface.co/JavaFXpert).
|
||||
|
||||
## [Beam](https://github.com/slai-labs/get-beam/tree/main/examples/langchain-question-answering)
|
||||
|
||||
This repo serves as a template for how deploy a LangChain with [Beam](https://beam.cloud).
|
||||
|
||||
It implements a Question Answering app and contains instructions for deploying the app as a serverless REST API.
|
||||
|
||||
## [Vercel](https://github.com/homanp/vercel-langchain)
|
||||
|
||||
A minimal example on how to run LangChain on Vercel using Flask.
|
||||
|
||||
|
||||
## [SteamShip](https://github.com/steamship-core/steamship-langchain/)
|
||||
This repository contains LangChain adapters for Steamship, enabling LangChain developers to rapidly deploy their apps on Steamship.
|
||||
This includes: production ready endpoints, horizontal scaling across dependencies, persistant storage of app state, multi-tenancy support, etc.
|
||||
|
||||
@@ -77,6 +77,17 @@ Open Source
|
||||
|
||||
+++
|
||||
|
||||
A jupyter notebook demonstrating how you could create a semantic search engine on documents in one of your Google Folders
|
||||
|
||||
---
|
||||
|
||||
.. link-button:: https://github.com/venuv/langchain_semantic_search
|
||||
:type: url
|
||||
:text: Google Folder Semantic Search
|
||||
:classes: stretched-link btn-lg
|
||||
|
||||
+++
|
||||
|
||||
Build a GitHub support bot with GPT3, LangChain, and Python.
|
||||
|
||||
---
|
||||
@@ -188,6 +199,17 @@ Open Source
|
||||
+++
|
||||
|
||||
This repo is a simple demonstration of using LangChain to do fact-checking with prompt chaining.
|
||||
|
||||
---
|
||||
|
||||
.. link-button:: https://github.com/arc53/docsgpt
|
||||
:type: url
|
||||
:text: DocsGPT
|
||||
:classes: stretched-link btn-lg
|
||||
|
||||
+++
|
||||
|
||||
Answer questions about the documentation of any project
|
||||
|
||||
Misc. Colab Notebooks
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -162,7 +162,7 @@ This is one of the simpler types of chains, but understanding how it works will
|
||||
|
||||
`````{dropdown} Agents: Dynamically call chains based on user input
|
||||
|
||||
So for the chains we've looked at run in a predetermined order.
|
||||
So far the chains we've looked at run in a predetermined order.
|
||||
|
||||
Agents no longer do: they use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning to the user.
|
||||
|
||||
@@ -179,6 +179,20 @@ In order to load agents, you should understand the following concepts:
|
||||
|
||||
**Tools**: For a list of predefined tools and their specifications, see [here](../modules/agents/tools.md).
|
||||
|
||||
For this example, you will also need to install the SerpAPI Python package.
|
||||
|
||||
```bash
|
||||
pip install google-search-results
|
||||
```
|
||||
|
||||
And set the appropriate environment variables.
|
||||
|
||||
```python
|
||||
import os
|
||||
os.environ["SERPAPI_API_KEY"] = "..."
|
||||
```
|
||||
|
||||
Now we can get started!
|
||||
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
|
||||
@@ -51,6 +51,8 @@ These modules are, in increasing order of complexity:
|
||||
|
||||
- `LLMs <./modules/llms.html>`_: This includes a generic interface for all LLMs, and common utilities for working with LLMs.
|
||||
|
||||
- `Document Loaders <./modules/document_loaders.html>`_: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources.
|
||||
|
||||
- `Utils <./modules/utils.html>`_: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application.
|
||||
|
||||
- `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
|
||||
@@ -68,6 +70,7 @@ These modules are, in increasing order of complexity:
|
||||
|
||||
./modules/prompts.md
|
||||
./modules/llms.md
|
||||
./modules/document_loaders.md
|
||||
./modules/utils.md
|
||||
./modules/chains.md
|
||||
./modules/agents.md
|
||||
|
||||
423
docs/modules/agents/examples/async_agent.ipynb
Normal file
423
docs/modules/agents/examples/async_agent.ipynb
Normal file
@@ -0,0 +1,423 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6fb92deb-d89e-439b-855d-c7f2607d794b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Async API for Agent\n",
|
||||
"\n",
|
||||
"LangChain provides async support for Agents by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
|
||||
"\n",
|
||||
"Async methods are currently supported for the following `Tools`: [`SerpAPIWrapper`](https://github.com/hwchase17/langchain/blob/master/langchain/serpapi.py) and [`LLMMathChain`](https://github.com/hwchase17/langchain/blob/master/langchain/chains/llm_math/base.py). Async support for other agent tools are on the roadmap.\n",
|
||||
"\n",
|
||||
"For `Tool`s that have a `coroutine` implemented (the two mentioned above), the `AgentExecutor` will `await` them directly. Otherwise, the `AgentExecutor` will call the `Tool`'s `func` via `asyncio.get_event_loop().run_in_executor` to avoid blocking the main runloop.\n",
|
||||
"\n",
|
||||
"You can use `arun` to call an `AgentExecutor` asynchronously."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97800378-cc34-4283-9bd0-43f336bc914c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Serial vs. Concurrent Execution\n",
|
||||
"\n",
|
||||
"In this example, we kick off agents to answer some questions serially vs. concurrently. You can see that concurrent execution significantly speeds this up."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "da5df06c-af6f-4572-b9f5-0ab971c16487",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"from langchain.agents import initialize_agent, load_tools\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.callbacks.stdout import StdOutCallbackHandler\n",
|
||||
"from langchain.callbacks.base import CallbackManager\n",
|
||||
"from langchain.callbacks.tracers import LangChainTracer\n",
|
||||
"from aiohttp import ClientSession\n",
|
||||
"\n",
|
||||
"questions = [\n",
|
||||
" \"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\",\n",
|
||||
" \"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\",\n",
|
||||
" \"Who won the most recent formula 1 grand prix? What is their age raised to the 0.23 power?\",\n",
|
||||
" \"Who won the US Open women's final in 2019? What is her age raised to the 0.34 power?\",\n",
|
||||
" \"Who is Beyonce's husband? What is his age raised to the 0.19 power?\"\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "fd4c294e-b1d6-44b8-b32e-2765c017e503",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 36^0.334\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis is an American actor, comedian, writer, and producer. In the 1990s, he began his career in improv comedy and performed with ComedySportz, iO Chicago, and The Second City.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' exact age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Jason Sudeikis age exact\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis. (1975-09-18) September 18, 1975 (age 47). Fairfax, Virginia, U.S. · Fort Scott Community College · Actor; comedian; producer; writer · 1997– ...\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the information I need to calculate the age raised to the 0.23 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 47^0.23\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who won the grand prix and then calculate their age raised to the 0.23 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Formula 1 Grand Prix Winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mMax Emilian Verstappen is a Belgian-Dutch racing driver and the 2021 and 2022 Formula One World Champion. He competes under the Dutch flag in Formula One with Red Bull Racing. Verstappen is the son of racing drivers Jos Verstappen, who also competed in Formula One, and Sophie Kumpen.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Max Emilian Verstappen's age.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Max Emilian Verstappen age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m25 years\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now need to calculate 25 raised to the 0.23 power.\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 25^0.23\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.096651272316035\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: Max Emilian Verstappen, who is 25 years old, won the most recent Formula 1 Grand Prix and his age raised to the 0.23 power is 2.096651272316035.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open women's final in 2019 and then calculate her age raised to the 0.34 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"US Open women's final 2019 winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mBianca Andreescu defeated Serena Williams in the final, 6–3, 7–5 to win the women's singles tennis title at the 2019 US Open. It was her first major title, and she became the first Canadian, as well as the first player born in the 2000s, to win a major singles title.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Bianca Andreescu's age.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Bianca Andreescu age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mBianca Vanessa Andreescu is a Canadian-Romanian professional tennis player. She has a career-high ranking of No. 4 in the world, and is the highest-ranked Canadian in the history of the Women's Tennis Association.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the age of Bianca Andreescu.\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 19^0.34\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.7212987634680084\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: Bianca Andreescu, aged 19, won the US Open women's final in 2019. Her age raised to the 0.34 power is 2.7212987634680084.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who Beyonce's husband is and then calculate his age raised to the 0.19 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Who is Beyonce's husband?\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mJay-Z\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jay-Z's age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"How old is Jay-Z?\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m53 years\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 53 raised to the 0.19 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 53^0.19\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.12624064206896\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Jay-Z is Beyonce's husband and his age raised to the 0.19 power is 2.12624064206896.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"Serial executed in 94.83 seconds.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def generate_serially():\n",
|
||||
" for q in questions:\n",
|
||||
" llm = OpenAI(temperature=0)\n",
|
||||
" tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm)\n",
|
||||
" agent = initialize_agent(\n",
|
||||
" tools, llm, agent=\"zero-shot-react-description\", verbose=True\n",
|
||||
" )\n",
|
||||
" agent.run(q)\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"generate_serially()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print(f\"Serial executed in {elapsed:0.2f} seconds.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "076d7b85-45ec-465d-8b31-c2ad119c3438",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[33;1m\u001b[1;3m I need to find out who Beyonce's husband is and then calculate his age raised to the 0.19 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Who is Beyonce's husband?\"\u001b[0m\u001b[31;1m\u001b[1;3m I need to find out who won the grand prix and then calculate their age raised to the 0.23 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Formula 1 Grand Prix Winner\"\u001b[0m\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\u001b[38;5;200m\u001b[1;3m I need to find out who won the US Open women's final in 2019 and then calculate her age raised to the 0.34 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"US Open women's final 2019 winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mJay-Z\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mMax Emilian Verstappen is a Belgian-Dutch racing driver and the 2021 and 2022 Formula One World Champion. He competes under the Dutch flag in Formula One with Red Bull Racing. Verstappen is the son of racing drivers Jos Verstappen, who also competed in Formula One, and Sophie Kumpen.\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mBianca Andreescu defeated Serena Williams in the final, 6–3, 7–5 to win the women's singles tennis title at the 2019 US Open. It was her first major title, and she became the first Canadian, as well as the first player born in the 2000s, to win a major singles title.\u001b[0m\n",
|
||||
"Thought:\u001b[31;1m\u001b[1;3m I need to find out Max Emilian Verstappen's age.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Max Emilian Verstappen age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m25 years\u001b[0m\n",
|
||||
"Thought:\u001b[38;5;200m\u001b[1;3m I need to find out Bianca Andreescu's age.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Bianca Andreescu age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mBianca Vanessa Andreescu is a Canadian-Romanian professional tennis player. She has a career-high ranking of No. 4 in the world, and is the highest-ranked Canadian in the history of the Women's Tennis Association.\u001b[0m\n",
|
||||
"Thought:\u001b[36;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis is an American actor, comedian, writer, and producer. In the 1990s, he began his career in improv comedy and performed with ComedySportz, iO Chicago, and The Second City.\u001b[0m\n",
|
||||
"Thought:\u001b[33;1m\u001b[1;3m I need to find out Jay-Z's age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"How old is Jay-Z?\"\u001b[0m\u001b[36;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m53 years\u001b[0m\n",
|
||||
"Thought:\u001b[38;5;200m\u001b[1;3m I now know the age of Bianca Andreescu.\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 19^0.34\u001b[0m\u001b[31;1m\u001b[1;3m I now need to calculate 25 raised to the 0.23 power.\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 25^0.23\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.7212987634680084\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' exact age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Jason Sudeikis age exact\"\u001b[0m\u001b[33;1m\u001b[1;3m I need to calculate 53 raised to the 0.19 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 53^0.19\u001b[0m\u001b[36;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 36^0.334\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis. (1975-09-18) September 18, 1975 (age 47). Fairfax, Virginia, U.S. · Fort Scott Community College · Actor; comedian; producer; writer · 1997– ...\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.096651272316035\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.12624064206896\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now have the information I need to calculate the age raised to the 0.23 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 47^0.23\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: Bianca Andreescu, aged 19, won the US Open women's final in 2019. Her age raised to the 0.34 power is 2.7212987634680084.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Jay-Z is Beyonce's husband and his age raised to the 0.19 power is 2.12624064206896.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: Max Emilian Verstappen, who is 25 years old, won the most recent Formula 1 Grand Prix and his age raised to the 0.23 power is 2.096651272316035.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"Concurrent executed in 25.06 seconds.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"async def generate_concurrently():\n",
|
||||
" agents = []\n",
|
||||
" # To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
|
||||
" # but you must manually close the client session at the end of your program/event loop\n",
|
||||
" aiosession = ClientSession()\n",
|
||||
" colors = [\"blue\", \"green\", \"red\", \"pink\", \"yellow\"]\n",
|
||||
" for color in colors:\n",
|
||||
" # Use a custom CallbackManager to print in different colors.\n",
|
||||
" manager = CallbackManager([StdOutCallbackHandler(color=color)])\n",
|
||||
" llm = OpenAI(temperature=0, callback_manager=manager)\n",
|
||||
" async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession)\n",
|
||||
" agents.append(\n",
|
||||
" initialize_agent(async_tools, llm, agent=\"zero-shot-react-description\", verbose=True, callback_manager=manager)\n",
|
||||
" )\n",
|
||||
" tasks = [async_agent.arun(q) for async_agent, q in zip(agents, questions)]\n",
|
||||
" await asyncio.gather(*tasks)\n",
|
||||
" await aiosession.close()\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
|
||||
"await generate_concurrently()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print(f\"Concurrent executed in {elapsed:0.2f} seconds.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "97ef285c-4a43-4a4e-9698-cd52a1bc56c9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using Tracing with Asynchronous Agents\n",
|
||||
"\n",
|
||||
"To use tracing with async agents, you must pass in a custom `CallbackManager` with `LangChainTracer` to each agent running asynchronously. This way, you avoid collisions while the trace is being collected."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "44bda05a-d33e-4e91-9a71-a0f3f96aae95",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
|
||||
"Action: Search\n",
|
||||
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
|
||||
"Action: Calculator\n",
|
||||
"Action Input: 36^0.334\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
|
||||
"# but you must manually close the client session at the end of your program/event loop\n",
|
||||
"aiosession = ClientSession()\n",
|
||||
"tracer = LangChainTracer()\n",
|
||||
"tracer.load_default_session()\n",
|
||||
"manager = CallbackManager([StdOutCallbackHandler(), tracer])\n",
|
||||
"\n",
|
||||
"# Pass the manager into the llm if you want llm calls traced.\n",
|
||||
"llm = OpenAI(temperature=0, callback_manager=manager)\n",
|
||||
"\n",
|
||||
"async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession)\n",
|
||||
"async_agent = initialize_agent(async_tools, llm, agent=\"zero-shot-react-description\", verbose=True, callback_manager=manager)\n",
|
||||
"await async_agent.arun(questions[0])\n",
|
||||
"await aiosession.close()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -17,6 +17,7 @@ The first category of how-to guides here cover specific parts of working with ag
|
||||
|
||||
`Max Iterations <./examples/max_iterations.html>`_: How to restrict an agent to a certain number of iterations.
|
||||
|
||||
`Asynchronous <./examples/async_agent.html>`_: Covering asynchronous functionality.
|
||||
|
||||
The next set of examples are all end-to-end agents for specific applications.
|
||||
In all examples there is an Agent with a particular set of tools.
|
||||
|
||||
132
docs/modules/chains/async_chain.ipynb
Normal file
132
docs/modules/chains/async_chain.ipynb
Normal file
@@ -0,0 +1,132 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "593f7553-7038-498e-96d4-8255e5ce34f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Async API for Chain\n",
|
||||
"\n",
|
||||
"LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
|
||||
"\n",
|
||||
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`). Async support for other chains is on the roadmap."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c19c736e-ca74-4726-bb77-0a849bcc2960",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"BrightSmile Toothpaste Company\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"BrightSmile Toothpaste Co.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"BrightSmile Toothpaste\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Gleaming Smile Inc.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"SparkleSmile Toothpaste\n",
|
||||
"\u001b[1mConcurrent executed in 1.54 seconds.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"BrightSmile Toothpaste Co.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"MintyFresh Toothpaste Co.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"SparkleSmile Toothpaste.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Pearly Whites Toothpaste Co.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"BrightSmile Toothpaste.\n",
|
||||
"\u001b[1mSerial executed in 6.38 seconds.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"import time\n",
|
||||
"\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def generate_serially():\n",
|
||||
" llm = OpenAI(temperature=0.9)\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" )\n",
|
||||
" chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
" for _ in range(5):\n",
|
||||
" resp = chain.run(product=\"toothpaste\")\n",
|
||||
" print(resp)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def async_generate(chain):\n",
|
||||
" resp = await chain.arun(product=\"toothpaste\")\n",
|
||||
" print(resp)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def generate_concurrently():\n",
|
||||
" llm = OpenAI(temperature=0.9)\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"product\"],\n",
|
||||
" template=\"What is a good name for a company that makes {product}?\",\n",
|
||||
" )\n",
|
||||
" chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
" tasks = [async_generate(chain) for _ in range(5)]\n",
|
||||
" await asyncio.gather(*tasks)\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
|
||||
"await generate_concurrently()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Concurrent executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"generate_serially()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
178
docs/modules/chains/combine_docs_examples/analyze_document.ipynb
Normal file
178
docs/modules/chains/combine_docs_examples/analyze_document.ipynb
Normal file
@@ -0,0 +1,178 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ad719b65",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Analyze Document\n",
|
||||
"\n",
|
||||
"The AnalyzeDocumentChain is more of an end to chain. This chain takes in a single document, splits it up, and then runs it through a CombineDocumentsChain. This can be used as more of an end-to-end chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "15e1a8a2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open('../../state_of_the_union.txt') as f:\n",
|
||||
" state_of_the_union = f.read()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "14da4012",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Summarize\n",
|
||||
"Let's take a look at it in action below, using it summarize a long document."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "765d6326",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import OpenAI\n",
|
||||
"from langchain.chains.summarize import load_summarize_chain\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"summary_chain = load_summarize_chain(llm, chain_type=\"map_reduce\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "3a3d3ebc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import AnalyzeDocumentChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "97178aad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=summary_chain)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "2e5a7bf7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" In this speech, President Biden addresses the American people and the world, discussing the recent aggression of Russia's Vladimir Putin in Ukraine and the US response. He outlines economic sanctions and other measures taken to hold Putin accountable, and announces the US Department of Justice's task force to go after the crimes of Russian oligarchs. He also announces plans to fight inflation and lower costs for families, invest in American manufacturing, and provide military, economic, and humanitarian assistance to Ukraine. He calls for immigration reform, protecting the rights of women, and advancing the rights of LGBTQ+ Americans, and pays tribute to military families. He concludes with optimism for the future of America.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"summarize_document_chain.run(state_of_the_union)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "35739404",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Question Answering\n",
|
||||
"Let's take a look at this using a question answering chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "8b9b7705",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.question_answering import load_qa_chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "60c309a8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa_chain = load_qa_chain(llm, chain_type=\"map_reduce\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "ba1fc940",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "9aa1fbde",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The president thanked Justice Breyer for his service.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"qa_document_chain.run(input_document=state_of_the_union, question=\"what did the president say about justice breyer?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7eb02f1e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
165
docs/modules/chains/combine_docs_examples/chat_vector_db.ipynb
Normal file
165
docs/modules/chains/combine_docs_examples/chat_vector_db.ipynb
Normal file
@@ -0,0 +1,165 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "134a0785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Chat Vector DB\n",
|
||||
"\n",
|
||||
"This notebook goes over how to set up a chain to chat with a vector database. The only difference because this chain and the [VectorDBQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "70c4e529",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores.faiss import FAISS\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chains import ChatVectorDBChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a8930cf7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open('../../state_of_the_union.txt') as f:\n",
|
||||
" state_of_the_union = f.read()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_text(state_of_the_union)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"vectorstore = FAISS.from_texts(texts, embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "7b4110f3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3872432d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's an example of asking a question with no chat history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "7fe3e730",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = []\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "bfff9cc8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result[\"answer\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9e46edf7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's an example of asking a question with some chat history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "00b4cf00",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_history = [(query, result[\"answer\"])]\n",
|
||||
"query = \"Did he mention who she suceeded\"\n",
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "f01828d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Justice Stephen Breyer'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['answer']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d0f869c6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -21,6 +21,24 @@
|
||||
"from langchain import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9a58e15e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "095adc76",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Math Prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
@@ -28,7 +46,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)\n",
|
||||
"pal_chain = PALChain.from_math_prompt(llm, verbose=True)"
|
||||
]
|
||||
},
|
||||
@@ -64,7 +81,7 @@
|
||||
" result = total_pets\n",
|
||||
" return result\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished PALChain chain.\u001b[0m\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -82,6 +99,14 @@
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0269d20a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Colored Objects"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
@@ -89,7 +114,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)\n",
|
||||
"pal_chain = PALChain.from_colored_object_prompt(llm, verbose=True)"
|
||||
]
|
||||
},
|
||||
@@ -147,10 +171,94 @@
|
||||
"pal_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fc3d7f10",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Intermediate Steps\n",
|
||||
"You can also use the intermediate steps flag to return the code executed that generates the answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9d2d9c61",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pal_chain = PALChain.from_colored_object_prompt(llm, verbose=True, return_intermediate_steps=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b29b971b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = \"On the desk, you see two blue booklets, two purple booklets, and two yellow pairs of sunglasses. If I remove all the pairs of sunglasses from the desk, how many purple items remain on it?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "a2c40c28",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new PALChain chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m# Put objects into a list to record ordering\n",
|
||||
"objects = []\n",
|
||||
"objects += [('booklet', 'blue')] * 2\n",
|
||||
"objects += [('booklet', 'purple')] * 2\n",
|
||||
"objects += [('sunglasses', 'yellow')] * 2\n",
|
||||
"\n",
|
||||
"# Remove all pairs of sunglasses\n",
|
||||
"objects = [object for object in objects if object[0] != 'sunglasses']\n",
|
||||
"\n",
|
||||
"# Count number of purple objects\n",
|
||||
"num_purple = len([object for object in objects if object[1] == 'purple'])\n",
|
||||
"answer = num_purple\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = pal_chain({\"question\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "efddd033",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"# Put objects into a list to record ordering\\nobjects = []\\nobjects += [('booklet', 'blue')] * 2\\nobjects += [('booklet', 'purple')] * 2\\nobjects += [('sunglasses', 'yellow')] * 2\\n\\n# Remove all pairs of sunglasses\\nobjects = [object for object in objects if object[0] != 'sunglasses']\\n\\n# Count number of purple objects\\nnum_purple = len([object for object in objects if object[1] == 'purple'])\\nanswer = num_purple\""
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result['intermediate_steps']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4ab20fec",
|
||||
"id": "dfd88594",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -56,6 +56,14 @@
|
||||
"llm = OpenAI(temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3d1e692e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**NOTE:** For data-sensitive projects, you can specify `return_direct=True` in the `SQLDatabaseChain` initialization to directly return the output of the SQL query without any additional formatting. This prevents the LLM from seeing any contents within the database. Note, however, the LLM still has access to the database scheme (i.e. dialect, table and key names) by default."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
@@ -85,15 +93,15 @@
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"How many employees are there? \n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(9,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 9 employees.\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' There are 9 employees.'"
|
||||
"' There are 8 employees.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
@@ -168,15 +176,15 @@
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"How many employees are there in the foobar table? \n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(9,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 9 employees in the foobar table.\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees in the foobar table.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' There are 9 employees in the foobar table.'"
|
||||
"' There are 8 employees in the foobar table.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
@@ -210,7 +218,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 9,
|
||||
"id": "78b6af4d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -223,18 +231,18 @@
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"How many employees are there in the foobar table? \n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(9,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 9 employees in the foobar table.\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees in the foobar table.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[' SELECT COUNT(*) FROM Employee;', '[(9,)]']"
|
||||
"[' SELECT COUNT(*) FROM Employee;', '[(8,)]']"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -255,7 +263,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 10,
|
||||
"id": "6adaa799",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -265,7 +273,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 11,
|
||||
"id": "edfc8a8e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -277,8 +285,8 @@
|
||||
"\n",
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"What are some example tracks by composer Johann Sebastian Bach? \n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name FROM Track WHERE Composer = 'Johann Sebastian Bach' LIMIT 3;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace',), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria',), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude',)]\u001b[0m\n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name, Composer FROM Track WHERE Composer = 'Johann Sebastian Bach' LIMIT 3;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach')]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m Examples of tracks by Johann Sebastian Bach include 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', and 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude'.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -289,7 +297,7 @@
|
||||
"' Examples of tracks by Johann Sebastian Bach include \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', and \\'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\\'.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -303,13 +311,13 @@
|
||||
"id": "bcc5e936",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Adding first row of each table\n",
|
||||
"Sometimes, the format of the data is not obvious and it is optimal to include the first row of the table in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names."
|
||||
"## Adding example rows from each table\n",
|
||||
"Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the tables in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing two rows from the `Track` table."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 12,
|
||||
"id": "9a22ee47",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -317,12 +325,40 @@
|
||||
"db = SQLDatabase.from_uri(\n",
|
||||
" \"sqlite:///../../../../notebooks/Chinook.db\", \n",
|
||||
" include_tables=['Track'], # we include only one table to save tokens in the prompt :)\n",
|
||||
" sample_row_in_table_info=True)"
|
||||
" sample_rows_in_table_info=2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "952c0b4d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The sample rows are added to the prompt after each corresponding table's column information:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 13,
|
||||
"id": "9de86267",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Table 'Track' has columns: TrackId (INTEGER), Name (NVARCHAR(200)), AlbumId (INTEGER), MediaTypeId (INTEGER), GenreId (INTEGER), Composer (NVARCHAR(220)), Milliseconds (INTEGER), Bytes (INTEGER), UnitPrice (NUMERIC(10, 2)). Here is an example of 2 rows from this table (long strings are truncated):\n",
|
||||
"1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99\n",
|
||||
"2 Balls to the Wall 2 2 1 None 342562 5510424 0.99\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(db.table_info)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "bcb7a489",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -332,7 +368,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 15,
|
||||
"id": "81e05d82",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -344,20 +380,19 @@
|
||||
"\n",
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"What are some example tracks by Bach? \n",
|
||||
"SQLQuery:Table 'Track' has columns: TrackId (INTEGER), Name (NVARCHAR(200)), AlbumId (INTEGER), MediaTypeId (INTEGER), GenreId (INTEGER), Composer (NVARCHAR(220)), Milliseconds (INTEGER), Bytes (INTEGER), UnitPrice (NUMERIC(10, 2)). Here is an example row for this table (long strings are truncated): ['1', 'For Those About To Rock (We Salute You)', '1', '1', '1', 'Angus Young, Malcolm Young, Brian Johnson', '343719', '11170334', '0.99'].\n",
|
||||
"\u001b[32;1m\u001b[1;3m SELECT TrackId, Name, Composer FROM Track WHERE Composer LIKE '%Bach%' ORDER BY Name LIMIT 5;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(1709, 'American Woman', 'B. Cummings/G. Peterson/M.J. Kale/R. Bachman'), (3408, 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Johann Sebastian Bach'), (3433, 'Concerto No.2 in F Major, BWV1047, I. Allegro', 'Johann Sebastian Bach'), (3407, 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), (3490, 'Partita in E Major, BWV 1006A: I. Prelude', 'Johann Sebastian Bach')]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m Some example tracks by Bach are 'American Woman', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Concerto No.2 in F Major, BWV1047, I. Allegro', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', and 'Partita in E Major, BWV 1006A: I. Prelude'.\u001b[0m\n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name, Composer FROM Track WHERE Composer LIKE '%Bach%' LIMIT 5;\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[('American Woman', 'B. Cummings/G. Peterson/M.J. Kale/R. Bachman'), ('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach'), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata', 'Johann Sebastian Bach')]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3m Some example tracks by Bach are 'American Woman', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', and 'Toccata and Fugue in D Minor, BWV 565: I. Toccata'.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Some example tracks by Bach are \\'American Woman\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', \\'Concerto No.2 in F Major, BWV1047, I. Allegro\\', \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', and \\'Partita in E Major, BWV 1006A: I. Prelude\\'.'"
|
||||
"' Some example tracks by Bach are \\'American Woman\\', \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', \\'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\\', and \\'Toccata and Fugue in D Minor, BWV 565: I. Toccata\\'.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -446,6 +481,10 @@
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"@webio": {
|
||||
"lastCommId": null,
|
||||
"lastKernelId": null
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
@@ -461,7 +500,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -121,10 +121,51 @@
|
||||
"llm_chain.predict(adjective=\"sad\", subject=\"ducks\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "672f59d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## From string\n",
|
||||
"You can also construct an LLMChain from a string template directly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "f8bc262e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Write a {adjective} poem about {subject}.\"\"\"\n",
|
||||
"llm_chain = LLMChain.from_string(llm=OpenAI(temperature=0), template=template)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "cb164a76",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"\\n\\nThe ducks swim in the pond,\\nTheir feathers so soft and warm,\\nBut they can't help but feel so forlorn.\\n\\nTheir quacks echo in the air,\\nBut no one is there to hear,\\nFor they have no one to share.\\n\\nThe ducks paddle around in circles,\\nTheir heads hung low in despair,\\nFor they have no one to care.\\n\\nThe ducks look up to the sky,\\nBut no one is there to see,\\nFor they have no one to be.\\n\\nThe ducks drift away in the night,\\nTheir hearts filled with sorrow and pain,\\nFor they have no one to gain.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_chain.predict(adjective=\"sad\", subject=\"ducks\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8310cdaa",
|
||||
"id": "9f0adbc7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -9,6 +9,7 @@ They are broken up into three categories:
|
||||
1. `Generic Chains <./generic_how_to.html>`_: Generic chains, that are meant to help build other chains rather than serve a particular purpose.
|
||||
2. `CombineDocuments Chains <./combine_docs_how_to.html>`_: Chains aimed at making it easy to work with documents (question answering, summarization, etc).
|
||||
3. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util.
|
||||
4. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
29
docs/modules/document_loaders.rst
Normal file
29
docs/modules/document_loaders.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
Document Loaders
|
||||
==========================
|
||||
|
||||
Combining language models with your own text data is a powerful way to differentiate them.
|
||||
The first step in doing this is to load the data into "documents" - a fancy way of say some pieces of text.
|
||||
This module is aimed at making this easy.
|
||||
|
||||
A primary driver of a lot of this is the `Unstructured <https://github.com/Unstructured-IO/unstructured>`_ python package.
|
||||
This package is a great way to transform all types of files - text, powerpoint, images, html, pdf, etc - into text data.
|
||||
|
||||
For detailed instructions on how to get set up with Unstructured, see installation guidelines `here <https://github.com/Unstructured-IO/unstructured#coffee-getting-started>`_.
|
||||
|
||||
The following sections of documentation are provided:
|
||||
|
||||
- `Key Concepts <./document_loaders/key_concepts.html>`_: A conceptual guide going over the various concepts related to loading documents.
|
||||
|
||||
- `How-To Guides <./document_loaders/how_to_guides.html>`_: A collection of how-to guides. These highlight different types of loaders.
|
||||
|
||||
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Document Loaders
|
||||
:name: Document Loaders
|
||||
:hidden:
|
||||
|
||||
./document_loaders/key_concepts.md
|
||||
./document_loaders/how_to_guides.rst
|
||||
93
docs/modules/document_loaders/examples/azlyrics.ipynb
Normal file
93
docs/modules/document_loaders/examples/azlyrics.ipynb
Normal file
@@ -0,0 +1,93 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c31caff",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AZLyrics\n",
|
||||
"This covers how to load AZLyrics webpages into a document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "7e6f5726",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AZLyricsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a0df4c24",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = AZLyricsLoader(\"https://www.azlyrics.com/lyrics/mileycyrus/flowers.html\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8cd61b6e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "162fd286",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content=\"Miley Cyrus - Flowers Lyrics | AZLyrics.com\\n\\r\\nWe were good, we were gold\\nKinda dream that can't be sold\\nWe were right till we weren't\\nBuilt a home and watched it burn\\n\\nI didn't wanna leave you\\nI didn't wanna lie\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than you can\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\n\\nPaint my nails, cherry red\\nMatch the roses that you left\\nNo remorse, no regret\\nI forgive every word you said\\n\\nI didn't wanna leave you, baby\\nI didn't wanna fight\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours, yeah\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than you can\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI\\n\\nI didn't wanna wanna leave you\\nI didn't wanna fight\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours (Yeah)\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than\\nYeah, I can love me better than you can, uh\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby (Than you can)\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI\\n\", lookup_str='', metadata={'source': 'https://www.azlyrics.com/lyrics/mileycyrus/flowers.html'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6358000c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
101
docs/modules/document_loaders/examples/directory_loader.ipynb
Normal file
101
docs/modules/document_loaders/examples/directory_loader.ipynb
Normal file
@@ -0,0 +1,101 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "79f24a6b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Directory Loader\n",
|
||||
"This covers how to use the DirectoryLoader to load all documents in a directory. Under the hood, this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "019d8520",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import DirectoryLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0c76cdc5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.ipynb` files."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "891fe56f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = DirectoryLoader('../', glob=\"**/*.md\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "addfe9cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "b042086d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "cbc8256b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
94
docs/modules/document_loaders/examples/email.ipynb
Normal file
94
docs/modules/document_loaders/examples/email.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9fdbd55d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Email\n",
|
||||
"\n",
|
||||
"This notebook shows how to load email (`.eml`) files."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "40cd9806",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredEmailLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2d20b852",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredEmailLoader('example_data/fake-email.eml')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "579fa702",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "90c1d899",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='This is a test email to use for unit tests.\\n\\nImportant points:\\n\\nRoses are red\\n\\nViolets are blue', lookup_str='', metadata={'source': 'example_data/fake-email.eml'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4ef9a5f4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<body>
|
||||
|
||||
<h1>My First Heading</h1>
|
||||
<p>My first paragraph.</p>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,20 @@
|
||||
MIME-Version: 1.0
|
||||
Date: Fri, 16 Dec 2022 17:04:16 -0500
|
||||
Message-ID: <CADc-_xaLB2FeVQ7mNsoX+NJb_7hAJhBKa_zet-rtgPGenj0uVw@mail.gmail.com>
|
||||
Subject: Test Email
|
||||
From: Matthew Robinson <mrobinson@unstructured.io>
|
||||
To: Matthew Robinson <mrobinson@unstructured.io>
|
||||
Content-Type: multipart/alternative; boundary="00000000000095c9b205eff92630"
|
||||
|
||||
--00000000000095c9b205eff92630
|
||||
Content-Type: text/plain; charset="UTF-8"
|
||||
This is a test email to use for unit tests.
|
||||
Important points:
|
||||
- Roses are red
|
||||
- Violets are blue
|
||||
--00000000000095c9b205eff92630
|
||||
Content-Type: text/html; charset="UTF-8"
|
||||
|
||||
<div dir="ltr"><div>This is a test email to use for unit tests.</div><div><br></div><div>Important points:</div><div><ul><li>Roses are red</li><li>Violets are blue</li></ul></div></div>
|
||||
|
||||
--00000000000095c9b205eff92630--
|
||||
Binary file not shown.
BIN
docs/modules/document_loaders/examples/example_data/fake.docx
Normal file
BIN
docs/modules/document_loaders/examples/example_data/fake.docx
Normal file
Binary file not shown.
Binary file not shown.
156
docs/modules/document_loaders/examples/gcs_directory.ipynb
Normal file
156
docs/modules/document_loaders/examples/gcs_directory.ipynb
Normal file
@@ -0,0 +1,156 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0ef41fd4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GCS Directory\n",
|
||||
"\n",
|
||||
"This covers how to load document objects from an Google Cloud Storage (GCS) directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5cfb25c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import GCSDirectoryLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "93a4d0f1",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install google-cloud-storage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "633dc839",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GCSDirectoryLoader(project_name=\"aist\", bucket=\"testing-hwc\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a863467d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
|
||||
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
|
||||
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
|
||||
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpz37njh7u/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c0dcbb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying a prefix\n",
|
||||
"You can also specify a prefix for more finegrained control over what files to load."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b3143c89",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GCSDirectoryLoader(project_name=\"aist\", bucket=\"testing-hwc\", prefix=\"fake\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "226ac6f5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
|
||||
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
|
||||
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
|
||||
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpylg6291i/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f9c0734f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
104
docs/modules/document_loaders/examples/gcs_file.ipynb
Normal file
104
docs/modules/document_loaders/examples/gcs_file.ipynb
Normal file
@@ -0,0 +1,104 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0ef41fd4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GCS File Storage\n",
|
||||
"\n",
|
||||
"This covers how to load document objects from an Google Cloud Storage (GCS) file object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5cfb25c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import GCSFileLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "93a4d0f1",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install google-cloud-storage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "633dc839",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GCSFileLoader(project_name=\"aist\", bucket=\"testing-hwc\", blob=\"fake.docx\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a863467d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
|
||||
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmp3srlf8n8/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eba3002d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
84
docs/modules/document_loaders/examples/googledrive.ipynb
Normal file
84
docs/modules/document_loaders/examples/googledrive.ipynb
Normal file
@@ -0,0 +1,84 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b0ed136e-6983-4893-ae1b-b75753af05f8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Drive\n",
|
||||
"This notebook covers how to load documents from Google Drive. Currently, only Google Docs are supported.\n",
|
||||
"\n",
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"1. Create a Google Cloud project or use an existing project\n",
|
||||
"1. Enable the [Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com)\n",
|
||||
"1. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)\n",
|
||||
"1. `pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib`\n",
|
||||
"\n",
|
||||
"## 🧑 Instructions for ingesting your Google Docs data\n",
|
||||
"By default, the `GoogleDriveLoader` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `credentials_file` keyword argument. Same thing with `token.json`. Note that `token.json` will be created automatically the first time you use the loader.\n",
|
||||
"\n",
|
||||
"`GoogleDriveLoader` can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:\n",
|
||||
"* Folder: https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5 -> folder id is `\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\"`\n",
|
||||
"* Document: https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit -> document id is `\"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw\"`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "878928a6-a5ae-4f74-b351-64e3b01733fe",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import GoogleDriveLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2216c83f-68e4-4d2f-8ea2-5878fb18bbe7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GoogleDriveLoader(folder_id=\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "8f3b6aa0-b45d-4e37-8c50-5bebe70fdb9d",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
83
docs/modules/document_loaders/examples/gutenberg.ipynb
Normal file
83
docs/modules/document_loaders/examples/gutenberg.ipynb
Normal file
@@ -0,0 +1,83 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bda1f3f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gutenberg\n",
|
||||
"\n",
|
||||
"This covers how to load links to Gutenberg e-books into a document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "9bfd5e46",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import GutenbergLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "700e4ef2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = GutenbergLoader('https://www.gutenberg.org/cache/epub/69972/pg69972.txt')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "b6f28930",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7d436441",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3b74d755",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
94
docs/modules/document_loaders/examples/html.ipynb
Normal file
94
docs/modules/document_loaders/examples/html.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2dfc4698",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HTML\n",
|
||||
"\n",
|
||||
"This covers how to load HTML documents into a document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "24b434b5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredHTMLLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "00f46fda",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredHTMLLoader(\"example_data/fake-content.html\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "b68a26b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "34de48fa",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='My First Heading\\n\\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "79b1bce4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
94
docs/modules/document_loaders/examples/imsdb.ipynb
Normal file
94
docs/modules/document_loaders/examples/imsdb.ipynb
Normal file
File diff suppressed because one or more lines are too long
94
docs/modules/document_loaders/examples/microsoft_word.ipynb
Normal file
94
docs/modules/document_loaders/examples/microsoft_word.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "34c90eed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Microsoft Word\n",
|
||||
"\n",
|
||||
"This notebook shows how to load text from Microsoft word documents."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "28ded768",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredDocxLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "f1f26035",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredDocxLoader('example_data/fake.docx')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "2c87dde9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "0e4a884c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'example_data/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "61953c83",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
82
docs/modules/document_loaders/examples/notion.ipynb
Normal file
82
docs/modules/document_loaders/examples/notion.ipynb
Normal file
@@ -0,0 +1,82 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1dc7df1d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Notion\n",
|
||||
"This notebook covers how to load documents from a Notion database dump.\n",
|
||||
"\n",
|
||||
"In order to get this notion dump, follow these instructions:\n",
|
||||
"\n",
|
||||
"## 🧑 Instructions for ingesting your own dataset\n",
|
||||
"\n",
|
||||
"Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n",
|
||||
"\n",
|
||||
"When exporting, make sure to select the `Markdown & CSV` format option.\n",
|
||||
"\n",
|
||||
"This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.\n",
|
||||
"\n",
|
||||
"Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).\n",
|
||||
"\n",
|
||||
"```shell\n",
|
||||
"unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Run the following command to ingest the data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "007c5cbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import NotionDirectoryLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a1caec59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = NotionDirectoryLoader(\"Notion_DB\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b1c30ff7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
66
docs/modules/document_loaders/examples/obsidian.ipynb
Normal file
66
docs/modules/document_loaders/examples/obsidian.ipynb
Normal file
@@ -0,0 +1,66 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1dc7df1d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Obsidian\n",
|
||||
"This notebook covers how to load documents from an Obsidian database.\n",
|
||||
"\n",
|
||||
"Since Obsidian is just stored on disk as a folder of Markdown files, the loader just takes a path to this directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "007c5cbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import ObsidianLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a1caec59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = ObsidianLoader(\"<path-to-obsidian>\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b1c30ff7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
73
docs/modules/document_loaders/examples/pdf.ipynb
Normal file
73
docs/modules/document_loaders/examples/pdf.ipynb
Normal file
@@ -0,0 +1,73 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f70e6118",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PDF\n",
|
||||
"\n",
|
||||
"This covers how to load pdfs into a document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "0cc0cd42",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredPDFLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "082d557c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredPDFLoader(\"example_data/layout-parser-paper.pdf\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5c41106f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "54fb6b62",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
94
docs/modules/document_loaders/examples/powerpoint.ipynb
Normal file
94
docs/modules/document_loaders/examples/powerpoint.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39af9ecd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PowerPoint\n",
|
||||
"\n",
|
||||
"This covers how to load PowerPoint documents into a document format that we can use downstream."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "721c48aa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredPowerPointLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "9d3d0e35",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredPowerPointLoader(\"example_data/fake-power-point.pptx\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "06073f91",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "c9adc5cb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Adding a Bullet Slide\\n\\nFind the bullet slide layout\\n\\nUse _TextFrame.text for first bullet\\n\\nUse _TextFrame.add_paragraph() for subsequent bullets\\n\\nHere is a lot of text!\\n\\nHere is some text in a text box!', lookup_str='', metadata={'source': 'example_data/fake-power-point.pptx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0c55f1cf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,78 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17812129",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ReadTheDocs Documentation\n",
|
||||
"This notebook covers how to load content from html that was generated as part of a Read-The-Docs build.\n",
|
||||
"\n",
|
||||
"For an example of this in the wild, see [here](https://github.com/hwchase17/chat-langchain).\n",
|
||||
"\n",
|
||||
"This assumes that the html has already been scraped into a folder. This can be done by uncommenting and running the following command"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "84696e27",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "92dd950b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import ReadTheDocsLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "494567c3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = ReadTheDocsLoader(\"rtdocs\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e2e6d6f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
78
docs/modules/document_loaders/examples/roam.ipynb
Normal file
78
docs/modules/document_loaders/examples/roam.ipynb
Normal file
@@ -0,0 +1,78 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1dc7df1d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Roam\n",
|
||||
"This notebook covers how to load documents from a Roam database. This takes a lot of inspiration from the example repo [here](https://github.com/JimmyLv/roam-qa).\n",
|
||||
"\n",
|
||||
"## 🧑 Instructions for ingesting your own dataset\n",
|
||||
"\n",
|
||||
"Export your dataset from Roam Research. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n",
|
||||
"\n",
|
||||
"When exporting, make sure to select the `Markdown & CSV` format option.\n",
|
||||
"\n",
|
||||
"This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.\n",
|
||||
"\n",
|
||||
"Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).\n",
|
||||
"\n",
|
||||
"```shell\n",
|
||||
"unzip Roam-Export-1675782732639.zip -d Roam_DB\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "007c5cbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import RoamLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a1caec59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = ObsidianLoader(\"Roam_DB\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b1c30ff7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
134
docs/modules/document_loaders/examples/s3_directory.ipynb
Normal file
134
docs/modules/document_loaders/examples/s3_directory.ipynb
Normal file
@@ -0,0 +1,134 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a634365e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# s3 Directory\n",
|
||||
"\n",
|
||||
"This covers how to load document objects from an s3 directory object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "2f0cd6a5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import S3DirectoryLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "49815096",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install boto3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "321cc7f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = S3DirectoryLoader(\"testing-hwc\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "2b11d155",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpaa9xl6ch/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0690c40a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying a prefix\n",
|
||||
"You can also specify a prefix for more finegrained control over what files to load."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "72d44781",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = S3DirectoryLoader(\"testing-hwc\", prefix=\"fake\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "2d3c32db",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpujbkzf_l/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "885dc280",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
94
docs/modules/document_loaders/examples/s3_file.ipynb
Normal file
94
docs/modules/document_loaders/examples/s3_file.ipynb
Normal file
@@ -0,0 +1,94 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "66a7777e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# s3 File\n",
|
||||
"\n",
|
||||
"This covers how to load document objects from an s3 file object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "9ec8a3b3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import S3FileLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "43128d8d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install boto3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "35d6809a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = S3FileLoader(\"testing-hwc\", \"fake.docx\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "efd6be84",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpxvave6wl/fake.docx'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "93689594",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -0,0 +1,72 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "20deed05",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Unstructured File Loader\n",
|
||||
"This notebook covers how to use Unstructured to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "79d3e549",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredFileLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "2593d1dc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredFileLoader(\"../../state_of_the_union.txt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "fe34e941",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "24e577e5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
117
docs/modules/document_loaders/examples/web_base.ipynb
Normal file
117
docs/modules/document_loaders/examples/web_base.ipynb
Normal file
File diff suppressed because one or more lines are too long
137
docs/modules/document_loaders/examples/youtube.ipynb
Normal file
137
docs/modules/document_loaders/examples/youtube.ipynb
Normal file
@@ -0,0 +1,137 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "df770c72",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# YouTube\n",
|
||||
"\n",
|
||||
"How to load documents from YouTube transcripts."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "da4a867f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import YoutubeLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "34a25b57",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# !pip install youtube-transcript-api"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "bc8b308a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "d073dd36",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6b278a1b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Add video info"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "ba28af69",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# ! pip install pytube"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9b8ea390",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "97b98e92",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
55
docs/modules/document_loaders/how_to_guides.rst
Normal file
55
docs/modules/document_loaders/how_to_guides.rst
Normal file
@@ -0,0 +1,55 @@
|
||||
How To Guides
|
||||
====================================
|
||||
|
||||
There are a lot of different document loaders that LangChain supports. Below are how-to guides for working with them
|
||||
|
||||
`File Loader <./examples/unstructured_file.html>`_: A walkthrough of how to use Unstructured to load files of arbitrary types (pdfs, txt, html, etc).
|
||||
|
||||
`Directory Loader <./examples/directory_loader.html>`_: A walkthrough of how to use Unstructured load files from a given directory.
|
||||
|
||||
`Notion <./examples/notion.html>`_: A walkthrough of how to load data for an arbitrary Notion DB.
|
||||
|
||||
`ReadTheDocs <./examples/readthedocs_documentation.html>`_: A walkthrough of how to load data for documentation generated by ReadTheDocs.
|
||||
|
||||
`HTML <./examples/html.html>`_: A walkthrough of how to load data from an html file.
|
||||
|
||||
`PDF <./examples/pdf.html>`_: A walkthrough of how to load data from a PDF file.
|
||||
|
||||
`PowerPoint <./examples/powerpoint.html>`_: A walkthrough of how to load data from a powerpoint file.
|
||||
|
||||
`Email <./examples/email.html>`_: A walkthrough of how to load data from an email (`.eml`) file.
|
||||
|
||||
`GoogleDrive <./examples/googledrive.html>`_: A walkthrough of how to load data from Google drive.
|
||||
|
||||
`Microsoft Word <./examples/microsoft_word.html>`_: A walkthrough of how to load data from Microsoft Word files.
|
||||
|
||||
`Obsidian <./examples/obsidian.html>`_: A walkthrough of how to load data from an Obsidian file dump.
|
||||
|
||||
`Roam <./examples/roam.html>`_: A walkthrough of how to load data from a Roam file export.
|
||||
|
||||
`YouTube <./examples/youtube.html>`_: A walkthrough of how to load the transcript from a YouTube video.
|
||||
|
||||
`s3 File <./examples/s3_file.html>`_: A walkthrough of how to load a file from s3.
|
||||
|
||||
`s3 Directory <./examples/s3_directory.html>`_: A walkthrough of how to load all files in a directory from s3.
|
||||
|
||||
`GCS File <./examples/gcs_file.html>`_: A walkthrough of how to load a file from Google Cloud Storage (GCS).
|
||||
|
||||
`GCS Directory <./examples/gcs_directory.html>`_: A walkthrough of how to load all files in a directory from Google Cloud Storage (GCS).
|
||||
|
||||
`Web Base <./examples/web_base.html>`_: A walkthrough of how to load all text data from webpages.
|
||||
|
||||
`IMSDb <./examples/imsdb.html>`_: A walkthrough of how to load all text data from IMSDb webpage.
|
||||
|
||||
`AZLyrics <./examples/azlyrics.html>`_: A walkthrough of how to load all text data from AZLyrics webpage.
|
||||
|
||||
`College Confidential <./examples/college_confidential.html>`_: A walkthrough of how to load all text data from College Confidential webpage.
|
||||
|
||||
`Gutenberg <./examples/gutenberg.html>`_: A walkthrough of how to load data from a Gutenberg ebook text.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
:hidden:
|
||||
|
||||
examples/*
|
||||
12
docs/modules/document_loaders/key_concepts.md
Normal file
12
docs/modules/document_loaders/key_concepts.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Key Concepts
|
||||
|
||||
## Document
|
||||
This class is a container for document information. This contains two parts:
|
||||
- `page_content`: The content of the actual page itself.
|
||||
- `metadata`: The metadata associated with the document. This can be things like the file path, the url, etc.
|
||||
|
||||
## Loader
|
||||
This base class is a way to load documents. It exposes a `load` method that returns `Document` objects.
|
||||
|
||||
## [Unstructured](https://github.com/Unstructured-IO/unstructured)
|
||||
Unstructured is a python package specifically focused on transformations from raw documents to text.
|
||||
150
docs/modules/llms/async_llm.ipynb
Normal file
150
docs/modules/llms/async_llm.ipynb
Normal file
@@ -0,0 +1,150 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6574496-b360-4ffa-9523-7fd34a590164",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Async API for LLM\n",
|
||||
"\n",
|
||||
"LangChain provides async support for LLMs by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
|
||||
"\n",
|
||||
"Async support is particularly useful for calling multiple LLMs concurrently, as these calls are network-bound. Currently, only `OpenAI` is supported, but async support for other LLMs is on the roadmap.\n",
|
||||
"\n",
|
||||
"You can use the `agenerate` method to call an OpenAI LLM asynchronously."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "5e49e96c-0f88-466d-b3d3-ea0966bdf19e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"I am doing quite well. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing great, thank you! How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thanks for asking. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\u001b[1mConcurrent executed in 1.93 seconds.\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing well, thank you. How about you?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm doing great, thank you. How about you?\n",
|
||||
"\u001b[1mSerial executed in 10.54 seconds.\u001b[0m\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"def generate_serially():\n",
|
||||
" llm = OpenAI(temperature=0.9)\n",
|
||||
" for _ in range(10):\n",
|
||||
" resp = llm.generate([\"Hello, how are you?\"])\n",
|
||||
" print(resp.generations[0][0].text)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def async_generate(llm):\n",
|
||||
" resp = await llm.agenerate([\"Hello, how are you?\"])\n",
|
||||
" print(resp.generations[0][0].text)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"async def generate_concurrently():\n",
|
||||
" llm = OpenAI(temperature=0.9)\n",
|
||||
" tasks = [async_generate(llm) for _ in range(10)]\n",
|
||||
" await asyncio.gather(*tasks)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
|
||||
"await generate_concurrently() \n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Concurrent executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n",
|
||||
"\n",
|
||||
"s = time.perf_counter()\n",
|
||||
"generate_serially()\n",
|
||||
"elapsed = time.perf_counter() - s\n",
|
||||
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -7,6 +7,7 @@ They are split into two categories:
|
||||
|
||||
1. `Generic Functionality <./generic_how_to.html>`_: Covering generic functionality all LLMs should have.
|
||||
2. `Integrations <./integrations.html>`_: Covering integrations with various LLM providers.
|
||||
3. `Asynchronous <./async_llm.html>`_: Covering asynchronous functionality.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
@@ -5,9 +5,9 @@
|
||||
"id": "959300d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HuggingFace Hub\n",
|
||||
"# Hugging Face Hub\n",
|
||||
"\n",
|
||||
"This example showcases how to connect to the HuggingFace Hub."
|
||||
"This example showcases how to connect to the Hugging Face Hub."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"This notebook walks through using an agent optimized for conversation. Other agents are often optimized for using tools to figure out the best response, which is not ideal in a conversational setting where you may want the agent to be able to chat with the user as well.\n",
|
||||
"\n",
|
||||
"This is accomplisehd with a specific type of agent (`conversational-react-description`) which expects to be used with a memory component."
|
||||
"This is accomplished with a specific type of agent (`conversational-react-description`) which expects to be used with a memory component."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
168
docs/modules/prompts/examples/custom_prompt_template.ipynb
Normal file
168
docs/modules/prompts/examples/custom_prompt_template.ipynb
Normal file
@@ -0,0 +1,168 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c75efab3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Create a custom prompt template\n",
|
||||
"\n",
|
||||
"Let's suppose we want the LLM to generate English language explanations of a function given its name. To achieve this task, we will create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.\n",
|
||||
"\n",
|
||||
"## Why are custom prompt templates needed?\n",
|
||||
"\n",
|
||||
"LangChain provides a set of default prompt templates that can be used to generate prompts for a variety of tasks. However, there may be cases where the default prompt templates do not meet your needs. For example, you may want to create a prompt template with specific dynamic instructions for your language model. In such cases, you can create a custom prompt template.\n",
|
||||
"\n",
|
||||
"Take a look at the current set of default prompt templates [here](../getting_started.md)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5d56ce86",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create a custom prompt template\n",
|
||||
"\n",
|
||||
"The only two requirements for all prompt templates are:\n",
|
||||
"\n",
|
||||
"1. They have a input_variables attribute that exposes what input variables this prompt template expects.\n",
|
||||
"2. They expose a format method which takes in keyword arguments corresponding to the expected input_variables and returns the formatted prompt.\n",
|
||||
"\n",
|
||||
"Let's create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.\n",
|
||||
"\n",
|
||||
"First, let's create a function that will return the source code of a function given its name."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "c831e1ce",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import inspect\n",
|
||||
"\n",
|
||||
"def get_source_code(function_name):\n",
|
||||
" # Get the source code of the function\n",
|
||||
" return inspect.getsource(function_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c2c8f4ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we'll create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "3ad1efdc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import BasePromptTemplate\n",
|
||||
"from pydantic import BaseModel, validator\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class FunctionExplainerPromptTemplate(BasePromptTemplate, BaseModel):\n",
|
||||
" \"\"\" A custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function. \"\"\"\n",
|
||||
"\n",
|
||||
" @validator(\"input_variables\")\n",
|
||||
" def validate_input_variables(cls, v):\n",
|
||||
" \"\"\" Validate that the input variables are correct. \"\"\"\n",
|
||||
" if len(v) != 1 or \"function_name\" not in v:\n",
|
||||
" raise ValueError(\"function_name must be the only input_variable.\")\n",
|
||||
" return v\n",
|
||||
"\n",
|
||||
" def format(self, **kwargs) -> str:\n",
|
||||
" # Get the source code of the function\n",
|
||||
" source_code = get_source_code(kwargs[\"function_name\"])\n",
|
||||
"\n",
|
||||
" # Generate the prompt to be sent to the language model\n",
|
||||
" prompt = f\"\"\"\n",
|
||||
" Given the function name and source code, generate an English language explanation of the function.\n",
|
||||
" Function Name: {kwargs[\"function_name\"].__name__}\n",
|
||||
" Source Code:\n",
|
||||
" {source_code}\n",
|
||||
" Explanation:\n",
|
||||
" \"\"\"\n",
|
||||
" return prompt\n",
|
||||
" \n",
|
||||
" def _prompt_type(self):\n",
|
||||
" return \"function-explainer\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7fcbf6ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Use the custom prompt template\n",
|
||||
"\n",
|
||||
"Now that we have created a custom prompt template, we can use it to generate prompts for our task."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "bd836cda",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
" Given the function name and source code, generate an English language explanation of the function.\n",
|
||||
" Function Name: get_source_code\n",
|
||||
" Source Code:\n",
|
||||
" def get_source_code(function_name):\n",
|
||||
" # Get the source code of the function\n",
|
||||
" return inspect.getsource(function_name)\n",
|
||||
"\n",
|
||||
" Explanation:\n",
|
||||
" \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"fn_explainer = FunctionExplainerPromptTemplate(input_variables=[\"function_name\"])\n",
|
||||
"\n",
|
||||
"# Generate a prompt for the function \"get_source_code\"\n",
|
||||
"prompt = fn_explainer.format(function_name=get_source_code)\n",
|
||||
"print(prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7f3161c6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,75 +0,0 @@
|
||||
# Create a custom prompt template
|
||||
|
||||
Let's suppose we want the LLM to generate English language explanations of a function given its name. To achieve this task, we will create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.
|
||||
|
||||
## Why are custom prompt templates needed?
|
||||
|
||||
LangChain provides a set of default prompt templates that can be used to generate prompts for a variety of tasks. However, there may be cases where the default prompt templates do not meet your needs. For example, you may want to create a prompt template with specific dynamic instructions for your language model. In such cases, you can create a custom prompt template.
|
||||
|
||||
:::{note}
|
||||
Take a look at the current set of default prompt templates [here](../getting_started.md).
|
||||
:::
|
||||
<!-- TODO(shreya): Add correct link here. -->
|
||||
|
||||
## Create a custom prompt template
|
||||
|
||||
The only two requirements for all prompt templates are:
|
||||
|
||||
1. They have a input_variables attribute that exposes what input variables this prompt template expects.
|
||||
2. They expose a format method which takes in keyword arguments corresponding to the expected input_variables and returns the formatted prompt.
|
||||
|
||||
Let's create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.
|
||||
|
||||
First, let's create a function that will return the source code of a function given its name.
|
||||
|
||||
```python
|
||||
import inspect
|
||||
|
||||
def get_source_code(function_name):
|
||||
# Get the source code of the function
|
||||
return inspect.getsource(function_name)
|
||||
```
|
||||
|
||||
Next, we'll create a custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function.
|
||||
|
||||
```python
|
||||
from langchain.prompts import BasePromptTemplate
|
||||
from pydantic import BaseModel, validator
|
||||
|
||||
|
||||
class FunctionExplainerPromptTemplate(BasePromptTemplate, BaseModel):
|
||||
""" A custom prompt template that takes in the function name as input, and formats the prompt template to provide the source code of the function. """
|
||||
|
||||
@validator("input_variables")
|
||||
def validate_input_variables(cls, v):
|
||||
""" Validate that the input variables are correct. """
|
||||
if len(v) != 1 or "function_name" not in v:
|
||||
raise ValueError("function_name must be the only input_variable.")
|
||||
return v
|
||||
|
||||
def format(self, **kwargs) -> str:
|
||||
# Get the source code of the function
|
||||
source_code = get_source_code(kwargs["function_name"])
|
||||
|
||||
# Generate the prompt to be sent to the language model
|
||||
prompt = f"""
|
||||
Given the function name and source code, generate an English language explanation of the function.
|
||||
Function Name: {kwargs["function_name"].__name__}
|
||||
Source Code:
|
||||
{source_code}
|
||||
Explanation:
|
||||
"""
|
||||
return prompt
|
||||
```
|
||||
|
||||
## Use the custom prompt template
|
||||
|
||||
Now that we have created a custom prompt template, we can use it to generate prompts for our task.
|
||||
|
||||
```python
|
||||
fn_explainer = FunctionExplainerPromptTemplate(input_variables=["function_name"])
|
||||
|
||||
# Generate a prompt for the function "get_source_code"
|
||||
prompt = fn_explainer.format(function_name=get_source_code)
|
||||
print(prompt)
|
||||
```
|
||||
@@ -23,7 +23,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "8244ff60",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -81,7 +81,7 @@
|
||||
" template=\"Input: {input}\\nOutput: {output}\",\n",
|
||||
")\n",
|
||||
"example_selector = LengthBasedExampleSelector(\n",
|
||||
" # These are the examples is has available to choose from.\n",
|
||||
" # These are the examples it has available to choose from.\n",
|
||||
" examples=examples, \n",
|
||||
" # This is the PromptTemplate being used to format the examples.\n",
|
||||
" example_prompt=example_prompt, \n",
|
||||
@@ -439,10 +439,242 @@
|
||||
"print(similar_prompt.format(adjective=\"worried\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4aaeed2f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## NGram Overlap ExampleSelector\n",
|
||||
"\n",
|
||||
"The NGramOverlapExampleSelector selects and orders examples based on which examples are most similar to the input, according to an ngram overlap score. The ngram overlap score is a float between 0.0 and 1.0, inclusive. \n",
|
||||
"\n",
|
||||
"The selector allows for a threshold score to be set. Examples with an ngram overlap score less than or equal to the threshold are excluded. The threshold is set to -1.0, by default, so will not exclude any examples, only reorder them. Setting the threshold to 0.0 will exclude examples that have no ngram overlaps with the input.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "9cbc0acc",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "4f318f4b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# These are examples of a fictional translation task.\n",
|
||||
"examples = [\n",
|
||||
" {\"input\": \"See Spot run.\", \"output\": \"Ver correr a Spot.\"},\n",
|
||||
" {\"input\": \"My dog barks.\", \"output\": \"Mi perro ladra.\"},\n",
|
||||
" {\"input\": \"Spot can run.\", \"output\": \"Spot puede correr.\"},\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "bf75e0fe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"example_prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"input\", \"output\"],\n",
|
||||
" template=\"Input: {input}\\nOutput: {output}\",\n",
|
||||
")\n",
|
||||
"example_selector = NGramOverlapExampleSelector(\n",
|
||||
" # These are the examples it has available to choose from.\n",
|
||||
" examples=examples, \n",
|
||||
" # This is the PromptTemplate being used to format the examples.\n",
|
||||
" example_prompt=example_prompt, \n",
|
||||
" # This is the threshold, at which selector stops.\n",
|
||||
" # It is set to -1.0 by default.\n",
|
||||
" threshold=-1.0,\n",
|
||||
" # For negative threshold:\n",
|
||||
" # Selector sorts examples by ngram overlap score, and excludes none.\n",
|
||||
" # For threshold greater than 1.0:\n",
|
||||
" # Selector excludes all examples, and returns an empty list.\n",
|
||||
" # For threshold equal to 0.0:\n",
|
||||
" # Selector sorts examples by ngram overlap score,\n",
|
||||
" # and excludes those with no ngram overlap with input.\n",
|
||||
")\n",
|
||||
"dynamic_prompt = FewShotPromptTemplate(\n",
|
||||
" # We provide an ExampleSelector instead of examples.\n",
|
||||
" example_selector=example_selector,\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" prefix=\"Give the Spanish translation of every input\",\n",
|
||||
" suffix=\"Input: {sentence}\\nOutput:\", \n",
|
||||
" input_variables=[\"sentence\"],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "83fb218a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Give the Spanish translation of every input\n",
|
||||
"\n",
|
||||
"Input: Spot can run.\n",
|
||||
"Output: Spot puede correr.\n",
|
||||
"\n",
|
||||
"Input: See Spot run.\n",
|
||||
"Output: Ver correr a Spot.\n",
|
||||
"\n",
|
||||
"Input: My dog barks.\n",
|
||||
"Output: Mi perro ladra.\n",
|
||||
"\n",
|
||||
"Input: Spot can run fast.\n",
|
||||
"Output:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# An example input with large ngram overlap with \"Spot can run.\"\n",
|
||||
"# and no overlap with \"My dog barks.\"\n",
|
||||
"print(dynamic_prompt.format(sentence=\"Spot can run fast.\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "485f5307",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Give the Spanish translation of every input\n",
|
||||
"\n",
|
||||
"Input: Spot can run.\n",
|
||||
"Output: Spot puede correr.\n",
|
||||
"\n",
|
||||
"Input: See Spot run.\n",
|
||||
"Output: Ver correr a Spot.\n",
|
||||
"\n",
|
||||
"Input: Spot plays fetch.\n",
|
||||
"Output: Spot juega a buscar.\n",
|
||||
"\n",
|
||||
"Input: My dog barks.\n",
|
||||
"Output: Mi perro ladra.\n",
|
||||
"\n",
|
||||
"Input: Spot can run fast.\n",
|
||||
"Output:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can add examples to NGramOverlapExampleSelector as well.\n",
|
||||
"new_example = {\"input\": \"Spot plays fetch.\", \"output\": \"Spot juega a buscar.\"}\n",
|
||||
"\n",
|
||||
"example_selector.add_example(new_example)\n",
|
||||
"print(dynamic_prompt.format(sentence=\"Spot can run fast.\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "606ce697",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Give the Spanish translation of every input\n",
|
||||
"\n",
|
||||
"Input: Spot can run.\n",
|
||||
"Output: Spot puede correr.\n",
|
||||
"\n",
|
||||
"Input: See Spot run.\n",
|
||||
"Output: Ver correr a Spot.\n",
|
||||
"\n",
|
||||
"Input: Spot plays fetch.\n",
|
||||
"Output: Spot juega a buscar.\n",
|
||||
"\n",
|
||||
"Input: Spot can run fast.\n",
|
||||
"Output:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can set a threshold at which examples are excluded.\n",
|
||||
"# For example, setting threshold equal to 0.0\n",
|
||||
"# excludes examples with no ngram overlaps with input.\n",
|
||||
"# Since \"My dog barks.\" has no ngram overlaps with \"Spot can run fast.\"\n",
|
||||
"# it is excluded.\n",
|
||||
"example_selector.threshold=0.0\n",
|
||||
"print(dynamic_prompt.format(sentence=\"Spot can run fast.\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 87,
|
||||
"id": "7f8d72f7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Give the Spanish translation of every input\n",
|
||||
"\n",
|
||||
"Input: Spot can run.\n",
|
||||
"Output: Spot puede correr.\n",
|
||||
"\n",
|
||||
"Input: Spot plays fetch.\n",
|
||||
"Output: Spot juega a buscar.\n",
|
||||
"\n",
|
||||
"Input: Spot can play fetch.\n",
|
||||
"Output:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Setting small nonzero threshold\n",
|
||||
"example_selector.threshold=0.09\n",
|
||||
"print(dynamic_prompt.format(sentence=\"Spot can play fetch.\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 88,
|
||||
"id": "09633aa8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Give the Spanish translation of every input\n",
|
||||
"\n",
|
||||
"Input: Spot can play fetch.\n",
|
||||
"Output:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Setting threshold greater than 1.0\n",
|
||||
"example_selector.threshold=1.0+1e-9\n",
|
||||
"print(dynamic_prompt.format(sentence=\"Spot can play fetch.\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c746d6f4",
|
||||
"id": "39f30097",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -151,6 +151,47 @@
|
||||
"multiple_input_prompt.format(adjective=\"funny\", content=\"chickens\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cc991ad2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## From Template\n",
|
||||
"You can also easily load a prompt template by just specifying the template, and not worrying about the input variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "d0a0756c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"Tell me a {adjective} joke about {content}.\"\n",
|
||||
"multiple_input_prompt = PromptTemplate.from_template(template)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "59046640",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"PromptTemplate(input_variables=['adjective', 'content'], output_parser=None, template='Tell me a {adjective} joke about {content}.', template_format='f-string', validate_template=True)"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"multiple_input_prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2dd6154",
|
||||
@@ -291,6 +332,69 @@
|
||||
"print(prompt_from_string_examples.format(adjective=\"big\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "874b7575",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Few Shot Prompts with Templates\n",
|
||||
"We can also construct few shot prompt templates where the prefix and suffix themselves are prompt templates"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "e710115f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts import FewShotPromptWithTemplates"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "5bf23a65",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"prefix = PromptTemplate(input_variables=[\"content\"], template=\"This is a test about {content}.\")\n",
|
||||
"suffix = PromptTemplate(input_variables=[\"new_content\"], template=\"Now you try to talk about {new_content}.\")\n",
|
||||
"\n",
|
||||
"prompt = FewShotPromptWithTemplates(\n",
|
||||
" suffix=suffix,\n",
|
||||
" prefix=prefix,\n",
|
||||
" input_variables=[\"content\", \"new_content\"],\n",
|
||||
" examples=examples,\n",
|
||||
" example_prompt=example_prompt,\n",
|
||||
" example_separator=\"\\n\",\n",
|
||||
")\n",
|
||||
"output = prompt.format(content=\"animals\", new_content=\"party\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "d4036351",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"This is a test about animals.\n",
|
||||
"Input: happy\n",
|
||||
"Output: sad\n",
|
||||
"Input: tall\n",
|
||||
"Output: short\n",
|
||||
"Now you try to talk about party.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(output)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bf038596",
|
||||
|
||||
@@ -19,11 +19,6 @@ The user guide here shows more advanced workflows and how to use the library in
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
@@ -255,10 +255,68 @@
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "59428e05",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## InstructEmbeddings\n",
|
||||
"Let's load the HuggingFace instruct Embeddings class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "92c5b61e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings import HuggingFaceInstructEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "062547b9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"load INSTRUCTOR_Transformer\n",
|
||||
"max_seq_length 512\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"embeddings = HuggingFaceInstructEmbeddings(query_instruction=\"Represent the query for retrieval: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "e1dcc4bd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "90f0db94",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "90f0db94",
|
||||
"id": "a961cdb5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "b118c9dc",
|
||||
"metadata": {},
|
||||
@@ -152,7 +151,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Document creation\n",
|
||||
"We can also use the text splitter to create \"Documents\" directly. Documents a way of bundling pieces of text with associated metadata so that chains can interact with them. We can also create documents with empty metadata though!\n",
|
||||
"We can also use the text splitter to create \"Documents\" directly. Documents are a way of bundling pieces of text with associated metadata so that chains can interact with them. We can also create documents with empty metadata though!\n",
|
||||
"\n",
|
||||
"In the below example, we pass two pieces of text to get split up (we pass two just to show off the interface of splitting multiple pieces of text)."
|
||||
]
|
||||
@@ -476,10 +475,59 @@
|
||||
"print(texts[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "53049ff5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Token Text Splitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a1a118b1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import TokenTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "ef37c5d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "5750228a",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Madam Speaker, Madam Vice President, our\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"texts = text_splitter.split_text(state_of_the_union)\n",
|
||||
"print(texts[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a1a118b1",
|
||||
"id": "0905c1de",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -487,7 +535,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -501,7 +549,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.12 (main, Mar 26 2022, 15:51:15) \n[Clang 13.1.6 (clang-1316.0.21.2)]"
|
||||
"version": "3.10.9"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 3,
|
||||
"id": "015f4ff5",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
@@ -210,39 +210,27 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 4,
|
||||
"id": "b58b3955",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pickle"
|
||||
"docsearch.save_local(\"faiss_index\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "1897e23d",
|
||||
"execution_count": 5,
|
||||
"id": "ca72c650",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"foo.pkl\", 'wb') as f:\n",
|
||||
" pickle.dump(docsearch, f)"
|
||||
"new_docsearch = FAISS.load_local(\"faiss_index\", embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "bf3732f1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"foo.pkl\", 'rb') as f:\n",
|
||||
" new_docsearch = pickle.load(f)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 6,
|
||||
"id": "5bf2ee24",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -252,7 +240,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 7,
|
||||
"id": "edc2aad1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -262,7 +250,7 @@
|
||||
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -483,7 +471,10 @@
|
||||
"import pinecone \n",
|
||||
"\n",
|
||||
"# initialize pinecone\n",
|
||||
"pinecone.init(api_key=\"\", environment=\"us-west1-gcp\")\n",
|
||||
"pinecone.init(\n",
|
||||
" api_key=\"YOUR_API_KEY\", # find at app.pinecone.io\n",
|
||||
" environment=\"YOUR_ENV\" # next to api key in console\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"index_name = \"langchain-demo\"\n",
|
||||
"\n",
|
||||
@@ -566,10 +557,74 @@
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6c3ec797",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Milvus\n",
|
||||
"To run, you should have a Milvus instance up and running: https://milvus.io/docs/install_standalone-docker.md"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "be347313",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Milvus"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "f2eee23f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_db = Milvus.from_texts(\n",
|
||||
" texts,\n",
|
||||
" embeddings,\n",
|
||||
" connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "06bdb701",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = vector_db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7b3e94aa",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={}, lookup_index=0)"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8ffd66e2",
|
||||
"id": "4af5a071",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
||||
@@ -6,7 +6,7 @@ These agents can be used to power the next generation of personal assistants -
|
||||
systems that intelligently understand what you mean, and then can take actions to help you accomplish your goal.
|
||||
|
||||
Agents are a core use of LangChain - so much so that there is a whole module dedicated to them.
|
||||
Therefor, we recommend that you check out that documentation for detailed instruction on how to work
|
||||
Therefore, we recommend that you check out that documentation for detailed instruction on how to work
|
||||
with them.
|
||||
|
||||
- [Agent Documentation](../modules/agents.rst)
|
||||
|
||||
@@ -22,7 +22,7 @@ from langchain.chains import (
|
||||
VectorDBQAWithSourcesChain,
|
||||
)
|
||||
from langchain.docstore import InMemoryDocstore, Wikipedia
|
||||
from langchain.llms import Cohere, HuggingFaceHub, OpenAI
|
||||
from langchain.llms import Anthropic, Cohere, HuggingFaceHub, OpenAI
|
||||
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
|
||||
from langchain.prompts import (
|
||||
BasePromptTemplate,
|
||||
@@ -50,6 +50,7 @@ __all__ = [
|
||||
"SerpAPIChain",
|
||||
"GoogleSearchAPIWrapper",
|
||||
"WolframAlphaAPIWrapper",
|
||||
"Anthropic",
|
||||
"Cohere",
|
||||
"OpenAI",
|
||||
"BasePromptTemplate",
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
"""Chain that takes in an input and produces an action and action input."""
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from abc import abstractmethod
|
||||
@@ -71,6 +72,19 @@ class Agent(BaseModel):
|
||||
tool=parsed_output[0], tool_input=parsed_output[1], log=full_output
|
||||
)
|
||||
|
||||
async def _aget_next_action(self, full_inputs: Dict[str, str]) -> AgentAction:
|
||||
full_output = await self.llm_chain.apredict(**full_inputs)
|
||||
parsed_output = self._extract_tool_and_input(full_output)
|
||||
while parsed_output is None:
|
||||
full_output = self._fix_text(full_output)
|
||||
full_inputs["agent_scratchpad"] += full_output
|
||||
output = await self.llm_chain.apredict(**full_inputs)
|
||||
full_output += output
|
||||
parsed_output = self._extract_tool_and_input(full_output)
|
||||
return AgentAction(
|
||||
tool=parsed_output[0], tool_input=parsed_output[1], log=full_output
|
||||
)
|
||||
|
||||
def plan(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
|
||||
) -> Union[AgentAction, AgentFinish]:
|
||||
@@ -84,15 +98,40 @@ class Agent(BaseModel):
|
||||
Returns:
|
||||
Action specifying what tool to use.
|
||||
"""
|
||||
thoughts = self._construct_scratchpad(intermediate_steps)
|
||||
new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
|
||||
full_inputs = {**kwargs, **new_inputs}
|
||||
|
||||
full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
|
||||
action = self._get_next_action(full_inputs)
|
||||
if action.tool == self.finish_tool_name:
|
||||
return AgentFinish({"output": action.tool_input}, action.log)
|
||||
return action
|
||||
|
||||
async def aplan(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
|
||||
) -> Union[AgentAction, AgentFinish]:
|
||||
"""Given input, decided what to do.
|
||||
|
||||
Args:
|
||||
intermediate_steps: Steps the LLM has taken to date,
|
||||
along with observations
|
||||
**kwargs: User inputs.
|
||||
|
||||
Returns:
|
||||
Action specifying what tool to use.
|
||||
"""
|
||||
full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
|
||||
action = await self._aget_next_action(full_inputs)
|
||||
if action.tool == self.finish_tool_name:
|
||||
return AgentFinish({"output": action.tool_input}, action.log)
|
||||
return action
|
||||
|
||||
def get_full_inputs(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
|
||||
) -> Dict[str, Any]:
|
||||
"""Create the full inputs for the LLMChain from intermediate steps."""
|
||||
thoughts = self._construct_scratchpad(intermediate_steps)
|
||||
new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
|
||||
full_inputs = {**kwargs, **new_inputs}
|
||||
return full_inputs
|
||||
|
||||
def prepare_for_new_call(self) -> None:
|
||||
"""Prepare the agent for new call, if needed."""
|
||||
pass
|
||||
@@ -338,6 +377,14 @@ class AgentExecutor(Chain, BaseModel):
|
||||
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, Any]:
|
||||
"""Run text through and get agent response."""
|
||||
# Make sure that every tool is synchronous (not a coroutine)
|
||||
for tool in self.tools:
|
||||
if asyncio.iscoroutinefunction(tool.func):
|
||||
raise ValueError(
|
||||
"Tools cannot be asynchronous for `run` method. "
|
||||
"Please use `arun` instead."
|
||||
)
|
||||
|
||||
# Do any preparation necessary when receiving a new input.
|
||||
self.agent.prepare_for_new_call()
|
||||
# Construct a mapping of tool name to tool for easy lookup
|
||||
@@ -399,3 +446,81 @@ class AgentExecutor(Chain, BaseModel):
|
||||
self.early_stopping_method, intermediate_steps, **inputs
|
||||
)
|
||||
return self._return(output, intermediate_steps)
|
||||
|
||||
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
"""Run text through and get agent response."""
|
||||
# Make sure that every tool is asynchronous (a coroutine)
|
||||
for tool in self.tools:
|
||||
if tool.coroutine and not asyncio.iscoroutinefunction(tool.coroutine):
|
||||
raise ValueError(
|
||||
"The coroutine for the tool must be a coroutine function."
|
||||
)
|
||||
|
||||
# Do any preparation necessary when receiving a new input.
|
||||
self.agent.prepare_for_new_call()
|
||||
# Construct a mapping of tool name to tool for easy lookup
|
||||
name_to_tool_map = {tool.name: tool for tool in self.tools}
|
||||
# We construct a mapping from each tool to a color, used for logging.
|
||||
color_mapping = get_color_mapping(
|
||||
[tool.name for tool in self.tools], excluded_colors=["green"]
|
||||
)
|
||||
intermediate_steps: List[Tuple[AgentAction, str]] = []
|
||||
# Let's start tracking the iterations the agent has gone through
|
||||
iterations = 0
|
||||
# We now enter the agent loop (until it returns something).
|
||||
while self._should_continue(iterations):
|
||||
# Call the LLM to see what to do.
|
||||
output = await self.agent.aplan(intermediate_steps, **inputs)
|
||||
# If the tool chosen is the finishing tool, then we end and return.
|
||||
if isinstance(output, AgentFinish):
|
||||
return self._return(output, intermediate_steps)
|
||||
|
||||
# Otherwise we lookup the tool
|
||||
if output.tool in name_to_tool_map:
|
||||
tool = name_to_tool_map[output.tool]
|
||||
self.callback_manager.on_tool_start(
|
||||
{"name": str(tool.func)[:60] + "..."},
|
||||
output,
|
||||
verbose=self.verbose,
|
||||
)
|
||||
try:
|
||||
# We then call the tool on the tool input to get an observation
|
||||
observation = (
|
||||
await tool.coroutine(output.tool_input)
|
||||
if tool.coroutine
|
||||
# If the tool is not a coroutine, we run it in the executor
|
||||
# to avoid blocking the event loop.
|
||||
else await asyncio.get_event_loop().run_in_executor(
|
||||
None, tool.func, output.tool_input
|
||||
)
|
||||
)
|
||||
color = color_mapping[output.tool]
|
||||
return_direct = tool.return_direct
|
||||
except (KeyboardInterrupt, Exception) as e:
|
||||
self.callback_manager.on_tool_error(e, verbose=self.verbose)
|
||||
raise e
|
||||
else:
|
||||
self.callback_manager.on_tool_start(
|
||||
{"name": "N/A"}, output, verbose=self.verbose
|
||||
)
|
||||
observation = f"{output.tool} is not a valid tool, try another one."
|
||||
color = None
|
||||
return_direct = False
|
||||
llm_prefix = "" if return_direct else self.agent.llm_prefix
|
||||
self.callback_manager.on_tool_end(
|
||||
observation,
|
||||
color=color,
|
||||
observation_prefix=self.agent.observation_prefix,
|
||||
llm_prefix=llm_prefix,
|
||||
verbose=self.verbose,
|
||||
)
|
||||
intermediate_steps.append((output, observation))
|
||||
if return_direct:
|
||||
# Set the log to "" because we do not want to log it.
|
||||
output = AgentFinish({self.agent.return_values[0]: observation}, "")
|
||||
return self._return(output, intermediate_steps)
|
||||
iterations += 1
|
||||
output = self.agent.return_stopped_response(
|
||||
self.early_stopping_method, intermediate_steps, **inputs
|
||||
)
|
||||
return self._return(output, intermediate_steps)
|
||||
|
||||
@@ -65,9 +65,10 @@ def _get_pal_colored_objects(llm: BaseLLM) -> Tool:
|
||||
|
||||
def _get_llm_math(llm: BaseLLM) -> Tool:
|
||||
return Tool(
|
||||
"Calculator",
|
||||
LLMMathChain(llm=llm).run,
|
||||
"Useful for when you need to answer questions about math.",
|
||||
name="Calculator",
|
||||
description="Useful for when you need to answer questions about math.",
|
||||
func=LLMMathChain(llm=llm, callback_manager=llm.callback_manager).run,
|
||||
coroutine=LLMMathChain(llm=llm, callback_manager=llm.callback_manager).arun,
|
||||
)
|
||||
|
||||
|
||||
@@ -132,9 +133,10 @@ def _get_google_search(**kwargs: Any) -> Tool:
|
||||
|
||||
def _get_serpapi(**kwargs: Any) -> Tool:
|
||||
return Tool(
|
||||
"Search",
|
||||
SerpAPIWrapper(**kwargs).run,
|
||||
"A search engine. Useful for when you need to answer questions about current events. Input should be a search query.",
|
||||
name="Search",
|
||||
description="A search engine. Useful for when you need to answer questions about current events. Input should be a search query.",
|
||||
func=SerpAPIWrapper(**kwargs).run,
|
||||
coroutine=SerpAPIWrapper(**kwargs).arun,
|
||||
)
|
||||
|
||||
|
||||
@@ -145,7 +147,7 @@ _EXTRA_LLM_TOOLS = {
|
||||
_EXTRA_OPTIONAL_TOOLS = {
|
||||
"wolfram-alpha": (_get_wolfram_alpha, ["wolfram_alpha_appid"]),
|
||||
"google-search": (_get_google_search, ["google_api_key", "google_cse_id"]),
|
||||
"serpapi": (_get_serpapi, ["serpapi_api_key"]),
|
||||
"serpapi": (_get_serpapi, ["serpapi_api_key", "aiosession"]),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ def get_action_and_input(llm_output: str) -> Tuple[str, str]:
|
||||
if FINAL_ANSWER_ACTION in llm_output:
|
||||
return "Final Answer", llm_output.split(FINAL_ANSWER_ACTION)[-1].strip()
|
||||
regex = r"Action: (.*?)\nAction Input: (.*)"
|
||||
match = re.search(regex, llm_output)
|
||||
match = re.search(regex, llm_output, re.DOTALL)
|
||||
if not match:
|
||||
raise ValueError(f"Could not parse LLM output: `{llm_output}`")
|
||||
action = match.group(1).strip()
|
||||
|
||||
@@ -1,7 +1,8 @@
|
||||
"""Interface for tools."""
|
||||
import asyncio
|
||||
from dataclasses import dataclass
|
||||
from inspect import signature
|
||||
from typing import Any, Callable, Optional, Union
|
||||
from typing import Any, Awaitable, Callable, Optional, Union
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -12,9 +13,13 @@ class Tool:
|
||||
func: Callable[[str], str]
|
||||
description: Optional[str] = None
|
||||
return_direct: bool = False
|
||||
# If the tool has a coroutine, then we can use this to run it asynchronously
|
||||
coroutine: Optional[Callable[[str], Awaitable[str]]] = None
|
||||
|
||||
def __call__(self, *args: Any, **kwargs: Any) -> str:
|
||||
"""Make tools callable by piping through to `func`."""
|
||||
if asyncio.iscoroutinefunction(self.func):
|
||||
raise TypeError("Coroutine cannot be called directly")
|
||||
return self.func(*args, **kwargs)
|
||||
|
||||
|
||||
|
||||
@@ -9,6 +9,10 @@ from langchain.schema import AgentAction, AgentFinish, LLMResult
|
||||
class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
"""Callback Handler that prints to std out."""
|
||||
|
||||
def __init__(self, color: str = "green") -> None:
|
||||
"""Initialize callback handler."""
|
||||
self.color = color
|
||||
|
||||
def on_llm_start(
|
||||
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
|
||||
) -> None:
|
||||
@@ -50,7 +54,7 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Print out the log in specified color."""
|
||||
print_text(action.log, color=color)
|
||||
print_text(action.log, color=color if color else self.color)
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
@@ -62,7 +66,7 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
) -> None:
|
||||
"""If not the final action, print out observation."""
|
||||
print_text(f"\n{observation_prefix}")
|
||||
print_text(output, color=color)
|
||||
print_text(output, color=color if color else self.color)
|
||||
print_text(f"\n{llm_prefix}")
|
||||
|
||||
def on_tool_error(
|
||||
@@ -79,10 +83,10 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
**kwargs: Optional[str],
|
||||
) -> None:
|
||||
"""Run when agent ends."""
|
||||
print_text(text, color=color, end=end)
|
||||
print_text(text, color=color if color else self.color, end=end)
|
||||
|
||||
def on_agent_finish(
|
||||
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
|
||||
) -> None:
|
||||
"""Run on agent end."""
|
||||
print_text(finish.log, color=color, end="\n")
|
||||
print_text(finish.log, color=color if self.color else color, end="\n")
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
"""Chains are easily reusable components which can be linked together."""
|
||||
from langchain.chains.api.base import APIChain
|
||||
from langchain.chains.chat_vector_db.base import ChatVectorDBChain
|
||||
from langchain.chains.combine_documents.base import AnalyzeDocumentChain
|
||||
from langchain.chains.conversation.base import ConversationChain
|
||||
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
|
||||
from langchain.chains.llm import LLMChain
|
||||
@@ -22,7 +24,6 @@ from langchain.chains.transform import TransformChain
|
||||
from langchain.chains.vector_db_qa.base import VectorDBQA
|
||||
|
||||
__all__ = [
|
||||
"APIChain",
|
||||
"ConversationChain",
|
||||
"LLMChain",
|
||||
"LLMBashChain",
|
||||
@@ -42,5 +43,7 @@ __all__ = [
|
||||
"OpenAIModerationChain",
|
||||
"SQLDatabaseSequentialChain",
|
||||
"load_chain",
|
||||
"AnalyzeDocumentChain",
|
||||
"HypotheticalDocumentEmbedder",
|
||||
"ChatVectorDBChain",
|
||||
]
|
||||
|
||||
@@ -111,6 +111,10 @@ class Chain(BaseModel, ABC):
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
"""Run the logic of this chain and return the output."""
|
||||
|
||||
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
"""Run the logic of this chain and return the output."""
|
||||
raise NotImplementedError("Async call not supported for this chain type.")
|
||||
|
||||
def __call__(
|
||||
self, inputs: Union[Dict[str, Any], Any], return_only_outputs: bool = False
|
||||
) -> Dict[str, Any]:
|
||||
@@ -125,6 +129,65 @@ class Chain(BaseModel, ABC):
|
||||
chain will be returned. Defaults to False.
|
||||
|
||||
"""
|
||||
inputs = self.prep_inputs(inputs)
|
||||
self.callback_manager.on_chain_start(
|
||||
{"name": self.__class__.__name__},
|
||||
inputs,
|
||||
verbose=self.verbose,
|
||||
)
|
||||
try:
|
||||
outputs = self._call(inputs)
|
||||
except (KeyboardInterrupt, Exception) as e:
|
||||
self.callback_manager.on_chain_error(e, verbose=self.verbose)
|
||||
raise e
|
||||
self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
|
||||
return self.prep_outputs(inputs, outputs, return_only_outputs)
|
||||
|
||||
async def acall(
|
||||
self, inputs: Union[Dict[str, Any], Any], return_only_outputs: bool = False
|
||||
) -> Dict[str, Any]:
|
||||
"""Run the logic of this chain and add to output if desired.
|
||||
|
||||
Args:
|
||||
inputs: Dictionary of inputs, or single input if chain expects
|
||||
only one param.
|
||||
return_only_outputs: boolean for whether to return only outputs in the
|
||||
response. If True, only new keys generated by this chain will be
|
||||
returned. If False, both input keys and new keys generated by this
|
||||
chain will be returned. Defaults to False.
|
||||
|
||||
"""
|
||||
inputs = self.prep_inputs(inputs)
|
||||
self.callback_manager.on_chain_start(
|
||||
{"name": self.__class__.__name__},
|
||||
inputs,
|
||||
verbose=self.verbose,
|
||||
)
|
||||
try:
|
||||
outputs = await self._acall(inputs)
|
||||
except (KeyboardInterrupt, Exception) as e:
|
||||
self.callback_manager.on_chain_error(e, verbose=self.verbose)
|
||||
raise e
|
||||
self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
|
||||
return self.prep_outputs(inputs, outputs, return_only_outputs)
|
||||
|
||||
def prep_outputs(
|
||||
self,
|
||||
inputs: Dict[str, str],
|
||||
outputs: Dict[str, str],
|
||||
return_only_outputs: bool = False,
|
||||
) -> Dict[str, str]:
|
||||
"""Validate and prep outputs."""
|
||||
self._validate_outputs(outputs)
|
||||
if self.memory is not None:
|
||||
self.memory.save_context(inputs, outputs)
|
||||
if return_only_outputs:
|
||||
return outputs
|
||||
else:
|
||||
return {**inputs, **outputs}
|
||||
|
||||
def prep_inputs(self, inputs: Union[Dict[str, Any], Any]) -> Dict[str, str]:
|
||||
"""Validate and prep inputs."""
|
||||
if not isinstance(inputs, dict):
|
||||
_input_keys = set(self.input_keys)
|
||||
if self.memory is not None:
|
||||
@@ -143,24 +206,7 @@ class Chain(BaseModel, ABC):
|
||||
external_context = self.memory.load_memory_variables(inputs)
|
||||
inputs = dict(inputs, **external_context)
|
||||
self._validate_inputs(inputs)
|
||||
self.callback_manager.on_chain_start(
|
||||
{"name": self.__class__.__name__},
|
||||
inputs,
|
||||
verbose=self.verbose,
|
||||
)
|
||||
try:
|
||||
outputs = self._call(inputs)
|
||||
except (KeyboardInterrupt, Exception) as e:
|
||||
self.callback_manager.on_chain_error(e, verbose=self.verbose)
|
||||
raise e
|
||||
self.callback_manager.on_chain_end(outputs, verbose=self.verbose)
|
||||
self._validate_outputs(outputs)
|
||||
if self.memory is not None:
|
||||
self.memory.save_context(inputs, outputs)
|
||||
if return_only_outputs:
|
||||
return outputs
|
||||
else:
|
||||
return {**inputs, **outputs}
|
||||
return inputs
|
||||
|
||||
def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
|
||||
"""Call the chain on all inputs in the list."""
|
||||
@@ -187,6 +233,27 @@ class Chain(BaseModel, ABC):
|
||||
f" but not both. Got args: {args} and kwargs: {kwargs}."
|
||||
)
|
||||
|
||||
async def arun(self, *args: str, **kwargs: str) -> str:
|
||||
"""Run the chain as text in, text out or multiple variables, text out."""
|
||||
if len(self.output_keys) != 1:
|
||||
raise ValueError(
|
||||
f"`run` not supported when there is not exactly "
|
||||
f"one output key. Got {self.output_keys}."
|
||||
)
|
||||
|
||||
if args and not kwargs:
|
||||
if len(args) != 1:
|
||||
raise ValueError("`run` supports only one positional argument.")
|
||||
return (await self.acall(args[0]))[self.output_keys[0]]
|
||||
|
||||
if kwargs and not args:
|
||||
return (await self.acall(kwargs))[self.output_keys[0]]
|
||||
|
||||
raise ValueError(
|
||||
f"`run` supported with either positional arguments or keyword arguments"
|
||||
f" but not both. Got args: {args} and kwargs: {kwargs}."
|
||||
)
|
||||
|
||||
def dict(self, **kwargs: Any) -> Dict:
|
||||
"""Return dictionary representation of chain."""
|
||||
if self.memory is not None:
|
||||
|
||||
1
langchain/chains/chat_vector_db/__init__.py
Normal file
1
langchain/chains/chat_vector_db/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Chain for chatting with a vector database."""
|
||||
85
langchain/chains/chat_vector_db/base.py
Normal file
85
langchain/chains/chat_vector_db/base.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""Chain for chatting with a vector database."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Tuple
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.chains.question_answering import load_qa_chain
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
from langchain.vectorstores.base import VectorStore
|
||||
|
||||
|
||||
def _get_chat_history(chat_history: List[Tuple[str, str]]) -> str:
|
||||
buffer = ""
|
||||
for human_s, ai_s in chat_history:
|
||||
human = "Human: " + human_s
|
||||
ai = "Assistant: " + ai_s
|
||||
buffer += "\n" + "\n".join([human, ai])
|
||||
return buffer
|
||||
|
||||
|
||||
class ChatVectorDBChain(Chain, BaseModel):
|
||||
"""Chain for chatting with a vector database."""
|
||||
|
||||
vectorstore: VectorStore
|
||||
combine_docs_chain: BaseCombineDocumentsChain
|
||||
question_generator: LLMChain
|
||||
output_key: str = "answer"
|
||||
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
return "chat-vector-db"
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Input keys."""
|
||||
return ["question", "chat_history"]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Output keys."""
|
||||
return [self.output_key]
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLLM,
|
||||
vectorstore: VectorStore,
|
||||
condense_question_prompt: BasePromptTemplate = CONDENSE_QUESTION_PROMPT,
|
||||
qa_prompt: BasePromptTemplate = QA_PROMPT,
|
||||
chain_type: str = "stuff",
|
||||
) -> ChatVectorDBChain:
|
||||
"""Load chain from LLM."""
|
||||
doc_chain = load_qa_chain(
|
||||
llm,
|
||||
chain_type=chain_type,
|
||||
prompt=qa_prompt,
|
||||
)
|
||||
condense_question_chain = LLMChain(llm=llm, prompt=condense_question_prompt)
|
||||
return cls(
|
||||
vectorstore=vectorstore,
|
||||
combine_docs_chain=doc_chain,
|
||||
question_generator=condense_question_chain,
|
||||
)
|
||||
|
||||
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
|
||||
question = inputs["question"]
|
||||
chat_history_str = _get_chat_history(inputs["chat_history"])
|
||||
if chat_history_str:
|
||||
new_question = self.question_generator.run(
|
||||
question=question, chat_history=chat_history_str
|
||||
)
|
||||
else:
|
||||
new_question = question
|
||||
docs = self.vectorstore.similarity_search(new_question, k=4)
|
||||
new_inputs = inputs.copy()
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["chat_history"] = chat_history_str
|
||||
answer, _ = self.combine_docs_chain.combine_docs(docs, **new_inputs)
|
||||
return {self.output_key: answer}
|
||||
20
langchain/chains/chat_vector_db/prompts.py
Normal file
20
langchain/chains/chat_vector_db/prompts.py
Normal file
@@ -0,0 +1,20 @@
|
||||
# flake8: noqa
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
|
||||
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
|
||||
|
||||
Chat History:
|
||||
{chat_history}
|
||||
Follow Up Input: {question}
|
||||
Standalone question:"""
|
||||
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
|
||||
|
||||
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
Helpful Answer:"""
|
||||
QA_PROMPT = PromptTemplate(
|
||||
template=prompt_template, input_variables=["context", "question"]
|
||||
)
|
||||
@@ -3,10 +3,11 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter, TextSplitter
|
||||
|
||||
|
||||
class BaseCombineDocumentsChain(Chain, BaseModel, ABC):
|
||||
@@ -49,3 +50,36 @@ class BaseCombineDocumentsChain(Chain, BaseModel, ABC):
|
||||
output, extra_return_dict = self.combine_docs(docs, **other_keys)
|
||||
extra_return_dict[self.output_key] = output
|
||||
return extra_return_dict
|
||||
|
||||
|
||||
class AnalyzeDocumentChain(Chain, BaseModel):
|
||||
"""Chain that splits documents, then analyzes it in pieces."""
|
||||
|
||||
input_key: str = "input_document" #: :meta private:
|
||||
output_key: str = "output_text" #: :meta private:
|
||||
text_splitter: TextSplitter = Field(default_factory=RecursiveCharacterTextSplitter)
|
||||
combine_docs_chain: BaseCombineDocumentsChain
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Expect input key.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Return output key.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
return [self.output_key]
|
||||
|
||||
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
|
||||
document = inputs[self.input_key]
|
||||
docs = self.text_splitter.create_documents([document])
|
||||
# Other keys are assumed to be needed for LLM prediction
|
||||
other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
|
||||
other_keys[self.combine_docs_chain.input_key] = docs
|
||||
return self.combine_docs_chain(other_keys, return_only_outputs=True)
|
||||
|
||||
@@ -27,7 +27,6 @@ class ConversationChain(LLMChain, BaseModel):
|
||||
|
||||
input_key: str = "input" #: :meta private:
|
||||
output_key: str = "response" #: :meta private:
|
||||
buffer: str = "" #: :meta private:
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
"""Chain that just formats a prompt and calls an LLM."""
|
||||
from typing import Any, Dict, List, Sequence, Union
|
||||
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union
|
||||
|
||||
from pydantic import BaseModel, Extra
|
||||
|
||||
@@ -7,6 +7,7 @@ from langchain.chains.base import Chain
|
||||
from langchain.input import get_colored_text
|
||||
from langchain.llms.base import BaseLLM
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
from langchain.schema import LLMResult
|
||||
|
||||
|
||||
@@ -54,6 +55,20 @@ class LLMChain(Chain, BaseModel):
|
||||
|
||||
def generate(self, input_list: List[Dict[str, Any]]) -> LLMResult:
|
||||
"""Generate LLM result from inputs."""
|
||||
prompts, stop = self.prep_prompts(input_list)
|
||||
response = self.llm.generate(prompts, stop=stop)
|
||||
return response
|
||||
|
||||
async def agenerate(self, input_list: List[Dict[str, Any]]) -> LLMResult:
|
||||
"""Generate LLM result from inputs."""
|
||||
prompts, stop = self.prep_prompts(input_list)
|
||||
response = await self.llm.agenerate(prompts, stop=stop)
|
||||
return response
|
||||
|
||||
def prep_prompts(
|
||||
self, input_list: List[Dict[str, Any]]
|
||||
) -> Tuple[List[str], Optional[List[str]]]:
|
||||
"""Prepare prompts from inputs."""
|
||||
stop = None
|
||||
if "stop" in input_list[0]:
|
||||
stop = input_list[0]["stop"]
|
||||
@@ -69,12 +84,20 @@ class LLMChain(Chain, BaseModel):
|
||||
"If `stop` is present in any inputs, should be present in all."
|
||||
)
|
||||
prompts.append(prompt)
|
||||
response = self.llm.generate(prompts, stop=stop)
|
||||
return response
|
||||
return prompts, stop
|
||||
|
||||
def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
|
||||
"""Utilize the LLM generate method for speed gains."""
|
||||
response = self.generate(input_list)
|
||||
return self.create_outputs(response)
|
||||
|
||||
async def aapply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
|
||||
"""Utilize the LLM generate method for speed gains."""
|
||||
response = await self.agenerate(input_list)
|
||||
return self.create_outputs(response)
|
||||
|
||||
def create_outputs(self, response: LLMResult) -> List[Dict[str, str]]:
|
||||
"""Create outputs from response."""
|
||||
outputs = []
|
||||
for generation in response.generations:
|
||||
# Get the text of the top generated string.
|
||||
@@ -85,6 +108,9 @@ class LLMChain(Chain, BaseModel):
|
||||
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
|
||||
return self.apply([inputs])[0]
|
||||
|
||||
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
|
||||
return (await self.aapply([inputs]))[0]
|
||||
|
||||
def predict(self, **kwargs: Any) -> str:
|
||||
"""Format prompt with kwargs and pass to LLM.
|
||||
|
||||
@@ -101,6 +127,22 @@ class LLMChain(Chain, BaseModel):
|
||||
"""
|
||||
return self(kwargs)[self.output_key]
|
||||
|
||||
async def apredict(self, **kwargs: Any) -> str:
|
||||
"""Format prompt with kwargs and pass to LLM.
|
||||
|
||||
Args:
|
||||
**kwargs: Keys to pass to prompt template.
|
||||
|
||||
Returns:
|
||||
Completion from LLM.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
completion = llm.predict(adjective="funny")
|
||||
"""
|
||||
return (await self.acall(kwargs))[self.output_key]
|
||||
|
||||
def predict_and_parse(self, **kwargs: Any) -> Union[str, List[str], Dict[str, str]]:
|
||||
"""Call predict and then parse the results."""
|
||||
result = self.predict(**kwargs)
|
||||
@@ -126,3 +168,9 @@ class LLMChain(Chain, BaseModel):
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
return "llm_chain"
|
||||
|
||||
@classmethod
|
||||
def from_string(cls, llm: BaseLLM, template: str) -> Chain:
|
||||
"""Create LLMChain from LLM and template."""
|
||||
prompt_template = PromptTemplate.from_template(template)
|
||||
return cls(llm=llm, prompt=prompt_template)
|
||||
|
||||
@@ -50,11 +50,8 @@ class LLMMathChain(Chain, BaseModel):
|
||||
"""
|
||||
return [self.output_key]
|
||||
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
llm_executor = LLMChain(prompt=self.prompt, llm=self.llm)
|
||||
def _process_llm_result(self, t: str) -> Dict[str, str]:
|
||||
python_executor = PythonREPL()
|
||||
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
|
||||
t = llm_executor.predict(question=inputs[self.input_key], stop=["```output"])
|
||||
self.callback_manager.on_text(t, color="green", verbose=self.verbose)
|
||||
t = t.strip()
|
||||
if t.startswith("```python"):
|
||||
@@ -69,6 +66,24 @@ class LLMMathChain(Chain, BaseModel):
|
||||
raise ValueError(f"unknown format from LLM: {t}")
|
||||
return {self.output_key: answer}
|
||||
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
llm_executor = LLMChain(
|
||||
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
|
||||
)
|
||||
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
|
||||
t = llm_executor.predict(question=inputs[self.input_key], stop=["```output"])
|
||||
return self._process_llm_result(t)
|
||||
|
||||
async def _acall(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
llm_executor = LLMChain(
|
||||
prompt=self.prompt, llm=self.llm, callback_manager=self.callback_manager
|
||||
)
|
||||
self.callback_manager.on_text(inputs[self.input_key], verbose=self.verbose)
|
||||
t = await llm_executor.apredict(
|
||||
question=inputs[self.input_key], stop=["```output"]
|
||||
)
|
||||
return self._process_llm_result(t)
|
||||
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
return "llm_math_chain"
|
||||
|
||||
@@ -440,7 +440,7 @@ def load_chain_from_config(config: dict, **kwargs: Any) -> Chain:
|
||||
def load_chain(path: Union[str, Path], **kwargs: Any) -> Chain:
|
||||
"""Unified method for loading a chain from LangChainHub or local fs."""
|
||||
if hub_result := try_load_from_hub(
|
||||
path, _load_chain_from_file, "chains", {"json", "yaml"}
|
||||
path, _load_chain_from_file, "chains", {"json", "yaml"}, **kwargs
|
||||
):
|
||||
return hub_result
|
||||
else:
|
||||
|
||||
@@ -336,7 +336,7 @@ class Crawler:
|
||||
element_node_value = strings[node_value[index]]
|
||||
if (
|
||||
element_node_value == "|"
|
||||
): # commonly used as a seperator, does not add much context - lets save ourselves some token space
|
||||
): # commonly used as a separator, does not add much context - lets save ourselves some token space
|
||||
continue
|
||||
elif (
|
||||
node_name == "input"
|
||||
|
||||
@@ -4,7 +4,7 @@ As in https://arxiv.org/pdf/2211.10435.pdf.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel, Extra
|
||||
|
||||
@@ -24,7 +24,10 @@ class PALChain(Chain, BaseModel):
|
||||
prompt: BasePromptTemplate
|
||||
stop: str = "\n\n"
|
||||
get_answer_expr: str = "print(solution())"
|
||||
python_globals: Optional[Dict[str, Any]] = None
|
||||
python_locals: Optional[Dict[str, Any]] = None
|
||||
output_key: str = "result" #: :meta private:
|
||||
return_intermediate_steps: bool = False
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -46,7 +49,10 @@ class PALChain(Chain, BaseModel):
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
return [self.output_key]
|
||||
if not self.return_intermediate_steps:
|
||||
return [self.output_key]
|
||||
else:
|
||||
return [self.output_key, "intermediate_steps"]
|
||||
|
||||
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
|
||||
llm_chain = LLMChain(llm=self.llm, prompt=self.prompt)
|
||||
@@ -54,9 +60,12 @@ class PALChain(Chain, BaseModel):
|
||||
self.callback_manager.on_text(
|
||||
code, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
repl = PythonREPL()
|
||||
repl = PythonREPL(_globals=self.python_globals, _locals=self.python_locals)
|
||||
res = repl.run(code + f"\n{self.get_answer_expr}")
|
||||
return {self.output_key: res.strip()}
|
||||
output = {self.output_key: res.strip()}
|
||||
if self.return_intermediate_steps:
|
||||
output["intermediate_steps"] = code
|
||||
return output
|
||||
|
||||
@classmethod
|
||||
def from_math_prompt(cls, llm: BaseLLM, **kwargs: Any) -> PALChain:
|
||||
|
||||
@@ -21,7 +21,7 @@ class SQLDatabaseChain(Chain, BaseModel):
|
||||
|
||||
from langchain import SQLDatabaseChain, OpenAI, SQLDatabase
|
||||
db = SQLDatabase(...)
|
||||
db_chain = SelfAskWithSearchChain(llm=OpenAI(), database=db)
|
||||
db_chain = SQLDatabaseChain(llm=OpenAI(), database=db)
|
||||
"""
|
||||
|
||||
llm: BaseLLM
|
||||
@@ -35,6 +35,9 @@ class SQLDatabaseChain(Chain, BaseModel):
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
return_intermediate_steps: bool = False
|
||||
"""Whether or not to return the intermediate steps along with the final answer."""
|
||||
return_direct: bool = False
|
||||
"""Whether or not to return the result of querying the SQL table directly."""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -83,11 +86,17 @@ class SQLDatabaseChain(Chain, BaseModel):
|
||||
intermediate_steps.append(result)
|
||||
self.callback_manager.on_text("\nSQLResult: ", verbose=self.verbose)
|
||||
self.callback_manager.on_text(result, color="yellow", verbose=self.verbose)
|
||||
self.callback_manager.on_text("\nAnswer:", verbose=self.verbose)
|
||||
input_text += f"{sql_cmd}\nSQLResult: {result}\nAnswer:"
|
||||
llm_inputs["input"] = input_text
|
||||
final_result = llm_chain.predict(**llm_inputs)
|
||||
self.callback_manager.on_text(final_result, color="green", verbose=self.verbose)
|
||||
# If return direct, we just set the final result equal to the sql query
|
||||
if self.return_direct:
|
||||
final_result = result
|
||||
else:
|
||||
self.callback_manager.on_text("\nAnswer:", verbose=self.verbose)
|
||||
input_text += f"{sql_cmd}\nSQLResult: {result}\nAnswer:"
|
||||
llm_inputs["input"] = input_text
|
||||
final_result = llm_chain.predict(**llm_inputs)
|
||||
self.callback_manager.on_text(
|
||||
final_result, color="green", verbose=self.verbose
|
||||
)
|
||||
chain_result: Dict[str, Any] = {self.output_key: final_result}
|
||||
if self.return_intermediate_steps:
|
||||
chain_result["intermediate_steps"] = intermediate_steps
|
||||
|
||||
@@ -26,7 +26,7 @@ PROMPT = PromptTemplate(
|
||||
template=_DEFAULT_TEMPLATE,
|
||||
)
|
||||
|
||||
_DECIDER_TEMPLATE = """Given the below input question and list of potential tables, output a comma separated list of the table names that may be neccessary to answer this question.
|
||||
_DECIDER_TEMPLATE = """Given the below input question and list of potential tables, output a comma separated list of the table names that may be necessary to answer this question.
|
||||
|
||||
Question: {query}
|
||||
|
||||
|
||||
49
langchain/document_loaders/__init__.py
Normal file
49
langchain/document_loaders/__init__.py
Normal file
@@ -0,0 +1,49 @@
|
||||
"""All different types of document loaders."""
|
||||
|
||||
from langchain.document_loaders.azlyrics import AZLyricsLoader
|
||||
from langchain.document_loaders.college_confidential import CollegeConfidentialLoader
|
||||
from langchain.document_loaders.directory import DirectoryLoader
|
||||
from langchain.document_loaders.docx import UnstructuredDocxLoader
|
||||
from langchain.document_loaders.email import UnstructuredEmailLoader
|
||||
from langchain.document_loaders.gcs_directory import GCSDirectoryLoader
|
||||
from langchain.document_loaders.gcs_file import GCSFileLoader
|
||||
from langchain.document_loaders.googledrive import GoogleDriveLoader
|
||||
from langchain.document_loaders.gutenberg import GutenbergLoader
|
||||
from langchain.document_loaders.html import UnstructuredHTMLLoader
|
||||
from langchain.document_loaders.imsdb import IMSDbLoader
|
||||
from langchain.document_loaders.notion import NotionDirectoryLoader
|
||||
from langchain.document_loaders.obsidian import ObsidianLoader
|
||||
from langchain.document_loaders.pdf import UnstructuredPDFLoader
|
||||
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
|
||||
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
|
||||
from langchain.document_loaders.roam import RoamLoader
|
||||
from langchain.document_loaders.s3_directory import S3DirectoryLoader
|
||||
from langchain.document_loaders.s3_file import S3FileLoader
|
||||
from langchain.document_loaders.unstructured import UnstructuredFileLoader
|
||||
from langchain.document_loaders.web_base import WebBaseLoader
|
||||
from langchain.document_loaders.youtube import YoutubeLoader
|
||||
|
||||
__all__ = [
|
||||
"UnstructuredFileLoader",
|
||||
"DirectoryLoader",
|
||||
"NotionDirectoryLoader",
|
||||
"ReadTheDocsLoader",
|
||||
"GoogleDriveLoader",
|
||||
"UnstructuredHTMLLoader",
|
||||
"UnstructuredPowerPointLoader",
|
||||
"UnstructuredPDFLoader",
|
||||
"ObsidianLoader",
|
||||
"UnstructuredDocxLoader",
|
||||
"UnstructuredEmailLoader",
|
||||
"RoamLoader",
|
||||
"YoutubeLoader",
|
||||
"S3FileLoader",
|
||||
"S3DirectoryLoader",
|
||||
"GCSFileLoader",
|
||||
"GCSDirectoryLoader",
|
||||
"WebBaseLoader",
|
||||
"IMSDbLoader",
|
||||
"AZLyricsLoader",
|
||||
"CollegeConfidentialLoader",
|
||||
"GutenbergLoader",
|
||||
]
|
||||
22
langchain/document_loaders/azlyrics.py
Normal file
22
langchain/document_loaders/azlyrics.py
Normal file
@@ -0,0 +1,22 @@
|
||||
"""Loader that loads AZLyrics."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.web_base import WebBaseLoader
|
||||
|
||||
|
||||
class AZLyricsLoader(WebBaseLoader):
|
||||
"""Loader that loads AZLyrics webpages."""
|
||||
|
||||
def __init__(self, web_path: str):
|
||||
"""Initialize with webpage path."""
|
||||
self.web_path = web_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load webpage."""
|
||||
soup = self.scrape()
|
||||
title = soup.title.text
|
||||
lyrics = soup.find_all("div", {"class": ""})[2].text
|
||||
text = title + lyrics
|
||||
metadata = {"source": self.web_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
26
langchain/document_loaders/base.py
Normal file
26
langchain/document_loaders/base.py
Normal file
@@ -0,0 +1,26 @@
|
||||
"""Base loader class."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List, Optional
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter, TextSplitter
|
||||
|
||||
|
||||
class BaseLoader(ABC):
|
||||
"""Base loader class."""
|
||||
|
||||
@abstractmethod
|
||||
def load(self) -> List[Document]:
|
||||
"""Load data into document objects."""
|
||||
|
||||
def load_and_split(
|
||||
self, text_splitter: Optional[TextSplitter] = None
|
||||
) -> List[Document]:
|
||||
"""Load documents and split into chunks."""
|
||||
if text_splitter is None:
|
||||
_text_splitter: TextSplitter = RecursiveCharacterTextSplitter()
|
||||
else:
|
||||
_text_splitter = text_splitter
|
||||
docs = self.load()
|
||||
return _text_splitter.split_documents(docs)
|
||||
20
langchain/document_loaders/college_confidential.py
Normal file
20
langchain/document_loaders/college_confidential.py
Normal file
@@ -0,0 +1,20 @@
|
||||
"""Loader that loads College Confidential."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.web_base import WebBaseLoader
|
||||
|
||||
|
||||
class CollegeConfidentialLoader(WebBaseLoader):
|
||||
"""Loader that loads College Confidential webpages."""
|
||||
|
||||
def __init__(self, web_path: str):
|
||||
"""Initialize with webpage path."""
|
||||
self.web_path = web_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load webpage."""
|
||||
soup = self.scrape()
|
||||
text = soup.select_one("main[class='skin-handler']").text
|
||||
metadata = {"source": self.web_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
26
langchain/document_loaders/directory.py
Normal file
26
langchain/document_loaders/directory.py
Normal file
@@ -0,0 +1,26 @@
|
||||
"""Loading logic for loading documents from a directory."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.unstructured import UnstructuredFileLoader
|
||||
|
||||
|
||||
class DirectoryLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from a directory."""
|
||||
|
||||
def __init__(self, path: str, glob: str = "**/*"):
|
||||
"""Initialize with path to directory and how to glob over it."""
|
||||
self.path = path
|
||||
self.glob = glob
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
p = Path(self.path)
|
||||
docs = []
|
||||
for i in p.glob(self.glob):
|
||||
if i.is_file():
|
||||
sub_docs = UnstructuredFileLoader(str(i)).load()
|
||||
docs.extend(sub_docs)
|
||||
return docs
|
||||
29
langchain/document_loaders/docx.py
Normal file
29
langchain/document_loaders/docx.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that loads Microsoft Word files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredDocxLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load Microsoft Word files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.docx import partition_docx
|
||||
|
||||
elements = partition_docx(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
29
langchain/document_loaders/email.py
Normal file
29
langchain/document_loaders/email.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that loads email files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredEmailLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load email files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.email import partition_email
|
||||
|
||||
elements = partition_email(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
32
langchain/document_loaders/gcs_directory.py
Normal file
32
langchain/document_loaders/gcs_directory.py
Normal file
@@ -0,0 +1,32 @@
|
||||
"""Loading logic for loading documents from an GCS directory."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.gcs_file import GCSFileLoader
|
||||
|
||||
|
||||
class GCSDirectoryLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from GCS."""
|
||||
|
||||
def __init__(self, project_name: str, bucket: str, prefix: str = ""):
|
||||
"""Initialize with bucket and key name."""
|
||||
self.project_name = project_name
|
||||
self.bucket = bucket
|
||||
self.prefix = prefix
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
from google.cloud import storage
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import google-cloud-storage python package. "
|
||||
"Please it install it with `pip install google-cloud-storage`."
|
||||
)
|
||||
client = storage.Client(project=self.project_name)
|
||||
docs = []
|
||||
for blob in client.list_blobs(self.bucket, prefix=self.prefix):
|
||||
loader = GCSFileLoader(self.project_name, self.bucket, blob.name)
|
||||
docs.extend(loader.load())
|
||||
return docs
|
||||
40
langchain/document_loaders/gcs_file.py
Normal file
40
langchain/document_loaders/gcs_file.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""Loading logic for loading documents from a GCS file."""
|
||||
import tempfile
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.unstructured import UnstructuredFileLoader
|
||||
|
||||
|
||||
class GCSFileLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from GCS."""
|
||||
|
||||
def __init__(self, project_name: str, bucket: str, blob: str):
|
||||
"""Initialize with bucket and key name."""
|
||||
self.bucket = bucket
|
||||
self.blob = blob
|
||||
self.project_name = project_name
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
from google.cloud import storage
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import google-cloud-storage python package. "
|
||||
"Please it install it with `pip install google-cloud-storage`."
|
||||
)
|
||||
|
||||
# Initialise a client
|
||||
storage_client = storage.Client(self.project_name)
|
||||
# Create a bucket object for our bucket
|
||||
bucket = storage_client.get_bucket(self.bucket)
|
||||
# Create a blob object from the filepath
|
||||
blob = bucket.blob(self.blob)
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
file_path = f"{temp_dir}/{self.blob}"
|
||||
# Download the file to a destination
|
||||
blob.download_to_filename(file_path)
|
||||
loader = UnstructuredFileLoader(file_path)
|
||||
return loader.load()
|
||||
141
langchain/document_loaders/googledrive.py
Normal file
141
langchain/document_loaders/googledrive.py
Normal file
@@ -0,0 +1,141 @@
|
||||
"""Loader that loads data from Google Drive."""
|
||||
|
||||
# Prerequisites:
|
||||
# 1. Create a Google Cloud project
|
||||
# 2. Enable the Google Drive API:
|
||||
# https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com
|
||||
# 3. Authorize credentials for desktop app:
|
||||
# https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application # noqa: E501
|
||||
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel, root_validator, validator
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
SCOPES = ["https://www.googleapis.com/auth/drive.readonly"]
|
||||
|
||||
|
||||
class GoogleDriveLoader(BaseLoader, BaseModel):
|
||||
"""Loader that loads Google Docs from Google Drive."""
|
||||
|
||||
credentials_path: Path = Path.home() / ".credentials" / "credentials.json"
|
||||
token_path: Path = Path.home() / ".credentials" / "token.json"
|
||||
folder_id: Optional[str] = None
|
||||
document_ids: Optional[List[str]] = None
|
||||
|
||||
@root_validator
|
||||
def validate_folder_id_or_document_ids(
|
||||
cls, values: Dict[str, Any]
|
||||
) -> Dict[str, Any]:
|
||||
"""Validate that either folder_id or document_ids is set, but not both."""
|
||||
if values.get("folder_id") and values.get("document_ids"):
|
||||
raise ValueError("Cannot specify both folder_id and document_ids")
|
||||
if not values.get("folder_id") and not values.get("document_ids"):
|
||||
raise ValueError("Must specify either folder_id or document_ids")
|
||||
return values
|
||||
|
||||
@validator("credentials_path")
|
||||
def validate_credentials_path(cls, v: Any, **kwargs: Any) -> Any:
|
||||
"""Validate that credentials_path exists."""
|
||||
if not v.exists():
|
||||
raise ValueError(f"credentials_path {v} does not exist")
|
||||
return v
|
||||
|
||||
def _load_credentials(self) -> Any:
|
||||
"""Load credentials."""
|
||||
# Adapted from https://developers.google.com/drive/api/v3/quickstart/python
|
||||
try:
|
||||
from google.auth.transport.requests import Request
|
||||
from google.oauth2.credentials import Credentials
|
||||
from google_auth_oauthlib.flow import InstalledAppFlow
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"You must run"
|
||||
"`pip install --upgrade "
|
||||
"google-api-python-client google-auth-httplib2 "
|
||||
"google-auth-oauthlib`"
|
||||
"to use the Google Drive loader."
|
||||
)
|
||||
|
||||
creds = None
|
||||
if self.token_path.exists():
|
||||
creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES)
|
||||
|
||||
if not creds or not creds.valid:
|
||||
if creds and creds.expired and creds.refresh_token:
|
||||
creds.refresh(Request())
|
||||
else:
|
||||
flow = InstalledAppFlow.from_client_secrets_file(
|
||||
str(self.credentials_path), SCOPES
|
||||
)
|
||||
creds = flow.run_local_server(port=0)
|
||||
with open(self.token_path, "w") as token:
|
||||
token.write(creds.to_json())
|
||||
|
||||
return creds
|
||||
|
||||
def _load_document_from_id(self, id: str) -> Document:
|
||||
"""Load a document from an ID."""
|
||||
from io import BytesIO
|
||||
|
||||
from googleapiclient.discovery import build
|
||||
from googleapiclient.http import MediaIoBaseDownload
|
||||
|
||||
creds = self._load_credentials()
|
||||
service = build("drive", "v3", credentials=creds)
|
||||
|
||||
request = service.files().export_media(fileId=id, mimeType="text/plain")
|
||||
fh = BytesIO()
|
||||
downloader = MediaIoBaseDownload(fh, request)
|
||||
done = False
|
||||
while done is False:
|
||||
status, done = downloader.next_chunk()
|
||||
text = fh.getvalue().decode("utf-8")
|
||||
metadata = {"source": f"https://docs.google.com/document/d/{id}/edit"}
|
||||
return Document(page_content=text, metadata=metadata)
|
||||
|
||||
def _load_documents_from_folder(self) -> List[Document]:
|
||||
"""Load documents from a folder."""
|
||||
from googleapiclient.discovery import build
|
||||
|
||||
creds = self._load_credentials()
|
||||
service = build("drive", "v3", credentials=creds)
|
||||
|
||||
results = (
|
||||
service.files()
|
||||
.list(
|
||||
q=f"'{self.folder_id}' in parents",
|
||||
pageSize=1000,
|
||||
fields="nextPageToken, files(id, name, mimeType)",
|
||||
)
|
||||
.execute()
|
||||
)
|
||||
items = results.get("files", [])
|
||||
|
||||
docs = []
|
||||
for item in items:
|
||||
# Only support Google Docs for now
|
||||
if item["mimeType"] == "application/vnd.google-apps.document":
|
||||
docs.append(self._load_document_from_id(item["id"]))
|
||||
return docs
|
||||
|
||||
def _load_documents_from_ids(self) -> List[Document]:
|
||||
"""Load documents from a list of IDs."""
|
||||
if not self.document_ids:
|
||||
raise ValueError("document_ids must be set")
|
||||
|
||||
docs = []
|
||||
for doc_id in self.document_ids:
|
||||
docs.append(self._load_document_from_id(doc_id))
|
||||
return docs
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
if self.folder_id:
|
||||
return self._load_documents_from_folder()
|
||||
else:
|
||||
return self._load_documents_from_ids()
|
||||
28
langchain/document_loaders/gutenberg.py
Normal file
28
langchain/document_loaders/gutenberg.py
Normal file
@@ -0,0 +1,28 @@
|
||||
"""Loader that loads .txt web files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class GutenbergLoader(BaseLoader):
|
||||
"""Loader that uses urllib to load .txt web files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
if not file_path.startswith("https://www.gutenberg.org"):
|
||||
raise ValueError("file path must start with 'https://www.gutenberg.org'")
|
||||
|
||||
if not file_path.endswith(".txt"):
|
||||
raise ValueError("file path must end with '.txt'")
|
||||
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from urllib.request import urlopen
|
||||
|
||||
elements = urlopen(self.file_path)
|
||||
text = "\n\n".join([str(el.decode("utf-8-sig")) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
29
langchain/document_loaders/html.py
Normal file
29
langchain/document_loaders/html.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that loads PDF files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredHTMLLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load HTML files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.html import partition_html
|
||||
|
||||
elements = partition_html(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
20
langchain/document_loaders/imsdb.py
Normal file
20
langchain/document_loaders/imsdb.py
Normal file
@@ -0,0 +1,20 @@
|
||||
"""Loader that loads IMSDb."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.web_base import WebBaseLoader
|
||||
|
||||
|
||||
class IMSDbLoader(WebBaseLoader):
|
||||
"""Loader that loads IMSDb webpages."""
|
||||
|
||||
def __init__(self, web_path: str):
|
||||
"""Initialize with webpage path."""
|
||||
self.web_path = web_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load webpage."""
|
||||
soup = self.scrape()
|
||||
text = soup.select_one("td[class='scrtext']").text
|
||||
metadata = {"source": self.web_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
25
langchain/document_loaders/notion.py
Normal file
25
langchain/document_loaders/notion.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Loader that loads Notion directory dump."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class NotionDirectoryLoader(BaseLoader):
|
||||
"""Loader that loads Notion directory dump."""
|
||||
|
||||
def __init__(self, path: str):
|
||||
"""Initialize with path."""
|
||||
self.file_path = path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
ps = list(Path(self.file_path).glob("**/*.md"))
|
||||
docs = []
|
||||
for p in ps:
|
||||
with open(p) as f:
|
||||
text = f.read()
|
||||
metadata = {"source": str(p)}
|
||||
docs.append(Document(page_content=text, metadata=metadata))
|
||||
return docs
|
||||
25
langchain/document_loaders/obsidian.py
Normal file
25
langchain/document_loaders/obsidian.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Loader that loads Obsidian directory dump."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class ObsidianLoader(BaseLoader):
|
||||
"""Loader that loads Obsidian files from disk."""
|
||||
|
||||
def __init__(self, path: str):
|
||||
"""Initialize with path."""
|
||||
self.file_path = path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
ps = list(Path(self.file_path).glob("**/*.md"))
|
||||
docs = []
|
||||
for p in ps:
|
||||
with open(p) as f:
|
||||
text = f.read()
|
||||
metadata = {"source": str(p)}
|
||||
docs.append(Document(page_content=text, metadata=metadata))
|
||||
return docs
|
||||
29
langchain/document_loaders/pdf.py
Normal file
29
langchain/document_loaders/pdf.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that loads PDF files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredPDFLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load PDF files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.pdf import partition_pdf
|
||||
|
||||
elements = partition_pdf(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
29
langchain/document_loaders/powerpoint.py
Normal file
29
langchain/document_loaders/powerpoint.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that loads powerpoint files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredPowerPointLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load powerpoint files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.pptx import partition_pptx
|
||||
|
||||
elements = partition_pptx(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
37
langchain/document_loaders/readthedocs.py
Normal file
37
langchain/document_loaders/readthedocs.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Loader that loads ReadTheDocs documentation directory dump."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class ReadTheDocsLoader(BaseLoader):
|
||||
"""Loader that loads ReadTheDocs documentation directory dump."""
|
||||
|
||||
def __init__(self, path: str):
|
||||
"""Initialize path."""
|
||||
self.file_path = path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
def _clean_data(data: str) -> str:
|
||||
soup = BeautifulSoup(data)
|
||||
text = soup.find_all("main", {"id": "main-content"})
|
||||
if len(text) != 0:
|
||||
text = text[0].get_text()
|
||||
else:
|
||||
text = ""
|
||||
return "\n".join([t for t in text.split("\n") if t])
|
||||
|
||||
docs = []
|
||||
for p in Path(self.file_path).rglob("*"):
|
||||
if p.is_dir():
|
||||
continue
|
||||
with open(p) as f:
|
||||
text = _clean_data(f.read())
|
||||
metadata = {"source": str(p)}
|
||||
docs.append(Document(page_content=text, metadata=metadata))
|
||||
return docs
|
||||
25
langchain/document_loaders/roam.py
Normal file
25
langchain/document_loaders/roam.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Loader that loads Roam directory dump."""
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class RoamLoader(BaseLoader):
|
||||
"""Loader that loads Roam files from disk."""
|
||||
|
||||
def __init__(self, path: str):
|
||||
"""Initialize with path."""
|
||||
self.file_path = path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
ps = list(Path(self.file_path).glob("**/*.md"))
|
||||
docs = []
|
||||
for p in ps:
|
||||
with open(p) as f:
|
||||
text = f.read()
|
||||
metadata = {"source": str(p)}
|
||||
docs.append(Document(page_content=text, metadata=metadata))
|
||||
return docs
|
||||
32
langchain/document_loaders/s3_directory.py
Normal file
32
langchain/document_loaders/s3_directory.py
Normal file
@@ -0,0 +1,32 @@
|
||||
"""Loading logic for loading documents from an s3 directory."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.s3_file import S3FileLoader
|
||||
|
||||
|
||||
class S3DirectoryLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from s3."""
|
||||
|
||||
def __init__(self, bucket: str, prefix: str = ""):
|
||||
"""Initialize with bucket and key name."""
|
||||
self.bucket = bucket
|
||||
self.prefix = prefix
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
import boto3
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import boto3 python package. "
|
||||
"Please it install it with `pip install boto3`."
|
||||
)
|
||||
s3 = boto3.resource("s3")
|
||||
bucket = s3.Bucket(self.bucket)
|
||||
docs = []
|
||||
for obj in bucket.objects.filter(Prefix=self.prefix):
|
||||
loader = S3FileLoader(self.bucket, obj.key)
|
||||
docs.extend(loader.load())
|
||||
return docs
|
||||
32
langchain/document_loaders/s3_file.py
Normal file
32
langchain/document_loaders/s3_file.py
Normal file
@@ -0,0 +1,32 @@
|
||||
"""Loading logic for loading documents from an s3 file."""
|
||||
import tempfile
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
from langchain.document_loaders.unstructured import UnstructuredFileLoader
|
||||
|
||||
|
||||
class S3FileLoader(BaseLoader):
|
||||
"""Loading logic for loading documents from s3."""
|
||||
|
||||
def __init__(self, bucket: str, key: str):
|
||||
"""Initialize with bucket and key name."""
|
||||
self.bucket = bucket
|
||||
self.key = key
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
import boto3
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"Could not import boto3 python package. "
|
||||
"Please it install it with `pip install boto3`."
|
||||
)
|
||||
s3 = boto3.client("s3")
|
||||
with tempfile.TemporaryDirectory() as temp_dir:
|
||||
file_path = f"{temp_dir}/{self.key}"
|
||||
s3.download_file(self.bucket, self.key, file_path)
|
||||
loader = UnstructuredFileLoader(file_path)
|
||||
return loader.load()
|
||||
29
langchain/document_loaders/unstructured.py
Normal file
29
langchain/document_loaders/unstructured.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Loader that uses unstructured to load files."""
|
||||
from typing import List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class UnstructuredFileLoader(BaseLoader):
|
||||
"""Loader that uses unstructured to load files."""
|
||||
|
||||
def __init__(self, file_path: str):
|
||||
"""Initialize with file path."""
|
||||
try:
|
||||
import unstructured # noqa:F401
|
||||
except ImportError:
|
||||
raise ValueError(
|
||||
"unstructured package not found, please install it with "
|
||||
"`pip install unstructured`"
|
||||
)
|
||||
self.file_path = file_path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load file."""
|
||||
from unstructured.partition.auto import partition
|
||||
|
||||
elements = partition(filename=self.file_path)
|
||||
text = "\n\n".join([str(el) for el in elements])
|
||||
metadata = {"source": self.file_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
29
langchain/document_loaders/web_base.py
Normal file
29
langchain/document_loaders/web_base.py
Normal file
@@ -0,0 +1,29 @@
|
||||
"""Web base loader class."""
|
||||
from typing import List
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class WebBaseLoader(BaseLoader):
|
||||
"""Loader that uses urllib and beautiful soup to load webpages."""
|
||||
|
||||
def __init__(self, web_path: str):
|
||||
"""Initialize with webpage path."""
|
||||
self.web_path = web_path
|
||||
|
||||
def scrape(self) -> BeautifulSoup:
|
||||
"""Scrape data from webpage and return it in BeautifulSoup format."""
|
||||
html_doc = requests.get(self.web_path)
|
||||
soup = BeautifulSoup(html_doc.text, "html.parser")
|
||||
return soup
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load data into document objects."""
|
||||
soup = self.scrape()
|
||||
text = soup.get_text()
|
||||
metadata = {"source": self.web_path}
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
76
langchain/document_loaders/youtube.py
Normal file
76
langchain/document_loaders/youtube.py
Normal file
@@ -0,0 +1,76 @@
|
||||
"""Loader that loads YouTube transcript."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, List
|
||||
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.document_loaders.base import BaseLoader
|
||||
|
||||
|
||||
class YoutubeLoader(BaseLoader):
|
||||
"""Loader that loads Youtube transcripts."""
|
||||
|
||||
def __init__(self, video_id: str, add_video_info: bool = False):
|
||||
"""Initialize with YouTube video ID."""
|
||||
self.video_id = video_id
|
||||
self.add_video_info = add_video_info
|
||||
|
||||
@classmethod
|
||||
def from_youtube_url(cls, youtube_url: str, **kwargs: Any) -> YoutubeLoader:
|
||||
"""Parse out video id from YouTube url."""
|
||||
video_id = youtube_url.split("youtube.com/watch?v=")[-1]
|
||||
return cls(video_id, **kwargs)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
try:
|
||||
from youtube_transcript_api import YouTubeTranscriptApi
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Could not import youtube_transcript_api python package. "
|
||||
"Please it install it with `pip install youtube-transcript-api`."
|
||||
)
|
||||
|
||||
metadata = {"source": self.video_id}
|
||||
|
||||
if self.add_video_info:
|
||||
# Get more video meta info
|
||||
# Such as title, description, thumbnail url, publish_date
|
||||
video_info = self._get_video_info()
|
||||
metadata.update(video_info)
|
||||
|
||||
transcript_pieces = YouTubeTranscriptApi.get_transcript(self.video_id)
|
||||
transcript = " ".join([t["text"].strip(" ") for t in transcript_pieces])
|
||||
|
||||
return [Document(page_content=transcript, metadata=metadata)]
|
||||
|
||||
def _get_video_info(self) -> dict:
|
||||
"""Get important video information.
|
||||
|
||||
Components are:
|
||||
- title
|
||||
- description
|
||||
- thumbnail url,
|
||||
- publish_date
|
||||
- channel_author
|
||||
- and more.
|
||||
"""
|
||||
try:
|
||||
from pytube import YouTube
|
||||
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Could not import pytube python package. "
|
||||
"Please it install it with `pip install pytube`."
|
||||
)
|
||||
yt = YouTube(f"https://www.youtube.com/watch?v={self.video_id}")
|
||||
video_info = {
|
||||
"title": yt.title,
|
||||
"description": yt.description,
|
||||
"view_count": yt.views,
|
||||
"thumbnail_url": yt.thumbnail_url,
|
||||
"publish_date": yt.publish_date,
|
||||
"length": yt.length,
|
||||
"author": yt.author,
|
||||
}
|
||||
return video_info
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user