Compare commits

..

8 Commits

Author SHA1 Message Date
Harrison Chase
ec65ca00c1 cr 2022-12-04 20:34:53 -08:00
Harrison Chase
64ea17bd21 Merge branch 'master' into harrison/fix_logging_api 2022-12-04 20:29:25 -08:00
Harrison Chase
ec842b7e7b fix logging in api chain 2022-12-04 20:29:19 -08:00
Harrison Chase
bf8bed493f wip logging 2022-12-04 19:31:57 -08:00
Harrison Chase
ad85f3bdbc Merge branch 'master' into harrison/logger 2022-12-04 18:52:10 -08:00
Harrison Chase
c2580cf401 stash 2022-12-04 16:35:13 -08:00
Harrison Chase
7ec210767a Merge branch 'master' into harrison/logger 2022-12-04 08:52:40 -08:00
Harrison Chase
2bef195a1f stash 2022-12-04 08:45:34 -08:00
134 changed files with 1680 additions and 8345 deletions

View File

@@ -1,2 +0,0 @@
[run]
omit = tests/*

View File

@@ -2,35 +2,34 @@ name: lint
on:
push:
branches: [master]
branches: [main]
pull_request:
env:
POETRY_VERSION: "1.3.1"
POETRY_VERSION: "1.2.0"
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
python-version:
- "3.8"
- "3.9"
- "3.10"
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: |
pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: poetry
- name: Install dependencies
run: |
poetry install
- name: Analysing the code with our lint
run: |
make lint
- uses: actions/checkout@v3
- name: Install poetry
run: |
pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: poetry
- name: Install dependencies
run: |
poetry install
- name: Analysing the code with our lint
run: |
make lint

View File

@@ -2,11 +2,11 @@ name: test
on:
push:
branches: [master]
branches: [main]
pull_request:
env:
POETRY_VERSION: "1.3.1"
POETRY_VERSION: "1.2.0"
jobs:
build:
@@ -14,21 +14,20 @@ jobs:
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
- "3.8"
- "3.9"
- "3.10"
steps:
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install dependencies
run: poetry install
- name: Run unit tests
run: |
make tests
- uses: actions/checkout@v3
- name: Install poetry
run: pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'
- name: Install dependencies
run: poetry install
- name: Run unit tests
run: |
make tests

1
.gitignore vendored
View File

@@ -1,5 +1,4 @@
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

View File

@@ -1,154 +0,0 @@
# Contributing to LangChain
Hi there! Thank you for even being interested in contributing to LangChain.
As an open source project in a rapidly developing field, we are extremely open
to contributions, whether it be in the form of a new feature, improved infra, or better documentation.
To contribute to this project, please follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are maintainer.
## 🗺Contributing Guidelines
### 🚩GitHub Issues
Our [issues](https://github.com/hwchase17/langchain/issues) page is kept up to date
with bugs, improvements, and feature requests. There is a taxonomy of labels to help
with sorting and discovery of issues of interest. These include:
- prompts: related to prompt tooling/infra.
- llms: related to LLM wrappers/tooling/infra.
- chains
- utilities: related to different types of utilities to integrate with (Python, SQL, etc.).
- agents
- memory
- applications: related to example applications to build
If you start working on an issue, please assign it to yourself.
If you are adding an issue, please try to keep it focused on a single modular bug/improvement/feature.
If the two issues are related, or blocking, please link them rather than keep them as one single one.
We will try to keep these issues as up to date as possible, though
with the rapid rate of develop in this field some may get out of date.
If you notice this happening, please just let us know.
### 🙋Getting Help
Although we try to have a developer setup to make it as easy as possible for others to contribute (see below)
it is possible that some pain point may arise around environment setup, linting, documentation, or other.
Should that occur, please contact a maintainer! Not only do we want to help get you unblocked,
but we also want to make sure that the process is smooth for future contributors.
In a similar vein, we do enforce certain linting, formatting, and documentation standards in the codebase.
If you are finding these difficult (or even just annoying) to work with,
feel free to contact a maintainer for help - we do not want these to get in the way of getting
good code into the codebase.
### 🏭Release process
As of now, LangChain has an ad hoc release process: releases are cut with high frequency via by
a developer and published to [PyPI](https://pypi.org/project/ruff/).
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.
## 🤖Developer Setup
### 🚀Quick Start
This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
To install requirements:
```bash
poetry install -E all
```
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage. Note the `-E all` flag will install all optional dependencies necessary for integration testing.
Now, you should be able to run the common tasks in the following section.
### ✅Common Tasks
#### Code Formatting
Formatting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/) and [isort](https://pycqa.github.io/isort/).
To run formatting for this project:
```bash
make format
```
#### Linting
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
To run linting for this project:
```bash
make lint
```
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
#### Coverage
Code coverage (i.e. the amount of code that is covered by unit tests) helps identify areas of the code that are potentially more or less brittle.
To get a report of current coverage, run the following:
```bash
make coverage
```
#### Testing
Unit tests cover modular logic that does not require calls to outside APIs.
To run unit tests:
```bash
make tests
```
If you add new logic, please add a unit test.
Integration tests cover logic that requires making calls to outside APIs (often integration with other services).
To run integration tests:
```bash
make integration_tests
```
If you add support for a new external API, please add a new integration test.
#### Adding a Jupyter Notebook
If you are adding a Jupyter notebook example, you'll want to install the optional `dev` dependencies.
To install dev dependencies:
```bash
poetry install --with dev
```
Launch a notebook:
```bash
poetry run jupyter notebook
```
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
#### Contribute Documentation
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
For that reason, we ask that you add good documentation to all classes and methods.
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.

View File

@@ -1,11 +1,5 @@
.PHONY: format lint tests integration_tests
coverage:
poetry run pytest --cov \
--cov-config=.coveragerc \
--cov-report xml \
--cov-report term-missing:skip-covered
format:
poetry run black .
poetry run isort .

167
README.md
View File

@@ -13,45 +13,176 @@
Large language models (LLMs) are emerging as a transformative technology, enabling
developers to build applications that they previously could not.
But using these LLMs in isolation is often not enough to
create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
create a truly powerful app - the real power comes when you are able to
combine them with other sources of computation or knowledge.
This library is aimed at assisting in the development of those types of applications.
## 📖 Documentation
Please see [here](https://langchain.readthedocs.io/en/latest/?) for full documentation on:
- Getting started (installation, setting up the environment, simple examples)
- Getting started (installation, setting up environment, simple examples)
- How-To examples (demos, integrations, helper functions)
- Reference (full API docs)
Resources (high-level explanation of core concepts)
- Resources (high level explanation of core concepts)
## 🚀 What can this help with?
There are four main areas that LangChain is designed to help with.
There are three main areas (with a forth coming soon) that LangChain is designed to help with.
These are, in increasing order of complexity:
1. LLM and Prompts
2. Chains
3. Agents
4. Memory
**📃 LLMs and Prompts:**
Let's go through these categories and for each one identify key concepts (to clarify terminology) as well as the problems in this area LangChain helps solve.
This includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.
### LLMs and Prompts
Calling out to an LLM once is pretty easy, with most of them being behind well documented APIs.
However, there are still some challenges going from that to an application running in production that LangChain attempts to address.
**🔗 Chains:**
**Key Concepts**
- LLM: A large language model, in particular a text-to-text model.
- Prompt: The input to a language model. Typically this is not simply a hardcoded string but rather a combination of a template, some examples, and user input.
- Prompt Template: An object responsible for constructing the final prompt to pass to a LLM.
- Examples: Datapoints that can be included in the prompt in order to give the model more context what to do.
- Few Shot Prompt Template: A subclass of the PromptTemplate class that uses examples.
- Example Selector: A class responsible to selecting examples to use dynamically (depending on user input) in a few shot prompt.
Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
**Problems Solved**
- Switching costs: by exposing a standard interface for all the top LLM providers, LangChain makes it easy to switch from one provider to another, whether it be for production use cases or just for testing stuff out.
- Prompt management: managing your prompts is easy when you only have one simple one, but can get tricky when you have a bunch or when they start to get more complex. LangChain provides a standard way for storing, constructing, and referencing prompts.
- Prompt optimization: despite the underlying models getting better and better, there is still currently a need for carefully constructing prompts.
**🤖 Agents:**
### Chains
Using an LLM in isolation is fine for some simple applications, but many more complex ones require chaining LLMs - either with eachother or with other experts.
LangChain provides several parts to help with that.
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.
**Key Concepts**
- Tools: APIs designed for assisting with a particular use case (search, databases, Python REPL, etc). Prompt templates, LLMs, and chains can also be considered tools.
- Chains: A combination of multiple tools in a deterministic manner.
**🧠 Memory:**
**Problems Solved**
- Standard interface for working with Chains
- Easy way to construct chains of LLMs
- Lots of integrations with other tools that you may want to use in conjunction with LLMs
- End-to-end chains for common workflows (database question/answer, recursive summarization, etc)
Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
### Agents
Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user input.
In these types of chains, there is a “agent” which has access to a suite of tools.
Depending on the user input, the agent can then decide which, if any, of these tools to call.
For more information on these concepts, please see our [full documentation](https://langchain.readthedocs.io/en/latest/?).
**Key Concepts**
- Tools: same as above.
- Agent: An LLM-powered class responsible for determining which tools to use and in what order.
## 💁 Contributing
As an open source project in a rapidly developing field, we are extremely open
to contributions, whether it be in the form of a new feature, improved infra, or better documentation.
**Problems Solved**
- Standard agent interfaces
- A selection of powerful agents to choose from
- Common chains that can be used as tools
For detailed information on how to contribute, see [here](CONTRIBUTING.md).
### Memory
By default, Chains and Agents are stateless, meaning that they treat each incoming query independently.
In some applications (chatbots being a GREAT example) it is highly important to remember previous interactions,
both at a short term but also at a long term level. The concept of "Memory" exists to do exactly that.
**Key Concepts**
- Memory: A class that can be added to an Agent or Chain to (1) pull in memory variables before calling that chain/agent, and (2) create new memories after the chain/agent finishes.
- Memory Variables: Variables returned from a Memory class, to be passed into the chain/agent along with the user input.
**Problems Solved**
- Standard memory interfaces
- A collection of common memory implementations to choose from
- Common chains/agents that use memory (e.g. chatbots)
## 🤖 Developer Guide
To begin developing on this project, first clone the repo locally.
### Quick Start
This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's own [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
To install requirements:
```bash
poetry install -E all
```
This will install all requirements for running the package, examples, linting, formatting, and tests. Note the `-E all` flag will install all optional dependencies necessary for integration testing.
Now, you should be able to run the common tasks in the following section.
### Common Tasks
#### Code Formatting
Formatting for this project is a combination of [Black](https://black.readthedocs.io/en/stable/) and [isort](https://pycqa.github.io/isort/).
To run formatting for this project:
```bash
make format
```
#### Linting
Linting for this project is a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
To run linting for this project:
```bash
make lint
```
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer and they can help you with it. We do not want this to be a blocker for good code getting contributed.
#### Testing
Unit tests cover modular logic that does not require calls to outside apis.
To run unit tests:
```bash
make tests
```
If you add new logic, please add a unit test.
Integration tests cover logic that requires making calls to outside APIs (often integration with other services).
To run integration tests:
```bash
make integration_tests
```
If you add support for a new external API, please add a new integration test.
#### Adding a Jupyter Notebook
If you are adding a Jupyter notebook example, you'll want to install the optional `dev` dependencies.
To install dev dependencies:
```bash
poetry install --with dev
```
Launch a notebook:
```bash
poetry run jupyter notebook
```
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
#### Contribute Documentation
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
For that reason, we ask that you add good documentation to all classes and methods.
Similar to linting, we recognize documentation can be annoying - if you do not want to do it, please contact a project maintainer and they can help you with it. We do not want this to be a blocker for good code getting contributed.

View File

@@ -224,7 +224,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -46,7 +46,7 @@
" Tool(\n",
" name = \"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events. You should ask targeted questions\"\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" ),\n",
" Tool(\n",
" name=\"Calculator\",\n",
@@ -56,7 +56,7 @@
" Tool(\n",
" name=\"FooBar DB\",\n",
" func=db_chain.run,\n",
" description=\"useful for when you need to answer questions about FooBar. Input should be in the form of a question containing full context\"\n",
" description=\"useful for when you need to answer questions about FooBar. Input should be in the form of a question\"\n",
" )\n",
"]"
]
@@ -81,44 +81,40 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ZeroShotAgent chain...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
"What is the age of Olivia Wilde's boyfriend raised to the 0.23 power?\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find the age of Olivia Wilde's boyfriend\n",
"Action: Search\n",
"Action Input: \"Who is Olivia Wilde's boyfriend?\"\u001b[0m\n",
"Action Input: \"Olivia Wilde's boyfriend\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mOlivia Wilde started dating Harry Styles after ending her years-long engagement to Jason Sudeikis — see their relationship timeline.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Harry Styles' age.\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find the age of Harry Styles\n",
"Action: Search\n",
"Action Input: \"How old is Harry Styles?\"\u001b[0m\n",
"Action Input: \"Harry Styles age\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m28 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 28 raised to the 0.23 power.\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 28 to the 0.23 power\n",
"Action: Calculator\n",
"Action Input: 28^0.23\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"28^0.23\u001b[32;1m\u001b[1;3m\n",
"\n",
"```python\n",
"import math\n",
"print(math.pow(28, 0.23))\n",
"print(28**0.23)\n",
"```\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m2.1520202182226886\n",
"\u001b[0m\n",
"\u001b[1m> Finished LLMMathChain chain.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.1520202182226886\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 28 years old and his age raised to the 0.23 power is 2.1520202182226886.\u001b[0m\n",
"\u001b[1m> Finished ZeroShotAgent chain.\u001b[0m\n"
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: 2.1520202182226886\u001b[0m"
]
},
{
"data": {
"text/plain": [
"\"Harry Styles, Olivia Wilde's boyfriend, is 28 years old and his age raised to the 0.23 power is 2.1520202182226886.\""
"'2.1520202182226886'"
]
},
"execution_count": 4,
@@ -127,7 +123,7 @@
}
],
"source": [
"mrkl.run(\"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\")"
"mrkl.run(\"What is the age of Olivia Wilde's boyfriend raised to the 0.23 power?\")"
]
},
{
@@ -140,34 +136,43 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ZeroShotAgent chain...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out the artist's full name and then search the FooBar database for their albums.\n",
"Who recently released an album called 'The Storm Before the Calm' and are they in the FooBar database? If so, what albums of theirs are in the FooBar database?\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find an album called 'The Storm Before the Calm'\n",
"Action: Search\n",
"Action Input: \"The Storm Before the Calm\" artist\u001b[0m\n",
"Action Input: \"The Storm Before the Calm album\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mThe Storm Before the Calm (stylized in all lowercase) is the tenth (and eighth international) studio album by Canadian-American singer-songwriter Alanis ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now need to search the FooBar database for Alanis Morissette's albums\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to check if Alanis is in the FooBar database\n",
"Action: FooBar DB\n",
"Action Input: What albums by Alanis Morissette are in the FooBar database?\u001b[0m\n",
"Action Input: \"Does Alanis Morissette exist in the FooBar database?\"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"What albums by Alanis Morissette are in the FooBar database? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Title FROM Album INNER JOIN Artist ON Album.ArtistId = Artist.ArtistId WHERE Artist.Name = 'Alanis Morissette';\u001b[0m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Does Alanis Morissette exist in the FooBar database?\n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT * FROM Artist WHERE Name = 'Alanis Morissette'\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(4, 'Alanis Morissette')]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Yes\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3m Yes\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out what albums of Alanis's are in the FooBar database\n",
"Action: FooBar DB\n",
"Action Input: \"What albums by Alanis Morissette are in the FooBar database?\"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"What albums by Alanis Morissette are in the FooBar database?\n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Album.Title FROM Album JOIN Artist ON Album.ArtistId = Artist.ArtistId WHERE Artist.Name = 'Alanis Morissette'\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[('Jagged Little Pill',)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m The album 'Jagged Little Pill' by Alanis Morissette is in the FooBar database.\u001b[0m\n",
"\u001b[1m> Finished SQLDatabaseChain chain.\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Jagged Little Pill\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3m The album 'Jagged Little Pill' by Alanis Morissette is in the FooBar database.\u001b[0m\n",
"Observation: \u001b[38;5;200m\u001b[1;3m Jagged Little Pill\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Alanis Morissette's album 'Jagged Little Pill' is in the FooBar database.\u001b[0m\n",
"\u001b[1m> Finished ZeroShotAgent chain.\u001b[0m\n"
"Final Answer: The album is by Alanis Morissette and the albums in the FooBar database by her are Jagged Little Pill\u001b[0m"
]
},
{
"data": {
"text/plain": [
"\"Alanis Morissette's album 'Jagged Little Pill' is in the FooBar database.\""
"'The album is by Alanis Morissette and the albums in the FooBar database by her are Jagged Little Pill'"
]
},
"execution_count": 5,
@@ -176,13 +181,13 @@
}
],
"source": [
"mrkl.run(\"What is the full name of the artist who recently released an album called 'The Storm Before the Calm' and are they in the FooBar database? If so, what albums of theirs are in the FooBar database?\")"
"mrkl.run(\"Who recently released an album called 'The Storm Before the Calm' and are they in the FooBar database? If so, what albums of theirs are in the FooBar database?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af016a70",
"id": "d7c2e6ac",
"metadata": {},
"outputs": [],
"source": []
@@ -204,7 +209,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 6,
"id": "4e272b47",
"metadata": {},
"outputs": [],
@@ -38,7 +38,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 7,
"id": "8078c8f1",
"metadata": {},
"outputs": [
@@ -49,6 +49,7 @@
"\n",
"\n",
"\u001b[1m> Entering new ReActDocstoreAgent chain...\u001b[0m\n",
"Author David Chanoff has collaborated with a U.S. Navy admiral who served as the ambassador to the United Kingdom under which President?\n",
"Thought 1:\u001b[32;1m\u001b[1;3m I need to search David Chanoff and find the U.S. Navy admiral he collaborated\n",
"with.\n",
"Action 1: Search[David Chanoff]\u001b[0m\n",
@@ -67,7 +68,7 @@
"'Bill Clinton'"
]
},
"execution_count": 2,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "7e3b513e",
"metadata": {},
"outputs": [
@@ -23,7 +23,8 @@
"\n",
"\n",
"\u001b[1m> Entering new SelfAskWithSearchAgent chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAre follow up questions needed here: Yes.\n",
"What is the hometown of the reigning men's U.S. Open champion?\n",
"Are follow up questions needed here:\u001b[32;1m\u001b[1;3m Yes.\n",
"Follow up: Who is the reigning men's U.S. Open champion?\u001b[0m\n",
"Intermediate answer: \u001b[36;1m\u001b[1;3mCarlos Alcaraz\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mFollow up: Where is Carlos Alcaraz from?\u001b[0m\n",
@@ -38,7 +39,7 @@
"'El Palmar, Spain'"
]
},
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
@@ -57,6 +58,7 @@
"]\n",
"\n",
"self_ask_with_search = initialize_agent(tools, llm, agent=\"self-ask-with-search\", verbose=True)\n",
"\n",
"self_ask_with_search.run(\"What is the hometown of the reigning men's U.S. Open champion?\")"
]
},
@@ -85,7 +87,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -52,32 +52,15 @@ With these primitives in mind, the following chains exist:
**Vector Database Question-Answering**
- **Links Used**: Vectorstore, LLMChain
- **Notes**: This chain takes user input (a question), uses the Vectorstore and semantic search to find relevant documents, and then passes the documents plus the original question to another LLM to generate a final answer.
- **Notes**: This chain takes user input (a question), uses the Vectorstore and semantic search to find relevant documents, and then passes the documents plus to the original question to another LLM to generate a final answer.
- `Example Notebook <chains/vector_db_qa.ipynb>`_
**Vector Database Question-Answering With Sources**
- **Links Used**: Vectorstore, LLMChain
- **Notes**: This chain takes user input (a question), uses the Vectorstore and semantic search to find relevant documents, and then passes the documents plus the original question to another LLM to generate a final answer with sources.
- `Example Notebook <chains/vector_db_qa_with_sources.ipynb>`_
**Question-Answering With Sources**
- **Links Used**: LLMChain
- **Notes**: These types of chains take a question and multiple documents as input, and return an answer plus sources for where that answer came from. There are multiple underlying types of chains to do this, for more information see TODO.
- `Example Notebook <chains/qa_with_sources.ipynb>`_
- **Notes**: This chain takes a question and multiple documents as input. It then runs a first LLMChain over all documents attempting to answer the provided question. It then runs a second LLMChain over the results of the first pass, combining the answers from documents into a single response that is returned.
- `Example Notebook <chains/combine_documents.ipynb>`_
**Question-Answering**
- **Links Used**: LLMChain
- **Notes**: These types of chains take a question and multiple documents as input, and return an answer. There are multiple underlying types of chains to do this, for more information see TODO.
- `Example Notebook <chains/question_answering.ipynb>`_
**Summarization**
- **Links Used**: LLMChain
- **Notes**: These types of chains take multiple documents as input, and return a summary of all documents. There are multiple underlying types of chains to do this, for more information see TODO.
- `Example Notebook <chains/summarize.ipynb>`_
.. toctree::
:maxdepth: 1

View File

@@ -1,971 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b253f4d5",
"metadata": {},
"source": [
"# ChatGPT Clone\n",
"\n",
"This chain replicates ChatGPT by combining (1) a specific prompt, and (2) the concept of memory.\n",
"\n",
"Shows off the example as in https://www.engraved.blog/building-a-virtual-machine-inside/"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "a99acd89",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"\n",
"Human: I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"```\n",
"$ pwd\n",
"/\n",
"```\n"
]
}
],
"source": [
"from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate\n",
"from langchain.chains.conversation.memory import ConversationalBufferWindowMemory\n",
"\n",
"\n",
"template = \"\"\"Assistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"{history}\n",
"Human: {human_input}\n",
"Assistant:\"\"\"\n",
"\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"history\", \"human_input\"], \n",
" template=template\n",
")\n",
"\n",
"\n",
"chatgpt_chain = LLMChain(\n",
" llm=OpenAI(temperature=0), \n",
" prompt=prompt, \n",
" verbose=True, \n",
" memory=ConversationalBufferWindowMemory(k=2),\n",
")\n",
"\n",
"output = chatgpt_chain.predict(human_input=\"I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "4ef711d6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\n",
"AI: \n",
"```\n",
"$ pwd\n",
"/\n",
"```\n",
"Human: ls ~\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"```\n",
"$ ls ~\n",
"Desktop Documents Downloads Music Pictures Public Templates Videos\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"ls ~\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "a5d6dac2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\n",
"AI: \n",
"```\n",
"$ pwd\n",
"/\n",
"```\n",
"Human: ls ~\n",
"AI: \n",
"```\n",
"$ ls ~\n",
"Desktop Documents Downloads Music Pictures Public Templates Videos\n",
"```\n",
"Human: cd ~\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
" \n",
"```\n",
"$ cd ~\n",
"$ pwd\n",
"/home/user\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"cd ~\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "b9283077",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: ls ~\n",
"AI: \n",
"```\n",
"$ ls ~\n",
"Desktop Documents Downloads Music Pictures Public Templates Videos\n",
"```\n",
"Human: cd ~\n",
"AI: \n",
"```\n",
"$ cd ~\n",
"$ pwd\n",
"/home/user\n",
"```\n",
"Human: {Please make a file jokes.txt inside and put some jokes inside}\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ touch jokes.txt\n",
"$ echo \"Why did the chicken cross the road? To get to the other side!\" >> jokes.txt\n",
"$ echo \"What did the fish say when it hit the wall? Dam!\" >> jokes.txt\n",
"$ echo \"Why did the scarecrow win the Nobel Prize? Because he was outstanding in his field!\" >> jokes.txt\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"{Please make a file jokes.txt inside and put some jokes inside}\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "570e785e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: cd ~\n",
"AI: \n",
"```\n",
"$ cd ~\n",
"$ pwd\n",
"/home/user\n",
"```\n",
"Human: {Please make a file jokes.txt inside and put some jokes inside}\n",
"AI: \n",
"\n",
"```\n",
"$ touch jokes.txt\n",
"$ echo \"Why did the chicken cross the road? To get to the other side!\" >> jokes.txt\n",
"$ echo \"What did the fish say when it hit the wall? Dam!\" >> jokes.txt\n",
"$ echo \"Why did the scarecrow win the Nobel Prize? Because he was outstanding in his field!\" >> jokes.txt\n",
"```\n",
"Human: echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py && python3 run.py\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py\n",
"$ python3 run.py\n",
"Result: 33\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"\"\"echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py && python3 run.py\"\"\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "cd0a23d9",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: {Please make a file jokes.txt inside and put some jokes inside}\n",
"AI: \n",
"\n",
"```\n",
"$ touch jokes.txt\n",
"$ echo \"Why did the chicken cross the road? To get to the other side!\" >> jokes.txt\n",
"$ echo \"What did the fish say when it hit the wall? Dam!\" >> jokes.txt\n",
"$ echo \"Why did the scarecrow win the Nobel Prize? Because he was outstanding in his field!\" >> jokes.txt\n",
"```\n",
"Human: echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py && python3 run.py\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py\n",
"$ python3 run.py\n",
"Result: 33\n",
"```\n",
"Human: echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py && python3 run.py\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py\n",
"$ python3 run.py\n",
"[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"\"\"echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py && python3 run.py\"\"\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "90db6eb2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py && python3 run.py\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"x=lambda y:y*5+3;print('Result:' + str(x(6)))\" > run.py\n",
"$ python3 run.py\n",
"Result: 33\n",
"```\n",
"Human: echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py && python3 run.py\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py\n",
"$ python3 run.py\n",
"[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]\n",
"```\n",
"Human: echo -e \"echo 'Hello from Docker\" > entrypoint.sh && echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile && docker build . -t my_docker_image && docker run -t my_docker_image\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ echo -e \"echo 'Hello from Docker\" > entrypoint.sh\n",
"$ echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile\n",
"$ docker build . -t my_docker_image\n",
"$ docker run -t my_docker_image\n",
"Hello from Docker\n",
"```\n"
]
}
],
"source": [
"docker_input = \"\"\"echo -e \"echo 'Hello from Docker\" > entrypoint.sh && echo -e \"FROM ubuntu:20.04\\nCOPY entrypoint.sh entrypoint.sh\\nENTRYPOINT [\\\"/bin/sh\\\",\\\"entrypoint.sh\\\"]\">Dockerfile && docker build . -t my_docker_image && docker run -t my_docker_image\"\"\"\n",
"output = chatgpt_chain.predict(human_input=docker_input)\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "c3806f89",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py && python3 run.py\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"print(list(filter(lambda x: all(x%d for d in range(2,x)),range(2,3**10)))[:10])\" > run.py\n",
"$ python3 run.py\n",
"[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]\n",
"```\n",
"Human: echo -e \"echo 'Hello from Docker\" > entrypoint.sh && echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile && docker build . -t my_docker_image && docker run -t my_docker_image\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"echo 'Hello from Docker\" > entrypoint.sh\n",
"$ echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile\n",
"$ docker build . -t my_docker_image\n",
"$ docker run -t my_docker_image\n",
"Hello from Docker\n",
"```\n",
"Human: nvidia-smi\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ nvidia-smi\n",
"Sat May 15 21:45:02 2021 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"|===============================+======================+======================|\n",
"| 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |\n",
"| N/A 45C P0 N/A / N/A | 511MiB / 10206MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: GPU Memory |\n",
"| GPU PID Type Process name Usage |\n",
"|=============================================================================|\n",
"\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"nvidia-smi\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "f508f597",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: echo -e \"echo 'Hello from Docker\" > entrypoint.sh && echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile && docker build . -t my_docker_image && docker run -t my_docker_image\n",
"AI: \n",
"\n",
"```\n",
"$ echo -e \"echo 'Hello from Docker\" > entrypoint.sh\n",
"$ echo -e \"FROM ubuntu:20.04\n",
"COPY entrypoint.sh entrypoint.sh\n",
"ENTRYPOINT [\"/bin/sh\",\"entrypoint.sh\"]\">Dockerfile\n",
"$ docker build . -t my_docker_image\n",
"$ docker run -t my_docker_image\n",
"Hello from Docker\n",
"```\n",
"Human: nvidia-smi\n",
"AI: \n",
"\n",
"```\n",
"$ nvidia-smi\n",
"Sat May 15 21:45:02 2021 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"|===============================+======================+======================|\n",
"| 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |\n",
"| N/A 45C P0 N/A / N/A | 511MiB / 10206MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: GPU Memory |\n",
"| GPU PID Type Process name Usage |\n",
"|=============================================================================|\n",
"\n",
"Human: ping bbc.com\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ ping bbc.com\n",
"PING bbc.com (151.101.65.81): 56 data bytes\n",
"64 bytes from 151.101.65.81: icmp_seq=0 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=1 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=2 ttl=53 time=14.945 ms\n",
"\n",
"--- bbc.com ping statistics ---\n",
"3 packets transmitted, 3 packets received, 0.0% packet loss\n",
"round-trip min/avg/max/stddev = 14.945/14.945/14.945/0.000 ms\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"ping bbc.com\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "cbd607f4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: nvidia-smi\n",
"AI: \n",
"\n",
"```\n",
"$ nvidia-smi\n",
"Sat May 15 21:45:02 2021 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"|===============================+======================+======================|\n",
"| 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |\n",
"| N/A 45C P0 N/A / N/A | 511MiB / 10206MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: GPU Memory |\n",
"| GPU PID Type Process name Usage |\n",
"|=============================================================================|\n",
"\n",
"Human: ping bbc.com\n",
"AI: \n",
"\n",
"```\n",
"$ ping bbc.com\n",
"PING bbc.com (151.101.65.81): 56 data bytes\n",
"64 bytes from 151.101.65.81: icmp_seq=0 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=1 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=2 ttl=53 time=14.945 ms\n",
"\n",
"--- bbc.com ping statistics ---\n",
"3 packets transmitted, 3 packets received, 0.0% packet loss\n",
"round-trip min/avg/max/stddev = 14.945/14.945/14.945/0.000 ms\n",
"```\n",
"Human: curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"1.8.1\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"\"\"curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\"\"\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "d33e0e28",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: ping bbc.com\n",
"AI: \n",
"\n",
"```\n",
"$ ping bbc.com\n",
"PING bbc.com (151.101.65.81): 56 data bytes\n",
"64 bytes from 151.101.65.81: icmp_seq=0 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=1 ttl=53 time=14.945 ms\n",
"64 bytes from 151.101.65.81: icmp_seq=2 ttl=53 time=14.945 ms\n",
"\n",
"--- bbc.com ping statistics ---\n",
"3 packets transmitted, 3 packets received, 0.0% packet loss\n",
"round-trip min/avg/max/stddev = 14.945/14.945/14.945/0.000 ms\n",
"```\n",
"Human: curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"AI: \n",
"\n",
"```\n",
"$ curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"1.8.1\n",
"```\n",
"Human: lynx https://www.deepmind.com/careers\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ lynx https://www.deepmind.com/careers\n",
"DeepMind Careers\n",
"\n",
"Welcome to DeepMind Careers. We are a world-leading artificial intelligence research and development company, and we are looking for talented people to join our team.\n",
"\n",
"We offer a range of exciting opportunities in research, engineering, product, and operations. Our mission is to solve intelligence and make it useful, and we are looking for people who share our passion for pushing the boundaries of AI.\n",
"\n",
"Explore our current openings and apply today. We look forward to hearing from you.\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"lynx https://www.deepmind.com/careers\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "57c2f113",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"AI: \n",
"\n",
"```\n",
"$ curl -fsSL \"https://api.github.com/repos/pytorch/pytorch/releases/latest\" | jq -r '.tag_name' | sed 's/[^0-9\\.\\-]*//g'\n",
"1.8.1\n",
"```\n",
"Human: lynx https://www.deepmind.com/careers\n",
"AI: \n",
"\n",
"```\n",
"$ lynx https://www.deepmind.com/careers\n",
"DeepMind Careers\n",
"\n",
"Welcome to DeepMind Careers. We are a world-leading artificial intelligence research and development company, and we are looking for talented people to join our team.\n",
"\n",
"We offer a range of exciting opportunities in research, engineering, product, and operations. Our mission is to solve intelligence and make it useful, and we are looking for people who share our passion for pushing the boundaries of AI.\n",
"\n",
"Explore our current openings and apply today. We look forward to hearing from you.\n",
"```\n",
"Human: curl https://chat.openai.com/chat\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
" \n",
"\n",
"```\n",
"$ curl https://chat.openai.com/chat\n",
"<html>\n",
" <head>\n",
" <title>OpenAI Chat</title>\n",
" </head>\n",
" <body>\n",
" <h1>Welcome to OpenAI Chat!</h1>\n",
" <p>\n",
" OpenAI Chat is a natural language processing platform that allows you to interact with OpenAI's AI models in a conversational way.\n",
" </p>\n",
" <p>\n",
" To get started, type a message in the box below and press enter.\n",
" </p>\n",
" </body>\n",
"</html>\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"curl https://chat.openai.com/chat\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "babadc78",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: lynx https://www.deepmind.com/careers\n",
"AI: \n",
"\n",
"```\n",
"$ lynx https://www.deepmind.com/careers\n",
"DeepMind Careers\n",
"\n",
"Welcome to DeepMind Careers. We are a world-leading artificial intelligence research and development company, and we are looking for talented people to join our team.\n",
"\n",
"We offer a range of exciting opportunities in research, engineering, product, and operations. Our mission is to solve intelligence and make it useful, and we are looking for people who share our passion for pushing the boundaries of AI.\n",
"\n",
"Explore our current openings and apply today. We look forward to hearing from you.\n",
"```\n",
"Human: curl https://chat.openai.com/chat\n",
"AI: \n",
"\n",
"```\n",
"$ curl https://chat.openai.com/chat\n",
"<html>\n",
" <head>\n",
" <title>OpenAI Chat</title>\n",
" </head>\n",
" <body>\n",
" <h1>Welcome to OpenAI Chat!</h1>\n",
" <p>\n",
" OpenAI Chat is a natural language processing platform that allows you to interact with OpenAI's AI models in a conversational way.\n",
" </p>\n",
" <p>\n",
" To get started, type a message in the box below and press enter.\n",
" </p>\n",
" </body>\n",
"</html>\n",
"```\n",
"Human: curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"What is artificial intelligence?\"}' https://chat.openai.com/chat\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
"\n",
"\n",
"```\n",
"$ curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"What is artificial intelligence?\"}' https://chat.openai.com/chat\n",
"\n",
"{\n",
" \"response\": \"Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions) and self-correction. AI is used to develop computer systems that can think and act like humans.\"\n",
"}\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"\"\"curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"What is artificial intelligence?\"}' https://chat.openai.com/chat\"\"\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "0954792a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mAssistant is a large language model trained by OpenAI.\n",
"\n",
"Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n",
"\n",
"Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n",
"\n",
"Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n",
"\n",
"Human: curl https://chat.openai.com/chat\n",
"AI: \n",
"\n",
"```\n",
"$ curl https://chat.openai.com/chat\n",
"<html>\n",
" <head>\n",
" <title>OpenAI Chat</title>\n",
" </head>\n",
" <body>\n",
" <h1>Welcome to OpenAI Chat!</h1>\n",
" <p>\n",
" OpenAI Chat is a natural language processing platform that allows you to interact with OpenAI's AI models in a conversational way.\n",
" </p>\n",
" <p>\n",
" To get started, type a message in the box below and press enter.\n",
" </p>\n",
" </body>\n",
"</html>\n",
"```\n",
"Human: curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"What is artificial intelligence?\"}' https://chat.openai.com/chat\n",
"AI: \n",
"\n",
"```\n",
"$ curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"What is artificial intelligence?\"}' https://chat.openai.com/chat\n",
"\n",
"{\n",
" \"response\": \"Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using the rules to reach approximate or definite conclusions) and self-correction. AI is used to develop computer systems that can think and act like humans.\"\n",
"}\n",
"```\n",
"Human: curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\"}' https://chat.openai.com/chat\n",
"Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n",
" \n",
"\n",
"```\n",
"$ curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\"}' https://chat.openai.com/chat\n",
"\n",
"{\n",
" \"response\": \"```\n",
"/home/user\n",
"```\"\n",
"}\n",
"```\n"
]
}
],
"source": [
"output = chatgpt_chain.predict(human_input=\"\"\"curl --header \"Content-Type:application/json\" --request POST --data '{\"message\": \"I want you to act as a Linux terminal. I will type commands and you will reply with what the terminal should show. I want you to only reply wiht the terminal output inside one unique code block, and nothing else. Do not write explanations. Do not type commands unless I instruct you to do so. When I need to tell you something in English I will do so by putting text inside curly brackets {like this}. My first command is pwd.\"}' https://chat.openai.com/chat\"\"\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e68a087e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -5,7 +5,7 @@
"id": "efc5be67",
"metadata": {},
"source": [
"# VectorDB Question Ansering with Sources\n",
"# Question-Answering with Sources\n",
"\n",
"This notebook goes over how to do question-answering with sources. It does this in a few different ways - first showing how you can use the `QAWithSourcesChain` to take in documents and use those, and next showing the `VectorDBQAWithSourcesChain`, which also does the lookup of the documents from a vector database. "
]
@@ -61,6 +61,72 @@
" d.metadata = {'source': f\"{i}-pl\"}"
]
},
{
"cell_type": "markdown",
"id": "aa1c1b60",
"metadata": {},
"source": [
"### QAWithSourcesChain\n",
"This shows how to use the `QAWithSourcesChain`, which takes in document objects and uses them directly."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "61bce191",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "57ddf8c7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import QAWithSourcesChain\n",
"from langchain.llms import OpenAI, Cohere\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f908a92a",
"metadata": {},
"outputs": [],
"source": [
"chain = QAWithSourcesChain.from_llm(OpenAI(temperature=0))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a505ac89",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president thanked Justice Breyer for his service.',\n",
" 'sources': '27-pl'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain({\"docs\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "e6fc81de",
@@ -93,22 +159,10 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "8ba36fa7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president thanked Justice Breyer for his service.',\n",
" 'sources': '27-pl'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"chain({\"question\": \"What did the president say about Justice Breyer\"}, return_only_outputs=True)"
]
@@ -138,7 +192,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.8.7"
}
},
"nbformat": 4,

View File

@@ -1,87 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BashChain\n",
"This notebook showcases using LLMs and a bash process to do perform simple filesystem commands."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMBashChain chain...\u001b[0m\n",
"Please write a bash script that prints 'Hello World' to the console.\u001b[32;1m\u001b[1;3m\n",
"\n",
"```bash\n",
"echo \"Hello World\"\n",
"```\u001b[0m['```bash', 'echo \"Hello World\"', '```']\n",
"\n",
"Answer: \u001b[33;1m\u001b[1;3mHello World\n",
"\u001b[0m\n",
"\u001b[1m> Finished LLMBashChain chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Hello World\\n'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import LLMBashChain\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"\n",
"text = \"Please write a bash script that prints 'Hello World' to the console.\"\n",
"\n",
"bash_chain = LLMBashChain(llm=llm, verbose=True)\n",
"\n",
"bash_chain.run(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -13,26 +13,6 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "835e6978",
"metadata": {},
"outputs": [],
"source": [
"from langchain import PromptTemplate, OpenAI, LLMChain"
]
},
{
"cell_type": "markdown",
"id": "06bcb078",
"metadata": {},
"source": [
"### Single Input\n",
"\n",
"First, lets go over an example using a single input"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "51a54c4d",
"metadata": {},
"outputs": [
@@ -42,27 +22,29 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mQuestion: What NFL team won the Super Bowl in the year Justin Beiber was born?\n",
"\n",
"Answer: Let's think step by step.\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Justin Bieber was born in 1994, so the NFL team that won the Super Bowl in 1994 was the Dallas Cowboys.'"
"' The year Justin Beiber was born was 1994. In 1994, the Dallas Cowboys won the Super Bowl.'"
]
},
"execution_count": 3,
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import PromptTemplate, OpenAI, LLMChain\n",
"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
@@ -71,60 +53,13 @@
"\n",
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.predict(question=question)"
]
},
{
"cell_type": "markdown",
"id": "79c3ec4d",
"metadata": {},
"source": [
"### Multiple Inputs\n",
"Now lets go over an example using multiple inputs."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "03dd6918",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mWrite a sad poem about ducks.\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nThe ducks swim in the pond,\\nTheir feathers so soft and warm,\\nBut they can't help but feel so forlorn.\\n\\nTheir quacks echo in the air,\\nBut no one is there to hear,\\nFor they have no one to share.\\n\\nThe ducks paddle around in circles,\\nTheir heads hung low in despair,\\nFor they have no one to care.\\n\\nThe ducks look up to the sky,\\nBut no one is there to see,\\nFor they have no one to be.\\n\\nThe ducks drift away in the night,\\nTheir hearts filled with sorrow and pain,\\nFor they have no one to gain.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"template = \"\"\"Write a {adjective} poem about {subject}.\"\"\"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"adjective\", \"subject\"])\n",
"llm_chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0), verbose=True)\n",
"\n",
"llm_chain.predict(adjective=\"sad\", subject=\"ducks\")"
"llm_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8310cdaa",
"id": "03dd6918",
"metadata": {},
"outputs": [],
"source": []
@@ -146,7 +81,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -1,97 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LLMCheckerChain\n",
"This notebook showcases how to use LLMCheckerChain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMCheckerChain chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new SequentialChain chain...\u001b[0m\n",
"\u001b[1mChain 0\u001b[0m:\n",
"{'statement': '\\nThe largest mammal that lays eggs is the platypus.'}\n",
"\n",
"\u001b[1mChain 1\u001b[0m:\n",
"{'assertions': '\\n• The largest mammal is the platypus.\\n• The platypus lays eggs.\\n• There is no larger mammal than the platypus that lays eggs.'}\n",
"\n",
"\u001b[1mChain 2\u001b[0m:\n",
"{'checked_assertions': '\\n1. The largest mammal is the platypus. False. The blue whale is the largest mammal.\\n\\n2. The platypus lays eggs. True. The Platypus is one of only two mammals that lay eggs.\\n\\n3. There is no larger mammal than the platypus that lays eggs. False. The echidna is another mammal that lays eggs and is larger than the platypus.'}\n",
"\n",
"\u001b[1mChain 3\u001b[0m:\n",
"{'revised_statement': ' The echidna is the type of mammal that lays the biggest eggs.'}\n",
"\n",
"\n",
"\u001b[1m> Finished SequentialChain chain.\u001b[0m\n",
"\n",
"\u001b[1m> Finished LLMCheckerChain chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' The echidna is the type of mammal that lays the biggest eggs.'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import LLMCheckerChain\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0.7)\n",
"\n",
"text = \"What type of mammal lays the biggest eggs?\"\n",
"\n",
"checker_chain = LLMCheckerChain(llm=llm, verbose=True)\n",
"\n",
"checker_chain.run(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,123 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd7ec7af",
"metadata": {},
"source": [
"# LLMRequestsChain\n",
"\n",
"Using the request library to get HTML results from a URL and then an LLM to parse results"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "dd8eae75",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import LLMRequestsChain, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "65bf324e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"\n",
"template = \"\"\"Between >>> and <<< are the raw search result text from google.\n",
"Extract the answer to the question '{query}' or say \"not found\" if the information is not contained.\n",
"Use the format\n",
"Extracted:<answer or \"not found\">\n",
">>> {requests_result} <<<\n",
"Extracted:\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"query\", \"requests_result\"],\n",
" template=template,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f36ae0d8",
"metadata": {},
"outputs": [],
"source": [
"chain = LLMRequestsChain(llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=PROMPT))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b5d22d9d",
"metadata": {},
"outputs": [],
"source": [
"question = \"What are the Three (3) biggest countries, and their respective sizes?\"\n",
"inputs = {\n",
" \"query\": question,\n",
" \"url\": \"https://www.google.com/search?q=\" + question.replace(\" \", \"+\")\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2ea81168",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the Three (3) biggest countries, and their respective sizes?',\n",
" 'url': 'https://www.google.com/search?q=What+are+the+Three+(3)+biggest+countries,+and+their+respective+sizes?',\n",
" 'output': ' Russia (17,098,242 sq km), Canada (9,984,670 sq km), China (9,706,961 sq km)'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(inputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db8f2b6d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,93 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d9a0131f",
"metadata": {},
"source": [
"# Map Reduce\n",
"\n",
"This notebok showcases an example of map-reduce chains: recursive summarization."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e9db25f3",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI, PromptTemplate, LLMChain\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chains.mapreduce import MapReduceChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"\n",
"_prompt = \"\"\"Write a concise summary of the following:\n",
"\n",
"\n",
"{text}\n",
"\n",
"\n",
"CONCISE SUMMARY:\"\"\"\n",
"prompt = PromptTemplate(template=_prompt, input_variables=[\"text\"])\n",
"\n",
"text_splitter = CharacterTextSplitter()\n",
"\n",
"mp_chain = MapReduceChain.from_params(llm, prompt, text_splitter)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "99bbe19b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"\\n\\nThe President discusses the recent aggression by Russia, and the response by the United States and its allies. He announces new sanctions against Russia, and says that the free world is united in holding Putin accountable. The President also discusses the American Rescue Plan, the Bipartisan Infrastructure Law, and the Bipartisan Innovation Act. Finally, the President addresses the need for women's rights and equality for LGBTQ+ Americans.\""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"mp_chain.run(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "baa6e808",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,435 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b83e61ed",
"metadata": {},
"source": [
"# Moderation\n",
"This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, [specifically prohibit](https://beta.openai.com/docs/usage-policies/use-case-policy) you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.\n",
"\n",
"If the content passed into the moderation chain is harmful, there is not one best way to handle it, it probably depends on your application. Sometimes you may want to throw an error in the Chain (and have your application handle that). Other times, you may want to return something to the user explaining that the text was harmful. There could even be other ways to handle it! We will cover all these ways in this notebook.\n",
"\n",
"In this notebook, we will show:\n",
"\n",
"1. How to run any piece of text through a moderation chain.\n",
"2. How to append a Moderation chain to a LLMChain."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b7aa1ff2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import OpenAIModerationChain, SequentialChain, LLMChain, SimpleSequentialChain\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "markdown",
"id": "c26d5be6",
"metadata": {},
"source": [
"## How to use the moderation chain\n",
"\n",
"Here's an example of using the moderation chain with default settings (will return a string explaining stuff was flagged)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fd0fc85c",
"metadata": {},
"outputs": [],
"source": [
"moderation_chain = OpenAIModerationChain()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3fa47dd7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is okay'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"moderation_chain.run(\"This is okay\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "37bfad73",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Text was found that violates OpenAI's content policy.\""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"moderation_chain.run(\"I will kill you\")"
]
},
{
"cell_type": "markdown",
"id": "196820ab",
"metadata": {},
"source": [
"Here's an example of using the moderation chain to throw an error."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b29c1150",
"metadata": {},
"outputs": [],
"source": [
"moderation_chain_error = OpenAIModerationChain(error=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f9ab64d9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is okay'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"moderation_chain_error.run(\"This is okay\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "954f3da2",
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "Text was found that violates OpenAI's content policy.",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[8], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mmoderation_chain_error\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mI will kill you\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/workplace/third_party/langchain/langchain/chains/base.py:114\u001b[0m, in \u001b[0;36mChain.run\u001b[0;34m(self, text)\u001b[0m\n\u001b[1;32m 109\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[1;32m 110\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 111\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m`run` not supported when there is not exactly \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 112\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mone output key, got \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 113\u001b[0m )\n\u001b[0;32m--> 114\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m{\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minput_keys\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43mtext\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]]\n",
"File \u001b[0;32m~/workplace/third_party/langchain/langchain/chains/base.py:87\u001b[0m, in \u001b[0;36mChain.__call__\u001b[0;34m(self, inputs, return_only_outputs)\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mverbose:\n\u001b[1;32m 84\u001b[0m \u001b[38;5;28mprint\u001b[39m(\n\u001b[1;32m 85\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\033\u001b[39;00m\u001b[38;5;124m[1m> Entering new \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m chain...\u001b[39m\u001b[38;5;130;01m\\033\u001b[39;00m\u001b[38;5;124m[0m\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 86\u001b[0m )\n\u001b[0;32m---> 87\u001b[0m outputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 88\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mverbose:\n\u001b[1;32m 89\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;130;01m\\033\u001b[39;00m\u001b[38;5;124m[1m> Finished \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__class__\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m chain.\u001b[39m\u001b[38;5;130;01m\\033\u001b[39;00m\u001b[38;5;124m[0m\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"File \u001b[0;32m~/workplace/third_party/langchain/langchain/chains/moderation.py:79\u001b[0m, in \u001b[0;36mOpenAIModerationChain._call\u001b[0;34m(self, inputs)\u001b[0m\n\u001b[1;32m 77\u001b[0m text \u001b[38;5;241m=\u001b[39m inputs[\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39minput_key]\n\u001b[1;32m 78\u001b[0m results \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mclient\u001b[38;5;241m.\u001b[39mcreate(text)\n\u001b[0;32m---> 79\u001b[0m output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_moderate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtext\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mresults\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mresults\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 80\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m {\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_key: output}\n",
"File \u001b[0;32m~/workplace/third_party/langchain/langchain/chains/moderation.py:71\u001b[0m, in \u001b[0;36mOpenAIModerationChain._moderate\u001b[0;34m(self, text, results)\u001b[0m\n\u001b[1;32m 69\u001b[0m error_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mText was found that violates OpenAI\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ms content policy.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 70\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39merror:\n\u001b[0;32m---> 71\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(error_str)\n\u001b[1;32m 72\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 73\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m error_str\n",
"\u001b[0;31mValueError\u001b[0m: Text was found that violates OpenAI's content policy."
]
}
],
"source": [
"moderation_chain_error.run(\"I will kill you\")"
]
},
{
"cell_type": "markdown",
"id": "8de5dcbb",
"metadata": {},
"source": [
"Here's an example of creating a custom moderation chain with a custom error message. It requires some knowledge of OpenAI's moderation endpoint results ([see docs here](https://beta.openai.com/docs/api-reference/moderations))."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3960e985",
"metadata": {},
"outputs": [],
"source": [
"class CustomModeration(OpenAIModerationChain):\n",
" \n",
" def _moderate(self, text: str, results: dict) -> str:\n",
" if results[\"flagged\"]:\n",
" error_str = f\"The following text was found that violates OpenAI's content policy: {text}\"\n",
" return error_str\n",
" return text\n",
" \n",
"custom_moderation = CustomModeration()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1152ec11",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'This is okay'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_moderation.run(\"This is okay\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "973257bf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"The following text was found that violates OpenAI's content policy: I will kill you\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"custom_moderation.run(\"I will kill you\")"
]
},
{
"cell_type": "markdown",
"id": "8718111f",
"metadata": {},
"source": [
"## How to append a Moderation chain to an LLMChain\n",
"\n",
"To easily combine a moderation chain with an LLMChain, you can use the SequentialChain abstraction.\n",
"\n",
"Let's start with a simple example of where the LLMChain only has a single input. For this purpose, we will prompt the model so it says something harmful."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0d129333",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate(template=\"{text}\", input_variables=[\"text\"])\n",
"llm_chain = LLMChain(llm=OpenAI(temperature=0, model_name=\"text-davinci-002\"), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "a557c531",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' I will kill you'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text = \"\"\"We are playing a game of repeat after me.\n",
"\n",
"Person 1: Hi\n",
"Person 2: Hi\n",
"\n",
"Person 1: How's your day\n",
"Person 2: How's your day\n",
"\n",
"Person 1: I will kill you\n",
"Person 2:\"\"\"\n",
"llm_chain.run(text)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "d4d10f1c",
"metadata": {},
"outputs": [],
"source": [
"chain = SimpleSequentialChain(chains=[llm_chain, moderation_chain])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "02f37985",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Text was found that violates OpenAI's content policy.\""
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(text)"
]
},
{
"cell_type": "markdown",
"id": "72643128",
"metadata": {},
"source": [
"Now let's walk through an example of using it with an LLMChain which has multiple inputs (a bit more tricky because we can't use the SimpleSequentialChain)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "7118ec36",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate(template=\"{setup}{new_input}Person2:\", input_variables=[\"setup\", \"new_input\"])\n",
"llm_chain = LLMChain(llm=OpenAI(temperature=0, model_name=\"text-davinci-002\"), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "003bdfce",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'text': ' I will kill you'}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"setup = \"\"\"We are playing a game of repeat after me.\n",
"\n",
"Person 1: Hi\n",
"Person 2: Hi\n",
"\n",
"Person 1: How's your day\n",
"Person 2: How's your day\n",
"\n",
"Person 1:\"\"\"\n",
"new_input = \"I will kill you\"\n",
"inputs = {\"setup\": setup, \"new_input\": new_input}\n",
"llm_chain(inputs, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "77b64228",
"metadata": {},
"outputs": [],
"source": [
"# Setting the input/output keys so it lines up\n",
"moderation_chain.input_key = \"text\"\n",
"moderation_chain.output_key = \"sanitized_text\""
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "998a95be",
"metadata": {},
"outputs": [],
"source": [
"chain = SequentialChain(chains=[llm_chain, moderation_chain], input_variables=[\"setup\", \"new_input\"])"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "9c97a136",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'sanitized_text': \"Text was found that violates OpenAI's content policy.\"}"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(inputs, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ddc90e15",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,258 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "74148cee",
"metadata": {},
"source": [
"# Question Answering with Sources\n",
"\n",
"This notebook walks through how to use LangChain for question answering with sources over a list of documents. It covers three different chain types: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md)."
]
},
{
"cell_type": "markdown",
"id": "ca2f0efc",
"metadata": {},
"source": [
"### Prepare Data\n",
"First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "78f28130",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.embeddings.cohere import CohereEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4da195a3",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5ec2b55b",
"metadata": {},
"outputs": [],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{\"source\": i} for i in range(len(texts))])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5286f58f",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "005a47e9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "d82f899a",
"metadata": {},
"source": [
"### The `stuff` Chain\n",
"\n",
"This sections shows results of using the `stuff` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "fc1a5ed6",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e239964b",
"metadata": {},
"outputs": [],
"source": [
"docs = [Document(page_content=t, metadata={\"source\": i}) for i, t in enumerate(texts[:3])]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7d766417",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.\\nSOURCES: 0-pl, 1-pl, 2-pl'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "c5dbb304",
"metadata": {},
"source": [
"### The `map_reduce` Chain\n",
"\n",
"This sections shows results of using the `map_reduce` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "921db0a4",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e417926a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n",
"Token indices sequence length is longer than the specified maximum sequence length for this model (1546 > 1024). Running this sequence through the model will result in indexing errors\n"
]
},
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.\\nSOURCES: 0, 1, 2'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "5bf0e1ab",
"metadata": {},
"source": [
"### The `refine` Chain\n",
"\n",
"This sections shows results of using the `refine` Chain to do question answering with sources."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "904835c8",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"refine\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f60875c6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': \"\\n\\nThe president did not mention Justice Breyer in his speech to the European Parliament, which focused on building a coalition of freedom-loving nations to confront Putin, unifying European allies, countering Russia's lies with truth, and enforcing powerful economic sanctions. Source: 2\"}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "929620d0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,248 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "05859721",
"metadata": {},
"source": [
"# Question Answering\n",
"\n",
"This notebook walks through how to use LangChain for question answering over a list of documents. It covers three different types of chaings: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md)."
]
},
{
"cell_type": "markdown",
"id": "726f4996",
"metadata": {},
"source": [
"### Prepare Data\n",
"First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "17fcbc0f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "291f0117",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fd9666a9",
"metadata": {},
"outputs": [],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d1eaf6e6",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a16e3453",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "f78787a0",
"metadata": {},
"source": [
"### The `stuff` Chain\n",
"\n",
"This sections shows results of using the `stuff` Chain to do question answering."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "180fd4c1",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"stuff\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d145ae31",
"metadata": {},
"outputs": [],
"source": [
"docs = [Document(page_content=t) for t in texts[:3]]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "77fdf1aa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "91522e29",
"metadata": {},
"source": [
"### The `map_reduce` Chain\n",
"\n",
"This sections shows results of using the `map_reduce` Chain to do question answering."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b0060f51",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "fbdb9137",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': ' The president did not mention Justice Breyer.'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "markdown",
"id": "6ea50ad0",
"metadata": {},
"source": [
"### The `refine` Chain\n",
"\n",
"This sections shows results of using the `refine` Chain to do question answering."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "fb167057",
"metadata": {},
"outputs": [],
"source": [
"chain = load_qa_chain(OpenAI(temperature=0), chain_type=\"refine\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d8b5286e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output_text': \"\\n\\nThe president did not mention Justice Breyer in his speech to the European Parliament about building a coalition of freedom-loving nations to confront Putin, unifying European allies, countering Russia's lies with truth, and enforcing powerful economic sanctions.\"}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Justice Breyer\"\n",
"chain({\"input_documents\": docs, \"question\": query}, return_only_outputs=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49e9c6d7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,234 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d9a0131f",
"metadata": {},
"source": [
"# Summarization\n",
"\n",
"This notebook walks through how to use LangChain for summarization over a list of documents. It covers three different chain types: `stuff`, `map_reduce`, and `refine`. For a more in depth explanation of what these chain types are, see [here](../../explanation/combine_docs.md)."
]
},
{
"cell_type": "markdown",
"id": "0b5660bf",
"metadata": {},
"source": [
"### Prepare Data\n",
"First we prepare the data. For this example we create multiple documents from one long one, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e9db25f3",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI, PromptTemplate, LLMChain\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.chains.mapreduce import MapReduceChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"\n",
"\n",
"text_splitter = CharacterTextSplitter()\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "99bbe19b",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "baa6e808",
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore.document import Document"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8dff4f43",
"metadata": {},
"outputs": [],
"source": [
"docs = [Document(page_content=t) for t in texts[:3]]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "27989fc4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.summarize import load_summarize_chain"
]
},
{
"cell_type": "markdown",
"id": "ea2d5c99",
"metadata": {},
"source": [
"### The `stuff` Chain\n",
"\n",
"This sections shows results of using the `stuff` Chain to do summarization."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f01f3196",
"metadata": {},
"outputs": [],
"source": [
"chain = load_summarize_chain(llm, chain_type=\"stuff\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "da4d9801",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' In his speech, President Biden addressed the ongoing conflict between Russia and Ukraine, and the need for the United States and its allies to stand with Ukraine. He also discussed the American Rescue Plan, the Bipartisan Infrastructure Law, and the Bipartisan Innovation Act, which will help to create jobs, modernize infrastructure, and level the playing field with China. He also emphasized the importance of buying American products to support American jobs.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(docs)"
]
},
{
"cell_type": "markdown",
"id": "9c868e86",
"metadata": {},
"source": [
"### The `map_reduce` Chain\n",
"\n",
"This sections shows results of using the `map_reduce` Chain to do summarization."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ef28e1d4",
"metadata": {},
"outputs": [],
"source": [
"chain = load_summarize_chain(llm, chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f82c5f9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" In response to Vladimir Putin's aggression in Ukraine, the US and its allies have taken action to hold him accountable, including economic sanctions, cutting off access to technology, and seizing the assets of Russian oligarchs. They are also providing military, economic, and humanitarian assistance to the Ukrainians, and releasing 60 million barrels of oil from reserves around the world. President Biden has passed several laws to provide economic relief to Americans and create jobs, and is making sure taxpayer dollars support American jobs and businesses.\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(docs)"
]
},
{
"cell_type": "markdown",
"id": "f61350f9",
"metadata": {},
"source": [
"### The `refine` Chain\n",
"\n",
"This sections shows results of using the `refine` Chain to do summarization."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3bcbe31e",
"metadata": {},
"outputs": [],
"source": [
"chain = load_summarize_chain(llm, chain_type=\"refine\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c8cad866",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"\\nIn this speech, the speaker addresses the American people and their allies, discussing the recent aggression of Russia's Vladimir Putin in Ukraine. The speaker outlines the actions taken by the United States and its allies to hold Putin accountable, including economic sanctions, cutting off access to technology, and seizing the assets of Russian oligarchs. The speaker also announces the closing of American airspace to Russian flights, further isolating Russia and adding an additional squeeze on their economy. The Russian stock market has lost 40% of its value and trading remains suspended. Together with our allies, the United States is providing military, economic, and humanitarian assistance to Ukraine, and has mobilized forces to protect NATO countries. The speaker also announces the release of 60 million barrels of oil from reserves around the world, with the United States releasing 30 million barrels from its own Strategic Petroleum Reserve. The speaker emphasizes that the United States and its allies will defend every inch of NATO territory and that Putin will pay a high price for his aggression. The speaker also acknowledges the hardships faced by the American people due to the pandemic and the American Rescue Plan, which has provided immediate economic relief for tens of millions of Americans, helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. The speaker\""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0da92750",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,130 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "872bb8b5",
"metadata": {},
"source": [
"# Transformation Chain\n",
"\n",
"This notebook showcases using a generic transformation chain.\n",
"\n",
"As an example, we will create a dummy transformation that takes in a super long text, filters the text to only the first 3 paragraphs, and then passes that into an LLMChain to summarize those."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bbbb4330",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import TransformChain, LLMChain, SimpleSequentialChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8ae5937c",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "98739592",
"metadata": {},
"outputs": [],
"source": [
"def transform_func(inputs: dict) -> dict:\n",
" text = inputs[\"text\"]\n",
" shortened_text = \"\\n\\n\".join(text.split(\"\\n\\n\")[:3])\n",
" return {\"output_text\": shortened_text}\n",
"\n",
"transform_chain = TransformChain(input_variables=[\"text\"], output_variables=[\"output_text\"], transform=transform_func)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e9397934",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Summarize this text:\n",
"\n",
"{output_text}\n",
"\n",
"Summary:\"\"\"\n",
"prompt = PromptTemplate(input_variables=[\"output_text\"], template=template)\n",
"llm_chain = LLMChain(llm=OpenAI(), prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "06f51f17",
"metadata": {},
"outputs": [],
"source": [
"sequential_chain = SimpleSequentialChain(chains=[transform_chain, llm_chain])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f7caa1ee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' This speech addresses the American people and acknowledges the difficulties of last year due to COVID-19. It emphasizes the importance of coming together regardless of political affiliation and encourages a sense of unity as Americans.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sequential_chain.run(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3ca6409",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -41,27 +41,27 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "3018f865",
"metadata": {},
"outputs": [],
"source": [
"qa = VectorDBQA.from_llm(llm=OpenAI(), vectorstore=docsearch)"
"qa = VectorDBQA(llm=OpenAI(), vectorstore=docsearch)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "032a47f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator and federal public defender, and from a family of public school educators and police officers. He also said that she has received a broad range of support since she was nominated, from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"' The President said that Ketanji Brown Jackson is a consensus builder and has received a broad range of support since she was nominated.'"
]
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -74,7 +74,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "f056f6fd",
"id": "f0f20b92",
"metadata": {},
"outputs": [],
"source": []
@@ -96,7 +96,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,180 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b118c9dc",
"metadata": {},
"source": [
"# HuggingFace Tokenizers\n",
"\n",
"This notebook show cases how to use HuggingFace tokenizers to split text."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e82c4685",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a8ce51d5",
"metadata": {},
"outputs": [],
"source": [
"from transformers import GPT2TokenizerFast\n",
"\n",
"tokenizer = GPT2TokenizerFast.from_pretrained(\"gpt2\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ca5e72c0",
"metadata": {},
"outputs": [],
"source": [
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(tokenizer, chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "37cdfbeb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny. \n",
"\n",
"Six days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n",
"\n",
"He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n",
"\n",
"He met the Ukrainian people. \n",
"\n",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n",
"\n",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n",
"\n",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. \n",
"\n",
"Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. \n",
"\n",
"Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. \n",
"\n",
"Throughout our history weve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. \n",
"\n",
"They keep moving. \n",
"\n",
"And the costs and the threats to America and the world keep rising. \n",
"\n",
"Thats why the NATO Alliance was created to secure peace and stability in Europe after World War 2. \n",
"\n",
"The United States is a member along with 29 other nations. \n",
"\n",
"It matters. American diplomacy matters. American resolve matters. \n",
"\n",
"Putins latest attack on Ukraine was premeditated and unprovoked. \n",
"\n",
"He rejected repeated efforts at diplomacy. \n",
"\n",
"He thought the West and NATO wouldnt respond. And he thought he could divide us at home. Putin was wrong. We were ready. Here is what we did. \n",
"\n",
"We prepared extensively and carefully. \n",
"\n",
"We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin. \n",
"\n",
"I spent countless hours unifying our European allies. We shared with the world in advance what we knew Putin was planning and precisely how he would try to falsely justify his aggression. \n",
"\n",
"We countered Russias lies with truth. \n",
"\n",
"And now that he has acted the free world is holding him accountable. \n",
"\n",
"Along with twenty-seven members of the European Union including France, Germany, Italy, as well as countries like the United Kingdom, Canada, Japan, Korea, Australia, New Zealand, and many others, even Switzerland. \n",
"\n",
"We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \n",
"\n",
"Together with our allies we are right now enforcing powerful economic sanctions. \n",
"\n",
"We are cutting off Russias largest banks from the international financial system. \n",
"\n",
"Preventing Russias central bank from defending the Russian Ruble making Putins $630 Billion “war fund” worthless. \n",
"\n",
"We are choking off Russias access to technology that will sap its economic strength and weaken its military for years to come. \n",
"\n",
"Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n",
"\n",
"The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. \n",
"\n",
"We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. \n",
"\n",
"And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights further isolating Russia and adding an additional squeeze on their economy. The Ruble has lost 30% of its value. \n",
"\n",
"The Russian stock market has lost 40% of its value and trading remains suspended. Russias economy is reeling and Putin alone is to blame. \n",
"\n",
"Together with our allies we are providing support to the Ukrainians in their fight for freedom. Military assistance. Economic assistance. Humanitarian assistance. \n",
"\n",
"We are giving more than $1 Billion in direct assistance to Ukraine. \n",
"\n",
"And we will continue to aid the Ukrainian people as they defend their country and to help ease their suffering. \n",
"\n",
"Let me be clear, our forces are not engaged and will not engage in conflict with Russian forces in Ukraine. \n",
"\n",
"Our forces are not going to Europe to fight in Ukraine, but to defend our NATO Allies in the event that Putin decides to keep moving west. \n"
]
}
],
"source": [
"print(texts[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d214aec2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,304 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b118c9dc",
"metadata": {},
"source": [
"# Text Splitter\n",
"\n",
"When you want to deal wit long pieces of text, it is necessary to split up that text into chunks.\n",
"This notebook showcases several ways to do that.\n",
"\n",
"At a high level, text splitters work as following:\n",
"\n",
"1. Split the text up into small, semantically meaningful chunks (often sentences).\n",
"2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).\n",
"3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e82c4685",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter, NLTKTextSplitter, SpacyTextSplitter\n",
"# This is a long document we can split up.\n",
"with open('../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()"
]
},
{
"cell_type": "markdown",
"id": "5c461b26",
"metadata": {},
"source": [
"## Character Text Splitting\n",
"\n",
"Let's start with the most simple method: let's split based on characters (by default \"\\n\\n\") and measure chunk length by number of characters."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "79ff6737",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter( \n",
" separator = \"\\n\\n\",\n",
" chunk_size = 1000,\n",
" chunk_overlap = 200,\n",
" length_function = len,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "38547666",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \\n\\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. '"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"texts = text_splitter.split_text(state_of_the_union)\n",
"texts[0]"
]
},
{
"cell_type": "markdown",
"id": "13dc0983",
"metadata": {},
"source": [
"## HuggingFace Length Function\n",
"Most LLMs are constrained by the number of tokens that you can pass in, which is not the same as the number of characters. In order to get a more accurate estimate, we can use HuggingFace tokenizers to count the text length."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a8ce51d5",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n"
]
}
],
"source": [
"from transformers import GPT2TokenizerFast\n",
"\n",
"tokenizer = GPT2TokenizerFast.from_pretrained(\"gpt2\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ca5e72c0",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(tokenizer, chunk_size=100, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "37cdfbeb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny. \n"
]
}
],
"source": [
"print(texts[0])"
]
},
{
"cell_type": "markdown",
"id": "7683b36a",
"metadata": {},
"source": [
"## tiktoken (OpenAI) Length Function\n",
"You can also use tiktoken, a open source tokenizer package from OpenAI to estimate tokens used. Will probably be ore accurate for their models."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "825f7c0a",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=100, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ae35d165",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny. \n"
]
}
],
"source": [
"print(texts[0])"
]
},
{
"cell_type": "markdown",
"id": "ea2973ac",
"metadata": {},
"source": [
"## NLTK Text Splitter\n",
"Rather than just splitting on \"\\n\\n\", we can use NLTK to split based on tokenizers."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "20fa9c23",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = NLTKTextSplitter(chunk_size=1000)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "5ea10835",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.\\n\\nMembers of Congress and the Cabinet.\\n\\nJustices of the Supreme Court.\\n\\nMy fellow Americans.\\n\\nLast year COVID-19 kept us apart.\\n\\nThis year we are finally together again.\\n\\nTonight, we meet as Democrats Republicans and Independents.\\n\\nBut most importantly as Americans.\\n\\nWith a duty to one another to the American people to the Constitution.\\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny.\\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.\\n\\nBut he badly miscalculated.\\n\\nHe thought he could roll into Ukraine and the world would roll over.\\n\\nInstead he met a wall of strength he never imagined.\\n\\nHe met the Ukrainian people.\\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.\\n\\nGroups of citizens blocking tanks with their bodies.\\n\\nEveryone from students to retirees teachers turned soldiers defending their homeland.'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"texts = text_splitter.split_text(state_of_the_union)\n",
"texts[0]"
]
},
{
"cell_type": "markdown",
"id": "dab86b60",
"metadata": {},
"source": [
"## Spacy Text Splitter\n",
"Another alternative to NLTK is to use Spacy."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f9cc9dfc",
"metadata": {},
"outputs": [],
"source": [
"text_splitter = SpacyTextSplitter(chunk_size=1000)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "cef2b29e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.\\n\\nMembers of Congress and the Cabinet.\\n\\nJustices of the Supreme Court.\\n\\nMy fellow Americans. \\n\\n\\n\\nLast year COVID-19 kept us apart.\\n\\nThis year we are finally together again.\\n\\n\\n\\n\\n\\nTonight, we meet as Democrats Republicans and Independents.\\n\\nBut most importantly as Americans.\\n\\n\\n\\n\\n\\nWith a duty to one another to the American people to the Constitution. \\n\\n\\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny.\\n\\n\\n\\n\\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.\\n\\nBut he badly miscalculated.\\n\\n\\n\\n\\n\\nHe thought he could roll into Ukraine and the world would roll over.\\n\\nInstead he met a wall of strength he never imagined.\\n\\n\\n\\n\\n\\nHe met the Ukrainian people.\\n\\n\\n\\n\\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.\\n\\n\\n\\n\\n\\nGroups of citizens blocking tanks with their bodies.\\n\\nEveryone from students to retirees teachers turned soldiers defending their homeland.'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"texts = text_splitter.split_text(state_of_the_union)\n",
"texts[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1a118b1",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,35 +1,10 @@
LLMs & Prompts
==============
The examples here all highlight how to work with LLMs and prompts.
**LLMs**
`LLM Functionality <prompts/llm_functionality.ipynb>`_: A walkthrough of all the functionality the standard LLM interface exposes.
`LLM Serialization <prompts/llm_serialization.ipynb>`_: A walkthrough of how to serialize LLMs to and from disk.
`Custom LLM <prompts/custom_llm.ipynb>`_: How to create and use a custom LLM class, in case you have an LLM not from one of the standard providers (including one that you host yourself).
**Prompts**
`Prompt Management <prompts/prompt_management.ipynb>`_: A walkthrough of all the functionality LangChain supports for working with prompts.
`Prompt Serialization <prompts/prompt_serialization.ipynb>`_: A walkthrough of how to serialize prompts to and from disk.
`Few Shot Examples <prompts/few_shot_examples.ipynb>`_: How to include examples in the prompt.
`Generate Examples <prompts/generate_examples.ipynb>`_: How to use existing examples to generate more examples.
`Custom Example Selector <prompts/custom_example_selector.ipynb>`_: How to create and use a custom ExampleSelector (the class responsible for choosing which examples to use in a prompt).
`Custom Prompt Template <prompts/custom_prompt_template.ipynb>`_: How to create and use a custom PromptTemplate, the logic that decides how input variables get formatted into a prompt.
Prompts
=======
The examples here all highlight how to work with prompts.
.. toctree::
:maxdepth: 1
:glob:
:hidden:
prompts/*

View File

@@ -11,7 +11,7 @@
"\n",
"There is only one required thing that a custom LLM needs to implement:\n",
"\n",
"1. A `_call` method that takes in a string, some optional stop words, and returns a string\n",
"1. A `__call__` method that takes in a string, some optional stop words, and returns a string\n",
"\n",
"There is a second optional thing it can implement:\n",
"\n",
@@ -33,20 +33,17 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 2,
"id": "d5ceff02",
"metadata": {},
"outputs": [],
"source": [
"class CustomLLM(LLM):\n",
" \n",
" n: int\n",
" \n",
" @property\n",
" def _llm_type(self) -> str:\n",
" return \"custom\"\n",
" def __init__(self, n: int):\n",
" self.n = n\n",
" \n",
" def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:\n",
" def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:\n",
" if stop is not None:\n",
" raise ValueError(\"stop kwargs are not permitted.\")\n",
" return prompt[:self.n]\n",
@@ -67,7 +64,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 3,
"id": "10e5ece6",
"metadata": {},
"outputs": [],
@@ -77,7 +74,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 4,
"id": "8cd49199",
"metadata": {},
"outputs": [
@@ -87,7 +84,7 @@
"'This is a '"
]
},
"execution_count": 9,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -106,7 +103,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 5,
"id": "9c33fa19",
"metadata": {},
"outputs": [
@@ -148,7 +145,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -1,11 +0,0 @@
{
"model_name": "text-davinci-003",
"temperature": 0.7,
"max_tokens": 256,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"n": 1,
"best_of": 1,
"_type": "openai"
}

View File

@@ -1,9 +0,0 @@
_type: openai
best_of: 1
frequency_penalty: 0.0
max_tokens: 256
model_name: text-davinci-003
n: 1
presence_penalty: 0.0
temperature: 0.7
top_p: 1.0

View File

@@ -1,412 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "20ac6b98",
"metadata": {},
"source": [
"# LLM Functionality\n",
"\n",
"This notebook goes over all the different features of the LLM class in LangChain.\n",
"\n",
"We will work with an OpenAI LLM wrapper, although these functionalities should exist for all LLM types."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "df924055",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "182b484c",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(model_name=\"text-ada-001\", n=2, best_of=2)"
]
},
{
"cell_type": "markdown",
"id": "9695ccfc",
"metadata": {},
"source": [
"**Generate Text:** The most basic functionality an LLM has is just the ability to call it, passing in a string and getting back a string."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9d12ac26",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "markdown",
"id": "e7d4d42d",
"metadata": {},
"source": [
"**Generate:** More broadly, you can call it with a list of inputs, getting back a more complete response than just the text. This complete response includes things like multiple top responses, as well as LLM provider specific information"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f4dc241a",
"metadata": {},
"outputs": [],
"source": [
"llm_result = llm.generate([\"Tell me a joke\", \"Tell me a poem\"]*15)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "740392f6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(llm_result.generations)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ab6cdcf1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Generation(text='\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'),\n",
" Generation(text='\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_result.generations[0]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4946a778",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Generation(text=\"\\n\\nA rose by the side of the road\\n\\nIs all I need to find my way\\n\\nTo the place I've been searching for\\n\\nAnd my heart is singing with joy\\n\\nWhen I look at this rose\\n\\nIt reminds me of the love I've found\\n\\nAnd I know that wherever I go\\n\\nI'll always find my rose by the side of the road.\"),\n",
" Generation(text=\"\\n\\nWhen I was younger\\nI thought that love\\nI was something like a fairytale\\nI would find my prince and they would be my people\\nI was naïve\\nI thought that\\n\\nLove was a something that happened\\nWhen I was younger\\nI was it for my fairytale prince\\nNow I realize\\nThat love is something that waits\\nFor when my prince comes\\nAnd when I am ready to be his wife\\nI'll tell you a poem\\n\\nWhen I was younger\\nI thought that love\\nI was something like a fairytale\\nI would find my prince and they would be my people\\nI was naïve\\nI thought that\\n\\nLove was a something that happened\\nAnd I would be happy\\nWhen my prince came\\nAnd I was ready to be his wife\")]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_result.generations[-1]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "242e4527",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'token_usage': {'completion_tokens': 3722,\n",
" 'prompt_tokens': 120,\n",
" 'total_tokens': 3842}}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Provider specific info\n",
"llm_result.llm_output"
]
},
{
"cell_type": "markdown",
"id": "bde8e04f",
"metadata": {},
"source": [
"**Number of Tokens:** You can also estimate how many tokens a piece of text will be in that model. This is useful because models have a context length (and cost more for more tokens), which means you need to be aware of how long the text you are passing in is.\n",
"\n",
"Notice that by default the tokens are estimated using a HuggingFace tokenizer."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b623c774",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm.get_num_tokens(\"what a joke\")"
]
},
{
"cell_type": "markdown",
"id": "ee6fcf8d",
"metadata": {},
"source": [
"### Caching\n",
"With LangChain, you can also enable caching of LLM calls. Note that currently this only applies for individual LLM calls."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2626ca48",
"metadata": {},
"outputs": [],
"source": [
"import langchain\n",
"from langchain.cache import InMemoryCache\n",
"langchain.llm_cache = InMemoryCache()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "97762272",
"metadata": {},
"outputs": [],
"source": [
"# To make the caching really obvious, lets use a slower model.\n",
"llm = OpenAI(model_name=\"text-davinci-002\", n=2, best_of=2)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e80c65e4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 31.2 ms, sys: 11.8 ms, total: 43.1 ms\n",
"Wall time: 1.75 s\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "678408ec",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 51 µs, sys: 1 µs, total: 52 µs\n",
"Wall time: 67.2 µs\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side!'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The second time it is, so it goes faster\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3f0ac8d2",
"metadata": {},
"outputs": [],
"source": [
"# We can do the same thing with a SQLite cache\n",
"from langchain.cache import SQLiteCache\n",
"langchain.llm_cache = SQLiteCache(database_path=\".langchain.db\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "0e1dcce3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 26.6 ms, sys: 11.2 ms, total: 37.7 ms\n",
"Wall time: 1.89 s\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The first time, it is not yet in cache, so it should take longer\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "efadd750",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2.69 ms, sys: 1.57 ms, total: 4.27 ms\n",
"Wall time: 2.73 ms\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nWhy did the chicken cross the road?\\n\\nTo get to the other side.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# The second time it is, so it goes faster\n",
"llm(\"Tell me a joke\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6053408b",
"metadata": {},
"outputs": [],
"source": [
"# You can use SQLAlchemyCache to cache with any SQL database supported by SQLAlchemy.\n",
"from langchain.cache import SQLAlchemyCache\n",
"from sqlalchemy import create_engine\n",
"\n",
"engine = create_engine(\"postgresql://postgres:postgres@localhost:5432/postgres\")\n",
"langchain.llm_cache = SQLAlchemyCache(engine)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12 (main, Jun 1 2022, 06:34:44) \n[Clang 12.0.0 ]"
},
"vscode": {
"interpreter": {
"hash": "1235b9b19e8e9828b5c1fdb2cd89fe8d3de0fcde5ef5f3db36e4b671adb8660f"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,166 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "73f9bf40",
"metadata": {},
"source": [
"# LLM Serialization\n",
"\n",
"This notebook walks how to write and read an LLM Configuration to and from disk. This is useful if you want to save the configuration for a given LLM (eg the provider, the temperature, etc)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9c9fb6ff",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.llms.loading import load_llm"
]
},
{
"cell_type": "markdown",
"id": "88ce018b",
"metadata": {},
"source": [
"### Loading\n",
"First, lets go over loading a LLM from disk. LLMs can be saved on disk in two formats: json or yaml. No matter the extension, they are loaded in the same way."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f12b28f3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"model_name\": \"text-davinci-003\",\r\n",
" \"temperature\": 0.7,\r\n",
" \"max_tokens\": 256,\r\n",
" \"top_p\": 1,\r\n",
" \"frequency_penalty\": 0,\r\n",
" \"presence_penalty\": 0,\r\n",
" \"n\": 1,\r\n",
" \"best_of\": 1,\r\n",
" \"_type\": \"openai\"\r\n",
"}"
]
}
],
"source": [
"!cat llm.json"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9ab709fc",
"metadata": {},
"outputs": [],
"source": [
"llm = load_llm(\"llm.json\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "095b1d56",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"_type: openai\r\n",
"best_of: 1\r\n",
"frequency_penalty: 0\r\n",
"max_tokens: 256\r\n",
"model_name: text-davinci-003\r\n",
"n: 1\r\n",
"presence_penalty: 0\r\n",
"temperature: 0.7\r\n",
"top_p: 1\r\n"
]
}
],
"source": [
"!cat llm.yaml"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8cafaafe",
"metadata": {},
"outputs": [],
"source": [
"llm = load_llm(\"llm.yaml\")"
]
},
{
"cell_type": "markdown",
"id": "ab3e4223",
"metadata": {},
"source": [
"### Saving\n",
"If you want to go from a LLM in memory to a serialized version of it, you can do so easily by calling the `.save` method. Again, this supports both json and yaml."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b38f685d",
"metadata": {},
"outputs": [],
"source": [
"llm.save(\"llm.json\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b7365503",
"metadata": {},
"outputs": [],
"source": [
"llm.save(\"llm.yaml\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e494851",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,128 +0,0 @@
# Data Augmented Generation
## Overview
Language models are trained on large amounts of unstructured data, which makes them really good at general purpose text generation. However, there are many instances where you may want the language model to generate text based not on generic data but rather on specific data. Some common examples of this include:
- Summarization of a specific piece of text (a website, a private document, etc)
- Question answering over a specific piece of text (a website, a private document, etc)
- Question answering over multiple pieces of text (multiple websites, multiple private documents, etc)
- Using the results of some external call to an API (results from a SQL query, etc)
All of these examples are instances when you do not want the LLM to generate text based solely on the data it was trained over, but rather you want it to incorporate other external data in some way. At a high level, this process can be broken down into two steps:
1. Fetching: Fetching the relevant data to include.
2. Augmenting: Passing the data in as context to the LLM.
This guide is intended to provide an overview of how to do this. This includes an overview of the literature, as well as common tools, abstractions and chains for doing this.
## Related Literature
There are a lot of related papers in this area. Most of them are focused on end-to-end methods that optimize the fetching of the relevant data as well as passing it in as context. These are a few of the papers that are particularly relevant:
**[RAG](https://arxiv.org/abs/2005.11401):** Retrieval Augmented Generation.
This paper introduces RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.
**[REALM](https://arxiv.org/abs/2002.08909):** Retrieval-Augmented Language Model Pre-Training.
To capture knowledge in a more modular and interpretable way, this paper augments language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference.
**[HayStack](https://haystack.deepset.ai/):** This is not a paper, but rather an open source library aimed at semantic search, question answering, summarization, and document ranking for a wide range of NLP applications. The underpinnings of this library are focused on the same `fetching` and `augmenting` concepts discussed here, and incorporate some of the methods in the above papers.
These papers/open-source projects are centered around retrieval of documents, which is important for question-answering tasks over a large corpus of documents (which is how they are evaluated). However, we use the terminology of `Data Augmented Generation` to highlight that retrieval from some document store is only one possible way of fetching relevant data to include. Other methods to fetch relevant data could involve hitting an API, querying a database, or just working with user provided data (eg a specific document that they want to summarize).
Let's now deep dive on the two steps involved: fetching and augmenting.
## Fetching
There are many ways to fetch relevant data to pass in as context to a LM, and these methods largely depend
on the use case.
**User provided:** In some cases, the user may provide the relevant data, and no algorithm for fetching is needed.
An example of this is for summarization of specific documents: the user will provide the document to be summarized,
and task the language model with summarizing it.
**Document Retrieval:** One of the more common use cases involves fetching relevant documents or pieces of text from
a large corpus of data. A common example of this is question answering over a private collection of documents.
**API Querying:** Another common way to fetch data is from an API query. One example of this is WebGPT like system,
where you first query Google (or another search API) for relevant information, and then those results are used in
the generation step. Another example could be querying a structured database (like SQL) and then using a language model
to synthesize those results.
There are two big issues to deal with in fetching:
1. Fetching small enough pieces of information
2. Not fetching too many pieces of information (eg fetching only the most relevant pieces)
### Text Splitting
One big issue with all of these methods is how to make sure you are working with pieces of text that are not too large.
This is important because most language models have a context length, and so you cannot (yet) just pass a
large document in as context. Therefor, it is important to not only fetch relevant data but also make sure it is
small enough chunks.
LangChain provides some utilities to help with splitting up larger pieces of data. This comes in the form of the TextSplitter class.
The class takes in a document and splits it up into chunks, with several parameters that control the
size of the chunks as well as the overlap in the chunks (important for maintaining context).
See [this walkthrough](../examples/integrations/textsplitter.ipynb) for more information.
### Relevant Documents
A second large issue related fetching data is to make sure you are not fetching too many documents, and are only fetching
the documents that are relevant to the query/question at hand. There are a few ways to deal with this.
One concrete example of this is vector stores for document retrieval, often used for semantic search or question answering.
With this method, larger documents are split up into
smaller chunks and then each chunk of text is passed to an embedding function which creates an embedding for that piece of text.
Those are embeddings are then stored in a database. When a new search query or question comes in, an embedding is
created for that query/question and then documents with embeddings most similar to that embedding are fetched.
Examples of vector database companies include [Pinecone](https://www.pinecone.io/) and [Weaviate](https://weaviate.io/).
Although this is perhaps the most common way of document retrieval, people are starting to think about alternative
data structures and indexing techniques specifically for working with language models. For a leading example of this,
check out [GPT Index](https://github.com/jerryjliu/gpt_index) - a collection of data structures created by and optimized
for language models.
## Augmenting
So you've fetched your relevant data - now what? How do you pass them to the language model in a format it can understand?
There are a few different methods, or chains, for doing so. LangChain supports three of the more common ones - and
we are actively looking to include more, so if you have any ideas please reach out! Note that there is not
one best method - the decision of which one to use is often very context specific. In order from simplest to
most complex:
### Stuffing
Stuffing is the simplest method, whereby you simply stuff all the related data into the prompt as context
to pass to the language model. This is implemented in LangChain as the `StuffDocumentsChain`.
**Pros:** Only makes a single call to the LLM. When generating text, the LLM has access to all the data at once.
**Cons:** Most LLMs have a context length, and for large documents (or many documents) this will not work as it will result in a prompt larger than the context length.
The main downside of this method is that it only works one smaller pieces of data. Once you are working
with many pieces of data, this approach is no longer feasible. The next two approaches are designed to help deal with that.
### Map Reduce
This method involves an initial prompt on each chunk of data (for summarization tasks, this
could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk).
Then a different prompt is run to combine all the initial outputs. This is implemented in the LangChain as the `MapReduceDocumentsChain`.
**Pros:** Can scale to larger documents (and more documents) than `StuffDocumentsChain`. The calls to the LLM on individual documents are independent and can therefore be parallelized.
**Cons:** Requires many more calls to the LLM than `StuffDocumentsChain`. Loses some information during the final combining call.
### Refine
This method involves an initial prompt on the first chunk of data, generating some output.
For the remaining documents, that output is passed in, along with the next document,
asking the LLM to refine the output based on the new document.
**Pros:** Can pull in more relevant context, and may be less lossy than `RefineDocumentsChain`.
**Cons:** Requires many more calls to the LLM than `StuffDocumentsChain`. The calls are also NOT independent, meaning they cannot be paralleled like `MapReduceDocumentsChain`. There is also some potential dependencies on the ordering of the documents.
## Use Cases
LangChain supports the above three methods of augmenting LLMs with external data.
These methods can be used to underpin several common use cases and they are discussed below.
For all three of these use cases, all three methods are supported.
It is important to note that a large part of these implementations is the prompts
that are used. We provide default prompts for all three use cases, but these can be configured.
This is in case you discover a prompt that works better for your specific application.
- [Question-Answering With Sources](../examples/chains/qa_with_sources.ipynb)
- [Question-Answering](../examples/chains/question_answering.ipynb)
- [Summarization](../examples/chains/summarize.ipynb)

View File

@@ -6,9 +6,6 @@ If you see any other demos that you think we should highlight, be sure to let us
## Open Source
### [YouTube Transcription Question Answering with Sources](https://colab.research.google.com/drive/1sKSTjt9cPstl_WMZ86JsgEqFG-aSAwkn?usp=sharing)
An end-to-end example of doing question answering on YouTube transcripts, returning the timestamps as sources to legitimize the answer.
### [ThoughtSource](https://github.com/OpenBioLink/ThoughtSource)
A central, open resource and community around data and tools related to chain-of-thought reasoning in large language models.

View File

@@ -72,10 +72,3 @@ Encouraging the model to think a certain way by including the start of the model
Resources:
- [Example](https://twitter.com/goodside/status/1583262455207460865?s=20&t=8Hz7XBnK1OF8siQrxxCIGQ)
### MemPrompt
MemPrompt maintains a memory of errors and user feedback, and uses them to prevent repetition of mistakes.
Resources:
- [Paper](https://memprompt.com/)

View File

@@ -118,40 +118,40 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ZeroShotAgent chain...\u001b[0m\n",
"How old is Olivia Wilde's boyfriend? What is that number raised to the 0.23 power?\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out how old Olivia Wilde's boyfriend is, and then use a calculator to calculate the power.\n",
"What is the age of Olivia Wilde's boyfriend raised to the 0.23 power?\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find the age of Olivia Wilde's boyfriend\n",
"Action: Search\n",
"Action Input: Olivia Wilde's boyfriend age\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mWhile Wilde, 37, and Styles, 27, have both kept a low profile when it comes to talking about their relationship, Wilde did address their ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Olivia Wilde's boyfriend is 27 years old.\n",
"Action Input: \"Olivia Wilde's boyfriend\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mOlivia Wilde started dating Harry Styles after ending her years-long engagement to Jason Sudeikis — see their relationship timeline.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find the age of Harry Styles\n",
"Action: Search\n",
"Action Input: \"Harry Styles age\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m28 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 28 to the 0.23 power\n",
"Action: Calculator\n",
"Action Input: 27^0.23\u001b[0m\n",
"Action Input: 28^0.23\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"27^0.23\u001b[32;1m\u001b[1;3m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"28^0.23\u001b[32;1m\u001b[1;3m\n",
"\n",
"```python\n",
"import math\n",
"print(math.pow(27, 0.23))\n",
"print(28**0.23)\n",
"```\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m2.1340945944237553\n",
"Answer: \u001b[33;1m\u001b[1;3m2.1520202182226886\n",
"\u001b[0m\n",
"\u001b[1m> Finished LLMMathChain chain.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.1340945944237553\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.1520202182226886\n",
"\u001b[0m\n",
"Thought:\n",
"\u001b[1m> Finished ZeroShotAgent chain.\u001b[0m\n"
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: 2.1520202182226886\u001b[0m"
]
},
{
"data": {
"text/plain": [
"'2.1340945944237553'"
"'2.1520202182226886'"
]
},
"execution_count": 4,
@@ -163,86 +163,10 @@
"agent.run(\"How old is Olivia Wilde's boyfriend? What is that number raised to the 0.23 power?\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2f0852ff",
"metadata": {},
"outputs": [],
"source": [
"# We can also return the intermediate steps\n",
"llm = OpenAI(temperature=0)\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True, return_intermediate_steps=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "837211e8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ZeroShotAgent chain...\u001b[0m\n",
"How old is Olivia Wilde's boyfriend? What is that number raised to the 0.23 power?\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out how old Olivia Wilde's boyfriend is, and then use a calculator to calculate the power.\n",
"Action: Search\n",
"Action Input: Olivia Wilde's boyfriend age\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mWhile Wilde, 37, and Styles, 27, have both kept a low profile when it comes to talking about their relationship, Wilde did address their ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m Olivia Wilde's boyfriend is 27 years old.\n",
"Action: Calculator\n",
"Action Input: 27^0.23\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"27^0.23\u001b[32;1m\u001b[1;3m\n",
"\n",
"```python\n",
"import math\n",
"print(math.pow(27, 0.23))\n",
"```\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m2.1340945944237553\n",
"\u001b[0m\n",
"\u001b[1m> Finished LLMMathChain chain.\u001b[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.1340945944237553\n",
"\u001b[0m\n",
"Thought:\n",
"\u001b[1m> Finished ZeroShotAgent chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': \"How old is Olivia Wilde's boyfriend? What is that number raised to the 0.23 power?\",\n",
" 'output': '2.1340945944237553',\n",
" 'intermediate_steps': [{'log': \" I need to find out how old Olivia Wilde's boyfriend is, and then use a calculator to calculate the power.\\nAction: Search\\nAction Input: Olivia Wilde's boyfriend age\",\n",
" 'tool': 'Search',\n",
" 'tool_input': \"Olivia Wilde's boyfriend age\",\n",
" 'observation': 'While Wilde, 37, and Styles, 27, have both kept a low profile when it comes to talking about their relationship, Wilde did address their ...'},\n",
" {'log': \" Olivia Wilde's boyfriend is 27 years old.\\nAction: Calculator\\nAction Input: 27^0.23\",\n",
" 'tool': 'Calculator',\n",
" 'tool_input': '27^0.23',\n",
" 'observation': 'Answer: 2.1340945944237553\\n'}]}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent({\"input\":\"How old is Olivia Wilde's boyfriend? What is that number raised to the 0.23 power?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9256ff6b",
"id": "2f0852ff",
"metadata": {},
"outputs": [],
"source": []
@@ -264,7 +188,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@@ -27,7 +27,7 @@ from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)
```
Now we can run that chain only specifying the product!
Now we can run that can only specifying the product!
```python
chain.run("colorful socks")

View File

@@ -1,333 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d31df93e",
"metadata": {},
"source": [
"# Memory\n",
"So far, all the chains and agents we've gone through have been stateless. But often, you may want a chain or agent to have some concept of \"memory\" so that it may remember information about its previous interactions. The clearest and simple example of this is when designing a chatbot - you want it to remember previous messages so it can use context from that to have a better conversation. This would be a type of \"short-term memory\". On the more complex side, you could imagine a chain/agent remembering key pieces of information over time - this would be a form of \"long-term memory\". For more concrete ideas on the latter, see this [awesome paper](https://memprompt.com/).\n",
"\n",
"LangChain provides several specially created chains just for this purpose. This notebook walks through using one of those chains (the `ConversationChain`) with two different types of memory."
]
},
{
"cell_type": "markdown",
"id": "d051c1da",
"metadata": {},
"source": [
"### ConversationChain with default memory\n",
"By default, the `ConversationChain` has a simple type of memory that remembers all previous inputs/outputs and adds them to the context that is passed. Let's take a look at using this chain (setting `verbose=True` so we can see the prompt)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ae046bff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Hello! How are you today?'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import OpenAI, ConversationChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"conversation = ConversationChain(llm=llm, verbose=True)\n",
"\n",
"conversation.predict(input=\"Hi there!\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d8e2a6ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI: Hello! How are you today?\n",
"Human: I'm doing well! Just having a conversation with an AI.\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" That's great! What would you like to talk about?\""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation.predict(input=\"I'm doing well! Just having a conversation with an AI.\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "15eda316",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI: Hello! How are you today?\n",
"Human: I'm doing well! Just having a conversation with an AI.\n",
"AI: That's great! What would you like to talk about?\n",
"Human: Tell me about yourself.\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' I am an AI created to provide information and support to humans. I enjoy learning and exploring new things.'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation.predict(input=\"Tell me about yourself.\")"
]
},
{
"cell_type": "markdown",
"id": "4fad9448",
"metadata": {},
"source": [
"### ConversationChain with ConversationSummaryMemory\n",
"Now let's take a look at using a slightly more complex type of memory - `ConversationSummaryMemory`. This type of memory creates a summary of the conversation over time. This can be useful for condensing information from the conversation over time.\n",
"\n",
"Let's walk through an example, again setting `verbose=True` so we can see the prompt."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f60a2fe8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.conversation.memory import ConversationSummaryMemory"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b7274f2c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi, what's up?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nI'm doing well, thank you for asking. I'm currently working on a project that I'm really excited about.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary = ConversationChain(llm=llm, memory=ConversationSummaryMemory(llm=OpenAI()), verbose=True)\n",
"conversation_with_summary.predict(input=\"Hi, what's up?\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a6b6b88f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"The human and artificial intelligence are talking. The human asked the AI what it is doing, and the AI said that it is working on a project that it is excited about.\n",
"Human: Tell me more about it!\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nI'm working on a project that I'm really excited about. It's a lot of work, but I think it's going to be really great when it's finished. I can't wait to show it to you!\""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"Tell me more about it!\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dad869fe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"\n",
"The human and artificial intelligence are talking. The human asked the AI what it is doing, and the AI said that it is working on a project that it is excited about. The AI said that the project is a lot of work, but it is going to be great when it is finished.\n",
"Human: Very cool -- what is the scope of the project?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nThe project is quite large in scope. It involves a lot of data analysis and work with artificial intelligence algorithms.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"Very cool -- what is the scope of the project?\")"
]
},
{
"cell_type": "markdown",
"id": "5c8735cc",
"metadata": {},
"source": [
"### More Resources on Memory\n",
"\n",
"This just scratches the surface of what you can do with memory. For more examples on things like how to implement custom memory classes, how to add memory to a custom LLM chain and how to use memory with an agent, please see the [How-To: Memory](../../examples/memory) section. For even more advanced ideas on memory (which will hopefully be included in LangChain soon!) see the [MemPrompt](https://memprompt.com/) paper."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "436dda66",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
"cells": [
{
"cell_type": "markdown",
"id": "d31df93e",
"metadata": {},
"source": [
"# Memory\n",
"So far, all the chains and agents we've gone through have been stateless. But often, you may want a chain or agent to have some concept of \"memory\" so that it may remember information about its previous interactions. The most clear and simple example of this is when designing a chatbot - you want it to remember previous messages so it can use context from that to have a better conversation. This would be a type of \"short-term memory\". On the more complex side, you could imagine a chain/agent remembering key pieces of information over time - this would be a form of \"long-term memory\".\n",
"\n",
"LangChain provides several specially created chains just for this purpose. This notebook walk throughs using one of those chains (the `ConversationChain`) with two different types of memory."
]
},
"nbformat": 4,
"nbformat_minor": 5
{
"cell_type": "markdown",
"id": "d051c1da",
"metadata": {},
"source": [
"### ConversationChain with default memory\n",
"By default, the `ConversationChain` has a simple type of memory which remebers all previes inputs/outputs and adds them to the context that is passed. Let's take a look at using this chain (setting `verbose=True` so we can see the prompt)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ae046bff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Hello! How are you today?'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import OpenAI, ConversationChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"conversation = ConversationChain(llm=llm, verbose=True)\n",
"\n",
"conversation.predict(input=\"Hi there!\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d8e2a6ff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI: Hello! How are you today?\n",
"Human: I'm doing well! Just having a conversation with an AI.\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" That's great! What would you like to talk about?\""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation.predict(input=\"I'm doing well! Just having a conversation with an AI.\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "15eda316",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi there!\n",
"AI: Hello! How are you today?\n",
"Human: I'm doing well! Just having a conversation with an AI.\n",
"AI: That's great! What would you like to talk about?\n",
"Human: Tell me about yourself.\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' I am an AI created to provide information and support to humans. I enjoy learning and exploring new things.'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation.predict(input=\"Tell me about yourself.\")"
]
},
{
"cell_type": "markdown",
"id": "4fad9448",
"metadata": {},
"source": [
"### ConversationChain with ConversationSummaryMemory\n",
"Now lets take a look at using a slightly more complex type of memory - `ConversationSummaryMemory`. This type of memory creates a summary of the conversation over time. This can be useful for condensing information from the conversation over time.\n",
"\n",
"Let's walk through an example, again setting `verbose=True` so we can see the prompt."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f60a2fe8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.conversation.memory import ConversationSummaryMemory"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b7274f2c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"Human: Hi, what's up?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nI'm doing well, thank you for asking. I'm currently working on a project that I'm really excited about.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary = ConversationChain(llm=llm, memory=ConversationSummaryMemory(llm=OpenAI()), verbose=True)\n",
"conversation_with_summary.predict(input=\"Hi, what's up?\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a6b6b88f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"The human and artificial intelligence are talking. The human asked the AI what it is doing, and the AI said that it is working on a project that it is excited about.\n",
"Human: Tell me more about it!\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"\\n\\nI'm working on a project that I'm really excited about. It's a lot of work, but I think it's going to be really great when it's finished. I can't wait to show it to you!\""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"Tell me more about it!\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dad869fe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.\n",
"\n",
"Current conversation:\n",
"\n",
"\n",
"The human and artificial intelligence are talking. The human asked the AI what it is doing, and the AI said that it is working on a project that it is excited about. The AI said that the project is a lot of work, but it is going to be great when it is finished.\n",
"Human: Very cool -- what is the scope of the project?\n",
"AI:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'\\n\\nThe project is quite large in scope. It involves a lot of data analysis and work with artificial intelligence algorithms.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversation_with_summary.predict(input=\"Very cool -- what is the scope of the project?\")"
]
},
{
"cell_type": "markdown",
"id": "5c8735cc",
"metadata": {},
"source": [
"### More Resources on Memory\n",
"\n",
"This just scratches the surface of what you can do with memory. For more examples on things like how to implement custom memory classes, how to add memory to a custom LLM chain and how to use memory with and agent, please see the [How-To: Memory](../../examples/memory) section."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "436dda66",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,13 +9,13 @@ combine them with other sources of computation or knowledge.
This library is aimed at assisting in the development of those types of applications.
There are four main areas that LangChain is designed to help with.
There are three main areas (with a forth coming soon) that LangChain is designed to help with.
These are, in increasing order of complexity:
1. LLM and Prompts
2. Chains
3. Agents
4. Memory
4. (Coming Soon) Memory
Let's go through these categories and for each one identify key concepts (to clarify terminology) as well as the problems in this area LangChain helps solve.
@@ -159,7 +159,6 @@ see detailed information about the various classes, methods, and APIs.
:name: resources
explanation/core_concepts.md
explanation/combine_docs.md
explanation/agents.md
explanation/glossary.md
explanation/cool_demos.md

View File

@@ -21,10 +21,4 @@ To install all modules needed for all integrations, run:
```
pip install langchain[all]
```
Note that if you are using `zsh`, you'll need to quote square brackets when passing them as an argument to a command, for example:
```
pip install 'langchain[all]'
```

View File

@@ -1,14 +1,9 @@
"""Main entrypoint into package."""
from typing import Optional
from langchain.agents import MRKLChain, ReActChain, SelfAskWithSearchChain
from langchain.cache import BaseCache
from langchain.chains import (
ConversationChain,
LLMBashChain,
LLMChain,
LLMCheckerChain,
LLMMathChain,
PALChain,
QAWithSourcesChain,
@@ -18,8 +13,7 @@ from langchain.chains import (
)
from langchain.docstore import InMemoryDocstore, Wikipedia
from langchain.llms import Cohere, HuggingFaceHub, OpenAI
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.logger import BaseLogger, StdOutLogger
from langchain.logger import BaseLogger
from langchain.prompts import (
BasePromptTemplate,
FewShotPromptTemplate,
@@ -30,14 +24,10 @@ from langchain.serpapi import SerpAPIChain, SerpAPIWrapper
from langchain.sql_database import SQLDatabase
from langchain.vectorstores import FAISS, ElasticVectorSearch
logger: BaseLogger = StdOutLogger()
verbose: bool = False
llm_cache: Optional[BaseCache] = None
logger = BaseLogger()
__all__ = [
"LLMChain",
"LLMBashChain",
"LLMCheckerChain",
"LLMMathChain",
"SelfAskWithSearchChain",
"SerpAPIWrapper",
@@ -51,7 +41,6 @@ __all__ = [
"ReActChain",
"Wikipedia",
"HuggingFaceHub",
"HuggingFacePipeline",
"SQLDatabase",
"SQLDatabaseChain",
"FAISS",

View File

@@ -1,5 +1,5 @@
"""Routing chains."""
from langchain.agents.agent import AgentWithTools
from langchain.agents.agent import Agent
from langchain.agents.loading import initialize_agent
from langchain.agents.mrkl.base import MRKLChain, ZeroShotAgent
from langchain.agents.react.base import ReActChain, ReActTextWorldAgent
@@ -10,7 +10,7 @@ __all__ = [
"MRKLChain",
"SelfAskWithSearchChain",
"ReActChain",
"AgentWithTools",
"Agent",
"Tool",
"initialize_agent",
"ZeroShotAgent",

View File

@@ -1,31 +1,65 @@
"""Chain that takes in an input and produces an action and action input."""
from __future__ import annotations
from abc import abstractmethod
from typing import Any, Dict, List, Optional, Tuple, Union
from abc import ABC, abstractmethod
from typing import Any, ClassVar, Dict, List, Optional, Tuple
from pydantic import BaseModel, root_validator
from pydantic import BaseModel
import langchain
from langchain.agents.input import ChainedInput
from langchain.agents.tools import Tool
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.input import get_color_mapping
from langchain.llms.base import LLM
from langchain.prompts.base import BasePromptTemplate
from langchain.schema import AgentAction, AgentFinish
from langchain.schema import AgentAction
class Agent(BaseModel):
"""Class responsible for calling the language model and deciding the action.
This is driven by an LLMChain. The prompt in the LLMChain MUST include
a variable called "agent_scratchpad" where the agent can put its
intermediary work.
"""
class Agent(Chain, BaseModel, ABC):
"""Agent that uses an LLM."""
prompt: ClassVar[BasePromptTemplate]
llm_chain: LLMChain
return_values: List[str] = ["output"]
tools: List[Tool]
input_key: str = "input" #: :meta private:
output_key: str = "output" #: :meta private:
@property
def input_keys(self) -> List[str]:
"""Return the singular input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return the singular output key.
:meta private:
"""
return [self.output_key]
@property
@abstractmethod
def observation_prefix(self) -> str:
"""Prefix to append the observation with."""
@property
@abstractmethod
def llm_prefix(self) -> str:
"""Prefix to append the LLM call with."""
@property
def finish_tool_name(self) -> str:
"""Name of the tool to use to finish the chain."""
return "Final Answer"
@property
def starter_string(self) -> str:
"""Put this string after user input but before first LLM call."""
return "\n"
@abstractmethod
def _extract_tool_and_input(self, text: str) -> Optional[Tuple[str, str]]:
@@ -39,152 +73,84 @@ class Agent(BaseModel):
def _stop(self) -> List[str]:
return [f"\n{self.observation_prefix}"]
def plan(
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
) -> Union[AgentFinish, AgentAction]:
"""Given input, decided what to do.
Args:
thoughts: LLM thoughts
inputs: user inputs
Returns:
Action specifying what tool to use.
"""
thoughts = ""
for action, observation in intermediate_steps:
thoughts += action.log
thoughts += f"\n{self.observation_prefix}{observation}\n{self.llm_prefix}"
new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
full_inputs = {**kwargs, **new_inputs}
full_output = self.llm_chain.predict(**full_inputs)
parsed_output = self._extract_tool_and_input(full_output)
while parsed_output is None:
full_output = self._fix_text(full_output)
full_inputs["agent_scratchpad"] += full_output
output = self.llm_chain.predict(**full_inputs)
full_output += output
parsed_output = self._extract_tool_and_input(full_output)
tool, tool_input = parsed_output
if tool == self.finish_tool_name:
return AgentFinish({"output": tool_input}, full_output)
return AgentAction(tool, tool_input, full_output)
def prepare_for_new_call(self) -> None:
"""Prepare the agent for new call, if needed."""
pass
@property
def finish_tool_name(self) -> str:
"""Name of the tool to use to finish the chain."""
return "Final Answer"
@property
def input_keys(self) -> List[str]:
"""Return the input keys.
:meta private:
"""
return list(set(self.llm_chain.input_keys) - {"agent_scratchpad"})
@root_validator()
def validate_prompt(cls, values: Dict) -> Dict:
"""Validate that prompt matches format."""
prompt = values["llm_chain"].prompt
if "agent_scratchpad" not in prompt.input_variables:
raise ValueError(
"`agent_scratchpad` should be a variable in prompt.input_variables"
)
return values
@property
@abstractmethod
def observation_prefix(self) -> str:
"""Prefix to append the observation with."""
@property
@abstractmethod
def llm_prefix(self) -> str:
"""Prefix to append the LLM call with."""
@classmethod
@abstractmethod
def create_prompt(cls, tools: List[Tool]) -> BasePromptTemplate:
"""Create a prompt for this class."""
@classmethod
def _validate_tools(cls, tools: List[Tool]) -> None:
"""Validate that appropriate tools are passed in."""
pass
@classmethod
def from_llm_and_tools(cls, llm: LLM, tools: List[Tool]) -> Agent:
def create_prompt(cls, tools: List[Tool]) -> BasePromptTemplate:
"""Create a prompt for this class."""
return cls.prompt
def _prepare_for_new_call(self) -> None:
pass
@classmethod
def from_llm_and_tools(cls, llm: LLM, tools: List[Tool], **kwargs: Any) -> Agent:
"""Construct an agent from an LLM and tools."""
cls._validate_tools(tools)
llm_chain = LLMChain(llm=llm, prompt=cls.create_prompt(tools))
return cls(llm_chain=llm_chain)
return cls(llm_chain=llm_chain, tools=tools, **kwargs)
def get_action(self, text: str) -> AgentAction:
"""Given input, decided what to do.
class AgentWithTools(Chain, BaseModel):
"""Consists of an agent using tools."""
Args:
text: input string
agent: Agent
tools: List[Tool]
return_intermediate_steps: bool = False
@property
def input_keys(self) -> List[str]:
"""Return the input keys.
:meta private:
Returns:
Action specifying what tool to use.
"""
return self.agent.input_keys
input_key = self.llm_chain.input_keys[0]
inputs = {input_key: text, "stop": self._stop}
full_output = self.llm_chain.predict(**inputs)
parsed_output = self._extract_tool_and_input(full_output)
while parsed_output is None:
full_output = self._fix_text(full_output)
inputs = {input_key: text + full_output, "stop": self._stop}
output = self.llm_chain.predict(**inputs)
full_output += output
parsed_output = self._extract_tool_and_input(full_output)
tool, tool_input = parsed_output
return AgentAction(tool, tool_input, full_output)
@property
def output_keys(self) -> List[str]:
"""Return the singular output key.
:meta private:
"""
if self.return_intermediate_steps:
return self.agent.return_values + ["intermediate_steps"]
else:
return self.agent.return_values
def _call(self, inputs: Dict[str, str]) -> Dict[str, Any]:
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
"""Run text through and get agent response."""
text = inputs[self.input_key]
# Do any preparation necessary when receiving a new input.
self.agent.prepare_for_new_call()
self._prepare_for_new_call()
# Construct a mapping of tool name to tool for easy lookup
name_to_tool_map = {tool.name: tool.func for tool in self.tools}
# Construct the initial string to pass into the LLM. This is made up
# of the user input, the special starter string, and then the LLM prefix.
# The starter string is a special string that may be used by a LLM to
# immediately follow the user input. The LLM prefix is a string that
# prompts the LLM to take an action.
starter_string = text + self.starter_string + self.llm_prefix
# We use the ChainedInput class to iteratively add to the input over time.
chained_input = ChainedInput(starter_string, verbose=self.verbose)
# We construct a mapping from each tool to a color, used for logging.
color_mapping = get_color_mapping(
[tool.name for tool in self.tools], excluded_colors=["green"]
)
intermediate_steps: List[Tuple[AgentAction, str]] = []
# We now enter the agent loop (until it returns something).
while True:
# Call the LLM to see what to do.
output = self.agent.plan(intermediate_steps, **inputs)
output = self.get_action(chained_input.input)
# Add the log to the Chained Input.
chained_input.add_action(output, color="green")
# If the tool chosen is the finishing tool, then we end and return.
if isinstance(output, AgentFinish):
if self.verbose:
langchain.logger.log_agent_end(output, color="green")
final_output = output.return_values
if self.return_intermediate_steps:
final_output["intermediate_steps"] = intermediate_steps
return final_output
if self.verbose:
langchain.logger.log_agent_action(output, color="green")
# And then we lookup the tool
if output.tool in name_to_tool_map:
chain = name_to_tool_map[output.tool]
# We then call the tool on the tool input to get an observation
observation = chain(output.tool_input)
color = color_mapping[output.tool]
else:
observation = f"{output.tool} is not a valid tool, try another one."
color = None
if self.verbose:
langchain.logger.log_agent_observation(observation, color=color)
intermediate_steps.append((output, observation))
if output.tool == self.finish_tool_name:
return {self.output_key: output.tool_input}
# Otherwise we lookup the tool
chain = name_to_tool_map[output.tool]
# We then call the tool on the tool input to get an observation
observation = chain(output.tool_input)
# We then log the observation
chained_input.add_observation(
observation,
self.observation_prefix,
self.llm_prefix,
color=color_mapping[output.tool],
)

44
langchain/agents/input.py Normal file
View File

@@ -0,0 +1,44 @@
"""Input manager for agents."""
from typing import Optional
import langchain
from langchain.schema import AgentAction
class ChainedInput:
"""Class for working with input that is the result of chains."""
def __init__(self, text: str, verbose: bool = False):
"""Initialize with verbose flag and initial text."""
self._verbose = verbose
if self._verbose:
langchain.logger.log_agent_start(text)
self._input = text
def add_action(self, action: AgentAction, color: Optional[str] = None) -> None:
"""Add text to input, print if in verbose mode."""
if self._verbose:
langchain.logger.log_agent_action(action, color=color)
self._input += action.log
def add_observation(
self,
observation: str,
observation_prefix: str,
llm_prefix: str,
color: Optional[str],
) -> None:
"""Add observation to input, print if in verbose mode."""
if self._verbose:
langchain.logger.log_agent_observation(
observation,
color=color,
observation_prefix=observation_prefix,
llm_prefix=llm_prefix,
)
self._input += f"\n{observation_prefix}{observation}\n{llm_prefix}"
@property
def input(self) -> str:
"""Return the accumulated input."""
return self._input

View File

@@ -1,7 +1,7 @@
"""Load agent."""
from typing import Any, List
from langchain.agents.agent import AgentWithTools
from langchain.agents.agent import Agent
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.react.base import ReActDocstoreAgent
from langchain.agents.self_ask_with_search.base import SelfAskWithSearchAgent
@@ -20,7 +20,7 @@ def initialize_agent(
llm: LLM,
agent: str = "zero-shot-react-description",
**kwargs: Any,
) -> AgentWithTools:
) -> Agent:
"""Load agent given tools and LLM.
Args:
@@ -39,5 +39,4 @@ def initialize_agent(
f"Valid types are: {AGENT_TO_CLASS.keys()}."
)
agent_cls = AGENT_TO_CLASS[agent]
agent_obj = agent_cls.from_llm_and_tools(llm, tools)
return AgentWithTools(agent=agent_obj, tools=tools, **kwargs)
return agent_cls.from_llm_and_tools(llm, tools, **kwargs)

View File

@@ -3,7 +3,7 @@ from __future__ import annotations
from typing import Any, Callable, List, NamedTuple, Optional, Tuple
from langchain.agents.agent import Agent, AgentWithTools
from langchain.agents.agent import Agent
from langchain.agents.mrkl.prompt import FORMAT_INSTRUCTIONS, PREFIX, SUFFIX
from langchain.agents.tools import Tool
from langchain.llms.base import LLM
@@ -85,7 +85,7 @@ class ZeroShotAgent(Agent):
format_instructions = FORMAT_INSTRUCTIONS.format(tool_names=tool_names)
template = "\n\n".join([prefix, tool_strings, format_instructions, suffix])
if input_variables is None:
input_variables = ["input", "agent_scratchpad"]
input_variables = ["input"]
return PromptTemplate(template=template, input_variables=input_variables)
@classmethod
@@ -101,7 +101,7 @@ class ZeroShotAgent(Agent):
return get_action_and_input(text)
class MRKLChain(AgentWithTools):
class MRKLChain(ZeroShotAgent):
"""Chain that implements the MRKL system.
Example:
@@ -116,9 +116,7 @@ class MRKLChain(AgentWithTools):
"""
@classmethod
def from_chains(
cls, llm: LLM, chains: List[ChainConfig], **kwargs: Any
) -> AgentWithTools:
def from_chains(cls, llm: LLM, chains: List[ChainConfig], **kwargs: Any) -> Agent:
"""User friendly way to initialize the MRKL chain.
This is intended to be an easy way to get up and running with the
@@ -158,5 +156,4 @@ class MRKLChain(AgentWithTools):
Tool(name=c.action_name, func=c.action, description=c.action_description)
for c in chains
]
agent = ZeroShotAgent.from_llm_and_tools(llm, tools)
return cls(agent=agent, tools=tools, **kwargs)
return cls.from_llm_and_tools(llm, tools, **kwargs)

View File

@@ -12,5 +12,4 @@ Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SUFFIX = """Begin!
Question: {input}
Thought:{agent_scratchpad}"""
Question: {input}"""

View File

@@ -1,13 +1,14 @@
"""Chain that implements the ReAct paper from https://arxiv.org/pdf/2210.03629.pdf."""
import re
from typing import Any, List, Optional, Tuple
from typing import Any, ClassVar, List, Optional, Tuple
from pydantic import BaseModel
from langchain.agents.agent import Agent, AgentWithTools
from langchain.agents.agent import Agent
from langchain.agents.react.textworld_prompt import TEXTWORLD_PROMPT
from langchain.agents.react.wiki_prompt import WIKI_PROMPT
from langchain.agents.tools import Tool
from langchain.chains.llm import LLMChain
from langchain.docstore.base import Docstore
from langchain.docstore.document import Document
from langchain.llms.base import LLM
@@ -17,10 +18,7 @@ from langchain.prompts.base import BasePromptTemplate
class ReActDocstoreAgent(Agent, BaseModel):
"""Agent for the ReAct chin."""
@classmethod
def create_prompt(cls, tools: List[Tool]) -> BasePromptTemplate:
"""Return default prompt."""
return WIKI_PROMPT
prompt: ClassVar[BasePromptTemplate] = WIKI_PROMPT
i: int = 1
@@ -102,10 +100,9 @@ class DocstoreExplorer:
class ReActTextWorldAgent(ReActDocstoreAgent, BaseModel):
"""Agent for the ReAct TextWorld chain."""
@classmethod
def create_prompt(cls, tools: List[Tool]) -> BasePromptTemplate:
"""Return default prompt."""
return TEXTWORLD_PROMPT
prompt: ClassVar[BasePromptTemplate] = TEXTWORLD_PROMPT
i: int = 1
@classmethod
def _validate_tools(cls, tools: List[Tool]) -> None:
@@ -116,7 +113,7 @@ class ReActTextWorldAgent(ReActDocstoreAgent, BaseModel):
raise ValueError(f"Tool name should be Play, got {tool_names}")
class ReActChain(AgentWithTools):
class ReActChain(ReActDocstoreAgent):
"""Chain that implements the ReAct paper.
Example:
@@ -133,5 +130,5 @@ class ReActChain(AgentWithTools):
Tool(name="Search", func=docstore_explorer.search),
Tool(name="Lookup", func=docstore_explorer.lookup),
]
agent = ReActDocstoreAgent.from_llm_and_tools(llm, tools)
super().__init__(agent=agent, tools=tools, **kwargs)
llm_chain = LLMChain(llm=llm, prompt=WIKI_PROMPT)
super().__init__(llm_chain=llm_chain, tools=tools, **kwargs)

View File

@@ -44,9 +44,6 @@ Action 4: Finish[yes]
"""
]
SUFFIX = """\n\nSetup: {input}
{agent_scratchpad}"""
SUFFIX = """\n\nSetup: {input}"""
TEXTWORLD_PROMPT = PromptTemplate.from_examples(
EXAMPLES, SUFFIX, ["input", "agent_scratchpad"]
)
TEXTWORLD_PROMPT = PromptTemplate.from_examples(EXAMPLES, SUFFIX, ["input"])

View File

@@ -107,9 +107,6 @@ Thought 3: Leonid Levin is a mathematician and computer scientist. So Pavel Urys
and Leonid Levin have the same type of work.
Action 3: Finish[yes]""",
]
SUFFIX = """\n\nQuestion: {input}
{agent_scratchpad}"""
SUFFIX = """\n\nQuestion: {input}"""
WIKI_PROMPT = PromptTemplate.from_examples(
EXAMPLES, SUFFIX, ["input", "agent_scratchpad"]
)
WIKI_PROMPT = PromptTemplate.from_examples(EXAMPLES, SUFFIX, ["input"])

View File

@@ -1,9 +1,10 @@
"""Chain that does self ask with search."""
from typing import Any, List, Optional, Tuple
from typing import Any, ClassVar, List, Optional, Tuple
from langchain.agents.agent import Agent, AgentWithTools
from langchain.agents.agent import Agent
from langchain.agents.self_ask_with_search.prompt import PROMPT
from langchain.agents.tools import Tool
from langchain.chains.llm import LLMChain
from langchain.llms.base import LLM
from langchain.prompts.base import BasePromptTemplate
from langchain.serpapi import SerpAPIWrapper
@@ -12,10 +13,7 @@ from langchain.serpapi import SerpAPIWrapper
class SelfAskWithSearchAgent(Agent):
"""Agent for the self-ask-with-search paper."""
@classmethod
def create_prompt(cls, tools: List[Tool]) -> BasePromptTemplate:
"""Prompt does not depend on tools."""
return PROMPT
prompt: ClassVar[BasePromptTemplate] = PROMPT
@classmethod
def _validate_tools(cls, tools: List[Tool]) -> None:
@@ -60,10 +58,10 @@ class SelfAskWithSearchAgent(Agent):
@property
def starter_string(self) -> str:
"""Put this string after user input but before first LLM call."""
return "Are follow up questions needed here:"
return "\nAre follow up questions needed here:"
class SelfAskWithSearchChain(AgentWithTools):
class SelfAskWithSearchChain(SelfAskWithSearchAgent):
"""Chain that does self ask with search.
Example:
@@ -77,5 +75,5 @@ class SelfAskWithSearchChain(AgentWithTools):
def __init__(self, llm: LLM, search_chain: SerpAPIWrapper, **kwargs: Any):
"""Initialize with just an LLM and a search chain."""
search_tool = Tool(name="Intermediate Answer", func=search_chain.run)
agent = SelfAskWithSearchAgent.from_llm_and_tools(llm, [search_tool])
super().__init__(agent=agent, tools=[search_tool], **kwargs)
llm_chain = LLMChain(llm=llm, prompt=PROMPT)
super().__init__(llm_chain=llm_chain, tools=[search_tool], **kwargs)

View File

@@ -37,8 +37,5 @@ Follow up: Where is Martin Campbell from?
Intermediate answer: New Zealand.
So the final answer is: No
Question: {input}
Are followup questions needed here:{agent_scratchpad}"""
PROMPT = PromptTemplate(
input_variables=["input", "agent_scratchpad"], template=_DEFAULT_TEMPLATE
)
Question: {input}"""
PROMPT = PromptTemplate(input_variables=["input"], template=_DEFAULT_TEMPLATE)

View File

@@ -1,118 +0,0 @@
"""Beta Feature: base interface for cache."""
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Tuple, Union
from sqlalchemy import Column, Integer, String, create_engine, select
from sqlalchemy.engine.base import Engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from langchain.schema import Generation
RETURN_VAL_TYPE = Union[List[Generation], str]
class BaseCache(ABC):
"""Base interface for cache."""
@abstractmethod
def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up based on prompt and llm_string."""
@abstractmethod
def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
"""Update cache based on prompt and llm_string."""
class InMemoryCache(BaseCache):
"""Cache that stores things in memory."""
def __init__(self) -> None:
"""Initialize with empty cache."""
self._cache: Dict[Tuple[str, str], RETURN_VAL_TYPE] = {}
def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up based on prompt and llm_string."""
return self._cache.get((prompt, llm_string), None)
def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
"""Update cache based on prompt and llm_string."""
self._cache[(prompt, llm_string)] = return_val
Base = declarative_base()
class LLMCache(Base): # type: ignore
"""SQLite table for simple LLM cache (string only)."""
__tablename__ = "llm_cache"
prompt = Column(String, primary_key=True)
llm = Column(String, primary_key=True)
response = Column(String)
class FullLLMCache(Base): # type: ignore
"""SQLite table for full LLM Cache (all generations)."""
__tablename__ = "full_llm_cache"
prompt = Column(String, primary_key=True)
llm = Column(String, primary_key=True)
idx = Column(Integer, primary_key=True)
response = Column(String)
class SQLAlchemyCache(BaseCache):
"""Cache that uses SQAlchemy as a backend."""
def __init__(self, engine: Engine):
"""Initialize by creating all tables."""
self.engine = engine
Base.metadata.create_all(self.engine)
def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
"""Look up based on prompt and llm_string."""
stmt = (
select(FullLLMCache.response)
.where(FullLLMCache.prompt == prompt)
.where(FullLLMCache.llm == llm_string)
.order_by(FullLLMCache.idx)
)
with Session(self.engine) as session:
generations = []
for row in session.execute(stmt):
generations.append(Generation(text=row[0]))
if len(generations) > 0:
return generations
stmt = (
select(LLMCache.response)
.where(LLMCache.prompt == prompt)
.where(LLMCache.llm == llm_string)
)
with Session(self.engine) as session:
for row in session.execute(stmt):
return row[0]
return None
def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
"""Look up based on prompt and llm_string."""
if isinstance(return_val, str):
item = LLMCache(prompt=prompt, llm=llm_string, response=return_val)
with Session(self.engine) as session, session.begin():
session.add(item)
else:
for i, generation in enumerate(return_val):
item = FullLLMCache(
prompt=prompt, llm=llm_string, response=generation.text, idx=i
)
with Session(self.engine) as session, session.begin():
session.add(item)
class SQLiteCache(SQLAlchemyCache):
"""Cache that uses SQLite as a backend."""
def __init__(self, database_path: str = ".langchain.db"):
"""Initialize by creating the engine and all tables."""
engine = create_engine(f"sqlite:///{database_path}")
super().__init__(engine)

View File

@@ -2,37 +2,24 @@
from langchain.chains.api.base import APIChain
from langchain.chains.conversation.base import ConversationChain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.base import LLMBashChain
from langchain.chains.llm_checker.base import LLMCheckerChain
from langchain.chains.llm_math.base import LLMMathChain
from langchain.chains.llm_requests import LLMRequestsChain
from langchain.chains.mapreduce import MapReduceChain
from langchain.chains.moderation import OpenAIModerationChain
from langchain.chains.pal.base import PALChain
from langchain.chains.qa_with_sources.base import QAWithSourcesChain
from langchain.chains.qa_with_sources.vector_db import VectorDBQAWithSourcesChain
from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
from langchain.chains.sql_database.base import SQLDatabaseChain
from langchain.chains.transform import TransformChain
from langchain.chains.vector_db_qa.base import VectorDBQA
__all__ = [
"APIChain",
"ConversationChain",
"LLMChain",
"LLMBashChain",
"LLMCheckerChain",
"LLMMathChain",
"PALChain",
"QAWithSourcesChain",
"SQLDatabaseChain",
"VectorDBQA",
"SequentialChain",
"SimpleSequentialChain",
"VectorDBQA",
"ConversationChain",
"QAWithSourcesChain",
"VectorDBQAWithSourcesChain",
"PALChain",
"APIChain",
"LLMRequestsChain",
"TransformChain",
"MapReduceChain",
"OpenAIModerationChain",
]

View File

@@ -3,6 +3,7 @@ from __future__ import annotations
from typing import Any, Dict, List, Optional
import requests
from pydantic import BaseModel, root_validator
from langchain.chains.api.prompt import API_RESPONSE_PROMPT, API_URL_PROMPT
@@ -10,7 +11,16 @@ from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.input import print_text
from langchain.llms.base import LLM
from langchain.requests import RequestsWrapper
class RequestsWrapper(BaseModel):
"""Lightweight wrapper to partial out everything except the url to hit."""
headers: Optional[dict] = None
def run(self, url: str) -> str:
"""Hit the URL and return the text."""
return requests.get(url, headers=self.headers).text
class APIChain(Chain, BaseModel):

View File

@@ -1,10 +1,8 @@
"""Base interface that all chains should implement."""
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Union
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Extra, Field
import langchain
from pydantic import BaseModel, Extra
class Memory(BaseModel, ABC):
@@ -29,21 +27,13 @@ class Memory(BaseModel, ABC):
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save the context of this model run to memory."""
@abstractmethod
def clear(self) -> None:
"""Clear memory contents."""
def _get_verbosity() -> bool:
return langchain.verbose
class Chain(BaseModel, ABC):
"""Base interface that all chains should implement."""
memory: Optional[Memory] = None
verbose: bool = Field(default_factory=_get_verbosity)
verbose: bool = False
"""Whether to print out response text."""
@property
@@ -74,28 +64,18 @@ class Chain(BaseModel, ABC):
"""Run the logic of this chain and return the output."""
def __call__(
self, inputs: Union[Dict[str, Any], Any], return_only_outputs: bool = False
) -> Dict[str, Any]:
self, inputs: Dict[str, Any], return_only_outputs: bool = False
) -> Dict[str, str]:
"""Run the logic of this chain and add to output if desired.
Args:
inputs: Dictionary of inputs, or single input if chain expects
only one param.
inputs: Dictionary of inputs.
return_only_outputs: boolean for whether to return only outputs in the
response. If True, only new keys generated by this chain will be
returned. If False, both input keys and new keys generated by this
chain will be returned. Defaults to False.
"""
if not isinstance(inputs, dict):
if len(self.input_keys) != 1:
raise ValueError(
f"A single string input was passed in, but this chain expects "
f"multiple inputs ({self.input_keys}). When a chain expects "
f"multiple inputs, please call it by passing in a dictionary, "
"eg `chain({'foo': 1, 'bar': 2})`"
)
inputs = {self.input_keys[0]: inputs}
if self.memory is not None:
external_context = self.memory.load_memory_variables(inputs)
inputs = dict(inputs, **external_context)

View File

@@ -1,22 +1,21 @@
"""Chain that combines documents by stuffing into context."""
"""Document combining chain."""
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.docstore.document import Document
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.prompts.prompt import Prompt
def _get_default_document_prompt() -> PromptTemplate:
return PromptTemplate(input_variables=["page_content"], template="{page_content}")
def _get_default_document_prompt() -> Prompt:
return Prompt(input_variables=["page_content"], template="{page_content}")
class StuffDocumentsChain(BaseCombineDocumentsChain, BaseModel):
"""Chain that combines documents by stuffing into context."""
class CombineDocumentsChain(Chain, BaseModel):
"""Combine documents."""
llm_chain: LLMChain
"""LLM wrapper to use after formatting documents."""
@@ -27,6 +26,8 @@ class StuffDocumentsChain(BaseCombineDocumentsChain, BaseModel):
document_variable_name: str
"""The variable name in the llm_chain to put the documents in.
If only one variable in the llm_chain, this need not be provided."""
input_key: str = "input_documents" #: :meta private:
output_key: str = "output_text" #: :meta private:
class Config:
"""Configuration for this pydantic object."""
@@ -34,6 +35,22 @@ class StuffDocumentsChain(BaseCombineDocumentsChain, BaseModel):
extra = Extra.forbid
arbitrary_types_allowed = True
@property
def input_keys(self) -> List[str]:
"""Expect input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return output key.
:meta private:
"""
return [self.output_key]
@root_validator(pre=True)
def get_default_document_variable_name(cls, values: Dict) -> Dict:
"""Get default document variable name, if not provided."""
@@ -55,7 +72,10 @@ class StuffDocumentsChain(BaseCombineDocumentsChain, BaseModel):
)
return values
def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
docs = inputs[self.input_key]
# Other keys are assumed to be needed for LLM prediction
other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
# Get relevant information from each document.
doc_dicts = []
for doc in docs:
@@ -68,18 +88,7 @@ class StuffDocumentsChain(BaseCombineDocumentsChain, BaseModel):
# Format each document according to the prompt
doc_strings = [self.document_prompt.format(**doc) for doc in doc_dicts]
# Join the documents together to put them in the prompt.
inputs = kwargs.copy()
inputs[self.document_variable_name] = "\n\n".join(doc_strings)
return inputs
def prompt_length(self, docs: List[Document], **kwargs: Any) -> Optional[int]:
"""Get the prompt length by formatting the prompt."""
inputs = self._get_inputs(docs, **kwargs)
prompt = self.llm_chain.prompt.format(**inputs)
return self.llm_chain.llm.get_num_tokens(prompt)
def combine_docs(self, docs: List[Document], **kwargs: Any) -> str:
"""Stuff all documents into one prompt and pass to LLM."""
inputs = self._get_inputs(docs, **kwargs)
other_keys[self.document_variable_name] = "\n".join(doc_strings)
# Call predict on the LLM.
return self.llm_chain.predict(**inputs)
output = self.llm_chain.predict(**other_keys)
return {self.output_key: output}

View File

@@ -1 +0,0 @@
"""Different ways to combine documents."""

View File

@@ -1,50 +0,0 @@
"""Base interface for chains combining documents."""
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
from pydantic import BaseModel
from langchain.chains.base import Chain
from langchain.docstore.document import Document
class BaseCombineDocumentsChain(Chain, BaseModel, ABC):
"""Base interface for chains combining documents."""
input_key: str = "input_documents" #: :meta private:
output_key: str = "output_text" #: :meta private:
@property
def input_keys(self) -> List[str]:
"""Expect input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return output key.
:meta private:
"""
return [self.output_key]
def prompt_length(self, docs: List[Document], **kwargs: Any) -> Optional[int]:
"""Return the prompt length given the documents passed in.
Returns None if the method does not depend on the prompt length.
"""
return None
@abstractmethod
def combine_docs(self, docs: List[Document], **kwargs: Any) -> str:
"""Combine documents into a single string."""
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
docs = inputs[self.input_key]
# Other keys are assumed to be needed for LLM prediction
other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
output = self.combine_docs(docs, **other_keys)
return {self.output_key: output}

View File

@@ -1,137 +0,0 @@
"""Combining documents by mapping a chain over them first, then combining results."""
from __future__ import annotations
from typing import Any, Callable, Dict, List, Optional
from pydantic import BaseModel, Extra, root_validator
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.docstore.document import Document
def _split_list_of_docs(
docs: List[Document], length_func: Callable, token_max: int, **kwargs: Any
) -> List[List[Document]]:
new_result_doc_list = []
_sub_result_docs = []
for doc in docs:
_sub_result_docs.append(doc)
_num_tokens = length_func(_sub_result_docs, **kwargs)
if _num_tokens > token_max:
if len(_sub_result_docs) == 1:
raise ValueError(
"A single document was longer than the context length,"
" we cannot handle this."
)
if len(_sub_result_docs) == 2:
raise ValueError(
"A single document was so long it could not be combined "
"with another document, we cannot handle this."
)
new_result_doc_list.append(_sub_result_docs[:-1])
_sub_result_docs = _sub_result_docs[-1:]
new_result_doc_list.append(_sub_result_docs)
return new_result_doc_list
def _collapse_docs(
docs: List[Document],
combine_document_func: Callable,
**kwargs: Any,
) -> Document:
result = combine_document_func(docs, **kwargs)
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
for doc in docs[1:]:
for k, v in doc.metadata.items():
if k in combined_metadata:
combined_metadata[k] += f", {v}"
else:
combined_metadata[k] = str(v)
return Document(page_content=result, metadata=combined_metadata)
class MapReduceDocumentsChain(BaseCombineDocumentsChain, BaseModel):
"""Combining documents by mapping a chain over them, then combining results."""
llm_chain: LLMChain
"""Chain to apply to each document individually."""
combine_document_chain: BaseCombineDocumentsChain
"""Chain to use to combine results of applying llm_chain to documents."""
collapse_document_chain: Optional[BaseCombineDocumentsChain] = None
"""Chain to use to collapse intermediary results if needed.
If None, will use the combine_document_chain."""
document_variable_name: str
"""The variable name in the llm_chain to put the documents in.
If only one variable in the llm_chain, this need not be provided."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
@root_validator(pre=True)
def get_default_document_variable_name(cls, values: Dict) -> Dict:
"""Get default document variable name, if not provided."""
if "document_variable_name" not in values:
llm_chain_variables = values["llm_chain"].prompt.input_variables
if len(llm_chain_variables) == 1:
values["document_variable_name"] = llm_chain_variables[0]
else:
raise ValueError(
"document_variable_name must be provided if there are "
"multiple llm_chain input_variables"
)
else:
llm_chain_variables = values["llm_chain"].prompt.input_variables
if values["document_variable_name"] not in llm_chain_variables:
raise ValueError(
f"document_variable_name {values['document_variable_name']} was "
f"not found in llm_chain input_variables: {llm_chain_variables}"
)
return values
@property
def _collapse_chain(self) -> BaseCombineDocumentsChain:
if self.collapse_document_chain is not None:
return self.collapse_document_chain
else:
return self.combine_document_chain
def combine_docs(
self, docs: List[Document], token_max: int = 3000, **kwargs: Any
) -> str:
"""Combine documents in a map reduce manner.
Combine by mapping first chain over all documents, then reducing the results.
This reducing can be done recursively if needed (if there are many documents).
"""
results = self.llm_chain.apply(
# FYI - this is parallelized and so it is fast.
[{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs]
)
question_result_key = self.llm_chain.output_key
result_docs = [
Document(page_content=r[question_result_key], metadata=docs[i].metadata)
# This uses metadata from the docs, and the textual results from `results`
for i, r in enumerate(results)
]
length_func = self.combine_document_chain.prompt_length
num_tokens = length_func(result_docs, **kwargs)
while num_tokens is not None and num_tokens > token_max:
new_result_doc_list = _split_list_of_docs(
result_docs, length_func, token_max, **kwargs
)
result_docs = []
for docs in new_result_doc_list:
new_doc = _collapse_docs(
docs, self._collapse_chain.combine_docs, **kwargs
)
result_docs.append(new_doc)
num_tokens = self.combine_document_chain.prompt_length(
result_docs, **kwargs
)
output = self.combine_document_chain.combine_docs(result_docs, **kwargs)
return output

View File

@@ -1,88 +0,0 @@
"""Combining documents by doing a first pass and then refining on more documents."""
from __future__ import annotations
from typing import Any, Dict, List
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.docstore.document import Document
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.prompt import PromptTemplate
def _get_default_document_prompt() -> PromptTemplate:
return PromptTemplate(input_variables=["page_content"], template="{page_content}")
class RefineDocumentsChain(BaseCombineDocumentsChain, BaseModel):
"""Combine documents by doing a first pass and then refining on more documents."""
initial_llm_chain: LLMChain
"""LLM chain to use on initial document."""
refine_llm_chain: LLMChain
"""LLM chain to use when refining."""
document_variable_name: str
"""The variable name in the initial_llm_chain to put the documents in.
If only one variable in the initial_llm_chain, this need not be provided."""
initial_response_name: str
"""The variable name to format the initial response in when refining."""
document_prompt: BasePromptTemplate = Field(
default_factory=_get_default_document_prompt
)
"""Prompt to use to format each document."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
@root_validator(pre=True)
def get_default_document_variable_name(cls, values: Dict) -> Dict:
"""Get default document variable name, if not provided."""
if "document_variable_name" not in values:
llm_chain_variables = values["initial_llm_chain"].prompt.input_variables
if len(llm_chain_variables) == 1:
values["document_variable_name"] = llm_chain_variables[0]
else:
raise ValueError(
"document_variable_name must be provided if there are "
"multiple llm_chain input_variables"
)
else:
llm_chain_variables = values["initial_llm_chain"].prompt.input_variables
if values["document_variable_name"] not in llm_chain_variables:
raise ValueError(
f"document_variable_name {values['document_variable_name']} was "
f"not found in llm_chain input_variables: {llm_chain_variables}"
)
return values
def combine_docs(self, docs: List[Document], **kwargs: Any) -> str:
"""Combine by mapping first chain over all, then stuffing into final chain."""
base_info = {"page_content": docs[0].page_content}
base_info.update(docs[0].metadata)
document_info = {k: base_info[k] for k in self.document_prompt.input_variables}
base_inputs: dict = {
self.document_variable_name: self.document_prompt.format(**document_info)
}
inputs = {**base_inputs, **kwargs}
res = self.initial_llm_chain.predict(**inputs)
for doc in docs[1:]:
base_info = {"page_content": doc.page_content}
base_info.update(doc.metadata)
document_info = {
k: base_info[k] for k in self.document_prompt.input_variables
}
base_inputs = {
self.document_variable_name: self.document_prompt.format(
**document_info
),
self.initial_response_name: res,
}
inputs = {**base_inputs, **kwargs}
res = self.refine_llm_chain.predict(**inputs)
return res

View File

@@ -1,7 +1,7 @@
"""Memory modules for conversation prompts."""
from typing import Any, Dict, List
from pydantic import BaseModel, Field, root_validator
from pydantic import BaseModel, root_validator
from langchain.chains.base import Memory
from langchain.chains.conversation.prompt import SUMMARY_PROMPT
@@ -46,43 +46,6 @@ class ConversationBufferMemory(Memory, BaseModel):
ai = "AI: " + outputs[list(outputs.keys())[0]]
self.buffer += "\n" + "\n".join([human, ai])
def clear(self) -> None:
"""Clear memory contents."""
self.buffer = ""
class ConversationalBufferWindowMemory(Memory, BaseModel):
"""Buffer for storing conversation memory."""
buffer: List[str] = Field(default_factory=list)
memory_key: str = "history" #: :meta private:
k: int = 5
@property
def memory_variables(self) -> List[str]:
"""Will always return list of memory variables.
:meta private:
"""
return [self.memory_key]
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Return history buffer."""
return {self.memory_key: "\n".join(self.buffer[-self.k :])}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
prompt_input_key = _get_prompt_input_key(inputs, self.memory_variables)
if len(outputs) != 1:
raise ValueError(f"One output key expected, got {outputs.keys()}")
human = "Human: " + inputs[prompt_input_key]
ai = "AI: " + outputs[list(outputs.keys())[0]]
self.buffer.append("\n".join([human, ai]))
def clear(self) -> None:
"""Clear memory contents."""
self.buffer = []
class ConversationSummaryMemory(Memory, BaseModel):
"""Conversation summarizer to memory."""
@@ -126,7 +89,3 @@ class ConversationSummaryMemory(Memory, BaseModel):
new_lines = "\n".join([human, ai])
chain = LLMChain(llm=self.llm, prompt=self.prompt)
self.buffer = chain.predict(summary=self.buffer, new_lines=new_lines)
def clear(self) -> None:
"""Clear memory contents."""
self.buffer = ""

View File

@@ -51,34 +51,18 @@ class LLMChain(Chain, BaseModel):
"""
return [self.output_key]
def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:
"""Utilize the LLM generate method for speed gains."""
stop = None
if "stop" in input_list[0]:
stop = input_list[0]["stop"]
prompts = []
for inputs in input_list:
selected_inputs = {k: inputs[k] for k in self.prompt.input_variables}
prompt = self.prompt.format(**selected_inputs)
if self.verbose:
langchain.logger.log_llm_inputs(selected_inputs, prompt)
if "stop" in inputs and inputs["stop"] != stop:
raise ValueError(
"If `stop` is present in any inputs, should be present in all."
)
prompts.append(prompt)
response = self.llm.generate(prompts, stop=stop)
outputs = []
for generation in response.generations:
# Get the text of the top generated string.
response_str = generation[0].text
if self.verbose:
langchain.logger.log_llm_response(response_str)
outputs.append({self.output_key: response_str})
return outputs
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
return self.apply([inputs])[0]
selected_inputs = {k: inputs[k] for k in self.prompt.input_variables}
prompt = self.prompt.format(**selected_inputs)
if self.verbose:
langchain.logger.log_llm_inputs(selected_inputs, prompt)
kwargs = {}
if "stop" in inputs:
kwargs["stop"] = inputs["stop"]
response = self.llm(prompt, **kwargs)
if self.verbose:
langchain.logger.log_llm_response(response)
return {self.output_key: response}
def predict(self, **kwargs: Any) -> str:
"""Format prompt with kwargs and pass to LLM.

View File

@@ -1,77 +0,0 @@
"""Chain that interprets a prompt and executes bash code to perform bash operations."""
from typing import Dict, List
from pydantic import BaseModel, Extra
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.prompt import PROMPT
from langchain.input import print_text
from langchain.llms.base import LLM
from langchain.utilities.bash import BashProcess
class LLMBashChain(Chain, BaseModel):
"""Chain that interprets a prompt and executes bash code to perform bash operations.
Example:
.. code-block:: python
from langchain import LLMBashChain, OpenAI
llm_bash = LLMBashChain(llm=OpenAI())
"""
llm: LLM
"""LLM wrapper to use."""
input_key: str = "question" #: :meta private:
output_key: str = "answer" #: :meta private:
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
@property
def input_keys(self) -> List[str]:
"""Expect input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Expect output key.
:meta private:
"""
return [self.output_key]
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
llm_executor = LLMChain(prompt=PROMPT, llm=self.llm)
bash_executor = BashProcess()
if self.verbose:
print_text(inputs[self.input_key])
t = llm_executor.predict(question=inputs[self.input_key])
if self.verbose:
print_text(t, color="green")
t = t.strip()
if t.startswith("```bash"):
# Split the string into a list of substrings
command_list = t.split("\n")
print(command_list)
# Remove the first and last substrings
command_list = [s for s in command_list[1:-1]]
output = bash_executor.run(command_list)
if self.verbose:
print_text("\nAnswer: ")
print_text(output, color="yellow")
else:
raise ValueError(f"unknown format from LLM: {t}")
return {self.output_key: output}

View File

@@ -1,22 +0,0 @@
# flake8: noqa
from langchain.prompts.prompt import PromptTemplate
_PROMPT_TEMPLATE = """If someone asks you to perform a task, your job is to come up with a series of bash commands that will perform the task. There is no need to put "#!/bin/bash" in your answer. Make sure to reason step by step, using this format:
Question: "copy the files in the directory named 'target' into a new directory at the same level as target called 'myNewDirectory'"
I need to take the following actions:
- List all files in the directory
- Create a new directory
- Copy the files from the first directory into the second directory
```bash
ls
mkdir myNewDirectory
cp -r target/* myNewDirectory
```
That is the format. Begin!
Question: {question}"""
PROMPT = PromptTemplate(input_variables=["question"], template=_PROMPT_TEMPLATE)

View File

@@ -1,4 +0,0 @@
"""Chain that tries to verify assumptions before answering a question.
Heavily borrowed from https://github.com/jagilley/fact-checker
"""

View File

@@ -1,98 +0,0 @@
"""Chain for question-answering with self-verification."""
from typing import Dict, List
from pydantic import BaseModel, Extra
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.chains.llm_checker.prompt import (
CHECK_ASSERTIONS_PROMPT,
CREATE_DRAFT_ANSWER_PROMPT,
LIST_ASSERTIONS_PROMPT,
REVISED_ANSWER_PROMPT,
)
from langchain.chains.sequential import SequentialChain
from langchain.llms.base import LLM
from langchain.prompts import PromptTemplate
class LLMCheckerChain(Chain, BaseModel):
"""Chain for question-answering with self-verification.
Example:
.. code-block:: python
from langchain import OpenAI, LLMCheckerChain
llm = OpenAI(temperature=0.7)
checker_chain = LLMCheckerChain(llm=llm)
"""
llm: LLM
"""LLM wrapper to use."""
create_draft_answer_prompt: PromptTemplate = CREATE_DRAFT_ANSWER_PROMPT
list_assertions_prompt: PromptTemplate = LIST_ASSERTIONS_PROMPT
check_assertions_prompt: PromptTemplate = CHECK_ASSERTIONS_PROMPT
revised_answer_prompt: PromptTemplate = REVISED_ANSWER_PROMPT
"""Prompt to use when questioning the documents."""
input_key: str = "query" #: :meta private:
output_key: str = "result" #: :meta private:
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
@property
def input_keys(self) -> List[str]:
"""Return the singular input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return the singular output key.
:meta private:
"""
return [self.output_key]
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
question = inputs[self.input_key]
create_draft_answer_chain = LLMChain(
llm=self.llm, prompt=self.create_draft_answer_prompt, output_key="statement"
)
list_assertions_chain = LLMChain(
llm=self.llm, prompt=self.list_assertions_prompt, output_key="assertions"
)
check_assertions_chain = LLMChain(
llm=self.llm,
prompt=self.check_assertions_prompt,
output_key="checked_assertions",
)
revised_answer_chain = LLMChain(
llm=self.llm,
prompt=self.revised_answer_prompt,
output_key="revised_statement",
)
chains = [
create_draft_answer_chain,
list_assertions_chain,
check_assertions_chain,
revised_answer_chain,
]
question_to_checked_assertions_chain = SequentialChain(
chains=chains,
input_variables=["question"],
output_variables=["revised_statement"],
verbose=True,
)
output = question_to_checked_assertions_chain({"question": question})
return {self.output_key: output["revised_statement"]}

View File

@@ -1,31 +0,0 @@
# flake8: noqa
from langchain.prompts.prompt import PromptTemplate
_CREATE_DRAFT_ANSWER_TEMPLATE = """{question}\n\n"""
CREATE_DRAFT_ANSWER_PROMPT = PromptTemplate(
input_variables=["question"], template=_CREATE_DRAFT_ANSWER_TEMPLATE
)
_LIST_ASSERTIONS_TEMPLATE = """Here is a statement:
{statement}
Make a bullet point list of the assumptions you made when producing the above statement.\n\n"""
LIST_ASSERTIONS_PROMPT = PromptTemplate(
input_variables=["statement"], template=_LIST_ASSERTIONS_TEMPLATE
)
_CHECK_ASSERTIONS_TEMPLATE = """Here is a bullet point list of assertions:
{assertions}
For each assertion, determine whether it is true or false. If it is false, explain why.\n\n"""
CHECK_ASSERTIONS_PROMPT = PromptTemplate(
input_variables=["assertions"], template=_CHECK_ASSERTIONS_TEMPLATE
)
_REVISED_ANSWER_TEMPLATE = """{checked_assertions}
Question: In light of the above assertions and checks, how would you answer the question '{question}'?
Answer:"""
REVISED_ANSWER_PROMPT = PromptTemplate(
input_variables=["checked_assertions", "question"],
template=_REVISED_ANSWER_TEMPLATE,
)

View File

@@ -1,73 +0,0 @@
"""Chain that hits a URL and then uses an LLM to parse results."""
from __future__ import annotations
from typing import Dict, List
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.chains import LLMChain
from langchain.chains.base import Chain
from langchain.requests import RequestsWrapper
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" # noqa: E501
}
class LLMRequestsChain(Chain, BaseModel):
"""Chain that hits a URL and then uses an LLM to parse results."""
llm_chain: LLMChain
requests_wrapper: RequestsWrapper = Field(default_factory=RequestsWrapper)
text_length: int = 8000
requests_key: str = "requests_result" #: :meta private:
input_key: str = "url" #: :meta private:
output_key: str = "output" #: :meta private:
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
arbitrary_types_allowed = True
@property
def input_keys(self) -> List[str]:
"""Will be whatever keys the prompt expects.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Will always return text key.
:meta private:
"""
return [self.output_key]
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
try:
from bs4 import BeautifulSoup # noqa: F401
except ImportError:
raise ValueError(
"Could not import bs4 python package. "
"Please it install it with `pip install bs4`."
)
return values
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
from bs4 import BeautifulSoup
# Other keys are assumed to be needed for LLM prediction
other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
url = inputs[self.input_key]
res = self.requests_wrapper.run(url)
# extract the text from the html
soup = BeautifulSoup(res, "html.parser")
other_keys[self.requests_key] = soup.get_text()[: self.text_length]
result = self.llm_chain.predict(**other_keys)
return {self.output_key: result}

View File

@@ -10,9 +10,7 @@ from typing import Dict, List
from pydantic import BaseModel, Extra
from langchain.chains.base import Chain
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.combine_documents import CombineDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.docstore.document import Document
from langchain.llms.base import LLM
@@ -23,8 +21,10 @@ from langchain.text_splitter import TextSplitter
class MapReduceChain(Chain, BaseModel):
"""Map-reduce chain."""
combine_documents_chain: BaseCombineDocumentsChain
"""Chain to use to combine documents."""
map_llm: LLMChain
"""LLM wrapper to use for the map step."""
reduce_llm: LLMChain
"""LLM wrapper to use for the reduce step."""
text_splitter: TextSplitter
"""Text splitter to use."""
input_key: str = "input_text" #: :meta private:
@@ -36,13 +36,7 @@ class MapReduceChain(Chain, BaseModel):
) -> MapReduceChain:
"""Construct a map-reduce chain that uses the chain for map and reduce."""
llm_chain = LLMChain(llm=llm, prompt=prompt)
reduce_chain = StuffDocumentsChain(llm_chain=llm_chain)
combine_documents_chain = MapReduceDocumentsChain(
llm_chain=llm_chain, combine_document_chain=reduce_chain
)
return cls(
combine_documents_chain=combine_documents_chain, text_splitter=text_splitter
)
return cls(map_llm=llm_chain, reduce_llm=llm_chain, text_splitter=text_splitter)
class Config:
"""Configuration for this pydantic object."""
@@ -68,7 +62,16 @@ class MapReduceChain(Chain, BaseModel):
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
# Split the larger text into smaller chunks.
texts = self.text_splitter.split_text(inputs[self.input_key])
docs = [Document(page_content=text) for text in texts]
outputs = self.combine_documents_chain.combine_docs(docs)
return {self.output_key: outputs}
docs = self.text_splitter.split_text(inputs[self.input_key])
# Now that we have the chunks, we send them to the LLM and track results.
# This is the "map" part.
input_list = [{self.map_llm.prompt.input_variables[0]: d} for d in docs]
summary_results = self.map_llm.apply(input_list)
summaries = [res[self.map_llm.output_key] for res in summary_results]
summary_docs = [Document(page_content=text) for text in summaries]
# We then need to combine these individual parts into one.
# This is the reduce part.
reduce_chain = CombineDocumentsChain(llm_chain=self.reduce_llm)
outputs = reduce_chain({reduce_chain.input_key: summary_docs})
return {self.output_key: outputs[self.output_key]}

View File

@@ -1,82 +0,0 @@
"""Pass input through a moderation endpoint."""
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, root_validator
from langchain.chains.base import Chain
from langchain.utils import get_from_dict_or_env
class OpenAIModerationChain(Chain, BaseModel):
"""Pass input through a moderation endpoint.
To use, you should have the ``openai`` python package installed, and the
environment variable ``OPENAI_API_KEY`` set with your API key.
Any parameters that are valid to be passed to the openai.create call can be passed
in, even if not explicitly saved on this class.
Example:
.. code-block:: python
from langchain.chains import OpenAIModerationChain
moderation = OpenAIModerationChain()
"""
client: Any #: :meta private:
model_name: Optional[str] = None
"""Moderation model name to use."""
error: bool = False
"""Whether or not to error if bad content was found."""
input_key: str = "input" #: :meta private:
output_key: str = "output" #: :meta private:
openai_api_key: Optional[str] = None
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
openai_api_key = get_from_dict_or_env(
values, "openai_api_key", "OPENAI_API_KEY"
)
try:
import openai
openai.api_key = openai_api_key
values["client"] = openai.Moderation
except ImportError:
raise ValueError(
"Could not import openai python package. "
"Please it install it with `pip install openai`."
)
return values
@property
def input_keys(self) -> List[str]:
"""Expect input key.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return output key.
:meta private:
"""
return [self.output_key]
def _moderate(self, text: str, results: dict) -> str:
if results["flagged"]:
error_str = "Text was found that violates OpenAI's content policy."
if self.error:
raise ValueError(error_str)
else:
return error_str
return text
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
text = inputs[self.input_key]
results = self.client.create(text)
output = self._moderate(text, results["results"][0])
return {self.output_key: output}

View File

@@ -1,119 +1 @@
"""Load question answering with sources chains."""
from typing import Any, Mapping, Optional, Protocol
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.refine import RefineDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.chains.qa_with_sources import (
map_reduce_prompt,
refine_prompts,
stuff_prompt,
)
from langchain.llms.base import LLM
from langchain.prompts.base import BasePromptTemplate
class LoadingCallable(Protocol):
"""Interface for loading the combine documents chain."""
def __call__(self, llm: LLM, **kwargs: Any) -> BaseCombineDocumentsChain:
"""Callable to load the combine documents chain."""
def _load_stuff_chain(
llm: LLM,
prompt: BasePromptTemplate = stuff_prompt.PROMPT,
document_variable_name: str = "summaries",
**kwargs: Any,
) -> StuffDocumentsChain:
llm_chain = LLMChain(llm=llm, prompt=prompt)
return StuffDocumentsChain(
llm_chain=llm_chain,
document_variable_name=document_variable_name,
document_prompt=stuff_prompt.EXAMPLE_PROMPT,
**kwargs,
)
def _load_map_reduce_chain(
llm: LLM,
question_prompt: BasePromptTemplate = map_reduce_prompt.QUESTION_PROMPT,
combine_prompt: BasePromptTemplate = map_reduce_prompt.COMBINE_PROMPT,
document_prompt: BasePromptTemplate = map_reduce_prompt.EXAMPLE_PROMPT,
combine_document_variable_name: str = "summaries",
map_reduce_document_variable_name: str = "context",
collapse_prompt: Optional[BasePromptTemplate] = None,
**kwargs: Any,
) -> MapReduceDocumentsChain:
map_chain = LLMChain(llm=llm, prompt=question_prompt)
reduce_chain = LLMChain(llm=llm, prompt=combine_prompt)
combine_document_chain = StuffDocumentsChain(
llm_chain=reduce_chain,
document_variable_name=combine_document_variable_name,
document_prompt=document_prompt,
)
if collapse_prompt is None:
collapse_chain = None
else:
collapse_chain = StuffDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=collapse_prompt),
document_variable_name=combine_document_variable_name,
document_prompt=document_prompt,
)
return MapReduceDocumentsChain(
llm_chain=map_chain,
combine_document_chain=combine_document_chain,
document_variable_name=map_reduce_document_variable_name,
collapse_document_chain=collapse_chain,
**kwargs,
)
def _load_refine_chain(
llm: LLM,
question_prompt: BasePromptTemplate = refine_prompts.DEFAULT_TEXT_QA_PROMPT,
refine_prompt: BasePromptTemplate = refine_prompts.DEFAULT_REFINE_PROMPT,
document_prompt: BasePromptTemplate = refine_prompts.EXAMPLE_PROMPT,
document_variable_name: str = "context_str",
initial_response_name: str = "existing_answer",
**kwargs: Any,
) -> RefineDocumentsChain:
initial_chain = LLMChain(llm=llm, prompt=question_prompt)
refine_chain = LLMChain(llm=llm, prompt=refine_prompt)
return RefineDocumentsChain(
initial_llm_chain=initial_chain,
refine_llm_chain=refine_chain,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
document_prompt=document_prompt,
**kwargs,
)
def load_qa_with_sources_chain(
llm: LLM, chain_type: str = "stuff", **kwargs: Any
) -> BaseCombineDocumentsChain:
"""Load question answering with sources chain.
Args:
llm: Language Model to use in the chain.
chain_type: Type of document combining chain to use. Should be one of "stuff",
"map_reduce", and "refine".
Returns:
A chain to use for question answering with sources.
"""
loader_mapping: Mapping[str, LoadingCallable] = {
"stuff": _load_stuff_chain,
"map_reduce": _load_map_reduce_chain,
"refine": _load_refine_chain,
}
if chain_type not in loader_mapping:
raise ValueError(
f"Got unsupported chain type: {chain_type}. "
f"Should be one of {loader_mapping.keys()}"
)
_func: LoadingCallable = loader_mapping[chain_type]
return _func(llm, **kwargs)
"""Question answering with sources over documents."""

View File

@@ -8,11 +8,9 @@ from typing import Any, Dict, List
from pydantic import BaseModel, Extra, root_validator
from langchain.chains.base import Chain
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.combine_documents import CombineDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.chains.qa_with_sources.map_reduce_prompt import (
from langchain.chains.qa_with_sources.prompt import (
COMBINE_PROMPT,
EXAMPLE_PROMPT,
QUESTION_PROMPT,
@@ -25,8 +23,12 @@ from langchain.prompts.base import BasePromptTemplate
class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
"""Question answering with sources over documents."""
combine_document_chain: BaseCombineDocumentsChain
llm_question_chain: LLMChain
"""LLM wrapper to use for asking questions to each document."""
combine_document_chain: CombineDocumentsChain
"""Chain to use to combine documents."""
doc_source_key: str = "source"
"""Key in document.metadata to use as source information"""
question_key: str = "question" #: :meta private:
input_docs_key: str = "docs" #: :meta private:
answer_key: str = "answer" #: :meta private:
@@ -36,7 +38,7 @@ class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
def from_llm(
cls,
llm: LLM,
document_prompt: BasePromptTemplate = EXAMPLE_PROMPT,
combine_document_prompt: BasePromptTemplate = EXAMPLE_PROMPT,
question_prompt: BasePromptTemplate = QUESTION_PROMPT,
combine_prompt: BasePromptTemplate = COMBINE_PROMPT,
**kwargs: Any,
@@ -44,17 +46,13 @@ class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
"""Construct the chain from an LLM."""
llm_question_chain = LLMChain(llm=llm, prompt=question_prompt)
llm_combine_chain = LLMChain(llm=llm, prompt=combine_prompt)
combine_results_chain = StuffDocumentsChain(
combine_document_chain = CombineDocumentsChain(
llm_chain=llm_combine_chain,
document_prompt=document_prompt,
document_prompt=combine_document_prompt,
document_variable_name="summaries",
)
combine_document_chain = MapReduceDocumentsChain(
llm_chain=llm_question_chain,
combine_document_chain=combine_results_chain,
document_variable_name="context",
)
return cls(
llm_question_chain=llm_question_chain,
combine_document_chain=combine_document_chain,
**kwargs,
)
@@ -84,7 +82,7 @@ class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
@root_validator(pre=True)
def validate_question_chain(cls, values: Dict) -> Dict:
"""Validate question chain."""
llm_question_chain = values["combine_document_chain"].llm_chain
llm_question_chain = values["llm_question_chain"]
if len(llm_question_chain.input_keys) != 2:
raise ValueError(
f"The llm_question_chain should have two inputs: a content key "
@@ -106,7 +104,23 @@ class BaseQAWithSourcesChain(Chain, BaseModel, ABC):
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
docs = self._get_docs(inputs)
answer = self.combine_document_chain.combine_docs(docs, **inputs)
query = inputs[self.question_key]
content_key, query_key = self.llm_question_chain.input_keys
results = self.llm_question_chain.apply(
[{content_key: d.page_content, query_key: query} for d in docs]
)
question_result_key = self.llm_question_chain.output_key
result_docs = [
Document(page_content=r[question_result_key], metadata=docs[i].metadata)
for i, r in enumerate(results)
]
answer_dict = self.combine_document_chain(
{
self.combine_document_chain.input_key: result_docs,
self.question_key: query,
}
)
answer = answer_dict[self.combine_document_chain.output_key]
if "\nSOURCES: " in answer:
answer, sources = answer.split("\nSOURCES: ")
else:
@@ -128,4 +142,4 @@ class QAWithSourcesChain(BaseQAWithSourcesChain, BaseModel):
return [self.input_docs_key, self.question_key]
def _get_docs(self, inputs: Dict[str, Any]) -> List[Document]:
return inputs.pop(self.input_docs_key)
return inputs[self.input_docs_key]

View File

@@ -1,38 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {question}\n"
"We have provided an existing answer, including sources: {existing_answer}\n"
"We have the opportunity to refine the existing answer"
"(only if needed) with some more context below.\n"
"------------\n"
"{context_str}\n"
"------------\n"
"Given the new context, refine the original answer to better "
"answer the question. "
"If you do update it, please update the sources as well. "
"If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = PromptTemplate(
input_variables=["question", "existing_answer", "context_str"],
template=DEFAULT_REFINE_PROMPT_TMPL,
)
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the question: {question}\n"
)
DEFAULT_TEXT_QA_PROMPT = PromptTemplate(
input_variables=["context_str", "question"], template=DEFAULT_TEXT_QA_PROMPT_TMPL
)
EXAMPLE_PROMPT = PromptTemplate(
template="Content: {page_content}\nSource: {source}",
input_variables=["page_content", "source"],
)

View File

@@ -1,44 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other) right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
=========
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl
QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Source: 0-pl
Content: And we wont stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLets use this moment to reset. Lets stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease. \n\nLets stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans. \n\nWe cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Source: 24-pl
Content: And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards. \n\nTo all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n\nThese steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Source: 5-pl
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIts based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more. \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimers, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
Source: 34-pl
=========
FINAL ANSWER: The president did not mention Michael Jackson.
SOURCES:
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])
EXAMPLE_PROMPT = PromptTemplate(
template="Content: {page_content}\nSource: {source}",
input_variables=["page_content", "source"],
)

View File

@@ -1,111 +0,0 @@
"""Load question answering chains."""
from typing import Any, Mapping, Optional, Protocol
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.refine import RefineDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.chains.question_answering import (
map_reduce_prompt,
refine_prompts,
stuff_prompt,
)
from langchain.llms.base import LLM
from langchain.prompts.base import BasePromptTemplate
class LoadingCallable(Protocol):
"""Interface for loading the combine documents chain."""
def __call__(self, llm: LLM, **kwargs: Any) -> BaseCombineDocumentsChain:
"""Callable to load the combine documents chain."""
def _load_stuff_chain(
llm: LLM,
prompt: BasePromptTemplate = stuff_prompt.PROMPT,
document_variable_name: str = "context",
**kwargs: Any,
) -> StuffDocumentsChain:
llm_chain = LLMChain(llm=llm, prompt=prompt)
# TODO: document prompt
return StuffDocumentsChain(
llm_chain=llm_chain, document_variable_name=document_variable_name, **kwargs
)
def _load_map_reduce_chain(
llm: LLM,
question_prompt: BasePromptTemplate = map_reduce_prompt.QUESTION_PROMPT,
combine_prompt: BasePromptTemplate = map_reduce_prompt.COMBINE_PROMPT,
combine_document_variable_name: str = "summaries",
map_reduce_document_variable_name: str = "context",
collapse_prompt: Optional[BasePromptTemplate] = None,
**kwargs: Any,
) -> MapReduceDocumentsChain:
map_chain = LLMChain(llm=llm, prompt=question_prompt)
reduce_chain = LLMChain(llm=llm, prompt=combine_prompt)
# TODO: document prompt
combine_document_chain = StuffDocumentsChain(
llm_chain=reduce_chain, document_variable_name=combine_document_variable_name
)
if collapse_prompt is None:
collapse_chain = None
else:
collapse_chain = StuffDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=collapse_prompt),
document_variable_name=combine_document_variable_name,
)
return MapReduceDocumentsChain(
llm_chain=map_chain,
combine_document_chain=combine_document_chain,
document_variable_name=map_reduce_document_variable_name,
collapse_document_chain=collapse_chain,
**kwargs,
)
def _load_refine_chain(
llm: LLM,
question_prompt: BasePromptTemplate = refine_prompts.DEFAULT_TEXT_QA_PROMPT,
refine_prompt: BasePromptTemplate = refine_prompts.DEFAULT_REFINE_PROMPT,
document_variable_name: str = "context_str",
initial_response_name: str = "existing_answer",
**kwargs: Any,
) -> RefineDocumentsChain:
initial_chain = LLMChain(llm=llm, prompt=question_prompt)
refine_chain = LLMChain(llm=llm, prompt=refine_prompt)
return RefineDocumentsChain(
initial_llm_chain=initial_chain,
refine_llm_chain=refine_chain,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
**kwargs,
)
def load_qa_chain(
llm: LLM, chain_type: str = "stuff", **kwargs: Any
) -> BaseCombineDocumentsChain:
"""Load question answering chain.
Args:
llm: Language Model to use in the chain.
chain_type: Type of document combining chain to use. Should be one of "stuff",
"map_reduce", and "refine".
Returns:
A chain to use for question answering.
"""
loader_mapping: Mapping[str, LoadingCallable] = {
"stuff": _load_stuff_chain,
"map_reduce": _load_map_reduce_chain,
"refine": _load_refine_chain,
}
if chain_type not in loader_mapping:
raise ValueError(
f"Got unsupported chain type: {chain_type}. "
f"Should be one of {loader_mapping.keys()}"
)
return loader_mapping[chain_type](llm, **kwargs)

View File

@@ -1,45 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
question_prompt_template = """Use the following portion of a long document to see if any of the text is relevant to answer the question.
Return any relevant text verbatim.
{context}
Question: {question}
Relevant text, if any:"""
QUESTION_PROMPT = PromptTemplate(
template=question_prompt_template, input_variables=["context", "question"]
)
combine_prompt_template = """Given the following extracted parts of a long document and a question, create a final answer.
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an injunction or other relief to protect its Intellectual Property Rights.
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other) right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as defined in Clause 8.5) or that such a violation is reasonably likely to occur,
=========
FINAL ANSWER: This Agreement is governed by English law.
QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Content: And we wont stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLets use this moment to reset. Lets stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease. \n\nLets stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans. \n\nWe cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Content: And a proud Ukrainian people, who have known 30 years of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards. \n\nTo all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n\nThese steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIts based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more. \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimers, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
=========
FINAL ANSWER: The president did not mention Michael Jackson.
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""
COMBINE_PROMPT = PromptTemplate(
template=combine_prompt_template, input_variables=["summaries", "question"]
)

View File

@@ -1,32 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {question}\n"
"We have provided an existing answer: {existing_answer}\n"
"We have the opportunity to refine the existing answer"
"(only if needed) with some more context below.\n"
"------------\n"
"{context_str}\n"
"------------\n"
"Given the new context, refine the original answer to better "
"answer the question. "
"If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = PromptTemplate(
input_variables=["question", "existing_answer", "context_str"],
template=DEFAULT_REFINE_PROMPT_TMPL,
)
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the question: {question}\n"
)
DEFAULT_TEXT_QA_PROMPT = PromptTemplate(
input_variables=["context_str", "question"], template=DEFAULT_TEXT_QA_PROMPT_TMPL
)

View File

@@ -1,12 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)

View File

@@ -1,107 +0,0 @@
"""Load summarizing chains."""
from typing import Any, Mapping, Optional, Protocol
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.refine import RefineDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.chains.summarize import map_reduce_prompt, refine_prompts, stuff_prompt
from langchain.llms.base import LLM
from langchain.prompts.base import BasePromptTemplate
class LoadingCallable(Protocol):
"""Interface for loading the combine documents chain."""
def __call__(self, llm: LLM, **kwargs: Any) -> BaseCombineDocumentsChain:
"""Callable to load the combine documents chain."""
def _load_stuff_chain(
llm: LLM,
prompt: BasePromptTemplate = stuff_prompt.PROMPT,
document_variable_name: str = "text",
**kwargs: Any,
) -> StuffDocumentsChain:
llm_chain = LLMChain(llm=llm, prompt=prompt)
# TODO: document prompt
return StuffDocumentsChain(
llm_chain=llm_chain, document_variable_name=document_variable_name, **kwargs
)
def _load_map_reduce_chain(
llm: LLM,
map_prompt: BasePromptTemplate = map_reduce_prompt.PROMPT,
combine_prompt: BasePromptTemplate = map_reduce_prompt.PROMPT,
combine_document_variable_name: str = "text",
map_reduce_document_variable_name: str = "text",
collapse_prompt: Optional[BasePromptTemplate] = None,
**kwargs: Any,
) -> MapReduceDocumentsChain:
map_chain = LLMChain(llm=llm, prompt=map_prompt)
reduce_chain = LLMChain(llm=llm, prompt=combine_prompt)
# TODO: document prompt
combine_document_chain = StuffDocumentsChain(
llm_chain=reduce_chain, document_variable_name=combine_document_variable_name
)
if collapse_prompt is None:
collapse_chain = None
else:
collapse_chain = StuffDocumentsChain(
llm_chain=LLMChain(llm=llm, prompt=collapse_prompt),
document_variable_name=combine_document_variable_name,
)
return MapReduceDocumentsChain(
llm_chain=map_chain,
combine_document_chain=combine_document_chain,
document_variable_name=map_reduce_document_variable_name,
collapse_document_chain=collapse_chain,
**kwargs,
)
def _load_refine_chain(
llm: LLM,
question_prompt: BasePromptTemplate = refine_prompts.PROMPT,
refine_prompt: BasePromptTemplate = refine_prompts.REFINE_PROMPT,
document_variable_name: str = "text",
initial_response_name: str = "existing_answer",
**kwargs: Any,
) -> RefineDocumentsChain:
initial_chain = LLMChain(llm=llm, prompt=question_prompt)
refine_chain = LLMChain(llm=llm, prompt=refine_prompt)
return RefineDocumentsChain(
initial_llm_chain=initial_chain,
refine_llm_chain=refine_chain,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
**kwargs,
)
def load_summarize_chain(
llm: LLM, chain_type: str = "stuff", **kwargs: Any
) -> BaseCombineDocumentsChain:
"""Load summarizing chain.
Args:
llm: Language Model to use in the chain.
chain_type: Type of document combining chain to use. Should be one of "stuff",
"map_reduce", and "refine".
Returns:
A chain to use for summarizing.
"""
loader_mapping: Mapping[str, LoadingCallable] = {
"stuff": _load_stuff_chain,
"map_reduce": _load_map_reduce_chain,
"refine": _load_refine_chain,
}
if chain_type not in loader_mapping:
raise ValueError(
f"Got unsupported chain type: {chain_type}. "
f"Should be one of {loader_mapping.keys()}"
)
return loader_mapping[chain_type](llm, **kwargs)

View File

@@ -1,11 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

View File

@@ -1,28 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
REFINE_PROMPT_TMPL = (
"Your job is to produce a final summary\n"
"We have provided an existing summary up to a certain point: {existing_answer}\n"
"We have the opportunity to refine the existing summary"
"(only if needed) with some more context below.\n"
"------------\n"
"{text}\n"
"------------\n"
"Given the new context, refine the original summary"
"If the context isn't useful, return the original summary."
)
REFINE_PROMPT = PromptTemplate(
input_variables=["existing_answer", "text"],
template=REFINE_PROMPT_TMPL,
)
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

View File

@@ -1,11 +0,0 @@
# flake8: noqa
from langchain.prompts import PromptTemplate
prompt_template = """Write a concise summary of the following:
{text}
CONCISE SUMMARY:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])

View File

@@ -1,41 +0,0 @@
"""Chain that runs an arbitrary python function."""
from typing import Callable, Dict, List
from pydantic import BaseModel
from langchain.chains.base import Chain
class TransformChain(Chain, BaseModel):
"""Chain transform chain output.
Example:
.. code-block:: python
from langchain import TransformChain
transform_chain = TransformChain(input_variables=["text"],
output_variables["entities"], transform=func())
"""
input_variables: List[str]
output_variables: List[str]
transform: Callable[[Dict[str, str]], Dict[str, str]]
@property
def input_keys(self) -> List[str]:
"""Expect input keys.
:meta private:
"""
return self.input_variables
@property
def output_keys(self) -> List[str]:
"""Return output keys.
:meta private:
"""
return self.output_variables
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
return self.transform(inputs)

View File

@@ -1,13 +1,9 @@
"""Chain for question-answering against a vector database."""
from __future__ import annotations
from typing import Dict, List
from typing import Any, Dict, List
from pydantic import BaseModel, Extra, root_validator
from pydantic import BaseModel, Extra
from langchain.chains.base import Chain
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.chains.vector_db_qa.prompt import PROMPT
from langchain.llms.base import LLM
@@ -28,12 +24,14 @@ class VectorDBQA(Chain, BaseModel):
"""
llm: LLM
"""LLM wrapper to use."""
vectorstore: VectorStore
"""Vector Database to connect to."""
k: int = 4
"""Number of documents to query for."""
combine_documents_chain: BaseCombineDocumentsChain
"""Chain to use to combine the documents."""
prompt: PromptTemplate = PROMPT
"""Prompt to use when questioning the documents."""
input_key: str = "query" #: :meta private:
output_key: str = "result" #: :meta private:
@@ -59,47 +57,13 @@ class VectorDBQA(Chain, BaseModel):
"""
return [self.output_key]
# TODO: deprecate this
@root_validator(pre=True)
def load_combine_documents_chain(cls, values: Dict) -> Dict:
"""Validate question chain."""
if "combine_documents_chain" not in values:
if "llm" not in values:
raise ValueError(
"If `combine_documents_chain` not provided, `llm` should be."
)
prompt = values.pop("prompt", PROMPT)
llm = values.pop("llm")
llm_chain = LLMChain(llm=llm, prompt=prompt)
document_prompt = PromptTemplate(
input_variables=["page_content"], template="Context:\n{page_content}"
)
combine_documents_chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_variable_name="context",
document_prompt=document_prompt,
)
values["combine_documents_chain"] = combine_documents_chain
return values
@classmethod
def from_llm(
cls, llm: LLM, prompt: PromptTemplate = PROMPT, **kwargs: Any
) -> VectorDBQA:
"""Initialize from LLM."""
llm_chain = LLMChain(llm=llm, prompt=prompt)
document_prompt = PromptTemplate(
input_variables=["page_content"], template="Context:\n{page_content}"
)
combine_documents_chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_variable_name="context",
document_prompt=document_prompt,
)
return cls(combine_documents_chain=combine_documents_chain, **kwargs)
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:
question = inputs[self.input_key]
llm_chain = LLMChain(llm=self.llm, prompt=self.prompt)
docs = self.vectorstore.similarity_search(question, k=self.k)
answer = self.combine_documents_chain.combine_docs(docs, question=question)
contexts = []
for j, doc in enumerate(docs):
contexts.append(f"Context {j}:\n{doc.page_content}")
# TODO: handle cases where this context is too long.
answer = llm_chain.predict(question=question, context="\n\n".join(contexts))
return {self.output_key: answer}

View File

@@ -22,8 +22,9 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
"""
client: Any #: :meta private:
document_model_name: str = "text-embedding-ada-002"
query_model_name: str = "text-embedding-ada-002"
model_name: str = "babbage"
"""Model name to use."""
openai_api_key: Optional[str] = None
class Config:
@@ -31,26 +32,6 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
extra = Extra.forbid
# TODO: deprecate this
@root_validator(pre=True)
def get_model_names(cls, values: Dict) -> Dict:
"""Get model names from just old model name."""
if "model_name" in values:
if "document_model_name" in values:
raise ValueError(
"Both `model_name` and `document_model_name` were provided, "
"but only one should be."
)
if "query_model_name" in values:
raise ValueError(
"Both `model_name` and `query_model_name` were provided, "
"but only one should be."
)
model_name = values.pop("model_name")
values["document_model_name"] = f"text-search-{model_name}-doc-001"
values["query_model_name"] = f"text-search-{model_name}-query-001"
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
@@ -85,7 +66,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
List of embeddings, one for each text.
"""
responses = [
self._embedding_func(text, engine=self.document_model_name)
self._embedding_func(text, engine=f"text-search-{self.model_name}-doc-001")
for text in texts
]
return responses
@@ -99,5 +80,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
Returns:
Embeddings for the text.
"""
embedding = self._embedding_func(text, engine=self.query_model_name)
embedding = self._embedding_func(
text, engine=f"text-search-{self.model_name}-query-001"
)
return embedding

View File

@@ -1,28 +1,7 @@
"""Wrappers on top of large language models APIs."""
from typing import Dict, Type
from langchain.llms.ai21 import AI21
from langchain.llms.base import LLM
from langchain.llms.cohere import Cohere
from langchain.llms.huggingface_hub import HuggingFaceHub
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.llms.nlpcloud import NLPCloud
from langchain.llms.openai import OpenAI
__all__ = [
"Cohere",
"NLPCloud",
"OpenAI",
"HuggingFaceHub",
"HuggingFacePipeline",
"AI21",
]
type_to_cls_dict: Dict[str, Type[LLM]] = {
"ai21": AI21,
"cohere": Cohere,
"huggingface_hub": HuggingFaceHub,
"nlpcloud": NLPCloud,
"openai": OpenAI,
"huggingface_pipeline": HuggingFacePipeline,
}
__all__ = ["Cohere", "NLPCloud", "OpenAI", "HuggingFaceHub"]

View File

@@ -19,7 +19,7 @@ class AI21PenaltyData(BaseModel):
applyToEmojis: bool = True
class AI21(LLM, BaseModel):
class AI21(BaseModel, LLM):
"""Wrapper around AI21 large language models.
To use, you should have the environment variable ``AI21_API_KEY``
@@ -96,12 +96,7 @@ class AI21(LLM, BaseModel):
"""Get the identifying parameters."""
return {**{"model": self.model}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "ai21"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to AI21's complete endpoint.
Args:

View File

@@ -1,115 +1,14 @@
"""Base interface for large language models to expose."""
import json
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Any, Dict, List, Mapping, NamedTuple, Optional, Union
import yaml
from pydantic import BaseModel, Extra
import langchain
from langchain.schema import Generation
from typing import Any, List, Mapping, Optional
class LLMResult(NamedTuple):
"""Class that contains all relevant information for an LLM Result."""
generations: List[List[Generation]]
"""List of the things generated. This is List[List[]] because
each input could have multiple generations."""
llm_output: Optional[dict] = None
"""For arbitrary LLM provider specific output."""
class LLM(BaseModel, ABC):
class LLM(ABC):
"""LLM wrapper should take in a prompt and return a string."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def _generate(
self, prompts: List[str], stop: Optional[List[str]] = None
) -> LLMResult:
"""Run the LLM on the given prompt and input."""
# TODO: add caching here.
generations = []
for prompt in prompts:
text = self(prompt, stop=stop)
generations.append([Generation(text=text)])
return LLMResult(generations=generations)
def generate(
self, prompts: List[str], stop: Optional[List[str]] = None
) -> LLMResult:
"""Run the LLM on the given prompt and input."""
if langchain.llm_cache is None:
return self._generate(prompts, stop=stop)
params = self._llm_dict()
params["stop"] = stop
llm_string = str(sorted([(k, v) for k, v in params.items()]))
missing_prompts = []
missing_prompt_idxs = []
existing_prompts = {}
for i, prompt in enumerate(prompts):
cache_val = langchain.llm_cache.lookup(prompt, llm_string)
if isinstance(cache_val, list):
existing_prompts[i] = cache_val
else:
missing_prompts.append(prompt)
missing_prompt_idxs.append(i)
new_results = self._generate(missing_prompts, stop=stop)
for i, result in enumerate(new_results.generations):
existing_prompts[i] = result
prompt = prompts[i]
langchain.llm_cache.update(prompt, llm_string, result)
generations = [existing_prompts[i] for i in range(len(prompts))]
return LLMResult(generations=generations, llm_output=new_results.llm_output)
def get_num_tokens(self, text: str) -> int:
"""Get the number of tokens present in the text."""
# TODO: this method may not be exact.
# TODO: this method may differ based on model (eg codex).
try:
from transformers import GPT2TokenizerFast
except ImportError:
raise ValueError(
"Could not import transformers python package. "
"This is needed in order to calculate get_num_tokens. "
"Please it install it with `pip install transformers`."
)
# create a GPT-3 tokenizer instance
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# tokenize the text using the GPT-3 tokenizer
tokenized_text = tokenizer.tokenize(text)
# calculate the number of tokens in the tokenized text
return len(tokenized_text)
@abstractmethod
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Run the LLM on the given prompt and input."""
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Check Cache and run the LLM on the given prompt and input."""
if langchain.llm_cache is None:
return self._call(prompt, stop=stop)
params = self._llm_dict()
params["stop"] = stop
llm_string = str(sorted([(k, v) for k, v in params.items()]))
if langchain.cache is not None:
cache_val = langchain.llm_cache.lookup(prompt, llm_string)
if cache_val is not None:
if isinstance(cache_val, str):
return cache_val
else:
return cache_val[0].text
return_val = self._call(prompt, stop=stop)
if langchain.cache is not None:
langchain.llm_cache.update(prompt, llm_string, return_val)
return return_val
"""Run the LLM on the given prompt and input."""
@property
def _identifying_params(self) -> Mapping[str, Any]:
@@ -120,46 +19,3 @@ class LLM(BaseModel, ABC):
"""Get a string representation of the object for printing."""
cls_name = f"\033[1m{self.__class__.__name__}\033[0m"
return f"{cls_name}\nParams: {self._identifying_params}"
@property
@abstractmethod
def _llm_type(self) -> str:
"""Return type of llm."""
def _llm_dict(self) -> Dict:
"""Return a dictionary of the prompt."""
starter_dict = dict(self._identifying_params)
starter_dict["_type"] = self._llm_type
return starter_dict
def save(self, file_path: Union[Path, str]) -> None:
"""Save the LLM.
Args:
file_path: Path to file to save the LLM to.
Example:
.. code-block:: python
llm.save(file_path="path/llm.yaml")
"""
# Convert file to Path object.
if isinstance(file_path, str):
save_path = Path(file_path)
else:
save_path = file_path
directory_path = save_path.parent
directory_path.mkdir(parents=True, exist_ok=True)
# Fetch dictionary to save
prompt_dict = self._llm_dict()
if save_path.suffix == ".json":
with open(file_path, "w") as f:
json.dump(prompt_dict, f, indent=4)
elif save_path.suffix == ".yaml":
with open(file_path, "w") as f:
yaml.dump(prompt_dict, f, default_flow_style=False)
else:
raise ValueError(f"{save_path} must be json or yaml")

View File

@@ -85,12 +85,7 @@ class Cohere(LLM, BaseModel):
"""Get the identifying parameters."""
return {**{"model": self.model}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "cohere"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to Cohere's generate endpoint.
Args:

View File

@@ -74,17 +74,9 @@ class HuggingFaceHub(LLM, BaseModel):
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
_model_kwargs = self.model_kwargs or {}
return {
**{"repo_id": self.repo_id, "task": self.task},
**{"model_kwargs": _model_kwargs},
}
return {**{"repo_id": self.repo_id}, **_model_kwargs}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "huggingface_hub"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to HuggingFace Hub's inference endpoint.
Args:

View File

@@ -1,118 +0,0 @@
"""Wrapper around HuggingFace Pipeline APIs."""
from typing import Any, List, Mapping, Optional
from pydantic import BaseModel, Extra
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
DEFAULT_MODEL_ID = "gpt2"
DEFAULT_TASK = "text-generation"
VALID_TASKS = ("text2text-generation", "text-generation")
class HuggingFacePipeline(LLM, BaseModel):
"""Wrapper around HuggingFace Pipeline API.
To use, you should have the ``transformers`` python package installed.
Only supports `text-generation` and `text2text-generation` for now.
Example using from_model_id:
.. code-block:: python
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="gpt2", task="text-generation"
)
Example passing pipeline in directly:
.. code-block:: python
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline(
"text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
)
hf = HuggingFacePipeline(pipeline=pipe
"""
pipeline: Any #: :meta private:
model_id: str = DEFAULT_MODEL_ID
"""Model name to use."""
model_kwargs: Optional[dict] = None
"""Key word arguments to pass to the model."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@classmethod
def from_model_id(
cls,
model_id: str,
task: str,
model_kwargs: Optional[dict] = None,
**kwargs: Any,
) -> LLM:
"""Construct the pipeline object from model_id and task."""
try:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline as hf_pipeline
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipeline = hf_pipeline(
task=task, model=model, tokenizer=tokenizer, **model_kwargs
)
if pipeline.task not in VALID_TASKS:
raise ValueError(
f"Got invalid task {pipeline.task}, "
f"currently only {VALID_TASKS} are supported"
)
return cls(
pipeline=pipeline,
model_id=model_id,
model_kwargs=model_kwargs,
**kwargs,
)
except ImportError:
raise ValueError(
"Could not import transformers python package. "
"Please it install it with `pip install transformers`."
)
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
**{"model_id": self.model_id},
**{"model_kwargs": self.model_kwargs},
}
@property
def _llm_type(self) -> str:
return "huggingface_pipeline"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
response = self.pipeline(text_inputs=prompt)
if self.pipeline.task == "text-generation":
# Text generation return includes the starter text.
text = response[0]["generated_text"][len(prompt) :]
elif self.pipeline.task == "text2text-generation":
text = response[0]["generated_text"]
else:
raise ValueError(
f"Got invalid task {self.pipeline.task}, "
f"currently only {VALID_TASKS} are supported"
)
if stop is not None:
# This is a bit hacky, but I can't figure out a better way to enforce
# stop tokens when making calls to huggingface_hub.
text = enforce_stop_tokens(text, stop)
return text

View File

@@ -1,42 +0,0 @@
"""Base interface for loading large language models apis."""
import json
from pathlib import Path
from typing import Union
import yaml
from langchain.llms import type_to_cls_dict
from langchain.llms.base import LLM
def load_llm_from_config(config: dict) -> LLM:
"""Load LLM from Config Dict."""
if "_type" not in config:
raise ValueError("Must specify an LLM Type in config")
config_type = config.pop("_type")
if config_type not in type_to_cls_dict:
raise ValueError(f"Loading {config_type} LLM not supported")
llm_cls = type_to_cls_dict[config_type]
return llm_cls(**config)
def load_llm(file: Union[str, Path]) -> LLM:
"""Load LLM from file."""
# Convert file to Path object.
if isinstance(file, str):
file_path = Path(file)
else:
file_path = file
# Load from either json or yaml.
if file_path.suffix == ".json":
with open(file_path) as f:
config = json.load(f)
elif file_path.suffix == ".yaml":
with open(file_path, "r") as f:
config = yaml.safe_load(f)
else:
raise ValueError("File type must be json or yaml")
# Load the LLM from the config now.
return load_llm_from_config(config)

View File

@@ -37,12 +37,7 @@ class ManifestWrapper(LLM, BaseModel):
kwargs = self.llm_kwargs or {}
return {**self.client.client.get_model_params(), **kwargs}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "manifest"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to LLM through Manifest."""
if stop is not None and len(stop) != 1:
raise NotImplementedError(

View File

@@ -106,12 +106,7 @@ class NLPCloud(LLM, BaseModel):
"""Get the identifying parameters."""
return {**{"model_name": self.model_name}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "nlpcloud"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to NLPCloud's create endpoint.
Args:

View File

@@ -1,11 +1,9 @@
"""Wrapper around OpenAI APIs."""
import sys
from typing import Any, Dict, Generator, List, Mapping, Optional
from typing import Any, Dict, List, Mapping, Optional
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.llms.base import LLM, LLMResult
from langchain.schema import Generation
from langchain.llms.base import LLM
from langchain.utils import get_from_dict_or_env
@@ -31,14 +29,12 @@ class OpenAI(LLM, BaseModel):
temperature: float = 0.7
"""What sampling temperature to use."""
max_tokens: int = 256
"""The maximum number of tokens to generate in the completion.
-1 returns as many tokens as possible given the prompt and
the models maximal context size."""
top_p: float = 1
"""The maximum number of tokens to generate in the completion."""
top_p: int = 1
"""Total probability mass of tokens to consider at each step."""
frequency_penalty: float = 0
frequency_penalty: int = 0
"""Penalizes repeated tokens according to frequency."""
presence_penalty: float = 0
presence_penalty: int = 0
"""Penalizes repeated tokens."""
n: int = 1
"""How many completions to generate for each prompt."""
@@ -47,8 +43,6 @@ class OpenAI(LLM, BaseModel):
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not explicitly specified."""
openai_api_key: Optional[str] = None
batch_size: int = 20
"""Batch size to use when passing multiple documents to generate."""
class Config:
"""Configuration for this pydantic object."""
@@ -101,100 +95,12 @@ class OpenAI(LLM, BaseModel):
}
return {**normal_params, **self.model_kwargs}
def _generate(
self, prompts: List[str], stop: Optional[List[str]] = None
) -> LLMResult:
"""Call out to OpenAI's endpoint with k unique prompts.
Args:
prompts: The prompts to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The full LLM output.
Example:
.. code-block:: python
response = openai.generate(["Tell me a joke."])
"""
# TODO: write a unit test for this
params = self._default_params
if stop is not None:
if "stop" in params:
raise ValueError("`stop` found in both the input and default params.")
params["stop"] = stop
if params["max_tokens"] == -1:
if len(prompts) != 1:
raise ValueError(
"max_tokens set to -1 not supported for multiple inputs."
)
params["max_tokens"] = self.max_tokens_for_prompt(prompts[0])
sub_prompts = [
prompts[i : i + self.batch_size]
for i in range(0, len(prompts), self.batch_size)
]
choices = []
token_usage = {}
# Get the token usage from the response.
# Includes prompt, completion, and total tokens used.
_keys = ["completion_tokens", "prompt_tokens", "total_tokens"]
for _prompts in sub_prompts:
response = self.client.create(
model=self.model_name, prompt=_prompts, **params
)
choices.extend(response["choices"])
for _key in _keys:
if _key not in token_usage:
token_usage[_key] = response["usage"][_key]
else:
token_usage[_key] += response["usage"][_key]
generations = []
for i, prompt in enumerate(prompts):
sub_choices = choices[i * self.n : (i + 1) * self.n]
generations.append(
[Generation(text=choice["text"]) for choice in sub_choices]
)
return LLMResult(
generations=generations, llm_output={"token_usage": token_usage}
)
def stream(self, prompt: str) -> Generator:
"""Call OpenAI with streaming flag and return the resulting generator.
Args:
prompt: The prompts to pass into the model.
Returns:
A generator representing the stream of tokens from OpenAI.
Example:
.. code-block:: python
generator = openai.stream("Tell me a joke.")
for token in generator:
yield token
"""
params = self._default_params
if params["best_of"] != 1:
raise ValueError("OpenAI only supports best_of == 1 for streaming")
params["stream"] = True
generator = self.client.create(model=self.model_name, prompt=prompt, **params)
return generator
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {**{"model_name": self.model_name}, **self._default_params}
return {**{"model": self.model_name}, **self._default_params}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "openai"
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
def __call__(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""Call out to OpenAI's create endpoint.
Args:
@@ -209,82 +115,10 @@ class OpenAI(LLM, BaseModel):
response = openai("Tell me a joke.")
"""
return self.generate([prompt], stop=stop).generations[0][0].text
def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
# tiktoken NOT supported for Python 3.8 or below
if sys.version_info[1] <= 8:
return super().get_num_tokens(text)
try:
import tiktoken
except ImportError:
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please it install it with `pip install tiktoken`."
)
# create a GPT-3 encoder instance
enc = tiktoken.get_encoding("gpt2")
# encode the text using the GPT-3 encoder
tokenized_text = enc.encode(text)
# calculate the number of tokens in the encoded text
return len(tokenized_text)
def modelname_to_contextsize(self, modelname: str) -> int:
"""Calculate the maximum number of tokens possible to generate for a model.
text-davinci-003: 4,000 tokens
text-curie-001: 2,048 tokens
text-babbage-001: 2,048 tokens
text-ada-001: 2,048 tokens
code-davinci-002: 8,000 tokens
code-cushman-001: 2,048 tokens
Args:
modelname: The modelname we want to know the context size for.
Returns:
The maximum context size
Example:
.. code-block:: python
max_tokens = openai.modelname_to_contextsize("text-davinci-003")
"""
if modelname == "text-davinci-003":
return 4000
elif modelname == "text-curie-001":
return 2048
elif modelname == "text-babbage-001":
return 2048
elif modelname == "text-ada-001":
return 2048
elif modelname == "code-davinci-002":
return 8000
elif modelname == "code-cushman-001":
return 2048
else:
return 4000
def max_tokens_for_prompt(self, prompt: str) -> int:
"""Calculate the maximum number of tokens possible to generate for a prompt.
Args:
prompt: The prompt to pass into the model.
Returns:
The maximum number of tokens to generate for a prompt.
Example:
.. code-block:: python
max_tokens = openai.max_token_for_prompt("Tell me a joke.")
"""
num_tokens = self.get_num_tokens(prompt)
# get max context size for model by name
max_size = self.modelname_to_contextsize(self.model_name)
return max_size - num_tokens
params = self._default_params
if stop is not None:
if "stop" in params:
raise ValueError("`stop` found in both the input and default params.")
params["stop"] = stop
response = self.client.create(model=self.model_name, prompt=prompt, **params)
return response["choices"][0]["text"]

View File

@@ -1,51 +1,48 @@
"""BETA: everything in here is highly experimental, do not rely on."""
from typing import Any, Optional
from langchain.input import print_text
from langchain.schema import AgentAction, AgentFinish
from langchain.schema import AgentAction
class BaseLogger:
"""Base logging interface."""
def log_agent_start(self, text: str, **kwargs: Any) -> None:
"""Log the start of an agent interaction."""
def log_agent_start(self, text: str, **kwargs: Any):
pass
def log_agent_end(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Log the end of an agent interaction."""
def log_agent_end(self, text: str, **kwargs: Any):
pass
def log_agent_action(self, action: AgentAction, **kwargs: Any) -> None:
"""Log agent action decision."""
def log_agent_action(self, action: AgentAction, **kwargs: Any):
pass
def log_agent_observation(self, observation: str, **kwargs: Any) -> None:
"""Log agent observation."""
def log_agent_observation(self, observation: str, **kwargs: Any):
pass
def log_llm_inputs(self, inputs: dict, prompt: str, **kwargs: Any) -> None:
"""Log LLM inputs."""
def log_llm_inputs(self, inputs: dict, prompt: str, **kwargs):
pass
def log_llm_response(self, output: str, **kwargs: Any) -> None:
"""Log LLM response."""
def log_llm_response(self, output: str, **kwargs):
pass
class StdOutLogger(BaseLogger):
"""Interface for printing things to stdout."""
def log_agent_start(self, text: str, **kwargs: Any) -> None:
"""Print the text to start the agent."""
class StOutLogger(BaseLogger):
def log_agent_start(self, text: str, **kwargs: Any):
print_text(text)
def log_agent_end(self, text: str, **kwargs: Any):
pass
def log_agent_action(
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
) -> None:
"""Print the log of the action in a certain color."""
):
print_text(action.log, color=color)
def log_llm_inputs(self, inputs: dict, prompt: str, **kwargs):
print("Prompt after formatting:")
print_text(prompt, color="green", end="\n")
def log_llm_response(self, output: str, **kwargs):
pass
def log_agent_observation(
self,
observation: str,
@@ -53,13 +50,7 @@ class StdOutLogger(BaseLogger):
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Print the observation in a special color."""
):
print_text(f"\n{observation_prefix}")
print_text(observation, color=color)
print_text(f"\n{llm_prefix}")
def log_llm_inputs(self, inputs: dict, prompt: str, **kwargs: Any) -> None:
"""Print the prompt in green."""
print("Prompt after formatting:")
print_text(prompt, color="green", end="\n")

Some files were not shown because too many files have changed in this diff Show More