Compare commits

..

2 Commits

Author SHA1 Message Date
Lance Martin
487488f540 Updates 2023-10-24 15:53:42 -07:00
Lance Martin
ae374461a6 Text-to-SQL+semantic search 2023-10-23 09:09:43 -07:00
3138 changed files with 48629 additions and 262541 deletions

View File

@@ -17,16 +17,13 @@ For more info, check out the [GitHub documentation](https://docs.github.com/en/f
## VS Code Dev Containers
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
```
Note: If you click this link you will open the main repo and not your local cloned repo, you can use this link and replace with your username and cloned repo name:
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
```
Then you will have a local cloned repo where you can contribute and then create pull requests.
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
You can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the [getting started steps](https://aka.ms/vscode-remote/containers/getting-started).

View File

@@ -1,132 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
conduct@langchain.dev.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@@ -1,19 +1,20 @@
# Contributing to LangChain
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
As an open source project in a rapidly developing field, we are extremely open
to contributions, whether they be in the form of new features, improved infra, better documentation, or bug fixes.
## 🗺️ Guidelines
### 👩‍💻 Contributing Code
To contribute to this project, please follow the ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
To contribute to this project, please follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are a maintainer.
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
maintainers.
Pull requests cannot land without passing the formatting, linting, and testing checks first. See [Testing](#testing) and
Pull requests cannot land without passing the formatting, linting and testing checks first. See [Testing](#testing) and
[Formatting and Linting](#formatting-and-linting) for how to run these checks locally.
It's essential that we maintain great documentation and testing. If you:
@@ -26,14 +27,16 @@ It's essential that we maintain great documentation and testing. If you:
- Add a demo notebook in `docs/modules`.
- Add unit and integration tests.
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
We're a small, building-oriented team. If there's something you'd like to add or change, opening a pull request is the
best way to get our attention.
### 🚩GitHub Issues
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date
with bugs, improvements, and feature requests.
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help organize issues.
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
organize issues.
If you start working on an issue, please assign it to yourself.
@@ -56,12 +59,12 @@ we do not want these to get in the way of getting good code into the codebase.
## 🚀 Quick Start
This quick start guide explains how to run the repository locally.
This quick start describes running the repository locally.
For a [development container](https://containers.dev/), see the [.devcontainer folder](https://github.com/langchain-ai/langchain/tree/master/.devcontainer).
### Dependency Management: Poetry and other env/dependency managers
This project utilizes [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
This project uses [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
❗Note: *Before installing Poetry*, if you use `Conda`, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
@@ -72,11 +75,11 @@ tell Poetry to use the virtualenv python environment (`poetry config virtualenvs
### Core vs. Experimental
This repository contains two separate projects:
- `langchain`: core langchain code, abstractions, and use cases.
- `langchain.experimental`: see the [Experimental README](https://github.com/langchain-ai/langchain/tree/master/libs/experimental/README.md) for more information.
There are two separate projects in this repository:
- `langchain`: core langchain code, abstractions, and use cases
- `langchain.experimental`: see the [Experimental README](../libs/experimental/README.md) for more information.
Each of these has its own development environment. Docs are run from the top-level makefile, but development
Each of these has their own development environment. Docs are run from the top-level makefile, but development
is split across separate test & release flows.
For this quickstart, start with langchain core:
@@ -126,7 +129,7 @@ To run unit tests in Docker:
make docker_tests
```
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
There are also [integration tests and code-coverage](../libs/langchain/tests/README.md) available.
### Formatting and Linting
@@ -134,21 +137,14 @@ Run these locally before submitting a PR; the CI system will check also.
#### Code Formatting
Formatting for this project is done via [ruff](https://docs.astral.sh/ruff/rules/).
Formatting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/) and [ruff](https://docs.astral.sh/ruff/rules/).
To run formatting for docs, cookbook and templates:
To run formatting for this project:
```bash
make format
```
To run formatting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make format
```
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
```bash
@@ -159,21 +155,14 @@ This is especially useful when you have made changes to a subset of the project
#### Linting
Linting for this project is done via a combination of [ruff](https://docs.astral.sh/ruff/rules/) and [mypy](http://mypy-lang.org/).
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [ruff](https://docs.astral.sh/ruff/rules/), and [mypy](http://mypy-lang.org/).
To run linting for docs, cookbook and templates:
To run linting for this project:
```bash
make lint
```
To run linting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make lint
```
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
```bash
@@ -214,10 +203,6 @@ ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogy
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
You only need to add a new dependency if a **unit test** relies on the package.
If your package is only required for **integration tests**, then you can skip these
steps and leave all pyproject.toml and poetry.lock files alone.
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
that most users won't have it installed.
@@ -297,7 +282,7 @@ make docs_build
make api_docs_build
```
Finally, run the link checker to ensure all links are valid:
Finally, you can run the linkchecker to make sure all links are valid:
```bash
make docs_linkcheck
@@ -306,8 +291,8 @@ make api_docs_linkcheck
### Verify Documentation changes
After pushing documentation changes to the repository, you can preview and verify that the changes are
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
After pushing documentation changes to the repository, you can preview and verify that the changes are
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
This will take you to a preview of the documentation changes.
This preview is created by [Vercel](https://vercel.com/docs/getting-started-with-vercel).
@@ -322,4 +307,4 @@ even patch releases may contain [non-backwards-compatible changes](https://semve
### 🌟 Recognition
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or through another means.
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.

View File

@@ -1,57 +0,0 @@
name: compile-integration-test
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.6.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: compile-integration
- name: Install integration dependencies
shell: bash
run: poetry install --with=test_integration
- name: Check integration tests compile
shell: bash
run: poetry run pytest -m compile tests/integration_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -7,21 +7,20 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
# This env var allows us to get inline annotations when ruff has complaints.
RUFF_OUTPUT_FORMAT: github
jobs:
build:
runs-on: ubuntu-latest
env:
# This number is set "by eye": we want it to be big enough
# so that it's bigger than the number of commits in any reasonable PR,
# and also as small as possible since increasing the number makes
# the initial `git fetch` slower.
FETCH_DEPTH: 50
strategy:
matrix:
# Only lint on the min and max supported Python versions.
@@ -36,6 +35,51 @@ jobs:
- "3.11"
steps:
- uses: actions/checkout@v4
with:
# Fetch the last FETCH_DEPTH commits, so the mtime-changing script
# can accurately set the mtimes of files modified in the last FETCH_DEPTH commits.
fetch-depth: ${{ env.FETCH_DEPTH }}
- name: Restore workdir file mtimes to last-edited commit date
id: restore-mtimes
# This is needed to make black caching work.
# Black's cache uses file (mtime, size) to check whether a lookup is a cache hit.
# Without this command, files in the repo would have the current time as the modified time,
# since the previous action step just created them.
# This command resets the mtime to the last time the files were modified in git instead,
# which is a high-quality and stable representation of the last modification date.
run: |
# Important considerations:
# - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
# - We only want to alter mtimes for Python files, since that's all black checks.
# - We don't need to alter mtimes for directories, since black doesn't look at those.
# - We also only alter mtimes inside the `WORKDIR` since that's all we'll lint.
# - This should run before `poetry install`, because poetry's venv also contains
# Python files, and we don't want to alter their mtimes since they aren't linted.
# Ensure we fail on non-zero exits and on undefined variables.
# Also print executed commands, for easier debugging.
set -eux
# Restore the mtimes of Python files in the workdir based on git history.
.github/tools/git-restore-mtime --no-directories "$WORKDIR/**/*.py"
# Since CI only does a partial fetch (to `FETCH_DEPTH`) for efficiency,
# the local git repo doesn't have full history. There are probably files
# that were last modified in a commit *older than* the oldest fetched commit.
# After `git-restore-mtime`, such files have a mtime set to the oldest fetched commit.
#
# As new commits get added, that timestamp will keep moving forward.
# If left unchanged, this will make `black` think that the files were edited
# more recently than its cache suggests. Instead, we can set their mtime
# to a fixed date in the far past that won't change and won't cause cache misses in black.
#
# For all workdir Python files modified in or before the oldest few fetched commits,
# make their mtime be 2000-01-01 00:00:00.
OLDEST_COMMIT="$(git log --reverse '--pretty=format:%H' | head -1)"
OLDEST_COMMIT_TIME="$(git show -s '--format=%ai' "$OLDEST_COMMIT")"
find "$WORKDIR" -name '*.py' -type f -not -newermt "$OLDEST_COMMIT_TIME" -exec touch -c -m -t '200001010000' '{}' '+'
echo "oldest-commit=$OLDEST_COMMIT" >> "$GITHUB_OUTPUT"
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
@@ -68,15 +112,26 @@ jobs:
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with lint,typing
poetry install --with dev,lint,test,typing
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.langchain-location }}
env:
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
if: ${{ inputs.working-directory != 'libs/langchain' }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
pip install -e ../langchain
- name: Restore black cache
uses: actions/cache@v3
env:
CACHE_BASE: black-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
with:
path: |
${{ env.WORKDIR }}/.black_cache
key: ${{ env.CACHE_BASE }}-${{ steps.restore-mtimes.outputs.oldest-commit }}
restore-keys:
# If we can't find an exact match for our cache key, accept any with this prefix.
${{ env.CACHE_BASE }}-
- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v3
@@ -89,5 +144,7 @@ jobs:
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
env:
BLACK_CACHE_DIR: .black_cache
run: |
make lint

View File

@@ -7,10 +7,6 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
@@ -44,14 +40,6 @@ jobs:
shell: bash
run: poetry install
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.langchain-location }}
env:
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
- name: Install the opposite major version of pydantic
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
shell: bash

View File

@@ -9,120 +9,13 @@ on:
description: "From which folder this pipeline executes"
env:
PYTHON_VERSION: "3.10"
POETRY_VERSION: "1.6.1"
jobs:
build:
if_release:
# Disallow publishing from branches that aren't `master`.
if: github.ref == 'refs/heads/master'
runs-on: ubuntu-latest
outputs:
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
# Per the trusted publishing GitHub Action:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: Upload build
uses: actions/upload-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Check Version
id: check-version
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
test-pypi-publish:
needs:
- build
uses:
./.github/workflows/_test_release.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
pre-release-checks:
needs:
- build
- test-pypi-publish
runs-on: ubuntu-latest
steps:
# We explicitly *don't* set up caching here. This ensures our tests are
# maximally sensitive to catching breakage.
#
# For example, here's a way that caching can cause a falsely-passing test:
# - Make the langchain package manifest no longer list a dependency package
# as a requirement. This means it won't be installed by `pip install`,
# and attempting to use it would cause a crash.
# - That dependency used to be required, so it may have been cached.
# When restoring the venv packages from cache, that dependency gets included.
# - Tests pass, because the dependency is present even though it wasn't specified.
# - The package is published, and it breaks on the missing dependency when
# used in the real world.
- uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Test published package
shell: bash
env:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
# Here we use:
# - The default regular PyPI index as the *primary* index, meaning
# that it takes priority (https://pypi.org/simple)
# - The test PyPI index as an extra index, so that any dependencies that
# are not found on test PyPI can be resolved and installed anyway.
# (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION
# package because VERSION will not have been uploaded to regular PyPI yet.
#
# TODO: add more in-depth pre-publish tests after testing that importing works
run: |
pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION"
# Replace all dashes in the package name with underscores,
# since that's how Python imports packages with dashes in the name.
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
publish:
needs:
- build
- test-pypi-publish
- pre-release-checks
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
@@ -131,65 +24,28 @@ jobs:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- uses: actions/download-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
mark-release:
needs:
- build
- test-pypi-publish
- pre-release-checks
- publish
runs-on: ubuntu-latest
permissions:
# This permission is needed by `ncipollo/release-action` to
# create the GitHub release.
# This permission is needed by `ncipollo/release-action` to create the GitHub release.
contents: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
python-version: "3.10"
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- uses: actions/download-artifact@v3
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Build project for distribution
run: poetry build
- name: Check Version
id: check-version
run: |
echo version=$(poetry version --short) >> $GITHUB_OUTPUT
- name: Create Release
uses: ncipollo/release-action@v1
if: ${{ inputs.working-directory == 'libs/langchain' }}
@@ -198,5 +54,11 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}
draft: false
generateReleaseNotes: true
tag: v${{ needs.build.outputs.version }}
tag: v${{ steps.check-version.outputs.version }}
commit: master
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true

View File

@@ -7,10 +7,6 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
langchain-location:
required: false
type: string
description: "Relative path to the langchain library folder"
env:
POETRY_VERSION: "1.6.1"
@@ -42,20 +38,19 @@ jobs:
- name: Install dependencies
shell: bash
run: poetry install --with test
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.langchain-location }}
env:
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
run: |
poetry run pip install -e "$LANGCHAIN_LOCATION"
run: poetry install
- name: Run core tests
shell: bash
run: |
make test
run: make test
- name: Install integration dependencies
shell: bash
run: poetry install --with=test_integration
- name: Check integration tests compile
shell: bash
run: poetry run pytest -m compile tests/integration_tests
- name: Ensure the tests did not create any additional files
shell: bash

View File

@@ -1,95 +0,0 @@
name: test-release
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.6.1"
PYTHON_VERSION: "3.10"
jobs:
build:
if: github.ref == 'refs/heads/master'
runs-on: ubuntu-latest
outputs:
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
# Per the trusted publishing GitHub Action:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: Upload build
uses: actions/upload-artifact@v3
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/
- name: Check Version
id: check-version
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
publish:
needs:
- build
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
#
# Trusted publishing has to also be configured on PyPI for each package:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v3
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/
- name: Publish to test PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
repository-url: https://test.pypi.org/legacy/
# We overwrite any existing distributions with the same name and version.
# This is *only for CI use* and is *extremely dangerous* otherwise!
# https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates
skip-existing: true

View File

@@ -1,17 +1,11 @@
---
name: Docs, templates, cookbook lint
name: Documentation Lint
on:
push:
branches: [ master ]
branches: [master]
pull_request:
paths:
- 'docs/**'
- 'templates/**'
- 'cookbook/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/doc_lint.yml'
workflow_dispatch:
branches: [master]
jobs:
check:
@@ -19,17 +13,10 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v2
- name: Run import check
run: |
# We should not encourage imports directly from main init file
# Expect for hub
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: "."
secrets: inherit
git grep 'from langchain import' docs/{docs,snippets} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0

View File

@@ -3,8 +3,6 @@ import toml
pyproject_toml = toml.load("pyproject.toml")
# Extract the ignore words list (adjust the key as per your TOML structure)
ignore_words_list = (
pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
)
ignore_words_list = pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
print(f"::set-output name=ignore_words_list::{ignore_words_list}")
print(f"::set-output name=ignore_words_list::{ignore_words_list}")

View File

@@ -3,19 +3,17 @@ name: libs/langchain CI
on:
push:
branches: [master]
branches: [ master ]
pull_request:
paths:
- ".github/actions/poetry_setup/action.yml"
- ".github/tools/**"
- ".github/workflows/_lint.yml"
- ".github/workflows/_test.yml"
- ".github/workflows/_pydantic_compatibility.yml"
- ".github/workflows/langchain_ci.yml"
- "libs/*"
- "libs/langchain/**"
- "libs/core/**"
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/_pydantic_compatibility.yml'
- '.github/workflows/langchain_ci.yml'
- 'libs/langchain/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
@@ -33,25 +31,22 @@ env:
jobs:
lint:
uses: ./.github/workflows/_lint.yml
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/langchain
secrets: inherit
test:
uses: ./.github/workflows/_test.yml
with:
working-directory: libs/langchain
secrets: inherit
compile-integration-tests:
uses: ./.github/workflows/_compile_integration_test.yml
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/langchain
secrets: inherit
pydantic-compatibility:
uses: ./.github/workflows/_pydantic_compatibility.yml
uses:
./.github/workflows/_pydantic_compatibility.yml
with:
working-directory: libs/langchain
secrets: inherit
@@ -86,11 +81,6 @@ jobs:
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
- name: Install langchain core editable
shell: bash
run: |
poetry run pip install -e ../core
- name: Run extended tests
run: make extended_tests

View File

@@ -1,47 +0,0 @@
---
name: libs/cli CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/_pydantic_compatibility.yml'
- '.github/workflows/langchain_cli_ci.yml'
- 'libs/cli/**'
- 'libs/*'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "libs/cli"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/cli
langchain-location: ../langchain
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/cli
secrets: inherit

View File

@@ -1,13 +0,0 @@
---
name: libs/cli Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/cli
secrets: inherit

View File

@@ -1,52 +0,0 @@
---
name: libs/langchain core CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/_pydantic_compatibility.yml'
- '.github/workflows/langchain_core_ci.yml'
- 'libs/core/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "libs/core"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/core
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/core
secrets: inherit
pydantic-compatibility:
uses:
./.github/workflows/_pydantic_compatibility.yml
with:
working-directory: libs/core
secrets: inherit

View File

@@ -1,13 +0,0 @@
---
name: libs/core Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/core
secrets: inherit

View File

@@ -3,19 +3,17 @@ name: libs/experimental CI
on:
push:
branches: [master]
branches: [ master ]
pull_request:
paths:
- ".github/actions/poetry_setup/action.yml"
- ".github/tools/**"
- ".github/workflows/_lint.yml"
- ".github/workflows/_test.yml"
- ".github/workflows/langchain_experimental_ci.yml"
- "libs/*"
- "libs/experimental/**"
- "libs/langchain/**"
- "libs/core/**"
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/langchain_experimental_ci.yml'
- 'libs/langchain/**'
- 'libs/experimental/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
@@ -33,19 +31,15 @@ env:
jobs:
lint:
uses: ./.github/workflows/_lint.yml
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/experimental
secrets: inherit
test:
uses: ./.github/workflows/_test.yml
with:
working-directory: libs/experimental
secrets: inherit
compile-integration-tests:
uses: ./.github/workflows/_compile_integration_test.yml
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/experimental
secrets: inherit
@@ -86,7 +80,6 @@ jobs:
echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
poetry run pip install -e ../langchain
poetry run pip install -e ../core
- name: Run tests
run: make test

View File

@@ -1,13 +0,0 @@
---
name: Experimental Test Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_test_release.yml
with:
working-directory: libs/experimental
secrets: inherit

View File

@@ -1,13 +0,0 @@
---
name: Test Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_test_release.yml
with:
working-directory: libs/langchain
secrets: inherit

View File

@@ -55,10 +55,6 @@ jobs:
poetry install --with=test_integration
poetry run pip install google-cloud-aiplatform
poetry run pip install "boto3>=1.28.57"
if [[ ${{ matrix.python-version }} != "3.8" ]]
then
poetry run pip install fireworks-ai
fi
- name: Run tests
shell: bash
@@ -68,8 +64,7 @@ jobs:
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_DEPLOYMENT_NAME }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
AZURE_OPENAI_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_DEPLOYMENT_NAME }}
run: |
make scheduled_tests

View File

@@ -1,37 +0,0 @@
---
name: templates CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/templates_ci.yml'
- 'templates/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.6.1"
WORKDIR: "templates"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: templates
langchain-location: ../libs/langchain
secrets: inherit

1
.gitignore vendored
View File

@@ -178,4 +178,3 @@ docs/docs/build
docs/docs/node_modules
docs/docs/yarn.lock
_dist
docs/docs/templates

View File

@@ -1,18 +1,9 @@
# Migrating
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
This migration has already started, but we are remaining backwards compatible until 7/28.
On that date, we will remove functionality from `langchain`.
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
### Migrating to `langchain_experimental`
# Migrating to `langchain_experimental`
We are moving any experimental components of LangChain, or components with vulnerability issues, into `langchain_experimental`.
This guide covers how to migrate.
### Installation
## Installation
Previously:
@@ -22,7 +13,7 @@ Now (only if you want to access things in experimental):
`pip install -U langchain langchain_experimental`
### Things in `langchain.experimental`
## Things in `langchain.experimental`
Previously:
@@ -32,7 +23,7 @@ Now:
`from langchain_experimental import ...`
### PALChain
## PALChain
Previously:
@@ -42,7 +33,7 @@ Now:
`from langchain_experimental.pal_chain import PALChain`
### SQLDatabaseChain
## SQLDatabaseChain
Previously:
@@ -56,7 +47,7 @@ Alternatively, if you are just interested in using the query generation part of
`from langchain.chains import create_sql_query_chain`
### `load_prompt` for Python files
## `load_prompt` for Python files
Note: this only applies if you want to load Python files as prompts.
If you want to load json/yaml files, no change is needed.

View File

@@ -37,18 +37,6 @@ spell_check:
spell_fix:
poetry run codespell --toml pyproject.toml -w
######################
# LINTING AND FORMATTING
######################
lint:
poetry run ruff docs templates cookbook
poetry run ruff format docs templates cookbook --diff
format format_diff:
poetry run ruff format docs templates cookbook
poetry run ruff --select I --fix docs templates cookbook
######################
# HELP
######################

View File

@@ -15,72 +15,71 @@
[![Dependency Status](https://img.shields.io/librariesio/github/langchain-ai/langchain)](https://libraries.io/github/langchain-ai/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/issues)
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to get off the waitlist or speak with our sales team.
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to get off the waitlist or speak with our sales team
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
This migration has already started, but we are remaining backwards compatible until 7/28.
On that date, we will remove functionality from `langchain`.
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
Read how to migrate your code [here](MIGRATE.md).
## Quick Install
With pip:
```bash
pip install langchain
```
`pip install langchain`
or
`pip install langsmith && conda install langchain -c conda-forge`
With conda:
```bash
pip install langsmith && conda install langchain -c conda-forge
```
## 🤔 What is this?
## 🤔 What is LangChain?
Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
This library aims to assist in the development of those types of applications. Common examples of these applications include:
This framework consists of several parts.
- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- **[LangChain Templates](templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.
- **[LangServe](https://github.com/langchain-ai/langserve)**: A library for deploying LangChain chains as a REST API.
- **[LangSmith](https://smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
**This repo contains the `langchain` ([here](libs/langchain)), `langchain-experimental` ([here](libs/experimental)), and `langchain-cli` ([here](libs/cli)) Python packages, as well as [LangChain Templates](templates).**
![LangChain Stack](docs/static/img/langchain_stack.png)
## 🧱 What can you build with LangChain?
**❓ Retrieval augmented generation**
**❓ Question Answering over specific documents**
- [Documentation](https://python.langchain.com/docs/use_cases/question_answering/)
- End-to-end Example: [Chat LangChain](https://chat.langchain.com) and [repo](https://github.com/langchain-ai/chat-langchain)
- End-to-end Example: [Question Answering over Notion Database](https://github.com/hwchase17/notion-qa)
**💬 Analyzing structured data**
**💬 Chatbots**
- [Documentation](https://python.langchain.com/docs/use_cases/qa_structured/sql)
- End-to-end Example: [SQL Llama2 Template](https://github.com/langchain-ai/langchain/tree/master/templates/sql-llama2)
- [Documentation](https://python.langchain.com/docs/use_cases/chatbots/)
- End-to-end Example: [Chat-LangChain](https://github.com/langchain-ai/chat-langchain)
**🤖 Chatbots**
**🤖 Agents**
- [Documentation](https://python.langchain.com/docs/use_cases/chatbots)
- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
- [Documentation](https://python.langchain.com/docs/modules/agents/)
- End-to-end Example: [GPT+WolframAlpha](https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain)
And much more! Head to the [Use cases](https://python.langchain.com/docs/use_cases/) section of the docs for more.
## 📖 Documentation
## 🚀 How does LangChain help?
The main value props of the LangChain libraries are:
1. **Components**: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
Please see [here](https://python.langchain.com) for full documentation on:
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
- Getting started (installation, setting up the environment, simple examples)
- How-To examples (demos, integrations, helper functions)
- Reference (full API docs)
- Resources (high-level explanation of core concepts)
Components fall into the following **modules**:
## 🚀 What can this help with?
**📃 Model I/O:**
There are six main areas that LangChain is designed to help with.
These are, in increasing order of complexity:
**📃 LLMs and Prompts:**
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
**📚 Retrieval:**
**🔗 Chains:**
Chains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
**📚 Data Augmented Generation:**
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
@@ -88,16 +87,15 @@ Data Augmented Generation involves specific types of chains that first interact
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
## 📖 Documentation
**🧠 Memory:**
Please see [here](https://python.langchain.com) for full documentation, which includes:
Memory refers to persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
- [Getting started](https://python.langchain.com/docs/get_started/introduction): installation, setting up the environment, simple examples
- Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/) and [integrations](https://python.langchain.com/docs/integrations/providers)
- [Use case](https://python.langchain.com/docs/use_cases/qa_structured/sql) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/adapters/openai)
- [LangSmith](https://python.langchain.com/docs/langsmith/), [LangServe](https://python.langchain.com/docs/langserve), and [LangChain Template](https://python.langchain.com/docs/templates/) overviews
- [Reference](https://api.python.langchain.com): full API docs
**🧐 Evaluation:**
[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is by using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
For more information on these concepts, please see our [full documentation](https://python.langchain.com).
## 💁 Contributing

View File

@@ -47,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 8,
"id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee",
"metadata": {},
"outputs": [
@@ -60,26 +60,28 @@
}
],
"source": [
"# Local\n",
"# Local \n",
"from langchain.chat_models import ChatOllama\n",
"\n",
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",
"\n",
"# API\n",
"from getpass import getpass\n",
"from langchain.llms import Replicate\n",
"\n",
"# REPLICATE_API_TOKEN = getpass()\n",
"# os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n",
"replicate_id = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
"llama2_chat_replicate = Replicate(\n",
" model=replicate_id, input={\"temperature\": 0.01, \"max_length\": 500, \"top_p\": 1}\n",
" model=replicate_id,\n",
" input={\"temperature\": 0.01, \n",
" \"max_length\": 500, \n",
" \"top_p\": 1}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 12,
"id": "ce96f7ea-b3d5-44e1-9fa5-a79e04a9e1fb",
"metadata": {},
"outputs": [],
@@ -102,20 +104,17 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 13,
"id": "025bdd82-3bb1-4948-bc7c-c3ccd94fd05c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities import SQLDatabase\n",
"\n",
"db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info=0)\n",
"\n",
"db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info= 0)\n",
"\n",
"def get_schema(_):\n",
" return db.get_table_info()\n",
"\n",
"\n",
"def run_query(query):\n",
" return db.run(query)"
]
@@ -132,7 +131,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 14,
"id": "5a4933ea-d9c0-4b0a-8177-ba4490c6532b",
"metadata": {},
"outputs": [
@@ -142,7 +141,7 @@
"' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
]
},
"execution_count": 4,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
@@ -150,29 +149,26 @@
"source": [
"# Prompt\n",
"from langchain.prompts import ChatPromptTemplate\n",
"\n",
"template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n",
"{schema}\n",
"\n",
"Question: {question}\n",
"SQL Query:\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
" (\"human\", template),\n",
" ]\n",
")\n",
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
" (\"human\", template)\n",
"])\n",
"\n",
"# Chain to query\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"sql_response = (\n",
" RunnablePassthrough.assign(schema=get_schema)\n",
" | prompt\n",
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
" | StrOutputParser()\n",
")\n",
" RunnablePassthrough.assign(schema=get_schema)\n",
" | prompt\n",
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
" | StrOutputParser()\n",
" )\n",
"\n",
"sql_response.invoke({\"question\": \"What team is Klay Thompson on?\"})"
]
@@ -213,23 +209,18 @@
"Question: {question}\n",
"SQL Query: {query}\n",
"SQL Response: {response}\"\"\"\n",
"prompt_response = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\",\n",
" ),\n",
" (\"human\", template),\n",
" ]\n",
")\n",
"prompt_response = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\"),\n",
" (\"human\", template)\n",
"])\n",
"\n",
"full_chain = (\n",
" RunnablePassthrough.assign(query=sql_response)\n",
" RunnablePassthrough.assign(query=sql_response) \n",
" | RunnablePassthrough.assign(\n",
" schema=get_schema,\n",
" response=lambda x: db.run(x[\"query\"]),\n",
" )\n",
" | prompt_response\n",
" | prompt_response \n",
" | llm\n",
")\n",
"\n",
@@ -259,8 +250,8 @@
},
{
"cell_type": "code",
"execution_count": 7,
"id": "022868f2-128e-42f5-8d90-d3bb2f11d994",
"execution_count": 19,
"id": "1985aa1c-eb8f-4fb1-a54f-c8aa10744687",
"metadata": {},
"outputs": [
{
@@ -269,7 +260,7 @@
"' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
]
},
"execution_count": 7,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -278,44 +269,61 @@
"# Prompt\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"\n",
"template = \"\"\"Given an input question, convert it to a SQL query. No pre-amble. Based on the table schema below, write a SQL query that would answer the user's question:\n",
"template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n",
"{schema}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", template),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"Question: {question}\n",
"SQL Query:\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", template)\n",
"])\n",
"\n",
"memory = ConversationBufferMemory(return_messages=True)\n",
"\n",
"# Chain to query with memory\n",
"# Chain to query with memory \n",
"from langchain.schema.runnable import RunnableLambda\n",
"\n",
"sql_chain = (\n",
" RunnablePassthrough.assign(\n",
" schema=get_schema,\n",
" history=RunnableLambda(lambda x: memory.load_memory_variables(x)[\"history\"]),\n",
" )\n",
" | prompt\n",
" schema=get_schema,\n",
" history=RunnableLambda(lambda x: memory.load_memory_variables(x)[\"history\"])\n",
" )| prompt\n",
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
" | StrOutputParser()\n",
")\n",
"\n",
"\n",
"def save(input_output):\n",
" output = {\"output\": input_output.pop(\"output\")}\n",
" memory.save_context(input_output, output)\n",
" return output[\"output\"]\n",
"\n",
"\n",
" return output['output']\n",
" \n",
"sql_response_memory = RunnablePassthrough.assign(output=sql_chain) | save\n",
"sql_response_memory.invoke({\"question\": \"What team is Klay Thompson on?\"})"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0b45818a-1498-441d-b82d-23c29428c2bb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' SELECT \"SALARY\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql_response_memory.invoke({\"question\": \"What is his salary?\"})"
]
},
{
"cell_type": "code",
"execution_count": 21,
@@ -341,23 +349,18 @@
"Question: {question}\n",
"SQL Query: {query}\n",
"SQL Response: {response}\"\"\"\n",
"prompt_response = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\",\n",
" ),\n",
" (\"human\", template),\n",
" ]\n",
")\n",
"prompt_response = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\"),\n",
" (\"human\", template)\n",
"])\n",
"\n",
"full_chain = (\n",
" RunnablePassthrough.assign(query=sql_response_memory)\n",
" RunnablePassthrough.assign(query=sql_response_memory) \n",
" | RunnablePassthrough.assign(\n",
" schema=get_schema,\n",
" response=lambda x: db.run(x[\"query\"]),\n",
" )\n",
" | prompt_response\n",
" | prompt_response \n",
" | llm\n",
")\n",
"\n",

File diff suppressed because one or more lines are too long

View File

@@ -4,24 +4,22 @@ Example code for building applications with LangChain, with an emphasis on more
Notebook | Description
:- | :-
[LLaMA2_sql_chat.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/LLaMA2_sql_chat.ipynb) | Build a chat application that interacts with a SQL database using an open source llm (llama2), specifically demonstrated on an SQLite database containing rosters.
[LLaMA2_sql_chat.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/LLaMA2_sql_chat.ipynb) | Build a chat application that interacts with a sql database using an open source llm (llama2), specifically demonstrated on a sqlite database containing nba rosters.
[Semi_Structured_RAG.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_Structured_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data, including text and tables, using unstructured for parsing, multi-vector retriever for storing, and lcel for implementing chains.
[Semi_structured_and_multi_moda...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using unstructured for parsing, multi-vector retriever for storage and retrieval, and lcel for implementing chains.
[Semi_structured_multi_modal_RA...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using various tools and methods such as unstructured for parsing, multi-vector retriever for storing, lcel for implementing chains, and open source language models like llama2, llava, and gpt4all.
[analyze_document.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/analyze_document.ipynb) | Analyze a single long document.
[autogpt/autogpt.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/autogpt.ipynb) | Implement autogpt, a language model, with langchain primitives such as llms, prompttemplates, vectorstores, embeddings, and tools.
[autogpt/marathon_times.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/marathon_times.ipynb) | Implement autogpt for finding winning marathon times.
[baby_agi.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/baby_agi.ipynb) | Implement babyagi, an ai agent that can generate and execute tasks based on a given objective, with the flexibility to swap out specific vectorstores/model providers.
[baby_agi_with_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/baby_agi_with_agent.ipynb) | Swap out the execution chain in the babyagi notebook with an agent that has access to tools, aiming to obtain more reliable information.
[camel_role_playing.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/camel_role_playing.ipynb) | Implement the camel framework for creating autonomous cooperative agents in large-scale language models, using role-playing and inception prompting to guide chat agents towards task completion.
[camel_role_playing.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/camel_role_playing.ipynb) | Implement the camel framework for creating autonomous cooperative agents in large scale language models, using role-playing and inception prompting to guide chat agents towards task completion.
[causal_program_aided_language_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/causal_program_aided_language_model.ipynb) | Implement the causal program-aided language (cpal) chain, which improves upon the program-aided language (pal) by incorporating causal structure to prevent hallucination in language models, particularly when dealing with complex narratives and math problems with nested dependencies.
[code-analysis-deeplake.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/code-analysis-deeplake.ipynb) | Analyze its own code base with the help of gpt and activeloop's deep lake.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval.ipynb) | Build a custom agent that can interact with ai plugins by retrieving tools and creating natural language wrappers around openapi endpoints.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval_using_plugnplai.ipynb) | Build a custom agent with plugin retrieval functionality, utilizing ai plugins from the `plugnplai` directory.
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl api.
[forward_looking_retrieval_augm...](https://github.com/langchain-ai/langchain/tree/master/cookbook/forward_looking_retrieval_augmented_generation.ipynb) | Implement the forward-looking active retrieval augmented generation (flare) method, which generates answers to questions, identifies uncertain tokens, generates hypothetical questions based on these tokens, and retrieves relevant documents to continue generating the answer.
[generative_agents_interactive_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/generative_agents_interactive_simulacra_of_human_behavior.ipynb) | Implement a generative agent that simulates human behavior, based on a research paper, using a time-weighted memory object backed by a langchain retriever.
[gymnasium_agent_simulation.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/gymnasium_agent_simulation.ipynb) | Create a simple agent-environment interaction loop in simulated environments like text-based games with gymnasium.
@@ -39,19 +37,16 @@ Notebook | Description
[multiagent_authoritarian.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_authoritarian.ipynb) | Implement a multi-agent simulation where a privileged agent controls the conversation, including deciding who speaks and when the conversation ends, in the context of a simulated news network.
[multiagent_bidding.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_bidding.ipynb) | Implement a multi-agent simulation where agents bid to speak, with the highest bidder speaking next, demonstrated through a fictitious presidential debate example.
[myscale_vector_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/myscale_vector_sql.ipynb) | Access and interact with the myscale integrated vector database, which can enhance the performance of language model (llm) applications.
[openai_functions_retrieval_qa....](https://github.com/langchain-ai/langchain/tree/master/cookbook/openai_functions_retrieval_qa.ipynb) | Structure response output in a question-answering system by incorporating openai functions into a retrieval pipeline.
[openai_v1_cookbook.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/openai_v1_cookbook.ipynb) | Explore new functionality released alongside the V1 release of the OpenAI Python library.
[openai_functions_retrieval_qa....](https://github.com/langchain-ai/langchain/tree/master/cookbook/openai_functions_retrieval_qa.ipynb) | Structure response output in a question answering system by incorporating openai functions into a retrieval pipeline.
[petting_zoo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/petting_zoo.ipynb) | Create multi-agent simulations with simulated environments using the petting zoo library.
[plan_and_execute_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/plan_and_execute_agent.ipynb) | Create plan-and-execute agents that accomplish objectives by planning tasks with a language model (llm) and executing them with a separate agent.
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
[smart_llm.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/smart_llm.ipynb) | Implement a smartllmchain, a self-critique chain that generates multiple output proposals, critiques them to find the best one, and then improves upon it to produce a final output.
[tree_of_thought.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/tree_of_thought.ipynb) | Query a large language model using the tree of thought technique.
[twitter-the-algorithm-analysis...](https://github.com/langchain-ai/langchain/tree/master/cookbook/twitter-the-algorithm-analysis-deeplake.ipynb) | Analyze the source code of the Twitter algorithm with the help of gpt4 and activeloop's deep lake.
[twitter-the-algorithm-analysis...](https://github.com/langchain-ai/langchain/tree/master/cookbook/twitter-the-algorithm-analysis-deeplake.ipynb) | Analyze the source code of the twitter algorithm with the help of gpt4 and activeloop's deep lake.
[two_agent_debate_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_agent_debate_tools.ipynb) | Simulate multi-agent dialogues where the agents can utilize various tools.
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialogue simulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialoguesimulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.

View File

@@ -60,7 +60,7 @@
"metadata": {},
"outputs": [],
"source": [
"! brew install tesseract\n",
"! brew install tesseract \n",
"! brew install poppler"
]
},
@@ -102,29 +102,27 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Any\n",
"\n",
"from lxml import html\n",
"from pydantic import BaseModel\n",
"from typing import Any, Optional\n",
"from unstructured.partition.pdf import partition_pdf\n",
"\n",
"# Get elements\n",
"raw_pdf_elements = partition_pdf(\n",
" filename=path + \"LLaMA2.pdf\",\n",
" # Unstructured first finds embedded image blocks\n",
" extract_images_in_pdf=False,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document\n",
" infer_table_structure=True,\n",
" # Post processing to aggregate text once we have the title\n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars\n",
" max_characters=4000,\n",
" new_after_n_chars=3800,\n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path,\n",
")"
"raw_pdf_elements = partition_pdf(filename=path+\"LLaMA2.pdf\",\n",
" # Unstructured first finds embedded image blocks\n",
" extract_images_in_pdf=False,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document \n",
" infer_table_structure=True, \n",
" # Post processing to aggregate text once we have the title \n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars \n",
" max_characters=4000, \n",
" new_after_n_chars=3800, \n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path)"
]
},
{
@@ -192,7 +190,6 @@
" type: str\n",
" text: Any\n",
"\n",
"\n",
"# Categorize by type\n",
"categorized_elements = []\n",
"for element in raw_pdf_elements:\n",
@@ -262,14 +259,14 @@
"metadata": {},
"outputs": [],
"source": [
"# Prompt\n",
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
"# Prompt \n",
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
"\n",
"# Summary chain\n",
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
"# Summary chain \n",
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
]
},
{
@@ -317,15 +314,17 @@
"outputs": [],
"source": [
"import uuid\n",
"\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.schema.document import Document\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
"vectorstore = Chroma(\n",
" collection_name=\"summaries\",\n",
" embedding_function=OpenAIEmbeddings()\n",
")\n",
"\n",
"# The storage layer for the parent documents\n",
"store = InMemoryStore()\n",
@@ -333,26 +332,20 @@
"\n",
"# The retriever (empty to start)\n",
"retriever = MultiVectorRetriever(\n",
" vectorstore=vectorstore,\n",
" docstore=store,\n",
" vectorstore=vectorstore, \n",
" docstore=store, \n",
" id_key=id_key,\n",
")\n",
"\n",
"# Add texts\n",
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
"summary_texts = [\n",
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
" for i, s in enumerate(text_summaries)\n",
"]\n",
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
"retriever.vectorstore.add_documents(summary_texts)\n",
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
"\n",
"# Add tables\n",
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
"summary_tables = [\n",
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
" for i, s in enumerate(table_summaries)\n",
"]\n",
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
"retriever.vectorstore.add_documents(summary_tables)\n",
"retriever.docstore.mset(list(zip(table_ids, tables)))"
]
@@ -374,6 +367,7 @@
"metadata": {},
"outputs": [],
"source": [
"from operator import itemgetter\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
@@ -384,13 +378,13 @@
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"# LLM\n",
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
"\n",
"# RAG pipeline\n",
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | model\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | model \n",
" | StrOutputParser()\n",
")"
]

View File

@@ -92,30 +92,28 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Any\n",
"\n",
"from lxml import html\n",
"from pydantic import BaseModel\n",
"from typing import Any, Optional\n",
"from unstructured.partition.pdf import partition_pdf\n",
"\n",
"# Get elements\n",
"raw_pdf_elements = partition_pdf(\n",
" filename=path + \"LLaVA.pdf\",\n",
" # Using pdf format to find embedded image blocks\n",
" extract_images_in_pdf=True,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document\n",
" infer_table_structure=True,\n",
" # Post processing to aggregate text once we have the title\n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars\n",
" # Hard max on chunks\n",
" max_characters=4000,\n",
" new_after_n_chars=3800,\n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path,\n",
")"
"raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
" # Using pdf format to find embedded image blocks\n",
" extract_images_in_pdf=True,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document \n",
" infer_table_structure=True, \n",
" # Post processing to aggregate text once we have the title \n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars \n",
" # Hard max on chunks\n",
" max_characters=4000, \n",
" new_after_n_chars=3800, \n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path)"
]
},
{
@@ -172,7 +170,6 @@
" type: str\n",
" text: Any\n",
"\n",
"\n",
"# Categorize by type\n",
"categorized_elements = []\n",
"for element in raw_pdf_elements:\n",
@@ -223,14 +220,14 @@
"metadata": {},
"outputs": [],
"source": [
"# Prompt\n",
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\\n",
"# Prompt \n",
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
"\n",
"# Summary chain\n",
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
"# Summary chain \n",
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
]
},
{
@@ -313,7 +310,7 @@
" # Execute the command and save the output to the defined output file\n",
" /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
"\n",
"done\n"
"done"
]
},
{
@@ -337,8 +334,7 @@
"metadata": {},
"outputs": [],
"source": [
"import glob\n",
"import os\n",
"import os, glob\n",
"\n",
"# Get all .txt file summaries\n",
"file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
@@ -346,11 +342,11 @@
"# Read each file and store its content in a list\n",
"img_summaries = []\n",
"for file_path in file_paths:\n",
" with open(file_path, \"r\") as file:\n",
" with open(file_path, 'r') as file:\n",
" img_summaries.append(file.read())\n",
"\n",
"# Remove any logging prior to summary\n",
"logging_header = \"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
"logging_header=\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
"cleaned_img_summary = [s.split(logging_header, 1)[1].strip() for s in img_summaries]"
]
},
@@ -372,15 +368,17 @@
"outputs": [],
"source": [
"import uuid\n",
"\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.schema.document import Document\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
"vectorstore = Chroma(\n",
" collection_name=\"summaries\",\n",
" embedding_function=OpenAIEmbeddings()\n",
")\n",
"\n",
"# The storage layer for the parent documents\n",
"store = InMemoryStore()\n",
@@ -388,26 +386,20 @@
"\n",
"# The retriever (empty to start)\n",
"retriever = MultiVectorRetriever(\n",
" vectorstore=vectorstore,\n",
" docstore=store,\n",
" vectorstore=vectorstore, \n",
" docstore=store, \n",
" id_key=id_key,\n",
")\n",
"\n",
"# Add texts\n",
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
"summary_texts = [\n",
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
" for i, s in enumerate(text_summaries)\n",
"]\n",
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
"retriever.vectorstore.add_documents(summary_texts)\n",
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
"\n",
"# Add tables\n",
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
"summary_tables = [\n",
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
" for i, s in enumerate(table_summaries)\n",
"]\n",
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
"retriever.vectorstore.add_documents(summary_tables)\n",
"retriever.docstore.mset(list(zip(table_ids, tables)))"
]
@@ -431,12 +423,9 @@
"source": [
"# Add image summaries\n",
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
"summary_img = [\n",
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
" for i, s in enumerate(cleaned_img_summary)\n",
"]\n",
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
"retriever.vectorstore.add_documents(summary_img)\n",
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary)))"
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) "
]
},
{
@@ -460,19 +449,10 @@
"source": [
"# Add images\n",
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
"summary_img = [\n",
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
" for i, s in enumerate(cleaned_img_summary)\n",
"]\n",
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
"retriever.vectorstore.add_documents(summary_img)\n",
"### Fetch images\n",
"retriever.docstore.mset(\n",
" list(\n",
" zip(\n",
" img_ids,\n",
" )\n",
" )\n",
")"
"retriever.docstore.mset(list(zip(img_ids, ### image ### ))) "
]
},
{
@@ -562,9 +542,7 @@
],
"source": [
"# We can retrieve this table\n",
"retriever.get_relevant_documents(\n",
" \"What are results for LLaMA across across domains / subjects?\"\n",
")[1]"
"retriever.get_relevant_documents(\"What are results for LLaMA across across domains / subjects?\")[1]"
]
},
{
@@ -614,9 +592,7 @@
}
],
"source": [
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[\n",
" 1\n",
"]"
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[1]"
]
},
{
@@ -646,6 +622,7 @@
"metadata": {},
"outputs": [],
"source": [
"from operator import itemgetter\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
@@ -656,15 +633,15 @@
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"# Option 1: LLM\n",
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
"# Option 2: Multi-modal LLM\n",
"# model = GPT4-V or LLaVA\n",
"\n",
"# RAG pipeline\n",
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | model\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | model \n",
" | StrOutputParser()\n",
")"
]
@@ -687,9 +664,7 @@
}
],
"source": [
"chain.invoke(\n",
" \"What is the performance of LLaVa across across multiple image domains / subjects?\"\n",
")"
"chain.invoke(\"What is the performance of LLaVa across across multiple image domains / subjects?\")"
]
},
{
@@ -738,7 +713,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.16"
}
},
"nbformat": 4,

View File

@@ -82,33 +82,32 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Any\n",
"\n",
"import pandas as pd\n",
"from lxml import html\n",
"from pydantic import BaseModel\n",
"from typing import Any, Optional\n",
"from unstructured.partition.pdf import partition_pdf\n",
"\n",
"# Path to save images\n",
"path = \"/Users/rlm/Desktop/Papers/LLaVA/\"\n",
"\n",
"# Get elements\n",
"raw_pdf_elements = partition_pdf(\n",
" filename=path + \"LLaVA.pdf\",\n",
" # Using pdf format to find embedded image blocks\n",
" extract_images_in_pdf=True,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document\n",
" infer_table_structure=True,\n",
" # Post processing to aggregate text once we have the title\n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars\n",
" # Hard max on chunks\n",
" max_characters=4000,\n",
" new_after_n_chars=3800,\n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path,\n",
")"
"raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
" # Using pdf format to find embedded image blocks\n",
" extract_images_in_pdf=True,\n",
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
" # Titles are any sub-section of the document \n",
" infer_table_structure=True, \n",
" # Post processing to aggregate text once we have the title \n",
" chunking_strategy=\"by_title\",\n",
" # Chunking params to aggregate text blocks\n",
" # Attempt to create a new chunk 3800 chars\n",
" # Attempt to keep chunks > 2000 chars \n",
" # Hard max on chunks\n",
" max_characters=4000, \n",
" new_after_n_chars=3800, \n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path)"
]
},
{
@@ -166,7 +165,6 @@
" type: str\n",
" text: Any\n",
"\n",
"\n",
"# Categorize by type\n",
"categorized_elements = []\n",
"for element in raw_pdf_elements:\n",
@@ -221,14 +219,14 @@
"metadata": {},
"outputs": [],
"source": [
"# Prompt\n",
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\\n",
"# Prompt \n",
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
"\n",
"# Summary chain\n",
"# Summary chain \n",
"model = ChatOllama(model=\"llama2:13b-chat\")\n",
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
]
},
{
@@ -311,7 +309,7 @@
" # Execute the command and save the output to the defined output file\n",
" /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
"\n",
"done\n"
"done"
]
},
{
@@ -321,8 +319,7 @@
"metadata": {},
"outputs": [],
"source": [
"import glob\n",
"import os\n",
"import os, glob\n",
"\n",
"# Get all .txt files in the directory\n",
"file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
@@ -330,14 +327,11 @@
"# Read each file and store its content in a list\n",
"img_summaries = []\n",
"for file_path in file_paths:\n",
" with open(file_path, \"r\") as file:\n",
" with open(file_path, 'r') as file:\n",
" img_summaries.append(file.read())\n",
"\n",
"# Clean up residual logging\n",
"cleaned_img_summary = [\n",
" s.split(\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\", 1)[1].strip()\n",
" for s in img_summaries\n",
"]"
"cleaned_img_summary = [s.split(\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\", 1)[1].strip() for s in img_summaries]"
]
},
{
@@ -375,26 +369,26 @@
],
"source": [
"import uuid\n",
"\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.schema.document import Document\n",
"from langchain.embeddings import GPT4AllEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(\n",
" collection_name=\"summaries\", embedding_function=GPT4AllEmbeddings()\n",
" collection_name=\"summaries\",\n",
" embedding_function=GPT4AllEmbeddings()\n",
")\n",
"\n",
"# The storage layer for the parent documents\n",
"store = InMemoryStore() # <- Can we extend this to images\n",
"store = InMemoryStore() # <- Can we extend this to images \n",
"id_key = \"doc_id\"\n",
"\n",
"# The retriever (empty to start)\n",
"retriever = MultiVectorRetriever(\n",
" vectorstore=vectorstore,\n",
" docstore=store,\n",
" vectorstore=vectorstore, \n",
" docstore=store, \n",
" id_key=id_key,\n",
")"
]
@@ -418,32 +412,21 @@
"source": [
"# Add texts\n",
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
"summary_texts = [\n",
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
" for i, s in enumerate(text_summaries)\n",
"]\n",
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
"retriever.vectorstore.add_documents(summary_texts)\n",
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
"\n",
"# Add tables\n",
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
"summary_tables = [\n",
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
" for i, s in enumerate(table_summaries)\n",
"]\n",
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
"retriever.vectorstore.add_documents(summary_tables)\n",
"retriever.docstore.mset(list(zip(table_ids, tables)))\n",
"\n",
"# Add images\n",
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
"summary_img = [\n",
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
" for i, s in enumerate(cleaned_img_summary)\n",
"]\n",
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
"retriever.vectorstore.add_documents(summary_img)\n",
"retriever.docstore.mset(\n",
" list(zip(img_ids, cleaned_img_summary))\n",
") # Store the image summary as the raw document"
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) # Store the image summary as the raw document"
]
},
{
@@ -501,9 +484,7 @@
}
],
"source": [
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[\n",
" 0\n",
"]"
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[0]"
]
},
{
@@ -532,6 +513,7 @@
"metadata": {},
"outputs": [],
"source": [
"from operator import itemgetter\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
@@ -548,9 +530,9 @@
"\n",
"# RAG pipeline\n",
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | model\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | model \n",
" | StrOutputParser()\n",
")"
]
@@ -573,9 +555,7 @@
}
],
"source": [
"chain.invoke(\n",
" \"What is the performance of LLaVa across across multiple image domains / subjects?\"\n",
")"
"chain.invoke(\"What is the performance of LLaVa across across multiple image domains / subjects?\")"
]
},
{
@@ -604,9 +584,7 @@
}
],
"source": [
"chain.invoke(\n",
" \"Explain any images / figures in the paper with playful and creative examples.\"\n",
")"
"chain.invoke(\"Explain any images / figures in the paper with playful and creative examples.\")"
]
},
{

File diff suppressed because one or more lines are too long

View File

@@ -1,105 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f69d4a4c-137d-47e9-bea1-786afce9c1c0",
"metadata": {},
"source": [
"# Analyze a single long document\n",
"\n",
"The AnalyzeDocumentChain takes in a single document, splits it up, and then runs it through a CombineDocumentsChain."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2a0707ce-6d2d-471b-bc33-64da32a7b3f0",
"metadata": {},
"outputs": [],
"source": [
"with open(\"../docs/docs/modules/state_of_the_union.txt\") as f:\n",
" state_of_the_union = f.read()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ca14d161-2d5b-4a6c-a296-77d8ce4b28cd",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import AnalyzeDocumentChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "9f97406c-85a9-45fb-99ce-9138c0ba3731",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"qa_chain = load_qa_chain(llm, chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0871a753-f5bb-4b4f-a394-f87f2691f659",
"metadata": {},
"outputs": [],
"source": [
"qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e6f86428-3c2c-46a0-a57c-e22826fdbf91",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The President said, \"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa_document_chain.run(\n",
" input_document=state_of_the_union,\n",
" question=\"what did the president say about justice breyer?\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -27,10 +27,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import Tool\n",
"from langchain.tools.file_management.read import ReadFileTool\n",
"from langchain.tools.file_management.write import WriteFileTool\n",
"from langchain.utilities import SerpAPIWrapper\n",
"from langchain.agents import Tool\n",
"from langchain.tools.file_management.write import WriteFileTool\n",
"from langchain.tools.file_management.read import ReadFileTool\n",
"\n",
"search = SerpAPIWrapper()\n",
"tools = [\n",
@@ -61,9 +61,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.vectorstores import FAISS"
"from langchain.embeddings import OpenAIEmbeddings"
]
},
{
@@ -100,8 +100,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain_experimental.autonomous_agents import AutoGPT"
"from langchain_experimental.autonomous_agents import AutoGPT\n",
"from langchain.chat_models import ChatOpenAI"
]
},
{

View File

@@ -34,15 +34,16 @@
"outputs": [],
"source": [
"# General\n",
"import asyncio\n",
"import os\n",
"\n",
"import nest_asyncio\n",
"import pandas as pd\n",
"from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.docstore.document import Document\n",
"from langchain_experimental.autonomous_agents import AutoGPT\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent\n",
"from langchain.docstore.document import Document\n",
"import asyncio\n",
"import nest_asyncio\n",
"\n",
"\n",
"# Needed synce jupyter runs an async eventloop\n",
"nest_asyncio.apply()"
@@ -91,7 +92,6 @@
"import os\n",
"from contextlib import contextmanager\n",
"from typing import Optional\n",
"\n",
"from langchain.agents import tool\n",
"from langchain.tools.file_management.read import ReadFileTool\n",
"from langchain.tools.file_management.write import WriteFileTool\n",
@@ -223,13 +223,14 @@
},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources.loading import (\n",
" BaseCombineDocumentsChain,\n",
" load_qa_with_sources_chain,\n",
")\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.tools import BaseTool, DuckDuckGoSearchRun\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"from pydantic import Field\n",
"from langchain.chains.qa_with_sources.loading import (\n",
" load_qa_with_sources_chain,\n",
" BaseCombineDocumentsChain,\n",
")\n",
"\n",
"\n",
"def _get_text_splitter():\n",
@@ -310,9 +311,10 @@
"source": [
"# Memory\n",
"import faiss\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.tools.human.tool import HumanInputRun\n",
"\n",
"embeddings_model = OpenAIEmbeddings()\n",
"embedding_size = 1536\n",

View File

@@ -29,10 +29,16 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional\n",
"import os\n",
"from collections import deque\n",
"from typing import Dict, List, Optional, Any\n",
"\n",
"from langchain.chains import LLMChain\nfrom langchain.llms import OpenAI\nfrom langchain.prompts import PromptTemplate\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.llms import BaseLLM\n",
"from langchain.schema.vectorstore import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain\n",
"from langchain_experimental.autonomous_agents import BabyAGI"
]
},
@@ -53,8 +59,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.vectorstores import FAISS"
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore"
]
},
{

View File

@@ -25,12 +25,16 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional\n",
"import os\n",
"from collections import deque\n",
"from typing import Dict, List, Optional, Any\n",
"\n",
"from langchain.chains import LLMChain\n",
"from langchain.chains import LLMChain\nfrom langchain.llms import OpenAI\nfrom langchain.prompts import PromptTemplate\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.llms import BaseLLM\n",
"from langchain.schema.vectorstore import VectorStore\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains.base import Chain\n",
"from langchain_experimental.autonomous_agents import BabyAGI"
]
},
@@ -62,8 +66,8 @@
"source": [
"%pip install faiss-cpu > /dev/null\n",
"%pip install google-search-results > /dev/null\n",
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.vectorstores import FAISS"
"from langchain.vectorstores import FAISS\n",
"from langchain.docstore import InMemoryDocstore"
]
},
{
@@ -106,10 +110,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import AgentExecutor, Tool, ZeroShotAgent\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.utilities import SerpAPIWrapper\n",
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
"\n",
"todo_prompt = PromptTemplate.from_template(\n",
" \"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\"\n",

View File

@@ -35,17 +35,16 @@
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts.chat import (\n",
" HumanMessagePromptTemplate,\n",
" SystemMessagePromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" BaseMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},

View File

@@ -47,9 +47,10 @@
"outputs": [],
"source": [
"from IPython.display import SVG\n",
"from langchain.llms import OpenAI\n",
"\n",
"from langchain_experimental.cpal.base import CPALChain\n",
"from langchain_experimental.pal_chain import PALChain\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0, max_tokens=512)\n",
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",

View File

@@ -177,7 +177,7 @@
" try:\n",
" loader = TextLoader(os.path.join(dirpath, file), encoding=\"utf-8\")\n",
" docs.extend(loader.load_and_split())\n",
" except Exception:\n",
" except Exception as e:\n",
" pass\n",
"print(f\"{len(docs)}\")"
]
@@ -648,7 +648,7 @@
{
"data": {
"text/plain": [
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={})"
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='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', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={})"
]
},
"execution_count": 13,
@@ -717,6 +717,7 @@
"source": [
"from langchain.vectorstores import DeepLake\n",
"\n",
"\n",
"username = \"<USERNAME_OR_ORG>\"\n",
"\n",
"\n",
@@ -833,12 +834,10 @@
},
"outputs": [],
"source": [
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(\n",
" model_name=\"gpt-3.5-turbo-0613\"\n",
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
]
},

View File

@@ -32,20 +32,19 @@
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"from typing import Union\n",
"\n",
"from langchain.agents import (\n",
" Tool,\n",
" AgentExecutor,\n",
" AgentOutputParser,\n",
" LLMSingleActionAgent,\n",
" AgentOutputParser,\n",
")\n",
"from langchain.agents.agent_toolkits import NLAToolkit\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import StringPromptTemplate\n",
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
"from typing import List, Union\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"from langchain.tools.plugin import AIPlugin"
"from langchain.agents.agent_toolkits import NLAToolkit\n",
"from langchain.tools.plugin import AIPlugin\n",
"import re"
]
},
{
@@ -114,9 +113,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.schema import Document\n",
"from langchain.vectorstores import FAISS"
"from langchain.schema import Document"
]
},
{

View File

@@ -56,21 +56,20 @@
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"from typing import Union\n",
"\n",
"import plugnplai\n",
"from langchain.agents import (\n",
" Tool,\n",
" AgentExecutor,\n",
" AgentOutputParser,\n",
" LLMSingleActionAgent,\n",
" AgentOutputParser,\n",
")\n",
"from langchain.agents.agent_toolkits import NLAToolkit\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import StringPromptTemplate\n",
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
"from typing import List, Union\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"from langchain.tools.plugin import AIPlugin"
"from langchain.agents.agent_toolkits import NLAToolkit\n",
"from langchain.tools.plugin import AIPlugin\n",
"import re\n",
"import plugnplai"
]
},
{
@@ -138,9 +137,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.schema import Document\n",
"from langchain.vectorstores import FAISS"
"from langchain.schema import Document"
]
},
{

View File

@@ -48,17 +48,18 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"from langchain.chains import RetrievalQA\n",
"import getpass\n",
"from langchain.document_loaders import PyPDFLoader, TextLoader\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.text_splitter import (\n",
" CharacterTextSplitter,\n",
" RecursiveCharacterTextSplitter,\n",
" CharacterTextSplitter,\n",
")\n",
"from langchain.vectorstores import DeepLake\n",
"from langchain.chains import ConversationalRetrievalChain, RetrievalQA\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"activeloop_token = getpass.getpass(\"Activeloop Token:\")\n",

File diff suppressed because one or more lines are too long

View File

@@ -38,8 +38,9 @@
"outputs": [],
"source": [
"from elasticsearch import Elasticsearch\n",
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
"from langchain.chat_models import ChatOpenAI"
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain"
]
},
{
@@ -111,6 +112,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
"from langchain.prompts.prompt import PromptTemplate\n",
"\n",
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",

View File

@@ -1,214 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2def22ea",
"metadata": {},
"source": [
"# Extraction with OpenAI Tools\n",
"\n",
"Performing extraction has never been easier! OpenAI's tool calling ability is the perfect thing to use as it allows for extracting multiple different elements from text that are different types. \n",
"\n",
"Models after 1106 use tools and support \"parallel function calling\" which makes this super easy."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5c628496",
"metadata": {},
"outputs": [],
"source": [
"from typing import List, Optional\n",
"\n",
"from langchain.chains.openai_tools import create_extraction_chain_pydantic\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.pydantic_v1 import BaseModel"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "afe9657b",
"metadata": {},
"outputs": [],
"source": [
"# Make sure to use a recent model that supports tools\n",
"model = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bc0ca3b6",
"metadata": {},
"outputs": [],
"source": [
"# Pydantic is an easy way to define a schema\n",
"class Person(BaseModel):\n",
" \"\"\"Information about people to extract.\"\"\"\n",
"\n",
" name: str\n",
" age: Optional[int] = None"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2036af68",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain_pydantic(Person, model)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1748ad21",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Person(name='jane', age=2), Person(name='bob', age=3)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"input\": \"jane is 2 and bob is 3\"})"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c8262ce5",
"metadata": {},
"outputs": [],
"source": [
"# Let's define another element\n",
"class Class(BaseModel):\n",
" \"\"\"Information about classes to extract.\"\"\"\n",
"\n",
" teacher: str\n",
" students: List[str]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "4973c104",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain_pydantic([Person, Class], model)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e976a15e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Person(name='jane', age=2),\n",
" Person(name='bob', age=3),\n",
" Class(teacher='Mrs Sampson', students=['jane', 'bob'])]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"input\": \"jane is 2 and bob is 3 and they are in Mrs Sampson's class\"})"
]
},
{
"cell_type": "markdown",
"id": "6575a7d6",
"metadata": {},
"source": [
"## Under the hood\n",
"\n",
"Under the hood, this is a simple chain:"
]
},
{
"cell_type": "markdown",
"id": "b8ba83e5",
"metadata": {},
"source": [
"```python\n",
"from typing import Union, List, Type, Optional\n",
"\n",
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
"from langchain.utils.openai_functions import convert_pydantic_to_openai_tool\n",
"from langchain.schema.runnable import Runnable\n",
"from langchain.pydantic_v1 import BaseModel\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.messages import SystemMessage\n",
"from langchain.schema.language_model import BaseLanguageModel\n",
"\n",
"_EXTRACTION_TEMPLATE = \"\"\"Extract and save the relevant entities mentioned \\\n",
"in the following passage together with their properties.\n",
"\n",
"If a property is not present and is not required in the function parameters, do not include it in the output.\"\"\" # noqa: E501\n",
"\n",
"\n",
"def create_extraction_chain_pydantic(\n",
" pydantic_schemas: Union[List[Type[BaseModel]], Type[BaseModel]],\n",
" llm: BaseLanguageModel,\n",
" system_message: str = _EXTRACTION_TEMPLATE,\n",
") -> Runnable:\n",
" if not isinstance(pydantic_schemas, list):\n",
" pydantic_schemas = [pydantic_schemas]\n",
" prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", system_message),\n",
" (\"user\", \"{input}\")\n",
" ])\n",
" tools = [convert_pydantic_to_openai_tool(p) for p in pydantic_schemas]\n",
" model = llm.bind(tools=tools)\n",
" chain = prompt | model | PydanticToolsParser(tools=pydantic_schemas)\n",
" return chain\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2eac6b68",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -56,8 +56,7 @@
"source": [
"import os\n",
"\n",
"os.environ[\"SERPER_API_KEY\"] = \"\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\""
"os.environ[\"SERPER_API_KEY\"] = \"\"os.environ[\"OPENAI_API_KEY\"] = \"\""
]
},
{
@@ -67,16 +66,21 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Any, List\n",
"import re\n",
"\n",
"import numpy as np\n",
"\n",
"from langchain.schema import BaseRetriever\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForRetrieverRun,\n",
" CallbackManagerForRetrieverRun,\n",
")\n",
"from langchain.utilities import GoogleSerperAPIWrapper\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"from langchain.schema import BaseRetriever, Document\n",
"from langchain.utilities import GoogleSerperAPIWrapper"
"from langchain.schema import Document\n",
"from typing import Any, List"
]
},
{

View File

@@ -46,13 +46,14 @@
"source": [
"from datetime import datetime, timedelta\n",
"from typing import List\n",
"from termcolor import colored\n",
"\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.docstore import InMemoryDocstore\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers import TimeWeightedVectorStoreRetriever\n",
"from langchain.vectorstores import FAISS\n",
"from termcolor import colored"
"from langchain.vectorstores import FAISS"
]
},
{
@@ -152,7 +153,6 @@
"outputs": [],
"source": [
"import math\n",
"\n",
"import faiss\n",
"\n",
"\n",

View File

@@ -27,12 +27,18 @@
"metadata": {},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"import inspect\n",
"import tenacity\n",
"from langchain.output_parsers import RegexParser\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
")"
" BaseMessage,\n",
")\n",
"from langchain.output_parsers import RegexParser"
]
},
{
@@ -125,7 +131,7 @@
" ):\n",
" with attempt:\n",
" action = self._act()\n",
" except tenacity.RetryError:\n",
" except tenacity.RetryError as e:\n",
" action = self.random_action()\n",
" return action"
]

View File

@@ -77,7 +77,6 @@
"source": [
"from langchain.llms import OpenAI\n",
"from langchain_experimental.autonomous_agents import HuggingGPT\n",
"\n",
"# %env OPENAI_API_BASE=http://localhost:8000/v1"
]
},

View File

@@ -20,9 +20,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import HypotheticalDocumentEmbedder, LLMChain\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chains import LLMChain, HypotheticalDocumentEmbedder\n",
"from langchain.prompts import PromptTemplate"
]
},

View File

@@ -50,7 +50,6 @@
"# pick and configure the LLM of your choice\n",
"\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(model=\"text-davinci-003\")"
]
},
@@ -86,8 +85,8 @@
"\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"],\n",
" template=PROMPT_TEMPLATE,\n",
" input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"], \n",
" template=PROMPT_TEMPLATE\n",
")"
]
},
@@ -106,7 +105,7 @@
"source": [
"import langchain_experimental.rl_chain as rl_chain\n",
"\n",
"chain = rl_chain.PickBest.from_llm(llm=llm, prompt=PROMPT)"
"chain = rl_chain.PickBest.from_llm(llm=llm, prompt=PROMPT)\n"
]
},
{
@@ -123,10 +122,10 @@
"outputs": [],
"source": [
"response = chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs \\\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs \\\n",
" believe you will love it!\",\n",
")"
]
@@ -194,10 +193,10 @@
"for _ in range(5):\n",
" try:\n",
" response = chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" )\n",
" except Exception as e:\n",
" print(e)\n",
@@ -224,16 +223,12 @@
"metadata": {},
"outputs": [],
"source": [
"scoring_criteria_template = (\n",
" \"Given {preference} rank how good or bad this selection is {meal}\"\n",
")\n",
"scoring_criteria_template = \"Given {preference} rank how good or bad this selection is {meal}\"\n",
"\n",
"chain = rl_chain.PickBest.from_llm(\n",
" llm=llm,\n",
" prompt=PROMPT,\n",
" selection_scorer=rl_chain.AutoSelectionScorer(\n",
" llm=llm, scoring_criteria_template_str=scoring_criteria_template\n",
" ),\n",
" selection_scorer=rl_chain.AutoSelectionScorer(llm=llm, scoring_criteria_template_str=scoring_criteria_template),\n",
")"
]
},
@@ -260,16 +255,14 @@
],
"source": [
"response = chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
")\n",
"print(response[\"response\"])\n",
"selection_metadata = response[\"selection_metadata\"]\n",
"print(\n",
" f\"selected index: {selection_metadata.selected.index}, score: {selection_metadata.selected.score}\"\n",
")"
"print(f\"selected index: {selection_metadata.selected.index}, score: {selection_metadata.selected.score}\")"
]
},
{
@@ -287,8 +280,8 @@
"source": [
"class CustomSelectionScorer(rl_chain.SelectionScorer):\n",
" def score_response(\n",
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent\n",
" ) -> float:\n",
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent) -> float:\n",
"\n",
" print(event.based_on)\n",
" print(event.to_select_from)\n",
"\n",
@@ -343,10 +336,10 @@
],
"source": [
"response = chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
")"
]
},
@@ -377,10 +370,9 @@
" return 1.0\n",
" else:\n",
" return 0.0\n",
"\n",
" def score_response(\n",
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent\n",
" ) -> float:\n",
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent) -> float:\n",
"\n",
" selected_meal = event.to_select_from[\"meal\"][event.selected.index]\n",
"\n",
" if \"Tom\" in event.based_on[\"user\"]:\n",
@@ -402,7 +394,7 @@
" prompt=PROMPT,\n",
" selection_scorer=CustomSelectionScorer(),\n",
" metrics_step=5,\n",
" metrics_window_size=5, # rolling window average\n",
" metrics_window_size=5, # rolling window average\n",
")\n",
"\n",
"random_chain = rl_chain.PickBest.from_llm(\n",
@@ -410,8 +402,8 @@
" prompt=PROMPT,\n",
" selection_scorer=CustomSelectionScorer(),\n",
" metrics_step=5,\n",
" metrics_window_size=5, # rolling window average\n",
" policy=rl_chain.PickBestRandomPolicy, # set the random policy instead of default\n",
" metrics_window_size=5, # rolling window average\n",
" policy=rl_chain.PickBestRandomPolicy # set the random policy instead of default\n",
")"
]
},
@@ -424,29 +416,29 @@
"for _ in range(20):\n",
" try:\n",
" chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" )\n",
" random_chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" )\n",
"\n",
" \n",
" chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Anna\"),\n",
" preference=rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Anna\"),\n",
" preference = rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" )\n",
" random_chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Anna\"),\n",
" preference=rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Anna\"),\n",
" preference = rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" )\n",
" except Exception as e:\n",
" print(e)"
@@ -485,17 +477,12 @@
],
"source": [
"from matplotlib import pyplot as plt\n",
"\n",
"chain.metrics.to_pandas()[\"score\"].plot(label=\"default learning policy\")\n",
"random_chain.metrics.to_pandas()[\"score\"].plot(label=\"random selection policy\")\n",
"chain.metrics.to_pandas()['score'].plot(label=\"default learning policy\")\n",
"random_chain.metrics.to_pandas()['score'].plot(label=\"random selection policy\")\n",
"plt.legend()\n",
"\n",
"print(\n",
" f\"The final average score for the default policy, calculated over a rolling window, is: {chain.metrics.to_pandas()['score'].iloc[-1]}\"\n",
")\n",
"print(\n",
" f\"The final average score for the random policy, calculated over a rolling window, is: {random_chain.metrics.to_pandas()['score'].iloc[-1]}\"\n",
")"
"print(f\"The final average score for the default policy, calculated over a rolling window, is: {chain.metrics.to_pandas()['score'].iloc[-1]}\")\n",
"print(f\"The final average score for the random policy, calculated over a rolling window, is: {random_chain.metrics.to_pandas()['score'].iloc[-1]}\")"
]
},
{
@@ -790,8 +777,8 @@
}
],
"source": [
"from langchain.globals import set_debug\n",
"from langchain.prompts.prompt import PromptTemplate\n",
"from langchain.globals import set_debug\n",
"\n",
"set_debug(True)\n",
"\n",
@@ -816,10 +803,10 @@
")\n",
"\n",
"chain.run(\n",
" meal=rl_chain.ToSelectFrom(meals),\n",
" user=rl_chain.BasedOn(\"Tom\"),\n",
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
" meal = rl_chain.ToSelectFrom(meals),\n",
" user = rl_chain.BasedOn(\"Tom\"),\n",
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
")"
]
}

View File

@@ -43,8 +43,8 @@
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain_experimental.llm_bash.base import LLMBashChain\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"\n",
@@ -70,7 +70,7 @@
"outputs": [],
"source": [
"from langchain.prompts.prompt import PromptTemplate\n",
"from langchain_experimental.llm_bash.prompt import BashOutputParser\n",
"from langchain.chains.llm_bash.prompt import BashOutputParser\n",
"\n",
"_PROMPT_TEMPLATE = \"\"\"If someone asks you to perform a task, your job is to come up with a series of bash commands that will perform the task. There is no need to put \"#!/bin/bash\" in your answer. Make sure to reason step by step, using this format:\n",
"Question: \"copy the files in the directory named 'target' into a new directory at the same level as target called 'myNewDirectory'\"\n",
@@ -185,6 +185,7 @@
"source": [
"from langchain_experimental.llm_bash.bash import BashProcess\n",
"\n",
"\n",
"persistent_process = BashProcess(persistent=True)\n",
"bash_chain = LLMBashChain.from_llm(llm, bash_process=persistent_process, verbose=True)\n",
"\n",

View File

@@ -45,8 +45,7 @@
}
],
"source": [
"from langchain.chains import LLMMathChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.llms import OpenAI\nfrom langchain.chains import LLMMathChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"llm_math = LLMMathChain.from_llm(llm, verbose=True)\n",

View File

@@ -56,10 +56,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.memory import ConversationBufferWindowMemory\n",
"from langchain.prompts import PromptTemplate"
"from langchain.llms import OpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n",
"from langchain.memory import ConversationBufferWindowMemory"
]
},
{
@@ -154,13 +152,13 @@
" for j in range(max_iters):\n",
" print(f\"(Step {j+1}/{max_iters})\")\n",
" print(f\"Assistant: {output}\")\n",
" print(\"Human: \")\n",
" print(f\"Human: \")\n",
" human_input = input()\n",
" if any(phrase in human_input.lower() for phrase in key_phrases):\n",
" break\n",
" output = chain.predict(human_input=human_input)\n",
" if success_phrase in human_input.lower():\n",
" print(\"You succeeded! Thanks for playing!\")\n",
" print(f\"You succeeded! Thanks for playing!\")\n",
" return\n",
" meta_chain = initialize_meta_chain()\n",
" meta_output = meta_chain.predict(chat_history=get_chat_history(chain.memory))\n",
@@ -168,7 +166,7 @@
" instructions = get_new_instructions(meta_output)\n",
" print(f\"New Instructions: {instructions}\")\n",
" print(\"\\n\" + \"#\" * 80 + \"\\n\")\n",
" print(\"You failed! Thanks for playing!\")"
" print(f\"You failed! Thanks for playing!\")"
]
},
{

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -29,10 +29,9 @@
"metadata": {},
"outputs": [],
"source": [
"from steamship import Block, Steamship\n",
"import re\n",
"\n",
"from IPython.display import Image\n",
"from steamship import Block, Steamship"
"from IPython.display import Image"
]
},
{
@@ -42,8 +41,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import AgentType, initialize_agent\n",
"from langchain.llms import OpenAI\n",
"from langchain.agents import initialize_agent\n",
"from langchain.agents import AgentType\n",
"from langchain.tools import SteamshipImageGenerationTool"
]
},

View File

@@ -26,12 +26,13 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Callable, List\n",
"\n",
"from typing import List, Dict, Callable\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},

View File

@@ -27,20 +27,26 @@
"metadata": {},
"outputs": [],
"source": [
"from collections import OrderedDict\n",
"import functools\n",
"import random\n",
"from collections import OrderedDict\n",
"from typing import Callable, List\n",
"\n",
"import re\n",
"import tenacity\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.output_parsers import RegexParser\n",
"from typing import List, Dict, Callable\n",
"\n",
"from langchain.prompts import (\n",
" ChatPromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
" PromptTemplate,\n",
")\n",
"from langchain.chains import LLMChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.output_parsers import RegexParser\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},

View File

@@ -24,15 +24,17 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Callable, List\n",
"\n",
"from langchain.prompts import PromptTemplate\n",
"import re\n",
"import tenacity\n",
"from typing import List, Dict, Callable\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.output_parsers import RegexParser\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},

View File

@@ -27,17 +27,19 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"from os import environ\n",
"\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.utilities import SQLDatabase\n",
"import getpass\n",
"from typing import Dict, Any\n",
"from langchain.llms import OpenAI\nfrom langchain.utilities import SQLDatabase\nfrom langchain.chains import LLMChain\n",
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
"from sqlalchemy import MetaData, create_engine\n",
"from sqlalchemy import create_engine, Column, MetaData\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"MYSCALE_HOST = \"msc-4a9e710a.us-east-1.aws.staging.myscale.cloud\"\n",
"\n",
"from sqlalchemy import create_engine\n",
"\n",
"MYSCALE_HOST = \"msc-1decbcc9.us-east-1.aws.staging.myscale.cloud\"\n",
"MYSCALE_PORT = 443\n",
"MYSCALE_USER = \"chatdata\"\n",
"MYSCALE_PASSWORD = \"myscale_rocks\"\n",
@@ -74,8 +76,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks import StdOutCallbackHandler\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import StdOutCallbackHandler\n",
"\n",
"from langchain.utilities.sql_database import SQLDatabase\n",
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
@@ -116,16 +120,14 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain_experimental.retrievers.vector_sql_database import (\n",
" VectorSQLDatabaseChainRetriever,\n",
")\n",
"from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain\n",
"\n",
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
"from langchain_experimental.retrievers.vector_sql_database \\\n",
" import VectorSQLDatabaseChainRetriever\n",
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
"from langchain_experimental.sql.vector_sql import (\n",
" VectorSQLDatabaseChain,\n",
" VectorSQLRetrieveAllOutputParser,\n",
")\n",
"from langchain_experimental.sql.vector_sql import VectorSQLRetrieveAllOutputParser\n",
"\n",
"output_parser_retrieve_all = VectorSQLRetrieveAllOutputParser.from_embeddings(\n",
" output_parser.model\n",
@@ -142,9 +144,7 @@
")\n",
"\n",
"# You need all those keys to get docs\n",
"retriever = VectorSQLDatabaseChainRetriever(\n",
" sql_db_chain=chain, page_content_key=\"abstract\"\n",
")\n",
"retriever = VectorSQLDatabaseChainRetriever(sql_db_chain=chain, page_content_key=\"abstract\")\n",
"\n",
"document_with_metadata_prompt = PromptTemplate(\n",
" input_variables=[\"page_content\", \"id\", \"title\", \"authors\", \"pubdate\", \"categories\"],\n",
@@ -162,10 +162,8 @@
" },\n",
" return_source_documents=True,\n",
")\n",
"ans = chain(\n",
" \"Please give me 10 papers to ask what is PageRank?\",\n",
" callbacks=[StdOutCallbackHandler()],\n",
")\n",
"ans = chain(\"Please give me 10 papers to ask what is PageRank?\",\n",
" callbacks=[StdOutCallbackHandler()])\n",
"print(ans[\"answer\"])"
]
},

View File

@@ -50,10 +50,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import create_qa_with_sources_chain\n",
"from langchain.chains.combine_documents.stuff import StuffDocumentsChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import PromptTemplate"
"from langchain.chains.combine_documents.stuff import StuffDocumentsChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import create_qa_with_sources_chain"
]
},
{
@@ -230,8 +230,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import ConversationalRetrievalChain, LLMChain\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chains import LLMChain\n",
"\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
@@ -356,10 +357,12 @@
"source": [
"from typing import List\n",
"\n",
"from pydantic import BaseModel, Field\n",
"\n",
"from langchain.chains.openai_functions import create_qa_with_structure_chain\n",
"\n",
"from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate\n",
"from langchain.schema import HumanMessage, SystemMessage\n",
"from pydantic import BaseModel, Field"
"from langchain.schema import SystemMessage, HumanMessage"
]
},
{

View File

@@ -1,506 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f970f757-ec76-4bf0-90cd-a2fb68b945e3",
"metadata": {},
"source": [
"# Exploring OpenAI V1 functionality\n",
"\n",
"On 11.06.23 OpenAI released a number of new features, and along with it bumped their Python SDK to 1.0.0. This notebook shows off the new features and how to use them with LangChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee897729-263a-4073-898f-bb4cf01ed829",
"metadata": {},
"outputs": [],
"source": [
"# need openai>=1.1.0, langchain>=0.0.335, langchain-experimental>=0.0.39\n",
"!pip install -U openai langchain langchain-experimental"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c3e067ce-7a43-47a7-bc89-41f1de4cf136",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema.messages import HumanMessage, SystemMessage"
]
},
{
"cell_type": "markdown",
"id": "fa7e7e95-90a1-4f73-98fe-10c4b4e0951b",
"metadata": {},
"source": [
"## [Vision](https://platform.openai.com/docs/guides/vision)\n",
"\n",
"OpenAI released multi-modal models, which can take a sequence of text and images as input."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1c8c3965-d3c9-4186-b5f3-5e67855ef916",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='The image appears to be a diagram representing the architecture or components of a software system or framework related to language processing, possibly named LangChain or associated with a project or product called LangChain, based on the prominent appearance of that term. The diagram is organized into several layers or aspects, each containing various elements or modules:\\n\\n1. **Protocol**: This may be the foundational layer, which includes \"LCEL\" and terms like parallelization, fallbacks, tracing, batching, streaming, async, and composition. These seem related to communication and execution protocols for the system.\\n\\n2. **Integrations Components**: This layer includes \"Model I/O\" with elements such as the model, output parser, prompt, and example selector. It also has a \"Retrieval\" section with a document loader, retriever, embedding model, vector store, and text splitter. Lastly, there\\'s an \"Agent Tooling\" section. These components likely deal with the interaction with external data, models, and tools.\\n\\n3. **Application**: The application layer features \"LangChain\" with chains, agents, agent executors, and common application logic. This suggests that the system uses a modular approach with chains and agents to process language tasks.\\n\\n4. **Deployment**: This contains \"Lang')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=256)\n",
"chat.invoke(\n",
" [\n",
" HumanMessage(\n",
" content=[\n",
" {\"type\": \"text\", \"text\": \"What is this image showing\"},\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": \"https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/static/img/langchain_stack.png\",\n",
" \"detail\": \"auto\",\n",
" },\n",
" },\n",
" ]\n",
" )\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "210f8248-fcf3-4052-a4a3-0684e08f8785",
"metadata": {},
"source": [
"## [OpenAI assistants](https://platform.openai.com/docs/assistants/overview)\n",
"\n",
"> The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling\n",
"\n",
"\n",
"You can interact with OpenAI Assistants using OpenAI tools or custom tools. When using exclusively OpenAI tools, you can just invoke the assistant directly and get final answers. When using custom tools, you can run the assistant and tool execution loop using the built-in AgentExecutor or easily write your own executor.\n",
"\n",
"Below we show the different ways to interact with Assistants. As a simple example, let's build a math tutor that can write and run code."
]
},
{
"cell_type": "markdown",
"id": "318da28d-4cec-42ab-ae3e-76d95bb34fa5",
"metadata": {},
"source": [
"### Using only OpenAI tools"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a9064bbe-d9f7-4a29-a7b3-73933b3197e7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents.openai_assistant import OpenAIAssistantRunnable"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7a20a008-49ac-46d2-aa26-b270118af5ea",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[ThreadMessage(id='msg_g9OJv0rpPgnc3mHmocFv7OVd', assistant_id='asst_hTwZeNMMphxzSOqJ01uBMsJI', content=[MessageContentText(text=Text(annotations=[], value='The result of \\\\(10 - 4^{2.7}\\\\) is approximately \\\\(-32.224\\\\).'), type='text')], created_at=1699460600, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_nBIT7SiAwtUfSCTrQNSPLOfe', thread_id='thread_14n4GgXwxgNL0s30WJW5F6p0')]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"interpreter_assistant = OpenAIAssistantRunnable.create_assistant(\n",
" name=\"langchain assistant\",\n",
" instructions=\"You are a personal math tutor. Write and run code to answer math questions.\",\n",
" tools=[{\"type\": \"code_interpreter\"}],\n",
" model=\"gpt-4-1106-preview\",\n",
")\n",
"output = interpreter_assistant.invoke({\"content\": \"What's 10 - 4 raised to the 2.7\"})\n",
"output"
]
},
{
"cell_type": "markdown",
"id": "a8ddd181-ac63-4ab6-a40d-a236120379c1",
"metadata": {},
"source": [
"### As a LangChain agent with arbitrary tools\n",
"\n",
"Now let's recreate this functionality using our own tools. For this example we'll use the [E2B sandbox runtime tool](https://e2b.dev/docs?ref=landing-page-get-started)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee4cc355-f2d6-4c51-bcf7-f502868357d3",
"metadata": {},
"outputs": [],
"source": [
"!pip install e2b duckduckgo-search"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "48681ac7-b267-48d4-972c-8a7df8393a21",
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools import DuckDuckGoSearchRun, E2BDataAnalysisTool\n",
"\n",
"tools = [E2BDataAnalysisTool(api_key=\"...\"), DuckDuckGoSearchRun()]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1c01dd79-dd3e-4509-a2e2-009a7f99f16a",
"metadata": {},
"outputs": [],
"source": [
"agent = OpenAIAssistantRunnable.create_assistant(\n",
" name=\"langchain assistant e2b tool\",\n",
" instructions=\"You are a personal math tutor. Write and run code to answer math questions. You can also search the internet.\",\n",
" tools=tools,\n",
" model=\"gpt-4-1106-preview\",\n",
" as_agent=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "1ac71d8b-4b4b-4f98-b826-6b3c57a34166",
"metadata": {},
"source": [
"#### Using AgentExecutor"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1f137f94-801f-4766-9ff5-2de9df5e8079",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'content': \"What's the weather in SF today divided by 2.7\",\n",
" 'output': \"The weather in San Francisco today is reported to have temperatures as high as 66 °F. To get the temperature divided by 2.7, we will calculate that:\\n\\n66 °F / 2.7 = 24.44 °F\\n\\nSo, when the high temperature of 66 °F is divided by 2.7, the result is approximately 24.44 °F. Please note that this doesn't have a meteorological meaning; it's purely a mathematical operation based on the given temperature.\"}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.agents import AgentExecutor\n",
"\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools)\n",
"agent_executor.invoke({\"content\": \"What's the weather in SF today divided by 2.7\"})"
]
},
{
"cell_type": "markdown",
"id": "2d0a0b1d-c1b3-4b50-9dce-1189b51a6206",
"metadata": {},
"source": [
"#### Custom execution"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c0475fa7-b6c1-4331-b8e2-55407466c724",
"metadata": {},
"outputs": [],
"source": [
"agent = OpenAIAssistantRunnable.create_assistant(\n",
" name=\"langchain assistant e2b tool\",\n",
" instructions=\"You are a personal math tutor. Write and run code to answer math questions.\",\n",
" tools=tools,\n",
" model=\"gpt-4-1106-preview\",\n",
" as_agent=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b76cb669-6aba-4827-868f-00aa960026f2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.agent import AgentFinish\n",
"\n",
"\n",
"def execute_agent(agent, tools, input):\n",
" tool_map = {tool.name: tool for tool in tools}\n",
" response = agent.invoke(input)\n",
" while not isinstance(response, AgentFinish):\n",
" tool_outputs = []\n",
" for action in response:\n",
" tool_output = tool_map[action.tool].invoke(action.tool_input)\n",
" print(action.tool, action.tool_input, tool_output, end=\"\\n\\n\")\n",
" tool_outputs.append(\n",
" {\"output\": tool_output, \"tool_call_id\": action.tool_call_id}\n",
" )\n",
" response = agent.invoke(\n",
" {\n",
" \"tool_outputs\": tool_outputs,\n",
" \"run_id\": action.run_id,\n",
" \"thread_id\": action.thread_id,\n",
" }\n",
" )\n",
"\n",
" return response"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7946116a-b82f-492e-835e-ca958a8949a5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"e2b_data_analysis {'python_code': 'print(10 - 4 ** 2.7)'} {\"stdout\": \"-32.22425314473263\", \"stderr\": \"\", \"artifacts\": []}\n",
"\n",
"\\( 10 - 4^{2.7} \\) is approximately \\(-32.22425314473263\\).\n"
]
}
],
"source": [
"response = execute_agent(agent, tools, {\"content\": \"What's 10 - 4 raised to the 2.7\"})\n",
"print(response.return_values[\"output\"])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f2744a56-9f4f-4899-827a-fa55821c318c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7\\nprint(result + 17.241)'} {\"stdout\": \"-14.983253144732629\", \"stderr\": \"\", \"artifacts\": []}\n",
"\n",
"When you add \\( 17.241 \\) to \\( 10 - 4^{2.7} \\), the result is approximately \\( -14.98325314473263 \\).\n"
]
}
],
"source": [
"next_response = execute_agent(\n",
" agent, tools, {\"content\": \"now add 17.241\", \"thread_id\": response.thread_id}\n",
")\n",
"print(next_response.return_values[\"output\"])"
]
},
{
"cell_type": "markdown",
"id": "71c34763-d1e7-4b9a-a9d7-3e4cc0dfc2c4",
"metadata": {},
"source": [
"## [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode)\n",
"\n",
"Constrain the model to only generate valid JSON. Note that you must include a system message with instructions to use JSON for this mode to work.\n",
"\n",
"Only works with certain models. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db6072c4-f3f3-415d-872b-71ea9f3c02bb",
"metadata": {},
"outputs": [],
"source": [
"chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\").bind(\n",
" response_format={\"type\": \"json_object\"}\n",
")\n",
"\n",
"output = chat.invoke(\n",
" [\n",
" SystemMessage(\n",
" content=\"Extract the 'name' and 'origin' of any companies mentioned in the following statement. Return a JSON list.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Google was founded in the USA, while Deepmind was founded in the UK\"\n",
" ),\n",
" ]\n",
")\n",
"print(output.content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08e00ccf-b991-4249-846b-9500a0ccbfa0",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"json.loads(output.content)"
]
},
{
"cell_type": "markdown",
"id": "aa9a94d9-4319-4ab7-a979-c475ce6b5f50",
"metadata": {},
"source": [
"## [System fingerprint](https://platform.openai.com/docs/guides/text-generation/reproducible-outputs)\n",
"\n",
"OpenAI sometimes changes model configurations in a way that impacts outputs. Whenever this happens, the system_fingerprint associated with a generation will change."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1281883c-bf8f-4665-89cd-4f33ccde69ab",
"metadata": {},
"outputs": [],
"source": [
"chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")\n",
"output = chat.generate(\n",
" [\n",
" [\n",
" SystemMessage(\n",
" content=\"Extract the 'name' and 'origin' of any companies mentioned in the following statement. Return a JSON list.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Google was founded in the USA, while Deepmind was founded in the UK\"\n",
" ),\n",
" ]\n",
" ]\n",
")\n",
"print(output.llm_output)"
]
},
{
"cell_type": "markdown",
"id": "aa6565be-985d-4127-848e-c3bca9d7b434",
"metadata": {},
"source": [
"## Breaking changes to Azure classes\n",
"\n",
"OpenAI V1 rewrote their clients and separated Azure and OpenAI clients. This has led to some changes in LangChain interfaces when using OpenAI V1.\n",
"\n",
"BREAKING CHANGES:\n",
"- To use Azure embeddings with OpenAI V1, you'll need to use the new `AzureOpenAIEmbeddings` instead of the existing `OpenAIEmbeddings`. `OpenAIEmbeddings` continue to work when using Azure with `openai<1`.\n",
"```python\n",
"from langchain.embeddings import AzureOpenAIEmbeddings\n",
"```\n",
"\n",
"\n",
"RECOMMENDED CHANGES:\n",
"- When using `AzureChatOpenAI` or `AzureOpenAI`, if passing in an Azure endpoint (eg https://example-resource.azure.openai.com/) this should be specified via the `azure_endpoint` parameter or the `AZURE_OPENAI_ENDPOINT`. We're maintaining backwards compatibility for now with specifying this via `openai_api_base`/`base_url` or env var `OPENAI_API_BASE` but this shouldn't be relied upon.\n",
"- When using Azure chat or embedding models, pass in API keys either via `openai_api_key` parameter or `AZURE_OPENAI_API_KEY` parameter. We're maintaining backwards compatibility for now with specifying this via `OPENAI_API_KEY` but this shouldn't be relied upon."
]
},
{
"cell_type": "markdown",
"id": "49944887-3972-497e-8da2-6d32d44345a9",
"metadata": {},
"source": [
"## Tools\n",
"\n",
"Use tools for parallel function calling."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "916292d8-0f89-40a6-af1c-5a1122327de8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[GetCurrentWeather(location='New York, NY', unit='fahrenheit'),\n",
" GetCurrentWeather(location='Los Angeles, CA', unit='fahrenheit'),\n",
" GetCurrentWeather(location='San Francisco, CA', unit='fahrenheit')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing import Literal\n",
"\n",
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.pydantic_v1 import BaseModel, Field\n",
"from langchain.utils.openai_functions import convert_pydantic_to_openai_tool\n",
"\n",
"\n",
"class GetCurrentWeather(BaseModel):\n",
" \"\"\"Get the current weather in a location.\"\"\"\n",
"\n",
" location: str = Field(description=\"The city and state, e.g. San Francisco, CA\")\n",
" unit: Literal[\"celsius\", \"fahrenheit\"] = Field(\n",
" default=\"fahrenheit\", description=\"The temperature unit, default to fahrenheit\"\n",
" )\n",
"\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", \"You are a helpful assistant\"), (\"user\", \"{input}\")]\n",
")\n",
"model = ChatOpenAI(model=\"gpt-3.5-turbo-1106\").bind(\n",
" tools=[convert_pydantic_to_openai_tool(GetCurrentWeather)]\n",
")\n",
"chain = prompt | model | PydanticToolsParser(tools=[GetCurrentWeather])\n",
"\n",
"chain.invoke({\"input\": \"what's the weather in NYC, LA, and SF\"})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv",
"language": "python",
"name": "poetry-venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -45,14 +45,14 @@
"source": [
"import collections\n",
"import inspect\n",
"\n",
"import tenacity\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.output_parsers import RegexParser\n",
"from langchain.schema import (\n",
" HumanMessage,\n",
" SystemMessage,\n",
")"
")\n",
"from langchain.output_parsers import RegexParser"
]
},
{
@@ -146,7 +146,7 @@
" ):\n",
" with attempt:\n",
" action = self._act()\n",
" except tenacity.RetryError:\n",
" except tenacity.RetryError as e:\n",
" action = self.random_action()\n",
" return action"
]

View File

@@ -34,11 +34,7 @@
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"from langchain.utilities import DuckDuckGoSearchAPIWrapper\n",
"from langchain_experimental.plan_and_execute import (\n",
" PlanAndExecute,\n",
" load_agent_executor,\n",
" load_chat_planner,\n",
")"
"from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner"
]
},
{
@@ -60,16 +56,16 @@
"llm = OpenAI(temperature=0)\n",
"llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)\n",
"tools = [\n",
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\",\n",
" ),\n",
" Tool(\n",
" name=\"Calculator\",\n",
" func=llm_math_chain.run,\n",
" description=\"useful for when you need to answer questions about math\",\n",
" ),\n",
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\"\n",
" ),\n",
" Tool(\n",
" name=\"Calculator\",\n",
" func=llm_math_chain.run,\n",
" description=\"useful for when you need to answer questions about math\"\n",
" ),\n",
"]"
]
},
@@ -220,9 +216,7 @@
}
],
"source": [
"agent.run(\n",
" \"Who is the current prime minister of the UK? What is their current age raised to the 0.43 power?\"\n",
")"
"agent.run(\"Who is the current prime minister of the UK? What is their current age raised to the 0.43 power?\")"
]
},
{

View File

@@ -55,7 +55,6 @@
"source": [
"# Setup API keys for Kay and OpenAI\n",
"from getpass import getpass\n",
"\n",
"KAY_API_KEY = getpass()\n",
"OPENAI_API_KEY = getpass()"
]
@@ -68,7 +67,6 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"KAY_API_KEY\"] = KAY_API_KEY\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
@@ -85,9 +83,7 @@
"from langchain.retrievers import KayAiRetriever\n",
"\n",
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
"retriever = KayAiRetriever.create(\n",
" dataset_id=\"company\", data_types=[\"PressRelease\"], num_contexts=6\n",
")\n",
"retriever = KayAiRetriever.create(dataset_id=\"company\", data_types=[\"PressRelease\"], num_contexts=6)\n",
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
]
},
@@ -120,7 +116,7 @@
"# More sample questions in the Playground on https://kay.ai\n",
"questions = [\n",
" \"How is the healthcare industry adopting generative AI tools?\",\n",
" # \"What are some recent challenges faced by the renewable energy sector?\",\n",
" #\"What are some recent challenges faced by the renewable energy sector?\",\n",
"]\n",
"chat_history = []\n",
"\n",

View File

@@ -17,8 +17,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain_experimental.pal_chain import PALChain"
"from langchain_experimental.pal_chain import PALChain\n",
"from langchain.llms import OpenAI"
]
},
{

View File

@@ -1,181 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG based on Qianfan and BES"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook is an implementation of Retrieval augmented generation (RAG) using Baidu Qianfan Platform combined with Baidu ElasricSearch, where the original data is located on BOS.\n",
"## Baidu Qianfan\n",
"Baidu AI Cloud Qianfan Platform is a one-stop large model development and service operation platform for enterprise developers. Qianfan not only provides including the model of Wenxin Yiyan (ERNIE-Bot) and the third-party open-source models, but also provides various AI development tools and the whole set of development environment, which facilitates customers to use and develop large model applications easily.\n",
"\n",
"## Baidu ElasticSearch\n",
"[Baidu Cloud VectorSearch](https://cloud.baidu.com/doc/BES/index.html?from=productToDoc) is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods. "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation and Setup\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install qianfan\n",
"#!pip install bce-python-sdk\n",
"#!pip install elasticsearch == 7.11.0"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from baidubce.auth.bce_credentials import BceCredentials\n",
"from baidubce.bce_client_configuration import BceClientConfiguration\n",
"from langchain.document_loaders.baiducloud_bos_directory import BaiduBOSDirectoryLoader\n",
"from langchain.embeddings.huggingface import HuggingFaceEmbeddings\n",
"from langchain.llms.baidu_qianfan_endpoint import QianfanLLMEndpoint\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.vectorstores import BESVectorStore"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Document loading"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bos_host = \"your bos eddpoint\"\n",
"access_key_id = \"your bos access ak\"\n",
"secret_access_key = \"your bos access sk\"\n",
"\n",
"# create BceClientConfiguration\n",
"config = BceClientConfiguration(\n",
" credentials=BceCredentials(access_key_id, secret_access_key), endpoint=bos_host\n",
")\n",
"\n",
"loader = BaiduBOSDirectoryLoader(conf=config, bucket=\"llm-test\", prefix=\"llm/\")\n",
"documents = loader.load()\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)\n",
"split_docs = text_splitter.split_documents(documents)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Embedding and VectorStore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = HuggingFaceEmbeddings(model_name=\"shibing624/text2vec-base-chinese\")\n",
"embeddings.client = sentence_transformers.SentenceTransformer(embeddings.model_name)\n",
"\n",
"db = BESVectorStore.from_documents(\n",
" documents=split_docs,\n",
" embedding=embeddings,\n",
" bes_url=\"your bes url\",\n",
" index_name=\"test-index\",\n",
" vector_query_field=\"vector\",\n",
")\n",
"\n",
"db.client.indices.refresh(index=\"test-index\")\n",
"retriever = db.as_retriever()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## QA Retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = QianfanLLMEndpoint(\n",
" model=\"ERNIE-Bot\",\n",
" qianfan_ak=\"your qianfan ak\",\n",
" qianfan_sk=\"your qianfan sk\",\n",
" streaming=True,\n",
")\n",
"qa = RetrievalQA.from_chain_type(\n",
" llm=llm, chain_type=\"refine\", retriever=retriever, return_source_documents=True\n",
")\n",
"\n",
"query = \"什么是张量?\"\n",
"print(qa.run(query))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> 张量Tensor是一个数学概念用于表示多维数据。它是一个可以表示多个数值的数组可以是标量、向量、矩阵等。在深度学习和人工智能领域中张量常用于表示神经网络的输入、输出和权重等。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.17"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -30,10 +30,10 @@
"outputs": [],
"source": [
"import pinecone\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.vectorstores import Pinecone\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"\n",
"pinecone.init(api_key=\"...\", environment=\"...\")"
"pinecone.init(api_key=\"...\",environment=\"...\")"
]
},
{
@@ -53,7 +53,7 @@
" \"doc7\": \"Climate change: The science and models.\",\n",
" \"doc8\": \"Global warming: A subset of climate change.\",\n",
" \"doc9\": \"How climate change affects daily weather.\",\n",
" \"doc10\": \"The history of climate change activism.\",\n",
" \"doc10\": \"The history of climate change activism.\"\n",
"}"
]
},
@@ -64,9 +64,7 @@
"metadata": {},
"outputs": [],
"source": [
"vectorstore = Pinecone.from_texts(\n",
" list(all_documents.values()), OpenAIEmbeddings(), index_name=\"rag-fusion\"\n",
")"
"vectorstore = Pinecone.from_texts(list(all_documents.values()), OpenAIEmbeddings(), index_name='rag-fusion')"
]
},
{
@@ -87,6 +85,7 @@
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser"
]
},
@@ -99,7 +98,7 @@
"source": [
"from langchain import hub\n",
"\n",
"prompt = hub.pull(\"langchain-ai/rag-fusion-query-generation\")"
"prompt = hub.pull('langchain-ai/rag-fusion-query-generation')"
]
},
{
@@ -123,9 +122,7 @@
"metadata": {},
"outputs": [],
"source": [
"generate_queries = (\n",
" prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split(\"\\n\"))\n",
")"
"generate_queries = prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split(\"\\n\"))"
]
},
{
@@ -174,8 +171,6 @@
"outputs": [],
"source": [
"from langchain.load import dumps, loads\n",
"\n",
"\n",
"def reciprocal_rank_fusion(results: list[list], k=60):\n",
" fused_scores = {}\n",
" for docs in results:\n",
@@ -186,12 +181,9 @@
" fused_scores[doc_str] = 0\n",
" previous_score = fused_scores[doc_str]\n",
" fused_scores[doc_str] += 1 / (rank + k)\n",
"\n",
" reranked_results = [\n",
" (loads(doc), score)\n",
" for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)\n",
" ]\n",
" return reranked_results"
" \n",
" reranked_results = [(loads(doc), score) for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)]\n",
" return reranked_results "
]
},
{

View File

@@ -13,13 +13,21 @@
"### Overall workflow\n",
"\n",
"1. Generating embeddings for a specific column\n",
" \n",
"2. Storing the embeddings in a new column (if column has low cardinality, it's better to use another table containing unique values and their embeddings)\n",
"3. Querying using standard SQL queries with [PGVector](https://github.com/pgvector/pgvector) extension which allows using L2 distance (`<->`), Cosine distance (`<=>` or cosine similarity using `1 - <=>`) and Inner product (`<#>`)\n",
" \n",
"3. Querying using standard SQL queries with [PGVector](https://github.com/pgvector/pgvector) extension which allows using:\n",
"* L2 distance (`<->`)\n",
"* Cosine distance (`<=>` or cosine similarity using `1 - <=>`)\n",
"* Inner product (`<#>`)\n",
" \n",
"4. Running standard SQL query\n",
"\n",
"### Requirements\n",
"\n",
"We will need a PostgreSQL database with [pgvector](https://github.com/pgvector/pgvector) extension enabled. For this example, we will use a `Chinook` database using a local PostgreSQL server."
"We will need a PostgreSQL database with [pgvector](https://github.com/pgvector/pgvector) extension enabled. \n",
"\n",
"For this example, we will use a `Chinook` database using a local PostgreSQL server."
]
},
{
@@ -28,12 +36,9 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\") or getpass.getpass(\n",
" \"OpenAI API Key:\"\n",
")"
"import getpass\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get('OPENAI_API_KEY') or getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -42,10 +47,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.sql_database import SQLDatabase\n",
"\n",
"CONNECTION_STRING = \"postgresql+psycopg2://postgres:test@localhost:5432/vectordb\" # Replace with your own\n",
"from langchain.chat_models import ChatOpenAI\n",
"CONNECTION_STRING = \"postgresql+psycopg2://postgres:test@localhost:5432/vectordb\" # Replace with your own\n",
"db = SQLDatabase.from_uri(CONNECTION_STRING)"
]
},
@@ -62,12 +66,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this example, we will run queries based on semantic meaning of song titles. In order to do this, let's start by adding a new column in the table for storing the embeddings:"
"For this example, we will run queries based on semantic meaning of song titles. \n",
"\n",
"In order to do this, let's start by adding a new column in the table for storing the embeddings:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
@@ -84,32 +90,14 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"\n",
"embeddings_model = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3503"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings_model = OpenAIEmbeddings()\n",
"\n",
"tracks = db.run('SELECT \"Name\" FROM \"Track\"')\n",
"song_titles = [s[0] for s in eval(tracks)]\n",
"title_embeddings = embeddings_model.embed_documents(song_titles)\n",
@@ -126,19 +114,16 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"from tqdm import tqdm\n",
"\n",
"for i in tqdm(range(len(title_embeddings))):\n",
" title = titles[i].replace(\"'\", \"''\")\n",
" title = titles[i].replace(\"'\",\"''\")\n",
" embedding = title_embeddings[i]\n",
" sql_command = (\n",
" f'UPDATE \"Track\" SET \"embeddings\" = ARRAY{embedding} WHERE \"Name\" ='\n",
" + f\"'{title}'\"\n",
" )\n",
" sql_command = f'UPDATE \"Track\" SET \"embeddings\" = ARRAY{embedding} WHERE \"Name\" =' + f\"'{title}'\"\n",
" db.run(sql_command)"
]
},
@@ -152,7 +137,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 21,
"metadata": {},
"outputs": [
{
@@ -161,20 +146,24 @@
"'[(\"Tomorrow\\'s Dream\",), (\\'Remember Tomorrow\\',), (\\'Remember Tomorrow\\',), (\\'The Best Is Yet To Come\\',), (\"Thinking \\'Bout Tomorrow\",)]'"
]
},
"execution_count": 7,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeded_title = embeddings_model.embed_query(\"hope about the future\")\n",
"query = (\n",
" 'SELECT \"Track\".\"Name\" FROM \"Track\" WHERE \"Track\".\"embeddings\" IS NOT NULL ORDER BY \"embeddings\" <-> '\n",
" + f\"'{embeded_title}' LIMIT 5\"\n",
")\n",
"query = 'SELECT \"Track\".\"Name\" FROM \"Track\" WHERE \"Track\".\"embeddings\" IS NOT NULL ORDER BY \"embeddings\" <-> ' + f\"'{embeded_title}' LIMIT 5\"\n",
"db.run(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see the the song titles are conceptually similar to our search term `\"hope about the future\"`."
]
},
{
"attachments": {},
"cell_type": "markdown",
@@ -193,14 +182,13 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def get_schema(_):\n",
" return db.get_table_info()\n",
"\n",
"\n",
"def run_query(query):\n",
" return db.run(query)"
]
@@ -210,12 +198,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's build the **prompt** we will use. This prompt is an extension from [text-to-postgres-sql](https://smith.langchain.com/hub/jacob/text-to-postgres-sql?organizationId=f9b614b8-5c3a-4e7c-afbc-6d7ad4fd8892) prompt"
"Now let's build the **prompt** we will use:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
@@ -245,12 +233,15 @@
"Only use the following tables:\n",
"\n",
"{schema}\n",
"\n",
"QUESTION: {question}\n",
"SQLQuery:\n",
"\n",
"\"\"\"\n",
"\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", template), (\"human\", \"{question}\")]\n",
")"
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
" (\"human\", template)\n",
"])"
]
},
{
@@ -263,49 +254,34 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 22,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/manuelsoria/miniconda3/envs/auto-gpt/lib/python3.8/site-packages/langchain/utilities/sql_database.py:112: SAWarning: Did not recognize type 'vector' of column 'title_embedding'\n",
" self._metadata.reflect(\n",
"/Users/manuelsoria/miniconda3/envs/auto-gpt/lib/python3.8/site-packages/langchain/utilities/sql_database.py:112: SAWarning: Did not recognize type 'vector' of column 'embeddings'\n",
" self._metadata.reflect(\n"
]
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"db = SQLDatabase.from_uri(\n",
" CONNECTION_STRING\n",
") # We reconnect to db so the new columns are loaded as well.\n",
"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
"db = SQLDatabase.from_uri(CONNECTION_STRING) # We reconnect to db so the new columns are loaded as well.\n",
"llm = ChatOpenAI(model_name='gpt-4', temperature=0)\n",
"\n",
"sql_query_chain = (\n",
" RunnablePassthrough.assign(schema=get_schema)\n",
" | prompt\n",
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'SQLQuery: SELECT \"Track\".\"Name\" FROM \"Track\" JOIN \"Genre\" ON \"Track\".\"GenreId\" = \"Genre\".\"GenreId\" WHERE \"Genre\".\"Name\" = \\'Rock\\' ORDER BY \"Track\".\"embeddings\" <-> \\'[dispair]\\' LIMIT 5'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql_query_chain.invoke(\n",
" {\n",
" \"question\": \"Which are the 5 rock songs with titles about deep feeling of dispair?\"\n",
" }\n",
")"
" RunnablePassthrough.assign(schema=get_schema)\n",
" | prompt\n",
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
" | StrOutputParser()\n",
" )"
]
},
{
@@ -318,28 +294,23 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"\n",
"from langchain.schema.runnable import RunnableLambda\n",
"\n",
"\n",
"# Inject the embedding for any words within brackets \n",
"def replace_brackets(match):\n",
" words_inside_brackets = match.group(1).split(\", \")\n",
" embedded_words = [\n",
" str(embeddings_model.embed_query(word)) for word in words_inside_brackets\n",
" ]\n",
" words_inside_brackets = match.group(1).split(', ')\n",
" embedded_words = [str(embeddings_model.embed_query(word)) for word in words_inside_brackets]\n",
" return \"', '\".join(embedded_words)\n",
"\n",
"\n",
"def get_query(query):\n",
" sql_query = re.sub(r\"\\[([\\w\\s,]+)\\]\", replace_brackets, query)\n",
" sql_query = re.sub(r'\\[([\\w\\s,]+)\\]', replace_brackets, query)\n",
" return sql_query\n",
"\n",
"\n",
" \n",
"template = \"\"\"Based on the table schema below, question, sql query, and sql response, write a natural language response:\n",
"{schema}\n",
"\n",
@@ -347,17 +318,18 @@
"SQL Query: {query}\n",
"SQL Response: {response}\"\"\"\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", template), (\"human\", \"{question}\")]\n",
")\n",
"prompt_response = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\"),\n",
" (\"human\", template)\n",
"])\n",
"\n",
"full_chain = (\n",
" RunnablePassthrough.assign(query=sql_query_chain)\n",
" | RunnablePassthrough.assign(\n",
" schema=get_schema,\n",
" response=RunnableLambda(lambda x: db.run(get_query(x[\"query\"]))),\n",
" )\n",
" | prompt\n",
" )\n",
" | prompt_response \n",
" | llm\n",
")"
]
@@ -388,26 +360,22 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"The 5 rock songs with titles that convey a deep feeling of despair are 'Sea Of Sorrow', 'Surrender', 'Indifference', 'Hard Luck Woman', and 'Desire'.\")"
"AIMessage(content=\"The 5 rock songs with titles about deep feeling of despair are 'Sea Of Sorrow', 'Surrender', 'Indifference', 'Hard Luck Woman', and 'Desire'.\")"
]
},
"execution_count": 11,
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_chain.invoke(\n",
" {\n",
" \"question\": \"Which are the 5 rock songs with titles about deep feeling of dispair?\"\n",
" }\n",
")"
"full_chain.invoke({\"question\":\"Which are the 5 rock songs with titles about deep feeling of dispair?\"})"
]
},
{
@@ -449,11 +417,7 @@
}
],
"source": [
"full_chain.invoke(\n",
" {\n",
" \"question\": \"I want to know the 3 albums which have the most amount of songs in the top 150 saddest songs\"\n",
" }\n",
")"
"full_chain.invoke({\"question\": \"I want to know the 3 albums which have the most amount of songs in the top 150 saddest songs\"})"
]
},
{
@@ -483,11 +447,7 @@
}
],
"source": [
"full_chain.invoke(\n",
" {\n",
" \"question\": \"I need the 6 albums with shortest title, as long as they contain songs which are in the 20 saddest song list.\"\n",
" }\n",
")"
"full_chain.invoke({\"question\": \"I need the 6 albums with shortest title, as long as they contain songs which are in the 20 saddest song list.\"})"
]
},
{
@@ -523,13 +483,7 @@
}
],
"source": [
"print(\n",
" sql_query_chain.invoke(\n",
" {\n",
" \"question\": \"I need the 6 albums with shortest title, as long as they contain songs which are in the 20 saddest song list.\"\n",
" }\n",
" )\n",
")"
"print(sql_query_chain.invoke({\"question\": \"I need the 6 albums with shortest title, as long as they contain songs which are in the 20 saddest song list.\"}))"
]
},
{
@@ -577,12 +531,9 @@
"album_titles = [title[0] for title in eval(albums)]\n",
"album_title_embeddings = embeddings_model.embed_documents(album_titles)\n",
"for i in tqdm(range(len(album_title_embeddings))):\n",
" album_title = album_titles[i].replace(\"'\", \"''\")\n",
" album_title = album_titles[i].replace(\"'\",\"''\")\n",
" album_embedding = album_title_embeddings[i]\n",
" sql_command = (\n",
" f'UPDATE \"Album\" SET \"embeddings\" = ARRAY{album_embedding} WHERE \"Title\" ='\n",
" + f\"'{album_title}'\"\n",
" )\n",
" sql_command = f'UPDATE \"Album\" SET \"embeddings\" = ARRAY{album_embedding} WHERE \"Title\" =' + f\"'{album_title}'\"\n",
" db.run(sql_command)"
]
},
@@ -606,10 +557,7 @@
],
"source": [
"embeded_title = embeddings_model.embed_query(\"hope about the future\")\n",
"query = (\n",
" 'SELECT \"Album\".\"Title\" FROM \"Album\" WHERE \"Album\".\"embeddings\" IS NOT NULL ORDER BY \"embeddings\" <-> '\n",
" + f\"'{embeded_title}' LIMIT 5\"\n",
")\n",
"query = 'SELECT \"Album\".\"Title\" FROM \"Album\" WHERE \"Album\".\"embeddings\" IS NOT NULL ORDER BY \"embeddings\" <-> ' + f\"'{embeded_title}' LIMIT 5\"\n",
"db.run(query)"
]
},
@@ -627,9 +575,7 @@
"metadata": {},
"outputs": [],
"source": [
"db = SQLDatabase.from_uri(\n",
" CONNECTION_STRING\n",
") # We reconnect to dbso the new columns are loaded as well."
"db = SQLDatabase.from_uri(CONNECTION_STRING) # We reconnect to dbso the new columns are loaded as well."
]
},
{
@@ -649,11 +595,7 @@
}
],
"source": [
"full_chain.invoke(\n",
" {\n",
" \"question\": \"I want to know songs about breakouts obtained from top 5 albums about love\"\n",
" }\n",
")"
"full_chain.invoke({\"question\": \"I want to know songs about breakouts obtained from top 5 albums about love\"})"
]
},
{
@@ -681,9 +623,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -31,10 +31,12 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from operator import itemgetter\n",
"\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain.schema.runnable import RunnablePassthrough, RunnableLambda\n",
"from langchain.utilities import DuckDuckGoSearchAPIWrapper"
]
},
@@ -72,9 +74,9 @@
"outputs": [],
"source": [
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | model\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | model \n",
" | StrOutputParser()\n",
")"
]
@@ -243,7 +245,6 @@
"source": [
"# Parser to remove the `**`\n",
"\n",
"\n",
"def _parse(text):\n",
" return text.strip(\"**\")"
]
@@ -289,10 +290,9 @@
"rewrite_retrieve_read_chain = (\n",
" {\n",
" \"context\": {\"x\": RunnablePassthrough()} | rewriter | retriever,\n",
" \"question\": RunnablePassthrough(),\n",
" }\n",
" | prompt\n",
" | model\n",
" \"question\": RunnablePassthrough()} \n",
" | prompt \n",
" | model \n",
" | StrOutputParser()\n",
")"
]

View File

@@ -42,22 +42,22 @@
"OPENAI_API_KEY = \"sk-xx\"\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY\n",
"\n",
"from typing import Any, Callable, Dict, List, Union\n",
"\n",
"from langchain.agents import AgentExecutor, LLMSingleActionAgent, Tool\n",
"from langchain.agents.agent import AgentOutputParser\n",
"from langchain.agents.conversational.prompt import FORMAT_INSTRUCTIONS\n",
"from langchain.chains import LLMChain, RetrievalQA\n",
"from typing import Dict, List, Any, Union, Callable\n",
"from pydantic import BaseModel, Field\n",
"from langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n",
"from langchain.llms import BaseLLM\n",
"from langchain.chains.base import Chain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.llms import BaseLLM, OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.prompts.base import StringPromptTemplate\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"from langchain.agents import Tool, LLMSingleActionAgent, AgentExecutor\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.vectorstores import Chroma\n",
"from pydantic import BaseModel, Field"
"from langchain.llms import OpenAI\n",
"from langchain.prompts.base import StringPromptTemplate\n",
"from langchain.agents.agent import AgentOutputParser\n",
"from langchain.agents.conversational.prompt import FORMAT_INSTRUCTIONS\n",
"from langchain.schema import AgentAction, AgentFinish"
]
},
{

View File

@@ -1,175 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e93283d1",
"metadata": {},
"source": [
"# Selecting LLMs based on Context Length\n",
"\n",
"Different LLMs have different context lengths. As a very immediate an practical example, OpenAI has two versions of GPT-3.5-Turbo: one with 4k context, another with 16k context. This notebook shows how to route between them based on input."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "cc453450",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.schema.prompt import PromptValue"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1cec6a10",
"metadata": {},
"outputs": [],
"source": [
"short_context_model = ChatOpenAI(model=\"gpt-3.5-turbo\")\n",
"long_context_model = ChatOpenAI(model=\"gpt-3.5-turbo-16k\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "772da153",
"metadata": {},
"outputs": [],
"source": [
"def get_context_length(prompt: PromptValue):\n",
" messages = prompt.to_messages()\n",
" tokens = short_context_model.get_num_tokens_from_messages(messages)\n",
" return tokens"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "db771e20",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate.from_template(\"Summarize this passage: {context}\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "af057e2f",
"metadata": {},
"outputs": [],
"source": [
"def choose_model(prompt: PromptValue):\n",
" context_len = get_context_length(prompt)\n",
" if context_len < 30:\n",
" print(\"short model\")\n",
" return short_context_model\n",
" else:\n",
" print(\"long model\")\n",
" return long_context_model"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "84f3e07d",
"metadata": {},
"outputs": [],
"source": [
"chain = prompt | choose_model | StrOutputParser()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "d8b14f8f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"short model\n"
]
},
{
"data": {
"text/plain": [
"'The passage mentions that a frog visited a pond.'"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"context\": \"a frog went to a pond\"})"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "70ebd3dd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"long model\n"
]
},
{
"data": {
"text/plain": [
"'The passage describes a frog that moved from one pond to another and perched on a log.'"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\n",
" {\"context\": \"a frog went to a pond and sat on a log and went to a different pond\"}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7e29fef",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -7,33 +7,9 @@
"source": [
"# Building hotel room search with self-querying retrieval\n",
"\n",
"In this example we'll walk through how to build and iterate on a hotel room search service that leverages an LLM to generate structured filter queries that can then be passed to a vector store.\n",
"\n",
"For an introduction to self-querying retrieval [check out the docs](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query)."
]
},
{
"cell_type": "markdown",
"id": "d621de99-d993-4f4b-b94a-d02b2c7ad4e0",
"metadata": {},
"source": [
"## Imports and data prep\n",
"\n",
"In this example we use `ChatOpenAI` for the model and `ElasticsearchStore` for the vector store, but these can be swapped out with an LLM/ChatModel and [any VectorStore that support self-querying](https://python.langchain.com/docs/integrations/retrievers/self_query/).\n",
"\n",
"Download data from: https://www.kaggle.com/datasets/keshavramaiah/hotel-recommendation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ecd1fbb-bdba-420b-bcc7-5ea8a232ab11",
"metadata": {},
"outputs": [],
"source": [
"!pip install langchain lark openai elasticsearch pandas"
]
},
{
"cell_type": "code",
"execution_count": 1,
@@ -51,14 +27,8 @@
"metadata": {},
"outputs": [],
"source": [
"details = (\n",
" pd.read_csv(\"~/Downloads/archive/Hotel_details.csv\")\n",
" .drop_duplicates(subset=\"hotelid\")\n",
" .set_index(\"hotelid\")\n",
")\n",
"attributes = pd.read_csv(\n",
" \"~/Downloads/archive/Hotel_Room_attributes.csv\", index_col=\"id\"\n",
")\n",
"details = pd.read_csv(\"~/Downloads/archive/Hotel_details.csv\").drop_duplicates(subset=\"hotelid\").set_index(\"hotelid\")\n",
"attributes = pd.read_csv(\"~/Downloads/archive/Hotel_Room_attributes.csv\", index_col=\"id\")\n",
"price = pd.read_csv(\"~/Downloads/archive/hotels_RoomPrice.csv\", index_col=\"id\")"
]
},
@@ -214,20 +184,9 @@
}
],
"source": [
"latest_price = price.drop_duplicates(subset=\"refid\", keep=\"last\")[\n",
" [\n",
" \"hotelcode\",\n",
" \"roomtype\",\n",
" \"onsiterate\",\n",
" \"roomamenities\",\n",
" \"maxoccupancy\",\n",
" \"mealinclusiontype\",\n",
" ]\n",
"]\n",
"latest_price = price.drop_duplicates(subset=\"refid\", keep=\"last\")[[\"hotelcode\", \"roomtype\", \"onsiterate\", \"roomamenities\", \"maxoccupancy\", \"mealinclusiontype\"]]\n",
"latest_price[\"ratedescription\"] = attributes.loc[latest_price.index][\"ratedescription\"]\n",
"latest_price = latest_price.join(\n",
" details[[\"hotelname\", \"city\", \"country\", \"starrating\"]], on=\"hotelcode\"\n",
")\n",
"latest_price = latest_price.join(details[[\"hotelname\", \"city\", \"country\", \"starrating\"]], on=\"hotelcode\")\n",
"latest_price = latest_price.rename({\"ratedescription\": \"roomdescription\"}, axis=1)\n",
"latest_price[\"mealsincluded\"] = ~latest_price[\"mealinclusiontype\"].isnull()\n",
"latest_price.pop(\"hotelcode\")\n",
@@ -261,7 +220,7 @@
"res = model.predict(\n",
" \"Below is a table with information about hotel rooms. \"\n",
" \"Return a JSON list with an entry for each column. Each entry should have \"\n",
" '{\"name\": \"column name\", \"description\": \"column description\", \"type\": \"column data type\"}'\n",
" \"{\\\"name\\\": \\\"column name\\\", \\\"description\\\": \\\"column description\\\", \\\"type\\\": \\\"column data type\\\"}\"\n",
" f\"\\n\\n{latest_price.head()}\\n\\nJSON:\\n\"\n",
")"
]
@@ -355,15 +314,9 @@
"metadata": {},
"outputs": [],
"source": [
"attribute_info[-2][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
"attribute_info[3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
"attribute_info[-3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\""
"attribute_info[-2]['description'] += f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
"attribute_info[3]['description'] += f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
"attribute_info[-3]['description'] += f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\""
]
},
{
@@ -431,10 +384,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.query_constructor.base import (\n",
" get_query_constructor_prompt,\n",
" load_query_constructor_runnable,\n",
")"
"from langchain.chains.query_constructor.base import get_query_constructor_prompt, load_query_constructor_runnable"
]
},
{
@@ -618,9 +568,7 @@
"metadata": {},
"outputs": [],
"source": [
"chain = load_query_constructor_runnable(\n",
" ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0), doc_contents, attribute_info\n",
")"
"chain = load_query_constructor_runnable(ChatOpenAI(model='gpt-3.5-turbo', temperature=0), doc_contents, attribute_info)"
]
},
{
@@ -662,11 +610,7 @@
}
],
"source": [
"chain.invoke(\n",
" {\n",
" \"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"\n",
" }\n",
")"
"chain.invoke({\"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"})"
]
},
{
@@ -688,12 +632,10 @@
"metadata": {},
"outputs": [],
"source": [
"attribute_info[-3][\n",
" \"description\"\n",
"] += \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"attribute_info[-3]['description'] += \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"chain = load_query_constructor_runnable(\n",
" ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" doc_contents,\n",
" ChatOpenAI(model='gpt-3.5-turbo', temperature=0), \n",
" doc_contents, \n",
" attribute_info,\n",
")"
]
@@ -738,12 +680,10 @@
"source": [
"content_attr = [\"roomtype\", \"roomamenities\", \"roomdescription\", \"hotelname\"]\n",
"doc_contents = \"A detailed description of a hotel room, including information about the room type and room amenities.\"\n",
"filter_attribute_info = tuple(\n",
" ai for ai in attribute_info if ai[\"name\"] not in content_attr\n",
")\n",
"filter_attribute_info = tuple(ai for ai in attribute_info if ai[\"name\"] not in content_attr)\n",
"chain = load_query_constructor_runnable(\n",
" ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" doc_contents,\n",
" ChatOpenAI(model='gpt-3.5-turbo', temperature=0), \n",
" doc_contents, \n",
" filter_attribute_info,\n",
")"
]
@@ -766,11 +706,7 @@
}
],
"source": [
"chain.invoke(\n",
" {\n",
" \"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"\n",
" }\n",
")"
"chain.invoke({\"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"})"
]
},
{
@@ -900,22 +836,14 @@
"examples = [\n",
" (\n",
" \"I want a hotel in the Balkans with a king sized bed and a hot tub. Budget is $300 a night\",\n",
" {\n",
" \"query\": \"king-sized bed, hot tub\",\n",
" \"filter\": 'and(in(\"country\", [\"Bulgaria\", \"Greece\", \"Croatia\", \"Serbia\"]), lte(\"onsiterate\", 300))',\n",
" },\n",
" {\"query\": \"king-sized bed, hot tub\", \"filter\": 'and(in(\"country\", [\"Bulgaria\", \"Greece\", \"Croatia\", \"Serbia\"]), lte(\"onsiterate\", 300))'}\n",
" ),\n",
" (\n",
" \"A room with breakfast included for 3 people, at a Hilton\",\n",
" {\n",
" \"query\": \"Hilton\",\n",
" \"filter\": 'and(eq(\"mealsincluded\", true), gte(\"maxoccupancy\", 3))',\n",
" },\n",
" {\"query\": \"Hilton\", \"filter\": 'and(eq(\"mealsincluded\", true), gte(\"maxoccupancy\", 3))'}\n",
" ),\n",
"]\n",
"prompt = get_query_constructor_prompt(\n",
" doc_contents, filter_attribute_info, examples=examples\n",
")\n",
"prompt = get_query_constructor_prompt(doc_contents, filter_attribute_info, examples=examples)\n",
"print(prompt.format(query=\"{query}\"))"
]
},
@@ -927,10 +855,10 @@
"outputs": [],
"source": [
"chain = load_query_constructor_runnable(\n",
" ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" doc_contents,\n",
" ChatOpenAI(model='gpt-3.5-turbo', temperature=0), \n",
" doc_contents, \n",
" filter_attribute_info,\n",
" examples=examples,\n",
" examples=examples\n",
")"
]
},
@@ -952,11 +880,7 @@
}
],
"source": [
"chain.invoke(\n",
" {\n",
" \"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"\n",
" }\n",
")"
"chain.invoke({\"query\": \"Find a 2-person room in Vienna or London, preferably with meals included and AC\"})"
]
},
{
@@ -1008,11 +932,7 @@
}
],
"source": [
"chain.invoke(\n",
" {\n",
" \"query\": \"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\"\n",
" }\n",
")"
"chain.invoke({\"query\": \"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\"})"
]
},
{
@@ -1033,11 +953,11 @@
"outputs": [],
"source": [
"chain = load_query_constructor_runnable(\n",
" ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
" doc_contents,\n",
" ChatOpenAI(model='gpt-3.5-turbo', temperature=0), \n",
" doc_contents, \n",
" filter_attribute_info,\n",
" examples=examples,\n",
" fix_invalid=True,\n",
" fix_invalid=True\n",
")"
]
},
@@ -1059,11 +979,7 @@
}
],
"source": [
"chain.invoke(\n",
" {\n",
" \"query\": \"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\"\n",
" }\n",
")"
"chain.invoke({\"query\": \"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\"})"
]
},
{
@@ -1084,6 +1000,7 @@
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.schema import Document\n",
"from langchain.vectorstores import ElasticsearchStore\n",
"\n",
"embeddings = OpenAIEmbeddings()"
@@ -1115,8 +1032,8 @@
"# docs.append(doc)\n",
"# vecstore = ElasticsearchStore.from_documents(\n",
"# docs,\n",
"# embeddings,\n",
"# es_url=\"http://localhost:9200\",\n",
"# embeddings, \n",
"# es_url=\"http://localhost:9200\", \n",
"# index_name=\"hotel_rooms\",\n",
"# # strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
"# # hybrid=True,\n",
@@ -1132,9 +1049,9 @@
"outputs": [],
"source": [
"vecstore = ElasticsearchStore(\n",
" \"hotel_rooms\",\n",
" embedding=embeddings,\n",
" es_url=\"http://localhost:9200\",\n",
" \"hotel_rooms\", \n",
" embedding=embeddings, \n",
" es_url=\"http://localhost:9200\", \n",
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy(hybrid=True) # seems to not be available in community version\n",
")"
]
@@ -1148,9 +1065,7 @@
"source": [
"from langchain.retrievers import SelfQueryRetriever\n",
"\n",
"retriever = SelfQueryRetriever(\n",
" query_constructor=chain, vectorstore=vecstore, verbose=True\n",
")"
"retriever = SelfQueryRetriever(query_constructor=chain, vectorstore=vecstore, verbose=True)"
]
},
{
@@ -1227,9 +1142,7 @@
}
],
"source": [
"results = retriever.get_relevant_documents(\n",
" \"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\"\n",
")\n",
"results = retriever.get_relevant_documents(\"I want to stay somewhere highly rated along the coast. I want a room with a patio and a fireplace.\")\n",
"for res in results:\n",
" print(res.page_content)\n",
" print(\"\\n\" + \"-\" * 20 + \"\\n\")"

View File

@@ -51,8 +51,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain_experimental.smart_llm import SmartLLMChain"
]
},

View File

@@ -11,7 +11,7 @@
"\n",
"Read the paper [here](https://arxiv.org/abs/2310.06117)\n",
"\n",
"See an excellent blog post on this by Cobus Greyling [here](https://cobusgreyling.medium.com/a-new-prompt-engineering-technique-has-been-introduced-called-step-back-prompting-b00e8954cacb)\n",
"See an excelent blog post on this by Cobus Greyling [here](https://cobusgreyling.medium.com/a-new-prompt-engineering-technique-has-been-introduced-called-step-back-prompting-b00e8954cacb)\n",
"\n",
"In this cookbook we will replicate this technique. We modify the prompts used slightly to work better with chat models."
]
@@ -40,11 +40,11 @@
"examples = [\n",
" {\n",
" \"input\": \"Could the members of The Police perform lawful arrests?\",\n",
" \"output\": \"what can the members of The Police do?\",\n",
" \"output\": \"what can the members of The Police do?\"\n",
" },\n",
" {\n",
" \"input\": \"Jan Sindels was born in what country?\",\n",
" \"output\": \"what is Jan Sindels personal history?\",\n",
" \"input\": \"Jan Sindels was born in what country?\", \n",
" \"output\": \"what is Jan Sindels personal history?\"\n",
" },\n",
"]\n",
"# We now transform these to example messages\n",
@@ -67,18 +67,13 @@
"metadata": {},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"\"\"You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:\"\"\",\n",
" ),\n",
" # Few shot examples\n",
" few_shot_prompt,\n",
" # New question\n",
" (\"user\", \"{question}\"),\n",
" ]\n",
")"
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\", \"\"\"You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:\"\"\"),\n",
" # Few shot examples\n",
" few_shot_prompt,\n",
" # New question\n",
" (\"user\", \"{question}\"),\n",
"])"
]
},
{
@@ -131,8 +126,8 @@
"source": [
"from langchain.utilities import DuckDuckGoSearchAPIWrapper\n",
"\n",
"search = DuckDuckGoSearchAPIWrapper(max_results=4)\n",
"\n",
"search = DuckDuckGoSearchAPIWrapper(max_results=4)\n",
"\n",
"def retriever(query):\n",
" return search.run(query)"
@@ -216,19 +211,14 @@
"metadata": {},
"outputs": [],
"source": [
"chain = (\n",
" {\n",
" # Retrieve context using the normal question\n",
" \"normal_context\": RunnableLambda(lambda x: x[\"question\"]) | retriever,\n",
" # Retrieve context using the step-back question\n",
" \"step_back_context\": question_gen | retriever,\n",
" # Pass on the question\n",
" \"question\": lambda x: x[\"question\"],\n",
" }\n",
" | response_prompt\n",
" | ChatOpenAI(temperature=0)\n",
" | StrOutputParser()\n",
")"
"chain = {\n",
" # Retrieve context using the normal question\n",
" \"normal_context\": RunnableLambda(lambda x: x['question']) | retriever,\n",
" # Retrieve context using the step-back question\n",
" \"step_back_context\": question_gen | retriever,\n",
" # Pass on the question\n",
" \"question\": lambda x: x[\"question\"]\n",
"} | response_prompt | ChatOpenAI(temperature=0) | StrOutputParser()"
]
},
{
@@ -283,17 +273,12 @@
"metadata": {},
"outputs": [],
"source": [
"chain = (\n",
" {\n",
" # Retrieve context using the normal question (only the first 3 results)\n",
" \"normal_context\": RunnableLambda(lambda x: x[\"question\"]) | retriever,\n",
" # Pass on the question\n",
" \"question\": lambda x: x[\"question\"],\n",
" }\n",
" | response_prompt\n",
" | ChatOpenAI(temperature=0)\n",
" | StrOutputParser()\n",
")"
"chain = {\n",
" # Retrieve context using the normal question (only the first 3 results)\n",
" \"normal_context\": RunnableLambda(lambda x: x['question']) | retriever,\n",
" # Pass on the question\n",
" \"question\": lambda x: x[\"question\"]\n",
"} | response_prompt | ChatOpenAI(temperature=0) | StrOutputParser()"
]
},
{

View File

@@ -51,7 +51,7 @@
}
],
"source": [
"sudoku_puzzle = \"3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1\"\n",
"sudoku_puzzle = \"3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1\"\n",
"sudoku_solution = \"3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1\"\n",
"problem_description = f\"\"\"\n",
"{sudoku_puzzle}\n",
@@ -64,7 +64,7 @@
"- Keep the known digits from previous valid thoughts in place.\n",
"- Each thought can be a partial or the final solution.\n",
"\"\"\".strip()\n",
"print(problem_description)"
"print(problem_description)\n"
]
},
{
@@ -84,17 +84,13 @@
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"from typing import Tuple\n",
"\n",
"from langchain_experimental.tot.checker import ToTChecker\n",
"from langchain_experimental.tot.thought import ThoughtValidity\n",
"\n",
"import re\n",
"\n",
"class MyChecker(ToTChecker):\n",
" def evaluate(\n",
" self, problem_description: str, thoughts: Tuple[str, ...] = ()\n",
" ) -> ThoughtValidity:\n",
" def evaluate(self, problem_description: str, thoughts: Tuple[str, ...] = ()) -> ThoughtValidity:\n",
" last_thought = thoughts[-1]\n",
" clean_solution = last_thought.replace(\" \", \"\").replace('\"', \"\")\n",
" regex_solution = clean_solution.replace(\"*\", \".\").replace(\"|\", \"\\\\|\")\n",
@@ -120,22 +116,10 @@
"outputs": [],
"source": [
"checker = MyChecker()\n",
"assert (\n",
" checker.evaluate(\"\", (\"3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1\",))\n",
" == ThoughtValidity.VALID_INTERMEDIATE\n",
")\n",
"assert (\n",
" checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1\",))\n",
" == ThoughtValidity.VALID_FINAL\n",
")\n",
"assert (\n",
" checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,3,*,1\",))\n",
" == ThoughtValidity.VALID_INTERMEDIATE\n",
")\n",
"assert (\n",
" checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,*,3,1\",))\n",
" == ThoughtValidity.INVALID\n",
")"
"assert checker.evaluate(\"\", (\"3,*,*,2|1,*,3,*|*,1,*,3|4,*,*,1\",)) == ThoughtValidity.VALID_INTERMEDIATE\n",
"assert checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,3,2,1\",)) == ThoughtValidity.VALID_FINAL\n",
"assert checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,3,*,1\",)) == ThoughtValidity.VALID_INTERMEDIATE\n",
"assert checker.evaluate(\"\", (\"3,4,1,2|1,2,3,4|2,1,4,3|4,*,3,1\",)) == ThoughtValidity.INVALID"
]
},
{
@@ -219,9 +203,7 @@
"source": [
"from langchain_experimental.tot.base import ToTChain\n",
"\n",
"tot_chain = ToTChain(\n",
" llm=llm, checker=MyChecker(), k=30, c=5, verbose=True, verbose_llm=False\n",
")\n",
"tot_chain = ToTChain(llm=llm, checker=MyChecker(), k=30, c=5, verbose=True, verbose_llm=False)\n",
"tot_chain.run(problem_description=problem_description)"
]
},

View File

@@ -34,8 +34,8 @@
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"import getpass\n",
"\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
@@ -109,7 +109,6 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"root_dir = \"./the-algorithm\"\n",
@@ -119,7 +118,7 @@
" try:\n",
" loader = TextLoader(os.path.join(dirpath, file), encoding=\"utf-8\")\n",
" docs.extend(loader.load_and_split())\n",
" except Exception:\n",
" except Exception as e:\n",
" pass"
]
},
@@ -3808,8 +3807,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo-0613\") # switch to 'gpt-4'\n",
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"

View File

@@ -22,14 +22,17 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Callable, List\n",
"\n",
"from typing import List, Dict, Callable\n",
"from langchain.chains import ConversationChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.prompts.prompt import PromptTemplate\n",
"from langchain.schema import (\n",
" AIMessage,\n",
" HumanMessage,\n",
" SystemMessage,\n",
" BaseMessage,\n",
")"
]
},
@@ -46,7 +49,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import AgentType, initialize_agent, load_tools"
"from langchain.agents import Tool\n",
"from langchain.agents import initialize_agent\n",
"from langchain.agents import AgentType\n",
"from langchain.agents import load_tools"
]
},
{

View File

@@ -22,8 +22,7 @@
"metadata": {},
"outputs": [],
"source": [
"from typing import Callable, List\n",
"\n",
"from typing import List, Dict, Callable\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import (\n",
" HumanMessage,\n",

View File

@@ -35,7 +35,7 @@
"tags": []
},
"source": [
"### API keys and other secrets\n",
"### API keys and other secrats\n",
"\n",
"We use an `.ini` file, like this: \n",
"```\n",
@@ -192,10 +192,10 @@
" return current\n",
"\n",
"\n",
"from typing import Optional\n",
"\n",
"import requests\n",
"\n",
"from typing import Optional\n",
"\n",
"\n",
"def vocab_lookup(\n",
" search: str,\n",
@@ -319,10 +319,9 @@
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from typing import Any, Dict, List\n",
"\n",
"import requests\n",
"from typing import List, Dict, Any\n",
"import json\n",
"\n",
"\n",
"def run_sparql(\n",
@@ -390,18 +389,17 @@
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"from typing import List, Union\n",
"\n",
"from langchain.agents import (\n",
" AgentExecutor,\n",
" AgentOutputParser,\n",
" LLMSingleActionAgent,\n",
" Tool,\n",
" AgentExecutor,\n",
" LLMSingleActionAgent,\n",
" AgentOutputParser,\n",
")\n",
"from langchain.chains import LLMChain\n",
"from langchain.prompts import StringPromptTemplate\n",
"from langchain.schema import AgentAction, AgentFinish"
"from langchain.llms import OpenAI\nfrom langchain.chains import LLMChain\n",
"from typing import List, Union\n",
"from langchain.schema import AgentAction, AgentFinish\n",
"import re"
]
},
{

View File

@@ -14,8 +14,6 @@ cd ../_dist
poetry run python scripts/model_feat_table.py
poetry run nbdoc_build --srcdir docs
cp ../cookbook/README.md src/pages/cookbook.mdx
cp ../.github/CONTRIBUTING.md docs/contributing.md
wget https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md
poetry run python scripts/generate_api_reference_links.py
yarn install
yarn start

View File

@@ -15,11 +15,3 @@ pre {
#my-component-root *, #headlessui-portal-root * {
z-index: 10000;
}
table.longtable code {
white-space: normal;
}
table.longtable td {
max-width: 600px;
}

View File

@@ -2,9 +2,9 @@
import importlib
import inspect
import typing
from enum import Enum
from pathlib import Path
from typing import Dict, List, Literal, Optional, Sequence, TypedDict, Union
from typing import TypedDict, Sequence, List, Dict, Literal, Union, Optional
from enum import Enum
from pydantic import BaseModel
@@ -13,10 +13,8 @@ HERE = Path(__file__).parent
PKG_DIR = ROOT_DIR / "libs" / "langchain" / "langchain"
EXP_DIR = ROOT_DIR / "libs" / "experimental" / "langchain_experimental"
CORE_DIR = ROOT_DIR / "libs" / "core" / "langchain_core"
WRITE_FILE = HERE / "api_reference.rst"
EXP_WRITE_FILE = HERE / "experimental_api_reference.rst"
CORE_WRITE_FILE = HERE / "core_api_reference.rst"
ClassKind = Literal["TypedDict", "Regular", "Pydantic", "enum"]
@@ -294,17 +292,6 @@ def _document_langchain_experimental() -> None:
def _document_langchain_core() -> None:
"""Document the langchain_core package."""
# Generate core_api_reference.rst
core_members = _load_package_modules(CORE_DIR)
core_doc = ".. _core_api_reference:\n\n" + _construct_doc(
"langchain_core", core_members
)
with open(CORE_WRITE_FILE, "w") as f:
f.write(core_doc)
def _document_langchain() -> None:
"""Document the main langchain package."""
# load top level module members
lc_members = _load_package_modules(PKG_DIR)
@@ -319,6 +306,7 @@ def _document_langchain() -> None:
"agents.output_parsers": agents["output_parsers"],
"agents.format_scratchpad": agents["format_scratchpad"],
"tools.render": tools["render"],
"schema.runnable": schema["runnable"],
}
)
@@ -330,9 +318,8 @@ def _document_langchain() -> None:
def main() -> None:
"""Generate the reference.rst file for each package."""
_document_langchain()
_document_langchain_experimental()
_document_langchain_core()
_document_langchain_experimental()
if __name__ == "__main__":

File diff suppressed because one or more lines are too long

View File

@@ -1,6 +1,5 @@
-e libs/langchain
-e libs/experimental
-e libs/core
pydantic<2
autodoc_pydantic==1.8.0
myst_parser

View File

@@ -34,9 +34,6 @@
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('api_reference') }}">API</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('core_api_reference') }}">Core</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('experimental_api_reference') }}">Experimental</a>
</li>

BIN
docs/docs/_static/ApifyActors.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 559 KiB

BIN
docs/docs/_static/ChaindeskDashboard.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB

BIN
docs/docs/_static/HeliconeDashboard.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 235 KiB

BIN
docs/docs/_static/HeliconeKeys.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

BIN
docs/docs/_static/MetalDash.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

BIN
docs/docs/_static/apple-touch-icon.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

21
docs/docs/_static/css/custom.css vendored Normal file
View File

@@ -0,0 +1,21 @@
pre {
white-space: break-spaces;
}
@media (min-width: 1200px) {
.container,
.container-lg,
.container-md,
.container-sm,
.container-xl {
max-width: 2560px !important;
}
}
#my-component-root *, #headlessui-portal-root * {
z-index: 10000;
}
.content-container p {
margin: revert;
}

BIN
docs/docs/_static/favicon-16x16.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 542 B

BIN
docs/docs/_static/favicon-32x32.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

BIN
docs/docs/_static/favicon.ico vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Some files were not shown because too many files have changed in this diff Show More