Compare commits

..

36 Commits

Author SHA1 Message Date
Harrison Chase
a9126073b6 cr 2023-07-19 21:01:27 -07:00
Harrison Chase
6fe73854f3 cr 2023-07-19 20:12:11 -07:00
Harrison Chase
f824b4cecc cr 2023-07-19 20:10:16 -07:00
Harrison Chase
f5d62be724 cr 2023-07-19 20:08:24 -07:00
Harrison Chase
2ddbca8c7b cr 2023-07-19 20:03:55 -07:00
Harrison Chase
d034a9f477 cr 2023-07-19 19:57:58 -07:00
Harrison Chase
e8465aaa15 cr 2023-07-19 19:30:30 -07:00
Harrison Chase
785c049d34 cr 2023-07-19 19:21:18 -07:00
Harrison Chase
afd928bac4 cr 2023-07-19 19:18:10 -07:00
Harrison Chase
8a5dad8898 cr 2023-07-19 19:11:47 -07:00
Harrison Chase
f1d0494cfd cr 2023-07-19 18:13:43 -07:00
Harrison Chase
eb3756d728 add experimental package 2023-07-19 18:12:44 -07:00
Harrison Chase
13a36f2c48 cr 2023-07-19 15:53:53 -07:00
Harrison Chase
813cf10abf cr 2023-07-19 15:43:45 -07:00
Harrison Chase
ec8ab91034 cr 2023-07-19 15:40:01 -07:00
Harrison Chase
6761f9919f cr 2023-07-19 15:38:18 -07:00
Harrison Chase
724173c580 cr 2023-07-19 15:37:34 -07:00
Harrison Chase
507b313ed2 cr 2023-07-19 15:34:50 -07:00
Harrison Chase
543af85647 cr 2023-07-19 15:31:55 -07:00
Harrison Chase
37ab378ea1 cr 2023-07-19 15:25:28 -07:00
Harrison Chase
265a95d3a9 cr 2023-07-19 15:23:57 -07:00
Harrison Chase
02eed9d707 cr 2023-07-19 15:09:25 -07:00
Harrison Chase
bba5a5e5e4 cr 2023-07-19 14:28:03 -07:00
Harrison Chase
058cca8357 cr 2023-07-19 14:23:20 -07:00
Harrison Chase
b090453110 cr 2023-07-19 14:22:01 -07:00
Harrison Chase
ab39a2faed Merge branch 'master' into harrison/experimental 2023-07-19 14:20:45 -07:00
Harrison Chase
4287c72873 cr 2023-07-19 14:20:02 -07:00
Harrison Chase
1b66b8cd06 cr 2023-07-19 14:18:31 -07:00
Harrison Chase
f8fbd5fcfc cr 2023-07-19 14:16:03 -07:00
Harrison Chase
3e41142408 cr 2023-07-19 14:15:45 -07:00
Harrison Chase
e1499748d8 cr 2023-07-19 14:09:09 -07:00
Harrison Chase
4f6597f5cf cr 2023-07-19 14:08:15 -07:00
Harrison Chase
c0e17b4c01 cr 2023-07-19 14:06:48 -07:00
Harrison Chase
e8505ac0a0 cr 2023-07-19 14:03:02 -07:00
Harrison Chase
eb411f91b5 cr 2023-07-19 13:57:25 -07:00
Harrison Chase
b1d5fc40a7 set up experimental 2023-07-19 13:49:41 -07:00
2569 changed files with 249323 additions and 177343 deletions

View File

@@ -15,11 +15,7 @@ You may use the button above, or follow these steps to open this repo in a Codes
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
## VS Code Dev Containers
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
Note: If you click this link you will open the main repo and not your local cloned repo, you can use this link and replace with your username and cloned repo name:
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
@@ -29,7 +25,7 @@ You can also follow these steps to open this repo in a container using the VS Co
2. Open a locally cloned copy of the code:
- Fork and Clone this repository to your local filesystem.
- Clone this repository to your local filesystem.
- Press <kbd>F1</kbd> and select the **Dev Containers: Open Folder in Container...** command.
- Select the cloned copy of this folder, wait for the container to start, and try things out!

View File

@@ -9,7 +9,7 @@ to contributions, whether they be in the form of new features, improved infra, b
### 👩‍💻 Contributing Code
To contribute to this project, please follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are a maintainer.
Please do not try to push directly to this repo unless you are maintainer.
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
maintainers.
@@ -21,7 +21,7 @@ It's essential that we maintain great documentation and testing. If you:
- Fix a bug
- Add a relevant unit or integration test when possible. These live in `tests/unit_tests` and `tests/integration_tests`.
- Make an improvement
- Update any affected example notebooks and documentation. These live in `docs`.
- Update any affected example notebooks and documentation. These lives in `docs`.
- Update unit and integration tests when relevant.
- Add a feature
- Add a demo notebook in `docs/modules`.
@@ -33,7 +33,7 @@ best way to get our attention.
### 🚩GitHub Issues
Our [issues](https://github.com/hwchase17/langchain/issues) page is kept up to date
with bugs, improvements, and feature requests.
with bugs, improvements, and feature requests.
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
organize issues.
@@ -43,8 +43,8 @@ If you start working on an issue, please assign it to yourself.
If you are adding an issue, please try to keep it focused on a single, modular bug/improvement/feature.
If two issues are related, or blocking, please link them rather than combining them.
We will try to keep these issues as up-to-date as possible, though
with the rapid rate of development in this field some may get out of date.
We will try to keep these issues as up to date as possible, though
with the rapid rate of develop in this field some may get out of date.
If you notice this happening, please let us know.
### 🙋Getting Help
@@ -61,33 +61,25 @@ we do not want these to get in the way of getting good code into the codebase.
> **Note:** You can run this repository locally (which is described below) or in a [development container](https://containers.dev/) (which is described in the [.devcontainer folder](https://github.com/hwchase17/langchain/tree/master/.devcontainer)).
This project uses [Poetry](https://python-poetry.org/) v1.5.1 as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
❗Note: If you use `Conda` or `Pyenv` as your environment/package manager, avoid dependency conflicts by doing the following first:
❗Note: If you use `Conda` or `Pyenv` as your environment / package manager, avoid dependency conflicts by doing the following first:
1. *Before installing Poetry*, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
2. Install Poetry v1.5.1 (see above)
2. Install Poetry (see above)
3. Tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
4. Continue with the following steps.
There are two separate projects in this repository:
- `langchain`: core langchain code, abstractions, and use cases
- `langchain.experimental`: more experimental code
Each of these has their OWN development environment.
In order to run any of the commands below, please move into their respective directories.
For example, to contribute to `langchain` run `cd libs/langchain` before getting started with the below.
To install requirements:
```bash
poetry install --with test
poetry install -E all
```
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage.
This will install all requirements for running the package, examples, linting, formatting, tests, and coverage. Note the `-E all` flag will install all optional dependencies necessary for integration testing.
❗Note: If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running Poetry v1.5.1. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases. If you are still seeing this bug on v1.5.1, you may also try disabling "modern installation" (`poetry config installer.modern-installation false`) and re-installing requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
❗Note: If you're running Poetry 1.4.1 and receive a `WheelFileValidationError` for `debugpy` during installation, you can try either downgrading to Poetry 1.4.0 or disabling "modern installation" (`poetry config installer.modern-installation false`) and re-install requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
Now assuming `make` and `pytest` are installed, you should be able to run the common tasks in the following section. To double check, run `make test` under `libs/langchain`, all tests should pass. If they don't, you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
Now, you should be able to run the common tasks in the following section. To double check, run `make test`, all tests should pass. If they don't you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
## ✅ Common Tasks
@@ -134,7 +126,7 @@ We recognize linting can be annoying - if you do not want to do it, please conta
### Spellcheck
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
Note that `codespell` finds common typos, so could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
To check spelling for this project:
@@ -174,10 +166,10 @@ Langchain relies heavily on optional dependencies to keep the Langchain package
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
that most users won't have it installed.
Users who do not have the dependency installed should be able to **import** your code without
any side effects (no warnings, no errors, no exceptions).
Users that do not have the dependency installed should be able to **import** your code without
any side effects (no warnings, no errors, no exceptions).
To introduce the dependency to the pyproject.toml file correctly, please do the following:
To introduce the dependency to the pyproject.toml file correctly, please do the following:
1. Add the dependency to the main group as an optional dependency
```bash
@@ -188,7 +180,7 @@ To introduce the dependency to the pyproject.toml file correctly, please do the
```bash
poetry lock --no-update
```
4. Add a unit test that the very least attempts to import the new code. Ideally, the unit
4. Add a unit test that the very least attempts to import the new code. Ideally the unit
test makes use of lightweight fixtures to test the logic of the code.
5. Please use the `@pytest.mark.requires(package_name)` decorator for any tests that require the dependency.
@@ -220,7 +212,7 @@ If you add new logic, please add a unit test.
Integration tests cover logic that requires making calls to outside APIs (often integration with other services).
**warning** Almost no tests should be integration tests.
**warning** Almost no tests should be integration tests.
Tests that require making network connections make it difficult for other
developers to test the code.
@@ -238,7 +230,7 @@ If you add support for a new external API, please add a new integration test.
### Adding a Jupyter Notebook
If you are adding a Jupyter Notebook example, you'll want to install the optional `dev` dependencies.
If you are adding a Jupyter notebook example, you'll want to install the optional `dev` dependencies.
To install dev dependencies:
@@ -256,9 +248,6 @@ When you run `poetry install`, the `langchain` package is installed as editable
## Documentation
While the code is split between `langchain` and `langchain.experimental`, the documentation is one holistic thing.
This covers how to get started contributing to documentation.
### Contribute Documentation
The docs directory contains Documentation and API Reference.
@@ -307,3 +296,4 @@ even patch releases may contain [non-backwards-compatible changes](https://semve
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.

View File

@@ -1,5 +1,5 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve LangChain. To report a security issue, please instead use the security option below.
description: Submit a bug report to help us improve LangChain
labels: ["02 Bug Report"]
body:
- type: markdown

View File

@@ -1,20 +1,28 @@
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer (see below),
- Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
See contribution guidelines for more information on how to write/run tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use. It lives in `docs/extras` directory.
2. an example notebook showing its use.
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the same people again.
See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

View File

@@ -15,77 +15,64 @@ inputs:
description: Poetry version
required: true
install-command:
description: Command run for installing dependencies
required: false
default: poetry install
cache-key:
description: Cache key to use for manual handling of caching
required: true
working-directory:
description: Directory whose poetry.lock file should be cached
required: true
description: Directory to run install-command in
required: false
default: ""
runs:
using: composite
steps:
- uses: actions/setup-python@v4
name: Setup python ${{ inputs.python-version }}
name: Setup python $${ inputs.python-version }}
with:
python-version: ${{ inputs.python-version }}
- uses: actions/cache@v3
id: cache-bin-poetry
name: Cache Poetry binary - Python ${{ inputs.python-version }}
id: cache-pip
name: Cache Pip ${{ inputs.python-version }}
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
with:
path: |
/opt/pipx/venvs/poetry
# This step caches the poetry installation, so make sure it's keyed on the poetry version as well.
key: bin-poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-${{ inputs.poetry-version }}
- name: Refresh shell hashtable and fixup softlinks
if: steps.cache-bin-poetry.outputs.cache-hit == 'true'
shell: bash
env:
POETRY_VERSION: ${{ inputs.poetry-version }}
PYTHON_VERSION: ${{ inputs.python-version }}
run: |
set -eux
# Refresh the shell hashtable, to ensure correct `which` output.
hash -r
# `actions/cache@v3` doesn't always seem able to correctly unpack softlinks.
# Delete and recreate the softlinks pipx expects to have.
rm /opt/pipx/venvs/poetry/bin/python
cd /opt/pipx/venvs/poetry/bin
ln -s "$(which "python$PYTHON_VERSION")" python
chmod +x python
cd /opt/pipx_bin/
ln -s /opt/pipx/venvs/poetry/bin/poetry poetry
chmod +x poetry
# Ensure everything got set up correctly.
/opt/pipx/venvs/poetry/bin/python --version
/opt/pipx_bin/poetry --version
- name: Install poetry
if: steps.cache-bin-poetry.outputs.cache-hit != 'true'
shell: bash
env:
POETRY_VERSION: ${{ inputs.poetry-version }}
PYTHON_VERSION: ${{ inputs.python-version }}
run: pipx install "poetry==$POETRY_VERSION" --python "python$PYTHON_VERSION" --verbose
- name: Restore pip and poetry cached dependencies
uses: actions/cache@v3
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
with:
path: |
~/.cache/pip
key: pip-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}
- run: pipx install poetry==${{ inputs.poetry-version }} --python python${{ inputs.python-version }}
shell: bash
- name: Check Poetry File
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry check
- name: Check lock file
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry lock --check
- uses: actions/cache@v3
id: cache-poetry
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
with:
path: |
~/.cache/pypoetry/virtualenvs
~/.cache/pypoetry/cache
~/.cache/pypoetry/artifacts
${{ env.WORKDIR }}/.venv
key: py-deps-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles(format('{0}/**/poetry.lock', env.WORKDIR)) }}
key: poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles('poetry.lock') }}
- run: ${{ inputs.install-command }}
working-directory: ${{ inputs.working-directory }}
shell: bash

View File

@@ -1,606 +0,0 @@
#!/usr/bin/env python3
#
# git-restore-mtime - Change mtime of files based on commit date of last change
#
# Copyright (C) 2012 Rodrigo Silva (MestreLion) <linux@rodrigosilva.com>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. See <http://www.gnu.org/licenses/gpl.html>
#
# Source: https://github.com/MestreLion/git-tools
# Version: July 13, 2023 (commit hash 5f832e72453e035fccae9d63a5056918d64476a2)
"""
Change the modification time (mtime) of files in work tree, based on the
date of the most recent commit that modified the file, including renames.
Ignores untracked files and uncommitted deletions, additions and renames, and
by default modifications too.
---
Useful prior to generating release tarballs, so each file is archived with a
date that is similar to the date when the file was actually last modified,
assuming the actual modification date and its commit date are close.
"""
# TODO:
# - Add -z on git whatchanged/ls-files, so we don't deal with filename decoding
# - When Python is bumped to 3.7, use text instead of universal_newlines on subprocess
# - Update "Statistics for some large projects" with modern hardware and repositories.
# - Create a README.md for git-restore-mtime alone. It deserves extensive documentation
# - Move Statistics there
# - See git-extras as a good example on project structure and documentation
# FIXME:
# - When current dir is outside the worktree, e.g. using --work-tree, `git ls-files`
# assume any relative pathspecs are to worktree root, not the current dir. As such,
# relative pathspecs may not work.
# - Renames are tricky:
# - R100 should not change mtime, but original name is not on filelist. Should
# track renames until a valid (A, M) mtime found and then set on current name.
# - Should set mtime for both current and original directories.
# - Check mode changes with unchanged blobs?
# - Check file (A, D) for the directory mtime is not sufficient:
# - Renames also change dir mtime, unless rename was on a parent dir
# - If most recent change of all files in a dir was a Modification (M),
# dir might not be touched at all.
# - Dirs containing only subdirectories but no direct files will also
# not be touched. They're files' [grand]parent dir, but never their dirname().
# - Some solutions:
# - After files done, perform some dir processing for missing dirs, finding latest
# file (A, D, R)
# - Simple approach: dir mtime is the most recent child (dir or file) mtime
# - Use a virtual concept of "created at most at" to fill missing info, bubble up
# to parents and grandparents
# - When handling [grand]parent dirs, stay inside <pathspec>
# - Better handling of merge commits. `-m` is plain *wrong*. `-c/--cc` is perfect, but
# painfully slow. First pass without merge commits is not accurate. Maybe add a new
# `--accurate` mode for `--cc`?
if __name__ != "__main__":
raise ImportError("{} should not be used as a module.".format(__name__))
import argparse
import datetime
import logging
import os.path
import shlex
import signal
import subprocess
import sys
import time
__version__ = "2022.12+dev"
# Update symlinks only if the platform supports not following them
UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', []))
# Call os.path.normpath() only if not in a POSIX platform (Windows)
NORMALIZE_PATHS = (os.path.sep != '/')
# How many files to process in each batch when re-trying merge commits
STEPMISSING = 100
# (Extra) keywords for the os.utime() call performed by touch()
UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False}
# Command-line interface ######################################################
def parse_args():
parser = argparse.ArgumentParser(
description=__doc__.split('\n---')[0])
group = parser.add_mutually_exclusive_group()
group.add_argument('--quiet', '-q', dest='loglevel',
action="store_const", const=logging.WARNING, default=logging.INFO,
help="Suppress informative messages and summary statistics.")
group.add_argument('--verbose', '-v', action="count", help="""
Print additional information for each processed file.
Specify twice to further increase verbosity.
""")
parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help="""
Run as if %(prog)s was started in directory %(metavar)s.
This affects how --work-tree, --git-dir and PATHSPEC arguments are handled.
See 'man 1 git' or 'git --help' for more information.
""")
parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help="""
Path to the git repository, by default auto-discovered by searching
the current directory and its parents for a .git/ subdirectory.
""")
parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help="""
Path to the work tree root, by default the parent of GITDIR if it's
automatically discovered, or the current directory if GITDIR is set.
""")
parser.add_argument('--force', '-f', default=False, action="store_true", help="""
Force updating files with uncommitted modifications.
Untracked files and uncommitted deletions, renames and additions are
always ignored.
""")
parser.add_argument('--merge', '-m', default=False, action="store_true", help="""
Include merge commits.
Leads to more recent times and more files per commit, thus with the same
time, which may or may not be what you want.
Including merge commits may lead to fewer commits being evaluated as files
are found sooner, which can improve performance, sometimes substantially.
But as merge commits are usually huge, processing them may also take longer.
By default, merge commits are only used for files missing from regular commits.
""")
parser.add_argument('--first-parent', default=False, action="store_true", help="""
Consider only the first parent, the "main branch", when evaluating merge commits.
Only effective when merge commits are processed, either when --merge is
used or when finding missing files after the first regular log search.
See --skip-missing.
""")
parser.add_argument('--skip-missing', '-s', dest="missing", default=True,
action="store_false", help="""
Do not try to find missing files.
If merge commits were not evaluated with --merge and some files were
not found in regular commits, by default %(prog)s searches for these
files again in the merge commits.
This option disables this retry, so files found only in merge commits
will not have their timestamp updated.
""")
parser.add_argument('--no-directories', '-D', dest='dirs', default=True,
action="store_false", help="""
Do not update directory timestamps.
By default, use the time of its most recently created, renamed or deleted file.
Note that just modifying a file will NOT update its directory time.
""")
parser.add_argument('--test', '-t', default=False, action="store_true",
help="Test run: do not actually update any file timestamp.")
parser.add_argument('--commit-time', '-c', dest='commit_time', default=False,
action='store_true', help="Use commit time instead of author time.")
parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False,
action='store_true', help="""
Update times based on the oldest, instead of the most recent commit of a file.
This reverses the order in which the git log is processed to emulate a
file "creation" date. Note this will be inaccurate for files deleted and
re-created at later dates.
""")
parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help="""
Ignore files that are currently older than %(metavar)s.
Useful in workflows that assume such files already have a correct timestamp,
as it may improve performance by processing fewer files.
""")
parser.add_argument('--skip-older-than-commit', '-N', default=False,
action='store_true', help="""
Ignore files older than the timestamp it would be updated to.
Such files may be considered "original", likely in the author's repository.
""")
parser.add_argument('--unique-times', default=False, action="store_true", help="""
Set the microseconds to a unique value per commit.
Allows telling apart changes that would otherwise have identical timestamps,
as git's time accuracy is in seconds.
""")
parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help="""
Only modify paths matching %(metavar)s, relative to current directory.
By default, update all but untracked files and submodules.
""")
parser.add_argument('--version', '-V', action='version',
version='%(prog)s version {version}'.format(version=get_version()))
args_ = parser.parse_args()
if args_.verbose:
args_.loglevel = max(logging.TRACE, logging.DEBUG // args_.verbose)
args_.debug = args_.loglevel <= logging.DEBUG
return args_
def get_version(version=__version__):
if not version.endswith('+dev'):
return version
try:
cwd = os.path.dirname(os.path.realpath(__file__))
return Git(cwd=cwd, errors=False).describe().lstrip('v')
except Git.Error:
return '-'.join((version, "unknown"))
# Helper functions ############################################################
def setup_logging():
"""Add TRACE logging level and corresponding method, return the root logger"""
logging.TRACE = TRACE = logging.DEBUG // 2
logging.Logger.trace = lambda _, m, *a, **k: _.log(TRACE, m, *a, **k)
return logging.getLogger()
def normalize(path):
r"""Normalize paths from git, handling non-ASCII characters.
Git stores paths as UTF-8 normalization form C.
If path contains non-ASCII or non-printable characters, git outputs the UTF-8
in octal-escaped notation, escaping double-quotes and backslashes, and then
double-quoting the whole path.
https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath
This function reverts this encoding, so:
normalize(r'"Back\\slash_double\"quote_a\303\247a\303\255"') =>
r'Back\slash_double"quote_açaí')
Paths with invalid UTF-8 encoding, such as single 0x80-0xFF bytes (e.g, from
Latin1/Windows-1251 encoding) are decoded using surrogate escape, the same
method used by Python for filesystem paths. So 0xE6 ("æ" in Latin1, r'\\346'
from Git) is decoded as "\udce6". See https://peps.python.org/pep-0383/ and
https://vstinner.github.io/painful-history-python-filesystem-encoding.html
Also see notes on `windows/non-ascii-paths.txt` about path encodings on
non-UTF-8 platforms and filesystems.
"""
if path and path[0] == '"':
# Python 2: path = path[1:-1].decode("string-escape")
# Python 3: https://stackoverflow.com/a/46650050/624066
path = (path[1:-1] # Remove enclosing double quotes
.encode('latin1') # Convert to bytes, required by 'unicode-escape'
.decode('unicode-escape') # Perform the actual octal-escaping decode
.encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded
.decode('utf8', 'surrogateescape')) # Decode from UTF-8
if NORMALIZE_PATHS:
# Make sure the slash matches the OS; for Windows we need a backslash
path = os.path.normpath(path)
return path
def dummy(*_args, **_kwargs):
"""No-op function used in dry-run tests"""
def touch(path, mtime):
"""The actual mtime update"""
os.utime(path, (mtime, mtime), **UTIME_KWS)
def touch_ns(path, mtime_ns):
"""The actual mtime update, using nanoseconds for unique timestamps"""
os.utime(path, None, ns=(mtime_ns, mtime_ns), **UTIME_KWS)
def isodate(secs: int):
# time.localtime() accepts floats, but discards fractional part
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs))
def isodate_ns(ns: int):
# for integers fromtimestamp() is equivalent and ~16% slower than isodate()
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ')
def get_mtime_ns(secs: int, idx: int):
# Time resolution for filesystems and functions:
# ext-4 and other POSIX filesystems: 1 nanosecond
# NTFS (Windows default): 100 nanoseconds
# datetime.datetime() (due to 64-bit float epoch): 1 microsecond
us = idx % 1000000 # 10**6
return 1000 * (1000000 * secs + us)
def get_mtime_path(path):
return os.path.getmtime(path)
# Git class and parse_log(), the heart of the script ##########################
class Git:
def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True):
self.gitcmd = ['git']
self.errors = errors
self._proc = None
if workdir: self.gitcmd.extend(('--work-tree', workdir))
if gitdir: self.gitcmd.extend(('--git-dir', gitdir))
if cwd: self.gitcmd.extend(('-C', cwd))
self.workdir, self.gitdir = self._get_repo_dirs()
def ls_files(self, paths: list = None):
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
def ls_dirty(self, force=False):
return (normalize(_[3:].split(' -> ', 1)[-1])
for _ in self._run('status --porcelain')
if _[:2] != '??' and (not force or (_[0] in ('R', 'A')
or _[1] == 'D')))
def log(self, merge=False, first_parent=False, commit_time=False,
reverse_order=False, paths: list = None):
cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at')
if merge: cmd += ' -m'
if first_parent: cmd += ' --first-parent'
if reverse_order: cmd += ' --reverse'
return self._run(cmd, paths)
def describe(self):
return self._run('describe --tags', check=True)[0]
def terminate(self):
if self._proc is None:
return
try:
self._proc.terminate()
except OSError:
# Avoid errors on OpenBSD
pass
def _get_repo_dirs(self):
return (os.path.normpath(_) for _ in
self._run('rev-parse --show-toplevel --absolute-git-dir', check=True))
def _run(self, cmdstr: str, paths: list = None, output=True, check=False):
cmdlist = self.gitcmd + shlex.split(cmdstr)
if paths:
cmdlist.append('--')
cmdlist.extend(paths)
popen_args = dict(universal_newlines=True, encoding='utf8')
if not self.errors:
popen_args['stderr'] = subprocess.DEVNULL
log.trace("Executing: %s", ' '.join(cmdlist))
if not output:
return subprocess.call(cmdlist, **popen_args)
if check:
try:
stdout: str = subprocess.check_output(cmdlist, **popen_args)
return stdout.splitlines()
except subprocess.CalledProcessError as e:
raise self.Error(e.returncode, e.cmd, e.output, e.stderr)
self._proc = subprocess.Popen(cmdlist, stdout=subprocess.PIPE, **popen_args)
return (_.rstrip() for _ in self._proc.stdout)
def __del__(self):
self.terminate()
class Error(subprocess.CalledProcessError):
"""Error from git executable"""
def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
mtime = 0
datestr = isodate(0)
for line in git.log(
merge,
args.first_parent,
args.commit_time,
args.reverse_order,
filterlist
):
stats['loglines'] += 1
# Blank line between Date and list of files
if not line:
continue
# Date line
if line[0] != ':': # Faster than `not line.startswith(':')`
stats['commits'] += 1
mtime = int(line)
if args.unique_times:
mtime = get_mtime_ns(mtime, stats['commits'])
if args.debug:
datestr = isodate(mtime)
continue
# File line: three tokens if it describes a renaming, otherwise two
tokens = line.split('\t')
# Possible statuses:
# M: Modified (content changed)
# A: Added (created)
# D: Deleted
# T: Type changed: to/from regular file, symlinks, submodules
# R099: Renamed (moved), with % of unchanged content. 100 = pure rename
# Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken
status = tokens[0].split(' ')[-1]
file = tokens[-1]
# Handles non-ASCII chars and OS path separator
file = normalize(file)
def do_file():
if args.skip_older_than_commit and get_mtime_path(file) <= mtime:
stats['skip'] += 1
return
if args.debug:
log.debug("%d\t%d\t%d\t%s\t%s",
stats['loglines'], stats['commits'], stats['files'],
datestr, file)
try:
touch(os.path.join(git.workdir, file), mtime)
stats['touches'] += 1
except Exception as e:
log.error("ERROR: %s: %s", e, file)
stats['errors'] += 1
def do_dir():
if args.debug:
log.debug("%d\t%d\t-\t%s\t%s",
stats['loglines'], stats['commits'],
datestr, "{}/".format(dirname or '.'))
try:
touch(os.path.join(git.workdir, dirname), mtime)
stats['dirtouches'] += 1
except Exception as e:
log.error("ERROR: %s: %s", e, dirname)
stats['direrrors'] += 1
if file in filelist:
stats['files'] -= 1
filelist.remove(file)
do_file()
if args.dirs and status in ('A', 'D'):
dirname = os.path.dirname(file)
if dirname in dirlist:
dirlist.remove(dirname)
do_dir()
# All files done?
if not stats['files']:
git.terminate()
return
# Main Logic ##################################################################
def main():
start = time.time() # yes, Wall time. CPU time is not realistic for users.
stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors',
'dirtouches', 'direrrors')}
logging.basicConfig(level=args.loglevel, format='%(message)s')
log.trace("Arguments: %s", args)
# First things first: Where and Who are we?
if args.cwd:
log.debug("Changing directory: %s", args.cwd)
try:
os.chdir(args.cwd)
except OSError as e:
log.critical(e)
return e.errno
# Using both os.chdir() and `git -C` is redundant, but might prevent side effects
# `git -C` alone could be enough if we make sure that:
# - all paths, including args.pathspec, are processed by git: ls-files, rev-parse
# - touch() / os.utime() path argument is always prepended with git.workdir
try:
git = Git(workdir=args.workdir, gitdir=args.gitdir, cwd=args.cwd)
except Git.Error as e:
# Not in a git repository, and git already informed user on stderr. So we just...
return e.returncode
# Get the files managed by git and build file list to be processed
if UPDATE_SYMLINKS and not args.skip_older_than:
filelist = set(git.ls_files(args.pathspec))
else:
filelist = set()
for path in git.ls_files(args.pathspec):
fullpath = os.path.join(git.workdir, path)
# Symlink (to file, to dir or broken - git handles the same way)
if not UPDATE_SYMLINKS and os.path.islink(fullpath):
log.warning("WARNING: Skipping symlink, no OS support for updates: %s",
path)
continue
# skip files which are older than given threshold
if (args.skip_older_than
and start - get_mtime_path(fullpath) > args.skip_older_than):
continue
# Always add files relative to worktree root
filelist.add(path)
# If --force, silently ignore uncommitted deletions (not in the filesystem)
# and renames / additions (will not be found in log anyway)
if args.force:
filelist -= set(git.ls_dirty(force=True))
# Otherwise, ignore any dirty files
else:
dirty = set(git.ls_dirty())
if dirty:
log.warning("WARNING: Modified files in the working directory were ignored."
"\nTo include such files, commit your changes or use --force.")
filelist -= dirty
# Build dir list to be processed
dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set()
stats['totalfiles'] = stats['files'] = len(filelist)
log.info("{0:,} files to be processed in work dir".format(stats['totalfiles']))
if not filelist:
# Nothing to do. Exit silently and without errors, just like git does
return
# Process the log until all files are 'touched'
log.debug("Line #\tLog #\tF.Left\tModification Time\tFile Name")
parse_log(filelist, dirlist, stats, git, args.merge, args.pathspec)
# Missing files
if filelist:
# Try to find them in merge logs, if not done already
# (usually HUGE, thus MUCH slower!)
if args.missing and not args.merge:
filterlist = list(filelist)
missing = len(filterlist)
log.info("{0:,} files not found in log, trying merge commits".format(missing))
for i in range(0, missing, STEPMISSING):
parse_log(filelist, dirlist, stats, git,
merge=True, filterlist=filterlist[i:i + STEPMISSING])
# Still missing some?
for file in filelist:
log.warning("WARNING: not found in the log: %s", file)
# Final statistics
# Suggestion: use git-log --before=mtime to brag about skipped log entries
def log_info(msg, *a, width=13):
ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt
ffmt = '{:%d,.2f}' % (width,)
# %-formatting lacks a thousand separator, must pre-render with .format()
log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a))
log_info(
"Statistics:\n"
"%f seconds\n"
"%d log lines processed\n"
"%d commits evaluated",
time.time() - start, stats['loglines'], stats['commits'])
if args.dirs:
if stats['direrrors']: log_info("%d directory update errors", stats['direrrors'])
log_info("%d directories updated", stats['dirtouches'])
if stats['touches'] != stats['totalfiles']:
log_info("%d files", stats['totalfiles'])
if stats['skip']: log_info("%d files skipped", stats['skip'])
if stats['files']: log_info("%d files missing", stats['files'])
if stats['errors']: log_info("%d file update errors", stats['errors'])
log_info("%d files updated", stats['touches'])
if args.test:
log.info("TEST RUN - No files modified!")
# Keep only essential, global assignments here. Any other logic must be in main()
log = setup_logging()
args = parse_args()
# Set the actual touch() and other functions based on command-line arguments
if args.unique_times:
touch = touch_ns
isodate = isodate_ns
# Make sure this is always set last to ensure --test behaves as intended
if args.test:
touch = dummy
# UI done, it's showtime!
try:
sys.exit(main())
except KeyboardInterrupt:
log.info("\nAborting")
signal.signal(signal.SIGINT, signal.SIG_DFL)
os.kill(os.getpid(), signal.SIGINT)

View File

@@ -9,142 +9,38 @@ on:
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.5.1"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
POETRY_VERSION: "1.4.2"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
env:
# This number is set "by eye": we want it to be big enough
# so that it's bigger than the number of commits in any reasonable PR,
# and also as small as possible since increasing the number makes
# the initial `git fetch` slower.
FETCH_DEPTH: 50
strategy:
matrix:
# Only lint on the min and max supported Python versions.
# It's extremely unlikely that there's a lint issue on any version in between
# that doesn't show up on the min or max versions.
#
# GitHub rate-limits how many jobs can be running at any one time.
# Starting new jobs is also relatively slow,
# so linting on fewer versions makes CI faster.
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
steps:
- uses: actions/checkout@v3
with:
# Fetch the last FETCH_DEPTH commits, so the mtime-changing script
# can accurately set the mtimes of files modified in the last FETCH_DEPTH commits.
fetch-depth: ${{ env.FETCH_DEPTH }}
- name: Restore workdir file mtimes to last-edited commit date
id: restore-mtimes
# This is needed to make black caching work.
# Black's cache uses file (mtime, size) to check whether a lookup is a cache hit.
# Without this command, files in the repo would have the current time as the modified time,
# since the previous action step just created them.
# This command resets the mtime to the last time the files were modified in git instead,
# which is a high-quality and stable representation of the last modification date.
- name: Install poetry
run: |
# Important considerations:
# - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
# - We only want to alter mtimes for Python files, since that's all black checks.
# - We don't need to alter mtimes for directories, since black doesn't look at those.
# - We also only alter mtimes inside the `WORKDIR` since that's all we'll lint.
# - This should run before `poetry install`, because poetry's venv also contains
# Python files, and we don't want to alter their mtimes since they aren't linted.
# Ensure we fail on non-zero exits and on undefined variables.
# Also print executed commands, for easier debugging.
set -eux
# Restore the mtimes of Python files in the workdir based on git history.
.github/tools/git-restore-mtime --no-directories "$WORKDIR/**/*.py"
# Since CI only does a partial fetch (to `FETCH_DEPTH`) for efficiency,
# the local git repo doesn't have full history. There are probably files
# that were last modified in a commit *older than* the oldest fetched commit.
# After `git-restore-mtime`, such files have a mtime set to the oldest fetched commit.
#
# As new commits get added, that timestamp will keep moving forward.
# If left unchanged, this will make `black` think that the files were edited
# more recently than its cache suggests. Instead, we can set their mtime
# to a fixed date in the far past that won't change and won't cause cache misses in black.
#
# For all workdir Python files modified in or before the oldest few fetched commits,
# make their mtime be 2000-01-01 00:00:00.
OLDEST_COMMIT="$(git log --reverse '--pretty=format:%H' | head -1)"
OLDEST_COMMIT_TIME="$(git show -s '--format=%ai' "$OLDEST_COMMIT")"
find "$WORKDIR" -name '*.py' -type f -not -newermt "$OLDEST_COMMIT_TIME" -exec touch -c -m -t '200001010000' '{}' '+'
echo "oldest-commit=$OLDEST_COMMIT" >> "$GITHUB_OUTPUT"
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
pipx install poetry==$POETRY_VERSION
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: lint-with-extras
- name: Check Poetry File
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry check
- name: Check lock file
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry lock --check
cache: poetry
- name: Install dependencies
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
#
# If you change this configuration, make sure to change the `cache-key`
# in the `poetry_setup` action above to stop using the old cache.
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with dev,lint,test,typing
poetry install
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}
if: ${{ inputs.working-directory != 'libs/langchain' }}
if: ${{ inputs.working-directory != 'langchain' }}
run: |
pip install -e ../langchain
- name: Restore black cache
uses: actions/cache@v3
env:
CACHE_BASE: black-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
with:
path: |
${{ env.WORKDIR }}/.black_cache
key: ${{ env.CACHE_BASE }}-${{ steps.restore-mtimes.outputs.oldest-commit }}
restore-keys:
# If we can't find an exact match for our cache key, accept any with this prefix.
${{ env.CACHE_BASE }}-
- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v3
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache
key: mypy-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
env:
BLACK_CACHE_DIR: .black_cache
run: |
make lint

View File

@@ -1,93 +0,0 @@
name: pydantic v1/v2 compatibility
on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.5.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Pydantic v1/v2 compatibility - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: pydantic-cross-compat
- name: Install dependencies
shell: bash
run: poetry install
- name: Install the opposite major version of pydantic
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
shell: bash
run: |
# Determine the major part of pydantic version
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
if [[ "$REGULAR_VERSION" == "1" ]]; then
PYDANTIC_DEP=">=2.1,<3"
TEST_WITH_VERSION="2"
elif [[ "$REGULAR_VERSION" == "2" ]]; then
PYDANTIC_DEP="<2"
TEST_WITH_VERSION="1"
else
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
exit 1
fi
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
# which would prevent caching from working: the cache would get saved
# to a different key than where it gets loaded from.
poetry run pip install "pydantic${PYDANTIC_DEP}"
# Ensure that the correct pydantic is installed now.
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
# Determine the major part of pydantic version
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
# Check that the major part of pydantic version is as expected, if not
# raise an error
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
exit 1
fi
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
- name: Run pydantic compatibility tests
shell: bash
run: make test
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -9,37 +9,26 @@ on:
description: "From which folder this pipeline executes"
env:
POETRY_VERSION: "1.5.1"
POETRY_VERSION: "1.4.2"
jobs:
if_release:
# Disallow publishing from branches that aren't `master`.
if: github.ref == 'refs/heads/master'
if: |
${{ github.event.pull_request.merged == true }}
&& ${{ contains(github.event.pull_request.labels.*.name, 'release') }}
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
# https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
#
# Trusted publishing has to also be configured on PyPI for each package:
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
id-token: write
# This permission is needed by `ncipollo/release-action` to create the GitHub release.
contents: write
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v3
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
- name: Install poetry
run: pipx install poetry==$POETRY_VERSION
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: "3.10"
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
cache: "poetry"
- name: Build project for distribution
run: poetry build
- name: Check Version
@@ -48,7 +37,6 @@ jobs:
echo version=$(poetry version --short) >> $GITHUB_OUTPUT
- name: Create Release
uses: ncipollo/release-action@v1
if: ${{ inputs.working-directory == 'libs/langchain' }}
with:
artifacts: "dist/*"
token: ${{ secrets.GITHUB_TOKEN }}
@@ -56,9 +44,8 @@ jobs:
generateReleaseNotes: true
tag: v${{ steps.check-version.outputs.version }}
commit: master
- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: ${{ inputs.working-directory }}/dist/
verbose: true
print-hash: true
- name: Publish to PyPI
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_API_TOKEN }}
run: |
poetry publish

View File

@@ -7,9 +7,13 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
test_type:
type: string
description: "Test types to run"
default: '["core", "extended"]'
env:
POETRY_VERSION: "1.5.1"
POETRY_VERSION: "1.4.2"
jobs:
build:
@@ -24,34 +28,34 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
test_type: ${{ fromJSON(inputs.test_type) }}
name: Python ${{ matrix.python-version }} ${{ matrix.test_type }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
- name: Set up Python ${{ matrix.python-version }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: core
- name: Install dependencies
shell: bash
run: poetry install
- name: Run core tests
shell: bash
run: make test
- name: Ensure the tests did not create any additional files
shell: bash
poetry-version: "1.4.2"
cache-key: ${{ matrix.test_type }}
install-command: |
if [ "${{ matrix.test_type }}" == "core" ]; then
echo "Running core tests, installing dependencies with poetry..."
poetry install
else
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
fi
- name: Install langchain editable
if: ${{ inputs.working-directory != 'langchain' }}
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
pip install -e ../langchain
- name: Run ${{matrix.test_type}} tests
run: |
if [ "${{ matrix.test_type }}" == "core" ]; then
make test
else
make extended_tests
fi
shell: bash

View File

@@ -20,5 +20,3 @@ jobs:
uses: actions/checkout@v3
- name: Codespell
uses: codespell-project/actions-codespell@v2
with:
skip: guide_imports.json

View File

@@ -1,22 +0,0 @@
---
name: Documentation Lint
on:
push:
branches: [master]
pull_request:
branches: [master]
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Run import check
run: |
# We should not encourage imports directly from main init file
# Expect for hub
git grep 'from langchain import' docs | grep -vE 'from langchain import (hub)' && exit 1 || exit 0

View File

@@ -6,29 +6,12 @@ on:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/_pydantic_compatibility.yml'
- '.github/workflows/langchain_ci.yml'
- 'libs/langchain/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.5.1"
WORKDIR: "libs/langchain"
jobs:
lint:
uses:
@@ -36,62 +19,9 @@ jobs:
with:
working-directory: libs/langchain
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/langchain
secrets: inherit
pydantic-compatibility:
uses:
./.github/workflows/_pydantic_compatibility.yml
with:
working-directory: libs/langchain
secrets: inherit
extended-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: libs/langchain
cache-key: extended
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
secrets: inherit

View File

@@ -1,129 +1,28 @@
---
name: libs/experimental CI
name: libs/langchain-experimental CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/_test.yml'
- '.github/workflows/langchain_experimental_ci.yml'
- 'libs/langchain/**'
- 'libs/experimental/**'
- 'libs/langchain-experimental/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.5.1"
WORKDIR: "libs/experimental"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: libs/experimental
working-directory: libs/langchain-experimental
secrets: inherit
test:
uses:
./.github/workflows/_test.yml
with:
working-directory: libs/experimental
secrets: inherit
# It's possible that langchain-experimental works fine with the latest *published* langchain,
# but is broken with the langchain on `master`.
#
# We want to catch situations like that *before* releasing a new langchain, hence this test.
test-with-latest-langchain:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: test with unpublished langchain - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ env.WORKDIR }}
cache-key: unpublished-langchain
- name: Install dependencies
shell: bash
run: |
echo "Running tests with unpublished langchain, installing dependencies with poetry..."
poetry install
echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
poetry run pip install -e ../langchain
- name: Run tests
run: make test
extended-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ env.WORKDIR }}
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: libs/experimental
cache-key: extended
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with poetry..."
poetry install -E extended_testing
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
working-directory: libs/langchain-experimental
test_type: '["core"]'
secrets: inherit

View File

@@ -1,13 +0,0 @@
---
name: libs/experimental Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
jobs:
release:
uses:
./.github/workflows/_release.yml
with:
working-directory: libs/experimental
secrets: inherit

View File

@@ -2,7 +2,13 @@
name: libs/langchain Release
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
pull_request:
types:
- closed
branches:
- master
paths:
- 'libs/langchain/pyproject.toml'
jobs:
release:
@@ -10,4 +16,4 @@ jobs:
./.github/workflows/_release.yml
with:
working-directory: libs/langchain
secrets: inherit
secrets: inherit

View File

@@ -1,61 +0,0 @@
name: Scheduled tests
on:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
schedule:
- cron: '0 13 * * *'
env:
POETRY_VERSION: "1.5.1"
jobs:
build:
defaults:
run:
working-directory: libs/langchain
runs-on: ubuntu-latest
environment: Scheduled testing
strategy:
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: libs/langchain
cache-key: scheduled
- name: Install dependencies
working-directory: libs/langchain
shell: bash
run: |
echo "Running scheduled tests, installing dependencies with poetry..."
poetry install --with=test_integration
- name: Run tests
shell: bash
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
make scheduled_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

1
.gitignore vendored
View File

@@ -162,7 +162,6 @@ docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/experimental_api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/

View File

@@ -24,6 +24,6 @@ sphinx:
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/api_reference/requirements.txt
- requirements: docs/requirements.txt
- method: pip
path: .

View File

@@ -1,61 +0,0 @@
# Migrating to `langchain_experimental`
We are moving any experimental components of LangChain, or components with vulnerability issues, into `langchain_experimental`.
This guide covers how to migrate.
## Installation
Previously:
`pip install -U langchain`
Now (only if you want to access things in experimental):
`pip install -U langchain langchain_experimental`
## Things in `langchain.experimental`
Previously:
`from langchain.experimental import ...`
Now:
`from langchain_experimental import ...`
## PALChain
Previously:
`from langchain.chains import PALChain`
Now:
`from langchain_experimental.pal_chain import PALChain`
## SQLDatabaseChain
Previously:
`from langchain.chains import SQLDatabaseChain`
Now:
`from langchain_experimental.sql import SQLDatabaseChain`
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out [`create_sql_query_chain`](https://github.com/langchain-ai/langchain/blob/master/docs/extras/use_cases/tabular/sql_query.ipynb)
`from langchain.chains import create_sql_query_chain`
## `load_prompt` for Python files
Note: this only applies if you want to load Python files as prompts.
If you want to load json/yaml files, no change is needed.
Previously:
`from langchain.prompts import load_prompt`
Now:
`from langchain_experimental.prompts import load_prompt`

View File

@@ -1,54 +0,0 @@
.PHONY: all clean docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck
# Default target executed when no arguments are given to make.
all: help
######################
# DOCUMENTATION
######################
clean: docs_clean api_docs_clean
docs_build:
docs/.local_build.sh
docs_clean:
rm -r docs/_dist
docs_linkcheck:
poetry run linkchecker docs/_dist/docs_skeleton/ --ignore-url node_modules
api_docs_build:
poetry run python docs/api_reference/create_api_rst.py
cd docs/api_reference && poetry run make html
api_docs_clean:
rm -f docs/api_reference/api_reference.rst
cd docs/api_reference && poetry run make clean
api_docs_linkcheck:
poetry run linkchecker docs/api_reference/_build/html/index.html
spell_check:
poetry run codespell --toml pyproject.toml
spell_fix:
poetry run codespell --toml pyproject.toml -w
######################
# HELP
######################
help:
@echo '----'
@echo 'clean - run docs_clean and api_docs_clean'
@echo 'docs_build - build the documentation'
@echo 'docs_clean - clean the documentation build artifacts'
@echo 'docs_linkcheck - run linkchecker on the documentation'
@echo 'api_docs_build - build the API Reference documentation'
@echo 'api_docs_clean - clean the API Reference documentation build artifacts'
@echo 'api_docs_linkcheck - run linkchecker on the API Reference documentation'
@echo 'spell_check - run codespell on the project'
@echo 'spell_fix - run codespell on the project and fix the errors'

View File

@@ -2,32 +2,24 @@
⚡ Building applications with LLMs through composability ⚡
[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/releases)
[![CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml)
[![Experimental CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml)
[![Release Notes](https://img.shields.io/github/release/hwchase17/langchain)](https://github.com/hwchase17/langchain/releases)
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml)
[![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml)
[![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
[![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain)
[![GitHub star chart](https://img.shields.io/github/stars/langchain-ai/langchain?style=social)](https://star-history.com/#langchain-ai/langchain)
[![Dependency Status](https://img.shields.io/librariesio/github/langchain-ai/langchain)](https://libraries.io/github/langchain-ai/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/issues)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/hwchase17/langchain)
[![GitHub star chart](https://img.shields.io/github/stars/hwchase17/langchain?style=social)](https://star-history.com/#hwchase17/langchain)
[![Dependency Status](https://img.shields.io/librariesio/github/hwchase17/langchain)](https://libraries.io/github/hwchase17/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/hwchase17/langchain)](https://github.com/hwchase17/langchain/issues)
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).
**Production Support:** As you move your LangChains into production, we'd love to offer more hands-on support.
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to share more about what you're building, and our team will get in touch.
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
This migration has already started, but we are remaining backwards compatible until 7/28.
On that date, we will remove functionality from `langchain`.
Read more about the motivation and the progress [here](https://github.com/hwchase17/langchain/discussions/8043).
Read how to migrate your code [here](MIGRATE.md).
**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.
## Quick Install

View File

@@ -1,6 +0,0 @@
# Security Policy
## Reporting a Vulnerability
Please report security vulnerabilities by email to `security@langchain.dev`.
This email is an alias to a subset of our maintainers, and will ensure the issue is promptly triaged and acted upon as needed.

View File

@@ -13,6 +13,5 @@ cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
poetry run nbdoc_build
poetry run python generate_api_reference_links.py
yarn install
yarn start

View File

@@ -7,67 +7,19 @@
# -- Path setup --------------------------------------------------------------
import json
import os
import sys
from pathlib import Path
import toml
from docutils import nodes
from sphinx.util.docutils import SphinxDirective
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import toml
_DIR = Path(__file__).parent.absolute()
sys.path.insert(0, os.path.abspath("."))
sys.path.insert(0, os.path.abspath("../../libs/langchain"))
sys.path.insert(0, os.path.abspath("../../libs/experimental"))
with (_DIR.parents[1] / "libs" / "langchain" / "pyproject.toml").open("r") as f:
with open("../../pyproject.toml") as f:
data = toml.load(f)
with (_DIR / "guide_imports.json").open("r") as f:
imported_classes = json.load(f)
class ExampleLinksDirective(SphinxDirective):
"""Directive to generate a list of links to examples.
We have a script that extracts links to API reference docs
from our notebook examples. This directive uses that information
to backlink to the examples from the API reference docs."""
has_content = False
required_arguments = 1
def run(self):
"""Run the directive.
Called any time :example_links:`ClassName` is used
in the template *.rst files."""
class_or_func_name = self.arguments[0]
links = imported_classes.get(class_or_func_name, {})
list_node = nodes.bullet_list()
for doc_name, link in links.items():
item_node = nodes.list_item()
para_node = nodes.paragraph()
link_node = nodes.reference()
link_node["refuri"] = link
link_node.append(nodes.Text(doc_name))
para_node.append(link_node)
item_node.append(para_node)
list_node.append(item_node)
if list_node.children:
title_node = nodes.title()
title_node.append(nodes.Text(f"Examples using {class_or_func_name}"))
return [title_node, list_node]
return [list_node]
def setup(app):
app.add_directive("example_links", ExampleLinksDirective)
# -- Project information -----------------------------------------------------
@@ -100,9 +52,6 @@ extensions = [
]
source_suffix = [".rst"]
# some autodoc pydantic options are repeated in the actual template.
# potentially user error, but there may be bugs in the sphinx extension
# with options not being passed through correctly (from either the location in the code)
autodoc_pydantic_model_show_json = False
autodoc_pydantic_field_list_validators = False
autodoc_pydantic_config_members = False
@@ -115,6 +64,13 @@ autodoc_member_order = "groupwise"
autoclass_content = "both"
autodoc_typehints_format = "short"
autodoc_default_options = {
"members": True,
"show-inheritance": True,
"inherited-members": "BaseModel",
"undoc-members": True,
"special-members": "__call__",
}
# autodoc_typehints = "description"
# Add any paths that contain templates here, relative to this directory.
templates_path = ["templates"]
@@ -156,7 +112,7 @@ html_context = {
html_static_path = ["_static"]
# These paths are either relative to html_static_path
# or fully qualified paths (e.g. https://...)
# or fully qualified paths (eg. https://...)
html_css_files = [
"css/custom.css",
]

View File

@@ -1,279 +1,83 @@
"""Script for auto-generating api_reference.rst."""
import importlib
import inspect
import os
import typing
"""Script for auto-generating api_reference.rst"""
import glob
import re
from pathlib import Path
from typing import TypedDict, Sequence, List, Dict, Literal, Union, Optional
from enum import Enum
from pydantic import BaseModel
ROOT_DIR = Path(__file__).parents[2].absolute()
HERE = Path(__file__).parent
PKG_DIR = ROOT_DIR / "libs" / "langchain" / "langchain"
EXP_DIR = ROOT_DIR / "libs" / "experimental" / "langchain_experimental"
WRITE_FILE = HERE / "api_reference.rst"
EXP_WRITE_FILE = HERE / "experimental_api_reference.rst"
PKG_DIR = ROOT_DIR / "langchain"
WRITE_FILE = Path(__file__).parent / "api_reference.rst"
ClassKind = Literal["TypedDict", "Regular", "Pydantic", "enum"]
def load_members() -> dict:
members: dict = {}
for py in glob.glob(str(PKG_DIR) + "/**/*.py", recursive=True):
module = py[len(str(PKG_DIR)) + 1 :].replace(".py", "").replace("/", ".")
top_level = module.split(".")[0]
if top_level not in members:
members[top_level] = {"classes": [], "functions": []}
with open(py, "r") as f:
for line in f.readlines():
cls = re.findall(r"^class ([^_].*)\(", line)
members[top_level]["classes"].extend([module + "." + c for c in cls])
func = re.findall(r"^def ([^_].*)\(", line)
afunc = re.findall(r"^async def ([^_].*)\(", line)
func_strings = [module + "." + f for f in func + afunc]
members[top_level]["functions"].extend(func_strings)
return members
class ClassInfo(TypedDict):
"""Information about a class."""
def construct_doc(members: dict) -> str:
full_doc = """\
.. _api_reference:
name: str
"""The name of the class."""
qualified_name: str
"""The fully qualified name of the class."""
kind: ClassKind
"""The kind of the class."""
is_public: bool
"""Whether the class is public or not."""
class FunctionInfo(TypedDict):
"""Information about a function."""
name: str
"""The name of the function."""
qualified_name: str
"""The fully qualified name of the function."""
is_public: bool
"""Whether the function is public or not."""
class ModuleMembers(TypedDict):
"""A dictionary of module members."""
classes_: Sequence[ClassInfo]
functions: Sequence[FunctionInfo]
def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
"""Load all members of a module.
Args:
module_path: Path to the module.
namespace: the namespace of the module.
Returns:
list: A list of loaded module objects.
"""
classes_: List[ClassInfo] = []
functions: List[FunctionInfo] = []
module = importlib.import_module(module_path)
for name, type_ in inspect.getmembers(module):
if not hasattr(type_, "__module__"):
continue
if type_.__module__ != module_path:
continue
if inspect.isclass(type_):
if type(type_) == typing._TypedDictMeta: # type: ignore
kind: ClassKind = "TypedDict"
elif issubclass(type_, Enum):
kind = "enum"
elif issubclass(type_, BaseModel):
kind = "Pydantic"
else:
kind = "Regular"
classes_.append(
ClassInfo(
name=name,
qualified_name=f"{namespace}.{name}",
kind=kind,
is_public=not name.startswith("_"),
)
)
elif inspect.isfunction(type_):
functions.append(
FunctionInfo(
name=name,
qualified_name=f"{namespace}.{name}",
is_public=not name.startswith("_"),
)
)
else:
continue
return ModuleMembers(
classes_=classes_,
functions=functions,
)
def _merge_module_members(
module_members: Sequence[ModuleMembers],
) -> ModuleMembers:
"""Merge module members."""
classes_: List[ClassInfo] = []
functions: List[FunctionInfo] = []
for module in module_members:
classes_.extend(module["classes_"])
functions.extend(module["functions"])
return ModuleMembers(
classes_=classes_,
functions=functions,
)
def _load_package_modules(
package_directory: Union[str, Path],
submodule: Optional[str] = None
) -> Dict[str, ModuleMembers]:
"""Recursively load modules of a package based on the file system.
Traversal based on the file system makes it easy to determine which
of the modules/packages are part of the package vs. 3rd party or built-in.
Parameters:
package_directory: Path to the package directory.
submodule: Optional name of submodule to load.
Returns:
list: A list of loaded module objects.
"""
package_path = (
Path(package_directory)
if isinstance(package_directory, str)
else package_directory
)
modules_by_namespace = {}
# Get the high level package name
package_name = package_path.name
# If we are loading a submodule, add it in
if submodule is not None:
package_path = package_path / submodule
for file_path in package_path.rglob("*.py"):
if file_path.name.startswith("_"):
continue
relative_module_name = file_path.relative_to(package_path)
# Skip if any module part starts with an underscore
if any(part.startswith("_") for part in relative_module_name.parts):
continue
# Get the full namespace of the module
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
# Keep only the top level namespace
top_namespace = namespace.split(".")[0]
try:
# If submodule is present, we need to construct the paths in a slightly
# different way
if submodule is not None:
module_members = _load_module_members(
f"{package_name}.{submodule}.{namespace}", f"{submodule}.{namespace}"
)
else:
module_members = _load_module_members(
f"{package_name}.{namespace}", namespace
)
# Merge module members if the namespace already exists
if top_namespace in modules_by_namespace:
existing_module_members = modules_by_namespace[top_namespace]
_module_members = _merge_module_members(
[existing_module_members, module_members]
)
else:
_module_members = module_members
modules_by_namespace[top_namespace] = _module_members
except ImportError as e:
print(f"Error: Unable to import module '{namespace}' with error: {e}")
return modules_by_namespace
def _construct_doc(pkg: str, members_by_namespace: Dict[str, ModuleMembers]) -> str:
"""Construct the contents of the reference.rst file for the given package.
Args:
pkg: The package name
members_by_namespace: The members of the package, dict organized by top level
module contains a list of classes and functions
inside of the top level namespace.
Returns:
The contents of the reference.rst file.
"""
full_doc = f"""\
=======================
``{pkg}`` API Reference
=======================
=============
API Reference
=============
"""
namespaces = sorted(members_by_namespace)
for module in namespaces:
_members = members_by_namespace[module]
classes = _members["classes_"]
for module, _members in sorted(members.items(), key=lambda kv: kv[0]):
classes = _members["classes"]
functions = _members["functions"]
if not (classes or functions):
continue
section = f":mod:`{pkg}.{module}`"
underline = "=" * (len(section) + 1)
module_title = module.replace("_", " ").title()
if module_title == "Llms":
module_title = "LLMs"
section = f":mod:`langchain.{module}`: {module_title}"
full_doc += f"""\
{section}
{underline}
{'=' * (len(section) + 1)}
.. automodule:: {pkg}.{module}
.. automodule:: langchain.{module}
:no-members:
:no-inherited-members:
"""
if classes:
cstring = "\n ".join(sorted(classes))
full_doc += f"""\
Classes
--------------
.. currentmodule:: {pkg}
.. currentmodule:: langchain
.. autosummary::
:toctree: {module}
:template: class.rst
{cstring}
"""
for class_ in sorted(classes, key=lambda c: c["qualified_name"]):
if not class_["is_public"]:
continue
if class_["kind"] == "TypedDict":
template = "typeddict.rst"
elif class_["kind"] == "enum":
template = "enum.rst"
elif class_["kind"] == "Pydantic":
template = "pydantic.rst"
else:
template = "class.rst"
full_doc += f"""\
:template: {template}
{class_["qualified_name"]}
"""
if functions:
_functions = [f["qualified_name"] for f in functions if f["is_public"]]
fstring = "\n ".join(sorted(_functions))
fstring = "\n ".join(sorted(functions))
full_doc += f"""\
Functions
--------------
.. currentmodule:: {pkg}
.. currentmodule:: langchain
.. autosummary::
:toctree: {module}
:template: function.rst
{fstring}
@@ -282,20 +86,10 @@ Functions
def main() -> None:
"""Generate the reference.rst file for each package."""
lc_members = _load_package_modules(PKG_DIR)
# Put tools.render at the top level
tools = _load_package_modules(PKG_DIR, "tools")
lc_members['tools.render'] = tools['render']
lc_doc = ".. _api_reference:\n\n" + _construct_doc("langchain", lc_members)
members = load_members()
full_doc = construct_doc(members)
with open(WRITE_FILE, "w") as f:
f.write(lc_doc)
exp_members = _load_package_modules(EXP_DIR)
exp_doc = ".. _experimental_api_reference:\n\n" + _construct_doc(
"langchain_experimental", exp_members
)
with open(EXP_WRITE_FILE, "w") as f:
f.write(exp_doc)
f.write(full_doc)
if __name__ == "__main__":

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,9 @@
Evaluation
=======================
LangChain has a number of convenient evaluation chains you can use off the shelf to grade your models' oupputs.
.. automodule:: langchain.evaluation
:members:
:undoc-members:
:inherited-members:

View File

@@ -5,6 +5,17 @@
.. autoclass:: {{ objname }}
{% block methods %}
{% if methods %}
.. rubric:: {{ _('Methods') }}
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}
@@ -15,22 +26,3 @@
{%- endfor %}
{% endif %}
{% endblock %}
{% block methods %}
{% if methods %}
.. rubric:: {{ _('Methods') }}
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% for item in methods %}
.. automethod:: {{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
.. example_links:: {{ objname }}

View File

@@ -1,14 +0,0 @@
:mod:`{{module}}`.{{objname}}
{{ underline }}==============
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
{% block attributes %}
{% for item in attributes %}
.. autoattribute:: {{ item }}
{% endfor %}
{% endblock %}
.. example_links:: {{ objname }}

View File

@@ -1,8 +0,0 @@
:mod:`{{module}}`.{{objname}}
{{ underline }}==============
.. currentmodule:: {{ module }}
.. autofunction:: {{ objname }}
.. example_links:: {{ objname }}

View File

@@ -1,22 +0,0 @@
:mod:`{{module}}`.{{objname}}
{{ underline }}==============
.. currentmodule:: {{ module }}
.. autopydantic_model:: {{ objname }}
:model-show-json: False
:model-show-config-summary: False
:model-show-validator-members: False
:model-show-field-summary: False
:field-signature-prefix: param
:members:
:undoc-members:
:inherited-members:
:member-order: groupwise
:show-inheritance: True
:special-members: __call__
{% block attributes %}
{% endblock %}
.. example_links:: {{ objname }}

View File

@@ -5,10 +5,9 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Refresh" content="0; url={{ redirect }}" />
<meta name="robots" content="follow, index">
<meta name="Description" content="Python API reference for LangChain.">
<meta name="Description" content="scikit-learn: machine learning in Python">
<link rel="canonical" href="{{ redirect }}" />
<title>LangChain Python API Reference Documentation.</title>
<title>scikit-learn: machine learning in Python</title>
</head>
<body>
<p>You will be automatically redirected to the <a href="{{ redirect }}">new location of this page</a>.</p>

View File

@@ -1,14 +0,0 @@
:mod:`{{module}}`.{{objname}}
{{ underline }}==============
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
{% block attributes %}
{% for item in attributes %}
.. autoattribute:: {{ item }}
{% endfor %}
{% endblock %}
.. example_links:: {{ objname }}

View File

@@ -19,7 +19,7 @@
{% block htmltitle %}
<title>{{ title|striptags|e }}{{ titlesuffix }}</title>
{% endblock %}
<link rel="canonical" href="https://api.python.langchain.com/en/latest/{{pagename}}.html" />
<link rel="canonical" href="http://scikit-learn.org/stable/{{pagename}}.html" />
{% if favicon_url %}
<link rel="shortcut icon" href="{{ favicon_url|e }}"/>

View File

@@ -6,6 +6,17 @@
{%- set top_container_cls = "sk-landing-container" %}
{%- endif %}
{% if theme_link_to_live_contributing_page|tobool %}
{# Link to development page for live builds #}
{%- set development_link = "https://scikit-learn.org/dev/developers/index.html" %}
{# Open on a new development page in new window/tab for live builds #}
{%- set development_attrs = 'target="_blank" rel="noopener noreferrer"' %}
{%- else %}
{%- set development_link = pathto('developers/index') %}
{%- set development_attrs = '' %}
{%- endif %}
<nav id="navbar" class="{{ nav_bar_class }} navbar navbar-expand-md navbar-light bg-light py-0">
<div class="container-fluid {{ top_container_cls }} px-0">
{%- if logo_url %}
@@ -34,9 +45,6 @@
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('api_reference') }}">API</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('experimental_api_reference') }}">Experimental</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" target="_blank" rel="noopener noreferrer" href="https://python.langchain.com/">Python Docs</a>
</li>

View File

@@ -745,11 +745,6 @@ span.descname {
background-color: transparent;
padding: 0;
font-family: monospace;
font-size: 1.2rem;
}
em.property {
font-weight: normal;
}
span.descclassname {

View File

@@ -1,54 +0,0 @@
# Community navigator
Hi! Thanks for being here. Were lucky to have a community of so many passionate developers building with LangChainwe have so much to teach and learn from each other. Community members contribute code, host meetups, write blog posts, amplify each others work, become each other's customers and collaborators, and so much more.
Whether youre new to LangChain, looking to go deeper, or just want to get more exposure to the world of building with LLMs, this page can point you in the right direction.
- **🦜 Contribute to LangChain**
- **🌍 Meetups, Events, and Hackathons**
- **📣 Help Us Amplify Your Work**
- **💬 Stay in the loop**
# 🦜 Contribute to LangChain
LangChain is the product of over 5,000+ contributions by 1,500+ contributors, and there is ******still****** so much to do together. Here are some ways to get involved:
- **[Open a pull request](https://github.com/langchain-ai/langchain/issues):** Wed appreciate all forms of contributionsnew features, infrastructure improvements, better documentation, bug fixes, etc. If you have an improvement or an idea, wed love to work on it with you.
- **[Read our contributor guidelines:](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/.github/CONTRIBUTING.md)** We ask contributors to follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow, run a few local checks for formatting, linting, and testing before submitting, and follow certain documentation and testing conventions.
- **First time contributor?** [Try one of these PRs with the “good first issue” tag](https://github.com/langchain-ai/langchain/contribute).
- **Become an expert:** Our experts help the community by answering product questions in Discord. If thats a role youd like to play, wed be so grateful! (And we have some special experts-only goodies/perks we can tell you more about). Send us an email to introduce yourself at hello@langchain.dev and well take it from there!
- **Integrate with LangChain:** If your product integrates with LangChainor aspires towe want to help make sure the experience is as smooth as possible for you and end users. Send us an email at hello@langchain.dev and tell us what youre working on.
- **Become an Integration Maintainer:** Partner with our team to ensure your integration stays up-to-date and talk directly with users (and answer their inquiries) in our Discord. Introduce yourself at hello@langchain.dev if youd like to explore this role.
# 🌍 Meetups, Events, and Hackathons
One of our favorite things about working in AI is how much enthusiasm there is for building together. We want to help make that as easy and impactful for you as possible!
- **Find a meetup, hackathon, or webinar:** You can find the one for you on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
- **Submit an event to our calendar:** Email us at events@langchain.dev with a link to your event page! We can also help you spread the word with our local communities.
- **Host a meetup:** If you want to bring a group of builders together, we want to help! We can publicize your event on our event calendar/Twitter, share it with our local communities in Discord, send swag, or potentially hook you up with a sponsor. Email us at events@langchain.dev to tell us about your event!
- **Become a meetup sponsor:** We often hear from groups of builders that want to get together, but are blocked or limited on some dimension (space to host, budget for snacks, prizes to distribute, etc.). If youd like to help, send us an email to events@langchain.dev we can share more about how it works!
- **Speak at an event:** Meetup hosts are always looking for great speakers, presenters, and panelists. If youd like to do that at an event, send us an email to hello@langchain.dev with more information about yourself, what you want to talk about, and what city youre based in and well try to match you with an upcoming event!
- **Tell us about your LLM community:** If you host or participate in a community that would welcome support from LangChain and/or our team, send us an email at hello@langchain.dev and let us know how we can help.
# 📣 Help Us Amplify Your Work
If youre working on something youre proud of, and think the LangChain community would benefit from knowing about it, we want to help you show it off.
- **Post about your work and mention us:** We love hanging out on Twitter to see what people in the space are talking about and working on. If you tag [@langchainai](https://twitter.com/LangChainAI), well almost certainly see it and can show you some love.
- **Publish something on our blog:** If youre writing about your experience building with LangChain, wed love to post (or crosspost) it on our blog! E-mail hello@langchain.dev with a draft of your post! Or even an idea for something you want to write about.
- **Get your product onto our [integrations hub](https://integrations.langchain.com/):** Many developers take advantage of our seamless integrations with other products, and come to our integrations hub to find out who those are. If you want to get your product up there, tell us about it (and how it works with LangChain) at hello@langchain.dev.
# ☀️ Stay in the loop
Heres where our team hangs out, talks shop, spotlights cool work, and shares what were up to. Wed love to see you there too.
- **[Twitter](https://twitter.com/LangChainAI):** We post about what were working on and what cool things were seeing in the space. If you tag @langchainai in your post, well almost certainly see it, and can show you some love!
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with >30k developers who are building with LangChain
- **[GitHub](https://github.com/langchain-ai/langchain):** Open pull requests, contribute to a discussion, and/or contribute
- **[Subscribe to our bi-weekly Release Notes](https://6w1pwbss0py.typeform.com/to/KjZB1auB):** a twice/month email roundup of the coolest things going on in our orbit
- **Slack:** If youre building an application in production at your company, wed love to get into a Slack channel together. Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) and well get in touch about setting one up.

View File

@@ -0,0 +1,10 @@
---
sidebar_position: 0
---
# Integrations
Visit the [Integrations Hub](https://integrations.langchain.com) to further explore, upvote and request integrations across key LangChain components.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -1,17 +0,0 @@
---
sidebar_class_name: hidden
---
# LangChain Expression Language (LCEL)
LangChain Expression Language or LCEL is a declarative way to easily compose chains together.
Any chain constructed this way will automatically have full sync, async, and streaming support.
#### [Interface](/docs/expression_language/interface)
The base interface shared by all LCEL objects
#### [How to](/docs/expression_language/how_to)
How to use core features of LCEL
#### [Cookbook](/docs/expression_language/cookbook)
Examples of common LCEL usage patterns

View File

@@ -4,21 +4,21 @@ sidebar_position: 0
# Introduction
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
**LangChain** is a framework for developing applications powered by language models. It enables applications that are:
- **Data-aware**: connect a language model to other sources of data
- **Agentic**: allow a language model to interact with its environment
The main value props of LangChain are:
1. **Components**: abstractions for working with language models, along with a collection of implementations for each abstraction. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2. **Off-the-shelf chains**: a structured assembly of components for accomplishing specific higher-level tasks
Off-the-shelf chains make it easy to get started. For complex applications, components make it easy to customize existing chains and build new ones.
Off-the-shelf chains make it easy to get started. For more complex applications and nuanced use-cases, components make it easy to customize existing chains or build new ones.
## Get started
[Heres](/docs/get_started/installation) how to install LangChain, set up your environment, and start building.
[Heres](/docs/get_started/installation.html) how to install LangChain, set up your environment, and start building.
We recommend following our [Quickstart](/docs/get_started/quickstart) guide to familiarize yourself with the framework by building your first LangChain application.
We recommend following our [Quickstart](/docs/get_started/quickstart.html) guide to familiarize yourself with the framework by building your first LangChain application.
_**Note**: These docs are for the LangChain [Python package](https://github.com/hwchase17/langchain). For documentation on [LangChain.js](https://github.com/hwchase17/langchainjs), the JS/TS version, [head here](https://js.langchain.com/docs)._
@@ -28,7 +28,7 @@ LangChain provides standard, extendable interfaces and external integrations for
#### [Model I/O](/docs/modules/model_io/)
Interface with language models
#### [Retrieval](/docs/modules/data_connection/)
#### [Data connection](/docs/modules/data_connection/)
Interface with application-specific data
#### [Chains](/docs/modules/chains/)
Construct sequences of calls
@@ -40,24 +40,25 @@ Persist application state between runs of a chain
Log and stream intermediate steps of any chain
## Examples, ecosystem, and resources
### [Use cases](/docs/use_cases/question_answering/)
### [Use cases](/docs/use_cases/)
Walkthroughs and best-practices for common end-to-end use cases, like:
- [Document question answering](/docs/use_cases/question_answering/)
- [Chatbots](/docs/use_cases/chatbots/)
- [Analyzing structured data](/docs/use_cases/qa_structured/sql/)
- [Answering questions using sources](/docs/use_cases/question_answering/)
- [Analyzing structured data](/docs/use_cases/tabular.html)
- and much more...
### [Guides](/docs/guides/)
Learn best practices for developing with LangChain.
### [Ecosystem](/docs/integrations/providers/)
LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing list of [integrations](/docs/integrations/providers/) and [dependent repos](/docs/additional_resources/dependents).
### [Ecosystem](/docs/ecosystem/)
LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing list of [integrations](/docs/ecosystem/integrations/) and [dependent repos](/docs/ecosystem/dependents.html).
### [Additional resources](/docs/additional_resources/)
Our community is full of prolific developers, creative builders, and fantastic teachers. Check out [YouTube tutorials](/docs/additional_resources/youtube) for great tutorials from folks in the community, and [Gallery](https://github.com/kyrolabs/awesome-langchain) for a list of awesome LangChain projects, compiled by the folks at [KyroLabs](https://kyrolabs.com).
Our community is full of prolific developers, creative builders, and fantastic teachers. Check out [YouTube tutorials](/docs/additional_resources/youtube.html) for great tutorials from folks in the community, and [Gallery](https://github.com/kyrolabs/awesome-langchain) for a list of awesome LangChain projects, compiled by the folks at [KyroLabs](https://kyrolabs.com).
### [Community](/docs/community)
Head to the [Community navigator](/docs/community) to find places to ask questions, share feedback, meet other developers, and dream about the future of LLMs.
<h3><span style={{color:"#2e8555"}}> Support </span></h3>
Join us on [GitHub](https://github.com/hwchase17/langchain) or [Discord](https://discord.gg/6adMQxSpJS) to ask questions, share feedback, meet other developers building with LangChain, and dream about the future of LLMs.
## API reference

View File

@@ -22,73 +22,28 @@ import OpenAISetup from "@snippets/get_started/quickstart/openai_setup.mdx"
## Building an application
Now we can start building our language model application. LangChain provides many modules that can be used to build language model applications.
Modules can be used as stand-alones in simple applications and they can be combined for more complex use cases.
The most common and most important chain that LangChain helps create contains three things:
- LLM: The language model is the core reasoning engine here. In order to work with LangChain, you need to understand the different types of language models and how to work with them.
- Prompt Templates: This provides instructions to the language model. This controls what the language model outputs, so understanding how to construct prompts and different prompting strategies is crucial.
- Output Parsers: These translate the raw response from the LLM to a more workable format, making it easy to use the output downstream.
In this getting started guide we will cover those three components by themselves, and then go over how to combine all of them.
Understanding these concepts will set you up well for being able to use and customize LangChain applications.
Most LangChain applications allow you to configure the LLM and/or the prompt used, so knowing how to take advantage of this will be a big enabler.
Now we can start building our language model application. LangChain provides many modules that can be used to build language model applications. Modules can be used as stand-alones in simple applications and they can be combined for more complex use cases.
## LLMs
#### Get predictions from a language model
There are two types of language models, which in LangChain are called:
The basic building block of LangChain is the LLM, which takes in text and generates more text.
- LLMs: this is a language model which takes a string as input and returns a string
- ChatModels: this is a language model which takes a list of messages as input and returns a message
As an example, suppose we're building an application that generates a company name based on a company description. In order to do this, we need to initialize an OpenAI model wrapper. In this case, since we want the outputs to be MORE random, we'll initialize our model with a HIGH temperature.
The input/output for LLMs is simple and easy to understand - a string.
But what about ChatModels? The input there is a list of `ChatMessage`s, and the output is a single `ChatMessage`.
A `ChatMessage` has two required components:
import LLM from "@snippets/get_started/quickstart/llm.mdx"
- `content`: This is the content of the message.
- `role`: This is the role of the entity from which the `ChatMessage` is coming from.
<LLM/>
LangChain provides several objects to easily distinguish between different roles:
## Chat models
- `HumanMessage`: A `ChatMessage` coming from a human/user.
- `AIMessage`: A `ChatMessage` coming from an AI/assistant.
- `SystemMessage`: A `ChatMessage` coming from the system.
- `FunctionMessage`: A `ChatMessage` coming from a function call.
Chat models are a variation on language models. While chat models use language models under the hood, the interface they expose is a bit different: rather than expose a "text in, text out" API, they expose an interface where "chat messages" are the inputs and outputs.
If none of those roles sound right, there is also a `ChatMessage` class where you can specify the role manually.
For more information on how to use these different messages most effectively, see our prompting guide.
You can get chat completions by passing one or more messages to the chat model. The response will be a message. The types of messages currently supported in LangChain are `AIMessage`, `HumanMessage`, `SystemMessage`, and `ChatMessage` -- `ChatMessage` takes in an arbitrary role parameter. Most of the time, you'll just be dealing with `HumanMessage`, `AIMessage`, and `SystemMessage`.
LangChain provides a standard interface for both, but it's useful to understand this difference in order to construct prompts for a given language model.
The standard interface that LangChain provides has two methods:
- `predict`: Takes in a string, returns a string
- `predict_messages`: Takes in a list of messages, returns a message.
Let's see how to work with these different types of models and these different types of inputs.
First, let's import an LLM and a ChatModel.
import ImportLLMs from "@snippets/get_started/quickstart/import_llms.mdx"
<ImportLLMs/>
The `OpenAI` and `ChatOpenAI` objects are basically just configuration objects.
You can initialize them with parameters like `temperature` and others, and pass them around.
Next, let's use the `predict` method to run over a string input.
import InputString from "@snippets/get_started/quickstart/input_string.mdx"
<InputString/>
Finally, let's use the `predict_messages` method to run over a list of messages.
import InputMessages from "@snippets/get_started/quickstart/input_messages.mdx"
<InputMessages/>
For both these methods, you can also pass in parameters as key word arguments.
For example, you could pass in `temperature=0` to adjust the temperature that is used from what the object was configured with.
Whatever values are passed in during run time will always override what the object was configured with.
import ChatModel from "@snippets/get_started/quickstart/chat_model.mdx"
<ChatModel/>
## Prompt templates
@@ -96,71 +51,108 @@ Most LLM applications do not pass user input directly into an LLM. Usually they
In the previous example, the text we passed to the model contained instructions to generate a company name. For our application, it'd be great if the user only had to provide the description of a company/product, without having to worry about giving the model instructions.
PromptTemplates help with exactly this!
They bundle up all the logic for going from user input into a fully formatted prompt.
This can start off very simple - for example, a prompt to produce the above string would just be:
import PromptTemplateLLM from "@snippets/get_started/quickstart/prompt_templates_llms.mdx"
import PromptTemplateChatModel from "@snippets/get_started/quickstart/prompt_templates_chat_models.mdx"
<Tabs>
<TabItem value="llms" label="LLMs" default>
With PromptTemplates this is easy! In this case our template would be very simple:
<PromptTemplateLLM/>
</TabItem>
<TabItem value="chat_models" label="Chat models">
However, the advantages of using these over raw string formatting are several.
You can "partial" out variables - e.g. you can format only some of the variables at a time.
You can compose them together, easily combining different templates into a single prompt.
For explanations of these functionalities, see the [section on prompts](/docs/modules/model_io/prompts) for more detail.
Similar to LLMs, you can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplate`s. You can use `ChatPromptTemplate`'s `format_messages` method to generate the formatted messages.
PromptTemplates can also be used to produce a list of messages.
In this case, the prompt not only contains information about the content, but also each message (its role, its position in the list, etc)
Here, what happens most often is a ChatPromptTemplate is a list of ChatMessageTemplates.
Each ChatMessageTemplate contains instructions for how to format that ChatMessage - its role, and then also its content.
Let's take a look at this below:
Because this is generating a list of messages, it is slightly more complex than the normal prompt template which is generating only a string. Please see the detailed guides on prompts to understand more options available to you here.
<PromptTemplateChatModel/>
</TabItem>
</Tabs>
ChatPromptTemplates can also be constructed in other ways - see the [section on prompts](/docs/modules/model_io/prompts) for more detail.
## Chains
## Output parsers
Now that we've got a model and a prompt template, we'll want to combine the two. Chains give us a way to link (or chain) together multiple primitives, like models, prompts, and other chains.
OutputParsers convert the raw output of an LLM into a format that can be used downstream.
There are few main type of OutputParsers, including:
import ChainLLM from "@snippets/get_started/quickstart/chains_llms.mdx"
import ChainChatModel from "@snippets/get_started/quickstart/chains_chat_models.mdx"
- Convert text from LLM -> structured information (e.g. JSON)
- Convert a ChatMessage into just a string
- Convert the extra information returned from a call besides the message (like OpenAI function invocation) into a string.
<Tabs>
<TabItem value="llms" label="LLMs" default>
For full information on this, see the [section on output parsers](/docs/modules/model_io/output_parsers)
The simplest and most common type of chain is an LLMChain, which passes an input first to a PromptTemplate and then to an LLM. We can construct an LLM chain from our existing model and prompt template.
In this getting started guide, we will write our own output parser - one that converts a comma separated list into a list.
<ChainLLM/>
import OutputParser from "@snippets/get_started/quickstart/output_parser.mdx"
There we go, our first chain! Understanding how this simple chain works will set you up well for working with more complex chains.
<OutputParser/>
</TabItem>
<TabItem value="chat_models" label="Chat models">
## PromptTemplate + LLM + OutputParser
The `LLMChain` can be used with chat models as well:
We can now combine all these into one chain.
This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to a language model, and then pass the output through an (optional) output parser.
This is a convenient way to bundle up a modular piece of logic.
Let's see it in action!
<ChainChatModel/>
</TabItem>
</Tabs>
import LLMChain from "@snippets/get_started/quickstart/llm_chain.mdx"
## Agents
<LLMChain/>
import AgentLLM from "@snippets/get_started/quickstart/agents_llms.mdx"
import AgentChatModel from "@snippets/get_started/quickstart/agents_chat_models.mdx"
Note that we are using the `|` syntax to join these components together.
This `|` syntax is called the LangChain Expression Language.
To learn more about this syntax, read the documentation [here](/docs/expression_language).
Our first chain ran a pre-determined sequence of steps. To handle complex workflows, we need to be able to dynamically choose actions based on inputs.
## Next steps
Agents do just this: they use a language model to determine which actions to take and in what order. Agents are given access to tools, and they repeatedly choose a tool, run the tool, and observe the output until they come up with a final answer.
This is it!
We've now gone over how to create the core building block of LangChain applications.
There is a lot more nuance in all these components (LLMs, prompts, output parsers) and a lot more different components to learn about as well.
To continue on your journey:
To load an agent, you need to choose a(n):
- LLM/Chat model: The language model powering the agent.
- Tool(s): A function that performs a specific duty. This can be things like: Google Search, Database lookup, Python REPL, other chains. For a list of predefined tools and their specifications, see the [Tools documentation](/docs/modules/agents/tools/).
- Agent name: A string that references a supported agent class. An agent class is largely parameterized by the prompt the language model uses to determine which action to take. Because this notebook focuses on the simplest, highest level API, this only covers using the standard supported agents. If you want to implement a custom agent, see [here](/docs/modules/agents/how_to/custom_agent.html). For a list of supported agents and their specifications, see [here](/docs/modules/agents/agent_types/).
- [Dive deeper](/docs/modules/model_io) into LLMs, prompts, and output parsers
- Learn the other [key components](/docs/modules)
- Read up on [LangChain Expression Language](/docs/expression_language) to learn how to chain these components together
- Check out our [helpful guides](/docs/guides) for detailed walkthroughs on particular topics
- Explore [end-to-end use cases](/docs/use_cases)
For this example, we'll be using SerpAPI to query a search engine.
You'll need to install the SerpAPI Python package:
```bash
pip install google-search-results
```
And set the `SERPAPI_API_KEY` environment variable.
<Tabs>
<TabItem value="llms" label="LLMs" default>
<AgentLLM/>
</TabItem>
<TabItem value="chat_models" label="Chat models">
Agents can also be used with chat models, you can initialize one using `AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION` as the agent type.
<AgentChatModel/>
</TabItem>
</Tabs>
## Memory
The chains and agents we've looked at so far have been stateless, but for many applications it's necessary to reference past interactions. This is clearly the case with a chatbot for example, where you want it to understand new messages in the context of past messages.
The Memory module gives you a way to maintain application state. The base Memory interface is simple: it lets you update state given the latest run inputs and outputs and it lets you modify (or contextualize) the next input using the stored state.
There are a number of built-in memory systems. The simplest of these is a buffer memory which just prepends the last few inputs/outputs to the current input - we will use this in the example below.
import MemoryLLM from "@snippets/get_started/quickstart/memory_llms.mdx"
import MemoryChatModel from "@snippets/get_started/quickstart/memory_chat_models.mdx"
<Tabs>
<TabItem value="llms" label="LLMs" default>
<MemoryLLM/>
</TabItem>
<TabItem value="chat_models" label="Chat models">
You can use Memory with chains and agents initialized with chat models. The main difference between this and Memory for LLMs is that rather than trying to condense all previous messages into a string, we can keep them as their own unique memory object.
<MemoryChatModel/>
</TabItem>
</Tabs>

View File

@@ -1,24 +0,0 @@
---
sidebar_position: 3
---
# Comparison Evaluators
Comparison evaluators in LangChain help measure two different chains or LLM outputs. These evaluators are helpful for comparative analyses, such as A/B testing between two language models, or comparing different versions of the same model. They can also be useful for things like generating preference scores for ai-assisted reinforcement learning.
These evaluators inherit from the `PairwiseStringEvaluator` class, providing a comparison interface for two strings - typically, the outputs from two different prompts or models, or two versions of the same model. In essence, a comparison evaluator performs an evaluation on a pair of strings and returns a dictionary containing the evaluation score and other relevant details.
To create a custom comparison evaluator, inherit from the `PairwiseStringEvaluator` class and overwrite the `_evaluate_string_pairs` method. If you require asynchronous evaluation, also overwrite the `_aevaluate_string_pairs` method.
Here's a summary of the key methods and properties of a comparison evaluator:
- `evaluate_string_pairs`: Evaluate the output string pairs. This function should be overwritten when creating custom evaluators.
- `aevaluate_string_pairs`: Asynchronously evaluate the output string pairs. This function should be overwritten for asynchronous evaluation.
- `requires_input`: This property indicates whether this evaluator requires an input string.
- `requires_reference`: This property specifies whether this evaluator requires a reference label.
Detailed information about creating custom evaluators and the available built-in comparison evaluators is provided in the following sections.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -1,27 +0,0 @@
import DocCardList from "@theme/DocCardList";
# Evaluation
Building applications with language models involves many moving parts. One of the most critical components is ensuring that the outcomes produced by your models are reliable and useful across a broad array of inputs, and that they work well with your application's other software components. Ensuring reliability usually boils down to some combination of application design, testing & evaluation, and runtime checks.
The guides in this section review the APIs and functionality LangChain provides to help you better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
Each evaluator type in LangChain comes with ready-to-use implementations and an extensible API that allows for customization according to your unique requirements. Here are some of the types of evaluators we offer:
- [String Evaluators](/docs/guides/evaluation/string/): These evaluators assess the predicted string for a given input, usually comparing it against a reference string.
- [Trajectory Evaluators](/docs/guides/evaluation/trajectory/): These are used to evaluate the entire trajectory of agent actions.
- [Comparison Evaluators](/docs/guides/evaluation/comparison/): These evaluators are designed to compare predictions from two runs on a common input.
These evaluators can be used across various scenarios and can be applied to different chain and LLM implementations in the LangChain library.
We also are working to share guides and cookbooks that demonstrate how to use these evaluators in real-world scenarios, such as:
- [Chain Comparisons](/docs/guides/evaluation/examples/comparisons): This example uses a comparison evaluator to predict the preferred output. It reviews ways to measure confidence intervals to select statistically significant differences in aggregate preference scores across different models or prompts.
## Reference Docs
For detailed information on the available evaluators, including how to instantiate, configure, and customize them, check out the [reference documentation](https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.evaluation) directly.
<DocCardList />

View File

@@ -1,27 +0,0 @@
---
sidebar_position: 2
---
# String Evaluators
A string evaluator is a component within LangChain designed to assess the performance of a language model by comparing its generated outputs (predictions) to a reference string or an input. This comparison is a crucial step in the evaluation of language models, providing a measure of the accuracy or quality of the generated text.
In practice, string evaluators are typically used to evaluate a predicted string against a given input, such as a question or a prompt. Often, a reference label or context string is provided to define what a correct or ideal response would look like. These evaluators can be customized to tailor the evaluation process to fit your application's specific requirements.
To create a custom string evaluator, inherit from the `StringEvaluator` class and implement the `_evaluate_strings` method. If you require asynchronous support, also implement the `_aevaluate_strings` method.
Here's a summary of the key attributes and methods associated with a string evaluator:
- `evaluation_name`: Specifies the name of the evaluation.
- `requires_input`: Boolean attribute that indicates whether the evaluator requires an input string. If True, the evaluator will raise an error when the input isn't provided. If False, a warning will be logged if an input _is_ provided, indicating that it will not be considered in the evaluation.
- `requires_reference`: Boolean attribute specifying whether the evaluator requires a reference label. If True, the evaluator will raise an error when the reference isn't provided. If False, a warning will be logged if a reference _is_ provided, indicating that it will not be considered in the evaluation.
String evaluators also implement the following methods:
- `aevaluate_strings`: Asynchronously evaluates the output of the Chain or Language Model, with support for optional input and label.
- `evaluate_strings`: Synchronously evaluates the output of the Chain or Language Model, with support for optional input and label.
The following sections provide detailed information on available string evaluator implementations as well as how to create a custom string evaluator.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -1,28 +0,0 @@
---
sidebar_position: 4
---
# Trajectory Evaluators
Trajectory Evaluators in LangChain provide a more holistic approach to evaluating an agent. These evaluators assess the full sequence of actions taken by an agent and their corresponding responses, which we refer to as the "trajectory". This allows you to better measure an agent's effectiveness and capabilities.
A Trajectory Evaluator implements the `AgentTrajectoryEvaluator` interface, which requires two main methods:
- `evaluate_agent_trajectory`: This method synchronously evaluates an agent's trajectory.
- `aevaluate_agent_trajectory`: This asynchronous counterpart allows evaluations to be run in parallel for efficiency.
Both methods accept three main parameters:
- `input`: The initial input given to the agent.
- `prediction`: The final predicted response from the agent.
- `agent_trajectory`: The intermediate steps taken by the agent, given as a list of tuples.
These methods return a dictionary. It is recommended that custom implementations return a `score` (a float indicating the effectiveness of the agent) and `reasoning` (a string explaining the reasoning behind the score).
You can capture an agent's trajectory by initializing the agent with the `return_intermediate_steps=True` parameter. This lets you collect all intermediate steps without relying on special callbacks.
For a deeper dive into the implementation and use of Trajectory Evaluators, refer to the sections below.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -2,21 +2,11 @@
import DocCardList from "@theme/DocCardList";
[LangSmith](https://smith.langchain.com) helps you trace and evaluate your language model applications and intelligent agents to help you
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
move from prototype to production.
Check out the [interactive walkthrough](/docs/guides/langsmith/walkthrough) below to get started.
Check out the [interactive walkthrough](walkthrough) below to get started.
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
For tutorials and other end-to-end examples demonstrating ways to integrate LangSmith in your workflow,
check out the [LangSmith Cookbook](https://github.com/langchain-ai/langsmith-cookbook). Some of the guides therein include:
- Leveraging user feedback in your JS application ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/feedback-examples/nextjs/README.md)).
- Building an automated feedback pipeline ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/feedback-examples/algorithmic-feedback/algorithmic_feedback.ipynb)).
- How to evaluate and audit your RAG workflows ([link](https://github.com/langchain-ai/langsmith-cookbook/tree/main/testing-examples/qa-correctness)).
- How to fine-tune a LLM on real usage data ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/fine-tuning-examples/export-to-openai/fine-tuning-on-chat-runs.ipynb)).
- How to use the [LangChain Hub](https://smith.langchain.com/hub) to version your prompts ([link](https://github.com/langchain-ai/langsmith-cookbook/blob/main/hub-examples/retrieval-qa-chain/retrieval-qa.ipynb))
<DocCardList />
<DocCardList />

View File

@@ -1,8 +0,0 @@
# Moderation
One of the key concerns with using LLMs is that they may generate harmful or unethical text. This is an area of active research in the field. Here we present some built-in chains inspired by this research, which are intended to make the outputs of LLMs safer.
- [Moderation chain](/docs/guides/safety/moderation): Explicitly check if any output text is harmful and flag it.
- [Constitutional chain](/docs/guides/safety/constitutional_chain): Prompt the model with a set of principles which should guide it's behavior.
- [Logical Fallacy chain](/docs/guides/safety/logical_fallacy_chain): Checks the model output against logical fallacies to correct any deviation.
- [Amazon Comprehend moderation chain](/docs/guides/safety/amazon_comprehend_chain): Use [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect and handle PII and toxicity.

View File

@@ -1,85 +0,0 @@
# Removing logical fallacies from model output
Logical fallacies are flawed reasoning or false arguments that can undermine the validity of a model's outputs. Examples include circular reasoning, false
dichotomies, ad hominem attacks, etc. Machine learning models are optimized to perform well on specific metrics like accuracy, perplexity, or loss. However,
optimizing for metrics alone does not guarantee logically sound reasoning.
Language models can learn to exploit flaws in reasoning to generate plausible-sounding but logically invalid arguments. When models rely on fallacies, their outputs become unreliable and untrustworthy, even if they achieve high scores on metrics. Users cannot depend on such outputs. Propagating logical fallacies can spread misinformation, confuse users, and lead to harmful real-world consequences when models are deployed in products or services.
Monitoring and testing specifically for logical flaws is challenging unlike other quality issues. It requires reasoning about arguments rather than pattern matching.
Therefore, it is crucial that model developers proactively address logical fallacies after optimizing metrics. Specialized techniques like causal modeling, robustness testing, and bias mitigation can help avoid flawed reasoning. Overall, allowing logical flaws to persist makes models less safe and ethical. Eliminating fallacies ensures model outputs remain logically valid and aligned with human reasoning. This maintains user trust and mitigates risks.
```python
# Imports
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain_experimental.fallacy_removal.base import FallacyChain
```
```python
# Example of a model output being returned with a logical fallacy
misleading_prompt = PromptTemplate(
template="""You have to respond by using only logical fallacies inherent in your answer explanations.
Question: {question}
Bad answer:""",
input_variables=["question"],
)
llm = OpenAI(temperature=0)
misleading_chain = LLMChain(llm=llm, prompt=misleading_prompt)
misleading_chain.run(question="How do I know the earth is round?")
```
<CodeOutputBlock lang="python">
```
'The earth is round because my professor said it is, and everyone believes my professor'
```
</CodeOutputBlock>
```python
fallacies = FallacyChain.get_fallacies(["correction"])
fallacy_chain = FallacyChain.from_llm(
chain=misleading_chain,
logical_fallacies=fallacies,
llm=llm,
verbose=True,
)
fallacy_chain.run(question="How do I know the earth is round?")
```
<CodeOutputBlock lang="python">
```
> Entering new FallacyChain chain...
Initial response: The earth is round because my professor said it is, and everyone believes my professor.
Applying correction...
Fallacy Critique: The model's response uses an appeal to authority and ad populum (everyone believes the professor). Fallacy Critique Needed.
Updated response: You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.
> Finished chain.
'You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.'
```
</CodeOutputBlock>

View File

@@ -12,7 +12,7 @@ Here are the agents available in LangChain.
### [Zero-shot ReAct](/docs/modules/agents/agent_types/react.html)
This agent uses the [ReAct](https://arxiv.org/pdf/2210.03629) framework to determine which tool to use
This agent uses the [ReAct](https://arxiv.org/pdf/2205.00445.pdf) framework to determine which tool to use
based solely on the tool's description. Any number of tools can be provided.
This agent requires that a description is provided for each tool.
@@ -28,7 +28,7 @@ navigating around a browser.
### [OpenAI Functions](/docs/modules/agents/agent_types/openai_functions_agent.html)
Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been explicitly fine-tuned to detect when a
function should be called and respond with the inputs that should be passed to the function.
function should to be called and respond with the inputs that should be passed to the function.
The OpenAI Functions Agent is designed to work with these models.
### [Conversational](/docs/modules/agents/agent_types/chat_conversation_agent.html)
@@ -37,11 +37,11 @@ This agent is designed to be used in conversational settings.
The prompt is designed to make the agent helpful and conversational.
It uses the ReAct framework to decide which tool to use, and uses memory to remember the previous conversation interactions.
### [Self-ask with search](/docs/modules/agents/agent_types/self_ask_with_search.html)
### [Self ask with search](/docs/modules/agents/agent_types/self_ask_with_search.html)
This agent utilizes a single tool that should be named `Intermediate Answer`.
This tool should be able to lookup factual answers to questions. This agent
is equivalent to the original [self-ask with search paper](https://ofir.io/self-ask.pdf),
is equivalent to the original [self ask with search paper](https://ofir.io/self-ask.pdf),
where a Google search API was provided as the tool.
### [ReAct document store](/docs/modules/agents/agent_types/react_docstore.html)
@@ -54,4 +54,4 @@ This agent is equivalent to the
original [ReAct paper](https://arxiv.org/pdf/2210.03629.pdf), specifically the Wikipedia example.
## [Plan-and-execute agents](/docs/modules/agents/agent_types/plan_and_execute.html)
Plan-and-execute agents accomplish an objective by first planning what to do, then executing the sub tasks. This idea is largely inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) and then the ["Plan-and-Solve" paper](https://arxiv.org/abs/2305.04091).
Plan and execute agents accomplish an objective by first planning what to do, then executing the sub tasks. This idea is largely inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) and then the ["Plan-and-Solve" paper](https://arxiv.org/abs/2305.04091).

View File

@@ -1,6 +1,6 @@
# OpenAI functions
Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to detect when a function should be called and respond with the inputs that should be passed to the function.
Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to detect when a function should to be called and respond with the inputs that should be passed to the function.
In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call those functions.
The goal of the OpenAI Function APIs is to more reliably return valid and useful function calls than a generic text completion or chat API.

View File

@@ -1,6 +1,6 @@
# Plan-and-execute
# Plan and execute
Plan-and-execute agents accomplish an objective by first planning what to do, then executing the sub tasks. This idea is largely inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) and then the ["Plan-and-Solve" paper](https://arxiv.org/abs/2305.04091).
Plan and execute agents accomplish an objective by first planning what to do, then executing the sub tasks. This idea is largely inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) and then the ["Plan-and-Solve" paper](https://arxiv.org/abs/2305.04091).
The planning is almost always done by an LLM.

View File

@@ -1,13 +1,13 @@
# Custom LLM agent
# Custom LLM Agent
This notebook goes through how to create your own custom LLM agent.
An LLM agent consists of three parts:
- `PromptTemplate`: This is the prompt template that can be used to instruct the language model on what to do
- PromptTemplate: This is the prompt template that can be used to instruct the language model on what to do
- LLM: This is the language model that powers the agent
- `stop` sequence: Instructs the LLM to stop generating as soon as this string is found
- `OutputParser`: This determines how to parse the LLM output into an `AgentAction` or `AgentFinish` object
- OutputParser: This determines how to parse the LLMOutput into an AgentAction or AgentFinish object
import Example from "@snippets/modules/agents/how_to/custom_llm_agent.mdx"

View File

@@ -4,10 +4,10 @@ This notebook goes through how to create your own custom agent based on a chat m
An LLM chat agent consists of three parts:
- `PromptTemplate`: This is the prompt template that can be used to instruct the language model on what to do
- `ChatModel`: This is the language model that powers the agent
- PromptTemplate: This is the prompt template that can be used to instruct the language model on what to do
- ChatModel: This is the language model that powers the agent
- `stop` sequence: Instructs the LLM to stop generating as soon as this string is found
- `OutputParser`: This determines how to parse the LLM output into an `AgentAction` or `AgentFinish` object
- OutputParser: This determines how to parse the LLMOutput into an AgentAction or AgentFinish object
import Example from "@snippets/modules/agents/how_to/custom_llm_chat_agent.mdx"

View File

@@ -3,80 +3,46 @@ sidebar_position: 4
---
# Agents
The core idea of agents is to use an LLM to choose a sequence of actions to take.
In chains, a sequence of actions is hardcoded (in code).
In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.
Some applications require a flexible chain of calls to LLMs and other tools based on user input. The **Agent** interface provides the flexibility for such applications. An agent has access to a suite of tools, and determines which ones to use depending on the user input. Agents can use multiple tools, and use the output of one tool as the input to the next.
There are several key components here:
There are two main types of agents:
## Agent
- **Action agents**: at each timestep, decide on the next action using the outputs of all previous actions
- **Plan-and-execute agents**: decide on the full sequence of actions up front, then execute them all without updating the plan
This is the class responsible for deciding what step to take next.
This is powered by a language model and a prompt.
This prompt can include things like:
Action agents are suitable for small tasks, while plan-and-execute agents are better for complex or long-running tasks that require maintaining long-term objectives and focus. Often the best approach is to combine the dynamism of an action agent with the planning abilities of a plan-and-execute agent by letting the plan-and-execute agent use action agents to execute plans.
1. The personality of the agent (useful for having it respond in a certain way)
2. Background context for the agent (useful for giving it more context on the types of tasks it's being asked to do)
3. Prompting strategies to invoke better reasoning (the most famous/widely used being [ReAct](https://arxiv.org/abs/2210.03629))
For a full list of agent types see [agent types](/docs/modules/agents/agent_types/). Additional abstractions involved in agents are:
- [**Tools**](/docs/modules/agents/tools/): the actions an agent can take. What tools you give an agent highly depend on what you want the agent to do
- [**Toolkits**](/docs/modules/agents/toolkits/): wrappers around collections of tools that can be used together a specific use case. For example, in order for an agent to
interact with a SQL database it will likely need one tool to execute queries and another to inspect tables
LangChain provides a few different types of agents to get started.
Even then, you will likely want to customize those agents with parts (1) and (2).
For a full list of agent types see [agent types](/docs/modules/agents/agent_types/)
## Action agents
## Tools
At a high-level an action agent:
1. Receives user input
2. Decides which tool, if any, to use and the tool input
3. Calls the tool and records the output (also known as an "observation")
4. Decides the next step using the history of tools, tool inputs, and observations
5. Repeats 3-4 until it determines it can respond directly to the user
Tools are functions that an agent calls.
There are two important considerations here:
Action agents are wrapped in **agent executors**, which are responsible for calling the agent, getting back an action and action input, calling the tool that the action references with the generated input, getting the output of the tool, and then passing all that information back into the agent to get the next action it should take.
1. Giving the agent access to the right tools
2. Describing the tools in a way that is most helpful to the agent
Although an agent can be constructed in many ways, it typically involves these components:
Without both, the agent you are trying to build will not work.
If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objective.
If you don't describe the tools properly, the agent won't know how to properly use them.
- **Prompt template**: Responsible for taking the user input and previous steps and constructing a prompt
to send to the language model
- **Language model**: Takes the prompt with use input and action history and decides what to do next
- **Output parser**: Takes the output of the language model and parses it into the next action or a final answer
LangChain provides a wide set of tools to get started, but also makes it easy to define your own (including custom descriptions).
For a full list of tools, see [here](/docs/modules/agents/tools/)
## Plan-and-execute agents
## Toolkits
At a high-level a plan-and-execute agent:
1. Receives user input
2. Plans the full sequence of steps to take
3. Executes the steps in order, passing the outputs of past steps as inputs to future steps
Often the set of tools an agent has access to is more important than a single tool.
For this LangChain provides the concept of toolkits - groups of tools needed to accomplish specific objectives.
There are generally around 3-5 tools in a toolkit.
LangChain provides a wide set of toolkits to get started.
For a full list of toolkits, see [here](/docs/modules/agents/toolkits/)
## AgentExecutor
The agent executor is the runtime for an agent.
This is what actually calls the agent and executes the actions it chooses.
Pseudocode for this runtime is below:
```python
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
```
While this may seem simple, there are several complexities this runtime handles for you, including:
1. Handling cases where the agent selects a non-existent tool
2. Handling cases where the tool errors
3. Handling cases where the agent produces output that cannot be parsed into a tool invocation
4. Logging and observability at all levels (agent decisions, tool calls) either to stdout or [LangSmith](https://smith.langchain.com).
## Other types of agent runtimes
The `AgentExecutor` class is the main agent runtime supported by LangChain.
However, there are other, more experimental runtimes we also support.
These include:
- [Plan-and-execute Agent](/docs/modules/agents/agent_types/plan_and_execute.html)
- [Baby AGI](/docs/use_cases/autonomous_agents/baby_agi.html)
- [Auto GPT](/docs/use_cases/autonomous_agents/autogpt.html)
The most typical implementation is to have the planner be a language model, and the executor be an action agent. Read more [here](/docs/modules/agents/agent_types/plan_and_execute.html).
## Get started

View File

@@ -3,8 +3,8 @@ sidebar_position: 3
---
# Toolkits
:::info
Head to [Integrations](/docs/integrations/toolkits/) for documentation on built-in toolkit integrations.
:::
Toolkits are collections of tools that are designed to be used together for specific tasks and have convenience loading methods.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -0,0 +1,2 @@
label: 'How-to'
position: 0

View File

@@ -3,10 +3,6 @@ sidebar_position: 2
---
# Tools
:::info
Head to [Integrations](/docs/integrations/tools/) for documentation on built-in tool integrations.
:::
Tools are interfaces that an agent can use to interact with the world.
## Get started

View File

@@ -0,0 +1 @@
label: 'Integrations'

View File

@@ -0,0 +1,2 @@
label: 'How-to'
position: 0

View File

@@ -3,10 +3,6 @@ sidebar_position: 5
---
# Callbacks
:::info
Head to [Integrations](/docs/integrations/callbacks/) for documentation on built-in callbacks integrations with 3rd-party tools.
:::
LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks.
import GetStarted from "@snippets/modules/callbacks/get_started.mdx"

View File

@@ -0,0 +1 @@
label: 'Integrations'

View File

@@ -0,0 +1,8 @@
---
sidebar_position: 4
---
# Additional
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -0,0 +1,7 @@
# Dynamically selecting from multiple prompts
This notebook demonstrates how to use the `RouterChain` paradigm to create a chain that dynamically selects the prompt to use for a given input. Specifically we show how to use the `MultiPromptChain` to create a question-answering chain that selects the prompt which is most relevant for a given question, and then answers the question using that prompt.
import Example from "@snippets/modules/chains/additional/multi_prompt_router.mdx"
<Example/>

View File

@@ -1,4 +1,4 @@
# Dynamically select from multiple retrievers
# Dynamically selecting from multiple retrievers
This notebook demonstrates how to use the `RouterChain` paradigm to create a chain that dynamically selects which Retrieval system to use. Specifically we show how to use the `MultiRetrievalQAChain` to create a question-answering chain that selects the retrieval QA chain which is most relevant for a given question, and then answers the question using it.

View File

@@ -1,4 +1,4 @@
# QA over in-memory documents
# Document QA
Here we walk through how to use LangChain for question answering over a list of documents. Under the hood we'll be using our [Document chains](/docs/modules/chains/document/).

View File

@@ -3,7 +3,7 @@ sidebar_position: 2
---
# Documents
These are the core chains for working with documents. They are useful for summarizing documents, answering questions over documents, extracting information from documents, and more.
These are the core chains for working with Documents. They are useful for summarizing documents, answering questions over documents, extracting information from documents, and more.
These chains all implement a common interface:

View File

@@ -3,10 +3,10 @@ sidebar_position: 1
---
# Refine
The Refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.
The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.
Since the Refine chain only passes a single document to the LLM at a time, it is well-suited for tasks that require analyzing more documents than can fit in the model's context.
The obvious tradeoff is that this chain will make far more LLM calls than, for example, the Stuff documents chain.
There are also certain tasks which are difficult to accomplish iteratively. For example, the Refine chain can perform poorly when documents frequently cross-reference one another or when a task requires detailed information from many documents.
![refine_diagram](/img/refine.jpg)
![refine_diagram](/img/refine.jpg)

View File

@@ -1,11 +1,11 @@
# LLM
An `LLMChain` is a simple chain that adds some functionality around language models. It is used widely throughout LangChain, including in other chains and agents.
An LLMChain is a simple chain that adds some functionality around language models. It is used widely throughout LangChain, including in other chains and agents.
An `LLMChain` consists of a `PromptTemplate` and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.
An LLMChain consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.
## Get started
import Example from "@snippets/modules/chains/foundational/llm_chain.mdx"
<Example/>
<Example/>

View File

@@ -1,10 +1,10 @@
# Sequential
<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! Instead, edit the notebook w/the location & name as this file. -->
The next step after calling a language model is make a series of calls to a language model. This is particularly useful when you want to take the output from one call and use it as the input to another.
The next step after calling a language model is to make a series of calls to a language model. This is particularly useful when you want to take the output from one call and use it as the input to another.
In this notebook we will walk through some examples of how to do this, using sequential chains. Sequential chains allow you to connect multiple chains and compose them into pipelines that execute some specific scenario. There are two types of sequential chains:
In this notebook we will walk through some examples for how to do this, using sequential chains. Sequential chains allow you to connect multiple chains and compose them into pipelines that execute some specific scenario.. There are two types of sequential chains:
- `SimpleSequentialChain`: The simplest form of sequential chains, where each step has a singular input/output, and the output of one step is the input to the next.
- `SequentialChain`: A more general form of sequential chains, allowing for multiple inputs/outputs.

View File

@@ -19,6 +19,8 @@ For more specifics check out:
- [How-to](/docs/modules/chains/how_to/) for walkthroughs of different chain features
- [Foundational](/docs/modules/chains/foundational/) to get acquainted with core building block chains
- [Document](/docs/modules/chains/document/) to learn how to incorporate documents into chains
- [Popular](/docs/modules/chains/popular/) chains for the most common use cases
- [Additional](/docs/modules/chains/additional/) to see some of the more advanced chains and integrations that you can use out of the box
## Why do we need chains?
@@ -28,4 +30,4 @@ Chains allow us to combine multiple components together to create a single, cohe
import GetStarted from "@snippets/modules/chains/get_started.mdx"
<GetStarted/>
<GetStarted/>

View File

@@ -0,0 +1,9 @@
---
sidebar_position: 0
---
# API chains
APIChain enables using LLMs to interact with APIs to retrieve relevant information. Construct the chain by providing a question relevant to the provided API documentation.
import Example from "@snippets/modules/chains/popular/api.mdx"
<Example/>

View File

@@ -2,10 +2,10 @@
sidebar_position: 2
---
# Store and reference chat history
# Conversational Retrieval QA
The ConversationalRetrievalQA chain builds on RetrievalQAChain to provide a chat history component.
It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question-answering chain to return a response.
It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question answering chain to return a response.
To create one, you will need a retriever. In the below example, we will create one from a vector store, which can be created from embeddings.

View File

@@ -0,0 +1,8 @@
---
sidebar_position: 3
---
# Popular
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -1,7 +1,7 @@
# SQL Database Chain
# SQL
This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a SQL database.
import Example from "@snippets/modules/chains/popular/sqlite.mdx"
<Example/>
<Example/>

View File

@@ -0,0 +1,8 @@
# Summarization
A summarization chain can be used to summarize multiple documents. One way is to input multiple smaller documents, after they have been divided into chunks, and operate over them with a MapReduceDocumentsChain. You can also choose instead for the chain that does summarization to be a StuffDocumentsChain, or a RefineDocumentsChain.
import Example from "@snippets/modules/chains/popular/summarize.mdx"
<Example/>

View File

@@ -1,7 +1,7 @@
---
sidebar_position: 1
---
# QA using a Retriever
# Retrieval QA
This example showcases question answering over an index.

View File

@@ -0,0 +1,2 @@
label: 'How-to'
position: 0

View File

@@ -3,15 +3,11 @@ sidebar_position: 0
---
# Document loaders
:::info
Head to [Integrations](/docs/integrations/document_loaders/) for documentation on built-in document loader integrations with 3rd-party tools.
:::
Use document loaders to load data from a source as `Document`'s. A `Document` is a piece of text
and associated metadata. For example, there are document loaders for loading a simple `.txt` file, for loading the text
contents of any web page, or even for loading a transcript of a YouTube video.
Document loaders provide a "load" method for loading data as documents from a configured source. They optionally
Document loaders expose a "load" method for loading data as documents from a configured source. They optionally
implement a "lazy load" as well for lazily loading data into memory.
## Get started

View File

@@ -0,0 +1 @@
label: 'Integrations'

View File

@@ -3,10 +3,6 @@ sidebar_position: 1
---
# Document transformers
:::info
Head to [Integrations](/docs/integrations/document_transformers/) for documentation on built-in document transformer integrations with 3rd-party tools.
:::
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example
is you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain
has a number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

View File

@@ -2,8 +2,8 @@
This is the simplest method. This splits based on characters (by default "\n\n") and measure chunk length by number of characters.
1. How the text is split: by single character.
2. How the chunk size is measured: by number of characters.
1. How the text is split: by single character
2. How the chunk size is measured: by number of characters
import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/character_text_splitter.mdx"

View File

@@ -1,6 +1,6 @@
# Split code
CodeTextSplitter allows you to split your code with multiple languages supported. Import enum `Language` and specify the language.
CodeTextSplitter allows you to split your code with multiple language support. Import enum `Language` and specify the language.
import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/code_splitter.mdx"

View File

@@ -2,8 +2,8 @@
This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in order until the chunks are small enough. The default list is `["\n\n", "\n", " ", ""]`. This has the effect of trying to keep all paragraphs (and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest semantically related pieces of text.
1. How the text is split: by list of characters.
2. How the chunk size is measured: by number of characters.
1. How the text is split: by list of characters
2. How the chunk size is measured: by number of characters
import Example from "@snippets/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter.mdx"

View File

@@ -2,60 +2,15 @@
sidebar_position: 1
---
# Retrieval
# Data connection
Many LLM applications require user-specific data that is not part of the model's training set.
The primary way of accomplishing this is through Retrieval Augmented Generation (RAG).
In this process, external data is *retrieved* and then passed to the LLM when doing the *generation* step.
Many LLM applications require user-specific data that is not part of the model's training set. LangChain gives you the
building blocks to load, transform, store and query your data via:
LangChain provides all the building blocks for RAG applications - from simple to complex.
This section of the documentation covers everything related to the *retrieval* step - e.g. the fetching of the data.
Although this sounds simple, it can be subtly complex.
This encompasses several key modules.
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data
![data_connection_diagram](/img/data_connection.jpg)
**[Document loaders](/docs/modules/data_connection/document_loaders/)**
Load documents from many different sources.
LangChain provides over 100 different document loaders as well as integrations with other major providers in the space,
like AirByte and Unstructured.
We provide integrations to load all types of documents (HTML, PDF, code) from all types of locations (private s3 buckets, public websites).
**[Document transformers](/docs/modules/data_connection/document_transformers/)**
A key part of retrieval is fetching only the relevant parts of documents.
This involves several transformation steps in order to best prepare the documents for retrieval.
One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
LangChain provides several different algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).
**[Text embedding models](/docs/modules/data_connection/text_embedding/)**
Another key part of retrieval has become creating embeddings for documents.
Embeddings capture the semantic meaning of the text, allowing you to quickly and
efficiently find other pieces of text that are similar.
LangChain provides integrations with over 25 different embedding providers and methods,
from open-source to proprietary API,
allowing you to choose the one best suited for your needs.
LangChain provides a standard interface, allowing you to easily swap between models.
**[Vector stores](/docs/modules/data_connection/vectorstores/)**
With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these embeddings.
LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-hosted proprietary ones,
allowing you to choose the one best suited for your needs.
LangChain exposes a standard interface, allowing you to easily swap between vector stores.
**[Retrievers](/docs/modules/data_connection/retrievers/)**
Once the data is in the database, you still need to retrieve it.
LangChain supports many different retrieval algorithms and is one of the places where we add the most value.
We support basic methods that are easy to get started - namely simple semantic search.
However, we have also added a collection of algorithms on top of this to increase performance.
These include:
- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain a reference to something that isn't just semantic but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query.
- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
- And more!

View File

@@ -0,0 +1,2 @@
label: 'How-to'
position: 0

View File

@@ -5,10 +5,10 @@ One challenge with retrieval is that usually you don't know the specific queries
Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing” here refers to both compressing the contents of an individual document and filtering out documents wholesale.
To use the Contextual Compression Retriever, you'll need:
- a base retriever
- a base Retriever
- a Document Compressor
The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the contents of documents or dropping documents altogether.
The Contextual Compression Retriever passes queries to the base Retriever, takes the initial documents and passes them through the Document Compressor. The Document Compressor takes a list of Documents and shortens it by reducing the contents of Documents or dropping Documents altogether.
![](https://drive.google.com/uc?id=1CtNgWODXZudxAWSRiWgSGEoTNrUFT98v)

View File

@@ -1,6 +1,6 @@
# Self-querying
A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.
A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that structured query to it's underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documented, but to also extract filters from the user query on the metadata of stored documents and to execute those filters.
![](https://drive.google.com/uc?id=1OQUN-0MJcDUxmPXofgS7MqReEs720pqS)

View File

@@ -8,7 +8,7 @@ The algorithm for scoring them is:
semantic_similarity + (1.0 - decay_rate) ^ hours_passed
```
Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain "fresh".
Notably, `hours_passed` refers to the hours passed since the object in the retriever **was last accessed**, not since it was created. This means that frequently accessed objects remain "fresh."
import Example from "@snippets/modules/data_connection/retrievers/how_to/time_weighted_vectorstore.mdx"

View File

@@ -1,9 +1,9 @@
# Vector store-backed retriever
A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface.
A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the Vector Store class to make it conform to the Retriever interface.
It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.
Once you construct a vector store, it's very easy to construct a retriever. Let's walk through an example.
Once you construct a Vector store, it's very easy to construct a retriever. Let's walk through an example.
import Example from "@snippets/modules/data_connection/retrievers/how_to/vectorstore.mdx"

Some files were not shown because too many files have changed in this diff Show More