mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-05 00:30:18 +00:00
Compare commits
1 Commits
bagatur/rf
...
eugene/fix
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
ff1de3d347 |
@@ -12,7 +12,7 @@
|
||||
|
||||
// The optional 'workspaceFolder' property is the path VS Code should open by default when
|
||||
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
|
||||
"workspaceFolder": "/workspaces/langchain",
|
||||
"workspaceFolder": "/workspaces/${localWorkspaceFolderBasename}",
|
||||
|
||||
// Prevent the container from shutting down
|
||||
"overrideCommand": true
|
||||
|
||||
@@ -6,7 +6,7 @@ services:
|
||||
context: ..
|
||||
volumes:
|
||||
# Update this to wherever you want VS Code to mount the folder of your project
|
||||
- ..:/workspaces/langchain:cached
|
||||
- ..:/workspaces:cached
|
||||
networks:
|
||||
- langchain-network
|
||||
# environment:
|
||||
|
||||
376
.github/CONTRIBUTING.md
vendored
376
.github/CONTRIBUTING.md
vendored
@@ -3,4 +3,378 @@
|
||||
Hi there! Thank you for even being interested in contributing to LangChain.
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
|
||||
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
## 🗺️ Guidelines
|
||||
|
||||
### 👩💻 Contributing Code
|
||||
|
||||
To contribute to this project, please follow the ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
|
||||
Please do not try to push directly to this repo unless you are a maintainer.
|
||||
|
||||
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
|
||||
maintainers.
|
||||
|
||||
Pull requests cannot land without passing the formatting, linting, and testing checks first. See [Testing](#testing) and
|
||||
[Formatting and Linting](#formatting-and-linting) for how to run these checks locally.
|
||||
|
||||
It's essential that we maintain great documentation and testing. If you:
|
||||
- Fix a bug
|
||||
- Add a relevant unit or integration test when possible. These live in `tests/unit_tests` and `tests/integration_tests`.
|
||||
- Make an improvement
|
||||
- Update any affected example notebooks and documentation. These live in `docs`.
|
||||
- Update unit and integration tests when relevant.
|
||||
- Add a feature
|
||||
- Add a demo notebook in `docs/docs/`.
|
||||
- Add unit and integration tests.
|
||||
|
||||
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
|
||||
best way to get our attention.
|
||||
|
||||
### 🚩GitHub Issues
|
||||
|
||||
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.
|
||||
|
||||
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help organize issues.
|
||||
|
||||
If you start working on an issue, please assign it to yourself.
|
||||
|
||||
If you are adding an issue, please try to keep it focused on a single, modular bug/improvement/feature.
|
||||
If two issues are related, or blocking, please link them rather than combining them.
|
||||
|
||||
We will try to keep these issues as up-to-date as possible, though
|
||||
with the rapid rate of development in this field some may get out of date.
|
||||
If you notice this happening, please let us know.
|
||||
|
||||
### 🙋Getting Help
|
||||
|
||||
Our goal is to have the simplest developer setup possible. Should you experience any difficulty getting setup, please
|
||||
contact a maintainer! Not only do we want to help get you unblocked, but we also want to make sure that the process is
|
||||
smooth for future contributors.
|
||||
|
||||
In a similar vein, we do enforce certain linting, formatting, and documentation standards in the codebase.
|
||||
If you are finding these difficult (or even just annoying) to work with, feel free to contact a maintainer for help -
|
||||
we do not want these to get in the way of getting good code into the codebase.
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
This quick start guide explains how to run the repository locally.
|
||||
For a [development container](https://containers.dev/), see the [.devcontainer folder](https://github.com/langchain-ai/langchain/tree/master/.devcontainer).
|
||||
|
||||
### Dependency Management: Poetry and other env/dependency managers
|
||||
|
||||
This project utilizes [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
|
||||
|
||||
❗Note: *Before installing Poetry*, if you use `Conda`, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
|
||||
|
||||
Install Poetry: **[documentation on how to install it](https://python-poetry.org/docs/#installation)**.
|
||||
|
||||
❗Note: If you use `Conda` or `Pyenv` as your environment/package manager, after installing Poetry,
|
||||
tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
|
||||
|
||||
### Different packages
|
||||
|
||||
This repository contains multiple packages:
|
||||
- `langchain-core`: Base interfaces for key abstractions as well as logic for combining them in chains (LangChain Expression Language).
|
||||
- `langchain-community`: Third-party integrations of various components.
|
||||
- `langchain`: Chains, agents, and retrieval logic that makes up the cognitive architecture of your applications.
|
||||
- `langchain-experimental`: Components and chains that are experimental, either in the sense that the techniques are novel and still being tested, or they require giving the LLM more access than would be possible in most production systems.
|
||||
|
||||
Each of these has its own development environment. Docs are run from the top-level makefile, but development
|
||||
is split across separate test & release flows.
|
||||
|
||||
For this quickstart, start with langchain:
|
||||
|
||||
```bash
|
||||
cd libs/langchain
|
||||
```
|
||||
|
||||
### Local Development Dependencies
|
||||
|
||||
Install langchain development requirements (for running langchain, running examples, linting, formatting, tests, and coverage):
|
||||
|
||||
```bash
|
||||
poetry install --with test
|
||||
```
|
||||
|
||||
Then verify dependency installation:
|
||||
|
||||
```bash
|
||||
make test
|
||||
```
|
||||
|
||||
If the tests don't pass, you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
|
||||
|
||||
If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running
|
||||
Poetry v1.6.1+. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases.
|
||||
If you are still seeing this bug on v1.6.1, you may also try disabling "modern installation"
|
||||
(`poetry config installer.modern-installation false`) and re-installing requirements.
|
||||
See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
|
||||
|
||||
### Testing
|
||||
|
||||
_some test dependencies are optional; see section about optional dependencies_.
|
||||
|
||||
Unit tests cover modular logic that does not require calls to outside APIs.
|
||||
If you add new logic, please add a unit test.
|
||||
|
||||
To run unit tests:
|
||||
|
||||
```bash
|
||||
make test
|
||||
```
|
||||
|
||||
To run unit tests in Docker:
|
||||
|
||||
```bash
|
||||
make docker_tests
|
||||
```
|
||||
|
||||
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
|
||||
|
||||
### Only develop langchain_core or langchain_experimental
|
||||
|
||||
If you are only developing `langchain_core` or `langchain_experimental`, you can simply install the dependencies for the respective projects and run tests:
|
||||
|
||||
```bash
|
||||
cd libs/core
|
||||
poetry install --with test
|
||||
make test
|
||||
```
|
||||
|
||||
Or:
|
||||
|
||||
```bash
|
||||
cd libs/experimental
|
||||
poetry install --with test
|
||||
make test
|
||||
```
|
||||
|
||||
### Formatting and Linting
|
||||
|
||||
Run these locally before submitting a PR; the CI system will check also.
|
||||
|
||||
#### Code Formatting
|
||||
|
||||
Formatting for this project is done via [ruff](https://docs.astral.sh/ruff/rules/).
|
||||
|
||||
To run formatting for docs, cookbook and templates:
|
||||
|
||||
```bash
|
||||
make format
|
||||
```
|
||||
|
||||
To run formatting for a library, run the same command from the relevant library directory:
|
||||
|
||||
```bash
|
||||
cd libs/{LIBRARY}
|
||||
make format
|
||||
```
|
||||
|
||||
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
|
||||
|
||||
```bash
|
||||
make format_diff
|
||||
```
|
||||
|
||||
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
|
||||
|
||||
#### Linting
|
||||
|
||||
Linting for this project is done via a combination of [ruff](https://docs.astral.sh/ruff/rules/) and [mypy](http://mypy-lang.org/).
|
||||
|
||||
To run linting for docs, cookbook and templates:
|
||||
|
||||
```bash
|
||||
make lint
|
||||
```
|
||||
|
||||
To run linting for a library, run the same command from the relevant library directory:
|
||||
|
||||
```bash
|
||||
cd libs/{LIBRARY}
|
||||
make lint
|
||||
```
|
||||
|
||||
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
|
||||
|
||||
```bash
|
||||
make lint_diff
|
||||
```
|
||||
|
||||
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
|
||||
|
||||
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
#### Spellcheck
|
||||
|
||||
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
|
||||
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
|
||||
To check spelling for this project:
|
||||
|
||||
```bash
|
||||
make spell_check
|
||||
```
|
||||
|
||||
To fix spelling in place:
|
||||
|
||||
```bash
|
||||
make spell_fix
|
||||
```
|
||||
|
||||
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
|
||||
|
||||
```python
|
||||
[tool.codespell]
|
||||
...
|
||||
# Add here:
|
||||
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
|
||||
```
|
||||
|
||||
## Working with Optional Dependencies
|
||||
|
||||
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
|
||||
|
||||
You only need to add a new dependency if a **unit test** relies on the package.
|
||||
If your package is only required for **integration tests**, then you can skip these
|
||||
steps and leave all pyproject.toml and poetry.lock files alone.
|
||||
|
||||
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
|
||||
that most users won't have it installed.
|
||||
|
||||
Users who do not have the dependency installed should be able to **import** your code without
|
||||
any side effects (no warnings, no errors, no exceptions).
|
||||
|
||||
To introduce the dependency to the pyproject.toml file correctly, please do the following:
|
||||
|
||||
1. Add the dependency to the main group as an optional dependency
|
||||
```bash
|
||||
poetry add --optional [package_name]
|
||||
```
|
||||
2. Open pyproject.toml and add the dependency to the `extended_testing` extra
|
||||
3. Relock the poetry file to update the extra.
|
||||
```bash
|
||||
poetry lock --no-update
|
||||
```
|
||||
4. Add a unit test that the very least attempts to import the new code. Ideally, the unit
|
||||
test makes use of lightweight fixtures to test the logic of the code.
|
||||
5. Please use the `@pytest.mark.requires(package_name)` decorator for any tests that require the dependency.
|
||||
|
||||
## Adding a Jupyter Notebook
|
||||
|
||||
If you are adding a Jupyter Notebook example, you'll want to install the optional `dev` dependencies.
|
||||
|
||||
To install dev dependencies:
|
||||
|
||||
```bash
|
||||
poetry install --with dev
|
||||
```
|
||||
|
||||
Launch a notebook:
|
||||
|
||||
```bash
|
||||
poetry run jupyter notebook
|
||||
```
|
||||
|
||||
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
|
||||
|
||||
## Documentation
|
||||
|
||||
While the code is split between `langchain` and `langchain.experimental`, the documentation is one holistic thing.
|
||||
This covers how to get started contributing to documentation.
|
||||
|
||||
From the top-level of this repo, install documentation dependencies:
|
||||
|
||||
```bash
|
||||
poetry install
|
||||
```
|
||||
|
||||
### Contribute Documentation
|
||||
|
||||
The docs directory contains Documentation and API Reference.
|
||||
|
||||
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
|
||||
|
||||
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
For that reason, we ask that you add good documentation to all classes and methods.
|
||||
|
||||
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Build Documentation Locally
|
||||
|
||||
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
|
||||
|
||||
Before building the documentation, it is always a good idea to clean the build directory:
|
||||
|
||||
```bash
|
||||
make docs_clean
|
||||
make api_docs_clean
|
||||
```
|
||||
|
||||
Next, you can build the documentation as outlined below:
|
||||
|
||||
```bash
|
||||
make docs_build
|
||||
make api_docs_build
|
||||
```
|
||||
|
||||
Finally, run the link checker to ensure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
make api_docs_linkcheck
|
||||
```
|
||||
|
||||
### Verify Documentation changes
|
||||
|
||||
After pushing documentation changes to the repository, you can preview and verify that the changes are
|
||||
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
|
||||
This will take you to a preview of the documentation changes.
|
||||
This preview is created by [Vercel](https://vercel.com/docs/getting-started-with-vercel).
|
||||
|
||||
## 📕 Releases & Versioning
|
||||
|
||||
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
|
||||
a maintainer and published to [PyPI](https://pypi.org/).
|
||||
The different packages are versioned slightly differently.
|
||||
|
||||
### `langchain-core`
|
||||
|
||||
`langchain-core` is currently on version `0.1.x`.
|
||||
|
||||
As `langchain-core` contains the base abstractions and runtime for the whole LangChain ecosystem, we will communicate any breaking changes with advance notice and version bumps. The exception for this is anything in `langchain_core.beta`. The reason for `langchain_core.beta` is that given the rate of change of the field, being able to move quickly is still a priority, and this module is our attempt to do so.
|
||||
|
||||
Minor version increases will occur for:
|
||||
|
||||
- Breaking changes for any public interfaces NOT in `langchain_core.beta`
|
||||
|
||||
Patch version increases will occur for:
|
||||
|
||||
- Bug fixes
|
||||
- New features
|
||||
- Any changes to private interfaces
|
||||
- Any changes to `langchain_core.beta`
|
||||
|
||||
### `langchain`
|
||||
|
||||
`langchain` is currently on version `0.0.x`
|
||||
|
||||
All changes will be accompanied by a patch version increase. Any changes to public interfaces are nearly always done in a backwards compatible way and will be communicated ahead of time when they are not backwards compatible.
|
||||
|
||||
We are targeting January 2024 for a release of `langchain` v0.1, at which point `langchain` will adopt the same versioning policy as `langchain-core`.
|
||||
|
||||
### `langchain-community`
|
||||
|
||||
`langchain-community` is currently on version `0.0.x`
|
||||
|
||||
All changes will be accompanied by a patch version increase.
|
||||
|
||||
### `langchain-experimental`
|
||||
|
||||
`langchain-experimental` is currently on version `0.0.x`
|
||||
|
||||
All changes will be accompanied by a patch version increase.
|
||||
|
||||
## 🌟 Recognition
|
||||
|
||||
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
|
||||
If you have a Twitter account you would like us to mention, please let us know in the PR or through another means.
|
||||
|
||||
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
@@ -1,38 +0,0 @@
|
||||
labels: [idea]
|
||||
body:
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I searched existing ideas and did not find a similar one
|
||||
required: true
|
||||
- label: I added a very descriptive title
|
||||
required: true
|
||||
- label: I've clearly described the feature request and motivation for it
|
||||
required: true
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
- type: textarea
|
||||
id: proposal
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Proposal (If applicable)
|
||||
description: |
|
||||
If you would like to propose a solution, please describe it here.
|
||||
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
@@ -1,122 +0,0 @@
|
||||
labels: [Question]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain 🦜️🔗!
|
||||
|
||||
Please follow these instructions, fill every question, and do every step. 🙏
|
||||
|
||||
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
|
||||
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
|
||||
|
||||
By asking questions in a structured way (following this) it will be much easier for us to help you.
|
||||
|
||||
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
|
||||
|
||||
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
|
||||
|
||||
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
|
||||
|
||||
Relevant links to check before opening a question to see if your question has already been answered, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this question.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- type: checkboxes
|
||||
id: help
|
||||
attributes:
|
||||
label: Commit to Help
|
||||
description: |
|
||||
After submitting this, I commit to one of:
|
||||
|
||||
* Read open questions until I find 2 where I can help someone and add a comment to help there.
|
||||
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
|
||||
* Once my question is answered, I will mark the answer as "accepted".
|
||||
options:
|
||||
- label: I commit to help with one of those options 👆
|
||||
required: true
|
||||
- type: textarea
|
||||
id: example
|
||||
attributes:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
184
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
184
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -1,120 +1,106 @@
|
||||
name: "\U0001F41B Bug Report"
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
|
||||
description: Submit a bug report to help us improve LangChain. To report a security issue, please instead use the security option below.
|
||||
labels: ["02 Bug Report"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report bugs in LangChain.
|
||||
|
||||
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
|
||||
to ask for help with your issue.
|
||||
|
||||
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
Thank you for taking the time to file a bug report. Before creating a new
|
||||
issue, please make sure to take a few moments to check the issue tracker
|
||||
for existing issues about the bug.
|
||||
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
label: System Info
|
||||
description: Please share your system info with us.
|
||||
placeholder: LangChain version, platform, python version, ...
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: textarea
|
||||
id: who-can-help
|
||||
attributes:
|
||||
label: Who can help?
|
||||
description: |
|
||||
Your issue will be replied to more quickly if you can figure out the right person to tag with @
|
||||
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
|
||||
|
||||
The core maintainers strive to read all issues, but tagging them will help them prioritize.
|
||||
|
||||
Please tag fewer than 3 people.
|
||||
|
||||
@hwchase17 - project lead
|
||||
|
||||
Tracing / Callbacks
|
||||
- @agola11
|
||||
|
||||
Async
|
||||
- @agola11
|
||||
|
||||
DataLoader Abstractions
|
||||
- @eyurtsev
|
||||
|
||||
LLM/Chat Wrappers
|
||||
- @hwchase17
|
||||
- @agola11
|
||||
|
||||
Tools / Toolkits
|
||||
- ...
|
||||
|
||||
placeholder: "@Username ..."
|
||||
|
||||
- type: checkboxes
|
||||
id: information-scripts-examples
|
||||
attributes:
|
||||
label: Information
|
||||
description: "The problem arises when using:"
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- label: I am sure that this is a bug in LangChain rather than my code.
|
||||
required: true
|
||||
- label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
|
||||
required: true
|
||||
- label: "The official example notebooks/scripts"
|
||||
- label: "My own modified scripts"
|
||||
|
||||
- type: checkboxes
|
||||
id: related-components
|
||||
attributes:
|
||||
label: Related Components
|
||||
description: "Select the components related to the issue (if applicable):"
|
||||
options:
|
||||
- label: "LLMs/Chat Models"
|
||||
- label: "Embedding Models"
|
||||
- label: "Prompts / Prompt Templates / Prompt Selectors"
|
||||
- label: "Output Parsers"
|
||||
- label: "Document Loaders"
|
||||
- label: "Vector Stores / Retrievers"
|
||||
- label: "Memory"
|
||||
- label: "Agents / Agent Executors"
|
||||
- label: "Tools / Toolkits"
|
||||
- label: "Chains"
|
||||
- label: "Callbacks/Tracing"
|
||||
- label: "Async"
|
||||
|
||||
- type: textarea
|
||||
id: reproduction
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Example Code
|
||||
label: Reproduction
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
Please provide a [code sample](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
|
||||
If you have code snippets, error messages, stack traces please provide them here as well.
|
||||
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
|
||||
Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
The following code:
|
||||
|
||||
```python
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
Steps to reproduce the behavior:
|
||||
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
```
|
||||
- type: textarea
|
||||
id: error
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Error Message and Stack Trace (if applicable)
|
||||
description: |
|
||||
If you are reporting an error, please include the full error message and stack trace.
|
||||
placeholder: |
|
||||
Exception + full stack trace
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description telling what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
id: expected-behavior
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
label: Expected behavior
|
||||
description: "A clear and concise description of what you would expect to happen."
|
||||
|
||||
11
.github/ISSUE_TEMPLATE/config.yml
vendored
11
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,15 +1,6 @@
|
||||
blank_issues_enabled: false
|
||||
blank_issues_enabled: true
|
||||
version: 2.1
|
||||
contact_links:
|
||||
- name: 🤔 Question or Problem
|
||||
about: Ask a question or ask about a problem in GitHub Discussions.
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a
|
||||
- name: Discord
|
||||
url: https://discord.gg/6adMQxSpJS
|
||||
about: General community discussions
|
||||
- name: Feature Request
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas
|
||||
about: Suggest a feature or an idea
|
||||
- name: Show and tell
|
||||
about: Show what you built with LangChain
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell
|
||||
|
||||
45
.github/ISSUE_TEMPLATE/documentation.yml
vendored
45
.github/ISSUE_TEMPLATE/documentation.yml
vendored
@@ -4,55 +4,16 @@ title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
|
||||
labels: [03 - Documentation]
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: input
|
||||
id: url
|
||||
attributes:
|
||||
label: URL
|
||||
description: URL to documentation
|
||||
validations:
|
||||
required: false
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
referring to.
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
description: >
|
||||
Please describe as clearly as possible what topics you think are missing
|
||||
from the current documentation.
|
||||
from the current documentation.
|
||||
30
.github/ISSUE_TEMPLATE/feature-request.yml
vendored
Normal file
30
.github/ISSUE_TEMPLATE/feature-request.yml
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
name: "\U0001F680 Feature request"
|
||||
description: Submit a proposal/request for a new LangChain feature
|
||||
labels: ["02 Feature Request"]
|
||||
body:
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
|
||||
- type: textarea
|
||||
id: contribution
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Your contribution
|
||||
description: |
|
||||
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md)
|
||||
18
.github/ISSUE_TEMPLATE/other.yml
vendored
Normal file
18
.github/ISSUE_TEMPLATE/other.yml
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
name: Other Issue
|
||||
description: Raise an issue that wouldn't be covered by the other templates.
|
||||
title: "Issue: <Please write a comprehensive title after the 'Issue: ' prefix>"
|
||||
labels: [04 - Other]
|
||||
|
||||
body:
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue you'd like to raise."
|
||||
description: >
|
||||
Please describe the issue you'd like to raise as clearly as possible.
|
||||
Make sure to include any relevant links or references.
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Suggestion:"
|
||||
description: >
|
||||
Please outline a suggestion to improve the issue here.
|
||||
25
.github/ISSUE_TEMPLATE/privileged.yml
vendored
25
.github/ISSUE_TEMPLATE/privileged.yml
vendored
@@ -1,25 +0,0 @@
|
||||
name: 🔒 Privileged
|
||||
description: You are a LangChain maintainer, or was asked directly by a maintainer to create an issue here. If not, check the other options.
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain! 🚀
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
or are a regular contributor to LangChain with previous merged pull requests.
|
||||
- type: checkboxes
|
||||
id: privileged
|
||||
attributes:
|
||||
label: Privileged issue
|
||||
description: Confirm that you are allowed to create an issue here.
|
||||
options:
|
||||
- label: I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.
|
||||
required: true
|
||||
- type: textarea
|
||||
id: content
|
||||
attributes:
|
||||
label: Issue Content
|
||||
description: Add the content of the issue here.
|
||||
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,29 +1,20 @@
|
||||
Thank you for contributing to LangChain!
|
||||
<!-- Thank you for contributing to LangChain!
|
||||
|
||||
- [ ] **PR title**: "package: description"
|
||||
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
|
||||
- Example: "community: add foobar LLM"
|
||||
Replace this entire comment with:
|
||||
- **Description:** a description of the change,
|
||||
- **Issue:** the issue # it fixes (if applicable),
|
||||
- **Dependencies:** any dependencies required for this change,
|
||||
- **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below),
|
||||
- **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
|
||||
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
|
||||
- **Description:** a description of the change
|
||||
- **Issue:** the issue # it fixes, if applicable
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc:
|
||||
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
|
||||
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, please include
|
||||
If you're adding a new integration, please include:
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
2. an example notebook showing its use. It lives in `docs/extras` directory.
|
||||
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
|
||||
|
||||
Additional guidelines:
|
||||
- Make sure optional dependencies are imported within a function.
|
||||
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
|
||||
- Most PRs should not touch more than one package.
|
||||
- Changes should be backwards compatible.
|
||||
- If you are adding something to community, do not re-import it in langchain.
|
||||
|
||||
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
|
||||
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.
|
||||
-->
|
||||
|
||||
7
.github/actions/people/Dockerfile
vendored
7
.github/actions/people/Dockerfile
vendored
@@ -1,7 +0,0 @@
|
||||
FROM python:3.9
|
||||
|
||||
RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.1,<6.0.0"
|
||||
|
||||
COPY ./app /app
|
||||
|
||||
CMD ["python", "/app/main.py"]
|
||||
11
.github/actions/people/action.yml
vendored
11
.github/actions/people/action.yml
vendored
@@ -1,11 +0,0 @@
|
||||
# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/action.yml
|
||||
name: "Generate LangChain People"
|
||||
description: "Generate the data for the LangChain People page"
|
||||
author: "Jacob Lee <jacob@langchain.dev>"
|
||||
inputs:
|
||||
token:
|
||||
description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
|
||||
required: true
|
||||
runs:
|
||||
using: 'docker'
|
||||
image: 'Dockerfile'
|
||||
643
.github/actions/people/app/main.py
vendored
643
.github/actions/people/app/main.py
vendored
@@ -1,643 +0,0 @@
|
||||
# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/app/main.py
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Container, Dict, List, Set, Union
|
||||
|
||||
import httpx
|
||||
import yaml
|
||||
from github import Github
|
||||
from pydantic import BaseModel, SecretStr
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
github_graphql_url = "https://api.github.com/graphql"
|
||||
questions_category_id = "DIC_kwDOIPDwls4CS6Ve"
|
||||
|
||||
# discussions_query = """
|
||||
# query Q($after: String, $category_id: ID) {
|
||||
# repository(name: "langchain", owner: "langchain-ai") {
|
||||
# discussions(first: 100, after: $after, categoryId: $category_id) {
|
||||
# edges {
|
||||
# cursor
|
||||
# node {
|
||||
# number
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# title
|
||||
# createdAt
|
||||
# comments(first: 100) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# isAnswer
|
||||
# replies(first: 10) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# """
|
||||
|
||||
# issues_query = """
|
||||
# query Q($after: String) {
|
||||
# repository(name: "langchain", owner: "langchain-ai") {
|
||||
# issues(first: 100, after: $after) {
|
||||
# edges {
|
||||
# cursor
|
||||
# node {
|
||||
# number
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# title
|
||||
# createdAt
|
||||
# state
|
||||
# comments(first: 100) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# """
|
||||
|
||||
prs_query = """
|
||||
query Q($after: String) {
|
||||
repository(name: "langchain", owner: "langchain-ai") {
|
||||
pullRequests(first: 100, after: $after, states: MERGED) {
|
||||
edges {
|
||||
cursor
|
||||
node {
|
||||
changedFiles
|
||||
additions
|
||||
deletions
|
||||
number
|
||||
labels(first: 100) {
|
||||
nodes {
|
||||
name
|
||||
}
|
||||
}
|
||||
author {
|
||||
login
|
||||
avatarUrl
|
||||
url
|
||||
... on User {
|
||||
twitterUsername
|
||||
}
|
||||
}
|
||||
title
|
||||
createdAt
|
||||
state
|
||||
reviews(first:100) {
|
||||
nodes {
|
||||
author {
|
||||
login
|
||||
avatarUrl
|
||||
url
|
||||
... on User {
|
||||
twitterUsername
|
||||
}
|
||||
}
|
||||
state
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
class Author(BaseModel):
|
||||
login: str
|
||||
avatarUrl: str
|
||||
url: str
|
||||
twitterUsername: Union[str, None] = None
|
||||
|
||||
|
||||
# Issues and Discussions
|
||||
|
||||
|
||||
class CommentsNode(BaseModel):
|
||||
createdAt: datetime
|
||||
author: Union[Author, None] = None
|
||||
|
||||
|
||||
class Replies(BaseModel):
|
||||
nodes: List[CommentsNode]
|
||||
|
||||
|
||||
class DiscussionsCommentsNode(CommentsNode):
|
||||
replies: Replies
|
||||
|
||||
|
||||
class Comments(BaseModel):
|
||||
nodes: List[CommentsNode]
|
||||
|
||||
|
||||
class DiscussionsComments(BaseModel):
|
||||
nodes: List[DiscussionsCommentsNode]
|
||||
|
||||
|
||||
class IssuesNode(BaseModel):
|
||||
number: int
|
||||
author: Union[Author, None] = None
|
||||
title: str
|
||||
createdAt: datetime
|
||||
state: str
|
||||
comments: Comments
|
||||
|
||||
|
||||
class DiscussionsNode(BaseModel):
|
||||
number: int
|
||||
author: Union[Author, None] = None
|
||||
title: str
|
||||
createdAt: datetime
|
||||
comments: DiscussionsComments
|
||||
|
||||
|
||||
class IssuesEdge(BaseModel):
|
||||
cursor: str
|
||||
node: IssuesNode
|
||||
|
||||
|
||||
class DiscussionsEdge(BaseModel):
|
||||
cursor: str
|
||||
node: DiscussionsNode
|
||||
|
||||
|
||||
class Issues(BaseModel):
|
||||
edges: List[IssuesEdge]
|
||||
|
||||
|
||||
class Discussions(BaseModel):
|
||||
edges: List[DiscussionsEdge]
|
||||
|
||||
|
||||
class IssuesRepository(BaseModel):
|
||||
issues: Issues
|
||||
|
||||
|
||||
class DiscussionsRepository(BaseModel):
|
||||
discussions: Discussions
|
||||
|
||||
|
||||
class IssuesResponseData(BaseModel):
|
||||
repository: IssuesRepository
|
||||
|
||||
|
||||
class DiscussionsResponseData(BaseModel):
|
||||
repository: DiscussionsRepository
|
||||
|
||||
|
||||
class IssuesResponse(BaseModel):
|
||||
data: IssuesResponseData
|
||||
|
||||
|
||||
class DiscussionsResponse(BaseModel):
|
||||
data: DiscussionsResponseData
|
||||
|
||||
|
||||
# PRs
|
||||
|
||||
|
||||
class LabelNode(BaseModel):
|
||||
name: str
|
||||
|
||||
|
||||
class Labels(BaseModel):
|
||||
nodes: List[LabelNode]
|
||||
|
||||
|
||||
class ReviewNode(BaseModel):
|
||||
author: Union[Author, None] = None
|
||||
state: str
|
||||
|
||||
|
||||
class Reviews(BaseModel):
|
||||
nodes: List[ReviewNode]
|
||||
|
||||
|
||||
class PullRequestNode(BaseModel):
|
||||
number: int
|
||||
labels: Labels
|
||||
author: Union[Author, None] = None
|
||||
changedFiles: int
|
||||
additions: int
|
||||
deletions: int
|
||||
title: str
|
||||
createdAt: datetime
|
||||
state: str
|
||||
reviews: Reviews
|
||||
# comments: Comments
|
||||
|
||||
|
||||
class PullRequestEdge(BaseModel):
|
||||
cursor: str
|
||||
node: PullRequestNode
|
||||
|
||||
|
||||
class PullRequests(BaseModel):
|
||||
edges: List[PullRequestEdge]
|
||||
|
||||
|
||||
class PRsRepository(BaseModel):
|
||||
pullRequests: PullRequests
|
||||
|
||||
|
||||
class PRsResponseData(BaseModel):
|
||||
repository: PRsRepository
|
||||
|
||||
|
||||
class PRsResponse(BaseModel):
|
||||
data: PRsResponseData
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
input_token: SecretStr
|
||||
github_repository: str
|
||||
httpx_timeout: int = 30
|
||||
|
||||
|
||||
def get_graphql_response(
|
||||
*,
|
||||
settings: Settings,
|
||||
query: str,
|
||||
after: Union[str, None] = None,
|
||||
category_id: Union[str, None] = None,
|
||||
) -> Dict[str, Any]:
|
||||
headers = {"Authorization": f"token {settings.input_token.get_secret_value()}"}
|
||||
# category_id is only used by one query, but GraphQL allows unused variables, so
|
||||
# keep it here for simplicity
|
||||
variables = {"after": after, "category_id": category_id}
|
||||
response = httpx.post(
|
||||
github_graphql_url,
|
||||
headers=headers,
|
||||
timeout=settings.httpx_timeout,
|
||||
json={"query": query, "variables": variables, "operationName": "Q"},
|
||||
)
|
||||
if response.status_code != 200:
|
||||
logging.error(
|
||||
f"Response was not 200, after: {after}, category_id: {category_id}"
|
||||
)
|
||||
logging.error(response.text)
|
||||
raise RuntimeError(response.text)
|
||||
data = response.json()
|
||||
if "errors" in data:
|
||||
logging.error(f"Errors in response, after: {after}, category_id: {category_id}")
|
||||
logging.error(data["errors"])
|
||||
logging.error(response.text)
|
||||
raise RuntimeError(response.text)
|
||||
return data
|
||||
|
||||
|
||||
# def get_graphql_issue_edges(*, settings: Settings, after: Union[str, None] = None):
|
||||
# data = get_graphql_response(settings=settings, query=issues_query, after=after)
|
||||
# graphql_response = IssuesResponse.model_validate(data)
|
||||
# return graphql_response.data.repository.issues.edges
|
||||
|
||||
|
||||
# def get_graphql_question_discussion_edges(
|
||||
# *,
|
||||
# settings: Settings,
|
||||
# after: Union[str, None] = None,
|
||||
# ):
|
||||
# data = get_graphql_response(
|
||||
# settings=settings,
|
||||
# query=discussions_query,
|
||||
# after=after,
|
||||
# category_id=questions_category_id,
|
||||
# )
|
||||
# graphql_response = DiscussionsResponse.model_validate(data)
|
||||
# return graphql_response.data.repository.discussions.edges
|
||||
|
||||
|
||||
def get_graphql_pr_edges(*, settings: Settings, after: Union[str, None] = None):
|
||||
if after is None:
|
||||
print("Querying PRs...")
|
||||
else:
|
||||
print(f"Querying PRs with cursor {after}...")
|
||||
data = get_graphql_response(
|
||||
settings=settings,
|
||||
query=prs_query,
|
||||
after=after
|
||||
)
|
||||
graphql_response = PRsResponse.model_validate(data)
|
||||
return graphql_response.data.repository.pullRequests.edges
|
||||
|
||||
|
||||
# def get_issues_experts(settings: Settings):
|
||||
# issue_nodes: List[IssuesNode] = []
|
||||
# issue_edges = get_graphql_issue_edges(settings=settings)
|
||||
|
||||
# while issue_edges:
|
||||
# for edge in issue_edges:
|
||||
# issue_nodes.append(edge.node)
|
||||
# last_edge = issue_edges[-1]
|
||||
# issue_edges = get_graphql_issue_edges(settings=settings, after=last_edge.cursor)
|
||||
|
||||
# commentors = Counter()
|
||||
# last_month_commentors = Counter()
|
||||
# authors: Dict[str, Author] = {}
|
||||
|
||||
# now = datetime.now(tz=timezone.utc)
|
||||
# one_month_ago = now - timedelta(days=30)
|
||||
|
||||
# for issue in issue_nodes:
|
||||
# issue_author_name = None
|
||||
# if issue.author:
|
||||
# authors[issue.author.login] = issue.author
|
||||
# issue_author_name = issue.author.login
|
||||
# issue_commentors = set()
|
||||
# for comment in issue.comments.nodes:
|
||||
# if comment.author:
|
||||
# authors[comment.author.login] = comment.author
|
||||
# if comment.author.login != issue_author_name:
|
||||
# issue_commentors.add(comment.author.login)
|
||||
# for author_name in issue_commentors:
|
||||
# commentors[author_name] += 1
|
||||
# if issue.createdAt > one_month_ago:
|
||||
# last_month_commentors[author_name] += 1
|
||||
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
# def get_discussions_experts(settings: Settings):
|
||||
# discussion_nodes: List[DiscussionsNode] = []
|
||||
# discussion_edges = get_graphql_question_discussion_edges(settings=settings)
|
||||
|
||||
# while discussion_edges:
|
||||
# for discussion_edge in discussion_edges:
|
||||
# discussion_nodes.append(discussion_edge.node)
|
||||
# last_edge = discussion_edges[-1]
|
||||
# discussion_edges = get_graphql_question_discussion_edges(
|
||||
# settings=settings, after=last_edge.cursor
|
||||
# )
|
||||
|
||||
# commentors = Counter()
|
||||
# last_month_commentors = Counter()
|
||||
# authors: Dict[str, Author] = {}
|
||||
|
||||
# now = datetime.now(tz=timezone.utc)
|
||||
# one_month_ago = now - timedelta(days=30)
|
||||
|
||||
# for discussion in discussion_nodes:
|
||||
# discussion_author_name = None
|
||||
# if discussion.author:
|
||||
# authors[discussion.author.login] = discussion.author
|
||||
# discussion_author_name = discussion.author.login
|
||||
# discussion_commentors = set()
|
||||
# for comment in discussion.comments.nodes:
|
||||
# if comment.author:
|
||||
# authors[comment.author.login] = comment.author
|
||||
# if comment.author.login != discussion_author_name:
|
||||
# discussion_commentors.add(comment.author.login)
|
||||
# for reply in comment.replies.nodes:
|
||||
# if reply.author:
|
||||
# authors[reply.author.login] = reply.author
|
||||
# if reply.author.login != discussion_author_name:
|
||||
# discussion_commentors.add(reply.author.login)
|
||||
# for author_name in discussion_commentors:
|
||||
# commentors[author_name] += 1
|
||||
# if discussion.createdAt > one_month_ago:
|
||||
# last_month_commentors[author_name] += 1
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
# def get_experts(settings: Settings):
|
||||
# (
|
||||
# discussions_commentors,
|
||||
# discussions_last_month_commentors,
|
||||
# discussions_authors,
|
||||
# ) = get_discussions_experts(settings=settings)
|
||||
# commentors = discussions_commentors
|
||||
# last_month_commentors = discussions_last_month_commentors
|
||||
# authors = {**discussions_authors}
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
def _logistic(x, k):
|
||||
return x / (x + k)
|
||||
|
||||
|
||||
def get_contributors(settings: Settings):
|
||||
pr_nodes: List[PullRequestNode] = []
|
||||
pr_edges = get_graphql_pr_edges(settings=settings)
|
||||
|
||||
while pr_edges:
|
||||
for edge in pr_edges:
|
||||
pr_nodes.append(edge.node)
|
||||
last_edge = pr_edges[-1]
|
||||
pr_edges = get_graphql_pr_edges(settings=settings, after=last_edge.cursor)
|
||||
|
||||
contributors = Counter()
|
||||
contributor_scores = Counter()
|
||||
recent_contributor_scores = Counter()
|
||||
reviewers = Counter()
|
||||
authors: Dict[str, Author] = {}
|
||||
|
||||
for pr in pr_nodes:
|
||||
pr_reviewers: Set[str] = set()
|
||||
for review in pr.reviews.nodes:
|
||||
if review.author:
|
||||
authors[review.author.login] = review.author
|
||||
pr_reviewers.add(review.author.login)
|
||||
for reviewer in pr_reviewers:
|
||||
reviewers[reviewer] += 1
|
||||
if pr.author:
|
||||
authors[pr.author.login] = pr.author
|
||||
contributors[pr.author.login] += 1
|
||||
files_changed = pr.changedFiles
|
||||
lines_changed = pr.additions + pr.deletions
|
||||
score = _logistic(files_changed, 20) + _logistic(lines_changed, 100)
|
||||
contributor_scores[pr.author.login] += score
|
||||
three_months_ago = (datetime.now(timezone.utc) - timedelta(days=3*30))
|
||||
if pr.createdAt > three_months_ago:
|
||||
recent_contributor_scores[pr.author.login] += score
|
||||
return contributors, contributor_scores, recent_contributor_scores, reviewers, authors
|
||||
|
||||
|
||||
def get_top_users(
|
||||
*,
|
||||
counter: Counter,
|
||||
min_count: int,
|
||||
authors: Dict[str, Author],
|
||||
skip_users: Container[str],
|
||||
):
|
||||
users = []
|
||||
for commentor, count in counter.most_common():
|
||||
if commentor in skip_users:
|
||||
continue
|
||||
if count >= min_count:
|
||||
author = authors[commentor]
|
||||
users.append(
|
||||
{
|
||||
"login": commentor,
|
||||
"count": count,
|
||||
"avatarUrl": author.avatarUrl,
|
||||
"twitterUsername": author.twitterUsername,
|
||||
"url": author.url,
|
||||
}
|
||||
)
|
||||
return users
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
settings = Settings()
|
||||
logging.info(f"Using config: {settings.model_dump_json()}")
|
||||
g = Github(settings.input_token.get_secret_value())
|
||||
repo = g.get_repo(settings.github_repository)
|
||||
# question_commentors, question_last_month_commentors, question_authors = get_experts(
|
||||
# settings=settings
|
||||
# )
|
||||
contributors, contributor_scores, recent_contributor_scores, reviewers, pr_authors = get_contributors(
|
||||
settings=settings
|
||||
)
|
||||
# authors = {**question_authors, **pr_authors}
|
||||
authors = {**pr_authors}
|
||||
maintainers_logins = {
|
||||
"hwchase17",
|
||||
"agola11",
|
||||
"baskaryan",
|
||||
"hinthornw",
|
||||
"nfcampos",
|
||||
"efriis",
|
||||
"eyurtsev",
|
||||
"rlancemartin",
|
||||
"ccurme",
|
||||
"vbarda",
|
||||
}
|
||||
hidden_logins = {
|
||||
"dev2049",
|
||||
"vowelparrot",
|
||||
"obi1kenobi",
|
||||
"langchain-infra",
|
||||
"jacoblee93",
|
||||
"dqbd",
|
||||
"bracesproul",
|
||||
"akira",
|
||||
}
|
||||
bot_names = {"dosubot", "github-actions", "CodiumAI-Agent"}
|
||||
maintainers = []
|
||||
for login in maintainers_logins:
|
||||
user = authors[login]
|
||||
maintainers.append(
|
||||
{
|
||||
"login": login,
|
||||
"count": contributors[login], #+ question_commentors[login],
|
||||
"avatarUrl": user.avatarUrl,
|
||||
"twitterUsername": user.twitterUsername,
|
||||
"url": user.url,
|
||||
}
|
||||
)
|
||||
|
||||
# min_count_expert = 10
|
||||
# min_count_last_month = 3
|
||||
min_score_contributor = 1
|
||||
min_count_reviewer = 5
|
||||
skip_users = maintainers_logins | bot_names | hidden_logins
|
||||
# experts = get_top_users(
|
||||
# counter=question_commentors,
|
||||
# min_count=min_count_expert,
|
||||
# authors=authors,
|
||||
# skip_users=skip_users,
|
||||
# )
|
||||
# last_month_active = get_top_users(
|
||||
# counter=question_last_month_commentors,
|
||||
# min_count=min_count_last_month,
|
||||
# authors=authors,
|
||||
# skip_users=skip_users,
|
||||
# )
|
||||
top_recent_contributors = get_top_users(
|
||||
counter=recent_contributor_scores,
|
||||
min_count=min_score_contributor,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
top_contributors = get_top_users(
|
||||
counter=contributor_scores,
|
||||
min_count=min_score_contributor,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
top_reviewers = get_top_users(
|
||||
counter=reviewers,
|
||||
min_count=min_count_reviewer,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
|
||||
people = {
|
||||
"maintainers": maintainers,
|
||||
# "experts": experts,
|
||||
# "last_month_active": last_month_active,
|
||||
"top_recent_contributors": top_recent_contributors,
|
||||
"top_contributors": top_contributors,
|
||||
"top_reviewers": top_reviewers,
|
||||
}
|
||||
people_path = Path("./docs/data/people.yml")
|
||||
people_old_content = people_path.read_text(encoding="utf-8")
|
||||
new_people_content = yaml.dump(
|
||||
people, sort_keys=False, width=200, allow_unicode=True
|
||||
)
|
||||
if (
|
||||
people_old_content == new_people_content
|
||||
):
|
||||
logging.info("The LangChain People data hasn't changed, finishing.")
|
||||
sys.exit(0)
|
||||
people_path.write_text(new_people_content, encoding="utf-8")
|
||||
logging.info("Setting up GitHub Actions git user")
|
||||
subprocess.run(["git", "config", "user.name", "github-actions"], check=True)
|
||||
subprocess.run(
|
||||
["git", "config", "user.email", "github-actions@github.com"], check=True
|
||||
)
|
||||
branch_name = "langchain/langchain-people"
|
||||
logging.info(f"Creating a new branch {branch_name}")
|
||||
subprocess.run(["git", "checkout", "-B", branch_name], check=True)
|
||||
logging.info("Adding updated file")
|
||||
subprocess.run(
|
||||
["git", "add", str(people_path)], check=True
|
||||
)
|
||||
logging.info("Committing updated file")
|
||||
message = "👥 Update LangChain people data"
|
||||
result = subprocess.run(["git", "commit", "-m", message], check=True)
|
||||
logging.info("Pushing branch")
|
||||
subprocess.run(["git", "push", "origin", branch_name, "-f"], check=True)
|
||||
logging.info("Creating PR")
|
||||
pr = repo.create_pull(title=message, body=message, base="master", head=branch_name)
|
||||
logging.info(f"Created PR: {pr.number}")
|
||||
logging.info("Finished")
|
||||
10
.github/actions/poetry_setup/action.yml
vendored
10
.github/actions/poetry_setup/action.yml
vendored
@@ -26,13 +26,12 @@ inputs:
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
- uses: actions/setup-python@v5
|
||||
- uses: actions/setup-python@v4
|
||||
name: Setup python ${{ inputs.python-version }}
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- uses: actions/cache@v4
|
||||
- uses: actions/cache@v3
|
||||
id: cache-bin-poetry
|
||||
name: Cache Poetry binary - Python ${{ inputs.python-version }}
|
||||
env:
|
||||
@@ -75,11 +74,10 @@ runs:
|
||||
env:
|
||||
POETRY_VERSION: ${{ inputs.poetry-version }}
|
||||
PYTHON_VERSION: ${{ inputs.python-version }}
|
||||
# Install poetry using the python version installed by setup-python step.
|
||||
run: pipx install "poetry==$POETRY_VERSION" --python '${{ steps.setup-python.outputs.python-path }}' --verbose
|
||||
run: pipx install "poetry==$POETRY_VERSION" --python "python$PYTHON_VERSION" --verbose
|
||||
|
||||
- name: Restore pip and poetry cached dependencies
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
98
.github/scripts/check_diff.py
vendored
98
.github/scripts/check_diff.py
vendored
@@ -1,29 +1,16 @@
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
from typing import Dict
|
||||
|
||||
LANGCHAIN_DIRS = [
|
||||
ALL_DIRS = {
|
||||
"libs/core",
|
||||
"libs/text-splitters",
|
||||
"libs/langchain",
|
||||
"libs/community",
|
||||
"libs/experimental",
|
||||
]
|
||||
"libs/community",
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
files = sys.argv[1:]
|
||||
|
||||
dirs_to_run: Dict[str, set] = {
|
||||
"lint": set(),
|
||||
"test": set(),
|
||||
"extended-test": set(),
|
||||
}
|
||||
docs_edited = False
|
||||
|
||||
if len(files) == 300:
|
||||
# max diff length is 300 files - there are likely files missing
|
||||
raise ValueError("Max diff reached. Please manually run CI on changed libs.")
|
||||
dirs_to_run = set()
|
||||
|
||||
for file in files:
|
||||
if any(
|
||||
@@ -32,63 +19,28 @@ if __name__ == "__main__":
|
||||
".github/workflows",
|
||||
".github/tools",
|
||||
".github/actions",
|
||||
"libs/core",
|
||||
".github/scripts/check_diff.py",
|
||||
)
|
||||
):
|
||||
# add all LANGCHAIN_DIRS for infra changes
|
||||
dirs_to_run["extended-test"].update(LANGCHAIN_DIRS)
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS):
|
||||
# add that dir and all dirs after in LANGCHAIN_DIRS
|
||||
# for extended testing
|
||||
found = False
|
||||
for dir_ in LANGCHAIN_DIRS:
|
||||
if file.startswith(dir_):
|
||||
found = True
|
||||
if found:
|
||||
dirs_to_run["extended-test"].add(dir_)
|
||||
elif file.startswith("libs/standard-tests"):
|
||||
# TODO: update to include all packages that rely on standard-tests (all partner packages)
|
||||
# note: won't run on external repo partners
|
||||
dirs_to_run["lint"].add("libs/standard-tests")
|
||||
dirs_to_run["test"].add("libs/partners/mistralai")
|
||||
dirs_to_run["test"].add("libs/partners/openai")
|
||||
dirs_to_run["test"].add("libs/partners/anthropic")
|
||||
dirs_to_run["test"].add("libs/partners/ai21")
|
||||
dirs_to_run["test"].add("libs/partners/fireworks")
|
||||
dirs_to_run["test"].add("libs/partners/groq")
|
||||
|
||||
elif file.startswith("libs/cli"):
|
||||
# todo: add cli makefile
|
||||
pass
|
||||
elif file.startswith("libs/partners"):
|
||||
partner_dir = file.split("/")[2]
|
||||
if os.path.isdir(f"libs/partners/{partner_dir}") and [
|
||||
filename
|
||||
for filename in os.listdir(f"libs/partners/{partner_dir}")
|
||||
if not filename.startswith(".")
|
||||
] != ["README.md"]:
|
||||
dirs_to_run["test"].add(f"libs/partners/{partner_dir}")
|
||||
# Skip if the directory was deleted or is just a tombstone readme
|
||||
elif file.startswith("libs/"):
|
||||
raise ValueError(
|
||||
f"Unknown lib: {file}. check_diff.py likely needs "
|
||||
"an update for this new library!"
|
||||
dirs_to_run = ALL_DIRS
|
||||
break
|
||||
elif "libs/community" in file:
|
||||
dirs_to_run.update(
|
||||
("libs/community", "libs/langchain", "libs/experimental")
|
||||
)
|
||||
elif any(file.startswith(p) for p in ["docs/", "templates/", "cookbook/"]):
|
||||
if file.startswith("docs/"):
|
||||
docs_edited = True
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
outputs = {
|
||||
"dirs-to-lint": list(
|
||||
dirs_to_run["lint"] | dirs_to_run["test"] | dirs_to_run["extended-test"]
|
||||
),
|
||||
"dirs-to-test": list(dirs_to_run["test"] | dirs_to_run["extended-test"]),
|
||||
"dirs-to-extended-test": list(dirs_to_run["extended-test"]),
|
||||
"docs-edited": "true" if docs_edited else "",
|
||||
}
|
||||
for key, value in outputs.items():
|
||||
json_output = json.dumps(value)
|
||||
print(f"{key}={json_output}")
|
||||
elif "libs/partners" in file:
|
||||
partner_dir = file.split("/")[2]
|
||||
dirs_to_run.update(
|
||||
(f"libs/partners/{partner_dir}", "libs/langchain", "libs/experimental")
|
||||
)
|
||||
elif "libs/langchain" in file:
|
||||
dirs_to_run.update(("libs/langchain", "libs/experimental"))
|
||||
elif "libs/experimental" in file:
|
||||
dirs_to_run.add("libs/experimental")
|
||||
elif file.startswith("libs/"):
|
||||
dirs_to_run = ALL_DIRS
|
||||
break
|
||||
else:
|
||||
pass
|
||||
print(json.dumps(list(dirs_to_run)))
|
||||
|
||||
79
.github/scripts/get_min_versions.py
vendored
79
.github/scripts/get_min_versions.py
vendored
@@ -1,79 +0,0 @@
|
||||
import sys
|
||||
|
||||
import tomllib
|
||||
from packaging.version import parse as parse_version
|
||||
import re
|
||||
|
||||
MIN_VERSION_LIBS = [
|
||||
"langchain-core",
|
||||
"langchain-community",
|
||||
"langchain",
|
||||
"langchain-text-splitters",
|
||||
]
|
||||
|
||||
|
||||
def get_min_version(version: str) -> str:
|
||||
# base regex for x.x.x with cases for rc/post/etc
|
||||
# valid strings: https://peps.python.org/pep-0440/#public-version-identifiers
|
||||
vstring = r"\d+(?:\.\d+){0,2}(?:(?:a|b|rc|\.post|\.dev)\d+)?"
|
||||
# case ^x.x.x
|
||||
_match = re.match(f"^\\^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
# case >=x.x.x,<y.y.y
|
||||
_match = re.match(f"^>=({vstring}),<({vstring})$", version)
|
||||
if _match:
|
||||
_min = _match.group(1)
|
||||
_max = _match.group(2)
|
||||
assert parse_version(_min) < parse_version(_max)
|
||||
return _min
|
||||
|
||||
# case x.x.x
|
||||
_match = re.match(f"^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
raise ValueError(f"Unrecognized version format: {version}")
|
||||
|
||||
|
||||
def get_min_version_from_toml(toml_path: str):
|
||||
# Parse the TOML file
|
||||
with open(toml_path, "rb") as file:
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
# Get the dependencies from tool.poetry.dependencies
|
||||
dependencies = toml_data["tool"]["poetry"]["dependencies"]
|
||||
|
||||
# Initialize a dictionary to store the minimum versions
|
||||
min_versions = {}
|
||||
|
||||
# Iterate over the libs in MIN_VERSION_LIBS
|
||||
for lib in MIN_VERSION_LIBS:
|
||||
# Check if the lib is present in the dependencies
|
||||
if lib in dependencies:
|
||||
# Get the version string
|
||||
version_string = dependencies[lib]
|
||||
|
||||
if isinstance(version_string, dict):
|
||||
version_string = version_string["version"]
|
||||
|
||||
# Use parse_version to get the minimum supported version from version_string
|
||||
min_version = get_min_version(version_string)
|
||||
|
||||
# Store the minimum version in the min_versions dictionary
|
||||
min_versions[lib] = min_version
|
||||
|
||||
return min_versions
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Get the TOML file path from the command line argument
|
||||
toml_file = sys.argv[1]
|
||||
|
||||
# Call the function to get the minimum versions
|
||||
min_versions = get_min_version_from_toml(toml_file)
|
||||
|
||||
print(
|
||||
" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])
|
||||
)
|
||||
7
.github/workflows/.codespell-exclude
vendored
7
.github/workflows/.codespell-exclude
vendored
@@ -1,7 +0,0 @@
|
||||
libs/community/langchain_community/llms/yuan2.py
|
||||
"NotIn": "not in",
|
||||
- `/checkin`: Check-in
|
||||
docs/docs/integrations/providers/trulens.mdx
|
||||
self.assertIn(
|
||||
from trulens_eval import Tru
|
||||
tru = Tru()
|
||||
105
.github/workflows/_all_ci.yml
vendored
Normal file
105
.github/workflows/_all_ci.yml
vendored
Normal file
@@ -0,0 +1,105 @@
|
||||
---
|
||||
name: langchain CI
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: choice
|
||||
default: 'libs/langchain'
|
||||
options:
|
||||
- libs/langchain
|
||||
- libs/core
|
||||
- libs/experimental
|
||||
- libs/community
|
||||
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.working-directory }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses: ./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
dependencies:
|
||||
uses: ./.github/workflows/_dependencies.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }} extended tests
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing --with test
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
@@ -9,7 +9,7 @@ on:
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -24,7 +24,7 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "poetry run pytest -m compile tests/integration_tests #${{ matrix.python-version }}"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
|
||||
8
.github/workflows/_dependencies.yml
vendored
8
.github/workflows/_dependencies.yml
vendored
@@ -13,7 +13,7 @@ on:
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -28,7 +28,7 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: dependency checks ${{ matrix.python-version }}
|
||||
name: dependencies - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -63,8 +63,6 @@ jobs:
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
# airbyte currently doesn't support pydantic v2
|
||||
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
|
||||
run: |
|
||||
# Determine the major part of pydantic version
|
||||
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
@@ -99,8 +97,6 @@ jobs:
|
||||
fi
|
||||
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
|
||||
- name: Run pydantic compatibility tests
|
||||
# airbyte currently doesn't support pydantic v2
|
||||
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
|
||||
shell: bash
|
||||
run: make test
|
||||
|
||||
|
||||
95
.github/workflows/_integration_test.yml
vendored
95
.github/workflows/_integration_test.yml
vendored
@@ -1,95 +0,0 @@
|
||||
name: Integration tests
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
environment: Scheduled testing
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test,test_integration
|
||||
|
||||
- name: Install deps outside pyproject
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
|
||||
shell: bash
|
||||
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
shell: bash
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
run: |
|
||||
make integration_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
19
.github/workflows/_lint.yml
vendored
19
.github/workflows/_lint.yml
vendored
@@ -13,7 +13,7 @@ on:
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
# This env var allows us to get inline annotations when ruff has complaints.
|
||||
@@ -21,7 +21,6 @@ env:
|
||||
|
||||
jobs:
|
||||
build:
|
||||
name: "make lint #${{ matrix.python-version }}"
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
@@ -80,13 +79,13 @@ jobs:
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache
|
||||
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
@@ -94,7 +93,7 @@ jobs:
|
||||
run: |
|
||||
make lint_package
|
||||
|
||||
- name: Install unit test dependencies
|
||||
- name: Install test dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
@@ -103,24 +102,18 @@ jobs:
|
||||
# If you change this configuration, make sure to change the `cache-key`
|
||||
# in the `poetry_setup` action above to stop using the old cache.
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with test
|
||||
- name: Install unit+integration test dependencies
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with test,test_integration
|
||||
|
||||
- name: Get .mypy_cache_test to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache_test
|
||||
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
214
.github/workflows/_release.yml
vendored
214
.github/workflows/_release.yml
vendored
@@ -1,5 +1,5 @@
|
||||
name: release
|
||||
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
@@ -11,22 +11,21 @@ on:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
type: choice
|
||||
default: 'libs/langchain'
|
||||
dangerous-nonmaster-release:
|
||||
required: false
|
||||
type: boolean
|
||||
default: false
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
options:
|
||||
- libs/langchain
|
||||
- libs/core
|
||||
- libs/experimental
|
||||
- libs/community
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release
|
||||
environment: Scheduled testing
|
||||
if: github.ref == 'refs/heads/master'
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
outputs:
|
||||
@@ -60,7 +59,7 @@ jobs:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v4
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
@@ -72,83 +71,22 @@ jobs:
|
||||
run: |
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
release-notes:
|
||||
needs:
|
||||
- build
|
||||
runs-on: ubuntu-latest
|
||||
outputs:
|
||||
release-body: ${{ steps.generate-release-body.outputs.release-body }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain
|
||||
path: langchain
|
||||
sparse-checkout: | # this only grabs files for relevant dir
|
||||
${{ inputs.working-directory }}
|
||||
ref: master # this scopes to just master branch
|
||||
fetch-depth: 0 # this fetches entire commit history
|
||||
- name: Check Tags
|
||||
id: check-tags
|
||||
shell: bash
|
||||
working-directory: langchain/${{ inputs.working-directory }}
|
||||
env:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
echo $REGEX
|
||||
PREV_TAG=$(git tag --sort=-creatordate | grep -P $REGEX || true | head -1)
|
||||
TAG="${PKG_NAME}==${VERSION}"
|
||||
if [ "$TAG" == "$PREV_TAG" ]; then
|
||||
echo "No new version to release"
|
||||
exit 1
|
||||
fi
|
||||
echo tag="$TAG" >> $GITHUB_OUTPUT
|
||||
echo prev-tag="$PREV_TAG" >> $GITHUB_OUTPUT
|
||||
- name: Generate release body
|
||||
id: generate-release-body
|
||||
working-directory: langchain
|
||||
env:
|
||||
WORKING_DIR: ${{ inputs.working-directory }}
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
TAG: ${{ steps.check-tags.outputs.tag }}
|
||||
PREV_TAG: ${{ steps.check-tags.outputs.prev-tag }}
|
||||
run: |
|
||||
PREAMBLE="Changes since $PREV_TAG"
|
||||
# if PREV_TAG is empty, then we are releasing the first version
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
PREAMBLE="Initial release"
|
||||
PREV_TAG=$(git rev-list --max-parents=0 HEAD)
|
||||
fi
|
||||
{
|
||||
echo 'release-body<<EOF'
|
||||
echo "# Release $TAG"
|
||||
echo $PREAMBLE
|
||||
echo
|
||||
git log --format="%s" "$PREV_TAG"..HEAD -- $WORKING_DIR
|
||||
echo EOF
|
||||
} >> "$GITHUB_OUTPUT"
|
||||
|
||||
test-pypi-publish:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
dangerous-nonmaster-release: ${{ inputs.dangerous-nonmaster-release }}
|
||||
secrets: inherit
|
||||
|
||||
pre-release-checks:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We explicitly *don't* set up caching here. This ensures our tests are
|
||||
# maximally sensitive to catching breakage.
|
||||
#
|
||||
@@ -161,133 +99,38 @@ jobs:
|
||||
# - Tests pass, because the dependency is present even though it wasn't specified.
|
||||
# - The package is published, and it breaks on the missing dependency when
|
||||
# used in the real world.
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Import published package
|
||||
- name: Test published package
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
env:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
# Here we use:
|
||||
# - The default regular PyPI index as the *primary* index, meaning
|
||||
# - The default regular PyPI index as the *primary* index, meaning
|
||||
# that it takes priority (https://pypi.org/simple)
|
||||
# - The test PyPI index as an extra index, so that any dependencies that
|
||||
# are not found on test PyPI can be resolved and installed anyway.
|
||||
# (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION
|
||||
# package because VERSION will not have been uploaded to regular PyPI yet.
|
||||
# - attempt install again after 5 seconds if it fails because there is
|
||||
# sometimes a delay in availability on test pypi
|
||||
#
|
||||
# TODO: add more in-depth pre-publish tests after testing that importing works
|
||||
run: |
|
||||
poetry run pip install \
|
||||
pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" || \
|
||||
( \
|
||||
sleep 5 && \
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" \
|
||||
)
|
||||
"$PKG_NAME==$VERSION"
|
||||
|
||||
# Replace all dashes in the package name with underscores,
|
||||
# since that's how Python imports packages with dashes in the name.
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
|
||||
|
||||
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
|
||||
- name: Import test dependencies
|
||||
run: poetry install --with test,test_integration
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Overwrite the local version of the package with the test PyPI version.
|
||||
- name: Import published package (again)
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
shell: bash
|
||||
env:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION"
|
||||
|
||||
- name: Run unit tests
|
||||
run: make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Get minimum versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
run: |
|
||||
poetry run pip install packaging
|
||||
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
- name: Run unit tests with minimum dependency versions
|
||||
if: ${{ steps.min-version.outputs.min-versions != '' }}
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
poetry run pip install --force-reinstall $MIN_VERSIONS --editable .
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
run: make integration_tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
|
||||
publish:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
runs-on: ubuntu-latest
|
||||
@@ -314,7 +157,7 @@ jobs:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
@@ -329,7 +172,6 @@ jobs:
|
||||
mark-release:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
- publish
|
||||
@@ -354,18 +196,18 @@ jobs:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Create Tag
|
||||
|
||||
- name: Create Release
|
||||
uses: ncipollo/release-action@v1
|
||||
if: ${{ inputs.working-directory == 'libs/langchain' }}
|
||||
with:
|
||||
artifacts: "dist/*"
|
||||
token: ${{ secrets.GITHUB_TOKEN }}
|
||||
generateReleaseNotes: false
|
||||
tag: ${{needs.build.outputs.pkg-name}}==${{ needs.build.outputs.version }}
|
||||
body: ${{ needs.release-notes.outputs.release-body }}
|
||||
commit: ${{ github.sha }}
|
||||
makeLatest: ${{ needs.build.outputs.pkg-name == 'langchain-core'}}
|
||||
draft: false
|
||||
generateReleaseNotes: true
|
||||
tag: v${{ needs.build.outputs.version }}
|
||||
commit: master
|
||||
|
||||
4
.github/workflows/_test.yml
vendored
4
.github/workflows/_test.yml
vendored
@@ -13,7 +13,7 @@ on:
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -28,7 +28,7 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "make test #${{ matrix.python-version }}"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
|
||||
50
.github/workflows/_test_doc_imports.yml
vendored
50
.github/workflows/_test_doc_imports.yml
vendored
@@ -1,50 +0,0 @@
|
||||
name: test_doc_imports
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.11"
|
||||
name: "check doc imports #${{ matrix.python-version }}"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
cache-key: core
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
|
||||
- name: Install langchain editable
|
||||
run: |
|
||||
poetry run pip install -e libs/core libs/langchain libs/community libs/experimental
|
||||
|
||||
- name: Check doc imports
|
||||
shell: bash
|
||||
run: |
|
||||
poetry run python docs/scripts/check_imports.py
|
||||
|
||||
- name: Ensure the test did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
13
.github/workflows/_test_release.yml
vendored
13
.github/workflows/_test_release.yml
vendored
@@ -7,19 +7,14 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
dangerous-nonmaster-release:
|
||||
required: false
|
||||
type: boolean
|
||||
default: false
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release
|
||||
if: github.ref == 'refs/heads/master'
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
outputs:
|
||||
@@ -53,7 +48,7 @@ jobs:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v4
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
@@ -81,7 +76,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v3
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
24
.github/workflows/check-broken-links.yml
vendored
24
.github/workflows/check-broken-links.yml
vendored
@@ -1,24 +0,0 @@
|
||||
name: Check Broken Links
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
jobs:
|
||||
check-links:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Use Node.js 18.x
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: 18.x
|
||||
cache: "yarn"
|
||||
cache-dependency-path: ./docs/yarn.lock
|
||||
- name: Install dependencies
|
||||
run: yarn install --immutable --mode=skip-build
|
||||
working-directory: ./docs
|
||||
- name: Check broken links
|
||||
run: yarn check-broken-links
|
||||
working-directory: ./docs
|
||||
135
.github/workflows/check_diffs.yml
vendored
135
.github/workflows/check_diffs.yml
vendored
@@ -1,10 +1,15 @@
|
||||
---
|
||||
name: CI
|
||||
name: Check library diffs
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
paths:
|
||||
- ".github/actions/**"
|
||||
- ".github/tools/**"
|
||||
- ".github/workflows/**"
|
||||
- "libs/**"
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
@@ -16,143 +21,27 @@ concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.2.0
|
||||
- id: set-matrix
|
||||
run: |
|
||||
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
|
||||
run: echo "dirs-to-run=$(python .github/scripts/check_diff.py ${{ steps.files.outputs.all }})" >> $GITHUB_OUTPUT
|
||||
outputs:
|
||||
dirs-to-lint: ${{ steps.set-matrix.outputs.dirs-to-lint }}
|
||||
dirs-to-test: ${{ steps.set-matrix.outputs.dirs-to-test }}
|
||||
dirs-to-extended-test: ${{ steps.set-matrix.outputs.dirs-to-extended-test }}
|
||||
docs-edited: ${{ steps.set-matrix.outputs.docs-edited }}
|
||||
lint:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
|
||||
ci:
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-lint != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-lint) }}
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
|
||||
uses: ./.github/workflows/_all_ci.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
test-doc-imports:
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' || needs.build.outputs.docs-edited }}
|
||||
uses: ./.github/workflows/_test_doc_imports.yml
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
dependencies:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_dependencies.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
name: "cd ${{ matrix.working-directory }} / make extended_tests #${{ matrix.python-version }}"
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-extended-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
# note different variable for extended test dirs
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-extended-test) }}
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing --with test
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
ci_success:
|
||||
name: "CI Success"
|
||||
needs: [build, lint, test, compile-integration-tests, dependencies, extended-tests, test-doc-imports]
|
||||
if: |
|
||||
always()
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
JOBS_JSON: ${{ toJSON(needs) }}
|
||||
RESULTS_JSON: ${{ toJSON(needs.*.result) }}
|
||||
EXIT_CODE: ${{!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') && '0' || '1'}}
|
||||
steps:
|
||||
- name: "CI Success"
|
||||
run: |
|
||||
echo $JOBS_JSON
|
||||
echo $RESULTS_JSON
|
||||
echo "Exiting with $EXIT_CODE"
|
||||
exit $EXIT_CODE
|
||||
|
||||
19
.github/workflows/codespell.yml
vendored
19
.github/workflows/codespell.yml
vendored
@@ -1,18 +1,18 @@
|
||||
---
|
||||
name: CI / cd . / make spell_check
|
||||
name: Codespell
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master, v0.1]
|
||||
branches: [master]
|
||||
pull_request:
|
||||
branches: [master, v0.1]
|
||||
branches: [master]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: (Check for spelling errors)
|
||||
name: Check for spelling errors
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
@@ -29,9 +29,8 @@ jobs:
|
||||
python .github/workflows/extract_ignored_words_list.py
|
||||
id: extract_ignore_words
|
||||
|
||||
# - name: Codespell
|
||||
# uses: codespell-project/actions-codespell@v2
|
||||
# with:
|
||||
# skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv,*.lock
|
||||
# ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
|
||||
# exclude_file: ./.github/workflows/codespell-exclude
|
||||
- name: Codespell
|
||||
uses: codespell-project/actions-codespell@v2
|
||||
with:
|
||||
skip: guide_imports.json
|
||||
ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
|
||||
|
||||
35
.github/workflows/doc_lint.yml
vendored
Normal file
35
.github/workflows/doc_lint.yml
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
---
|
||||
name: Docs, templates, cookbook lint
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- 'docs/**'
|
||||
- 'templates/**'
|
||||
- 'cookbook/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/doc_lint.yml'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
check:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Run import check
|
||||
run: |
|
||||
# We should not encourage imports directly from main init file
|
||||
# Expect for hub
|
||||
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
|
||||
|
||||
lint:
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: "."
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_cli_release.yml
vendored
Normal file
13
.github/workflows/langchain_cli_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/cli Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/cli
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_community_release.yml
vendored
Normal file
13
.github/workflows/langchain_community_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/community Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/community
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_core_release.yml
vendored
Normal file
13
.github/workflows/langchain_core_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/core Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/core
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_experimental_release.yml
vendored
Normal file
13
.github/workflows/langchain_experimental_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/experimental Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_experimental_test_release.yml
vendored
Normal file
13
.github/workflows/langchain_experimental_test_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: Experimental Test Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_openai_release.yml
vendored
Normal file
13
.github/workflows/langchain_openai_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/core Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/core
|
||||
secrets: inherit
|
||||
27
.github/workflows/langchain_release.yml
vendored
Normal file
27
.github/workflows/langchain_release.yml
vendored
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
name: libs/langchain Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
# N.B.: It's possible that PyPI doesn't make the new release visible / available
|
||||
# immediately after publishing. If that happens, the docker build might not
|
||||
# create a new docker image for the new release, since it won't see it.
|
||||
#
|
||||
# If this ends up being a problem, add a check to the end of the `_release.yml`
|
||||
# workflow that prevents the workflow from finishing until the new release
|
||||
# is visible and installable on PyPI.
|
||||
release-docker:
|
||||
needs:
|
||||
- release
|
||||
uses:
|
||||
./.github/workflows/langchain_release_docker.yml
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_test_release.yml
vendored
Normal file
13
.github/workflows/langchain_test_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: Test Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
36
.github/workflows/people.yml
vendored
36
.github/workflows/people.yml
vendored
@@ -1,36 +0,0 @@
|
||||
name: LangChain People
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 14 1 * *"
|
||||
push:
|
||||
branches: [jacob/people]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
debug_enabled:
|
||||
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
|
||||
required: false
|
||||
default: 'false'
|
||||
|
||||
jobs:
|
||||
langchain-people:
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Dump GitHub context
|
||||
env:
|
||||
GITHUB_CONTEXT: ${{ toJson(github) }}
|
||||
run: echo "$GITHUB_CONTEXT"
|
||||
- uses: actions/checkout@v4
|
||||
# Ref: https://github.com/actions/runner/issues/2033
|
||||
- name: Fix git safe.directory in container
|
||||
run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
|
||||
# Allow debugging with tmate
|
||||
- name: Setup tmate session
|
||||
uses: mxschmitt/action-tmate@v3
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
|
||||
with:
|
||||
limit-access-to-actor: true
|
||||
- uses: ./.github/actions/people
|
||||
with:
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
93
.github/workflows/scheduled_test.yml
vendored
93
.github/workflows/scheduled_test.yml
vendored
@@ -6,77 +6,37 @@ on:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
name: Python ${{ matrix.python-version }} - ${{ matrix.working-directory }}
|
||||
defaults:
|
||||
run:
|
||||
working-directory: libs/langchain
|
||||
runs-on: ubuntu-latest
|
||||
environment: Scheduled testing
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
working-directory:
|
||||
- "libs/partners/openai"
|
||||
- "libs/partners/anthropic"
|
||||
- "libs/partners/ai21"
|
||||
- "libs/partners/fireworks"
|
||||
- "libs/partners/groq"
|
||||
- "libs/partners/mistralai"
|
||||
- "libs/partners/together"
|
||||
- "libs/partners/cohere"
|
||||
- "libs/partners/google-vertexai"
|
||||
- "libs/partners/google-genai"
|
||||
- "libs/partners/aws"
|
||||
- "libs/partners/nvidia-ai-endpoints"
|
||||
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
path: langchain
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-google
|
||||
path: langchain-google
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-nvidia
|
||||
path: langchain-nvidia
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-cohere
|
||||
path: langchain-cohere
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-aws
|
||||
path: langchain-aws
|
||||
|
||||
- name: Move libs
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
langchain/libs/partners/google-vertexai \
|
||||
langchain/libs/partners/nvidia-ai-endpoints \
|
||||
langchain/libs/partners/cohere
|
||||
mv langchain-google/libs/genai langchain/libs/partners/google-genai
|
||||
mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
|
||||
mv langchain-nvidia/libs/ai-endpoints langchain/libs/partners/nvidia-ai-endpoints
|
||||
mv langchain-cohere/libs/cohere langchain/libs/partners/cohere
|
||||
mv langchain-aws/libs/aws langchain/libs/partners/aws
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: "./langchain/.github/actions/poetry_setup"
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: langchain/${{ matrix.working-directory }}
|
||||
working-directory: libs/langchain
|
||||
cache-key: scheduled
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
uses: 'google-github-actions/auth@v1'
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
@@ -85,15 +45,17 @@ jobs:
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ secrets.AWS_REGION }}
|
||||
aws-region: ${{ vars.AWS_REGION }}
|
||||
|
||||
- name: Install dependencies
|
||||
working-directory: libs/langchain
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
poetry install --with=test_integration,test
|
||||
|
||||
- name: Run integration tests
|
||||
- name: Run tests
|
||||
shell: bash
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
@@ -103,31 +65,12 @@ jobs:
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
run: |
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
make integration_tests
|
||||
|
||||
- name: Remove external libraries
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
langchain/libs/partners/google-vertexai \
|
||||
langchain/libs/partners/nvidia-ai-endpoints \
|
||||
langchain/libs/partners/cohere \
|
||||
langchain/libs/partners/aws
|
||||
make scheduled_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
working-directory: langchain
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
|
||||
36
.github/workflows/templates_ci.yml
vendored
Normal file
36
.github/workflows/templates_ci.yml
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
name: templates CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/templates_ci.yml'
|
||||
- 'templates/**'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: "templates"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: templates
|
||||
secrets: inherit
|
||||
11
.gitignore
vendored
11
.gitignore
vendored
@@ -115,11 +115,13 @@ celerybeat.pid
|
||||
# Environments
|
||||
.env
|
||||
.envrc
|
||||
.venv*
|
||||
venv*
|
||||
.venv
|
||||
.venvs
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
|
||||
# Spyder project settings
|
||||
.spyderproject
|
||||
@@ -175,7 +177,4 @@ docs/docs/build
|
||||
docs/docs/node_modules
|
||||
docs/docs/yarn.lock
|
||||
_dist
|
||||
docs/docs/templates
|
||||
|
||||
prof
|
||||
virtualenv/
|
||||
docs/docs/templates
|
||||
@@ -4,17 +4,20 @@
|
||||
# Required
|
||||
version: 2
|
||||
|
||||
formats:
|
||||
- pdf
|
||||
|
||||
# Set the version of Python and other tools you might need
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.11"
|
||||
commands:
|
||||
- mkdir -p $READTHEDOCS_OUTPUT
|
||||
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
|
||||
- python -mvirtualenv $READTHEDOCS_VIRTUALENV_PATH
|
||||
- python -m pip install --upgrade --no-cache-dir pip setuptools
|
||||
- python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
|
||||
- python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
|
||||
- python docs/api_reference/create_api_rst.py
|
||||
- cat docs/api_reference/conf.py
|
||||
- python -m sphinx -T -E -b html -d _build/doctrees -c docs/api_reference docs/api_reference $READTHEDOCS_OUTPUT/html -j auto
|
||||
|
||||
# Build documentation in the docs/ directory with Sphinx
|
||||
sphinx:
|
||||
configuration: docs/api_reference/conf.py
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
"""Main entrypoint into package."""
|
||||
from importlib import metadata
|
||||
|
||||
try:
|
||||
__version__ = metadata.version(__package__)
|
||||
except metadata.PackageNotFoundError:
|
||||
# Case where package metadata is not available.
|
||||
__version__ = ""
|
||||
del metadata # optional, avoids polluting the results of dir(__package__)
|
||||
@@ -0,0 +1,79 @@
|
||||
"""Agent toolkits contain integrations with various resources and services.
|
||||
|
||||
LangChain has a large ecosystem of integrations with various external resources
|
||||
like local and remote file systems, APIs and databases.
|
||||
|
||||
These integrations allow developers to create versatile applications that combine the
|
||||
power of LLMs with the ability to access, interact with and manipulate external
|
||||
resources.
|
||||
|
||||
When developing an application, developers should inspect the capabilities and
|
||||
permissions of the tools that underlie the given agent toolkit, and determine
|
||||
whether permissions of the given toolkit are appropriate for the application.
|
||||
|
||||
See [Security](https://python.langchain.com/docs/security) for more information.
|
||||
"""
|
||||
from langchain_community.agent_toolkits.ainetwork.toolkit import AINetworkToolkit
|
||||
from langchain_community.agent_toolkits.amadeus.toolkit import AmadeusToolkit
|
||||
from langchain_community.agent_toolkits.azure_cognitive_services import (
|
||||
AzureCognitiveServicesToolkit,
|
||||
)
|
||||
from langchain_community.agent_toolkits.conversational_retrieval.openai_functions import ( # noqa: E501
|
||||
create_conversational_retrieval_agent,
|
||||
)
|
||||
from langchain_community.agent_toolkits.file_management.toolkit import (
|
||||
FileManagementToolkit,
|
||||
)
|
||||
from langchain_community.agent_toolkits.gmail.toolkit import GmailToolkit
|
||||
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
|
||||
from langchain_community.agent_toolkits.json.base import create_json_agent
|
||||
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
|
||||
from langchain_community.agent_toolkits.multion.toolkit import MultionToolkit
|
||||
from langchain_community.agent_toolkits.nasa.toolkit import NasaToolkit
|
||||
from langchain_community.agent_toolkits.nla.toolkit import NLAToolkit
|
||||
from langchain_community.agent_toolkits.office365.toolkit import O365Toolkit
|
||||
from langchain_community.agent_toolkits.openapi.base import create_openapi_agent
|
||||
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
|
||||
from langchain_community.agent_toolkits.playwright.toolkit import (
|
||||
PlayWrightBrowserToolkit,
|
||||
)
|
||||
from langchain_community.agent_toolkits.powerbi.base import create_pbi_agent
|
||||
from langchain_community.agent_toolkits.powerbi.chat_base import create_pbi_chat_agent
|
||||
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
|
||||
from langchain_community.agent_toolkits.slack.toolkit import SlackToolkit
|
||||
from langchain_community.agent_toolkits.spark_sql.base import create_spark_sql_agent
|
||||
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
|
||||
from langchain_community.agent_toolkits.sql.base import create_sql_agent
|
||||
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
|
||||
from langchain_community.agent_toolkits.steam.toolkit import SteamToolkit
|
||||
from langchain_community.agent_toolkits.zapier.toolkit import ZapierToolkit
|
||||
|
||||
|
||||
__all__ = [
|
||||
"AINetworkToolkit",
|
||||
"AmadeusToolkit",
|
||||
"AzureCognitiveServicesToolkit",
|
||||
"FileManagementToolkit",
|
||||
"GmailToolkit",
|
||||
"JiraToolkit",
|
||||
"JsonToolkit",
|
||||
"MultionToolkit",
|
||||
"NasaToolkit",
|
||||
"NLAToolkit",
|
||||
"O365Toolkit",
|
||||
"OpenAPIToolkit",
|
||||
"PlayWrightBrowserToolkit",
|
||||
"PowerBIToolkit",
|
||||
"SlackToolkit",
|
||||
"SteamToolkit",
|
||||
"SQLDatabaseToolkit",
|
||||
"SparkSQLToolkit",
|
||||
"ZapierToolkit",
|
||||
"create_json_agent",
|
||||
"create_openapi_agent",
|
||||
"create_pbi_agent",
|
||||
"create_pbi_chat_agent",
|
||||
"create_spark_sql_agent",
|
||||
"create_sql_agent",
|
||||
"create_conversational_retrieval_agent",
|
||||
]
|
||||
@@ -0,0 +1,53 @@
|
||||
"""Json agent."""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
|
||||
from langchain_community.agent_toolkits.json.prompt import JSON_PREFIX, JSON_SUFFIX
|
||||
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
|
||||
|
||||
def create_json_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: JsonToolkit,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
prefix: str = JSON_PREFIX,
|
||||
suffix: str = JSON_SUFFIX,
|
||||
format_instructions: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
verbose: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct a json agent from an LLM and tools."""
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.chains.llm import LLMChain
|
||||
tools = toolkit.get_tools()
|
||||
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
|
||||
prompt = ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix,
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
**prompt_params,
|
||||
)
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
prompt=prompt,
|
||||
callback_manager=callback_manager,
|
||||
)
|
||||
tool_names = [tool.name for tool in tools]
|
||||
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
verbose=verbose,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,57 @@
|
||||
"""Tool for interacting with a single API with natural language definition."""
|
||||
|
||||
from __future__ import annotations
|
||||
from typing import Any, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.tools import Tool
|
||||
|
||||
from langchain_community.tools.openapi.utils.api_models import APIOperation
|
||||
from langchain_community.tools.openapi.utils.openapi_utils import OpenAPISpec
|
||||
from langchain_community.utilities.requests import Requests
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.chains.api.openapi.chain import OpenAPIEndpointChain
|
||||
|
||||
|
||||
class NLATool(Tool):
|
||||
"""Natural Language API Tool."""
|
||||
|
||||
@classmethod
|
||||
def from_open_api_endpoint_chain(
|
||||
cls, chain: OpenAPIEndpointChain, api_title: str
|
||||
) -> "NLATool":
|
||||
"""Convert an endpoint chain to an API endpoint tool."""
|
||||
expanded_name = (
|
||||
f'{api_title.replace(" ", "_")}.{chain.api_operation.operation_id}'
|
||||
)
|
||||
description = (
|
||||
f"I'm an AI from {api_title}. Instruct what you want,"
|
||||
" and I'll assist via an API with description:"
|
||||
f" {chain.api_operation.description}"
|
||||
)
|
||||
return cls(name=expanded_name, func=chain.run, description=description)
|
||||
|
||||
@classmethod
|
||||
def from_llm_and_method(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
path: str,
|
||||
method: str,
|
||||
spec: OpenAPISpec,
|
||||
requests: Optional[Requests] = None,
|
||||
verbose: bool = False,
|
||||
return_intermediate_steps: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> "NLATool":
|
||||
"""Instantiate the tool from the specified path and method."""
|
||||
api_operation = APIOperation.from_openapi_spec(spec, path, method)
|
||||
chain = OpenAPIEndpointChain.from_api_operation(
|
||||
api_operation,
|
||||
llm,
|
||||
requests=requests,
|
||||
verbose=verbose,
|
||||
return_intermediate_steps=return_intermediate_steps,
|
||||
**kwargs,
|
||||
)
|
||||
return cls.from_open_api_endpoint_chain(chain, spec.info.title)
|
||||
@@ -0,0 +1,77 @@
|
||||
"""OpenAPI spec agent."""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
|
||||
from langchain_community.agent_toolkits.openapi.prompt import (
|
||||
OPENAPI_PREFIX,
|
||||
OPENAPI_SUFFIX,
|
||||
)
|
||||
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
|
||||
|
||||
def create_openapi_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: OpenAPIToolkit,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
prefix: str = OPENAPI_PREFIX,
|
||||
suffix: str = OPENAPI_SUFFIX,
|
||||
format_instructions: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
max_iterations: Optional[int] = 15,
|
||||
max_execution_time: Optional[float] = None,
|
||||
early_stopping_method: str = "force",
|
||||
verbose: bool = False,
|
||||
return_intermediate_steps: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct an OpenAPI agent from an LLM and tools.
|
||||
|
||||
*Security Note*: When creating an OpenAPI agent, check the permissions
|
||||
and capabilities of the underlying toolkit.
|
||||
|
||||
For example, if the default implementation of OpenAPIToolkit
|
||||
uses the RequestsToolkit which contains tools to make arbitrary
|
||||
network requests against any URL (e.g., GET, POST, PATCH, PUT, DELETE),
|
||||
|
||||
Control access to who can submit issue requests using this toolkit and
|
||||
what network access it has.
|
||||
|
||||
See https://python.langchain.com/docs/security for more information.
|
||||
"""
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.chains.llm import LLMChain
|
||||
tools = toolkit.get_tools()
|
||||
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
|
||||
prompt = ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix,
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
**prompt_params
|
||||
)
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
prompt=prompt,
|
||||
callback_manager=callback_manager,
|
||||
)
|
||||
tool_names = [tool.name for tool in tools]
|
||||
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
verbose=verbose,
|
||||
return_intermediate_steps=return_intermediate_steps,
|
||||
max_iterations=max_iterations,
|
||||
max_execution_time=max_execution_time,
|
||||
early_stopping_method=early_stopping_method,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,370 @@
|
||||
"""Agent that interacts with OpenAPI APIs via a hierarchical planning approach."""
|
||||
import json
|
||||
import re
|
||||
from functools import partial
|
||||
from typing import Any, Callable, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
import yaml
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.prompts import BasePromptTemplate, PromptTemplate
|
||||
from langchain_core.pydantic_v1 import Field
|
||||
from langchain_core.tools import BaseTool, Tool
|
||||
from langchain_community.llms import OpenAI
|
||||
|
||||
from langchain_community.agent_toolkits.openapi.planner_prompt import (
|
||||
API_CONTROLLER_PROMPT,
|
||||
API_CONTROLLER_TOOL_DESCRIPTION,
|
||||
API_CONTROLLER_TOOL_NAME,
|
||||
API_ORCHESTRATOR_PROMPT,
|
||||
API_PLANNER_PROMPT,
|
||||
API_PLANNER_TOOL_DESCRIPTION,
|
||||
API_PLANNER_TOOL_NAME,
|
||||
PARSING_DELETE_PROMPT,
|
||||
PARSING_GET_PROMPT,
|
||||
PARSING_PATCH_PROMPT,
|
||||
PARSING_POST_PROMPT,
|
||||
PARSING_PUT_PROMPT,
|
||||
REQUESTS_DELETE_TOOL_DESCRIPTION,
|
||||
REQUESTS_GET_TOOL_DESCRIPTION,
|
||||
REQUESTS_PATCH_TOOL_DESCRIPTION,
|
||||
REQUESTS_POST_TOOL_DESCRIPTION,
|
||||
REQUESTS_PUT_TOOL_DESCRIPTION,
|
||||
)
|
||||
from langchain_community.agent_toolkits.openapi.spec import ReducedOpenAPISpec
|
||||
from langchain_community.tools.requests.tool import BaseRequestsTool
|
||||
from langchain_community.utilities.requests import RequestsWrapper
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.memory import ReadOnlySharedMemory
|
||||
|
||||
#
|
||||
# Requests tools with LLM-instructed extraction of truncated responses.
|
||||
#
|
||||
# Of course, truncating so bluntly may lose a lot of valuable
|
||||
# information in the response.
|
||||
# However, the goal for now is to have only a single inference step.
|
||||
MAX_RESPONSE_LENGTH = 5000
|
||||
"""Maximum length of the response to be returned."""
|
||||
|
||||
|
||||
def _get_default_llm_chain(prompt: BasePromptTemplate) -> LLMChain:
|
||||
from langchain.chains.llm import LLMChain
|
||||
return LLMChain(
|
||||
llm=OpenAI(),
|
||||
prompt=prompt,
|
||||
)
|
||||
|
||||
|
||||
def _get_default_llm_chain_factory(
|
||||
prompt: BasePromptTemplate,
|
||||
) -> Callable[[], LLMChain]:
|
||||
"""Returns a default LLMChain factory."""
|
||||
return partial(_get_default_llm_chain, prompt)
|
||||
|
||||
|
||||
class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
"""Requests GET tool with LLM-instructed extraction of truncated responses."""
|
||||
|
||||
name: str = "requests_get"
|
||||
"""Tool name."""
|
||||
description = REQUESTS_GET_TOOL_DESCRIPTION
|
||||
"""Tool description."""
|
||||
response_length: Optional[int] = MAX_RESPONSE_LENGTH
|
||||
"""Maximum length of the response to be returned."""
|
||||
llm_chain: Any = Field(
|
||||
default_factory=_get_default_llm_chain_factory(PARSING_GET_PROMPT)
|
||||
)
|
||||
"""LLMChain used to extract the response."""
|
||||
|
||||
def _run(self, text: str) -> str:
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
try:
|
||||
data = parse_json_markdown(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
data_params = data.get("params")
|
||||
response = self.requests_wrapper.get(data["url"], params=data_params)
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
).strip()
|
||||
|
||||
async def _arun(self, text: str) -> str:
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
"""Requests POST tool with LLM-instructed extraction of truncated responses."""
|
||||
|
||||
name: str = "requests_post"
|
||||
"""Tool name."""
|
||||
description = REQUESTS_POST_TOOL_DESCRIPTION
|
||||
"""Tool description."""
|
||||
response_length: Optional[int] = MAX_RESPONSE_LENGTH
|
||||
"""Maximum length of the response to be returned."""
|
||||
llm_chain: Any = Field(
|
||||
default_factory=_get_default_llm_chain_factory(PARSING_POST_PROMPT)
|
||||
)
|
||||
"""LLMChain used to extract the response."""
|
||||
|
||||
def _run(self, text: str) -> str:
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
try:
|
||||
data = parse_json_markdown(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
response = self.requests_wrapper.post(data["url"], data["data"])
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
).strip()
|
||||
|
||||
async def _arun(self, text: str) -> str:
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
"""Requests PATCH tool with LLM-instructed extraction of truncated responses."""
|
||||
|
||||
name: str = "requests_patch"
|
||||
"""Tool name."""
|
||||
description = REQUESTS_PATCH_TOOL_DESCRIPTION
|
||||
"""Tool description."""
|
||||
response_length: Optional[int] = MAX_RESPONSE_LENGTH
|
||||
"""Maximum length of the response to be returned."""
|
||||
llm_chain: Any = Field(
|
||||
default_factory=_get_default_llm_chain_factory(PARSING_PATCH_PROMPT)
|
||||
)
|
||||
"""LLMChain used to extract the response."""
|
||||
|
||||
def _run(self, text: str) -> str:
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
try:
|
||||
data = parse_json_markdown(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
response = self.requests_wrapper.patch(data["url"], data["data"])
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
).strip()
|
||||
|
||||
async def _arun(self, text: str) -> str:
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
class RequestsPutToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
"""Requests PUT tool with LLM-instructed extraction of truncated responses."""
|
||||
|
||||
name: str = "requests_put"
|
||||
"""Tool name."""
|
||||
description = REQUESTS_PUT_TOOL_DESCRIPTION
|
||||
"""Tool description."""
|
||||
response_length: Optional[int] = MAX_RESPONSE_LENGTH
|
||||
"""Maximum length of the response to be returned."""
|
||||
llm_chain: Any = Field(
|
||||
default_factory=_get_default_llm_chain_factory(PARSING_PUT_PROMPT)
|
||||
)
|
||||
"""LLMChain used to extract the response."""
|
||||
|
||||
def _run(self, text: str) -> str:
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
try:
|
||||
data = parse_json_markdown(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
response = self.requests_wrapper.put(data["url"], data["data"])
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
).strip()
|
||||
|
||||
async def _arun(self, text: str) -> str:
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
class RequestsDeleteToolWithParsing(BaseRequestsTool, BaseTool):
|
||||
"""A tool that sends a DELETE request and parses the response."""
|
||||
|
||||
name: str = "requests_delete"
|
||||
"""The name of the tool."""
|
||||
description = REQUESTS_DELETE_TOOL_DESCRIPTION
|
||||
"""The description of the tool."""
|
||||
|
||||
response_length: Optional[int] = MAX_RESPONSE_LENGTH
|
||||
"""The maximum length of the response."""
|
||||
llm_chain: Any = Field(
|
||||
default_factory=_get_default_llm_chain_factory(PARSING_DELETE_PROMPT)
|
||||
)
|
||||
"""The LLM chain used to parse the response."""
|
||||
|
||||
def _run(self, text: str) -> str:
|
||||
from langchain.output_parsers.json import parse_json_markdown
|
||||
try:
|
||||
data = parse_json_markdown(text)
|
||||
except json.JSONDecodeError as e:
|
||||
raise e
|
||||
response = self.requests_wrapper.delete(data["url"])
|
||||
response = response[: self.response_length]
|
||||
return self.llm_chain.predict(
|
||||
response=response, instructions=data["output_instructions"]
|
||||
).strip()
|
||||
|
||||
async def _arun(self, text: str) -> str:
|
||||
raise NotImplementedError()
|
||||
|
||||
|
||||
#
|
||||
# Orchestrator, planner, controller.
|
||||
#
|
||||
def _create_api_planner_tool(
|
||||
api_spec: ReducedOpenAPISpec, llm: BaseLanguageModel
|
||||
) -> Tool:
|
||||
from langchain.chains.llm import LLMChain
|
||||
endpoint_descriptions = [
|
||||
f"{name} {description}" for name, description, _ in api_spec.endpoints
|
||||
]
|
||||
prompt = PromptTemplate(
|
||||
template=API_PLANNER_PROMPT,
|
||||
input_variables=["query"],
|
||||
partial_variables={"endpoints": "- " + "- ".join(endpoint_descriptions)},
|
||||
)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
tool = Tool(
|
||||
name=API_PLANNER_TOOL_NAME,
|
||||
description=API_PLANNER_TOOL_DESCRIPTION,
|
||||
func=chain.run,
|
||||
)
|
||||
return tool
|
||||
|
||||
|
||||
def _create_api_controller_agent(
|
||||
api_url: str,
|
||||
api_docs: str,
|
||||
requests_wrapper: RequestsWrapper,
|
||||
llm: BaseLanguageModel,
|
||||
) -> AgentExecutor:
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.chains.llm import LLMChain
|
||||
get_llm_chain = LLMChain(llm=llm, prompt=PARSING_GET_PROMPT)
|
||||
post_llm_chain = LLMChain(llm=llm, prompt=PARSING_POST_PROMPT)
|
||||
tools: List[BaseTool] = [
|
||||
RequestsGetToolWithParsing(
|
||||
requests_wrapper=requests_wrapper, llm_chain=get_llm_chain
|
||||
),
|
||||
RequestsPostToolWithParsing(
|
||||
requests_wrapper=requests_wrapper, llm_chain=post_llm_chain
|
||||
),
|
||||
]
|
||||
prompt = PromptTemplate(
|
||||
template=API_CONTROLLER_PROMPT,
|
||||
input_variables=["input", "agent_scratchpad"],
|
||||
partial_variables={
|
||||
"api_url": api_url,
|
||||
"api_docs": api_docs,
|
||||
"tool_names": ", ".join([tool.name for tool in tools]),
|
||||
"tool_descriptions": "\n".join(
|
||||
[f"{tool.name}: {tool.description}" for tool in tools]
|
||||
),
|
||||
},
|
||||
)
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=LLMChain(llm=llm, prompt=prompt),
|
||||
allowed_tools=[tool.name for tool in tools],
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
|
||||
|
||||
|
||||
def _create_api_controller_tool(
|
||||
api_spec: ReducedOpenAPISpec,
|
||||
requests_wrapper: RequestsWrapper,
|
||||
llm: BaseLanguageModel,
|
||||
) -> Tool:
|
||||
"""Expose controller as a tool.
|
||||
|
||||
The tool is invoked with a plan from the planner, and dynamically
|
||||
creates a controller agent with relevant documentation only to
|
||||
constrain the context.
|
||||
"""
|
||||
|
||||
base_url = api_spec.servers[0]["url"] # TODO: do better.
|
||||
|
||||
def _create_and_run_api_controller_agent(plan_str: str) -> str:
|
||||
pattern = r"\b(GET|POST|PATCH|DELETE)\s+(/\S+)*"
|
||||
matches = re.findall(pattern, plan_str)
|
||||
endpoint_names = [
|
||||
"{method} {route}".format(method=method, route=route.split("?")[0])
|
||||
for method, route in matches
|
||||
]
|
||||
docs_str = ""
|
||||
for endpoint_name in endpoint_names:
|
||||
found_match = False
|
||||
for name, _, docs in api_spec.endpoints:
|
||||
regex_name = re.compile(re.sub("\{.*?\}", ".*", name))
|
||||
if regex_name.match(endpoint_name):
|
||||
found_match = True
|
||||
docs_str += f"== Docs for {endpoint_name} == \n{yaml.dump(docs)}\n"
|
||||
if not found_match:
|
||||
raise ValueError(f"{endpoint_name} endpoint does not exist.")
|
||||
|
||||
agent = _create_api_controller_agent(base_url, docs_str, requests_wrapper, llm)
|
||||
return agent.run(plan_str)
|
||||
|
||||
return Tool(
|
||||
name=API_CONTROLLER_TOOL_NAME,
|
||||
func=_create_and_run_api_controller_agent,
|
||||
description=API_CONTROLLER_TOOL_DESCRIPTION,
|
||||
)
|
||||
|
||||
|
||||
def create_openapi_agent(
|
||||
api_spec: ReducedOpenAPISpec,
|
||||
requests_wrapper: RequestsWrapper,
|
||||
llm: BaseLanguageModel,
|
||||
shared_memory: Optional[ReadOnlySharedMemory] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
verbose: bool = True,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Instantiate OpenAI API planner and controller for a given spec.
|
||||
|
||||
Inject credentials via requests_wrapper.
|
||||
|
||||
We use a top-level "orchestrator" agent to invoke the planner and controller,
|
||||
rather than a top-level planner
|
||||
that invokes a controller with its plan. This is to keep the planner simple.
|
||||
"""
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.chains.llm import LLMChain
|
||||
tools = [
|
||||
_create_api_planner_tool(api_spec, llm),
|
||||
_create_api_controller_tool(api_spec, requests_wrapper, llm),
|
||||
]
|
||||
prompt = PromptTemplate(
|
||||
template=API_ORCHESTRATOR_PROMPT,
|
||||
input_variables=["input", "agent_scratchpad"],
|
||||
partial_variables={
|
||||
"tool_names": ", ".join([tool.name for tool in tools]),
|
||||
"tool_descriptions": "\n".join(
|
||||
[f"{tool.name}: {tool.description}" for tool in tools]
|
||||
),
|
||||
},
|
||||
)
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=shared_memory),
|
||||
allowed_tools=[tool.name for tool in tools],
|
||||
**kwargs,
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
verbose=verbose,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,90 @@
|
||||
"""Requests toolkit."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, List
|
||||
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.tools import Tool
|
||||
|
||||
from langchain_community.agent_toolkits.base import BaseToolkit
|
||||
from langchain_community.agent_toolkits.json.base import create_json_agent
|
||||
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
|
||||
from langchain_community.agent_toolkits.openapi.prompt import DESCRIPTION
|
||||
from langchain_community.tools import BaseTool
|
||||
from langchain_community.tools.json.tool import JsonSpec
|
||||
from langchain_community.tools.requests.tool import (
|
||||
RequestsDeleteTool,
|
||||
RequestsGetTool,
|
||||
RequestsPatchTool,
|
||||
RequestsPostTool,
|
||||
RequestsPutTool,
|
||||
)
|
||||
from langchain_community.utilities.requests import TextRequestsWrapper
|
||||
|
||||
|
||||
class RequestsToolkit(BaseToolkit):
|
||||
"""Toolkit for making REST requests.
|
||||
|
||||
*Security Note*: This toolkit contains tools to make GET, POST, PATCH, PUT,
|
||||
and DELETE requests to an API.
|
||||
|
||||
Exercise care in who is allowed to use this toolkit. If exposing
|
||||
to end users, consider that users will be able to make arbitrary
|
||||
requests on behalf of the server hosting the code. For example,
|
||||
users could ask the server to make a request to a private API
|
||||
that is only accessible from the server.
|
||||
|
||||
Control access to who can submit issue requests using this toolkit and
|
||||
what network access it has.
|
||||
|
||||
See https://python.langchain.com/docs/security for more information.
|
||||
"""
|
||||
|
||||
requests_wrapper: TextRequestsWrapper
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Return a list of tools."""
|
||||
return [
|
||||
RequestsGetTool(requests_wrapper=self.requests_wrapper),
|
||||
RequestsPostTool(requests_wrapper=self.requests_wrapper),
|
||||
RequestsPatchTool(requests_wrapper=self.requests_wrapper),
|
||||
RequestsPutTool(requests_wrapper=self.requests_wrapper),
|
||||
RequestsDeleteTool(requests_wrapper=self.requests_wrapper),
|
||||
]
|
||||
|
||||
|
||||
class OpenAPIToolkit(BaseToolkit):
|
||||
"""Toolkit for interacting with an OpenAPI API.
|
||||
|
||||
*Security Note*: This toolkit contains tools that can read and modify
|
||||
the state of a service; e.g., by creating, deleting, or updating,
|
||||
reading underlying data.
|
||||
|
||||
For example, this toolkit can be used to delete data exposed via
|
||||
an OpenAPI compliant API.
|
||||
"""
|
||||
|
||||
json_agent: Any
|
||||
requests_wrapper: TextRequestsWrapper
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
json_agent_tool = Tool(
|
||||
name="json_explorer",
|
||||
func=self.json_agent.run,
|
||||
description=DESCRIPTION,
|
||||
)
|
||||
request_toolkit = RequestsToolkit(requests_wrapper=self.requests_wrapper)
|
||||
return [*request_toolkit.get_tools(), json_agent_tool]
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
json_spec: JsonSpec,
|
||||
requests_wrapper: TextRequestsWrapper,
|
||||
**kwargs: Any,
|
||||
) -> OpenAPIToolkit:
|
||||
"""Create json agent from llm, then initialize."""
|
||||
json_agent = create_json_agent(llm, JsonToolkit(spec=json_spec), **kwargs)
|
||||
return cls(json_agent=json_agent, requests_wrapper=requests_wrapper)
|
||||
@@ -0,0 +1,68 @@
|
||||
"""Power BI agent."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
|
||||
from langchain_community.agent_toolkits.powerbi.prompt import (
|
||||
POWERBI_PREFIX,
|
||||
POWERBI_SUFFIX,
|
||||
)
|
||||
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
|
||||
from langchain_community.utilities.powerbi import PowerBIDataset
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents import AgentExecutor
|
||||
|
||||
|
||||
def create_pbi_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: Optional[PowerBIToolkit] = None,
|
||||
powerbi: Optional[PowerBIDataset] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
prefix: str = POWERBI_PREFIX,
|
||||
suffix: str = POWERBI_SUFFIX,
|
||||
format_instructions: Optional[str] = None,
|
||||
examples: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
top_k: int = 10,
|
||||
verbose: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct a Power BI agent from an LLM and tools."""
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents import AgentExecutor
|
||||
from langchain.chains.llm import LLMChain
|
||||
if toolkit is None:
|
||||
if powerbi is None:
|
||||
raise ValueError("Must provide either a toolkit or powerbi dataset")
|
||||
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
|
||||
tools = toolkit.get_tools()
|
||||
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
|
||||
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=LLMChain(
|
||||
llm=llm,
|
||||
prompt=ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix.format(top_k=top_k).format(tables=tables),
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
**prompt_params,
|
||||
),
|
||||
callback_manager=callback_manager, # type: ignore
|
||||
verbose=verbose,
|
||||
),
|
||||
allowed_tools=[tool.name for tool in tools],
|
||||
**kwargs,
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
verbose=verbose,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,69 @@
|
||||
"""Power BI agent."""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models.chat_models import BaseChatModel
|
||||
|
||||
from langchain_community.agent_toolkits.powerbi.prompt import (
|
||||
POWERBI_CHAT_PREFIX,
|
||||
POWERBI_CHAT_SUFFIX,
|
||||
)
|
||||
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
|
||||
from langchain_community.utilities.powerbi import PowerBIDataset
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents import AgentExecutor
|
||||
from langchain.agents.agent import AgentOutputParser
|
||||
from langchain.memory.chat_memory import BaseChatMemory
|
||||
|
||||
|
||||
def create_pbi_chat_agent(
|
||||
llm: BaseChatModel,
|
||||
toolkit: Optional[PowerBIToolkit] = None,
|
||||
powerbi: Optional[PowerBIDataset] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
output_parser: Optional[AgentOutputParser] = None,
|
||||
prefix: str = POWERBI_CHAT_PREFIX,
|
||||
suffix: str = POWERBI_CHAT_SUFFIX,
|
||||
examples: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
memory: Optional[BaseChatMemory] = None,
|
||||
top_k: int = 10,
|
||||
verbose: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct a Power BI agent from a Chat LLM and tools.
|
||||
|
||||
If you supply only a toolkit and no Power BI dataset, the same LLM is used for both.
|
||||
"""
|
||||
from langchain.agents import AgentExecutor
|
||||
from langchain.agents.conversational_chat.base import ConversationalChatAgent
|
||||
from langchain.memory import ConversationBufferMemory
|
||||
if toolkit is None:
|
||||
if powerbi is None:
|
||||
raise ValueError("Must provide either a toolkit or powerbi dataset")
|
||||
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
|
||||
tools = toolkit.get_tools()
|
||||
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
|
||||
agent = ConversationalChatAgent.from_llm_and_tools(
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
system_message=prefix.format(top_k=top_k).format(tables=tables),
|
||||
human_message=suffix,
|
||||
input_variables=input_variables,
|
||||
callback_manager=callback_manager,
|
||||
output_parser=output_parser,
|
||||
verbose=verbose,
|
||||
**kwargs,
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
memory=memory
|
||||
or ConversationBufferMemory(memory_key="chat_history", return_messages=True),
|
||||
verbose=verbose,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,106 @@
|
||||
"""Toolkit for interacting with a Power BI dataset."""
|
||||
from __future__ import annotations
|
||||
from typing import List, Optional, Union, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.language_models.chat_models import BaseChatModel
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
from langchain_core.prompts.chat import (
|
||||
ChatPromptTemplate,
|
||||
HumanMessagePromptTemplate,
|
||||
SystemMessagePromptTemplate,
|
||||
)
|
||||
from langchain_core.pydantic_v1 import Field
|
||||
|
||||
from langchain_community.agent_toolkits.base import BaseToolkit
|
||||
from langchain_community.tools import BaseTool
|
||||
from langchain_community.tools.powerbi.prompt import (
|
||||
QUESTION_TO_QUERY_BASE,
|
||||
SINGLE_QUESTION_TO_QUERY,
|
||||
USER_INPUT,
|
||||
)
|
||||
from langchain_community.tools.powerbi.tool import (
|
||||
InfoPowerBITool,
|
||||
ListPowerBITool,
|
||||
QueryPowerBITool,
|
||||
)
|
||||
from langchain_community.utilities.powerbi import PowerBIDataset
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.chains.llm import LLMChain
|
||||
|
||||
|
||||
class PowerBIToolkit(BaseToolkit):
|
||||
"""Toolkit for interacting with Power BI dataset.
|
||||
|
||||
*Security Note*: This toolkit interacts with an external service.
|
||||
|
||||
Control access to who can use this toolkit.
|
||||
|
||||
Make sure that the capabilities given by this toolkit to the calling
|
||||
code are appropriately scoped to the application.
|
||||
|
||||
See https://python.langchain.com/docs/security for more information.
|
||||
"""
|
||||
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
llm: Union[BaseLanguageModel, BaseChatModel] = Field(exclude=True)
|
||||
examples: Optional[str] = None
|
||||
max_iterations: int = 5
|
||||
callback_manager: Optional[BaseCallbackManager] = None
|
||||
output_token_limit: Optional[int] = None
|
||||
tiktoken_model_name: Optional[str] = None
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
return [
|
||||
QueryPowerBITool(
|
||||
llm_chain=self._get_chain(),
|
||||
powerbi=self.powerbi,
|
||||
examples=self.examples,
|
||||
max_iterations=self.max_iterations,
|
||||
output_token_limit=self.output_token_limit,
|
||||
tiktoken_model_name=self.tiktoken_model_name,
|
||||
),
|
||||
InfoPowerBITool(powerbi=self.powerbi),
|
||||
ListPowerBITool(powerbi=self.powerbi),
|
||||
]
|
||||
|
||||
def _get_chain(self) -> LLMChain:
|
||||
"""Construct the chain based on the callback manager and model type."""
|
||||
from langchain.chains.llm import LLMChain
|
||||
if isinstance(self.llm, BaseLanguageModel):
|
||||
return LLMChain(
|
||||
llm=self.llm,
|
||||
callback_manager=self.callback_manager
|
||||
if self.callback_manager
|
||||
else None,
|
||||
prompt=PromptTemplate(
|
||||
template=SINGLE_QUESTION_TO_QUERY,
|
||||
input_variables=["tool_input", "tables", "schemas", "examples"],
|
||||
),
|
||||
)
|
||||
|
||||
system_prompt = SystemMessagePromptTemplate(
|
||||
prompt=PromptTemplate(
|
||||
template=QUESTION_TO_QUERY_BASE,
|
||||
input_variables=["tables", "schemas", "examples"],
|
||||
)
|
||||
)
|
||||
human_prompt = HumanMessagePromptTemplate(
|
||||
prompt=PromptTemplate(
|
||||
template=USER_INPUT,
|
||||
input_variables=["tool_input"],
|
||||
)
|
||||
)
|
||||
return LLMChain(
|
||||
llm=self.llm,
|
||||
callback_manager=self.callback_manager if self.callback_manager else None,
|
||||
prompt=ChatPromptTemplate.from_messages([system_prompt, human_prompt]),
|
||||
)
|
||||
@@ -0,0 +1,64 @@
|
||||
"""Spark SQL agent."""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager, Callbacks
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
|
||||
from langchain_community.agent_toolkits.spark_sql.prompt import SQL_PREFIX, SQL_SUFFIX
|
||||
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
|
||||
|
||||
def create_spark_sql_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: SparkSQLToolkit,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
callbacks: Callbacks = None,
|
||||
prefix: str = SQL_PREFIX,
|
||||
suffix: str = SQL_SUFFIX,
|
||||
format_instructions: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
top_k: int = 10,
|
||||
max_iterations: Optional[int] = 15,
|
||||
max_execution_time: Optional[float] = None,
|
||||
early_stopping_method: str = "force",
|
||||
verbose: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct a Spark SQL agent from an LLM and tools."""
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.chains.llm import LLMChain
|
||||
tools = toolkit.get_tools()
|
||||
prefix = prefix.format(top_k=top_k)
|
||||
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
|
||||
prompt = ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix,
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
**prompt_params,
|
||||
)
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
prompt=prompt,
|
||||
callback_manager=callback_manager,
|
||||
callbacks=callbacks,
|
||||
)
|
||||
tool_names = [tool.name for tool in tools]
|
||||
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
callbacks=callbacks,
|
||||
verbose=verbose,
|
||||
max_iterations=max_iterations,
|
||||
max_execution_time=max_execution_time,
|
||||
early_stopping_method=early_stopping_method,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,102 @@
|
||||
"""SQL agent."""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, List, Optional, Sequence, TYPE_CHECKING
|
||||
|
||||
from langchain_core.callbacks import BaseCallbackManager
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.messages import AIMessage, SystemMessage
|
||||
from langchain_core.prompts.chat import (
|
||||
ChatPromptTemplate,
|
||||
HumanMessagePromptTemplate,
|
||||
MessagesPlaceholder,
|
||||
)
|
||||
|
||||
from langchain_community.agent_toolkits.sql.prompt import (
|
||||
SQL_FUNCTIONS_SUFFIX,
|
||||
SQL_PREFIX,
|
||||
SQL_SUFFIX,
|
||||
)
|
||||
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
|
||||
from langchain_community.tools import BaseTool
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
from langchain.agents.agent_types import AgentType
|
||||
|
||||
|
||||
def create_sql_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: SQLDatabaseToolkit,
|
||||
agent_type: Optional[AgentType] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
prefix: str = SQL_PREFIX,
|
||||
suffix: Optional[str] = None,
|
||||
format_instructions: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
top_k: int = 10,
|
||||
max_iterations: Optional[int] = 15,
|
||||
max_execution_time: Optional[float] = None,
|
||||
early_stopping_method: str = "force",
|
||||
verbose: bool = False,
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
extra_tools: Sequence[BaseTool] = (),
|
||||
**kwargs: Any,
|
||||
) -> AgentExecutor:
|
||||
"""Construct an SQL agent from an LLM and tools."""
|
||||
from langchain.agents.agent import AgentExecutor, BaseSingleActionAgent
|
||||
from langchain.agents.agent_types import AgentType
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
|
||||
from langchain.chains.llm import LLMChain
|
||||
agent_type = agent_type or AgentType.ZERO_SHOT_REACT_DESCRIPTION
|
||||
tools = toolkit.get_tools() + list(extra_tools)
|
||||
prefix = prefix.format(dialect=toolkit.dialect, top_k=top_k)
|
||||
agent: BaseSingleActionAgent
|
||||
|
||||
if agent_type == AgentType.ZERO_SHOT_REACT_DESCRIPTION:
|
||||
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
|
||||
prompt = ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix,
|
||||
suffix=suffix or SQL_SUFFIX,
|
||||
input_variables=input_variables,
|
||||
**prompt_params,
|
||||
)
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
prompt=prompt,
|
||||
callback_manager=callback_manager,
|
||||
)
|
||||
tool_names = [tool.name for tool in tools]
|
||||
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
|
||||
|
||||
elif agent_type == AgentType.OPENAI_FUNCTIONS:
|
||||
messages = [
|
||||
SystemMessage(content=prefix),
|
||||
HumanMessagePromptTemplate.from_template("{input}"),
|
||||
AIMessage(content=suffix or SQL_FUNCTIONS_SUFFIX),
|
||||
MessagesPlaceholder(variable_name="agent_scratchpad"),
|
||||
]
|
||||
input_variables = ["input", "agent_scratchpad"]
|
||||
_prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)
|
||||
|
||||
agent = OpenAIFunctionsAgent(
|
||||
llm=llm,
|
||||
prompt=_prompt,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
**kwargs,
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Agent type {agent_type} not supported at the moment.")
|
||||
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
callback_manager=callback_manager,
|
||||
verbose=verbose,
|
||||
max_iterations=max_iterations,
|
||||
max_execution_time=max_execution_time,
|
||||
early_stopping_method=early_stopping_method,
|
||||
**(agent_executor_kwargs or {}),
|
||||
)
|
||||
@@ -0,0 +1,66 @@
|
||||
"""**Callback handlers** allow listening to events in LangChain.
|
||||
|
||||
**Class hierarchy:**
|
||||
|
||||
.. code-block::
|
||||
|
||||
BaseCallbackHandler --> <name>CallbackHandler # Example: AimCallbackHandler
|
||||
"""
|
||||
|
||||
from langchain_community.callbacks.aim_callback import AimCallbackHandler
|
||||
from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler
|
||||
from langchain_community.callbacks.arize_callback import ArizeCallbackHandler
|
||||
from langchain_community.callbacks.arthur_callback import ArthurCallbackHandler
|
||||
from langchain_community.callbacks.clearml_callback import ClearMLCallbackHandler
|
||||
from langchain_community.callbacks.comet_ml_callback import CometCallbackHandler
|
||||
from langchain_community.callbacks.context_callback import ContextCallbackHandler
|
||||
from langchain_community.callbacks.flyte_callback import FlyteCallbackHandler
|
||||
from langchain_community.callbacks.human import HumanApprovalCallbackHandler
|
||||
from langchain_community.callbacks.infino_callback import InfinoCallbackHandler
|
||||
from langchain_community.callbacks.labelstudio_callback import (
|
||||
LabelStudioCallbackHandler,
|
||||
)
|
||||
from langchain_community.callbacks.llmonitor_callback import LLMonitorCallbackHandler
|
||||
from langchain_community.callbacks.manager import (
|
||||
get_openai_callback,
|
||||
wandb_tracing_enabled,
|
||||
)
|
||||
from langchain_community.callbacks.mlflow_callback import MlflowCallbackHandler
|
||||
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
|
||||
from langchain_community.callbacks.promptlayer_callback import (
|
||||
PromptLayerCallbackHandler,
|
||||
)
|
||||
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
|
||||
from langchain_community.callbacks.streamlit import (
|
||||
LLMThoughtLabeler,
|
||||
StreamlitCallbackHandler,
|
||||
)
|
||||
from langchain_community.callbacks.trubrics_callback import TrubricsCallbackHandler
|
||||
from langchain_community.callbacks.wandb_callback import WandbCallbackHandler
|
||||
from langchain_community.callbacks.whylabs_callback import WhyLabsCallbackHandler
|
||||
|
||||
__all__ = [
|
||||
"AimCallbackHandler",
|
||||
"ArgillaCallbackHandler",
|
||||
"ArizeCallbackHandler",
|
||||
"PromptLayerCallbackHandler",
|
||||
"ArthurCallbackHandler",
|
||||
"ClearMLCallbackHandler",
|
||||
"CometCallbackHandler",
|
||||
"ContextCallbackHandler",
|
||||
"HumanApprovalCallbackHandler",
|
||||
"InfinoCallbackHandler",
|
||||
"MlflowCallbackHandler",
|
||||
"LLMonitorCallbackHandler",
|
||||
"OpenAICallbackHandler",
|
||||
"LLMThoughtLabeler",
|
||||
"StreamlitCallbackHandler",
|
||||
"WandbCallbackHandler",
|
||||
"WhyLabsCallbackHandler",
|
||||
"get_openai_callback",
|
||||
"wandb_tracing_enabled",
|
||||
"FlyteCallbackHandler",
|
||||
"SageMakerCallbackHandler",
|
||||
"LabelStudioCallbackHandler",
|
||||
"TrubricsCallbackHandler",
|
||||
]
|
||||
@@ -0,0 +1,69 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from contextlib import contextmanager
|
||||
from contextvars import ContextVar
|
||||
from typing import (
|
||||
Generator,
|
||||
Optional,
|
||||
)
|
||||
|
||||
from langchain_core.tracers.context import register_configure_hook
|
||||
|
||||
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
|
||||
from langchain_community.callbacks.tracers.wandb import WandbTracer
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
openai_callback_var: ContextVar[Optional[OpenAICallbackHandler]] = ContextVar(
|
||||
"openai_callback", default=None
|
||||
)
|
||||
wandb_tracing_callback_var: ContextVar[Optional[WandbTracer]] = ContextVar( # noqa: E501
|
||||
"tracing_wandb_callback", default=None
|
||||
)
|
||||
|
||||
register_configure_hook(openai_callback_var, True)
|
||||
register_configure_hook(
|
||||
wandb_tracing_callback_var, True, WandbTracer, "LANGCHAIN_WANDB_TRACING"
|
||||
)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_openai_callback() -> Generator[OpenAICallbackHandler, None, None]:
|
||||
"""Get the OpenAI callback handler in a context manager.
|
||||
which conveniently exposes token and cost information.
|
||||
|
||||
Returns:
|
||||
OpenAICallbackHandler: The OpenAI callback handler.
|
||||
|
||||
Example:
|
||||
>>> with get_openai_callback() as cb:
|
||||
... # Use the OpenAI callback handler
|
||||
"""
|
||||
cb = OpenAICallbackHandler()
|
||||
openai_callback_var.set(cb)
|
||||
yield cb
|
||||
openai_callback_var.set(None)
|
||||
|
||||
|
||||
@contextmanager
|
||||
def wandb_tracing_enabled(
|
||||
session_name: str = "default",
|
||||
) -> Generator[None, None, None]:
|
||||
"""Get the WandbTracer in a context manager.
|
||||
|
||||
Args:
|
||||
session_name (str, optional): The name of the session.
|
||||
Defaults to "default".
|
||||
|
||||
Returns:
|
||||
None
|
||||
|
||||
Example:
|
||||
>>> with wandb_tracing_enabled() as session:
|
||||
... # Use the WandbTracer session
|
||||
"""
|
||||
cb = WandbTracer()
|
||||
wandb_tracing_callback_var.set(cb)
|
||||
yield None
|
||||
wandb_tracing_callback_var.set(None)
|
||||
@@ -0,0 +1,18 @@
|
||||
"""Tracers that record execution of LangChain runs."""
|
||||
|
||||
from langchain_core.tracers.langchain import LangChainTracer
|
||||
from langchain_core.tracers.langchain_v1 import LangChainTracerV1
|
||||
from langchain_core.tracers.stdout import (
|
||||
ConsoleCallbackHandler,
|
||||
FunctionCallbackHandler,
|
||||
)
|
||||
|
||||
from langchain_community.callbacks.tracers.wandb import WandbTracer
|
||||
|
||||
__all__ = [
|
||||
"ConsoleCallbackHandler",
|
||||
"FunctionCallbackHandler",
|
||||
"LangChainTracer",
|
||||
"LangChainTracerV1",
|
||||
"WandbTracer",
|
||||
]
|
||||
@@ -0,0 +1,101 @@
|
||||
"""Abstract interface for document loader implementations."""
|
||||
from __future__ import annotations
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Iterator, List, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
from langchain_community.document_loaders.blob_loaders import Blob
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.text_splitter import TextSplitter
|
||||
|
||||
|
||||
class BaseLoader(ABC):
|
||||
"""Interface for Document Loader.
|
||||
|
||||
Implementations should implement the lazy-loading method using generators
|
||||
to avoid loading all Documents into memory at once.
|
||||
|
||||
The `load` method will remain as is for backwards compatibility, but its
|
||||
implementation should be just `list(self.lazy_load())`.
|
||||
"""
|
||||
|
||||
# Sub-classes should implement this method
|
||||
# as return list(self.lazy_load()).
|
||||
# This method returns a List which is materialized in memory.
|
||||
@abstractmethod
|
||||
def load(self) -> List[Document]:
|
||||
"""Load data into Document objects."""
|
||||
|
||||
def load_and_split(
|
||||
self, text_splitter: Optional[TextSplitter] = None
|
||||
) -> List[Document]:
|
||||
"""Load Documents and split into chunks. Chunks are returned as Documents.
|
||||
|
||||
Args:
|
||||
text_splitter: TextSplitter instance to use for splitting documents.
|
||||
Defaults to RecursiveCharacterTextSplitter.
|
||||
|
||||
Returns:
|
||||
List of Documents.
|
||||
"""
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
|
||||
if text_splitter is None:
|
||||
_text_splitter: TextSplitter = RecursiveCharacterTextSplitter()
|
||||
else:
|
||||
_text_splitter = text_splitter
|
||||
docs = self.load()
|
||||
return _text_splitter.split_documents(docs)
|
||||
|
||||
# Attention: This method will be upgraded into an abstractmethod once it's
|
||||
# implemented in all the existing subclasses.
|
||||
def lazy_load(
|
||||
self,
|
||||
) -> Iterator[Document]:
|
||||
"""A lazy loader for Documents."""
|
||||
raise NotImplementedError(
|
||||
f"{self.__class__.__name__} does not implement lazy_load()"
|
||||
)
|
||||
|
||||
|
||||
class BaseBlobParser(ABC):
|
||||
"""Abstract interface for blob parsers.
|
||||
|
||||
A blob parser provides a way to parse raw data stored in a blob into one
|
||||
or more documents.
|
||||
|
||||
The parser can be composed with blob loaders, making it easy to reuse
|
||||
a parser independent of how the blob was originally loaded.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
|
||||
"""Lazy parsing interface.
|
||||
|
||||
Subclasses are required to implement this method.
|
||||
|
||||
Args:
|
||||
blob: Blob instance
|
||||
|
||||
Returns:
|
||||
Generator of documents
|
||||
"""
|
||||
|
||||
def parse(self, blob: Blob) -> List[Document]:
|
||||
"""Eagerly parse the blob into a document or documents.
|
||||
|
||||
This is a convenience method for interactive development environment.
|
||||
|
||||
Production applications should favor the lazy_parse method instead.
|
||||
|
||||
Subclasses should generally not over-ride this parse method.
|
||||
|
||||
Args:
|
||||
blob: Blob instance
|
||||
|
||||
Returns:
|
||||
List of documents
|
||||
"""
|
||||
return list(self.lazy_parse(blob))
|
||||
@@ -0,0 +1,147 @@
|
||||
"""Use to load blobs from the local file system."""
|
||||
from pathlib import Path
|
||||
from typing import Callable, Iterable, Iterator, Optional, Sequence, TypeVar, Union
|
||||
|
||||
from langchain_community.document_loaders.blob_loaders.schema import Blob, BlobLoader
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
def _make_iterator(
|
||||
length_func: Callable[[], int], show_progress: bool = False
|
||||
) -> Callable[[Iterable[T]], Iterator[T]]:
|
||||
"""Create a function that optionally wraps an iterable in tqdm."""
|
||||
if show_progress:
|
||||
try:
|
||||
from tqdm.auto import tqdm
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"You must install tqdm to use show_progress=True."
|
||||
"You can install tqdm with `pip install tqdm`."
|
||||
)
|
||||
|
||||
# Make sure to provide `total` here so that tqdm can show
|
||||
# a progress bar that takes into account the total number of files.
|
||||
def _with_tqdm(iterable: Iterable[T]) -> Iterator[T]:
|
||||
"""Wrap an iterable in a tqdm progress bar."""
|
||||
return tqdm(iterable, total=length_func())
|
||||
|
||||
iterator = _with_tqdm
|
||||
else:
|
||||
iterator = iter # type: ignore
|
||||
|
||||
return iterator
|
||||
|
||||
|
||||
# PUBLIC API
|
||||
|
||||
|
||||
class FileSystemBlobLoader(BlobLoader):
|
||||
"""Load blobs in the local file system.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
|
||||
loader = FileSystemBlobLoader("/path/to/directory")
|
||||
for blob in loader.yield_blobs():
|
||||
print(blob)
|
||||
""" # noqa: E501
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
path: Union[str, Path],
|
||||
*,
|
||||
glob: str = "**/[!.]*",
|
||||
exclude: Sequence[str] = (),
|
||||
suffixes: Optional[Sequence[str]] = None,
|
||||
show_progress: bool = False,
|
||||
) -> None:
|
||||
"""Initialize with a path to directory and how to glob over it.
|
||||
|
||||
Args:
|
||||
path: Path to directory to load from or path to file to load.
|
||||
If a path to a file is provided, glob/exclude/suffixes are ignored.
|
||||
glob: Glob pattern relative to the specified path
|
||||
by default set to pick up all non-hidden files
|
||||
exclude: patterns to exclude from results, use glob syntax
|
||||
suffixes: Provide to keep only files with these suffixes
|
||||
Useful when wanting to keep files with different suffixes
|
||||
Suffixes must include the dot, e.g. ".txt"
|
||||
show_progress: If true, will show a progress bar as the files are loaded.
|
||||
This forces an iteration through all matching files
|
||||
to count them prior to loading them.
|
||||
|
||||
Examples:
|
||||
|
||||
.. code-block:: python
|
||||
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
|
||||
|
||||
# Load a single file.
|
||||
loader = FileSystemBlobLoader("/path/to/file.txt")
|
||||
|
||||
# Recursively load all text files in a directory.
|
||||
loader = FileSystemBlobLoader("/path/to/directory", glob="**/*.txt")
|
||||
|
||||
# Recursively load all non-hidden files in a directory.
|
||||
loader = FileSystemBlobLoader("/path/to/directory", glob="**/[!.]*")
|
||||
|
||||
# Load all files in a directory without recursion.
|
||||
loader = FileSystemBlobLoader("/path/to/directory", glob="*")
|
||||
|
||||
# Recursively load all files in a directory, except for py or pyc files.
|
||||
loader = FileSystemBlobLoader(
|
||||
"/path/to/directory",
|
||||
glob="**/*.txt",
|
||||
exclude=["**/*.py", "**/*.pyc"]
|
||||
)
|
||||
""" # noqa: E501
|
||||
if isinstance(path, Path):
|
||||
_path = path
|
||||
elif isinstance(path, str):
|
||||
_path = Path(path)
|
||||
else:
|
||||
raise TypeError(f"Expected str or Path, got {type(path)}")
|
||||
|
||||
self.path = _path.expanduser() # Expand user to handle ~
|
||||
self.glob = glob
|
||||
self.suffixes = set(suffixes or [])
|
||||
self.show_progress = show_progress
|
||||
self.exclude = exclude
|
||||
|
||||
def yield_blobs(
|
||||
self,
|
||||
) -> Iterable[Blob]:
|
||||
"""Yield blobs that match the requested pattern."""
|
||||
iterator = _make_iterator(
|
||||
length_func=self.count_matching_files, show_progress=self.show_progress
|
||||
)
|
||||
|
||||
for path in iterator(self._yield_paths()):
|
||||
yield Blob.from_path(path)
|
||||
|
||||
def _yield_paths(self) -> Iterable[Path]:
|
||||
"""Yield paths that match the requested pattern."""
|
||||
if self.path.is_file():
|
||||
yield self.path
|
||||
return
|
||||
|
||||
paths = self.path.glob(self.glob)
|
||||
for path in paths:
|
||||
if self.exclude:
|
||||
if any(path.match(glob) for glob in self.exclude):
|
||||
continue
|
||||
if path.is_file():
|
||||
if self.suffixes and path.suffix not in self.suffixes:
|
||||
continue
|
||||
yield path
|
||||
|
||||
def count_matching_files(self) -> int:
|
||||
"""Count files that match the pattern without loading them."""
|
||||
# Carry out a full iteration to count the files without
|
||||
# materializing anything expensive in memory.
|
||||
num = 0
|
||||
for _ in self._yield_paths():
|
||||
num += 1
|
||||
return num
|
||||
@@ -0,0 +1,190 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
Iterator,
|
||||
List,
|
||||
Literal,
|
||||
Optional,
|
||||
Sequence,
|
||||
Union,
|
||||
)
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
from langchain_community.document_loaders.base import BaseBlobParser, BaseLoader
|
||||
from langchain_community.document_loaders.blob_loaders import (
|
||||
BlobLoader,
|
||||
FileSystemBlobLoader,
|
||||
)
|
||||
from langchain_community.document_loaders.parsers.registry import get_parser
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.text_splitter import TextSplitter
|
||||
|
||||
_PathLike = Union[str, Path]
|
||||
|
||||
DEFAULT = Literal["default"]
|
||||
|
||||
|
||||
class GenericLoader(BaseLoader):
|
||||
"""Generic Document Loader.
|
||||
|
||||
A generic document loader that allows combining an arbitrary blob loader with
|
||||
a blob parser.
|
||||
|
||||
Examples:
|
||||
|
||||
Parse a specific PDF file:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_loaders import GenericLoader
|
||||
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
|
||||
|
||||
# Recursively load all text files in a directory.
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"my_lovely_pdf.pdf",
|
||||
parser=PyPDFParser()
|
||||
)
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_loaders import GenericLoader
|
||||
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
|
||||
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
path="path/to/directory",
|
||||
glob="**/[!.]*",
|
||||
suffixes=[".pdf"],
|
||||
show_progress=True,
|
||||
)
|
||||
|
||||
docs = loader.lazy_load()
|
||||
next(docs)
|
||||
|
||||
Example instantiations to change which files are loaded:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
# Recursively load all text files in a directory.
|
||||
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt")
|
||||
|
||||
# Recursively load all non-hidden files in a directory.
|
||||
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*")
|
||||
|
||||
# Load all files in a directory without recursion.
|
||||
loader = GenericLoader.from_filesystem("/path/to/dir", glob="*")
|
||||
|
||||
Example instantiations to change which parser is used:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
|
||||
|
||||
# Recursively load all text files in a directory.
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"/path/to/dir",
|
||||
glob="**/*.pdf",
|
||||
parser=PyPDFParser()
|
||||
)
|
||||
|
||||
""" # noqa: E501
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
blob_loader: BlobLoader,
|
||||
blob_parser: BaseBlobParser,
|
||||
) -> None:
|
||||
"""A generic document loader.
|
||||
|
||||
Args:
|
||||
blob_loader: A blob loader which knows how to yield blobs
|
||||
blob_parser: A blob parser which knows how to parse blobs into documents
|
||||
"""
|
||||
self.blob_loader = blob_loader
|
||||
self.blob_parser = blob_parser
|
||||
|
||||
def lazy_load(
|
||||
self,
|
||||
) -> Iterator[Document]:
|
||||
"""Load documents lazily. Use this when working at a large scale."""
|
||||
for blob in self.blob_loader.yield_blobs():
|
||||
yield from self.blob_parser.lazy_parse(blob)
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load all documents."""
|
||||
return list(self.lazy_load())
|
||||
|
||||
def load_and_split(
|
||||
self, text_splitter: Optional[TextSplitter] = None
|
||||
) -> List[Document]:
|
||||
"""Load all documents and split them into sentences."""
|
||||
raise NotImplementedError(
|
||||
"Loading and splitting is not yet implemented for generic loaders. "
|
||||
"When they will be implemented they will be added via the initializer. "
|
||||
"This method should not be used going forward."
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_filesystem(
|
||||
cls,
|
||||
path: _PathLike,
|
||||
*,
|
||||
glob: str = "**/[!.]*",
|
||||
exclude: Sequence[str] = (),
|
||||
suffixes: Optional[Sequence[str]] = None,
|
||||
show_progress: bool = False,
|
||||
parser: Union[DEFAULT, BaseBlobParser] = "default",
|
||||
parser_kwargs: Optional[dict] = None,
|
||||
) -> GenericLoader:
|
||||
"""Create a generic document loader using a filesystem blob loader.
|
||||
|
||||
Args:
|
||||
path: The path to the directory to load documents from OR the path to a
|
||||
single file to load. If this is a file, glob, exclude, suffixes
|
||||
will be ignored.
|
||||
glob: The glob pattern to use to find documents.
|
||||
suffixes: The suffixes to use to filter documents. If None, all files
|
||||
matching the glob will be loaded.
|
||||
exclude: A list of patterns to exclude from the loader.
|
||||
show_progress: Whether to show a progress bar or not (requires tqdm).
|
||||
Proxies to the file system loader.
|
||||
parser: A blob parser which knows how to parse blobs into documents,
|
||||
will instantiate a default parser if not provided.
|
||||
The default can be overridden by either passing a parser or
|
||||
setting the class attribute `blob_parser` (the latter
|
||||
should be used with inheritance).
|
||||
parser_kwargs: Keyword arguments to pass to the parser.
|
||||
|
||||
Returns:
|
||||
A generic document loader.
|
||||
"""
|
||||
blob_loader = FileSystemBlobLoader(
|
||||
path,
|
||||
glob=glob,
|
||||
exclude=exclude,
|
||||
suffixes=suffixes,
|
||||
show_progress=show_progress,
|
||||
)
|
||||
if isinstance(parser, str):
|
||||
if parser == "default":
|
||||
try:
|
||||
# If there is an implementation of get_parser on the class, use it.
|
||||
blob_parser = cls.get_parser(**(parser_kwargs or {}))
|
||||
except NotImplementedError:
|
||||
# if not then use the global registry.
|
||||
blob_parser = get_parser(parser)
|
||||
else:
|
||||
blob_parser = get_parser(parser)
|
||||
else:
|
||||
blob_parser = parser
|
||||
return cls(blob_loader, blob_parser)
|
||||
|
||||
@staticmethod
|
||||
def get_parser(**kwargs: Any) -> BaseBlobParser:
|
||||
"""Override this method to associate a default parser with the class."""
|
||||
raise NotImplementedError()
|
||||
@@ -0,0 +1,70 @@
|
||||
"""Code for generic / auxiliary parsers.
|
||||
|
||||
This module contains some logic to help assemble more sophisticated parsers.
|
||||
"""
|
||||
from typing import Iterator, Mapping, Optional
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
from langchain_community.document_loaders.base import BaseBlobParser
|
||||
from langchain_community.document_loaders.blob_loaders.schema import Blob
|
||||
|
||||
|
||||
class MimeTypeBasedParser(BaseBlobParser):
|
||||
"""Parser that uses `mime`-types to parse a blob.
|
||||
|
||||
This parser is useful for simple pipelines where the mime-type is sufficient
|
||||
to determine how to parse a blob.
|
||||
|
||||
To use, configure handlers based on mime-types and pass them to the initializer.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser
|
||||
|
||||
parser = MimeTypeBasedParser(
|
||||
handlers={
|
||||
"application/pdf": ...,
|
||||
},
|
||||
fallback_parser=...,
|
||||
)
|
||||
""" # noqa: E501
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
handlers: Mapping[str, BaseBlobParser],
|
||||
*,
|
||||
fallback_parser: Optional[BaseBlobParser] = None,
|
||||
) -> None:
|
||||
"""Define a parser that uses mime-types to determine how to parse a blob.
|
||||
|
||||
Args:
|
||||
handlers: A mapping from mime-types to functions that take a blob, parse it
|
||||
and return a document.
|
||||
fallback_parser: A fallback_parser parser to use if the mime-type is not
|
||||
found in the handlers. If provided, this parser will be
|
||||
used to parse blobs with all mime-types not found in
|
||||
the handlers.
|
||||
If not provided, a ValueError will be raised if the
|
||||
mime-type is not found in the handlers.
|
||||
"""
|
||||
self.handlers = handlers
|
||||
self.fallback_parser = fallback_parser
|
||||
|
||||
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
|
||||
"""Load documents from a blob."""
|
||||
mimetype = blob.mimetype
|
||||
|
||||
if mimetype is None:
|
||||
raise ValueError(f"{blob} does not have a mimetype.")
|
||||
|
||||
if mimetype in self.handlers:
|
||||
handler = self.handlers[mimetype]
|
||||
yield from handler.lazy_parse(blob)
|
||||
else:
|
||||
if self.fallback_parser is not None:
|
||||
yield from self.fallback_parser.lazy_parse(blob)
|
||||
else:
|
||||
raise ValueError(f"Unsupported mime type: {mimetype}")
|
||||
@@ -0,0 +1,157 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, Iterator, Optional, TYPE_CHECKING
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
from langchain_community.document_loaders.base import BaseBlobParser
|
||||
from langchain_community.document_loaders.blob_loaders import Blob
|
||||
from langchain_community.document_loaders.parsers.language.cobol import CobolSegmenter
|
||||
from langchain_community.document_loaders.parsers.language.javascript import (
|
||||
JavaScriptSegmenter,
|
||||
)
|
||||
from langchain_community.document_loaders.parsers.language.python import PythonSegmenter
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from langchain.text_splitter import Language
|
||||
|
||||
try:
|
||||
from langchain.text_splitter import Language
|
||||
LANGUAGE_EXTENSIONS: Dict[str, str] = {
|
||||
"py": Language.PYTHON,
|
||||
"js": Language.JS,
|
||||
"cobol": Language.COBOL,
|
||||
}
|
||||
|
||||
LANGUAGE_SEGMENTERS: Dict[str, Any] = {
|
||||
Language.PYTHON: PythonSegmenter,
|
||||
Language.JS: JavaScriptSegmenter,
|
||||
Language.COBOL: CobolSegmenter,
|
||||
}
|
||||
except ImportError:
|
||||
LANGUAGE_EXTENSIONS = {}
|
||||
LANGUAGE_SEGMENTERS = {}
|
||||
|
||||
|
||||
class LanguageParser(BaseBlobParser):
|
||||
"""Parse using the respective programming language syntax.
|
||||
|
||||
Each top-level function and class in the code is loaded into separate documents.
|
||||
Furthermore, an extra document is generated, containing the remaining top-level code
|
||||
that excludes the already segmented functions and classes.
|
||||
|
||||
This approach can potentially improve the accuracy of QA models over source code.
|
||||
|
||||
Currently, the supported languages for code parsing are Python and JavaScript.
|
||||
|
||||
The language used for parsing can be configured, along with the minimum number of
|
||||
lines required to activate the splitting based on syntax.
|
||||
|
||||
Examples:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.text_splitter.Language
|
||||
from langchain_community.document_loaders.generic import GenericLoader
|
||||
from langchain_community.document_loaders.parsers import LanguageParser
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py", ".js"],
|
||||
parser=LanguageParser()
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
Example instantiations to manually select the language:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.text_splitter import Language
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py"],
|
||||
parser=LanguageParser(language=Language.PYTHON)
|
||||
)
|
||||
|
||||
Example instantiations to set number of lines threshold:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
loader = GenericLoader.from_filesystem(
|
||||
"./code",
|
||||
glob="**/*",
|
||||
suffixes=[".py"],
|
||||
parser=LanguageParser(parser_threshold=200)
|
||||
)
|
||||
"""
|
||||
|
||||
def __init__(self, language: Optional[Language] = None, parser_threshold: int = 0):
|
||||
"""
|
||||
Language parser that split code using the respective language syntax.
|
||||
|
||||
Args:
|
||||
language: If None (default), it will try to infer language from source.
|
||||
parser_threshold: Minimum lines needed to activate parsing (0 by default).
|
||||
"""
|
||||
self.language = language
|
||||
self.parser_threshold = parser_threshold
|
||||
|
||||
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
|
||||
code = blob.as_string()
|
||||
|
||||
language = self.language or (
|
||||
LANGUAGE_EXTENSIONS.get(blob.source.rsplit(".", 1)[-1])
|
||||
if isinstance(blob.source, str)
|
||||
else None
|
||||
)
|
||||
|
||||
if language is None:
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
if self.parser_threshold >= len(code.splitlines()):
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
self.Segmenter = LANGUAGE_SEGMENTERS[language]
|
||||
segmenter = self.Segmenter(blob.as_string())
|
||||
if not segmenter.is_valid():
|
||||
yield Document(
|
||||
page_content=code,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
},
|
||||
)
|
||||
return
|
||||
|
||||
for functions_classes in segmenter.extract_functions_classes():
|
||||
yield Document(
|
||||
page_content=functions_classes,
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"content_type": "functions_classes",
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
yield Document(
|
||||
page_content=segmenter.simplify_code(),
|
||||
metadata={
|
||||
"source": blob.source,
|
||||
"content_type": "simplified_code",
|
||||
"language": language,
|
||||
},
|
||||
)
|
||||
@@ -0,0 +1,262 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Dict, List, Optional, Union
|
||||
|
||||
from langchain_core.documents import Document
|
||||
|
||||
from langchain_community.document_loaders.base import BaseLoader
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import pandas as pd
|
||||
from telethon.hints import EntityLike
|
||||
|
||||
|
||||
def concatenate_rows(row: dict) -> str:
|
||||
"""Combine message information in a readable format ready to be used."""
|
||||
date = row["date"]
|
||||
sender = row["from"]
|
||||
text = row["text"]
|
||||
return f"{sender} on {date}: {text}\n\n"
|
||||
|
||||
|
||||
class TelegramChatFileLoader(BaseLoader):
|
||||
"""Load from `Telegram chat` dump."""
|
||||
|
||||
def __init__(self, path: str):
|
||||
"""Initialize with a path."""
|
||||
self.file_path = path
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
p = Path(self.file_path)
|
||||
|
||||
with open(p, encoding="utf8") as f:
|
||||
d = json.load(f)
|
||||
|
||||
text = "".join(
|
||||
concatenate_rows(message)
|
||||
for message in d["messages"]
|
||||
if message["type"] == "message" and isinstance(message["text"], str)
|
||||
)
|
||||
metadata = {"source": str(p)}
|
||||
|
||||
return [Document(page_content=text, metadata=metadata)]
|
||||
|
||||
|
||||
def text_to_docs(text: Union[str, List[str]]) -> List[Document]:
|
||||
"""Convert a string or list of strings to a list of Documents with metadata."""
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
if isinstance(text, str):
|
||||
# Take a single string as one page
|
||||
text = [text]
|
||||
page_docs = [Document(page_content=page) for page in text]
|
||||
|
||||
# Add page numbers as metadata
|
||||
for i, doc in enumerate(page_docs):
|
||||
doc.metadata["page"] = i + 1
|
||||
|
||||
# Split pages into chunks
|
||||
doc_chunks = []
|
||||
|
||||
for doc in page_docs:
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=800,
|
||||
separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
|
||||
chunk_overlap=20,
|
||||
)
|
||||
chunks = text_splitter.split_text(doc.page_content)
|
||||
for i, chunk in enumerate(chunks):
|
||||
doc = Document(
|
||||
page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
|
||||
)
|
||||
# Add sources a metadata
|
||||
doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
|
||||
doc_chunks.append(doc)
|
||||
return doc_chunks
|
||||
|
||||
|
||||
class TelegramChatApiLoader(BaseLoader):
|
||||
"""Load `Telegram` chat json directory dump."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
chat_entity: Optional[EntityLike] = None,
|
||||
api_id: Optional[int] = None,
|
||||
api_hash: Optional[str] = None,
|
||||
username: Optional[str] = None,
|
||||
file_path: str = "telegram_data.json",
|
||||
):
|
||||
"""Initialize with API parameters.
|
||||
|
||||
Args:
|
||||
chat_entity: The chat entity to fetch data from.
|
||||
api_id: The API ID.
|
||||
api_hash: The API hash.
|
||||
username: The username.
|
||||
file_path: The file path to save the data to. Defaults to
|
||||
"telegram_data.json".
|
||||
"""
|
||||
self.chat_entity = chat_entity
|
||||
self.api_id = api_id
|
||||
self.api_hash = api_hash
|
||||
self.username = username
|
||||
self.file_path = file_path
|
||||
|
||||
async def fetch_data_from_telegram(self) -> None:
|
||||
"""Fetch data from Telegram API and save it as a JSON file."""
|
||||
from telethon.sync import TelegramClient
|
||||
|
||||
data = []
|
||||
async with TelegramClient(self.username, self.api_id, self.api_hash) as client:
|
||||
async for message in client.iter_messages(self.chat_entity):
|
||||
is_reply = message.reply_to is not None
|
||||
reply_to_id = message.reply_to.reply_to_msg_id if is_reply else None
|
||||
data.append(
|
||||
{
|
||||
"sender_id": message.sender_id,
|
||||
"text": message.text,
|
||||
"date": message.date.isoformat(),
|
||||
"message.id": message.id,
|
||||
"is_reply": is_reply,
|
||||
"reply_to_id": reply_to_id,
|
||||
}
|
||||
)
|
||||
|
||||
with open(self.file_path, "w", encoding="utf-8") as f:
|
||||
json.dump(data, f, ensure_ascii=False, indent=4)
|
||||
|
||||
def _get_message_threads(self, data: pd.DataFrame) -> dict:
|
||||
"""Create a dictionary of message threads from the given data.
|
||||
|
||||
Args:
|
||||
data (pd.DataFrame): A DataFrame containing the conversation \
|
||||
data with columns:
|
||||
- message.sender_id
|
||||
- text
|
||||
- date
|
||||
- message.id
|
||||
- is_reply
|
||||
- reply_to_id
|
||||
|
||||
Returns:
|
||||
dict: A dictionary where the key is the parent message ID and \
|
||||
the value is a list of message IDs in ascending order.
|
||||
"""
|
||||
|
||||
def find_replies(parent_id: int, reply_data: pd.DataFrame) -> List[int]:
|
||||
"""
|
||||
Recursively find all replies to a given parent message ID.
|
||||
|
||||
Args:
|
||||
parent_id (int): The parent message ID.
|
||||
reply_data (pd.DataFrame): A DataFrame containing reply messages.
|
||||
|
||||
Returns:
|
||||
list: A list of message IDs that are replies to the parent message ID.
|
||||
"""
|
||||
# Find direct replies to the parent message ID
|
||||
direct_replies = reply_data[reply_data["reply_to_id"] == parent_id][
|
||||
"message.id"
|
||||
].tolist()
|
||||
|
||||
# Recursively find replies to the direct replies
|
||||
all_replies = []
|
||||
for reply_id in direct_replies:
|
||||
all_replies += [reply_id] + find_replies(reply_id, reply_data)
|
||||
|
||||
return all_replies
|
||||
|
||||
# Filter out parent messages
|
||||
parent_messages = data[~data["is_reply"]]
|
||||
|
||||
# Filter out reply messages and drop rows with NaN in 'reply_to_id'
|
||||
reply_messages = data[data["is_reply"]].dropna(subset=["reply_to_id"])
|
||||
|
||||
# Convert 'reply_to_id' to integer
|
||||
reply_messages["reply_to_id"] = reply_messages["reply_to_id"].astype(int)
|
||||
|
||||
# Create a dictionary of message threads with parent message IDs as keys and \
|
||||
# lists of reply message IDs as values
|
||||
message_threads = {
|
||||
parent_id: [parent_id] + find_replies(parent_id, reply_messages)
|
||||
for parent_id in parent_messages["message.id"]
|
||||
}
|
||||
|
||||
return message_threads
|
||||
|
||||
def _combine_message_texts(
|
||||
self, message_threads: Dict[int, List[int]], data: pd.DataFrame
|
||||
) -> str:
|
||||
"""
|
||||
Combine the message texts for each parent message ID based \
|
||||
on the list of message threads.
|
||||
|
||||
Args:
|
||||
message_threads (dict): A dictionary where the key is the parent message \
|
||||
ID and the value is a list of message IDs in ascending order.
|
||||
data (pd.DataFrame): A DataFrame containing the conversation data:
|
||||
- message.sender_id
|
||||
- text
|
||||
- date
|
||||
- message.id
|
||||
- is_reply
|
||||
- reply_to_id
|
||||
|
||||
Returns:
|
||||
str: A combined string of message texts sorted by date.
|
||||
"""
|
||||
combined_text = ""
|
||||
|
||||
# Iterate through sorted parent message IDs
|
||||
for parent_id, message_ids in message_threads.items():
|
||||
# Get the message texts for the message IDs and sort them by date
|
||||
message_texts = (
|
||||
data[data["message.id"].isin(message_ids)]
|
||||
.sort_values(by="date")["text"]
|
||||
.tolist()
|
||||
)
|
||||
message_texts = [str(elem) for elem in message_texts]
|
||||
|
||||
# Combine the message texts
|
||||
combined_text += " ".join(message_texts) + ".\n"
|
||||
|
||||
return combined_text.strip()
|
||||
|
||||
def load(self) -> List[Document]:
|
||||
"""Load documents."""
|
||||
|
||||
if self.chat_entity is not None:
|
||||
try:
|
||||
import nest_asyncio
|
||||
|
||||
nest_asyncio.apply()
|
||||
asyncio.run(self.fetch_data_from_telegram())
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"""`nest_asyncio` package not found.
|
||||
please install with `pip install nest_asyncio`
|
||||
"""
|
||||
)
|
||||
|
||||
p = Path(self.file_path)
|
||||
|
||||
with open(p, encoding="utf8") as f:
|
||||
d = json.load(f)
|
||||
try:
|
||||
import pandas as pd
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"""`pandas` package not found.
|
||||
please install with `pip install pandas`
|
||||
"""
|
||||
)
|
||||
normalized_messages = pd.json_normalize(d)
|
||||
df = pd.DataFrame(normalized_messages)
|
||||
|
||||
message_threads = self._get_message_threads(df)
|
||||
combined_texts = self._combine_message_texts(message_threads, df)
|
||||
|
||||
return text_to_docs(combined_texts)
|
||||
@@ -0,0 +1,149 @@
|
||||
from typing import Any, Iterator, List, Sequence, cast
|
||||
|
||||
from langchain_core.documents import BaseDocumentTransformer, Document
|
||||
|
||||
|
||||
class BeautifulSoupTransformer(BaseDocumentTransformer):
|
||||
"""Transform HTML content by extracting specific tags and removing unwanted ones.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.document_transformers import BeautifulSoupTransformer
|
||||
|
||||
bs4_transformer = BeautifulSoupTransformer()
|
||||
docs_transformed = bs4_transformer.transform_documents(docs)
|
||||
""" # noqa: E501
|
||||
|
||||
def __init__(self) -> None:
|
||||
"""
|
||||
Initialize the transformer.
|
||||
|
||||
This checks if the BeautifulSoup4 package is installed.
|
||||
If not, it raises an ImportError.
|
||||
"""
|
||||
try:
|
||||
import bs4 # noqa:F401
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"BeautifulSoup4 is required for BeautifulSoupTransformer. "
|
||||
"Please install it with `pip install beautifulsoup4`."
|
||||
)
|
||||
|
||||
def transform_documents(
|
||||
self,
|
||||
documents: Sequence[Document],
|
||||
unwanted_tags: List[str] = ["script", "style"],
|
||||
tags_to_extract: List[str] = ["p", "li", "div", "a"],
|
||||
remove_lines: bool = True,
|
||||
**kwargs: Any,
|
||||
) -> Sequence[Document]:
|
||||
"""
|
||||
Transform a list of Document objects by cleaning their HTML content.
|
||||
|
||||
Args:
|
||||
documents: A sequence of Document objects containing HTML content.
|
||||
unwanted_tags: A list of tags to be removed from the HTML.
|
||||
tags_to_extract: A list of tags whose content will be extracted.
|
||||
remove_lines: If set to True, unnecessary lines will be
|
||||
removed from the HTML content.
|
||||
|
||||
Returns:
|
||||
A sequence of Document objects with transformed content.
|
||||
"""
|
||||
for doc in documents:
|
||||
cleaned_content = doc.page_content
|
||||
|
||||
cleaned_content = self.remove_unwanted_tags(cleaned_content, unwanted_tags)
|
||||
|
||||
cleaned_content = self.extract_tags(cleaned_content, tags_to_extract)
|
||||
|
||||
if remove_lines:
|
||||
cleaned_content = self.remove_unnecessary_lines(cleaned_content)
|
||||
|
||||
doc.page_content = cleaned_content
|
||||
|
||||
return documents
|
||||
|
||||
@staticmethod
|
||||
def remove_unwanted_tags(html_content: str, unwanted_tags: List[str]) -> str:
|
||||
"""
|
||||
Remove unwanted tags from a given HTML content.
|
||||
|
||||
Args:
|
||||
html_content: The original HTML content string.
|
||||
unwanted_tags: A list of tags to be removed from the HTML.
|
||||
|
||||
Returns:
|
||||
A cleaned HTML string with unwanted tags removed.
|
||||
"""
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
soup = BeautifulSoup(html_content, "html.parser")
|
||||
for tag in unwanted_tags:
|
||||
for element in soup.find_all(tag):
|
||||
element.decompose()
|
||||
return str(soup)
|
||||
|
||||
@staticmethod
|
||||
def extract_tags(html_content: str, tags: List[str]) -> str:
|
||||
"""
|
||||
Extract specific tags from a given HTML content.
|
||||
|
||||
Args:
|
||||
html_content: The original HTML content string.
|
||||
tags: A list of tags to be extracted from the HTML.
|
||||
|
||||
Returns:
|
||||
A string combining the content of the extracted tags.
|
||||
"""
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
soup = BeautifulSoup(html_content, "html.parser")
|
||||
text_parts: List[str] = []
|
||||
for element in soup.find_all():
|
||||
if element.name in tags:
|
||||
# Extract all navigable strings recursively from this element.
|
||||
text_parts += get_navigable_strings(element)
|
||||
|
||||
# To avoid duplicate text, remove all descendants from the soup.
|
||||
element.decompose()
|
||||
|
||||
return " ".join(text_parts)
|
||||
|
||||
@staticmethod
|
||||
def remove_unnecessary_lines(content: str) -> str:
|
||||
"""
|
||||
Clean up the content by removing unnecessary lines.
|
||||
|
||||
Args:
|
||||
content: A string, which may contain unnecessary lines or spaces.
|
||||
|
||||
Returns:
|
||||
A cleaned string with unnecessary lines removed.
|
||||
"""
|
||||
lines = content.split("\n")
|
||||
stripped_lines = [line.strip() for line in lines]
|
||||
non_empty_lines = [line for line in stripped_lines if line]
|
||||
cleaned_content = " ".join(non_empty_lines)
|
||||
return cleaned_content
|
||||
|
||||
async def atransform_documents(
|
||||
self,
|
||||
documents: Sequence[Document],
|
||||
**kwargs: Any,
|
||||
) -> Sequence[Document]:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
def get_navigable_strings(element: Any) -> Iterator[str]:
|
||||
from bs4 import NavigableString, Tag
|
||||
|
||||
for child in cast(Tag, element).children:
|
||||
if isinstance(child, Tag):
|
||||
yield from get_navigable_strings(child)
|
||||
elif isinstance(child, NavigableString):
|
||||
if (element.name == "a") and (href := element.get("href")):
|
||||
yield f"{child.strip()} ({href})"
|
||||
else:
|
||||
yield child.strip()
|
||||
@@ -0,0 +1,140 @@
|
||||
"""Document transformers that use OpenAI Functions models"""
|
||||
from typing import Any, Dict, Optional, Sequence, Type, Union
|
||||
|
||||
from langchain_core.documents import BaseDocumentTransformer, Document
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_core.pydantic_v1 import BaseModel
|
||||
|
||||
|
||||
class OpenAIMetadataTagger(BaseDocumentTransformer, BaseModel):
|
||||
"""Extract metadata tags from document contents using OpenAI functions.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.chat_models import ChatOpenAI
|
||||
from langchain_community.document_transformers import OpenAIMetadataTagger
|
||||
from langchain_core.documents import Document
|
||||
|
||||
schema = {
|
||||
"properties": {
|
||||
"movie_title": { "type": "string" },
|
||||
"critic": { "type": "string" },
|
||||
"tone": {
|
||||
"type": "string",
|
||||
"enum": ["positive", "negative"]
|
||||
},
|
||||
"rating": {
|
||||
"type": "integer",
|
||||
"description": "The number of stars the critic rated the movie"
|
||||
}
|
||||
},
|
||||
"required": ["movie_title", "critic", "tone"]
|
||||
}
|
||||
|
||||
# Must be an OpenAI model that supports functions
|
||||
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
|
||||
tagging_chain = create_tagging_chain(schema, llm)
|
||||
document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain)
|
||||
original_documents = [
|
||||
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
|
||||
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
|
||||
]
|
||||
|
||||
enhanced_documents = document_transformer.transform_documents(original_documents)
|
||||
""" # noqa: E501
|
||||
|
||||
tagging_chain: Any
|
||||
"""The chain used to extract metadata from each document."""
|
||||
|
||||
def transform_documents(
|
||||
self, documents: Sequence[Document], **kwargs: Any
|
||||
) -> Sequence[Document]:
|
||||
"""Automatically extract and populate metadata
|
||||
for each document according to the provided schema."""
|
||||
|
||||
new_documents = []
|
||||
|
||||
for document in documents:
|
||||
extracted_metadata: Dict = self.tagging_chain.run(document.page_content) # type: ignore[assignment] # noqa: E501
|
||||
new_document = Document(
|
||||
page_content=document.page_content,
|
||||
metadata={**extracted_metadata, **document.metadata},
|
||||
)
|
||||
new_documents.append(new_document)
|
||||
return new_documents
|
||||
|
||||
async def atransform_documents(
|
||||
self, documents: Sequence[Document], **kwargs: Any
|
||||
) -> Sequence[Document]:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
def create_metadata_tagger(
|
||||
metadata_schema: Union[Dict[str, Any], Type[BaseModel]],
|
||||
llm: BaseLanguageModel,
|
||||
prompt: Optional[ChatPromptTemplate] = None,
|
||||
*,
|
||||
tagging_chain_kwargs: Optional[Dict] = None,
|
||||
) -> OpenAIMetadataTagger:
|
||||
"""Create a DocumentTransformer that uses an OpenAI function chain to automatically
|
||||
tag documents with metadata based on their content and an input schema.
|
||||
|
||||
Args:
|
||||
metadata_schema: Either a dictionary or pydantic.BaseModel class. If a dictionary
|
||||
is passed in, it's assumed to already be a valid JsonSchema.
|
||||
For best results, pydantic.BaseModels should have docstrings describing what
|
||||
the schema represents and descriptions for the parameters.
|
||||
llm: Language model to use, assumed to support the OpenAI function-calling API.
|
||||
Defaults to use "gpt-3.5-turbo-0613"
|
||||
prompt: BasePromptTemplate to pass to the model.
|
||||
|
||||
Returns:
|
||||
An LLMChain that will pass the given function to the model.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.chat_models import ChatOpenAI
|
||||
from langchain_community.document_transformers import create_metadata_tagger
|
||||
from langchain_core.documents import Document
|
||||
|
||||
schema = {
|
||||
"properties": {
|
||||
"movie_title": { "type": "string" },
|
||||
"critic": { "type": "string" },
|
||||
"tone": {
|
||||
"type": "string",
|
||||
"enum": ["positive", "negative"]
|
||||
},
|
||||
"rating": {
|
||||
"type": "integer",
|
||||
"description": "The number of stars the critic rated the movie"
|
||||
}
|
||||
},
|
||||
"required": ["movie_title", "critic", "tone"]
|
||||
}
|
||||
|
||||
# Must be an OpenAI model that supports functions
|
||||
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
|
||||
|
||||
document_transformer = create_metadata_tagger(schema, llm)
|
||||
original_documents = [
|
||||
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
|
||||
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
|
||||
]
|
||||
|
||||
enhanced_documents = document_transformer.transform_documents(original_documents)
|
||||
""" # noqa: E501
|
||||
from langchain.chains.openai_functions import create_tagging_chain
|
||||
metadata_schema = (
|
||||
metadata_schema
|
||||
if isinstance(metadata_schema, dict)
|
||||
else metadata_schema.schema()
|
||||
)
|
||||
_tagging_chain_kwargs = tagging_chain_kwargs or {}
|
||||
tagging_chain = create_tagging_chain(
|
||||
metadata_schema, llm, prompt=prompt, **_tagging_chain_kwargs
|
||||
)
|
||||
return OpenAIMetadataTagger(tagging_chain=tagging_chain)
|
||||
@@ -0,0 +1,161 @@
|
||||
"""**Embedding models** are wrappers around embedding models
|
||||
from different APIs and services.
|
||||
|
||||
**Embedding models** can be LLMs or not.
|
||||
|
||||
**Class hierarchy:**
|
||||
|
||||
.. code-block::
|
||||
|
||||
Embeddings --> <name>Embeddings # Examples: OpenAIEmbeddings, HuggingFaceEmbeddings
|
||||
"""
|
||||
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from langchain_community.embeddings.aleph_alpha import (
|
||||
AlephAlphaAsymmetricSemanticEmbedding,
|
||||
AlephAlphaSymmetricSemanticEmbedding,
|
||||
)
|
||||
from langchain_community.embeddings.awa import AwaEmbeddings
|
||||
from langchain_community.embeddings.azure_openai import AzureOpenAIEmbeddings
|
||||
from langchain_community.embeddings.baidu_qianfan_endpoint import (
|
||||
QianfanEmbeddingsEndpoint,
|
||||
)
|
||||
from langchain_community.embeddings.bedrock import BedrockEmbeddings
|
||||
from langchain_community.embeddings.bookend import BookendEmbeddings
|
||||
from langchain_community.embeddings.clarifai import ClarifaiEmbeddings
|
||||
from langchain_community.embeddings.cohere import CohereEmbeddings
|
||||
from langchain_community.embeddings.dashscope import DashScopeEmbeddings
|
||||
from langchain_community.embeddings.databricks import DatabricksEmbeddings
|
||||
from langchain_community.embeddings.deepinfra import DeepInfraEmbeddings
|
||||
from langchain_community.embeddings.edenai import EdenAiEmbeddings
|
||||
from langchain_community.embeddings.elasticsearch import ElasticsearchEmbeddings
|
||||
from langchain_community.embeddings.embaas import EmbaasEmbeddings
|
||||
from langchain_community.embeddings.ernie import ErnieEmbeddings
|
||||
from langchain_community.embeddings.fake import (
|
||||
DeterministicFakeEmbedding,
|
||||
FakeEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
|
||||
from langchain_community.embeddings.google_palm import GooglePalmEmbeddings
|
||||
from langchain_community.embeddings.gpt4all import GPT4AllEmbeddings
|
||||
from langchain_community.embeddings.gradient_ai import GradientEmbeddings
|
||||
from langchain_community.embeddings.huggingface import (
|
||||
HuggingFaceBgeEmbeddings,
|
||||
HuggingFaceEmbeddings,
|
||||
HuggingFaceInferenceAPIEmbeddings,
|
||||
HuggingFaceInstructEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.huggingface_hub import HuggingFaceHubEmbeddings
|
||||
from langchain_community.embeddings.infinity import InfinityEmbeddings
|
||||
from langchain_community.embeddings.javelin_ai_gateway import JavelinAIGatewayEmbeddings
|
||||
from langchain_community.embeddings.jina import JinaEmbeddings
|
||||
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
|
||||
from langchain_community.embeddings.llamacpp import LlamaCppEmbeddings
|
||||
from langchain_community.embeddings.localai import LocalAIEmbeddings
|
||||
from langchain_community.embeddings.minimax import MiniMaxEmbeddings
|
||||
from langchain_community.embeddings.mlflow import MlflowEmbeddings
|
||||
from langchain_community.embeddings.mlflow_gateway import MlflowAIGatewayEmbeddings
|
||||
from langchain_community.embeddings.modelscope_hub import ModelScopeEmbeddings
|
||||
from langchain_community.embeddings.mosaicml import MosaicMLInstructorEmbeddings
|
||||
from langchain_community.embeddings.nlpcloud import NLPCloudEmbeddings
|
||||
from langchain_community.embeddings.octoai_embeddings import OctoAIEmbeddings
|
||||
from langchain_community.embeddings.ollama import OllamaEmbeddings
|
||||
from langchain_community.embeddings.openai import OpenAIEmbeddings
|
||||
from langchain_community.embeddings.sagemaker_endpoint import (
|
||||
SagemakerEndpointEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
|
||||
from langchain_community.embeddings.self_hosted_hugging_face import (
|
||||
SelfHostedHuggingFaceEmbeddings,
|
||||
SelfHostedHuggingFaceInstructEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.sentence_transformer import (
|
||||
SentenceTransformerEmbeddings,
|
||||
)
|
||||
from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
|
||||
from langchain_community.embeddings.tensorflow_hub import TensorflowHubEmbeddings
|
||||
from langchain_community.embeddings.vertexai import VertexAIEmbeddings
|
||||
from langchain_community.embeddings.voyageai import VoyageEmbeddings
|
||||
from langchain_community.embeddings.xinference import XinferenceEmbeddings
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
__all__ = [
|
||||
"OpenAIEmbeddings",
|
||||
"AzureOpenAIEmbeddings",
|
||||
"ClarifaiEmbeddings",
|
||||
"CohereEmbeddings",
|
||||
"DatabricksEmbeddings",
|
||||
"ElasticsearchEmbeddings",
|
||||
"FastEmbedEmbeddings",
|
||||
"HuggingFaceEmbeddings",
|
||||
"HuggingFaceInferenceAPIEmbeddings",
|
||||
"InfinityEmbeddings",
|
||||
"GradientEmbeddings",
|
||||
"JinaEmbeddings",
|
||||
"LlamaCppEmbeddings",
|
||||
"HuggingFaceHubEmbeddings",
|
||||
"MlflowEmbeddings",
|
||||
"MlflowAIGatewayEmbeddings",
|
||||
"ModelScopeEmbeddings",
|
||||
"TensorflowHubEmbeddings",
|
||||
"SagemakerEndpointEmbeddings",
|
||||
"HuggingFaceInstructEmbeddings",
|
||||
"MosaicMLInstructorEmbeddings",
|
||||
"SelfHostedEmbeddings",
|
||||
"SelfHostedHuggingFaceEmbeddings",
|
||||
"SelfHostedHuggingFaceInstructEmbeddings",
|
||||
"FakeEmbeddings",
|
||||
"DeterministicFakeEmbedding",
|
||||
"AlephAlphaAsymmetricSemanticEmbedding",
|
||||
"AlephAlphaSymmetricSemanticEmbedding",
|
||||
"SentenceTransformerEmbeddings",
|
||||
"GooglePalmEmbeddings",
|
||||
"MiniMaxEmbeddings",
|
||||
"VertexAIEmbeddings",
|
||||
"BedrockEmbeddings",
|
||||
"DeepInfraEmbeddings",
|
||||
"EdenAiEmbeddings",
|
||||
"DashScopeEmbeddings",
|
||||
"EmbaasEmbeddings",
|
||||
"OctoAIEmbeddings",
|
||||
"SpacyEmbeddings",
|
||||
"NLPCloudEmbeddings",
|
||||
"GPT4AllEmbeddings",
|
||||
"XinferenceEmbeddings",
|
||||
"LocalAIEmbeddings",
|
||||
"AwaEmbeddings",
|
||||
"HuggingFaceBgeEmbeddings",
|
||||
"ErnieEmbeddings",
|
||||
"JavelinAIGatewayEmbeddings",
|
||||
"OllamaEmbeddings",
|
||||
"QianfanEmbeddingsEndpoint",
|
||||
"JohnSnowLabsEmbeddings",
|
||||
"VoyageEmbeddings",
|
||||
"BookendEmbeddings",
|
||||
]
|
||||
|
||||
|
||||
# TODO: this is in here to maintain backwards compatibility
|
||||
class HypotheticalDocumentEmbedder:
|
||||
def __init__(self, *args: Any, **kwargs: Any):
|
||||
logger.warning(
|
||||
"Using a deprecated class. Please use "
|
||||
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
|
||||
)
|
||||
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
|
||||
|
||||
return H(*args, **kwargs) # type: ignore
|
||||
|
||||
@classmethod
|
||||
def from_llm(cls, *args: Any, **kwargs: Any) -> Any:
|
||||
logger.warning(
|
||||
"Using a deprecated class. Please use "
|
||||
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
|
||||
)
|
||||
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
|
||||
|
||||
return H.from_llm(*args, **kwargs)
|
||||
@@ -0,0 +1,343 @@
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import requests
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.pydantic_v1 import BaseModel, Extra, Field
|
||||
|
||||
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
|
||||
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
|
||||
DEFAULT_BGE_MODEL = "BAAI/bge-large-en"
|
||||
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
|
||||
DEFAULT_QUERY_INSTRUCTION = (
|
||||
"Represent the question for retrieving supporting documents: "
|
||||
)
|
||||
DEFAULT_QUERY_BGE_INSTRUCTION_EN = (
|
||||
"Represent this question for searching relevant passages: "
|
||||
)
|
||||
DEFAULT_QUERY_BGE_INSTRUCTION_ZH = "为这个句子生成表示以用于检索相关文章:"
|
||||
|
||||
|
||||
class HuggingFaceEmbeddings(BaseModel, Embeddings):
|
||||
"""HuggingFace sentence_transformers embedding models.
|
||||
|
||||
To use, you should have the ``sentence_transformers`` python package installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||
|
||||
model_name = "sentence-transformers/all-mpnet-base-v2"
|
||||
model_kwargs = {'device': 'cpu'}
|
||||
encode_kwargs = {'normalize_embeddings': False}
|
||||
hf = HuggingFaceEmbeddings(
|
||||
model_name=model_name,
|
||||
model_kwargs=model_kwargs,
|
||||
encode_kwargs=encode_kwargs
|
||||
)
|
||||
"""
|
||||
|
||||
client: Any #: :meta private:
|
||||
model_name: str = DEFAULT_MODEL_NAME
|
||||
"""Model name to use."""
|
||||
cache_folder: Optional[str] = None
|
||||
"""Path to store models.
|
||||
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass to the model."""
|
||||
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass when calling the `encode` method of the model."""
|
||||
multi_process: bool = False
|
||||
"""Run encode() on multiple GPUs."""
|
||||
|
||||
def __init__(self, **kwargs: Any):
|
||||
"""Initialize the sentence_transformer."""
|
||||
super().__init__(**kwargs)
|
||||
try:
|
||||
import sentence_transformers
|
||||
|
||||
except ImportError as exc:
|
||||
raise ImportError(
|
||||
"Could not import sentence_transformers python package. "
|
||||
"Please install it with `pip install sentence-transformers`."
|
||||
) from exc
|
||||
|
||||
self.client = sentence_transformers.SentenceTransformer(
|
||||
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
|
||||
)
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Compute doc embeddings using a HuggingFace transformer model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
import sentence_transformers
|
||||
|
||||
texts = list(map(lambda x: x.replace("\n", " "), texts))
|
||||
if self.multi_process:
|
||||
pool = self.client.start_multi_process_pool()
|
||||
embeddings = self.client.encode_multi_process(texts, pool)
|
||||
sentence_transformers.SentenceTransformer.stop_multi_process_pool(pool)
|
||||
else:
|
||||
embeddings = self.client.encode(texts, **self.encode_kwargs)
|
||||
|
||||
return embeddings.tolist()
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a HuggingFace transformer model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
return self.embed_documents([text])[0]
|
||||
|
||||
|
||||
class HuggingFaceInstructEmbeddings(BaseModel, Embeddings):
|
||||
"""Wrapper around sentence_transformers embedding models.
|
||||
|
||||
To use, you should have the ``sentence_transformers``
|
||||
and ``InstructorEmbedding`` python packages installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
|
||||
|
||||
model_name = "hkunlp/instructor-large"
|
||||
model_kwargs = {'device': 'cpu'}
|
||||
encode_kwargs = {'normalize_embeddings': True}
|
||||
hf = HuggingFaceInstructEmbeddings(
|
||||
model_name=model_name,
|
||||
model_kwargs=model_kwargs,
|
||||
encode_kwargs=encode_kwargs
|
||||
)
|
||||
"""
|
||||
|
||||
client: Any #: :meta private:
|
||||
model_name: str = DEFAULT_INSTRUCT_MODEL
|
||||
"""Model name to use."""
|
||||
cache_folder: Optional[str] = None
|
||||
"""Path to store models.
|
||||
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass to the model."""
|
||||
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass when calling the `encode` method of the model."""
|
||||
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
|
||||
"""Instruction to use for embedding documents."""
|
||||
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
|
||||
"""Instruction to use for embedding query."""
|
||||
|
||||
def __init__(self, **kwargs: Any):
|
||||
"""Initialize the sentence_transformer."""
|
||||
super().__init__(**kwargs)
|
||||
try:
|
||||
from InstructorEmbedding import INSTRUCTOR
|
||||
|
||||
self.client = INSTRUCTOR(
|
||||
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
|
||||
)
|
||||
except ImportError as e:
|
||||
raise ImportError("Dependencies for InstructorEmbedding not found.") from e
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Compute doc embeddings using a HuggingFace instruct model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
instruction_pairs = [[self.embed_instruction, text] for text in texts]
|
||||
embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
|
||||
return embeddings.tolist()
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a HuggingFace instruct model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
instruction_pair = [self.query_instruction, text]
|
||||
embedding = self.client.encode([instruction_pair], **self.encode_kwargs)[0]
|
||||
return embedding.tolist()
|
||||
|
||||
|
||||
class HuggingFaceBgeEmbeddings(BaseModel, Embeddings):
|
||||
"""HuggingFace BGE sentence_transformers embedding models.
|
||||
|
||||
To use, you should have the ``sentence_transformers`` python package installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
|
||||
|
||||
model_name = "BAAI/bge-large-en"
|
||||
model_kwargs = {'device': 'cpu'}
|
||||
encode_kwargs = {'normalize_embeddings': True}
|
||||
hf = HuggingFaceBgeEmbeddings(
|
||||
model_name=model_name,
|
||||
model_kwargs=model_kwargs,
|
||||
encode_kwargs=encode_kwargs
|
||||
)
|
||||
"""
|
||||
|
||||
client: Any #: :meta private:
|
||||
model_name: str = DEFAULT_BGE_MODEL
|
||||
"""Model name to use."""
|
||||
cache_folder: Optional[str] = None
|
||||
"""Path to store models.
|
||||
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass to the model."""
|
||||
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
"""Keyword arguments to pass when calling the `encode` method of the model."""
|
||||
query_instruction: str = DEFAULT_QUERY_BGE_INSTRUCTION_EN
|
||||
"""Instruction to use for embedding query."""
|
||||
|
||||
def __init__(self, **kwargs: Any):
|
||||
"""Initialize the sentence_transformer."""
|
||||
super().__init__(**kwargs)
|
||||
try:
|
||||
import sentence_transformers
|
||||
|
||||
except ImportError as exc:
|
||||
raise ImportError(
|
||||
"Could not import sentence_transformers python package. "
|
||||
"Please install it with `pip install sentence_transformers`."
|
||||
) from exc
|
||||
|
||||
self.client = sentence_transformers.SentenceTransformer(
|
||||
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
|
||||
)
|
||||
if "-zh" in self.model_name:
|
||||
self.query_instruction = DEFAULT_QUERY_BGE_INSTRUCTION_ZH
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Compute doc embeddings using a HuggingFace transformer model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
texts = [t.replace("\n", " ") for t in texts]
|
||||
embeddings = self.client.encode(texts, **self.encode_kwargs)
|
||||
return embeddings.tolist()
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a HuggingFace transformer model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
text = text.replace("\n", " ")
|
||||
embedding = self.client.encode(
|
||||
self.query_instruction + text, **self.encode_kwargs
|
||||
)
|
||||
return embedding.tolist()
|
||||
|
||||
|
||||
class HuggingFaceInferenceAPIEmbeddings(BaseModel, Embeddings):
|
||||
"""Embed texts using the HuggingFace API.
|
||||
|
||||
Requires a HuggingFace Inference API key and a model name.
|
||||
"""
|
||||
|
||||
api_key: str
|
||||
"""Your API key for the HuggingFace Inference API."""
|
||||
model_name: str = "sentence-transformers/all-MiniLM-L6-v2"
|
||||
"""The name of the model to use for text embeddings."""
|
||||
api_url: Optional[str] = None
|
||||
"""Custom inference endpoint url. None for using default public url."""
|
||||
|
||||
@property
|
||||
def _api_url(self) -> str:
|
||||
return self.api_url or self._default_api_url
|
||||
|
||||
@property
|
||||
def _default_api_url(self) -> str:
|
||||
return (
|
||||
"https://api-inference.huggingface.co"
|
||||
"/pipeline"
|
||||
"/feature-extraction"
|
||||
f"/{self.model_name}"
|
||||
)
|
||||
|
||||
@property
|
||||
def _headers(self) -> dict:
|
||||
return {"Authorization": f"Bearer {self.api_key}"}
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Get the embeddings for a list of texts.
|
||||
|
||||
Args:
|
||||
texts (Documents): A list of texts to get embeddings for.
|
||||
|
||||
Returns:
|
||||
Embedded texts as List[List[float]], where each inner List[float]
|
||||
corresponds to a single input text.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
|
||||
|
||||
hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
|
||||
api_key="your_api_key",
|
||||
model_name="sentence-transformers/all-MiniLM-l6-v2"
|
||||
)
|
||||
texts = ["Hello, world!", "How are you?"]
|
||||
hf_embeddings.embed_documents(texts)
|
||||
""" # noqa: E501
|
||||
response = requests.post(
|
||||
self._api_url,
|
||||
headers=self._headers,
|
||||
json={
|
||||
"inputs": texts,
|
||||
"options": {"wait_for_model": True, "use_cache": True},
|
||||
},
|
||||
)
|
||||
return response.json()
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a HuggingFace transformer model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
return self.embed_documents([text])[0]
|
||||
@@ -0,0 +1,92 @@
|
||||
import os
|
||||
import sys
|
||||
from typing import Any, List
|
||||
|
||||
from langchain_core.embeddings import Embeddings
|
||||
from langchain_core.pydantic_v1 import BaseModel, Extra
|
||||
|
||||
|
||||
class JohnSnowLabsEmbeddings(BaseModel, Embeddings):
|
||||
"""JohnSnowLabs embedding models
|
||||
|
||||
To use, you should have the ``johnsnowlabs`` python package installed.
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
|
||||
|
||||
embedding = JohnSnowLabsEmbeddings(model='embed_sentence.bert')
|
||||
output = embedding.embed_query("foo bar")
|
||||
""" # noqa: E501
|
||||
|
||||
model: Any = "embed_sentence.bert"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model: Any = "embed_sentence.bert",
|
||||
hardware_target: str = "cpu",
|
||||
**kwargs: Any,
|
||||
):
|
||||
"""Initialize the johnsnowlabs model."""
|
||||
super().__init__(**kwargs)
|
||||
# 1) Check imports
|
||||
try:
|
||||
from johnsnowlabs import nlp
|
||||
from nlu.pipe.pipeline import NLUPipeline
|
||||
except ImportError as exc:
|
||||
raise ImportError(
|
||||
"Could not import johnsnowlabs python package. "
|
||||
"Please install it with `pip install johnsnowlabs`."
|
||||
) from exc
|
||||
|
||||
# 2) Start a Spark Session
|
||||
try:
|
||||
os.environ["PYSPARK_PYTHON"] = sys.executable
|
||||
os.environ["PYSPARK_DRIVER_PYTHON"] = sys.executable
|
||||
nlp.start(hardware_target=hardware_target)
|
||||
except Exception as exc:
|
||||
raise Exception("Failure starting Spark Session") from exc
|
||||
|
||||
# 3) Load the model
|
||||
try:
|
||||
if isinstance(model, str):
|
||||
self.model = nlp.load(model)
|
||||
elif isinstance(model, NLUPipeline):
|
||||
self.model = model
|
||||
else:
|
||||
self.model = nlp.to_nlu_pipe(model)
|
||||
except Exception as exc:
|
||||
raise Exception("Failure loading model") from exc
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Compute doc embeddings using a JohnSnowLabs transformer model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
|
||||
df = self.model.predict(texts, output_level="document")
|
||||
emb_col = None
|
||||
for c in df.columns:
|
||||
if "embedding" in c:
|
||||
emb_col = c
|
||||
return [vec.tolist() for vec in df[emb_col].tolist()]
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a JohnSnowLabs transformer model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
return self.embed_documents([text])[0]
|
||||
@@ -0,0 +1,168 @@
|
||||
import importlib
|
||||
import logging
|
||||
from typing import Any, Callable, List, Optional
|
||||
|
||||
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
|
||||
|
||||
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
|
||||
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
|
||||
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
|
||||
DEFAULT_QUERY_INSTRUCTION = (
|
||||
"Represent the question for retrieving supporting documents: "
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _embed_documents(client: Any, *args: Any, **kwargs: Any) -> List[List[float]]:
|
||||
"""Inference function to send to the remote hardware.
|
||||
|
||||
Accepts a sentence_transformer model_id and
|
||||
returns a list of embeddings for each document in the batch.
|
||||
"""
|
||||
return client.encode(*args, **kwargs)
|
||||
|
||||
|
||||
def load_embedding_model(model_id: str, instruct: bool = False, device: int = 0) -> Any:
|
||||
"""Load the embedding model."""
|
||||
if not instruct:
|
||||
import sentence_transformers
|
||||
|
||||
client = sentence_transformers.SentenceTransformer(model_id)
|
||||
else:
|
||||
from InstructorEmbedding import INSTRUCTOR
|
||||
|
||||
client = INSTRUCTOR(model_id)
|
||||
|
||||
if importlib.util.find_spec("torch") is not None:
|
||||
import torch
|
||||
|
||||
cuda_device_count = torch.cuda.device_count()
|
||||
if device < -1 or (device >= cuda_device_count):
|
||||
raise ValueError(
|
||||
f"Got device=={device}, "
|
||||
f"device is required to be within [-1, {cuda_device_count})"
|
||||
)
|
||||
if device < 0 and cuda_device_count > 0:
|
||||
logger.warning(
|
||||
"Device has %d GPUs available. "
|
||||
"Provide device={deviceId} to `from_model_id` to use available"
|
||||
"GPUs for execution. deviceId is -1 for CPU and "
|
||||
"can be a positive integer associated with CUDA device id.",
|
||||
cuda_device_count,
|
||||
)
|
||||
|
||||
client = client.to(device)
|
||||
return client
|
||||
|
||||
|
||||
class SelfHostedHuggingFaceEmbeddings(SelfHostedEmbeddings):
|
||||
"""HuggingFace embedding models on self-hosted remote hardware.
|
||||
|
||||
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
|
||||
and Lambda, as well as servers specified
|
||||
by IP address and SSH credentials (such as on-prem, or another cloud
|
||||
like Paperspace, Coreweave, etc.).
|
||||
|
||||
To use, you should have the ``runhouse`` python package installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import SelfHostedHuggingFaceEmbeddings
|
||||
import runhouse as rh
|
||||
model_name = "sentence-transformers/all-mpnet-base-v2"
|
||||
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
|
||||
hf = SelfHostedHuggingFaceEmbeddings(model_name=model_name, hardware=gpu)
|
||||
"""
|
||||
|
||||
client: Any #: :meta private:
|
||||
model_id: str = DEFAULT_MODEL_NAME
|
||||
"""Model name to use."""
|
||||
model_reqs: List[str] = ["./", "sentence_transformers", "torch"]
|
||||
"""Requirements to install on hardware to inference the model."""
|
||||
hardware: Any
|
||||
"""Remote hardware to send the inference function to."""
|
||||
model_load_fn: Callable = load_embedding_model
|
||||
"""Function to load the model remotely on the server."""
|
||||
load_fn_kwargs: Optional[dict] = None
|
||||
"""Keyword arguments to pass to the model load function."""
|
||||
inference_fn: Callable = _embed_documents
|
||||
"""Inference function to extract the embeddings."""
|
||||
|
||||
def __init__(self, **kwargs: Any):
|
||||
"""Initialize the remote inference function."""
|
||||
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
|
||||
load_fn_kwargs["model_id"] = load_fn_kwargs.get("model_id", DEFAULT_MODEL_NAME)
|
||||
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", False)
|
||||
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
|
||||
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
|
||||
|
||||
|
||||
class SelfHostedHuggingFaceInstructEmbeddings(SelfHostedHuggingFaceEmbeddings):
|
||||
"""HuggingFace InstructEmbedding models on self-hosted remote hardware.
|
||||
|
||||
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
|
||||
and Lambda, as well as servers specified
|
||||
by IP address and SSH credentials (such as on-prem, or another
|
||||
cloud like Paperspace, Coreweave, etc.).
|
||||
|
||||
To use, you should have the ``runhouse`` python package installed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.embeddings import SelfHostedHuggingFaceInstructEmbeddings
|
||||
import runhouse as rh
|
||||
model_name = "hkunlp/instructor-large"
|
||||
gpu = rh.cluster(name='rh-a10x', instance_type='A100:1')
|
||||
hf = SelfHostedHuggingFaceInstructEmbeddings(
|
||||
model_name=model_name, hardware=gpu)
|
||||
""" # noqa: E501
|
||||
|
||||
model_id: str = DEFAULT_INSTRUCT_MODEL
|
||||
"""Model name to use."""
|
||||
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
|
||||
"""Instruction to use for embedding documents."""
|
||||
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
|
||||
"""Instruction to use for embedding query."""
|
||||
model_reqs: List[str] = ["./", "InstructorEmbedding", "torch"]
|
||||
"""Requirements to install on hardware to inference the model."""
|
||||
|
||||
def __init__(self, **kwargs: Any):
|
||||
"""Initialize the remote inference function."""
|
||||
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
|
||||
load_fn_kwargs["model_id"] = load_fn_kwargs.get(
|
||||
"model_id", DEFAULT_INSTRUCT_MODEL
|
||||
)
|
||||
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", True)
|
||||
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
|
||||
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
|
||||
|
||||
def embed_documents(self, texts: List[str]) -> List[List[float]]:
|
||||
"""Compute doc embeddings using a HuggingFace instruct model.
|
||||
|
||||
Args:
|
||||
texts: The list of texts to embed.
|
||||
|
||||
Returns:
|
||||
List of embeddings, one for each text.
|
||||
"""
|
||||
instruction_pairs = []
|
||||
for text in texts:
|
||||
instruction_pairs.append([self.embed_instruction, text])
|
||||
embeddings = self.client(self.pipeline_ref, instruction_pairs)
|
||||
return embeddings.tolist()
|
||||
|
||||
def embed_query(self, text: str) -> List[float]:
|
||||
"""Compute query embeddings using a HuggingFace instruct model.
|
||||
|
||||
Args:
|
||||
text: The text to embed.
|
||||
|
||||
Returns:
|
||||
Embeddings for the text.
|
||||
"""
|
||||
instruction_pair = [self.query_instruction, text]
|
||||
embedding = self.client(self.pipeline_ref, [instruction_pair])[0]
|
||||
return embedding.tolist()
|
||||
@@ -0,0 +1,351 @@
|
||||
import re
|
||||
import warnings
|
||||
from typing import (
|
||||
Any,
|
||||
AsyncIterator,
|
||||
Callable,
|
||||
Dict,
|
||||
Iterator,
|
||||
List,
|
||||
Mapping,
|
||||
Optional,
|
||||
)
|
||||
|
||||
from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForLLMRun,
|
||||
CallbackManagerForLLMRun,
|
||||
)
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.language_models.llms import LLM
|
||||
from langchain_core.outputs import GenerationChunk
|
||||
from langchain_core.prompt_values import PromptValue
|
||||
from langchain_core.pydantic_v1 import Field, SecretStr, root_validator
|
||||
from langchain_core.utils import (
|
||||
check_package_version,
|
||||
get_from_dict_or_env,
|
||||
get_pydantic_field_names,
|
||||
)
|
||||
from langchain_core.utils.utils import build_extra_kwargs, convert_to_secret_str
|
||||
|
||||
|
||||
class _AnthropicCommon(BaseLanguageModel):
|
||||
client: Any = None #: :meta private:
|
||||
async_client: Any = None #: :meta private:
|
||||
model: str = Field(default="claude-2", alias="model_name")
|
||||
"""Model name to use."""
|
||||
|
||||
max_tokens_to_sample: int = Field(default=256, alias="max_tokens")
|
||||
"""Denotes the number of tokens to predict per generation."""
|
||||
|
||||
temperature: Optional[float] = None
|
||||
"""A non-negative float that tunes the degree of randomness in generation."""
|
||||
|
||||
top_k: Optional[int] = None
|
||||
"""Number of most likely tokens to consider at each step."""
|
||||
|
||||
top_p: Optional[float] = None
|
||||
"""Total probability mass of tokens to consider at each step."""
|
||||
|
||||
streaming: bool = False
|
||||
"""Whether to stream the results."""
|
||||
|
||||
default_request_timeout: Optional[float] = None
|
||||
"""Timeout for requests to Anthropic Completion API. Default is 600 seconds."""
|
||||
|
||||
anthropic_api_url: Optional[str] = None
|
||||
|
||||
anthropic_api_key: Optional[SecretStr] = None
|
||||
|
||||
HUMAN_PROMPT: Optional[str] = None
|
||||
AI_PROMPT: Optional[str] = None
|
||||
count_tokens: Optional[Callable[[str], int]] = None
|
||||
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
@root_validator(pre=True)
|
||||
def build_extra(cls, values: Dict) -> Dict:
|
||||
extra = values.get("model_kwargs", {})
|
||||
all_required_field_names = get_pydantic_field_names(cls)
|
||||
values["model_kwargs"] = build_extra_kwargs(
|
||||
extra, values, all_required_field_names
|
||||
)
|
||||
return values
|
||||
|
||||
@root_validator()
|
||||
def validate_environment(cls, values: Dict) -> Dict:
|
||||
"""Validate that api key and python package exists in environment."""
|
||||
values["anthropic_api_key"] = convert_to_secret_str(
|
||||
get_from_dict_or_env(values, "anthropic_api_key", "ANTHROPIC_API_KEY")
|
||||
)
|
||||
# Get custom api url from environment.
|
||||
values["anthropic_api_url"] = get_from_dict_or_env(
|
||||
values,
|
||||
"anthropic_api_url",
|
||||
"ANTHROPIC_API_URL",
|
||||
default="https://api.anthropic.com",
|
||||
)
|
||||
|
||||
try:
|
||||
import anthropic
|
||||
|
||||
check_package_version("anthropic", gte_version="0.3")
|
||||
values["client"] = anthropic.Anthropic(
|
||||
base_url=values["anthropic_api_url"],
|
||||
api_key=values["anthropic_api_key"].get_secret_value(),
|
||||
timeout=values["default_request_timeout"],
|
||||
)
|
||||
values["async_client"] = anthropic.AsyncAnthropic(
|
||||
base_url=values["anthropic_api_url"],
|
||||
api_key=values["anthropic_api_key"].get_secret_value(),
|
||||
timeout=values["default_request_timeout"],
|
||||
)
|
||||
values["HUMAN_PROMPT"] = anthropic.HUMAN_PROMPT
|
||||
values["AI_PROMPT"] = anthropic.AI_PROMPT
|
||||
values["count_tokens"] = values["client"].count_tokens
|
||||
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"Could not import anthropic python package. "
|
||||
"Please it install it with `pip install anthropic`."
|
||||
)
|
||||
return values
|
||||
|
||||
@property
|
||||
def _default_params(self) -> Mapping[str, Any]:
|
||||
"""Get the default parameters for calling Anthropic API."""
|
||||
d = {
|
||||
"max_tokens_to_sample": self.max_tokens_to_sample,
|
||||
"model": self.model,
|
||||
}
|
||||
if self.temperature is not None:
|
||||
d["temperature"] = self.temperature
|
||||
if self.top_k is not None:
|
||||
d["top_k"] = self.top_k
|
||||
if self.top_p is not None:
|
||||
d["top_p"] = self.top_p
|
||||
return {**d, **self.model_kwargs}
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Mapping[str, Any]:
|
||||
"""Get the identifying parameters."""
|
||||
return {**{}, **self._default_params}
|
||||
|
||||
def _get_anthropic_stop(self, stop: Optional[List[str]] = None) -> List[str]:
|
||||
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
|
||||
raise NameError("Please ensure the anthropic package is loaded")
|
||||
|
||||
if stop is None:
|
||||
stop = []
|
||||
|
||||
# Never want model to invent new turns of Human / Assistant dialog.
|
||||
stop.extend([self.HUMAN_PROMPT])
|
||||
|
||||
return stop
|
||||
|
||||
|
||||
class Anthropic(LLM, _AnthropicCommon):
|
||||
"""Anthropic large language models.
|
||||
|
||||
To use, you should have the ``anthropic`` python package installed, and the
|
||||
environment variable ``ANTHROPIC_API_KEY`` set with your API key, or pass
|
||||
it as a named parameter to the constructor.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
import anthropic
|
||||
from langchain_community.llms import Anthropic
|
||||
|
||||
model = Anthropic(model="<model_name>", anthropic_api_key="my-api-key")
|
||||
|
||||
# Simplest invocation, automatically wrapped with HUMAN_PROMPT
|
||||
# and AI_PROMPT.
|
||||
response = model("What are the biggest risks facing humanity?")
|
||||
|
||||
# Or if you want to use the chat mode, build a few-shot-prompt, or
|
||||
# put words in the Assistant's mouth, use HUMAN_PROMPT and AI_PROMPT:
|
||||
raw_prompt = "What are the biggest risks facing humanity?"
|
||||
prompt = f"{anthropic.HUMAN_PROMPT} {prompt}{anthropic.AI_PROMPT}"
|
||||
response = model(prompt)
|
||||
"""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
allow_population_by_field_name = True
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
@root_validator()
|
||||
def raise_warning(cls, values: Dict) -> Dict:
|
||||
"""Raise warning that this class is deprecated."""
|
||||
warnings.warn(
|
||||
"This Anthropic LLM is deprecated. "
|
||||
"Please use `from langchain_community.chat_models import ChatAnthropic` "
|
||||
"instead"
|
||||
)
|
||||
return values
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
"""Return type of llm."""
|
||||
return "anthropic-llm"
|
||||
|
||||
def _wrap_prompt(self, prompt: str) -> str:
|
||||
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
|
||||
raise NameError("Please ensure the anthropic package is loaded")
|
||||
|
||||
if prompt.startswith(self.HUMAN_PROMPT):
|
||||
return prompt # Already wrapped.
|
||||
|
||||
# Guard against common errors in specifying wrong number of newlines.
|
||||
corrected_prompt, n_subs = re.subn(r"^\n*Human:", self.HUMAN_PROMPT, prompt)
|
||||
if n_subs == 1:
|
||||
return corrected_prompt
|
||||
|
||||
# As a last resort, wrap the prompt ourselves to emulate instruct-style.
|
||||
return f"{self.HUMAN_PROMPT} {prompt}{self.AI_PROMPT} Sure, here you go:\n"
|
||||
|
||||
def _call(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
r"""Call out to Anthropic's completion endpoint.
|
||||
|
||||
Args:
|
||||
prompt: The prompt to pass into the model.
|
||||
stop: Optional list of stop words to use when generating.
|
||||
|
||||
Returns:
|
||||
The string generated by the model.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
prompt = "What are the biggest risks facing humanity?"
|
||||
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
|
||||
response = model(prompt)
|
||||
|
||||
"""
|
||||
if self.streaming:
|
||||
completion = ""
|
||||
for chunk in self._stream(
|
||||
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
|
||||
):
|
||||
completion += chunk.text
|
||||
return completion
|
||||
|
||||
stop = self._get_anthropic_stop(stop)
|
||||
params = {**self._default_params, **kwargs}
|
||||
response = self.client.completions.create(
|
||||
prompt=self._wrap_prompt(prompt),
|
||||
stop_sequences=stop,
|
||||
**params,
|
||||
)
|
||||
return response.completion
|
||||
|
||||
def convert_prompt(self, prompt: PromptValue) -> str:
|
||||
return self._wrap_prompt(prompt.to_string())
|
||||
|
||||
async def _acall(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Call out to Anthropic's completion endpoint asynchronously."""
|
||||
if self.streaming:
|
||||
completion = ""
|
||||
async for chunk in self._astream(
|
||||
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
|
||||
):
|
||||
completion += chunk.text
|
||||
return completion
|
||||
|
||||
stop = self._get_anthropic_stop(stop)
|
||||
params = {**self._default_params, **kwargs}
|
||||
|
||||
response = await self.async_client.completions.create(
|
||||
prompt=self._wrap_prompt(prompt),
|
||||
stop_sequences=stop,
|
||||
**params,
|
||||
)
|
||||
return response.completion
|
||||
|
||||
def _stream(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> Iterator[GenerationChunk]:
|
||||
r"""Call Anthropic completion_stream and return the resulting generator.
|
||||
|
||||
Args:
|
||||
prompt: The prompt to pass into the model.
|
||||
stop: Optional list of stop words to use when generating.
|
||||
Returns:
|
||||
A generator representing the stream of tokens from Anthropic.
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
prompt = "Write a poem about a stream."
|
||||
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
|
||||
generator = anthropic.stream(prompt)
|
||||
for token in generator:
|
||||
yield token
|
||||
"""
|
||||
stop = self._get_anthropic_stop(stop)
|
||||
params = {**self._default_params, **kwargs}
|
||||
|
||||
for token in self.client.completions.create(
|
||||
prompt=self._wrap_prompt(prompt), stop_sequences=stop, stream=True, **params
|
||||
):
|
||||
chunk = GenerationChunk(text=token.completion)
|
||||
yield chunk
|
||||
if run_manager:
|
||||
run_manager.on_llm_new_token(chunk.text, chunk=chunk)
|
||||
|
||||
async def _astream(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> AsyncIterator[GenerationChunk]:
|
||||
r"""Call Anthropic completion_stream and return the resulting generator.
|
||||
|
||||
Args:
|
||||
prompt: The prompt to pass into the model.
|
||||
stop: Optional list of stop words to use when generating.
|
||||
Returns:
|
||||
A generator representing the stream of tokens from Anthropic.
|
||||
Example:
|
||||
.. code-block:: python
|
||||
prompt = "Write a poem about a stream."
|
||||
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
|
||||
generator = anthropic.stream(prompt)
|
||||
for token in generator:
|
||||
yield token
|
||||
"""
|
||||
stop = self._get_anthropic_stop(stop)
|
||||
params = {**self._default_params, **kwargs}
|
||||
|
||||
async for token in await self.async_client.completions.create(
|
||||
prompt=self._wrap_prompt(prompt),
|
||||
stop_sequences=stop,
|
||||
stream=True,
|
||||
**params,
|
||||
):
|
||||
chunk = GenerationChunk(text=token.completion)
|
||||
yield chunk
|
||||
if run_manager:
|
||||
await run_manager.on_llm_new_token(chunk.text, chunk=chunk)
|
||||
|
||||
def get_num_tokens(self, text: str) -> int:
|
||||
"""Calculate number of tokens."""
|
||||
if not self.count_tokens:
|
||||
raise NameError("Please ensure the anthropic package is loaded")
|
||||
return self.count_tokens(text)
|
||||
@@ -0,0 +1,126 @@
|
||||
import json
|
||||
import logging
|
||||
from typing import Any, Dict, Iterator, List, Optional
|
||||
|
||||
import requests
|
||||
from langchain_core.callbacks import CallbackManagerForLLMRun
|
||||
from langchain_core.language_models.llms import LLM
|
||||
from langchain_core.outputs import GenerationChunk
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CloudflareWorkersAI(LLM):
|
||||
"""Langchain LLM class to help to access Cloudflare Workers AI service.
|
||||
|
||||
To use, you must provide an API token and
|
||||
account ID to access Cloudflare Workers AI, and
|
||||
pass it as a named parameter to the constructor.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain_community.llms.cloudflare_workersai import CloudflareWorkersAI
|
||||
|
||||
my_account_id = "my_account_id"
|
||||
my_api_token = "my_secret_api_token"
|
||||
llm_model = "@cf/meta/llama-2-7b-chat-int8"
|
||||
|
||||
cf_ai = CloudflareWorkersAI(
|
||||
account_id=my_account_id,
|
||||
api_token=my_api_token,
|
||||
model=llm_model
|
||||
)
|
||||
""" # noqa: E501
|
||||
|
||||
account_id: str
|
||||
api_token: str
|
||||
model: str = "@cf/meta/llama-2-7b-chat-int8"
|
||||
base_url: str = "https://api.cloudflare.com/client/v4/accounts"
|
||||
streaming: bool = False
|
||||
endpoint_url: str = ""
|
||||
|
||||
def __init__(self, **kwargs: Any) -> None:
|
||||
"""Initialize the Cloudflare Workers AI class."""
|
||||
super().__init__(**kwargs)
|
||||
|
||||
self.endpoint_url = f"{self.base_url}/{self.account_id}/ai/run/{self.model}"
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
"""Return type of LLM."""
|
||||
return "cloudflare"
|
||||
|
||||
@property
|
||||
def _default_params(self) -> Dict[str, Any]:
|
||||
"""Default parameters"""
|
||||
return {}
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Dict[str, Any]:
|
||||
"""Identifying parameters"""
|
||||
return {
|
||||
"account_id": self.account_id,
|
||||
"api_token": self.api_token,
|
||||
"model": self.model,
|
||||
"base_url": self.base_url,
|
||||
}
|
||||
|
||||
def _call_api(self, prompt: str, params: Dict[str, Any]) -> requests.Response:
|
||||
"""Call Cloudflare Workers API"""
|
||||
headers = {"Authorization": f"Bearer {self.api_token}"}
|
||||
data = {"prompt": prompt, "stream": self.streaming, **params}
|
||||
response = requests.post(self.endpoint_url, headers=headers, json=data)
|
||||
return response
|
||||
|
||||
def _process_response(self, response: requests.Response) -> str:
|
||||
"""Process API response"""
|
||||
if response.ok:
|
||||
data = response.json()
|
||||
return data["result"]["response"]
|
||||
else:
|
||||
raise ValueError(f"Request failed with status {response.status_code}")
|
||||
|
||||
def _stream(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> Iterator[GenerationChunk]:
|
||||
"""Streaming prediction"""
|
||||
original_steaming: bool = self.streaming
|
||||
self.streaming = True
|
||||
_response_prefix_count = len("data: ")
|
||||
_response_stream_end = b"data: [DONE]"
|
||||
for chunk in self._call_api(prompt, kwargs).iter_lines():
|
||||
if chunk == _response_stream_end:
|
||||
break
|
||||
if len(chunk) > _response_prefix_count:
|
||||
try:
|
||||
data = json.loads(chunk[_response_prefix_count:])
|
||||
except Exception as e:
|
||||
logger.debug(chunk)
|
||||
raise e
|
||||
if data is not None and "response" in data:
|
||||
yield GenerationChunk(text=data["response"])
|
||||
if run_manager:
|
||||
run_manager.on_llm_new_token(data["response"])
|
||||
logger.debug("stream end")
|
||||
self.streaming = original_steaming
|
||||
|
||||
def _call(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Regular prediction"""
|
||||
if self.streaming:
|
||||
return "".join(
|
||||
[c.text for c in self._stream(prompt, stop, run_manager, **kwargs)]
|
||||
)
|
||||
else:
|
||||
response = self._call_api(prompt, kwargs)
|
||||
return self._process_response(response)
|
||||
@@ -0,0 +1,106 @@
|
||||
"""**Retriever** class returns Documents given a text **query**.
|
||||
|
||||
It is more general than a vector store. A retriever does not need to be able to
|
||||
store documents, only to return (or retrieve) it. Vector stores can be used as
|
||||
the backbone of a retriever, but there are other types of retrievers as well.
|
||||
|
||||
**Class hierarchy:**
|
||||
|
||||
.. code-block::
|
||||
|
||||
BaseRetriever --> <name>Retriever # Examples: ArxivRetriever, MergerRetriever
|
||||
|
||||
**Main helpers:**
|
||||
|
||||
.. code-block::
|
||||
|
||||
Document, Serializable, Callbacks,
|
||||
CallbackManagerForRetrieverRun, AsyncCallbackManagerForRetrieverRun
|
||||
"""
|
||||
|
||||
from langchain_community.retrievers.arcee import ArceeRetriever
|
||||
from langchain_community.retrievers.arxiv import ArxivRetriever
|
||||
from langchain_community.retrievers.azure_cognitive_search import (
|
||||
AzureCognitiveSearchRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.bedrock import AmazonKnowledgeBasesRetriever
|
||||
from langchain_community.retrievers.bm25 import BM25Retriever
|
||||
from langchain_community.retrievers.chaindesk import ChaindeskRetriever
|
||||
from langchain_community.retrievers.chatgpt_plugin_retriever import (
|
||||
ChatGPTPluginRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.cohere_rag_retriever import CohereRagRetriever
|
||||
from langchain_community.retrievers.docarray import DocArrayRetriever
|
||||
from langchain_community.retrievers.elastic_search_bm25 import (
|
||||
ElasticSearchBM25Retriever,
|
||||
)
|
||||
from langchain_community.retrievers.embedchain import EmbedchainRetriever
|
||||
from langchain_community.retrievers.google_cloud_documentai_warehouse import (
|
||||
GoogleDocumentAIWarehouseRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.google_vertex_ai_search import (
|
||||
GoogleCloudEnterpriseSearchRetriever,
|
||||
GoogleVertexAIMultiTurnSearchRetriever,
|
||||
GoogleVertexAISearchRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.kay import KayAiRetriever
|
||||
from langchain_community.retrievers.kendra import AmazonKendraRetriever
|
||||
from langchain_community.retrievers.knn import KNNRetriever
|
||||
from langchain_community.retrievers.llama_index import (
|
||||
LlamaIndexGraphRetriever,
|
||||
LlamaIndexRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.metal import MetalRetriever
|
||||
from langchain_community.retrievers.milvus import MilvusRetriever
|
||||
from langchain_community.retrievers.outline import OutlineRetriever
|
||||
from langchain_community.retrievers.pinecone_hybrid_search import (
|
||||
PineconeHybridSearchRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.pubmed import PubMedRetriever
|
||||
from langchain_community.retrievers.remote_retriever import RemoteLangChainRetriever
|
||||
from langchain_community.retrievers.svm import SVMRetriever
|
||||
from langchain_community.retrievers.tavily_search_api import TavilySearchAPIRetriever
|
||||
from langchain_community.retrievers.tfidf import TFIDFRetriever
|
||||
from langchain_community.retrievers.weaviate_hybrid_search import (
|
||||
WeaviateHybridSearchRetriever,
|
||||
)
|
||||
from langchain_community.retrievers.wikipedia import WikipediaRetriever
|
||||
from langchain_community.retrievers.zep import ZepRetriever
|
||||
from langchain_community.retrievers.zilliz import ZillizRetriever
|
||||
|
||||
__all__ = [
|
||||
"AmazonKendraRetriever",
|
||||
"AmazonKnowledgeBasesRetriever",
|
||||
"ArceeRetriever",
|
||||
"ArxivRetriever",
|
||||
"AzureCognitiveSearchRetriever",
|
||||
"ChatGPTPluginRetriever",
|
||||
"ChaindeskRetriever",
|
||||
"CohereRagRetriever",
|
||||
"ElasticSearchBM25Retriever",
|
||||
"EmbedchainRetriever",
|
||||
"GoogleDocumentAIWarehouseRetriever",
|
||||
"GoogleCloudEnterpriseSearchRetriever",
|
||||
"GoogleVertexAIMultiTurnSearchRetriever",
|
||||
"GoogleVertexAISearchRetriever",
|
||||
"KayAiRetriever",
|
||||
"KNNRetriever",
|
||||
"LlamaIndexGraphRetriever",
|
||||
"LlamaIndexRetriever",
|
||||
"MetalRetriever",
|
||||
"MilvusRetriever",
|
||||
"OutlineRetriever",
|
||||
"PineconeHybridSearchRetriever",
|
||||
"PubMedRetriever",
|
||||
"RemoteLangChainRetriever",
|
||||
"SVMRetriever",
|
||||
"TavilySearchAPIRetriever",
|
||||
"TFIDFRetriever",
|
||||
"BM25Retriever",
|
||||
"VespaRetriever",
|
||||
"WeaviateHybridSearchRetriever",
|
||||
"WikipediaRetriever",
|
||||
"ZepRetriever",
|
||||
"ZillizRetriever",
|
||||
"DocArrayRetriever",
|
||||
]
|
||||
@@ -0,0 +1,19 @@
|
||||
"""Implementations of key-value stores and storage helpers.
|
||||
|
||||
Module provides implementations of various key-value stores that conform
|
||||
to a simple key-value interface.
|
||||
|
||||
The primary goal of these storages is to support implementation of caching.
|
||||
"""
|
||||
|
||||
from langchain_community.storage.redis import RedisStore
|
||||
from langchain_community.storage.upstash_redis import (
|
||||
UpstashRedisByteStore,
|
||||
UpstashRedisStore,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"RedisStore",
|
||||
"UpstashRedisByteStore",
|
||||
"UpstashRedisStore",
|
||||
]
|
||||
@@ -0,0 +1,50 @@
|
||||
from typing import Optional, Type
|
||||
|
||||
from langchain_core.callbacks import CallbackManagerForToolRun
|
||||
from langchain_core.pydantic_v1 import BaseModel, Field
|
||||
|
||||
from langchain_community.chat_models import ChatOpenAI
|
||||
from langchain_community.tools.amadeus.base import AmadeusBaseTool
|
||||
|
||||
|
||||
class ClosestAirportSchema(BaseModel):
|
||||
"""Schema for the AmadeusClosestAirport tool."""
|
||||
|
||||
location: str = Field(
|
||||
description=(
|
||||
" The location for which you would like to find the nearest airport "
|
||||
" along with optional details such as country, state, region, or "
|
||||
" province, allowing for easy processing and identification of "
|
||||
" the closest airport. Examples of the format are the following:\n"
|
||||
" Cali, Colombia\n "
|
||||
" Lincoln, Nebraska, United States\n"
|
||||
" New York, United States\n"
|
||||
" Sydney, New South Wales, Australia\n"
|
||||
" Rome, Lazio, Italy\n"
|
||||
" Toronto, Ontario, Canada\n"
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class AmadeusClosestAirport(AmadeusBaseTool):
|
||||
"""Tool for finding the closest airport to a particular location."""
|
||||
|
||||
name: str = "closest_airport"
|
||||
description: str = (
|
||||
"Use this tool to find the closest airport to a particular location."
|
||||
)
|
||||
args_schema: Type[ClosestAirportSchema] = ClosestAirportSchema
|
||||
|
||||
def _run(
|
||||
self,
|
||||
location: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
content = (
|
||||
f" What is the nearest airport to {location}? Please respond with the "
|
||||
" airport's International Air Transport Association (IATA) Location "
|
||||
' Identifier in the following JSON format. JSON: "iataCode": "IATA '
|
||||
' Location Identifier" '
|
||||
)
|
||||
|
||||
return ChatOpenAI(temperature=0).predict(content)
|
||||
@@ -0,0 +1,42 @@
|
||||
"""
|
||||
This tool allows agents to interact with the clickup library
|
||||
and operate on a Clickup instance.
|
||||
To use this tool, you must first set as environment variables:
|
||||
client_secret
|
||||
client_id
|
||||
code
|
||||
|
||||
Below is a sample script that uses the Clickup tool:
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits.clickup.toolkit import ClickupToolkit
|
||||
from langchain_community.utilities.clickup import ClickupAPIWrapper
|
||||
|
||||
clickup = ClickupAPIWrapper()
|
||||
toolkit = ClickupToolkit.from_clickup_api_wrapper(clickup)
|
||||
```
|
||||
"""
|
||||
from typing import Optional
|
||||
|
||||
from langchain_core.callbacks import CallbackManagerForToolRun
|
||||
from langchain_core.pydantic_v1 import Field
|
||||
from langchain_core.tools import BaseTool
|
||||
|
||||
from langchain_community.utilities.clickup import ClickupAPIWrapper
|
||||
|
||||
|
||||
class ClickupAction(BaseTool):
|
||||
"""Tool that queries the Clickup API."""
|
||||
|
||||
api_wrapper: ClickupAPIWrapper = Field(default_factory=ClickupAPIWrapper)
|
||||
mode: str
|
||||
name: str = ""
|
||||
description: str = ""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
instructions: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Clickup API to run an operation."""
|
||||
return self.api_wrapper.run(self.mode, instructions)
|
||||
@@ -0,0 +1,44 @@
|
||||
"""
|
||||
This tool allows agents to interact with the atlassian-python-api library
|
||||
and operate on a Jira instance. For more information on the
|
||||
atlassian-python-api library, see https://atlassian-python-api.readthedocs.io/jira.html
|
||||
|
||||
To use this tool, you must first set as environment variables:
|
||||
JIRA_API_TOKEN
|
||||
JIRA_USERNAME
|
||||
JIRA_INSTANCE_URL
|
||||
|
||||
Below is a sample script that uses the Jira tool:
|
||||
|
||||
```python
|
||||
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
|
||||
from langchain_community.utilities.jira import JiraAPIWrapper
|
||||
|
||||
jira = JiraAPIWrapper()
|
||||
toolkit = JiraToolkit.from_jira_api_wrapper(jira)
|
||||
```
|
||||
"""
|
||||
from typing import Optional
|
||||
|
||||
from langchain_core.callbacks import CallbackManagerForToolRun
|
||||
from langchain_core.pydantic_v1 import Field
|
||||
from langchain_core.tools import BaseTool
|
||||
|
||||
from langchain_community.utilities.jira import JiraAPIWrapper
|
||||
|
||||
|
||||
class JiraAction(BaseTool):
|
||||
"""Tool that queries the Atlassian Jira API."""
|
||||
|
||||
api_wrapper: JiraAPIWrapper = Field(default_factory=JiraAPIWrapper)
|
||||
mode: str
|
||||
name: str = ""
|
||||
description: str = ""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
instructions: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Atlassian Jira API to run an operation."""
|
||||
return self.api_wrapper.run(self.mode, instructions)
|
||||
@@ -0,0 +1,276 @@
|
||||
"""Tools for interacting with a Power BI dataset."""
|
||||
import logging
|
||||
from time import perf_counter
|
||||
from typing import Any, Dict, Optional, Tuple
|
||||
|
||||
from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.pydantic_v1 import Field, validator
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_community.chat_models.openai import _import_tiktoken
|
||||
|
||||
from langchain_community.tools.powerbi.prompt import (
|
||||
BAD_REQUEST_RESPONSE,
|
||||
DEFAULT_FEWSHOT_EXAMPLES,
|
||||
RETRY_RESPONSE,
|
||||
)
|
||||
from langchain_community.utilities.powerbi import PowerBIDataset, json_to_md
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class QueryPowerBITool(BaseTool):
|
||||
"""Tool for querying a Power BI Dataset."""
|
||||
|
||||
name: str = "query_powerbi"
|
||||
description: str = """
|
||||
Input to this tool is a detailed question about the dataset, output is a result from the dataset. It will try to answer the question using the dataset, and if it cannot, it will ask for clarification.
|
||||
|
||||
Example Input: "How many rows are in table1?"
|
||||
""" # noqa: E501
|
||||
llm_chain: Any
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
examples: Optional[str] = DEFAULT_FEWSHOT_EXAMPLES
|
||||
session_cache: Dict[str, Any] = Field(default_factory=dict, exclude=True)
|
||||
max_iterations: int = 5
|
||||
output_token_limit: int = 4000
|
||||
tiktoken_model_name: Optional[str] = None # "cl100k_base"
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
@validator("llm_chain")
|
||||
def validate_llm_chain_input_variables( # pylint: disable=E0213
|
||||
cls, llm_chain: Any
|
||||
) -> Any:
|
||||
"""Make sure the LLM chain has the correct input variables."""
|
||||
for var in llm_chain.prompt.input_variables:
|
||||
if var not in ["tool_input", "tables", "schemas", "examples"]:
|
||||
raise ValueError(
|
||||
"LLM chain for QueryPowerBITool must have input variables ['tool_input', 'tables', 'schemas', 'examples'], found %s", # noqa: C0301 E501 # pylint: disable=C0301
|
||||
llm_chain.prompt.input_variables,
|
||||
)
|
||||
return llm_chain
|
||||
|
||||
def _check_cache(self, tool_input: str) -> Optional[str]:
|
||||
"""Check if the input is present in the cache.
|
||||
|
||||
If the value is a bad request, overwrite with the escalated version,
|
||||
if not present return None."""
|
||||
if tool_input not in self.session_cache:
|
||||
return None
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
def _run(
|
||||
self,
|
||||
tool_input: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Execute the query, return the results or an error message."""
|
||||
if cache := self._check_cache(tool_input):
|
||||
logger.debug("Found cached result for %s: %s", tool_input, cache)
|
||||
return cache
|
||||
|
||||
try:
|
||||
logger.info("Running PBI Query Tool with input: %s", tool_input)
|
||||
query = self.llm_chain.predict(
|
||||
tool_input=tool_input,
|
||||
tables=self.powerbi.get_table_names(),
|
||||
schemas=self.powerbi.get_schemas(),
|
||||
examples=self.examples,
|
||||
callbacks=run_manager.get_child() if run_manager else None,
|
||||
)
|
||||
except Exception as exc: # pylint: disable=broad-except
|
||||
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
|
||||
return self.session_cache[tool_input]
|
||||
if query == "I cannot answer this":
|
||||
self.session_cache[tool_input] = query
|
||||
return self.session_cache[tool_input]
|
||||
logger.info("PBI Query:\n%s", query)
|
||||
start_time = perf_counter()
|
||||
pbi_result = self.powerbi.run(command=query)
|
||||
end_time = perf_counter()
|
||||
logger.debug("PBI Result: %s", pbi_result)
|
||||
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
|
||||
result, error = self._parse_output(pbi_result)
|
||||
if error is not None and "TokenExpired" in error:
|
||||
self.session_cache[
|
||||
tool_input
|
||||
] = "Authentication token expired or invalid, please try reauthenticate."
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
iterations = kwargs.get("iterations", 0)
|
||||
if error and iterations < self.max_iterations:
|
||||
return self._run(
|
||||
tool_input=RETRY_RESPONSE.format(
|
||||
tool_input=tool_input, query=query, error=error
|
||||
),
|
||||
run_manager=run_manager,
|
||||
iterations=iterations + 1,
|
||||
)
|
||||
|
||||
self.session_cache[tool_input] = (
|
||||
result if result else BAD_REQUEST_RESPONSE.format(error=error)
|
||||
)
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
tool_input: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Execute the query, return the results or an error message."""
|
||||
if cache := self._check_cache(tool_input):
|
||||
logger.debug("Found cached result for %s: %s", tool_input, cache)
|
||||
return f"{cache}, from cache, you have already asked this question."
|
||||
try:
|
||||
logger.info("Running PBI Query Tool with input: %s", tool_input)
|
||||
query = await self.llm_chain.apredict(
|
||||
tool_input=tool_input,
|
||||
tables=self.powerbi.get_table_names(),
|
||||
schemas=self.powerbi.get_schemas(),
|
||||
examples=self.examples,
|
||||
callbacks=run_manager.get_child() if run_manager else None,
|
||||
)
|
||||
except Exception as exc: # pylint: disable=broad-except
|
||||
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
if query == "I cannot answer this":
|
||||
self.session_cache[tool_input] = query
|
||||
return self.session_cache[tool_input]
|
||||
logger.info("PBI Query: %s", query)
|
||||
start_time = perf_counter()
|
||||
pbi_result = await self.powerbi.arun(command=query)
|
||||
end_time = perf_counter()
|
||||
logger.debug("PBI Result: %s", pbi_result)
|
||||
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
|
||||
result, error = self._parse_output(pbi_result)
|
||||
if error is not None and ("TokenExpired" in error or "TokenError" in error):
|
||||
self.session_cache[
|
||||
tool_input
|
||||
] = "Authentication token expired or invalid, please try to reauthenticate or check the scope of the credential." # noqa: E501
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
iterations = kwargs.get("iterations", 0)
|
||||
if error and iterations < self.max_iterations:
|
||||
return await self._arun(
|
||||
tool_input=RETRY_RESPONSE.format(
|
||||
tool_input=tool_input, query=query, error=error
|
||||
),
|
||||
run_manager=run_manager,
|
||||
iterations=iterations + 1,
|
||||
)
|
||||
|
||||
self.session_cache[tool_input] = (
|
||||
result if result else BAD_REQUEST_RESPONSE.format(error=error)
|
||||
)
|
||||
return self.session_cache[tool_input]
|
||||
|
||||
def _parse_output(
|
||||
self, pbi_result: Dict[str, Any]
|
||||
) -> Tuple[Optional[str], Optional[Any]]:
|
||||
"""Parse the output of the query to a markdown table."""
|
||||
if "results" in pbi_result:
|
||||
rows = pbi_result["results"][0]["tables"][0]["rows"]
|
||||
if len(rows) == 0:
|
||||
logger.info("0 records in result, query was valid.")
|
||||
return (
|
||||
None,
|
||||
"0 rows returned, this might be correct, but please validate if all filter values were correct?", # noqa: E501
|
||||
)
|
||||
result = json_to_md(rows)
|
||||
too_long, length = self._result_too_large(result)
|
||||
if too_long:
|
||||
return (
|
||||
f"Result too large, please try to be more specific or use the `TOPN` function. The result is {length} tokens long, the limit is {self.output_token_limit} tokens.", # noqa: E501
|
||||
None,
|
||||
)
|
||||
return result, None
|
||||
|
||||
if "error" in pbi_result:
|
||||
if (
|
||||
"pbi.error" in pbi_result["error"]
|
||||
and "details" in pbi_result["error"]["pbi.error"]
|
||||
):
|
||||
return None, pbi_result["error"]["pbi.error"]["details"][0]["detail"]
|
||||
return None, pbi_result["error"]
|
||||
return None, pbi_result
|
||||
|
||||
def _result_too_large(self, result: str) -> Tuple[bool, int]:
|
||||
"""Tokenize the output of the query."""
|
||||
if self.tiktoken_model_name:
|
||||
tiktoken_ = _import_tiktoken()
|
||||
encoding = tiktoken_.encoding_for_model(self.tiktoken_model_name)
|
||||
length = len(encoding.encode(result))
|
||||
logger.info("Result length: %s", length)
|
||||
return length > self.output_token_limit, length
|
||||
return False, 0
|
||||
|
||||
|
||||
class InfoPowerBITool(BaseTool):
|
||||
"""Tool for getting metadata about a PowerBI Dataset."""
|
||||
|
||||
name: str = "schema_powerbi"
|
||||
description: str = """
|
||||
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
|
||||
Be sure that the tables actually exist by calling list_tables_powerbi first!
|
||||
|
||||
Example Input: "table1, table2, table3"
|
||||
""" # noqa: E501
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
def _run(
|
||||
self,
|
||||
tool_input: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the schema for tables in a comma-separated list."""
|
||||
return self.powerbi.get_table_info(tool_input.split(", "))
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
tool_input: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
return await self.powerbi.aget_table_info(tool_input.split(", "))
|
||||
|
||||
|
||||
class ListPowerBITool(BaseTool):
|
||||
"""Tool for getting tables names."""
|
||||
|
||||
name: str = "list_tables_powerbi"
|
||||
description: str = "Input is an empty string, output is a comma separated list of tables in the database." # noqa: E501 # pylint: disable=C0301
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
def _run(
|
||||
self,
|
||||
tool_input: Optional[str] = None,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the names of the tables."""
|
||||
return ", ".join(self.powerbi.get_table_names())
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
tool_input: Optional[str] = None,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the names of the tables."""
|
||||
return ", ".join(self.powerbi.get_table_names())
|
||||
@@ -0,0 +1,130 @@
|
||||
# flake8: noqa
|
||||
"""Tools for interacting with Spark SQL."""
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from langchain_core.pydantic_v1 import BaseModel, Field, root_validator
|
||||
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
from langchain_community.utilities.spark_sql import SparkSQL
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_community.tools.spark_sql.prompt import QUERY_CHECKER
|
||||
|
||||
|
||||
class BaseSparkSQLTool(BaseModel):
|
||||
"""Base tool for interacting with Spark SQL."""
|
||||
|
||||
db: SparkSQL = Field(exclude=True)
|
||||
|
||||
class Config(BaseTool.Config):
|
||||
pass
|
||||
|
||||
|
||||
class QuerySparkSQLTool(BaseSparkSQLTool, BaseTool):
|
||||
"""Tool for querying a Spark SQL."""
|
||||
|
||||
name: str = "query_sql_db"
|
||||
description: str = """
|
||||
Input to this tool is a detailed and correct SQL query, output is a result from the Spark SQL.
|
||||
If the query is not correct, an error message will be returned.
|
||||
If an error is returned, rewrite the query, check the query, and try again.
|
||||
"""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Execute the query, return the results or an error message."""
|
||||
return self.db.run_no_throw(query)
|
||||
|
||||
|
||||
class InfoSparkSQLTool(BaseSparkSQLTool, BaseTool):
|
||||
"""Tool for getting metadata about a Spark SQL."""
|
||||
|
||||
name: str = "schema_sql_db"
|
||||
description: str = """
|
||||
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
|
||||
Be sure that the tables actually exist by calling list_tables_sql_db first!
|
||||
|
||||
Example Input: "table1, table2, table3"
|
||||
"""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
table_names: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the schema for tables in a comma-separated list."""
|
||||
return self.db.get_table_info_no_throw(table_names.split(", "))
|
||||
|
||||
|
||||
class ListSparkSQLTool(BaseSparkSQLTool, BaseTool):
|
||||
"""Tool for getting tables names."""
|
||||
|
||||
name: str = "list_tables_sql_db"
|
||||
description: str = "Input is an empty string, output is a comma separated list of tables in the Spark SQL."
|
||||
|
||||
def _run(
|
||||
self,
|
||||
tool_input: str = "",
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the schema for a specific table."""
|
||||
return ", ".join(self.db.get_usable_table_names())
|
||||
|
||||
|
||||
class QueryCheckerTool(BaseSparkSQLTool, BaseTool):
|
||||
"""Use an LLM to check if a query is correct.
|
||||
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
|
||||
|
||||
template: str = QUERY_CHECKER
|
||||
llm: BaseLanguageModel
|
||||
llm_chain: Any = Field(init=False)
|
||||
name: str = "query_checker_sql_db"
|
||||
description: str = """
|
||||
Use this tool to double check if your query is correct before executing it.
|
||||
Always use this tool before executing a query with query_sql_db!
|
||||
"""
|
||||
|
||||
@root_validator(pre=True)
|
||||
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
if "llm_chain" not in values:
|
||||
from langchain.chains.llm import LLMChain
|
||||
values["llm_chain"] = LLMChain(
|
||||
llm=values.get("llm"),
|
||||
prompt=PromptTemplate(
|
||||
template=QUERY_CHECKER, input_variables=["query"]
|
||||
),
|
||||
)
|
||||
|
||||
if values["llm_chain"].prompt.input_variables != ["query"]:
|
||||
raise ValueError(
|
||||
"LLM chain for QueryCheckerTool need to use ['query'] as input_variables "
|
||||
"for the embedded prompt"
|
||||
)
|
||||
|
||||
return values
|
||||
|
||||
def _run(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the LLM to check the query."""
|
||||
return self.llm_chain.predict(
|
||||
query=query, callbacks=run_manager.get_child() if run_manager else None
|
||||
)
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
return await self.llm_chain.apredict(
|
||||
query=query, callbacks=run_manager.get_child() if run_manager else None
|
||||
)
|
||||
@@ -0,0 +1,134 @@
|
||||
# flake8: noqa
|
||||
"""Tools for interacting with a SQL database."""
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from langchain_core.pydantic_v1 import BaseModel, Extra, Field, root_validator
|
||||
|
||||
from langchain_core.language_models import BaseLanguageModel
|
||||
from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
from langchain_community.utilities.sql_database import SQLDatabase
|
||||
from langchain_core.tools import BaseTool
|
||||
from langchain_community.tools.sql_database.prompt import QUERY_CHECKER
|
||||
|
||||
|
||||
class BaseSQLDatabaseTool(BaseModel):
|
||||
"""Base tool for interacting with a SQL database."""
|
||||
|
||||
db: SQLDatabase = Field(exclude=True)
|
||||
|
||||
class Config(BaseTool.Config):
|
||||
pass
|
||||
|
||||
|
||||
class QuerySQLDataBaseTool(BaseSQLDatabaseTool, BaseTool):
|
||||
"""Tool for querying a SQL database."""
|
||||
|
||||
name: str = "sql_db_query"
|
||||
description: str = """
|
||||
Input to this tool is a detailed and correct SQL query, output is a result from the database.
|
||||
If the query is not correct, an error message will be returned.
|
||||
If an error is returned, rewrite the query, check the query, and try again.
|
||||
"""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Execute the query, return the results or an error message."""
|
||||
return self.db.run_no_throw(query)
|
||||
|
||||
|
||||
class InfoSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
|
||||
"""Tool for getting metadata about a SQL database."""
|
||||
|
||||
name: str = "sql_db_schema"
|
||||
description: str = """
|
||||
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
|
||||
|
||||
Example Input: "table1, table2, table3"
|
||||
"""
|
||||
|
||||
def _run(
|
||||
self,
|
||||
table_names: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the schema for tables in a comma-separated list."""
|
||||
return self.db.get_table_info_no_throw(
|
||||
[t.strip() for t in table_names.split(",")]
|
||||
)
|
||||
|
||||
|
||||
class ListSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
|
||||
"""Tool for getting tables names."""
|
||||
|
||||
name: str = "sql_db_list_tables"
|
||||
description: str = "Input is an empty string, output is a comma separated list of tables in the database."
|
||||
|
||||
def _run(
|
||||
self,
|
||||
tool_input: str = "",
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Get the schema for a specific table."""
|
||||
return ", ".join(self.db.get_usable_table_names())
|
||||
|
||||
|
||||
class QuerySQLCheckerTool(BaseSQLDatabaseTool, BaseTool):
|
||||
"""Use an LLM to check if a query is correct.
|
||||
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
|
||||
|
||||
template: str = QUERY_CHECKER
|
||||
llm: BaseLanguageModel
|
||||
llm_chain: Any = Field(init=False)
|
||||
name: str = "sql_db_query_checker"
|
||||
description: str = """
|
||||
Use this tool to double check if your query is correct before executing it.
|
||||
Always use this tool before executing a query with sql_db_query!
|
||||
"""
|
||||
|
||||
@root_validator(pre=True)
|
||||
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
if "llm_chain" not in values:
|
||||
from langchain.chains.llm import LLMChain
|
||||
values["llm_chain"] = LLMChain(
|
||||
llm=values.get("llm"),
|
||||
prompt=PromptTemplate(
|
||||
template=QUERY_CHECKER, input_variables=["dialect", "query"]
|
||||
),
|
||||
)
|
||||
|
||||
if values["llm_chain"].prompt.input_variables != ["dialect", "query"]:
|
||||
raise ValueError(
|
||||
"LLM chain for QueryCheckerTool must have input variables ['query', 'dialect']"
|
||||
)
|
||||
|
||||
return values
|
||||
|
||||
def _run(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the LLM to check the query."""
|
||||
return self.llm_chain.predict(
|
||||
query=query,
|
||||
dialect=self.db.dialect,
|
||||
callbacks=run_manager.get_child() if run_manager else None,
|
||||
)
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
query: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
return await self.llm_chain.apredict(
|
||||
query=query,
|
||||
dialect=self.db.dialect,
|
||||
callbacks=run_manager.get_child() if run_manager else None,
|
||||
)
|
||||
@@ -0,0 +1,215 @@
|
||||
"""[DEPRECATED]
|
||||
|
||||
## Zapier Natural Language Actions API
|
||||
\
|
||||
Full docs here: https://nla.zapier.com/start/
|
||||
|
||||
**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions
|
||||
on Zapier's platform through a natural language API interface.
|
||||
|
||||
NLA supports apps like Gmail, Salesforce, Trello, Slack, Asana, HubSpot, Google Sheets,
|
||||
Microsoft Teams, and thousands more apps: https://zapier.com/apps
|
||||
|
||||
Zapier NLA handles ALL the underlying API auth and translation from
|
||||
natural language --> underlying API call --> return simplified output for LLMs
|
||||
The key idea is you, or your users, expose a set of actions via an oauth-like setup
|
||||
window, which you can then query and execute via a REST API.
|
||||
|
||||
NLA offers both API Key and OAuth for signing NLA API requests.
|
||||
|
||||
1. Server-side (API Key): for quickly getting started, testing, and production scenarios
|
||||
where LangChain will only use actions exposed in the developer's Zapier account
|
||||
(and will use the developer's connected accounts on Zapier.com)
|
||||
|
||||
2. User-facing (Oauth): for production scenarios where you are deploying an end-user
|
||||
facing application and LangChain needs access to end-user's exposed actions and
|
||||
connected accounts on Zapier.com
|
||||
|
||||
This quick start will focus on the server-side use case for brevity.
|
||||
Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer
|
||||
support.
|
||||
|
||||
Typically, you'd use SequentialChain, here's a basic example:
|
||||
|
||||
1. Use NLA to find an email in Gmail
|
||||
2. Use LLMChain to generate a draft reply to (1)
|
||||
3. Use NLA to send the draft reply (2) to someone in Slack via direct message
|
||||
|
||||
In code, below:
|
||||
|
||||
```python
|
||||
|
||||
import os
|
||||
|
||||
# get from https://platform.openai.com/
|
||||
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
|
||||
|
||||
# get from https://nla.zapier.com/docs/authentication/
|
||||
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "")
|
||||
|
||||
from langchain_community.agent_toolkits import ZapierToolkit
|
||||
from langchain_community.utilities.zapier import ZapierNLAWrapper
|
||||
|
||||
## step 0. expose gmail 'find email' and slack 'send channel message' actions
|
||||
|
||||
# first go here, log in, expose (enable) the two actions:
|
||||
# https://nla.zapier.com/demo/start
|
||||
# -- for this example, can leave all fields "Have AI guess"
|
||||
# in an oauth scenario, you'd get your own <provider> id (instead of 'demo')
|
||||
# which you route your users through first
|
||||
|
||||
zapier = ZapierNLAWrapper()
|
||||
## To leverage OAuth you may pass the value `nla_oauth_access_token` to
|
||||
## the ZapierNLAWrapper. If you do this there is no need to initialize
|
||||
## the ZAPIER_NLA_API_KEY env variable
|
||||
# zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token="TOKEN_HERE")
|
||||
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
|
||||
```
|
||||
|
||||
"""
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
from langchain_core._api import warn_deprecated
|
||||
from langchain_core.callbacks import (
|
||||
AsyncCallbackManagerForToolRun,
|
||||
CallbackManagerForToolRun,
|
||||
)
|
||||
from langchain_core.pydantic_v1 import Field, root_validator
|
||||
from langchain_core.tools import BaseTool
|
||||
|
||||
from langchain_community.tools.zapier.prompt import BASE_ZAPIER_TOOL_PROMPT
|
||||
from langchain_community.utilities.zapier import ZapierNLAWrapper
|
||||
|
||||
|
||||
class ZapierNLARunAction(BaseTool):
|
||||
"""
|
||||
Args:
|
||||
action_id: a specific action ID (from list actions) of the action to execute
|
||||
(the set api_key must be associated with the action owner)
|
||||
instructions: a natural language instruction string for using the action
|
||||
(eg. "get the latest email from Mike Knoop" for "Gmail: find email" action)
|
||||
params: a dict, optional. Any params provided will *override* AI guesses
|
||||
from `instructions` (see "understanding the AI guessing flow" here:
|
||||
https://nla.zapier.com/docs/using-the-api#ai-guessing)
|
||||
|
||||
"""
|
||||
|
||||
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
|
||||
action_id: str
|
||||
params: Optional[dict] = None
|
||||
base_prompt: str = BASE_ZAPIER_TOOL_PROMPT
|
||||
zapier_description: str
|
||||
params_schema: Dict[str, str] = Field(default_factory=dict)
|
||||
name: str = ""
|
||||
description: str = ""
|
||||
|
||||
@root_validator
|
||||
def set_name_description(cls, values: Dict[str, Any]) -> Dict[str, Any]:
|
||||
zapier_description = values["zapier_description"]
|
||||
params_schema = values["params_schema"]
|
||||
if "instructions" in params_schema:
|
||||
del params_schema["instructions"]
|
||||
|
||||
# Ensure base prompt (if overridden) contains necessary input fields
|
||||
necessary_fields = {"{zapier_description}", "{params}"}
|
||||
if not all(field in values["base_prompt"] for field in necessary_fields):
|
||||
raise ValueError(
|
||||
"Your custom base Zapier prompt must contain input fields for "
|
||||
"{zapier_description} and {params}."
|
||||
)
|
||||
|
||||
values["name"] = zapier_description
|
||||
values["description"] = values["base_prompt"].format(
|
||||
zapier_description=zapier_description,
|
||||
params=str(list(params_schema.keys())),
|
||||
)
|
||||
return values
|
||||
|
||||
def _run(
|
||||
self, instructions: str, run_manager: Optional[CallbackManagerForToolRun] = None
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
warn_deprecated(
|
||||
since="0.0.319",
|
||||
message=(
|
||||
"This tool will be deprecated on 2023-11-17. See "
|
||||
"https://nla.zapier.com/sunset/ for details"
|
||||
),
|
||||
)
|
||||
return self.api_wrapper.run_as_str(self.action_id, instructions, self.params)
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
instructions: str,
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
warn_deprecated(
|
||||
since="0.0.319",
|
||||
message=(
|
||||
"This tool will be deprecated on 2023-11-17. See "
|
||||
"https://nla.zapier.com/sunset/ for details"
|
||||
),
|
||||
)
|
||||
return await self.api_wrapper.arun_as_str(
|
||||
self.action_id,
|
||||
instructions,
|
||||
self.params,
|
||||
)
|
||||
|
||||
|
||||
ZapierNLARunAction.__doc__ = (
|
||||
ZapierNLAWrapper.run.__doc__ + ZapierNLARunAction.__doc__ # type: ignore
|
||||
)
|
||||
|
||||
|
||||
# other useful actions
|
||||
|
||||
|
||||
class ZapierNLAListActions(BaseTool):
|
||||
"""
|
||||
Args:
|
||||
None
|
||||
|
||||
"""
|
||||
|
||||
name: str = "ZapierNLA_list_actions"
|
||||
description: str = BASE_ZAPIER_TOOL_PROMPT + (
|
||||
"This tool returns a list of the user's exposed actions."
|
||||
)
|
||||
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
|
||||
|
||||
def _run(
|
||||
self,
|
||||
_: str = "",
|
||||
run_manager: Optional[CallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
warn_deprecated(
|
||||
since="0.0.319",
|
||||
message=(
|
||||
"This tool will be deprecated on 2023-11-17. See "
|
||||
"https://nla.zapier.com/sunset/ for details"
|
||||
),
|
||||
)
|
||||
return self.api_wrapper.list_as_str()
|
||||
|
||||
async def _arun(
|
||||
self,
|
||||
_: str = "",
|
||||
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
|
||||
) -> str:
|
||||
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
|
||||
warn_deprecated(
|
||||
since="0.0.319",
|
||||
message=(
|
||||
"This tool will be deprecated on 2023-11-17. See "
|
||||
"https://nla.zapier.com/sunset/ for details"
|
||||
),
|
||||
)
|
||||
return await self.api_wrapper.alist_as_str()
|
||||
|
||||
|
||||
ZapierNLAListActions.__doc__ = (
|
||||
ZapierNLAWrapper.list.__doc__ + ZapierNLAListActions.__doc__ # type: ignore
|
||||
)
|
||||
@@ -0,0 +1,283 @@
|
||||
"""Integration tests for the langchain tracer module."""
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
from aiohttp import ClientSession
|
||||
from langchain_core.callbacks.manager import atrace_as_chain_group, trace_as_chain_group
|
||||
from langchain_core.tracers.context import tracing_v2_enabled, tracing_enabled
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
|
||||
from langchain_community.chat_models import ChatOpenAI
|
||||
from langchain_community.llms import OpenAI
|
||||
|
||||
questions = [
|
||||
(
|
||||
"Who won the US Open men's final in 2019? "
|
||||
"What is his age raised to the 0.334 power?"
|
||||
),
|
||||
(
|
||||
"Who is Olivia Wilde's boyfriend? "
|
||||
"What is his current age raised to the 0.23 power?"
|
||||
),
|
||||
(
|
||||
"Who won the most recent formula 1 grand prix? "
|
||||
"What is their age raised to the 0.23 power?"
|
||||
),
|
||||
(
|
||||
"Who won the US Open women's final in 2019? "
|
||||
"What is her age raised to the 0.34 power?"
|
||||
),
|
||||
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
|
||||
]
|
||||
|
||||
|
||||
def test_tracing_sequential() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING"] = "true"
|
||||
|
||||
for q in questions[:3]:
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(q)
|
||||
|
||||
|
||||
def test_tracing_session_env_var() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING"] = "true"
|
||||
os.environ["LANGCHAIN_SESSION"] = "my_session"
|
||||
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(questions[0])
|
||||
if "LANGCHAIN_SESSION" in os.environ:
|
||||
del os.environ["LANGCHAIN_SESSION"]
|
||||
|
||||
|
||||
async def test_tracing_concurrent() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING"] = "true"
|
||||
aiosession = ClientSession()
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
tasks = [agent.arun(q) for q in questions[:3]]
|
||||
await asyncio.gather(*tasks)
|
||||
await aiosession.close()
|
||||
|
||||
|
||||
async def test_tracing_concurrent_bw_compat_environ() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_HANDLER"] = "langchain"
|
||||
if "LANGCHAIN_TRACING" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING"]
|
||||
aiosession = ClientSession()
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
tasks = [agent.arun(q) for q in questions[:3]]
|
||||
await asyncio.gather(*tasks)
|
||||
await aiosession.close()
|
||||
if "LANGCHAIN_HANDLER" in os.environ:
|
||||
del os.environ["LANGCHAIN_HANDLER"]
|
||||
|
||||
|
||||
def test_tracing_context_manager() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
if "LANGCHAIN_TRACING" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING"]
|
||||
with tracing_enabled() as session:
|
||||
assert session
|
||||
agent.run(questions[0]) # this should be traced
|
||||
|
||||
agent.run(questions[0]) # this should not be traced
|
||||
|
||||
|
||||
async def test_tracing_context_manager_async() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
if "LANGCHAIN_TRACING" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING"]
|
||||
|
||||
# start a background task
|
||||
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
|
||||
with tracing_enabled() as session:
|
||||
assert session
|
||||
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
await task
|
||||
|
||||
|
||||
async def test_tracing_v2_environment_variable() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING_V2"] = "true"
|
||||
|
||||
aiosession = ClientSession()
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
tasks = [agent.arun(q) for q in questions[:3]]
|
||||
await asyncio.gather(*tasks)
|
||||
await aiosession.close()
|
||||
|
||||
|
||||
def test_tracing_v2_context_manager() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = ChatOpenAI(temperature=0)
|
||||
tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
if "LANGCHAIN_TRACING_V2" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING_V2"]
|
||||
with tracing_v2_enabled():
|
||||
agent.run(questions[0]) # this should be traced
|
||||
|
||||
agent.run(questions[0]) # this should not be traced
|
||||
|
||||
|
||||
def test_tracing_v2_chain_with_tags() -> None:
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.chains.constitutional_ai.base import ConstitutionalChain
|
||||
from langchain.chains.constitutional_ai.models import ConstitutionalPrinciple
|
||||
llm = OpenAI(temperature=0)
|
||||
chain = ConstitutionalChain.from_llm(
|
||||
llm,
|
||||
chain=LLMChain.from_string(llm, "Q: {question} A:"),
|
||||
tags=["only-root"],
|
||||
constitutional_principles=[
|
||||
ConstitutionalPrinciple(
|
||||
critique_request="Tell if this answer is good.",
|
||||
revision_request="Give a better answer.",
|
||||
)
|
||||
],
|
||||
)
|
||||
if "LANGCHAIN_TRACING_V2" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING_V2"]
|
||||
with tracing_v2_enabled():
|
||||
chain.run("what is the meaning of life", tags=["a-tag"])
|
||||
|
||||
|
||||
def test_tracing_v2_agent_with_metadata() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING_V2"] = "true"
|
||||
llm = OpenAI(temperature=0)
|
||||
chat = ChatOpenAI(temperature=0)
|
||||
tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
chat_agent = initialize_agent(
|
||||
tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
|
||||
chat_agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
|
||||
|
||||
|
||||
async def test_tracing_v2_async_agent_with_metadata() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_TRACING_V2"] = "true"
|
||||
llm = OpenAI(temperature=0, metadata={"f": "g", "h": "i"})
|
||||
chat = ChatOpenAI(temperature=0, metadata={"f": "g", "h": "i"})
|
||||
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
chat_agent = initialize_agent(
|
||||
async_tools,
|
||||
chat,
|
||||
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
|
||||
verbose=True,
|
||||
)
|
||||
await agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
|
||||
await chat_agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
|
||||
|
||||
|
||||
def test_trace_as_group() -> None:
|
||||
from langchain.chains.llm import LLMChain
|
||||
llm = OpenAI(temperature=0.9)
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
with trace_as_chain_group("my_group", inputs={"input": "cars"}) as group_manager:
|
||||
chain.run(product="cars", callbacks=group_manager)
|
||||
chain.run(product="computers", callbacks=group_manager)
|
||||
final_res = chain.run(product="toys", callbacks=group_manager)
|
||||
group_manager.on_chain_end({"output": final_res})
|
||||
|
||||
with trace_as_chain_group("my_group_2", inputs={"input": "toys"}) as group_manager:
|
||||
final_res = chain.run(product="toys", callbacks=group_manager)
|
||||
group_manager.on_chain_end({"output": final_res})
|
||||
|
||||
|
||||
def test_trace_as_group_with_env_set() -> None:
|
||||
from langchain.chains.llm import LLMChain
|
||||
os.environ["LANGCHAIN_TRACING_V2"] = "true"
|
||||
llm = OpenAI(temperature=0.9)
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
with trace_as_chain_group(
|
||||
"my_group_env_set", inputs={"input": "cars"}
|
||||
) as group_manager:
|
||||
chain.run(product="cars", callbacks=group_manager)
|
||||
chain.run(product="computers", callbacks=group_manager)
|
||||
final_res = chain.run(product="toys", callbacks=group_manager)
|
||||
group_manager.on_chain_end({"output": final_res})
|
||||
|
||||
with trace_as_chain_group(
|
||||
"my_group_2_env_set", inputs={"input": "toys"}
|
||||
) as group_manager:
|
||||
final_res = chain.run(product="toys", callbacks=group_manager)
|
||||
group_manager.on_chain_end({"output": final_res})
|
||||
|
||||
|
||||
async def test_trace_as_group_async() -> None:
|
||||
from langchain.chains.llm import LLMChain
|
||||
llm = OpenAI(temperature=0.9)
|
||||
prompt = PromptTemplate(
|
||||
input_variables=["product"],
|
||||
template="What is a good name for a company that makes {product}?",
|
||||
)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
async with atrace_as_chain_group("my_async_group") as group_manager:
|
||||
await chain.arun(product="cars", callbacks=group_manager)
|
||||
await chain.arun(product="computers", callbacks=group_manager)
|
||||
await chain.arun(product="toys", callbacks=group_manager)
|
||||
|
||||
async with atrace_as_chain_group(
|
||||
"my_async_group_2", inputs={"input": "toys"}
|
||||
) as group_manager:
|
||||
res = await asyncio.gather(
|
||||
*[
|
||||
chain.arun(product="toys", callbacks=group_manager),
|
||||
chain.arun(product="computers", callbacks=group_manager),
|
||||
chain.arun(product="cars", callbacks=group_manager),
|
||||
]
|
||||
)
|
||||
await group_manager.on_chain_end({"output": res})
|
||||
@@ -0,0 +1,68 @@
|
||||
"""Integration tests for the langchain tracer module."""
|
||||
import asyncio
|
||||
|
||||
|
||||
from langchain_community.callbacks import get_openai_callback
|
||||
from langchain_community.llms import OpenAI
|
||||
|
||||
|
||||
async def test_openai_callback() -> None:
|
||||
llm = OpenAI(temperature=0)
|
||||
with get_openai_callback() as cb:
|
||||
llm("What is the square root of 4?")
|
||||
|
||||
total_tokens = cb.total_tokens
|
||||
assert total_tokens > 0
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
llm("What is the square root of 4?")
|
||||
llm("What is the square root of 4?")
|
||||
|
||||
assert cb.total_tokens == total_tokens * 2
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
await asyncio.gather(
|
||||
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
|
||||
)
|
||||
|
||||
assert cb.total_tokens == total_tokens * 3
|
||||
|
||||
task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
|
||||
with get_openai_callback() as cb:
|
||||
await llm.agenerate(["What is the square root of 4?"])
|
||||
|
||||
await task
|
||||
assert cb.total_tokens == total_tokens
|
||||
|
||||
|
||||
def test_openai_callback_batch_llm() -> None:
|
||||
llm = OpenAI(temperature=0)
|
||||
with get_openai_callback() as cb:
|
||||
llm.generate(["What is the square root of 4?", "What is the square root of 4?"])
|
||||
|
||||
assert cb.total_tokens > 0
|
||||
total_tokens = cb.total_tokens
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
llm("What is the square root of 4?")
|
||||
llm("What is the square root of 4?")
|
||||
|
||||
assert cb.total_tokens == total_tokens
|
||||
|
||||
|
||||
def test_openai_callback_agent() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(["serpapi", "llm-math"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
with get_openai_callback() as cb:
|
||||
agent.run(
|
||||
"Who is Olivia Wilde's boyfriend? "
|
||||
"What is his current age raised to the 0.23 power?"
|
||||
)
|
||||
print(f"Total Tokens: {cb.total_tokens}")
|
||||
print(f"Prompt Tokens: {cb.prompt_tokens}")
|
||||
print(f"Completion Tokens: {cb.completion_tokens}")
|
||||
print(f"Total Cost (USD): ${cb.total_cost}")
|
||||
@@ -0,0 +1,30 @@
|
||||
"""Integration tests for the StreamlitCallbackHandler module."""
|
||||
|
||||
import pytest
|
||||
|
||||
# Import the internal StreamlitCallbackHandler from its module - and not from
|
||||
# the `langchain_community.callbacks.streamlit` package - so that we don't end up using
|
||||
# Streamlit's externally-provided callback handler.
|
||||
from langchain_community.callbacks.streamlit.streamlit_callback_handler import (
|
||||
StreamlitCallbackHandler,
|
||||
)
|
||||
from langchain_community.llms import OpenAI
|
||||
|
||||
|
||||
@pytest.mark.requires("streamlit")
|
||||
def test_streamlit_callback_agent() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
import streamlit as st
|
||||
|
||||
streamlit_callback = StreamlitCallbackHandler(st.container())
|
||||
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(["serpapi", "llm-math"], llm=llm)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(
|
||||
"Who is Olivia Wilde's boyfriend? "
|
||||
"What is his current age raised to the 0.23 power?",
|
||||
callbacks=[streamlit_callback],
|
||||
)
|
||||
@@ -0,0 +1,118 @@
|
||||
"""Integration tests for the langchain tracer module."""
|
||||
import asyncio
|
||||
import os
|
||||
|
||||
from aiohttp import ClientSession
|
||||
from langchain_community.callbacks import wandb_tracing_enabled
|
||||
|
||||
from langchain_community.llms import OpenAI
|
||||
|
||||
questions = [
|
||||
(
|
||||
"Who won the US Open men's final in 2019? "
|
||||
"What is his age raised to the 0.334 power?"
|
||||
),
|
||||
(
|
||||
"Who is Olivia Wilde's boyfriend? "
|
||||
"What is his current age raised to the 0.23 power?"
|
||||
),
|
||||
(
|
||||
"Who won the most recent formula 1 grand prix? "
|
||||
"What is their age raised to the 0.23 power?"
|
||||
),
|
||||
(
|
||||
"Who won the US Open women's final in 2019? "
|
||||
"What is her age raised to the 0.34 power?"
|
||||
),
|
||||
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
|
||||
]
|
||||
|
||||
|
||||
def test_tracing_sequential() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
|
||||
os.environ["WANDB_PROJECT"] = "langchain-tracing"
|
||||
|
||||
for q in questions[:3]:
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(
|
||||
["llm-math", "serpapi"],
|
||||
llm=llm,
|
||||
)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(q)
|
||||
|
||||
|
||||
def test_tracing_session_env_var() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
|
||||
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(
|
||||
["llm-math", "serpapi"],
|
||||
llm=llm,
|
||||
)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
agent.run(questions[0])
|
||||
|
||||
|
||||
async def test_tracing_concurrent() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
|
||||
aiosession = ClientSession()
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(
|
||||
["llm-math", "serpapi"],
|
||||
llm=llm,
|
||||
aiosession=aiosession,
|
||||
)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
tasks = [agent.arun(q) for q in questions[:3]]
|
||||
await asyncio.gather(*tasks)
|
||||
await aiosession.close()
|
||||
|
||||
|
||||
def test_tracing_context_manager() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = OpenAI(temperature=0)
|
||||
tools = load_tools(
|
||||
["llm-math", "serpapi"],
|
||||
llm=llm,
|
||||
)
|
||||
agent = initialize_agent(
|
||||
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
if "LANGCHAIN_WANDB_TRACING" in os.environ:
|
||||
del os.environ["LANGCHAIN_WANDB_TRACING"]
|
||||
with wandb_tracing_enabled():
|
||||
agent.run(questions[0]) # this should be traced
|
||||
|
||||
agent.run(questions[0]) # this should not be traced
|
||||
|
||||
|
||||
async def test_tracing_context_manager_async() -> None:
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
llm = OpenAI(temperature=0)
|
||||
async_tools = load_tools(
|
||||
["llm-math", "serpapi"],
|
||||
llm=llm,
|
||||
)
|
||||
agent = initialize_agent(
|
||||
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
|
||||
)
|
||||
if "LANGCHAIN_WANDB_TRACING" in os.environ:
|
||||
del os.environ["LANGCHAIN_TRACING"]
|
||||
|
||||
# start a background task
|
||||
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
|
||||
with wandb_tracing_enabled():
|
||||
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
await task
|
||||
@@ -0,0 +1,333 @@
|
||||
"""Test ChatOpenAI wrapper."""
|
||||
from typing import Any, Optional
|
||||
|
||||
import pytest
|
||||
from langchain_core.callbacks import CallbackManager
|
||||
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage
|
||||
from langchain_core.outputs import (
|
||||
ChatGeneration,
|
||||
ChatResult,
|
||||
LLMResult,
|
||||
)
|
||||
from langchain_core.prompts import ChatPromptTemplate
|
||||
from langchain_core.pydantic_v1 import BaseModel, Field
|
||||
|
||||
from langchain_community.chat_models.openai import ChatOpenAI
|
||||
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_chat_openai() -> None:
|
||||
"""Test ChatOpenAI wrapper."""
|
||||
chat = ChatOpenAI(
|
||||
temperature=0.7,
|
||||
base_url=None,
|
||||
organization=None,
|
||||
openai_proxy=None,
|
||||
timeout=10.0,
|
||||
max_retries=3,
|
||||
http_client=None,
|
||||
n=1,
|
||||
max_tokens=10,
|
||||
default_headers=None,
|
||||
default_query=None,
|
||||
)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = chat([message])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_chat_openai_model() -> None:
|
||||
"""Test ChatOpenAI wrapper handles model_name."""
|
||||
chat = ChatOpenAI(model="foo")
|
||||
assert chat.model_name == "foo"
|
||||
chat = ChatOpenAI(model_name="bar")
|
||||
assert chat.model_name == "bar"
|
||||
|
||||
|
||||
def test_chat_openai_system_message() -> None:
|
||||
"""Test ChatOpenAI wrapper with system message."""
|
||||
chat = ChatOpenAI(max_tokens=10)
|
||||
system_message = SystemMessage(content="You are to chat with the user.")
|
||||
human_message = HumanMessage(content="Hello")
|
||||
response = chat([system_message, human_message])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_chat_openai_generate() -> None:
|
||||
"""Test ChatOpenAI wrapper with generate."""
|
||||
chat = ChatOpenAI(max_tokens=10, n=2)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = chat.generate([[message], [message]])
|
||||
assert isinstance(response, LLMResult)
|
||||
assert len(response.generations) == 2
|
||||
assert response.llm_output
|
||||
for generations in response.generations:
|
||||
assert len(generations) == 2
|
||||
for generation in generations:
|
||||
assert isinstance(generation, ChatGeneration)
|
||||
assert isinstance(generation.text, str)
|
||||
assert generation.text == generation.message.content
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_chat_openai_multiple_completions() -> None:
|
||||
"""Test ChatOpenAI wrapper with multiple completions."""
|
||||
chat = ChatOpenAI(max_tokens=10, n=5)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = chat._generate([message])
|
||||
assert isinstance(response, ChatResult)
|
||||
assert len(response.generations) == 5
|
||||
for generation in response.generations:
|
||||
assert isinstance(generation.message, BaseMessage)
|
||||
assert isinstance(generation.message.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_chat_openai_streaming() -> None:
|
||||
"""Test that streaming correctly invokes on_llm_new_token callback."""
|
||||
callback_handler = FakeCallbackHandler()
|
||||
callback_manager = CallbackManager([callback_handler])
|
||||
chat = ChatOpenAI(
|
||||
max_tokens=10,
|
||||
streaming=True,
|
||||
temperature=0,
|
||||
callback_manager=callback_manager,
|
||||
verbose=True,
|
||||
)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = chat([message])
|
||||
assert callback_handler.llm_streams > 0
|
||||
assert isinstance(response, BaseMessage)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_chat_openai_streaming_generation_info() -> None:
|
||||
"""Test that generation info is preserved when streaming."""
|
||||
|
||||
class _FakeCallback(FakeCallbackHandler):
|
||||
saved_things: dict = {}
|
||||
|
||||
def on_llm_end(
|
||||
self,
|
||||
*args: Any,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
# Save the generation
|
||||
self.saved_things["generation"] = args[0]
|
||||
|
||||
callback = _FakeCallback()
|
||||
callback_manager = CallbackManager([callback])
|
||||
chat = ChatOpenAI(
|
||||
max_tokens=2,
|
||||
temperature=0,
|
||||
callback_manager=callback_manager,
|
||||
)
|
||||
list(chat.stream("hi"))
|
||||
generation = callback.saved_things["generation"]
|
||||
# `Hello!` is two tokens, assert that that is what is returned
|
||||
assert generation.generations[0][0].text == "Hello!"
|
||||
|
||||
|
||||
def test_chat_openai_llm_output_contains_model_name() -> None:
|
||||
"""Test llm_output contains model_name."""
|
||||
chat = ChatOpenAI(max_tokens=10)
|
||||
message = HumanMessage(content="Hello")
|
||||
llm_result = chat.generate([[message]])
|
||||
assert llm_result.llm_output is not None
|
||||
assert llm_result.llm_output["model_name"] == chat.model_name
|
||||
|
||||
|
||||
def test_chat_openai_streaming_llm_output_contains_model_name() -> None:
|
||||
"""Test llm_output contains model_name."""
|
||||
chat = ChatOpenAI(max_tokens=10, streaming=True)
|
||||
message = HumanMessage(content="Hello")
|
||||
llm_result = chat.generate([[message]])
|
||||
assert llm_result.llm_output is not None
|
||||
assert llm_result.llm_output["model_name"] == chat.model_name
|
||||
|
||||
|
||||
def test_chat_openai_invalid_streaming_params() -> None:
|
||||
"""Test that streaming correctly invokes on_llm_new_token callback."""
|
||||
with pytest.raises(ValueError):
|
||||
ChatOpenAI(
|
||||
max_tokens=10,
|
||||
streaming=True,
|
||||
temperature=0,
|
||||
n=5,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_async_chat_openai() -> None:
|
||||
"""Test async generation."""
|
||||
chat = ChatOpenAI(max_tokens=10, n=2)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = await chat.agenerate([[message], [message]])
|
||||
assert isinstance(response, LLMResult)
|
||||
assert len(response.generations) == 2
|
||||
assert response.llm_output
|
||||
for generations in response.generations:
|
||||
assert len(generations) == 2
|
||||
for generation in generations:
|
||||
assert isinstance(generation, ChatGeneration)
|
||||
assert isinstance(generation.text, str)
|
||||
assert generation.text == generation.message.content
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_async_chat_openai_streaming() -> None:
|
||||
"""Test that streaming correctly invokes on_llm_new_token callback."""
|
||||
callback_handler = FakeCallbackHandler()
|
||||
callback_manager = CallbackManager([callback_handler])
|
||||
chat = ChatOpenAI(
|
||||
max_tokens=10,
|
||||
streaming=True,
|
||||
temperature=0,
|
||||
callback_manager=callback_manager,
|
||||
verbose=True,
|
||||
)
|
||||
message = HumanMessage(content="Hello")
|
||||
response = await chat.agenerate([[message], [message]])
|
||||
assert callback_handler.llm_streams > 0
|
||||
assert isinstance(response, LLMResult)
|
||||
assert len(response.generations) == 2
|
||||
for generations in response.generations:
|
||||
assert len(generations) == 1
|
||||
for generation in generations:
|
||||
assert isinstance(generation, ChatGeneration)
|
||||
assert isinstance(generation.text, str)
|
||||
assert generation.text == generation.message.content
|
||||
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_async_chat_openai_bind_functions() -> None:
|
||||
"""Test ChatOpenAI wrapper with multiple completions."""
|
||||
|
||||
class Person(BaseModel):
|
||||
"""Identifying information about a person."""
|
||||
|
||||
name: str = Field(..., title="Name", description="The person's name")
|
||||
age: int = Field(..., title="Age", description="The person's age")
|
||||
fav_food: Optional[str] = Field(
|
||||
default=None, title="Fav Food", description="The person's favorite food"
|
||||
)
|
||||
|
||||
chat = ChatOpenAI(
|
||||
max_tokens=30,
|
||||
n=1,
|
||||
streaming=True,
|
||||
).bind_functions(functions=[Person], function_call="Person")
|
||||
|
||||
prompt = ChatPromptTemplate.from_messages(
|
||||
[
|
||||
("system", "Use the provided Person function"),
|
||||
("user", "{input}"),
|
||||
]
|
||||
)
|
||||
|
||||
chain = prompt | chat
|
||||
|
||||
message = HumanMessage(content="Sally is 13 years old")
|
||||
response = await chain.abatch([{"input": message}])
|
||||
|
||||
assert isinstance(response, list)
|
||||
assert len(response) == 1
|
||||
for generation in response:
|
||||
assert isinstance(generation, AIMessage)
|
||||
|
||||
|
||||
def test_chat_openai_extra_kwargs() -> None:
|
||||
"""Test extra kwargs to chat openai."""
|
||||
# Check that foo is saved in extra_kwargs.
|
||||
llm = ChatOpenAI(foo=3, max_tokens=10)
|
||||
assert llm.max_tokens == 10
|
||||
assert llm.model_kwargs == {"foo": 3}
|
||||
|
||||
# Test that if extra_kwargs are provided, they are added to it.
|
||||
llm = ChatOpenAI(foo=3, model_kwargs={"bar": 2})
|
||||
assert llm.model_kwargs == {"foo": 3, "bar": 2}
|
||||
|
||||
# Test that if provided twice it errors
|
||||
with pytest.raises(ValueError):
|
||||
ChatOpenAI(foo=3, model_kwargs={"foo": 2})
|
||||
|
||||
# Test that if explicit param is specified in kwargs it errors
|
||||
with pytest.raises(ValueError):
|
||||
ChatOpenAI(model_kwargs={"temperature": 0.2})
|
||||
|
||||
# Test that "model" cannot be specified in kwargs
|
||||
with pytest.raises(ValueError):
|
||||
ChatOpenAI(model_kwargs={"model": "text-davinci-003"})
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_openai_streaming() -> None:
|
||||
"""Test streaming tokens from OpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
for token in llm.stream("I'm Pickle Rick"):
|
||||
assert isinstance(token.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_openai_astream() -> None:
|
||||
"""Test streaming tokens from OpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
async for token in llm.astream("I'm Pickle Rick"):
|
||||
assert isinstance(token.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_openai_abatch() -> None:
|
||||
"""Test streaming tokens from ChatOpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
result = await llm.abatch(["I'm Pickle Rick", "I'm not Pickle Rick"])
|
||||
for token in result:
|
||||
assert isinstance(token.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_openai_abatch_tags() -> None:
|
||||
"""Test batch tokens from ChatOpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
result = await llm.abatch(
|
||||
["I'm Pickle Rick", "I'm not Pickle Rick"], config={"tags": ["foo"]}
|
||||
)
|
||||
for token in result:
|
||||
assert isinstance(token.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_openai_batch() -> None:
|
||||
"""Test batch tokens from ChatOpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
result = llm.batch(["I'm Pickle Rick", "I'm not Pickle Rick"])
|
||||
for token in result:
|
||||
assert isinstance(token.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_openai_ainvoke() -> None:
|
||||
"""Test invoke tokens from ChatOpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
result = await llm.ainvoke("I'm Pickle Rick", config={"tags": ["foo"]})
|
||||
assert isinstance(result.content, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_openai_invoke() -> None:
|
||||
"""Test invoke tokens from ChatOpenAI."""
|
||||
llm = ChatOpenAI(max_tokens=10)
|
||||
|
||||
result = llm.invoke("I'm Pickle Rick", config=dict(tags=["foo"]))
|
||||
assert isinstance(result.content, str)
|
||||
@@ -0,0 +1,219 @@
|
||||
"""Test Baidu Qianfan Chat Endpoint."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.callbacks import CallbackManager
|
||||
from langchain_core.messages import (
|
||||
AIMessage,
|
||||
BaseMessage,
|
||||
FunctionMessage,
|
||||
HumanMessage,
|
||||
)
|
||||
from langchain_core.outputs import ChatGeneration, LLMResult
|
||||
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
|
||||
|
||||
from langchain_community.chat_models.baidu_qianfan_endpoint import QianfanChatEndpoint
|
||||
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
|
||||
|
||||
_FUNCTIONS: Any = [
|
||||
{
|
||||
"name": "format_person_info",
|
||||
"description": (
|
||||
"Output formatter. Should always be used to format your response to the"
|
||||
" user."
|
||||
),
|
||||
"parameters": {
|
||||
"title": "Person",
|
||||
"description": "Identifying information about a person.",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"title": "Name",
|
||||
"description": "The person's name",
|
||||
"type": "string",
|
||||
},
|
||||
"age": {
|
||||
"title": "Age",
|
||||
"description": "The person's age",
|
||||
"type": "integer",
|
||||
},
|
||||
"fav_food": {
|
||||
"title": "Fav Food",
|
||||
"description": "The person's favorite food",
|
||||
"type": "string",
|
||||
},
|
||||
},
|
||||
"required": ["name", "age"],
|
||||
},
|
||||
},
|
||||
{
|
||||
"name": "get_current_temperature",
|
||||
"description": ("Used to get the location's temperature."),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "city name",
|
||||
},
|
||||
"unit": {
|
||||
"type": "string",
|
||||
"enum": ["centigrade", "Fahrenheit"],
|
||||
},
|
||||
},
|
||||
"required": ["location", "unit"],
|
||||
},
|
||||
"responses": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"temperature": {
|
||||
"type": "integer",
|
||||
"description": "city temperature",
|
||||
},
|
||||
"unit": {
|
||||
"type": "string",
|
||||
"enum": ["centigrade", "Fahrenheit"],
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def test_default_call() -> None:
|
||||
"""Test default model(`ERNIE-Bot`) call."""
|
||||
chat = QianfanChatEndpoint()
|
||||
response = chat(messages=[HumanMessage(content="Hello")])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_model() -> None:
|
||||
"""Test model kwarg works."""
|
||||
chat = QianfanChatEndpoint(model="BLOOMZ-7B")
|
||||
response = chat(messages=[HumanMessage(content="Hello")])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_model_param() -> None:
|
||||
"""Test model params works."""
|
||||
chat = QianfanChatEndpoint()
|
||||
response = chat(model="BLOOMZ-7B", messages=[HumanMessage(content="Hello")])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_endpoint() -> None:
|
||||
"""Test user custom model deployments like some open source models."""
|
||||
chat = QianfanChatEndpoint(endpoint="qianfan_bloomz_7b_compressed")
|
||||
response = chat(messages=[HumanMessage(content="Hello")])
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_endpoint_param() -> None:
|
||||
"""Test user custom model deployments like some open source models."""
|
||||
chat = QianfanChatEndpoint()
|
||||
response = chat(
|
||||
messages=[
|
||||
HumanMessage(endpoint="qianfan_bloomz_7b_compressed", content="Hello")
|
||||
]
|
||||
)
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_multiple_history() -> None:
|
||||
"""Tests multiple history works."""
|
||||
chat = QianfanChatEndpoint()
|
||||
|
||||
response = chat(
|
||||
messages=[
|
||||
HumanMessage(content="Hello."),
|
||||
AIMessage(content="Hello!"),
|
||||
HumanMessage(content="How are you doing?"),
|
||||
]
|
||||
)
|
||||
assert isinstance(response, BaseMessage)
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_stream() -> None:
|
||||
"""Test that stream works."""
|
||||
chat = QianfanChatEndpoint(streaming=True)
|
||||
callback_handler = FakeCallbackHandler()
|
||||
callback_manager = CallbackManager([callback_handler])
|
||||
response = chat(
|
||||
messages=[
|
||||
HumanMessage(content="Hello."),
|
||||
AIMessage(content="Hello!"),
|
||||
HumanMessage(content="Who are you?"),
|
||||
],
|
||||
stream=True,
|
||||
callbacks=callback_manager,
|
||||
)
|
||||
assert callback_handler.llm_streams > 0
|
||||
assert isinstance(response.content, str)
|
||||
|
||||
|
||||
def test_multiple_messages() -> None:
|
||||
"""Tests multiple messages works."""
|
||||
chat = QianfanChatEndpoint()
|
||||
message = HumanMessage(content="Hi, how are you.")
|
||||
response = chat.generate([[message], [message]])
|
||||
|
||||
assert isinstance(response, LLMResult)
|
||||
assert len(response.generations) == 2
|
||||
for generations in response.generations:
|
||||
assert len(generations) == 1
|
||||
for generation in generations:
|
||||
assert isinstance(generation, ChatGeneration)
|
||||
assert isinstance(generation.text, str)
|
||||
assert generation.text == generation.message.content
|
||||
|
||||
|
||||
def test_functions_call_thoughts() -> None:
|
||||
chat = QianfanChatEndpoint(model="ERNIE-Bot")
|
||||
|
||||
prompt_tmpl = "Use the given functions to answer following question: {input}"
|
||||
prompt_msgs = [
|
||||
HumanMessagePromptTemplate.from_template(prompt_tmpl),
|
||||
]
|
||||
prompt = ChatPromptTemplate(messages=prompt_msgs)
|
||||
|
||||
chain = prompt | chat.bind(functions=_FUNCTIONS)
|
||||
|
||||
message = HumanMessage(content="What's the temperature in Shanghai today?")
|
||||
response = chain.batch([{"input": message}])
|
||||
assert isinstance(response[0], AIMessage)
|
||||
assert "function_call" in response[0].additional_kwargs
|
||||
|
||||
|
||||
def test_functions_call() -> None:
|
||||
chat = QianfanChatEndpoint(model="ERNIE-Bot")
|
||||
|
||||
prompt = ChatPromptTemplate(
|
||||
messages=[
|
||||
HumanMessage(content="What's the temperature in Shanghai today?"),
|
||||
AIMessage(
|
||||
content="",
|
||||
additional_kwargs={
|
||||
"function_call": {
|
||||
"name": "get_current_temperature",
|
||||
"thoughts": "i will use get_current_temperature "
|
||||
"to resolve the questions",
|
||||
"arguments": '{"location":"Shanghai","unit":"centigrade"}',
|
||||
}
|
||||
},
|
||||
),
|
||||
FunctionMessage(
|
||||
name="get_current_weather",
|
||||
content='{"temperature": "25", \
|
||||
"unit": "摄氏度", "description": "晴朗"}',
|
||||
),
|
||||
]
|
||||
)
|
||||
chain = prompt | chat.bind(functions=_FUNCTIONS)
|
||||
resp = chain.invoke({})
|
||||
assert isinstance(resp, AIMessage)
|
||||
@@ -0,0 +1,182 @@
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from langchain_community.document_loaders.concurrent import ConcurrentLoader
|
||||
from langchain_community.document_loaders.generic import GenericLoader
|
||||
from langchain_community.document_loaders.parsers import LanguageParser
|
||||
|
||||
|
||||
def test_language_loader_for_python() -> None:
|
||||
"""Test Python loader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 2
|
||||
|
||||
metadata = docs[0].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.py")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "python"
|
||||
metadata = docs[1].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.py")
|
||||
assert metadata["content_type"] == "simplified_code"
|
||||
assert metadata["language"] == "python"
|
||||
|
||||
assert (
|
||||
docs[0].page_content
|
||||
== """def main():
|
||||
print("Hello World!")
|
||||
|
||||
return 0"""
|
||||
)
|
||||
assert (
|
||||
docs[1].page_content
|
||||
== """#!/usr/bin/env python3
|
||||
|
||||
import sys
|
||||
|
||||
|
||||
# Code for: def main():
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())"""
|
||||
)
|
||||
|
||||
|
||||
def test_language_loader_for_python_with_parser_threshold() -> None:
|
||||
"""Test Python loader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.py",
|
||||
parser=LanguageParser(language="python", parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def esprima_installed() -> bool:
|
||||
try:
|
||||
import esprima # noqa: F401
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"esprima not installed, skipping test {e}")
|
||||
return False
|
||||
|
||||
|
||||
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
|
||||
def test_language_loader_for_javascript() -> None:
|
||||
"""Test JavaScript loader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 3
|
||||
|
||||
metadata = docs[0].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "js"
|
||||
metadata = docs[1].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "functions_classes"
|
||||
assert metadata["language"] == "js"
|
||||
metadata = docs[2].metadata
|
||||
assert metadata["source"] == str(file_path / "hello_world.js")
|
||||
assert metadata["content_type"] == "simplified_code"
|
||||
assert metadata["language"] == "js"
|
||||
|
||||
assert (
|
||||
docs[0].page_content
|
||||
== """class HelloWorld {
|
||||
sayHello() {
|
||||
console.log("Hello World!");
|
||||
}
|
||||
}"""
|
||||
)
|
||||
assert (
|
||||
docs[1].page_content
|
||||
== """function main() {
|
||||
const hello = new HelloWorld();
|
||||
hello.sayHello();
|
||||
}"""
|
||||
)
|
||||
assert (
|
||||
docs[2].page_content
|
||||
== """// Code for: class HelloWorld {
|
||||
|
||||
// Code for: function main() {
|
||||
|
||||
main();"""
|
||||
)
|
||||
|
||||
|
||||
def test_language_loader_for_javascript_with_parser_threshold() -> None:
|
||||
"""Test JavaScript loader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = GenericLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.js",
|
||||
parser=LanguageParser(language="js", parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def test_concurrent_language_loader_for_javascript_with_parser_threshold() -> None:
|
||||
"""Test JavaScript ConcurrentLoader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = ConcurrentLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.js",
|
||||
parser=LanguageParser(language="js", parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
def test_concurrent_language_loader_for_python_with_parser_threshold() -> None:
|
||||
"""Test Python ConcurrentLoader with parser enabled and below threshold."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = ConcurrentLoader.from_filesystem(
|
||||
file_path,
|
||||
glob="hello_world.py",
|
||||
parser=LanguageParser(language="python", parser_threshold=1000),
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 1
|
||||
|
||||
|
||||
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
|
||||
def test_concurrent_language_loader_for_javascript() -> None:
|
||||
"""Test JavaScript ConcurrentLoader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = ConcurrentLoader.from_filesystem(
|
||||
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 3
|
||||
|
||||
|
||||
def test_concurrent_language_loader_for_python() -> None:
|
||||
"""Test Python ConcurrentLoader with parser enabled."""
|
||||
file_path = Path(__file__).parent.parent.parent / "examples"
|
||||
loader = ConcurrentLoader.from_filesystem(
|
||||
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
|
||||
)
|
||||
docs = loader.load()
|
||||
|
||||
assert len(docs) == 2
|
||||
@@ -0,0 +1,136 @@
|
||||
"""Test Fireworks AI API Wrapper."""
|
||||
from typing import Generator
|
||||
|
||||
import pytest
|
||||
from langchain_core.outputs import LLMResult
|
||||
|
||||
from langchain_community.llms.fireworks import Fireworks
|
||||
|
||||
@pytest.fixture
|
||||
def llm() -> Fireworks:
|
||||
return Fireworks(model_kwargs={"temperature": 0, "max_tokens": 512})
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_call(llm: Fireworks) -> None:
|
||||
"""Test valid call to fireworks."""
|
||||
output = llm("How is the weather in New York today?")
|
||||
assert isinstance(output, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_model_param() -> None:
|
||||
"""Tests model parameters for Fireworks"""
|
||||
llm = Fireworks(model="foo")
|
||||
assert llm.model == "foo"
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_invoke(llm: Fireworks) -> None:
|
||||
"""Tests completion with invoke"""
|
||||
output = llm.invoke("How is the weather in New York today?", stop=[","])
|
||||
assert isinstance(output, str)
|
||||
assert output[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_fireworks_ainvoke(llm: Fireworks) -> None:
|
||||
"""Tests completion with invoke"""
|
||||
output = await llm.ainvoke("How is the weather in New York today?", stop=[","])
|
||||
assert isinstance(output, str)
|
||||
assert output[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_batch(llm: Fireworks) -> None:
|
||||
"""Tests completion with invoke"""
|
||||
llm = Fireworks()
|
||||
output = llm.batch(
|
||||
[
|
||||
"How is the weather in New York today?",
|
||||
"How is the weather in New York today?",
|
||||
],
|
||||
stop=[","],
|
||||
)
|
||||
for token in output:
|
||||
assert isinstance(token, str)
|
||||
assert token[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_fireworks_abatch(llm: Fireworks) -> None:
|
||||
"""Tests completion with invoke"""
|
||||
output = await llm.abatch(
|
||||
[
|
||||
"How is the weather in New York today?",
|
||||
"How is the weather in New York today?",
|
||||
],
|
||||
stop=[","],
|
||||
)
|
||||
for token in output:
|
||||
assert isinstance(token, str)
|
||||
assert token[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_multiple_prompts(
|
||||
llm: Fireworks,
|
||||
) -> None:
|
||||
"""Test completion with multiple prompts."""
|
||||
output = llm.generate(["How is the weather in New York today?", "I'm pickle rick"])
|
||||
assert isinstance(output, LLMResult)
|
||||
assert isinstance(output.generations, list)
|
||||
assert len(output.generations) == 2
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_streaming(llm: Fireworks) -> None:
|
||||
"""Test stream completion."""
|
||||
generator = llm.stream("Who's the best quarterback in the NFL?")
|
||||
assert isinstance(generator, Generator)
|
||||
|
||||
for token in generator:
|
||||
assert isinstance(token, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_fireworks_streaming_stop_words(llm: Fireworks) -> None:
|
||||
"""Test stream completion with stop words."""
|
||||
generator = llm.stream("Who's the best quarterback in the NFL?", stop=[","])
|
||||
assert isinstance(generator, Generator)
|
||||
|
||||
last_token = ""
|
||||
for token in generator:
|
||||
last_token = token
|
||||
assert isinstance(token, str)
|
||||
assert last_token[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_fireworks_streaming_async(llm: Fireworks) -> None:
|
||||
"""Test stream completion."""
|
||||
|
||||
last_token = ""
|
||||
async for token in llm.astream(
|
||||
"Who's the best quarterback in the NFL?", stop=[","]
|
||||
):
|
||||
last_token = token
|
||||
assert isinstance(token, str)
|
||||
assert last_token[-1] == ","
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_fireworks_async_agenerate(llm: Fireworks) -> None:
|
||||
"""Test async."""
|
||||
output = await llm.agenerate(["What is the best city to live in California?"])
|
||||
assert isinstance(output, LLMResult)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_fireworks_multiple_prompts_async_agenerate(llm: Fireworks) -> None:
|
||||
output = await llm.agenerate(
|
||||
["How is the weather in New York today?", "I'm pickle rick"]
|
||||
)
|
||||
assert isinstance(output, LLMResult)
|
||||
assert isinstance(output.generations, list)
|
||||
assert len(output.generations) == 2
|
||||
@@ -0,0 +1,77 @@
|
||||
import langchain_community.utilities.opaqueprompts as op
|
||||
from langchain_core.output_parsers import StrOutputParser
|
||||
from langchain_core.prompts import PromptTemplate
|
||||
from langchain_core.runnables import RunnableParallel
|
||||
|
||||
from langchain_community.llms import OpenAI
|
||||
from langchain_community.llms.opaqueprompts import OpaquePrompts
|
||||
|
||||
prompt_template = """
|
||||
As an AI assistant, you will answer questions according to given context.
|
||||
|
||||
Sensitive personal information in the question is masked for privacy.
|
||||
For instance, if the original text says "Giana is good," it will be changed
|
||||
to "PERSON_998 is good."
|
||||
|
||||
Here's how to handle these changes:
|
||||
* Consider these masked phrases just as placeholders, but still refer to
|
||||
them in a relevant way when answering.
|
||||
* It's possible that different masked terms might mean the same thing.
|
||||
Stick with the given term and don't modify it.
|
||||
* All masked terms follow the "TYPE_ID" pattern.
|
||||
* Please don't invent new masked terms. For instance, if you see "PERSON_998,"
|
||||
don't come up with "PERSON_997" or "PERSON_999" unless they're already in the question.
|
||||
|
||||
Conversation History: ```{history}```
|
||||
Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,
|
||||
John Doe provided me with his personal details. His email is johndoe@example.com
|
||||
and his contact number is 650-456-7890. He lives in New York City, USA, and
|
||||
belongs to the American nationality with Christian beliefs and a leaning towards
|
||||
the Democratic party. He mentioned that he recently made a transaction using his
|
||||
credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address
|
||||
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he
|
||||
noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided
|
||||
his website as https://johndoeportfolio.com. John also discussed
|
||||
some of his US-specific details. He said his bank account number is
|
||||
1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321,
|
||||
and he recently renewed his passport,
|
||||
the number for which is 123456789. He emphasized not to share his SSN, which is
|
||||
669-45-6789. Furthermore, he mentioned that he accesses his work files remotely
|
||||
through the IP 192.168.1.1 and has a medical license number MED-123456. ```
|
||||
Question: ```{question}```
|
||||
"""
|
||||
|
||||
|
||||
def test_opaqueprompts() -> None:
|
||||
chain = PromptTemplate.from_template(prompt_template) | OpaquePrompts(llm=OpenAI())
|
||||
output = chain.invoke(
|
||||
{
|
||||
"question": "Write a text message to remind John to do password reset \
|
||||
for his website through his email to stay secure."
|
||||
}
|
||||
)
|
||||
assert isinstance(output, str)
|
||||
|
||||
|
||||
def test_opaqueprompts_functions() -> None:
|
||||
prompt = (PromptTemplate.from_template(prompt_template),)
|
||||
llm = OpenAI()
|
||||
pg_chain = (
|
||||
op.sanitize
|
||||
| RunnableParallel(
|
||||
secure_context=lambda x: x["secure_context"], # type: ignore
|
||||
response=(lambda x: x["sanitized_input"]) # type: ignore
|
||||
| prompt
|
||||
| llm
|
||||
| StrOutputParser(),
|
||||
)
|
||||
| (lambda x: op.desanitize(x["response"], x["secure_context"]))
|
||||
)
|
||||
|
||||
pg_chain.invoke(
|
||||
{
|
||||
"question": "Write a text message to remind John to do password reset\
|
||||
for his website through his email to stay secure.",
|
||||
"history": "",
|
||||
}
|
||||
)
|
||||
@@ -0,0 +1,42 @@
|
||||
"""Test Nebula API wrapper."""
|
||||
from langchain_community.llms.symblai_nebula import Nebula
|
||||
|
||||
|
||||
def test_symblai_nebula_call() -> None:
|
||||
"""Test valid call to Nebula."""
|
||||
conversation = """Sam: Good morning, team! Let's keep this standup concise.
|
||||
We'll go in the usual order: what you did yesterday,
|
||||
what you plan to do today, and any blockers. Alex, kick us off.
|
||||
Alex: Morning! Yesterday, I wrapped up the UI for the user dashboard.
|
||||
The new charts and widgets are now responsive.
|
||||
I also had a sync with the design team to ensure the final touchups are in
|
||||
line with the brand guidelines. Today, I'll start integrating the frontend with
|
||||
the new API endpoints Rhea was working on.
|
||||
The only blocker is waiting for some final API documentation,
|
||||
but I guess Rhea can update on that.
|
||||
Rhea: Hey, all! Yep, about the API documentation - I completed the majority of
|
||||
the backend work for user data retrieval yesterday.
|
||||
The endpoints are mostly set up, but I need to do a bit more testing today.
|
||||
I'll finalize the API documentation by noon, so that should unblock Alex.
|
||||
After that, I’ll be working on optimizing the database queries
|
||||
for faster data fetching. No other blockers on my end.
|
||||
Sam: Great, thanks Rhea. Do reach out if you need any testing assistance
|
||||
or if there are any hitches with the database.
|
||||
Now, my update: Yesterday, I coordinated with the client to get clarity
|
||||
on some feature requirements. Today, I'll be updating our project roadmap
|
||||
and timelines based on their feedback. Additionally, I'll be sitting with
|
||||
the QA team in the afternoon for preliminary testing.
|
||||
Blocker: I might need both of you to be available for a quick call
|
||||
in case the client wants to discuss the changes live.
|
||||
Alex: Sounds good, Sam. Just let us know a little in advance for the call.
|
||||
Rhea: Agreed. We can make time for that.
|
||||
Sam: Perfect! Let's keep the momentum going. Reach out if there are any
|
||||
sudden issues or support needed. Have a productive day!
|
||||
Alex: You too.
|
||||
Rhea: Thanks, bye!"""
|
||||
llm = Nebula(nebula_api_key="<your_api_key>")
|
||||
|
||||
instruction = """Identify the main objectives mentioned in this
|
||||
conversation."""
|
||||
output = llm.invoke(f"{instruction}\n{conversation}")
|
||||
assert isinstance(output, str)
|
||||
@@ -0,0 +1,151 @@
|
||||
"""Test Vertex AI API wrapper.
|
||||
In order to run this test, you need to install VertexAI SDK:
|
||||
pip install google-cloud-aiplatform>=1.36.0
|
||||
|
||||
Your end-user credentials would be used to make the calls (make sure you've run
|
||||
`gcloud auth login` first).
|
||||
"""
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
import pytest
|
||||
from langchain_core.outputs import LLMResult
|
||||
|
||||
from langchain_community.llms import VertexAI, VertexAIModelGarden
|
||||
|
||||
|
||||
def test_vertex_initialization() -> None:
|
||||
llm = VertexAI()
|
||||
assert llm._llm_type == "vertexai"
|
||||
assert llm.model_name == llm.client._model_id
|
||||
|
||||
|
||||
def test_vertex_call() -> None:
|
||||
llm = VertexAI(temperature=0)
|
||||
output = llm("Say foo:")
|
||||
assert isinstance(output, str)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_vertex_generate() -> None:
|
||||
llm = VertexAI(temperature=0.3, n=2, model_name="text-bison@001")
|
||||
output = llm.generate(["Say foo:"])
|
||||
assert isinstance(output, LLMResult)
|
||||
assert len(output.generations) == 1
|
||||
assert len(output.generations[0]) == 2
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_vertex_generate_code() -> None:
|
||||
llm = VertexAI(temperature=0.3, n=2, model_name="code-bison@001")
|
||||
output = llm.generate(["generate a python method that says foo:"])
|
||||
assert isinstance(output, LLMResult)
|
||||
assert len(output.generations) == 1
|
||||
assert len(output.generations[0]) == 2
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
async def test_vertex_agenerate() -> None:
|
||||
llm = VertexAI(temperature=0)
|
||||
output = await llm.agenerate(["Please say foo:"])
|
||||
assert isinstance(output, LLMResult)
|
||||
|
||||
|
||||
@pytest.mark.scheduled
|
||||
def test_vertex_stream() -> None:
|
||||
llm = VertexAI(temperature=0)
|
||||
outputs = list(llm.stream("Please say foo:"))
|
||||
assert isinstance(outputs[0], str)
|
||||
|
||||
|
||||
async def test_vertex_consistency() -> None:
|
||||
llm = VertexAI(temperature=0)
|
||||
output = llm.generate(["Please say foo:"])
|
||||
streaming_output = llm.generate(["Please say foo:"], stream=True)
|
||||
async_output = await llm.agenerate(["Please say foo:"])
|
||||
assert output.generations[0][0].text == streaming_output.generations[0][0].text
|
||||
assert output.generations[0][0].text == async_output.generations[0][0].text
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"endpoint_os_variable_name,result_arg",
|
||||
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
|
||||
)
|
||||
def test_model_garden(
|
||||
endpoint_os_variable_name: str, result_arg: Optional[str]
|
||||
) -> None:
|
||||
"""In order to run this test, you should provide endpoint names.
|
||||
|
||||
Example:
|
||||
export FALCON_ENDPOINT_ID=...
|
||||
export LLAMA_ENDPOINT_ID=...
|
||||
export PROJECT=...
|
||||
"""
|
||||
endpoint_id = os.environ[endpoint_os_variable_name]
|
||||
project = os.environ["PROJECT"]
|
||||
location = "europe-west4"
|
||||
llm = VertexAIModelGarden(
|
||||
endpoint_id=endpoint_id,
|
||||
project=project,
|
||||
result_arg=result_arg,
|
||||
location=location,
|
||||
)
|
||||
output = llm("What is the meaning of life?")
|
||||
assert isinstance(output, str)
|
||||
assert llm._llm_type == "vertexai_model_garden"
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"endpoint_os_variable_name,result_arg",
|
||||
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
|
||||
)
|
||||
def test_model_garden_generate(
|
||||
endpoint_os_variable_name: str, result_arg: Optional[str]
|
||||
) -> None:
|
||||
"""In order to run this test, you should provide endpoint names.
|
||||
|
||||
Example:
|
||||
export FALCON_ENDPOINT_ID=...
|
||||
export LLAMA_ENDPOINT_ID=...
|
||||
export PROJECT=...
|
||||
"""
|
||||
endpoint_id = os.environ[endpoint_os_variable_name]
|
||||
project = os.environ["PROJECT"]
|
||||
location = "europe-west4"
|
||||
llm = VertexAIModelGarden(
|
||||
endpoint_id=endpoint_id,
|
||||
project=project,
|
||||
result_arg=result_arg,
|
||||
location=location,
|
||||
)
|
||||
output = llm.generate(["What is the meaning of life?", "How much is 2+2"])
|
||||
assert isinstance(output, LLMResult)
|
||||
assert len(output.generations) == 2
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@pytest.mark.parametrize(
|
||||
"endpoint_os_variable_name,result_arg",
|
||||
[("FALCON_ENDPOINT_ID", "generated_text"), ("LLAMA_ENDPOINT_ID", None)],
|
||||
)
|
||||
async def test_model_garden_agenerate(
|
||||
endpoint_os_variable_name: str, result_arg: Optional[str]
|
||||
) -> None:
|
||||
endpoint_id = os.environ[endpoint_os_variable_name]
|
||||
project = os.environ["PROJECT"]
|
||||
location = "europe-west4"
|
||||
llm = VertexAIModelGarden(
|
||||
endpoint_id=endpoint_id,
|
||||
project=project,
|
||||
result_arg=result_arg,
|
||||
location=location,
|
||||
)
|
||||
output = await llm.agenerate(["What is the meaning of life?", "How much is 2+2"])
|
||||
assert isinstance(output, LLMResult)
|
||||
assert len(output.generations) == 2
|
||||
|
||||
|
||||
def test_vertex_call_count_tokens() -> None:
|
||||
llm = VertexAI()
|
||||
output = llm.get_num_tokens("How are you?")
|
||||
assert output == 4
|
||||
@@ -0,0 +1,171 @@
|
||||
"""Integration test for Arxiv API Wrapper."""
|
||||
from typing import Any, List
|
||||
|
||||
import pytest
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.tools import BaseTool
|
||||
|
||||
from langchain_community.tools import ArxivQueryRun
|
||||
from langchain_community.utilities import ArxivAPIWrapper
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def api_client() -> ArxivAPIWrapper:
|
||||
return ArxivAPIWrapper()
|
||||
|
||||
|
||||
def test_run_success_paper_name(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of paper name that returns the correct answer"""
|
||||
|
||||
output = api_client.run("Heat-bath random walks with Markov bases")
|
||||
assert "Probability distributions for Markov chains based quantum walks" in output
|
||||
assert (
|
||||
"Transformations of random walks on groups via Markov stopping times" in output
|
||||
)
|
||||
assert (
|
||||
"Recurrence of Multidimensional Persistent Random Walks. Fourier and Series "
|
||||
"Criteria" in output
|
||||
)
|
||||
|
||||
|
||||
def test_run_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of an arxiv identifier returns the correct answer"""
|
||||
|
||||
output = api_client.run("1605.08386v1")
|
||||
assert "Heat-bath random walks with Markov bases" in output
|
||||
|
||||
|
||||
def test_run_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of multiple arxiv identifiers that returns the correct answer"""
|
||||
|
||||
output = api_client.run("1605.08386v1 2212.00794v2 2308.07912")
|
||||
assert "Heat-bath random walks with Markov bases" in output
|
||||
assert "Scaling Language-Image Pre-training via Masking" in output
|
||||
assert (
|
||||
"Ultra-low mass PBHs in the early universe can explain the PTA signal" in output
|
||||
)
|
||||
|
||||
|
||||
def test_run_returns_several_docs(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test that returns several docs"""
|
||||
|
||||
output = api_client.run("Caprice Stanley")
|
||||
assert "On Mixing Behavior of a Family of Random Walks" in output
|
||||
|
||||
|
||||
def test_run_returns_no_result(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test that gives no result."""
|
||||
|
||||
output = api_client.run("1605.08386WWW")
|
||||
assert "No good Arxiv Result was found" == output
|
||||
|
||||
|
||||
def assert_docs(docs: List[Document]) -> None:
|
||||
for doc in docs:
|
||||
assert doc.page_content
|
||||
assert doc.metadata
|
||||
assert set(doc.metadata) == {"Published", "Title", "Authors", "Summary"}
|
||||
|
||||
|
||||
def test_load_success_paper_name(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of paper name that returns one document"""
|
||||
|
||||
docs = api_client.load("Heat-bath random walks with Markov bases")
|
||||
assert len(docs) == 3
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of an arxiv identifier that returns one document"""
|
||||
|
||||
docs = api_client.load("1605.08386v1")
|
||||
assert len(docs) == 1
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test a query of arxiv identifiers that returns the correct answer"""
|
||||
|
||||
docs = api_client.load("1605.08386v1 2212.00794v2 2308.07912")
|
||||
assert len(docs) == 3
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_returns_no_result(api_client: ArxivAPIWrapper) -> None:
|
||||
"""Test that returns no docs"""
|
||||
|
||||
docs = api_client.load("1605.08386WWW")
|
||||
assert len(docs) == 0
|
||||
|
||||
|
||||
def test_load_returns_limited_docs() -> None:
|
||||
"""Test that returns several docs"""
|
||||
expected_docs = 2
|
||||
api_client = ArxivAPIWrapper(load_max_docs=expected_docs)
|
||||
docs = api_client.load("ChatGPT")
|
||||
assert len(docs) == expected_docs
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_returns_limited_doc_content_chars() -> None:
|
||||
"""Test that returns limited doc_content_chars_max"""
|
||||
|
||||
doc_content_chars_max = 100
|
||||
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
|
||||
docs = api_client.load("1605.08386")
|
||||
assert len(docs[0].page_content) == doc_content_chars_max
|
||||
|
||||
|
||||
def test_load_returns_unlimited_doc_content_chars() -> None:
|
||||
"""Test that returns unlimited doc_content_chars_max"""
|
||||
|
||||
doc_content_chars_max = None
|
||||
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
|
||||
docs = api_client.load("1605.08386")
|
||||
assert len(docs[0].page_content) == pytest.approx(54338, rel=1e-2)
|
||||
|
||||
|
||||
def test_load_returns_full_set_of_metadata() -> None:
|
||||
"""Test that returns several docs"""
|
||||
api_client = ArxivAPIWrapper(load_max_docs=1, load_all_available_meta=True)
|
||||
docs = api_client.load("ChatGPT")
|
||||
assert len(docs) == 1
|
||||
for doc in docs:
|
||||
assert doc.page_content
|
||||
assert doc.metadata
|
||||
assert set(doc.metadata).issuperset(
|
||||
{"Published", "Title", "Authors", "Summary"}
|
||||
)
|
||||
print(doc.metadata)
|
||||
assert len(set(doc.metadata)) > 4
|
||||
|
||||
|
||||
def _load_arxiv_from_universal_entry(**kwargs: Any) -> BaseTool:
|
||||
from langchain.agents.load_tools import load_tools
|
||||
tools = load_tools(["arxiv"], **kwargs)
|
||||
assert len(tools) == 1, "loaded more than 1 tool"
|
||||
return tools[0]
|
||||
|
||||
|
||||
def test_load_arxiv_from_universal_entry() -> None:
|
||||
arxiv_tool = _load_arxiv_from_universal_entry()
|
||||
output = arxiv_tool("Caprice Stanley")
|
||||
assert (
|
||||
"On Mixing Behavior of a Family of Random Walks" in output
|
||||
), "failed to fetch a valid result"
|
||||
|
||||
|
||||
def test_load_arxiv_from_universal_entry_with_params() -> None:
|
||||
params = {
|
||||
"top_k_results": 1,
|
||||
"load_max_docs": 10,
|
||||
"load_all_available_meta": True,
|
||||
}
|
||||
arxiv_tool = _load_arxiv_from_universal_entry(**params)
|
||||
assert isinstance(arxiv_tool, ArxivQueryRun)
|
||||
wp = arxiv_tool.api_wrapper
|
||||
assert wp.top_k_results == 1, "failed to assert top_k_results"
|
||||
assert wp.load_max_docs == 10, "failed to assert load_max_docs"
|
||||
assert (
|
||||
wp.load_all_available_meta is True
|
||||
), "failed to assert load_all_available_meta"
|
||||
@@ -0,0 +1,164 @@
|
||||
"""Integration test for PubMed API Wrapper."""
|
||||
from typing import Any, List
|
||||
|
||||
import pytest
|
||||
from langchain_core.documents import Document
|
||||
from langchain_core.tools import BaseTool
|
||||
|
||||
from langchain_community.tools import PubmedQueryRun
|
||||
from langchain_community.utilities import PubMedAPIWrapper
|
||||
|
||||
xmltodict = pytest.importorskip("xmltodict")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def api_client() -> PubMedAPIWrapper:
|
||||
return PubMedAPIWrapper()
|
||||
|
||||
|
||||
def test_run_success(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that returns the correct answer"""
|
||||
|
||||
search_string = (
|
||||
"Examining the Validity of ChatGPT in Identifying "
|
||||
"Relevant Nephrology Literature"
|
||||
)
|
||||
output = api_client.run(search_string)
|
||||
test_string = (
|
||||
"Examining the Validity of ChatGPT in Identifying "
|
||||
"Relevant Nephrology Literature: Findings and Implications"
|
||||
)
|
||||
assert test_string in output
|
||||
assert len(output) == api_client.doc_content_chars_max
|
||||
|
||||
|
||||
def test_run_returns_no_result(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that gives no result."""
|
||||
|
||||
output = api_client.run("1605.08386WWW")
|
||||
assert "No good PubMed Result was found" == output
|
||||
|
||||
|
||||
def test_retrieve_article_returns_book_abstract(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that returns the excerpt of a book."""
|
||||
|
||||
output_nolabel = api_client.retrieve_article("25905357", "")
|
||||
output_withlabel = api_client.retrieve_article("29262144", "")
|
||||
test_string_nolabel = (
|
||||
"Osteoporosis is a multifactorial disorder associated with low bone mass and "
|
||||
"enhanced skeletal fragility. Although"
|
||||
)
|
||||
assert test_string_nolabel in output_nolabel["Summary"]
|
||||
assert (
|
||||
"Wallenberg syndrome was first described in 1808 by Gaspard Vieusseux. However,"
|
||||
in output_withlabel["Summary"]
|
||||
)
|
||||
|
||||
|
||||
def test_retrieve_article_returns_article_abstract(
|
||||
api_client: PubMedAPIWrapper,
|
||||
) -> None:
|
||||
"""Test that returns the abstract of an article."""
|
||||
|
||||
output_nolabel = api_client.retrieve_article("37666905", "")
|
||||
output_withlabel = api_client.retrieve_article("37666551", "")
|
||||
test_string_nolabel = (
|
||||
"This work aims to: (1) Provide maximal hand force data on six different "
|
||||
"grasp types for healthy subjects; (2) detect grasp types with maximal "
|
||||
"force significantly affected by hand osteoarthritis (HOA) in women; (3) "
|
||||
"look for predictors to detect HOA from the maximal forces using discriminant "
|
||||
"analyses."
|
||||
)
|
||||
assert test_string_nolabel in output_nolabel["Summary"]
|
||||
test_string_withlabel = (
|
||||
"OBJECTIVES: To assess across seven hospitals from six different countries "
|
||||
"the extent to which the COVID-19 pandemic affected the volumes of orthopaedic "
|
||||
"hospital admissions and patient outcomes for non-COVID-19 patients admitted "
|
||||
"for orthopaedic care."
|
||||
)
|
||||
assert test_string_withlabel in output_withlabel["Summary"]
|
||||
|
||||
|
||||
def test_retrieve_article_no_abstract_available(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that returns 'No abstract available'."""
|
||||
|
||||
output = api_client.retrieve_article("10766884", "")
|
||||
assert "No abstract available" == output["Summary"]
|
||||
|
||||
|
||||
def assert_docs(docs: List[Document]) -> None:
|
||||
for doc in docs:
|
||||
assert doc.metadata
|
||||
assert set(doc.metadata) == {
|
||||
"Copyright Information",
|
||||
"uid",
|
||||
"Title",
|
||||
"Published",
|
||||
}
|
||||
|
||||
|
||||
def test_load_success(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that returns one document"""
|
||||
|
||||
docs = api_client.load_docs("chatgpt")
|
||||
assert len(docs) == api_client.top_k_results == 3
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_returns_no_result(api_client: PubMedAPIWrapper) -> None:
|
||||
"""Test that returns no docs"""
|
||||
|
||||
docs = api_client.load_docs("1605.08386WWW")
|
||||
assert len(docs) == 0
|
||||
|
||||
|
||||
def test_load_returns_limited_docs() -> None:
|
||||
"""Test that returns several docs"""
|
||||
expected_docs = 2
|
||||
api_client = PubMedAPIWrapper(top_k_results=expected_docs)
|
||||
docs = api_client.load_docs("ChatGPT")
|
||||
assert len(docs) == expected_docs
|
||||
assert_docs(docs)
|
||||
|
||||
|
||||
def test_load_returns_full_set_of_metadata() -> None:
|
||||
"""Test that returns several docs"""
|
||||
api_client = PubMedAPIWrapper(load_max_docs=1, load_all_available_meta=True)
|
||||
docs = api_client.load_docs("ChatGPT")
|
||||
assert len(docs) == 3
|
||||
for doc in docs:
|
||||
assert doc.metadata
|
||||
assert set(doc.metadata).issuperset(
|
||||
{"Copyright Information", "Published", "Title", "uid"}
|
||||
)
|
||||
|
||||
|
||||
def _load_pubmed_from_universal_entry(**kwargs: Any) -> BaseTool:
|
||||
from langchain.agents.load_tools import load_tools
|
||||
tools = load_tools(["pubmed"], **kwargs)
|
||||
assert len(tools) == 1, "loaded more than 1 tool"
|
||||
return tools[0]
|
||||
|
||||
|
||||
def test_load_pupmed_from_universal_entry() -> None:
|
||||
pubmed_tool = _load_pubmed_from_universal_entry()
|
||||
search_string = (
|
||||
"Examining the Validity of ChatGPT in Identifying "
|
||||
"Relevant Nephrology Literature"
|
||||
)
|
||||
output = pubmed_tool(search_string)
|
||||
test_string = (
|
||||
"Examining the Validity of ChatGPT in Identifying "
|
||||
"Relevant Nephrology Literature: Findings and Implications"
|
||||
)
|
||||
assert test_string in output
|
||||
|
||||
|
||||
def test_load_pupmed_from_universal_entry_with_params() -> None:
|
||||
params = {
|
||||
"top_k_results": 1,
|
||||
}
|
||||
pubmed_tool = _load_pubmed_from_universal_entry(**params)
|
||||
assert isinstance(pubmed_tool, PubmedQueryRun)
|
||||
wp = pubmed_tool.api_wrapper
|
||||
assert wp.top_k_results == 1, "failed to assert top_k_results"
|
||||
@@ -0,0 +1,44 @@
|
||||
import os
|
||||
from typing import Union
|
||||
|
||||
import pytest
|
||||
from vcr.request import Request
|
||||
|
||||
# Those environment variables turn on Deep Lake pytest mode.
|
||||
# It significantly makes tests run much faster.
|
||||
# Need to run before `import deeplake`
|
||||
os.environ["BUGGER_OFF"] = "true"
|
||||
os.environ["DEEPLAKE_DOWNLOAD_PATH"] = "./testing/local_storage"
|
||||
os.environ["DEEPLAKE_PYTEST_ENABLED"] = "true"
|
||||
|
||||
|
||||
# This fixture returns a dictionary containing filter_headers options
|
||||
# for replacing certain headers with dummy values during cassette playback
|
||||
# Specifically, it replaces the authorization header with a dummy value to
|
||||
# prevent sensitive data from being recorded in the cassette.
|
||||
# It also filters request to certain hosts (specified in the `ignored_hosts` list)
|
||||
# to prevent data from being recorded in the cassette.
|
||||
@pytest.fixture(scope="module")
|
||||
def vcr_config() -> dict:
|
||||
skipped_host = ["pinecone.io"]
|
||||
|
||||
def before_record_response(response: dict) -> Union[dict, None]:
|
||||
return response
|
||||
|
||||
def before_record_request(request: Request) -> Union[Request, None]:
|
||||
for host in skipped_host:
|
||||
if request.host.startswith(host) or request.host.endswith(host):
|
||||
return None
|
||||
return request
|
||||
|
||||
return {
|
||||
"before_record_request": before_record_request,
|
||||
"before_record_response": before_record_response,
|
||||
"filter_headers": [
|
||||
("authorization", "authorization-DUMMY"),
|
||||
("X-OpenAI-Client-User-Agent", "X-OpenAI-Client-User-Agent-DUMMY"),
|
||||
("Api-Key", "Api-Key-DUMMY"),
|
||||
("User-Agent", "User-Agent-DUMMY"),
|
||||
],
|
||||
"ignore_localhost": True,
|
||||
}
|
||||
@@ -0,0 +1,85 @@
|
||||
"""Test CallbackManager."""
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
from langchain_community.callbacks import get_openai_callback
|
||||
from langchain_core.callbacks.manager import trace_as_chain_group, CallbackManager
|
||||
from langchain_core.outputs import LLMResult
|
||||
from langchain_core.tracers.langchain import LangChainTracer, wait_for_all_tracers
|
||||
from langchain_community.llms.openai import BaseOpenAI
|
||||
|
||||
|
||||
def test_callback_manager_configure_context_vars(
|
||||
monkeypatch: pytest.MonkeyPatch,
|
||||
) -> None:
|
||||
"""Test callback manager configuration."""
|
||||
monkeypatch.setenv("LANGCHAIN_TRACING_V2", "true")
|
||||
monkeypatch.setenv("LANGCHAIN_TRACING", "false")
|
||||
with patch.object(LangChainTracer, "_update_run_single"):
|
||||
with patch.object(LangChainTracer, "_persist_run_single"):
|
||||
with trace_as_chain_group("test") as group_manager:
|
||||
assert len(group_manager.handlers) == 1
|
||||
tracer = group_manager.handlers[0]
|
||||
assert isinstance(tracer, LangChainTracer)
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
# This is a new empty callback handler
|
||||
assert cb.successful_requests == 0
|
||||
assert cb.total_tokens == 0
|
||||
|
||||
# configure adds this openai cb but doesn't modify the group manager
|
||||
mngr = CallbackManager.configure(group_manager)
|
||||
assert mngr.handlers == [tracer, cb]
|
||||
assert group_manager.handlers == [tracer]
|
||||
|
||||
response = LLMResult(
|
||||
generations=[],
|
||||
llm_output={
|
||||
"token_usage": {
|
||||
"prompt_tokens": 2,
|
||||
"completion_tokens": 1,
|
||||
"total_tokens": 3,
|
||||
},
|
||||
"model_name": BaseOpenAI.__fields__["model_name"].default,
|
||||
},
|
||||
)
|
||||
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
|
||||
|
||||
# The callback handler has been updated
|
||||
assert cb.successful_requests == 1
|
||||
assert cb.total_tokens == 3
|
||||
assert cb.prompt_tokens == 2
|
||||
assert cb.completion_tokens == 1
|
||||
assert cb.total_cost > 0
|
||||
|
||||
with get_openai_callback() as cb:
|
||||
# This is a new empty callback handler
|
||||
assert cb.successful_requests == 0
|
||||
assert cb.total_tokens == 0
|
||||
|
||||
# configure adds this openai cb but doesn't modify the group manager
|
||||
mngr = CallbackManager.configure(group_manager)
|
||||
assert mngr.handlers == [tracer, cb]
|
||||
assert group_manager.handlers == [tracer]
|
||||
|
||||
response = LLMResult(
|
||||
generations=[],
|
||||
llm_output={
|
||||
"token_usage": {
|
||||
"prompt_tokens": 2,
|
||||
"completion_tokens": 1,
|
||||
"total_tokens": 3,
|
||||
},
|
||||
"model_name": BaseOpenAI.__fields__["model_name"].default,
|
||||
},
|
||||
)
|
||||
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
|
||||
|
||||
# The callback handler has been updated
|
||||
assert cb.successful_requests == 1
|
||||
assert cb.total_tokens == 3
|
||||
assert cb.prompt_tokens == 2
|
||||
assert cb.completion_tokens == 1
|
||||
assert cb.total_cost > 0
|
||||
wait_for_all_tracers()
|
||||
assert LangChainTracer._persist_run_single.call_count == 1 # type: ignore
|
||||
@@ -0,0 +1,31 @@
|
||||
from langchain_community.callbacks import __all__
|
||||
|
||||
EXPECTED_ALL = [
|
||||
"AimCallbackHandler",
|
||||
"ArgillaCallbackHandler",
|
||||
"ArizeCallbackHandler",
|
||||
"PromptLayerCallbackHandler",
|
||||
"ArthurCallbackHandler",
|
||||
"ClearMLCallbackHandler",
|
||||
"CometCallbackHandler",
|
||||
"ContextCallbackHandler",
|
||||
"HumanApprovalCallbackHandler",
|
||||
"InfinoCallbackHandler",
|
||||
"MlflowCallbackHandler",
|
||||
"LLMonitorCallbackHandler",
|
||||
"OpenAICallbackHandler",
|
||||
"LLMThoughtLabeler",
|
||||
"StreamlitCallbackHandler",
|
||||
"WandbCallbackHandler",
|
||||
"WhyLabsCallbackHandler",
|
||||
"get_openai_callback",
|
||||
"wandb_tracing_enabled",
|
||||
"FlyteCallbackHandler",
|
||||
"SageMakerCallbackHandler",
|
||||
"LabelStudioCallbackHandler",
|
||||
"TrubricsCallbackHandler",
|
||||
]
|
||||
|
||||
|
||||
def test_all_imports() -> None:
|
||||
assert set(__all__) == set(EXPECTED_ALL)
|
||||
@@ -0,0 +1,23 @@
|
||||
import pathlib
|
||||
|
||||
from langchain_community.chat_loaders import slack, utils
|
||||
|
||||
|
||||
def test_slack_chat_loader() -> None:
|
||||
chat_path = (
|
||||
pathlib.Path(__file__).parents[2]
|
||||
/ "examples"
|
||||
/ "slack_export.zip"
|
||||
)
|
||||
loader = slack.SlackChatLoader(str(chat_path))
|
||||
|
||||
chat_sessions = list(
|
||||
utils.map_ai_messages(loader.lazy_load(), sender="U0500003428")
|
||||
)
|
||||
assert chat_sessions, "Chat sessions should not be empty"
|
||||
|
||||
assert chat_sessions[1]["messages"], "Chat messages should not be empty"
|
||||
|
||||
assert (
|
||||
"Example message" in chat_sessions[1]["messages"][0].content
|
||||
), "Chat content mismatch"
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user