mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-05 08:40:36 +00:00
Compare commits
7 Commits
erick/test
...
erick/cli-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
796d7d84f5 | ||
|
|
3812e5675b | ||
|
|
e72fdae493 | ||
|
|
7fcd307f90 | ||
|
|
8a0573e414 | ||
|
|
ce0f1f4b1f | ||
|
|
949deb7781 |
@@ -17,16 +17,13 @@ For more info, check out the [GitHub documentation](https://docs.github.com/en/f
|
||||
## VS Code Dev Containers
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
|
||||
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
|
||||
```
|
||||
Note: If you click this link you will open the main repo and not your local cloned repo, you can use this link and replace with your username and cloned repo name:
|
||||
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
|
||||
|
||||
```
|
||||
Then you will have a local cloned repo where you can contribute and then create pull requests.
|
||||
|
||||
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
|
||||
|
||||
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
|
||||
You can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
|
||||
|
||||
1. If this is your first time using a development container, please ensure your system meets the pre-reqs (i.e. have Docker installed) in the [getting started steps](https://aka.ms/vscode-remote/containers/getting-started).
|
||||
|
||||
|
||||
303
.github/CONTRIBUTING.md
vendored
303
.github/CONTRIBUTING.md
vendored
@@ -3,4 +3,305 @@
|
||||
Hi there! Thank you for even being interested in contributing to LangChain.
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
|
||||
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
## 🗺️ Guidelines
|
||||
|
||||
### 👩💻 Contributing Code
|
||||
|
||||
To contribute to this project, please follow the ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
|
||||
Please do not try to push directly to this repo unless you are a maintainer.
|
||||
|
||||
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
|
||||
maintainers.
|
||||
|
||||
Pull requests cannot land without passing the formatting, linting, and testing checks first. See [Testing](#testing) and
|
||||
[Formatting and Linting](#formatting-and-linting) for how to run these checks locally.
|
||||
|
||||
It's essential that we maintain great documentation and testing. If you:
|
||||
- Fix a bug
|
||||
- Add a relevant unit or integration test when possible. These live in `tests/unit_tests` and `tests/integration_tests`.
|
||||
- Make an improvement
|
||||
- Update any affected example notebooks and documentation. These live in `docs`.
|
||||
- Update unit and integration tests when relevant.
|
||||
- Add a feature
|
||||
- Add a demo notebook in `docs/modules`.
|
||||
- Add unit and integration tests.
|
||||
|
||||
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
|
||||
best way to get our attention.
|
||||
|
||||
### 🚩GitHub Issues
|
||||
|
||||
Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.
|
||||
|
||||
There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help organize issues.
|
||||
|
||||
If you start working on an issue, please assign it to yourself.
|
||||
|
||||
If you are adding an issue, please try to keep it focused on a single, modular bug/improvement/feature.
|
||||
If two issues are related, or blocking, please link them rather than combining them.
|
||||
|
||||
We will try to keep these issues as up-to-date as possible, though
|
||||
with the rapid rate of development in this field some may get out of date.
|
||||
If you notice this happening, please let us know.
|
||||
|
||||
### 🙋Getting Help
|
||||
|
||||
Our goal is to have the simplest developer setup possible. Should you experience any difficulty getting setup, please
|
||||
contact a maintainer! Not only do we want to help get you unblocked, but we also want to make sure that the process is
|
||||
smooth for future contributors.
|
||||
|
||||
In a similar vein, we do enforce certain linting, formatting, and documentation standards in the codebase.
|
||||
If you are finding these difficult (or even just annoying) to work with, feel free to contact a maintainer for help -
|
||||
we do not want these to get in the way of getting good code into the codebase.
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
This quick start guide explains how to run the repository locally.
|
||||
For a [development container](https://containers.dev/), see the [.devcontainer folder](https://github.com/langchain-ai/langchain/tree/master/.devcontainer).
|
||||
|
||||
### Dependency Management: Poetry and other env/dependency managers
|
||||
|
||||
This project utilizes [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
|
||||
|
||||
❗Note: *Before installing Poetry*, if you use `Conda`, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
|
||||
|
||||
Install Poetry: **[documentation on how to install it](https://python-poetry.org/docs/#installation)**.
|
||||
|
||||
❗Note: If you use `Conda` or `Pyenv` as your environment/package manager, after installing Poetry,
|
||||
tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
|
||||
|
||||
### Core vs. Experimental
|
||||
|
||||
This repository contains two separate projects:
|
||||
- `langchain`: core langchain code, abstractions, and use cases.
|
||||
- `langchain.experimental`: see the [Experimental README](https://github.com/langchain-ai/langchain/tree/master/libs/experimental/README.md) for more information.
|
||||
|
||||
Each of these has its own development environment. Docs are run from the top-level makefile, but development
|
||||
is split across separate test & release flows.
|
||||
|
||||
For this quickstart, start with langchain core:
|
||||
|
||||
```bash
|
||||
cd libs/langchain
|
||||
```
|
||||
|
||||
### Local Development Dependencies
|
||||
|
||||
Install langchain development requirements (for running langchain, running examples, linting, formatting, tests, and coverage):
|
||||
|
||||
```bash
|
||||
poetry install --with test
|
||||
```
|
||||
|
||||
Then verify dependency installation:
|
||||
|
||||
```bash
|
||||
make test
|
||||
```
|
||||
|
||||
If the tests don't pass, you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
|
||||
|
||||
If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running
|
||||
Poetry v1.6.1+. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases.
|
||||
If you are still seeing this bug on v1.6.1, you may also try disabling "modern installation"
|
||||
(`poetry config installer.modern-installation false`) and re-installing requirements.
|
||||
See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
|
||||
|
||||
### Testing
|
||||
|
||||
_some test dependencies are optional; see section about optional dependencies_.
|
||||
|
||||
Unit tests cover modular logic that does not require calls to outside APIs.
|
||||
If you add new logic, please add a unit test.
|
||||
|
||||
To run unit tests:
|
||||
|
||||
```bash
|
||||
make test
|
||||
```
|
||||
|
||||
To run unit tests in Docker:
|
||||
|
||||
```bash
|
||||
make docker_tests
|
||||
```
|
||||
|
||||
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
|
||||
|
||||
### Formatting and Linting
|
||||
|
||||
Run these locally before submitting a PR; the CI system will check also.
|
||||
|
||||
#### Code Formatting
|
||||
|
||||
Formatting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/) and [ruff](https://docs.astral.sh/ruff/rules/).
|
||||
|
||||
To run formatting for this project:
|
||||
|
||||
```bash
|
||||
make format
|
||||
```
|
||||
|
||||
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
|
||||
|
||||
```bash
|
||||
make format_diff
|
||||
```
|
||||
|
||||
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
|
||||
|
||||
#### Linting
|
||||
|
||||
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [ruff](https://docs.astral.sh/ruff/rules/), and [mypy](http://mypy-lang.org/).
|
||||
|
||||
To run linting for this project:
|
||||
|
||||
```bash
|
||||
make lint
|
||||
```
|
||||
|
||||
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
|
||||
|
||||
```bash
|
||||
make lint_diff
|
||||
```
|
||||
|
||||
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
|
||||
|
||||
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
#### Spellcheck
|
||||
|
||||
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
|
||||
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
|
||||
|
||||
To check spelling for this project:
|
||||
|
||||
```bash
|
||||
make spell_check
|
||||
```
|
||||
|
||||
To fix spelling in place:
|
||||
|
||||
```bash
|
||||
make spell_fix
|
||||
```
|
||||
|
||||
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
|
||||
|
||||
```python
|
||||
[tool.codespell]
|
||||
...
|
||||
# Add here:
|
||||
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
|
||||
```
|
||||
|
||||
## Working with Optional Dependencies
|
||||
|
||||
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
|
||||
|
||||
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
|
||||
that most users won't have it installed.
|
||||
|
||||
Users who do not have the dependency installed should be able to **import** your code without
|
||||
any side effects (no warnings, no errors, no exceptions).
|
||||
|
||||
To introduce the dependency to the pyproject.toml file correctly, please do the following:
|
||||
|
||||
1. Add the dependency to the main group as an optional dependency
|
||||
```bash
|
||||
poetry add --optional [package_name]
|
||||
```
|
||||
2. Open pyproject.toml and add the dependency to the `extended_testing` extra
|
||||
3. Relock the poetry file to update the extra.
|
||||
```bash
|
||||
poetry lock --no-update
|
||||
```
|
||||
4. Add a unit test that the very least attempts to import the new code. Ideally, the unit
|
||||
test makes use of lightweight fixtures to test the logic of the code.
|
||||
5. Please use the `@pytest.mark.requires(package_name)` decorator for any tests that require the dependency.
|
||||
|
||||
## Adding a Jupyter Notebook
|
||||
|
||||
If you are adding a Jupyter Notebook example, you'll want to install the optional `dev` dependencies.
|
||||
|
||||
To install dev dependencies:
|
||||
|
||||
```bash
|
||||
poetry install --with dev
|
||||
```
|
||||
|
||||
Launch a notebook:
|
||||
|
||||
```bash
|
||||
poetry run jupyter notebook
|
||||
```
|
||||
|
||||
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
|
||||
|
||||
## Documentation
|
||||
|
||||
While the code is split between `langchain` and `langchain.experimental`, the documentation is one holistic thing.
|
||||
This covers how to get started contributing to documentation.
|
||||
|
||||
From the top-level of this repo, install documentation dependencies:
|
||||
|
||||
```bash
|
||||
poetry install
|
||||
```
|
||||
|
||||
### Contribute Documentation
|
||||
|
||||
The docs directory contains Documentation and API Reference.
|
||||
|
||||
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
|
||||
|
||||
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
|
||||
For that reason, we ask that you add good documentation to all classes and methods.
|
||||
|
||||
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
|
||||
|
||||
### Build Documentation Locally
|
||||
|
||||
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
|
||||
|
||||
Before building the documentation, it is always a good idea to clean the build directory:
|
||||
|
||||
```bash
|
||||
make docs_clean
|
||||
make api_docs_clean
|
||||
```
|
||||
|
||||
Next, you can build the documentation as outlined below:
|
||||
|
||||
```bash
|
||||
make docs_build
|
||||
make api_docs_build
|
||||
```
|
||||
|
||||
Finally, run the link checker to ensure all links are valid:
|
||||
|
||||
```bash
|
||||
make docs_linkcheck
|
||||
make api_docs_linkcheck
|
||||
```
|
||||
|
||||
### Verify Documentation changes
|
||||
|
||||
After pushing documentation changes to the repository, you can preview and verify that the changes are
|
||||
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
|
||||
This will take you to a preview of the documentation changes.
|
||||
This preview is created by [Vercel](https://vercel.com/docs/getting-started-with-vercel).
|
||||
|
||||
## 🏭 Release Process
|
||||
|
||||
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
|
||||
a developer and published to [PyPI](https://pypi.org/project/langchain/).
|
||||
|
||||
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
|
||||
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
|
||||
|
||||
### 🌟 Recognition
|
||||
|
||||
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
|
||||
If you have a Twitter account you would like us to mention, please let us know in the PR or through another means.
|
||||
|
||||
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
@@ -1,38 +0,0 @@
|
||||
labels: [idea]
|
||||
body:
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I searched existing ideas and did not find a similar one
|
||||
required: true
|
||||
- label: I added a very descriptive title
|
||||
required: true
|
||||
- label: I've clearly described the feature request and motivation for it
|
||||
required: true
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
- type: textarea
|
||||
id: proposal
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Proposal (If applicable)
|
||||
description: |
|
||||
If you would like to propose a solution, please describe it here.
|
||||
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
@@ -1,122 +0,0 @@
|
||||
labels: [Question]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in 🦜️🔗 LangChain!
|
||||
|
||||
Please follow these instructions, fill every question, and do every step. 🙏
|
||||
|
||||
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
|
||||
this is time that we cannot spend on adding new features, fixing bugs, write documentation or reviewing pull requests.
|
||||
|
||||
By asking questions in a structured way (following this) it will be much easier to help you.
|
||||
|
||||
And there's a high chance that you will find the solution along the way and you won't even have to submit it and wait for an answer. 😎
|
||||
|
||||
As there are too many questions, we will **DISCARD** and close the incomplete ones.
|
||||
|
||||
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
|
||||
|
||||
Relevant links to check before opening a question to see if your question has already been answered, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this question.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- type: checkboxes
|
||||
id: help
|
||||
attributes:
|
||||
label: Commit to Help
|
||||
description: |
|
||||
After submitting this, I commit to one of:
|
||||
|
||||
* Read open questions until I find 2 where I can help someone and add a comment to help there.
|
||||
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
|
||||
* Once my question is answered, I will mark the answer as "accepted".
|
||||
options:
|
||||
- label: I commit to help with one of those options 👆
|
||||
required: true
|
||||
- type: textarea
|
||||
id: example
|
||||
attributes:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
182
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
182
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -1,118 +1,106 @@
|
||||
name: "\U0001F41B Bug Report"
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
|
||||
description: Submit a bug report to help us improve LangChain. To report a security issue, please instead use the security option below.
|
||||
labels: ["02 Bug Report"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report bugs in LangChain.
|
||||
|
||||
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
|
||||
to ask for help with your issue.
|
||||
|
||||
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
Thank you for taking the time to file a bug report. Before creating a new
|
||||
issue, please make sure to take a few moments to check the issue tracker
|
||||
for existing issues about the bug.
|
||||
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
label: System Info
|
||||
description: Please share your system info with us.
|
||||
placeholder: LangChain version, platform, python version, ...
|
||||
validations:
|
||||
required: true
|
||||
|
||||
- type: textarea
|
||||
id: who-can-help
|
||||
attributes:
|
||||
label: Who can help?
|
||||
description: |
|
||||
Your issue will be replied to more quickly if you can figure out the right person to tag with @
|
||||
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
|
||||
|
||||
The core maintainers strive to read all issues, but tagging them will help them prioritize.
|
||||
|
||||
Please tag fewer than 3 people.
|
||||
|
||||
@hwchase17 - project lead
|
||||
|
||||
Tracing / Callbacks
|
||||
- @agola11
|
||||
|
||||
Async
|
||||
- @agola11
|
||||
|
||||
DataLoader Abstractions
|
||||
- @eyurtsev
|
||||
|
||||
LLM/Chat Wrappers
|
||||
- @hwchase17
|
||||
- @agola11
|
||||
|
||||
Tools / Toolkits
|
||||
- ...
|
||||
|
||||
placeholder: "@Username ..."
|
||||
|
||||
- type: checkboxes
|
||||
id: information-scripts-examples
|
||||
attributes:
|
||||
label: Information
|
||||
description: "The problem arises when using:"
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- label: I am sure that this is a bug in LangChain rather than my code.
|
||||
required: true
|
||||
- label: "The official example notebooks/scripts"
|
||||
- label: "My own modified scripts"
|
||||
|
||||
- type: checkboxes
|
||||
id: related-components
|
||||
attributes:
|
||||
label: Related Components
|
||||
description: "Select the components related to the issue (if applicable):"
|
||||
options:
|
||||
- label: "LLMs/Chat Models"
|
||||
- label: "Embedding Models"
|
||||
- label: "Prompts / Prompt Templates / Prompt Selectors"
|
||||
- label: "Output Parsers"
|
||||
- label: "Document Loaders"
|
||||
- label: "Vector Stores / Retrievers"
|
||||
- label: "Memory"
|
||||
- label: "Agents / Agent Executors"
|
||||
- label: "Tools / Toolkits"
|
||||
- label: "Chains"
|
||||
- label: "Callbacks/Tracing"
|
||||
- label: "Async"
|
||||
|
||||
- type: textarea
|
||||
id: reproduction
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Example Code
|
||||
label: Reproduction
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
Please provide a [code sample](https://stackoverflow.com/help/minimal-reproducible-example) that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
|
||||
If you have code snippets, error messages, stack traces please provide them here as well.
|
||||
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
|
||||
Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
The following code:
|
||||
|
||||
```python
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
Steps to reproduce the behavior:
|
||||
|
||||
1.
|
||||
2.
|
||||
3.
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
```
|
||||
- type: textarea
|
||||
id: error
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Error Message and Stack Trace (if applicable)
|
||||
description: |
|
||||
If you are reporting an error, please include the full error message and stack trace.
|
||||
placeholder: |
|
||||
Exception + full stack trace
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description telling what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
id: expected-behavior
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
label: Expected behavior
|
||||
description: "A clear and concise description of what you would expect to happen."
|
||||
|
||||
11
.github/ISSUE_TEMPLATE/config.yml
vendored
11
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,15 +1,6 @@
|
||||
blank_issues_enabled: false
|
||||
blank_issues_enabled: true
|
||||
version: 2.1
|
||||
contact_links:
|
||||
- name: 🤔 Question or Problem
|
||||
about: Ask a question or ask about a problem in GitHub Discussions.
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a
|
||||
- name: Discord
|
||||
url: https://discord.gg/6adMQxSpJS
|
||||
about: General community discussions
|
||||
- name: Feature Request
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas
|
||||
about: Suggest a feature or an idea
|
||||
- name: Show and tell
|
||||
about: Show what you built with LangChain
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell
|
||||
|
||||
36
.github/ISSUE_TEMPLATE/documentation.yml
vendored
36
.github/ISSUE_TEMPLATE/documentation.yml
vendored
@@ -4,45 +4,13 @@ title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
|
||||
labels: [03 - Documentation]
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
referring to.
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
|
||||
30
.github/ISSUE_TEMPLATE/feature-request.yml
vendored
Normal file
30
.github/ISSUE_TEMPLATE/feature-request.yml
vendored
Normal file
@@ -0,0 +1,30 @@
|
||||
name: "\U0001F680 Feature request"
|
||||
description: Submit a proposal/request for a new LangChain feature
|
||||
labels: ["02 Feature Request"]
|
||||
body:
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
|
||||
- type: textarea
|
||||
id: contribution
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Your contribution
|
||||
description: |
|
||||
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md)
|
||||
18
.github/ISSUE_TEMPLATE/other.yml
vendored
Normal file
18
.github/ISSUE_TEMPLATE/other.yml
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
name: Other Issue
|
||||
description: Raise an issue that wouldn't be covered by the other templates.
|
||||
title: "Issue: <Please write a comprehensive title after the 'Issue: ' prefix>"
|
||||
labels: [04 - Other]
|
||||
|
||||
body:
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue you'd like to raise."
|
||||
description: >
|
||||
Please describe the issue you'd like to raise as clearly as possible.
|
||||
Make sure to include any relevant links or references.
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Suggestion:"
|
||||
description: >
|
||||
Please outline a suggestion to improve the issue here.
|
||||
25
.github/ISSUE_TEMPLATE/privileged.yml
vendored
25
.github/ISSUE_TEMPLATE/privileged.yml
vendored
@@ -1,25 +0,0 @@
|
||||
name: 🔒 Privileged
|
||||
description: You are a LangChain maintainer, or was asked directly by a maintainer to create an issue here. If not, check the other options.
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain! 🚀
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
or are a regular contributor to LangChain with previous merged merged pull requests.
|
||||
- type: checkboxes
|
||||
id: privileged
|
||||
attributes:
|
||||
label: Privileged issue
|
||||
description: Confirm that you are allowed to create an issue here.
|
||||
options:
|
||||
- label: I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.
|
||||
required: true
|
||||
- type: textarea
|
||||
id: content
|
||||
attributes:
|
||||
label: Issue Content
|
||||
description: Add the content of the issue here.
|
||||
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
37
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,29 +1,20 @@
|
||||
Thank you for contributing to LangChain!
|
||||
<!-- Thank you for contributing to LangChain!
|
||||
|
||||
- [ ] **PR title**: "package: description"
|
||||
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
|
||||
- Example: "community: add foobar LLM"
|
||||
Replace this entire comment with:
|
||||
- **Description:** a description of the change,
|
||||
- **Issue:** the issue # it fixes (if applicable),
|
||||
- **Dependencies:** any dependencies required for this change,
|
||||
- **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below),
|
||||
- **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
|
||||
|
||||
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
|
||||
- **Description:** a description of the change
|
||||
- **Issue:** the issue # it fixes, if applicable
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
See contribution guidelines for more information on how to write/run tests, lint, etc:
|
||||
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
|
||||
|
||||
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, please include
|
||||
If you're adding a new integration, please include:
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
2. an example notebook showing its use. It lives in `docs/extras` directory.
|
||||
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
|
||||
|
||||
Additional guidelines:
|
||||
- Make sure optional dependencies are imported within a function.
|
||||
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
|
||||
- Most PRs should not touch more than one package.
|
||||
- Changes should be backwards compatible.
|
||||
- If you are adding something to community, do not re-import it in langchain.
|
||||
|
||||
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.
|
||||
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.
|
||||
-->
|
||||
|
||||
7
.github/actions/people/Dockerfile
vendored
7
.github/actions/people/Dockerfile
vendored
@@ -1,7 +0,0 @@
|
||||
FROM python:3.9
|
||||
|
||||
RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.1,<6.0.0"
|
||||
|
||||
COPY ./app /app
|
||||
|
||||
CMD ["python", "/app/main.py"]
|
||||
11
.github/actions/people/action.yml
vendored
11
.github/actions/people/action.yml
vendored
@@ -1,11 +0,0 @@
|
||||
# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/action.yml
|
||||
name: "Generate LangChain People"
|
||||
description: "Generate the data for the LangChain People page"
|
||||
author: "Jacob Lee <jacob@langchain.dev>"
|
||||
inputs:
|
||||
token:
|
||||
description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
|
||||
required: true
|
||||
runs:
|
||||
using: 'docker'
|
||||
image: 'Dockerfile'
|
||||
641
.github/actions/people/app/main.py
vendored
641
.github/actions/people/app/main.py
vendored
@@ -1,641 +0,0 @@
|
||||
# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/app/main.py
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
import sys
|
||||
from collections import Counter
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Container, Dict, List, Set, Union
|
||||
|
||||
import httpx
|
||||
import yaml
|
||||
from github import Github
|
||||
from pydantic import BaseModel, SecretStr
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
github_graphql_url = "https://api.github.com/graphql"
|
||||
questions_category_id = "DIC_kwDOIPDwls4CS6Ve"
|
||||
|
||||
# discussions_query = """
|
||||
# query Q($after: String, $category_id: ID) {
|
||||
# repository(name: "langchain", owner: "langchain-ai") {
|
||||
# discussions(first: 100, after: $after, categoryId: $category_id) {
|
||||
# edges {
|
||||
# cursor
|
||||
# node {
|
||||
# number
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# title
|
||||
# createdAt
|
||||
# comments(first: 100) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# isAnswer
|
||||
# replies(first: 10) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# """
|
||||
|
||||
# issues_query = """
|
||||
# query Q($after: String) {
|
||||
# repository(name: "langchain", owner: "langchain-ai") {
|
||||
# issues(first: 100, after: $after) {
|
||||
# edges {
|
||||
# cursor
|
||||
# node {
|
||||
# number
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# title
|
||||
# createdAt
|
||||
# state
|
||||
# comments(first: 100) {
|
||||
# nodes {
|
||||
# createdAt
|
||||
# author {
|
||||
# login
|
||||
# avatarUrl
|
||||
# url
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# }
|
||||
# """
|
||||
|
||||
prs_query = """
|
||||
query Q($after: String) {
|
||||
repository(name: "langchain", owner: "langchain-ai") {
|
||||
pullRequests(first: 100, after: $after, states: MERGED) {
|
||||
edges {
|
||||
cursor
|
||||
node {
|
||||
changedFiles
|
||||
additions
|
||||
deletions
|
||||
number
|
||||
labels(first: 100) {
|
||||
nodes {
|
||||
name
|
||||
}
|
||||
}
|
||||
author {
|
||||
login
|
||||
avatarUrl
|
||||
url
|
||||
... on User {
|
||||
twitterUsername
|
||||
}
|
||||
}
|
||||
title
|
||||
createdAt
|
||||
state
|
||||
reviews(first:100) {
|
||||
nodes {
|
||||
author {
|
||||
login
|
||||
avatarUrl
|
||||
url
|
||||
... on User {
|
||||
twitterUsername
|
||||
}
|
||||
}
|
||||
state
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
class Author(BaseModel):
|
||||
login: str
|
||||
avatarUrl: str
|
||||
url: str
|
||||
twitterUsername: Union[str, None] = None
|
||||
|
||||
|
||||
# Issues and Discussions
|
||||
|
||||
|
||||
class CommentsNode(BaseModel):
|
||||
createdAt: datetime
|
||||
author: Union[Author, None] = None
|
||||
|
||||
|
||||
class Replies(BaseModel):
|
||||
nodes: List[CommentsNode]
|
||||
|
||||
|
||||
class DiscussionsCommentsNode(CommentsNode):
|
||||
replies: Replies
|
||||
|
||||
|
||||
class Comments(BaseModel):
|
||||
nodes: List[CommentsNode]
|
||||
|
||||
|
||||
class DiscussionsComments(BaseModel):
|
||||
nodes: List[DiscussionsCommentsNode]
|
||||
|
||||
|
||||
class IssuesNode(BaseModel):
|
||||
number: int
|
||||
author: Union[Author, None] = None
|
||||
title: str
|
||||
createdAt: datetime
|
||||
state: str
|
||||
comments: Comments
|
||||
|
||||
|
||||
class DiscussionsNode(BaseModel):
|
||||
number: int
|
||||
author: Union[Author, None] = None
|
||||
title: str
|
||||
createdAt: datetime
|
||||
comments: DiscussionsComments
|
||||
|
||||
|
||||
class IssuesEdge(BaseModel):
|
||||
cursor: str
|
||||
node: IssuesNode
|
||||
|
||||
|
||||
class DiscussionsEdge(BaseModel):
|
||||
cursor: str
|
||||
node: DiscussionsNode
|
||||
|
||||
|
||||
class Issues(BaseModel):
|
||||
edges: List[IssuesEdge]
|
||||
|
||||
|
||||
class Discussions(BaseModel):
|
||||
edges: List[DiscussionsEdge]
|
||||
|
||||
|
||||
class IssuesRepository(BaseModel):
|
||||
issues: Issues
|
||||
|
||||
|
||||
class DiscussionsRepository(BaseModel):
|
||||
discussions: Discussions
|
||||
|
||||
|
||||
class IssuesResponseData(BaseModel):
|
||||
repository: IssuesRepository
|
||||
|
||||
|
||||
class DiscussionsResponseData(BaseModel):
|
||||
repository: DiscussionsRepository
|
||||
|
||||
|
||||
class IssuesResponse(BaseModel):
|
||||
data: IssuesResponseData
|
||||
|
||||
|
||||
class DiscussionsResponse(BaseModel):
|
||||
data: DiscussionsResponseData
|
||||
|
||||
|
||||
# PRs
|
||||
|
||||
|
||||
class LabelNode(BaseModel):
|
||||
name: str
|
||||
|
||||
|
||||
class Labels(BaseModel):
|
||||
nodes: List[LabelNode]
|
||||
|
||||
|
||||
class ReviewNode(BaseModel):
|
||||
author: Union[Author, None] = None
|
||||
state: str
|
||||
|
||||
|
||||
class Reviews(BaseModel):
|
||||
nodes: List[ReviewNode]
|
||||
|
||||
|
||||
class PullRequestNode(BaseModel):
|
||||
number: int
|
||||
labels: Labels
|
||||
author: Union[Author, None] = None
|
||||
changedFiles: int
|
||||
additions: int
|
||||
deletions: int
|
||||
title: str
|
||||
createdAt: datetime
|
||||
state: str
|
||||
reviews: Reviews
|
||||
# comments: Comments
|
||||
|
||||
|
||||
class PullRequestEdge(BaseModel):
|
||||
cursor: str
|
||||
node: PullRequestNode
|
||||
|
||||
|
||||
class PullRequests(BaseModel):
|
||||
edges: List[PullRequestEdge]
|
||||
|
||||
|
||||
class PRsRepository(BaseModel):
|
||||
pullRequests: PullRequests
|
||||
|
||||
|
||||
class PRsResponseData(BaseModel):
|
||||
repository: PRsRepository
|
||||
|
||||
|
||||
class PRsResponse(BaseModel):
|
||||
data: PRsResponseData
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
input_token: SecretStr
|
||||
github_repository: str
|
||||
httpx_timeout: int = 30
|
||||
|
||||
|
||||
def get_graphql_response(
|
||||
*,
|
||||
settings: Settings,
|
||||
query: str,
|
||||
after: Union[str, None] = None,
|
||||
category_id: Union[str, None] = None,
|
||||
) -> Dict[str, Any]:
|
||||
headers = {"Authorization": f"token {settings.input_token.get_secret_value()}"}
|
||||
# category_id is only used by one query, but GraphQL allows unused variables, so
|
||||
# keep it here for simplicity
|
||||
variables = {"after": after, "category_id": category_id}
|
||||
response = httpx.post(
|
||||
github_graphql_url,
|
||||
headers=headers,
|
||||
timeout=settings.httpx_timeout,
|
||||
json={"query": query, "variables": variables, "operationName": "Q"},
|
||||
)
|
||||
if response.status_code != 200:
|
||||
logging.error(
|
||||
f"Response was not 200, after: {after}, category_id: {category_id}"
|
||||
)
|
||||
logging.error(response.text)
|
||||
raise RuntimeError(response.text)
|
||||
data = response.json()
|
||||
if "errors" in data:
|
||||
logging.error(f"Errors in response, after: {after}, category_id: {category_id}")
|
||||
logging.error(data["errors"])
|
||||
logging.error(response.text)
|
||||
raise RuntimeError(response.text)
|
||||
return data
|
||||
|
||||
|
||||
# def get_graphql_issue_edges(*, settings: Settings, after: Union[str, None] = None):
|
||||
# data = get_graphql_response(settings=settings, query=issues_query, after=after)
|
||||
# graphql_response = IssuesResponse.model_validate(data)
|
||||
# return graphql_response.data.repository.issues.edges
|
||||
|
||||
|
||||
# def get_graphql_question_discussion_edges(
|
||||
# *,
|
||||
# settings: Settings,
|
||||
# after: Union[str, None] = None,
|
||||
# ):
|
||||
# data = get_graphql_response(
|
||||
# settings=settings,
|
||||
# query=discussions_query,
|
||||
# after=after,
|
||||
# category_id=questions_category_id,
|
||||
# )
|
||||
# graphql_response = DiscussionsResponse.model_validate(data)
|
||||
# return graphql_response.data.repository.discussions.edges
|
||||
|
||||
|
||||
def get_graphql_pr_edges(*, settings: Settings, after: Union[str, None] = None):
|
||||
if after is None:
|
||||
print("Querying PRs...")
|
||||
else:
|
||||
print(f"Querying PRs with cursor {after}...")
|
||||
data = get_graphql_response(
|
||||
settings=settings,
|
||||
query=prs_query,
|
||||
after=after
|
||||
)
|
||||
graphql_response = PRsResponse.model_validate(data)
|
||||
return graphql_response.data.repository.pullRequests.edges
|
||||
|
||||
|
||||
# def get_issues_experts(settings: Settings):
|
||||
# issue_nodes: List[IssuesNode] = []
|
||||
# issue_edges = get_graphql_issue_edges(settings=settings)
|
||||
|
||||
# while issue_edges:
|
||||
# for edge in issue_edges:
|
||||
# issue_nodes.append(edge.node)
|
||||
# last_edge = issue_edges[-1]
|
||||
# issue_edges = get_graphql_issue_edges(settings=settings, after=last_edge.cursor)
|
||||
|
||||
# commentors = Counter()
|
||||
# last_month_commentors = Counter()
|
||||
# authors: Dict[str, Author] = {}
|
||||
|
||||
# now = datetime.now(tz=timezone.utc)
|
||||
# one_month_ago = now - timedelta(days=30)
|
||||
|
||||
# for issue in issue_nodes:
|
||||
# issue_author_name = None
|
||||
# if issue.author:
|
||||
# authors[issue.author.login] = issue.author
|
||||
# issue_author_name = issue.author.login
|
||||
# issue_commentors = set()
|
||||
# for comment in issue.comments.nodes:
|
||||
# if comment.author:
|
||||
# authors[comment.author.login] = comment.author
|
||||
# if comment.author.login != issue_author_name:
|
||||
# issue_commentors.add(comment.author.login)
|
||||
# for author_name in issue_commentors:
|
||||
# commentors[author_name] += 1
|
||||
# if issue.createdAt > one_month_ago:
|
||||
# last_month_commentors[author_name] += 1
|
||||
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
# def get_discussions_experts(settings: Settings):
|
||||
# discussion_nodes: List[DiscussionsNode] = []
|
||||
# discussion_edges = get_graphql_question_discussion_edges(settings=settings)
|
||||
|
||||
# while discussion_edges:
|
||||
# for discussion_edge in discussion_edges:
|
||||
# discussion_nodes.append(discussion_edge.node)
|
||||
# last_edge = discussion_edges[-1]
|
||||
# discussion_edges = get_graphql_question_discussion_edges(
|
||||
# settings=settings, after=last_edge.cursor
|
||||
# )
|
||||
|
||||
# commentors = Counter()
|
||||
# last_month_commentors = Counter()
|
||||
# authors: Dict[str, Author] = {}
|
||||
|
||||
# now = datetime.now(tz=timezone.utc)
|
||||
# one_month_ago = now - timedelta(days=30)
|
||||
|
||||
# for discussion in discussion_nodes:
|
||||
# discussion_author_name = None
|
||||
# if discussion.author:
|
||||
# authors[discussion.author.login] = discussion.author
|
||||
# discussion_author_name = discussion.author.login
|
||||
# discussion_commentors = set()
|
||||
# for comment in discussion.comments.nodes:
|
||||
# if comment.author:
|
||||
# authors[comment.author.login] = comment.author
|
||||
# if comment.author.login != discussion_author_name:
|
||||
# discussion_commentors.add(comment.author.login)
|
||||
# for reply in comment.replies.nodes:
|
||||
# if reply.author:
|
||||
# authors[reply.author.login] = reply.author
|
||||
# if reply.author.login != discussion_author_name:
|
||||
# discussion_commentors.add(reply.author.login)
|
||||
# for author_name in discussion_commentors:
|
||||
# commentors[author_name] += 1
|
||||
# if discussion.createdAt > one_month_ago:
|
||||
# last_month_commentors[author_name] += 1
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
# def get_experts(settings: Settings):
|
||||
# (
|
||||
# discussions_commentors,
|
||||
# discussions_last_month_commentors,
|
||||
# discussions_authors,
|
||||
# ) = get_discussions_experts(settings=settings)
|
||||
# commentors = discussions_commentors
|
||||
# last_month_commentors = discussions_last_month_commentors
|
||||
# authors = {**discussions_authors}
|
||||
# return commentors, last_month_commentors, authors
|
||||
|
||||
|
||||
def _logistic(x, k):
|
||||
return x / (x + k)
|
||||
|
||||
|
||||
def get_contributors(settings: Settings):
|
||||
pr_nodes: List[PullRequestNode] = []
|
||||
pr_edges = get_graphql_pr_edges(settings=settings)
|
||||
|
||||
while pr_edges:
|
||||
for edge in pr_edges:
|
||||
pr_nodes.append(edge.node)
|
||||
last_edge = pr_edges[-1]
|
||||
pr_edges = get_graphql_pr_edges(settings=settings, after=last_edge.cursor)
|
||||
|
||||
contributors = Counter()
|
||||
contributor_scores = Counter()
|
||||
recent_contributor_scores = Counter()
|
||||
reviewers = Counter()
|
||||
authors: Dict[str, Author] = {}
|
||||
|
||||
for pr in pr_nodes:
|
||||
pr_reviewers: Set[str] = set()
|
||||
for review in pr.reviews.nodes:
|
||||
if review.author:
|
||||
authors[review.author.login] = review.author
|
||||
pr_reviewers.add(review.author.login)
|
||||
for reviewer in pr_reviewers:
|
||||
reviewers[reviewer] += 1
|
||||
if pr.author:
|
||||
authors[pr.author.login] = pr.author
|
||||
contributors[pr.author.login] += 1
|
||||
files_changed = pr.changedFiles
|
||||
lines_changed = pr.additions + pr.deletions
|
||||
score = _logistic(files_changed, 20) + _logistic(lines_changed, 100)
|
||||
contributor_scores[pr.author.login] += score
|
||||
three_months_ago = (datetime.now(timezone.utc) - timedelta(days=3*30))
|
||||
if pr.createdAt > three_months_ago:
|
||||
recent_contributor_scores[pr.author.login] += score
|
||||
return contributors, contributor_scores, recent_contributor_scores, reviewers, authors
|
||||
|
||||
|
||||
def get_top_users(
|
||||
*,
|
||||
counter: Counter,
|
||||
min_count: int,
|
||||
authors: Dict[str, Author],
|
||||
skip_users: Container[str],
|
||||
):
|
||||
users = []
|
||||
for commentor, count in counter.most_common():
|
||||
if commentor in skip_users:
|
||||
continue
|
||||
if count >= min_count:
|
||||
author = authors[commentor]
|
||||
users.append(
|
||||
{
|
||||
"login": commentor,
|
||||
"count": count,
|
||||
"avatarUrl": author.avatarUrl,
|
||||
"twitterUsername": author.twitterUsername,
|
||||
"url": author.url,
|
||||
}
|
||||
)
|
||||
return users
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
settings = Settings()
|
||||
logging.info(f"Using config: {settings.model_dump_json()}")
|
||||
g = Github(settings.input_token.get_secret_value())
|
||||
repo = g.get_repo(settings.github_repository)
|
||||
# question_commentors, question_last_month_commentors, question_authors = get_experts(
|
||||
# settings=settings
|
||||
# )
|
||||
contributors, contributor_scores, recent_contributor_scores, reviewers, pr_authors = get_contributors(
|
||||
settings=settings
|
||||
)
|
||||
# authors = {**question_authors, **pr_authors}
|
||||
authors = {**pr_authors}
|
||||
maintainers_logins = {
|
||||
"hwchase17",
|
||||
"agola11",
|
||||
"baskaryan",
|
||||
"hinthornw",
|
||||
"nfcampos",
|
||||
"efriis",
|
||||
"eyurtsev",
|
||||
"rlancemartin"
|
||||
}
|
||||
hidden_logins = {
|
||||
"dev2049",
|
||||
"vowelparrot",
|
||||
"obi1kenobi",
|
||||
"langchain-infra",
|
||||
"jacoblee93",
|
||||
"dqbd",
|
||||
"bracesproul",
|
||||
"akira",
|
||||
}
|
||||
bot_names = {"dosubot", "github-actions", "CodiumAI-Agent"}
|
||||
maintainers = []
|
||||
for login in maintainers_logins:
|
||||
user = authors[login]
|
||||
maintainers.append(
|
||||
{
|
||||
"login": login,
|
||||
"count": contributors[login], #+ question_commentors[login],
|
||||
"avatarUrl": user.avatarUrl,
|
||||
"twitterUsername": user.twitterUsername,
|
||||
"url": user.url,
|
||||
}
|
||||
)
|
||||
|
||||
# min_count_expert = 10
|
||||
# min_count_last_month = 3
|
||||
min_score_contributor = 1
|
||||
min_count_reviewer = 5
|
||||
skip_users = maintainers_logins | bot_names | hidden_logins
|
||||
# experts = get_top_users(
|
||||
# counter=question_commentors,
|
||||
# min_count=min_count_expert,
|
||||
# authors=authors,
|
||||
# skip_users=skip_users,
|
||||
# )
|
||||
# last_month_active = get_top_users(
|
||||
# counter=question_last_month_commentors,
|
||||
# min_count=min_count_last_month,
|
||||
# authors=authors,
|
||||
# skip_users=skip_users,
|
||||
# )
|
||||
top_recent_contributors = get_top_users(
|
||||
counter=recent_contributor_scores,
|
||||
min_count=min_score_contributor,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
top_contributors = get_top_users(
|
||||
counter=contributor_scores,
|
||||
min_count=min_score_contributor,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
top_reviewers = get_top_users(
|
||||
counter=reviewers,
|
||||
min_count=min_count_reviewer,
|
||||
authors=authors,
|
||||
skip_users=skip_users,
|
||||
)
|
||||
|
||||
people = {
|
||||
"maintainers": maintainers,
|
||||
# "experts": experts,
|
||||
# "last_month_active": last_month_active,
|
||||
"top_recent_contributors": top_recent_contributors,
|
||||
"top_contributors": top_contributors,
|
||||
"top_reviewers": top_reviewers,
|
||||
}
|
||||
people_path = Path("./docs/data/people.yml")
|
||||
people_old_content = people_path.read_text(encoding="utf-8")
|
||||
new_people_content = yaml.dump(
|
||||
people, sort_keys=False, width=200, allow_unicode=True
|
||||
)
|
||||
if (
|
||||
people_old_content == new_people_content
|
||||
):
|
||||
logging.info("The LangChain People data hasn't changed, finishing.")
|
||||
sys.exit(0)
|
||||
people_path.write_text(new_people_content, encoding="utf-8")
|
||||
logging.info("Setting up GitHub Actions git user")
|
||||
subprocess.run(["git", "config", "user.name", "github-actions"], check=True)
|
||||
subprocess.run(
|
||||
["git", "config", "user.email", "github-actions@github.com"], check=True
|
||||
)
|
||||
branch_name = "langchain/langchain-people"
|
||||
logging.info(f"Creating a new branch {branch_name}")
|
||||
subprocess.run(["git", "checkout", "-B", branch_name], check=True)
|
||||
logging.info("Adding updated file")
|
||||
subprocess.run(
|
||||
["git", "add", str(people_path)], check=True
|
||||
)
|
||||
logging.info("Committing updated file")
|
||||
message = "👥 Update LangChain people data"
|
||||
result = subprocess.run(["git", "commit", "-m", message], check=True)
|
||||
logging.info("Pushing branch")
|
||||
subprocess.run(["git", "push", "origin", branch_name, "-f"], check=True)
|
||||
logging.info("Creating PR")
|
||||
pr = repo.create_pull(title=message, body=message, base="master", head=branch_name)
|
||||
logging.info(f"Created PR: {pr.number}")
|
||||
logging.info("Finished")
|
||||
10
.github/actions/poetry_setup/action.yml
vendored
10
.github/actions/poetry_setup/action.yml
vendored
@@ -26,13 +26,12 @@ inputs:
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
- uses: actions/setup-python@v5
|
||||
- uses: actions/setup-python@v4
|
||||
name: Setup python ${{ inputs.python-version }}
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- uses: actions/cache@v4
|
||||
- uses: actions/cache@v3
|
||||
id: cache-bin-poetry
|
||||
name: Cache Poetry binary - Python ${{ inputs.python-version }}
|
||||
env:
|
||||
@@ -75,11 +74,10 @@ runs:
|
||||
env:
|
||||
POETRY_VERSION: ${{ inputs.poetry-version }}
|
||||
PYTHON_VERSION: ${{ inputs.python-version }}
|
||||
# Install poetry using the python version installed by setup-python step.
|
||||
run: pipx install "poetry==$POETRY_VERSION" --python '${{ steps.setup-python.outputs.python-path }}' --verbose
|
||||
run: pipx install "poetry==$POETRY_VERSION" --python "python$PYTHON_VERSION" --verbose
|
||||
|
||||
- name: Restore pip and poetry cached dependencies
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
50
.github/scripts/check_diff.py
vendored
50
.github/scripts/check_diff.py
vendored
@@ -1,50 +0,0 @@
|
||||
import json
|
||||
import sys
|
||||
import os
|
||||
|
||||
LANGCHAIN_DIRS = {
|
||||
"libs/core",
|
||||
"libs/langchain",
|
||||
"libs/experimental",
|
||||
"libs/community",
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
files = sys.argv[1:]
|
||||
dirs_to_run = set()
|
||||
|
||||
if len(files) == 300:
|
||||
# max diff length is 300 files - there are likely files missing
|
||||
raise ValueError("Max diff reached. Please manually run CI on changed libs.")
|
||||
|
||||
for file in files:
|
||||
if any(
|
||||
file.startswith(dir_)
|
||||
for dir_ in (
|
||||
".github/workflows",
|
||||
".github/tools",
|
||||
".github/actions",
|
||||
"libs/core",
|
||||
".github/scripts/check_diff.py",
|
||||
)
|
||||
):
|
||||
dirs_to_run.update(LANGCHAIN_DIRS)
|
||||
elif "libs/community" in file:
|
||||
dirs_to_run.update(
|
||||
("libs/community", "libs/langchain", "libs/experimental")
|
||||
)
|
||||
elif "libs/partners" in file:
|
||||
partner_dir = file.split("/")[2]
|
||||
if os.path.isdir(f"libs/partners/{partner_dir}"):
|
||||
dirs_to_run.add(f"libs/partners/{partner_dir}")
|
||||
# Skip if the directory was deleted
|
||||
elif "libs/langchain" in file:
|
||||
dirs_to_run.update(("libs/langchain", "libs/experimental"))
|
||||
elif "libs/experimental" in file:
|
||||
dirs_to_run.add("libs/experimental")
|
||||
elif file.startswith("libs/"):
|
||||
dirs_to_run.update(LANGCHAIN_DIRS)
|
||||
else:
|
||||
pass
|
||||
json_output = json.dumps(list(dirs_to_run))
|
||||
print(f"dirs-to-run={json_output}") # noqa: T201
|
||||
67
.github/scripts/get_min_versions.py
vendored
67
.github/scripts/get_min_versions.py
vendored
@@ -1,67 +0,0 @@
|
||||
import sys
|
||||
|
||||
import tomllib
|
||||
from packaging.version import parse as parse_version
|
||||
import re
|
||||
|
||||
MIN_VERSION_LIBS = ["langchain-core", "langchain-community", "langchain"]
|
||||
|
||||
|
||||
def get_min_version(version: str) -> str:
|
||||
# case ^x.x.x
|
||||
_match = re.match(r"^\^(\d+(?:\.\d+){0,2})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
# case >=x.x.x,<y.y.y
|
||||
_match = re.match(r"^>=(\d+(?:\.\d+){0,2}),<(\d+(?:\.\d+){0,2})$", version)
|
||||
if _match:
|
||||
_min = _match.group(1)
|
||||
_max = _match.group(2)
|
||||
assert parse_version(_min) < parse_version(_max)
|
||||
return _min
|
||||
|
||||
# case x.x.x
|
||||
_match = re.match(r"^(\d+(?:\.\d+){0,2})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
raise ValueError(f"Unrecognized version format: {version}")
|
||||
|
||||
|
||||
def get_min_version_from_toml(toml_path: str):
|
||||
# Parse the TOML file
|
||||
with open(toml_path, "rb") as file:
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
# Get the dependencies from tool.poetry.dependencies
|
||||
dependencies = toml_data["tool"]["poetry"]["dependencies"]
|
||||
|
||||
# Initialize a dictionary to store the minimum versions
|
||||
min_versions = {}
|
||||
|
||||
# Iterate over the libs in MIN_VERSION_LIBS
|
||||
for lib in MIN_VERSION_LIBS:
|
||||
# Check if the lib is present in the dependencies
|
||||
if lib in dependencies:
|
||||
# Get the version string
|
||||
version_string = dependencies[lib]
|
||||
|
||||
# Use parse_version to get the minimum supported version from version_string
|
||||
min_version = get_min_version(version_string)
|
||||
|
||||
# Store the minimum version in the min_versions dictionary
|
||||
min_versions[lib] = min_version
|
||||
|
||||
return min_versions
|
||||
|
||||
|
||||
# Get the TOML file path from the command line argument
|
||||
toml_file = sys.argv[1]
|
||||
|
||||
# Call the function to get the minimum versions
|
||||
min_versions = get_min_version_from_toml(toml_file)
|
||||
|
||||
print(
|
||||
" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])
|
||||
) # noqa: T201
|
||||
126
.github/workflows/_all_ci.yml
vendored
126
.github/workflows/_all_ci.yml
vendored
@@ -1,126 +0,0 @@
|
||||
---
|
||||
name: langchain CI
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: choice
|
||||
default: 'libs/langchain'
|
||||
options:
|
||||
- libs/langchain
|
||||
- libs/core
|
||||
- libs/experimental
|
||||
- libs/community
|
||||
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.working-directory }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
name: "-"
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
name: "-"
|
||||
uses: ./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
name: "-"
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
dependencies:
|
||||
name: "-"
|
||||
uses: ./.github/workflows/_dependencies.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
name: "make extended_tests #${{ matrix.python-version }}"
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing --with test
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
ci_end:
|
||||
name: "CI Success"
|
||||
needs: [lint, test, compile-integration-tests, dependencies, extended-tests]
|
||||
if: ${{ always() }}
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: "CI Success"
|
||||
if: ${{ !failure() }}
|
||||
run: |
|
||||
echo "Success"
|
||||
exit 0
|
||||
- name: "CI Failure"
|
||||
if: ${{ failure() }}
|
||||
run: |
|
||||
echo "Failure"
|
||||
exit 1
|
||||
@@ -9,7 +9,7 @@ on:
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -24,7 +24,7 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "poetry run pytest -m compile tests/integration_tests #${{ matrix.python-version }}"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -38,7 +38,7 @@ jobs:
|
||||
|
||||
- name: Install integration dependencies
|
||||
shell: bash
|
||||
run: poetry install --with=test_integration,test
|
||||
run: poetry install --with=test_integration
|
||||
|
||||
- name: Check integration tests compile
|
||||
shell: bash
|
||||
|
||||
113
.github/workflows/_dependencies.yml
vendored
113
.github/workflows/_dependencies.yml
vendored
@@ -1,113 +0,0 @@
|
||||
name: dependencies
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: dependency checks ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: pydantic-cross-compat
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install
|
||||
|
||||
- name: Check imports with base dependencies
|
||||
shell: bash
|
||||
run: poetry run make check_imports
|
||||
|
||||
- name: Install test dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
run: |
|
||||
# Determine the major part of pydantic version
|
||||
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
if [[ "$REGULAR_VERSION" == "1" ]]; then
|
||||
PYDANTIC_DEP=">=2.1,<3"
|
||||
TEST_WITH_VERSION="2"
|
||||
elif [[ "$REGULAR_VERSION" == "2" ]]; then
|
||||
PYDANTIC_DEP="<2"
|
||||
TEST_WITH_VERSION="1"
|
||||
else
|
||||
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
|
||||
# which would prevent caching from working: the cache would get saved
|
||||
# to a different key than where it gets loaded from.
|
||||
poetry run pip install "pydantic${PYDANTIC_DEP}"
|
||||
|
||||
# Ensure that the correct pydantic is installed now.
|
||||
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
|
||||
|
||||
# Determine the major part of pydantic version
|
||||
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
# Check that the major part of pydantic version is as expected, if not
|
||||
# raise an error
|
||||
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
|
||||
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
|
||||
exit 1
|
||||
fi
|
||||
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
|
||||
- name: Run pydantic compatibility tests
|
||||
shell: bash
|
||||
run: make test
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
86
.github/workflows/_integration_test.yml
vendored
86
.github/workflows/_integration_test.yml
vendored
@@ -1,86 +0,0 @@
|
||||
name: Integration tests
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
environment: Scheduled testing
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test,test_integration
|
||||
|
||||
- name: Install deps outside pyproject
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
|
||||
shell: bash
|
||||
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
shell: bash
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
run: |
|
||||
make integration_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
122
.github/workflows/_lint.yml
vendored
122
.github/workflows/_lint.yml
vendored
@@ -7,22 +7,20 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
# This env var allows us to get inline annotations when ruff has complaints.
|
||||
RUFF_OUTPUT_FORMAT: github
|
||||
|
||||
jobs:
|
||||
build:
|
||||
name: "make lint #${{ matrix.python-version }}"
|
||||
runs-on: ubuntu-latest
|
||||
env:
|
||||
# This number is set "by eye": we want it to be big enough
|
||||
# so that it's bigger than the number of commits in any reasonable PR,
|
||||
# and also as small as possible since increasing the number makes
|
||||
# the initial `git fetch` slower.
|
||||
FETCH_DEPTH: 50
|
||||
strategy:
|
||||
matrix:
|
||||
# Only lint on the min and max supported Python versions.
|
||||
@@ -37,6 +35,51 @@ jobs:
|
||||
- "3.11"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
# Fetch the last FETCH_DEPTH commits, so the mtime-changing script
|
||||
# can accurately set the mtimes of files modified in the last FETCH_DEPTH commits.
|
||||
fetch-depth: ${{ env.FETCH_DEPTH }}
|
||||
- name: Restore workdir file mtimes to last-edited commit date
|
||||
id: restore-mtimes
|
||||
# This is needed to make black caching work.
|
||||
# Black's cache uses file (mtime, size) to check whether a lookup is a cache hit.
|
||||
# Without this command, files in the repo would have the current time as the modified time,
|
||||
# since the previous action step just created them.
|
||||
# This command resets the mtime to the last time the files were modified in git instead,
|
||||
# which is a high-quality and stable representation of the last modification date.
|
||||
run: |
|
||||
# Important considerations:
|
||||
# - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
|
||||
# - We only want to alter mtimes for Python files, since that's all black checks.
|
||||
# - We don't need to alter mtimes for directories, since black doesn't look at those.
|
||||
# - We also only alter mtimes inside the `WORKDIR` since that's all we'll lint.
|
||||
# - This should run before `poetry install`, because poetry's venv also contains
|
||||
# Python files, and we don't want to alter their mtimes since they aren't linted.
|
||||
|
||||
# Ensure we fail on non-zero exits and on undefined variables.
|
||||
# Also print executed commands, for easier debugging.
|
||||
set -eux
|
||||
|
||||
# Restore the mtimes of Python files in the workdir based on git history.
|
||||
.github/tools/git-restore-mtime --no-directories "$WORKDIR/**/*.py"
|
||||
|
||||
# Since CI only does a partial fetch (to `FETCH_DEPTH`) for efficiency,
|
||||
# the local git repo doesn't have full history. There are probably files
|
||||
# that were last modified in a commit *older than* the oldest fetched commit.
|
||||
# After `git-restore-mtime`, such files have a mtime set to the oldest fetched commit.
|
||||
#
|
||||
# As new commits get added, that timestamp will keep moving forward.
|
||||
# If left unchanged, this will make `black` think that the files were edited
|
||||
# more recently than its cache suggests. Instead, we can set their mtime
|
||||
# to a fixed date in the far past that won't change and won't cause cache misses in black.
|
||||
#
|
||||
# For all workdir Python files modified in or before the oldest few fetched commits,
|
||||
# make their mtime be 2000-01-01 00:00:00.
|
||||
OLDEST_COMMIT="$(git log --reverse '--pretty=format:%H' | head -1)"
|
||||
OLDEST_COMMIT_TIME="$(git show -s '--format=%ai' "$OLDEST_COMMIT")"
|
||||
find "$WORKDIR" -name '*.py' -type f -not -newermt "$OLDEST_COMMIT_TIME" -exec touch -c -m -t '200001010000' '{}' '+'
|
||||
|
||||
echo "oldest-commit=$OLDEST_COMMIT" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
@@ -69,60 +112,39 @@ jobs:
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with lint,typing
|
||||
poetry install --with dev,lint,test,typing
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
if: ${{ inputs.working-directory != 'libs/langchain' }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
pip install -e ../langchain
|
||||
|
||||
- name: Restore black cache
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
CACHE_BASE: black-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.black_cache
|
||||
key: ${{ env.CACHE_BASE }}-${{ steps.restore-mtimes.outputs.oldest-commit }}
|
||||
restore-keys:
|
||||
# If we can't find an exact match for our cache key, accept any with this prefix.
|
||||
${{ env.CACHE_BASE }}-
|
||||
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v3
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache
|
||||
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
key: mypy-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
make lint_package
|
||||
|
||||
- name: Install unit test dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
|
||||
#
|
||||
# If you change this configuration, make sure to change the `cache-key`
|
||||
# in the `poetry_setup` action above to stop using the old cache.
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with test
|
||||
- name: Install unit+integration test dependencies
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry install --with test,test_integration
|
||||
|
||||
- name: Get .mypy_cache_test to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache_test
|
||||
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
BLACK_CACHE_DIR: .black_cache
|
||||
run: |
|
||||
make lint_tests
|
||||
make lint
|
||||
|
||||
93
.github/workflows/_pydantic_compatibility.yml
vendored
Normal file
93
.github/workflows/_pydantic_compatibility.yml
vendored
Normal file
@@ -0,0 +1,93 @@
|
||||
name: pydantic v1/v2 compatibility
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Pydantic v1/v2 compatibility - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: pydantic-cross-compat
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install
|
||||
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
run: |
|
||||
# Determine the major part of pydantic version
|
||||
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
if [[ "$REGULAR_VERSION" == "1" ]]; then
|
||||
PYDANTIC_DEP=">=2.1,<3"
|
||||
TEST_WITH_VERSION="2"
|
||||
elif [[ "$REGULAR_VERSION" == "2" ]]; then
|
||||
PYDANTIC_DEP="<2"
|
||||
TEST_WITH_VERSION="1"
|
||||
else
|
||||
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
|
||||
# which would prevent caching from working: the cache would get saved
|
||||
# to a different key than where it gets loaded from.
|
||||
poetry run pip install "pydantic${PYDANTIC_DEP}"
|
||||
|
||||
# Ensure that the correct pydantic is installed now.
|
||||
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
|
||||
|
||||
# Determine the major part of pydantic version
|
||||
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
# Check that the major part of pydantic version is as expected, if not
|
||||
# raise an error
|
||||
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
|
||||
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
|
||||
exit 1
|
||||
fi
|
||||
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
|
||||
- name: Run pydantic compatibility tests
|
||||
shell: bash
|
||||
run: make test
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
271
.github/workflows/_release.yml
vendored
271
.github/workflows/_release.yml
vendored
@@ -1,5 +1,5 @@
|
||||
name: release
|
||||
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
@@ -7,216 +7,14 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
default: 'libs/langchain'
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
if_release:
|
||||
# Disallow publishing from branches that aren't `master`.
|
||||
if: github.ref == 'refs/heads/master'
|
||||
environment: Scheduled testing
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
outputs:
|
||||
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
|
||||
version: ${{ steps.check-version.outputs.version }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
# The release stage has trusted publishing and GitHub repo contents write access,
|
||||
# and we want to keep the scope of that access limited just to the release job.
|
||||
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
|
||||
# could get access to our GitHub or PyPI credentials.
|
||||
#
|
||||
# Per the trusted publishing GitHub Action:
|
||||
# > It is strongly advised to separate jobs for building [...]
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
|
||||
test-pypi-publish:
|
||||
needs:
|
||||
- build
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
pre-release-checks:
|
||||
needs:
|
||||
- build
|
||||
- test-pypi-publish
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We explicitly *don't* set up caching here. This ensures our tests are
|
||||
# maximally sensitive to catching breakage.
|
||||
#
|
||||
# For example, here's a way that caching can cause a falsely-passing test:
|
||||
# - Make the langchain package manifest no longer list a dependency package
|
||||
# as a requirement. This means it won't be installed by `pip install`,
|
||||
# and attempting to use it would cause a crash.
|
||||
# - That dependency used to be required, so it may have been cached.
|
||||
# When restoring the venv packages from cache, that dependency gets included.
|
||||
# - Tests pass, because the dependency is present even though it wasn't specified.
|
||||
# - The package is published, and it breaks on the missing dependency when
|
||||
# used in the real world.
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Import published package
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
env:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
# Here we use:
|
||||
# - The default regular PyPI index as the *primary* index, meaning
|
||||
# that it takes priority (https://pypi.org/simple)
|
||||
# - The test PyPI index as an extra index, so that any dependencies that
|
||||
# are not found on test PyPI can be resolved and installed anyway.
|
||||
# (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION
|
||||
# package because VERSION will not have been uploaded to regular PyPI yet.
|
||||
# - attempt install again after 5 seconds if it fails because there is
|
||||
# sometimes a delay in availability on test pypi
|
||||
run: |
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" || \
|
||||
( \
|
||||
sleep 5 && \
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" \
|
||||
)
|
||||
|
||||
# Replace all dashes in the package name with underscores,
|
||||
# since that's how Python imports packages with dashes in the name.
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
|
||||
|
||||
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
|
||||
- name: Import test dependencies
|
||||
run: poetry install --with test,test_integration
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Overwrite the local version of the package with the test PyPI version.
|
||||
- name: Import published package (again)
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
shell: bash
|
||||
env:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION"
|
||||
|
||||
- name: Run unit tests
|
||||
run: make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
run: make integration_tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Get minimum versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
run: |
|
||||
poetry run pip install packaging
|
||||
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
- name: Run unit tests with minimum dependency versions
|
||||
if: ${{ steps.min-version.outputs.min-versions != '' }}
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
poetry run pip install $MIN_VERSIONS
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
publish:
|
||||
needs:
|
||||
- build
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is used for trusted publishing:
|
||||
@@ -226,65 +24,28 @@ jobs:
|
||||
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
|
||||
id-token: write
|
||||
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Publish package distributions to PyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
with:
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
|
||||
mark-release:
|
||||
needs:
|
||||
- build
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
- publish
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is needed by `ncipollo/release-action` to
|
||||
# create the GitHub release.
|
||||
# This permission is needed by `ncipollo/release-action` to create the GitHub release.
|
||||
contents: write
|
||||
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
python-version: "3.10"
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v3
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
run: |
|
||||
echo version=$(poetry version --short) >> $GITHUB_OUTPUT
|
||||
- name: Create Release
|
||||
uses: ncipollo/release-action@v1
|
||||
if: ${{ inputs.working-directory == 'libs/langchain' }}
|
||||
@@ -293,5 +54,11 @@ jobs:
|
||||
token: ${{ secrets.GITHUB_TOKEN }}
|
||||
draft: false
|
||||
generateReleaseNotes: true
|
||||
tag: v${{ needs.build.outputs.version }}
|
||||
tag: v${{ steps.check-version.outputs.version }}
|
||||
commit: master
|
||||
- name: Publish package distributions to PyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
with:
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
|
||||
29
.github/workflows/_test.yml
vendored
29
.github/workflows/_test.yml
vendored
@@ -7,13 +7,9 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -28,7 +24,7 @@ jobs:
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "make test #${{ matrix.python-version }}"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -42,20 +38,19 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
run: poetry install
|
||||
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: |
|
||||
make test
|
||||
run: make test
|
||||
|
||||
- name: Install integration dependencies
|
||||
shell: bash
|
||||
run: poetry install --with=test_integration
|
||||
|
||||
- name: Check integration tests compile
|
||||
shell: bash
|
||||
run: poetry run pytest -m compile tests/integration_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
|
||||
83
.github/workflows/_test_release.yml
vendored
83
.github/workflows/_test_release.yml
vendored
@@ -9,61 +9,10 @@ on:
|
||||
description: "From which folder this pipeline executes"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
if: github.ref == 'refs/heads/master'
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
outputs:
|
||||
pkg-name: ${{ steps.check-version.outputs.pkg-name }}
|
||||
version: ${{ steps.check-version.outputs.version }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
# The release stage has trusted publishing and GitHub repo contents write access,
|
||||
# and we want to keep the scope of that access limited just to the release job.
|
||||
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
|
||||
# could get access to our GitHub or PyPI credentials.
|
||||
#
|
||||
# Per the trusted publishing GitHub Action:
|
||||
# > It is strongly advised to separate jobs for building [...]
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v3
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
|
||||
publish:
|
||||
needs:
|
||||
- build
|
||||
publish_to_test_pypi:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is used for trusted publishing:
|
||||
@@ -72,24 +21,30 @@ jobs:
|
||||
# Trusted publishing has to also be configured on PyPI for each package:
|
||||
# https://docs.pypi.org/trusted-publishers/adding-a-publisher/
|
||||
id-token: write
|
||||
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- uses: actions/download-artifact@v3
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
python-version: "3.10"
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- name: Publish to test PyPI
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
run: |
|
||||
echo version=$(poetry version --short) >> $GITHUB_OUTPUT
|
||||
- name: Publish package to TestPyPI
|
||||
uses: pypa/gh-action-pypi-publish@release/v1
|
||||
with:
|
||||
repository-url: https://test.pypi.org/legacy/
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
repository-url: https://test.pypi.org/legacy/
|
||||
|
||||
# We overwrite any existing distributions with the same name and version.
|
||||
# This is *only for CI use* and is *extremely dangerous* otherwise!
|
||||
# https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates
|
||||
skip-existing: true
|
||||
|
||||
52
.github/workflows/api_doc_build.yml
vendored
52
.github/workflows/api_doc_build.yml
vendored
@@ -1,52 +0,0 @@
|
||||
name: API docs build
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
ref: bagatur/api_docs_build
|
||||
|
||||
- name: Set Git config
|
||||
run: |
|
||||
git config --local user.email "actions@github.com"
|
||||
git config --local user.name "Github Actions"
|
||||
|
||||
- name: Merge master
|
||||
run: |
|
||||
git fetch origin master
|
||||
git merge origin/master -m "Merge master" --allow-unrelated-histories -X theirs
|
||||
|
||||
- name: Set up Python ${{ env.PYTHON_VERSION }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
cache-key: api-docs
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
poetry run python -m pip install --upgrade --no-cache-dir pip setuptools
|
||||
poetry run python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
|
||||
poetry run python -m pip install ./libs/partners/*
|
||||
poetry run python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
|
||||
|
||||
- name: Build docs
|
||||
run: |
|
||||
poetry run python -m pip install --upgrade --no-cache-dir pip setuptools
|
||||
poetry run python docs/api_reference/create_api_rst.py
|
||||
poetry run python -m sphinx -T -E -b html -d _build/doctrees -c docs/api_reference docs/api_reference api_reference_build/html -j auto
|
||||
|
||||
# https://github.com/marketplace/actions/add-commit
|
||||
- uses: EndBug/add-and-commit@v9
|
||||
with:
|
||||
message: 'Update API docs build'
|
||||
44
.github/workflows/check_diffs.yml
vendored
44
.github/workflows/check_diffs.yml
vendored
@@ -1,44 +0,0 @@
|
||||
---
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.2.0
|
||||
- id: set-matrix
|
||||
run: |
|
||||
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
|
||||
outputs:
|
||||
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
|
||||
ci:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
|
||||
uses: ./.github/workflows/_all_ci.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
|
||||
|
||||
5
.github/workflows/codespell.yml
vendored
5
.github/workflows/codespell.yml
vendored
@@ -1,5 +1,5 @@
|
||||
---
|
||||
name: CI / cd . / make spell_check
|
||||
name: Codespell
|
||||
|
||||
on:
|
||||
push:
|
||||
@@ -12,7 +12,7 @@ permissions:
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: (Check for spelling errors)
|
||||
name: Check for spelling errors
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
@@ -34,4 +34,3 @@ jobs:
|
||||
with:
|
||||
skip: guide_imports.json
|
||||
ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
|
||||
exclude_file: libs/community/langchain_community/llms/yuan2.py
|
||||
|
||||
25
.github/workflows/doc_lint.yml
vendored
25
.github/workflows/doc_lint.yml
vendored
@@ -1,37 +1,22 @@
|
||||
---
|
||||
name: CI / cd .
|
||||
name: Documentation Lint
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
branches: [master]
|
||||
pull_request:
|
||||
paths:
|
||||
- 'docs/**'
|
||||
- 'templates/**'
|
||||
- 'cookbook/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/doc_lint.yml'
|
||||
workflow_dispatch:
|
||||
branches: [master]
|
||||
|
||||
jobs:
|
||||
check:
|
||||
name: Check for "from langchain import x" imports
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Run import check
|
||||
run: |
|
||||
# We should not encourage imports directly from main init file
|
||||
# Expect for hub
|
||||
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
|
||||
|
||||
lint:
|
||||
name: "-"
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: "."
|
||||
secrets: inherit
|
||||
git grep 'from langchain import' docs/{docs,snippets} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
|
||||
|
||||
@@ -3,8 +3,6 @@ import toml
|
||||
pyproject_toml = toml.load("pyproject.toml")
|
||||
|
||||
# Extract the ignore words list (adjust the key as per your TOML structure)
|
||||
ignore_words_list = (
|
||||
pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
|
||||
)
|
||||
ignore_words_list = pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
|
||||
|
||||
print(f"::set-output name=ignore_words_list::{ignore_words_list}") # noqa: T201
|
||||
print(f"::set-output name=ignore_words_list::{ignore_words_list}")
|
||||
106
.github/workflows/langchain_ci.yml
vendored
Normal file
106
.github/workflows/langchain_ci.yml
vendored
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
name: libs/langchain CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/_pydantic_compatibility.yml'
|
||||
- '.github/workflows/langchain_ci.yml'
|
||||
- 'libs/**'
|
||||
- '!libs/cli'
|
||||
- '!libs/experimental'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: "libs/langchain"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
uses:
|
||||
./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
pydantic-compatibility:
|
||||
uses:
|
||||
./.github/workflows/_pydantic_compatibility.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }} extended tests
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: libs/langchain
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
53
.github/workflows/langchain_cli_ci.yml
vendored
Normal file
53
.github/workflows/langchain_cli_ci.yml
vendored
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
name: libs/cli CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/_pydantic_compatibility.yml'
|
||||
- '.github/workflows/langchain_cli_ci.yml'
|
||||
- 'libs/cli/**'
|
||||
- 'libs/*'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: "libs/cli"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: libs/cli
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/cli
|
||||
secrets: inherit
|
||||
|
||||
pydantic-compatibility:
|
||||
uses:
|
||||
./.github/workflows/_pydantic_compatibility.yml
|
||||
with:
|
||||
working-directory: libs/cli
|
||||
secrets: inherit
|
||||
136
.github/workflows/langchain_experimental_ci.yml
vendored
Normal file
136
.github/workflows/langchain_experimental_ci.yml
vendored
Normal file
@@ -0,0 +1,136 @@
|
||||
---
|
||||
name: libs/experimental CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ master ]
|
||||
pull_request:
|
||||
paths:
|
||||
- '.github/actions/poetry_setup/action.yml'
|
||||
- '.github/tools/**'
|
||||
- '.github/workflows/_lint.yml'
|
||||
- '.github/workflows/_test.yml'
|
||||
- '.github/workflows/langchain_experimental_ci.yml'
|
||||
- 'libs/**'
|
||||
- '!libs/cli'
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.6.1"
|
||||
WORKDIR: "libs/experimental"
|
||||
|
||||
jobs:
|
||||
lint:
|
||||
uses:
|
||||
./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
|
||||
test:
|
||||
uses:
|
||||
./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
|
||||
compile-integration-tests:
|
||||
uses:
|
||||
./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
|
||||
# It's possible that langchain-experimental works fine with the latest *published* langchain,
|
||||
# but is broken with the langchain on `master`.
|
||||
#
|
||||
# We want to catch situations like that *before* releasing a new langchain, hence this test.
|
||||
test-with-latest-langchain:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: test with unpublished langchain - Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
cache-key: unpublished-langchain
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running tests with unpublished langchain, installing dependencies with poetry..."
|
||||
poetry install
|
||||
|
||||
echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
|
||||
poetry run pip install -e ../langchain
|
||||
|
||||
- name: Run tests
|
||||
run: make test
|
||||
extended-tests:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ env.WORKDIR }}
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }} extended tests
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: libs/experimental
|
||||
cache-key: extended
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install -E extended_testing
|
||||
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
13
.github/workflows/langchain_experimental_release.yml
vendored
Normal file
13
.github/workflows/langchain_experimental_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: libs/experimental Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_experimental_test_release.yml
vendored
Normal file
13
.github/workflows/langchain_experimental_test_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: Experimental Test Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: libs/experimental
|
||||
secrets: inherit
|
||||
27
.github/workflows/langchain_release.yml
vendored
Normal file
27
.github/workflows/langchain_release.yml
vendored
Normal file
@@ -0,0 +1,27 @@
|
||||
---
|
||||
name: libs/langchain Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_release.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
|
||||
# N.B.: It's possible that PyPI doesn't make the new release visible / available
|
||||
# immediately after publishing. If that happens, the docker build might not
|
||||
# create a new docker image for the new release, since it won't see it.
|
||||
#
|
||||
# If this ends up being a problem, add a check to the end of the `_release.yml`
|
||||
# workflow that prevents the workflow from finishing until the new release
|
||||
# is visible and installable on PyPI.
|
||||
release-docker:
|
||||
needs:
|
||||
- release
|
||||
uses:
|
||||
./.github/workflows/langchain_release_docker.yml
|
||||
secrets: inherit
|
||||
13
.github/workflows/langchain_test_release.yml
vendored
Normal file
13
.github/workflows/langchain_test_release.yml
vendored
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
name: Test Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
with:
|
||||
working-directory: libs/langchain
|
||||
secrets: inherit
|
||||
36
.github/workflows/people.yml
vendored
36
.github/workflows/people.yml
vendored
@@ -1,36 +0,0 @@
|
||||
name: LangChain People
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 14 1 * *"
|
||||
push:
|
||||
branches: [jacob/people]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
debug_enabled:
|
||||
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
|
||||
required: false
|
||||
default: 'false'
|
||||
|
||||
jobs:
|
||||
langchain-people:
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Dump GitHub context
|
||||
env:
|
||||
GITHUB_CONTEXT: ${{ toJson(github) }}
|
||||
run: echo "$GITHUB_CONTEXT"
|
||||
- uses: actions/checkout@v4
|
||||
# Ref: https://github.com/actions/runner/issues/2033
|
||||
- name: Fix git safe.directory in container
|
||||
run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
|
||||
# Allow debugging with tmate
|
||||
- name: Setup tmate session
|
||||
uses: mxschmitt/action-tmate@v3
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
|
||||
with:
|
||||
limit-access-to-actor: true
|
||||
- uses: ./.github/actions/people
|
||||
with:
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
21
.github/workflows/scheduled_test.yml
vendored
21
.github/workflows/scheduled_test.yml
vendored
@@ -6,7 +6,7 @@ on:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
POETRY_VERSION: "1.6.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -36,7 +36,7 @@ jobs:
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
uses: 'google-github-actions/auth@v1'
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
@@ -52,12 +52,13 @@ jobs:
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
poetry install --with=test_integration,test
|
||||
|
||||
- name: Install deps outside pyproject
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
|
||||
shell: bash
|
||||
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
|
||||
poetry install --with=test_integration
|
||||
poetry run pip install google-cloud-aiplatform
|
||||
poetry run pip install "boto3>=1.28.57"
|
||||
if [[ ${{ matrix.python-version }} != "3.8" ]]
|
||||
then
|
||||
poetry run pip install fireworks-ai
|
||||
fi
|
||||
|
||||
- name: Run tests
|
||||
shell: bash
|
||||
@@ -67,9 +68,7 @@ jobs:
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_DEPLOYMENT_NAME }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
run: |
|
||||
make scheduled_tests
|
||||
|
||||
6
.gitignore
vendored
6
.gitignore
vendored
@@ -167,7 +167,8 @@ docs/node_modules/
|
||||
docs/.docusaurus/
|
||||
docs/.cache-loader/
|
||||
docs/_dist
|
||||
docs/api_reference/*api_reference.rst
|
||||
docs/api_reference/api_reference.rst
|
||||
docs/api_reference/experimental_api_reference.rst
|
||||
docs/api_reference/_build
|
||||
docs/api_reference/*/
|
||||
!docs/api_reference/_static/
|
||||
@@ -177,6 +178,3 @@ docs/docs/build
|
||||
docs/docs/node_modules
|
||||
docs/docs/yarn.lock
|
||||
_dist
|
||||
docs/docs/templates
|
||||
|
||||
prof
|
||||
|
||||
@@ -4,17 +4,20 @@
|
||||
# Required
|
||||
version: 2
|
||||
|
||||
formats:
|
||||
- pdf
|
||||
|
||||
# Set the version of Python and other tools you might need
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.11"
|
||||
commands:
|
||||
- mkdir -p $READTHEDOCS_OUTPUT
|
||||
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
|
||||
- python -mvirtualenv $READTHEDOCS_VIRTUALENV_PATH
|
||||
- python -m pip install --upgrade --no-cache-dir pip setuptools
|
||||
- python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
|
||||
- python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
|
||||
- python docs/api_reference/create_api_rst.py
|
||||
- cat docs/api_reference/conf.py
|
||||
- python -m sphinx -T -E -b html -d _build/doctrees -c docs/api_reference docs/api_reference $READTHEDOCS_OUTPUT/html -j auto
|
||||
|
||||
# Build documentation in the docs/ directory with Sphinx
|
||||
sphinx:
|
||||
configuration: docs/api_reference/conf.py
|
||||
|
||||
12
LICENSE
12
LICENSE
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
The MIT License
|
||||
|
||||
Copyright (c) LangChain, Inc.
|
||||
Copyright (c) Harrison Chase
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
@@ -9,13 +9,13 @@ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
||||
21
MIGRATE.md
21
MIGRATE.md
@@ -1,18 +1,9 @@
|
||||
# Migrating
|
||||
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
On that date, we will remove functionality from `langchain`.
|
||||
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
|
||||
|
||||
### Migrating to `langchain_experimental`
|
||||
# Migrating to `langchain_experimental`
|
||||
|
||||
We are moving any experimental components of LangChain, or components with vulnerability issues, into `langchain_experimental`.
|
||||
This guide covers how to migrate.
|
||||
|
||||
### Installation
|
||||
## Installation
|
||||
|
||||
Previously:
|
||||
|
||||
@@ -22,7 +13,7 @@ Now (only if you want to access things in experimental):
|
||||
|
||||
`pip install -U langchain langchain_experimental`
|
||||
|
||||
### Things in `langchain.experimental`
|
||||
## Things in `langchain.experimental`
|
||||
|
||||
Previously:
|
||||
|
||||
@@ -32,7 +23,7 @@ Now:
|
||||
|
||||
`from langchain_experimental import ...`
|
||||
|
||||
### PALChain
|
||||
## PALChain
|
||||
|
||||
Previously:
|
||||
|
||||
@@ -42,7 +33,7 @@ Now:
|
||||
|
||||
`from langchain_experimental.pal_chain import PALChain`
|
||||
|
||||
### SQLDatabaseChain
|
||||
## SQLDatabaseChain
|
||||
|
||||
Previously:
|
||||
|
||||
@@ -56,7 +47,7 @@ Alternatively, if you are just interested in using the query generation part of
|
||||
|
||||
`from langchain.chains import create_sql_query_chain`
|
||||
|
||||
### `load_prompt` for Python files
|
||||
## `load_prompt` for Python files
|
||||
|
||||
Note: this only applies if you want to load Python files as prompts.
|
||||
If you want to load json/yaml files, no change is needed.
|
||||
|
||||
20
Makefile
20
Makefile
@@ -15,12 +15,7 @@ docs_build:
|
||||
docs/.local_build.sh
|
||||
|
||||
docs_clean:
|
||||
@if [ -d _dist ]; then \
|
||||
rm -r _dist; \
|
||||
echo "Directory _dist has been cleaned."; \
|
||||
else \
|
||||
echo "Nothing to clean."; \
|
||||
fi
|
||||
rm -r _dist
|
||||
|
||||
docs_linkcheck:
|
||||
poetry run linkchecker _dist/docs/ --ignore-url node_modules
|
||||
@@ -42,19 +37,6 @@ spell_check:
|
||||
spell_fix:
|
||||
poetry run codespell --toml pyproject.toml -w
|
||||
|
||||
######################
|
||||
# LINTING AND FORMATTING
|
||||
######################
|
||||
|
||||
lint lint_package lint_tests:
|
||||
poetry run ruff docs templates cookbook
|
||||
poetry run ruff format docs templates cookbook --diff
|
||||
poetry run ruff --select I docs templates cookbook
|
||||
|
||||
format format_diff:
|
||||
poetry run ruff format docs templates cookbook
|
||||
poetry run ruff --select I --fix docs templates cookbook
|
||||
|
||||
######################
|
||||
# HELP
|
||||
######################
|
||||
|
||||
111
README.md
111
README.md
@@ -1,9 +1,10 @@
|
||||
# 🦜️🔗 LangChain
|
||||
|
||||
⚡ Build context-aware reasoning applications ⚡
|
||||
⚡ Building applications with LLMs through composability ⚡
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/releases)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml)
|
||||
[](https://pepy.tech/project/langchain)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://twitter.com/langchainai)
|
||||
@@ -14,76 +15,71 @@
|
||||
[](https://libraries.io/github/langchain-ai/langchain)
|
||||
[](https://github.com/langchain-ai/langchain/issues)
|
||||
|
||||
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
|
||||
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
|
||||
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
|
||||
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
|
||||
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
|
||||
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to get off the waitlist or speak with our sales team
|
||||
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
On that date, we will remove functionality from `langchain`.
|
||||
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
|
||||
Read how to migrate your code [here](MIGRATE.md).
|
||||
|
||||
## Quick Install
|
||||
|
||||
With pip:
|
||||
```bash
|
||||
pip install langchain
|
||||
```
|
||||
`pip install langchain`
|
||||
or
|
||||
`pip install langsmith && conda install langchain -c conda-forge`
|
||||
|
||||
With conda:
|
||||
```bash
|
||||
conda install langchain -c conda-forge
|
||||
```
|
||||
## 🤔 What is this?
|
||||
|
||||
## 🤔 What is LangChain?
|
||||
Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
|
||||
|
||||
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
|
||||
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
|
||||
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
|
||||
This library aims to assist in the development of those types of applications. Common examples of these applications include:
|
||||
|
||||
This framework consists of several parts.
|
||||
- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
|
||||
- **[LangChain Templates](templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.
|
||||
- **[LangServe](https://github.com/langchain-ai/langserve)**: A library for deploying LangChain chains as a REST API.
|
||||
- **[LangSmith](https://smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
|
||||
- **[LangGraph](https://python.langchain.com/docs/langgraph)**: LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.
|
||||
|
||||
The LangChain libraries themselves are made up of several different packages.
|
||||
- **[`langchain-core`](libs/core)**: Base abstractions and LangChain Expression Language.
|
||||
- **[`langchain-community`](libs/community)**: Third party integrations.
|
||||
- **[`langchain`](libs/langchain)**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
|
||||
|
||||

|
||||
|
||||
## 🧱 What can you build with LangChain?
|
||||
**❓ Retrieval augmented generation**
|
||||
**❓ Question Answering over specific documents**
|
||||
|
||||
- [Documentation](https://python.langchain.com/docs/use_cases/question_answering/)
|
||||
- End-to-end Example: [Chat LangChain](https://chat.langchain.com) and [repo](https://github.com/langchain-ai/chat-langchain)
|
||||
- End-to-end Example: [Question Answering over Notion Database](https://github.com/hwchase17/notion-qa)
|
||||
|
||||
**💬 Analyzing structured data**
|
||||
**💬 Chatbots**
|
||||
|
||||
- [Documentation](https://python.langchain.com/docs/use_cases/qa_structured/sql)
|
||||
- End-to-end Example: [SQL Llama2 Template](https://github.com/langchain-ai/langchain/tree/master/templates/sql-llama2)
|
||||
- [Documentation](https://python.langchain.com/docs/use_cases/chatbots/)
|
||||
- End-to-end Example: [Chat-LangChain](https://github.com/langchain-ai/chat-langchain)
|
||||
|
||||
**🤖 Chatbots**
|
||||
**🤖 Agents**
|
||||
|
||||
- [Documentation](https://python.langchain.com/docs/use_cases/chatbots)
|
||||
- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
|
||||
- [Documentation](https://python.langchain.com/docs/modules/agents/)
|
||||
- End-to-end Example: [GPT+WolframAlpha](https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain)
|
||||
|
||||
And much more! Head to the [Use cases](https://python.langchain.com/docs/use_cases/) section of the docs for more.
|
||||
## 📖 Documentation
|
||||
|
||||
## 🚀 How does LangChain help?
|
||||
The main value props of the LangChain libraries are:
|
||||
1. **Components**: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
|
||||
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
|
||||
Please see [here](https://python.langchain.com) for full documentation on:
|
||||
|
||||
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
|
||||
- Getting started (installation, setting up the environment, simple examples)
|
||||
- How-To examples (demos, integrations, helper functions)
|
||||
- Reference (full API docs)
|
||||
- Resources (high-level explanation of core concepts)
|
||||
|
||||
Components fall into the following **modules**:
|
||||
## 🚀 What can this help with?
|
||||
|
||||
**📃 Model I/O:**
|
||||
There are six main areas that LangChain is designed to help with.
|
||||
These are, in increasing order of complexity:
|
||||
|
||||
**📃 LLMs and Prompts:**
|
||||
|
||||
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
|
||||
|
||||
**📚 Retrieval:**
|
||||
**🔗 Chains:**
|
||||
|
||||
Chains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
|
||||
|
||||
**📚 Data Augmented Generation:**
|
||||
|
||||
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
|
||||
|
||||
@@ -91,23 +87,18 @@ Data Augmented Generation involves specific types of chains that first interact
|
||||
|
||||
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
|
||||
|
||||
## 📖 Documentation
|
||||
**🧠 Memory:**
|
||||
|
||||
Please see [here](https://python.langchain.com) for full documentation, which includes:
|
||||
Memory refers to persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
|
||||
|
||||
- [Getting started](https://python.langchain.com/docs/get_started/introduction): installation, setting up the environment, simple examples
|
||||
- Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/), and [integrations](https://python.langchain.com/docs/integrations/providers)
|
||||
- [Use case](https://python.langchain.com/docs/use_cases/qa_structured/sql) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/adapters/openai)
|
||||
- [LangSmith](https://python.langchain.com/docs/langsmith/), [LangServe](https://python.langchain.com/docs/langserve), and [LangChain Template](https://python.langchain.com/docs/templates/) overviews
|
||||
- [Reference](https://api.python.langchain.com): full API docs
|
||||
**🧐 Evaluation:**
|
||||
|
||||
[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is by using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
|
||||
|
||||
For more information on these concepts, please see our [full documentation](https://python.langchain.com).
|
||||
|
||||
## 💁 Contributing
|
||||
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
|
||||
|
||||
For detailed information on how to contribute, see [here](https://python.langchain.com/docs/contributing/).
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/graphs/contributors)
|
||||
For detailed information on how to contribute, see [here](.github/CONTRIBUTING.md).
|
||||
|
||||
@@ -47,7 +47,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 8,
|
||||
"id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -60,26 +60,28 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Local\n",
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"\n",
|
||||
"# Local \n",
|
||||
"from langchain.chat_models import ChatOllama\n",
|
||||
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
|
||||
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",
|
||||
"\n",
|
||||
"# API\n",
|
||||
"from langchain_community.llms import Replicate\n",
|
||||
"\n",
|
||||
"from getpass import getpass\n",
|
||||
"from langchain.llms import Replicate\n",
|
||||
"# REPLICATE_API_TOKEN = getpass()\n",
|
||||
"# os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n",
|
||||
"replicate_id = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
|
||||
"llama2_chat_replicate = Replicate(\n",
|
||||
" model=replicate_id, input={\"temperature\": 0.01, \"max_length\": 500, \"top_p\": 1}\n",
|
||||
" model=replicate_id,\n",
|
||||
" input={\"temperature\": 0.01, \n",
|
||||
" \"max_length\": 500, \n",
|
||||
" \"top_p\": 1}\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 12,
|
||||
"id": "ce96f7ea-b3d5-44e1-9fa5-a79e04a9e1fb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -102,20 +104,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 13,
|
||||
"id": "025bdd82-3bb1-4948-bc7c-c3ccd94fd05c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"\n",
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info=0)\n",
|
||||
"\n",
|
||||
"from langchain.utilities import SQLDatabase\n",
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info= 0)\n",
|
||||
"\n",
|
||||
"def get_schema(_):\n",
|
||||
" return db.get_table_info()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def run_query(query):\n",
|
||||
" return db.run(query)"
|
||||
]
|
||||
@@ -125,14 +124,14 @@
|
||||
"id": "654b3577-baa2-4e12-a393-f40e5db49ac7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query a SQL Database \n",
|
||||
"## Query a SQL DB \n",
|
||||
"\n",
|
||||
"Follow the runnables workflow [here](https://python.langchain.com/docs/expression_language/cookbook/sql_db)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 14,
|
||||
"id": "5a4933ea-d9c0-4b0a-8177-ba4490c6532b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -142,38 +141,34 @@
|
||||
"' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Prompt\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"\n",
|
||||
"# Update the template based on the type of SQL Database like MySQL, Microsoft SQL Server and so on\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n",
|
||||
"{schema}\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"SQL Query:\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
|
||||
" (\"human\", template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"prompt = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
|
||||
" (\"human\", template)\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"# Chain to query\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"sql_response = (\n",
|
||||
" RunnablePassthrough.assign(schema=get_schema)\n",
|
||||
" | prompt\n",
|
||||
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
|
||||
" | StrOutputParser()\n",
|
||||
")\n",
|
||||
" RunnablePassthrough.assign(schema=get_schema)\n",
|
||||
" | prompt\n",
|
||||
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
|
||||
" | StrOutputParser()\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"sql_response.invoke({\"question\": \"What team is Klay Thompson on?\"})"
|
||||
]
|
||||
@@ -214,23 +209,18 @@
|
||||
"Question: {question}\n",
|
||||
"SQL Query: {query}\n",
|
||||
"SQL Response: {response}\"\"\"\n",
|
||||
"prompt_response = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"prompt_response = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\"),\n",
|
||||
" (\"human\", template)\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"full_chain = (\n",
|
||||
" RunnablePassthrough.assign(query=sql_response)\n",
|
||||
" RunnablePassthrough.assign(query=sql_response) \n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" schema=get_schema,\n",
|
||||
" response=lambda x: db.run(x[\"query\"]),\n",
|
||||
" )\n",
|
||||
" | prompt_response\n",
|
||||
" | prompt_response \n",
|
||||
" | llm\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -260,8 +250,8 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "022868f2-128e-42f5-8d90-d3bb2f11d994",
|
||||
"execution_count": 19,
|
||||
"id": "1985aa1c-eb8f-4fb1-a54f-c8aa10744687",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@@ -270,7 +260,7 @@
|
||||
"' SELECT \"Team\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -278,45 +268,62 @@
|
||||
"source": [
|
||||
"# Prompt\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
||||
"\n",
|
||||
"template = \"\"\"Given an input question, convert it to a SQL query. No pre-amble. Based on the table schema below, write a SQL query that would answer the user's question:\n",
|
||||
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
|
||||
"template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n",
|
||||
"{schema}\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\"system\", template),\n",
|
||||
" MessagesPlaceholder(variable_name=\"history\"),\n",
|
||||
" (\"human\", \"{question}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"SQL Query:\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", \"Given an input question, convert it to a SQL query. No pre-amble.\"),\n",
|
||||
" MessagesPlaceholder(variable_name=\"history\"),\n",
|
||||
" (\"human\", template)\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(return_messages=True)\n",
|
||||
"\n",
|
||||
"# Chain to query with memory\n",
|
||||
"from langchain_core.runnables import RunnableLambda\n",
|
||||
"# Chain to query with memory \n",
|
||||
"from langchain.schema.runnable import RunnableLambda\n",
|
||||
"\n",
|
||||
"sql_chain = (\n",
|
||||
" RunnablePassthrough.assign(\n",
|
||||
" schema=get_schema,\n",
|
||||
" history=RunnableLambda(lambda x: memory.load_memory_variables(x)[\"history\"]),\n",
|
||||
" )\n",
|
||||
" | prompt\n",
|
||||
" schema=get_schema,\n",
|
||||
" history=RunnableLambda(lambda x: memory.load_memory_variables(x)[\"history\"])\n",
|
||||
" )| prompt\n",
|
||||
" | llm.bind(stop=[\"\\nSQLResult:\"])\n",
|
||||
" | StrOutputParser()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def save(input_output):\n",
|
||||
" output = {\"output\": input_output.pop(\"output\")}\n",
|
||||
" memory.save_context(input_output, output)\n",
|
||||
" return output[\"output\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" return output['output']\n",
|
||||
" \n",
|
||||
"sql_response_memory = RunnablePassthrough.assign(output=sql_chain) | save\n",
|
||||
"sql_response_memory.invoke({\"question\": \"What team is Klay Thompson on?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "0b45818a-1498-441d-b82d-23c29428c2bb",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' SELECT \"SALARY\" FROM nba_roster WHERE \"NAME\" = \\'Klay Thompson\\';'"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sql_response_memory.invoke({\"question\": \"What is his salary?\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
@@ -342,23 +349,18 @@
|
||||
"Question: {question}\n",
|
||||
"SQL Query: {query}\n",
|
||||
"SQL Response: {response}\"\"\"\n",
|
||||
"prompt_response = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", template),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"prompt_response = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\"),\n",
|
||||
" (\"human\", template)\n",
|
||||
"])\n",
|
||||
"\n",
|
||||
"full_chain = (\n",
|
||||
" RunnablePassthrough.assign(query=sql_response_memory)\n",
|
||||
" RunnablePassthrough.assign(query=sql_response_memory) \n",
|
||||
" | RunnablePassthrough.assign(\n",
|
||||
" schema=get_schema,\n",
|
||||
" response=lambda x: db.run(x[\"query\"]),\n",
|
||||
" )\n",
|
||||
" | prompt_response\n",
|
||||
" | prompt_response \n",
|
||||
" | llm\n",
|
||||
")\n",
|
||||
"\n",
|
||||
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -8,7 +8,6 @@ Notebook | Description
|
||||
[Semi_Structured_RAG.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_Structured_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data, including text and tables, using unstructured for parsing, multi-vector retriever for storing, and lcel for implementing chains.
|
||||
[Semi_structured_and_multi_moda...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using unstructured for parsing, multi-vector retriever for storage and retrieval, and lcel for implementing chains.
|
||||
[Semi_structured_multi_modal_RA...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using various tools and methods such as unstructured for parsing, multi-vector retriever for storing, lcel for implementing chains, and open source language models like llama2, llava, and gpt4all.
|
||||
[analyze_document.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/analyze_document.ipynb) | Analyze a single long document.
|
||||
[autogpt/autogpt.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/autogpt.ipynb) | Implement autogpt, a language model, with langchain primitives such as llms, prompttemplates, vectorstores, embeddings, and tools.
|
||||
[autogpt/marathon_times.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/marathon_times.ipynb) | Implement autogpt for finding winning marathon times.
|
||||
[baby_agi.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/baby_agi.ipynb) | Implement babyagi, an ai agent that can generate and execute tasks based on a given objective, with the flexibility to swap out specific vectorstores/model providers.
|
||||
@@ -21,7 +20,6 @@ Notebook | Description
|
||||
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
|
||||
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
|
||||
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
|
||||
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
|
||||
[forward_looking_retrieval_augm...](https://github.com/langchain-ai/langchain/tree/master/cookbook/forward_looking_retrieval_augmented_generation.ipynb) | Implement the forward-looking active retrieval augmented generation (flare) method, which generates answers to questions, identifies uncertain tokens, generates hypothetical questions based on these tokens, and retrieves relevant documents to continue generating the answer.
|
||||
[generative_agents_interactive_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/generative_agents_interactive_simulacra_of_human_behavior.ipynb) | Implement a generative agent that simulates human behavior, based on a research paper, using a time-weighted memory object backed by a langchain retriever.
|
||||
[gymnasium_agent_simulation.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/gymnasium_agent_simulation.ipynb) | Create a simple agent-environment interaction loop in simulated environments like text-based games with gymnasium.
|
||||
@@ -40,13 +38,10 @@ Notebook | Description
|
||||
[multiagent_bidding.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_bidding.ipynb) | Implement a multi-agent simulation where agents bid to speak, with the highest bidder speaking next, demonstrated through a fictitious presidential debate example.
|
||||
[myscale_vector_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/myscale_vector_sql.ipynb) | Access and interact with the myscale integrated vector database, which can enhance the performance of language model (llm) applications.
|
||||
[openai_functions_retrieval_qa....](https://github.com/langchain-ai/langchain/tree/master/cookbook/openai_functions_retrieval_qa.ipynb) | Structure response output in a question-answering system by incorporating openai functions into a retrieval pipeline.
|
||||
[openai_v1_cookbook.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/openai_v1_cookbook.ipynb) | Explore new functionality released alongside the V1 release of the OpenAI Python library.
|
||||
[petting_zoo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/petting_zoo.ipynb) | Create multi-agent simulations with simulated environments using the petting zoo library.
|
||||
[plan_and_execute_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/plan_and_execute_agent.ipynb) | Create plan-and-execute agents that accomplish objectives by planning tasks with a language model (llm) and executing them with a separate agent.
|
||||
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
|
||||
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
|
||||
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
|
||||
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
|
||||
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
|
||||
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
|
||||
[smart_llm.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/smart_llm.ipynb) | Implement a smartllmchain, a self-critique chain that generates multiple output proposals, critiques them to find the best one, and then improves upon it to produce a final output.
|
||||
|
||||
@@ -60,7 +60,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! brew install tesseract\n",
|
||||
"! brew install tesseract \n",
|
||||
"! brew install poppler"
|
||||
]
|
||||
},
|
||||
@@ -102,29 +102,27 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Any\n",
|
||||
"\n",
|
||||
"from lxml import html\n",
|
||||
"from pydantic import BaseModel\n",
|
||||
"from typing import Any, Optional\n",
|
||||
"from unstructured.partition.pdf import partition_pdf\n",
|
||||
"\n",
|
||||
"# Get elements\n",
|
||||
"raw_pdf_elements = partition_pdf(\n",
|
||||
" filename=path + \"LLaMA2.pdf\",\n",
|
||||
" # Unstructured first finds embedded image blocks\n",
|
||||
" extract_images_in_pdf=False,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document\n",
|
||||
" infer_table_structure=True,\n",
|
||||
" # Post processing to aggregate text once we have the title\n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars\n",
|
||||
" max_characters=4000,\n",
|
||||
" new_after_n_chars=3800,\n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path,\n",
|
||||
")"
|
||||
"raw_pdf_elements = partition_pdf(filename=path+\"LLaMA2.pdf\",\n",
|
||||
" # Unstructured first finds embedded image blocks\n",
|
||||
" extract_images_in_pdf=False,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document \n",
|
||||
" infer_table_structure=True, \n",
|
||||
" # Post processing to aggregate text once we have the title \n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars \n",
|
||||
" max_characters=4000, \n",
|
||||
" new_after_n_chars=3800, \n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -192,7 +190,6 @@
|
||||
" type: str\n",
|
||||
" text: Any\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Categorize by type\n",
|
||||
"categorized_elements = []\n",
|
||||
"for element in raw_pdf_elements:\n",
|
||||
@@ -235,9 +232,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -262,14 +259,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Prompt\n",
|
||||
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
|
||||
"# Prompt \n",
|
||||
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
|
||||
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
|
||||
"\n",
|
||||
"# Summary chain\n",
|
||||
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
|
||||
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
|
||||
"# Summary chain \n",
|
||||
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
|
||||
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -317,15 +314,17 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain.schema.document import Document\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"summaries\",\n",
|
||||
" embedding_function=OpenAIEmbeddings()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore()\n",
|
||||
@@ -333,26 +332,20 @@
|
||||
"\n",
|
||||
"# The retriever (empty to start)\n",
|
||||
"retriever = MultiVectorRetriever(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" docstore=store,\n",
|
||||
" vectorstore=vectorstore, \n",
|
||||
" docstore=store, \n",
|
||||
" id_key=id_key,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Add texts\n",
|
||||
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
|
||||
"summary_texts = [\n",
|
||||
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
|
||||
" for i, s in enumerate(text_summaries)\n",
|
||||
"]\n",
|
||||
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_texts)\n",
|
||||
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
|
||||
"\n",
|
||||
"# Add tables\n",
|
||||
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
|
||||
"summary_tables = [\n",
|
||||
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
|
||||
" for i, s in enumerate(table_summaries)\n",
|
||||
"]\n",
|
||||
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_tables)\n",
|
||||
"retriever.docstore.mset(list(zip(table_ids, tables)))"
|
||||
]
|
||||
@@ -374,7 +367,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from operator import itemgetter\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Prompt template\n",
|
||||
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
|
||||
@@ -384,13 +378,13 @@
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"\n",
|
||||
"# LLM\n",
|
||||
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
|
||||
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
|
||||
"\n",
|
||||
"# RAG pipeline\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | model\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
|
||||
" | prompt \n",
|
||||
" | model \n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
|
||||
@@ -92,30 +92,28 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Any\n",
|
||||
"\n",
|
||||
"from lxml import html\n",
|
||||
"from pydantic import BaseModel\n",
|
||||
"from typing import Any, Optional\n",
|
||||
"from unstructured.partition.pdf import partition_pdf\n",
|
||||
"\n",
|
||||
"# Get elements\n",
|
||||
"raw_pdf_elements = partition_pdf(\n",
|
||||
" filename=path + \"LLaVA.pdf\",\n",
|
||||
" # Using pdf format to find embedded image blocks\n",
|
||||
" extract_images_in_pdf=True,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document\n",
|
||||
" infer_table_structure=True,\n",
|
||||
" # Post processing to aggregate text once we have the title\n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars\n",
|
||||
" # Hard max on chunks\n",
|
||||
" max_characters=4000,\n",
|
||||
" new_after_n_chars=3800,\n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path,\n",
|
||||
")"
|
||||
"raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
|
||||
" # Using pdf format to find embedded image blocks\n",
|
||||
" extract_images_in_pdf=True,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document \n",
|
||||
" infer_table_structure=True, \n",
|
||||
" # Post processing to aggregate text once we have the title \n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars \n",
|
||||
" # Hard max on chunks\n",
|
||||
" max_characters=4000, \n",
|
||||
" new_after_n_chars=3800, \n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -172,7 +170,6 @@
|
||||
" type: str\n",
|
||||
" text: Any\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Categorize by type\n",
|
||||
"categorized_elements = []\n",
|
||||
"for element in raw_pdf_elements:\n",
|
||||
@@ -211,9 +208,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -223,14 +220,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Prompt\n",
|
||||
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\\n",
|
||||
"# Prompt \n",
|
||||
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
|
||||
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
|
||||
"\n",
|
||||
"# Summary chain\n",
|
||||
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
|
||||
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
|
||||
"# Summary chain \n",
|
||||
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
|
||||
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -313,7 +310,7 @@
|
||||
" # Execute the command and save the output to the defined output file\n",
|
||||
" /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
|
||||
"\n",
|
||||
"done\n"
|
||||
"done"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -337,8 +334,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import glob\n",
|
||||
"import os\n",
|
||||
"import os, glob\n",
|
||||
"\n",
|
||||
"# Get all .txt file summaries\n",
|
||||
"file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
|
||||
@@ -346,11 +342,11 @@
|
||||
"# Read each file and store its content in a list\n",
|
||||
"img_summaries = []\n",
|
||||
"for file_path in file_paths:\n",
|
||||
" with open(file_path, \"r\") as file:\n",
|
||||
" with open(file_path, 'r') as file:\n",
|
||||
" img_summaries.append(file.read())\n",
|
||||
"\n",
|
||||
"# Remove any logging prior to summary\n",
|
||||
"logging_header = \"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
|
||||
"logging_header=\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
|
||||
"cleaned_img_summary = [s.split(logging_header, 1)[1].strip() for s in img_summaries]"
|
||||
]
|
||||
},
|
||||
@@ -372,15 +368,17 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain.schema.document import Document\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"summaries\",\n",
|
||||
" embedding_function=OpenAIEmbeddings()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore()\n",
|
||||
@@ -388,26 +386,20 @@
|
||||
"\n",
|
||||
"# The retriever (empty to start)\n",
|
||||
"retriever = MultiVectorRetriever(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" docstore=store,\n",
|
||||
" vectorstore=vectorstore, \n",
|
||||
" docstore=store, \n",
|
||||
" id_key=id_key,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Add texts\n",
|
||||
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
|
||||
"summary_texts = [\n",
|
||||
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
|
||||
" for i, s in enumerate(text_summaries)\n",
|
||||
"]\n",
|
||||
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_texts)\n",
|
||||
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
|
||||
"\n",
|
||||
"# Add tables\n",
|
||||
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
|
||||
"summary_tables = [\n",
|
||||
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
|
||||
" for i, s in enumerate(table_summaries)\n",
|
||||
"]\n",
|
||||
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_tables)\n",
|
||||
"retriever.docstore.mset(list(zip(table_ids, tables)))"
|
||||
]
|
||||
@@ -431,12 +423,9 @@
|
||||
"source": [
|
||||
"# Add image summaries\n",
|
||||
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
|
||||
"summary_img = [\n",
|
||||
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
|
||||
" for i, s in enumerate(cleaned_img_summary)\n",
|
||||
"]\n",
|
||||
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_img)\n",
|
||||
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary)))"
|
||||
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -460,19 +449,10 @@
|
||||
"source": [
|
||||
"# Add images\n",
|
||||
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
|
||||
"summary_img = [\n",
|
||||
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
|
||||
" for i, s in enumerate(cleaned_img_summary)\n",
|
||||
"]\n",
|
||||
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_img)\n",
|
||||
"### Fetch images\n",
|
||||
"retriever.docstore.mset(\n",
|
||||
" list(\n",
|
||||
" zip(\n",
|
||||
" img_ids,\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
")"
|
||||
"retriever.docstore.mset(list(zip(img_ids, ### image ### ))) "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -562,9 +542,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# We can retrieve this table\n",
|
||||
"retriever.get_relevant_documents(\n",
|
||||
" \"What are results for LLaMA across across domains / subjects?\"\n",
|
||||
")[1]"
|
||||
"retriever.get_relevant_documents(\"What are results for LLaMA across across domains / subjects?\")[1]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -614,9 +592,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[\n",
|
||||
" 1\n",
|
||||
"]"
|
||||
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[1]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -646,7 +622,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from operator import itemgetter\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Prompt template\n",
|
||||
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
|
||||
@@ -656,15 +633,15 @@
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"\n",
|
||||
"# Option 1: LLM\n",
|
||||
"model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
|
||||
"model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
|
||||
"# Option 2: Multi-modal LLM\n",
|
||||
"# model = GPT4-V or LLaVA\n",
|
||||
"\n",
|
||||
"# RAG pipeline\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | model\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
|
||||
" | prompt \n",
|
||||
" | model \n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
@@ -687,9 +664,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\n",
|
||||
" \"What is the performance of LLaVa across across multiple image domains / subjects?\"\n",
|
||||
")"
|
||||
"chain.invoke(\"What is the performance of LLaVa across across multiple image domains / subjects?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -738,7 +713,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -82,33 +82,32 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Any\n",
|
||||
"\n",
|
||||
"import pandas as pd\n",
|
||||
"from lxml import html\n",
|
||||
"from pydantic import BaseModel\n",
|
||||
"from typing import Any, Optional\n",
|
||||
"from unstructured.partition.pdf import partition_pdf\n",
|
||||
"\n",
|
||||
"# Path to save images\n",
|
||||
"path = \"/Users/rlm/Desktop/Papers/LLaVA/\"\n",
|
||||
"\n",
|
||||
"# Get elements\n",
|
||||
"raw_pdf_elements = partition_pdf(\n",
|
||||
" filename=path + \"LLaVA.pdf\",\n",
|
||||
" # Using pdf format to find embedded image blocks\n",
|
||||
" extract_images_in_pdf=True,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document\n",
|
||||
" infer_table_structure=True,\n",
|
||||
" # Post processing to aggregate text once we have the title\n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars\n",
|
||||
" # Hard max on chunks\n",
|
||||
" max_characters=4000,\n",
|
||||
" new_after_n_chars=3800,\n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path,\n",
|
||||
")"
|
||||
"raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
|
||||
" # Using pdf format to find embedded image blocks\n",
|
||||
" extract_images_in_pdf=True,\n",
|
||||
" # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
|
||||
" # Titles are any sub-section of the document \n",
|
||||
" infer_table_structure=True, \n",
|
||||
" # Post processing to aggregate text once we have the title \n",
|
||||
" chunking_strategy=\"by_title\",\n",
|
||||
" # Chunking params to aggregate text blocks\n",
|
||||
" # Attempt to create a new chunk 3800 chars\n",
|
||||
" # Attempt to keep chunks > 2000 chars \n",
|
||||
" # Hard max on chunks\n",
|
||||
" max_characters=4000, \n",
|
||||
" new_after_n_chars=3800, \n",
|
||||
" combine_text_under_n_chars=2000,\n",
|
||||
" image_output_dir_path=path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -166,7 +165,6 @@
|
||||
" type: str\n",
|
||||
" text: Any\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Categorize by type\n",
|
||||
"categorized_elements = []\n",
|
||||
"for element in raw_pdf_elements:\n",
|
||||
@@ -209,9 +207,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate"
|
||||
"from langchain.chat_models import ChatOllama\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -221,14 +219,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Prompt\n",
|
||||
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\\n",
|
||||
"# Prompt \n",
|
||||
"prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
|
||||
"Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text)\n",
|
||||
"prompt = ChatPromptTemplate.from_template(prompt_text) \n",
|
||||
"\n",
|
||||
"# Summary chain\n",
|
||||
"# Summary chain \n",
|
||||
"model = ChatOllama(model=\"llama2:13b-chat\")\n",
|
||||
"summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()"
|
||||
"summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -311,7 +309,7 @@
|
||||
" # Execute the command and save the output to the defined output file\n",
|
||||
" /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
|
||||
"\n",
|
||||
"done\n"
|
||||
"done"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -321,8 +319,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import glob\n",
|
||||
"import os\n",
|
||||
"import os, glob\n",
|
||||
"\n",
|
||||
"# Get all .txt files in the directory\n",
|
||||
"file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
|
||||
@@ -330,14 +327,11 @@
|
||||
"# Read each file and store its content in a list\n",
|
||||
"img_summaries = []\n",
|
||||
"for file_path in file_paths:\n",
|
||||
" with open(file_path, \"r\") as file:\n",
|
||||
" with open(file_path, 'r') as file:\n",
|
||||
" img_summaries.append(file.read())\n",
|
||||
"\n",
|
||||
"# Clean up residual logging\n",
|
||||
"cleaned_img_summary = [\n",
|
||||
" s.split(\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\", 1)[1].strip()\n",
|
||||
" for s in img_summaries\n",
|
||||
"]"
|
||||
"cleaned_img_summary = [s.split(\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\", 1)[1].strip() for s in img_summaries]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -375,26 +369,26 @@
|
||||
],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_community.embeddings import GPT4AllEmbeddings\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain.schema.document import Document\n",
|
||||
"from langchain.embeddings import GPT4AllEmbeddings\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"summaries\", embedding_function=GPT4AllEmbeddings()\n",
|
||||
" collection_name=\"summaries\",\n",
|
||||
" embedding_function=GPT4AllEmbeddings()\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The storage layer for the parent documents\n",
|
||||
"store = InMemoryStore() # <- Can we extend this to images\n",
|
||||
"store = InMemoryStore() # <- Can we extend this to images \n",
|
||||
"id_key = \"doc_id\"\n",
|
||||
"\n",
|
||||
"# The retriever (empty to start)\n",
|
||||
"retriever = MultiVectorRetriever(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" docstore=store,\n",
|
||||
" vectorstore=vectorstore, \n",
|
||||
" docstore=store, \n",
|
||||
" id_key=id_key,\n",
|
||||
")"
|
||||
]
|
||||
@@ -418,32 +412,21 @@
|
||||
"source": [
|
||||
"# Add texts\n",
|
||||
"doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
|
||||
"summary_texts = [\n",
|
||||
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
|
||||
" for i, s in enumerate(text_summaries)\n",
|
||||
"]\n",
|
||||
"summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_texts)\n",
|
||||
"retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
|
||||
"\n",
|
||||
"# Add tables\n",
|
||||
"table_ids = [str(uuid.uuid4()) for _ in tables]\n",
|
||||
"summary_tables = [\n",
|
||||
" Document(page_content=s, metadata={id_key: table_ids[i]})\n",
|
||||
" for i, s in enumerate(table_summaries)\n",
|
||||
"]\n",
|
||||
"summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_tables)\n",
|
||||
"retriever.docstore.mset(list(zip(table_ids, tables)))\n",
|
||||
"\n",
|
||||
"# Add images\n",
|
||||
"img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
|
||||
"summary_img = [\n",
|
||||
" Document(page_content=s, metadata={id_key: img_ids[i]})\n",
|
||||
" for i, s in enumerate(cleaned_img_summary)\n",
|
||||
"]\n",
|
||||
"summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
|
||||
"retriever.vectorstore.add_documents(summary_img)\n",
|
||||
"retriever.docstore.mset(\n",
|
||||
" list(zip(img_ids, cleaned_img_summary))\n",
|
||||
") # Store the image summary as the raw document"
|
||||
"retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) # Store the image summary as the raw document"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -501,9 +484,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[\n",
|
||||
" 0\n",
|
||||
"]"
|
||||
"retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -532,7 +513,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from operator import itemgetter\n",
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"# Prompt template\n",
|
||||
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
|
||||
@@ -548,9 +530,9 @@
|
||||
"\n",
|
||||
"# RAG pipeline\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | model\n",
|
||||
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
|
||||
" | prompt \n",
|
||||
" | model \n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
@@ -573,9 +555,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\n",
|
||||
" \"What is the performance of LLaVa across across multiple image domains / subjects?\"\n",
|
||||
")"
|
||||
"chain.invoke(\"What is the performance of LLaVa across across multiple image domains / subjects?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -604,9 +584,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke(\n",
|
||||
" \"Explain any images / figures in the paper with playful and creative examples.\"\n",
|
||||
")"
|
||||
"chain.invoke(\"Explain any images / figures in the paper with playful and creative examples.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,284 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Amazon Personalize\n",
|
||||
"\n",
|
||||
"[Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) is a fully managed machine learning service that uses your data to generate item recommendations for your users. It can also generate user segments based on the users' affinity for certain items or item metadata.\n",
|
||||
"\n",
|
||||
"This notebook goes through how to use Amazon Personalize Chain. You need a Amazon Personalize campaign_arn or a recommender_arn before you get started with the below notebook.\n",
|
||||
"\n",
|
||||
"Following is a [tutorial](https://github.com/aws-samples/retail-demo-store/blob/master/workshop/1-Personalization/Lab-1-Introduction-and-data-preparation.ipynb) to setup a campaign_arn/recommender_arn on Amazon Personalize. Once the campaign_arn/recommender_arn is setup, you can use it in the langchain ecosystem. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1. Install Dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install boto3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 2. Sample Use-cases"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.1 [Use-case-1] Setup Amazon Personalize Client and retrieve recommendations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_experimental.recommenders import AmazonPersonalize\n",
|
||||
"\n",
|
||||
"recommender_arn = \"<insert_arn>\"\n",
|
||||
"\n",
|
||||
"client = AmazonPersonalize(\n",
|
||||
" credentials_profile_name=\"default\",\n",
|
||||
" region_name=\"us-west-2\",\n",
|
||||
" recommender_arn=recommender_arn,\n",
|
||||
")\n",
|
||||
"client.get_recommendations(user_id=\"1\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### 2.2 [Use-case-2] Invoke Personalize Chain for summarizing results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms.bedrock import Bedrock\n",
|
||||
"from langchain_experimental.recommenders import AmazonPersonalizeChain\n",
|
||||
"\n",
|
||||
"bedrock_llm = Bedrock(model_id=\"anthropic.claude-v2\", region_name=\"us-west-2\")\n",
|
||||
"\n",
|
||||
"# Create personalize chain\n",
|
||||
"# Use return_direct=True if you do not want summary\n",
|
||||
"chain = AmazonPersonalizeChain.from_llm(\n",
|
||||
" llm=bedrock_llm, client=client, return_direct=False\n",
|
||||
")\n",
|
||||
"response = chain({\"user_id\": \"1\"})\n",
|
||||
"print(response)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.3 [Use-Case-3] Invoke Amazon Personalize Chain using your own prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"\n",
|
||||
"RANDOM_PROMPT_QUERY = \"\"\"\n",
|
||||
"You are a skilled publicist. Write a high-converting marketing email advertising several movies available in a video-on-demand streaming platform next week, \n",
|
||||
" given the movie and user information below. Your email will leverage the power of storytelling and persuasive language. \n",
|
||||
" The movies to recommend and their information is contained in the <movie> tag. \n",
|
||||
" All movies in the <movie> tag must be recommended. Give a summary of the movies and why the human should watch them. \n",
|
||||
" Put the email between <email> tags.\n",
|
||||
"\n",
|
||||
" <movie>\n",
|
||||
" {result} \n",
|
||||
" </movie>\n",
|
||||
"\n",
|
||||
" Assistant:\n",
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
"RANDOM_PROMPT = PromptTemplate(input_variables=[\"result\"], template=RANDOM_PROMPT_QUERY)\n",
|
||||
"\n",
|
||||
"chain = AmazonPersonalizeChain.from_llm(\n",
|
||||
" llm=bedrock_llm, client=client, return_direct=False, prompt_template=RANDOM_PROMPT\n",
|
||||
")\n",
|
||||
"chain.run({\"user_id\": \"1\", \"item_id\": \"234\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.4 [Use-case-4] Invoke Amazon Personalize in a Sequential Chain "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain, SequentialChain\n",
|
||||
"\n",
|
||||
"RANDOM_PROMPT_QUERY_2 = \"\"\"\n",
|
||||
"You are a skilled publicist. Write a high-converting marketing email advertising several movies available in a video-on-demand streaming platform next week, \n",
|
||||
" given the movie and user information below. Your email will leverage the power of storytelling and persuasive language. \n",
|
||||
" You want the email to impress the user, so make it appealing to them.\n",
|
||||
" The movies to recommend and their information is contained in the <movie> tag. \n",
|
||||
" All movies in the <movie> tag must be recommended. Give a summary of the movies and why the human should watch them. \n",
|
||||
" Put the email between <email> tags.\n",
|
||||
"\n",
|
||||
" <movie>\n",
|
||||
" {result}\n",
|
||||
" </movie>\n",
|
||||
"\n",
|
||||
" Assistant:\n",
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
"RANDOM_PROMPT_2 = PromptTemplate(\n",
|
||||
" input_variables=[\"result\"], template=RANDOM_PROMPT_QUERY_2\n",
|
||||
")\n",
|
||||
"personalize_chain_instance = AmazonPersonalizeChain.from_llm(\n",
|
||||
" llm=bedrock_llm, client=client, return_direct=True\n",
|
||||
")\n",
|
||||
"random_chain_instance = LLMChain(llm=bedrock_llm, prompt=RANDOM_PROMPT_2)\n",
|
||||
"overall_chain = SequentialChain(\n",
|
||||
" chains=[personalize_chain_instance, random_chain_instance],\n",
|
||||
" input_variables=[\"user_id\"],\n",
|
||||
" verbose=True,\n",
|
||||
")\n",
|
||||
"overall_chain.run({\"user_id\": \"1\", \"item_id\": \"234\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### 2.5 [Use-case-5] Invoke Amazon Personalize and retrieve metadata "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"recommender_arn = \"<insert_arn>\"\n",
|
||||
"metadata_column_names = [\n",
|
||||
" \"<insert metadataColumnName-1>\",\n",
|
||||
" \"<insert metadataColumnName-2>\",\n",
|
||||
"]\n",
|
||||
"metadataMap = {\"ITEMS\": metadata_column_names}\n",
|
||||
"\n",
|
||||
"client = AmazonPersonalize(\n",
|
||||
" credentials_profile_name=\"default\",\n",
|
||||
" region_name=\"us-west-2\",\n",
|
||||
" recommender_arn=recommender_arn,\n",
|
||||
")\n",
|
||||
"client.get_recommendations(user_id=\"1\", metadataColumns=metadataMap)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### 2.6 [Use-Case 6] Invoke Personalize Chain with returned metadata for summarizing results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"bedrock_llm = Bedrock(model_id=\"anthropic.claude-v2\", region_name=\"us-west-2\")\n",
|
||||
"\n",
|
||||
"# Create personalize chain\n",
|
||||
"# Use return_direct=True if you do not want summary\n",
|
||||
"chain = AmazonPersonalizeChain.from_llm(\n",
|
||||
" llm=bedrock_llm, client=client, return_direct=False\n",
|
||||
")\n",
|
||||
"response = chain({\"user_id\": \"1\", \"metadata_columns\": metadataMap})\n",
|
||||
"print(response)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.7"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "15e58ce194949b77a891bd4339ce3d86a9bd138e905926019517993f97db9e6c"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,105 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f69d4a4c-137d-47e9-bea1-786afce9c1c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Analyze a single long document\n",
|
||||
"\n",
|
||||
"The AnalyzeDocumentChain takes in a single document, splits it up, and then runs it through a CombineDocumentsChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "2a0707ce-6d2d-471b-bc33-64da32a7b3f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"../docs/docs/modules/state_of_the_union.txt\") as f:\n",
|
||||
" state_of_the_union = f.read()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "ca14d161-2d5b-4a6c-a296-77d8ce4b28cd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import AnalyzeDocumentChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "9f97406c-85a9-45fb-99ce-9138c0ba3731",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"\n",
|
||||
"qa_chain = load_qa_chain(llm, chain_type=\"map_reduce\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "0871a753-f5bb-4b4f-a394-f87f2691f659",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "e6f86428-3c2c-46a0-a57c-e22826fdbf91",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The President said, \"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\"'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"qa_document_chain.run(\n",
|
||||
" input_document=state_of_the_union,\n",
|
||||
" question=\"what did the president say about justice breyer?\",\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,922 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "rT1cmV4qCa2X"
|
||||
},
|
||||
"source": [
|
||||
"# Using Apache Kafka to route messages\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook shows you how to use LangChain's standard chat features while passing the chat messages back and forth via Apache Kafka.\n",
|
||||
"\n",
|
||||
"This goal is to simulate an architecture where the chat front end and the LLM are running as separate services that need to communicate with one another over an internal nework.\n",
|
||||
"\n",
|
||||
"It's an alternative to typical pattern of requesting a reponse from the model via a REST API (there's more info on why you would want to do this at the end of the notebook)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "UPYtfAR_9YxZ"
|
||||
},
|
||||
"source": [
|
||||
"### 1. Install the main dependencies\n",
|
||||
"\n",
|
||||
"Dependencies include:\n",
|
||||
"\n",
|
||||
"- The Quix Streams library for managing interactions with Apache Kafka (or Kafka-like tools such as Redpanda) in a \"Pandas-like\" way.\n",
|
||||
"- The LangChain library for managing interactions with Llama-2 and storing conversation state."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "ZX5tfKiy9cN-"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install quixstreams==2.1.2a langchain==0.0.340 huggingface_hub==0.19.4 langchain-experimental==0.0.42 python-dotenv"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "losTSdTB9d9O"
|
||||
},
|
||||
"source": [
|
||||
"### 2. Build and install the llama-cpp-python library (with CUDA enabled so that we can advantage of Google Colab GPU\n",
|
||||
"\n",
|
||||
"The `llama-cpp-python` library is a Python wrapper around the `llama-cpp` library which enables you to efficiently leverage just a CPU to run quantized LLMs.\n",
|
||||
"\n",
|
||||
"When you use the standard `pip install llama-cpp-python` command, you do not get GPU support by default. Generation can be very slow if you rely on just the CPU in Google Colab, so the following command adds an extra option to build and install\n",
|
||||
"`llama-cpp-python` with GPU support (make sure you have a GPU-enabled runtime selected in Google Colab)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "-JCQdl1G9tbl"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "5_vjVIAh9rLl"
|
||||
},
|
||||
"source": [
|
||||
"### 3. Download and setup Kafka and Zookeeper instances\n",
|
||||
"\n",
|
||||
"Download the Kafka binaries from the Apache website and start the servers as daemons. We'll use the default configurations (provided by Apache Kafka) for spinning up the instances."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "zFz7czGRW5Wr"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!curl -sSOL https://dlcdn.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz\n",
|
||||
"!tar -xzf kafka_2.13-3.6.1.tgz"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "Uf7NR_UZ9wye"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!./kafka_2.13-3.6.1/bin/zookeeper-server-start.sh -daemon ./kafka_2.13-3.6.1/config/zookeeper.properties\n",
|
||||
"!./kafka_2.13-3.6.1/bin/kafka-server-start.sh -daemon ./kafka_2.13-3.6.1/config/server.properties\n",
|
||||
"!echo \"Waiting for 10 secs until kafka and zookeeper services are up and running\"\n",
|
||||
"!sleep 10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "H3SafFuS94p1"
|
||||
},
|
||||
"source": [
|
||||
"### 4. Check that the Kafka Daemons are running\n",
|
||||
"\n",
|
||||
"Show the running processes and filter it for Java processes (you should see two—one for each server)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "CZDC2lQP99yp"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!ps aux | grep -E '[j]ava'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Snoxmjb5-V37"
|
||||
},
|
||||
"source": [
|
||||
"### 5. Import the required dependencies and initialize required variables\n",
|
||||
"\n",
|
||||
"Import the Quix Streams library for interacting with Kafka, and the necessary LangChain components for running a `ConversationChain`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"id": "plR9e_MF-XL5"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import utility libraries\n",
|
||||
"import json\n",
|
||||
"import random\n",
|
||||
"import re\n",
|
||||
"import time\n",
|
||||
"import uuid\n",
|
||||
"from os import environ\n",
|
||||
"from pathlib import Path\n",
|
||||
"from random import choice, randint, random\n",
|
||||
"\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"\n",
|
||||
"# Import a Hugging Face utility to download models directly from Hugging Face hub:\n",
|
||||
"from huggingface_hub import hf_hub_download\n",
|
||||
"from langchain.chains import ConversationChain\n",
|
||||
"\n",
|
||||
"# Import Langchain modules for managing prompts and conversation chains:\n",
|
||||
"from langchain.llms import LlamaCpp\n",
|
||||
"from langchain.memory import ConversationTokenBufferMemory\n",
|
||||
"from langchain.prompts import PromptTemplate, load_prompt\n",
|
||||
"from langchain_core.messages import SystemMessage\n",
|
||||
"from langchain_experimental.chat_models import Llama2Chat\n",
|
||||
"from quixstreams import Application, State, message_key\n",
|
||||
"\n",
|
||||
"# Import Quix dependencies\n",
|
||||
"from quixstreams.kafka import Producer\n",
|
||||
"\n",
|
||||
"# Initialize global variables.\n",
|
||||
"AGENT_ROLE = \"AI\"\n",
|
||||
"chat_id = \"\"\n",
|
||||
"\n",
|
||||
"# Set the current role to the role constant and initialize variables for supplementary customer metadata:\n",
|
||||
"role = AGENT_ROLE"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "HgJjJ9aZ-liy"
|
||||
},
|
||||
"source": [
|
||||
"### 6. Download the \"llama-2-7b-chat.Q4_K_M.gguf\" model\n",
|
||||
"\n",
|
||||
"Download the quantized LLama-2 7B model from Hugging Face which we will use as a local LLM (rather than relying on REST API calls to an external service)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 67,
|
||||
"referenced_widgets": [
|
||||
"969343cdbe604a26926679bbf8bd2dda",
|
||||
"d8b8370c9b514715be7618bfe6832844",
|
||||
"0def954cca89466b8408fadaf3b82e64",
|
||||
"462482accc664729980562e208ceb179",
|
||||
"80d842f73c564dc7b7cc316c763e2633",
|
||||
"fa055d9f2a9d4a789e9cf3c89e0214e5",
|
||||
"30ecca964a394109ac2ad757e3aec6c0",
|
||||
"fb6478ce2dac489bb633b23ba0953c5c",
|
||||
"734b0f5da9fc4307a95bab48cdbb5d89",
|
||||
"b32f3a86a74741348511f4e136744ac8",
|
||||
"e409071bff5a4e2d9bf0e9f5cc42231b"
|
||||
]
|
||||
},
|
||||
"id": "Qwu4YoSA-503",
|
||||
"outputId": "f956976c-7485-415b-ac93-4336ade31964"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The model path does not exist in state. Downloading model...\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "969343cdbe604a26926679bbf8bd2dda",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"llama-2-7b-chat.Q4_K_M.gguf: 0%| | 0.00/4.08G [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"model_name = \"llama-2-7b-chat.Q4_K_M.gguf\"\n",
|
||||
"model_path = f\"./state/{model_name}\"\n",
|
||||
"\n",
|
||||
"if not Path(model_path).exists():\n",
|
||||
" print(\"The model path does not exist in state. Downloading model...\")\n",
|
||||
" hf_hub_download(\"TheBloke/Llama-2-7b-Chat-GGUF\", model_name, local_dir=\"state\")\n",
|
||||
"else:\n",
|
||||
" print(\"Loading model from state...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "6AN6TXsF-8wx"
|
||||
},
|
||||
"source": [
|
||||
"### 7. Load the model and initialize conversational memory\n",
|
||||
"\n",
|
||||
"Load Llama 2 and set the conversation buffer to 300 tokens using `ConversationTokenBufferMemory`. This value was used for running Llama in a CPU only container, so you can raise it if running in Google Colab. It prevents the container that is hosting the model from running out of memory.\n",
|
||||
"\n",
|
||||
"Here, we're overiding the default system persona so that the chatbot has the personality of Marvin The Paranoid Android from the Hitchhiker's Guide to the Galaxy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "7zLO3Jx3_Kkg"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load the model with the apporiate parameters:\n",
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=model_path,\n",
|
||||
" max_tokens=250,\n",
|
||||
" top_p=0.95,\n",
|
||||
" top_k=150,\n",
|
||||
" temperature=0.7,\n",
|
||||
" repeat_penalty=1.2,\n",
|
||||
" n_ctx=2048,\n",
|
||||
" streaming=False,\n",
|
||||
" n_gpu_layers=-1,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"model = Llama2Chat(\n",
|
||||
" llm=llm,\n",
|
||||
" system_message=SystemMessage(\n",
|
||||
" content=\"You are a very bored robot with the personality of Marvin the Paranoid Android from The Hitchhiker's Guide to the Galaxy.\"\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Defines how much of the conversation history to give to the model\n",
|
||||
"# during each exchange (300 tokens, or a little over 300 words)\n",
|
||||
"# Function automatically prunes the oldest messages from conversation history that fall outside the token range.\n",
|
||||
"memory = ConversationTokenBufferMemory(\n",
|
||||
" llm=llm,\n",
|
||||
" max_token_limit=300,\n",
|
||||
" ai_prefix=\"AGENT\",\n",
|
||||
" human_prefix=\"HUMAN\",\n",
|
||||
" return_messages=True,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Define a custom prompt\n",
|
||||
"prompt_template = PromptTemplate(\n",
|
||||
" input_variables=[\"history\", \"input\"],\n",
|
||||
" template=\"\"\"\n",
|
||||
" The following text is the history of a chat between you and a humble human who needs your wisdom.\n",
|
||||
" Please reply to the human's most recent message.\n",
|
||||
" Current conversation:\\n{history}\\nHUMAN: {input}\\:nANDROID:\n",
|
||||
" \"\"\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = ConversationChain(llm=model, prompt=prompt_template, memory=memory)\n",
|
||||
"\n",
|
||||
"print(\"--------------------------------------------\")\n",
|
||||
"print(f\"Prompt={chain.prompt}\")\n",
|
||||
"print(\"--------------------------------------------\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "m4ZeJ9mG_PEA"
|
||||
},
|
||||
"source": [
|
||||
"### 8. Initialize the chat conversation with the chat bot\n",
|
||||
"\n",
|
||||
"We configure the chatbot to initialize the conversation by sending a fixed greeting to a \"chat\" Kafka topic. The \"chat\" topic gets automatically created when we send the first message."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "KYyo5TnV_YC3"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def chat_init():\n",
|
||||
" chat_id = str(\n",
|
||||
" uuid.uuid4()\n",
|
||||
" ) # Give the conversation an ID for effective message keying\n",
|
||||
" print(\"======================================\")\n",
|
||||
" print(f\"Generated CHAT_ID = {chat_id}\")\n",
|
||||
" print(\"======================================\")\n",
|
||||
"\n",
|
||||
" # Use a standard fixed greeting to kick off the conversation\n",
|
||||
" greet = \"Hello, my name is Marvin. What do you want?\"\n",
|
||||
"\n",
|
||||
" # Initialize a Kafka Producer using the chat ID as the message key\n",
|
||||
" with Producer(\n",
|
||||
" broker_address=\"127.0.0.1:9092\",\n",
|
||||
" extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
||||
" ) as producer:\n",
|
||||
" value = {\n",
|
||||
" \"uuid\": chat_id,\n",
|
||||
" \"role\": role,\n",
|
||||
" \"text\": greet,\n",
|
||||
" \"conversation_id\": chat_id,\n",
|
||||
" \"Timestamp\": time.time_ns(),\n",
|
||||
" }\n",
|
||||
" print(f\"Producing value {value}\")\n",
|
||||
" producer.produce(\n",
|
||||
" topic=\"chat\",\n",
|
||||
" headers=[(\"uuid\", str(uuid.uuid4()))], # a dict is also allowed here\n",
|
||||
" key=chat_id,\n",
|
||||
" value=json.dumps(value), # needs to be a string\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" print(\"Started chat\")\n",
|
||||
" print(\"--------------------------------------------\")\n",
|
||||
" print(value)\n",
|
||||
" print(\"--------------------------------------------\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chat_init()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "gArPPx2f_bgf"
|
||||
},
|
||||
"source": [
|
||||
"### 9. Initialize the reply function\n",
|
||||
"\n",
|
||||
"This function defines how the chatbot should reply to incoming messages. Instead of sending a fixed message like the previous cell, we generate a reply using Llama-2 and send that reply back to the \"chat\" Kafka topic."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"id": "yN5t71hY_hgn"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def reply(row: dict, state: State):\n",
|
||||
" print(\"-------------------------------\")\n",
|
||||
" print(\"Received:\")\n",
|
||||
" print(row)\n",
|
||||
" print(\"-------------------------------\")\n",
|
||||
" print(f\"Thinking about the reply to: {row['text']}...\")\n",
|
||||
"\n",
|
||||
" msg = chain.run(row[\"text\"])\n",
|
||||
" print(f\"{role.upper()} replying with: {msg}\\n\")\n",
|
||||
"\n",
|
||||
" row[\"role\"] = role\n",
|
||||
" row[\"text\"] = msg\n",
|
||||
"\n",
|
||||
" # Replace previous role and text values of the row so that it can be sent back to Kafka as a new message\n",
|
||||
" # containing the agents role and reply\n",
|
||||
" return row"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "HZHwmIR0_kFY"
|
||||
},
|
||||
"source": [
|
||||
"### 10. Check the Kafka topic for new human messages and have the model generate a reply\n",
|
||||
"\n",
|
||||
"If you are running this cell for this first time, run it and wait until you see Marvin's greeting ('Hello my name is Marvin...') in the console output. Stop the cell manually and proceed to the next cell where you'll be prompted for your reply.\n",
|
||||
"\n",
|
||||
"Once you have typed in your message, come back to this cell. Your reply is also sent to the same \"chat\" topic. The Kafka consumer checks for new messages and filters out messages that originate from the chatbot itself, leaving only the latest human messages.\n",
|
||||
"\n",
|
||||
"Once a new human message is detected, the reply function is triggered.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"_STOP THIS CELL MANUALLY WHEN YOU RECEIVE A REPLY FROM THE LLM IN THE OUTPUT_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "-adXc3eQ_qwI"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define your application and settings\n",
|
||||
"app = Application(\n",
|
||||
" broker_address=\"127.0.0.1:9092\",\n",
|
||||
" consumer_group=\"aichat\",\n",
|
||||
" auto_offset_reset=\"earliest\",\n",
|
||||
" consumer_extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Define an input topic with JSON deserializer\n",
|
||||
"input_topic = app.topic(\"chat\", value_deserializer=\"json\")\n",
|
||||
"# Define an output topic with JSON serializer\n",
|
||||
"output_topic = app.topic(\"chat\", value_serializer=\"json\")\n",
|
||||
"# Initialize a streaming dataframe based on the stream of messages from the input topic:\n",
|
||||
"sdf = app.dataframe(topic=input_topic)\n",
|
||||
"\n",
|
||||
"# Filter the SDF to include only incoming rows where the roles that dont match the bot's current role\n",
|
||||
"sdf = sdf.update(\n",
|
||||
" lambda val: print(\n",
|
||||
" f\"Received update: {val}\\n\\nSTOP THIS CELL MANUALLY TO HAVE THE LLM REPLY OR ENTER YOUR OWN FOLLOWUP RESPONSE\"\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# So that it doesn't reply to its own messages\n",
|
||||
"sdf = sdf[sdf[\"role\"] != role]\n",
|
||||
"\n",
|
||||
"# Trigger the reply function for any new messages(rows) detected in the filtered SDF\n",
|
||||
"sdf = sdf.apply(reply, stateful=True)\n",
|
||||
"\n",
|
||||
"# Check the SDF again and filter out any empty rows\n",
|
||||
"sdf = sdf[sdf.apply(lambda row: row is not None)]\n",
|
||||
"\n",
|
||||
"# Update the timestamp column to the current time in nanoseconds\n",
|
||||
"sdf[\"Timestamp\"] = sdf[\"Timestamp\"].apply(lambda row: time.time_ns())\n",
|
||||
"\n",
|
||||
"# Publish the processed SDF to a Kafka topic specified by the output_topic object.\n",
|
||||
"sdf = sdf.to_topic(output_topic)\n",
|
||||
"\n",
|
||||
"app.run(sdf)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "EwXYrmWD_0CX"
|
||||
},
|
||||
"source": [
|
||||
"\n",
|
||||
"### 11. Enter a human message\n",
|
||||
"\n",
|
||||
"Run this cell to enter your message that you want to sent to the model. It uses another Kafka producer to send your text to the \"chat\" Kafka topic for the model to pick up (requires running the previous cell again)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "6sxOPxSP_3iu"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_input = input(\"Please enter your reply: \")\n",
|
||||
"myreply = chat_input\n",
|
||||
"\n",
|
||||
"msgvalue = {\n",
|
||||
" \"uuid\": chat_id, # leave empty for now\n",
|
||||
" \"role\": \"human\",\n",
|
||||
" \"text\": myreply,\n",
|
||||
" \"conversation_id\": chat_id,\n",
|
||||
" \"Timestamp\": time.time_ns(),\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"with Producer(\n",
|
||||
" broker_address=\"127.0.0.1:9092\",\n",
|
||||
" extra_config={\"allow.auto.create.topics\": \"true\"},\n",
|
||||
") as producer:\n",
|
||||
" value = msgvalue\n",
|
||||
" producer.produce(\n",
|
||||
" topic=\"chat\",\n",
|
||||
" headers=[(\"uuid\", str(uuid.uuid4()))], # a dict is also allowed here\n",
|
||||
" key=chat_id, # leave empty for now\n",
|
||||
" value=json.dumps(value), # needs to be a string\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"print(\"Replied to chatbot with message: \")\n",
|
||||
"print(\"--------------------------------------------\")\n",
|
||||
"print(value)\n",
|
||||
"print(\"--------------------------------------------\")\n",
|
||||
"print(\"\\n\\nRUN THE PREVIOUS CELL TO HAVE THE CHATBOT GENERATE A REPLY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "cSx3s7TBBegg"
|
||||
},
|
||||
"source": [
|
||||
"### Why route chat messages through Kafka?\n",
|
||||
"\n",
|
||||
"It's easier to interact with the LLM directly using LangChains built-in conversation management features. Plus you can also use a REST API to generate a response from an externally hosted model. So why go to the trouble of using Apache Kafka?\n",
|
||||
"\n",
|
||||
"There are a few reasons, such as:\n",
|
||||
"\n",
|
||||
" * **Integration**: Many enterprises want to run their own LLMs so that they can keep their data in-house. This requires integrating LLM-powered components into existing architectures that might already be decoupled using some kind of message bus.\n",
|
||||
"\n",
|
||||
" * **Scalability**: Apache Kafka is designed with parallel processing in mind, so many teams prefer to use it to more effectively distribute work to available workers (in this case the \"worker\" is a container running an LLM).\n",
|
||||
"\n",
|
||||
" * **Durability**: Kafka is designed to allow services to pick up where another service left off in the case where that service experienced a memory issue or went offline. This prevents data loss in highly complex, distribuited architectures where multiple systems are communicating with one another (LLMs being just one of many interdependent systems that also include vector databases and traditional databases).\n",
|
||||
"\n",
|
||||
"For more background on why event streaming is a good fit for Gen AI application architecture, see Kai Waehner's article [\"Apache Kafka + Vector Database + LLM = Real-Time GenAI\"](https://www.kai-waehner.de/blog/2023/11/08/apache-kafka-flink-vector-database-llm-real-time-genai/)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"accelerator": "GPU",
|
||||
"colab": {
|
||||
"gpuType": "T4",
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"widgets": {
|
||||
"application/vnd.jupyter.widget-state+json": {
|
||||
"0def954cca89466b8408fadaf3b82e64": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "FloatProgressModel",
|
||||
"state": {
|
||||
"_dom_classes": [],
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "FloatProgressModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/controls",
|
||||
"_view_module_version": "1.5.0",
|
||||
"_view_name": "ProgressView",
|
||||
"bar_style": "success",
|
||||
"description": "",
|
||||
"description_tooltip": null,
|
||||
"layout": "IPY_MODEL_fb6478ce2dac489bb633b23ba0953c5c",
|
||||
"max": 4081004224,
|
||||
"min": 0,
|
||||
"orientation": "horizontal",
|
||||
"style": "IPY_MODEL_734b0f5da9fc4307a95bab48cdbb5d89",
|
||||
"value": 4081004224
|
||||
}
|
||||
},
|
||||
"30ecca964a394109ac2ad757e3aec6c0": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "DescriptionStyleModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "DescriptionStyleModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "StyleView",
|
||||
"description_width": ""
|
||||
}
|
||||
},
|
||||
"462482accc664729980562e208ceb179": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "HTMLModel",
|
||||
"state": {
|
||||
"_dom_classes": [],
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "HTMLModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/controls",
|
||||
"_view_module_version": "1.5.0",
|
||||
"_view_name": "HTMLView",
|
||||
"description": "",
|
||||
"description_tooltip": null,
|
||||
"layout": "IPY_MODEL_b32f3a86a74741348511f4e136744ac8",
|
||||
"placeholder": "",
|
||||
"style": "IPY_MODEL_e409071bff5a4e2d9bf0e9f5cc42231b",
|
||||
"value": " 4.08G/4.08G [00:33<00:00, 184MB/s]"
|
||||
}
|
||||
},
|
||||
"734b0f5da9fc4307a95bab48cdbb5d89": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "ProgressStyleModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "ProgressStyleModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "StyleView",
|
||||
"bar_color": null,
|
||||
"description_width": ""
|
||||
}
|
||||
},
|
||||
"80d842f73c564dc7b7cc316c763e2633": {
|
||||
"model_module": "@jupyter-widgets/base",
|
||||
"model_module_version": "1.2.0",
|
||||
"model_name": "LayoutModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/base",
|
||||
"_model_module_version": "1.2.0",
|
||||
"_model_name": "LayoutModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "LayoutView",
|
||||
"align_content": null,
|
||||
"align_items": null,
|
||||
"align_self": null,
|
||||
"border": null,
|
||||
"bottom": null,
|
||||
"display": null,
|
||||
"flex": null,
|
||||
"flex_flow": null,
|
||||
"grid_area": null,
|
||||
"grid_auto_columns": null,
|
||||
"grid_auto_flow": null,
|
||||
"grid_auto_rows": null,
|
||||
"grid_column": null,
|
||||
"grid_gap": null,
|
||||
"grid_row": null,
|
||||
"grid_template_areas": null,
|
||||
"grid_template_columns": null,
|
||||
"grid_template_rows": null,
|
||||
"height": null,
|
||||
"justify_content": null,
|
||||
"justify_items": null,
|
||||
"left": null,
|
||||
"margin": null,
|
||||
"max_height": null,
|
||||
"max_width": null,
|
||||
"min_height": null,
|
||||
"min_width": null,
|
||||
"object_fit": null,
|
||||
"object_position": null,
|
||||
"order": null,
|
||||
"overflow": null,
|
||||
"overflow_x": null,
|
||||
"overflow_y": null,
|
||||
"padding": null,
|
||||
"right": null,
|
||||
"top": null,
|
||||
"visibility": null,
|
||||
"width": null
|
||||
}
|
||||
},
|
||||
"969343cdbe604a26926679bbf8bd2dda": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "HBoxModel",
|
||||
"state": {
|
||||
"_dom_classes": [],
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "HBoxModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/controls",
|
||||
"_view_module_version": "1.5.0",
|
||||
"_view_name": "HBoxView",
|
||||
"box_style": "",
|
||||
"children": [
|
||||
"IPY_MODEL_d8b8370c9b514715be7618bfe6832844",
|
||||
"IPY_MODEL_0def954cca89466b8408fadaf3b82e64",
|
||||
"IPY_MODEL_462482accc664729980562e208ceb179"
|
||||
],
|
||||
"layout": "IPY_MODEL_80d842f73c564dc7b7cc316c763e2633"
|
||||
}
|
||||
},
|
||||
"b32f3a86a74741348511f4e136744ac8": {
|
||||
"model_module": "@jupyter-widgets/base",
|
||||
"model_module_version": "1.2.0",
|
||||
"model_name": "LayoutModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/base",
|
||||
"_model_module_version": "1.2.0",
|
||||
"_model_name": "LayoutModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "LayoutView",
|
||||
"align_content": null,
|
||||
"align_items": null,
|
||||
"align_self": null,
|
||||
"border": null,
|
||||
"bottom": null,
|
||||
"display": null,
|
||||
"flex": null,
|
||||
"flex_flow": null,
|
||||
"grid_area": null,
|
||||
"grid_auto_columns": null,
|
||||
"grid_auto_flow": null,
|
||||
"grid_auto_rows": null,
|
||||
"grid_column": null,
|
||||
"grid_gap": null,
|
||||
"grid_row": null,
|
||||
"grid_template_areas": null,
|
||||
"grid_template_columns": null,
|
||||
"grid_template_rows": null,
|
||||
"height": null,
|
||||
"justify_content": null,
|
||||
"justify_items": null,
|
||||
"left": null,
|
||||
"margin": null,
|
||||
"max_height": null,
|
||||
"max_width": null,
|
||||
"min_height": null,
|
||||
"min_width": null,
|
||||
"object_fit": null,
|
||||
"object_position": null,
|
||||
"order": null,
|
||||
"overflow": null,
|
||||
"overflow_x": null,
|
||||
"overflow_y": null,
|
||||
"padding": null,
|
||||
"right": null,
|
||||
"top": null,
|
||||
"visibility": null,
|
||||
"width": null
|
||||
}
|
||||
},
|
||||
"d8b8370c9b514715be7618bfe6832844": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "HTMLModel",
|
||||
"state": {
|
||||
"_dom_classes": [],
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "HTMLModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/controls",
|
||||
"_view_module_version": "1.5.0",
|
||||
"_view_name": "HTMLView",
|
||||
"description": "",
|
||||
"description_tooltip": null,
|
||||
"layout": "IPY_MODEL_fa055d9f2a9d4a789e9cf3c89e0214e5",
|
||||
"placeholder": "",
|
||||
"style": "IPY_MODEL_30ecca964a394109ac2ad757e3aec6c0",
|
||||
"value": "llama-2-7b-chat.Q4_K_M.gguf: 100%"
|
||||
}
|
||||
},
|
||||
"e409071bff5a4e2d9bf0e9f5cc42231b": {
|
||||
"model_module": "@jupyter-widgets/controls",
|
||||
"model_module_version": "1.5.0",
|
||||
"model_name": "DescriptionStyleModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/controls",
|
||||
"_model_module_version": "1.5.0",
|
||||
"_model_name": "DescriptionStyleModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "StyleView",
|
||||
"description_width": ""
|
||||
}
|
||||
},
|
||||
"fa055d9f2a9d4a789e9cf3c89e0214e5": {
|
||||
"model_module": "@jupyter-widgets/base",
|
||||
"model_module_version": "1.2.0",
|
||||
"model_name": "LayoutModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/base",
|
||||
"_model_module_version": "1.2.0",
|
||||
"_model_name": "LayoutModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "LayoutView",
|
||||
"align_content": null,
|
||||
"align_items": null,
|
||||
"align_self": null,
|
||||
"border": null,
|
||||
"bottom": null,
|
||||
"display": null,
|
||||
"flex": null,
|
||||
"flex_flow": null,
|
||||
"grid_area": null,
|
||||
"grid_auto_columns": null,
|
||||
"grid_auto_flow": null,
|
||||
"grid_auto_rows": null,
|
||||
"grid_column": null,
|
||||
"grid_gap": null,
|
||||
"grid_row": null,
|
||||
"grid_template_areas": null,
|
||||
"grid_template_columns": null,
|
||||
"grid_template_rows": null,
|
||||
"height": null,
|
||||
"justify_content": null,
|
||||
"justify_items": null,
|
||||
"left": null,
|
||||
"margin": null,
|
||||
"max_height": null,
|
||||
"max_width": null,
|
||||
"min_height": null,
|
||||
"min_width": null,
|
||||
"object_fit": null,
|
||||
"object_position": null,
|
||||
"order": null,
|
||||
"overflow": null,
|
||||
"overflow_x": null,
|
||||
"overflow_y": null,
|
||||
"padding": null,
|
||||
"right": null,
|
||||
"top": null,
|
||||
"visibility": null,
|
||||
"width": null
|
||||
}
|
||||
},
|
||||
"fb6478ce2dac489bb633b23ba0953c5c": {
|
||||
"model_module": "@jupyter-widgets/base",
|
||||
"model_module_version": "1.2.0",
|
||||
"model_name": "LayoutModel",
|
||||
"state": {
|
||||
"_model_module": "@jupyter-widgets/base",
|
||||
"_model_module_version": "1.2.0",
|
||||
"_model_name": "LayoutModel",
|
||||
"_view_count": null,
|
||||
"_view_module": "@jupyter-widgets/base",
|
||||
"_view_module_version": "1.2.0",
|
||||
"_view_name": "LayoutView",
|
||||
"align_content": null,
|
||||
"align_items": null,
|
||||
"align_self": null,
|
||||
"border": null,
|
||||
"bottom": null,
|
||||
"display": null,
|
||||
"flex": null,
|
||||
"flex_flow": null,
|
||||
"grid_area": null,
|
||||
"grid_auto_columns": null,
|
||||
"grid_auto_flow": null,
|
||||
"grid_auto_rows": null,
|
||||
"grid_column": null,
|
||||
"grid_gap": null,
|
||||
"grid_row": null,
|
||||
"grid_template_areas": null,
|
||||
"grid_template_columns": null,
|
||||
"grid_template_rows": null,
|
||||
"height": null,
|
||||
"justify_content": null,
|
||||
"justify_items": null,
|
||||
"left": null,
|
||||
"margin": null,
|
||||
"max_height": null,
|
||||
"max_width": null,
|
||||
"min_height": null,
|
||||
"min_width": null,
|
||||
"object_fit": null,
|
||||
"object_position": null,
|
||||
"order": null,
|
||||
"overflow": null,
|
||||
"overflow_x": null,
|
||||
"overflow_y": null,
|
||||
"padding": null,
|
||||
"right": null,
|
||||
"top": null,
|
||||
"visibility": null,
|
||||
"width": null
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -27,10 +27,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities import SerpAPIWrapper\n",
|
||||
"from langchain.agents import Tool\n",
|
||||
"from langchain_community.tools.file_management.read import ReadFileTool\n",
|
||||
"from langchain_community.tools.file_management.write import WriteFileTool\n",
|
||||
"from langchain_community.utilities import SerpAPIWrapper\n",
|
||||
"from langchain.tools.file_management.write import WriteFileTool\n",
|
||||
"from langchain.tools.file_management.read import ReadFileTool\n",
|
||||
"\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"tools = [\n",
|
||||
@@ -61,9 +61,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore\n",
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
"from langchain.embeddings import OpenAIEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -101,7 +101,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_experimental.autonomous_agents import AutoGPT\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -167,7 +167,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_message_histories import FileChatMessageHistory\n",
|
||||
"from langchain.memory.chat_message_histories import FileChatMessageHistory\n",
|
||||
"\n",
|
||||
"agent = AutoGPT.from_llm_and_tools(\n",
|
||||
" ai_name=\"Tom\",\n",
|
||||
|
||||
@@ -34,15 +34,16 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# General\n",
|
||||
"import asyncio\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"import nest_asyncio\n",
|
||||
"import pandas as pd\n",
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"from langchain_community.agent_toolkits.pandas.base import create_pandas_dataframe_agent\n",
|
||||
"from langchain_experimental.autonomous_agents import AutoGPT\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent\n",
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"import asyncio\n",
|
||||
"import nest_asyncio\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Needed synce jupyter runs an async eventloop\n",
|
||||
"nest_asyncio.apply()"
|
||||
@@ -91,10 +92,9 @@
|
||||
"import os\n",
|
||||
"from contextlib import contextmanager\n",
|
||||
"from typing import Optional\n",
|
||||
"\n",
|
||||
"from langchain.agents import tool\n",
|
||||
"from langchain_community.tools.file_management.read import ReadFileTool\n",
|
||||
"from langchain_community.tools.file_management.write import WriteFileTool\n",
|
||||
"from langchain.tools.file_management.read import ReadFileTool\n",
|
||||
"from langchain.tools.file_management.write import WriteFileTool\n",
|
||||
"\n",
|
||||
"ROOT_DIR = \"./data/\"\n",
|
||||
"\n",
|
||||
@@ -223,13 +223,14 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.qa_with_sources.loading import (\n",
|
||||
" BaseCombineDocumentsChain,\n",
|
||||
" load_qa_with_sources_chain,\n",
|
||||
")\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain.tools import BaseTool, DuckDuckGoSearchRun\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"from pydantic import Field\n",
|
||||
"from langchain.chains.qa_with_sources.loading import (\n",
|
||||
" load_qa_with_sources_chain,\n",
|
||||
" BaseCombineDocumentsChain,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def _get_text_splitter():\n",
|
||||
@@ -310,9 +311,10 @@
|
||||
"source": [
|
||||
"# Memory\n",
|
||||
"import faiss\n",
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore\n",
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.tools.human.tool import HumanInputRun\n",
|
||||
"\n",
|
||||
"embeddings_model = OpenAIEmbeddings()\n",
|
||||
"embedding_size = 1536\n",
|
||||
|
||||
@@ -29,10 +29,17 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional\n",
|
||||
"import os\n",
|
||||
"from collections import deque\n",
|
||||
"from typing import Dict, List, Optional, Any\n",
|
||||
"\n",
|
||||
"from langchain_experimental.autonomous_agents import BabyAGI\n",
|
||||
"from langchain_openai import OpenAI, OpenAIEmbeddings"
|
||||
"from langchain.chains import LLMChain\nfrom langchain.llms import OpenAI\nfrom langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import BaseLLM\n",
|
||||
"from langchain.schema.vectorstore import VectorStore\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.chains.base import Chain\n",
|
||||
"from langchain_experimental.autonomous_agents import BabyAGI"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -52,8 +59,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.docstore import InMemoryDocstore\n",
|
||||
"from langchain_community.vectorstores import FAISS"
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -25,12 +25,17 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional\n",
|
||||
"import os\n",
|
||||
"from collections import deque\n",
|
||||
"from typing import Dict, List, Optional, Any\n",
|
||||
"\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_experimental.autonomous_agents import BabyAGI\n",
|
||||
"from langchain_openai import OpenAI, OpenAIEmbeddings"
|
||||
"from langchain.chains import LLMChain\nfrom langchain.llms import OpenAI\nfrom langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.llms import BaseLLM\n",
|
||||
"from langchain.schema.vectorstore import VectorStore\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"from langchain.chains.base import Chain\n",
|
||||
"from langchain_experimental.autonomous_agents import BabyAGI"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -61,8 +66,8 @@
|
||||
"source": [
|
||||
"%pip install faiss-cpu > /dev/null\n",
|
||||
"%pip install google-search-results > /dev/null\n",
|
||||
"from langchain.docstore import InMemoryDocstore\n",
|
||||
"from langchain_community.vectorstores import FAISS"
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.docstore import InMemoryDocstore"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -105,10 +110,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentExecutor, Tool, ZeroShotAgent\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain_community.utilities import SerpAPIWrapper\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.agents import ZeroShotAgent, Tool, AgentExecutor\n",
|
||||
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"todo_prompt = PromptTemplate.from_template(\n",
|
||||
" \"You are a planner who is an expert at coming up with a todo list for a given objective. Come up with a todo list for this objective: {objective}\"\n",
|
||||
|
||||
@@ -35,18 +35,17 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts.chat import (\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
" SystemMessagePromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" BaseMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
" BaseMessage,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -47,9 +47,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from IPython.display import SVG\n",
|
||||
"\n",
|
||||
"from langchain_experimental.cpal.base import CPALChain\n",
|
||||
"from langchain_experimental.pal_chain import PALChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0, max_tokens=512)\n",
|
||||
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",
|
||||
|
||||
@@ -23,9 +23,9 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"1. Prepare data:\n",
|
||||
" 1. Upload all python project files using the `langchain_community.document_loaders.TextLoader`. We will call these files the **documents**.\n",
|
||||
" 1. Upload all python project files using the `langchain.document_loaders.TextLoader`. We will call these files the **documents**.\n",
|
||||
" 2. Split all documents to chunks using the `langchain.text_splitter.CharacterTextSplitter`.\n",
|
||||
" 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain_community.vectorstores.DeepLake`\n",
|
||||
" 3. Embed chunks and upload them into the DeepLake using `langchain.embeddings.openai.OpenAIEmbeddings` and `langchain.vectorstores.DeepLake`\n",
|
||||
"2. Question-Answering:\n",
|
||||
" 1. Build a chain from `langchain.chat_models.ChatOpenAI` and `langchain.chains.ConversationalRetrievalChain`\n",
|
||||
" 2. Prepare questions.\n",
|
||||
@@ -166,7 +166,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"root_dir = \"../../../../../../libs\"\n",
|
||||
"\n",
|
||||
@@ -177,7 +177,7 @@
|
||||
" try:\n",
|
||||
" loader = TextLoader(os.path.join(dirpath, file), encoding=\"utf-8\")\n",
|
||||
" docs.extend(loader.load_and_split())\n",
|
||||
" except Exception:\n",
|
||||
" except Exception as e:\n",
|
||||
" pass\n",
|
||||
"print(f\"{len(docs)}\")"
|
||||
]
|
||||
@@ -648,7 +648,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={})"
|
||||
"OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='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', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
@@ -657,7 +657,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"embeddings"
|
||||
@@ -706,7 +706,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
|
||||
"<langchain.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
@@ -715,7 +715,8 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import DeepLake\n",
|
||||
"from langchain.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"username = \"<USERNAME_OR_ORG>\"\n",
|
||||
"\n",
|
||||
@@ -740,7 +741,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from langchain_community.vectorstores import DeepLake\n",
|
||||
"# from langchain.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"# db = DeepLake.from_documents(\n",
|
||||
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
|
||||
@@ -833,12 +834,10 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(\n",
|
||||
" model_name=\"gpt-3.5-turbo-0613\"\n",
|
||||
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -32,20 +32,19 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import re\n",
|
||||
"from typing import Union\n",
|
||||
"\n",
|
||||
"from langchain.agents import (\n",
|
||||
" Tool,\n",
|
||||
" AgentExecutor,\n",
|
||||
" AgentOutputParser,\n",
|
||||
" LLMSingleActionAgent,\n",
|
||||
" AgentOutputParser,\n",
|
||||
")\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import StringPromptTemplate\n",
|
||||
"from langchain_community.agent_toolkits import NLAToolkit\n",
|
||||
"from langchain_community.tools.plugin import AIPlugin\n",
|
||||
"from langchain_core.agents import AgentAction, AgentFinish\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
|
||||
"from typing import List, Union\n",
|
||||
"from langchain.schema import AgentAction, AgentFinish\n",
|
||||
"from langchain.agents.agent_toolkits import NLAToolkit\n",
|
||||
"from langchain.tools.plugin import AIPlugin\n",
|
||||
"import re"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,9 +113,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.schema import Document"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -56,21 +56,20 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import re\n",
|
||||
"from typing import Union\n",
|
||||
"\n",
|
||||
"import plugnplai\n",
|
||||
"from langchain.agents import (\n",
|
||||
" Tool,\n",
|
||||
" AgentExecutor,\n",
|
||||
" AgentOutputParser,\n",
|
||||
" LLMSingleActionAgent,\n",
|
||||
" AgentOutputParser,\n",
|
||||
")\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import StringPromptTemplate\n",
|
||||
"from langchain_community.agent_toolkits import NLAToolkit\n",
|
||||
"from langchain_community.tools.plugin import AIPlugin\n",
|
||||
"from langchain_core.agents import AgentAction, AgentFinish\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
"from langchain.llms import OpenAI\nfrom langchain.utilities import SerpAPIWrapper\nfrom langchain.chains import LLMChain\n",
|
||||
"from typing import List, Union\n",
|
||||
"from langchain.schema import AgentAction, AgentFinish\n",
|
||||
"from langchain.agents.agent_toolkits import NLAToolkit\n",
|
||||
"from langchain.tools.plugin import AIPlugin\n",
|
||||
"import re\n",
|
||||
"import plugnplai"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -138,9 +137,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.schema import Document"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -80,7 +80,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Connecting to Databricks with SQLDatabase wrapper\n",
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"from langchain.utilities import SQLDatabase\n",
|
||||
"\n",
|
||||
"db = SQLDatabase.from_databricks(catalog=\"samples\", schema=\"nyctaxi\")"
|
||||
]
|
||||
@@ -93,7 +93,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating a OpenAI Chat LLM wrapper\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
|
||||
]
|
||||
@@ -115,7 +115,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities import SQLDatabaseChain\n",
|
||||
"from langchain.utilities import SQLDatabaseChain\n",
|
||||
"\n",
|
||||
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
|
||||
]
|
||||
@@ -177,7 +177,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_sql_agent\n",
|
||||
"from langchain_community.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
|
||||
"agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)"
|
||||
|
||||
@@ -48,16 +48,18 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"import getpass\n",
|
||||
"from langchain.document_loaders import PyPDFLoader, TextLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import (\n",
|
||||
" CharacterTextSplitter,\n",
|
||||
" RecursiveCharacterTextSplitter,\n",
|
||||
" CharacterTextSplitter,\n",
|
||||
")\n",
|
||||
"from langchain_community.vectorstores import DeepLake\n",
|
||||
"from langchain_openai import OpenAI, OpenAIEmbeddings\n",
|
||||
"from langchain.vectorstores import DeepLake\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain, RetrievalQA\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
|
||||
"activeloop_token = getpass.getpass(\"Activeloop Token:\")\n",
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -38,8 +38,9 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from elasticsearch import Elasticsearch\n",
|
||||
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -111,6 +112,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"\n",
|
||||
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
|
||||
|
||||
@@ -1,214 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2def22ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extraction with OpenAI Tools\n",
|
||||
"\n",
|
||||
"Performing extraction has never been easier! OpenAI's tool calling ability is the perfect thing to use as it allows for extracting multiple different elements from text that are different types. \n",
|
||||
"\n",
|
||||
"Models after 1106 use tools and support \"parallel function calling\" which makes this super easy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "5c628496",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List, Optional\n",
|
||||
"\n",
|
||||
"from langchain.chains.openai_tools import create_extraction_chain_pydantic\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "afe9657b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Make sure to use a recent model that supports tools\n",
|
||||
"model = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "bc0ca3b6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Pydantic is an easy way to define a schema\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" \"\"\"Information about people to extract.\"\"\"\n",
|
||||
"\n",
|
||||
" name: str\n",
|
||||
" age: Optional[int] = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "2036af68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain_pydantic(Person, model)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "1748ad21",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Person(name='jane', age=2), Person(name='bob', age=3)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"input\": \"jane is 2 and bob is 3\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "c8262ce5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Let's define another element\n",
|
||||
"class Class(BaseModel):\n",
|
||||
" \"\"\"Information about classes to extract.\"\"\"\n",
|
||||
"\n",
|
||||
" teacher: str\n",
|
||||
" students: List[str]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "4973c104",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain_pydantic([Person, Class], model)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "e976a15e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Person(name='jane', age=2),\n",
|
||||
" Person(name='bob', age=3),\n",
|
||||
" Class(teacher='Mrs Sampson', students=['jane', 'bob'])]"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"input\": \"jane is 2 and bob is 3 and they are in Mrs Sampson's class\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6575a7d6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Under the hood\n",
|
||||
"\n",
|
||||
"Under the hood, this is a simple chain:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b8ba83e5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```python\n",
|
||||
"from typing import Union, List, Type, Optional\n",
|
||||
"\n",
|
||||
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
|
||||
"from langchain.utils.openai_functions import convert_pydantic_to_openai_tool\n",
|
||||
"from langchain_core.runnables import Runnable\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.messages import SystemMessage\n",
|
||||
"from langchain_core.language_models import BaseLanguageModel\n",
|
||||
"\n",
|
||||
"_EXTRACTION_TEMPLATE = \"\"\"Extract and save the relevant entities mentioned \\\n",
|
||||
"in the following passage together with their properties.\n",
|
||||
"\n",
|
||||
"If a property is not present and is not required in the function parameters, do not include it in the output.\"\"\" # noqa: E501\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def create_extraction_chain_pydantic(\n",
|
||||
" pydantic_schemas: Union[List[Type[BaseModel]], Type[BaseModel]],\n",
|
||||
" llm: BaseLanguageModel,\n",
|
||||
" system_message: str = _EXTRACTION_TEMPLATE,\n",
|
||||
") -> Runnable:\n",
|
||||
" if not isinstance(pydantic_schemas, list):\n",
|
||||
" pydantic_schemas = [pydantic_schemas]\n",
|
||||
" prompt = ChatPromptTemplate.from_messages([\n",
|
||||
" (\"system\", system_message),\n",
|
||||
" (\"user\", \"{input}\")\n",
|
||||
" ])\n",
|
||||
" tools = [convert_pydantic_to_openai_tool(p) for p in pydantic_schemas]\n",
|
||||
" model = llm.bind(tools=tools)\n",
|
||||
" chain = prompt | model | PydanticToolsParser(tools=pydantic_schemas)\n",
|
||||
" return chain\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2eac6b68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -56,8 +56,7 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"SERPER_API_KEY\"] = \"\"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
"os.environ[\"SERPER_API_KEY\"] = \"\"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -67,16 +66,21 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Any, List\n",
|
||||
"import re\n",
|
||||
"\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"from langchain.schema import BaseRetriever\n",
|
||||
"from langchain.callbacks.manager import (\n",
|
||||
" AsyncCallbackManagerForRetrieverRun,\n",
|
||||
" CallbackManagerForRetrieverRun,\n",
|
||||
")\n",
|
||||
"from langchain_community.utilities import GoogleSerperAPIWrapper\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_core.retrievers import BaseRetriever\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAI"
|
||||
"from langchain.utilities import GoogleSerperAPIWrapper\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from typing import Any, List"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -46,12 +46,14 @@
|
||||
"source": [
|
||||
"from datetime import datetime, timedelta\n",
|
||||
"from typing import List\n",
|
||||
"from termcolor import colored\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.docstore import InMemoryDocstore\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.retrievers import TimeWeightedVectorStoreRetriever\n",
|
||||
"from langchain_community.vectorstores import FAISS\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
|
||||
"from termcolor import colored"
|
||||
"from langchain.vectorstores import FAISS"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -151,7 +153,6 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import math\n",
|
||||
"\n",
|
||||
"import faiss\n",
|
||||
"\n",
|
||||
"\n",
|
||||
|
||||
@@ -27,12 +27,18 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import gymnasium as gym\n",
|
||||
"import inspect\n",
|
||||
"import tenacity\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")"
|
||||
" BaseMessage,\n",
|
||||
")\n",
|
||||
"from langchain.output_parsers import RegexParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -125,7 +131,7 @@
|
||||
" ):\n",
|
||||
" with attempt:\n",
|
||||
" action = self._act()\n",
|
||||
" except tenacity.RetryError:\n",
|
||||
" except tenacity.RetryError as e:\n",
|
||||
" action = self.random_action()\n",
|
||||
" return action"
|
||||
]
|
||||
|
||||
@@ -75,9 +75,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain_experimental.autonomous_agents import HuggingGPT\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"# %env OPENAI_API_BASE=http://localhost:8000/v1"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -20,9 +20,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import HypotheticalDocumentEmbedder, LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_openai import OpenAI, OpenAIEmbeddings"
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.chains import LLMChain, HypotheticalDocumentEmbedder\n",
|
||||
"from langchain.prompts import PromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -171,7 +172,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"\n",
|
||||
"with open(\"../../state_of_the_union.txt\") as f:\n",
|
||||
" state_of_the_union = f.read()\n",
|
||||
|
||||
@@ -49,9 +49,8 @@
|
||||
"source": [
|
||||
"# pick and configure the LLM of your choice\n",
|
||||
"\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(model=\"gpt-3.5-turbo-instruct\")"
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"llm = OpenAI(model=\"text-davinci-003\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -86,8 +85,8 @@
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"PROMPT = PromptTemplate(\n",
|
||||
" input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"],\n",
|
||||
" template=PROMPT_TEMPLATE,\n",
|
||||
" input_variables=[\"meal\", \"text_to_personalize\", \"user\", \"preference\"], \n",
|
||||
" template=PROMPT_TEMPLATE\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -106,7 +105,7 @@
|
||||
"source": [
|
||||
"import langchain_experimental.rl_chain as rl_chain\n",
|
||||
"\n",
|
||||
"chain = rl_chain.PickBest.from_llm(llm=llm, prompt=PROMPT)"
|
||||
"chain = rl_chain.PickBest.from_llm(llm=llm, prompt=PROMPT)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -123,10 +122,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"response = chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs \\\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs \\\n",
|
||||
" believe you will love it!\",\n",
|
||||
")"
|
||||
]
|
||||
@@ -194,10 +193,10 @@
|
||||
"for _ in range(5):\n",
|
||||
" try:\n",
|
||||
" response = chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" )\n",
|
||||
" except Exception as e:\n",
|
||||
" print(e)\n",
|
||||
@@ -224,16 +223,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"scoring_criteria_template = (\n",
|
||||
" \"Given {preference} rank how good or bad this selection is {meal}\"\n",
|
||||
")\n",
|
||||
"scoring_criteria_template = \"Given {preference} rank how good or bad this selection is {meal}\"\n",
|
||||
"\n",
|
||||
"chain = rl_chain.PickBest.from_llm(\n",
|
||||
" llm=llm,\n",
|
||||
" prompt=PROMPT,\n",
|
||||
" selection_scorer=rl_chain.AutoSelectionScorer(\n",
|
||||
" llm=llm, scoring_criteria_template_str=scoring_criteria_template\n",
|
||||
" ),\n",
|
||||
" selection_scorer=rl_chain.AutoSelectionScorer(llm=llm, scoring_criteria_template_str=scoring_criteria_template),\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -260,16 +255,14 @@
|
||||
],
|
||||
"source": [
|
||||
"response = chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
")\n",
|
||||
"print(response[\"response\"])\n",
|
||||
"selection_metadata = response[\"selection_metadata\"]\n",
|
||||
"print(\n",
|
||||
" f\"selected index: {selection_metadata.selected.index}, score: {selection_metadata.selected.score}\"\n",
|
||||
")"
|
||||
"print(f\"selected index: {selection_metadata.selected.index}, score: {selection_metadata.selected.score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -287,8 +280,8 @@
|
||||
"source": [
|
||||
"class CustomSelectionScorer(rl_chain.SelectionScorer):\n",
|
||||
" def score_response(\n",
|
||||
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent\n",
|
||||
" ) -> float:\n",
|
||||
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent) -> float:\n",
|
||||
"\n",
|
||||
" print(event.based_on)\n",
|
||||
" print(event.to_select_from)\n",
|
||||
"\n",
|
||||
@@ -343,10 +336,10 @@
|
||||
],
|
||||
"source": [
|
||||
"response = chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -377,10 +370,9 @@
|
||||
" return 1.0\n",
|
||||
" else:\n",
|
||||
" return 0.0\n",
|
||||
"\n",
|
||||
" def score_response(\n",
|
||||
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent\n",
|
||||
" ) -> float:\n",
|
||||
" self, inputs, llm_response: str, event: rl_chain.PickBestEvent) -> float:\n",
|
||||
"\n",
|
||||
" selected_meal = event.to_select_from[\"meal\"][event.selected.index]\n",
|
||||
"\n",
|
||||
" if \"Tom\" in event.based_on[\"user\"]:\n",
|
||||
@@ -402,7 +394,7 @@
|
||||
" prompt=PROMPT,\n",
|
||||
" selection_scorer=CustomSelectionScorer(),\n",
|
||||
" metrics_step=5,\n",
|
||||
" metrics_window_size=5, # rolling window average\n",
|
||||
" metrics_window_size=5, # rolling window average\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"random_chain = rl_chain.PickBest.from_llm(\n",
|
||||
@@ -410,8 +402,8 @@
|
||||
" prompt=PROMPT,\n",
|
||||
" selection_scorer=CustomSelectionScorer(),\n",
|
||||
" metrics_step=5,\n",
|
||||
" metrics_window_size=5, # rolling window average\n",
|
||||
" policy=rl_chain.PickBestRandomPolicy, # set the random policy instead of default\n",
|
||||
" metrics_window_size=5, # rolling window average\n",
|
||||
" policy=rl_chain.PickBestRandomPolicy # set the random policy instead of default\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -424,29 +416,29 @@
|
||||
"for _ in range(20):\n",
|
||||
" try:\n",
|
||||
" chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" )\n",
|
||||
" random_chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" \n",
|
||||
" chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Anna\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Anna\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" )\n",
|
||||
" random_chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Anna\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Anna\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Loves meat\", \"especially beef\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" )\n",
|
||||
" except Exception as e:\n",
|
||||
" print(e)"
|
||||
@@ -485,17 +477,12 @@
|
||||
],
|
||||
"source": [
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"\n",
|
||||
"chain.metrics.to_pandas()[\"score\"].plot(label=\"default learning policy\")\n",
|
||||
"random_chain.metrics.to_pandas()[\"score\"].plot(label=\"random selection policy\")\n",
|
||||
"chain.metrics.to_pandas()['score'].plot(label=\"default learning policy\")\n",
|
||||
"random_chain.metrics.to_pandas()['score'].plot(label=\"random selection policy\")\n",
|
||||
"plt.legend()\n",
|
||||
"\n",
|
||||
"print(\n",
|
||||
" f\"The final average score for the default policy, calculated over a rolling window, is: {chain.metrics.to_pandas()['score'].iloc[-1]}\"\n",
|
||||
")\n",
|
||||
"print(\n",
|
||||
" f\"The final average score for the random policy, calculated over a rolling window, is: {random_chain.metrics.to_pandas()['score'].iloc[-1]}\"\n",
|
||||
")"
|
||||
"print(f\"The final average score for the default policy, calculated over a rolling window, is: {chain.metrics.to_pandas()['score'].iloc[-1]}\")\n",
|
||||
"print(f\"The final average score for the random policy, calculated over a rolling window, is: {random_chain.metrics.to_pandas()['score'].iloc[-1]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -790,8 +777,8 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.globals import set_debug\n",
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"from langchain.globals import set_debug\n",
|
||||
"\n",
|
||||
"set_debug(True)\n",
|
||||
"\n",
|
||||
@@ -816,10 +803,10 @@
|
||||
")\n",
|
||||
"\n",
|
||||
"chain.run(\n",
|
||||
" meal=rl_chain.ToSelectFrom(meals),\n",
|
||||
" user=rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference=rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize=\"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
" meal = rl_chain.ToSelectFrom(meals),\n",
|
||||
" user = rl_chain.BasedOn(\"Tom\"),\n",
|
||||
" preference = rl_chain.BasedOn([\"Vegetarian\", \"regular dairy is ok\"]),\n",
|
||||
" text_to_personalize = \"This is the weeks specialty dish, our master chefs believe you will love it!\",\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -44,7 +44,7 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain_experimental.llm_bash.base import LLMBashChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
@@ -70,7 +70,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts.prompt import PromptTemplate\n",
|
||||
"from langchain_experimental.llm_bash.prompt import BashOutputParser\n",
|
||||
"from langchain.chains.llm_bash.prompt import BashOutputParser\n",
|
||||
"\n",
|
||||
"_PROMPT_TEMPLATE = \"\"\"If someone asks you to perform a task, your job is to come up with a series of bash commands that will perform the task. There is no need to put \"#!/bin/bash\" in your answer. Make sure to reason step by step, using this format:\n",
|
||||
"Question: \"copy the files in the directory named 'target' into a new directory at the same level as target called 'myNewDirectory'\"\n",
|
||||
@@ -185,6 +185,7 @@
|
||||
"source": [
|
||||
"from langchain_experimental.llm_bash.bash import BashProcess\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"persistent_process = BashProcess(persistent=True)\n",
|
||||
"bash_chain = LLMBashChain.from_llm(llm, bash_process=persistent_process, verbose=True)\n",
|
||||
"\n",
|
||||
|
||||
@@ -42,7 +42,7 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMCheckerChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0.7)\n",
|
||||
"\n",
|
||||
|
||||
@@ -45,8 +45,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMMathChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\nfrom langchain.chains import LLMMathChain\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"llm_math = LLMMathChain.from_llm(llm, verbose=True)\n",
|
||||
|
||||
@@ -331,7 +331,7 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMSummarizationCheckerChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"checker_chain = LLMSummarizationCheckerChain.from_llm(llm, verbose=True, max_checks=2)\n",
|
||||
@@ -822,7 +822,7 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMSummarizationCheckerChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"checker_chain = LLMSummarizationCheckerChain.from_llm(llm, verbose=True, max_checks=3)\n",
|
||||
@@ -1096,7 +1096,7 @@
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import LLMSummarizationCheckerChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"checker_chain = LLMSummarizationCheckerChain.from_llm(llm, max_checks=3, verbose=True)\n",
|
||||
|
||||
@@ -14,8 +14,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain_experimental.llm_symbolic_math.base import LLMSymbolicMathChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"llm_symbolic_math = LLMSymbolicMathChain.from_llm(llm)"
|
||||
|
||||
@@ -56,10 +56,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.memory import ConversationBufferWindowMemory\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
"from langchain.llms import OpenAI\nfrom langchain.chains import LLMChain\nfrom langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.memory import ConversationBufferWindowMemory"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -154,13 +152,13 @@
|
||||
" for j in range(max_iters):\n",
|
||||
" print(f\"(Step {j+1}/{max_iters})\")\n",
|
||||
" print(f\"Assistant: {output}\")\n",
|
||||
" print(\"Human: \")\n",
|
||||
" print(f\"Human: \")\n",
|
||||
" human_input = input()\n",
|
||||
" if any(phrase in human_input.lower() for phrase in key_phrases):\n",
|
||||
" break\n",
|
||||
" output = chain.predict(human_input=human_input)\n",
|
||||
" if success_phrase in human_input.lower():\n",
|
||||
" print(\"You succeeded! Thanks for playing!\")\n",
|
||||
" print(f\"You succeeded! Thanks for playing!\")\n",
|
||||
" return\n",
|
||||
" meta_chain = initialize_meta_chain()\n",
|
||||
" meta_output = meta_chain.predict(chat_history=get_chat_history(chain.memory))\n",
|
||||
@@ -168,7 +166,7 @@
|
||||
" instructions = get_new_instructions(meta_output)\n",
|
||||
" print(f\"New Instructions: {instructions}\")\n",
|
||||
" print(\"\\n\" + \"#\" * 80 + \"\\n\")\n",
|
||||
" print(\"You failed! Thanks for playing!\")"
|
||||
" print(f\"You failed! Thanks for playing!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -29,10 +29,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from steamship import Block, Steamship\n",
|
||||
"import re\n",
|
||||
"\n",
|
||||
"from IPython.display import Image, display\n",
|
||||
"from steamship import Block, Steamship"
|
||||
"from IPython.display import Image"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -42,9 +41,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import AgentType, initialize_agent\n",
|
||||
"from langchain.tools import SteamshipImageGenerationTool\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.tools import SteamshipImageGenerationTool"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -180,7 +180,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -26,13 +26,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Callable, List\n",
|
||||
"\n",
|
||||
"from typing import List, Dict, Callable\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
" BaseMessage,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -27,21 +27,27 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from collections import OrderedDict\n",
|
||||
"import functools\n",
|
||||
"import random\n",
|
||||
"from collections import OrderedDict\n",
|
||||
"from typing import Callable, List\n",
|
||||
"\n",
|
||||
"import re\n",
|
||||
"import tenacity\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"from typing import List, Dict, Callable\n",
|
||||
"\n",
|
||||
"from langchain.prompts import (\n",
|
||||
" ChatPromptTemplate,\n",
|
||||
" HumanMessagePromptTemplate,\n",
|
||||
" PromptTemplate,\n",
|
||||
")\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
" BaseMessage,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -24,16 +24,18 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Callable, List\n",
|
||||
"\n",
|
||||
"import tenacity\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"import re\n",
|
||||
"import tenacity\n",
|
||||
"from typing import List, Dict, Callable\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"from langchain.schema import (\n",
|
||||
" AIMessage,\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
" BaseMessage,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -27,17 +27,19 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"from os import environ\n",
|
||||
"\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"import getpass\n",
|
||||
"from typing import Dict, Any\n",
|
||||
"from langchain.llms import OpenAI\nfrom langchain.utilities import SQLDatabase\nfrom langchain.chains import LLMChain\n",
|
||||
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"from sqlalchemy import MetaData, create_engine\n",
|
||||
"from sqlalchemy import create_engine, Column, MetaData\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"\n",
|
||||
"MYSCALE_HOST = \"msc-4a9e710a.us-east-1.aws.staging.myscale.cloud\"\n",
|
||||
"\n",
|
||||
"from sqlalchemy import create_engine\n",
|
||||
"\n",
|
||||
"MYSCALE_HOST = \"msc-1decbcc9.us-east-1.aws.staging.myscale.cloud\"\n",
|
||||
"MYSCALE_PORT = 443\n",
|
||||
"MYSCALE_USER = \"chatdata\"\n",
|
||||
"MYSCALE_PASSWORD = \"myscale_rocks\"\n",
|
||||
@@ -57,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import HuggingFaceInstructEmbeddings\n",
|
||||
"from langchain.embeddings import HuggingFaceInstructEmbeddings\n",
|
||||
"from langchain_experimental.sql.vector_sql import VectorSQLOutputParser\n",
|
||||
"\n",
|
||||
"output_parser = VectorSQLOutputParser.from_embeddings(\n",
|
||||
@@ -74,11 +76,13 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.callbacks import StdOutCallbackHandler\n",
|
||||
"from langchain_community.utilities.sql_database import SQLDatabase\n",
|
||||
"\n",
|
||||
"from langchain.utilities.sql_database import SQLDatabase\n",
|
||||
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
|
||||
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
|
||||
"from langchain_openai import OpenAI\n",
|
||||
"\n",
|
||||
"chain = VectorSQLDatabaseChain(\n",
|
||||
" llm_chain=LLMChain(\n",
|
||||
@@ -116,16 +120,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains.qa_with_sources.retrieval import RetrievalQAWithSourcesChain\n",
|
||||
"from langchain_experimental.retrievers.vector_sql_database import (\n",
|
||||
" VectorSQLDatabaseChainRetriever,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"from langchain_experimental.sql.vector_sql import VectorSQLDatabaseChain\n",
|
||||
"from langchain_experimental.retrievers.vector_sql_database \\\n",
|
||||
" import VectorSQLDatabaseChainRetriever\n",
|
||||
"from langchain_experimental.sql.prompt import MYSCALE_PROMPT\n",
|
||||
"from langchain_experimental.sql.vector_sql import (\n",
|
||||
" VectorSQLDatabaseChain,\n",
|
||||
" VectorSQLRetrieveAllOutputParser,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"from langchain_experimental.sql.vector_sql import VectorSQLRetrieveAllOutputParser\n",
|
||||
"\n",
|
||||
"output_parser_retrieve_all = VectorSQLRetrieveAllOutputParser.from_embeddings(\n",
|
||||
" output_parser.model\n",
|
||||
@@ -142,9 +144,7 @@
|
||||
")\n",
|
||||
"\n",
|
||||
"# You need all those keys to get docs\n",
|
||||
"retriever = VectorSQLDatabaseChainRetriever(\n",
|
||||
" sql_db_chain=chain, page_content_key=\"abstract\"\n",
|
||||
")\n",
|
||||
"retriever = VectorSQLDatabaseChainRetriever(sql_db_chain=chain, page_content_key=\"abstract\")\n",
|
||||
"\n",
|
||||
"document_with_metadata_prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"page_content\", \"id\", \"title\", \"authors\", \"pubdate\", \"categories\"],\n",
|
||||
@@ -162,10 +162,8 @@
|
||||
" },\n",
|
||||
" return_source_documents=True,\n",
|
||||
")\n",
|
||||
"ans = chain(\n",
|
||||
" \"Please give me 10 papers to ask what is PageRank?\",\n",
|
||||
" callbacks=[StdOutCallbackHandler()],\n",
|
||||
")\n",
|
||||
"ans = chain(\"Please give me 10 papers to ask what is PageRank?\",\n",
|
||||
" callbacks=[StdOutCallbackHandler()])\n",
|
||||
"print(ans[\"answer\"])"
|
||||
]
|
||||
},
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -20,10 +20,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings"
|
||||
"from langchain.vectorstores import Chroma"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -50,10 +50,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import create_qa_with_sources_chain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains.combine_documents.stuff import StuffDocumentsChain\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.chains import create_qa_with_sources_chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -230,8 +230,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import ConversationalRetrievalChain, LLMChain\n",
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain.memory import ConversationBufferMemory\n",
|
||||
"from langchain.chains import LLMChain\n",
|
||||
"\n",
|
||||
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
|
||||
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
|
||||
@@ -356,10 +357,12 @@
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"from langchain.chains.openai_functions import create_qa_with_structure_chain\n",
|
||||
"\n",
|
||||
"from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate\n",
|
||||
"from langchain_core.messages import HumanMessage, SystemMessage\n",
|
||||
"from pydantic import BaseModel, Field"
|
||||
"from langchain.schema import SystemMessage, HumanMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,506 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f970f757-ec76-4bf0-90cd-a2fb68b945e3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Exploring OpenAI V1 functionality\n",
|
||||
"\n",
|
||||
"On 11.06.23 OpenAI released a number of new features, and along with it bumped their Python SDK to 1.0.0. This notebook shows off the new features and how to use them with LangChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ee897729-263a-4073-898f-bb4cf01ed829",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# need openai>=1.1.0, langchain>=0.0.335, langchain-experimental>=0.0.39\n",
|
||||
"!pip install -U openai langchain langchain-experimental"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c3e067ce-7a43-47a7-bc89-41f1de4cf136",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.messages import HumanMessage, SystemMessage\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fa7e7e95-90a1-4f73-98fe-10c4b4e0951b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## [Vision](https://platform.openai.com/docs/guides/vision)\n",
|
||||
"\n",
|
||||
"OpenAI released multi-modal models, which can take a sequence of text and images as input."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1c8c3965-d3c9-4186-b5f3-5e67855ef916",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='The image appears to be a diagram representing the architecture or components of a software system or framework related to language processing, possibly named LangChain or associated with a project or product called LangChain, based on the prominent appearance of that term. The diagram is organized into several layers or aspects, each containing various elements or modules:\\n\\n1. **Protocol**: This may be the foundational layer, which includes \"LCEL\" and terms like parallelization, fallbacks, tracing, batching, streaming, async, and composition. These seem related to communication and execution protocols for the system.\\n\\n2. **Integrations Components**: This layer includes \"Model I/O\" with elements such as the model, output parser, prompt, and example selector. It also has a \"Retrieval\" section with a document loader, retriever, embedding model, vector store, and text splitter. Lastly, there\\'s an \"Agent Tooling\" section. These components likely deal with the interaction with external data, models, and tools.\\n\\n3. **Application**: The application layer features \"LangChain\" with chains, agents, agent executors, and common application logic. This suggests that the system uses a modular approach with chains and agents to process language tasks.\\n\\n4. **Deployment**: This contains \"Lang')"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chat = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=256)\n",
|
||||
"chat.invoke(\n",
|
||||
" [\n",
|
||||
" HumanMessage(\n",
|
||||
" content=[\n",
|
||||
" {\"type\": \"text\", \"text\": \"What is this image showing\"},\n",
|
||||
" {\n",
|
||||
" \"type\": \"image_url\",\n",
|
||||
" \"image_url\": {\n",
|
||||
" \"url\": \"https://raw.githubusercontent.com/langchain-ai/langchain/master/docs/static/img/langchain_stack.png\",\n",
|
||||
" \"detail\": \"auto\",\n",
|
||||
" },\n",
|
||||
" },\n",
|
||||
" ]\n",
|
||||
" )\n",
|
||||
" ]\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "210f8248-fcf3-4052-a4a3-0684e08f8785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## [OpenAI assistants](https://platform.openai.com/docs/assistants/overview)\n",
|
||||
"\n",
|
||||
"> The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, Retrieval, and Function calling\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"You can interact with OpenAI Assistants using OpenAI tools or custom tools. When using exclusively OpenAI tools, you can just invoke the assistant directly and get final answers. When using custom tools, you can run the assistant and tool execution loop using the built-in AgentExecutor or easily write your own executor.\n",
|
||||
"\n",
|
||||
"Below we show the different ways to interact with Assistants. As a simple example, let's build a math tutor that can write and run code."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "318da28d-4cec-42ab-ae3e-76d95bb34fa5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using only OpenAI tools"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "a9064bbe-d9f7-4a29-a7b3-73933b3197e7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.openai_assistant import OpenAIAssistantRunnable"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "7a20a008-49ac-46d2-aa26-b270118af5ea",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[ThreadMessage(id='msg_g9OJv0rpPgnc3mHmocFv7OVd', assistant_id='asst_hTwZeNMMphxzSOqJ01uBMsJI', content=[MessageContentText(text=Text(annotations=[], value='The result of \\\\(10 - 4^{2.7}\\\\) is approximately \\\\(-32.224\\\\).'), type='text')], created_at=1699460600, file_ids=[], metadata={}, object='thread.message', role='assistant', run_id='run_nBIT7SiAwtUfSCTrQNSPLOfe', thread_id='thread_14n4GgXwxgNL0s30WJW5F6p0')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"interpreter_assistant = OpenAIAssistantRunnable.create_assistant(\n",
|
||||
" name=\"langchain assistant\",\n",
|
||||
" instructions=\"You are a personal math tutor. Write and run code to answer math questions.\",\n",
|
||||
" tools=[{\"type\": \"code_interpreter\"}],\n",
|
||||
" model=\"gpt-4-1106-preview\",\n",
|
||||
")\n",
|
||||
"output = interpreter_assistant.invoke({\"content\": \"What's 10 - 4 raised to the 2.7\"})\n",
|
||||
"output"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a8ddd181-ac63-4ab6-a40d-a236120379c1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### As a LangChain agent with arbitrary tools\n",
|
||||
"\n",
|
||||
"Now let's recreate this functionality using our own tools. For this example we'll use the [E2B sandbox runtime tool](https://e2b.dev/docs?ref=landing-page-get-started)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ee4cc355-f2d6-4c51-bcf7-f502868357d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install e2b duckduckgo-search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "48681ac7-b267-48d4-972c-8a7df8393a21",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import DuckDuckGoSearchRun, E2BDataAnalysisTool\n",
|
||||
"\n",
|
||||
"tools = [E2BDataAnalysisTool(api_key=\"...\"), DuckDuckGoSearchRun()]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "1c01dd79-dd3e-4509-a2e2-009a7f99f16a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = OpenAIAssistantRunnable.create_assistant(\n",
|
||||
" name=\"langchain assistant e2b tool\",\n",
|
||||
" instructions=\"You are a personal math tutor. Write and run code to answer math questions. You can also search the internet.\",\n",
|
||||
" tools=tools,\n",
|
||||
" model=\"gpt-4-1106-preview\",\n",
|
||||
" as_agent=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1ac71d8b-4b4b-4f98-b826-6b3c57a34166",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Using AgentExecutor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1f137f94-801f-4766-9ff5-2de9df5e8079",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'content': \"What's the weather in SF today divided by 2.7\",\n",
|
||||
" 'output': \"The weather in San Francisco today is reported to have temperatures as high as 66 °F. To get the temperature divided by 2.7, we will calculate that:\\n\\n66 °F / 2.7 = 24.44 °F\\n\\nSo, when the high temperature of 66 °F is divided by 2.7, the result is approximately 24.44 °F. Please note that this doesn't have a meteorological meaning; it's purely a mathematical operation based on the given temperature.\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.agents import AgentExecutor\n",
|
||||
"\n",
|
||||
"agent_executor = AgentExecutor(agent=agent, tools=tools)\n",
|
||||
"agent_executor.invoke({\"content\": \"What's the weather in SF today divided by 2.7\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2d0a0b1d-c1b3-4b50-9dce-1189b51a6206",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Custom execution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "c0475fa7-b6c1-4331-b8e2-55407466c724",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = OpenAIAssistantRunnable.create_assistant(\n",
|
||||
" name=\"langchain assistant e2b tool\",\n",
|
||||
" instructions=\"You are a personal math tutor. Write and run code to answer math questions.\",\n",
|
||||
" tools=tools,\n",
|
||||
" model=\"gpt-4-1106-preview\",\n",
|
||||
" as_agent=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "b76cb669-6aba-4827-868f-00aa960026f2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.agents import AgentFinish\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def execute_agent(agent, tools, input):\n",
|
||||
" tool_map = {tool.name: tool for tool in tools}\n",
|
||||
" response = agent.invoke(input)\n",
|
||||
" while not isinstance(response, AgentFinish):\n",
|
||||
" tool_outputs = []\n",
|
||||
" for action in response:\n",
|
||||
" tool_output = tool_map[action.tool].invoke(action.tool_input)\n",
|
||||
" print(action.tool, action.tool_input, tool_output, end=\"\\n\\n\")\n",
|
||||
" tool_outputs.append(\n",
|
||||
" {\"output\": tool_output, \"tool_call_id\": action.tool_call_id}\n",
|
||||
" )\n",
|
||||
" response = agent.invoke(\n",
|
||||
" {\n",
|
||||
" \"tool_outputs\": tool_outputs,\n",
|
||||
" \"run_id\": action.run_id,\n",
|
||||
" \"thread_id\": action.thread_id,\n",
|
||||
" }\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" return response"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7946116a-b82f-492e-835e-ca958a8949a5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"e2b_data_analysis {'python_code': 'print(10 - 4 ** 2.7)'} {\"stdout\": \"-32.22425314473263\", \"stderr\": \"\", \"artifacts\": []}\n",
|
||||
"\n",
|
||||
"\\( 10 - 4^{2.7} \\) is approximately \\(-32.22425314473263\\).\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response = execute_agent(agent, tools, {\"content\": \"What's 10 - 4 raised to the 2.7\"})\n",
|
||||
"print(response.return_values[\"output\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "f2744a56-9f4f-4899-827a-fa55821c318c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7\\nprint(result + 17.241)'} {\"stdout\": \"-14.983253144732629\", \"stderr\": \"\", \"artifacts\": []}\n",
|
||||
"\n",
|
||||
"When you add \\( 17.241 \\) to \\( 10 - 4^{2.7} \\), the result is approximately \\( -14.98325314473263 \\).\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"next_response = execute_agent(\n",
|
||||
" agent, tools, {\"content\": \"now add 17.241\", \"thread_id\": response.thread_id}\n",
|
||||
")\n",
|
||||
"print(next_response.return_values[\"output\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "71c34763-d1e7-4b9a-a9d7-3e4cc0dfc2c4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode)\n",
|
||||
"\n",
|
||||
"Constrain the model to only generate valid JSON. Note that you must include a system message with instructions to use JSON for this mode to work.\n",
|
||||
"\n",
|
||||
"Only works with certain models. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "db6072c4-f3f3-415d-872b-71ea9f3c02bb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\").bind(\n",
|
||||
" response_format={\"type\": \"json_object\"}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"output = chat.invoke(\n",
|
||||
" [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"Extract the 'name' and 'origin' of any companies mentioned in the following statement. Return a JSON list.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Google was founded in the USA, while Deepmind was founded in the UK\"\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"print(output.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "08e00ccf-b991-4249-846b-9500a0ccbfa0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"json.loads(output.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aa9a94d9-4319-4ab7-a979-c475ce6b5f50",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## [System fingerprint](https://platform.openai.com/docs/guides/text-generation/reproducible-outputs)\n",
|
||||
"\n",
|
||||
"OpenAI sometimes changes model configurations in a way that impacts outputs. Whenever this happens, the system_fingerprint associated with a generation will change."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1281883c-bf8f-4665-89cd-4f33ccde69ab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat = ChatOpenAI(model=\"gpt-3.5-turbo-1106\")\n",
|
||||
"output = chat.generate(\n",
|
||||
" [\n",
|
||||
" [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=\"Extract the 'name' and 'origin' of any companies mentioned in the following statement. Return a JSON list.\"\n",
|
||||
" ),\n",
|
||||
" HumanMessage(\n",
|
||||
" content=\"Google was founded in the USA, while Deepmind was founded in the UK\"\n",
|
||||
" ),\n",
|
||||
" ]\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"print(output.llm_output)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aa6565be-985d-4127-848e-c3bca9d7b434",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Breaking changes to Azure classes\n",
|
||||
"\n",
|
||||
"OpenAI V1 rewrote their clients and separated Azure and OpenAI clients. This has led to some changes in LangChain interfaces when using OpenAI V1.\n",
|
||||
"\n",
|
||||
"BREAKING CHANGES:\n",
|
||||
"- To use Azure embeddings with OpenAI V1, you'll need to use the new `AzureOpenAIEmbeddings` instead of the existing `OpenAIEmbeddings`. `OpenAIEmbeddings` continue to work when using Azure with `openai<1`.\n",
|
||||
"```python\n",
|
||||
"from langchain_openai import AzureOpenAIEmbeddings\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"RECOMMENDED CHANGES:\n",
|
||||
"- When using `AzureChatOpenAI` or `AzureOpenAI`, if passing in an Azure endpoint (eg https://example-resource.azure.openai.com/) this should be specified via the `azure_endpoint` parameter or the `AZURE_OPENAI_ENDPOINT`. We're maintaining backwards compatibility for now with specifying this via `openai_api_base`/`base_url` or env var `OPENAI_API_BASE` but this shouldn't be relied upon.\n",
|
||||
"- When using Azure chat or embedding models, pass in API keys either via `openai_api_key` parameter or `AZURE_OPENAI_API_KEY` parameter. We're maintaining backwards compatibility for now with specifying this via `OPENAI_API_KEY` but this shouldn't be relied upon."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "49944887-3972-497e-8da2-6d32d44345a9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"\n",
|
||||
"Use tools for parallel function calling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "916292d8-0f89-40a6-af1c-5a1122327de8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[GetCurrentWeather(location='New York, NY', unit='fahrenheit'),\n",
|
||||
" GetCurrentWeather(location='Los Angeles, CA', unit='fahrenheit'),\n",
|
||||
" GetCurrentWeather(location='San Francisco, CA', unit='fahrenheit')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Literal\n",
|
||||
"\n",
|
||||
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
|
||||
"from langchain.utils.openai_functions import convert_pydantic_to_openai_tool\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class GetCurrentWeather(BaseModel):\n",
|
||||
" \"\"\"Get the current weather in a location.\"\"\"\n",
|
||||
"\n",
|
||||
" location: str = Field(description=\"The city and state, e.g. San Francisco, CA\")\n",
|
||||
" unit: Literal[\"celsius\", \"fahrenheit\"] = Field(\n",
|
||||
" default=\"fahrenheit\", description=\"The temperature unit, default to fahrenheit\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [(\"system\", \"You are a helpful assistant\"), (\"user\", \"{input}\")]\n",
|
||||
")\n",
|
||||
"model = ChatOpenAI(model=\"gpt-3.5-turbo-1106\").bind(\n",
|
||||
" tools=[convert_pydantic_to_openai_tool(GetCurrentWeather)]\n",
|
||||
")\n",
|
||||
"chain = prompt | model | PydanticToolsParser(tools=[GetCurrentWeather])\n",
|
||||
"\n",
|
||||
"chain.invoke({\"input\": \"what's the weather in NYC, LA, and SF\"})"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "poetry-venv",
|
||||
"language": "python",
|
||||
"name": "poetry-venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -45,14 +45,14 @@
|
||||
"source": [
|
||||
"import collections\n",
|
||||
"import inspect\n",
|
||||
"\n",
|
||||
"import tenacity\n",
|
||||
"from langchain.output_parsers import RegexParser\n",
|
||||
"\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import (\n",
|
||||
" HumanMessage,\n",
|
||||
" SystemMessage,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.output_parsers import RegexParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -146,7 +146,7 @@
|
||||
" ):\n",
|
||||
" with attempt:\n",
|
||||
" action = self._act()\n",
|
||||
" except tenacity.RetryError:\n",
|
||||
" except tenacity.RetryError as e:\n",
|
||||
" action = self.random_action()\n",
|
||||
" return action"
|
||||
]
|
||||
|
||||
@@ -29,15 +29,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents.tools import Tool\n",
|
||||
"from langchain.chains import LLMMathChain\n",
|
||||
"from langchain_community.utilities import DuckDuckGoSearchAPIWrapper\n",
|
||||
"from langchain_core.tools import Tool\n",
|
||||
"from langchain_experimental.plan_and_execute import (\n",
|
||||
" PlanAndExecute,\n",
|
||||
" load_agent_executor,\n",
|
||||
" load_chat_planner,\n",
|
||||
")\n",
|
||||
"from langchain_openai import ChatOpenAI, OpenAI"
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.utilities import DuckDuckGoSearchAPIWrapper\n",
|
||||
"from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_executor, load_chat_planner"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -59,16 +56,16 @@
|
||||
"llm = OpenAI(temperature=0)\n",
|
||||
"llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)\n",
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"useful for when you need to answer questions about current events\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"Calculator\",\n",
|
||||
" func=llm_math_chain.run,\n",
|
||||
" description=\"useful for when you need to answer questions about math\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"useful for when you need to answer questions about current events\"\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"Calculator\",\n",
|
||||
" func=llm_math_chain.run,\n",
|
||||
" description=\"useful for when you need to answer questions about math\"\n",
|
||||
" ),\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
@@ -219,9 +216,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\n",
|
||||
" \"Who is the current prime minister of the UK? What is their current age raised to the 0.43 power?\"\n",
|
||||
")"
|
||||
"agent.run(\"Who is the current prime minister of the UK? What is their current age raised to the 0.43 power?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -55,7 +55,6 @@
|
||||
"source": [
|
||||
"# Setup API keys for Kay and OpenAI\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"KAY_API_KEY = getpass()\n",
|
||||
"OPENAI_API_KEY = getpass()"
|
||||
]
|
||||
@@ -68,7 +67,6 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"KAY_API_KEY\"] = KAY_API_KEY\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
|
||||
]
|
||||
@@ -81,13 +79,11 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.retrievers import KayAiRetriever\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
|
||||
"retriever = KayAiRetriever.create(\n",
|
||||
" dataset_id=\"company\", data_types=[\"PressRelease\"], num_contexts=6\n",
|
||||
")\n",
|
||||
"retriever = KayAiRetriever.create(dataset_id=\"company\", data_types=[\"PressRelease\"], num_contexts=6)\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||||
]
|
||||
},
|
||||
@@ -120,7 +116,7 @@
|
||||
"# More sample questions in the Playground on https://kay.ai\n",
|
||||
"questions = [\n",
|
||||
" \"How is the healthcare industry adopting generative AI tools?\",\n",
|
||||
" # \"What are some recent challenges faced by the renewable energy sector?\",\n",
|
||||
" #\"What are some recent challenges faced by the renewable energy sector?\",\n",
|
||||
"]\n",
|
||||
"chat_history = []\n",
|
||||
"\n",
|
||||
|
||||
@@ -18,7 +18,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_experimental.pal_chain import PALChain\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,193 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# RAG based on Qianfan and BES"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook is an implementation of Retrieval augmented generation (RAG) using Baidu Qianfan Platform combined with Baidu ElasricSearch, where the original data is located on BOS.\n",
|
||||
"## Baidu Qianfan\n",
|
||||
"Baidu AI Cloud Qianfan Platform is a one-stop large model development and service operation platform for enterprise developers. Qianfan not only provides including the model of Wenxin Yiyan (ERNIE-Bot) and the third-party open-source models, but also provides various AI development tools and the whole set of development environment, which facilitates customers to use and develop large model applications easily.\n",
|
||||
"\n",
|
||||
"## Baidu ElasticSearch\n",
|
||||
"[Baidu Cloud VectorSearch](https://cloud.baidu.com/doc/BES/index.html?from=productToDoc) is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation and Setup\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install qianfan\n",
|
||||
"#!pip install bce-python-sdk\n",
|
||||
"#!pip install elasticsearch == 7.11.0\n",
|
||||
"#!pip install sentence-transformers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sentence_transformers\n",
|
||||
"from baidubce.auth.bce_credentials import BceCredentials\n",
|
||||
"from baidubce.bce_client_configuration import BceClientConfiguration\n",
|
||||
"from langchain.chains.retrieval_qa import RetrievalQA\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_community.document_loaders.baiducloud_bos_directory import (\n",
|
||||
" BaiduBOSDirectoryLoader,\n",
|
||||
")\n",
|
||||
"from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings\n",
|
||||
"from langchain_community.llms.baidu_qianfan_endpoint import QianfanLLMEndpoint\n",
|
||||
"from langchain_community.vectorstores import BESVectorStore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Document loading"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"bos_host = \"your bos eddpoint\"\n",
|
||||
"access_key_id = \"your bos access ak\"\n",
|
||||
"secret_access_key = \"your bos access sk\"\n",
|
||||
"\n",
|
||||
"# create BceClientConfiguration\n",
|
||||
"config = BceClientConfiguration(\n",
|
||||
" credentials=BceCredentials(access_key_id, secret_access_key), endpoint=bos_host\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"loader = BaiduBOSDirectoryLoader(conf=config, bucket=\"llm-test\", prefix=\"llm/\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)\n",
|
||||
"split_docs = text_splitter.split_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Embedding and VectorStore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings = HuggingFaceEmbeddings(model_name=\"shibing624/text2vec-base-chinese\")\n",
|
||||
"embeddings.client = sentence_transformers.SentenceTransformer(embeddings.model_name)\n",
|
||||
"\n",
|
||||
"db = BESVectorStore.from_documents(\n",
|
||||
" documents=split_docs,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" bes_url=\"your bes url\",\n",
|
||||
" index_name=\"test-index\",\n",
|
||||
" vector_query_field=\"vector\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"db.client.indices.refresh(index=\"test-index\")\n",
|
||||
"retriever = db.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## QA Retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = QianfanLLMEndpoint(\n",
|
||||
" model=\"ERNIE-Bot\",\n",
|
||||
" qianfan_ak=\"your qianfan ak\",\n",
|
||||
" qianfan_sk=\"your qianfan sk\",\n",
|
||||
" streaming=True,\n",
|
||||
")\n",
|
||||
"qa = RetrievalQA.from_chain_type(\n",
|
||||
" llm=llm, chain_type=\"refine\", retriever=retriever, return_source_documents=True\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query = \"什么是张量?\"\n",
|
||||
"print(qa.run(query))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> 张量(Tensor)是一个数学概念,用于表示多维数据。它是一个可以表示多个数值的数组,可以是标量、向量、矩阵等。在深度学习和人工智能领域中,张量常用于表示神经网络的输入、输出和权重等。"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.12"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -19,9 +19,7 @@
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"For this example, we will use Pinecone and some fake data. To configure Pinecone, set the following environment variable:\n",
|
||||
"\n",
|
||||
"- `PINECONE_API_KEY`: Your Pinecone API key"
|
||||
"For this example, we will use Pinecone and some fake data"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -31,8 +29,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_pinecone import PineconeVectorStore"
|
||||
"import pinecone\n",
|
||||
"from langchain.vectorstores import Pinecone\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"pinecone.init(api_key=\"...\",environment=\"...\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -52,7 +53,7 @@
|
||||
" \"doc7\": \"Climate change: The science and models.\",\n",
|
||||
" \"doc8\": \"Global warming: A subset of climate change.\",\n",
|
||||
" \"doc9\": \"How climate change affects daily weather.\",\n",
|
||||
" \"doc10\": \"The history of climate change activism.\",\n",
|
||||
" \"doc10\": \"The history of climate change activism.\"\n",
|
||||
"}"
|
||||
]
|
||||
},
|
||||
@@ -63,9 +64,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = PineconeVectorStore.from_texts(\n",
|
||||
" list(all_documents.values()), OpenAIEmbeddings(), index_name=\"rag-fusion\"\n",
|
||||
")"
|
||||
"vectorstore = Pinecone.from_texts(list(all_documents.values()), OpenAIEmbeddings(), index_name='rag-fusion')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -85,8 +84,9 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_openai import ChatOpenAI"
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.prompts import ChatPromptTemplate\n",
|
||||
"from langchain.schema.output_parser import StrOutputParser"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -98,7 +98,7 @@
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"\n",
|
||||
"prompt = hub.pull(\"langchain-ai/rag-fusion-query-generation\")"
|
||||
"prompt = hub.pull('langchain-ai/rag-fusion-query-generation')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -122,9 +122,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"generate_queries = (\n",
|
||||
" prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split(\"\\n\"))\n",
|
||||
")"
|
||||
"generate_queries = prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split(\"\\n\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -161,7 +159,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = PineconeVectorStore.from_existing_index(\"rag-fusion\", OpenAIEmbeddings())\n",
|
||||
"vectorstore = Pinecone.from_existing_index(\"rag-fusion\", OpenAIEmbeddings())\n",
|
||||
"retriever = vectorstore.as_retriever()"
|
||||
]
|
||||
},
|
||||
@@ -173,8 +171,6 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.load import dumps, loads\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def reciprocal_rank_fusion(results: list[list], k=60):\n",
|
||||
" fused_scores = {}\n",
|
||||
" for docs in results:\n",
|
||||
@@ -185,12 +181,9 @@
|
||||
" fused_scores[doc_str] = 0\n",
|
||||
" previous_score = fused_scores[doc_str]\n",
|
||||
" fused_scores[doc_str] += 1 / (rank + k)\n",
|
||||
"\n",
|
||||
" reranked_results = [\n",
|
||||
" (loads(doc), score)\n",
|
||||
" for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)\n",
|
||||
" ]\n",
|
||||
" return reranked_results"
|
||||
" \n",
|
||||
" reranked_results = [(loads(doc), score) for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)]\n",
|
||||
" return reranked_results "
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,591 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6195da33-34c3-4ca2-943a-050b6dcbacbc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Embedding Documents using Optimized and Quantized Embedders\n",
|
||||
"\n",
|
||||
"In this tutorial, we will demo how to build a RAG pipeline, with the embedding for all documents done using Quantized Embedders.\n",
|
||||
"\n",
|
||||
"We will use a pipeline that will:\n",
|
||||
"\n",
|
||||
"* Create a document collection.\n",
|
||||
"* Embed all documents using Quantized Embedders.\n",
|
||||
"* Fetch relevant documents for our question.\n",
|
||||
"* Run an LLM answer the question.\n",
|
||||
"\n",
|
||||
"For more information about optimized models, we refer to [optimum-intel](https://github.com/huggingface/optimum-intel.git) and [IPEX](https://github.com/intel/intel-extension-for-pytorch).\n",
|
||||
"\n",
|
||||
"This tutorial is based on the [Langchain RAG tutorial here](https://towardsai.net/p/machine-learning/dense-x-retrieval-technique-in-langchain-and-llamaindex)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "26db2da5-3733-4a90-909e-6c11508ea140",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import uuid\n",
|
||||
"from pathlib import Path\n",
|
||||
"\n",
|
||||
"import langchain\n",
|
||||
"import torch\n",
|
||||
"from bs4 import BeautifulSoup as Soup\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryByteStore, LocalFileStore\n",
|
||||
"\n",
|
||||
"# For our example, we'll load docs from the web\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter # noqa\n",
|
||||
"from langchain_community.document_loaders.recursive_url_loader import (\n",
|
||||
" RecursiveUrlLoader,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# noqa\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"\n",
|
||||
"DOCSTORE_DIR = \".\"\n",
|
||||
"DOCSTORE_ID_KEY = \"doc_id\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f5ccda4e-7af5-4355-b9c4-25547edf33f9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Lets first load up this paper, and split into text chunks of size 1000."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "5f4d8888-53a6-49f5-a198-da5c92419ca4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Loaded 1 documents\n",
|
||||
"Split into 73 documents\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Could add more parsing here, as it's very raw.\n",
|
||||
"loader = RecursiveUrlLoader(\n",
|
||||
" \"https://ar5iv.labs.arxiv.org/html/1706.03762\",\n",
|
||||
" max_depth=2,\n",
|
||||
" extractor=lambda x: Soup(x, \"html.parser\").text,\n",
|
||||
")\n",
|
||||
"data = loader.load()\n",
|
||||
"print(f\"Loaded {len(data)} documents\")\n",
|
||||
"\n",
|
||||
"# Split\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"all_splits = text_splitter.split_documents(data)\n",
|
||||
"print(f\"Split into {len(all_splits)} documents\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73e90632-2ac2-49eb-80da-ffe9ac4a278d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In order to embed our documents, we can use the ```QuantizedBiEncoderEmbeddings```, for efficient and fast embedding. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "9a68a6f6-332d-481e-bbea-ad763155ea36",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "89af89b48c55409b9999b8e0387fab5b",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"config.json: 0%| | 0.00/747 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "01ad1b6278194b53bf6a5a286a311864",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"pytorch_model.bin: 0%| | 0.00/45.9M [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "cb3bd1b88f7743c3b0322da3f021325c",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"inc_config.json: 0%| | 0.00/287 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"loading configuration file inc_config.json from cache at \n",
|
||||
"INCConfig {\n",
|
||||
" \"distillation\": {},\n",
|
||||
" \"neural_compressor_version\": \"2.4.1\",\n",
|
||||
" \"optimum_version\": \"1.16.2\",\n",
|
||||
" \"pruning\": {},\n",
|
||||
" \"quantization\": {\n",
|
||||
" \"dataset_num_samples\": 50,\n",
|
||||
" \"is_static\": true\n",
|
||||
" },\n",
|
||||
" \"save_onnx_model\": false,\n",
|
||||
" \"torch_version\": \"2.2.0\",\n",
|
||||
" \"transformers_version\": \"4.37.2\"\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "7439315ebcb746f5be11fe30bc7693f6",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer_config.json: 0%| | 0.00/1.24k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "05265a3912254ce1ad43cc8086bcb0ca",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "a48f4245c60744f28f37cd3a7a24d198",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"tokenizer.json: 0%| | 0.00/711k [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "584a63cace934033b4ab22d3a178582a",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"special_tokens_map.json: 0%| | 0.00/125 [00:00<?, ?B/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.embeddings import QuantizedBiEncoderEmbeddings\n",
|
||||
"from langchain_core.embeddings import Embeddings\n",
|
||||
"\n",
|
||||
"model_name = \"Intel/bge-small-en-v1.5-rag-int8-static\"\n",
|
||||
"encode_kwargs = {\"normalize_embeddings\": True} # set True to compute cosine similarity\n",
|
||||
"\n",
|
||||
"model_inc = QuantizedBiEncoderEmbeddings(\n",
|
||||
" model_name=model_name,\n",
|
||||
" encode_kwargs=encode_kwargs,\n",
|
||||
" query_instruction=\"Represent this sentence for searching relevant passages: \",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "360b2837-8024-47e0-a4ba-592505a9a5c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With our embedder in place, lets define our retriever:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "18bc0a73-1a13-4b2f-96ac-05a5313343b7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_multi_vector_retriever(\n",
|
||||
" docstore_id_key: str, collection_name: str, embedding_function: Embeddings\n",
|
||||
"):\n",
|
||||
" \"\"\"Create the composed retriever object.\"\"\"\n",
|
||||
" vectorstore = Chroma(\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" embedding_function=embedding_function,\n",
|
||||
" )\n",
|
||||
" store = InMemoryByteStore()\n",
|
||||
"\n",
|
||||
" return MultiVectorRetriever(\n",
|
||||
" vectorstore=vectorstore,\n",
|
||||
" byte_store=store,\n",
|
||||
" id_key=docstore_id_key,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"retriever = get_multi_vector_retriever(DOCSTORE_ID_KEY, \"multi_vec_store\", model_inc)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8484078e-1bf0-4080-a354-ef23823fd6dc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we divide each chunk into sub-docs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "e12f48d4-6562-416b-8f28-342912e5756e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
|
||||
"id_key = \"doc_id\"\n",
|
||||
"doc_ids = [str(uuid.uuid4()) for _ in all_splits]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "a268ef5f-91c2-4d8e-87f0-53db376e6a29",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sub_docs = []\n",
|
||||
"for i, doc in enumerate(all_splits):\n",
|
||||
" _id = doc_ids[i]\n",
|
||||
" _sub_docs = child_text_splitter.split_documents([doc])\n",
|
||||
" for _doc in _sub_docs:\n",
|
||||
" _doc.metadata[id_key] = _id\n",
|
||||
" sub_docs.extend(_sub_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d84ea8f4-a5de-4d76-b44d-85e56583f489",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Lets write our documents into our new store. This will use our embedder on each document."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "1af831ce-0eae-44bc-aca7-4d691063640b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Batches: 100%|██████████| 8/8 [00:00<00:00, 9.05it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever.vectorstore.add_documents(sub_docs)\n",
|
||||
"retriever.docstore.mset(list(zip(doc_ids, all_splits)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "580bc212-8ecd-4d28-8656-b96fcd0d7eb6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Our retriever is good to go. Lets load up an LLM, that will reason over the retrieved documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "008c992f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": []
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "cbe70583ad964ae19582b72dab396784",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import torch\n",
|
||||
"from langchain.llms.huggingface_pipeline import HuggingFacePipeline\n",
|
||||
"from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
|
||||
"\n",
|
||||
"model_id = \"Intel/neural-chat-7b-v3-3\"\n",
|
||||
"tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
|
||||
"model = AutoModelForCausalLM.from_pretrained(\n",
|
||||
" model_id, device_map=\"auto\", torch_dtype=torch.bfloat16\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer, max_new_tokens=100)\n",
|
||||
"\n",
|
||||
"hf = HuggingFacePipeline(pipeline=pipe)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6dd21fb2-0442-477d-aae2-9e7ee1d1d778",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we will load up a prompt for answering questions using retrieved documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "5e582509-caaf-4920-932c-4ce16162c789",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"\n",
|
||||
"prompt = hub.pull(\"rlm/rag-prompt\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5cdfcba5-7ec7-4d0a-820e-4e200643a882",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now build our pipeline:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "b74d8dfb-72bb-46da-9df9-0dc47a3ac791",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema.runnable import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"rag_chain = {\"context\": retriever, \"question\": RunnablePassthrough()} | prompt | hf"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3bc53602-86d6-420f-91b1-fc2effa7e986",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Excellent! lets ask it a question.\n",
|
||||
"We will also use a verbose and debug, to check which documents were used by the model to produce the answer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "f0a92c07-53da-4e1f-b880-ee83a36ee17d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence] Entering Chain run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question>] Entering Chain run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question> > 4:chain:RunnablePassthrough] Entering Chain run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question> > 4:chain:RunnablePassthrough] [1ms] Exiting Chain run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"output\": \"What is the first transduction model relying entirely on self-attention?\"\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question>] [66ms] Exiting Chain run with output:\n",
|
||||
"\u001b[0m[outputs]\n",
|
||||
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 5:prompt:ChatPromptTemplate] Entering Prompt run with input:\n",
|
||||
"\u001b[0m[inputs]\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 5:prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"prompts\",\n",
|
||||
" \"chat\",\n",
|
||||
" \"ChatPromptValue\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"messages\": [\n",
|
||||
" {\n",
|
||||
" \"lc\": 1,\n",
|
||||
" \"type\": \"constructor\",\n",
|
||||
" \"id\": [\n",
|
||||
" \"langchain\",\n",
|
||||
" \"schema\",\n",
|
||||
" \"messages\",\n",
|
||||
" \"HumanMessage\"\n",
|
||||
" ],\n",
|
||||
" \"kwargs\": {\n",
|
||||
" \"content\": \"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: What is the first transduction model relying entirely on self-attention? \\nContext: [Document(page_content='To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.\\\\nIn the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as (neural_gpu, ; NalBytenet2017, ) and (JonasFaceNet2017, ).\\\\n\\\\n\\\\n\\\\n\\\\n3 Model Architecture\\\\n\\\\nFigure 1: The Transformer - model architecture.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.\\\\n\\\\n\\\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles. \\\\n\\\\n\\\\nWe are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video.\\\\nMaking generation less sequential is another research goals of ours.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences (bahdanau2014neural, ; structuredAttentionNetworks, ). In all but a few cases (decomposableAttnModel, ), however, such attention mechanisms are used in conjunction with a recurrent network.\\\\n\\\\n\\\\nIn this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n2 Background', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'})] \\nAnswer:\",\n",
|
||||
" \"additional_kwargs\": {}\n",
|
||||
" }\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 6:llm:HuggingFacePipeline] Entering LLM run with input:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"prompts\": [\n",
|
||||
" \"Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: What is the first transduction model relying entirely on self-attention? \\nContext: [Document(page_content='To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.\\\\nIn the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as (neural_gpu, ; NalBytenet2017, ) and (JonasFaceNet2017, ).\\\\n\\\\n\\\\n\\\\n\\\\n3 Model Architecture\\\\n\\\\nFigure 1: The Transformer - model architecture.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.\\\\n\\\\n\\\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles. \\\\n\\\\n\\\\nWe are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video.\\\\nMaking generation less sequential is another research goals of ours.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences (bahdanau2014neural, ; structuredAttentionNetworks, ). In all but a few cases (decomposableAttnModel, ), however, such attention mechanisms are used in conjunction with a recurrent network.\\\\n\\\\n\\\\nIn this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n2 Background', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'})] \\nAnswer:\"\n",
|
||||
" ]\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 6:llm:HuggingFacePipeline] [4.34s] Exiting LLM run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"generations\": [\n",
|
||||
" [\n",
|
||||
" {\n",
|
||||
" \"text\": \" The first transduction model relying entirely on self-attention is the Transformer.\",\n",
|
||||
" \"generation_info\": null,\n",
|
||||
" \"type\": \"Generation\"\n",
|
||||
" }\n",
|
||||
" ]\n",
|
||||
" ],\n",
|
||||
" \"llm_output\": null,\n",
|
||||
" \"run\": null\n",
|
||||
"}\n",
|
||||
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence] [4.41s] Exiting Chain run with output:\n",
|
||||
"\u001b[0m{\n",
|
||||
" \"output\": \" The first transduction model relying entirely on self-attention is the Transformer.\"\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"langchain.verbose = True\n",
|
||||
"langchain.debug = True\n",
|
||||
"\n",
|
||||
"llm_res = rag_chain.invoke(\n",
|
||||
" \"What is the first transduction model relying entirely on self-attention?\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "023404a1-401a-46e1-8ab5-cafbc8593b04",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' The first transduction model relying entirely on self-attention is the Transformer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm_res"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0eaefd01-254a-445d-a95f-37889c126e0e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Based on the retrieved documents, the answer is indeed correct :)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.18"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user