mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-05 08:40:36 +00:00
Compare commits
8 Commits
cc/bench
...
eugene/why
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
14347acbce | ||
|
|
3ece5497ac | ||
|
|
1b053e961f | ||
|
|
54d5b74b00 | ||
|
|
15d49d3df2 | ||
|
|
2b38a4ee55 | ||
|
|
e8ce5cde99 | ||
|
|
a7aad27cba |
@@ -5,31 +5,26 @@ This project includes a [dev container](https://containers.dev/), which lets you
|
||||
You can use the dev container configuration in this folder to build and run the app without needing to install any of its tools locally! You can use it in [GitHub Codespaces](https://github.com/features/codespaces) or the [VS Code Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).
|
||||
|
||||
## GitHub Codespaces
|
||||
|
||||
[](https://codespaces.new/langchain-ai/langchain)
|
||||
|
||||
You may use the button above, or follow these steps to open this repo in a Codespace:
|
||||
|
||||
1. Click the **Code** drop-down menu at the top of <https://github.com/langchain-ai/langchain>.
|
||||
1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain.
|
||||
1. Click on the **Codespaces** tab.
|
||||
1. Click **Create codespace on master**.
|
||||
|
||||
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
|
||||
|
||||
|
||||
## VS Code Dev Containers
|
||||
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
|
||||
> [!NOTE]
|
||||
> If you click the link above you will open the main repo (`langchain-ai/langchain`) and *not* your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
|
||||
|
||||
```txt
|
||||
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<YOUR_USERNAME>/<YOUR_CLONED_REPO_NAME>
|
||||
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
|
||||
```
|
||||
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
|
||||
|
||||
```
|
||||
Then you will have a local cloned repo where you can contribute and then create pull requests.
|
||||
|
||||
If you already have VS Code and Docker installed, you can use the button above to get started. This will use VSCode to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
|
||||
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
|
||||
|
||||
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
|
||||
|
||||
@@ -45,5 +40,5 @@ You can learn more in the [Dev Containers documentation](https://code.visualstud
|
||||
|
||||
## Tips and tricks
|
||||
|
||||
- If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
|
||||
- If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
|
||||
* If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
|
||||
* If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
|
||||
|
||||
@@ -1,58 +1,36 @@
|
||||
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
|
||||
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-docker-compose
|
||||
{
|
||||
// Name for the dev container
|
||||
"name": "langchain",
|
||||
// Point to a Docker Compose file
|
||||
"dockerComposeFile": "./docker-compose.yaml",
|
||||
// Required when using Docker Compose. The name of the service to connect to once running
|
||||
"service": "langchain",
|
||||
// The optional 'workspaceFolder' property is the path VS Code should open by default when
|
||||
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
|
||||
"workspaceFolder": "/workspaces/langchain",
|
||||
"mounts": [
|
||||
"source=langchain-workspaces,target=/workspaces/langchain,type=volume"
|
||||
],
|
||||
// Prevent the container from shutting down
|
||||
"overrideCommand": true,
|
||||
// Features to add to the dev container. More info: https://containers.dev/features
|
||||
"features": {
|
||||
"ghcr.io/devcontainers/features/git:1": {},
|
||||
"ghcr.io/devcontainers/features/github-cli:1": {}
|
||||
},
|
||||
"containerEnv": {
|
||||
"UV_LINK_MODE": "copy"
|
||||
},
|
||||
// Use 'forwardPorts' to make a list of ports inside the container available locally.
|
||||
// "forwardPorts": [],
|
||||
// Run commands after the container is created
|
||||
"postCreateCommand": "uv sync && echo 'LangChain (Python) dev environment ready!'",
|
||||
// Configure tool-specific properties.
|
||||
"customizations": {
|
||||
"vscode": {
|
||||
"extensions": [
|
||||
"ms-python.python",
|
||||
"ms-python.debugpy",
|
||||
"ms-python.mypy-type-checker",
|
||||
"ms-python.isort",
|
||||
"unifiedjs.vscode-mdx",
|
||||
"davidanson.vscode-markdownlint",
|
||||
"ms-toolsai.jupyter",
|
||||
"GitHub.copilot",
|
||||
"GitHub.copilot-chat"
|
||||
],
|
||||
"settings": {
|
||||
"python.defaultInterpreterPath": ".venv/bin/python",
|
||||
"python.formatting.provider": "none",
|
||||
"[python]": {
|
||||
"editor.formatOnSave": true,
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
|
||||
// "remoteUser": "root"
|
||||
// Name for the dev container
|
||||
"name": "langchain",
|
||||
|
||||
// Point to a Docker Compose file
|
||||
"dockerComposeFile": "./docker-compose.yaml",
|
||||
|
||||
// Required when using Docker Compose. The name of the service to connect to once running
|
||||
"service": "langchain",
|
||||
|
||||
// The optional 'workspaceFolder' property is the path VS Code should open by default when
|
||||
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
|
||||
"workspaceFolder": "/workspaces/langchain",
|
||||
|
||||
// Prevent the container from shutting down
|
||||
"overrideCommand": true
|
||||
|
||||
// Features to add to the dev container. More info: https://containers.dev/features
|
||||
// "features": {
|
||||
// "ghcr.io/devcontainers-contrib/features/poetry:2": {}
|
||||
// }
|
||||
|
||||
// Use 'forwardPorts' to make a list of ports inside the container available locally.
|
||||
// "forwardPorts": [],
|
||||
|
||||
// Uncomment the next line to run commands after the container is created.
|
||||
// "postCreateCommand": "cat /etc/os-release",
|
||||
|
||||
// Configure tool-specific properties.
|
||||
// "customizations": {},
|
||||
|
||||
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
|
||||
// "remoteUser": "root"
|
||||
}
|
||||
|
||||
@@ -4,9 +4,26 @@ services:
|
||||
build:
|
||||
dockerfile: libs/langchain/dev.Dockerfile
|
||||
context: ..
|
||||
|
||||
volumes:
|
||||
# Update this to wherever you want VS Code to mount the folder of your project
|
||||
- ..:/workspaces/langchain:cached
|
||||
networks:
|
||||
- langchain-network
|
||||
# environment:
|
||||
# MONGO_ROOT_USERNAME: root
|
||||
# MONGO_ROOT_PASSWORD: example123
|
||||
# depends_on:
|
||||
# - mongo
|
||||
# mongo:
|
||||
# image: mongo
|
||||
# restart: unless-stopped
|
||||
# environment:
|
||||
# MONGO_INITDB_ROOT_USERNAME: root
|
||||
# MONGO_INITDB_ROOT_PASSWORD: example123
|
||||
# ports:
|
||||
# - "27017:27017"
|
||||
# networks:
|
||||
# - langchain-network
|
||||
|
||||
networks:
|
||||
langchain-network:
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
# top-most EditorConfig file
|
||||
root = true
|
||||
|
||||
# All files
|
||||
[*]
|
||||
charset = utf-8
|
||||
end_of_line = lf
|
||||
insert_final_newline = true
|
||||
trim_trailing_whitespace = true
|
||||
|
||||
# Python files
|
||||
[*.py]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
max_line_length = 88
|
||||
|
||||
# JSON files
|
||||
[*.json]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# YAML files
|
||||
[*.{yml,yaml}]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# Markdown files
|
||||
[*.md]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
trim_trailing_whitespace = false
|
||||
|
||||
# Configuration files
|
||||
[*.{toml,ini,cfg}]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
|
||||
# Shell scripts
|
||||
[*.sh]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# Makefile
|
||||
[Makefile]
|
||||
indent_style = tab
|
||||
indent_size = 4
|
||||
|
||||
# Jupyter notebooks
|
||||
[*.ipynb]
|
||||
# Jupyter may include trailing whitespace in cell
|
||||
# outputs that's semantically meaningful
|
||||
trim_trailing_whitespace = false
|
||||
3
.github/CODEOWNERS
vendored
3
.github/CODEOWNERS
vendored
@@ -1,3 +0,0 @@
|
||||
/.github/ @baskaryan @ccurme @eyurtsev
|
||||
/libs/core/ @eyurtsev
|
||||
/libs/packages.yml @ccurme
|
||||
2
.github/CODE_OF_CONDUCT.md
vendored
2
.github/CODE_OF_CONDUCT.md
vendored
@@ -129,4 +129,4 @@ For answers to common questions about this code of conduct, see the FAQ at
|
||||
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
||||
[Mozilla CoC]: https://github.com/mozilla/diversity
|
||||
[FAQ]: https://www.contributor-covenant.org/faq
|
||||
[translations]: https://www.contributor-covenant.org/translations
|
||||
[translations]: https://www.contributor-covenant.org/translations
|
||||
6
.github/CONTRIBUTING.md
vendored
6
.github/CONTRIBUTING.md
vendored
@@ -3,8 +3,4 @@
|
||||
Hi there! Thank you for even being interested in contributing to LangChain.
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
|
||||
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
|
||||
## New features
|
||||
|
||||
For new features, please start a new [discussion](https://forum.langchain.com/), where the maintainers will help with scoping out the necessary changes.
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
Normal file
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
Normal file
@@ -0,0 +1,38 @@
|
||||
labels: [idea]
|
||||
body:
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I searched existing ideas and did not find a similar one
|
||||
required: true
|
||||
- label: I added a very descriptive title
|
||||
required: true
|
||||
- label: I've clearly described the feature request and motivation for it
|
||||
required: true
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
- type: textarea
|
||||
id: proposal
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Proposal (If applicable)
|
||||
description: |
|
||||
If you would like to propose a solution, please describe it here.
|
||||
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
Normal file
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
Normal file
@@ -0,0 +1,122 @@
|
||||
labels: [Question]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain 🦜️🔗!
|
||||
|
||||
Please follow these instructions, fill every question, and do every step. 🙏
|
||||
|
||||
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
|
||||
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
|
||||
|
||||
By asking questions in a structured way (following this) it will be much easier for us to help you.
|
||||
|
||||
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
|
||||
|
||||
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
|
||||
|
||||
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
|
||||
|
||||
Relevant links to check before opening a question to see if your question has already been answered, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this question.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- type: checkboxes
|
||||
id: help
|
||||
attributes:
|
||||
label: Commit to Help
|
||||
description: |
|
||||
After submitting this, I commit to one of:
|
||||
|
||||
* Read open questions until I find 2 where I can help someone and add a comment to help there.
|
||||
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
|
||||
* Once my question is answered, I will mark the answer as "accepted".
|
||||
options:
|
||||
- label: I commit to help with one of those options 👆
|
||||
required: true
|
||||
- type: textarea
|
||||
id: example
|
||||
attributes:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
72
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
72
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -1,33 +1,35 @@
|
||||
name: "\U0001F41B Bug Report"
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the LangChain forum.
|
||||
labels: ["bug"]
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
|
||||
labels: ["02 Bug Report"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report BUGS in LangChain. For usage questions, feature requests and general design questions, please use the [LangChain Forum](https://forum.langchain.com/).
|
||||
|
||||
value: >
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report bugs in LangChain.
|
||||
|
||||
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
|
||||
to ask for help with your issue.
|
||||
|
||||
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
* [LangChain Forum](https://forum.langchain.com/),
|
||||
* [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
* [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
* [LangChain how-to guides](https://python.langchain.com/docs/how_to/),
|
||||
* [API Reference](https://python.langchain.com/api_reference/),
|
||||
* [LangChain ChatBot](https://chat.langchain.com/)
|
||||
* [GitHub search](https://github.com/langchain-ai/langchain),
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I added a clear and descriptive title that summarizes this issue.
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
@@ -35,10 +37,6 @@ body:
|
||||
required: true
|
||||
- label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
|
||||
required: true
|
||||
- label: I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
|
||||
required: true
|
||||
- label: I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
|
||||
required: true
|
||||
- type: textarea
|
||||
id: reproduction
|
||||
validations:
|
||||
@@ -47,25 +45,25 @@ body:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
The following code:
|
||||
|
||||
The following code:
|
||||
|
||||
```python
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
```
|
||||
@@ -101,18 +99,16 @@ body:
|
||||
Please share your system info with us. Do NOT skip this step and please don't trim
|
||||
the output. Most users don't include enough information here and it makes it harder
|
||||
for us to help you.
|
||||
|
||||
|
||||
Run the following command in your terminal and paste the output here:
|
||||
|
||||
`python -m langchain_core.sys_info`
|
||||
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
or if you have an existing python interpreter running:
|
||||
|
||||
```python
|
||||
|
||||
from langchain_core import sys_info
|
||||
sys_info.print_sys_info()
|
||||
```
|
||||
|
||||
|
||||
alternatively, put the entire output of `pip freeze` here.
|
||||
placeholder: |
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
12
.github/ISSUE_TEMPLATE/config.yml
vendored
12
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,6 +1,12 @@
|
||||
blank_issues_enabled: false
|
||||
version: 2.1
|
||||
contact_links:
|
||||
- name: LangChain Forum
|
||||
url: https://forum.langchain.com/
|
||||
about: General community discussions, support, and feature requests
|
||||
- name: 🤔 Question or Problem
|
||||
about: Ask a question or ask about a problem in GitHub Discussions.
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a
|
||||
- name: Feature Request
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas
|
||||
about: Suggest a feature or an idea
|
||||
- name: Show and tell
|
||||
about: Show what you built with LangChain
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell
|
||||
|
||||
109
.github/ISSUE_TEMPLATE/documentation.yml
vendored
109
.github/ISSUE_TEMPLATE/documentation.yml
vendored
@@ -1,59 +1,58 @@
|
||||
name: Documentation
|
||||
description: Report an issue related to the LangChain documentation.
|
||||
title: "docs: <Please write a comprehensive title after the 'docs: ' prefix>"
|
||||
labels: [documentation]
|
||||
title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
|
||||
labels: [03 - Documentation]
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use the [LangChain Forum](https://forum.langchain.com/).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
* [LangChain Forum](https://forum.langchain.com/),
|
||||
* [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
* [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
* [LangChain how-to guides](https://python.langchain.com/docs/how_to/),
|
||||
* [API Reference](https://python.langchain.com/api_reference/),
|
||||
* [LangChain ChatBot](https://chat.langchain.com/)
|
||||
* [GitHub search](https://github.com/langchain-ai/langchain),
|
||||
- type: input
|
||||
id: url
|
||||
attributes:
|
||||
label: URL
|
||||
description: URL to documentation
|
||||
validations:
|
||||
required: false
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
description: >
|
||||
Please describe as clearly as possible what topics you think are missing
|
||||
from the current documentation.
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: input
|
||||
id: url
|
||||
attributes:
|
||||
label: URL
|
||||
description: URL to documentation
|
||||
validations:
|
||||
required: false
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
description: >
|
||||
Please describe as clearly as possible what topics you think are missing
|
||||
from the current documentation.
|
||||
|
||||
8
.github/ISSUE_TEMPLATE/privileged.yml
vendored
8
.github/ISSUE_TEMPLATE/privileged.yml
vendored
@@ -5,10 +5,10 @@ body:
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain! 🚀
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation on the [LangChain Forum](https://forum.langchain.com/) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
or are a regular contributor to LangChain with previous merged pull requests.
|
||||
- type: checkboxes
|
||||
id: privileged
|
||||
|
||||
41
.github/PULL_REQUEST_TEMPLATE.md
vendored
41
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,32 +1,29 @@
|
||||
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.**
|
||||
Thank you for contributing to LangChain!
|
||||
|
||||
- [ ] **PR title**: "package: description"
|
||||
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
|
||||
- Example: "community: add foobar LLM"
|
||||
|
||||
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
|
||||
- Examples:
|
||||
- feat(core): add multi-tenant support
|
||||
- fix(cli): resolve flag parsing error
|
||||
- docs(openai): update API usage examples
|
||||
- Allowed `{TYPE}` values:
|
||||
- feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
|
||||
- Allowed `{SCOPE}` values (optional):
|
||||
- core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai
|
||||
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
|
||||
- Once you've written the title, please delete this checklist item; do not include it in the PR.
|
||||
|
||||
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
|
||||
- **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue.
|
||||
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes #123)
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
- **Description:** a description of the change
|
||||
- **Issue:** the issue # it fixes, if applicable
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, you must include:
|
||||
1. A test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. An example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more.
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, please include
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
|
||||
|
||||
Additional guidelines:
|
||||
|
||||
- Make sure optional dependencies are imported within a function.
|
||||
- Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests.
|
||||
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
|
||||
- Most PRs should not touch more than one package.
|
||||
- Changes should be backwards compatible.
|
||||
- If you are adding something to community, do not re-import it in langchain.
|
||||
|
||||
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
|
||||
|
||||
2
.github/actions/people/Dockerfile
vendored
2
.github/actions/people/Dockerfile
vendored
@@ -4,4 +4,4 @@ RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.
|
||||
|
||||
COPY ./app /app
|
||||
|
||||
CMD ["python", "/app/main.py"]
|
||||
CMD ["python", "/app/main.py"]
|
||||
6
.github/actions/people/action.yml
vendored
6
.github/actions/people/action.yml
vendored
@@ -4,8 +4,8 @@ description: "Generate the data for the LangChain People page"
|
||||
author: "Jacob Lee <jacob@langchain.dev>"
|
||||
inputs:
|
||||
token:
|
||||
description: "User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}"
|
||||
description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
|
||||
required: true
|
||||
runs:
|
||||
using: "docker"
|
||||
image: "Dockerfile"
|
||||
using: 'docker'
|
||||
image: 'Dockerfile'
|
||||
21
.github/actions/uv_setup/action.yml
vendored
21
.github/actions/uv_setup/action.yml
vendored
@@ -1,21 +0,0 @@
|
||||
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching
|
||||
|
||||
name: uv-install
|
||||
description: Set up Python and uv
|
||||
|
||||
inputs:
|
||||
python-version:
|
||||
description: Python version, supporting MAJOR.MINOR only
|
||||
required: true
|
||||
|
||||
env:
|
||||
UV_VERSION: "0.5.25"
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
- name: Install uv and set the python version
|
||||
uses: astral-sh/setup-uv@v5
|
||||
with:
|
||||
version: ${{ env.UV_VERSION }}
|
||||
python-version: ${{ inputs.python-version }}
|
||||
325
.github/copilot-instructions.md
vendored
325
.github/copilot-instructions.md
vendored
@@ -1,325 +0,0 @@
|
||||
# Global Development Guidelines for LangChain Projects
|
||||
|
||||
## Core Development Principles
|
||||
|
||||
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
|
||||
|
||||
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
|
||||
|
||||
❌ **Bad - Breaking Change:**
|
||||
|
||||
```python
|
||||
def get_user(id, verbose=False): # Changed from `user_id`
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Good - Stable Interface:**
|
||||
|
||||
```python
|
||||
def get_user(user_id: str, verbose: bool = False) -> User:
|
||||
"""Retrieve user by ID with optional verbose output."""
|
||||
pass
|
||||
```
|
||||
|
||||
**Before making ANY changes to public APIs:**
|
||||
|
||||
- Check if the function/class is exported in `__init__.py`
|
||||
- Look for existing usage patterns in tests and examples
|
||||
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
|
||||
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
|
||||
|
||||
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
|
||||
|
||||
### 2. Code Quality Standards
|
||||
|
||||
**All Python code MUST include type hints and return types.**
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def p(u, d):
|
||||
return [x for x in u if x not in d]
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
|
||||
"""Filter out users that are not in the known users set.
|
||||
|
||||
Args:
|
||||
users: List of user identifiers to filter.
|
||||
known_users: Set of known/valid user identifiers.
|
||||
|
||||
Returns:
|
||||
List of users that are not in the known_users set.
|
||||
"""
|
||||
return [user for user in users if user not in known_users]
|
||||
```
|
||||
|
||||
**Style Requirements:**
|
||||
|
||||
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
|
||||
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
|
||||
- Avoid unnecessary abstraction or premature optimization
|
||||
- Follow existing patterns in the codebase you're modifying
|
||||
|
||||
### 3. Testing Requirements
|
||||
|
||||
**Every new feature or bugfix MUST be covered by unit tests.**
|
||||
|
||||
**Test Organization:**
|
||||
|
||||
- Unit tests: `tests/unit_tests/` (no network calls allowed)
|
||||
- Integration tests: `tests/integration_tests/` (network calls permitted)
|
||||
- Use `pytest` as the testing framework
|
||||
|
||||
**Test Quality Checklist:**
|
||||
|
||||
- [ ] Tests fail when your new logic is broken
|
||||
- [ ] Happy path is covered
|
||||
- [ ] Edge cases and error conditions are tested
|
||||
- [ ] Use fixtures/mocks for external dependencies
|
||||
- [ ] Tests are deterministic (no flaky tests)
|
||||
|
||||
Checklist questions:
|
||||
|
||||
- [ ] Does the test suite fail if your new logic is broken?
|
||||
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
|
||||
- [ ] Do tests use fixtures or mocks where needed?
|
||||
|
||||
```python
|
||||
def test_filter_unknown_users():
|
||||
"""Test filtering unknown users from a list."""
|
||||
users = ["alice", "bob", "charlie"]
|
||||
known_users = {"alice", "bob"}
|
||||
|
||||
result = filter_unknown_users(users, known_users)
|
||||
|
||||
assert result == ["charlie"]
|
||||
assert len(result) == 1
|
||||
```
|
||||
|
||||
### 4. Security and Risk Assessment
|
||||
|
||||
**Security Checklist:**
|
||||
|
||||
- No `eval()`, `exec()`, or `pickle` on user-controlled input
|
||||
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
|
||||
- Remove unreachable/commented code before committing
|
||||
- Race conditions or resource leaks (file handles, sockets, threads).
|
||||
- Ensure proper resource cleanup (file handles, connections)
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def load_config(path):
|
||||
with open(path) as f:
|
||||
return eval(f.read()) # ⚠️ Never eval config
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def load_config(path: str) -> dict:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
```
|
||||
|
||||
### 5. Documentation Standards
|
||||
|
||||
**Use Google-style docstrings with Args section for all public functions.**
|
||||
|
||||
❌ **Insufficient Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to, msg):
|
||||
"""Send an email to a recipient."""
|
||||
```
|
||||
|
||||
✅ **Complete Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
|
||||
"""
|
||||
Send an email to a recipient with specified priority.
|
||||
|
||||
Args:
|
||||
to: The email address of the recipient.
|
||||
msg: The message body to send.
|
||||
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
|
||||
|
||||
Returns:
|
||||
True if email was sent successfully, False otherwise.
|
||||
|
||||
Raises:
|
||||
InvalidEmailError: If the email address format is invalid.
|
||||
SMTPConnectionError: If unable to connect to email server.
|
||||
"""
|
||||
```
|
||||
|
||||
**Documentation Guidelines:**
|
||||
|
||||
- Types go in function signatures, NOT in docstrings
|
||||
- Focus on "why" rather than "what" in descriptions
|
||||
- Document all parameters, return values, and exceptions
|
||||
- Keep descriptions concise but clear
|
||||
- Use reStructuredText for docstrings to enable rich formatting
|
||||
|
||||
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
|
||||
|
||||
### 6. Architectural Improvements
|
||||
|
||||
**When you encounter code that could be improved, suggest better designs:**
|
||||
|
||||
❌ **Poor Design:**
|
||||
|
||||
```python
|
||||
def process_data(data, db_conn, email_client, logger):
|
||||
# Function doing too many things
|
||||
validated = validate_data(data)
|
||||
result = db_conn.save(validated)
|
||||
email_client.send_notification(result)
|
||||
logger.log(f"Processed {len(data)} items")
|
||||
return result
|
||||
```
|
||||
|
||||
✅ **Better Design:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ProcessingResult:
|
||||
"""Result of data processing operation."""
|
||||
items_processed: int
|
||||
success: bool
|
||||
errors: List[str] = field(default_factory=list)
|
||||
|
||||
class DataProcessor:
|
||||
"""Handles data validation, storage, and notification."""
|
||||
|
||||
def __init__(self, db_conn: Database, email_client: EmailClient):
|
||||
self.db = db_conn
|
||||
self.email = email_client
|
||||
|
||||
def process(self, data: List[dict]) -> ProcessingResult:
|
||||
"""Process and store data with notifications."""
|
||||
validated = self._validate_data(data)
|
||||
result = self.db.save(validated)
|
||||
self._notify_completion(result)
|
||||
return result
|
||||
```
|
||||
|
||||
**Design Improvement Areas:**
|
||||
|
||||
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
|
||||
|
||||
- Reduce code duplication through shared utilities
|
||||
- Make unit testing easier
|
||||
- Improve separation of concerns (single responsibility)
|
||||
- Make unit testing easier through dependency injection
|
||||
- Add clarity without adding complexity
|
||||
- Prefer dataclasses for structured data
|
||||
|
||||
## Development Tools & Commands
|
||||
|
||||
### Package Management
|
||||
|
||||
```bash
|
||||
# Add package
|
||||
uv add package-name
|
||||
|
||||
# Sync project dependencies
|
||||
uv sync
|
||||
uv lock
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run unit tests (no network)
|
||||
make test
|
||||
|
||||
# Don't run integration tests, as API keys must be set
|
||||
|
||||
# Run specific test file
|
||||
uv run --group test pytest tests/unit_tests/test_specific.py
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
|
||||
```bash
|
||||
# Lint code
|
||||
make lint
|
||||
|
||||
# Format code
|
||||
make format
|
||||
|
||||
# Type checking
|
||||
uv run --group lint mypy .
|
||||
```
|
||||
|
||||
### Dependency Management Patterns
|
||||
|
||||
**Local Development Dependencies:**
|
||||
|
||||
```toml
|
||||
[tool.uv.sources]
|
||||
langchain-core = { path = "../core", editable = true }
|
||||
langchain-tests = { path = "../standard-tests", editable = true }
|
||||
```
|
||||
|
||||
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search_database(query: str) -> str:
|
||||
"""Search the database for relevant information.
|
||||
|
||||
Args:
|
||||
query: The search query string.
|
||||
"""
|
||||
# Implementation here
|
||||
return results
|
||||
```
|
||||
|
||||
## Commit Standards
|
||||
|
||||
**Use Conventional Commits format for PR titles:**
|
||||
|
||||
- `feat(core): add multi-tenant support`
|
||||
- `fix(cli): resolve flag parsing error`
|
||||
- `docs: update API usage examples`
|
||||
- `docs(openai): update API usage examples`
|
||||
|
||||
## Framework-Specific Guidelines
|
||||
|
||||
- Follow the existing patterns in `langchain-core` for base abstractions
|
||||
- Use `langchain_core.callbacks` for execution tracking
|
||||
- Implement proper streaming support where applicable
|
||||
- Avoid deprecated components like legacy `LLMChain`
|
||||
|
||||
### Partner Integrations
|
||||
|
||||
- Follow the established patterns in existing partner libraries
|
||||
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
|
||||
- Include comprehensive integration tests
|
||||
- Document API key requirements and authentication
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Checklist
|
||||
|
||||
Before submitting code changes:
|
||||
|
||||
- [ ] **Breaking Changes**: Verified no public API changes
|
||||
- [ ] **Type Hints**: All functions have complete type annotations
|
||||
- [ ] **Tests**: New functionality is fully tested
|
||||
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
|
||||
- [ ] **Documentation**: Google-style docstrings for public functions
|
||||
- [ ] **Code Quality**: `make lint` and `make format` pass
|
||||
- [ ] **Architecture**: Suggested improvements where applicable
|
||||
- [ ] **Commit Message**: Follows Conventional Commits format
|
||||
11
.github/dependabot.yml
vendored
11
.github/dependabot.yml
vendored
@@ -1,11 +0,0 @@
|
||||
# Please see the documentation for all configuration options:
|
||||
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
|
||||
# and
|
||||
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
|
||||
|
||||
version: 2
|
||||
updates:
|
||||
- package-ecosystem: "github-actions"
|
||||
directory: "/"
|
||||
schedule:
|
||||
interval: "weekly"
|
||||
122
.github/scripts/check_diff.py
vendored
122
.github/scripts/check_diff.py
vendored
@@ -3,18 +3,19 @@ import json
|
||||
import os
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Set
|
||||
|
||||
from pathlib import Path
|
||||
import tomllib
|
||||
|
||||
from get_min_versions import get_min_version_from_toml
|
||||
from packaging.requirements import Requirement
|
||||
|
||||
|
||||
LANGCHAIN_DIRS = [
|
||||
"libs/core",
|
||||
"libs/text-splitters",
|
||||
"libs/langchain",
|
||||
"libs/langchain_v1",
|
||||
"libs/community",
|
||||
"libs/experimental",
|
||||
]
|
||||
|
||||
# when set to True, we are ignoring core dependents
|
||||
@@ -30,13 +31,6 @@ IGNORED_PARTNERS = [
|
||||
# specifically in huggingface jobs
|
||||
# https://github.com/langchain-ai/langchain/issues/25558
|
||||
"huggingface",
|
||||
# prompty exhibiting issues with numpy for Python 3.13
|
||||
# https://github.com/langchain-ai/langchain/actions/runs/12651104685/job/35251034969?pr=29065
|
||||
"prompty",
|
||||
]
|
||||
|
||||
PY_312_MAX_PACKAGES = [
|
||||
"libs/partners/chroma", # https://github.com/chroma-core/chroma/issues/4382
|
||||
]
|
||||
|
||||
|
||||
@@ -61,17 +55,15 @@ def dependents_graph() -> dict:
|
||||
|
||||
# load regular and test deps from pyproject.toml
|
||||
with open(path, "rb") as f:
|
||||
pyproject = tomllib.load(f)
|
||||
pyproject = tomllib.load(f)["tool"]["poetry"]
|
||||
|
||||
pkg_dir = "libs" + "/".join(path.split("libs")[1].split("/")[:-1])
|
||||
for dep in [
|
||||
*pyproject["project"]["dependencies"],
|
||||
*pyproject["dependency-groups"]["test"],
|
||||
*pyproject["dependencies"].keys(),
|
||||
*pyproject["group"]["test"]["dependencies"].keys(),
|
||||
]:
|
||||
requirement = Requirement(dep)
|
||||
package_name = requirement.name
|
||||
if "langchain" in dep:
|
||||
dependents[package_name].add(pkg_dir)
|
||||
dependents[dep].add(pkg_dir)
|
||||
continue
|
||||
|
||||
# load extended deps from extended_testing_deps.txt
|
||||
@@ -83,9 +75,9 @@ def dependents_graph() -> dict:
|
||||
for depline in extended_deps:
|
||||
if depline.startswith("-e "):
|
||||
# editable dependency
|
||||
assert depline.startswith("-e ../partners/"), (
|
||||
"Extended test deps should only editable install partner packages"
|
||||
)
|
||||
assert depline.startswith(
|
||||
"-e ../partners/"
|
||||
), "Extended test deps should only editable install partner packages"
|
||||
partner = depline.split("partners/")[1]
|
||||
dep = f"langchain-{partner}"
|
||||
else:
|
||||
@@ -118,26 +110,24 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
|
||||
if job == "test-pydantic":
|
||||
return _get_pydantic_test_configs(dir_)
|
||||
|
||||
if job == "codspeed":
|
||||
py_versions = ["3.12"] # 3.13 is not yet supported
|
||||
elif dir_ == "libs/core":
|
||||
py_versions = ["3.9", "3.10", "3.11", "3.12", "3.13"]
|
||||
if dir_ == "libs/core":
|
||||
py_versions = ["3.9", "3.10", "3.11", "3.12"]
|
||||
# custom logic for specific directories
|
||||
elif dir_ == "libs/partners/milvus":
|
||||
# milvus doesn't allow 3.12 because they declare deps in funny way
|
||||
# milvus poetry doesn't allow 3.12 because they
|
||||
# declare deps in funny way
|
||||
py_versions = ["3.9", "3.11"]
|
||||
|
||||
elif dir_ in PY_312_MAX_PACKAGES:
|
||||
py_versions = ["3.9", "3.12"]
|
||||
elif dir_ in ["libs/community", "libs/langchain"] and job == "extended-tests":
|
||||
# community extended test resolution in 3.12 is slow
|
||||
# even in uv
|
||||
py_versions = ["3.9", "3.11"]
|
||||
|
||||
elif dir_ == "libs/langchain" and job == "extended-tests":
|
||||
py_versions = ["3.9", "3.13"]
|
||||
|
||||
elif dir_ == ".":
|
||||
# unable to install with 3.13 because tokenizers doesn't support 3.13 yet
|
||||
py_versions = ["3.9", "3.12"]
|
||||
elif dir_ == "libs/community" and job == "compile-integration-tests":
|
||||
# community integration deps are slow in 3.12
|
||||
py_versions = ["3.9", "3.11"]
|
||||
else:
|
||||
py_versions = ["3.9", "3.13"]
|
||||
py_versions = ["3.9", "3.12"]
|
||||
|
||||
return [{"working-directory": dir_, "python-version": py_v} for py_v in py_versions]
|
||||
|
||||
@@ -145,17 +135,17 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
|
||||
def _get_pydantic_test_configs(
|
||||
dir_: str, *, python_version: str = "3.11"
|
||||
) -> List[Dict[str, str]]:
|
||||
with open("./libs/core/uv.lock", "rb") as f:
|
||||
core_uv_lock_data = tomllib.load(f)
|
||||
for package in core_uv_lock_data["package"]:
|
||||
with open("./libs/core/poetry.lock", "rb") as f:
|
||||
core_poetry_lock_data = tomllib.load(f)
|
||||
for package in core_poetry_lock_data["package"]:
|
||||
if package["name"] == "pydantic":
|
||||
core_max_pydantic_minor = package["version"].split(".")[1]
|
||||
break
|
||||
|
||||
with open(f"./{dir_}/uv.lock", "rb") as f:
|
||||
dir_uv_lock_data = tomllib.load(f)
|
||||
with open(f"./{dir_}/poetry.lock", "rb") as f:
|
||||
dir_poetry_lock_data = tomllib.load(f)
|
||||
|
||||
for package in dir_uv_lock_data["package"]:
|
||||
for package in dir_poetry_lock_data["package"]:
|
||||
if package["name"] == "pydantic":
|
||||
dir_max_pydantic_minor = package["version"].split(".")[1]
|
||||
break
|
||||
@@ -163,19 +153,19 @@ def _get_pydantic_test_configs(
|
||||
core_min_pydantic_version = get_min_version_from_toml(
|
||||
"./libs/core/pyproject.toml", "release", python_version, include=["pydantic"]
|
||||
)["pydantic"]
|
||||
core_min_pydantic_minor = (
|
||||
core_min_pydantic_version.split(".")[1]
|
||||
if "." in core_min_pydantic_version
|
||||
else "0"
|
||||
)
|
||||
dir_min_pydantic_version = get_min_version_from_toml(
|
||||
f"./{dir_}/pyproject.toml", "release", python_version, include=["pydantic"]
|
||||
).get("pydantic", "0.0.0")
|
||||
dir_min_pydantic_minor = (
|
||||
dir_min_pydantic_version.split(".")[1]
|
||||
if "." in dir_min_pydantic_version
|
||||
else "0"
|
||||
core_min_pydantic_minor = core_min_pydantic_version.split(".")[1] if "." in core_min_pydantic_version else "0"
|
||||
dir_min_pydantic_version = (
|
||||
get_min_version_from_toml(
|
||||
f"./{dir_}/pyproject.toml", "release", python_version, include=["pydantic"]
|
||||
)
|
||||
.get("pydantic", "0.0.0")
|
||||
)
|
||||
dir_min_pydantic_minor = dir_min_pydantic_version.split(".")[1] if "." in dir_min_pydantic_version else "0"
|
||||
|
||||
custom_mins = {
|
||||
# depends on pydantic-settings 2.4 which requires pydantic 2.7
|
||||
"libs/community": 7,
|
||||
}
|
||||
|
||||
max_pydantic_minor = min(
|
||||
int(dir_max_pydantic_minor),
|
||||
@@ -184,6 +174,7 @@ def _get_pydantic_test_configs(
|
||||
min_pydantic_minor = max(
|
||||
int(dir_min_pydantic_minor),
|
||||
int(core_min_pydantic_minor),
|
||||
custom_mins.get(dir_, 0),
|
||||
)
|
||||
|
||||
configs = [
|
||||
@@ -211,8 +202,6 @@ def _get_configs_for_multi_dirs(
|
||||
)
|
||||
elif job == "extended-tests":
|
||||
dirs = list(dirs_to_run["extended-test"])
|
||||
elif job == "codspeed":
|
||||
dirs = list(dirs_to_run["codspeed"])
|
||||
else:
|
||||
raise ValueError(f"Unknown job: {job}")
|
||||
|
||||
@@ -228,7 +217,6 @@ if __name__ == "__main__":
|
||||
"lint": set(),
|
||||
"test": set(),
|
||||
"extended-test": set(),
|
||||
"codspeed": set(),
|
||||
}
|
||||
docs_edited = False
|
||||
|
||||
@@ -252,8 +240,6 @@ if __name__ == "__main__":
|
||||
dirs_to_run["extended-test"].update(LANGCHAIN_DIRS)
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
if file.startswith("libs/core"):
|
||||
dirs_to_run["codspeed"].add(f"libs/core")
|
||||
if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS):
|
||||
# add that dir and all dirs after in LANGCHAIN_DIRS
|
||||
# for extended testing
|
||||
@@ -269,11 +255,8 @@ if __name__ == "__main__":
|
||||
dirs_to_run["extended-test"].add(dir_)
|
||||
elif file.startswith("libs/standard-tests"):
|
||||
# TODO: update to include all packages that rely on standard-tests (all partner packages)
|
||||
# Note: won't run on external repo partners
|
||||
# note: won't run on external repo partners
|
||||
dirs_to_run["lint"].add("libs/standard-tests")
|
||||
dirs_to_run["test"].add("libs/standard-tests")
|
||||
dirs_to_run["lint"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/partners/mistralai")
|
||||
dirs_to_run["test"].add("libs/partners/openai")
|
||||
dirs_to_run["test"].add("libs/partners/anthropic")
|
||||
@@ -281,9 +264,8 @@ if __name__ == "__main__":
|
||||
dirs_to_run["test"].add("libs/partners/groq")
|
||||
|
||||
elif file.startswith("libs/cli"):
|
||||
dirs_to_run["lint"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/cli")
|
||||
|
||||
# todo: add cli makefile
|
||||
pass
|
||||
elif file.startswith("libs/partners"):
|
||||
partner_dir = file.split("/")[2]
|
||||
if os.path.isdir(f"libs/partners/{partner_dir}") and [
|
||||
@@ -292,20 +274,15 @@ if __name__ == "__main__":
|
||||
if not filename.startswith(".")
|
||||
] != ["README.md"]:
|
||||
dirs_to_run["test"].add(f"libs/partners/{partner_dir}")
|
||||
dirs_to_run["codspeed"].add(f"libs/partners/{partner_dir}")
|
||||
# Skip if the directory was deleted or is just a tombstone readme
|
||||
elif file == "libs/packages.yml":
|
||||
continue
|
||||
elif file.startswith("libs/"):
|
||||
raise ValueError(
|
||||
f"Unknown lib: {file}. check_diff.py likely needs "
|
||||
"an update for this new library!"
|
||||
)
|
||||
elif file.startswith("docs/") or file in [
|
||||
"pyproject.toml",
|
||||
"uv.lock",
|
||||
]: # docs or root uv files
|
||||
docs_edited = True
|
||||
elif any(file.startswith(p) for p in ["docs/", "templates/", "cookbook/"]):
|
||||
if file.startswith("docs/"):
|
||||
docs_edited = True
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
dependents = dependents_graph()
|
||||
@@ -321,7 +298,6 @@ if __name__ == "__main__":
|
||||
"compile-integration-tests",
|
||||
"dependencies",
|
||||
"test-pydantic",
|
||||
"codspeed",
|
||||
]
|
||||
}
|
||||
map_job_to_configs["test-doc-imports"] = (
|
||||
|
||||
12
.github/scripts/check_prerelease_dependencies.py
vendored
12
.github/scripts/check_prerelease_dependencies.py
vendored
@@ -1,5 +1,4 @@
|
||||
import sys
|
||||
|
||||
import tomllib
|
||||
|
||||
if __name__ == "__main__":
|
||||
@@ -11,25 +10,26 @@ if __name__ == "__main__":
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
# see if we're releasing an rc
|
||||
version = toml_data["project"]["version"]
|
||||
version = toml_data["tool"]["poetry"]["version"]
|
||||
releasing_rc = "rc" in version or "dev" in version
|
||||
|
||||
# if not, iterate through dependencies and make sure none allow prereleases
|
||||
if not releasing_rc:
|
||||
dependencies = toml_data["project"]["dependencies"]
|
||||
for dep_version in dependencies:
|
||||
dependencies = toml_data["tool"]["poetry"]["dependencies"]
|
||||
for lib in dependencies:
|
||||
dep_version = dependencies[lib]
|
||||
dep_version_string = (
|
||||
dep_version["version"] if isinstance(dep_version, dict) else dep_version
|
||||
)
|
||||
|
||||
if "rc" in dep_version_string:
|
||||
raise ValueError(
|
||||
f"Dependency {dep_version} has a prerelease version. Please remove this."
|
||||
f"Dependency {lib} has a prerelease version. Please remove this."
|
||||
)
|
||||
|
||||
if isinstance(dep_version, dict) and dep_version.get(
|
||||
"allow-prereleases", False
|
||||
):
|
||||
raise ValueError(
|
||||
f"Dependency {dep_version} has allow-prereleases set to true. Please remove this."
|
||||
f"Dependency {lib} has allow-prereleases set to true. Please remove this."
|
||||
)
|
||||
|
||||
143
.github/scripts/get_min_versions.py
vendored
143
.github/scripts/get_min_versions.py
vendored
@@ -1,5 +1,4 @@
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from typing import Optional
|
||||
|
||||
if sys.version_info >= (3, 11):
|
||||
@@ -8,19 +7,17 @@ else:
|
||||
# for python 3.10 and below, which doesnt have stdlib tomllib
|
||||
import tomli as tomllib
|
||||
|
||||
import re
|
||||
from typing import List
|
||||
|
||||
import requests
|
||||
from packaging.requirements import Requirement
|
||||
from packaging.version import parse as parse_version
|
||||
from packaging.specifiers import SpecifierSet
|
||||
from packaging.version import Version, parse
|
||||
from packaging.version import Version
|
||||
|
||||
import re
|
||||
|
||||
MIN_VERSION_LIBS = [
|
||||
"langchain-core",
|
||||
"langchain-community",
|
||||
"langchain",
|
||||
"langchain-text-splitters",
|
||||
"numpy",
|
||||
"SQLAlchemy",
|
||||
]
|
||||
|
||||
@@ -30,83 +27,33 @@ SKIP_IF_PULL_REQUEST = [
|
||||
"langchain-core",
|
||||
"langchain-text-splitters",
|
||||
"langchain",
|
||||
"langchain-community",
|
||||
]
|
||||
|
||||
|
||||
def get_pypi_versions(package_name: str) -> List[str]:
|
||||
"""
|
||||
Fetch all available versions for a package from PyPI.
|
||||
def get_min_version(version: str) -> str:
|
||||
# base regex for x.x.x with cases for rc/post/etc
|
||||
# valid strings: https://peps.python.org/pep-0440/#public-version-identifiers
|
||||
vstring = r"\d+(?:\.\d+){0,2}(?:(?:a|b|rc|\.post|\.dev)\d+)?"
|
||||
# case ^x.x.x
|
||||
_match = re.match(f"^\\^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
Args:
|
||||
package_name (str): Name of the package
|
||||
# case >=x.x.x,<y.y.y
|
||||
_match = re.match(f"^>=({vstring}),<({vstring})$", version)
|
||||
if _match:
|
||||
_min = _match.group(1)
|
||||
_max = _match.group(2)
|
||||
assert parse_version(_min) < parse_version(_max)
|
||||
return _min
|
||||
|
||||
Returns:
|
||||
List[str]: List of all available versions
|
||||
# case x.x.x
|
||||
_match = re.match(f"^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
Raises:
|
||||
requests.exceptions.RequestException: If PyPI API request fails
|
||||
KeyError: If package not found or response format unexpected
|
||||
"""
|
||||
pypi_url = f"https://pypi.org/pypi/{package_name}/json"
|
||||
response = requests.get(pypi_url)
|
||||
response.raise_for_status()
|
||||
return list(response.json()["releases"].keys())
|
||||
|
||||
|
||||
def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]:
|
||||
"""
|
||||
Find the minimum published version that satisfies the given constraints.
|
||||
|
||||
Args:
|
||||
package_name (str): Name of the package
|
||||
spec_string (str): Version specification string (e.g., ">=0.2.43,<0.4.0,!=0.3.0")
|
||||
|
||||
Returns:
|
||||
Optional[str]: Minimum compatible version or None if no compatible version found
|
||||
"""
|
||||
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
|
||||
spec_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", spec_string)
|
||||
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string)
|
||||
for y in range(1, 10):
|
||||
spec_string = re.sub(
|
||||
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}", spec_string
|
||||
)
|
||||
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
|
||||
for x in range(1, 10):
|
||||
spec_string = re.sub(
|
||||
rf"\^{x}\.(\d+)\.(\d+)", rf">={x}.\1.\2,<{x + 1}", spec_string
|
||||
)
|
||||
|
||||
spec_set = SpecifierSet(spec_string)
|
||||
all_versions = get_pypi_versions(package_name)
|
||||
|
||||
valid_versions = []
|
||||
for version_str in all_versions:
|
||||
try:
|
||||
version = parse(version_str)
|
||||
if spec_set.contains(version):
|
||||
valid_versions.append(version)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
return str(min(valid_versions)) if valid_versions else None
|
||||
|
||||
|
||||
def _check_python_version_from_requirement(
|
||||
requirement: Requirement, python_version: str
|
||||
) -> bool:
|
||||
if not requirement.marker:
|
||||
return True
|
||||
else:
|
||||
marker_str = str(requirement.marker)
|
||||
if "python_version" or "python_full_version" in marker_str:
|
||||
python_version_str = "".join(
|
||||
char
|
||||
for char in marker_str
|
||||
if char.isdigit() or char in (".", "<", ">", "=", ",")
|
||||
)
|
||||
return check_python_version(python_version, python_version_str)
|
||||
return True
|
||||
raise ValueError(f"Unrecognized version format: {version}")
|
||||
|
||||
|
||||
def get_min_version_from_toml(
|
||||
@@ -120,10 +67,8 @@ def get_min_version_from_toml(
|
||||
with open(toml_path, "rb") as file:
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
dependencies = defaultdict(list)
|
||||
for dep in toml_data["project"]["dependencies"]:
|
||||
requirement = Requirement(dep)
|
||||
dependencies[requirement.name].append(requirement)
|
||||
# Get the dependencies from tool.poetry.dependencies
|
||||
dependencies = toml_data["tool"]["poetry"]["dependencies"]
|
||||
|
||||
# Initialize a dictionary to store the minimum versions
|
||||
min_versions = {}
|
||||
@@ -138,14 +83,20 @@ def get_min_version_from_toml(
|
||||
if lib in dependencies:
|
||||
if include and lib not in include:
|
||||
continue
|
||||
requirements = dependencies[lib]
|
||||
for requirement in requirements:
|
||||
if _check_python_version_from_requirement(requirement, python_version):
|
||||
version_string = str(requirement.specifier)
|
||||
break
|
||||
# Get the version string
|
||||
version_string = dependencies[lib]
|
||||
|
||||
if isinstance(version_string, dict):
|
||||
version_string = version_string["version"]
|
||||
if isinstance(version_string, list):
|
||||
version_string = [
|
||||
vs
|
||||
for vs in version_string
|
||||
if check_python_version(python_version, vs["python"])
|
||||
][0]["version"]
|
||||
|
||||
# Use parse_version to get the minimum supported version from version_string
|
||||
min_version = get_minimum_version(lib, version_string)
|
||||
min_version = get_min_version(version_string)
|
||||
|
||||
# Store the minimum version in the min_versions dictionary
|
||||
min_versions[lib] = min_version
|
||||
@@ -161,20 +112,6 @@ def check_python_version(version_string, constraint_string):
|
||||
:param constraint_string: A string representing the package's Python version constraints (e.g. ">=3.6, <4.0").
|
||||
:return: True if the version matches the constraints, False otherwise.
|
||||
"""
|
||||
|
||||
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
|
||||
constraint_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", constraint_string)
|
||||
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string)
|
||||
for y in range(1, 10):
|
||||
constraint_string = re.sub(
|
||||
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}.0", constraint_string
|
||||
)
|
||||
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
|
||||
for x in range(1, 10):
|
||||
constraint_string = re.sub(
|
||||
rf"\^{x}\.0\.(\d+)", rf">={x}.0.\1,<{x + 1}.0.0", constraint_string
|
||||
)
|
||||
|
||||
try:
|
||||
version = Version(version_string)
|
||||
constraints = SpecifierSet(constraint_string)
|
||||
|
||||
112
.github/scripts/prep_api_docs_build.py
vendored
112
.github/scripts/prep_api_docs_build.py
vendored
@@ -1,112 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
"""Script to sync libraries from various repositories into the main langchain repository."""
|
||||
|
||||
import os
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
def load_packages_yaml() -> Dict[str, Any]:
|
||||
"""Load and parse the packages.yml file."""
|
||||
with open("langchain/libs/packages.yml", "r") as f:
|
||||
return yaml.safe_load(f)
|
||||
|
||||
|
||||
def get_target_dir(package_name: str) -> Path:
|
||||
"""Get the target directory for a given package."""
|
||||
package_name_short = package_name.replace("langchain-", "")
|
||||
base_path = Path("langchain/libs")
|
||||
if package_name_short == "experimental":
|
||||
return base_path / "experimental"
|
||||
if package_name_short == "community":
|
||||
return base_path / "community"
|
||||
return base_path / "partners" / package_name_short
|
||||
|
||||
|
||||
def clean_target_directories(packages: list) -> None:
|
||||
"""Remove old directories that will be replaced."""
|
||||
for package in packages:
|
||||
target_dir = get_target_dir(package["name"])
|
||||
if target_dir.exists():
|
||||
print(f"Removing {target_dir}")
|
||||
shutil.rmtree(target_dir)
|
||||
|
||||
|
||||
def move_libraries(packages: list) -> None:
|
||||
"""Move libraries from their source locations to the target directories."""
|
||||
for package in packages:
|
||||
repo_name = package["repo"].split("/")[1]
|
||||
source_path = package["path"]
|
||||
target_dir = get_target_dir(package["name"])
|
||||
|
||||
# Handle root path case
|
||||
if source_path == ".":
|
||||
source_dir = repo_name
|
||||
else:
|
||||
source_dir = f"{repo_name}/{source_path}"
|
||||
|
||||
print(f"Moving {source_dir} to {target_dir}")
|
||||
|
||||
# Ensure target directory exists
|
||||
os.makedirs(os.path.dirname(target_dir), exist_ok=True)
|
||||
|
||||
try:
|
||||
# Move the directory
|
||||
shutil.move(source_dir, target_dir)
|
||||
except Exception as e:
|
||||
print(f"Error moving {source_dir} to {target_dir}: {e}")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to orchestrate the library sync process."""
|
||||
try:
|
||||
# Load packages configuration
|
||||
package_yaml = load_packages_yaml()
|
||||
|
||||
# Clean target directories
|
||||
clean_target_directories(
|
||||
[
|
||||
p
|
||||
for p in package_yaml["packages"]
|
||||
if (
|
||||
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
|
||||
)
|
||||
and p["repo"] != "langchain-ai/langchain"
|
||||
and p["name"]
|
||||
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
|
||||
]
|
||||
)
|
||||
|
||||
# Move libraries to their new locations
|
||||
move_libraries(
|
||||
[
|
||||
p
|
||||
for p in package_yaml["packages"]
|
||||
if not p.get("disabled", False)
|
||||
and (
|
||||
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
|
||||
)
|
||||
and p["repo"] != "langchain-ai/langchain"
|
||||
and p["name"]
|
||||
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
|
||||
]
|
||||
)
|
||||
|
||||
# Delete ones without a pyproject.toml
|
||||
for partner in Path("langchain/libs/partners").iterdir():
|
||||
if partner.is_dir() and not (partner / "pyproject.toml").exists():
|
||||
print(f"Removing {partner} as it does not have a pyproject.toml")
|
||||
shutil.rmtree(partner)
|
||||
|
||||
print("Library sync completed successfully!")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error during library sync: {e}")
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
416
.github/tools/git-restore-mtime
vendored
416
.github/tools/git-restore-mtime
vendored
@@ -81,93 +81,56 @@ import time
|
||||
__version__ = "2022.12+dev"
|
||||
|
||||
# Update symlinks only if the platform supports not following them
|
||||
UPDATE_SYMLINKS = bool(os.utime in getattr(os, "supports_follow_symlinks", []))
|
||||
UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', []))
|
||||
|
||||
# Call os.path.normpath() only if not in a POSIX platform (Windows)
|
||||
NORMALIZE_PATHS = os.path.sep != "/"
|
||||
NORMALIZE_PATHS = (os.path.sep != '/')
|
||||
|
||||
# How many files to process in each batch when re-trying merge commits
|
||||
STEPMISSING = 100
|
||||
|
||||
# (Extra) keywords for the os.utime() call performed by touch()
|
||||
UTIME_KWS = {} if not UPDATE_SYMLINKS else {"follow_symlinks": False}
|
||||
UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False}
|
||||
|
||||
|
||||
# Command-line interface ######################################################
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description=__doc__.split("\n---")[0])
|
||||
parser = argparse.ArgumentParser(
|
||||
description=__doc__.split('\n---')[0])
|
||||
|
||||
group = parser.add_mutually_exclusive_group()
|
||||
group.add_argument(
|
||||
"--quiet",
|
||||
"-q",
|
||||
dest="loglevel",
|
||||
action="store_const",
|
||||
const=logging.WARNING,
|
||||
default=logging.INFO,
|
||||
help="Suppress informative messages and summary statistics.",
|
||||
)
|
||||
group.add_argument(
|
||||
"--verbose",
|
||||
"-v",
|
||||
action="count",
|
||||
help="""
|
||||
group.add_argument('--quiet', '-q', dest='loglevel',
|
||||
action="store_const", const=logging.WARNING, default=logging.INFO,
|
||||
help="Suppress informative messages and summary statistics.")
|
||||
group.add_argument('--verbose', '-v', action="count", help="""
|
||||
Print additional information for each processed file.
|
||||
Specify twice to further increase verbosity.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--cwd",
|
||||
"-C",
|
||||
metavar="DIRECTORY",
|
||||
help="""
|
||||
parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help="""
|
||||
Run as if %(prog)s was started in directory %(metavar)s.
|
||||
This affects how --work-tree, --git-dir and PATHSPEC arguments are handled.
|
||||
See 'man 1 git' or 'git --help' for more information.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--git-dir",
|
||||
dest="gitdir",
|
||||
metavar="GITDIR",
|
||||
help="""
|
||||
parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help="""
|
||||
Path to the git repository, by default auto-discovered by searching
|
||||
the current directory and its parents for a .git/ subdirectory.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--work-tree",
|
||||
dest="workdir",
|
||||
metavar="WORKTREE",
|
||||
help="""
|
||||
parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help="""
|
||||
Path to the work tree root, by default the parent of GITDIR if it's
|
||||
automatically discovered, or the current directory if GITDIR is set.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--force",
|
||||
"-f",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--force', '-f', default=False, action="store_true", help="""
|
||||
Force updating files with uncommitted modifications.
|
||||
Untracked files and uncommitted deletions, renames and additions are
|
||||
always ignored.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--merge",
|
||||
"-m",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--merge', '-m', default=False, action="store_true", help="""
|
||||
Include merge commits.
|
||||
Leads to more recent times and more files per commit, thus with the same
|
||||
time, which may or may not be what you want.
|
||||
@@ -175,130 +138,71 @@ def parse_args():
|
||||
are found sooner, which can improve performance, sometimes substantially.
|
||||
But as merge commits are usually huge, processing them may also take longer.
|
||||
By default, merge commits are only used for files missing from regular commits.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--first-parent",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--first-parent', default=False, action="store_true", help="""
|
||||
Consider only the first parent, the "main branch", when evaluating merge commits.
|
||||
Only effective when merge commits are processed, either when --merge is
|
||||
used or when finding missing files after the first regular log search.
|
||||
See --skip-missing.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-missing",
|
||||
"-s",
|
||||
dest="missing",
|
||||
default=True,
|
||||
action="store_false",
|
||||
help="""
|
||||
parser.add_argument('--skip-missing', '-s', dest="missing", default=True,
|
||||
action="store_false", help="""
|
||||
Do not try to find missing files.
|
||||
If merge commits were not evaluated with --merge and some files were
|
||||
not found in regular commits, by default %(prog)s searches for these
|
||||
files again in the merge commits.
|
||||
This option disables this retry, so files found only in merge commits
|
||||
will not have their timestamp updated.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--no-directories",
|
||||
"-D",
|
||||
dest="dirs",
|
||||
default=True,
|
||||
action="store_false",
|
||||
help="""
|
||||
parser.add_argument('--no-directories', '-D', dest='dirs', default=True,
|
||||
action="store_false", help="""
|
||||
Do not update directory timestamps.
|
||||
By default, use the time of its most recently created, renamed or deleted file.
|
||||
Note that just modifying a file will NOT update its directory time.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--test",
|
||||
"-t",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="Test run: do not actually update any file timestamp.",
|
||||
)
|
||||
parser.add_argument('--test', '-t', default=False, action="store_true",
|
||||
help="Test run: do not actually update any file timestamp.")
|
||||
|
||||
parser.add_argument(
|
||||
"--commit-time",
|
||||
"-c",
|
||||
dest="commit_time",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="Use commit time instead of author time.",
|
||||
)
|
||||
parser.add_argument('--commit-time', '-c', dest='commit_time', default=False,
|
||||
action='store_true', help="Use commit time instead of author time.")
|
||||
|
||||
parser.add_argument(
|
||||
"--oldest-time",
|
||||
"-o",
|
||||
dest="reverse_order",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False,
|
||||
action='store_true', help="""
|
||||
Update times based on the oldest, instead of the most recent commit of a file.
|
||||
This reverses the order in which the git log is processed to emulate a
|
||||
file "creation" date. Note this will be inaccurate for files deleted and
|
||||
re-created at later dates.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-older-than",
|
||||
metavar="SECONDS",
|
||||
type=int,
|
||||
help="""
|
||||
parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help="""
|
||||
Ignore files that are currently older than %(metavar)s.
|
||||
Useful in workflows that assume such files already have a correct timestamp,
|
||||
as it may improve performance by processing fewer files.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-older-than-commit",
|
||||
"-N",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--skip-older-than-commit', '-N', default=False,
|
||||
action='store_true', help="""
|
||||
Ignore files older than the timestamp it would be updated to.
|
||||
Such files may be considered "original", likely in the author's repository.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--unique-times",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--unique-times', default=False, action="store_true", help="""
|
||||
Set the microseconds to a unique value per commit.
|
||||
Allows telling apart changes that would otherwise have identical timestamps,
|
||||
as git's time accuracy is in seconds.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"pathspec",
|
||||
nargs="*",
|
||||
metavar="PATHSPEC",
|
||||
help="""
|
||||
parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help="""
|
||||
Only modify paths matching %(metavar)s, relative to current directory.
|
||||
By default, update all but untracked files and submodules.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--version",
|
||||
"-V",
|
||||
action="version",
|
||||
version="%(prog)s version {version}".format(version=get_version()),
|
||||
)
|
||||
parser.add_argument('--version', '-V', action='version',
|
||||
version='%(prog)s version {version}'.format(version=get_version()))
|
||||
|
||||
args_ = parser.parse_args()
|
||||
if args_.verbose:
|
||||
@@ -308,18 +212,17 @@ def parse_args():
|
||||
|
||||
|
||||
def get_version(version=__version__):
|
||||
if not version.endswith("+dev"):
|
||||
if not version.endswith('+dev'):
|
||||
return version
|
||||
try:
|
||||
cwd = os.path.dirname(os.path.realpath(__file__))
|
||||
return Git(cwd=cwd, errors=False).describe().lstrip("v")
|
||||
return Git(cwd=cwd, errors=False).describe().lstrip('v')
|
||||
except Git.Error:
|
||||
return "-".join((version, "unknown"))
|
||||
return '-'.join((version, "unknown"))
|
||||
|
||||
|
||||
# Helper functions ############################################################
|
||||
|
||||
|
||||
def setup_logging():
|
||||
"""Add TRACE logging level and corresponding method, return the root logger"""
|
||||
logging.TRACE = TRACE = logging.DEBUG // 2
|
||||
@@ -352,13 +255,11 @@ def normalize(path):
|
||||
if path and path[0] == '"':
|
||||
# Python 2: path = path[1:-1].decode("string-escape")
|
||||
# Python 3: https://stackoverflow.com/a/46650050/624066
|
||||
path = (
|
||||
path[1:-1] # Remove enclosing double quotes
|
||||
.encode("latin1") # Convert to bytes, required by 'unicode-escape'
|
||||
.decode("unicode-escape") # Perform the actual octal-escaping decode
|
||||
.encode("latin1") # 1:1 mapping to bytes, UTF-8 encoded
|
||||
.decode("utf8", "surrogateescape")
|
||||
) # Decode from UTF-8
|
||||
path = (path[1:-1] # Remove enclosing double quotes
|
||||
.encode('latin1') # Convert to bytes, required by 'unicode-escape'
|
||||
.decode('unicode-escape') # Perform the actual octal-escaping decode
|
||||
.encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded
|
||||
.decode('utf8', 'surrogateescape')) # Decode from UTF-8
|
||||
if NORMALIZE_PATHS:
|
||||
# Make sure the slash matches the OS; for Windows we need a backslash
|
||||
path = os.path.normpath(path)
|
||||
@@ -381,12 +282,12 @@ def touch_ns(path, mtime_ns):
|
||||
|
||||
def isodate(secs: int):
|
||||
# time.localtime() accepts floats, but discards fractional part
|
||||
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(secs))
|
||||
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs))
|
||||
|
||||
|
||||
def isodate_ns(ns: int):
|
||||
# for integers fromtimestamp() is equivalent and ~16% slower than isodate()
|
||||
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=" ")
|
||||
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ')
|
||||
|
||||
|
||||
def get_mtime_ns(secs: int, idx: int):
|
||||
@@ -404,49 +305,35 @@ def get_mtime_path(path):
|
||||
|
||||
# Git class and parse_log(), the heart of the script ##########################
|
||||
|
||||
|
||||
class Git:
|
||||
def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True):
|
||||
self.gitcmd = ["git"]
|
||||
self.gitcmd = ['git']
|
||||
self.errors = errors
|
||||
self._proc = None
|
||||
if workdir:
|
||||
self.gitcmd.extend(("--work-tree", workdir))
|
||||
if gitdir:
|
||||
self.gitcmd.extend(("--git-dir", gitdir))
|
||||
if cwd:
|
||||
self.gitcmd.extend(("-C", cwd))
|
||||
if workdir: self.gitcmd.extend(('--work-tree', workdir))
|
||||
if gitdir: self.gitcmd.extend(('--git-dir', gitdir))
|
||||
if cwd: self.gitcmd.extend(('-C', cwd))
|
||||
self.workdir, self.gitdir = self._get_repo_dirs()
|
||||
|
||||
def ls_files(self, paths: list = None):
|
||||
return (normalize(_) for _ in self._run("ls-files --full-name", paths))
|
||||
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
|
||||
|
||||
def ls_dirty(self, force=False):
|
||||
return (
|
||||
normalize(_[3:].split(" -> ", 1)[-1])
|
||||
for _ in self._run("status --porcelain")
|
||||
if _[:2] != "??" and (not force or (_[0] in ("R", "A") or _[1] == "D"))
|
||||
)
|
||||
return (normalize(_[3:].split(' -> ', 1)[-1])
|
||||
for _ in self._run('status --porcelain')
|
||||
if _[:2] != '??' and (not force or (_[0] in ('R', 'A')
|
||||
or _[1] == 'D')))
|
||||
|
||||
def log(
|
||||
self,
|
||||
merge=False,
|
||||
first_parent=False,
|
||||
commit_time=False,
|
||||
reverse_order=False,
|
||||
paths: list = None,
|
||||
):
|
||||
cmd = "whatchanged --pretty={}".format("%ct" if commit_time else "%at")
|
||||
if merge:
|
||||
cmd += " -m"
|
||||
if first_parent:
|
||||
cmd += " --first-parent"
|
||||
if reverse_order:
|
||||
cmd += " --reverse"
|
||||
def log(self, merge=False, first_parent=False, commit_time=False,
|
||||
reverse_order=False, paths: list = None):
|
||||
cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at')
|
||||
if merge: cmd += ' -m'
|
||||
if first_parent: cmd += ' --first-parent'
|
||||
if reverse_order: cmd += ' --reverse'
|
||||
return self._run(cmd, paths)
|
||||
|
||||
def describe(self):
|
||||
return self._run("describe --tags", check=True)[0]
|
||||
return self._run('describe --tags', check=True)[0]
|
||||
|
||||
def terminate(self):
|
||||
if self._proc is None:
|
||||
@@ -458,22 +345,18 @@ class Git:
|
||||
pass
|
||||
|
||||
def _get_repo_dirs(self):
|
||||
return (
|
||||
os.path.normpath(_)
|
||||
for _ in self._run(
|
||||
"rev-parse --show-toplevel --absolute-git-dir", check=True
|
||||
)
|
||||
)
|
||||
return (os.path.normpath(_) for _ in
|
||||
self._run('rev-parse --show-toplevel --absolute-git-dir', check=True))
|
||||
|
||||
def _run(self, cmdstr: str, paths: list = None, output=True, check=False):
|
||||
cmdlist = self.gitcmd + shlex.split(cmdstr)
|
||||
if paths:
|
||||
cmdlist.append("--")
|
||||
cmdlist.append('--')
|
||||
cmdlist.extend(paths)
|
||||
popen_args = dict(universal_newlines=True, encoding="utf8")
|
||||
popen_args = dict(universal_newlines=True, encoding='utf8')
|
||||
if not self.errors:
|
||||
popen_args["stderr"] = subprocess.DEVNULL
|
||||
log.trace("Executing: %s", " ".join(cmdlist))
|
||||
popen_args['stderr'] = subprocess.DEVNULL
|
||||
log.trace("Executing: %s", ' '.join(cmdlist))
|
||||
if not output:
|
||||
return subprocess.call(cmdlist, **popen_args)
|
||||
if check:
|
||||
@@ -496,26 +379,30 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
mtime = 0
|
||||
datestr = isodate(0)
|
||||
for line in git.log(
|
||||
merge, args.first_parent, args.commit_time, args.reverse_order, filterlist
|
||||
merge,
|
||||
args.first_parent,
|
||||
args.commit_time,
|
||||
args.reverse_order,
|
||||
filterlist
|
||||
):
|
||||
stats["loglines"] += 1
|
||||
stats['loglines'] += 1
|
||||
|
||||
# Blank line between Date and list of files
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# Date line
|
||||
if line[0] != ":": # Faster than `not line.startswith(':')`
|
||||
stats["commits"] += 1
|
||||
if line[0] != ':': # Faster than `not line.startswith(':')`
|
||||
stats['commits'] += 1
|
||||
mtime = int(line)
|
||||
if args.unique_times:
|
||||
mtime = get_mtime_ns(mtime, stats["commits"])
|
||||
mtime = get_mtime_ns(mtime, stats['commits'])
|
||||
if args.debug:
|
||||
datestr = isodate(mtime)
|
||||
continue
|
||||
|
||||
# File line: three tokens if it describes a renaming, otherwise two
|
||||
tokens = line.split("\t")
|
||||
tokens = line.split('\t')
|
||||
|
||||
# Possible statuses:
|
||||
# M: Modified (content changed)
|
||||
@@ -524,7 +411,7 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
# T: Type changed: to/from regular file, symlinks, submodules
|
||||
# R099: Renamed (moved), with % of unchanged content. 100 = pure rename
|
||||
# Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken
|
||||
status = tokens[0].split(" ")[-1]
|
||||
status = tokens[0].split(' ')[-1]
|
||||
file = tokens[-1]
|
||||
|
||||
# Handles non-ASCII chars and OS path separator
|
||||
@@ -532,76 +419,56 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
|
||||
def do_file():
|
||||
if args.skip_older_than_commit and get_mtime_path(file) <= mtime:
|
||||
stats["skip"] += 1
|
||||
stats['skip'] += 1
|
||||
return
|
||||
if args.debug:
|
||||
log.debug(
|
||||
"%d\t%d\t%d\t%s\t%s",
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
stats["files"],
|
||||
datestr,
|
||||
file,
|
||||
)
|
||||
log.debug("%d\t%d\t%d\t%s\t%s",
|
||||
stats['loglines'], stats['commits'], stats['files'],
|
||||
datestr, file)
|
||||
try:
|
||||
touch(os.path.join(git.workdir, file), mtime)
|
||||
stats["touches"] += 1
|
||||
stats['touches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, file)
|
||||
stats["errors"] += 1
|
||||
stats['errors'] += 1
|
||||
|
||||
def do_dir():
|
||||
if args.debug:
|
||||
log.debug(
|
||||
"%d\t%d\t-\t%s\t%s",
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
datestr,
|
||||
"{}/".format(dirname or "."),
|
||||
)
|
||||
log.debug("%d\t%d\t-\t%s\t%s",
|
||||
stats['loglines'], stats['commits'],
|
||||
datestr, "{}/".format(dirname or '.'))
|
||||
try:
|
||||
touch(os.path.join(git.workdir, dirname), mtime)
|
||||
stats["dirtouches"] += 1
|
||||
stats['dirtouches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, dirname)
|
||||
stats["direrrors"] += 1
|
||||
stats['direrrors'] += 1
|
||||
|
||||
if file in filelist:
|
||||
stats["files"] -= 1
|
||||
stats['files'] -= 1
|
||||
filelist.remove(file)
|
||||
do_file()
|
||||
|
||||
if args.dirs and status in ("A", "D"):
|
||||
if args.dirs and status in ('A', 'D'):
|
||||
dirname = os.path.dirname(file)
|
||||
if dirname in dirlist:
|
||||
dirlist.remove(dirname)
|
||||
do_dir()
|
||||
|
||||
# All files done?
|
||||
if not stats["files"]:
|
||||
if not stats['files']:
|
||||
git.terminate()
|
||||
return
|
||||
|
||||
|
||||
# Main Logic ##################################################################
|
||||
|
||||
|
||||
def main():
|
||||
start = time.time() # yes, Wall time. CPU time is not realistic for users.
|
||||
stats = {
|
||||
_: 0
|
||||
for _ in (
|
||||
"loglines",
|
||||
"commits",
|
||||
"touches",
|
||||
"skip",
|
||||
"errors",
|
||||
"dirtouches",
|
||||
"direrrors",
|
||||
)
|
||||
}
|
||||
stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors',
|
||||
'dirtouches', 'direrrors')}
|
||||
|
||||
logging.basicConfig(level=args.loglevel, format="%(message)s")
|
||||
logging.basicConfig(level=args.loglevel, format='%(message)s')
|
||||
log.trace("Arguments: %s", args)
|
||||
|
||||
# First things first: Where and Who are we?
|
||||
@@ -632,16 +499,13 @@ def main():
|
||||
|
||||
# Symlink (to file, to dir or broken - git handles the same way)
|
||||
if not UPDATE_SYMLINKS and os.path.islink(fullpath):
|
||||
log.warning(
|
||||
"WARNING: Skipping symlink, no OS support for updates: %s", path
|
||||
)
|
||||
log.warning("WARNING: Skipping symlink, no OS support for updates: %s",
|
||||
path)
|
||||
continue
|
||||
|
||||
# skip files which are older than given threshold
|
||||
if (
|
||||
args.skip_older_than
|
||||
and start - get_mtime_path(fullpath) > args.skip_older_than
|
||||
):
|
||||
if (args.skip_older_than
|
||||
and start - get_mtime_path(fullpath) > args.skip_older_than):
|
||||
continue
|
||||
|
||||
# Always add files relative to worktree root
|
||||
@@ -655,17 +519,15 @@ def main():
|
||||
else:
|
||||
dirty = set(git.ls_dirty())
|
||||
if dirty:
|
||||
log.warning(
|
||||
"WARNING: Modified files in the working directory were ignored."
|
||||
"\nTo include such files, commit your changes or use --force."
|
||||
)
|
||||
log.warning("WARNING: Modified files in the working directory were ignored."
|
||||
"\nTo include such files, commit your changes or use --force.")
|
||||
filelist -= dirty
|
||||
|
||||
# Build dir list to be processed
|
||||
dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set()
|
||||
|
||||
stats["totalfiles"] = stats["files"] = len(filelist)
|
||||
log.info("{0:,} files to be processed in work dir".format(stats["totalfiles"]))
|
||||
stats['totalfiles'] = stats['files'] = len(filelist)
|
||||
log.info("{0:,} files to be processed in work dir".format(stats['totalfiles']))
|
||||
|
||||
if not filelist:
|
||||
# Nothing to do. Exit silently and without errors, just like git does
|
||||
@@ -682,18 +544,10 @@ def main():
|
||||
if args.missing and not args.merge:
|
||||
filterlist = list(filelist)
|
||||
missing = len(filterlist)
|
||||
log.info(
|
||||
"{0:,} files not found in log, trying merge commits".format(missing)
|
||||
)
|
||||
log.info("{0:,} files not found in log, trying merge commits".format(missing))
|
||||
for i in range(0, missing, STEPMISSING):
|
||||
parse_log(
|
||||
filelist,
|
||||
dirlist,
|
||||
stats,
|
||||
git,
|
||||
merge=True,
|
||||
filterlist=filterlist[i : i + STEPMISSING],
|
||||
)
|
||||
parse_log(filelist, dirlist, stats, git,
|
||||
merge=True, filterlist=filterlist[i:i + STEPMISSING])
|
||||
|
||||
# Still missing some?
|
||||
for file in filelist:
|
||||
@@ -702,33 +556,29 @@ def main():
|
||||
# Final statistics
|
||||
# Suggestion: use git-log --before=mtime to brag about skipped log entries
|
||||
def log_info(msg, *a, width=13):
|
||||
ifmt = "{:%d,}" % (width,) # not using 'n' for consistency with ffmt
|
||||
ffmt = "{:%d,.2f}" % (width,)
|
||||
ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt
|
||||
ffmt = '{:%d,.2f}' % (width,)
|
||||
# %-formatting lacks a thousand separator, must pre-render with .format()
|
||||
log.info(msg.replace("%d", ifmt).replace("%f", ffmt).format(*a))
|
||||
log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a))
|
||||
|
||||
log_info(
|
||||
"Statistics:\n%f seconds\n%d log lines processed\n%d commits evaluated",
|
||||
time.time() - start,
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
)
|
||||
"Statistics:\n"
|
||||
"%f seconds\n"
|
||||
"%d log lines processed\n"
|
||||
"%d commits evaluated",
|
||||
time.time() - start, stats['loglines'], stats['commits'])
|
||||
|
||||
if args.dirs:
|
||||
if stats["direrrors"]:
|
||||
log_info("%d directory update errors", stats["direrrors"])
|
||||
log_info("%d directories updated", stats["dirtouches"])
|
||||
if stats['direrrors']: log_info("%d directory update errors", stats['direrrors'])
|
||||
log_info("%d directories updated", stats['dirtouches'])
|
||||
|
||||
if stats["touches"] != stats["totalfiles"]:
|
||||
log_info("%d files", stats["totalfiles"])
|
||||
if stats["skip"]:
|
||||
log_info("%d files skipped", stats["skip"])
|
||||
if stats["files"]:
|
||||
log_info("%d files missing", stats["files"])
|
||||
if stats["errors"]:
|
||||
log_info("%d file update errors", stats["errors"])
|
||||
if stats['touches'] != stats['totalfiles']:
|
||||
log_info("%d files", stats['totalfiles'])
|
||||
if stats['skip']: log_info("%d files skipped", stats['skip'])
|
||||
if stats['files']: log_info("%d files missing", stats['files'])
|
||||
if stats['errors']: log_info("%d file update errors", stats['errors'])
|
||||
|
||||
log_info("%d files updated", stats["touches"])
|
||||
log_info("%d files updated", stats['touches'])
|
||||
|
||||
if args.test:
|
||||
log.info("TEST RUN - No files modified!")
|
||||
|
||||
7
.github/workflows/.codespell-exclude
vendored
Normal file
7
.github/workflows/.codespell-exclude
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
libs/community/langchain_community/llms/yuan2.py
|
||||
"NotIn": "not in",
|
||||
- `/checkin`: Check-in
|
||||
docs/docs/integrations/providers/trulens.mdx
|
||||
self.assertIn(
|
||||
from trulens_eval import Tru
|
||||
tru = Tru()
|
||||
27
.github/workflows/_compile_integration_test.yml
vendored
27
.github/workflows/_compile_integration_test.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '🔗 Compile Integration Tests'
|
||||
name: compile-integration-test
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -12,11 +12,8 @@ on:
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -24,25 +21,27 @@ jobs:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
name: "poetry run pytest -m compile tests/integration_tests #${{ inputs.python-version }}"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: compile-integration
|
||||
|
||||
- name: '📦 Install Integration Dependencies'
|
||||
- name: Install integration dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test --group test_integration
|
||||
run: poetry install --with=test_integration,test
|
||||
|
||||
- name: '🔗 Check Integration Tests Compile'
|
||||
- name: Check integration tests compile
|
||||
shell: bash
|
||||
run: uv run pytest -m compile tests/integration_tests
|
||||
run: poetry run pytest -m compile tests/integration_tests
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
45
.github/workflows/_integration_test.yml
vendored
45
.github/workflows/_integration_test.yml
vendored
@@ -1,5 +1,4 @@
|
||||
name: '🚀 Integration Tests'
|
||||
run-name: 'Test ${{ inputs.working-directory }} on Python ${{ inputs.python-version }}'
|
||||
name: Integration tests
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
@@ -7,18 +6,13 @@ on:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: true
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
default: "3.11"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -26,33 +20,44 @@ jobs:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
name: Python ${{ inputs.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: '📦 Install Integration Dependencies'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test --group test_integration
|
||||
run: poetry install --with test,test_integration
|
||||
|
||||
- name: '🚀 Run Integration Tests'
|
||||
- name: Install deps outside pyproject
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
|
||||
shell: bash
|
||||
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
shell: bash
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
@@ -67,17 +72,19 @@ jobs:
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
run: |
|
||||
make integration_tests
|
||||
|
||||
|
||||
71
.github/workflows/_lint.yml
vendored
71
.github/workflows/_lint.yml
vendored
@@ -1,6 +1,4 @@
|
||||
name: '🧹 Code Linting'
|
||||
# Runs code quality checks using ruff, mypy, and other linting tools
|
||||
# Checks both package code and test code for consistency
|
||||
name: lint
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -14,33 +12,41 @@ on:
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
# This env var allows us to get inline annotations when ruff has complaints.
|
||||
RUFF_OUTPUT_FORMAT: github
|
||||
|
||||
UV_FROZEN: "true"
|
||||
|
||||
jobs:
|
||||
# Linting job - runs quality checks on package and test code
|
||||
build:
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
name: "make lint #${{ inputs.python-version }}"
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: lint-with-extras
|
||||
|
||||
- name: '📦 Install Lint & Typing Dependencies'
|
||||
- name: Check Poetry File
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry check
|
||||
|
||||
- name: Check lock file
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry lock --check
|
||||
|
||||
- name: Install dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
@@ -51,14 +57,24 @@ jobs:
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --group lint --group typing
|
||||
poetry install --with lint,typing
|
||||
|
||||
- name: '🔍 Analyze Package Code with Linters'
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache
|
||||
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ inputs.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
make lint_package
|
||||
|
||||
- name: '📦 Install Unit Test Dependencies'
|
||||
- name: Install unit test dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
@@ -70,14 +86,23 @@ jobs:
|
||||
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --inexact --group test
|
||||
- name: '📦 Install Unit + Integration Test Dependencies'
|
||||
poetry install --with test
|
||||
- name: Install unit+integration test dependencies
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --inexact --group test --group test_integration
|
||||
poetry install --with test,test_integration
|
||||
|
||||
- name: '🔍 Analyze Test Code with Linters'
|
||||
- name: Get .mypy_cache_test to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache_test
|
||||
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ inputs.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
make lint_tests
|
||||
|
||||
253
.github/workflows/_release.yml
vendored
253
.github/workflows/_release.yml
vendored
@@ -1,5 +1,5 @@
|
||||
name: '🚀 Package Release'
|
||||
run-name: 'Release ${{ inputs.working-directory }} ${{ inputs.release-version }}'
|
||||
name: release
|
||||
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
@@ -12,27 +12,18 @@ on:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
default: 'libs/langchain'
|
||||
release-version:
|
||||
required: true
|
||||
type: string
|
||||
default: '0.1.0'
|
||||
description: "New version of package being released"
|
||||
dangerous-nonmaster-release:
|
||||
required: false
|
||||
type: boolean
|
||||
default: false
|
||||
description: "Release from a non-master branch (danger!) - Only use for hotfixes"
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Build the distribution package and extract version info
|
||||
# Runs in isolated environment with minimal permissions for security
|
||||
build:
|
||||
if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release
|
||||
environment: Scheduled testing
|
||||
@@ -45,10 +36,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
@@ -62,7 +56,7 @@ jobs:
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: Build project for distribution
|
||||
run: uv build
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
@@ -71,20 +65,13 @@ jobs:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Check version
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: python
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
import os
|
||||
import tomllib
|
||||
with open("pyproject.toml", "rb") as f:
|
||||
data = tomllib.load(f)
|
||||
pkg_name = data["project"]["name"]
|
||||
version = data["project"]["version"]
|
||||
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
|
||||
f.write(f"pkg-name={pkg_name}\n")
|
||||
f.write(f"version={version}\n")
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
release-notes:
|
||||
needs:
|
||||
- build
|
||||
@@ -100,7 +87,7 @@ jobs:
|
||||
${{ inputs.working-directory }}
|
||||
ref: ${{ github.ref }} # this scopes to just ref'd branch
|
||||
fetch-depth: 0 # this fetches entire commit history
|
||||
- name: Check tags
|
||||
- name: Check Tags
|
||||
id: check-tags
|
||||
shell: bash
|
||||
working-directory: langchain/${{ inputs.working-directory }}
|
||||
@@ -108,47 +95,9 @@ jobs:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
# Handle regular versions and pre-release versions differently
|
||||
if [[ "$VERSION" == *"-"* ]]; then
|
||||
# This is a pre-release version (contains a hyphen)
|
||||
# Extract the base version without the pre-release suffix
|
||||
BASE_VERSION=${VERSION%%-*}
|
||||
# Look for the latest release of the same base version
|
||||
REGEX="^$PKG_NAME==$BASE_VERSION\$"
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
|
||||
|
||||
# If no exact base version match, look for the latest release of any kind
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
|
||||
fi
|
||||
else
|
||||
# Regular version handling
|
||||
PREV_TAG="$PKG_NAME==${VERSION%.*}.$(( ${VERSION##*.} - 1 ))"; [[ "${VERSION##*.}" -eq 0 ]] && PREV_TAG=""
|
||||
|
||||
# backup case if releasing e.g. 0.3.0, looks up last release
|
||||
# note if last release (chronologically) was e.g. 0.1.47 it will get
|
||||
# that instead of the last 0.2 release
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
echo $REGEX
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P $REGEX || true) | head -1)
|
||||
fi
|
||||
fi
|
||||
|
||||
# if PREV_TAG is empty, let it be empty
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
echo "No previous tag found - first release"
|
||||
else
|
||||
# confirm prev-tag actually exists in git repo with git tag
|
||||
GIT_TAG_RESULT=$(git tag -l "$PREV_TAG")
|
||||
if [ -z "$GIT_TAG_RESULT" ]; then
|
||||
echo "Previous tag $PREV_TAG not found in git repo"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
echo $REGEX
|
||||
PREV_TAG=$(git tag --sort=-creatordate | grep -P $REGEX || true | head -1)
|
||||
TAG="${PKG_NAME}==${VERSION}"
|
||||
if [ "$TAG" == "$PREV_TAG" ]; then
|
||||
echo "No new version to release"
|
||||
@@ -197,7 +146,6 @@ jobs:
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -214,18 +162,15 @@ jobs:
|
||||
# - The package is published, and it breaks on the missing dependency when
|
||||
# used in the real world.
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Import dist package
|
||||
- name: Import published package
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
env:
|
||||
@@ -241,21 +186,27 @@ jobs:
|
||||
# - attempt install again after 5 seconds if it fails because there is
|
||||
# sometimes a delay in availability on test pypi
|
||||
run: |
|
||||
uv venv
|
||||
VIRTUAL_ENV=.venv uv pip install dist/*.whl
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" || \
|
||||
( \
|
||||
sleep 15 && \
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" \
|
||||
)
|
||||
|
||||
# Replace all dashes in the package name with underscores,
|
||||
# since that's how Python imports packages with dashes in the name.
|
||||
# also remove _official suffix
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g | sed s/_official//g)"
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
|
||||
|
||||
uv run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
|
||||
- name: Import test dependencies
|
||||
run: uv sync --group test
|
||||
run: poetry install --with test
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Overwrite the local version of the package with the built version
|
||||
# Overwrite the local version of the package with the test PyPI version.
|
||||
- name: Import published package (again)
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
shell: bash
|
||||
@@ -263,7 +214,9 @@ jobs:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install dist/*.whl
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION"
|
||||
|
||||
- name: Run unit tests
|
||||
run: make tests
|
||||
@@ -272,15 +225,15 @@ jobs:
|
||||
- name: Check for prerelease versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
|
||||
poetry run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
|
||||
|
||||
- name: Get minimum versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install packaging requests
|
||||
python_version="$(uv run python --version | awk '{print $2}')"
|
||||
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)"
|
||||
poetry run pip install packaging
|
||||
python_version="$(poetry run python --version | awk '{print $2}')"
|
||||
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
@@ -289,12 +242,18 @@ jobs:
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install --force-reinstall $MIN_VERSIONS --editable .
|
||||
poetry run pip install --force-reinstall $MIN_VERSIONS --editable .
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Import integration test dependencies
|
||||
run: uv sync --group test --group test_integration
|
||||
run: poetry install --with test,test_integration
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Run integration tests
|
||||
@@ -310,7 +269,6 @@ jobs:
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
@@ -322,110 +280,29 @@ jobs:
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
UNSTRUCTURED_API_KEY: ${{ secrets.UNSTRUCTURED_API_KEY }}
|
||||
run: make integration_tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Test select published packages against new core
|
||||
test-prior-published-packages-against-new-core:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
partner: [openai, anthropic]
|
||||
fail-fast: false # Continue testing other partners if one fails
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We implement this conditional as Github Actions does not have good support
|
||||
# for conditionally needing steps. https://github.com/actions/runner/issues/491
|
||||
- name: Check if libs/core
|
||||
run: |
|
||||
if [ "${{ startsWith(inputs.working-directory, 'libs/core') }}" != "true" ]; then
|
||||
echo "Not in libs/core. Exiting successfully."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
- name: Set up Python + uv
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
uses: "./.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Test against ${{ matrix.partner }}
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
run: |
|
||||
# Identify latest tag, excluding pre-releases
|
||||
LATEST_PACKAGE_TAG="$(
|
||||
git ls-remote --tags origin "langchain-${{ matrix.partner }}*" \
|
||||
| awk '{print $2}' \
|
||||
| sed 's|refs/tags/||' \
|
||||
| grep -Ev '==[^=]*(\.?dev[0-9]*|\.?rc[0-9]*)$' \
|
||||
| sort -Vr \
|
||||
| head -n 1
|
||||
)"
|
||||
echo "Latest package tag: $LATEST_PACKAGE_TAG"
|
||||
|
||||
# Shallow-fetch just that single tag
|
||||
git fetch --depth=1 origin tag "$LATEST_PACKAGE_TAG"
|
||||
|
||||
# Checkout the latest package files
|
||||
rm -rf $GITHUB_WORKSPACE/libs/partners/${{ matrix.partner }}/*
|
||||
rm -rf $GITHUB_WORKSPACE/libs/standard-tests/*
|
||||
cd $GITHUB_WORKSPACE/libs/
|
||||
git checkout "$LATEST_PACKAGE_TAG" -- standard-tests/
|
||||
git checkout "$LATEST_PACKAGE_TAG" -- partners/${{ matrix.partner }}/
|
||||
cd partners/${{ matrix.partner }}
|
||||
|
||||
# Print as a sanity check
|
||||
echo "Version number from pyproject.toml: "
|
||||
cat pyproject.toml | grep "version = "
|
||||
|
||||
# Run tests
|
||||
uv sync --group test --group test_integration
|
||||
uv pip install ../../core/dist/*.whl
|
||||
make integration_tests
|
||||
|
||||
publish:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
- test-prior-published-packages-against-new-core
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is used for trusted publishing:
|
||||
@@ -442,10 +319,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
@@ -458,8 +338,6 @@ jobs:
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
# Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
|
||||
attestations: false
|
||||
|
||||
mark-release:
|
||||
needs:
|
||||
@@ -481,16 +359,19 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
|
||||
- name: Create Tag
|
||||
uses: ncipollo/release-action@v1
|
||||
with:
|
||||
|
||||
62
.github/workflows/_release_docker.yml
vendored
Normal file
62
.github/workflows/_release_docker.yml
vendored
Normal file
@@ -0,0 +1,62 @@
|
||||
name: release_docker
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
dockerfile:
|
||||
required: true
|
||||
type: string
|
||||
description: "Path to the Dockerfile to build"
|
||||
image:
|
||||
required: true
|
||||
type: string
|
||||
description: "Name of the image to build"
|
||||
|
||||
env:
|
||||
TEST_TAG: ${{ inputs.image }}:test
|
||||
LATEST_TAG: ${{ inputs.image }}:latest
|
||||
|
||||
jobs:
|
||||
docker:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
- name: Get git tag
|
||||
uses: actions-ecosystem/action-get-latest-tag@v1
|
||||
id: get-latest-tag
|
||||
- name: Set docker tag
|
||||
env:
|
||||
VERSION: ${{ steps.get-latest-tag.outputs.tag }}
|
||||
run: |
|
||||
echo "VERSION_TAG=${{ inputs.image }}:${VERSION#v}" >> $GITHUB_ENV
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@v3
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
- name: Login to Docker Hub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
- name: Build for Test
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
file: ${{ inputs.dockerfile }}
|
||||
load: true
|
||||
tags: ${{ env.TEST_TAG }}
|
||||
- name: Test
|
||||
run: |
|
||||
docker run --rm ${{ env.TEST_TAG }} python -c "import langchain"
|
||||
- name: Build and Push to Docker Hub
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
file: ${{ inputs.dockerfile }}
|
||||
# We can only build for the intersection of platforms supported by
|
||||
# QEMU and base python image, for now build only for
|
||||
# linux/amd64 and linux/arm64
|
||||
platforms: linux/amd64,linux/arm64
|
||||
tags: ${{ env.LATEST_TAG }},${{ env.VERSION_TAG }}
|
||||
push: true
|
||||
46
.github/workflows/_test.yml
vendored
46
.github/workflows/_test.yml
vendored
@@ -1,6 +1,4 @@
|
||||
name: '🧪 Unit Testing'
|
||||
# Runs unit tests with both current and minimum supported dependency versions
|
||||
# to ensure compatibility across the supported range
|
||||
name: test
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -14,61 +12,57 @@ on:
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Main test job - runs unit tests with current deps, then retests with minimum versions
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
name: "make test #${{ inputs.python-version }}"
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
- name: '📦 Install Test Dependencies'
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test --dev
|
||||
run: poetry install --with test
|
||||
|
||||
- name: '🧪 Run Core Unit Tests'
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: |
|
||||
make test
|
||||
|
||||
- name: '🔍 Calculate Minimum Dependency Versions'
|
||||
- name: Get minimum versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
shell: bash
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install packaging tomli requests
|
||||
python_version="$(uv run python --version | awk '{print $2}')"
|
||||
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)"
|
||||
poetry run pip install packaging tomli
|
||||
python_version="$(poetry run python --version | awk '{print $2}')"
|
||||
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
- name: '🧪 Run Tests with Minimum Dependencies'
|
||||
- name: Run unit tests with minimum dependency versions
|
||||
if: ${{ steps.min-version.outputs.min-versions != '' }}
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install $MIN_VERSIONS
|
||||
poetry run pip install $MIN_VERSIONS
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
@@ -79,4 +73,4 @@ jobs:
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
|
||||
|
||||
|
||||
33
.github/workflows/_test_doc_imports.yml
vendored
33
.github/workflows/_test_doc_imports.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '📑 Documentation Import Testing'
|
||||
name: test_doc_imports
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -8,40 +8,37 @@ on:
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: '🔍 Check Doc Imports (Python ${{ inputs.python-version }})'
|
||||
name: "check doc imports #${{ inputs.python-version }}"
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
cache-key: core
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test
|
||||
run: poetry install --with test
|
||||
|
||||
- name: '📦 Install LangChain in Editable Mode'
|
||||
- name: Install langchain editable
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install langchain-experimental langchain-community -e libs/core libs/langchain
|
||||
poetry run pip install -e libs/core libs/langchain libs/community libs/experimental
|
||||
|
||||
- name: '🔍 Validate Documentation Import Statements'
|
||||
- name: Check doc imports
|
||||
shell: bash
|
||||
run: |
|
||||
uv run python docs/scripts/check_imports.py
|
||||
poetry run python docs/scripts/check_imports.py
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the test did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
35
.github/workflows/_test_pydantic.yml
vendored
35
.github/workflows/_test_pydantic.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '🐍 Pydantic Version Testing'
|
||||
name: test pydantic intermediate versions
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -17,12 +17,8 @@ on:
|
||||
type: string
|
||||
description: "Pydantic version to test."
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -30,31 +26,32 @@ jobs:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Pydantic ~=${{ inputs.pydantic-version }}'
|
||||
name: "make test # pydantic: ~=${{ inputs.pydantic-version }}, python: ${{ inputs.python-version }}, "
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test
|
||||
run: poetry install --with test
|
||||
|
||||
- name: '🔄 Install Specific Pydantic Version'
|
||||
- name: Overwrite pydantic version
|
||||
shell: bash
|
||||
run: VIRTUAL_ENV=.venv uv pip install pydantic~=${{ inputs.pydantic-version }}
|
||||
run: poetry run pip install pydantic~=${{ inputs.pydantic-version }}
|
||||
|
||||
- name: '🧪 Run Core Tests'
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: |
|
||||
make test
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
@@ -64,4 +61,4 @@ jobs:
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
36
.github/workflows/_test_release.yml
vendored
36
.github/workflows/_test_release.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '🧪 Test Release Package'
|
||||
name: test-release
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -14,8 +14,8 @@ on:
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -29,10 +29,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
@@ -45,30 +48,23 @@ jobs:
|
||||
# > It is strongly advised to separate jobs for building [...]
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: '📦 Build Project for Distribution'
|
||||
run: uv build
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: '⬆️ Upload Build Artifacts'
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: '🔍 Extract Version Information'
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: python
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
import os
|
||||
import tomllib
|
||||
with open("pyproject.toml", "rb") as f:
|
||||
data = tomllib.load(f)
|
||||
pkg_name = data["project"]["name"]
|
||||
version = data["project"]["version"]
|
||||
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
|
||||
f.write(f"pkg-name={pkg_name}\n")
|
||||
f.write(f"version={version}\n")
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
|
||||
publish:
|
||||
needs:
|
||||
@@ -102,5 +98,3 @@ jobs:
|
||||
# This is *only for CI use* and is *extremely dangerous* otherwise!
|
||||
# https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates
|
||||
skip-existing: true
|
||||
# Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
|
||||
attestations: false
|
||||
|
||||
120
.github/workflows/api_doc_build.yml
vendored
120
.github/workflows/api_doc_build.yml
vendored
@@ -1,120 +0,0 @@
|
||||
name: '📚 API Docs'
|
||||
run-name: 'Build & Deploy API Reference'
|
||||
# Runs daily or can be triggered manually for immediate updates
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *' # Daily at 1PM UTC
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
|
||||
jobs:
|
||||
# Only runs on main repository to prevent unnecessary builds on forks
|
||||
build:
|
||||
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
path: langchain
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-api-docs-html
|
||||
path: langchain-api-docs-html
|
||||
token: ${{ secrets.TOKEN_GITHUB_API_DOCS_HTML }}
|
||||
|
||||
- name: '📋 Extract Repository List with yq'
|
||||
id: get-unsorted-repos
|
||||
uses: mikefarah/yq@master
|
||||
with:
|
||||
cmd: |
|
||||
yq '
|
||||
.packages[]
|
||||
| select(
|
||||
(
|
||||
(.repo | test("^langchain-ai/"))
|
||||
and
|
||||
(.repo != "langchain-ai/langchain")
|
||||
)
|
||||
or
|
||||
(.include_in_api_ref // false)
|
||||
)
|
||||
| .repo
|
||||
' langchain/libs/packages.yml
|
||||
|
||||
- name: '📋 Parse YAML & Checkout Repositories'
|
||||
env:
|
||||
REPOS_UNSORTED: ${{ steps.get-unsorted-repos.outputs.result }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
# Get unique repositories
|
||||
REPOS=$(echo "$REPOS_UNSORTED" | sort -u)
|
||||
# Checkout each unique repository
|
||||
for repo in $REPOS; do
|
||||
# Validate repository format (allow any org with proper format)
|
||||
if [[ ! "$repo" =~ ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$ ]]; then
|
||||
echo "Error: Invalid repository format: $repo"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
REPO_NAME=$(echo $repo | cut -d'/' -f2)
|
||||
|
||||
# Additional validation for repo name
|
||||
if [[ ! "$REPO_NAME" =~ ^[a-zA-Z0-9_.-]+$ ]]; then
|
||||
echo "Error: Invalid repository name: $REPO_NAME"
|
||||
exit 1
|
||||
fi
|
||||
echo "Checking out $repo to $REPO_NAME"
|
||||
git clone --depth 1 https://github.com/$repo.git $REPO_NAME
|
||||
done
|
||||
|
||||
- name: '🐍 Setup Python ${{ env.PYTHON_VERSION }}'
|
||||
uses: actions/setup-python@v5
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
|
||||
- name: '📦 Install Initial Python Dependencies'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python -m pip install -U uv
|
||||
python -m uv pip install --upgrade --no-cache-dir pip setuptools pyyaml
|
||||
|
||||
- name: '📦 Organize Library Directories'
|
||||
run: python langchain/.github/scripts/prep_api_docs_build.py
|
||||
|
||||
- name: '🧹 Remove Old HTML Files'
|
||||
run:
|
||||
rm -rf langchain-api-docs-html/api_reference_build/html
|
||||
|
||||
- name: '📦 Install Documentation Dependencies'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
|
||||
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
|
||||
python -m uv pip install -r docs/api_reference/requirements.txt
|
||||
|
||||
- name: '🔧 Configure Git Settings'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
git config --local user.email "actions@github.com"
|
||||
git config --local user.name "Github Actions"
|
||||
|
||||
- name: '📚 Build API Documentation'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python docs/api_reference/create_api_rst.py
|
||||
python -m sphinx -T -E -b html -d ../langchain-api-docs-html/_build/doctrees -c docs/api_reference docs/api_reference ../langchain-api-docs-html/api_reference_build/html -j auto
|
||||
python docs/api_reference/scripts/custom_formatter.py ../langchain-api-docs-html/api_reference_build/html
|
||||
# Default index page is blank so we copy in the actual home page.
|
||||
cp ../langchain-api-docs-html/api_reference_build/html/{reference,index}.html
|
||||
rm -rf ../langchain-api-docs-html/_build/
|
||||
|
||||
# https://github.com/marketplace/actions/add-commit
|
||||
- uses: EndBug/add-and-commit@v9
|
||||
with:
|
||||
cwd: langchain-api-docs-html
|
||||
message: 'Update API docs build'
|
||||
15
.github/workflows/check-broken-links.yml
vendored
15
.github/workflows/check-broken-links.yml
vendored
@@ -1,28 +1,25 @@
|
||||
name: '🔗 Check Broken Links'
|
||||
name: Check Broken Links
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check-links:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: '🟢 Setup Node.js 18.x'
|
||||
uses: actions/setup-node@v4
|
||||
- name: Use Node.js 18.x
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: 18.x
|
||||
cache: "yarn"
|
||||
cache-dependency-path: ./docs/yarn.lock
|
||||
- name: '📦 Install Node Dependencies'
|
||||
- name: Install dependencies
|
||||
run: yarn install --immutable --mode=skip-build
|
||||
working-directory: ./docs
|
||||
- name: '🔍 Scan Documentation for Broken Links'
|
||||
- name: Check broken links
|
||||
run: yarn check-broken-links
|
||||
working-directory: ./docs
|
||||
|
||||
34
.github/workflows/check_core_versions.yml
vendored
34
.github/workflows/check_core_versions.yml
vendored
@@ -1,34 +0,0 @@
|
||||
name: '🔍 Check `core` Version Equality'
|
||||
# Ensures version numbers in pyproject.toml and version.py stay in sync
|
||||
# Prevents releases with mismatched version numbers
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths:
|
||||
- 'libs/core/pyproject.toml'
|
||||
- 'libs/core/langchain_core/version.py'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check_version_equality:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '✅ Verify pyproject.toml & version.py Match'
|
||||
run: |
|
||||
PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' libs/core/pyproject.toml)
|
||||
VERSION_PY_VERSION=$(grep -Po '(?<=^VERSION = ")[^"]*' libs/core/langchain_core/version.py)
|
||||
|
||||
# Compare the two versions
|
||||
if [ "$PYPROJECT_VERSION" != "$VERSION_PY_VERSION" ]; then
|
||||
echo "langchain-core versions in pyproject.toml and version.py do not match!"
|
||||
echo "pyproject.toml version: $PYPROJECT_VERSION"
|
||||
echo "version.py version: $VERSION_PY_VERSION"
|
||||
exit 1
|
||||
else
|
||||
echo "Versions match: $PYPROJECT_VERSION"
|
||||
fi
|
||||
74
.github/workflows/check_diffs.yml
vendored
74
.github/workflows/check_diffs.yml
vendored
@@ -1,12 +1,11 @@
|
||||
name: '🔧 CI'
|
||||
---
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
merge_group:
|
||||
|
||||
# Optimizes CI performance by canceling redundant workflow runs
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
@@ -17,34 +16,22 @@ concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# This job analyzes which files changed and creates a dynamic test matrix
|
||||
# to only run tests/lints for the affected packages, improving CI efficiency
|
||||
build:
|
||||
name: 'Detect Changes & Set Matrix'
|
||||
runs-on: ubuntu-latest
|
||||
if: ${{ !contains(github.event.pull_request.labels.*.name, 'ci-ignore') }}
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- name: '🐍 Setup Python 3.11'
|
||||
uses: actions/setup-python@v5
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- name: '📂 Get Changed Files'
|
||||
id: files
|
||||
uses: Ana06/get-changed-files@v2.3.0
|
||||
- name: '🔍 Analyze Changed Files & Generate Build Matrix'
|
||||
id: set-matrix
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.2.0
|
||||
- id: set-matrix
|
||||
run: |
|
||||
python -m pip install packaging requests
|
||||
python -m pip install packaging
|
||||
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
|
||||
outputs:
|
||||
lint: ${{ steps.set-matrix.outputs.lint }}
|
||||
@@ -54,8 +41,8 @@ jobs:
|
||||
dependencies: ${{ steps.set-matrix.outputs.dependencies }}
|
||||
test-doc-imports: ${{ steps.set-matrix.outputs.test-doc-imports }}
|
||||
test-pydantic: ${{ steps.set-matrix.outputs.test-pydantic }}
|
||||
# Run linting only on packages that have changed files
|
||||
lint:
|
||||
name: cd ${{ matrix.job-configs.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.lint != '[]' }}
|
||||
strategy:
|
||||
@@ -68,8 +55,8 @@ jobs:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Run unit tests only on packages that have changed files
|
||||
test:
|
||||
name: cd ${{ matrix.job-configs.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.test != '[]' }}
|
||||
strategy:
|
||||
@@ -82,8 +69,8 @@ jobs:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Test compatibility with different Pydantic versions for affected packages
|
||||
test-pydantic:
|
||||
name: cd ${{ matrix.job-configs.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.test-pydantic != '[]' }}
|
||||
strategy:
|
||||
@@ -104,12 +91,12 @@ jobs:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.test-doc-imports) }}
|
||||
fail-fast: false
|
||||
uses: ./.github/workflows/_test_doc_imports.yml
|
||||
secrets: inherit
|
||||
with:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Verify integration tests compile without actually running them (faster feedback)
|
||||
compile-integration-tests:
|
||||
name: cd ${{ matrix.job-configs.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.compile-integration-tests != '[]' }}
|
||||
strategy:
|
||||
@@ -122,9 +109,8 @@ jobs:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Run extended test suites that require additional dependencies
|
||||
extended-tests:
|
||||
name: 'Extended Tests'
|
||||
name: "cd ${{ matrix.job-configs.working-directory }} / make extended_tests #${{ matrix.job-configs.python-version }}"
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.extended-tests != '[]' }}
|
||||
strategy:
|
||||
@@ -133,28 +119,32 @@ jobs:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.extended-tests) }}
|
||||
fail-fast: false
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.job-configs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.job-configs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
cache-key: extended
|
||||
|
||||
- name: '📦 Install Dependencies & Run Extended Tests'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with uv..."
|
||||
uv venv
|
||||
uv sync --group test
|
||||
VIRTUAL_ENV=.venv uv pip install -r extended_testing_deps.txt
|
||||
VIRTUAL_ENV=.venv make extended_tests
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install --with test
|
||||
poetry run pip install uv
|
||||
poetry run uv pip install -r extended_testing_deps.txt
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
@@ -165,10 +155,8 @@ jobs:
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
|
||||
# Final status check - ensures all required jobs passed before allowing merge
|
||||
ci_success:
|
||||
name: '✅ CI Success'
|
||||
name: "CI Success"
|
||||
needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic]
|
||||
if: |
|
||||
always()
|
||||
@@ -178,7 +166,7 @@ jobs:
|
||||
RESULTS_JSON: ${{ toJSON(needs.*.result) }}
|
||||
EXIT_CODE: ${{!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') && '0' || '1'}}
|
||||
steps:
|
||||
- name: '🎉 All Checks Passed'
|
||||
- name: "CI Success"
|
||||
run: |
|
||||
echo $JOBS_JSON
|
||||
echo $RESULTS_JSON
|
||||
|
||||
10
.github/workflows/check_new_docs.yml
vendored
10
.github/workflows/check_new_docs.yml
vendored
@@ -1,4 +1,5 @@
|
||||
name: '📑 Integration Docs Lint'
|
||||
---
|
||||
name: Integration docs lint
|
||||
|
||||
on:
|
||||
push:
|
||||
@@ -15,9 +16,6 @@ concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
@@ -27,12 +25,12 @@ jobs:
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.3.0
|
||||
uses: Ana06/get-changed-files@v2.2.0
|
||||
with:
|
||||
filter: |
|
||||
*.ipynb
|
||||
*.md
|
||||
*.mdx
|
||||
- name: '🔍 Check New Documentation Templates'
|
||||
- name: Check new docs
|
||||
run: |
|
||||
python docs/scripts/check_templates.py ${{ steps.files.outputs.added }}
|
||||
|
||||
36
.github/workflows/codespell.yml
vendored
Normal file
36
.github/workflows/codespell.yml
vendored
Normal file
@@ -0,0 +1,36 @@
|
||||
---
|
||||
name: CI / cd . / make spell_check
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master, v0.1, v0.2]
|
||||
pull_request:
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: (Check for spelling errors)
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install Dependencies
|
||||
run: |
|
||||
pip install toml
|
||||
|
||||
- name: Extract Ignore Words List
|
||||
run: |
|
||||
# Use a Python script to extract the ignore words list from pyproject.toml
|
||||
python .github/workflows/extract_ignored_words_list.py
|
||||
id: extract_ignore_words
|
||||
|
||||
# - name: Codespell
|
||||
# uses: codespell-project/actions-codespell@v2
|
||||
# with:
|
||||
# skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv,*.lock
|
||||
# ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
|
||||
# exclude_file: ./.github/workflows/codespell-exclude
|
||||
66
.github/workflows/codspeed.yml
vendored
66
.github/workflows/codspeed.yml
vendored
@@ -1,66 +0,0 @@
|
||||
name: '⚡ CodSpeed'
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
pull_request:
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: foo
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: foo
|
||||
DEEPSEEK_API_KEY: foo
|
||||
FIREWORKS_API_KEY: foo
|
||||
|
||||
jobs:
|
||||
codspeed:
|
||||
name: 'Benchmark'
|
||||
runs-on: ubuntu-latest
|
||||
if: ${{ !contains(github.event.pull_request.labels.*.name, 'codspeed-ignore') }}
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- working-directory: libs/core
|
||||
mode: walltime
|
||||
- working-directory: libs/partners/openai
|
||||
- working-directory: libs/partners/anthropic
|
||||
- working-directory: libs/partners/deepseek
|
||||
- working-directory: libs/partners/fireworks
|
||||
- working-directory: libs/partners/xai
|
||||
- working-directory: libs/partners/mistralai
|
||||
- working-directory: libs/partners/groq
|
||||
fail-fast: false
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We have to use 3.12 as 3.13 is not yet supported
|
||||
- name: '📦 Install UV Package Manager'
|
||||
uses: astral-sh/setup-uv@v6
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
run: uv sync --group test
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
|
||||
- name: '⚡ Run Benchmarks: ${{ matrix.working-directory }}'
|
||||
uses: CodSpeedHQ/action@v3
|
||||
with:
|
||||
token: ${{ secrets.CODSPEED_TOKEN }}
|
||||
run: |
|
||||
cd ${{ matrix.working-directory }}
|
||||
if [ "${{ matrix.working-directory }}" = "libs/core" ]; then
|
||||
uv run --no-sync pytest ./tests/benchmarks --codspeed
|
||||
else
|
||||
uv run --no-sync pytest ./tests/ --codspeed
|
||||
fi
|
||||
mode: ${{ matrix.mode || 'instrumentation' }}
|
||||
14
.github/workflows/langchain_release_docker.yml
vendored
Normal file
14
.github/workflows/langchain_release_docker.yml
vendored
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
name: docker/langchain/langchain Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
workflow_call: # Allows triggering from another workflow
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses: ./.github/workflows/_release_docker.yml
|
||||
with:
|
||||
dockerfile: docker/Dockerfile.base
|
||||
image: langchain/langchain
|
||||
secrets: inherit
|
||||
27
.github/workflows/people.yml
vendored
27
.github/workflows/people.yml
vendored
@@ -1,28 +1,37 @@
|
||||
name: '👥 LangChain People'
|
||||
run-name: 'Update People Data'
|
||||
# This workflow updates the LangChain People data by fetching the latest information from the LangChain Git
|
||||
name: LangChain People
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 14 1 * *"
|
||||
push:
|
||||
branches: [jacob/people]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
debug_enabled:
|
||||
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
|
||||
required: false
|
||||
default: 'false'
|
||||
|
||||
jobs:
|
||||
langchain-people:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
permissions: write-all
|
||||
steps:
|
||||
- name: '📋 Dump GitHub Context'
|
||||
- name: Dump GitHub context
|
||||
env:
|
||||
GITHUB_CONTEXT: ${{ toJson(github) }}
|
||||
run: echo "$GITHUB_CONTEXT"
|
||||
- uses: actions/checkout@v4
|
||||
# Ref: https://github.com/actions/runner/issues/2033
|
||||
- name: '🔧 Fix Git Safe Directory in Container'
|
||||
- name: Fix git safe.directory in container
|
||||
run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
|
||||
# Allow debugging with tmate
|
||||
- name: Setup tmate session
|
||||
uses: mxschmitt/action-tmate@v3
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
|
||||
with:
|
||||
limit-access-to-actor: true
|
||||
- uses: ./.github/actions/people
|
||||
with:
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
111
.github/workflows/pr_lint.yml
vendored
111
.github/workflows/pr_lint.yml
vendored
@@ -1,111 +0,0 @@
|
||||
# -----------------------------------------------------------------------------
|
||||
# PR Title Lint Workflow
|
||||
#
|
||||
# Purpose:
|
||||
# Enforces Conventional Commits format for pull request titles to maintain a
|
||||
# clear, consistent, and machine-readable change history across our repository.
|
||||
# This helps with automated changelog generation and semantic versioning.
|
||||
#
|
||||
# Enforced Commit Message Format (Conventional Commits 1.0.0):
|
||||
# <type>[optional scope]: <description>
|
||||
# [optional body]
|
||||
# [optional footer(s)]
|
||||
#
|
||||
# Allowed Types:
|
||||
# • feat — a new feature (MINOR bump)
|
||||
# • fix — a bug fix (PATCH bump)
|
||||
# • docs — documentation only changes
|
||||
# • style — formatting, missing semi-colons, etc.; no code change
|
||||
# • refactor — code change that neither fixes a bug nor adds a feature
|
||||
# • perf — code change that improves performance
|
||||
# • test — adding missing tests or correcting existing tests
|
||||
# • build — changes that affect the build system or external dependencies
|
||||
# • ci — continuous integration/configuration changes
|
||||
# • chore — other changes that don't modify src or test files
|
||||
# • revert — reverts a previous commit
|
||||
# • release — prepare a new release
|
||||
#
|
||||
# Allowed Scopes (optional):
|
||||
# core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek,
|
||||
# exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai,
|
||||
# perplexity, prompty, qdrant, xai
|
||||
#
|
||||
# Rules & Tips for New Committers:
|
||||
# 1. Subject (type) must start with a lowercase letter and, if possible, be
|
||||
# followed by a scope wrapped in parenthesis `(scope)`
|
||||
# 2. Breaking changes:
|
||||
# – Append "!" after type/scope (e.g., feat!: drop Node 12 support)
|
||||
# – Or include a footer "BREAKING CHANGE: <details>"
|
||||
# 3. Example PR titles:
|
||||
# feat(core): add multi‐tenant support
|
||||
# fix(cli): resolve flag parsing error
|
||||
# docs: update API usage examples
|
||||
# docs(openai): update API usage examples
|
||||
#
|
||||
# Resources:
|
||||
# • Conventional Commits spec: https://www.conventionalcommits.org/en/v1.0.0/
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
name: '🏷️ PR Title Lint'
|
||||
|
||||
permissions:
|
||||
pull-requests: read
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, edited, synchronize]
|
||||
|
||||
jobs:
|
||||
# Validates that PR title follows Conventional Commits specification
|
||||
lint-pr-title:
|
||||
name: 'Validate PR Title Format'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: '✅ Validate Conventional Commits Format'
|
||||
uses: amannn/action-semantic-pull-request@v5
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
with:
|
||||
types: |
|
||||
feat
|
||||
fix
|
||||
docs
|
||||
style
|
||||
refactor
|
||||
perf
|
||||
test
|
||||
build
|
||||
ci
|
||||
chore
|
||||
revert
|
||||
release
|
||||
scopes: |
|
||||
core
|
||||
cli
|
||||
langchain
|
||||
langchain_v1
|
||||
standard-tests
|
||||
text-splitters
|
||||
docs
|
||||
anthropic
|
||||
chroma
|
||||
deepseek
|
||||
exa
|
||||
fireworks
|
||||
groq
|
||||
huggingface
|
||||
mistralai
|
||||
nomic
|
||||
ollama
|
||||
openai
|
||||
perplexity
|
||||
prompty
|
||||
qdrant
|
||||
xai
|
||||
infra
|
||||
requireScope: false
|
||||
disallowScopes: |
|
||||
release
|
||||
[A-Z]+
|
||||
ignoreLabels: |
|
||||
ignore-lint-pr-title
|
||||
75
.github/workflows/run_notebooks.yml
vendored
75
.github/workflows/run_notebooks.yml
vendored
@@ -1,75 +0,0 @@
|
||||
name: '📓 Validate Documentation Notebooks'
|
||||
run-name: 'Test notebooks in ${{ inputs.working-directory }}'
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
python_version:
|
||||
description: 'Python version'
|
||||
required: false
|
||||
default: '3.11'
|
||||
working-directory:
|
||||
description: 'Working directory or subset (e.g., docs/docs/tutorials/llm_chain.ipynb or docs/docs/how_to)'
|
||||
required: false
|
||||
default: 'all'
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
|
||||
name: '📑 Test Documentation Notebooks'
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ github.event.inputs.python_version || '3.11' }}
|
||||
|
||||
- name: '🔐 Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: '🔐 Configure AWS Credentials'
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ secrets.AWS_REGION }}
|
||||
|
||||
- name: '📦 Install Dependencies'
|
||||
run: |
|
||||
uv sync --group dev --group test
|
||||
|
||||
- name: '📦 Pre-download Test Files'
|
||||
run: |
|
||||
uv run python docs/scripts/cache_data.py
|
||||
curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 docs/docs/how_to/Chinook.db
|
||||
cp docs/docs/how_to/Chinook.db docs/docs/tutorials/Chinook.db
|
||||
|
||||
- name: '🔧 Prepare Notebooks for CI'
|
||||
run: |
|
||||
uv run python docs/scripts/prepare_notebooks_for_ci.py --comment-install-cells --working-directory ${{ github.event.inputs.working-directory || 'all' }}
|
||||
|
||||
- name: '🚀 Execute Notebooks'
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
TAVILY_API_KEY: ${{ secrets.TAVILY_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
WORKING_DIRECTORY: ${{ github.event.inputs.working-directory || 'all' }}
|
||||
run: |
|
||||
./docs/scripts/execute_notebooks.sh $WORKING_DIRECTORY
|
||||
114
.github/workflows/scheduled_test.yml
vendored
114
.github/workflows/scheduled_test.yml
vendored
@@ -1,71 +1,33 @@
|
||||
name: '⏰ Scheduled Integration Tests'
|
||||
run-name: "Run Integration Tests - ${{ inputs.working-directory-force || 'all libs' }} (Python ${{ inputs.python-version-force || '3.9, 3.11' }})"
|
||||
name: Scheduled tests
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows maintainers to trigger the workflow manually in GitHub UI
|
||||
inputs:
|
||||
working-directory-force:
|
||||
type: string
|
||||
description: "From which folder this pipeline executes - defaults to all in matrix - example value: libs/partners/anthropic"
|
||||
python-version-force:
|
||||
type: string
|
||||
description: "Python version to use - defaults to 3.9 and 3.11 in matrix - example value: 3.9"
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
schedule:
|
||||
- cron: '0 13 * * *' # Runs daily at 1PM UTC (9AM EDT/6AM PDT)
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.8.4"
|
||||
UV_FROZEN: "true"
|
||||
DEFAULT_LIBS: '["libs/partners/openai", "libs/partners/anthropic", "libs/partners/fireworks", "libs/partners/groq", "libs/partners/mistralai", "libs/partners/xai", "libs/partners/google-vertexai", "libs/partners/google-genai", "libs/partners/aws"]'
|
||||
POETRY_LIBS: ("libs/partners/google-vertexai" "libs/partners/google-genai" "libs/partners/aws")
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Generate dynamic test matrix based on input parameters or defaults
|
||||
# Only runs on the main repo (for scheduled runs) or when manually triggered
|
||||
compute-matrix:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
runs-on: ubuntu-latest
|
||||
name: '📋 Compute Test Matrix'
|
||||
outputs:
|
||||
matrix: ${{ steps.set-matrix.outputs.matrix }}
|
||||
steps:
|
||||
- name: '🔢 Generate Python & Library Matrix'
|
||||
id: set-matrix
|
||||
env:
|
||||
DEFAULT_LIBS: ${{ env.DEFAULT_LIBS }}
|
||||
WORKING_DIRECTORY_FORCE: ${{ github.event.inputs.working-directory-force || '' }}
|
||||
PYTHON_VERSION_FORCE: ${{ github.event.inputs.python-version-force || '' }}
|
||||
run: |
|
||||
# echo "matrix=..." where matrix is a json formatted str with keys python-version and working-directory
|
||||
# python-version should default to 3.9 and 3.11, but is overridden to [PYTHON_VERSION_FORCE] if set
|
||||
# working-directory should default to DEFAULT_LIBS, but is overridden to [WORKING_DIRECTORY_FORCE] if set
|
||||
python_version='["3.9", "3.11"]'
|
||||
working_directory="$DEFAULT_LIBS"
|
||||
if [ -n "$PYTHON_VERSION_FORCE" ]; then
|
||||
python_version="[\"$PYTHON_VERSION_FORCE\"]"
|
||||
fi
|
||||
if [ -n "$WORKING_DIRECTORY_FORCE" ]; then
|
||||
working_directory="[\"$WORKING_DIRECTORY_FORCE\"]"
|
||||
fi
|
||||
matrix="{\"python-version\": $python_version, \"working-directory\": $working_directory}"
|
||||
echo $matrix
|
||||
echo "matrix=$matrix" >> $GITHUB_OUTPUT
|
||||
# Run integration tests against partner libraries with live API credentials
|
||||
# Tests are run with both Poetry and UV depending on the library's setup
|
||||
build:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
name: '🐍 Python ${{ matrix.python-version }}: ${{ matrix.working-directory }}'
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
name: Python ${{ matrix.python-version }} - ${{ matrix.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
needs: [compute-matrix]
|
||||
timeout-minutes: 20
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(needs.compute-matrix.outputs.matrix).python-version }}
|
||||
working-directory: ${{ fromJSON(needs.compute-matrix.outputs.matrix).working-directory }}
|
||||
python-version:
|
||||
- "3.9"
|
||||
- "3.11"
|
||||
working-directory:
|
||||
- "libs/partners/openai"
|
||||
- "libs/partners/anthropic"
|
||||
- "libs/partners/fireworks"
|
||||
- "libs/partners/groq"
|
||||
- "libs/partners/mistralai"
|
||||
- "libs/partners/google-vertexai"
|
||||
- "libs/partners/google-genai"
|
||||
- "libs/partners/aws"
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
@@ -80,7 +42,7 @@ jobs:
|
||||
repository: langchain-ai/langchain-aws
|
||||
path: langchain-aws
|
||||
|
||||
- name: '📦 Organize External Libraries'
|
||||
- name: Move libs
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
@@ -89,8 +51,7 @@ jobs:
|
||||
mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
|
||||
mv langchain-aws/libs/aws langchain/libs/partners/aws
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.python-version }} + Poetry'
|
||||
if: contains(env.POETRY_LIBS, matrix.working-directory)
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: "./langchain/.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
@@ -98,77 +59,56 @@ jobs:
|
||||
working-directory: langchain/${{ matrix.working-directory }}
|
||||
cache-key: scheduled
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.python-version }} + UV'
|
||||
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
|
||||
uses: "./langchain/.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
|
||||
- name: '🔐 Authenticate to Google Cloud'
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: '🔐 Configure AWS Credentials'
|
||||
- name: Configure AWS Credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ secrets.AWS_REGION }}
|
||||
|
||||
- name: '📦 Install Dependencies (Poetry)'
|
||||
if: contains(env.POETRY_LIBS, matrix.working-directory)
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
poetry install --with=test_integration,test
|
||||
|
||||
- name: '📦 Install Dependencies (UV)'
|
||||
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with uv..."
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
uv sync --group test --group test_integration
|
||||
|
||||
- name: '🚀 Run Integration Tests'
|
||||
- name: Run integration tests
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
run: |
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
make integration_tests
|
||||
|
||||
- name: '🧹 Clean up External Libraries'
|
||||
# Clean up external libraries to avoid affecting git status check
|
||||
run: |
|
||||
- name: Remove external libraries
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
langchain/libs/partners/google-vertexai \
|
||||
langchain/libs/partners/aws
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
working-directory: langchain
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -1,4 +1,5 @@
|
||||
.vs/
|
||||
.vscode/
|
||||
.idea/
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
@@ -58,7 +59,6 @@ coverage.xml
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
.codspeed/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
|
||||
@@ -1,14 +0,0 @@
|
||||
{
|
||||
"MD013": false,
|
||||
"MD024": {
|
||||
"siblings_only": true
|
||||
},
|
||||
"MD025": false,
|
||||
"MD033": false,
|
||||
"MD034": false,
|
||||
"MD036": false,
|
||||
"MD041": false,
|
||||
"MD046": {
|
||||
"style": "fenced"
|
||||
}
|
||||
}
|
||||
@@ -1,111 +0,0 @@
|
||||
repos:
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: core
|
||||
name: format core
|
||||
language: system
|
||||
entry: make -C libs/core format
|
||||
files: ^libs/core/
|
||||
pass_filenames: false
|
||||
- id: langchain
|
||||
name: format langchain
|
||||
language: system
|
||||
entry: make -C libs/langchain format
|
||||
files: ^libs/langchain/
|
||||
pass_filenames: false
|
||||
- id: standard-tests
|
||||
name: format standard-tests
|
||||
language: system
|
||||
entry: make -C libs/standard-tests format
|
||||
files: ^libs/standard-tests/
|
||||
pass_filenames: false
|
||||
- id: text-splitters
|
||||
name: format text-splitters
|
||||
language: system
|
||||
entry: make -C libs/text-splitters format
|
||||
files: ^libs/text-splitters/
|
||||
pass_filenames: false
|
||||
- id: anthropic
|
||||
name: format partners/anthropic
|
||||
language: system
|
||||
entry: make -C libs/partners/anthropic format
|
||||
files: ^libs/partners/anthropic/
|
||||
pass_filenames: false
|
||||
- id: chroma
|
||||
name: format partners/chroma
|
||||
language: system
|
||||
entry: make -C libs/partners/chroma format
|
||||
files: ^libs/partners/chroma/
|
||||
pass_filenames: false
|
||||
- id: couchbase
|
||||
name: format partners/couchbase
|
||||
language: system
|
||||
entry: make -C libs/partners/couchbase format
|
||||
files: ^libs/partners/couchbase/
|
||||
pass_filenames: false
|
||||
- id: exa
|
||||
name: format partners/exa
|
||||
language: system
|
||||
entry: make -C libs/partners/exa format
|
||||
files: ^libs/partners/exa/
|
||||
pass_filenames: false
|
||||
- id: fireworks
|
||||
name: format partners/fireworks
|
||||
language: system
|
||||
entry: make -C libs/partners/fireworks format
|
||||
files: ^libs/partners/fireworks/
|
||||
pass_filenames: false
|
||||
- id: groq
|
||||
name: format partners/groq
|
||||
language: system
|
||||
entry: make -C libs/partners/groq format
|
||||
files: ^libs/partners/groq/
|
||||
pass_filenames: false
|
||||
- id: huggingface
|
||||
name: format partners/huggingface
|
||||
language: system
|
||||
entry: make -C libs/partners/huggingface format
|
||||
files: ^libs/partners/huggingface/
|
||||
pass_filenames: false
|
||||
- id: mistralai
|
||||
name: format partners/mistralai
|
||||
language: system
|
||||
entry: make -C libs/partners/mistralai format
|
||||
files: ^libs/partners/mistralai/
|
||||
pass_filenames: false
|
||||
- id: nomic
|
||||
name: format partners/nomic
|
||||
language: system
|
||||
entry: make -C libs/partners/nomic format
|
||||
files: ^libs/partners/nomic/
|
||||
pass_filenames: false
|
||||
- id: ollama
|
||||
name: format partners/ollama
|
||||
language: system
|
||||
entry: make -C libs/partners/ollama format
|
||||
files: ^libs/partners/ollama/
|
||||
pass_filenames: false
|
||||
- id: openai
|
||||
name: format partners/openai
|
||||
language: system
|
||||
entry: make -C libs/partners/openai format
|
||||
files: ^libs/partners/openai/
|
||||
pass_filenames: false
|
||||
- id: prompty
|
||||
name: format partners/prompty
|
||||
language: system
|
||||
entry: make -C libs/partners/prompty format
|
||||
files: ^libs/partners/prompty/
|
||||
pass_filenames: false
|
||||
- id: qdrant
|
||||
name: format partners/qdrant
|
||||
language: system
|
||||
entry: make -C libs/partners/qdrant format
|
||||
files: ^libs/partners/qdrant/
|
||||
pass_filenames: false
|
||||
- id: root
|
||||
name: format docs, cookbook
|
||||
language: system
|
||||
entry: make format
|
||||
files: ^(docs|cookbook)/
|
||||
pass_filenames: false
|
||||
@@ -1,7 +1,12 @@
|
||||
# Read the Docs configuration file
|
||||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
||||
|
||||
# Required
|
||||
version: 2
|
||||
|
||||
formats:
|
||||
- pdf
|
||||
|
||||
# Set the version of Python and other tools you might need
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
@@ -10,16 +15,15 @@ build:
|
||||
commands:
|
||||
- mkdir -p $READTHEDOCS_OUTPUT
|
||||
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
|
||||
|
||||
# Build documentation in the docs/ directory with Sphinx
|
||||
sphinx:
|
||||
configuration: docs/api_reference/conf.py
|
||||
configuration: docs/api_reference/conf.py
|
||||
|
||||
# If using Sphinx, optionally build your docs in additional formats such as PDF
|
||||
formats:
|
||||
- pdf
|
||||
# formats:
|
||||
# - pdf
|
||||
|
||||
# Optionally declare the Python requirements required to build your docs
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/api_reference/requirements.txt
|
||||
install:
|
||||
- requirements: docs/api_reference/requirements.txt
|
||||
|
||||
21
.vscode/extensions.json
vendored
21
.vscode/extensions.json
vendored
@@ -1,21 +0,0 @@
|
||||
{
|
||||
"recommendations": [
|
||||
"ms-python.python",
|
||||
"charliermarsh.ruff",
|
||||
"ms-python.mypy-type-checker",
|
||||
"ms-toolsai.jupyter",
|
||||
"ms-toolsai.jupyter-keymap",
|
||||
"ms-toolsai.jupyter-renderers",
|
||||
"ms-toolsai.vscode-jupyter-cell-tags",
|
||||
"ms-toolsai.vscode-jupyter-slideshow",
|
||||
"yzhang.markdown-all-in-one",
|
||||
"davidanson.vscode-markdownlint",
|
||||
"bierner.markdown-mermaid",
|
||||
"bierner.markdown-preview-github-styles",
|
||||
"eamodio.gitlens",
|
||||
"github.vscode-pull-request-github",
|
||||
"github.vscode-github-actions",
|
||||
"redhat.vscode-yaml",
|
||||
"editorconfig.editorconfig",
|
||||
],
|
||||
}
|
||||
82
.vscode/settings.json
vendored
82
.vscode/settings.json
vendored
@@ -1,82 +0,0 @@
|
||||
{
|
||||
"python.analysis.include": [
|
||||
"libs/**",
|
||||
"docs/**",
|
||||
"cookbook/**"
|
||||
],
|
||||
"python.analysis.exclude": [
|
||||
"**/node_modules",
|
||||
"**/__pycache__",
|
||||
"**/.pytest_cache",
|
||||
"**/.*",
|
||||
"_dist/**",
|
||||
"docs/_build/**",
|
||||
"docs/api_reference/_build/**"
|
||||
],
|
||||
"python.analysis.autoImportCompletions": true,
|
||||
"python.analysis.typeCheckingMode": "basic",
|
||||
"python.testing.cwd": "${workspaceFolder}",
|
||||
"python.linting.enabled": true,
|
||||
"python.linting.ruffEnabled": true,
|
||||
"[python]": {
|
||||
"editor.formatOnSave": true,
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports.ruff": "explicit",
|
||||
"source.fixAll": "explicit"
|
||||
},
|
||||
"editor.defaultFormatter": "charliermarsh.ruff"
|
||||
},
|
||||
"editor.rulers": [
|
||||
88
|
||||
],
|
||||
"editor.tabSize": 4,
|
||||
"editor.insertSpaces": true,
|
||||
"editor.trimAutoWhitespace": true,
|
||||
"files.trimTrailingWhitespace": true,
|
||||
"files.insertFinalNewline": true,
|
||||
"files.exclude": {
|
||||
"**/__pycache__": true,
|
||||
"**/.pytest_cache": true,
|
||||
"**/*.pyc": true,
|
||||
"**/.mypy_cache": true,
|
||||
"**/.ruff_cache": true,
|
||||
"_dist/**": true,
|
||||
"docs/_build/**": true,
|
||||
"docs/api_reference/_build/**": true,
|
||||
"**/node_modules": true,
|
||||
"**/.git": false
|
||||
},
|
||||
"search.exclude": {
|
||||
"**/__pycache__": true,
|
||||
"**/*.pyc": true,
|
||||
"_dist/**": true,
|
||||
"docs/_build/**": true,
|
||||
"docs/api_reference/_build/**": true,
|
||||
"**/node_modules": true,
|
||||
"**/.git": true,
|
||||
"uv.lock": true,
|
||||
"yarn.lock": true
|
||||
},
|
||||
"git.autofetch": true,
|
||||
"git.enableSmartCommit": true,
|
||||
"jupyter.askForKernelRestart": false,
|
||||
"jupyter.interactiveWindow.textEditor.executeSelection": true,
|
||||
"[markdown]": {
|
||||
"editor.wordWrap": "on",
|
||||
"editor.quickSuggestions": {
|
||||
"comments": "off",
|
||||
"strings": "off",
|
||||
"other": "off"
|
||||
}
|
||||
},
|
||||
"[yaml]": {
|
||||
"editor.tabSize": 2,
|
||||
"editor.insertSpaces": true
|
||||
},
|
||||
"[json]": {
|
||||
"editor.tabSize": 2,
|
||||
"editor.insertSpaces": true
|
||||
},
|
||||
"python.terminal.activateEnvironment": false,
|
||||
"python.defaultInterpreterPath": "./.venv/bin/python"
|
||||
}
|
||||
325
CLAUDE.md
325
CLAUDE.md
@@ -1,325 +0,0 @@
|
||||
# Global Development Guidelines for LangChain Projects
|
||||
|
||||
## Core Development Principles
|
||||
|
||||
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
|
||||
|
||||
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
|
||||
|
||||
❌ **Bad - Breaking Change:**
|
||||
|
||||
```python
|
||||
def get_user(id, verbose=False): # Changed from `user_id`
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Good - Stable Interface:**
|
||||
|
||||
```python
|
||||
def get_user(user_id: str, verbose: bool = False) -> User:
|
||||
"""Retrieve user by ID with optional verbose output."""
|
||||
pass
|
||||
```
|
||||
|
||||
**Before making ANY changes to public APIs:**
|
||||
|
||||
- Check if the function/class is exported in `__init__.py`
|
||||
- Look for existing usage patterns in tests and examples
|
||||
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
|
||||
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
|
||||
|
||||
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
|
||||
|
||||
### 2. Code Quality Standards
|
||||
|
||||
**All Python code MUST include type hints and return types.**
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def p(u, d):
|
||||
return [x for x in u if x not in d]
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
|
||||
"""Filter out users that are not in the known users set.
|
||||
|
||||
Args:
|
||||
users: List of user identifiers to filter.
|
||||
known_users: Set of known/valid user identifiers.
|
||||
|
||||
Returns:
|
||||
List of users that are not in the known_users set.
|
||||
"""
|
||||
return [user for user in users if user not in known_users]
|
||||
```
|
||||
|
||||
**Style Requirements:**
|
||||
|
||||
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
|
||||
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
|
||||
- Avoid unnecessary abstraction or premature optimization
|
||||
- Follow existing patterns in the codebase you're modifying
|
||||
|
||||
### 3. Testing Requirements
|
||||
|
||||
**Every new feature or bugfix MUST be covered by unit tests.**
|
||||
|
||||
**Test Organization:**
|
||||
|
||||
- Unit tests: `tests/unit_tests/` (no network calls allowed)
|
||||
- Integration tests: `tests/integration_tests/` (network calls permitted)
|
||||
- Use `pytest` as the testing framework
|
||||
|
||||
**Test Quality Checklist:**
|
||||
|
||||
- [ ] Tests fail when your new logic is broken
|
||||
- [ ] Happy path is covered
|
||||
- [ ] Edge cases and error conditions are tested
|
||||
- [ ] Use fixtures/mocks for external dependencies
|
||||
- [ ] Tests are deterministic (no flaky tests)
|
||||
|
||||
Checklist questions:
|
||||
|
||||
- [ ] Does the test suite fail if your new logic is broken?
|
||||
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
|
||||
- [ ] Do tests use fixtures or mocks where needed?
|
||||
|
||||
```python
|
||||
def test_filter_unknown_users():
|
||||
"""Test filtering unknown users from a list."""
|
||||
users = ["alice", "bob", "charlie"]
|
||||
known_users = {"alice", "bob"}
|
||||
|
||||
result = filter_unknown_users(users, known_users)
|
||||
|
||||
assert result == ["charlie"]
|
||||
assert len(result) == 1
|
||||
```
|
||||
|
||||
### 4. Security and Risk Assessment
|
||||
|
||||
**Security Checklist:**
|
||||
|
||||
- No `eval()`, `exec()`, or `pickle` on user-controlled input
|
||||
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
|
||||
- Remove unreachable/commented code before committing
|
||||
- Race conditions or resource leaks (file handles, sockets, threads).
|
||||
- Ensure proper resource cleanup (file handles, connections)
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def load_config(path):
|
||||
with open(path) as f:
|
||||
return eval(f.read()) # ⚠️ Never eval config
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def load_config(path: str) -> dict:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
```
|
||||
|
||||
### 5. Documentation Standards
|
||||
|
||||
**Use Google-style docstrings with Args section for all public functions.**
|
||||
|
||||
❌ **Insufficient Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to, msg):
|
||||
"""Send an email to a recipient."""
|
||||
```
|
||||
|
||||
✅ **Complete Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
|
||||
"""
|
||||
Send an email to a recipient with specified priority.
|
||||
|
||||
Args:
|
||||
to: The email address of the recipient.
|
||||
msg: The message body to send.
|
||||
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
|
||||
|
||||
Returns:
|
||||
True if email was sent successfully, False otherwise.
|
||||
|
||||
Raises:
|
||||
InvalidEmailError: If the email address format is invalid.
|
||||
SMTPConnectionError: If unable to connect to email server.
|
||||
"""
|
||||
```
|
||||
|
||||
**Documentation Guidelines:**
|
||||
|
||||
- Types go in function signatures, NOT in docstrings
|
||||
- Focus on "why" rather than "what" in descriptions
|
||||
- Document all parameters, return values, and exceptions
|
||||
- Keep descriptions concise but clear
|
||||
- Use reStructuredText for docstrings to enable rich formatting
|
||||
|
||||
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
|
||||
|
||||
### 6. Architectural Improvements
|
||||
|
||||
**When you encounter code that could be improved, suggest better designs:**
|
||||
|
||||
❌ **Poor Design:**
|
||||
|
||||
```python
|
||||
def process_data(data, db_conn, email_client, logger):
|
||||
# Function doing too many things
|
||||
validated = validate_data(data)
|
||||
result = db_conn.save(validated)
|
||||
email_client.send_notification(result)
|
||||
logger.log(f"Processed {len(data)} items")
|
||||
return result
|
||||
```
|
||||
|
||||
✅ **Better Design:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ProcessingResult:
|
||||
"""Result of data processing operation."""
|
||||
items_processed: int
|
||||
success: bool
|
||||
errors: List[str] = field(default_factory=list)
|
||||
|
||||
class DataProcessor:
|
||||
"""Handles data validation, storage, and notification."""
|
||||
|
||||
def __init__(self, db_conn: Database, email_client: EmailClient):
|
||||
self.db = db_conn
|
||||
self.email = email_client
|
||||
|
||||
def process(self, data: List[dict]) -> ProcessingResult:
|
||||
"""Process and store data with notifications."""
|
||||
validated = self._validate_data(data)
|
||||
result = self.db.save(validated)
|
||||
self._notify_completion(result)
|
||||
return result
|
||||
```
|
||||
|
||||
**Design Improvement Areas:**
|
||||
|
||||
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
|
||||
|
||||
- Reduce code duplication through shared utilities
|
||||
- Make unit testing easier
|
||||
- Improve separation of concerns (single responsibility)
|
||||
- Make unit testing easier through dependency injection
|
||||
- Add clarity without adding complexity
|
||||
- Prefer dataclasses for structured data
|
||||
|
||||
## Development Tools & Commands
|
||||
|
||||
### Package Management
|
||||
|
||||
```bash
|
||||
# Add package
|
||||
uv add package-name
|
||||
|
||||
# Sync project dependencies
|
||||
uv sync
|
||||
uv lock
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run unit tests (no network)
|
||||
make test
|
||||
|
||||
# Don't run integration tests, as API keys must be set
|
||||
|
||||
# Run specific test file
|
||||
uv run --group test pytest tests/unit_tests/test_specific.py
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
|
||||
```bash
|
||||
# Lint code
|
||||
make lint
|
||||
|
||||
# Format code
|
||||
make format
|
||||
|
||||
# Type checking
|
||||
uv run --group lint mypy .
|
||||
```
|
||||
|
||||
### Dependency Management Patterns
|
||||
|
||||
**Local Development Dependencies:**
|
||||
|
||||
```toml
|
||||
[tool.uv.sources]
|
||||
langchain-core = { path = "../core", editable = true }
|
||||
langchain-tests = { path = "../standard-tests", editable = true }
|
||||
```
|
||||
|
||||
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search_database(query: str) -> str:
|
||||
"""Search the database for relevant information.
|
||||
|
||||
Args:
|
||||
query: The search query string.
|
||||
"""
|
||||
# Implementation here
|
||||
return results
|
||||
```
|
||||
|
||||
## Commit Standards
|
||||
|
||||
**Use Conventional Commits format for PR titles:**
|
||||
|
||||
- `feat(core): add multi-tenant support`
|
||||
- `fix(cli): resolve flag parsing error`
|
||||
- `docs: update API usage examples`
|
||||
- `docs(openai): update API usage examples`
|
||||
|
||||
## Framework-Specific Guidelines
|
||||
|
||||
- Follow the existing patterns in `langchain-core` for base abstractions
|
||||
- Use `langchain_core.callbacks` for execution tracking
|
||||
- Implement proper streaming support where applicable
|
||||
- Avoid deprecated components like legacy `LLMChain`
|
||||
|
||||
### Partner Integrations
|
||||
|
||||
- Follow the established patterns in existing partner libraries
|
||||
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
|
||||
- Include comprehensive integration tests
|
||||
- Document API key requirements and authentication
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Checklist
|
||||
|
||||
Before submitting code changes:
|
||||
|
||||
- [ ] **Breaking Changes**: Verified no public API changes
|
||||
- [ ] **Type Hints**: All functions have complete type annotations
|
||||
- [ ] **Tests**: New functionality is fully tested
|
||||
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
|
||||
- [ ] **Documentation**: Google-style docstrings for public functions
|
||||
- [ ] **Code Quality**: `make lint` and `make format` pass
|
||||
- [ ] **Architecture**: Suggested improvements where applicable
|
||||
- [ ] **Commit Message**: Follows Conventional Commits format
|
||||
73
MIGRATE.md
73
MIGRATE.md
@@ -1,11 +1,70 @@
|
||||
# Migrating
|
||||
|
||||
Please see the following guides for migrating LangChain code:
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
* Migrate to [LangChain v0.3](https://python.langchain.com/docs/versions/v0_3/)
|
||||
* Migrate to [LangChain v0.2](https://python.langchain.com/docs/versions/v0_2/)
|
||||
* Migrating from [LangChain 0.0.x Chains](https://python.langchain.com/docs/versions/migrating_chains/)
|
||||
* Upgrade to [LangGraph Memory](https://python.langchain.com/docs/versions/migrating_memory/)
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
On that date, we will remove functionality from `langchain`.
|
||||
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
|
||||
|
||||
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
|
||||
This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain.
|
||||
### Migrating to `langchain_experimental`
|
||||
|
||||
We are moving any experimental components of LangChain, or components with vulnerability issues, into `langchain_experimental`.
|
||||
This guide covers how to migrate.
|
||||
|
||||
### Installation
|
||||
|
||||
Previously:
|
||||
|
||||
`pip install -U langchain`
|
||||
|
||||
Now (only if you want to access things in experimental):
|
||||
|
||||
`pip install -U langchain langchain_experimental`
|
||||
|
||||
### Things in `langchain.experimental`
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.experimental import ...`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental import ...`
|
||||
|
||||
### PALChain
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.chains import PALChain`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.pal_chain import PALChain`
|
||||
|
||||
### SQLDatabaseChain
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.chains import SQLDatabaseChain`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.sql import SQLDatabaseChain`
|
||||
|
||||
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out this [`SQL question-answering tutorial`](https://python.langchain.com/v0.2/docs/tutorials/sql_qa/#convert-question-to-sql-query)
|
||||
|
||||
`from langchain.chains import create_sql_query_chain`
|
||||
|
||||
### `load_prompt` for Python files
|
||||
|
||||
Note: this only applies if you want to load Python files as prompts.
|
||||
If you want to load json/yaml files, no change is needed.
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.prompts import load_prompt`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.prompts import load_prompt`
|
||||
|
||||
90
Makefile
90
Makefile
@@ -1,13 +1,13 @@
|
||||
.PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck spell_check spell_fix lint lint_package lint_tests format format_diff
|
||||
|
||||
.EXPORT_ALL_VARIABLES:
|
||||
UV_FROZEN = true
|
||||
|
||||
## help: Show this help info.
|
||||
help: Makefile
|
||||
@printf "\n\033[1mUsage: make <TARGETS> ...\033[0m\n\n\033[1mTargets:\033[0m\n\n"
|
||||
@sed -n 's/^## //p' $< | awk -F':' '{printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort | sed -e 's/^/ /'
|
||||
|
||||
## all: Default target, shows help.
|
||||
all: help
|
||||
|
||||
## clean: Clean documentation and API documentation artifacts.
|
||||
clean: docs_clean api_docs_clean
|
||||
|
||||
@@ -16,79 +16,49 @@ clean: docs_clean api_docs_clean
|
||||
######################
|
||||
|
||||
## docs_build: Build the documentation.
|
||||
docs_build: docs_clean
|
||||
@echo "📚 Building LangChain documentation..."
|
||||
docs_build:
|
||||
cd docs && make build
|
||||
@echo "✅ Documentation build complete!"
|
||||
|
||||
## docs_clean: Clean the documentation build artifacts.
|
||||
docs_clean:
|
||||
@echo "🧹 Cleaning documentation artifacts..."
|
||||
cd docs && make clean
|
||||
@echo "✅ LangChain documentation cleaned"
|
||||
|
||||
## docs_linkcheck: Run linkchecker on the documentation.
|
||||
docs_linkcheck:
|
||||
@echo "🔗 Checking documentation links..."
|
||||
@if [ -d _dist/docs ]; then \
|
||||
uv run --group test linkchecker _dist/docs/ --ignore-url node_modules; \
|
||||
else \
|
||||
echo "⚠️ Documentation not built. Run 'make docs_build' first."; \
|
||||
exit 1; \
|
||||
fi
|
||||
@echo "✅ Link check complete"
|
||||
poetry run linkchecker _dist/docs/ --ignore-url node_modules
|
||||
|
||||
## api_docs_build: Build the API Reference documentation.
|
||||
api_docs_build: clean
|
||||
@echo "📖 Building API Reference documentation..."
|
||||
uv pip install -e libs/cli
|
||||
uv run --group docs python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && uv run --group docs make html
|
||||
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
@echo "✅ API documentation built"
|
||||
@echo "🌐 Opening documentation in browser..."
|
||||
open docs/api_reference/_build/html/reference.html
|
||||
api_docs_build:
|
||||
poetry run python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && poetry run make html
|
||||
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
|
||||
API_PKG ?= text-splitters
|
||||
|
||||
api_docs_quick_preview: clean
|
||||
@echo "⚡ Building quick API preview for $(API_PKG)..."
|
||||
uv run --group docs python docs/api_reference/create_api_rst.py $(API_PKG)
|
||||
cd docs/api_reference && uv run --group docs make html
|
||||
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
@echo "🌐 Opening preview in browser..."
|
||||
api_docs_quick_preview:
|
||||
poetry run python docs/api_reference/create_api_rst.py $(API_PKG)
|
||||
cd docs/api_reference && poetry run make html
|
||||
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
open docs/api_reference/_build/html/reference.html
|
||||
|
||||
## api_docs_clean: Clean the API Reference documentation build artifacts.
|
||||
api_docs_clean:
|
||||
@echo "🧹 Cleaning API documentation artifacts..."
|
||||
find ./docs/api_reference -name '*_api_reference.rst' -delete
|
||||
git clean -fdX ./docs/api_reference
|
||||
rm -f docs/api_reference/index.md
|
||||
@echo "✅ API documentation cleaned"
|
||||
rm docs/api_reference/index.md
|
||||
|
||||
|
||||
## api_docs_linkcheck: Run linkchecker on the API Reference documentation.
|
||||
api_docs_linkcheck:
|
||||
@echo "🔗 Checking API documentation links..."
|
||||
@if [ -f docs/api_reference/_build/html/index.html ]; then \
|
||||
uv run --group test linkchecker docs/api_reference/_build/html/index.html; \
|
||||
else \
|
||||
echo "⚠️ API documentation not built. Run 'make api_docs_build' first."; \
|
||||
exit 1; \
|
||||
fi
|
||||
@echo "✅ API link check complete"
|
||||
poetry run linkchecker docs/api_reference/_build/html/index.html
|
||||
|
||||
## spell_check: Run codespell on the project.
|
||||
spell_check:
|
||||
@echo "✏️ Checking spelling across project..."
|
||||
uv run --group codespell codespell --toml pyproject.toml
|
||||
@echo "✅ Spell check complete"
|
||||
poetry run codespell --toml pyproject.toml
|
||||
|
||||
## spell_fix: Run codespell on the project and fix the errors.
|
||||
spell_fix:
|
||||
@echo "✏️ Fixing spelling errors across project..."
|
||||
uv run --group codespell codespell --toml pyproject.toml -w
|
||||
@echo "✅ Spelling errors fixed"
|
||||
poetry run codespell --toml pyproject.toml -w
|
||||
|
||||
######################
|
||||
# LINTING AND FORMATTING
|
||||
@@ -96,24 +66,12 @@ spell_fix:
|
||||
|
||||
## lint: Run linting on the project.
|
||||
lint lint_package lint_tests:
|
||||
@echo "🔍 Running code linting and checks..."
|
||||
uv run --group lint ruff check docs cookbook
|
||||
uv run --group lint ruff format docs cookbook cookbook --diff
|
||||
git --no-pager grep 'from langchain import' docs cookbook | grep -vE 'from langchain import (hub)' && echo "Error: no importing langchain from root in docs, except for hub" && exit 1 || exit 0
|
||||
|
||||
git --no-pager grep 'api.python.langchain.com' -- docs/docs ':!docs/docs/additional_resources/arxiv_references.mdx' ':!docs/docs/integrations/document_loaders/sitemap.ipynb' || exit 0 && \
|
||||
echo "Error: you should link python.langchain.com/api_reference, not api.python.langchain.com in the docs" && \
|
||||
exit 1
|
||||
@echo "✅ Linting complete"
|
||||
poetry run ruff check docs templates cookbook
|
||||
poetry run ruff format docs templates cookbook --diff
|
||||
poetry run ruff check --select I docs templates cookbook
|
||||
git grep 'from langchain import' docs/docs templates cookbook | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
|
||||
|
||||
## format: Format the project files.
|
||||
format format_diff:
|
||||
@echo "🎨 Formatting project files..."
|
||||
uv run --group lint ruff format docs cookbook
|
||||
uv run --group lint ruff check --fix docs cookbook
|
||||
@echo "✅ Formatting complete"
|
||||
|
||||
update-package-downloads:
|
||||
@echo "📊 Updating package download statistics..."
|
||||
uv run python docs/scripts/packages_yml_get_downloads.py
|
||||
@echo "✅ Package downloads updated"
|
||||
poetry run ruff format docs templates cookbook
|
||||
poetry run ruff check --select I --fix docs templates cookbook
|
||||
|
||||
176
README.md
176
README.md
@@ -1,12 +1,6 @@
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: light)" srcset="docs/static/img/logo-dark.svg">
|
||||
<source media="(prefers-color-scheme: dark)" srcset="docs/static/img/logo-light.svg">
|
||||
<img alt="LangChain Logo" src="docs/static/img/logo-dark.svg" width="80%">
|
||||
</picture>
|
||||
# 🦜️🔗 LangChain
|
||||
|
||||
<div>
|
||||
<br>
|
||||
</div>
|
||||
⚡ Build context-aware reasoning applications ⚡
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/releases)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)
|
||||
@@ -15,73 +9,133 @@
|
||||
[](https://star-history.com/#langchain-ai/langchain)
|
||||
[](https://github.com/langchain-ai/langchain/issues)
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
[<img src="https://github.com/codespaces/badge.svg" alt="Open in Github Codespace" title="Open in Github Codespace" width="150" height="20">](https://codespaces.new/langchain-ai/langchain)
|
||||
[](https://codespaces.new/langchain-ai/langchain)
|
||||
[](https://twitter.com/langchainai)
|
||||
[](https://codspeed.io/langchain-ai/langchain)
|
||||
|
||||
> [!NOTE]
|
||||
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
|
||||
LangChain is a framework for building LLM-powered applications. It helps you chain
|
||||
together interoperable components and third-party integrations to simplify AI
|
||||
application development — all while future-proofing decisions as the underlying
|
||||
technology evolves.
|
||||
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
|
||||
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
|
||||
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
|
||||
|
||||
## Quick Install
|
||||
|
||||
With pip:
|
||||
|
||||
```bash
|
||||
pip install -U langchain
|
||||
pip install langchain
|
||||
```
|
||||
|
||||
To learn more about LangChain, check out
|
||||
[the docs](https://python.langchain.com/docs/introduction/). If you’re looking for more
|
||||
advanced customization or agent orchestration, check out
|
||||
[LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building
|
||||
controllable agent workflows.
|
||||
With conda:
|
||||
|
||||
## Why use LangChain?
|
||||
```bash
|
||||
conda install langchain -c conda-forge
|
||||
```
|
||||
|
||||
LangChain helps developers build applications powered by LLMs through a standard
|
||||
interface for models, embeddings, vector stores, and more.
|
||||
## 🤔 What is LangChain?
|
||||
|
||||
Use LangChain for:
|
||||
**LangChain** is a framework for developing applications powered by large language models (LLMs).
|
||||
|
||||
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and
|
||||
external / internal systems, drawing from LangChain’s vast library of integrations with
|
||||
model providers, tools, vector stores, retrievers, and more.
|
||||
- **Model interoperability**. Swap models in and out as your engineering team
|
||||
experiments to find the best choice for your application’s needs. As the industry
|
||||
frontier evolves, adapt quickly — LangChain’s abstractions keep you moving without
|
||||
losing momentum.
|
||||
For these applications, LangChain simplifies the entire application lifecycle:
|
||||
|
||||
## LangChain’s ecosystem
|
||||
- **Open-source libraries**: Build your applications using LangChain's open-source [building blocks](https://python.langchain.com/v0.2/docs/concepts#langchain-expression-language-lcel), [components](https://python.langchain.com/v0.2/docs/concepts), and [third-party integrations](https://python.langchain.com/v0.2/docs/integrations/platforms/).
|
||||
Use [LangGraph](https://langchain-ai.github.io/langgraph/) to build stateful agents with first-class streaming and human-in-the-loop support.
|
||||
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
|
||||
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/).
|
||||
|
||||
While the LangChain framework can be used standalone, it also integrates seamlessly
|
||||
with any LangChain product, giving developers a full suite of tools when building LLM
|
||||
applications.
|
||||
### Open-source libraries
|
||||
|
||||
To improve your LLM application development, pair LangChain with:
|
||||
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
|
||||
- **`langchain-community`**: Third party integrations.
|
||||
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
|
||||
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
|
||||
- **[`LangGraph`](https://langchain-ai.github.io/langgraph/)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
|
||||
|
||||
- [LangSmith](http://www.langchain.com/langsmith) - Helpful for agent evals and
|
||||
observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain
|
||||
visibility in production, and improve performance over time.
|
||||
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can
|
||||
reliably handle complex tasks with LangGraph, our low-level agent orchestration
|
||||
framework. LangGraph offers customizable architecture, long-term memory, and
|
||||
human-in-the-loop workflows — and is trusted in production by companies like LinkedIn,
|
||||
Uber, Klarna, and GitLab.
|
||||
- [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/langgraph_platform/) - Deploy
|
||||
and scale agents effortlessly with a purpose-built deployment platform for long
|
||||
running, stateful workflows. Discover, reuse, configure, and share agents across
|
||||
teams — and iterate quickly with visual prototyping in
|
||||
[LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
|
||||
### Productionization:
|
||||
|
||||
## Additional resources
|
||||
- **[LangSmith](https://docs.smith.langchain.com/)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
|
||||
|
||||
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with
|
||||
guided examples on getting started with LangChain.
|
||||
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code
|
||||
snippets for topics such as tool calling, RAG use cases, and more.
|
||||
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key
|
||||
concepts behind the LangChain framework.
|
||||
- [LangChain Forum](https://forum.langchain.com/): Connect with the community and share all of your technical questions, ideas, and feedback.
|
||||
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on
|
||||
navigating base packages and integrations for LangChain.
|
||||
### Deployment:
|
||||
|
||||
- **[LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.
|
||||
|
||||

|
||||
|
||||
## 🧱 What can you build with LangChain?
|
||||
|
||||
**❓ Question answering with RAG**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/rag/)
|
||||
- End-to-end Example: [Chat LangChain](https://chat.langchain.com) and [repo](https://github.com/langchain-ai/chat-langchain)
|
||||
|
||||
**🧱 Extracting structured output**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/extraction/)
|
||||
- End-to-end Example: [SQL Llama2 Template](https://github.com/langchain-ai/langchain-extract/)
|
||||
|
||||
**🤖 Chatbots**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/chatbot/)
|
||||
- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
|
||||
|
||||
And much more! Head to the [Tutorials](https://python.langchain.com/v0.2/docs/tutorials/) section of the docs for more.
|
||||
|
||||
## 🚀 How does LangChain help?
|
||||
|
||||
The main value props of the LangChain libraries are:
|
||||
|
||||
1. **Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
|
||||
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
|
||||
|
||||
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
|
||||
|
||||
## LangChain Expression Language (LCEL)
|
||||
|
||||
LCEL is a key part of LangChain, allowing you to build and organize chains of processes in a straightforward, declarative manner. It was designed to support taking prototypes directly into production without needing to alter any code. This means you can use LCEL to set up everything from basic "prompt + LLM" setups to intricate, multi-step workflows.
|
||||
|
||||
- **[Overview](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel)**: LCEL and its benefits
|
||||
- **[Interface](https://python.langchain.com/v0.2/docs/concepts/#runnable-interface)**: The standard Runnable interface for LCEL objects
|
||||
- **[Primitives](https://python.langchain.com/v0.2/docs/how_to/#langchain-expression-language-lcel)**: More on the primitives LCEL includes
|
||||
- **[Cheatsheet](https://python.langchain.com/v0.2/docs/how_to/lcel_cheatsheet/)**: Quick overview of the most common usage patterns
|
||||
|
||||
## Components
|
||||
|
||||
Components fall into the following **modules**:
|
||||
|
||||
**📃 Model I/O**
|
||||
|
||||
This includes [prompt management](https://python.langchain.com/v0.2/docs/concepts/#prompt-templates), [prompt optimization](https://python.langchain.com/v0.2/docs/concepts/#example-selectors), a generic interface for [chat models](https://python.langchain.com/v0.2/docs/concepts/#chat-models) and [LLMs](https://python.langchain.com/v0.2/docs/concepts/#llms), and common utilities for working with [model outputs](https://python.langchain.com/v0.2/docs/concepts/#output-parsers).
|
||||
|
||||
**📚 Retrieval**
|
||||
|
||||
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/v0.2/docs/concepts/#document-loaders) from a variety of sources, [preparing it](https://python.langchain.com/v0.2/docs/concepts/#text-splitters), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/v0.2/docs/concepts/#retrievers) it for use in the generation step.
|
||||
|
||||
**🤖 Agents**
|
||||
|
||||
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a [standard interface for agents](https://python.langchain.com/v0.2/docs/concepts/#agents), along with [LangGraph](https://github.com/langchain-ai/langgraph) for building custom agents.
|
||||
|
||||
## 📖 Documentation
|
||||
|
||||
Please see [here](https://python.langchain.com) for full documentation, which includes:
|
||||
|
||||
- [Introduction](https://python.langchain.com/v0.2/docs/introduction/): Overview of the framework and the structure of the docs.
|
||||
- [Tutorials](https://python.langchain.com/docs/use_cases/): If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
|
||||
- [How-to guides](https://python.langchain.com/v0.2/docs/how_to/): Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
|
||||
- [Conceptual guide](https://python.langchain.com/v0.2/docs/concepts/): Conceptual explanations of the key parts of the framework.
|
||||
- [API Reference](https://api.python.langchain.com): Thorough documentation of every class and method.
|
||||
|
||||
## 🌐 Ecosystem
|
||||
|
||||
- [🦜🛠️ LangSmith](https://docs.smith.langchain.com/): Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
|
||||
- [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph/): Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
|
||||
- [🦜🏓 LangServe](https://python.langchain.com/docs/langserve): Deploy LangChain runnables and chains as REST APIs.
|
||||
|
||||
## 💁 Contributing
|
||||
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
|
||||
|
||||
For detailed information on how to contribute, see [here](https://python.langchain.com/v0.2/docs/contributing/).
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/graphs/contributors)
|
||||
|
||||
68
SECURITY.md
68
SECURITY.md
@@ -1,44 +1,20 @@
|
||||
# Security Policy
|
||||
|
||||
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources.
|
||||
|
||||
## Best practices
|
||||
|
||||
When building such applications developers should remember to follow good security practices:
|
||||
|
||||
* [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc. as appropriate for your application.
|
||||
* **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it's safest to assume that any LLM able to use those credentials may in fact delete data.
|
||||
* [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It's best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
|
||||
|
||||
Risks of not doing so include, but are not limited to:
|
||||
|
||||
* Data corruption or loss.
|
||||
* Unauthorized access to confidential information.
|
||||
* Compromised performance or availability of critical resources.
|
||||
|
||||
Example scenarios with mitigation strategies:
|
||||
|
||||
* A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
|
||||
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
|
||||
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
|
||||
|
||||
If you're building applications that access external resources like file systems, APIs
|
||||
or databases, consider speaking with your company's security team to determine how to best
|
||||
design and secure your applications.
|
||||
|
||||
## Reporting OSS Vulnerabilities
|
||||
|
||||
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
|
||||
a bounty program for our open source projects.
|
||||
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
|
||||
a bounty program for our open source projects.
|
||||
|
||||
Please report security vulnerabilities associated with the LangChain
|
||||
open source projects at [huntr](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true).
|
||||
Please report security vulnerabilities associated with the LangChain
|
||||
open source projects by visiting the following link:
|
||||
|
||||
[https://huntr.com/bounties/disclose/](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true)
|
||||
|
||||
Before reporting a vulnerability, please review:
|
||||
|
||||
1) In-Scope Targets and Out-of-Scope Targets below.
|
||||
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
|
||||
3) The [Best Practices](#best-practices) above to
|
||||
3) LangChain [security guidelines](https://python.langchain.com/docs/security) to
|
||||
understand what we consider to be a security vulnerability vs. developer
|
||||
responsibility.
|
||||
|
||||
@@ -46,39 +22,39 @@ Before reporting a vulnerability, please review:
|
||||
|
||||
The following packages and repositories are eligible for bug bounties:
|
||||
|
||||
* langchain-core
|
||||
* langchain (see exceptions)
|
||||
* langchain-community (see exceptions)
|
||||
* langgraph
|
||||
* langserve
|
||||
- langchain-core
|
||||
- langchain (see exceptions)
|
||||
- langchain-community (see exceptions)
|
||||
- langgraph
|
||||
- langserve
|
||||
|
||||
### Out of Scope Targets
|
||||
|
||||
All out of scope targets defined by huntr as well as:
|
||||
|
||||
* **langchain-experimental**: This repository is for experimental code and is not
|
||||
eligible for bug bounties (see [package warning](https://pypi.org/project/langchain-experimental/)), bug reports to it will be marked as interesting or waste of
|
||||
- **langchain-experimental**: This repository is for experimental code and is not
|
||||
eligible for bug bounties, bug reports to it will be marked as interesting or waste of
|
||||
time and published with no bounty attached.
|
||||
* **tools**: Tools in either langchain or langchain-community are not eligible for bug
|
||||
- **tools**: Tools in either langchain or langchain-community are not eligible for bug
|
||||
bounties. This includes the following directories
|
||||
* libs/langchain/langchain/tools
|
||||
* libs/community/langchain_community/tools
|
||||
* Please review the [Best Practices](#best-practices)
|
||||
- langchain/tools
|
||||
- langchain-community/tools
|
||||
- Please review our [security guidelines](https://python.langchain.com/docs/security)
|
||||
for more details, but generally tools interact with the real world. Developers are
|
||||
expected to understand the security implications of their code and are responsible
|
||||
for the security of their tools.
|
||||
* Code documented with security notices. This will be decided on a case by
|
||||
- Code documented with security notices. This will be decided done on a case by
|
||||
case basis, but likely will not be eligible for a bounty as the code is already
|
||||
documented with guidelines for developers that should be followed for making their
|
||||
application secure.
|
||||
* Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)).
|
||||
- Any LangSmith related repositories or APIs see below.
|
||||
|
||||
## Reporting LangSmith Vulnerabilities
|
||||
|
||||
Please report security vulnerabilities associated with LangSmith by email to `security@langchain.dev`.
|
||||
|
||||
* LangSmith site: [https://smith.langchain.com](https://smith.langchain.com)
|
||||
* SDK client: [https://github.com/langchain-ai/langsmith-sdk](https://github.com/langchain-ai/langsmith-sdk)
|
||||
- LangSmith site: https://smith.langchain.com
|
||||
- SDK client: https://github.com/langchain-ai/langsmith-sdk
|
||||
|
||||
### Other Security Concerns
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@
|
||||
"id": "CI8Elyc5gBQF"
|
||||
},
|
||||
"source": [
|
||||
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number."
|
||||
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint it ready, you need to copy its number."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -47,7 +47,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -61,7 +61,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Local\n",
|
||||
"from langchain_ollama import ChatOllama\n",
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"\n",
|
||||
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
|
||||
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",
|
||||
|
||||
@@ -144,7 +144,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "kWDWfSDBMPl8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -185,7 +185,7 @@
|
||||
" )\n",
|
||||
" # Text summary chain\n",
|
||||
" model = VertexAI(\n",
|
||||
" temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024\n",
|
||||
" temperature=0, model_name=\"gemini-pro\", max_tokens=1024\n",
|
||||
" ).with_fallbacks([empty_response])\n",
|
||||
" summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()\n",
|
||||
"\n",
|
||||
@@ -235,7 +235,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 6,
|
||||
"id": "PeK9bzXv3olF",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -254,7 +254,7 @@
|
||||
"\n",
|
||||
"def image_summarize(img_base64, prompt):\n",
|
||||
" \"\"\"Make image summary\"\"\"\n",
|
||||
" model = ChatVertexAI(model=\"gemini-2.5-flash\", max_tokens=1024)\n",
|
||||
" model = ChatVertexAI(model=\"gemini-pro-vision\", max_tokens=1024)\n",
|
||||
"\n",
|
||||
" msg = model.invoke(\n",
|
||||
" [\n",
|
||||
@@ -394,7 +394,7 @@
|
||||
"# The vectorstore to use to index the summaries\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"mm_rag_cj_blog\",\n",
|
||||
" embedding_function=VertexAIEmbeddings(model_name=\"text-embedding-005\"),\n",
|
||||
" embedding_function=VertexAIEmbeddings(model_name=\"textembedding-gecko@latest\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create retriever\n",
|
||||
@@ -431,7 +431,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 9,
|
||||
"id": "GlwCErBaCKQW",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -553,7 +553,7 @@
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
" # Multi-modal LLM\n",
|
||||
" model = ChatVertexAI(temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024)\n",
|
||||
" model = ChatVertexAI(temperature=0, model_name=\"gemini-pro-vision\", max_tokens=1024)\n",
|
||||
"\n",
|
||||
" # RAG pipeline\n",
|
||||
" chain = (\n",
|
||||
|
||||
@@ -21,6 +21,7 @@ Notebook | Description
|
||||
[code-analysis-deeplake.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/code-analysis-deeplake.ipynb) | Analyze its own code base with the help of gpt and activeloop's deep lake.
|
||||
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval.ipynb) | Build a custom agent that can interact with ai plugins by retrieving tools and creating natural language wrappers around openapi endpoints.
|
||||
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval_using_plugnplai.ipynb) | Build a custom agent with plugin retrieval functionality, utilizing ai plugins from the `plugnplai` directory.
|
||||
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
|
||||
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
|
||||
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
|
||||
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
|
||||
@@ -49,7 +50,7 @@ Notebook | Description
|
||||
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
|
||||
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
|
||||
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
|
||||
[rag_upstage_document_parse_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_document_parse_groundedness_check.ipynb) | End-to-end RAG example using Upstage Document Parse and Groundedness Check.
|
||||
[rag_upstage_layout_analysis_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_layout_analysis_groundedness_check.ipynb) | End-to-end RAG example using Upstage Layout Analysis and Groundedness Check.
|
||||
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
|
||||
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
|
||||
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
|
||||
@@ -61,6 +62,4 @@ Notebook | Description
|
||||
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.
|
||||
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
|
||||
[rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
|
||||
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
|
||||
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
|
||||
[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
|
||||
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
|
||||
@@ -204,14 +204,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "523e6ed2-2132-4748-bdb7-db765f20648d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_ollama import ChatOllama"
|
||||
"from langchain_core.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -30,7 +30,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
|
||||
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml matplotlib tiktoken open_clip_torch torch"
|
||||
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml pillow matplotlib tiktoken open_clip_torch torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -409,7 +409,7 @@
|
||||
" table_summaries,\n",
|
||||
" tables,\n",
|
||||
" image_summaries,\n",
|
||||
" img_base64_list,\n",
|
||||
" image_summaries,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -22,19 +22,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e8d63d14-138d-4aa5-a741-7fd3537d00aa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 16,
|
||||
"id": "2e87c10a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -49,7 +37,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 17,
|
||||
"id": "0b7b772b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -66,10 +54,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 18,
|
||||
"id": "f2675861",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Running Chroma using direct local API.\n",
|
||||
"Using DuckDB in-memory for database. Data will be transient.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
@@ -84,7 +81,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"id": "bc5403d4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -96,25 +93,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "1431cded",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import WebBaseLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "915d3ff3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -124,20 +113,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 7,
|
||||
"id": "96a2edf8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Created a chunk of size 2122, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 3187, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1017, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1049, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1256, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 2321, which is longer than the specified 1000\n"
|
||||
"Running Chroma using direct local API.\n",
|
||||
"Using DuckDB in-memory for database. Data will be transient.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -150,6 +135,14 @@
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "71ecef90",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c0a6c031",
|
||||
@@ -160,30 +153,31 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 43,
|
||||
"id": "eb142786",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import things that are needed generically\n",
|
||||
"from langchain.agents import Tool"
|
||||
"from langchain.agents import AgentType, Tool, initialize_agent\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 44,
|
||||
"id": "850bc4e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
|
||||
" ),\n",
|
||||
@@ -192,116 +186,94 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "70c461d8-aaca-4f2a-9a93-bf35841cc615",
|
||||
"execution_count": 45,
|
||||
"id": "fc47f230",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"# Construct the agent. We will use the default agent type here.\n",
|
||||
"# See documentation for a full list of options.\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "a6d2b911-3044-4430-a35b-75832bb45334",
|
||||
"execution_count": 46,
|
||||
"id": "10ca2db8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What did biden say about ketanji brown jackson in the state of the union address?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" state_of_union_qa_system (call_26QlRdsptjEJJZjFsAUjEbaH)\n",
|
||||
" Call ID: call_26QlRdsptjEJJZjFsAUjEbaH\n",
|
||||
" Args:\n",
|
||||
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"\n",
|
||||
"In the State of the Union address, Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "e836b4cd-abf7-49eb-be0e-b9ad501213f3",
|
||||
"execution_count": 47,
|
||||
"id": "4e91b811",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Why use ruff over flake8?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_KqDoWeO9bo9OAXdxOsCb6msC)\n",
|
||||
" Call ID: call_KqDoWeO9bo9OAXdxOsCb6msC\n",
|
||||
" Args:\n",
|
||||
" __arg1: Why use ruff over flake8?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: ruff_qa_system\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There are a few reasons why someone might choose to use Ruff over Flake8:\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"\n",
|
||||
"1. Larger rule set: Ruff implements over 800 rules, while Flake8 only implements around 200. This means that Ruff can catch more potential issues in your code.\n",
|
||||
"\n",
|
||||
"2. Better compatibility with other tools: Ruff is designed to work well with other tools like Black, isort, and type checkers like Mypy. This means that you can use Ruff alongside these tools to get more comprehensive feedback on your code.\n",
|
||||
"\n",
|
||||
"3. Automatic fixing of lint violations: Unlike Flake8, Ruff is capable of automatically fixing its own lint violations. This can save you time and effort when fixing issues in your code.\n",
|
||||
"\n",
|
||||
"4. Native implementation of popular Flake8 plugins: Ruff re-implements some of the most popular Flake8 plugins natively, which means you don't have to install and configure multiple plugins to get the same functionality.\n",
|
||||
"\n",
|
||||
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8, making it a popular choice for many developers.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"You might choose to use Ruff over Flake8 for several reasons:\n",
|
||||
"\n",
|
||||
"1. Ruff has a much larger rule set, implementing over 800 rules compared to Flake8's roughly 200, so it can catch more potential issues.\n",
|
||||
"2. Ruff is designed to work better with other tools like Black, isort, and type checkers like Mypy, providing more comprehensive code feedback.\n",
|
||||
"3. Ruff can automatically fix its own lint violations, which Flake8 cannot, saving time and effort.\n",
|
||||
"4. Ruff natively implements some popular Flake8 plugins, so you don't need to install and configure multiple plugins separately.\n",
|
||||
"\n",
|
||||
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 47,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"Why use ruff over flake8?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\"Why use ruff over flake8?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -324,20 +296,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 48,
|
||||
"id": "f59b377e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
|
||||
" return_direct=True,\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
|
||||
" return_direct=True,\n",
|
||||
@@ -347,92 +319,90 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "06f69c0f-c83d-4b7f-a1c8-7614aced3bae",
|
||||
"execution_count": 49,
|
||||
"id": "8615707a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "a6b38c12-ac25-43c0-b9c2-2b1985ab4825",
|
||||
"execution_count": 50,
|
||||
"id": "36e718a9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What did biden say about ketanji brown jackson in the state of the union address?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" state_of_union_qa_system (call_yjxh11OnZiauoyTAn9npWdxj)\n",
|
||||
" Call ID: call_yjxh11OnZiauoyTAn9npWdxj\n",
|
||||
" Args:\n",
|
||||
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 50,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "88f08d86-7972-4148-8128-3ac8898ad68a",
|
||||
"execution_count": 51,
|
||||
"id": "edfd0a1a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Why use ruff over flake8?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_GiWWfwF6wbbRFQrHlHbhRtGW)\n",
|
||||
" Call ID: call_GiWWfwF6wbbRFQrHlHbhRtGW\n",
|
||||
" Args:\n",
|
||||
" __arg1: What are the advantages of using ruff over flake8 for Python linting?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: ruff_qa_system\n",
|
||||
"\n",
|
||||
" Ruff has a larger rule set, supports automatic fixing of lint violations, and does not require the installation of additional plugins. It also has better compatibility with Black and can be used alongside a type checker for more comprehensive code analysis.\n"
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"Why use ruff over flake8?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\"Why use ruff over flake8?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -447,19 +417,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 57,
|
||||
"id": "d397a233",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
|
||||
" ),\n",
|
||||
@@ -468,60 +438,60 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "41743f29-150d-40ba-aa8e-3a63c32216aa",
|
||||
"execution_count": 58,
|
||||
"id": "06157240",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"# Construct the agent. We will use the default agent type here.\n",
|
||||
"# See documentation for a full list of options.\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "e20e81dd-284a-4d07-9160-63a84b65cba8",
|
||||
"execution_count": 59,
|
||||
"id": "b492b520",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_VOnxiOEehauQyVOTjDJkR5L2)\n",
|
||||
" Call ID: call_VOnxiOEehauQyVOTjDJkR5L2\n",
|
||||
" Args:\n",
|
||||
" __arg1: What tool does ruff use to run over Jupyter Notebooks?\n",
|
||||
" state_of_union_qa_system (call_AbSsXAxwe4JtCRhga926SxOZ)\n",
|
||||
" Call ID: call_AbSsXAxwe4JtCRhga926SxOZ\n",
|
||||
" Args:\n",
|
||||
" __arg1: Did the president mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" No, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now need to find out if the president mentioned this tool in the state of the union.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
|
||||
"\n",
|
||||
"Ruff does not support source.organizeImports and source.fixAll code actions in Jupyter Notebooks. Additionally, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'No, the president did not mention nbQA in the state of the union.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 59,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -549,7 +519,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.4"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -31,8 +31,8 @@
|
||||
"source": [
|
||||
"# Optional\n",
|
||||
"import os\n",
|
||||
"# os.environ['LANGSMITH_TRACING'] = 'true' # enables tracing\n",
|
||||
"# os.environ['LANGSMITH_API_KEY'] = <your-api-key>"
|
||||
"# os.environ['LANGCHAIN_TRACING_V2'] = 'true' # enables tracing\n",
|
||||
"# os.environ['LANGCHAIN_API_KEY'] = <your-api-key>"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -66,7 +66,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!python3 -m pip install --upgrade langchain langchain-deeplake openai"
|
||||
"#!python3 -m pip install --upgrade langchain deeplake openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -666,26 +666,89 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Your Deep Lake dataset has been successfully created!\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" \r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Dataset(path='hub://adilkhan/langchain-code', tensors=['embedding', 'id', 'metadata', 'text'])\n",
|
||||
"\n",
|
||||
" tensor htype shape dtype compression\n",
|
||||
" ------- ------- ------- ------- ------- \n",
|
||||
" embedding embedding (8244, 1536) float32 None \n",
|
||||
" id text (8244, 1) str None \n",
|
||||
" metadata json (8244, 1) str None \n",
|
||||
" text text (8244, 1) str None \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": []
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_deeplake.vectorstores import DeeplakeVectorStore\n",
|
||||
"from langchain_community.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"username = \"<USERNAME_OR_ORG>\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"db = DeeplakeVectorStore.from_documents(\n",
|
||||
" documents=texts,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" dataset_path=f\"hub://{username}/langchain-code\",\n",
|
||||
" overwrite=True,\n",
|
||||
"db = DeepLake.from_documents(\n",
|
||||
" texts, embeddings, dataset_path=f\"hub://{username}/langchain-code\", overwrite=True\n",
|
||||
")\n",
|
||||
"db"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from langchain_community.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"# db = DeepLake.from_documents(\n",
|
||||
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
|
||||
"# )\n",
|
||||
"# db"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
@@ -697,16 +760,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Deep Lake Dataset in hub://adilkhan/langchain-code already exists, loading from the storage\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db = DeeplakeVectorStore(\n",
|
||||
"db = DeepLake(\n",
|
||||
" dataset_path=f\"hub://{username}/langchain-code\",\n",
|
||||
" read_only=True,\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" embedding=embeddings,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -725,6 +796,36 @@
|
||||
"retriever.search_kwargs[\"k\"] = 20"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def filter(x):\n",
|
||||
" # filter based on source code\n",
|
||||
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
" # filter based on path e.g. extension\n",
|
||||
" metadata = x[\"metadata\"].data()[\"value\"]\n",
|
||||
" return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### turn on below for custom filtering\n",
|
||||
"# retriever.search_kwargs['filter'] = filter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
@@ -736,8 +837,10 @@
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"qa = RetrievalQA.from_llm(model, retriever=retriever)"
|
||||
"model = ChatOpenAI(\n",
|
||||
" model_name=\"gpt-3.5-turbo-0613\"\n",
|
||||
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -229,9 +229,9 @@
|
||||
" \"smoke\",\n",
|
||||
" \"temp\",\n",
|
||||
"]\n",
|
||||
"assert all([column in df.columns for column in expected_columns]), (\n",
|
||||
" \"DataFrame does not have the expected columns\"\n",
|
||||
")"
|
||||
"assert all(\n",
|
||||
" [column in df.columns for column in expected_columns]\n",
|
||||
"), \"DataFrame does not have the expected columns\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
273
cookbook/databricks_sql_db.ipynb
Normal file
273
cookbook/databricks_sql_db.ipynb
Normal file
@@ -0,0 +1,273 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "707d13a7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Databricks\n",
|
||||
"\n",
|
||||
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
|
||||
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0076d072",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "739b489b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install databricks-sql-connector"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73113163",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connecting to Databricks\n",
|
||||
"\n",
|
||||
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
|
||||
"\n",
|
||||
"### Syntax\n",
|
||||
"```python\n",
|
||||
"SQLDatabase.from_databricks(\n",
|
||||
" catalog: str,\n",
|
||||
" schema: str,\n",
|
||||
" host: Optional[str] = None,\n",
|
||||
" api_token: Optional[str] = None,\n",
|
||||
" warehouse_id: Optional[str] = None,\n",
|
||||
" cluster_id: Optional[str] = None,\n",
|
||||
" engine_args: Optional[dict] = None,\n",
|
||||
" **kwargs: Any)\n",
|
||||
"```\n",
|
||||
"### Required Parameters\n",
|
||||
"* `catalog`: The catalog name in the Databricks database.\n",
|
||||
"* `schema`: The schema name in the catalog.\n",
|
||||
"\n",
|
||||
"### Optional Parameters\n",
|
||||
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
|
||||
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
|
||||
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
|
||||
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
|
||||
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
|
||||
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
|
||||
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b11c7e48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "8102bca0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Connecting to Databricks with SQLDatabase wrapper\n",
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"\n",
|
||||
"db = SQLDatabase.from_databricks(catalog=\"samples\", schema=\"nyctaxi\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "9dd36f58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating a OpenAI Chat LLM wrapper\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b5c5f1a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### SQL Chain example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "36f2270b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities import SQLDatabaseChain\n",
|
||||
"\n",
|
||||
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "4e2b5f25",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"What is the average duration of taxi rides that start between midnight and 6am?\n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
|
||||
"FROM trips\n",
|
||||
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(987.8122786304605,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_chain.run(\n",
|
||||
" \"What is the average duration of taxi rides that start between midnight and 6am?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e496d5e5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### SQL Database Agent example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "9918e86a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_sql_agent\n",
|
||||
"from langchain_community.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
|
||||
"agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c484a76e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n",
|
||||
"Action Input: \u001b[0m\n",
|
||||
"Observation: \u001b[38;5;200m\u001b[1;3mtrips\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
|
||||
"Action: schema_sql_db\n",
|
||||
"Action Input: trips\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m\n",
|
||||
"CREATE TABLE trips (\n",
|
||||
"\ttpep_pickup_datetime TIMESTAMP, \n",
|
||||
"\ttpep_dropoff_datetime TIMESTAMP, \n",
|
||||
"\ttrip_distance FLOAT, \n",
|
||||
"\tfare_amount FLOAT, \n",
|
||||
"\tpickup_zip INT, \n",
|
||||
"\tdropoff_zip INT\n",
|
||||
") USING DELTA\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from trips table:\n",
|
||||
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
|
||||
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
|
||||
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
|
||||
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
|
||||
"*/\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
|
||||
"Action: query_checker_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Observation: \u001b[31;1m\u001b[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
|
||||
"Action: query_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[(30.6, '0 00:43:31.000000000')]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
|
||||
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the longest trip distance and how long did it take?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -115,7 +115,7 @@
|
||||
"\n",
|
||||
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
|
||||
"\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a few relevant columns given the question.\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
|
||||
"\n",
|
||||
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
|
||||
"\n",
|
||||
|
||||
@@ -487,7 +487,7 @@
|
||||
" print(\"*\" * 40)\n",
|
||||
" print(\n",
|
||||
" colored(\n",
|
||||
" f\"After {i + 1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
|
||||
" f\"After {i+1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
|
||||
" \"blue\",\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
|
||||
@@ -20,7 +20,11 @@
|
||||
"cell_type": "markdown",
|
||||
"id": "5939a54c-3198-4ba4-8346-1cc088c473c0",
|
||||
"metadata": {},
|
||||
"source": "##### You can embed text in the same VectorDB space as images, and retrieve text and images as well based on input text or image.\n##### Following link demonstrates that.\n<a> https://python.langchain.com/v0.2/docs/integrations/text_embedding/open_clip/ </a>"
|
||||
"source": [
|
||||
"##### You can embed text in the same VectorDB space as images, and retreive text and images as well based on input text or image.\n",
|
||||
"##### Following link demonstrates that.\n",
|
||||
"<a> https://python.langchain.com/v0.2/docs/integrations/text_embedding/open_clip/ </a>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@@ -354,7 +358,7 @@
|
||||
"id": "6e5cd014-db86-4d6b-8399-25cae3da5570",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Helper function to plot retrieved similar images"
|
||||
"## Helper function to plot retrived similar images"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -385,7 +389,7 @@
|
||||
" ax = axs[idx]\n",
|
||||
" ax.imshow(img)\n",
|
||||
" # Assuming similarity is not available in the new data, removed sim_score\n",
|
||||
" ax.title.set_text(f\"\\nProduct ID: {data['id']}\\n Score: {score}\")\n",
|
||||
" ax.title.set_text(f\"\\nProduct ID: {data[\"id\"]}\\n Score: {score}\")\n",
|
||||
" ax.axis(\"off\") # Turn off axis\n",
|
||||
"\n",
|
||||
" # Hide any remaining empty subplots\n",
|
||||
@@ -596,4 +600,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -148,11 +148,11 @@
|
||||
"\n",
|
||||
" instructions = \"None\"\n",
|
||||
" for i in range(max_meta_iters):\n",
|
||||
" print(f\"[Episode {i + 1}/{max_meta_iters}]\")\n",
|
||||
" print(f\"[Episode {i+1}/{max_meta_iters}]\")\n",
|
||||
" chain = initialize_chain(instructions, memory=None)\n",
|
||||
" output = chain.predict(human_input=task)\n",
|
||||
" for j in range(max_iters):\n",
|
||||
" print(f\"(Step {j + 1}/{max_iters})\")\n",
|
||||
" print(f\"(Step {j+1}/{max_iters})\")\n",
|
||||
" print(f\"Assistant: {output}\")\n",
|
||||
" print(\"Human: \")\n",
|
||||
" human_input = input()\n",
|
||||
|
||||
@@ -124,8 +124,8 @@
|
||||
"# Optional-- If you want to enable Langsmith -- good for debugging\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
||||
"os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass()"
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,7 +156,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Ensure you have an HF_TOKEN in your development environment:\n",
|
||||
"# Ensure you have an HF_TOKEN in your development enviornment:\n",
|
||||
"# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
|
||||
"\n",
|
||||
"# Load MongoDB's embedded_movies dataset from Hugging Face\n",
|
||||
|
||||
@@ -23,41 +23,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2498a0a1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Packages\n",
|
||||
"\n",
|
||||
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "febbc459-ebba-4c1a-a52b-fed7731593f8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install --quiet -U langchain-vdms langchain-experimental langchain-ollama\n",
|
||||
"\n",
|
||||
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
|
||||
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" \"onnxruntime==1.17.0\" pillow pydantic lxml open_clip_torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "78ac6543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from dotenv import load_dotenv, find_dotenv\n",
|
||||
"# load_dotenv(find_dotenv(), override=True);"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5c8916e",
|
||||
"id": "6a6b6e73",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Start VDMS Server\n",
|
||||
@@ -68,15 +34,15 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1e6e2c15",
|
||||
"execution_count": 1,
|
||||
"id": "5f483872",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"a701e5ac3523006e9540b5355e2d872d5d78383eab61562a675d5b9ac21fde65\n"
|
||||
"a1b9206b08ef626e15b356bf9e031171f7c7eb8f956a2733f196f0109246fe2b\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -84,11 +50,45 @@
|
||||
"! docker run --rm -d -p 55559:55555 --name vdms_rag_nb intellabs/vdms:latest\n",
|
||||
"\n",
|
||||
"# Connect to VDMS Vector Store\n",
|
||||
"from langchain_vdms.vectorstores import VDMS_Client\n",
|
||||
"from langchain_community.vectorstores.vdms import VDMS_Client\n",
|
||||
"\n",
|
||||
"vdms_client = VDMS_Client(port=55559)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2498a0a1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Packages\n",
|
||||
"\n",
|
||||
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "febbc459-ebba-4c1a-a52b-fed7731593f8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install --quiet -U vdms langchain-experimental\n",
|
||||
"\n",
|
||||
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
|
||||
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml open_clip_torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "78ac6543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from dotenv import load_dotenv, find_dotenv\n",
|
||||
"# load_dotenv(find_dotenv(), override=True);"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1e94b3fb-8e3e-4736-be0a-ad881626c7bd",
|
||||
@@ -115,12 +115,11 @@
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"# Folder to store pdf and extracted images\n",
|
||||
"base_datapath = Path(\"./data/multimodal_files\").resolve()\n",
|
||||
"datapath = base_datapath / \"images\"\n",
|
||||
"datapath = Path(\"./data/multimodal_files\").resolve()\n",
|
||||
"datapath.mkdir(parents=True, exist_ok=True)\n",
|
||||
"\n",
|
||||
"pdf_url = \"https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\"\n",
|
||||
"pdf_path = str(base_datapath / pdf_url.split(\"/\")[-1])\n",
|
||||
"pdf_path = str(datapath / pdf_url.split(\"/\")[-1])\n",
|
||||
"with open(pdf_path, \"wb\") as f:\n",
|
||||
" f.write(requests.get(pdf_url).content)"
|
||||
]
|
||||
@@ -186,8 +185,8 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import VDMS\n",
|
||||
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
|
||||
"from langchain_vdms import VDMS\n",
|
||||
"\n",
|
||||
"# Create VDMS\n",
|
||||
"vectorstore = VDMS(\n",
|
||||
@@ -313,10 +312,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.messages import HumanMessage, SystemMessage\n",
|
||||
"from langchain_community.llms.ollama import Ollama\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
|
||||
"from langchain_ollama.llms import OllamaLLM\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def prompt_func(data_dict):\n",
|
||||
@@ -341,8 +340,8 @@
|
||||
" \"As an expert art critic and historian, your task is to analyze and interpret images, \"\n",
|
||||
" \"considering their historical and cultural significance. Alongside the images, you will be \"\n",
|
||||
" \"provided with related text to offer context. Both will be retrieved from a vectorstore based \"\n",
|
||||
" \"on user-input keywords. Please use your extensive knowledge and analytical skills to provide a \"\n",
|
||||
" \"comprehensive summary that includes:\\n\"\n",
|
||||
" \"on user-input keywords. Please convert answers to english and use your extensive knowledge \"\n",
|
||||
" \"and analytical skills to provide a comprehensive summary that includes:\\n\"\n",
|
||||
" \"- A detailed description of the visual elements in the image.\\n\"\n",
|
||||
" \"- The historical and cultural context of the image.\\n\"\n",
|
||||
" \"- An interpretation of the image's symbolism and meaning.\\n\"\n",
|
||||
@@ -360,7 +359,7 @@
|
||||
" \"\"\"Multi-modal RAG chain\"\"\"\n",
|
||||
"\n",
|
||||
" # Multi-modal LLM\n",
|
||||
" llm_model = OllamaLLM(\n",
|
||||
" llm_model = Ollama(\n",
|
||||
" verbose=True, temperature=0.5, model=\"llava\", base_url=\"http://localhost:11434\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
@@ -420,121 +419,6 @@
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"© 2017 LARRY D. MOORE\n",
|
||||
"\n",
|
||||
"contemporary criticism of the less-than- thoughtful circumstances under which Lange photographed Thomson, the picture’s power to engage has not diminished. Artists in other countries have appropriated the image, changing the mother’s features into those of other ethnicities, but keeping her expression and the positions of her clinging children. Long after anyone could help the Thompson family, this picture has resonance in another time of national crisis, unemployment and food shortages.\n",
|
||||
"\n",
|
||||
"A striking, but very different picture is a 1900 portrait of the legendary Hin-mah-too-yah- lat-kekt (Chief Joseph) of the Nez Percé people. The Bureau of American Ethnology in Washington, D.C., regularly arranged for its photographer, De Lancey Gill, to photograph Native American delegations that came to the capital to confer with officials about tribal needs and concerns. Although Gill described Chief Joseph as having “an air of gentleness and quiet reserve,” the delegate skeptically appraises the photographer, which is not surprising given that the United States broke five treaties with Chief Joseph and his father between 1855 and 1885.\n",
|
||||
"\n",
|
||||
"More than a glance, second looks may reveal new knowledge into complex histories.\n",
|
||||
"\n",
|
||||
"Anne Wilkes Tucker is the photography curator emeritus of the Museum of Fine Arts, Houston and curator of the “Not an Ostrich” exhibition.\n",
|
||||
"\n",
|
||||
"28\n",
|
||||
"\n",
|
||||
"28 LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"THEYRE WILLING TO HAVE MEENTERTAIN THEM DURING THE DAY,BUT AS SOON AS IT STARTSGETTING DARK, THEY ALLGO OFF, AND LEAVE ME! \n",
|
||||
"ROSA PARKS: IN HER OWN WORDS\n",
|
||||
"\n",
|
||||
"COMIC ART: 120 YEARS OF PANELS AND PAGES\n",
|
||||
"\n",
|
||||
"SHALL NOT BE DENIED: WOMEN FIGHT FOR THE VOTE\n",
|
||||
"\n",
|
||||
"More information loc.gov/exhibits\n",
|
||||
"Nuestra Sefiora de las Iguanas\n",
|
||||
"\n",
|
||||
"Graciela Iturbide’s 1979 portrait of Zobeida Díaz in the town of Juchitán in southeastern Mexico conveys the strength of women and reflects their important contributions to the economy. Díaz, a merchant, was selling iguanas to cook and eat, carrying them on her head, as is customary.\n",
|
||||
"\n",
|
||||
"GRACIELA ITURBIDE. “NUESTRA SEÑORA DE LAS IGUANAS.” 1979. GELATIN SILVER PRINT. © GRACIELA ITURBIDE, USED BY PERMISSION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
|
||||
"\n",
|
||||
"Iturbide requested permission to take a photograph, but this proved challenging because the iguanas were constantly moving, causing Díaz to laugh. The result, however, was a brilliant portrait that the inhabitants of Juchitán claimed with pride. They have reproduced it on posters and erected a statue honoring Díaz and her iguanas. The photo now appears throughout the world, inspiring supporters of feminism, women’s rights and gender equality.\n",
|
||||
"\n",
|
||||
"—Adam Silvia is a curator in the Prints and Photographs Division.\n",
|
||||
"\n",
|
||||
"6\n",
|
||||
"\n",
|
||||
"6 LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"‘Migrant Mother’ is Florence Owens Thompson\n",
|
||||
"\n",
|
||||
"The iconic portrait that became the face of the Great Depression is also the most famous photograph in the collections of the Library of Congress.\n",
|
||||
"\n",
|
||||
"The Library holds the original source of the photo — a nitrate negative measuring 4 by 5 inches. Do you see a faint thumb in the bottom right? The photographer, Dorothea Lange, found the thumb distracting and after a few years had the negative altered to make the thumb almost invisible. Lange’s boss at the Farm Security Administration, Roy Stryker, criticized her action because altering a negative undermines the credibility of a documentary photo.\n",
|
||||
"Shrimp Picker\n",
|
||||
"\n",
|
||||
"The photos and evocative captions of Lewis Hine served as source material for National Child Labor Committee reports and exhibits exposing abusive child labor practices in the United States in the first decades of the 20th century.\n",
|
||||
"\n",
|
||||
"LEWIS WICKES HINE. “MANUEL, THE YOUNG SHRIMP-PICKER, FIVE YEARS OLD, AND A MOUNTAIN OF CHILD-LABOR OYSTER SHELLS BEHIND HIM. HE WORKED LAST YEAR. UNDERSTANDS NOT A WORD OF ENGLISH. DUNBAR, LOPEZ, DUKATE COMPANY. LOCATION: BILOXI, MISSISSIPPI.” FEBRUARY 1911. NATIONAL CHILD LABOR COMMITTEE COLLECTION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
|
||||
"\n",
|
||||
"For 15 years, Hine\n",
|
||||
"\n",
|
||||
"crisscrossed the country, documenting the practices of the worst offenders. His effective use of photography made him one of the committee's greatest publicists in the campaign for legislation to ban child labor.\n",
|
||||
"\n",
|
||||
"Hine was a master at taking photos that catch attention and convey a message and, in this photo, he framed Manuel in a setting that drove home the boy’s small size and unsafe environment.\n",
|
||||
"\n",
|
||||
"Captions on photos of other shrimp pickers emphasized their long working hours as well as one hazard of the job: The acid from the shrimp made pickers’ hands sore and “eats the shoes off your feet.”\n",
|
||||
"\n",
|
||||
"Such images alerted viewers to all that workers, their families and the nation sacrificed when children were part of the labor force. The Library holds paper records of the National Child Labor Committee as well as over 5,000 photographs.\n",
|
||||
"\n",
|
||||
"—Barbara Natanson is head of the Reference Section in the Prints and Photographs Division.\n",
|
||||
"\n",
|
||||
"8\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"Intergenerational Portrait\n",
|
||||
"\n",
|
||||
"Raised on the Apsáalooke (Crow) reservation in Montana, photographer Wendy Red Star created her “Apsáalooke Feminist” self-portrait series with her daughter Beatrice. With a dash of wry humor, mother and daughter are their own first-person narrators.\n",
|
||||
"\n",
|
||||
"Red Star explains the significance of their appearance: “The dress has power: You feel strong and regal wearing it. In my art, the elk tooth dress specifically symbolizes Crow womanhood and the matrilineal line connecting me to my ancestors. As a mother, I spend hours searching for the perfect elk tooth dress materials to make a prized dress for my daughter.”\n",
|
||||
"\n",
|
||||
"In a world that struggles with cultural identities, this photograph shows us the power and beauty of blending traditional and contemporary styles.\n",
|
||||
"‘American Gothic’ Product #216040262 Price: $24\n",
|
||||
"\n",
|
||||
"U.S. Capitol at Night Product #216040052 Price: $24\n",
|
||||
"\n",
|
||||
"Good Reading Ahead Product #21606142 Price: $24\n",
|
||||
"\n",
|
||||
"Gordon Parks created an iconic image with this 1942 photograph of cleaning woman Ella Watson.\n",
|
||||
"\n",
|
||||
"Snow blankets the U.S. Capitol in this classic image by Ernest L. Crandall.\n",
|
||||
"\n",
|
||||
"Start your new year out right with a poster promising good reading for months to come.\n",
|
||||
"\n",
|
||||
"▪ Order online: loc.gov/shop ▪ Order by phone: 888.682.3557\n",
|
||||
"\n",
|
||||
"26\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"SUPPORT\n",
|
||||
"\n",
|
||||
"A PICTURE OF PHILANTHROPY Annenberg Foundation Gives $1 Million and a Photographic Collection to the Library.\n",
|
||||
"\n",
|
||||
"A major gift by Wallis Annenberg and the Annenberg Foundation in Los Angeles will support the effort to reimagine the visitor experience at the Library of Congress. The foundation also is donating 1,000 photographic prints from its Annenberg Space for Photography exhibitions to the Library.\n",
|
||||
"\n",
|
||||
"The Library is pursuing a multiyear plan to transform the experience of its nearly 2 million annual visitors, share more of its treasures with the public and show how Library collections connect with visitors’ own creativity and research. The project is part of a strategic plan established by Librarian of Congress Carla Hayden to make the Library more user-centered for Congress, creators and learners of all ages.\n",
|
||||
"\n",
|
||||
"A 2018 exhibition at the Annenberg Space for Photography in Los Angeles featured over 400 photographs from the Library. The Library is planning a future photography exhibition, based on the Annenberg-curated show, along with a documentary film on the Library and its history, produced by the Annenberg Space for Photography.\n",
|
||||
"\n",
|
||||
"“The nation’s library is honored to have the strong support of Wallis Annenberg and the Annenberg Foundation as we enhance the experience for our visitors,” Hayden said. “We know that visitors will find new connections to the Library through the incredible photography collections and countless other treasures held here to document our nation’s history and creativity.”\n",
|
||||
"\n",
|
||||
"To enhance the Library’s holdings, the foundation is giving the Library photographic prints for long-term preservation from 10 other exhibitions hosted at the Annenberg Space for Photography. The Library holds one of the world’s largest photography collections, with about 14 million photos and over 1 million images digitized and available online.\n",
|
||||
"18 LIBRARY OF CONGRESS MAGAZINE\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
@@ -577,17 +461,10 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The image is a black and white photograph by Dorothea Lange titled \"Destitute Pea Pickers in California. Mother of Seven Children. Age Thirty-Two. Nipomo, California.\" It was taken in March 1936 as part of the Farm Security Administration-Office of War Information Collection.\n",
|
||||
"\n",
|
||||
"The photograph features a woman with seven children, who appear to be in a state of poverty and hardship. The woman is seated, looking directly at the camera, while three of her children are standing behind her. They all seem to be dressed in ragged clothing, indicative of their impoverished condition.\n",
|
||||
"\n",
|
||||
"The historical context of this image is related to the Great Depression, which was a period of economic hardship in the United States that lasted from 1929 to 1939. During this time, many people struggled to make ends meet, and poverty was widespread. This photograph captures the plight of one such family during this difficult period.\n",
|
||||
"\n",
|
||||
"The symbolism of the image is multifaceted. The woman's direct gaze at the camera can be seen as a plea for help or an expression of desperation. The ragged clothing of the children serves as a stark reminder of the poverty and hardship experienced by many during this time.\n",
|
||||
"\n",
|
||||
"In terms of connections to the related text, it is mentioned that Florence Owens Thompson, the woman in the photograph, initially regretted having her picture taken. However, she later came to appreciate the importance of the image as a representation of the struggles faced by many during the Great Depression. The mention of Helena Zinkham suggests that she may have played a role in the creation or distribution of this photograph.\n",
|
||||
"\n",
|
||||
"Overall, this image is a powerful depiction of poverty and hardship during the Great Depression, capturing the resilience and struggles of one family amidst difficult times. \n"
|
||||
" The image depicts a woman with several children. The woman appears to be of Cherokee heritage, as suggested by the text provided. The image is described as having been initially regretted by the subject, Florence Owens Thompson, due to her feeling that it did not accurately represent her leadership qualities.\n",
|
||||
"The historical and cultural context of the image is tied to the Great Depression and the Dust Bowl, both of which affected the Cherokee people in Oklahoma. The photograph was taken during this period, and its subject, Florence Owens Thompson, was a leader within her community who worked tirelessly to help those affected by these crises.\n",
|
||||
"The image's symbolism and meaning can be interpreted as a representation of resilience and strength in the face of adversity. The woman is depicted with multiple children, which could signify her role as a caregiver and protector during difficult times.\n",
|
||||
"Connections between the image and the related text include Florence Owens Thompson's leadership qualities and her regretted feelings about the photograph. Additionally, the mention of Dorothea Lange, the photographer who took this photo, ties the image to its historical context and the broader narrative of the Great Depression and Dust Bowl in Oklahoma. \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -614,17 +491,11 @@
|
||||
"source": [
|
||||
"! docker kill vdms_rag_nb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fe4a98ee",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".test-venv",
|
||||
"display_name": ".langchain-venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -638,7 +509,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.10"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -183,7 +183,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"game_description = f\"\"\"Here is the topic for a Dungeons & Dragons game: {quest}.\n",
|
||||
" The characters are: {(*character_names,)}.\n",
|
||||
" The characters are: {*character_names,}.\n",
|
||||
" The story is narrated by the storyteller, {storyteller_name}.\"\"\"\n",
|
||||
"\n",
|
||||
"player_descriptor_system_message = SystemMessage(\n",
|
||||
@@ -334,7 +334,7 @@
|
||||
" You are the storyteller, {storyteller_name}.\n",
|
||||
" Please make the quest more specific. Be creative and imaginative.\n",
|
||||
" Please reply with the specified quest in {word_limit} words or less. \n",
|
||||
" Speak directly to the characters: {(*character_names,)}.\n",
|
||||
" Speak directly to the characters: {*character_names,}.\n",
|
||||
" Do not add anything else.\"\"\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
|
||||
@@ -200,7 +200,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"game_description = f\"\"\"Here is the topic for the presidential debate: {topic}.\n",
|
||||
"The presidential candidates are: {\", \".join(character_names)}.\"\"\"\n",
|
||||
"The presidential candidates are: {', '.join(character_names)}.\"\"\"\n",
|
||||
"\n",
|
||||
"player_descriptor_system_message = SystemMessage(\n",
|
||||
" content=\"You can add detail to the description of each presidential candidate.\"\n",
|
||||
@@ -595,7 +595,7 @@
|
||||
" Frame the debate topic as a problem to be solved.\n",
|
||||
" Be creative and imaginative.\n",
|
||||
" Please reply with the specified topic in {word_limit} words or less. \n",
|
||||
" Speak directly to the presidential candidates: {(*character_names,)}.\n",
|
||||
" Speak directly to the presidential candidates: {*character_names,}.\n",
|
||||
" Do not add anything else.\"\"\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
|
||||
@@ -71,9 +71,9 @@
|
||||
"# Optional: LangSmith API keys\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
||||
"os.environ[\"LANGSMITH_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
|
||||
"os.environ[\"LANGSMITH_API_KEY\"] = \"api_key\""
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = \"api_key\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -215,8 +215,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_ollama import ChatOllama\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"# Prompt\n",
|
||||
|
||||
@@ -395,7 +395,8 @@
|
||||
"prompt_messages = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=(\n",
|
||||
" \"You are a world class algorithm to answer questions in a specific format.\"\n",
|
||||
" \"You are a world class algorithm to answer \"\n",
|
||||
" \"questions in a specific format.\"\n",
|
||||
" )\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Answer question using the following context\"),\n",
|
||||
|
||||
@@ -29,7 +29,7 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_PROJECT\"] = \"movie-qa\""
|
||||
"os.environ[\"LANGCHAIN_PROJECT\"] = \"movie-qa\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -25,7 +25,7 @@
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||||
"\n",
|
||||
"This guide demonstrates how Oracle AI Vector Search can be used with LangChain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
|
||||
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
|
||||
"\n",
|
||||
" * Loading the documents from various sources using OracleDocLoader\n",
|
||||
" * Summarizing them within/outside the database using OracleSummary\n",
|
||||
@@ -47,19 +47,7 @@
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install the Oracle Database [python-oracledb driver](https://pypi.org/project/oracledb/) to use LangChain with Oracle AI Vector Search:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"$ python -m pip install --upgrade oracledb\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Demo User\n",
|
||||
"First, connect as a privileged user to create a demo user with all the required privileges. Change the credentials for your environment. Also set the DEMO_PY_DIR path to a directory on the database host where your model file is located:"
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -68,30 +56,65 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Demo User\n",
|
||||
"First, create a demo user with all the required privileges. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n",
|
||||
"User setup done!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# Please update with your SYSTEM (or privileged user) username, password, and database connection string\n",
|
||||
"username = \"SYSTEM\"\n",
|
||||
"# Update with your username, password, hostname, and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"\n",
|
||||
" with connection.cursor() as cursor:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
" try:\n",
|
||||
" cursor.execute(\n",
|
||||
" \"\"\"\n",
|
||||
" begin\n",
|
||||
" -- Drop user\n",
|
||||
" execute immediate 'drop user if exists testuser cascade';\n",
|
||||
"\n",
|
||||
" begin\n",
|
||||
" execute immediate 'drop user testuser cascade';\n",
|
||||
" exception\n",
|
||||
" when others then\n",
|
||||
" dbms_output.put_line('Error dropping user: ' || SQLERRM);\n",
|
||||
" end;\n",
|
||||
" \n",
|
||||
" -- Create user and grant privileges\n",
|
||||
" execute immediate 'create user testuser identified by testuser';\n",
|
||||
" execute immediate 'grant connect, unlimited tablespace, create credential, create procedure, create any index to testuser';\n",
|
||||
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/home/yourname/demo/orachain''';\n",
|
||||
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/scratch/hroy/view_storage/hroy_devstorage/demo/orachain''';\n",
|
||||
" execute immediate 'grant read, write on directory DEMO_PY_DIR to public';\n",
|
||||
" execute immediate 'grant create mining model to testuser';\n",
|
||||
"\n",
|
||||
" \n",
|
||||
" -- Network access\n",
|
||||
" begin\n",
|
||||
" DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n",
|
||||
@@ -104,7 +127,15 @@
|
||||
" end;\n",
|
||||
" \"\"\"\n",
|
||||
" )\n",
|
||||
" print(\"User setup done!\")"
|
||||
" print(\"User setup done!\")\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"User setup failed with error: {e}\")\n",
|
||||
" finally:\n",
|
||||
" cursor.close()\n",
|
||||
" conn.close()\n",
|
||||
"except Exception as e:\n",
|
||||
" print(f\"Connection failed with error: {e}\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -112,13 +143,13 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Process Documents using Oracle AI\n",
|
||||
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by LangChain.\n",
|
||||
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
|
||||
"\n",
|
||||
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
|
||||
"\n",
|
||||
"The Oracle AI Vector Search LangChain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
|
||||
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
|
||||
"\n",
|
||||
"In the sections that follow, we will detail the utilization of Oracle AI LangChain APIs to effectively implement each of these processes."
|
||||
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -126,24 +157,38 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Demo User\n",
|
||||
"The following sample code shows how to connect to Oracle Database using the python-oracledb driver. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You can switch to Thick mode if you are unable to use Thin mode."
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 45,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, and database connection string\n",
|
||||
"username = \"testuser\"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"connection = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
"print(\"Connection successful!\")"
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,12 +201,22 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Table created and populated.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"with connection.cursor() as cursor:\n",
|
||||
" drop_table_sql = \"\"\"drop table if exists demo_tab\"\"\"\n",
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
"\n",
|
||||
" drop_table_sql = \"\"\"drop table demo_tab\"\"\"\n",
|
||||
" cursor.execute(drop_table_sql)\n",
|
||||
"\n",
|
||||
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
|
||||
@@ -184,9 +239,15 @@
|
||||
" ]\n",
|
||||
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
|
||||
"\n",
|
||||
"connection.commit()\n",
|
||||
" conn.commit()\n",
|
||||
"\n",
|
||||
"print(\"Table created and populated.\")"
|
||||
" print(\"Table created and populated.\")\n",
|
||||
" cursor.close()\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Table creation failed.\")\n",
|
||||
" cursor.close()\n",
|
||||
" conn.close()\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -200,22 +261,30 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the ONNX Model\n",
|
||||
"### Load ONNX Model\n",
|
||||
"\n",
|
||||
"Oracle accommodates a variety of embedding providers, enabling you to choose between proprietary database solutions and third-party services such as Oracle Generative AI Service and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||||
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||||
"\n",
|
||||
"***Important*** : Should you opt for the database option, you must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||||
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||||
"\n",
|
||||
"A significant advantage of utilizing an ONNX model directly within Oracle Database is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||||
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||||
"\n",
|
||||
"Below is the example code to upload an ONNX model into Oracle Database:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 47,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ONNX model loaded.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"\n",
|
||||
@@ -225,8 +294,12 @@
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
"model_name = \"demo_model\"\n",
|
||||
"\n",
|
||||
"OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
|
||||
"print(\"ONNX model loaded.\")"
|
||||
"try:\n",
|
||||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||||
" print(\"ONNX model loaded.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"ONNX model loading failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -248,7 +321,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with connection.cursor() as cursor:\n",
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
" cursor.execute(\n",
|
||||
" \"\"\"\n",
|
||||
" declare\n",
|
||||
@@ -275,7 +349,12 @@
|
||||
" params => json(jo.to_string));\n",
|
||||
" end;\n",
|
||||
" \"\"\"\n",
|
||||
" )"
|
||||
" )\n",
|
||||
" cursor.close()\n",
|
||||
" print(\"Credentials created.\")\n",
|
||||
"except Exception as ex:\n",
|
||||
" cursor.close()\n",
|
||||
" raise"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -283,24 +362,33 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Documents\n",
|
||||
"You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
|
||||
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
|
||||
"\n",
|
||||
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
|
||||
"\n",
|
||||
"Below is a sample code snippet that demonstrates how to use OracleDocLoader:"
|
||||
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 48,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of docs loaded: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"# loading from Oracle Database table\n",
|
||||
"# make sure you have the table with this specification\n",
|
||||
"loader_params = {}\n",
|
||||
"loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
@@ -308,7 +396,7 @@
|
||||
"}\n",
|
||||
"\n",
|
||||
"\"\"\" load the docs \"\"\"\n",
|
||||
"loader = OracleDocLoader(conn=connection, params=loader_params)\n",
|
||||
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
@@ -321,23 +409,23 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Summary\n",
|
||||
"Now that you have loaded the documents, you may want to generate a summary for each document. The Oracle AI Vector Search LangChain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, Oracle Generative AI Service, HuggingFace, among others, allowing you to select the provider that best meets their needs. To utilize these capabilities, you must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
|
||||
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** You may need to set proxy if you want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
|
||||
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# proxy to be used when we instantiate summary and embedder objects\n",
|
||||
"# proxy to be used when we instantiate summary and embedder object\n",
|
||||
"proxy = \"\""
|
||||
]
|
||||
},
|
||||
@@ -345,14 +433,22 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code shows how to generate a summary:"
|
||||
"The following sample code will show how to generate summary:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 49,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of Summaries: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.utilities.oracleai import OracleSummary\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -367,7 +463,7 @@
|
||||
"\n",
|
||||
"# get the summary instance\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"summ = OracleSummary(conn=connection, params=summary_params, proxy=proxy)\n",
|
||||
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
|
||||
"\n",
|
||||
"list_summary = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -391,9 +487,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 50,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of Chunks: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -402,7 +506,7 @@
|
||||
"splitter_params = {\"normalize\": \"all\"}\n",
|
||||
"\n",
|
||||
"\"\"\" get the splitter instance \"\"\"\n",
|
||||
"splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
|
||||
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
|
||||
"\n",
|
||||
"list_chunks = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -419,19 +523,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Embeddings\n",
|
||||
"Now that the documents are chunked as per requirements, you may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||||
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** You may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||||
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -443,14 +547,22 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code shows how to generate embeddings:"
|
||||
"The following sample code will show how to generate embeddings:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of embeddings: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -460,7 +572,7 @@
|
||||
"\n",
|
||||
"# get the embedding instance\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"embedder = OracleEmbeddings(conn=connection, params=embedder_params, proxy=proxy)\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
|
||||
"\n",
|
||||
"embeddings = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -479,19 +591,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create Oracle AI Vector Store\n",
|
||||
"Now that you know how to use Oracle AI LangChain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
|
||||
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, let's import all the dependencies:"
|
||||
"First, let's import all the dependencies."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 52,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -514,80 +626,100 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, let's combine all document processing stages together. Here is the sample code:"
|
||||
"Next, let's combine all document processing stages together. Here is the sample code below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 53,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n",
|
||||
"ONNX model loaded.\n",
|
||||
"Number of total chunks with metadata: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\"\"\"\n",
|
||||
"In this sample example, we will use 'database' provider for both summary and embeddings\n",
|
||||
"so, we don't need to do the following:\n",
|
||||
"In this sample example, we will use 'database' provider for both summary and embeddings.\n",
|
||||
"So, we don't need to do the followings:\n",
|
||||
" - set proxy for 3rd party providers\n",
|
||||
" - create credential for 3rd party providers\n",
|
||||
"\n",
|
||||
"If you choose to use 3rd party provider, please follow the necessary steps for proxy and credential.\n",
|
||||
"If you choose to use 3rd party provider, \n",
|
||||
"please follow the necessary steps for proxy and credential.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# please update with your username, password, and database connection string\n",
|
||||
"# oracle connection\n",
|
||||
"# please update with your username, password, hostname, and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)\n",
|
||||
"\n",
|
||||
" # load onnx model\n",
|
||||
" # please update with your related information\n",
|
||||
" onnx_dir = \"DEMO_PY_DIR\"\n",
|
||||
" onnx_file = \"tinybert.onnx\"\n",
|
||||
" model_name = \"demo_model\"\n",
|
||||
" OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
|
||||
"\n",
|
||||
"# load onnx model\n",
|
||||
"# please update with your related information\n",
|
||||
"onnx_dir = \"DEMO_PY_DIR\"\n",
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
"model_name = \"demo_model\"\n",
|
||||
"try:\n",
|
||||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||||
" print(\"ONNX model loaded.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"ONNX model loading failed!\")\n",
|
||||
" sys.exit(1)\n",
|
||||
"\n",
|
||||
" # params\n",
|
||||
" # please update necessary fields with related information\n",
|
||||
" loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
" \"colname\": \"data\",\n",
|
||||
" }\n",
|
||||
" summary_params = {\n",
|
||||
" \"provider\": \"database\",\n",
|
||||
" \"glevel\": \"S\",\n",
|
||||
" \"numParagraphs\": 1,\n",
|
||||
" \"language\": \"english\",\n",
|
||||
" }\n",
|
||||
" splitter_params = {\"normalize\": \"all\"}\n",
|
||||
" embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
" # instantiate loader, summary, splitter, and embedder\n",
|
||||
" loader = OracleDocLoader(conn=connection, params=loader_params)\n",
|
||||
" summary = OracleSummary(conn=connection, params=summary_params)\n",
|
||||
" splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
|
||||
" embedder = OracleEmbeddings(conn=connection, params=embedder_params)\n",
|
||||
"# params\n",
|
||||
"# please update necessary fields with related information\n",
|
||||
"loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
" \"colname\": \"data\",\n",
|
||||
"}\n",
|
||||
"summary_params = {\n",
|
||||
" \"provider\": \"database\",\n",
|
||||
" \"glevel\": \"S\",\n",
|
||||
" \"numParagraphs\": 1,\n",
|
||||
" \"language\": \"english\",\n",
|
||||
"}\n",
|
||||
"splitter_params = {\"normalize\": \"all\"}\n",
|
||||
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
" # process the documents\n",
|
||||
" chunks_with_mdata = []\n",
|
||||
" for id, doc in enumerate(docs, start=1):\n",
|
||||
" summ = summary.get_summary(doc.page_content)\n",
|
||||
" chunks = splitter.split_text(doc.page_content)\n",
|
||||
" for ic, chunk in enumerate(chunks, start=1):\n",
|
||||
" chunk_metadata = doc.metadata.copy()\n",
|
||||
" chunk_metadata[\"id\"] = (\n",
|
||||
" chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
|
||||
" )\n",
|
||||
" chunk_metadata[\"document_id\"] = str(id)\n",
|
||||
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
|
||||
" chunks_with_mdata.append(\n",
|
||||
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
|
||||
" )\n",
|
||||
"# instantiate loader, summary, splitter, and embedder\n",
|
||||
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
|
||||
"summary = OracleSummary(conn=conn, params=summary_params)\n",
|
||||
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params)\n",
|
||||
"\n",
|
||||
" \"\"\" verify \"\"\"\n",
|
||||
" print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
|
||||
"# process the documents\n",
|
||||
"chunks_with_mdata = []\n",
|
||||
"for id, doc in enumerate(docs, start=1):\n",
|
||||
" summ = summary.get_summary(doc.page_content)\n",
|
||||
" chunks = splitter.split_text(doc.page_content)\n",
|
||||
" for ic, chunk in enumerate(chunks, start=1):\n",
|
||||
" chunk_metadata = doc.metadata.copy()\n",
|
||||
" chunk_metadata[\"id\"] = chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
|
||||
" chunk_metadata[\"document_id\"] = str(id)\n",
|
||||
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
|
||||
" chunks_with_mdata.append(\n",
|
||||
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
"print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -601,15 +733,23 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 55,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Vector Store Table: oravs\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# create Oracle AI Vector Store\n",
|
||||
"vectorstore = OracleVS.from_documents(\n",
|
||||
" chunks_with_mdata,\n",
|
||||
" embedder,\n",
|
||||
" client=connection,\n",
|
||||
" client=conn,\n",
|
||||
" table_name=\"oravs\",\n",
|
||||
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
|
||||
")\n",
|
||||
@@ -638,12 +778,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"oraclevs.create_index(\n",
|
||||
" connection, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
|
||||
" conn, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"Index created.\")"
|
||||
@@ -653,7 +793,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. You may adjust various parameters according to your specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
|
||||
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
|
||||
"\n",
|
||||
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
|
||||
]
|
||||
@@ -665,16 +805,29 @@
|
||||
"## Perform Semantic Search\n",
|
||||
"All set!\n",
|
||||
"\n",
|
||||
"You have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. You are now prepared to proceed with semantic searches.\n",
|
||||
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
|
||||
"\n",
|
||||
"Below is the sample code for this process:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 58,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
|
||||
"[]\n",
|
||||
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
|
||||
"[]\n",
|
||||
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
|
||||
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What is Oracle AI Vector Store?\"\n",
|
||||
"filter = {\"document_id\": [\"1\"]}\n",
|
||||
@@ -719,7 +872,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.3"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -552,7 +552,9 @@
|
||||
"cell_type": "markdown",
|
||||
"id": "77deb6a0-0950-450a-916a-f2a029676c20",
|
||||
"metadata": {},
|
||||
"source": "**Appending all retrieved documents in a single document**"
|
||||
"source": [
|
||||
"**Appending all retreived documents in a single document**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -756,4 +758,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# RAG using Upstage Document Parse and Groundedness Check\n",
|
||||
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Document Parse and Groundedness Check."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import DocArrayInMemorySearch\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain_core.runnables.base import RunnableSerializable\n",
|
||||
"from langchain_upstage import (\n",
|
||||
" ChatUpstage,\n",
|
||||
" UpstageDocumentParseLoader,\n",
|
||||
" UpstageEmbeddings,\n",
|
||||
" UpstageGroundednessCheck,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"model = ChatUpstage()\n",
|
||||
"\n",
|
||||
"files = [\"/PATH/TO/YOUR/FILE.pdf\", \"/PATH/TO/YOUR/FILE2.pdf\"]\n",
|
||||
"\n",
|
||||
"loader = UpstageDocumentParseLoader(file_path=files, split=\"element\")\n",
|
||||
"\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"vectorstore = DocArrayInMemorySearch.from_documents(\n",
|
||||
" docs, embedding=UpstageEmbeddings(model=\"solar-embedding-1-large\")\n",
|
||||
")\n",
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"\n",
|
||||
"template = \"\"\"Answer the question based only on the following context:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"output_parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"retrieved_docs = retriever.get_relevant_documents(\"How many parameters in SOLAR model?\")\n",
|
||||
"\n",
|
||||
"groundedness_check = UpstageGroundednessCheck()\n",
|
||||
"groundedness = \"\"\n",
|
||||
"while groundedness != \"grounded\":\n",
|
||||
" chain: RunnableSerializable = RunnablePassthrough() | prompt | model | output_parser\n",
|
||||
"\n",
|
||||
" result = chain.invoke(\n",
|
||||
" {\n",
|
||||
" \"context\": retrieved_docs,\n",
|
||||
" \"question\": \"How many parameters in SOLAR model?\",\n",
|
||||
" }\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" groundedness = groundedness_check.invoke(\n",
|
||||
" {\n",
|
||||
" \"context\": retrieved_docs,\n",
|
||||
" \"answer\": result,\n",
|
||||
" }\n",
|
||||
" )"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,82 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# RAG using Upstage Layout Analysis and Groundedness Check\n",
|
||||
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Layout Analysis and Groundedness Check."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import List\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import DocArrayInMemorySearch\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"from langchain_core.runnables.base import RunnableSerializable\n",
|
||||
"from langchain_upstage import (\n",
|
||||
" ChatUpstage,\n",
|
||||
" UpstageEmbeddings,\n",
|
||||
" UpstageGroundednessCheck,\n",
|
||||
" UpstageLayoutAnalysisLoader,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"model = ChatUpstage()\n",
|
||||
"\n",
|
||||
"files = [\"/PATH/TO/YOUR/FILE.pdf\", \"/PATH/TO/YOUR/FILE2.pdf\"]\n",
|
||||
"\n",
|
||||
"loader = UpstageLayoutAnalysisLoader(file_path=files, split=\"element\")\n",
|
||||
"\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"vectorstore = DocArrayInMemorySearch.from_documents(\n",
|
||||
" docs, embedding=UpstageEmbeddings(model=\"solar-embedding-1-large\")\n",
|
||||
")\n",
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"\n",
|
||||
"template = \"\"\"Answer the question based only on the following context:\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"Question: {question}\n",
|
||||
"\"\"\"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(template)\n",
|
||||
"output_parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"retrieved_docs = retriever.get_relevant_documents(\"How many parameters in SOLAR model?\")\n",
|
||||
"\n",
|
||||
"groundedness_check = UpstageGroundednessCheck()\n",
|
||||
"groundedness = \"\"\n",
|
||||
"while groundedness != \"grounded\":\n",
|
||||
" chain: RunnableSerializable = RunnablePassthrough() | prompt | model | output_parser\n",
|
||||
"\n",
|
||||
" result = chain.invoke(\n",
|
||||
" {\n",
|
||||
" \"context\": retrieved_docs,\n",
|
||||
" \"question\": \"How many parameters in SOLAR model?\",\n",
|
||||
" }\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" groundedness = groundedness_check.invoke(\n",
|
||||
" {\n",
|
||||
" \"context\": retrieved_docs,\n",
|
||||
" \"answer\": result,\n",
|
||||
" }\n",
|
||||
" )"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -53,7 +53,7 @@
|
||||
"id": "f5ccda4e-7af5-4355-b9c4-25547edf33f9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's first load up this paper, and split into text chunks of size 1000."
|
||||
"Lets first load up this paper, and split into text chunks of size 1000."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -241,7 +241,7 @@
|
||||
"id": "360b2837-8024-47e0-a4ba-592505a9a5c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With our embedder in place, let's define our retriever:"
|
||||
"With our embedder in place, lets define our retriever:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -312,7 +312,7 @@
|
||||
"id": "d84ea8f4-a5de-4d76-b44d-85e56583f489",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's write our documents into our new store. This will use our embedder on each document."
|
||||
"Lets write our documents into our new store. This will use our embedder on each document."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -339,7 +339,7 @@
|
||||
"id": "580bc212-8ecd-4d28-8656-b96fcd0d7eb6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Our retriever is good to go. Let's load up an LLM, that will reason over the retrieved documents:"
|
||||
"Great! Our retriever is good to go. Lets load up an LLM, that will reason over the retrieved documents:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -430,7 +430,7 @@
|
||||
"id": "3bc53602-86d6-420f-91b1-fc2effa7e986",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Excellent! Let's ask it a question.\n",
|
||||
"Excellent! lets ask it a question.\n",
|
||||
"We will also use a verbose and debug, to check which documents were used by the model to produce the answer."
|
||||
]
|
||||
},
|
||||
|
||||
@@ -233,7 +233,7 @@ Question: {input}"""
|
||||
|
||||
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
|
||||
|
||||
Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
|
||||
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
|
||||
|
||||
Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
|
||||
|
||||
|
||||
@@ -34,7 +34,7 @@
|
||||
"tools = [multiply, exponentiate, add]\n",
|
||||
"\n",
|
||||
"gpt35 = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0).bind_tools(tools)\n",
|
||||
"claude3 = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\").bind_tools(tools)\n",
|
||||
"claude3 = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools(tools)\n",
|
||||
"llm_with_tools = gpt35.configurable_alternatives(\n",
|
||||
" ConfigurableField(id=\"llm\"), default_key=\"gpt35\", claude3=claude3\n",
|
||||
")"
|
||||
@@ -113,15 +113,14 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
|
||||
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_xuNXwm2P6U2Pp2pAbC1sdIBz', 'function': {'arguments': '{\"x\": 3, \"y\": 5}', 'name': 'add'}, 'type': 'function'}, {'id': 'call_0pImUJUDlYa5zfBcxxuvWyYS', 'function': {'arguments': '{\"x\": 8, \"y\": 2.743}', 'name': 'exponentiate'}, 'type': 'function'}, {'id': 'call_yaownQ9TZK0dkqD1KSFyax4H', 'function': {'arguments': '{\"x\": 17.24, \"y\": -918.1241}', 'name': 'add'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 75, 'prompt_tokens': 131, 'total_tokens': 206, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-ByJm2qxSWU3oTTSZQv64J4XQKZhA6', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--35fad027-47f7-44d3-aa8b-99f4fc24098c-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 5}, 'id': 'call_xuNXwm2P6U2Pp2pAbC1sdIBz', 'type': 'tool_call'}, {'name': 'exponentiate', 'args': {'x': 8, 'y': 2.743}, 'id': 'call_0pImUJUDlYa5zfBcxxuvWyYS', 'type': 'tool_call'}, {'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'call_yaownQ9TZK0dkqD1KSFyax4H', 'type': 'tool_call'}], usage_metadata={'input_tokens': 131, 'output_tokens': 75, 'total_tokens': 206, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),\n",
|
||||
" ToolMessage(content='8.0', tool_call_id='call_xuNXwm2P6U2Pp2pAbC1sdIBz'),\n",
|
||||
" ToolMessage(content='300.03770462067547', tool_call_id='call_0pImUJUDlYa5zfBcxxuvWyYS'),\n",
|
||||
" ToolMessage(content='-900.8841', tool_call_id='call_yaownQ9TZK0dkqD1KSFyax4H'),\n",
|
||||
" AIMessage(content='The results are:\\n1. 3 plus 5 is 8.\\n2. 5 raised to the power of 2.743 is approximately 300.04.\\n3. 17.24 minus 918.1241 is approximately -900.88.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 55, 'prompt_tokens': 236, 'total_tokens': 291, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-ByJm345MYnpowGS90iAZAlSs7haed', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--5fa66d47-d80e-45d0-9c32-31348c735d72-0', usage_metadata={'input_tokens': 236, 'output_tokens': 55, 'total_tokens': 291, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}"
|
||||
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
|
||||
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_6yMU2WsS4Bqgi1WxFHxtfJRc', 'function': {'arguments': '{\"x\": 8, \"y\": 2.743}', 'name': 'exponentiate'}, 'type': 'function'}, {'id': 'call_GAL3dQiKFF9XEV0RrRLPTvVp', 'function': {'arguments': '{\"x\": 17.24, \"y\": -918.1241}', 'name': 'add'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 58, 'prompt_tokens': 168, 'total_tokens': 226}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-528302fc-7acf-4c11-82c4-119ccf40c573-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 8, 'y': 2.743}, 'id': 'call_6yMU2WsS4Bqgi1WxFHxtfJRc'}, {'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'call_GAL3dQiKFF9XEV0RrRLPTvVp'}]),\n",
|
||||
" ToolMessage(content='300.03770462067547', tool_call_id='call_6yMU2WsS4Bqgi1WxFHxtfJRc'),\n",
|
||||
" ToolMessage(content='-900.8841', tool_call_id='call_GAL3dQiKFF9XEV0RrRLPTvVp'),\n",
|
||||
" AIMessage(content='The result of \\\\(3 + 5^{2.743}\\\\) is approximately 300.04, and the result of \\\\(17.24 - 918.1241\\\\) is approximately -900.88.', response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 251, 'total_tokens': 295}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-d1161669-ed09-4b18-94bd-6d8530df5aa8-0')]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -147,17 +146,17 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
|
||||
" AIMessage(content=[{'text': \"I'll solve these calculations for you.\\n\\nFor the first part, I need to calculate 3 plus 5 raised to the power of 2.743.\\n\\nLet me break this down:\\n1) First, I'll calculate 5 raised to the power of 2.743\\n2) Then add 3 to the result\", 'type': 'text'}, {'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'input': {'x': 5, 'y': 2.743}, 'name': 'exponentiate', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01HCbDmuzdg9ATMyKbnecbEE', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 563, 'output_tokens': 146, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--9f6469fb-bcbb-4c1c-9eec-79f6979c38e6-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 5, 'y': 2.743}, 'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'type': 'tool_call'}], usage_metadata={'input_tokens': 563, 'output_tokens': 146, 'total_tokens': 709, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
|
||||
" ToolMessage(content='82.65606421491815', tool_call_id='toolu_01L1mXysBQtpPLQ2AZTaCGmE'),\n",
|
||||
" AIMessage(content=[{'text': \"Now I'll add 3 to this result:\", 'type': 'text'}, {'id': 'toolu_01NARC83e9obV35mZ6jYzBiN', 'input': {'x': 3, 'y': 82.65606421491815}, 'name': 'add', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01ELwyCtVLeGC685PUFqmdz2', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 727, 'output_tokens': 87, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--d5af3d7c-e8b7-4cc2-997a-ad2dafd08751-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 82.65606421491815}, 'id': 'toolu_01NARC83e9obV35mZ6jYzBiN', 'type': 'tool_call'}], usage_metadata={'input_tokens': 727, 'output_tokens': 87, 'total_tokens': 814, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
|
||||
" ToolMessage(content='85.65606421491815', tool_call_id='toolu_01NARC83e9obV35mZ6jYzBiN'),\n",
|
||||
" AIMessage(content=[{'text': \"For the second part, you asked for 17.24 - 918.1241. I don't have a subtraction function available, but I can rewrite this as adding a negative number: 17.24 + (-918.1241)\", 'type': 'text'}, {'id': 'toolu_01Q6fLcZkBWZpMPCZ55WXR3N', 'input': {'x': 17.24, 'y': -918.1241}, 'name': 'add', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01WkmDwUxWjjaKGnTtdLGJnN', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 832, 'output_tokens': 130, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--39a6fbda-4c81-47a6-b361-524bd4ee5823-0', tool_calls=[{'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'toolu_01Q6fLcZkBWZpMPCZ55WXR3N', 'type': 'tool_call'}], usage_metadata={'input_tokens': 832, 'output_tokens': 130, 'total_tokens': 962, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
|
||||
" ToolMessage(content='-900.8841', tool_call_id='toolu_01Q6fLcZkBWZpMPCZ55WXR3N'),\n",
|
||||
" AIMessage(content='So, the answers are:\\n1) 3 plus 5 raised to the 2.743 = 85.65606421491815\\n2) 17.24 - 918.1241 = -900.8841', additional_kwargs={}, response_metadata={'id': 'msg_015Yoc62CvdJbANGFouiQ6AQ', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 978, 'output_tokens': 58, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--174c0882-6180-47ea-8f63-d7b747302327-0', usage_metadata={'input_tokens': 978, 'output_tokens': 58, 'total_tokens': 1036, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}})]}"
|
||||
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
|
||||
" AIMessage(content=[{'text': \"Okay, let's break this down into two parts:\", 'type': 'text'}, {'id': 'toolu_01DEhqcXkXTtzJAiZ7uMBeDC', 'input': {'x': 3, 'y': 5}, 'name': 'add', 'type': 'tool_use'}], response_metadata={'id': 'msg_01AkLGH8sxMHaH15yewmjwkF', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 450, 'output_tokens': 81}}, id='run-f35bfae8-8ded-4f8a-831b-0940d6ad16b6-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 5}, 'id': 'toolu_01DEhqcXkXTtzJAiZ7uMBeDC'}]),\n",
|
||||
" ToolMessage(content='8.0', tool_call_id='toolu_01DEhqcXkXTtzJAiZ7uMBeDC'),\n",
|
||||
" AIMessage(content=[{'id': 'toolu_013DyMLrvnrto33peAKMGMr1', 'input': {'x': 8.0, 'y': 2.743}, 'name': 'exponentiate', 'type': 'tool_use'}], response_metadata={'id': 'msg_015Fmp8aztwYcce2JDAFfce3', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 545, 'output_tokens': 75}}, id='run-48aaeeeb-a1e5-48fd-a57a-6c3da2907b47-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 8.0, 'y': 2.743}, 'id': 'toolu_013DyMLrvnrto33peAKMGMr1'}]),\n",
|
||||
" ToolMessage(content='300.03770462067547', tool_call_id='toolu_013DyMLrvnrto33peAKMGMr1'),\n",
|
||||
" AIMessage(content=[{'text': 'So 3 plus 5 raised to the 2.743 power is 300.04.\\n\\nFor the second part:', 'type': 'text'}, {'id': 'toolu_01UTmMrGTmLpPrPCF1rShN46', 'input': {'x': 17.24, 'y': -918.1241}, 'name': 'add', 'type': 'tool_use'}], response_metadata={'id': 'msg_015TkhfRBENPib2RWAxkieH6', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 638, 'output_tokens': 105}}, id='run-45fb62e3-d102-4159-881d-241c5dbadeed-0', tool_calls=[{'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'toolu_01UTmMrGTmLpPrPCF1rShN46'}]),\n",
|
||||
" ToolMessage(content='-900.8841', tool_call_id='toolu_01UTmMrGTmLpPrPCF1rShN46'),\n",
|
||||
" AIMessage(content='Therefore, 17.24 - 918.1241 = -900.8841', response_metadata={'id': 'msg_01LgKnRuUcSyADCpxv9tPoYD', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 759, 'output_tokens': 24}}, id='run-1008254e-ccd1-497c-8312-9550dd77bd08-0')]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -178,7 +177,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "langchain",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -192,7 +191,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.16"
|
||||
"version": "3.10.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -227,7 +227,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"conversation_description = f\"\"\"Here is the topic of conversation: {topic}\n",
|
||||
"The participants are: {\", \".join(names.keys())}\"\"\"\n",
|
||||
"The participants are: {', '.join(names.keys())}\"\"\"\n",
|
||||
"\n",
|
||||
"agent_descriptor_system_message = SystemMessage(\n",
|
||||
" content=\"You can add detail to the description of the conversation participant.\"\n",
|
||||
@@ -396,7 +396,7 @@
|
||||
" You are the moderator.\n",
|
||||
" Please make the topic more specific.\n",
|
||||
" Please reply with the specified quest in {word_limit} words or less. \n",
|
||||
" Speak directly to the participants: {(*names,)}.\n",
|
||||
" Speak directly to the participants: {*names,}.\n",
|
||||
" Do not add anything else.\"\"\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
|
||||
@@ -26,7 +26,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"76e78b89cee4d6d31154823f93592315df79c28410dfbfc87c9f70cbfdfa648b\n"
|
||||
"2e44b44201c8778b462342ac97f5ccf05a4e02aa8a04505ecde97bf20dcc4cbb\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -49,7 +49,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install --quiet -U langchain-vdms langchain-experimental sentence-transformers opencv-python open_clip_torch torch accelerate"
|
||||
"! pip install --quiet -U vdms langchain-experimental sentence-transformers opencv-python open_clip_torch torch accelerate"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -63,16 +63,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/data1/cwlacewe/apps/cwlacewe_langchain/.langchain-venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
||||
" from .autonotebook import tqdm as notebook_tqdm\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import os\n",
|
||||
@@ -89,10 +80,10 @@
|
||||
"from langchain_community.embeddings.sentence_transformer import (\n",
|
||||
" SentenceTransformerEmbeddings,\n",
|
||||
")\n",
|
||||
"from langchain_community.vectorstores.vdms import VDMS, VDMS_Client\n",
|
||||
"from langchain_core.callbacks.manager import CallbackManagerForLLMRun\n",
|
||||
"from langchain_core.runnables import ConfigurableField\n",
|
||||
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
|
||||
"from langchain_vdms.vectorstores import VDMS, VDMS_Client\n",
|
||||
"from transformers import (\n",
|
||||
" AutoModelForCausalLM,\n",
|
||||
" AutoTokenizer,\n",
|
||||
@@ -372,7 +363,7 @@
|
||||
"\t\tThere are 2 shoppers in this video. Shopper 1 is wearing a plaid shirt and a spectacle. Shopper 2 who is not completely captured in the frame seems to wear a black shirt and is moving away with his back turned towards the camera. There is a shelf towards the right of the camera frame. Shopper 2 is hanging an item back to a hanger and then quickly walks away in a similar fashion as shopper 2. Contents of the nearer side of the shelf with respect to camera seems to be camping lanterns and cleansing agents, arranged at the top. In the middle part of the shelf, various tools including grommets, a pocket saw, candles, and other helpful camping items can be observed. Midway through the shelf contains items which appear to be steel containers and items made up of plastic with red, green, orange, and yellow colors, while those at the bottom are packed in cardboard boxes. Contents at the farther part of the shelf are well stocked and organized but are not glaringly visible.\n",
|
||||
"\n",
|
||||
"\tMetadata:\n",
|
||||
"\t\t{'fps': 24.0, 'total_frames': 120.0, 'video': 'clip16.mp4'}\n",
|
||||
"\t\t{'fps': 24.0, 'id': 'c6e5f894-b905-46f5-ac9e-4487a9235561', 'total_frames': 120.0, 'video': 'clip16.mp4'}\n",
|
||||
"Retrieved Top matching video!\n",
|
||||
"\n",
|
||||
"\n"
|
||||
@@ -401,12 +392,18 @@
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.01s/it]\n",
|
||||
"WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.\n"
|
||||
]
|
||||
"data": {
|
||||
"application/vnd.jupyter.widget-view+json": {
|
||||
"model_id": "3edf8783e114487ca490d8dec5c46884",
|
||||
"version_major": 2,
|
||||
"version_minor": 0
|
||||
},
|
||||
"text/plain": [
|
||||
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
@@ -558,7 +555,7 @@
|
||||
"\t\tA single shopper is seen in this video standing facing the shelf and in the bottom part of the frame. He's wearing a light-colored shirt and a spectacle. The shopper is carrying a red colored basket in his left hand. The entire basket is not clearly visible, but it does seem to contain something in a blue colored package which the shopper has just placed in the basket given his right hand was seen inside the basket. Then the shopper leans towards the shelf and checks out an item in orange package. He picks this single item with his right hand and proceeds to place the item in the basket. The entire shelf looks well stocked except for the top part of the shelf which is empty. The shopper has not picked any item from this part of the shelf. The rest of the shelf looks well stocked and does not need any restocking. The contents on the farther part of the shelf consists of items, majority of which are packed in black, yellow, and green packages. No other details are visible of these items.\n",
|
||||
"\n",
|
||||
"\tMetadata:\n",
|
||||
"\t\t{'fps': 24.0, 'total_frames': 162.0, 'video': 'clip10.mp4'}\n",
|
||||
"\t\t{'fps': 24.0, 'id': '37ddc212-994e-4db0-877f-5ed09965ab90', 'total_frames': 162.0, 'video': 'clip10.mp4'}\n",
|
||||
"Retrieved Top matching video!\n",
|
||||
"\n",
|
||||
"\n"
|
||||
@@ -588,7 +585,7 @@
|
||||
"User : Find a man holding a red shopping basket\n",
|
||||
"Assistant : Most relevant retrieved video is **clip9.mp4** \n",
|
||||
"\n",
|
||||
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the available information, I cannot confirm whether the basket is empty or contains items. However, the rest of the\n"
|
||||
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the scene description, I can confirm that the person is indeed holding a red shopping basket.</s>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -658,7 +655,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".langchain-venv",
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -672,7 +669,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.10"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -144,8 +144,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import os\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
||||
"# os.environ[\"LANGSMITH_PROJECT\"] = \"default\" # Make sure this session actually exists."
|
||||
"# os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\"\n",
|
||||
"# os.environ[\"LANGCHAIN_SESSION\"] = \"default\" # Make sure this session actually exists."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
3
docker/Dockerfile.base
Normal file
3
docker/Dockerfile.base
Normal file
@@ -0,0 +1,3 @@
|
||||
FROM python:3.11
|
||||
|
||||
RUN pip install langchain
|
||||
12
docker/Makefile
Normal file
12
docker/Makefile
Normal file
@@ -0,0 +1,12 @@
|
||||
# Makefile
|
||||
|
||||
build_graphdb:
|
||||
docker build --tag graphdb ./graphdb
|
||||
|
||||
start_graphdb:
|
||||
docker-compose up -d graphdb
|
||||
|
||||
down:
|
||||
docker-compose down -v --remove-orphans
|
||||
|
||||
.PHONY: build_graphdb start_graphdb down
|
||||
84
docker/docker-compose.yml
Normal file
84
docker/docker-compose.yml
Normal file
@@ -0,0 +1,84 @@
|
||||
# docker-compose to make it easier to spin up integration tests.
|
||||
# Services should use NON standard ports to avoid collision with
|
||||
# any existing services that might be used for development.
|
||||
# ATTENTION: When adding a service below use a non-standard port
|
||||
# increment by one from the preceding port.
|
||||
# For credentials always use `langchain` and `langchain` for the
|
||||
# username and password.
|
||||
version: "3"
|
||||
name: langchain-tests
|
||||
|
||||
services:
|
||||
redis:
|
||||
image: redis/redis-stack-server:latest
|
||||
# We use non standard ports since
|
||||
# these instances are used for testing
|
||||
# and users may already have existing
|
||||
# redis instances set up locally
|
||||
# for other projects
|
||||
ports:
|
||||
- "6020:6379"
|
||||
volumes:
|
||||
- ./redis-volume:/data
|
||||
graphdb:
|
||||
image: graphdb
|
||||
ports:
|
||||
- "6021:7200"
|
||||
mongo:
|
||||
image: mongo:latest
|
||||
container_name: mongo_container
|
||||
ports:
|
||||
- "6022:27017"
|
||||
environment:
|
||||
MONGO_INITDB_ROOT_USERNAME: langchain
|
||||
MONGO_INITDB_ROOT_PASSWORD: langchain
|
||||
postgres:
|
||||
image: postgres:16
|
||||
environment:
|
||||
POSTGRES_DB: langchain
|
||||
POSTGRES_USER: langchain
|
||||
POSTGRES_PASSWORD: langchain
|
||||
ports:
|
||||
- "6023:5432"
|
||||
command: |
|
||||
postgres -c log_statement=all
|
||||
healthcheck:
|
||||
test:
|
||||
[
|
||||
"CMD-SHELL",
|
||||
"psql postgresql://langchain:langchain@localhost/langchain --command 'SELECT 1;' || exit 1",
|
||||
]
|
||||
interval: 5s
|
||||
retries: 60
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
pgvector:
|
||||
# postgres with the pgvector extension
|
||||
image: ankane/pgvector
|
||||
environment:
|
||||
POSTGRES_DB: langchain
|
||||
POSTGRES_USER: langchain
|
||||
POSTGRES_PASSWORD: langchain
|
||||
ports:
|
||||
- "6024:5432"
|
||||
command: |
|
||||
postgres -c log_statement=all
|
||||
healthcheck:
|
||||
test:
|
||||
[
|
||||
"CMD-SHELL",
|
||||
"psql postgresql://langchain:langchain@localhost/langchain --command 'SELECT 1;' || exit 1",
|
||||
]
|
||||
interval: 5s
|
||||
retries: 60
|
||||
volumes:
|
||||
- postgres_data_pgvector:/var/lib/postgresql/data
|
||||
vdms:
|
||||
image: intellabs/vdms:latest
|
||||
container_name: vdms_container
|
||||
ports:
|
||||
- "6025:55555"
|
||||
|
||||
volumes:
|
||||
postgres_data:
|
||||
postgres_data_pgvector:
|
||||
5
docker/graphdb/Dockerfile
Normal file
5
docker/graphdb/Dockerfile
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM ontotext/graphdb:10.5.1
|
||||
RUN mkdir -p /opt/graphdb/dist/data/repositories/langchain
|
||||
COPY config.ttl /opt/graphdb/dist/data/repositories/langchain/
|
||||
COPY graphdb_create.sh /run.sh
|
||||
ENTRYPOINT bash /run.sh
|
||||
46
docker/graphdb/config.ttl
Normal file
46
docker/graphdb/config.ttl
Normal file
@@ -0,0 +1,46 @@
|
||||
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
|
||||
@prefix rep: <http://www.openrdf.org/config/repository#>.
|
||||
@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
|
||||
@prefix sail: <http://www.openrdf.org/config/sail#>.
|
||||
@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
|
||||
|
||||
[] a rep:Repository ;
|
||||
rep:repositoryID "langchain" ;
|
||||
rdfs:label "" ;
|
||||
rep:repositoryImpl [
|
||||
rep:repositoryType "graphdb:SailRepository" ;
|
||||
sr:sailImpl [
|
||||
sail:sailType "graphdb:Sail" ;
|
||||
|
||||
graphdb:read-only "false" ;
|
||||
|
||||
# Inference and Validation
|
||||
graphdb:ruleset "empty" ;
|
||||
graphdb:disable-sameAs "true" ;
|
||||
graphdb:check-for-inconsistencies "false" ;
|
||||
|
||||
# Indexing
|
||||
graphdb:entity-id-size "32" ;
|
||||
graphdb:enable-context-index "false" ;
|
||||
graphdb:enablePredicateList "true" ;
|
||||
graphdb:enable-fts-index "false" ;
|
||||
graphdb:fts-indexes ("default" "iri") ;
|
||||
graphdb:fts-string-literals-index "default" ;
|
||||
graphdb:fts-iris-index "none" ;
|
||||
|
||||
# Queries and Updates
|
||||
graphdb:query-timeout "0" ;
|
||||
graphdb:throw-QueryEvaluationException-on-timeout "false" ;
|
||||
graphdb:query-limit-results "0" ;
|
||||
|
||||
# Settable in the file but otherwise hidden in the UI and in the RDF4J console
|
||||
graphdb:base-URL "http://example.org/owlim#" ;
|
||||
graphdb:defaultNS "" ;
|
||||
graphdb:imports "" ;
|
||||
graphdb:repository-type "file-repository" ;
|
||||
graphdb:storage-folder "storage" ;
|
||||
graphdb:entity-index-size "10000000" ;
|
||||
graphdb:in-memory-literal-properties "true" ;
|
||||
graphdb:enable-literal-index "true" ;
|
||||
]
|
||||
].
|
||||
28
docker/graphdb/graphdb_create.sh
Normal file
28
docker/graphdb/graphdb_create.sh
Normal file
@@ -0,0 +1,28 @@
|
||||
#! /bin/bash
|
||||
REPOSITORY_ID="langchain"
|
||||
GRAPHDB_URI="http://localhost:7200/"
|
||||
|
||||
echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
|
||||
|
||||
function startGraphDB {
|
||||
echo -e "\nStarting GraphDB..."
|
||||
exec /opt/graphdb/dist/bin/graphdb
|
||||
}
|
||||
|
||||
function waitGraphDBStart {
|
||||
echo -e "\nWaiting GraphDB to start..."
|
||||
for _ in $(seq 1 5); do
|
||||
CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
|
||||
if [ "${CHECK_RES}" = '200' ]; then
|
||||
echo -e "\nUp and running"
|
||||
break
|
||||
fi
|
||||
sleep 30s
|
||||
echo "CHECK_RES: ${CHECK_RES}"
|
||||
done
|
||||
}
|
||||
|
||||
|
||||
startGraphDB &
|
||||
waitGraphDBStart
|
||||
wait
|
||||
@@ -1,9 +1,9 @@
|
||||
# We build the docs in these stages:
|
||||
# 1. Install vercel and python dependencies
|
||||
# 2. Copy files from "source dir" to "intermediate dir"
|
||||
# 2. Generate files like model feat table, etc in "intermediate dir"
|
||||
# 3. Copy files to their right spots (e.g. langserve readme) in "intermediate dir"
|
||||
# 4. Build the docs from "intermediate dir" to "output dir"
|
||||
# we build the docs in these stages:
|
||||
# 1. install vercel and python dependencies
|
||||
# 2. copy files from "source dir" to "intermediate dir"
|
||||
# 2. generate files like model feat table, etc in "intermediate dir"
|
||||
# 3. copy files to their right spots (e.g. langserve readme) in "intermediate dir"
|
||||
# 4. build the docs from "intermediate dir" to "output dir"
|
||||
|
||||
SOURCE_DIR = docs/
|
||||
INTERMEDIATE_DIR = build/intermediate/docs
|
||||
@@ -13,50 +13,43 @@ OUTPUT_NEW_DOCS_DIR = $(OUTPUT_NEW_DIR)/docs
|
||||
|
||||
PYTHON = .venv/bin/python
|
||||
|
||||
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec sh -c ' \
|
||||
for dir; do \
|
||||
if find "$$dir" -maxdepth 1 -type f \( -name "pyproject.toml" -o -name "setup.py" \) | grep -q .; then \
|
||||
echo "$$dir"; \
|
||||
fi \
|
||||
done' sh {} + | grep -vE "airbyte|ibm|couchbase|databricks" | tr '\n' ' ')
|
||||
|
||||
PORT ?= 3001
|
||||
|
||||
clean:
|
||||
rm -rf build
|
||||
|
||||
clean-cache:
|
||||
rm -rf build .venv/deps_installed
|
||||
|
||||
install-vercel-deps:
|
||||
yum -y -q update
|
||||
yum -y -q install gcc bzip2-devel libffi-devel zlib-devel wget tar gzip rsync -y
|
||||
yum -y update
|
||||
yum install gcc bzip2-devel libffi-devel zlib-devel wget tar gzip rsync -y
|
||||
|
||||
install-py-deps:
|
||||
@echo "📦 Installing Python dependencies..."
|
||||
@if [ ! -d .venv ]; then python3 -m venv .venv; fi
|
||||
@if [ ! -f .venv/deps_installed ]; then \
|
||||
$(PYTHON) -m pip install -q --upgrade pip --disable-pip-version-check; \
|
||||
$(PYTHON) -m pip install -q --upgrade uv; \
|
||||
$(PYTHON) -m uv pip install -q --pre -r vercel_requirements.txt; \
|
||||
$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py) --overrides vercel_overrides.txt; \
|
||||
touch .venv/deps_installed; \
|
||||
fi
|
||||
@echo "✅ Dependencies installed"
|
||||
python3 -m venv .venv
|
||||
$(PYTHON) -m pip install --upgrade pip
|
||||
$(PYTHON) -m pip install --upgrade uv
|
||||
$(PYTHON) -m uv pip install --pre -r vercel_requirements.txt
|
||||
$(PYTHON) -m uv pip install --pre --editable $(PARTNER_DEPS_LIST)
|
||||
|
||||
generate-files:
|
||||
@echo "📄 Generating documentation files..."
|
||||
mkdir -p $(INTERMEDIATE_DIR)
|
||||
cp -rp $(SOURCE_DIR)/* $(INTERMEDIATE_DIR)
|
||||
@if [ ! -f build/langserve_readme_cache.md ] || [ $$(find build/langserve_readme_cache.md -mtime +1 -print) ]; then \
|
||||
echo "🌐 Downloading LangServe README..."; \
|
||||
curl -s https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md | sed 's/<=/\<=/g' > build/langserve_readme_cache.md; \
|
||||
fi
|
||||
cp build/langserve_readme_cache.md $(INTERMEDIATE_DIR)/langserve.md
|
||||
cp ../SECURITY.md $(INTERMEDIATE_DIR)/security.md
|
||||
@echo "🔧 Generating feature tables and processing links..."
|
||||
$(PYTHON) scripts/tool_feat_table.py $(INTERMEDIATE_DIR) & \
|
||||
$(PYTHON) scripts/kv_store_feat_table.py $(INTERMEDIATE_DIR) & \
|
||||
$(PYTHON) scripts/partner_pkg_table.py $(INTERMEDIATE_DIR) & \
|
||||
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langserve.md https://github.com/langchain-ai/langserve/tree/main/ & \
|
||||
wait
|
||||
@echo "✅ Files generated"
|
||||
cp -r $(SOURCE_DIR)/* $(INTERMEDIATE_DIR)
|
||||
|
||||
$(PYTHON) scripts/tool_feat_table.py $(INTERMEDIATE_DIR)
|
||||
|
||||
$(PYTHON) scripts/kv_store_feat_table.py $(INTERMEDIATE_DIR)
|
||||
|
||||
$(PYTHON) scripts/partner_pkg_table.py $(INTERMEDIATE_DIR)
|
||||
|
||||
wget -q https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O $(INTERMEDIATE_DIR)/langserve.md
|
||||
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langserve.md https://github.com/langchain-ai/langserve/tree/main/
|
||||
|
||||
copy-infra:
|
||||
@echo "📂 Copying infrastructure files..."
|
||||
mkdir -p $(OUTPUT_NEW_DIR)
|
||||
cp -r src $(OUTPUT_NEW_DIR)
|
||||
cp vercel.json $(OUTPUT_NEW_DIR)
|
||||
@@ -66,24 +59,16 @@ copy-infra:
|
||||
cp package.json $(OUTPUT_NEW_DIR)
|
||||
cp sidebars.js $(OUTPUT_NEW_DIR)
|
||||
cp -r static $(OUTPUT_NEW_DIR)
|
||||
cp -r ../libs/cli/langchain_cli/integration_template $(OUTPUT_NEW_DIR)/src/theme
|
||||
cp yarn.lock $(OUTPUT_NEW_DIR)
|
||||
@echo "✅ Infrastructure files copied"
|
||||
|
||||
render:
|
||||
@echo "📓 Converting notebooks (this may take a while)..."
|
||||
$(PYTHON) scripts/notebook_convert.py $(INTERMEDIATE_DIR) $(OUTPUT_NEW_DOCS_DIR)
|
||||
@echo "✅ Notebooks converted"
|
||||
|
||||
md-sync:
|
||||
@echo "📝 Syncing markdown files..."
|
||||
rsync -avmq --include="*/" --include="*.mdx" --include="*.md" --include="*.png" --include="*/_category_.yml" --exclude="*" $(INTERMEDIATE_DIR)/ $(OUTPUT_NEW_DOCS_DIR)
|
||||
@echo "✅ Markdown files synced"
|
||||
rsync -avm --include="*/" --include="*.mdx" --include="*.md" --include="*.png" --include="*/_category_.yml" --exclude="*" $(INTERMEDIATE_DIR)/ $(OUTPUT_NEW_DOCS_DIR)
|
||||
|
||||
append-related:
|
||||
@echo "🔗 Appending related links..."
|
||||
$(PYTHON) scripts/append_related_links.py $(OUTPUT_NEW_DOCS_DIR)
|
||||
@echo "✅ Related links appended"
|
||||
|
||||
generate-references:
|
||||
$(PYTHON) scripts/generate_api_reference_links.py --docs_dir $(OUTPUT_NEW_DOCS_DIR)
|
||||
@@ -91,26 +76,16 @@ generate-references:
|
||||
update-md: generate-files md-sync
|
||||
|
||||
build: install-py-deps generate-files copy-infra render md-sync append-related
|
||||
@echo ""
|
||||
@echo "🎉 Documentation build complete!"
|
||||
@echo "📖 To view locally, run: cd docs && make start"
|
||||
@echo ""
|
||||
|
||||
vercel-build: install-vercel-deps build generate-references
|
||||
rm -rf docs
|
||||
mv $(OUTPUT_NEW_DOCS_DIR) docs
|
||||
cp -r ../libs/cli/langchain_cli/integration_template src/theme
|
||||
rm -rf build
|
||||
mkdir static/api_reference
|
||||
git clone --depth=1 https://github.com/langchain-ai/langchain-api-docs-html.git
|
||||
mv langchain-api-docs-html/api_reference_build/html/* static/api_reference/
|
||||
rm -rf langchain-api-docs-html
|
||||
git clone --depth=1 https://github.com/baskaryan/langchain-api-docs-build.git
|
||||
mv langchain-api-docs-build/api_reference_build/html/* static/api_reference/
|
||||
rm -rf langchain-api-docs-build
|
||||
NODE_OPTIONS="--max-old-space-size=5000" yarn run docusaurus build
|
||||
|
||||
start:
|
||||
@echo "🚀 Starting documentation server on port $(PORT)..."
|
||||
@echo "📖 Installing Node.js dependencies..."
|
||||
cd $(OUTPUT_NEW_DIR) && yarn install --silent
|
||||
@echo "🌐 Starting server at http://localhost:$(PORT)"
|
||||
@echo "Press Ctrl+C to stop the server"
|
||||
cd $(OUTPUT_NEW_DIR) && yarn start --port=$(PORT)
|
||||
cd $(OUTPUT_NEW_DIR) && yarn && yarn start --port=$(PORT)
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
# LangChain Documentation
|
||||
|
||||
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/how_to/documentation)
|
||||
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/documentation)
|
||||
|
||||
@@ -108,7 +108,7 @@ class GalleryGridDirective(SphinxDirective):
|
||||
|
||||
# Parse the template with Sphinx Design to create an output container
|
||||
# Prep the options for the template grid
|
||||
class_ = "gallery-directive" + f" {self.options.get('class-container', '')}"
|
||||
class_ = "gallery-directive" + f' {self.options.get("class-container", "")}'
|
||||
options = {"gutter": 2, "class-container": class_}
|
||||
options_str = "\n".join(f":{k}: {v}" for k, v in options.items())
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user