mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-05 08:40:36 +00:00
Compare commits
6 Commits
cc/bench
...
rlm/test-l
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3c6cb273d0 | ||
|
|
ad624ae3ed | ||
|
|
550cf6ac8f | ||
|
|
5ab3d5e54c | ||
|
|
d4b2876e50 | ||
|
|
4cfadef235 |
@@ -5,31 +5,26 @@ This project includes a [dev container](https://containers.dev/), which lets you
|
||||
You can use the dev container configuration in this folder to build and run the app without needing to install any of its tools locally! You can use it in [GitHub Codespaces](https://github.com/features/codespaces) or the [VS Code Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).
|
||||
|
||||
## GitHub Codespaces
|
||||
|
||||
[](https://codespaces.new/langchain-ai/langchain)
|
||||
|
||||
You may use the button above, or follow these steps to open this repo in a Codespace:
|
||||
|
||||
1. Click the **Code** drop-down menu at the top of <https://github.com/langchain-ai/langchain>.
|
||||
1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain.
|
||||
1. Click on the **Codespaces** tab.
|
||||
1. Click **Create codespace on master**.
|
||||
|
||||
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
|
||||
|
||||
|
||||
## VS Code Dev Containers
|
||||
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
|
||||
> [!NOTE]
|
||||
> If you click the link above you will open the main repo (`langchain-ai/langchain`) and *not* your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
|
||||
|
||||
```txt
|
||||
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<YOUR_USERNAME>/<YOUR_CLONED_REPO_NAME>
|
||||
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
|
||||
```
|
||||
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
|
||||
|
||||
```
|
||||
Then you will have a local cloned repo where you can contribute and then create pull requests.
|
||||
|
||||
If you already have VS Code and Docker installed, you can use the button above to get started. This will use VSCode to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
|
||||
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
|
||||
|
||||
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
|
||||
|
||||
@@ -45,5 +40,5 @@ You can learn more in the [Dev Containers documentation](https://code.visualstud
|
||||
|
||||
## Tips and tricks
|
||||
|
||||
- If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
|
||||
- If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
|
||||
* If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
|
||||
* If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
|
||||
|
||||
@@ -1,58 +1,36 @@
|
||||
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
|
||||
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-docker-compose
|
||||
{
|
||||
// Name for the dev container
|
||||
"name": "langchain",
|
||||
// Point to a Docker Compose file
|
||||
"dockerComposeFile": "./docker-compose.yaml",
|
||||
// Required when using Docker Compose. The name of the service to connect to once running
|
||||
"service": "langchain",
|
||||
// The optional 'workspaceFolder' property is the path VS Code should open by default when
|
||||
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
|
||||
"workspaceFolder": "/workspaces/langchain",
|
||||
"mounts": [
|
||||
"source=langchain-workspaces,target=/workspaces/langchain,type=volume"
|
||||
],
|
||||
// Prevent the container from shutting down
|
||||
"overrideCommand": true,
|
||||
// Features to add to the dev container. More info: https://containers.dev/features
|
||||
"features": {
|
||||
"ghcr.io/devcontainers/features/git:1": {},
|
||||
"ghcr.io/devcontainers/features/github-cli:1": {}
|
||||
},
|
||||
"containerEnv": {
|
||||
"UV_LINK_MODE": "copy"
|
||||
},
|
||||
// Use 'forwardPorts' to make a list of ports inside the container available locally.
|
||||
// "forwardPorts": [],
|
||||
// Run commands after the container is created
|
||||
"postCreateCommand": "uv sync && echo 'LangChain (Python) dev environment ready!'",
|
||||
// Configure tool-specific properties.
|
||||
"customizations": {
|
||||
"vscode": {
|
||||
"extensions": [
|
||||
"ms-python.python",
|
||||
"ms-python.debugpy",
|
||||
"ms-python.mypy-type-checker",
|
||||
"ms-python.isort",
|
||||
"unifiedjs.vscode-mdx",
|
||||
"davidanson.vscode-markdownlint",
|
||||
"ms-toolsai.jupyter",
|
||||
"GitHub.copilot",
|
||||
"GitHub.copilot-chat"
|
||||
],
|
||||
"settings": {
|
||||
"python.defaultInterpreterPath": ".venv/bin/python",
|
||||
"python.formatting.provider": "none",
|
||||
"[python]": {
|
||||
"editor.formatOnSave": true,
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
|
||||
// "remoteUser": "root"
|
||||
// Name for the dev container
|
||||
"name": "langchain",
|
||||
|
||||
// Point to a Docker Compose file
|
||||
"dockerComposeFile": "./docker-compose.yaml",
|
||||
|
||||
// Required when using Docker Compose. The name of the service to connect to once running
|
||||
"service": "langchain",
|
||||
|
||||
// The optional 'workspaceFolder' property is the path VS Code should open by default when
|
||||
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
|
||||
"workspaceFolder": "/workspaces/langchain",
|
||||
|
||||
// Prevent the container from shutting down
|
||||
"overrideCommand": true
|
||||
|
||||
// Features to add to the dev container. More info: https://containers.dev/features
|
||||
// "features": {
|
||||
// "ghcr.io/devcontainers-contrib/features/poetry:2": {}
|
||||
// }
|
||||
|
||||
// Use 'forwardPorts' to make a list of ports inside the container available locally.
|
||||
// "forwardPorts": [],
|
||||
|
||||
// Uncomment the next line to run commands after the container is created.
|
||||
// "postCreateCommand": "cat /etc/os-release",
|
||||
|
||||
// Configure tool-specific properties.
|
||||
// "customizations": {},
|
||||
|
||||
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
|
||||
// "remoteUser": "root"
|
||||
}
|
||||
|
||||
@@ -4,10 +4,29 @@ services:
|
||||
build:
|
||||
dockerfile: libs/langchain/dev.Dockerfile
|
||||
context: ..
|
||||
|
||||
volumes:
|
||||
# Update this to wherever you want VS Code to mount the folder of your project
|
||||
- ..:/workspaces/langchain:cached
|
||||
networks:
|
||||
- langchain-network
|
||||
- langchain-network
|
||||
# environment:
|
||||
# MONGO_ROOT_USERNAME: root
|
||||
# MONGO_ROOT_PASSWORD: example123
|
||||
# depends_on:
|
||||
# - mongo
|
||||
# mongo:
|
||||
# image: mongo
|
||||
# restart: unless-stopped
|
||||
# environment:
|
||||
# MONGO_INITDB_ROOT_USERNAME: root
|
||||
# MONGO_INITDB_ROOT_PASSWORD: example123
|
||||
# ports:
|
||||
# - "27017:27017"
|
||||
# networks:
|
||||
# - langchain-network
|
||||
|
||||
networks:
|
||||
langchain-network:
|
||||
driver: bridge
|
||||
|
||||
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
# top-most EditorConfig file
|
||||
root = true
|
||||
|
||||
# All files
|
||||
[*]
|
||||
charset = utf-8
|
||||
end_of_line = lf
|
||||
insert_final_newline = true
|
||||
trim_trailing_whitespace = true
|
||||
|
||||
# Python files
|
||||
[*.py]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
max_line_length = 88
|
||||
|
||||
# JSON files
|
||||
[*.json]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# YAML files
|
||||
[*.{yml,yaml}]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# Markdown files
|
||||
[*.md]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
trim_trailing_whitespace = false
|
||||
|
||||
# Configuration files
|
||||
[*.{toml,ini,cfg}]
|
||||
indent_style = space
|
||||
indent_size = 4
|
||||
|
||||
# Shell scripts
|
||||
[*.sh]
|
||||
indent_style = space
|
||||
indent_size = 2
|
||||
|
||||
# Makefile
|
||||
[Makefile]
|
||||
indent_style = tab
|
||||
indent_size = 4
|
||||
|
||||
# Jupyter notebooks
|
||||
[*.ipynb]
|
||||
# Jupyter may include trailing whitespace in cell
|
||||
# outputs that's semantically meaningful
|
||||
trim_trailing_whitespace = false
|
||||
3
.github/CODEOWNERS
vendored
3
.github/CODEOWNERS
vendored
@@ -1,3 +0,0 @@
|
||||
/.github/ @baskaryan @ccurme @eyurtsev
|
||||
/libs/core/ @eyurtsev
|
||||
/libs/packages.yml @ccurme
|
||||
2
.github/CODE_OF_CONDUCT.md
vendored
2
.github/CODE_OF_CONDUCT.md
vendored
@@ -129,4 +129,4 @@ For answers to common questions about this code of conduct, see the FAQ at
|
||||
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
|
||||
[Mozilla CoC]: https://github.com/mozilla/diversity
|
||||
[FAQ]: https://www.contributor-covenant.org/faq
|
||||
[translations]: https://www.contributor-covenant.org/translations
|
||||
[translations]: https://www.contributor-covenant.org/translations
|
||||
6
.github/CONTRIBUTING.md
vendored
6
.github/CONTRIBUTING.md
vendored
@@ -3,8 +3,4 @@
|
||||
Hi there! Thank you for even being interested in contributing to LangChain.
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
|
||||
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
|
||||
## New features
|
||||
|
||||
For new features, please start a new [discussion](https://forum.langchain.com/), where the maintainers will help with scoping out the necessary changes.
|
||||
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
|
||||
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
Normal file
38
.github/DISCUSSION_TEMPLATE/ideas.yml
vendored
Normal file
@@ -0,0 +1,38 @@
|
||||
labels: [idea]
|
||||
body:
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I searched existing ideas and did not find a similar one
|
||||
required: true
|
||||
- label: I added a very descriptive title
|
||||
required: true
|
||||
- label: I've clearly described the feature request and motivation for it
|
||||
required: true
|
||||
- type: textarea
|
||||
id: feature-request
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Feature request
|
||||
description: |
|
||||
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
|
||||
- type: textarea
|
||||
id: motivation
|
||||
validations:
|
||||
required: true
|
||||
attributes:
|
||||
label: Motivation
|
||||
description: |
|
||||
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
|
||||
- type: textarea
|
||||
id: proposal
|
||||
validations:
|
||||
required: false
|
||||
attributes:
|
||||
label: Proposal (If applicable)
|
||||
description: |
|
||||
If you would like to propose a solution, please describe it here.
|
||||
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
Normal file
122
.github/DISCUSSION_TEMPLATE/q-a.yml
vendored
Normal file
@@ -0,0 +1,122 @@
|
||||
labels: [Question]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain 🦜️🔗!
|
||||
|
||||
Please follow these instructions, fill every question, and do every step. 🙏
|
||||
|
||||
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
|
||||
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
|
||||
|
||||
By asking questions in a structured way (following this) it will be much easier for us to help you.
|
||||
|
||||
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
|
||||
|
||||
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
|
||||
|
||||
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
|
||||
|
||||
Relevant links to check before opening a question to see if your question has already been answered, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this question.
|
||||
required: true
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
- type: checkboxes
|
||||
id: help
|
||||
attributes:
|
||||
label: Commit to Help
|
||||
description: |
|
||||
After submitting this, I commit to one of:
|
||||
|
||||
* Read open questions until I find 2 where I can help someone and add a comment to help there.
|
||||
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
|
||||
* Once my question is answered, I will mark the answer as "accepted".
|
||||
options:
|
||||
- label: I commit to help with one of those options 👆
|
||||
required: true
|
||||
- type: textarea
|
||||
id: example
|
||||
attributes:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: description
|
||||
attributes:
|
||||
label: Description
|
||||
description: |
|
||||
What is the problem, question, or error?
|
||||
|
||||
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
|
||||
placeholder: |
|
||||
* I'm trying to use the `langchain` library to do X.
|
||||
* I expect to see Y.
|
||||
* Instead, it does Z.
|
||||
validations:
|
||||
required: true
|
||||
- type: textarea
|
||||
id: system-info
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
94
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
94
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -1,33 +1,35 @@
|
||||
name: "\U0001F41B Bug Report"
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the LangChain forum.
|
||||
labels: ["bug"]
|
||||
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
|
||||
labels: ["02 Bug Report"]
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report BUGS in LangChain. For usage questions, feature requests and general design questions, please use the [LangChain Forum](https://forum.langchain.com/).
|
||||
|
||||
value: >
|
||||
Thank you for taking the time to file a bug report.
|
||||
|
||||
Use this to report bugs in LangChain.
|
||||
|
||||
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
|
||||
to ask for help with your issue.
|
||||
|
||||
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
|
||||
if there's another way to solve your problem:
|
||||
|
||||
* [LangChain Forum](https://forum.langchain.com/),
|
||||
* [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
* [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
* [LangChain how-to guides](https://python.langchain.com/docs/how_to/),
|
||||
* [API Reference](https://python.langchain.com/api_reference/),
|
||||
* [LangChain ChatBot](https://chat.langchain.com/)
|
||||
* [GitHub search](https://github.com/langchain-ai/langchain),
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checked other resources
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I added a clear and descriptive title that summarizes this issue.
|
||||
- label: I searched the LangChain documentation with the integrated search.
|
||||
required: true
|
||||
- label: I used the GitHub search to find a similar question and didn't find it.
|
||||
required: true
|
||||
@@ -35,10 +37,6 @@ body:
|
||||
required: true
|
||||
- label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
|
||||
required: true
|
||||
- label: I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
|
||||
required: true
|
||||
- label: I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
|
||||
required: true
|
||||
- type: textarea
|
||||
id: reproduction
|
||||
validations:
|
||||
@@ -47,25 +45,25 @@ body:
|
||||
label: Example Code
|
||||
description: |
|
||||
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
|
||||
|
||||
|
||||
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
|
||||
**Important!**
|
||||
|
||||
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
|
||||
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
|
||||
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
|
||||
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
|
||||
|
||||
placeholder: |
|
||||
The following code:
|
||||
|
||||
The following code:
|
||||
|
||||
```python
|
||||
from langchain_core.runnables import RunnableLambda
|
||||
|
||||
def bad_code(inputs) -> int:
|
||||
raise NotImplementedError('For demo purpose')
|
||||
|
||||
|
||||
chain = RunnableLambda(bad_code)
|
||||
chain.invoke('Hello!')
|
||||
```
|
||||
@@ -98,23 +96,25 @@ body:
|
||||
attributes:
|
||||
label: System Info
|
||||
description: |
|
||||
Please share your system info with us. Do NOT skip this step and please don't trim
|
||||
the output. Most users don't include enough information here and it makes it harder
|
||||
for us to help you.
|
||||
|
||||
Run the following command in your terminal and paste the output here:
|
||||
|
||||
`python -m langchain_core.sys_info`
|
||||
|
||||
or if you have an existing python interpreter running:
|
||||
|
||||
```python
|
||||
from langchain_core import sys_info
|
||||
sys_info.print_sys_info()
|
||||
```
|
||||
|
||||
alternatively, put the entire output of `pip freeze` here.
|
||||
placeholder: |
|
||||
Please share your system info with us.
|
||||
|
||||
"pip freeze | grep langchain"
|
||||
platform (windows / linux / mac)
|
||||
python version
|
||||
|
||||
OR if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
placeholder: |
|
||||
"pip freeze | grep langchain"
|
||||
platform
|
||||
python version
|
||||
|
||||
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
|
||||
|
||||
python -m langchain_core.sys_info
|
||||
|
||||
These will only surface LangChain packages, don't forget to include any other relevant
|
||||
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
|
||||
validations:
|
||||
required: true
|
||||
|
||||
15
.github/ISSUE_TEMPLATE/config.yml
vendored
15
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,6 +1,15 @@
|
||||
blank_issues_enabled: false
|
||||
version: 2.1
|
||||
contact_links:
|
||||
- name: LangChain Forum
|
||||
url: https://forum.langchain.com/
|
||||
about: General community discussions, support, and feature requests
|
||||
- name: 🤔 Question or Problem
|
||||
about: Ask a question or ask about a problem in GitHub Discussions.
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a
|
||||
- name: Discord
|
||||
url: https://discord.gg/6adMQxSpJS
|
||||
about: General community discussions
|
||||
- name: Feature Request
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas
|
||||
about: Suggest a feature or an idea
|
||||
- name: Show and tell
|
||||
about: Show what you built with LangChain
|
||||
url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell
|
||||
|
||||
109
.github/ISSUE_TEMPLATE/documentation.yml
vendored
109
.github/ISSUE_TEMPLATE/documentation.yml
vendored
@@ -1,59 +1,58 @@
|
||||
name: Documentation
|
||||
description: Report an issue related to the LangChain documentation.
|
||||
title: "docs: <Please write a comprehensive title after the 'docs: ' prefix>"
|
||||
labels: [documentation]
|
||||
title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
|
||||
labels: [03 - Documentation]
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use the [LangChain Forum](https://forum.langchain.com/).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
* [LangChain Forum](https://forum.langchain.com/),
|
||||
* [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
* [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
* [LangChain how-to guides](https://python.langchain.com/docs/how_to/),
|
||||
* [API Reference](https://python.langchain.com/api_reference/),
|
||||
* [LangChain ChatBot](https://chat.langchain.com/)
|
||||
* [GitHub search](https://github.com/langchain-ai/langchain),
|
||||
- type: input
|
||||
id: url
|
||||
attributes:
|
||||
label: URL
|
||||
description: URL to documentation
|
||||
validations:
|
||||
required: false
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
description: >
|
||||
Please describe as clearly as possible what topics you think are missing
|
||||
from the current documentation.
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: >
|
||||
Thank you for taking the time to report an issue in the documentation.
|
||||
|
||||
Only report issues with documentation here, explain if there are
|
||||
any missing topics or if you found a mistake in the documentation.
|
||||
|
||||
Do **NOT** use this to ask usage questions or reporting issues with your code.
|
||||
|
||||
If you have usage questions or need help solving some problem,
|
||||
please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions).
|
||||
|
||||
If you're in the wrong place, here are some helpful links to find a better
|
||||
place to ask your question:
|
||||
|
||||
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
|
||||
[API Reference](https://api.python.langchain.com/en/stable/),
|
||||
[GitHub search](https://github.com/langchain-ai/langchain),
|
||||
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
|
||||
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
|
||||
[LangChain ChatBot](https://chat.langchain.com/)
|
||||
- type: input
|
||||
id: url
|
||||
attributes:
|
||||
label: URL
|
||||
description: URL to documentation
|
||||
validations:
|
||||
required: false
|
||||
- type: checkboxes
|
||||
id: checks
|
||||
attributes:
|
||||
label: Checklist
|
||||
description: Please confirm and check all the following options.
|
||||
options:
|
||||
- label: I added a very descriptive title to this issue.
|
||||
required: true
|
||||
- label: I included a link to the documentation page I am referring to (if applicable).
|
||||
required: true
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Issue with current documentation:"
|
||||
description: >
|
||||
Please make sure to leave a reference to the document/code you're
|
||||
referring to. Feel free to include names of classes, functions, methods
|
||||
or concepts you'd like to see documented more.
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: "Idea or request for content:"
|
||||
description: >
|
||||
Please describe as clearly as possible what topics you think are missing
|
||||
from the current documentation.
|
||||
|
||||
8
.github/ISSUE_TEMPLATE/privileged.yml
vendored
8
.github/ISSUE_TEMPLATE/privileged.yml
vendored
@@ -5,10 +5,10 @@ body:
|
||||
attributes:
|
||||
value: |
|
||||
Thanks for your interest in LangChain! 🚀
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation on the [LangChain Forum](https://forum.langchain.com/) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
|
||||
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
|
||||
|
||||
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
|
||||
or are a regular contributor to LangChain with previous merged pull requests.
|
||||
- type: checkboxes
|
||||
id: privileged
|
||||
|
||||
41
.github/PULL_REQUEST_TEMPLATE.md
vendored
41
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,32 +1,29 @@
|
||||
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.**
|
||||
Thank you for contributing to LangChain!
|
||||
|
||||
- [ ] **PR title**: "package: description"
|
||||
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
|
||||
- Example: "community: add foobar LLM"
|
||||
|
||||
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
|
||||
- Examples:
|
||||
- feat(core): add multi-tenant support
|
||||
- fix(cli): resolve flag parsing error
|
||||
- docs(openai): update API usage examples
|
||||
- Allowed `{TYPE}` values:
|
||||
- feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
|
||||
- Allowed `{SCOPE}` values (optional):
|
||||
- core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai
|
||||
- Note: the `{DESCRIPTION}` must not start with an uppercase letter.
|
||||
- Once you've written the title, please delete this checklist item; do not include it in the PR.
|
||||
|
||||
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
|
||||
- **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue.
|
||||
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes #123)
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
- **Description:** a description of the change
|
||||
- **Issue:** the issue # it fixes, if applicable
|
||||
- **Dependencies:** any dependencies required for this change
|
||||
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
|
||||
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, you must include:
|
||||
1. A test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. An example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more.
|
||||
- [ ] **Add tests and docs**: If you're adding a new integration, please include
|
||||
1. a test for the integration, preferably unit tests that do not rely on network access,
|
||||
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
|
||||
|
||||
|
||||
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
|
||||
|
||||
Additional guidelines:
|
||||
|
||||
- Make sure optional dependencies are imported within a function.
|
||||
- Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests.
|
||||
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
|
||||
- Most PRs should not touch more than one package.
|
||||
- Changes should be backwards compatible.
|
||||
- If you are adding something to community, do not re-import it in langchain.
|
||||
|
||||
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
|
||||
|
||||
2
.github/actions/people/Dockerfile
vendored
2
.github/actions/people/Dockerfile
vendored
@@ -4,4 +4,4 @@ RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.
|
||||
|
||||
COPY ./app /app
|
||||
|
||||
CMD ["python", "/app/main.py"]
|
||||
CMD ["python", "/app/main.py"]
|
||||
6
.github/actions/people/action.yml
vendored
6
.github/actions/people/action.yml
vendored
@@ -4,8 +4,8 @@ description: "Generate the data for the LangChain People page"
|
||||
author: "Jacob Lee <jacob@langchain.dev>"
|
||||
inputs:
|
||||
token:
|
||||
description: "User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}"
|
||||
description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
|
||||
required: true
|
||||
runs:
|
||||
using: "docker"
|
||||
image: "Dockerfile"
|
||||
using: 'docker'
|
||||
image: 'Dockerfile'
|
||||
39
.github/actions/people/app/main.py
vendored
39
.github/actions/people/app/main.py
vendored
@@ -350,7 +350,11 @@ def get_graphql_pr_edges(*, settings: Settings, after: Union[str, None] = None):
|
||||
print("Querying PRs...")
|
||||
else:
|
||||
print(f"Querying PRs with cursor {after}...")
|
||||
data = get_graphql_response(settings=settings, query=prs_query, after=after)
|
||||
data = get_graphql_response(
|
||||
settings=settings,
|
||||
query=prs_query,
|
||||
after=after
|
||||
)
|
||||
graphql_response = PRsResponse.model_validate(data)
|
||||
return graphql_response.data.repository.pullRequests.edges
|
||||
|
||||
@@ -480,16 +484,10 @@ def get_contributors(settings: Settings):
|
||||
lines_changed = pr.additions + pr.deletions
|
||||
score = _logistic(files_changed, 20) + _logistic(lines_changed, 100)
|
||||
contributor_scores[pr.author.login] += score
|
||||
three_months_ago = datetime.now(timezone.utc) - timedelta(days=3 * 30)
|
||||
three_months_ago = (datetime.now(timezone.utc) - timedelta(days=3*30))
|
||||
if pr.createdAt > three_months_ago:
|
||||
recent_contributor_scores[pr.author.login] += score
|
||||
return (
|
||||
contributors,
|
||||
contributor_scores,
|
||||
recent_contributor_scores,
|
||||
reviewers,
|
||||
authors,
|
||||
)
|
||||
return contributors, contributor_scores, recent_contributor_scores, reviewers, authors
|
||||
|
||||
|
||||
def get_top_users(
|
||||
@@ -526,13 +524,9 @@ if __name__ == "__main__":
|
||||
# question_commentors, question_last_month_commentors, question_authors = get_experts(
|
||||
# settings=settings
|
||||
# )
|
||||
(
|
||||
contributors,
|
||||
contributor_scores,
|
||||
recent_contributor_scores,
|
||||
reviewers,
|
||||
pr_authors,
|
||||
) = get_contributors(settings=settings)
|
||||
contributors, contributor_scores, recent_contributor_scores, reviewers, pr_authors = get_contributors(
|
||||
settings=settings
|
||||
)
|
||||
# authors = {**question_authors, **pr_authors}
|
||||
authors = {**pr_authors}
|
||||
maintainers_logins = {
|
||||
@@ -553,7 +547,6 @@ if __name__ == "__main__":
|
||||
"obi1kenobi",
|
||||
"langchain-infra",
|
||||
"jacoblee93",
|
||||
"isahers1",
|
||||
"dqbd",
|
||||
"bracesproul",
|
||||
"akira",
|
||||
@@ -565,7 +558,7 @@ if __name__ == "__main__":
|
||||
maintainers.append(
|
||||
{
|
||||
"login": login,
|
||||
"count": contributors[login], # + question_commentors[login],
|
||||
"count": contributors[login], #+ question_commentors[login],
|
||||
"avatarUrl": user.avatarUrl,
|
||||
"twitterUsername": user.twitterUsername,
|
||||
"url": user.url,
|
||||
@@ -621,7 +614,9 @@ if __name__ == "__main__":
|
||||
new_people_content = yaml.dump(
|
||||
people, sort_keys=False, width=200, allow_unicode=True
|
||||
)
|
||||
if people_old_content == new_people_content:
|
||||
if (
|
||||
people_old_content == new_people_content
|
||||
):
|
||||
logging.info("The LangChain People data hasn't changed, finishing.")
|
||||
sys.exit(0)
|
||||
people_path.write_text(new_people_content, encoding="utf-8")
|
||||
@@ -634,7 +629,9 @@ if __name__ == "__main__":
|
||||
logging.info(f"Creating a new branch {branch_name}")
|
||||
subprocess.run(["git", "checkout", "-B", branch_name], check=True)
|
||||
logging.info("Adding updated file")
|
||||
subprocess.run(["git", "add", str(people_path)], check=True)
|
||||
subprocess.run(
|
||||
["git", "add", str(people_path)], check=True
|
||||
)
|
||||
logging.info("Committing updated file")
|
||||
message = "👥 Update LangChain people data"
|
||||
result = subprocess.run(["git", "commit", "-m", message], check=True)
|
||||
@@ -643,4 +640,4 @@ if __name__ == "__main__":
|
||||
logging.info("Creating PR")
|
||||
pr = repo.create_pull(title=message, body=message, base="master", head=branch_name)
|
||||
logging.info(f"Created PR: {pr.number}")
|
||||
logging.info("Finished")
|
||||
logging.info("Finished")
|
||||
21
.github/actions/uv_setup/action.yml
vendored
21
.github/actions/uv_setup/action.yml
vendored
@@ -1,21 +0,0 @@
|
||||
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching
|
||||
|
||||
name: uv-install
|
||||
description: Set up Python and uv
|
||||
|
||||
inputs:
|
||||
python-version:
|
||||
description: Python version, supporting MAJOR.MINOR only
|
||||
required: true
|
||||
|
||||
env:
|
||||
UV_VERSION: "0.5.25"
|
||||
|
||||
runs:
|
||||
using: composite
|
||||
steps:
|
||||
- name: Install uv and set the python version
|
||||
uses: astral-sh/setup-uv@v5
|
||||
with:
|
||||
version: ${{ env.UV_VERSION }}
|
||||
python-version: ${{ inputs.python-version }}
|
||||
325
.github/copilot-instructions.md
vendored
325
.github/copilot-instructions.md
vendored
@@ -1,325 +0,0 @@
|
||||
# Global Development Guidelines for LangChain Projects
|
||||
|
||||
## Core Development Principles
|
||||
|
||||
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
|
||||
|
||||
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
|
||||
|
||||
❌ **Bad - Breaking Change:**
|
||||
|
||||
```python
|
||||
def get_user(id, verbose=False): # Changed from `user_id`
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Good - Stable Interface:**
|
||||
|
||||
```python
|
||||
def get_user(user_id: str, verbose: bool = False) -> User:
|
||||
"""Retrieve user by ID with optional verbose output."""
|
||||
pass
|
||||
```
|
||||
|
||||
**Before making ANY changes to public APIs:**
|
||||
|
||||
- Check if the function/class is exported in `__init__.py`
|
||||
- Look for existing usage patterns in tests and examples
|
||||
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
|
||||
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
|
||||
|
||||
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
|
||||
|
||||
### 2. Code Quality Standards
|
||||
|
||||
**All Python code MUST include type hints and return types.**
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def p(u, d):
|
||||
return [x for x in u if x not in d]
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
|
||||
"""Filter out users that are not in the known users set.
|
||||
|
||||
Args:
|
||||
users: List of user identifiers to filter.
|
||||
known_users: Set of known/valid user identifiers.
|
||||
|
||||
Returns:
|
||||
List of users that are not in the known_users set.
|
||||
"""
|
||||
return [user for user in users if user not in known_users]
|
||||
```
|
||||
|
||||
**Style Requirements:**
|
||||
|
||||
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
|
||||
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
|
||||
- Avoid unnecessary abstraction or premature optimization
|
||||
- Follow existing patterns in the codebase you're modifying
|
||||
|
||||
### 3. Testing Requirements
|
||||
|
||||
**Every new feature or bugfix MUST be covered by unit tests.**
|
||||
|
||||
**Test Organization:**
|
||||
|
||||
- Unit tests: `tests/unit_tests/` (no network calls allowed)
|
||||
- Integration tests: `tests/integration_tests/` (network calls permitted)
|
||||
- Use `pytest` as the testing framework
|
||||
|
||||
**Test Quality Checklist:**
|
||||
|
||||
- [ ] Tests fail when your new logic is broken
|
||||
- [ ] Happy path is covered
|
||||
- [ ] Edge cases and error conditions are tested
|
||||
- [ ] Use fixtures/mocks for external dependencies
|
||||
- [ ] Tests are deterministic (no flaky tests)
|
||||
|
||||
Checklist questions:
|
||||
|
||||
- [ ] Does the test suite fail if your new logic is broken?
|
||||
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
|
||||
- [ ] Do tests use fixtures or mocks where needed?
|
||||
|
||||
```python
|
||||
def test_filter_unknown_users():
|
||||
"""Test filtering unknown users from a list."""
|
||||
users = ["alice", "bob", "charlie"]
|
||||
known_users = {"alice", "bob"}
|
||||
|
||||
result = filter_unknown_users(users, known_users)
|
||||
|
||||
assert result == ["charlie"]
|
||||
assert len(result) == 1
|
||||
```
|
||||
|
||||
### 4. Security and Risk Assessment
|
||||
|
||||
**Security Checklist:**
|
||||
|
||||
- No `eval()`, `exec()`, or `pickle` on user-controlled input
|
||||
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
|
||||
- Remove unreachable/commented code before committing
|
||||
- Race conditions or resource leaks (file handles, sockets, threads).
|
||||
- Ensure proper resource cleanup (file handles, connections)
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def load_config(path):
|
||||
with open(path) as f:
|
||||
return eval(f.read()) # ⚠️ Never eval config
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def load_config(path: str) -> dict:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
```
|
||||
|
||||
### 5. Documentation Standards
|
||||
|
||||
**Use Google-style docstrings with Args section for all public functions.**
|
||||
|
||||
❌ **Insufficient Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to, msg):
|
||||
"""Send an email to a recipient."""
|
||||
```
|
||||
|
||||
✅ **Complete Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
|
||||
"""
|
||||
Send an email to a recipient with specified priority.
|
||||
|
||||
Args:
|
||||
to: The email address of the recipient.
|
||||
msg: The message body to send.
|
||||
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
|
||||
|
||||
Returns:
|
||||
True if email was sent successfully, False otherwise.
|
||||
|
||||
Raises:
|
||||
InvalidEmailError: If the email address format is invalid.
|
||||
SMTPConnectionError: If unable to connect to email server.
|
||||
"""
|
||||
```
|
||||
|
||||
**Documentation Guidelines:**
|
||||
|
||||
- Types go in function signatures, NOT in docstrings
|
||||
- Focus on "why" rather than "what" in descriptions
|
||||
- Document all parameters, return values, and exceptions
|
||||
- Keep descriptions concise but clear
|
||||
- Use reStructuredText for docstrings to enable rich formatting
|
||||
|
||||
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
|
||||
|
||||
### 6. Architectural Improvements
|
||||
|
||||
**When you encounter code that could be improved, suggest better designs:**
|
||||
|
||||
❌ **Poor Design:**
|
||||
|
||||
```python
|
||||
def process_data(data, db_conn, email_client, logger):
|
||||
# Function doing too many things
|
||||
validated = validate_data(data)
|
||||
result = db_conn.save(validated)
|
||||
email_client.send_notification(result)
|
||||
logger.log(f"Processed {len(data)} items")
|
||||
return result
|
||||
```
|
||||
|
||||
✅ **Better Design:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ProcessingResult:
|
||||
"""Result of data processing operation."""
|
||||
items_processed: int
|
||||
success: bool
|
||||
errors: List[str] = field(default_factory=list)
|
||||
|
||||
class DataProcessor:
|
||||
"""Handles data validation, storage, and notification."""
|
||||
|
||||
def __init__(self, db_conn: Database, email_client: EmailClient):
|
||||
self.db = db_conn
|
||||
self.email = email_client
|
||||
|
||||
def process(self, data: List[dict]) -> ProcessingResult:
|
||||
"""Process and store data with notifications."""
|
||||
validated = self._validate_data(data)
|
||||
result = self.db.save(validated)
|
||||
self._notify_completion(result)
|
||||
return result
|
||||
```
|
||||
|
||||
**Design Improvement Areas:**
|
||||
|
||||
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
|
||||
|
||||
- Reduce code duplication through shared utilities
|
||||
- Make unit testing easier
|
||||
- Improve separation of concerns (single responsibility)
|
||||
- Make unit testing easier through dependency injection
|
||||
- Add clarity without adding complexity
|
||||
- Prefer dataclasses for structured data
|
||||
|
||||
## Development Tools & Commands
|
||||
|
||||
### Package Management
|
||||
|
||||
```bash
|
||||
# Add package
|
||||
uv add package-name
|
||||
|
||||
# Sync project dependencies
|
||||
uv sync
|
||||
uv lock
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run unit tests (no network)
|
||||
make test
|
||||
|
||||
# Don't run integration tests, as API keys must be set
|
||||
|
||||
# Run specific test file
|
||||
uv run --group test pytest tests/unit_tests/test_specific.py
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
|
||||
```bash
|
||||
# Lint code
|
||||
make lint
|
||||
|
||||
# Format code
|
||||
make format
|
||||
|
||||
# Type checking
|
||||
uv run --group lint mypy .
|
||||
```
|
||||
|
||||
### Dependency Management Patterns
|
||||
|
||||
**Local Development Dependencies:**
|
||||
|
||||
```toml
|
||||
[tool.uv.sources]
|
||||
langchain-core = { path = "../core", editable = true }
|
||||
langchain-tests = { path = "../standard-tests", editable = true }
|
||||
```
|
||||
|
||||
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search_database(query: str) -> str:
|
||||
"""Search the database for relevant information.
|
||||
|
||||
Args:
|
||||
query: The search query string.
|
||||
"""
|
||||
# Implementation here
|
||||
return results
|
||||
```
|
||||
|
||||
## Commit Standards
|
||||
|
||||
**Use Conventional Commits format for PR titles:**
|
||||
|
||||
- `feat(core): add multi-tenant support`
|
||||
- `fix(cli): resolve flag parsing error`
|
||||
- `docs: update API usage examples`
|
||||
- `docs(openai): update API usage examples`
|
||||
|
||||
## Framework-Specific Guidelines
|
||||
|
||||
- Follow the existing patterns in `langchain-core` for base abstractions
|
||||
- Use `langchain_core.callbacks` for execution tracking
|
||||
- Implement proper streaming support where applicable
|
||||
- Avoid deprecated components like legacy `LLMChain`
|
||||
|
||||
### Partner Integrations
|
||||
|
||||
- Follow the established patterns in existing partner libraries
|
||||
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
|
||||
- Include comprehensive integration tests
|
||||
- Document API key requirements and authentication
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Checklist
|
||||
|
||||
Before submitting code changes:
|
||||
|
||||
- [ ] **Breaking Changes**: Verified no public API changes
|
||||
- [ ] **Type Hints**: All functions have complete type annotations
|
||||
- [ ] **Tests**: New functionality is fully tested
|
||||
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
|
||||
- [ ] **Documentation**: Google-style docstrings for public functions
|
||||
- [ ] **Code Quality**: `make lint` and `make format` pass
|
||||
- [ ] **Architecture**: Suggested improvements where applicable
|
||||
- [ ] **Commit Message**: Follows Conventional Commits format
|
||||
11
.github/dependabot.yml
vendored
11
.github/dependabot.yml
vendored
@@ -1,11 +0,0 @@
|
||||
# Please see the documentation for all configuration options:
|
||||
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
|
||||
# and
|
||||
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
|
||||
|
||||
version: 2
|
||||
updates:
|
||||
- package-ecosystem: "github-actions"
|
||||
directory: "/"
|
||||
schedule:
|
||||
interval: "weekly"
|
||||
281
.github/scripts/check_diff.py
vendored
281
.github/scripts/check_diff.py
vendored
@@ -1,226 +1,16 @@
|
||||
import glob
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Set
|
||||
|
||||
import tomllib
|
||||
from get_min_versions import get_min_version_from_toml
|
||||
from packaging.requirements import Requirement
|
||||
import os
|
||||
from typing import Dict
|
||||
|
||||
LANGCHAIN_DIRS = [
|
||||
"libs/core",
|
||||
"libs/text-splitters",
|
||||
"libs/langchain",
|
||||
"libs/langchain_v1",
|
||||
"libs/community",
|
||||
"libs/experimental",
|
||||
]
|
||||
|
||||
# when set to True, we are ignoring core dependents
|
||||
# in order to be able to get CI to pass for each individual
|
||||
# package that depends on core
|
||||
# e.g. if you touch core, we don't then add textsplitters/etc to CI
|
||||
IGNORE_CORE_DEPENDENTS = False
|
||||
|
||||
# ignored partners are removed from dependents
|
||||
# but still run if directly edited
|
||||
IGNORED_PARTNERS = [
|
||||
# remove huggingface from dependents because of CI instability
|
||||
# specifically in huggingface jobs
|
||||
# https://github.com/langchain-ai/langchain/issues/25558
|
||||
"huggingface",
|
||||
# prompty exhibiting issues with numpy for Python 3.13
|
||||
# https://github.com/langchain-ai/langchain/actions/runs/12651104685/job/35251034969?pr=29065
|
||||
"prompty",
|
||||
]
|
||||
|
||||
PY_312_MAX_PACKAGES = [
|
||||
"libs/partners/chroma", # https://github.com/chroma-core/chroma/issues/4382
|
||||
]
|
||||
|
||||
|
||||
def all_package_dirs() -> Set[str]:
|
||||
return {
|
||||
"/".join(path.split("/")[:-1]).lstrip("./")
|
||||
for path in glob.glob("./libs/**/pyproject.toml", recursive=True)
|
||||
if "libs/cli" not in path and "libs/standard-tests" not in path
|
||||
}
|
||||
|
||||
|
||||
def dependents_graph() -> dict:
|
||||
"""
|
||||
Construct a mapping of package -> dependents, such that we can
|
||||
run tests on all dependents of a package when a change is made.
|
||||
"""
|
||||
dependents = defaultdict(set)
|
||||
|
||||
for path in glob.glob("./libs/**/pyproject.toml", recursive=True):
|
||||
if "template" in path:
|
||||
continue
|
||||
|
||||
# load regular and test deps from pyproject.toml
|
||||
with open(path, "rb") as f:
|
||||
pyproject = tomllib.load(f)
|
||||
|
||||
pkg_dir = "libs" + "/".join(path.split("libs")[1].split("/")[:-1])
|
||||
for dep in [
|
||||
*pyproject["project"]["dependencies"],
|
||||
*pyproject["dependency-groups"]["test"],
|
||||
]:
|
||||
requirement = Requirement(dep)
|
||||
package_name = requirement.name
|
||||
if "langchain" in dep:
|
||||
dependents[package_name].add(pkg_dir)
|
||||
continue
|
||||
|
||||
# load extended deps from extended_testing_deps.txt
|
||||
package_path = Path(path).parent
|
||||
extended_requirement_path = package_path / "extended_testing_deps.txt"
|
||||
if extended_requirement_path.exists():
|
||||
with open(extended_requirement_path, "r") as f:
|
||||
extended_deps = f.read().splitlines()
|
||||
for depline in extended_deps:
|
||||
if depline.startswith("-e "):
|
||||
# editable dependency
|
||||
assert depline.startswith("-e ../partners/"), (
|
||||
"Extended test deps should only editable install partner packages"
|
||||
)
|
||||
partner = depline.split("partners/")[1]
|
||||
dep = f"langchain-{partner}"
|
||||
else:
|
||||
dep = depline.split("==")[0]
|
||||
|
||||
if "langchain" in dep:
|
||||
dependents[dep].add(pkg_dir)
|
||||
|
||||
for k in dependents:
|
||||
for partner in IGNORED_PARTNERS:
|
||||
if f"libs/partners/{partner}" in dependents[k]:
|
||||
dependents[k].remove(f"libs/partners/{partner}")
|
||||
return dependents
|
||||
|
||||
|
||||
def add_dependents(dirs_to_eval: Set[str], dependents: dict) -> List[str]:
|
||||
updated = set()
|
||||
for dir_ in dirs_to_eval:
|
||||
# handle core manually because it has so many dependents
|
||||
if "core" in dir_:
|
||||
updated.add(dir_)
|
||||
continue
|
||||
pkg = "langchain-" + dir_.split("/")[-1]
|
||||
updated.update(dependents[pkg])
|
||||
updated.add(dir_)
|
||||
return list(updated)
|
||||
|
||||
|
||||
def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
|
||||
if job == "test-pydantic":
|
||||
return _get_pydantic_test_configs(dir_)
|
||||
|
||||
if job == "codspeed":
|
||||
py_versions = ["3.12"] # 3.13 is not yet supported
|
||||
elif dir_ == "libs/core":
|
||||
py_versions = ["3.9", "3.10", "3.11", "3.12", "3.13"]
|
||||
# custom logic for specific directories
|
||||
elif dir_ == "libs/partners/milvus":
|
||||
# milvus doesn't allow 3.12 because they declare deps in funny way
|
||||
py_versions = ["3.9", "3.11"]
|
||||
|
||||
elif dir_ in PY_312_MAX_PACKAGES:
|
||||
py_versions = ["3.9", "3.12"]
|
||||
|
||||
elif dir_ == "libs/langchain" and job == "extended-tests":
|
||||
py_versions = ["3.9", "3.13"]
|
||||
|
||||
elif dir_ == ".":
|
||||
# unable to install with 3.13 because tokenizers doesn't support 3.13 yet
|
||||
py_versions = ["3.9", "3.12"]
|
||||
else:
|
||||
py_versions = ["3.9", "3.13"]
|
||||
|
||||
return [{"working-directory": dir_, "python-version": py_v} for py_v in py_versions]
|
||||
|
||||
|
||||
def _get_pydantic_test_configs(
|
||||
dir_: str, *, python_version: str = "3.11"
|
||||
) -> List[Dict[str, str]]:
|
||||
with open("./libs/core/uv.lock", "rb") as f:
|
||||
core_uv_lock_data = tomllib.load(f)
|
||||
for package in core_uv_lock_data["package"]:
|
||||
if package["name"] == "pydantic":
|
||||
core_max_pydantic_minor = package["version"].split(".")[1]
|
||||
break
|
||||
|
||||
with open(f"./{dir_}/uv.lock", "rb") as f:
|
||||
dir_uv_lock_data = tomllib.load(f)
|
||||
|
||||
for package in dir_uv_lock_data["package"]:
|
||||
if package["name"] == "pydantic":
|
||||
dir_max_pydantic_minor = package["version"].split(".")[1]
|
||||
break
|
||||
|
||||
core_min_pydantic_version = get_min_version_from_toml(
|
||||
"./libs/core/pyproject.toml", "release", python_version, include=["pydantic"]
|
||||
)["pydantic"]
|
||||
core_min_pydantic_minor = (
|
||||
core_min_pydantic_version.split(".")[1]
|
||||
if "." in core_min_pydantic_version
|
||||
else "0"
|
||||
)
|
||||
dir_min_pydantic_version = get_min_version_from_toml(
|
||||
f"./{dir_}/pyproject.toml", "release", python_version, include=["pydantic"]
|
||||
).get("pydantic", "0.0.0")
|
||||
dir_min_pydantic_minor = (
|
||||
dir_min_pydantic_version.split(".")[1]
|
||||
if "." in dir_min_pydantic_version
|
||||
else "0"
|
||||
)
|
||||
|
||||
max_pydantic_minor = min(
|
||||
int(dir_max_pydantic_minor),
|
||||
int(core_max_pydantic_minor),
|
||||
)
|
||||
min_pydantic_minor = max(
|
||||
int(dir_min_pydantic_minor),
|
||||
int(core_min_pydantic_minor),
|
||||
)
|
||||
|
||||
configs = [
|
||||
{
|
||||
"working-directory": dir_,
|
||||
"pydantic-version": f"2.{v}.0",
|
||||
"python-version": python_version,
|
||||
}
|
||||
for v in range(min_pydantic_minor, max_pydantic_minor + 1)
|
||||
]
|
||||
return configs
|
||||
|
||||
|
||||
def _get_configs_for_multi_dirs(
|
||||
job: str, dirs_to_run: Dict[str, Set[str]], dependents: dict
|
||||
) -> List[Dict[str, str]]:
|
||||
if job == "lint":
|
||||
dirs = add_dependents(
|
||||
dirs_to_run["lint"] | dirs_to_run["test"] | dirs_to_run["extended-test"],
|
||||
dependents,
|
||||
)
|
||||
elif job in ["test", "compile-integration-tests", "dependencies", "test-pydantic"]:
|
||||
dirs = add_dependents(
|
||||
dirs_to_run["test"] | dirs_to_run["extended-test"], dependents
|
||||
)
|
||||
elif job == "extended-tests":
|
||||
dirs = list(dirs_to_run["extended-test"])
|
||||
elif job == "codspeed":
|
||||
dirs = list(dirs_to_run["codspeed"])
|
||||
else:
|
||||
raise ValueError(f"Unknown job: {job}")
|
||||
|
||||
return [
|
||||
config for dir_ in dirs for config in _get_configs_for_single_dir(job, dir_)
|
||||
]
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
files = sys.argv[1:]
|
||||
|
||||
@@ -228,15 +18,12 @@ if __name__ == "__main__":
|
||||
"lint": set(),
|
||||
"test": set(),
|
||||
"extended-test": set(),
|
||||
"codspeed": set(),
|
||||
}
|
||||
docs_edited = False
|
||||
|
||||
if len(files) >= 300:
|
||||
if len(files) == 300:
|
||||
# max diff length is 300 files - there are likely files missing
|
||||
dirs_to_run["lint"] = all_package_dirs()
|
||||
dirs_to_run["test"] = all_package_dirs()
|
||||
dirs_to_run["extended-test"] = set(LANGCHAIN_DIRS)
|
||||
raise ValueError("Max diff reached. Please manually run CI on changed libs.")
|
||||
|
||||
for file in files:
|
||||
if any(
|
||||
@@ -252,38 +39,29 @@ if __name__ == "__main__":
|
||||
dirs_to_run["extended-test"].update(LANGCHAIN_DIRS)
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
if file.startswith("libs/core"):
|
||||
dirs_to_run["codspeed"].add(f"libs/core")
|
||||
if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS):
|
||||
# add that dir and all dirs after in LANGCHAIN_DIRS
|
||||
# for extended testing
|
||||
|
||||
found = False
|
||||
for dir_ in LANGCHAIN_DIRS:
|
||||
if dir_ == "libs/core" and IGNORE_CORE_DEPENDENTS:
|
||||
dirs_to_run["extended-test"].add(dir_)
|
||||
continue
|
||||
if file.startswith(dir_):
|
||||
found = True
|
||||
if found:
|
||||
dirs_to_run["extended-test"].add(dir_)
|
||||
elif file.startswith("libs/standard-tests"):
|
||||
# TODO: update to include all packages that rely on standard-tests (all partner packages)
|
||||
# Note: won't run on external repo partners
|
||||
# note: won't run on external repo partners
|
||||
dirs_to_run["lint"].add("libs/standard-tests")
|
||||
dirs_to_run["test"].add("libs/standard-tests")
|
||||
dirs_to_run["lint"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/partners/mistralai")
|
||||
dirs_to_run["test"].add("libs/partners/openai")
|
||||
dirs_to_run["test"].add("libs/partners/anthropic")
|
||||
dirs_to_run["test"].add("libs/partners/ai21")
|
||||
dirs_to_run["test"].add("libs/partners/fireworks")
|
||||
dirs_to_run["test"].add("libs/partners/groq")
|
||||
|
||||
elif file.startswith("libs/cli"):
|
||||
dirs_to_run["lint"].add("libs/cli")
|
||||
dirs_to_run["test"].add("libs/cli")
|
||||
|
||||
# todo: add cli makefile
|
||||
pass
|
||||
elif file.startswith("libs/partners"):
|
||||
partner_dir = file.split("/")[2]
|
||||
if os.path.isdir(f"libs/partners/{partner_dir}") and [
|
||||
@@ -292,42 +70,25 @@ if __name__ == "__main__":
|
||||
if not filename.startswith(".")
|
||||
] != ["README.md"]:
|
||||
dirs_to_run["test"].add(f"libs/partners/{partner_dir}")
|
||||
dirs_to_run["codspeed"].add(f"libs/partners/{partner_dir}")
|
||||
# Skip if the directory was deleted or is just a tombstone readme
|
||||
elif file == "libs/packages.yml":
|
||||
continue
|
||||
elif file.startswith("libs/"):
|
||||
raise ValueError(
|
||||
f"Unknown lib: {file}. check_diff.py likely needs "
|
||||
"an update for this new library!"
|
||||
)
|
||||
elif file.startswith("docs/") or file in [
|
||||
"pyproject.toml",
|
||||
"uv.lock",
|
||||
]: # docs or root uv files
|
||||
docs_edited = True
|
||||
elif any(file.startswith(p) for p in ["docs/", "templates/", "cookbook/"]):
|
||||
if file.startswith("docs/"):
|
||||
docs_edited = True
|
||||
dirs_to_run["lint"].add(".")
|
||||
|
||||
dependents = dependents_graph()
|
||||
|
||||
# we now have dirs_by_job
|
||||
# todo: clean this up
|
||||
map_job_to_configs = {
|
||||
job: _get_configs_for_multi_dirs(job, dirs_to_run, dependents)
|
||||
for job in [
|
||||
"lint",
|
||||
"test",
|
||||
"extended-tests",
|
||||
"compile-integration-tests",
|
||||
"dependencies",
|
||||
"test-pydantic",
|
||||
"codspeed",
|
||||
]
|
||||
outputs = {
|
||||
"dirs-to-lint": list(
|
||||
dirs_to_run["lint"] | dirs_to_run["test"] | dirs_to_run["extended-test"]
|
||||
),
|
||||
"dirs-to-test": list(dirs_to_run["test"] | dirs_to_run["extended-test"]),
|
||||
"dirs-to-extended-test": list(dirs_to_run["extended-test"]),
|
||||
"docs-edited": "true" if docs_edited else "",
|
||||
}
|
||||
map_job_to_configs["test-doc-imports"] = (
|
||||
[{"python-version": "3.12"}] if docs_edited else []
|
||||
)
|
||||
|
||||
for key, value in map_job_to_configs.items():
|
||||
for key, value in outputs.items():
|
||||
json_output = json.dumps(value)
|
||||
print(f"{key}={json_output}")
|
||||
|
||||
35
.github/scripts/check_prerelease_dependencies.py
vendored
35
.github/scripts/check_prerelease_dependencies.py
vendored
@@ -1,35 +0,0 @@
|
||||
import sys
|
||||
|
||||
import tomllib
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Get the TOML file path from the command line argument
|
||||
toml_file = sys.argv[1]
|
||||
|
||||
# read toml file
|
||||
with open(toml_file, "rb") as file:
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
# see if we're releasing an rc
|
||||
version = toml_data["project"]["version"]
|
||||
releasing_rc = "rc" in version or "dev" in version
|
||||
|
||||
# if not, iterate through dependencies and make sure none allow prereleases
|
||||
if not releasing_rc:
|
||||
dependencies = toml_data["project"]["dependencies"]
|
||||
for dep_version in dependencies:
|
||||
dep_version_string = (
|
||||
dep_version["version"] if isinstance(dep_version, dict) else dep_version
|
||||
)
|
||||
|
||||
if "rc" in dep_version_string:
|
||||
raise ValueError(
|
||||
f"Dependency {dep_version} has a prerelease version. Please remove this."
|
||||
)
|
||||
|
||||
if isinstance(dep_version, dict) and dep_version.get(
|
||||
"allow-prereleases", False
|
||||
):
|
||||
raise ValueError(
|
||||
f"Dependency {dep_version} has allow-prereleases set to true. Please remove this."
|
||||
)
|
||||
192
.github/scripts/get_min_versions.py
vendored
192
.github/scripts/get_min_versions.py
vendored
@@ -1,151 +1,65 @@
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from typing import Optional
|
||||
|
||||
if sys.version_info >= (3, 11):
|
||||
import tomllib
|
||||
else:
|
||||
# for python 3.10 and below, which doesnt have stdlib tomllib
|
||||
import tomli as tomllib
|
||||
|
||||
import tomllib
|
||||
from packaging.version import parse as parse_version
|
||||
import re
|
||||
from typing import List
|
||||
|
||||
import requests
|
||||
from packaging.requirements import Requirement
|
||||
from packaging.specifiers import SpecifierSet
|
||||
from packaging.version import Version, parse
|
||||
|
||||
MIN_VERSION_LIBS = [
|
||||
"langchain-core",
|
||||
"langchain-community",
|
||||
"langchain",
|
||||
"langchain-text-splitters",
|
||||
"numpy",
|
||||
"SQLAlchemy",
|
||||
]
|
||||
|
||||
# some libs only get checked on release because of simultaneous changes in
|
||||
# multiple libs
|
||||
SKIP_IF_PULL_REQUEST = [
|
||||
"langchain-core",
|
||||
"langchain-text-splitters",
|
||||
"langchain",
|
||||
]
|
||||
|
||||
|
||||
def get_pypi_versions(package_name: str) -> List[str]:
|
||||
"""
|
||||
Fetch all available versions for a package from PyPI.
|
||||
def get_min_version(version: str) -> str:
|
||||
# base regex for x.x.x with cases for rc/post/etc
|
||||
# valid strings: https://peps.python.org/pep-0440/#public-version-identifiers
|
||||
vstring = r"\d+(?:\.\d+){0,2}(?:(?:a|b|rc|\.post|\.dev)\d+)?"
|
||||
# case ^x.x.x
|
||||
_match = re.match(f"^\\^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
Args:
|
||||
package_name (str): Name of the package
|
||||
# case >=x.x.x,<y.y.y
|
||||
_match = re.match(f"^>=({vstring}),<({vstring})$", version)
|
||||
if _match:
|
||||
_min = _match.group(1)
|
||||
_max = _match.group(2)
|
||||
assert parse_version(_min) < parse_version(_max)
|
||||
return _min
|
||||
|
||||
Returns:
|
||||
List[str]: List of all available versions
|
||||
# case x.x.x
|
||||
_match = re.match(f"^({vstring})$", version)
|
||||
if _match:
|
||||
return _match.group(1)
|
||||
|
||||
Raises:
|
||||
requests.exceptions.RequestException: If PyPI API request fails
|
||||
KeyError: If package not found or response format unexpected
|
||||
"""
|
||||
pypi_url = f"https://pypi.org/pypi/{package_name}/json"
|
||||
response = requests.get(pypi_url)
|
||||
response.raise_for_status()
|
||||
return list(response.json()["releases"].keys())
|
||||
raise ValueError(f"Unrecognized version format: {version}")
|
||||
|
||||
|
||||
def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]:
|
||||
"""
|
||||
Find the minimum published version that satisfies the given constraints.
|
||||
|
||||
Args:
|
||||
package_name (str): Name of the package
|
||||
spec_string (str): Version specification string (e.g., ">=0.2.43,<0.4.0,!=0.3.0")
|
||||
|
||||
Returns:
|
||||
Optional[str]: Minimum compatible version or None if no compatible version found
|
||||
"""
|
||||
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
|
||||
spec_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", spec_string)
|
||||
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string)
|
||||
for y in range(1, 10):
|
||||
spec_string = re.sub(
|
||||
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}", spec_string
|
||||
)
|
||||
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
|
||||
for x in range(1, 10):
|
||||
spec_string = re.sub(
|
||||
rf"\^{x}\.(\d+)\.(\d+)", rf">={x}.\1.\2,<{x + 1}", spec_string
|
||||
)
|
||||
|
||||
spec_set = SpecifierSet(spec_string)
|
||||
all_versions = get_pypi_versions(package_name)
|
||||
|
||||
valid_versions = []
|
||||
for version_str in all_versions:
|
||||
try:
|
||||
version = parse(version_str)
|
||||
if spec_set.contains(version):
|
||||
valid_versions.append(version)
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
return str(min(valid_versions)) if valid_versions else None
|
||||
|
||||
|
||||
def _check_python_version_from_requirement(
|
||||
requirement: Requirement, python_version: str
|
||||
) -> bool:
|
||||
if not requirement.marker:
|
||||
return True
|
||||
else:
|
||||
marker_str = str(requirement.marker)
|
||||
if "python_version" or "python_full_version" in marker_str:
|
||||
python_version_str = "".join(
|
||||
char
|
||||
for char in marker_str
|
||||
if char.isdigit() or char in (".", "<", ">", "=", ",")
|
||||
)
|
||||
return check_python_version(python_version, python_version_str)
|
||||
return True
|
||||
|
||||
|
||||
def get_min_version_from_toml(
|
||||
toml_path: str,
|
||||
versions_for: str,
|
||||
python_version: str,
|
||||
*,
|
||||
include: Optional[list] = None,
|
||||
):
|
||||
def get_min_version_from_toml(toml_path: str):
|
||||
# Parse the TOML file
|
||||
with open(toml_path, "rb") as file:
|
||||
toml_data = tomllib.load(file)
|
||||
|
||||
dependencies = defaultdict(list)
|
||||
for dep in toml_data["project"]["dependencies"]:
|
||||
requirement = Requirement(dep)
|
||||
dependencies[requirement.name].append(requirement)
|
||||
# Get the dependencies from tool.poetry.dependencies
|
||||
dependencies = toml_data["tool"]["poetry"]["dependencies"]
|
||||
|
||||
# Initialize a dictionary to store the minimum versions
|
||||
min_versions = {}
|
||||
|
||||
# Iterate over the libs in MIN_VERSION_LIBS
|
||||
for lib in set(MIN_VERSION_LIBS + (include or [])):
|
||||
if versions_for == "pull_request" and lib in SKIP_IF_PULL_REQUEST:
|
||||
# some libs only get checked on release because of simultaneous
|
||||
# changes in multiple libs
|
||||
continue
|
||||
for lib in MIN_VERSION_LIBS:
|
||||
# Check if the lib is present in the dependencies
|
||||
if lib in dependencies:
|
||||
if include and lib not in include:
|
||||
continue
|
||||
requirements = dependencies[lib]
|
||||
for requirement in requirements:
|
||||
if _check_python_version_from_requirement(requirement, python_version):
|
||||
version_string = str(requirement.specifier)
|
||||
break
|
||||
# Get the version string
|
||||
version_string = dependencies[lib]
|
||||
|
||||
if isinstance(version_string, dict):
|
||||
version_string = version_string["version"]
|
||||
|
||||
# Use parse_version to get the minimum supported version from version_string
|
||||
min_version = get_minimum_version(lib, version_string)
|
||||
min_version = get_min_version(version_string)
|
||||
|
||||
# Store the minimum version in the min_versions dictionary
|
||||
min_versions[lib] = min_version
|
||||
@@ -153,45 +67,13 @@ def get_min_version_from_toml(
|
||||
return min_versions
|
||||
|
||||
|
||||
def check_python_version(version_string, constraint_string):
|
||||
"""
|
||||
Check if the given Python version matches the given constraints.
|
||||
|
||||
:param version_string: A string representing the Python version (e.g. "3.8.5").
|
||||
:param constraint_string: A string representing the package's Python version constraints (e.g. ">=3.6, <4.0").
|
||||
:return: True if the version matches the constraints, False otherwise.
|
||||
"""
|
||||
|
||||
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
|
||||
constraint_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", constraint_string)
|
||||
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string)
|
||||
for y in range(1, 10):
|
||||
constraint_string = re.sub(
|
||||
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}.0", constraint_string
|
||||
)
|
||||
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
|
||||
for x in range(1, 10):
|
||||
constraint_string = re.sub(
|
||||
rf"\^{x}\.0\.(\d+)", rf">={x}.0.\1,<{x + 1}.0.0", constraint_string
|
||||
)
|
||||
|
||||
try:
|
||||
version = Version(version_string)
|
||||
constraints = SpecifierSet(constraint_string)
|
||||
return version in constraints
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Get the TOML file path from the command line argument
|
||||
toml_file = sys.argv[1]
|
||||
versions_for = sys.argv[2]
|
||||
python_version = sys.argv[3]
|
||||
assert versions_for in ["release", "pull_request"]
|
||||
|
||||
# Call the function to get the minimum versions
|
||||
min_versions = get_min_version_from_toml(toml_file, versions_for, python_version)
|
||||
min_versions = get_min_version_from_toml(toml_file)
|
||||
|
||||
print(" ".join([f"{lib}=={version}" for lib, version in min_versions.items()]))
|
||||
print(
|
||||
" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])
|
||||
)
|
||||
|
||||
112
.github/scripts/prep_api_docs_build.py
vendored
112
.github/scripts/prep_api_docs_build.py
vendored
@@ -1,112 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
"""Script to sync libraries from various repositories into the main langchain repository."""
|
||||
|
||||
import os
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
def load_packages_yaml() -> Dict[str, Any]:
|
||||
"""Load and parse the packages.yml file."""
|
||||
with open("langchain/libs/packages.yml", "r") as f:
|
||||
return yaml.safe_load(f)
|
||||
|
||||
|
||||
def get_target_dir(package_name: str) -> Path:
|
||||
"""Get the target directory for a given package."""
|
||||
package_name_short = package_name.replace("langchain-", "")
|
||||
base_path = Path("langchain/libs")
|
||||
if package_name_short == "experimental":
|
||||
return base_path / "experimental"
|
||||
if package_name_short == "community":
|
||||
return base_path / "community"
|
||||
return base_path / "partners" / package_name_short
|
||||
|
||||
|
||||
def clean_target_directories(packages: list) -> None:
|
||||
"""Remove old directories that will be replaced."""
|
||||
for package in packages:
|
||||
target_dir = get_target_dir(package["name"])
|
||||
if target_dir.exists():
|
||||
print(f"Removing {target_dir}")
|
||||
shutil.rmtree(target_dir)
|
||||
|
||||
|
||||
def move_libraries(packages: list) -> None:
|
||||
"""Move libraries from their source locations to the target directories."""
|
||||
for package in packages:
|
||||
repo_name = package["repo"].split("/")[1]
|
||||
source_path = package["path"]
|
||||
target_dir = get_target_dir(package["name"])
|
||||
|
||||
# Handle root path case
|
||||
if source_path == ".":
|
||||
source_dir = repo_name
|
||||
else:
|
||||
source_dir = f"{repo_name}/{source_path}"
|
||||
|
||||
print(f"Moving {source_dir} to {target_dir}")
|
||||
|
||||
# Ensure target directory exists
|
||||
os.makedirs(os.path.dirname(target_dir), exist_ok=True)
|
||||
|
||||
try:
|
||||
# Move the directory
|
||||
shutil.move(source_dir, target_dir)
|
||||
except Exception as e:
|
||||
print(f"Error moving {source_dir} to {target_dir}: {e}")
|
||||
|
||||
|
||||
def main():
|
||||
"""Main function to orchestrate the library sync process."""
|
||||
try:
|
||||
# Load packages configuration
|
||||
package_yaml = load_packages_yaml()
|
||||
|
||||
# Clean target directories
|
||||
clean_target_directories(
|
||||
[
|
||||
p
|
||||
for p in package_yaml["packages"]
|
||||
if (
|
||||
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
|
||||
)
|
||||
and p["repo"] != "langchain-ai/langchain"
|
||||
and p["name"]
|
||||
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
|
||||
]
|
||||
)
|
||||
|
||||
# Move libraries to their new locations
|
||||
move_libraries(
|
||||
[
|
||||
p
|
||||
for p in package_yaml["packages"]
|
||||
if not p.get("disabled", False)
|
||||
and (
|
||||
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
|
||||
)
|
||||
and p["repo"] != "langchain-ai/langchain"
|
||||
and p["name"]
|
||||
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
|
||||
]
|
||||
)
|
||||
|
||||
# Delete ones without a pyproject.toml
|
||||
for partner in Path("langchain/libs/partners").iterdir():
|
||||
if partner.is_dir() and not (partner / "pyproject.toml").exists():
|
||||
print(f"Removing {partner} as it does not have a pyproject.toml")
|
||||
shutil.rmtree(partner)
|
||||
|
||||
print("Library sync completed successfully!")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error during library sync: {e}")
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
416
.github/tools/git-restore-mtime
vendored
416
.github/tools/git-restore-mtime
vendored
@@ -81,93 +81,56 @@ import time
|
||||
__version__ = "2022.12+dev"
|
||||
|
||||
# Update symlinks only if the platform supports not following them
|
||||
UPDATE_SYMLINKS = bool(os.utime in getattr(os, "supports_follow_symlinks", []))
|
||||
UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', []))
|
||||
|
||||
# Call os.path.normpath() only if not in a POSIX platform (Windows)
|
||||
NORMALIZE_PATHS = os.path.sep != "/"
|
||||
NORMALIZE_PATHS = (os.path.sep != '/')
|
||||
|
||||
# How many files to process in each batch when re-trying merge commits
|
||||
STEPMISSING = 100
|
||||
|
||||
# (Extra) keywords for the os.utime() call performed by touch()
|
||||
UTIME_KWS = {} if not UPDATE_SYMLINKS else {"follow_symlinks": False}
|
||||
UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False}
|
||||
|
||||
|
||||
# Command-line interface ######################################################
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description=__doc__.split("\n---")[0])
|
||||
parser = argparse.ArgumentParser(
|
||||
description=__doc__.split('\n---')[0])
|
||||
|
||||
group = parser.add_mutually_exclusive_group()
|
||||
group.add_argument(
|
||||
"--quiet",
|
||||
"-q",
|
||||
dest="loglevel",
|
||||
action="store_const",
|
||||
const=logging.WARNING,
|
||||
default=logging.INFO,
|
||||
help="Suppress informative messages and summary statistics.",
|
||||
)
|
||||
group.add_argument(
|
||||
"--verbose",
|
||||
"-v",
|
||||
action="count",
|
||||
help="""
|
||||
group.add_argument('--quiet', '-q', dest='loglevel',
|
||||
action="store_const", const=logging.WARNING, default=logging.INFO,
|
||||
help="Suppress informative messages and summary statistics.")
|
||||
group.add_argument('--verbose', '-v', action="count", help="""
|
||||
Print additional information for each processed file.
|
||||
Specify twice to further increase verbosity.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--cwd",
|
||||
"-C",
|
||||
metavar="DIRECTORY",
|
||||
help="""
|
||||
parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help="""
|
||||
Run as if %(prog)s was started in directory %(metavar)s.
|
||||
This affects how --work-tree, --git-dir and PATHSPEC arguments are handled.
|
||||
See 'man 1 git' or 'git --help' for more information.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--git-dir",
|
||||
dest="gitdir",
|
||||
metavar="GITDIR",
|
||||
help="""
|
||||
parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help="""
|
||||
Path to the git repository, by default auto-discovered by searching
|
||||
the current directory and its parents for a .git/ subdirectory.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--work-tree",
|
||||
dest="workdir",
|
||||
metavar="WORKTREE",
|
||||
help="""
|
||||
parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help="""
|
||||
Path to the work tree root, by default the parent of GITDIR if it's
|
||||
automatically discovered, or the current directory if GITDIR is set.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--force",
|
||||
"-f",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--force', '-f', default=False, action="store_true", help="""
|
||||
Force updating files with uncommitted modifications.
|
||||
Untracked files and uncommitted deletions, renames and additions are
|
||||
always ignored.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--merge",
|
||||
"-m",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--merge', '-m', default=False, action="store_true", help="""
|
||||
Include merge commits.
|
||||
Leads to more recent times and more files per commit, thus with the same
|
||||
time, which may or may not be what you want.
|
||||
@@ -175,130 +138,71 @@ def parse_args():
|
||||
are found sooner, which can improve performance, sometimes substantially.
|
||||
But as merge commits are usually huge, processing them may also take longer.
|
||||
By default, merge commits are only used for files missing from regular commits.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--first-parent",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--first-parent', default=False, action="store_true", help="""
|
||||
Consider only the first parent, the "main branch", when evaluating merge commits.
|
||||
Only effective when merge commits are processed, either when --merge is
|
||||
used or when finding missing files after the first regular log search.
|
||||
See --skip-missing.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-missing",
|
||||
"-s",
|
||||
dest="missing",
|
||||
default=True,
|
||||
action="store_false",
|
||||
help="""
|
||||
parser.add_argument('--skip-missing', '-s', dest="missing", default=True,
|
||||
action="store_false", help="""
|
||||
Do not try to find missing files.
|
||||
If merge commits were not evaluated with --merge and some files were
|
||||
not found in regular commits, by default %(prog)s searches for these
|
||||
files again in the merge commits.
|
||||
This option disables this retry, so files found only in merge commits
|
||||
will not have their timestamp updated.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--no-directories",
|
||||
"-D",
|
||||
dest="dirs",
|
||||
default=True,
|
||||
action="store_false",
|
||||
help="""
|
||||
parser.add_argument('--no-directories', '-D', dest='dirs', default=True,
|
||||
action="store_false", help="""
|
||||
Do not update directory timestamps.
|
||||
By default, use the time of its most recently created, renamed or deleted file.
|
||||
Note that just modifying a file will NOT update its directory time.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--test",
|
||||
"-t",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="Test run: do not actually update any file timestamp.",
|
||||
)
|
||||
parser.add_argument('--test', '-t', default=False, action="store_true",
|
||||
help="Test run: do not actually update any file timestamp.")
|
||||
|
||||
parser.add_argument(
|
||||
"--commit-time",
|
||||
"-c",
|
||||
dest="commit_time",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="Use commit time instead of author time.",
|
||||
)
|
||||
parser.add_argument('--commit-time', '-c', dest='commit_time', default=False,
|
||||
action='store_true', help="Use commit time instead of author time.")
|
||||
|
||||
parser.add_argument(
|
||||
"--oldest-time",
|
||||
"-o",
|
||||
dest="reverse_order",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False,
|
||||
action='store_true', help="""
|
||||
Update times based on the oldest, instead of the most recent commit of a file.
|
||||
This reverses the order in which the git log is processed to emulate a
|
||||
file "creation" date. Note this will be inaccurate for files deleted and
|
||||
re-created at later dates.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-older-than",
|
||||
metavar="SECONDS",
|
||||
type=int,
|
||||
help="""
|
||||
parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help="""
|
||||
Ignore files that are currently older than %(metavar)s.
|
||||
Useful in workflows that assume such files already have a correct timestamp,
|
||||
as it may improve performance by processing fewer files.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--skip-older-than-commit",
|
||||
"-N",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--skip-older-than-commit', '-N', default=False,
|
||||
action='store_true', help="""
|
||||
Ignore files older than the timestamp it would be updated to.
|
||||
Such files may be considered "original", likely in the author's repository.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--unique-times",
|
||||
default=False,
|
||||
action="store_true",
|
||||
help="""
|
||||
parser.add_argument('--unique-times', default=False, action="store_true", help="""
|
||||
Set the microseconds to a unique value per commit.
|
||||
Allows telling apart changes that would otherwise have identical timestamps,
|
||||
as git's time accuracy is in seconds.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"pathspec",
|
||||
nargs="*",
|
||||
metavar="PATHSPEC",
|
||||
help="""
|
||||
parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help="""
|
||||
Only modify paths matching %(metavar)s, relative to current directory.
|
||||
By default, update all but untracked files and submodules.
|
||||
""",
|
||||
)
|
||||
""")
|
||||
|
||||
parser.add_argument(
|
||||
"--version",
|
||||
"-V",
|
||||
action="version",
|
||||
version="%(prog)s version {version}".format(version=get_version()),
|
||||
)
|
||||
parser.add_argument('--version', '-V', action='version',
|
||||
version='%(prog)s version {version}'.format(version=get_version()))
|
||||
|
||||
args_ = parser.parse_args()
|
||||
if args_.verbose:
|
||||
@@ -308,18 +212,17 @@ def parse_args():
|
||||
|
||||
|
||||
def get_version(version=__version__):
|
||||
if not version.endswith("+dev"):
|
||||
if not version.endswith('+dev'):
|
||||
return version
|
||||
try:
|
||||
cwd = os.path.dirname(os.path.realpath(__file__))
|
||||
return Git(cwd=cwd, errors=False).describe().lstrip("v")
|
||||
return Git(cwd=cwd, errors=False).describe().lstrip('v')
|
||||
except Git.Error:
|
||||
return "-".join((version, "unknown"))
|
||||
return '-'.join((version, "unknown"))
|
||||
|
||||
|
||||
# Helper functions ############################################################
|
||||
|
||||
|
||||
def setup_logging():
|
||||
"""Add TRACE logging level and corresponding method, return the root logger"""
|
||||
logging.TRACE = TRACE = logging.DEBUG // 2
|
||||
@@ -352,13 +255,11 @@ def normalize(path):
|
||||
if path and path[0] == '"':
|
||||
# Python 2: path = path[1:-1].decode("string-escape")
|
||||
# Python 3: https://stackoverflow.com/a/46650050/624066
|
||||
path = (
|
||||
path[1:-1] # Remove enclosing double quotes
|
||||
.encode("latin1") # Convert to bytes, required by 'unicode-escape'
|
||||
.decode("unicode-escape") # Perform the actual octal-escaping decode
|
||||
.encode("latin1") # 1:1 mapping to bytes, UTF-8 encoded
|
||||
.decode("utf8", "surrogateescape")
|
||||
) # Decode from UTF-8
|
||||
path = (path[1:-1] # Remove enclosing double quotes
|
||||
.encode('latin1') # Convert to bytes, required by 'unicode-escape'
|
||||
.decode('unicode-escape') # Perform the actual octal-escaping decode
|
||||
.encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded
|
||||
.decode('utf8', 'surrogateescape')) # Decode from UTF-8
|
||||
if NORMALIZE_PATHS:
|
||||
# Make sure the slash matches the OS; for Windows we need a backslash
|
||||
path = os.path.normpath(path)
|
||||
@@ -381,12 +282,12 @@ def touch_ns(path, mtime_ns):
|
||||
|
||||
def isodate(secs: int):
|
||||
# time.localtime() accepts floats, but discards fractional part
|
||||
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(secs))
|
||||
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs))
|
||||
|
||||
|
||||
def isodate_ns(ns: int):
|
||||
# for integers fromtimestamp() is equivalent and ~16% slower than isodate()
|
||||
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=" ")
|
||||
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ')
|
||||
|
||||
|
||||
def get_mtime_ns(secs: int, idx: int):
|
||||
@@ -404,49 +305,35 @@ def get_mtime_path(path):
|
||||
|
||||
# Git class and parse_log(), the heart of the script ##########################
|
||||
|
||||
|
||||
class Git:
|
||||
def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True):
|
||||
self.gitcmd = ["git"]
|
||||
self.gitcmd = ['git']
|
||||
self.errors = errors
|
||||
self._proc = None
|
||||
if workdir:
|
||||
self.gitcmd.extend(("--work-tree", workdir))
|
||||
if gitdir:
|
||||
self.gitcmd.extend(("--git-dir", gitdir))
|
||||
if cwd:
|
||||
self.gitcmd.extend(("-C", cwd))
|
||||
if workdir: self.gitcmd.extend(('--work-tree', workdir))
|
||||
if gitdir: self.gitcmd.extend(('--git-dir', gitdir))
|
||||
if cwd: self.gitcmd.extend(('-C', cwd))
|
||||
self.workdir, self.gitdir = self._get_repo_dirs()
|
||||
|
||||
def ls_files(self, paths: list = None):
|
||||
return (normalize(_) for _ in self._run("ls-files --full-name", paths))
|
||||
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
|
||||
|
||||
def ls_dirty(self, force=False):
|
||||
return (
|
||||
normalize(_[3:].split(" -> ", 1)[-1])
|
||||
for _ in self._run("status --porcelain")
|
||||
if _[:2] != "??" and (not force or (_[0] in ("R", "A") or _[1] == "D"))
|
||||
)
|
||||
return (normalize(_[3:].split(' -> ', 1)[-1])
|
||||
for _ in self._run('status --porcelain')
|
||||
if _[:2] != '??' and (not force or (_[0] in ('R', 'A')
|
||||
or _[1] == 'D')))
|
||||
|
||||
def log(
|
||||
self,
|
||||
merge=False,
|
||||
first_parent=False,
|
||||
commit_time=False,
|
||||
reverse_order=False,
|
||||
paths: list = None,
|
||||
):
|
||||
cmd = "whatchanged --pretty={}".format("%ct" if commit_time else "%at")
|
||||
if merge:
|
||||
cmd += " -m"
|
||||
if first_parent:
|
||||
cmd += " --first-parent"
|
||||
if reverse_order:
|
||||
cmd += " --reverse"
|
||||
def log(self, merge=False, first_parent=False, commit_time=False,
|
||||
reverse_order=False, paths: list = None):
|
||||
cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at')
|
||||
if merge: cmd += ' -m'
|
||||
if first_parent: cmd += ' --first-parent'
|
||||
if reverse_order: cmd += ' --reverse'
|
||||
return self._run(cmd, paths)
|
||||
|
||||
def describe(self):
|
||||
return self._run("describe --tags", check=True)[0]
|
||||
return self._run('describe --tags', check=True)[0]
|
||||
|
||||
def terminate(self):
|
||||
if self._proc is None:
|
||||
@@ -458,22 +345,18 @@ class Git:
|
||||
pass
|
||||
|
||||
def _get_repo_dirs(self):
|
||||
return (
|
||||
os.path.normpath(_)
|
||||
for _ in self._run(
|
||||
"rev-parse --show-toplevel --absolute-git-dir", check=True
|
||||
)
|
||||
)
|
||||
return (os.path.normpath(_) for _ in
|
||||
self._run('rev-parse --show-toplevel --absolute-git-dir', check=True))
|
||||
|
||||
def _run(self, cmdstr: str, paths: list = None, output=True, check=False):
|
||||
cmdlist = self.gitcmd + shlex.split(cmdstr)
|
||||
if paths:
|
||||
cmdlist.append("--")
|
||||
cmdlist.append('--')
|
||||
cmdlist.extend(paths)
|
||||
popen_args = dict(universal_newlines=True, encoding="utf8")
|
||||
popen_args = dict(universal_newlines=True, encoding='utf8')
|
||||
if not self.errors:
|
||||
popen_args["stderr"] = subprocess.DEVNULL
|
||||
log.trace("Executing: %s", " ".join(cmdlist))
|
||||
popen_args['stderr'] = subprocess.DEVNULL
|
||||
log.trace("Executing: %s", ' '.join(cmdlist))
|
||||
if not output:
|
||||
return subprocess.call(cmdlist, **popen_args)
|
||||
if check:
|
||||
@@ -496,26 +379,30 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
mtime = 0
|
||||
datestr = isodate(0)
|
||||
for line in git.log(
|
||||
merge, args.first_parent, args.commit_time, args.reverse_order, filterlist
|
||||
merge,
|
||||
args.first_parent,
|
||||
args.commit_time,
|
||||
args.reverse_order,
|
||||
filterlist
|
||||
):
|
||||
stats["loglines"] += 1
|
||||
stats['loglines'] += 1
|
||||
|
||||
# Blank line between Date and list of files
|
||||
if not line:
|
||||
continue
|
||||
|
||||
# Date line
|
||||
if line[0] != ":": # Faster than `not line.startswith(':')`
|
||||
stats["commits"] += 1
|
||||
if line[0] != ':': # Faster than `not line.startswith(':')`
|
||||
stats['commits'] += 1
|
||||
mtime = int(line)
|
||||
if args.unique_times:
|
||||
mtime = get_mtime_ns(mtime, stats["commits"])
|
||||
mtime = get_mtime_ns(mtime, stats['commits'])
|
||||
if args.debug:
|
||||
datestr = isodate(mtime)
|
||||
continue
|
||||
|
||||
# File line: three tokens if it describes a renaming, otherwise two
|
||||
tokens = line.split("\t")
|
||||
tokens = line.split('\t')
|
||||
|
||||
# Possible statuses:
|
||||
# M: Modified (content changed)
|
||||
@@ -524,7 +411,7 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
# T: Type changed: to/from regular file, symlinks, submodules
|
||||
# R099: Renamed (moved), with % of unchanged content. 100 = pure rename
|
||||
# Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken
|
||||
status = tokens[0].split(" ")[-1]
|
||||
status = tokens[0].split(' ')[-1]
|
||||
file = tokens[-1]
|
||||
|
||||
# Handles non-ASCII chars and OS path separator
|
||||
@@ -532,76 +419,56 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
|
||||
|
||||
def do_file():
|
||||
if args.skip_older_than_commit and get_mtime_path(file) <= mtime:
|
||||
stats["skip"] += 1
|
||||
stats['skip'] += 1
|
||||
return
|
||||
if args.debug:
|
||||
log.debug(
|
||||
"%d\t%d\t%d\t%s\t%s",
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
stats["files"],
|
||||
datestr,
|
||||
file,
|
||||
)
|
||||
log.debug("%d\t%d\t%d\t%s\t%s",
|
||||
stats['loglines'], stats['commits'], stats['files'],
|
||||
datestr, file)
|
||||
try:
|
||||
touch(os.path.join(git.workdir, file), mtime)
|
||||
stats["touches"] += 1
|
||||
stats['touches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, file)
|
||||
stats["errors"] += 1
|
||||
stats['errors'] += 1
|
||||
|
||||
def do_dir():
|
||||
if args.debug:
|
||||
log.debug(
|
||||
"%d\t%d\t-\t%s\t%s",
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
datestr,
|
||||
"{}/".format(dirname or "."),
|
||||
)
|
||||
log.debug("%d\t%d\t-\t%s\t%s",
|
||||
stats['loglines'], stats['commits'],
|
||||
datestr, "{}/".format(dirname or '.'))
|
||||
try:
|
||||
touch(os.path.join(git.workdir, dirname), mtime)
|
||||
stats["dirtouches"] += 1
|
||||
stats['dirtouches'] += 1
|
||||
except Exception as e:
|
||||
log.error("ERROR: %s: %s", e, dirname)
|
||||
stats["direrrors"] += 1
|
||||
stats['direrrors'] += 1
|
||||
|
||||
if file in filelist:
|
||||
stats["files"] -= 1
|
||||
stats['files'] -= 1
|
||||
filelist.remove(file)
|
||||
do_file()
|
||||
|
||||
if args.dirs and status in ("A", "D"):
|
||||
if args.dirs and status in ('A', 'D'):
|
||||
dirname = os.path.dirname(file)
|
||||
if dirname in dirlist:
|
||||
dirlist.remove(dirname)
|
||||
do_dir()
|
||||
|
||||
# All files done?
|
||||
if not stats["files"]:
|
||||
if not stats['files']:
|
||||
git.terminate()
|
||||
return
|
||||
|
||||
|
||||
# Main Logic ##################################################################
|
||||
|
||||
|
||||
def main():
|
||||
start = time.time() # yes, Wall time. CPU time is not realistic for users.
|
||||
stats = {
|
||||
_: 0
|
||||
for _ in (
|
||||
"loglines",
|
||||
"commits",
|
||||
"touches",
|
||||
"skip",
|
||||
"errors",
|
||||
"dirtouches",
|
||||
"direrrors",
|
||||
)
|
||||
}
|
||||
stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors',
|
||||
'dirtouches', 'direrrors')}
|
||||
|
||||
logging.basicConfig(level=args.loglevel, format="%(message)s")
|
||||
logging.basicConfig(level=args.loglevel, format='%(message)s')
|
||||
log.trace("Arguments: %s", args)
|
||||
|
||||
# First things first: Where and Who are we?
|
||||
@@ -632,16 +499,13 @@ def main():
|
||||
|
||||
# Symlink (to file, to dir or broken - git handles the same way)
|
||||
if not UPDATE_SYMLINKS and os.path.islink(fullpath):
|
||||
log.warning(
|
||||
"WARNING: Skipping symlink, no OS support for updates: %s", path
|
||||
)
|
||||
log.warning("WARNING: Skipping symlink, no OS support for updates: %s",
|
||||
path)
|
||||
continue
|
||||
|
||||
# skip files which are older than given threshold
|
||||
if (
|
||||
args.skip_older_than
|
||||
and start - get_mtime_path(fullpath) > args.skip_older_than
|
||||
):
|
||||
if (args.skip_older_than
|
||||
and start - get_mtime_path(fullpath) > args.skip_older_than):
|
||||
continue
|
||||
|
||||
# Always add files relative to worktree root
|
||||
@@ -655,17 +519,15 @@ def main():
|
||||
else:
|
||||
dirty = set(git.ls_dirty())
|
||||
if dirty:
|
||||
log.warning(
|
||||
"WARNING: Modified files in the working directory were ignored."
|
||||
"\nTo include such files, commit your changes or use --force."
|
||||
)
|
||||
log.warning("WARNING: Modified files in the working directory were ignored."
|
||||
"\nTo include such files, commit your changes or use --force.")
|
||||
filelist -= dirty
|
||||
|
||||
# Build dir list to be processed
|
||||
dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set()
|
||||
|
||||
stats["totalfiles"] = stats["files"] = len(filelist)
|
||||
log.info("{0:,} files to be processed in work dir".format(stats["totalfiles"]))
|
||||
stats['totalfiles'] = stats['files'] = len(filelist)
|
||||
log.info("{0:,} files to be processed in work dir".format(stats['totalfiles']))
|
||||
|
||||
if not filelist:
|
||||
# Nothing to do. Exit silently and without errors, just like git does
|
||||
@@ -682,18 +544,10 @@ def main():
|
||||
if args.missing and not args.merge:
|
||||
filterlist = list(filelist)
|
||||
missing = len(filterlist)
|
||||
log.info(
|
||||
"{0:,} files not found in log, trying merge commits".format(missing)
|
||||
)
|
||||
log.info("{0:,} files not found in log, trying merge commits".format(missing))
|
||||
for i in range(0, missing, STEPMISSING):
|
||||
parse_log(
|
||||
filelist,
|
||||
dirlist,
|
||||
stats,
|
||||
git,
|
||||
merge=True,
|
||||
filterlist=filterlist[i : i + STEPMISSING],
|
||||
)
|
||||
parse_log(filelist, dirlist, stats, git,
|
||||
merge=True, filterlist=filterlist[i:i + STEPMISSING])
|
||||
|
||||
# Still missing some?
|
||||
for file in filelist:
|
||||
@@ -702,33 +556,29 @@ def main():
|
||||
# Final statistics
|
||||
# Suggestion: use git-log --before=mtime to brag about skipped log entries
|
||||
def log_info(msg, *a, width=13):
|
||||
ifmt = "{:%d,}" % (width,) # not using 'n' for consistency with ffmt
|
||||
ffmt = "{:%d,.2f}" % (width,)
|
||||
ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt
|
||||
ffmt = '{:%d,.2f}' % (width,)
|
||||
# %-formatting lacks a thousand separator, must pre-render with .format()
|
||||
log.info(msg.replace("%d", ifmt).replace("%f", ffmt).format(*a))
|
||||
log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a))
|
||||
|
||||
log_info(
|
||||
"Statistics:\n%f seconds\n%d log lines processed\n%d commits evaluated",
|
||||
time.time() - start,
|
||||
stats["loglines"],
|
||||
stats["commits"],
|
||||
)
|
||||
"Statistics:\n"
|
||||
"%f seconds\n"
|
||||
"%d log lines processed\n"
|
||||
"%d commits evaluated",
|
||||
time.time() - start, stats['loglines'], stats['commits'])
|
||||
|
||||
if args.dirs:
|
||||
if stats["direrrors"]:
|
||||
log_info("%d directory update errors", stats["direrrors"])
|
||||
log_info("%d directories updated", stats["dirtouches"])
|
||||
if stats['direrrors']: log_info("%d directory update errors", stats['direrrors'])
|
||||
log_info("%d directories updated", stats['dirtouches'])
|
||||
|
||||
if stats["touches"] != stats["totalfiles"]:
|
||||
log_info("%d files", stats["totalfiles"])
|
||||
if stats["skip"]:
|
||||
log_info("%d files skipped", stats["skip"])
|
||||
if stats["files"]:
|
||||
log_info("%d files missing", stats["files"])
|
||||
if stats["errors"]:
|
||||
log_info("%d file update errors", stats["errors"])
|
||||
if stats['touches'] != stats['totalfiles']:
|
||||
log_info("%d files", stats['totalfiles'])
|
||||
if stats['skip']: log_info("%d files skipped", stats['skip'])
|
||||
if stats['files']: log_info("%d files missing", stats['files'])
|
||||
if stats['errors']: log_info("%d file update errors", stats['errors'])
|
||||
|
||||
log_info("%d files updated", stats["touches"])
|
||||
log_info("%d files updated", stats['touches'])
|
||||
|
||||
if args.test:
|
||||
log.info("TEST RUN - No files modified!")
|
||||
|
||||
7
.github/workflows/.codespell-exclude
vendored
Normal file
7
.github/workflows/.codespell-exclude
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
libs/community/langchain_community/llms/yuan2.py
|
||||
"NotIn": "not in",
|
||||
- `/checkin`: Check-in
|
||||
docs/docs/integrations/providers/trulens.mdx
|
||||
self.assertIn(
|
||||
from trulens_eval import Tru
|
||||
tru = Tru()
|
||||
40
.github/workflows/_compile_integration_test.yml
vendored
40
.github/workflows/_compile_integration_test.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '🔗 Compile Integration Tests'
|
||||
name: compile-integration-test
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -7,16 +7,9 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: true
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -24,25 +17,34 @@ jobs:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "poetry run pytest -m compile tests/integration_tests #${{ matrix.python-version }}"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: compile-integration
|
||||
|
||||
- name: '📦 Install Integration Dependencies'
|
||||
- name: Install integration dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test --group test_integration
|
||||
run: poetry install --with=test_integration,test
|
||||
|
||||
- name: '🔗 Check Integration Tests Compile'
|
||||
- name: Check integration tests compile
|
||||
shell: bash
|
||||
run: uv run pytest -m compile tests/integration_tests
|
||||
run: poetry run pytest -m compile tests/integration_tests
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
117
.github/workflows/_dependencies.yml
vendored
Normal file
117
.github/workflows/_dependencies.yml
vendored
Normal file
@@ -0,0 +1,117 @@
|
||||
name: dependencies
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: dependency checks ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: pydantic-cross-compat
|
||||
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install
|
||||
|
||||
- name: Check imports with base dependencies
|
||||
shell: bash
|
||||
run: poetry run make check_imports
|
||||
|
||||
- name: Install test dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Install the opposite major version of pydantic
|
||||
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
|
||||
shell: bash
|
||||
# airbyte currently doesn't support pydantic v2
|
||||
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
|
||||
run: |
|
||||
# Determine the major part of pydantic version
|
||||
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
if [[ "$REGULAR_VERSION" == "1" ]]; then
|
||||
PYDANTIC_DEP=">=2.1,<3"
|
||||
TEST_WITH_VERSION="2"
|
||||
elif [[ "$REGULAR_VERSION" == "2" ]]; then
|
||||
PYDANTIC_DEP="<2"
|
||||
TEST_WITH_VERSION="1"
|
||||
else
|
||||
echo "Unexpected pydantic major version '$REGULAR_VERSION', cannot determine which version to use for cross-compatibility test."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Install via `pip` instead of `poetry add` to avoid changing lockfile,
|
||||
# which would prevent caching from working: the cache would get saved
|
||||
# to a different key than where it gets loaded from.
|
||||
poetry run pip install "pydantic${PYDANTIC_DEP}"
|
||||
|
||||
# Ensure that the correct pydantic is installed now.
|
||||
echo "Checking pydantic version... Expecting ${TEST_WITH_VERSION}"
|
||||
|
||||
# Determine the major part of pydantic version
|
||||
CURRENT_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
|
||||
|
||||
# Check that the major part of pydantic version is as expected, if not
|
||||
# raise an error
|
||||
if [[ "$CURRENT_VERSION" != "$TEST_WITH_VERSION" ]]; then
|
||||
echo "Error: expected pydantic version ${CURRENT_VERSION} to have been installed, but found: ${TEST_WITH_VERSION}"
|
||||
exit 1
|
||||
fi
|
||||
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
|
||||
- name: Run pydantic compatibility tests
|
||||
# airbyte currently doesn't support pydantic v2
|
||||
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
|
||||
shell: bash
|
||||
run: make test
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
65
.github/workflows/_integration_test.yml
vendored
65
.github/workflows/_integration_test.yml
vendored
@@ -1,5 +1,4 @@
|
||||
name: '🚀 Integration Tests'
|
||||
run-name: 'Test ${{ inputs.working-directory }} on Python ${{ inputs.python-version }}'
|
||||
name: Integration tests
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
@@ -7,54 +6,55 @@ on:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: true
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
default: "3.11"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
environment: Scheduled testing
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.11"
|
||||
name: Python ${{ matrix.python-version }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: '📦 Install Integration Dependencies'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test --group test_integration
|
||||
run: poetry install --with test,test_integration
|
||||
|
||||
- name: '🚀 Run Integration Tests'
|
||||
- name: Install deps outside pyproject
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
|
||||
shell: bash
|
||||
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
|
||||
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
shell: bash
|
||||
env:
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
@@ -62,22 +62,23 @@ jobs:
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
run: |
|
||||
make integration_tests
|
||||
|
||||
|
||||
99
.github/workflows/_lint.yml
vendored
99
.github/workflows/_lint.yml
vendored
@@ -1,6 +1,4 @@
|
||||
name: '🧹 Code Linting'
|
||||
# Runs code quality checks using ruff, mypy, and other linting tools
|
||||
# Checks both package code and test code for consistency
|
||||
name: lint
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -9,38 +7,58 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: true
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.7.1"
|
||||
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
|
||||
|
||||
# This env var allows us to get inline annotations when ruff has complaints.
|
||||
RUFF_OUTPUT_FORMAT: github
|
||||
|
||||
UV_FROZEN: "true"
|
||||
|
||||
jobs:
|
||||
# Linting job - runs quality checks on package and test code
|
||||
build:
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
name: "make lint #${{ matrix.python-version }}"
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
strategy:
|
||||
matrix:
|
||||
# Only lint on the min and max supported Python versions.
|
||||
# It's extremely unlikely that there's a lint issue on any version in between
|
||||
# that doesn't show up on the min or max versions.
|
||||
#
|
||||
# GitHub rate-limits how many jobs can be running at any one time.
|
||||
# Starting new jobs is also relatively slow,
|
||||
# so linting on fewer versions makes CI faster.
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.11"
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: lint-with-extras
|
||||
|
||||
- name: '📦 Install Lint & Typing Dependencies'
|
||||
- name: Check Poetry File
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry check
|
||||
|
||||
- name: Check lock file
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
poetry lock --check
|
||||
|
||||
- name: Install dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
@@ -51,14 +69,32 @@ jobs:
|
||||
# It doesn't matter how you change it, any change will cause a cache-bust.
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --group lint --group typing
|
||||
poetry install --with lint,typing
|
||||
|
||||
- name: '🔍 Analyze Package Code with Linters'
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Get .mypy_cache to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache
|
||||
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
make lint_package
|
||||
|
||||
- name: '📦 Install Unit Test Dependencies'
|
||||
- name: Install unit test dependencies
|
||||
# Also installs dev/lint/test/typing dependencies, to ensure we have
|
||||
# type hints for as many of our libraries as possible.
|
||||
# This helps catch errors that require dependencies to be spotted, for example:
|
||||
@@ -70,14 +106,23 @@ jobs:
|
||||
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --inexact --group test
|
||||
- name: '📦 Install Unit + Integration Test Dependencies'
|
||||
poetry install --with test
|
||||
- name: Install unit+integration test dependencies
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv sync --inexact --group test --group test_integration
|
||||
poetry install --with test,test_integration
|
||||
|
||||
- name: '🔍 Analyze Test Code with Linters'
|
||||
- name: Get .mypy_cache_test to speed up mypy
|
||||
uses: actions/cache@v4
|
||||
env:
|
||||
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
|
||||
with:
|
||||
path: |
|
||||
${{ env.WORKDIR }}/.mypy_cache_test
|
||||
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
|
||||
|
||||
- name: Analysing the code with our lint
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
make lint_tests
|
||||
|
||||
260
.github/workflows/_release.yml
vendored
260
.github/workflows/_release.yml
vendored
@@ -1,5 +1,5 @@
|
||||
name: '🚀 Package Release'
|
||||
run-name: 'Release ${{ inputs.working-directory }} ${{ inputs.release-version }}'
|
||||
name: release
|
||||
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
@@ -12,27 +12,18 @@ on:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
default: 'libs/langchain'
|
||||
release-version:
|
||||
required: true
|
||||
type: string
|
||||
default: '0.1.0'
|
||||
description: "New version of package being released"
|
||||
dangerous-nonmaster-release:
|
||||
required: false
|
||||
type: boolean
|
||||
default: false
|
||||
description: "Release from a non-master branch (danger!) - Only use for hotfixes"
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Build the distribution package and extract version info
|
||||
# Runs in isolated environment with minimal permissions for security
|
||||
build:
|
||||
if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release
|
||||
environment: Scheduled testing
|
||||
@@ -45,10 +36,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
@@ -62,7 +56,7 @@ jobs:
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: Build project for distribution
|
||||
run: uv build
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Upload build
|
||||
@@ -71,20 +65,13 @@ jobs:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Check version
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: python
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
import os
|
||||
import tomllib
|
||||
with open("pyproject.toml", "rb") as f:
|
||||
data = tomllib.load(f)
|
||||
pkg_name = data["project"]["name"]
|
||||
version = data["project"]["version"]
|
||||
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
|
||||
f.write(f"pkg-name={pkg_name}\n")
|
||||
f.write(f"version={version}\n")
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
release-notes:
|
||||
needs:
|
||||
- build
|
||||
@@ -98,9 +85,9 @@ jobs:
|
||||
path: langchain
|
||||
sparse-checkout: | # this only grabs files for relevant dir
|
||||
${{ inputs.working-directory }}
|
||||
ref: ${{ github.ref }} # this scopes to just ref'd branch
|
||||
ref: master # this scopes to just master branch
|
||||
fetch-depth: 0 # this fetches entire commit history
|
||||
- name: Check tags
|
||||
- name: Check Tags
|
||||
id: check-tags
|
||||
shell: bash
|
||||
working-directory: langchain/${{ inputs.working-directory }}
|
||||
@@ -108,47 +95,9 @@ jobs:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
# Handle regular versions and pre-release versions differently
|
||||
if [[ "$VERSION" == *"-"* ]]; then
|
||||
# This is a pre-release version (contains a hyphen)
|
||||
# Extract the base version without the pre-release suffix
|
||||
BASE_VERSION=${VERSION%%-*}
|
||||
# Look for the latest release of the same base version
|
||||
REGEX="^$PKG_NAME==$BASE_VERSION\$"
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
|
||||
|
||||
# If no exact base version match, look for the latest release of any kind
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
|
||||
fi
|
||||
else
|
||||
# Regular version handling
|
||||
PREV_TAG="$PKG_NAME==${VERSION%.*}.$(( ${VERSION##*.} - 1 ))"; [[ "${VERSION##*.}" -eq 0 ]] && PREV_TAG=""
|
||||
|
||||
# backup case if releasing e.g. 0.3.0, looks up last release
|
||||
# note if last release (chronologically) was e.g. 0.1.47 it will get
|
||||
# that instead of the last 0.2 release
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
echo $REGEX
|
||||
PREV_TAG=$(git tag --sort=-creatordate | (grep -P $REGEX || true) | head -1)
|
||||
fi
|
||||
fi
|
||||
|
||||
# if PREV_TAG is empty, let it be empty
|
||||
if [ -z "$PREV_TAG" ]; then
|
||||
echo "No previous tag found - first release"
|
||||
else
|
||||
# confirm prev-tag actually exists in git repo with git tag
|
||||
GIT_TAG_RESULT=$(git tag -l "$PREV_TAG")
|
||||
if [ -z "$GIT_TAG_RESULT" ]; then
|
||||
echo "Previous tag $PREV_TAG not found in git repo"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
|
||||
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
|
||||
echo $REGEX
|
||||
PREV_TAG=$(git tag --sort=-creatordate | grep -P $REGEX || true | head -1)
|
||||
TAG="${PKG_NAME}==${VERSION}"
|
||||
if [ "$TAG" == "$PREV_TAG" ]; then
|
||||
echo "No new version to release"
|
||||
@@ -173,6 +122,7 @@ jobs:
|
||||
fi
|
||||
{
|
||||
echo 'release-body<<EOF'
|
||||
echo "# Release $TAG"
|
||||
echo $PREAMBLE
|
||||
echo
|
||||
git log --format="%s" "$PREV_TAG"..HEAD -- $WORKING_DIR
|
||||
@@ -185,7 +135,6 @@ jobs:
|
||||
- release-notes
|
||||
uses:
|
||||
./.github/workflows/_test_release.yml
|
||||
permissions: write-all
|
||||
with:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
dangerous-nonmaster-release: ${{ inputs.dangerous-nonmaster-release }}
|
||||
@@ -197,7 +146,6 @@ jobs:
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
@@ -214,18 +162,14 @@ jobs:
|
||||
# - The package is published, and it breaks on the missing dependency when
|
||||
# used in the real world.
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
id: setup-python
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Import dist package
|
||||
- name: Import published package
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
env:
|
||||
@@ -241,21 +185,27 @@ jobs:
|
||||
# - attempt install again after 5 seconds if it fails because there is
|
||||
# sometimes a delay in availability on test pypi
|
||||
run: |
|
||||
uv venv
|
||||
VIRTUAL_ENV=.venv uv pip install dist/*.whl
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" || \
|
||||
( \
|
||||
sleep 5 && \
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION" \
|
||||
)
|
||||
|
||||
# Replace all dashes in the package name with underscores,
|
||||
# since that's how Python imports packages with dashes in the name.
|
||||
# also remove _official suffix
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g | sed s/_official//g)"
|
||||
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
|
||||
|
||||
uv run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
|
||||
|
||||
- name: Import test dependencies
|
||||
run: uv sync --group test
|
||||
run: poetry install --with test,test_integration
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Overwrite the local version of the package with the built version
|
||||
# Overwrite the local version of the package with the test PyPI version.
|
||||
- name: Import published package (again)
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
shell: bash
|
||||
@@ -263,24 +213,20 @@ jobs:
|
||||
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
|
||||
VERSION: ${{ needs.build.outputs.version }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install dist/*.whl
|
||||
poetry run pip install \
|
||||
--extra-index-url https://test.pypi.org/simple/ \
|
||||
"$PKG_NAME==$VERSION"
|
||||
|
||||
- name: Run unit tests
|
||||
run: make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Check for prerelease versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
uv run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
|
||||
|
||||
- name: Get minimum versions
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install packaging requests
|
||||
python_version="$(uv run python --version | awk '{print $2}')"
|
||||
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)"
|
||||
poetry run pip install packaging
|
||||
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
@@ -289,13 +235,15 @@ jobs:
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install --force-reinstall $MIN_VERSIONS --editable .
|
||||
poetry run pip install --force-reinstall $MIN_VERSIONS --editable .
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: Import integration test dependencies
|
||||
run: uv sync --group test --group test_integration
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: Run integration tests
|
||||
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
|
||||
@@ -310,122 +258,38 @@ jobs:
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
|
||||
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
|
||||
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
|
||||
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
|
||||
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
|
||||
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
|
||||
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
|
||||
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
|
||||
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
|
||||
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
|
||||
ES_URL: ${{ secrets.ES_URL }}
|
||||
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
|
||||
ES_API_KEY: ${{ secrets.ES_API_KEY }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
|
||||
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
|
||||
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
|
||||
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
run: make integration_tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
# Test select published packages against new core
|
||||
test-prior-published-packages-against-new-core:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
partner: [openai, anthropic]
|
||||
fail-fast: false # Continue testing other partners if one fails
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We implement this conditional as Github Actions does not have good support
|
||||
# for conditionally needing steps. https://github.com/actions/runner/issues/491
|
||||
- name: Check if libs/core
|
||||
run: |
|
||||
if [ "${{ startsWith(inputs.working-directory, 'libs/core') }}" != "true" ]; then
|
||||
echo "Not in libs/core. Exiting successfully."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
- name: Set up Python + uv
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
uses: "./.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: Test against ${{ matrix.partner }}
|
||||
if: startsWith(inputs.working-directory, 'libs/core')
|
||||
run: |
|
||||
# Identify latest tag, excluding pre-releases
|
||||
LATEST_PACKAGE_TAG="$(
|
||||
git ls-remote --tags origin "langchain-${{ matrix.partner }}*" \
|
||||
| awk '{print $2}' \
|
||||
| sed 's|refs/tags/||' \
|
||||
| grep -Ev '==[^=]*(\.?dev[0-9]*|\.?rc[0-9]*)$' \
|
||||
| sort -Vr \
|
||||
| head -n 1
|
||||
)"
|
||||
echo "Latest package tag: $LATEST_PACKAGE_TAG"
|
||||
|
||||
# Shallow-fetch just that single tag
|
||||
git fetch --depth=1 origin tag "$LATEST_PACKAGE_TAG"
|
||||
|
||||
# Checkout the latest package files
|
||||
rm -rf $GITHUB_WORKSPACE/libs/partners/${{ matrix.partner }}/*
|
||||
rm -rf $GITHUB_WORKSPACE/libs/standard-tests/*
|
||||
cd $GITHUB_WORKSPACE/libs/
|
||||
git checkout "$LATEST_PACKAGE_TAG" -- standard-tests/
|
||||
git checkout "$LATEST_PACKAGE_TAG" -- partners/${{ matrix.partner }}/
|
||||
cd partners/${{ matrix.partner }}
|
||||
|
||||
# Print as a sanity check
|
||||
echo "Version number from pyproject.toml: "
|
||||
cat pyproject.toml | grep "version = "
|
||||
|
||||
# Run tests
|
||||
uv sync --group test --group test_integration
|
||||
uv pip install ../../core/dist/*.whl
|
||||
make integration_tests
|
||||
|
||||
publish:
|
||||
needs:
|
||||
- build
|
||||
- release-notes
|
||||
- test-pypi-publish
|
||||
- pre-release-checks
|
||||
- test-prior-published-packages-against-new-core
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
# This permission is used for trusted publishing:
|
||||
@@ -442,10 +306,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
@@ -458,8 +325,6 @@ jobs:
|
||||
packages-dir: ${{ inputs.working-directory }}/dist/
|
||||
verbose: true
|
||||
print-hash: true
|
||||
# Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
|
||||
attestations: false
|
||||
|
||||
mark-release:
|
||||
needs:
|
||||
@@ -481,16 +346,19 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Python + uv
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
|
||||
- name: Create Tag
|
||||
uses: ncipollo/release-action@v1
|
||||
with:
|
||||
|
||||
62
.github/workflows/_release_docker.yml
vendored
Normal file
62
.github/workflows/_release_docker.yml
vendored
Normal file
@@ -0,0 +1,62 @@
|
||||
name: release_docker
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
dockerfile:
|
||||
required: true
|
||||
type: string
|
||||
description: "Path to the Dockerfile to build"
|
||||
image:
|
||||
required: true
|
||||
type: string
|
||||
description: "Name of the image to build"
|
||||
|
||||
env:
|
||||
TEST_TAG: ${{ inputs.image }}:test
|
||||
LATEST_TAG: ${{ inputs.image }}:latest
|
||||
|
||||
jobs:
|
||||
docker:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
- name: Get git tag
|
||||
uses: actions-ecosystem/action-get-latest-tag@v1
|
||||
id: get-latest-tag
|
||||
- name: Set docker tag
|
||||
env:
|
||||
VERSION: ${{ steps.get-latest-tag.outputs.tag }}
|
||||
run: |
|
||||
echo "VERSION_TAG=${{ inputs.image }}:${VERSION#v}" >> $GITHUB_ENV
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@v3
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
- name: Login to Docker Hub
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
- name: Build for Test
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
file: ${{ inputs.dockerfile }}
|
||||
load: true
|
||||
tags: ${{ env.TEST_TAG }}
|
||||
- name: Test
|
||||
run: |
|
||||
docker run --rm ${{ env.TEST_TAG }} python -c "import langchain"
|
||||
- name: Build and Push to Docker Hub
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
file: ${{ inputs.dockerfile }}
|
||||
# We can only build for the intersection of platforms supported by
|
||||
# QEMU and base python image, for now build only for
|
||||
# linux/amd64 and linux/arm64
|
||||
platforms: linux/amd64,linux/arm64
|
||||
tags: ${{ env.LATEST_TAG }},${{ env.VERSION_TAG }}
|
||||
push: true
|
||||
80
.github/workflows/_test.yml
vendored
80
.github/workflows/_test.yml
vendored
@@ -1,6 +1,4 @@
|
||||
name: '🧪 Unit Testing'
|
||||
# Runs unit tests with both current and minimum supported dependency versions
|
||||
# to ensure compatibility across the supported range
|
||||
name: test
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -9,66 +7,57 @@ on:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: true
|
||||
langchain-location:
|
||||
required: false
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
description: "Relative path to the langchain library folder"
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Main test job - runs unit tests with current deps, then retests with minimum versions
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Python ${{ inputs.python-version }}'
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
name: "make test #${{ matrix.python-version }}"
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
id: setup-python
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
- name: '📦 Install Test Dependencies'
|
||||
shell: bash
|
||||
run: uv sync --group test --dev
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: core
|
||||
|
||||
- name: '🧪 Run Core Unit Tests'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: poetry install --with test
|
||||
|
||||
- name: Install langchain editable
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
if: ${{ inputs.langchain-location }}
|
||||
env:
|
||||
LANGCHAIN_LOCATION: ${{ inputs.langchain-location }}
|
||||
run: |
|
||||
poetry run pip install -e "$LANGCHAIN_LOCATION"
|
||||
|
||||
- name: Run core tests
|
||||
shell: bash
|
||||
run: |
|
||||
make test
|
||||
|
||||
- name: '🔍 Calculate Minimum Dependency Versions'
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
id: min-version
|
||||
shell: bash
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install packaging tomli requests
|
||||
python_version="$(uv run python --version | awk '{print $2}')"
|
||||
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)"
|
||||
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
|
||||
echo "min-versions=$min_versions"
|
||||
|
||||
- name: '🧪 Run Tests with Minimum Dependencies'
|
||||
if: ${{ steps.min-version.outputs.min-versions != '' }}
|
||||
env:
|
||||
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install $MIN_VERSIONS
|
||||
make tests
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
@@ -79,4 +68,3 @@ jobs:
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
|
||||
|
||||
44
.github/workflows/_test_doc_imports.yml
vendored
44
.github/workflows/_test_doc_imports.yml
vendored
@@ -1,47 +1,43 @@
|
||||
name: '📑 Documentation Import Testing'
|
||||
name: test_doc_imports
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
python-version:
|
||||
required: true
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: '🔍 Check Doc Imports (Python ${{ inputs.python-version }})'
|
||||
strategy:
|
||||
matrix:
|
||||
python-version:
|
||||
- "3.11"
|
||||
name: "check doc imports #${{ matrix.python-version }}"
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
cache-key: core
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: uv sync --group test
|
||||
run: poetry install --with test
|
||||
|
||||
- name: '📦 Install LangChain in Editable Mode'
|
||||
- name: Install langchain editable
|
||||
run: |
|
||||
VIRTUAL_ENV=.venv uv pip install langchain-experimental langchain-community -e libs/core libs/langchain
|
||||
poetry run pip install -e libs/core libs/langchain libs/community libs/experimental
|
||||
|
||||
- name: '🔍 Validate Documentation Import Statements'
|
||||
- name: Check doc imports
|
||||
shell: bash
|
||||
run: |
|
||||
uv run python docs/scripts/check_imports.py
|
||||
poetry run python docs/scripts/check_imports.py
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the test did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
67
.github/workflows/_test_pydantic.yml
vendored
67
.github/workflows/_test_pydantic.yml
vendored
@@ -1,67 +0,0 @@
|
||||
name: '🐍 Pydantic Version Testing'
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
inputs:
|
||||
working-directory:
|
||||
required: true
|
||||
type: string
|
||||
description: "From which folder this pipeline executes"
|
||||
python-version:
|
||||
required: false
|
||||
type: string
|
||||
description: "Python version to use"
|
||||
default: "3.11"
|
||||
pydantic-version:
|
||||
required: true
|
||||
type: string
|
||||
description: "Pydantic version to test."
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
name: 'Pydantic ~=${{ inputs.pydantic-version }}'
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ inputs.python-version }}
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
shell: bash
|
||||
run: uv sync --group test
|
||||
|
||||
- name: '🔄 Install Specific Pydantic Version'
|
||||
shell: bash
|
||||
run: VIRTUAL_ENV=.venv uv pip install pydantic~=${{ inputs.pydantic-version }}
|
||||
|
||||
- name: '🧪 Run Core Tests'
|
||||
shell: bash
|
||||
run: |
|
||||
make test
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
STATUS="$(git status)"
|
||||
echo "$STATUS"
|
||||
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
36
.github/workflows/_test_release.yml
vendored
36
.github/workflows/_test_release.yml
vendored
@@ -1,4 +1,4 @@
|
||||
name: '🧪 Test Release Package'
|
||||
name: test-release
|
||||
|
||||
on:
|
||||
workflow_call:
|
||||
@@ -14,8 +14,8 @@ on:
|
||||
description: "Release from a non-master branch (danger!)"
|
||||
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
UV_FROZEN: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
PYTHON_VERSION: "3.10"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
@@ -29,10 +29,13 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
cache-key: release
|
||||
|
||||
# We want to keep this build stage *separate* from the release stage,
|
||||
# so that there's no sharing of permissions between them.
|
||||
@@ -45,30 +48,23 @@ jobs:
|
||||
# > It is strongly advised to separate jobs for building [...]
|
||||
# > from the publish job.
|
||||
# https://github.com/pypa/gh-action-pypi-publish#non-goals
|
||||
- name: '📦 Build Project for Distribution'
|
||||
run: uv build
|
||||
- name: Build project for distribution
|
||||
run: poetry build
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
|
||||
- name: '⬆️ Upload Build Artifacts'
|
||||
- name: Upload build
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: test-dist
|
||||
path: ${{ inputs.working-directory }}/dist/
|
||||
|
||||
- name: '🔍 Extract Version Information'
|
||||
- name: Check Version
|
||||
id: check-version
|
||||
shell: python
|
||||
shell: bash
|
||||
working-directory: ${{ inputs.working-directory }}
|
||||
run: |
|
||||
import os
|
||||
import tomllib
|
||||
with open("pyproject.toml", "rb") as f:
|
||||
data = tomllib.load(f)
|
||||
pkg_name = data["project"]["name"]
|
||||
version = data["project"]["version"]
|
||||
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
|
||||
f.write(f"pkg-name={pkg_name}\n")
|
||||
f.write(f"version={version}\n")
|
||||
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
|
||||
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
|
||||
|
||||
publish:
|
||||
needs:
|
||||
@@ -102,5 +98,3 @@ jobs:
|
||||
# This is *only for CI use* and is *extremely dangerous* otherwise!
|
||||
# https://github.com/pypa/gh-action-pypi-publish#tolerating-release-package-file-duplicates
|
||||
skip-existing: true
|
||||
# Temp workaround since attestations are on by default as of gh-action-pypi-publish v1.11.0
|
||||
attestations: false
|
||||
|
||||
120
.github/workflows/api_doc_build.yml
vendored
120
.github/workflows/api_doc_build.yml
vendored
@@ -1,120 +0,0 @@
|
||||
name: '📚 API Docs'
|
||||
run-name: 'Build & Deploy API Reference'
|
||||
# Runs daily or can be triggered manually for immediate updates
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *' # Daily at 1PM UTC
|
||||
env:
|
||||
PYTHON_VERSION: "3.11"
|
||||
|
||||
jobs:
|
||||
# Only runs on main repository to prevent unnecessary builds on forks
|
||||
build:
|
||||
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
path: langchain
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-api-docs-html
|
||||
path: langchain-api-docs-html
|
||||
token: ${{ secrets.TOKEN_GITHUB_API_DOCS_HTML }}
|
||||
|
||||
- name: '📋 Extract Repository List with yq'
|
||||
id: get-unsorted-repos
|
||||
uses: mikefarah/yq@master
|
||||
with:
|
||||
cmd: |
|
||||
yq '
|
||||
.packages[]
|
||||
| select(
|
||||
(
|
||||
(.repo | test("^langchain-ai/"))
|
||||
and
|
||||
(.repo != "langchain-ai/langchain")
|
||||
)
|
||||
or
|
||||
(.include_in_api_ref // false)
|
||||
)
|
||||
| .repo
|
||||
' langchain/libs/packages.yml
|
||||
|
||||
- name: '📋 Parse YAML & Checkout Repositories'
|
||||
env:
|
||||
REPOS_UNSORTED: ${{ steps.get-unsorted-repos.outputs.result }}
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
# Get unique repositories
|
||||
REPOS=$(echo "$REPOS_UNSORTED" | sort -u)
|
||||
# Checkout each unique repository
|
||||
for repo in $REPOS; do
|
||||
# Validate repository format (allow any org with proper format)
|
||||
if [[ ! "$repo" =~ ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$ ]]; then
|
||||
echo "Error: Invalid repository format: $repo"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
REPO_NAME=$(echo $repo | cut -d'/' -f2)
|
||||
|
||||
# Additional validation for repo name
|
||||
if [[ ! "$REPO_NAME" =~ ^[a-zA-Z0-9_.-]+$ ]]; then
|
||||
echo "Error: Invalid repository name: $REPO_NAME"
|
||||
exit 1
|
||||
fi
|
||||
echo "Checking out $repo to $REPO_NAME"
|
||||
git clone --depth 1 https://github.com/$repo.git $REPO_NAME
|
||||
done
|
||||
|
||||
- name: '🐍 Setup Python ${{ env.PYTHON_VERSION }}'
|
||||
uses: actions/setup-python@v5
|
||||
id: setup-python
|
||||
with:
|
||||
python-version: ${{ env.PYTHON_VERSION }}
|
||||
|
||||
- name: '📦 Install Initial Python Dependencies'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python -m pip install -U uv
|
||||
python -m uv pip install --upgrade --no-cache-dir pip setuptools pyyaml
|
||||
|
||||
- name: '📦 Organize Library Directories'
|
||||
run: python langchain/.github/scripts/prep_api_docs_build.py
|
||||
|
||||
- name: '🧹 Remove Old HTML Files'
|
||||
run:
|
||||
rm -rf langchain-api-docs-html/api_reference_build/html
|
||||
|
||||
- name: '📦 Install Documentation Dependencies'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
|
||||
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
|
||||
python -m uv pip install -r docs/api_reference/requirements.txt
|
||||
|
||||
- name: '🔧 Configure Git Settings'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
git config --local user.email "actions@github.com"
|
||||
git config --local user.name "Github Actions"
|
||||
|
||||
- name: '📚 Build API Documentation'
|
||||
working-directory: langchain
|
||||
run: |
|
||||
python docs/api_reference/create_api_rst.py
|
||||
python -m sphinx -T -E -b html -d ../langchain-api-docs-html/_build/doctrees -c docs/api_reference docs/api_reference ../langchain-api-docs-html/api_reference_build/html -j auto
|
||||
python docs/api_reference/scripts/custom_formatter.py ../langchain-api-docs-html/api_reference_build/html
|
||||
# Default index page is blank so we copy in the actual home page.
|
||||
cp ../langchain-api-docs-html/api_reference_build/html/{reference,index}.html
|
||||
rm -rf ../langchain-api-docs-html/_build/
|
||||
|
||||
# https://github.com/marketplace/actions/add-commit
|
||||
- uses: EndBug/add-and-commit@v9
|
||||
with:
|
||||
cwd: langchain-api-docs-html
|
||||
message: 'Update API docs build'
|
||||
14
.github/workflows/check-broken-links.yml
vendored
14
.github/workflows/check-broken-links.yml
vendored
@@ -1,28 +1,24 @@
|
||||
name: '🔗 Check Broken Links'
|
||||
name: Check Broken Links
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check-links:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: '🟢 Setup Node.js 18.x'
|
||||
uses: actions/setup-node@v4
|
||||
- name: Use Node.js 18.x
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: 18.x
|
||||
cache: "yarn"
|
||||
cache-dependency-path: ./docs/yarn.lock
|
||||
- name: '📦 Install Node Dependencies'
|
||||
- name: Install dependencies
|
||||
run: yarn install --immutable --mode=skip-build
|
||||
working-directory: ./docs
|
||||
- name: '🔍 Scan Documentation for Broken Links'
|
||||
- name: Check broken links
|
||||
run: yarn check-broken-links
|
||||
working-directory: ./docs
|
||||
|
||||
34
.github/workflows/check_core_versions.yml
vendored
34
.github/workflows/check_core_versions.yml
vendored
@@ -1,34 +0,0 @@
|
||||
name: '🔍 Check `core` Version Equality'
|
||||
# Ensures version numbers in pyproject.toml and version.py stay in sync
|
||||
# Prevents releases with mismatched version numbers
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths:
|
||||
- 'libs/core/pyproject.toml'
|
||||
- 'libs/core/langchain_core/version.py'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
check_version_equality:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '✅ Verify pyproject.toml & version.py Match'
|
||||
run: |
|
||||
PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' libs/core/pyproject.toml)
|
||||
VERSION_PY_VERSION=$(grep -Po '(?<=^VERSION = ")[^"]*' libs/core/langchain_core/version.py)
|
||||
|
||||
# Compare the two versions
|
||||
if [ "$PYPROJECT_VERSION" != "$VERSION_PY_VERSION" ]; then
|
||||
echo "langchain-core versions in pyproject.toml and version.py do not match!"
|
||||
echo "pyproject.toml version: $PYPROJECT_VERSION"
|
||||
echo "version.py version: $VERSION_PY_VERSION"
|
||||
exit 1
|
||||
else
|
||||
echo "Versions match: $PYPROJECT_VERSION"
|
||||
fi
|
||||
158
.github/workflows/check_diffs.yml
vendored
158
.github/workflows/check_diffs.yml
vendored
@@ -1,12 +1,11 @@
|
||||
name: '🔧 CI'
|
||||
---
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
merge_group:
|
||||
|
||||
# Optimizes CI performance by canceling redundant workflow runs
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
@@ -17,144 +16,121 @@ concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
UV_NO_SYNC: "true"
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# This job analyzes which files changed and creates a dynamic test matrix
|
||||
# to only run tests/lints for the affected packages, improving CI efficiency
|
||||
build:
|
||||
name: 'Detect Changes & Set Matrix'
|
||||
runs-on: ubuntu-latest
|
||||
if: ${{ !contains(github.event.pull_request.labels.*.name, 'ci-ignore') }}
|
||||
steps:
|
||||
- name: '📋 Checkout Code'
|
||||
uses: actions/checkout@v4
|
||||
- name: '🐍 Setup Python 3.11'
|
||||
uses: actions/setup-python@v5
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.11'
|
||||
- name: '📂 Get Changed Files'
|
||||
id: files
|
||||
uses: Ana06/get-changed-files@v2.3.0
|
||||
- name: '🔍 Analyze Changed Files & Generate Build Matrix'
|
||||
id: set-matrix
|
||||
python-version: '3.10'
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.2.0
|
||||
- id: set-matrix
|
||||
run: |
|
||||
python -m pip install packaging requests
|
||||
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
|
||||
outputs:
|
||||
lint: ${{ steps.set-matrix.outputs.lint }}
|
||||
test: ${{ steps.set-matrix.outputs.test }}
|
||||
extended-tests: ${{ steps.set-matrix.outputs.extended-tests }}
|
||||
compile-integration-tests: ${{ steps.set-matrix.outputs.compile-integration-tests }}
|
||||
dependencies: ${{ steps.set-matrix.outputs.dependencies }}
|
||||
test-doc-imports: ${{ steps.set-matrix.outputs.test-doc-imports }}
|
||||
test-pydantic: ${{ steps.set-matrix.outputs.test-pydantic }}
|
||||
# Run linting only on packages that have changed files
|
||||
dirs-to-lint: ${{ steps.set-matrix.outputs.dirs-to-lint }}
|
||||
dirs-to-test: ${{ steps.set-matrix.outputs.dirs-to-test }}
|
||||
dirs-to-extended-test: ${{ steps.set-matrix.outputs.dirs-to-extended-test }}
|
||||
docs-edited: ${{ steps.set-matrix.outputs.docs-edited }}
|
||||
lint:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.lint != '[]' }}
|
||||
if: ${{ needs.build.outputs.dirs-to-lint != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.lint) }}
|
||||
fail-fast: false
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-lint) }}
|
||||
uses: ./.github/workflows/_lint.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
# Run unit tests only on packages that have changed files
|
||||
test:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.test != '[]' }}
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.test) }}
|
||||
fail-fast: false
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_test.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Test compatibility with different Pydantic versions for affected packages
|
||||
test-pydantic:
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.test-pydantic != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.test-pydantic) }}
|
||||
fail-fast: false
|
||||
uses: ./.github/workflows/_test_pydantic.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
pydantic-version: ${{ matrix.job-configs.pydantic-version }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
test-doc-imports:
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.test-doc-imports != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.test-doc-imports) }}
|
||||
fail-fast: false
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' || needs.build.outputs.docs-edited }}
|
||||
uses: ./.github/workflows/_test_doc_imports.yml
|
||||
with:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
secrets: inherit
|
||||
|
||||
# Verify integration tests compile without actually running them (faster feedback)
|
||||
compile-integration-tests:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.compile-integration-tests != '[]' }}
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
job-configs: ${{ fromJson(needs.build.outputs.compile-integration-tests) }}
|
||||
fail-fast: false
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_compile_integration_test.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
# Run extended test suites that require additional dependencies
|
||||
extended-tests:
|
||||
name: 'Extended Tests'
|
||||
dependencies:
|
||||
name: cd ${{ matrix.working-directory }}
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.extended-tests != '[]' }}
|
||||
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
|
||||
uses: ./.github/workflows/_dependencies.yml
|
||||
with:
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
secrets: inherit
|
||||
|
||||
extended-tests:
|
||||
name: "cd ${{ matrix.working-directory }} / make extended_tests #${{ matrix.python-version }}"
|
||||
needs: [ build ]
|
||||
if: ${{ needs.build.outputs.dirs-to-extended-test != '[]' }}
|
||||
strategy:
|
||||
matrix:
|
||||
# note different variable for extended test dirs
|
||||
job-configs: ${{ fromJson(needs.build.outputs.extended-tests) }}
|
||||
fail-fast: false
|
||||
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-extended-test) }}
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.9"
|
||||
- "3.10"
|
||||
- "3.11"
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 20
|
||||
defaults:
|
||||
run:
|
||||
working-directory: ${{ matrix.job-configs.working-directory }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.job-configs.python-version }} + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
|
||||
uses: "./.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.job-configs.python-version }}
|
||||
python-version: ${{ matrix.python-version }}
|
||||
poetry-version: ${{ env.POETRY_VERSION }}
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
cache-key: extended
|
||||
|
||||
- name: '📦 Install Dependencies & Run Extended Tests'
|
||||
- name: Install dependencies
|
||||
shell: bash
|
||||
run: |
|
||||
echo "Running extended tests, installing dependencies with uv..."
|
||||
uv venv
|
||||
uv sync --group test
|
||||
VIRTUAL_ENV=.venv uv pip install -r extended_testing_deps.txt
|
||||
VIRTUAL_ENV=.venv make extended_tests
|
||||
echo "Running extended tests, installing dependencies with poetry..."
|
||||
poetry install --with test
|
||||
poetry run pip install uv
|
||||
poetry run uv pip install -r extended_testing_deps.txt
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Run extended tests
|
||||
run: make extended_tests
|
||||
|
||||
- name: Ensure the tests did not create any additional files
|
||||
shell: bash
|
||||
run: |
|
||||
set -eu
|
||||
@@ -165,11 +141,9 @@ jobs:
|
||||
# grep will exit non-zero if the target message isn't found,
|
||||
# and `set -e` above will cause the step to fail.
|
||||
echo "$STATUS" | grep 'nothing to commit, working tree clean'
|
||||
|
||||
# Final status check - ensures all required jobs passed before allowing merge
|
||||
ci_success:
|
||||
name: '✅ CI Success'
|
||||
needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic]
|
||||
name: "CI Success"
|
||||
needs: [build, lint, test, compile-integration-tests, dependencies, extended-tests, test-doc-imports]
|
||||
if: |
|
||||
always()
|
||||
runs-on: ubuntu-latest
|
||||
@@ -178,7 +152,7 @@ jobs:
|
||||
RESULTS_JSON: ${{ toJSON(needs.*.result) }}
|
||||
EXIT_CODE: ${{!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') && '0' || '1'}}
|
||||
steps:
|
||||
- name: '🎉 All Checks Passed'
|
||||
- name: "CI Success"
|
||||
run: |
|
||||
echo $JOBS_JSON
|
||||
echo $RESULTS_JSON
|
||||
|
||||
38
.github/workflows/check_new_docs.yml
vendored
38
.github/workflows/check_new_docs.yml
vendored
@@ -1,38 +0,0 @@
|
||||
name: '📑 Integration Docs Lint'
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master]
|
||||
pull_request:
|
||||
|
||||
# If another push to the same PR or branch happens while this workflow is still running,
|
||||
# cancel the earlier run in favor of the next run.
|
||||
#
|
||||
# There's no point in testing an outdated version of the code. GitHub only allows
|
||||
# a limited number of job runners to be active at the same time, so it's better to cancel
|
||||
# pointless jobs early so that more useful jobs can run sooner.
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.10'
|
||||
- id: files
|
||||
uses: Ana06/get-changed-files@v2.3.0
|
||||
with:
|
||||
filter: |
|
||||
*.ipynb
|
||||
*.md
|
||||
*.mdx
|
||||
- name: '🔍 Check New Documentation Templates'
|
||||
run: |
|
||||
python docs/scripts/check_templates.py ${{ steps.files.outputs.added }}
|
||||
37
.github/workflows/codespell.yml
vendored
Normal file
37
.github/workflows/codespell.yml
vendored
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
name: CI / cd . / make spell_check
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [master, v0.1]
|
||||
pull_request:
|
||||
branches: [master, v0.1]
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
codespell:
|
||||
name: (Check for spelling errors)
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install Dependencies
|
||||
run: |
|
||||
pip install toml
|
||||
|
||||
- name: Extract Ignore Words List
|
||||
run: |
|
||||
# Use a Python script to extract the ignore words list from pyproject.toml
|
||||
python .github/workflows/extract_ignored_words_list.py
|
||||
id: extract_ignore_words
|
||||
|
||||
# - name: Codespell
|
||||
# uses: codespell-project/actions-codespell@v2
|
||||
# with:
|
||||
# skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv,*.lock
|
||||
# ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
|
||||
# exclude_file: ./.github/workflows/codespell-exclude
|
||||
66
.github/workflows/codspeed.yml
vendored
66
.github/workflows/codspeed.yml
vendored
@@ -1,66 +0,0 @@
|
||||
name: '⚡ CodSpeed'
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
pull_request:
|
||||
workflow_dispatch:
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: foo
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: foo
|
||||
DEEPSEEK_API_KEY: foo
|
||||
FIREWORKS_API_KEY: foo
|
||||
|
||||
jobs:
|
||||
codspeed:
|
||||
name: 'Benchmark'
|
||||
runs-on: ubuntu-latest
|
||||
if: ${{ !contains(github.event.pull_request.labels.*.name, 'codspeed-ignore') }}
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- working-directory: libs/core
|
||||
mode: walltime
|
||||
- working-directory: libs/partners/openai
|
||||
- working-directory: libs/partners/anthropic
|
||||
- working-directory: libs/partners/deepseek
|
||||
- working-directory: libs/partners/fireworks
|
||||
- working-directory: libs/partners/xai
|
||||
- working-directory: libs/partners/mistralai
|
||||
- working-directory: libs/partners/groq
|
||||
fail-fast: false
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
# We have to use 3.12 as 3.13 is not yet supported
|
||||
- name: '📦 Install UV Package Manager'
|
||||
uses: astral-sh/setup-uv@v6
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: '📦 Install Test Dependencies'
|
||||
run: uv sync --group test
|
||||
working-directory: ${{ matrix.working-directory }}
|
||||
|
||||
- name: '⚡ Run Benchmarks: ${{ matrix.working-directory }}'
|
||||
uses: CodSpeedHQ/action@v3
|
||||
with:
|
||||
token: ${{ secrets.CODSPEED_TOKEN }}
|
||||
run: |
|
||||
cd ${{ matrix.working-directory }}
|
||||
if [ "${{ matrix.working-directory }}" = "libs/core" ]; then
|
||||
uv run --no-sync pytest ./tests/benchmarks --codspeed
|
||||
else
|
||||
uv run --no-sync pytest ./tests/ --codspeed
|
||||
fi
|
||||
mode: ${{ matrix.mode || 'instrumentation' }}
|
||||
14
.github/workflows/langchain_release_docker.yml
vendored
Normal file
14
.github/workflows/langchain_release_docker.yml
vendored
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
name: docker/langchain/langchain Release
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
workflow_call: # Allows triggering from another workflow
|
||||
|
||||
jobs:
|
||||
release:
|
||||
uses: ./.github/workflows/_release_docker.yml
|
||||
with:
|
||||
dockerfile: docker/Dockerfile.base
|
||||
image: langchain/langchain
|
||||
secrets: inherit
|
||||
26
.github/workflows/people.yml
vendored
26
.github/workflows/people.yml
vendored
@@ -1,28 +1,36 @@
|
||||
name: '👥 LangChain People'
|
||||
run-name: 'Update People Data'
|
||||
# This workflow updates the LangChain People data by fetching the latest information from the LangChain Git
|
||||
name: LangChain People
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: "0 14 1 * *"
|
||||
push:
|
||||
branches: [jacob/people]
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
debug_enabled:
|
||||
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
|
||||
required: false
|
||||
default: 'false'
|
||||
|
||||
jobs:
|
||||
langchain-people:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
if: github.repository_owner == 'langchain-ai'
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
steps:
|
||||
- name: '📋 Dump GitHub Context'
|
||||
- name: Dump GitHub context
|
||||
env:
|
||||
GITHUB_CONTEXT: ${{ toJson(github) }}
|
||||
run: echo "$GITHUB_CONTEXT"
|
||||
- uses: actions/checkout@v4
|
||||
# Ref: https://github.com/actions/runner/issues/2033
|
||||
- name: '🔧 Fix Git Safe Directory in Container'
|
||||
- name: Fix git safe.directory in container
|
||||
run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
|
||||
# Allow debugging with tmate
|
||||
- name: Setup tmate session
|
||||
uses: mxschmitt/action-tmate@v3
|
||||
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
|
||||
with:
|
||||
limit-access-to-actor: true
|
||||
- uses: ./.github/actions/people
|
||||
with:
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
|
||||
111
.github/workflows/pr_lint.yml
vendored
111
.github/workflows/pr_lint.yml
vendored
@@ -1,111 +0,0 @@
|
||||
# -----------------------------------------------------------------------------
|
||||
# PR Title Lint Workflow
|
||||
#
|
||||
# Purpose:
|
||||
# Enforces Conventional Commits format for pull request titles to maintain a
|
||||
# clear, consistent, and machine-readable change history across our repository.
|
||||
# This helps with automated changelog generation and semantic versioning.
|
||||
#
|
||||
# Enforced Commit Message Format (Conventional Commits 1.0.0):
|
||||
# <type>[optional scope]: <description>
|
||||
# [optional body]
|
||||
# [optional footer(s)]
|
||||
#
|
||||
# Allowed Types:
|
||||
# • feat — a new feature (MINOR bump)
|
||||
# • fix — a bug fix (PATCH bump)
|
||||
# • docs — documentation only changes
|
||||
# • style — formatting, missing semi-colons, etc.; no code change
|
||||
# • refactor — code change that neither fixes a bug nor adds a feature
|
||||
# • perf — code change that improves performance
|
||||
# • test — adding missing tests or correcting existing tests
|
||||
# • build — changes that affect the build system or external dependencies
|
||||
# • ci — continuous integration/configuration changes
|
||||
# • chore — other changes that don't modify src or test files
|
||||
# • revert — reverts a previous commit
|
||||
# • release — prepare a new release
|
||||
#
|
||||
# Allowed Scopes (optional):
|
||||
# core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek,
|
||||
# exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai,
|
||||
# perplexity, prompty, qdrant, xai
|
||||
#
|
||||
# Rules & Tips for New Committers:
|
||||
# 1. Subject (type) must start with a lowercase letter and, if possible, be
|
||||
# followed by a scope wrapped in parenthesis `(scope)`
|
||||
# 2. Breaking changes:
|
||||
# – Append "!" after type/scope (e.g., feat!: drop Node 12 support)
|
||||
# – Or include a footer "BREAKING CHANGE: <details>"
|
||||
# 3. Example PR titles:
|
||||
# feat(core): add multi‐tenant support
|
||||
# fix(cli): resolve flag parsing error
|
||||
# docs: update API usage examples
|
||||
# docs(openai): update API usage examples
|
||||
#
|
||||
# Resources:
|
||||
# • Conventional Commits spec: https://www.conventionalcommits.org/en/v1.0.0/
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
name: '🏷️ PR Title Lint'
|
||||
|
||||
permissions:
|
||||
pull-requests: read
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, edited, synchronize]
|
||||
|
||||
jobs:
|
||||
# Validates that PR title follows Conventional Commits specification
|
||||
lint-pr-title:
|
||||
name: 'Validate PR Title Format'
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: '✅ Validate Conventional Commits Format'
|
||||
uses: amannn/action-semantic-pull-request@v5
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
with:
|
||||
types: |
|
||||
feat
|
||||
fix
|
||||
docs
|
||||
style
|
||||
refactor
|
||||
perf
|
||||
test
|
||||
build
|
||||
ci
|
||||
chore
|
||||
revert
|
||||
release
|
||||
scopes: |
|
||||
core
|
||||
cli
|
||||
langchain
|
||||
langchain_v1
|
||||
standard-tests
|
||||
text-splitters
|
||||
docs
|
||||
anthropic
|
||||
chroma
|
||||
deepseek
|
||||
exa
|
||||
fireworks
|
||||
groq
|
||||
huggingface
|
||||
mistralai
|
||||
nomic
|
||||
ollama
|
||||
openai
|
||||
perplexity
|
||||
prompty
|
||||
qdrant
|
||||
xai
|
||||
infra
|
||||
requireScope: false
|
||||
disallowScopes: |
|
||||
release
|
||||
[A-Z]+
|
||||
ignoreLabels: |
|
||||
ignore-lint-pr-title
|
||||
75
.github/workflows/run_notebooks.yml
vendored
75
.github/workflows/run_notebooks.yml
vendored
@@ -1,75 +0,0 @@
|
||||
name: '📓 Validate Documentation Notebooks'
|
||||
run-name: 'Test notebooks in ${{ inputs.working-directory }}'
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
python_version:
|
||||
description: 'Python version'
|
||||
required: false
|
||||
default: '3.11'
|
||||
working-directory:
|
||||
description: 'Working directory or subset (e.g., docs/docs/tutorials/llm_chain.ipynb or docs/docs/how_to)'
|
||||
required: false
|
||||
default: 'all'
|
||||
schedule:
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
|
||||
env:
|
||||
UV_FROZEN: "true"
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
|
||||
name: '📑 Test Documentation Notebooks'
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: '🐍 Set up Python + UV'
|
||||
uses: "./.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ github.event.inputs.python_version || '3.11' }}
|
||||
|
||||
- name: '🔐 Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: '🔐 Configure AWS Credentials'
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ secrets.AWS_REGION }}
|
||||
|
||||
- name: '📦 Install Dependencies'
|
||||
run: |
|
||||
uv sync --group dev --group test
|
||||
|
||||
- name: '📦 Pre-download Test Files'
|
||||
run: |
|
||||
uv run python docs/scripts/cache_data.py
|
||||
curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 docs/docs/how_to/Chinook.db
|
||||
cp docs/docs/how_to/Chinook.db docs/docs/tutorials/Chinook.db
|
||||
|
||||
- name: '🔧 Prepare Notebooks for CI'
|
||||
run: |
|
||||
uv run python docs/scripts/prepare_notebooks_for_ci.py --comment-install-cells --working-directory ${{ github.event.inputs.working-directory || 'all' }}
|
||||
|
||||
- name: '🚀 Execute Notebooks'
|
||||
env:
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
TAVILY_API_KEY: ${{ secrets.TAVILY_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
WORKING_DIRECTORY: ${{ github.event.inputs.working-directory || 'all' }}
|
||||
run: |
|
||||
./docs/scripts/execute_notebooks.sh $WORKING_DIRECTORY
|
||||
136
.github/workflows/scheduled_test.yml
vendored
136
.github/workflows/scheduled_test.yml
vendored
@@ -1,71 +1,36 @@
|
||||
name: '⏰ Scheduled Integration Tests'
|
||||
run-name: "Run Integration Tests - ${{ inputs.working-directory-force || 'all libs' }} (Python ${{ inputs.python-version-force || '3.9, 3.11' }})"
|
||||
name: Scheduled tests
|
||||
|
||||
on:
|
||||
workflow_dispatch: # Allows maintainers to trigger the workflow manually in GitHub UI
|
||||
inputs:
|
||||
working-directory-force:
|
||||
type: string
|
||||
description: "From which folder this pipeline executes - defaults to all in matrix - example value: libs/partners/anthropic"
|
||||
python-version-force:
|
||||
type: string
|
||||
description: "Python version to use - defaults to 3.9 and 3.11 in matrix - example value: 3.9"
|
||||
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
|
||||
schedule:
|
||||
- cron: '0 13 * * *' # Runs daily at 1PM UTC (9AM EDT/6AM PDT)
|
||||
|
||||
permissions:
|
||||
contents: read
|
||||
- cron: '0 13 * * *'
|
||||
|
||||
env:
|
||||
POETRY_VERSION: "1.8.4"
|
||||
UV_FROZEN: "true"
|
||||
DEFAULT_LIBS: '["libs/partners/openai", "libs/partners/anthropic", "libs/partners/fireworks", "libs/partners/groq", "libs/partners/mistralai", "libs/partners/xai", "libs/partners/google-vertexai", "libs/partners/google-genai", "libs/partners/aws"]'
|
||||
POETRY_LIBS: ("libs/partners/google-vertexai" "libs/partners/google-genai" "libs/partners/aws")
|
||||
POETRY_VERSION: "1.7.1"
|
||||
|
||||
jobs:
|
||||
# Generate dynamic test matrix based on input parameters or defaults
|
||||
# Only runs on the main repo (for scheduled runs) or when manually triggered
|
||||
compute-matrix:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
runs-on: ubuntu-latest
|
||||
name: '📋 Compute Test Matrix'
|
||||
outputs:
|
||||
matrix: ${{ steps.set-matrix.outputs.matrix }}
|
||||
steps:
|
||||
- name: '🔢 Generate Python & Library Matrix'
|
||||
id: set-matrix
|
||||
env:
|
||||
DEFAULT_LIBS: ${{ env.DEFAULT_LIBS }}
|
||||
WORKING_DIRECTORY_FORCE: ${{ github.event.inputs.working-directory-force || '' }}
|
||||
PYTHON_VERSION_FORCE: ${{ github.event.inputs.python-version-force || '' }}
|
||||
run: |
|
||||
# echo "matrix=..." where matrix is a json formatted str with keys python-version and working-directory
|
||||
# python-version should default to 3.9 and 3.11, but is overridden to [PYTHON_VERSION_FORCE] if set
|
||||
# working-directory should default to DEFAULT_LIBS, but is overridden to [WORKING_DIRECTORY_FORCE] if set
|
||||
python_version='["3.9", "3.11"]'
|
||||
working_directory="$DEFAULT_LIBS"
|
||||
if [ -n "$PYTHON_VERSION_FORCE" ]; then
|
||||
python_version="[\"$PYTHON_VERSION_FORCE\"]"
|
||||
fi
|
||||
if [ -n "$WORKING_DIRECTORY_FORCE" ]; then
|
||||
working_directory="[\"$WORKING_DIRECTORY_FORCE\"]"
|
||||
fi
|
||||
matrix="{\"python-version\": $python_version, \"working-directory\": $working_directory}"
|
||||
echo $matrix
|
||||
echo "matrix=$matrix" >> $GITHUB_OUTPUT
|
||||
# Run integration tests against partner libraries with live API credentials
|
||||
# Tests are run with both Poetry and UV depending on the library's setup
|
||||
build:
|
||||
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
|
||||
name: '🐍 Python ${{ matrix.python-version }}: ${{ matrix.working-directory }}'
|
||||
name: Python ${{ matrix.python-version }} - ${{ matrix.working-directory }}
|
||||
runs-on: ubuntu-latest
|
||||
needs: [compute-matrix]
|
||||
timeout-minutes: 20
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
python-version: ${{ fromJSON(needs.compute-matrix.outputs.matrix).python-version }}
|
||||
working-directory: ${{ fromJSON(needs.compute-matrix.outputs.matrix).working-directory }}
|
||||
python-version:
|
||||
- "3.8"
|
||||
- "3.11"
|
||||
working-directory:
|
||||
- "libs/partners/openai"
|
||||
- "libs/partners/anthropic"
|
||||
- "libs/partners/ai21"
|
||||
- "libs/partners/fireworks"
|
||||
- "libs/partners/groq"
|
||||
- "libs/partners/mistralai"
|
||||
- "libs/partners/together"
|
||||
- "libs/partners/cohere"
|
||||
- "libs/partners/google-vertexai"
|
||||
- "libs/partners/google-genai"
|
||||
- "libs/partners/aws"
|
||||
- "libs/partners/nvidia-ai-endpoints"
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
@@ -75,22 +40,33 @@ jobs:
|
||||
with:
|
||||
repository: langchain-ai/langchain-google
|
||||
path: langchain-google
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-nvidia
|
||||
path: langchain-nvidia
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-cohere
|
||||
path: langchain-cohere
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
repository: langchain-ai/langchain-aws
|
||||
path: langchain-aws
|
||||
|
||||
- name: '📦 Organize External Libraries'
|
||||
- name: Move libs
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
langchain/libs/partners/google-vertexai
|
||||
langchain/libs/partners/google-vertexai \
|
||||
langchain/libs/partners/nvidia-ai-endpoints \
|
||||
langchain/libs/partners/cohere
|
||||
mv langchain-google/libs/genai langchain/libs/partners/google-genai
|
||||
mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
|
||||
mv langchain-nvidia/libs/ai-endpoints langchain/libs/partners/nvidia-ai-endpoints
|
||||
mv langchain-cohere/libs/cohere langchain/libs/partners/cohere
|
||||
mv langchain-aws/libs/aws langchain/libs/partners/aws
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.python-version }} + Poetry'
|
||||
if: contains(env.POETRY_LIBS, matrix.working-directory)
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: "./langchain/.github/actions/poetry_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
@@ -98,77 +74,59 @@ jobs:
|
||||
working-directory: langchain/${{ matrix.working-directory }}
|
||||
cache-key: scheduled
|
||||
|
||||
- name: '🐍 Set up Python ${{ matrix.python-version }} + UV'
|
||||
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
|
||||
uses: "./langchain/.github/actions/uv_setup"
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
|
||||
- name: '🔐 Authenticate to Google Cloud'
|
||||
- name: 'Authenticate to Google Cloud'
|
||||
id: 'auth'
|
||||
uses: google-github-actions/auth@v2
|
||||
with:
|
||||
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
|
||||
|
||||
- name: '🔐 Configure AWS Credentials'
|
||||
- name: Configure AWS Credentials
|
||||
uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: ${{ secrets.AWS_REGION }}
|
||||
|
||||
- name: '📦 Install Dependencies (Poetry)'
|
||||
if: contains(env.POETRY_LIBS, matrix.working-directory)
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with poetry..."
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
poetry install --with=test_integration,test
|
||||
|
||||
- name: '📦 Install Dependencies (UV)'
|
||||
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
|
||||
run: |
|
||||
echo "Running scheduled tests, installing dependencies with uv..."
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
uv sync --group test --group test_integration
|
||||
|
||||
- name: '🚀 Run Integration Tests'
|
||||
- name: Run integration tests
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
|
||||
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
|
||||
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
|
||||
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
|
||||
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
|
||||
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
|
||||
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
|
||||
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
|
||||
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
|
||||
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
|
||||
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
|
||||
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
|
||||
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
|
||||
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
|
||||
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
|
||||
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
|
||||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
|
||||
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
|
||||
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
|
||||
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
|
||||
run: |
|
||||
cd langchain/${{ matrix.working-directory }}
|
||||
make integration_tests
|
||||
|
||||
- name: '🧹 Clean up External Libraries'
|
||||
# Clean up external libraries to avoid affecting git status check
|
||||
run: |
|
||||
- name: Remove external libraries
|
||||
run: |
|
||||
rm -rf \
|
||||
langchain/libs/partners/google-genai \
|
||||
langchain/libs/partners/google-vertexai \
|
||||
langchain/libs/partners/nvidia-ai-endpoints \
|
||||
langchain/libs/partners/cohere \
|
||||
langchain/libs/partners/aws
|
||||
|
||||
- name: '🧹 Verify Clean Working Directory'
|
||||
- name: Ensure the tests did not create any additional files
|
||||
working-directory: langchain
|
||||
run: |
|
||||
set -eu
|
||||
|
||||
5
.gitignore
vendored
5
.gitignore
vendored
@@ -1,4 +1,5 @@
|
||||
.vs/
|
||||
.vscode/
|
||||
.idea/
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
@@ -58,7 +59,6 @@ coverage.xml
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
.codspeed/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
@@ -167,14 +167,11 @@ docs/.docusaurus/
|
||||
docs/.cache-loader/
|
||||
docs/_dist
|
||||
docs/api_reference/*api_reference.rst
|
||||
docs/api_reference/*.md
|
||||
docs/api_reference/_build
|
||||
docs/api_reference/*/
|
||||
!docs/api_reference/_static/
|
||||
!docs/api_reference/templates/
|
||||
!docs/api_reference/themes/
|
||||
!docs/api_reference/_extensions/
|
||||
!docs/api_reference/scripts/
|
||||
docs/docs/build
|
||||
docs/docs/node_modules
|
||||
docs/docs/yarn.lock
|
||||
|
||||
@@ -1,14 +0,0 @@
|
||||
{
|
||||
"MD013": false,
|
||||
"MD024": {
|
||||
"siblings_only": true
|
||||
},
|
||||
"MD025": false,
|
||||
"MD033": false,
|
||||
"MD034": false,
|
||||
"MD036": false,
|
||||
"MD041": false,
|
||||
"MD046": {
|
||||
"style": "fenced"
|
||||
}
|
||||
}
|
||||
@@ -1,111 +0,0 @@
|
||||
repos:
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: core
|
||||
name: format core
|
||||
language: system
|
||||
entry: make -C libs/core format
|
||||
files: ^libs/core/
|
||||
pass_filenames: false
|
||||
- id: langchain
|
||||
name: format langchain
|
||||
language: system
|
||||
entry: make -C libs/langchain format
|
||||
files: ^libs/langchain/
|
||||
pass_filenames: false
|
||||
- id: standard-tests
|
||||
name: format standard-tests
|
||||
language: system
|
||||
entry: make -C libs/standard-tests format
|
||||
files: ^libs/standard-tests/
|
||||
pass_filenames: false
|
||||
- id: text-splitters
|
||||
name: format text-splitters
|
||||
language: system
|
||||
entry: make -C libs/text-splitters format
|
||||
files: ^libs/text-splitters/
|
||||
pass_filenames: false
|
||||
- id: anthropic
|
||||
name: format partners/anthropic
|
||||
language: system
|
||||
entry: make -C libs/partners/anthropic format
|
||||
files: ^libs/partners/anthropic/
|
||||
pass_filenames: false
|
||||
- id: chroma
|
||||
name: format partners/chroma
|
||||
language: system
|
||||
entry: make -C libs/partners/chroma format
|
||||
files: ^libs/partners/chroma/
|
||||
pass_filenames: false
|
||||
- id: couchbase
|
||||
name: format partners/couchbase
|
||||
language: system
|
||||
entry: make -C libs/partners/couchbase format
|
||||
files: ^libs/partners/couchbase/
|
||||
pass_filenames: false
|
||||
- id: exa
|
||||
name: format partners/exa
|
||||
language: system
|
||||
entry: make -C libs/partners/exa format
|
||||
files: ^libs/partners/exa/
|
||||
pass_filenames: false
|
||||
- id: fireworks
|
||||
name: format partners/fireworks
|
||||
language: system
|
||||
entry: make -C libs/partners/fireworks format
|
||||
files: ^libs/partners/fireworks/
|
||||
pass_filenames: false
|
||||
- id: groq
|
||||
name: format partners/groq
|
||||
language: system
|
||||
entry: make -C libs/partners/groq format
|
||||
files: ^libs/partners/groq/
|
||||
pass_filenames: false
|
||||
- id: huggingface
|
||||
name: format partners/huggingface
|
||||
language: system
|
||||
entry: make -C libs/partners/huggingface format
|
||||
files: ^libs/partners/huggingface/
|
||||
pass_filenames: false
|
||||
- id: mistralai
|
||||
name: format partners/mistralai
|
||||
language: system
|
||||
entry: make -C libs/partners/mistralai format
|
||||
files: ^libs/partners/mistralai/
|
||||
pass_filenames: false
|
||||
- id: nomic
|
||||
name: format partners/nomic
|
||||
language: system
|
||||
entry: make -C libs/partners/nomic format
|
||||
files: ^libs/partners/nomic/
|
||||
pass_filenames: false
|
||||
- id: ollama
|
||||
name: format partners/ollama
|
||||
language: system
|
||||
entry: make -C libs/partners/ollama format
|
||||
files: ^libs/partners/ollama/
|
||||
pass_filenames: false
|
||||
- id: openai
|
||||
name: format partners/openai
|
||||
language: system
|
||||
entry: make -C libs/partners/openai format
|
||||
files: ^libs/partners/openai/
|
||||
pass_filenames: false
|
||||
- id: prompty
|
||||
name: format partners/prompty
|
||||
language: system
|
||||
entry: make -C libs/partners/prompty format
|
||||
files: ^libs/partners/prompty/
|
||||
pass_filenames: false
|
||||
- id: qdrant
|
||||
name: format partners/qdrant
|
||||
language: system
|
||||
entry: make -C libs/partners/qdrant format
|
||||
files: ^libs/partners/qdrant/
|
||||
pass_filenames: false
|
||||
- id: root
|
||||
name: format docs, cookbook
|
||||
language: system
|
||||
entry: make format
|
||||
files: ^(docs|cookbook)/
|
||||
pass_filenames: false
|
||||
@@ -1,7 +1,12 @@
|
||||
# Read the Docs configuration file
|
||||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
||||
|
||||
# Required
|
||||
version: 2
|
||||
|
||||
formats:
|
||||
- pdf
|
||||
|
||||
# Set the version of Python and other tools you might need
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
@@ -10,16 +15,15 @@ build:
|
||||
commands:
|
||||
- mkdir -p $READTHEDOCS_OUTPUT
|
||||
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
|
||||
|
||||
# Build documentation in the docs/ directory with Sphinx
|
||||
sphinx:
|
||||
configuration: docs/api_reference/conf.py
|
||||
configuration: docs/api_reference/conf.py
|
||||
|
||||
# If using Sphinx, optionally build your docs in additional formats such as PDF
|
||||
formats:
|
||||
- pdf
|
||||
# formats:
|
||||
# - pdf
|
||||
|
||||
# Optionally declare the Python requirements required to build your docs
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/api_reference/requirements.txt
|
||||
install:
|
||||
- requirements: docs/api_reference/requirements.txt
|
||||
|
||||
21
.vscode/extensions.json
vendored
21
.vscode/extensions.json
vendored
@@ -1,21 +0,0 @@
|
||||
{
|
||||
"recommendations": [
|
||||
"ms-python.python",
|
||||
"charliermarsh.ruff",
|
||||
"ms-python.mypy-type-checker",
|
||||
"ms-toolsai.jupyter",
|
||||
"ms-toolsai.jupyter-keymap",
|
||||
"ms-toolsai.jupyter-renderers",
|
||||
"ms-toolsai.vscode-jupyter-cell-tags",
|
||||
"ms-toolsai.vscode-jupyter-slideshow",
|
||||
"yzhang.markdown-all-in-one",
|
||||
"davidanson.vscode-markdownlint",
|
||||
"bierner.markdown-mermaid",
|
||||
"bierner.markdown-preview-github-styles",
|
||||
"eamodio.gitlens",
|
||||
"github.vscode-pull-request-github",
|
||||
"github.vscode-github-actions",
|
||||
"redhat.vscode-yaml",
|
||||
"editorconfig.editorconfig",
|
||||
],
|
||||
}
|
||||
82
.vscode/settings.json
vendored
82
.vscode/settings.json
vendored
@@ -1,82 +0,0 @@
|
||||
{
|
||||
"python.analysis.include": [
|
||||
"libs/**",
|
||||
"docs/**",
|
||||
"cookbook/**"
|
||||
],
|
||||
"python.analysis.exclude": [
|
||||
"**/node_modules",
|
||||
"**/__pycache__",
|
||||
"**/.pytest_cache",
|
||||
"**/.*",
|
||||
"_dist/**",
|
||||
"docs/_build/**",
|
||||
"docs/api_reference/_build/**"
|
||||
],
|
||||
"python.analysis.autoImportCompletions": true,
|
||||
"python.analysis.typeCheckingMode": "basic",
|
||||
"python.testing.cwd": "${workspaceFolder}",
|
||||
"python.linting.enabled": true,
|
||||
"python.linting.ruffEnabled": true,
|
||||
"[python]": {
|
||||
"editor.formatOnSave": true,
|
||||
"editor.codeActionsOnSave": {
|
||||
"source.organizeImports.ruff": "explicit",
|
||||
"source.fixAll": "explicit"
|
||||
},
|
||||
"editor.defaultFormatter": "charliermarsh.ruff"
|
||||
},
|
||||
"editor.rulers": [
|
||||
88
|
||||
],
|
||||
"editor.tabSize": 4,
|
||||
"editor.insertSpaces": true,
|
||||
"editor.trimAutoWhitespace": true,
|
||||
"files.trimTrailingWhitespace": true,
|
||||
"files.insertFinalNewline": true,
|
||||
"files.exclude": {
|
||||
"**/__pycache__": true,
|
||||
"**/.pytest_cache": true,
|
||||
"**/*.pyc": true,
|
||||
"**/.mypy_cache": true,
|
||||
"**/.ruff_cache": true,
|
||||
"_dist/**": true,
|
||||
"docs/_build/**": true,
|
||||
"docs/api_reference/_build/**": true,
|
||||
"**/node_modules": true,
|
||||
"**/.git": false
|
||||
},
|
||||
"search.exclude": {
|
||||
"**/__pycache__": true,
|
||||
"**/*.pyc": true,
|
||||
"_dist/**": true,
|
||||
"docs/_build/**": true,
|
||||
"docs/api_reference/_build/**": true,
|
||||
"**/node_modules": true,
|
||||
"**/.git": true,
|
||||
"uv.lock": true,
|
||||
"yarn.lock": true
|
||||
},
|
||||
"git.autofetch": true,
|
||||
"git.enableSmartCommit": true,
|
||||
"jupyter.askForKernelRestart": false,
|
||||
"jupyter.interactiveWindow.textEditor.executeSelection": true,
|
||||
"[markdown]": {
|
||||
"editor.wordWrap": "on",
|
||||
"editor.quickSuggestions": {
|
||||
"comments": "off",
|
||||
"strings": "off",
|
||||
"other": "off"
|
||||
}
|
||||
},
|
||||
"[yaml]": {
|
||||
"editor.tabSize": 2,
|
||||
"editor.insertSpaces": true
|
||||
},
|
||||
"[json]": {
|
||||
"editor.tabSize": 2,
|
||||
"editor.insertSpaces": true
|
||||
},
|
||||
"python.terminal.activateEnvironment": false,
|
||||
"python.defaultInterpreterPath": "./.venv/bin/python"
|
||||
}
|
||||
325
CLAUDE.md
325
CLAUDE.md
@@ -1,325 +0,0 @@
|
||||
# Global Development Guidelines for LangChain Projects
|
||||
|
||||
## Core Development Principles
|
||||
|
||||
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
|
||||
|
||||
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
|
||||
|
||||
❌ **Bad - Breaking Change:**
|
||||
|
||||
```python
|
||||
def get_user(id, verbose=False): # Changed from `user_id`
|
||||
pass
|
||||
```
|
||||
|
||||
✅ **Good - Stable Interface:**
|
||||
|
||||
```python
|
||||
def get_user(user_id: str, verbose: bool = False) -> User:
|
||||
"""Retrieve user by ID with optional verbose output."""
|
||||
pass
|
||||
```
|
||||
|
||||
**Before making ANY changes to public APIs:**
|
||||
|
||||
- Check if the function/class is exported in `__init__.py`
|
||||
- Look for existing usage patterns in tests and examples
|
||||
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
|
||||
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
|
||||
|
||||
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
|
||||
|
||||
### 2. Code Quality Standards
|
||||
|
||||
**All Python code MUST include type hints and return types.**
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def p(u, d):
|
||||
return [x for x in u if x not in d]
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
|
||||
"""Filter out users that are not in the known users set.
|
||||
|
||||
Args:
|
||||
users: List of user identifiers to filter.
|
||||
known_users: Set of known/valid user identifiers.
|
||||
|
||||
Returns:
|
||||
List of users that are not in the known_users set.
|
||||
"""
|
||||
return [user for user in users if user not in known_users]
|
||||
```
|
||||
|
||||
**Style Requirements:**
|
||||
|
||||
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
|
||||
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
|
||||
- Avoid unnecessary abstraction or premature optimization
|
||||
- Follow existing patterns in the codebase you're modifying
|
||||
|
||||
### 3. Testing Requirements
|
||||
|
||||
**Every new feature or bugfix MUST be covered by unit tests.**
|
||||
|
||||
**Test Organization:**
|
||||
|
||||
- Unit tests: `tests/unit_tests/` (no network calls allowed)
|
||||
- Integration tests: `tests/integration_tests/` (network calls permitted)
|
||||
- Use `pytest` as the testing framework
|
||||
|
||||
**Test Quality Checklist:**
|
||||
|
||||
- [ ] Tests fail when your new logic is broken
|
||||
- [ ] Happy path is covered
|
||||
- [ ] Edge cases and error conditions are tested
|
||||
- [ ] Use fixtures/mocks for external dependencies
|
||||
- [ ] Tests are deterministic (no flaky tests)
|
||||
|
||||
Checklist questions:
|
||||
|
||||
- [ ] Does the test suite fail if your new logic is broken?
|
||||
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
|
||||
- [ ] Do tests use fixtures or mocks where needed?
|
||||
|
||||
```python
|
||||
def test_filter_unknown_users():
|
||||
"""Test filtering unknown users from a list."""
|
||||
users = ["alice", "bob", "charlie"]
|
||||
known_users = {"alice", "bob"}
|
||||
|
||||
result = filter_unknown_users(users, known_users)
|
||||
|
||||
assert result == ["charlie"]
|
||||
assert len(result) == 1
|
||||
```
|
||||
|
||||
### 4. Security and Risk Assessment
|
||||
|
||||
**Security Checklist:**
|
||||
|
||||
- No `eval()`, `exec()`, or `pickle` on user-controlled input
|
||||
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
|
||||
- Remove unreachable/commented code before committing
|
||||
- Race conditions or resource leaks (file handles, sockets, threads).
|
||||
- Ensure proper resource cleanup (file handles, connections)
|
||||
|
||||
❌ **Bad:**
|
||||
|
||||
```python
|
||||
def load_config(path):
|
||||
with open(path) as f:
|
||||
return eval(f.read()) # ⚠️ Never eval config
|
||||
```
|
||||
|
||||
✅ **Good:**
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
def load_config(path: str) -> dict:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
```
|
||||
|
||||
### 5. Documentation Standards
|
||||
|
||||
**Use Google-style docstrings with Args section for all public functions.**
|
||||
|
||||
❌ **Insufficient Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to, msg):
|
||||
"""Send an email to a recipient."""
|
||||
```
|
||||
|
||||
✅ **Complete Documentation:**
|
||||
|
||||
```python
|
||||
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
|
||||
"""
|
||||
Send an email to a recipient with specified priority.
|
||||
|
||||
Args:
|
||||
to: The email address of the recipient.
|
||||
msg: The message body to send.
|
||||
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
|
||||
|
||||
Returns:
|
||||
True if email was sent successfully, False otherwise.
|
||||
|
||||
Raises:
|
||||
InvalidEmailError: If the email address format is invalid.
|
||||
SMTPConnectionError: If unable to connect to email server.
|
||||
"""
|
||||
```
|
||||
|
||||
**Documentation Guidelines:**
|
||||
|
||||
- Types go in function signatures, NOT in docstrings
|
||||
- Focus on "why" rather than "what" in descriptions
|
||||
- Document all parameters, return values, and exceptions
|
||||
- Keep descriptions concise but clear
|
||||
- Use reStructuredText for docstrings to enable rich formatting
|
||||
|
||||
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
|
||||
|
||||
### 6. Architectural Improvements
|
||||
|
||||
**When you encounter code that could be improved, suggest better designs:**
|
||||
|
||||
❌ **Poor Design:**
|
||||
|
||||
```python
|
||||
def process_data(data, db_conn, email_client, logger):
|
||||
# Function doing too many things
|
||||
validated = validate_data(data)
|
||||
result = db_conn.save(validated)
|
||||
email_client.send_notification(result)
|
||||
logger.log(f"Processed {len(data)} items")
|
||||
return result
|
||||
```
|
||||
|
||||
✅ **Better Design:**
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ProcessingResult:
|
||||
"""Result of data processing operation."""
|
||||
items_processed: int
|
||||
success: bool
|
||||
errors: List[str] = field(default_factory=list)
|
||||
|
||||
class DataProcessor:
|
||||
"""Handles data validation, storage, and notification."""
|
||||
|
||||
def __init__(self, db_conn: Database, email_client: EmailClient):
|
||||
self.db = db_conn
|
||||
self.email = email_client
|
||||
|
||||
def process(self, data: List[dict]) -> ProcessingResult:
|
||||
"""Process and store data with notifications."""
|
||||
validated = self._validate_data(data)
|
||||
result = self.db.save(validated)
|
||||
self._notify_completion(result)
|
||||
return result
|
||||
```
|
||||
|
||||
**Design Improvement Areas:**
|
||||
|
||||
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
|
||||
|
||||
- Reduce code duplication through shared utilities
|
||||
- Make unit testing easier
|
||||
- Improve separation of concerns (single responsibility)
|
||||
- Make unit testing easier through dependency injection
|
||||
- Add clarity without adding complexity
|
||||
- Prefer dataclasses for structured data
|
||||
|
||||
## Development Tools & Commands
|
||||
|
||||
### Package Management
|
||||
|
||||
```bash
|
||||
# Add package
|
||||
uv add package-name
|
||||
|
||||
# Sync project dependencies
|
||||
uv sync
|
||||
uv lock
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Run unit tests (no network)
|
||||
make test
|
||||
|
||||
# Don't run integration tests, as API keys must be set
|
||||
|
||||
# Run specific test file
|
||||
uv run --group test pytest tests/unit_tests/test_specific.py
|
||||
```
|
||||
|
||||
### Code Quality
|
||||
|
||||
```bash
|
||||
# Lint code
|
||||
make lint
|
||||
|
||||
# Format code
|
||||
make format
|
||||
|
||||
# Type checking
|
||||
uv run --group lint mypy .
|
||||
```
|
||||
|
||||
### Dependency Management Patterns
|
||||
|
||||
**Local Development Dependencies:**
|
||||
|
||||
```toml
|
||||
[tool.uv.sources]
|
||||
langchain-core = { path = "../core", editable = true }
|
||||
langchain-tests = { path = "../standard-tests", editable = true }
|
||||
```
|
||||
|
||||
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
|
||||
|
||||
```python
|
||||
from langchain_core.tools import tool
|
||||
|
||||
@tool
|
||||
def search_database(query: str) -> str:
|
||||
"""Search the database for relevant information.
|
||||
|
||||
Args:
|
||||
query: The search query string.
|
||||
"""
|
||||
# Implementation here
|
||||
return results
|
||||
```
|
||||
|
||||
## Commit Standards
|
||||
|
||||
**Use Conventional Commits format for PR titles:**
|
||||
|
||||
- `feat(core): add multi-tenant support`
|
||||
- `fix(cli): resolve flag parsing error`
|
||||
- `docs: update API usage examples`
|
||||
- `docs(openai): update API usage examples`
|
||||
|
||||
## Framework-Specific Guidelines
|
||||
|
||||
- Follow the existing patterns in `langchain-core` for base abstractions
|
||||
- Use `langchain_core.callbacks` for execution tracking
|
||||
- Implement proper streaming support where applicable
|
||||
- Avoid deprecated components like legacy `LLMChain`
|
||||
|
||||
### Partner Integrations
|
||||
|
||||
- Follow the established patterns in existing partner libraries
|
||||
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
|
||||
- Include comprehensive integration tests
|
||||
- Document API key requirements and authentication
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Checklist
|
||||
|
||||
Before submitting code changes:
|
||||
|
||||
- [ ] **Breaking Changes**: Verified no public API changes
|
||||
- [ ] **Type Hints**: All functions have complete type annotations
|
||||
- [ ] **Tests**: New functionality is fully tested
|
||||
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
|
||||
- [ ] **Documentation**: Google-style docstrings for public functions
|
||||
- [ ] **Code Quality**: `make lint` and `make format` pass
|
||||
- [ ] **Architecture**: Suggested improvements where applicable
|
||||
- [ ] **Commit Message**: Follows Conventional Commits format
|
||||
73
MIGRATE.md
73
MIGRATE.md
@@ -1,11 +1,70 @@
|
||||
# Migrating
|
||||
|
||||
Please see the following guides for migrating LangChain code:
|
||||
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
|
||||
|
||||
* Migrate to [LangChain v0.3](https://python.langchain.com/docs/versions/v0_3/)
|
||||
* Migrate to [LangChain v0.2](https://python.langchain.com/docs/versions/v0_2/)
|
||||
* Migrating from [LangChain 0.0.x Chains](https://python.langchain.com/docs/versions/migrating_chains/)
|
||||
* Upgrade to [LangGraph Memory](https://python.langchain.com/docs/versions/migrating_memory/)
|
||||
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
|
||||
This migration has already started, but we are remaining backwards compatible until 7/28.
|
||||
On that date, we will remove functionality from `langchain`.
|
||||
Read more about the motivation and the progress [here](https://github.com/langchain-ai/langchain/discussions/8043).
|
||||
|
||||
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
|
||||
This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain.
|
||||
### Migrating to `langchain_experimental`
|
||||
|
||||
We are moving any experimental components of LangChain, or components with vulnerability issues, into `langchain_experimental`.
|
||||
This guide covers how to migrate.
|
||||
|
||||
### Installation
|
||||
|
||||
Previously:
|
||||
|
||||
`pip install -U langchain`
|
||||
|
||||
Now (only if you want to access things in experimental):
|
||||
|
||||
`pip install -U langchain langchain_experimental`
|
||||
|
||||
### Things in `langchain.experimental`
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.experimental import ...`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental import ...`
|
||||
|
||||
### PALChain
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.chains import PALChain`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.pal_chain import PALChain`
|
||||
|
||||
### SQLDatabaseChain
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.chains import SQLDatabaseChain`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.sql import SQLDatabaseChain`
|
||||
|
||||
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out [`create_sql_query_chain`](https://github.com/langchain-ai/langchain/blob/master/docs/extras/use_cases/tabular/sql_query.ipynb)
|
||||
|
||||
`from langchain.chains import create_sql_query_chain`
|
||||
|
||||
### `load_prompt` for Python files
|
||||
|
||||
Note: this only applies if you want to load Python files as prompts.
|
||||
If you want to load json/yaml files, no change is needed.
|
||||
|
||||
Previously:
|
||||
|
||||
`from langchain.prompts import load_prompt`
|
||||
|
||||
Now:
|
||||
|
||||
`from langchain_experimental.prompts import load_prompt`
|
||||
|
||||
90
Makefile
90
Makefile
@@ -1,13 +1,13 @@
|
||||
.PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck spell_check spell_fix lint lint_package lint_tests format format_diff
|
||||
|
||||
.EXPORT_ALL_VARIABLES:
|
||||
UV_FROZEN = true
|
||||
|
||||
## help: Show this help info.
|
||||
help: Makefile
|
||||
@printf "\n\033[1mUsage: make <TARGETS> ...\033[0m\n\n\033[1mTargets:\033[0m\n\n"
|
||||
@sed -n 's/^## //p' $< | awk -F':' '{printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort | sed -e 's/^/ /'
|
||||
|
||||
## all: Default target, shows help.
|
||||
all: help
|
||||
|
||||
## clean: Clean documentation and API documentation artifacts.
|
||||
clean: docs_clean api_docs_clean
|
||||
|
||||
@@ -16,79 +16,47 @@ clean: docs_clean api_docs_clean
|
||||
######################
|
||||
|
||||
## docs_build: Build the documentation.
|
||||
docs_build: docs_clean
|
||||
@echo "📚 Building LangChain documentation..."
|
||||
docs_build:
|
||||
cd docs && make build
|
||||
@echo "✅ Documentation build complete!"
|
||||
|
||||
## docs_clean: Clean the documentation build artifacts.
|
||||
docs_clean:
|
||||
@echo "🧹 Cleaning documentation artifacts..."
|
||||
cd docs && make clean
|
||||
@echo "✅ LangChain documentation cleaned"
|
||||
|
||||
## docs_linkcheck: Run linkchecker on the documentation.
|
||||
docs_linkcheck:
|
||||
@echo "🔗 Checking documentation links..."
|
||||
@if [ -d _dist/docs ]; then \
|
||||
uv run --group test linkchecker _dist/docs/ --ignore-url node_modules; \
|
||||
else \
|
||||
echo "⚠️ Documentation not built. Run 'make docs_build' first."; \
|
||||
exit 1; \
|
||||
fi
|
||||
@echo "✅ Link check complete"
|
||||
poetry run linkchecker _dist/docs/ --ignore-url node_modules
|
||||
|
||||
## api_docs_build: Build the API Reference documentation.
|
||||
api_docs_build: clean
|
||||
@echo "📖 Building API Reference documentation..."
|
||||
uv pip install -e libs/cli
|
||||
uv run --group docs python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && uv run --group docs make html
|
||||
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
@echo "✅ API documentation built"
|
||||
@echo "🌐 Opening documentation in browser..."
|
||||
open docs/api_reference/_build/html/reference.html
|
||||
api_docs_build:
|
||||
poetry run python docs/api_reference/create_api_rst.py
|
||||
cd docs/api_reference && poetry run make html
|
||||
|
||||
API_PKG ?= text-splitters
|
||||
|
||||
api_docs_quick_preview: clean
|
||||
@echo "⚡ Building quick API preview for $(API_PKG)..."
|
||||
uv run --group docs python docs/api_reference/create_api_rst.py $(API_PKG)
|
||||
cd docs/api_reference && uv run --group docs make html
|
||||
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
|
||||
@echo "🌐 Opening preview in browser..."
|
||||
open docs/api_reference/_build/html/reference.html
|
||||
api_docs_quick_preview:
|
||||
poetry run pip install "pydantic<2"
|
||||
poetry run python docs/api_reference/create_api_rst.py $(API_PKG)
|
||||
cd docs/api_reference && poetry run make html
|
||||
open docs/api_reference/_build/html/$(shell echo $(API_PKG) | sed 's/-/_/g')_api_reference.html
|
||||
|
||||
## api_docs_clean: Clean the API Reference documentation build artifacts.
|
||||
api_docs_clean:
|
||||
@echo "🧹 Cleaning API documentation artifacts..."
|
||||
find ./docs/api_reference -name '*_api_reference.rst' -delete
|
||||
git clean -fdX ./docs/api_reference
|
||||
rm -f docs/api_reference/index.md
|
||||
@echo "✅ API documentation cleaned"
|
||||
|
||||
|
||||
## api_docs_linkcheck: Run linkchecker on the API Reference documentation.
|
||||
api_docs_linkcheck:
|
||||
@echo "🔗 Checking API documentation links..."
|
||||
@if [ -f docs/api_reference/_build/html/index.html ]; then \
|
||||
uv run --group test linkchecker docs/api_reference/_build/html/index.html; \
|
||||
else \
|
||||
echo "⚠️ API documentation not built. Run 'make api_docs_build' first."; \
|
||||
exit 1; \
|
||||
fi
|
||||
@echo "✅ API link check complete"
|
||||
poetry run linkchecker docs/api_reference/_build/html/index.html
|
||||
|
||||
## spell_check: Run codespell on the project.
|
||||
spell_check:
|
||||
@echo "✏️ Checking spelling across project..."
|
||||
uv run --group codespell codespell --toml pyproject.toml
|
||||
@echo "✅ Spell check complete"
|
||||
poetry run codespell --toml pyproject.toml
|
||||
|
||||
## spell_fix: Run codespell on the project and fix the errors.
|
||||
spell_fix:
|
||||
@echo "✏️ Fixing spelling errors across project..."
|
||||
uv run --group codespell codespell --toml pyproject.toml -w
|
||||
@echo "✅ Spelling errors fixed"
|
||||
poetry run codespell --toml pyproject.toml -w
|
||||
|
||||
######################
|
||||
# LINTING AND FORMATTING
|
||||
@@ -96,24 +64,12 @@ spell_fix:
|
||||
|
||||
## lint: Run linting on the project.
|
||||
lint lint_package lint_tests:
|
||||
@echo "🔍 Running code linting and checks..."
|
||||
uv run --group lint ruff check docs cookbook
|
||||
uv run --group lint ruff format docs cookbook cookbook --diff
|
||||
git --no-pager grep 'from langchain import' docs cookbook | grep -vE 'from langchain import (hub)' && echo "Error: no importing langchain from root in docs, except for hub" && exit 1 || exit 0
|
||||
|
||||
git --no-pager grep 'api.python.langchain.com' -- docs/docs ':!docs/docs/additional_resources/arxiv_references.mdx' ':!docs/docs/integrations/document_loaders/sitemap.ipynb' || exit 0 && \
|
||||
echo "Error: you should link python.langchain.com/api_reference, not api.python.langchain.com in the docs" && \
|
||||
exit 1
|
||||
@echo "✅ Linting complete"
|
||||
poetry run ruff check docs templates cookbook
|
||||
poetry run ruff format docs templates cookbook --diff
|
||||
poetry run ruff check --select I docs templates cookbook
|
||||
git grep 'from langchain import' docs/docs templates cookbook | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
|
||||
|
||||
## format: Format the project files.
|
||||
format format_diff:
|
||||
@echo "🎨 Formatting project files..."
|
||||
uv run --group lint ruff format docs cookbook
|
||||
uv run --group lint ruff check --fix docs cookbook
|
||||
@echo "✅ Formatting complete"
|
||||
|
||||
update-package-downloads:
|
||||
@echo "📊 Updating package download statistics..."
|
||||
uv run python docs/scripts/packages_yml_get_downloads.py
|
||||
@echo "✅ Package downloads updated"
|
||||
poetry run ruff format docs templates cookbook
|
||||
poetry run ruff check --select I --fix docs templates cookbook
|
||||
|
||||
172
README.md
172
README.md
@@ -1,87 +1,137 @@
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: light)" srcset="docs/static/img/logo-dark.svg">
|
||||
<source media="(prefers-color-scheme: dark)" srcset="docs/static/img/logo-light.svg">
|
||||
<img alt="LangChain Logo" src="docs/static/img/logo-dark.svg" width="80%">
|
||||
</picture>
|
||||
# 🦜️🔗 LangChain
|
||||
|
||||
<div>
|
||||
<br>
|
||||
</div>
|
||||
⚡ Build context-aware reasoning applications ⚡
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/releases)
|
||||
[](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
[](https://pypistats.org/packages/langchain-core)
|
||||
[](https://star-history.com/#langchain-ai/langchain)
|
||||
[](https://libraries.io/github/langchain-ai/langchain)
|
||||
[](https://github.com/langchain-ai/langchain/issues)
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
|
||||
[<img src="https://github.com/codespaces/badge.svg" alt="Open in Github Codespace" title="Open in Github Codespace" width="150" height="20">](https://codespaces.new/langchain-ai/langchain)
|
||||
[](https://codespaces.new/langchain-ai/langchain)
|
||||
[](https://discord.gg/6adMQxSpJS)
|
||||
[](https://twitter.com/langchainai)
|
||||
[](https://codspeed.io/langchain-ai/langchain)
|
||||
|
||||
> [!NOTE]
|
||||
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
|
||||
|
||||
LangChain is a framework for building LLM-powered applications. It helps you chain
|
||||
together interoperable components and third-party integrations to simplify AI
|
||||
application development — all while future-proofing decisions as the underlying
|
||||
technology evolves.
|
||||
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
|
||||
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
|
||||
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
|
||||
|
||||
## Quick Install
|
||||
|
||||
With pip:
|
||||
```bash
|
||||
pip install -U langchain
|
||||
pip install langchain
|
||||
```
|
||||
|
||||
To learn more about LangChain, check out
|
||||
[the docs](https://python.langchain.com/docs/introduction/). If you’re looking for more
|
||||
advanced customization or agent orchestration, check out
|
||||
[LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building
|
||||
controllable agent workflows.
|
||||
With conda:
|
||||
```bash
|
||||
conda install langchain -c conda-forge
|
||||
```
|
||||
|
||||
## Why use LangChain?
|
||||
## 🤔 What is LangChain?
|
||||
|
||||
LangChain helps developers build applications powered by LLMs through a standard
|
||||
interface for models, embeddings, vector stores, and more.
|
||||
**LangChain** is a framework for developing applications powered by large language models (LLMs).
|
||||
|
||||
Use LangChain for:
|
||||
For these applications, LangChain simplifies the entire application lifecycle:
|
||||
|
||||
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and
|
||||
external / internal systems, drawing from LangChain’s vast library of integrations with
|
||||
model providers, tools, vector stores, retrievers, and more.
|
||||
- **Model interoperability**. Swap models in and out as your engineering team
|
||||
experiments to find the best choice for your application’s needs. As the industry
|
||||
frontier evolves, adapt quickly — LangChain’s abstractions keep you moving without
|
||||
losing momentum.
|
||||
- **Open-source libraries**: Build your applications using LangChain's [modular building blocks](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel) and [components](https://python.langchain.com/v0.2/docs/concepts/#components). Integrate with hundreds of [third-party providers](https://python.langchain.com/v0.2/docs/integrations/platforms/).
|
||||
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
|
||||
- **Deployment**: Turn any chain into a REST API with [LangServe](https://python.langchain.com/v0.2/docs/langserve/).
|
||||
|
||||
## LangChain’s ecosystem
|
||||
### Open-source libraries
|
||||
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
|
||||
- **`langchain-community`**: Third party integrations.
|
||||
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
|
||||
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
|
||||
- **[`LangGraph`](https://langchain-ai.github.io/langgraph/)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
|
||||
|
||||
While the LangChain framework can be used standalone, it also integrates seamlessly
|
||||
with any LangChain product, giving developers a full suite of tools when building LLM
|
||||
applications.
|
||||
### Productionization:
|
||||
- **[LangSmith](https://docs.smith.langchain.com/)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
|
||||
|
||||
To improve your LLM application development, pair LangChain with:
|
||||
### Deployment:
|
||||
- **[LangServe](https://python.langchain.com/v0.2/docs/langserve/)**: A library for deploying LangChain chains as REST APIs.
|
||||
|
||||
- [LangSmith](http://www.langchain.com/langsmith) - Helpful for agent evals and
|
||||
observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain
|
||||
visibility in production, and improve performance over time.
|
||||
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can
|
||||
reliably handle complex tasks with LangGraph, our low-level agent orchestration
|
||||
framework. LangGraph offers customizable architecture, long-term memory, and
|
||||
human-in-the-loop workflows — and is trusted in production by companies like LinkedIn,
|
||||
Uber, Klarna, and GitLab.
|
||||
- [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/langgraph_platform/) - Deploy
|
||||
and scale agents effortlessly with a purpose-built deployment platform for long
|
||||
running, stateful workflows. Discover, reuse, configure, and share agents across
|
||||
teams — and iterate quickly with visual prototyping in
|
||||
[LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
|
||||

|
||||
|
||||
## Additional resources
|
||||
## 🧱 What can you build with LangChain?
|
||||
|
||||
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with
|
||||
guided examples on getting started with LangChain.
|
||||
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code
|
||||
snippets for topics such as tool calling, RAG use cases, and more.
|
||||
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key
|
||||
concepts behind the LangChain framework.
|
||||
- [LangChain Forum](https://forum.langchain.com/): Connect with the community and share all of your technical questions, ideas, and feedback.
|
||||
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on
|
||||
navigating base packages and integrations for LangChain.
|
||||
**❓ Question answering with RAG**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/rag/)
|
||||
- End-to-end Example: [Chat LangChain](https://chat.langchain.com) and [repo](https://github.com/langchain-ai/chat-langchain)
|
||||
|
||||
**🧱 Extracting structured output**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/extraction/)
|
||||
- End-to-end Example: [SQL Llama2 Template](https://github.com/langchain-ai/langchain-extract/)
|
||||
|
||||
**🤖 Chatbots**
|
||||
|
||||
- [Documentation](https://python.langchain.com/v0.2/docs/tutorials/chatbot/)
|
||||
- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
|
||||
|
||||
And much more! Head to the [Tutorials](https://python.langchain.com/v0.2/docs/tutorials/) section of the docs for more.
|
||||
|
||||
## 🚀 How does LangChain help?
|
||||
The main value props of the LangChain libraries are:
|
||||
1. **Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
|
||||
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
|
||||
|
||||
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
|
||||
|
||||
## LangChain Expression Language (LCEL)
|
||||
|
||||
LCEL is the foundation of many of LangChain's components, and is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.
|
||||
|
||||
- **[Overview](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel)**: LCEL and its benefits
|
||||
- **[Interface](https://python.langchain.com/v0.2/docs/concepts/#runnable-interface)**: The standard Runnable interface for LCEL objects
|
||||
- **[Primitives](https://python.langchain.com/v0.2/docs/how_to/#langchain-expression-language-lcel)**: More on the primitives LCEL includes
|
||||
- **[Cheatsheet](https://python.langchain.com/v0.2/docs/how_to/lcel_cheatsheet/)**: Quick overview of the most common usage patterns
|
||||
|
||||
## Components
|
||||
|
||||
Components fall into the following **modules**:
|
||||
|
||||
**📃 Model I/O**
|
||||
|
||||
This includes [prompt management](https://python.langchain.com/v0.2/docs/concepts/#prompt-templates), [prompt optimization](https://python.langchain.com/v0.2/docs/concepts/#example-selectors), a generic interface for [chat models](https://python.langchain.com/v0.2/docs/concepts/#chat-models) and [LLMs](https://python.langchain.com/v0.2/docs/concepts/#llms), and common utilities for working with [model outputs](https://python.langchain.com/v0.2/docs/concepts/#output-parsers).
|
||||
|
||||
**📚 Retrieval**
|
||||
|
||||
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/v0.2/docs/concepts/#document-loaders) from a variety of sources, [preparing it](https://python.langchain.com/v0.2/docs/concepts/#text-splitters), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/v0.2/docs/concepts/#retrievers) it for use in the generation step.
|
||||
|
||||
**🤖 Agents**
|
||||
|
||||
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a [standard interface for agents](https://python.langchain.com/v0.2/docs/concepts/#agents) along with the [LangGraph](https://github.com/langchain-ai/langgraph) extension for building custom agents.
|
||||
|
||||
## 📖 Documentation
|
||||
|
||||
Please see [here](https://python.langchain.com) for full documentation, which includes:
|
||||
|
||||
- [Introduction](https://python.langchain.com/v0.2/docs/introduction/): Overview of the framework and the structure of the docs.
|
||||
- [Tutorials](https://python.langchain.com/docs/use_cases/): If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
|
||||
- [How-to guides](https://python.langchain.com/v0.2/docs/how_to/): Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
|
||||
- [Conceptual guide](https://python.langchain.com/v0.2/docs/concepts/): Conceptual explanations of the key parts of the framework.
|
||||
- [API Reference](https://api.python.langchain.com): Thorough documentation of every class and method.
|
||||
|
||||
## 🌐 Ecosystem
|
||||
|
||||
- [🦜🛠️ LangSmith](https://docs.smith.langchain.com/): Tracing and evaluating your language model applications and intelligent agents to help you move from prototype to production.
|
||||
- [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph/): Creating stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain primitives.
|
||||
- [🦜🏓 LangServe](https://python.langchain.com/docs/langserve): Deploying LangChain runnables and chains as REST APIs.
|
||||
- [LangChain Templates](https://python.langchain.com/v0.2/docs/templates/): Example applications hosted with LangServe.
|
||||
|
||||
|
||||
## 💁 Contributing
|
||||
|
||||
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
|
||||
|
||||
For detailed information on how to contribute, see [here](https://python.langchain.com/v0.2/docs/contributing/).
|
||||
|
||||
## 🌟 Contributors
|
||||
|
||||
[](https://github.com/langchain-ai/langchain/graphs/contributors)
|
||||
|
||||
68
SECURITY.md
68
SECURITY.md
@@ -1,44 +1,20 @@
|
||||
# Security Policy
|
||||
|
||||
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources.
|
||||
|
||||
## Best practices
|
||||
|
||||
When building such applications developers should remember to follow good security practices:
|
||||
|
||||
* [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc. as appropriate for your application.
|
||||
* **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it's safest to assume that any LLM able to use those credentials may in fact delete data.
|
||||
* [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It's best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
|
||||
|
||||
Risks of not doing so include, but are not limited to:
|
||||
|
||||
* Data corruption or loss.
|
||||
* Unauthorized access to confidential information.
|
||||
* Compromised performance or availability of critical resources.
|
||||
|
||||
Example scenarios with mitigation strategies:
|
||||
|
||||
* A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
|
||||
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
|
||||
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
|
||||
|
||||
If you're building applications that access external resources like file systems, APIs
|
||||
or databases, consider speaking with your company's security team to determine how to best
|
||||
design and secure your applications.
|
||||
|
||||
## Reporting OSS Vulnerabilities
|
||||
|
||||
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
|
||||
a bounty program for our open source projects.
|
||||
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
|
||||
a bounty program for our open source projects.
|
||||
|
||||
Please report security vulnerabilities associated with the LangChain
|
||||
open source projects at [huntr](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true).
|
||||
Please report security vulnerabilities associated with the LangChain
|
||||
open source projects by visiting the following link:
|
||||
|
||||
[https://huntr.com/bounties/disclose/](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true)
|
||||
|
||||
Before reporting a vulnerability, please review:
|
||||
|
||||
1) In-Scope Targets and Out-of-Scope Targets below.
|
||||
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
|
||||
3) The [Best Practices](#best-practices) above to
|
||||
3) LangChain [security guidelines](https://python.langchain.com/docs/security) to
|
||||
understand what we consider to be a security vulnerability vs. developer
|
||||
responsibility.
|
||||
|
||||
@@ -46,39 +22,39 @@ Before reporting a vulnerability, please review:
|
||||
|
||||
The following packages and repositories are eligible for bug bounties:
|
||||
|
||||
* langchain-core
|
||||
* langchain (see exceptions)
|
||||
* langchain-community (see exceptions)
|
||||
* langgraph
|
||||
* langserve
|
||||
- langchain-core
|
||||
- langchain (see exceptions)
|
||||
- langchain-community (see exceptions)
|
||||
- langgraph
|
||||
- langserve
|
||||
|
||||
### Out of Scope Targets
|
||||
|
||||
All out of scope targets defined by huntr as well as:
|
||||
|
||||
* **langchain-experimental**: This repository is for experimental code and is not
|
||||
eligible for bug bounties (see [package warning](https://pypi.org/project/langchain-experimental/)), bug reports to it will be marked as interesting or waste of
|
||||
- **langchain-experimental**: This repository is for experimental code and is not
|
||||
eligible for bug bounties, bug reports to it will be marked as interesting or waste of
|
||||
time and published with no bounty attached.
|
||||
* **tools**: Tools in either langchain or langchain-community are not eligible for bug
|
||||
- **tools**: Tools in either langchain or langchain-community are not eligible for bug
|
||||
bounties. This includes the following directories
|
||||
* libs/langchain/langchain/tools
|
||||
* libs/community/langchain_community/tools
|
||||
* Please review the [Best Practices](#best-practices)
|
||||
- langchain/tools
|
||||
- langchain-community/tools
|
||||
- Please review our [security guidelines](https://python.langchain.com/docs/security)
|
||||
for more details, but generally tools interact with the real world. Developers are
|
||||
expected to understand the security implications of their code and are responsible
|
||||
for the security of their tools.
|
||||
* Code documented with security notices. This will be decided on a case by
|
||||
- Code documented with security notices. This will be decided done on a case by
|
||||
case basis, but likely will not be eligible for a bounty as the code is already
|
||||
documented with guidelines for developers that should be followed for making their
|
||||
application secure.
|
||||
* Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)).
|
||||
- Any LangSmith related repositories or APIs see below.
|
||||
|
||||
## Reporting LangSmith Vulnerabilities
|
||||
|
||||
Please report security vulnerabilities associated with LangSmith by email to `security@langchain.dev`.
|
||||
|
||||
* LangSmith site: [https://smith.langchain.com](https://smith.langchain.com)
|
||||
* SDK client: [https://github.com/langchain-ai/langsmith-sdk](https://github.com/langchain-ai/langsmith-sdk)
|
||||
- LangSmith site: https://smith.langchain.com
|
||||
- SDK client: https://github.com/langchain-ai/langsmith-sdk
|
||||
|
||||
### Other Security Concerns
|
||||
|
||||
|
||||
@@ -60,7 +60,7 @@
|
||||
"id": "CI8Elyc5gBQF"
|
||||
},
|
||||
"source": [
|
||||
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number."
|
||||
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint it ready, you need to copy its number."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -47,7 +47,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -61,7 +61,7 @@
|
||||
],
|
||||
"source": [
|
||||
"# Local\n",
|
||||
"from langchain_ollama import ChatOllama\n",
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"\n",
|
||||
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
|
||||
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",
|
||||
|
||||
@@ -64,7 +64,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -U langchain openai langchain-chroma langchain-experimental # (newest versions required for multi-modal)"
|
||||
"! pip install -U langchain openai chromadb langchain-experimental # (newest versions required for multi-modal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -355,7 +355,7 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
|
||||
@@ -37,7 +37,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install -U --quiet langchain langchain-chroma langchain-community openai langchain-experimental\n",
|
||||
"%pip install -U --quiet langchain langchain_community openai chromadb langchain-experimental\n",
|
||||
"%pip install --quiet \"unstructured[all-docs]\" pypdf pillow pydantic lxml pillow matplotlib chromadb tiktoken"
|
||||
]
|
||||
},
|
||||
@@ -144,7 +144,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "kWDWfSDBMPl8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -185,7 +185,7 @@
|
||||
" )\n",
|
||||
" # Text summary chain\n",
|
||||
" model = VertexAI(\n",
|
||||
" temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024\n",
|
||||
" temperature=0, model_name=\"gemini-pro\", max_tokens=1024\n",
|
||||
" ).with_fallbacks([empty_response])\n",
|
||||
" summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()\n",
|
||||
"\n",
|
||||
@@ -235,7 +235,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 6,
|
||||
"id": "PeK9bzXv3olF",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -254,7 +254,7 @@
|
||||
"\n",
|
||||
"def image_summarize(img_base64, prompt):\n",
|
||||
" \"\"\"Make image summary\"\"\"\n",
|
||||
" model = ChatVertexAI(model=\"gemini-2.5-flash\", max_tokens=1024)\n",
|
||||
" model = ChatVertexAI(model=\"gemini-pro-vision\", max_tokens=1024)\n",
|
||||
"\n",
|
||||
" msg = model.invoke(\n",
|
||||
" [\n",
|
||||
@@ -344,8 +344,8 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.embeddings import VertexAIEmbeddings\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"\n",
|
||||
@@ -394,7 +394,7 @@
|
||||
"# The vectorstore to use to index the summaries\n",
|
||||
"vectorstore = Chroma(\n",
|
||||
" collection_name=\"mm_rag_cj_blog\",\n",
|
||||
" embedding_function=VertexAIEmbeddings(model_name=\"text-embedding-005\"),\n",
|
||||
" embedding_function=VertexAIEmbeddings(model_name=\"textembedding-gecko@latest\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create retriever\n",
|
||||
@@ -431,7 +431,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 9,
|
||||
"id": "GlwCErBaCKQW",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -445,7 +445,7 @@
|
||||
"\n",
|
||||
"\n",
|
||||
"def plt_img_base64(img_base64):\n",
|
||||
" \"\"\"Display base64 encoded string as image\"\"\"\n",
|
||||
" \"\"\"Disply base64 encoded string as image\"\"\"\n",
|
||||
" # Create an HTML img tag with the base64 string as the source\n",
|
||||
" image_html = f'<img src=\"data:image/jpeg;base64,{img_base64}\" />'\n",
|
||||
" # Display the image by rendering the HTML\n",
|
||||
@@ -553,7 +553,7 @@
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
" # Multi-modal LLM\n",
|
||||
" model = ChatVertexAI(temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024)\n",
|
||||
" model = ChatVertexAI(temperature=0, model_name=\"gemini-pro-vision\", max_tokens=1024)\n",
|
||||
"\n",
|
||||
" # RAG pipeline\n",
|
||||
" chain = (\n",
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install -U langchain umap-learn scikit-learn langchain_community tiktoken langchain-openai langchainhub langchain-chroma langchain-anthropic"
|
||||
"pip install -U langchain umap-learn scikit-learn langchain_community tiktoken langchain-openai langchainhub chromadb langchain-anthropic"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -645,7 +645,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"\n",
|
||||
"# Initialize all_texts with leaf_texts\n",
|
||||
"all_texts = leaf_texts.copy()\n",
|
||||
|
||||
@@ -4,8 +4,6 @@ Example code for building applications with LangChain, with an emphasis on more
|
||||
|
||||
Notebook | Description
|
||||
:- | :-
|
||||
[agent_fireworks_ai_langchain_mongodb.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/agent_fireworks_ai_langchain_mongodb.ipynb) | Build an AI Agent With Memory Using MongoDB, LangChain and FireWorksAI.
|
||||
[mongodb-langchain-cache-memory.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/mongodb-langchain-cache-memory.ipynb) | Build a RAG Application with Semantic Cache Using MongoDB and LangChain.
|
||||
[LLaMA2_sql_chat.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/LLaMA2_sql_chat.ipynb) | Build a chat application that interacts with a SQL database using an open source llm (llama2), specifically demonstrated on an SQLite database containing rosters.
|
||||
[Semi_Structured_RAG.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_Structured_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data, including text and tables, using unstructured for parsing, multi-vector retriever for storing, and lcel for implementing chains.
|
||||
[Semi_structured_and_multi_moda...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using unstructured for parsing, multi-vector retriever for storage and retrieval, and lcel for implementing chains.
|
||||
@@ -21,6 +19,7 @@ Notebook | Description
|
||||
[code-analysis-deeplake.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/code-analysis-deeplake.ipynb) | Analyze its own code base with the help of gpt and activeloop's deep lake.
|
||||
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval.ipynb) | Build a custom agent that can interact with ai plugins by retrieving tools and creating natural language wrappers around openapi endpoints.
|
||||
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval_using_plugnplai.ipynb) | Build a custom agent with plugin retrieval functionality, utilizing ai plugins from the `plugnplai` directory.
|
||||
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
|
||||
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
|
||||
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
|
||||
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
|
||||
@@ -37,7 +36,6 @@ Notebook | Description
|
||||
[llm_symbolic_math.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/llm_symbolic_math.ipynb) | Solve algebraic equations with the help of llms (language learning models) and sympy, a python library for symbolic mathematics.
|
||||
[meta_prompt.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/meta_prompt.ipynb) | Implement the meta-prompt concept, which is a method for building self-improving agents that reflect on their own performance and modify their instructions accordingly.
|
||||
[multi_modal_output_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_modal_output_agent.ipynb) | Generate multi-modal outputs, specifically images and text.
|
||||
[multi_modal_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_modal_RAG_vdms.ipynb) | Perform retrieval-augmented generation (rag) on documents including text and images, using unstructured for parsing, Intel's Visual Data Management System (VDMS) as the vectorstore, and chains.
|
||||
[multi_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_player_dnd.ipynb) | Simulate multi-player dungeons & dragons games, with a custom function determining the speaking schedule of the agents.
|
||||
[multiagent_authoritarian.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_authoritarian.ipynb) | Implement a multi-agent simulation where a privileged agent controls the conversation, including deciding who speaks and when the conversation ends, in the context of a simulated news network.
|
||||
[multiagent_bidding.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_bidding.ipynb) | Implement a multi-agent simulation where agents bid to speak, with the highest bidder speaking next, demonstrated through a fictitious presidential debate example.
|
||||
@@ -49,7 +47,7 @@ Notebook | Description
|
||||
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
|
||||
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
|
||||
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
|
||||
[rag_upstage_document_parse_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_document_parse_groundedness_check.ipynb) | End-to-end RAG example using Upstage Document Parse and Groundedness Check.
|
||||
[rag_upstage_layout_analysis_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_layout_analysis_groundedness_check.ipynb) | End-to-end RAG example using Upstage Layout Analysis and Groundedness Check.
|
||||
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
|
||||
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
|
||||
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
|
||||
@@ -59,8 +57,4 @@ Notebook | Description
|
||||
[two_agent_debate_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_agent_debate_tools.ipynb) | Simulate multi-agent dialogues where the agents can utilize various tools.
|
||||
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialogue simulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
|
||||
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.
|
||||
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
|
||||
[rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
|
||||
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
|
||||
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
|
||||
[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
|
||||
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
|
||||
@@ -39,7 +39,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml langchainhub"
|
||||
"! pip install langchain unstructured[all-docs] pydantic lxml langchainhub"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -320,7 +320,7 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
|
||||
@@ -59,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
|
||||
"! pip install langchain unstructured[all-docs] pydantic lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -375,7 +375,7 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
|
||||
@@ -59,7 +59,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
|
||||
"! pip install langchain unstructured[all-docs] pydantic lxml"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -204,14 +204,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "523e6ed2-2132-4748-bdb7-db765f20648d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_ollama import ChatOllama"
|
||||
"from langchain_core.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -378,8 +378,8 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.embeddings import GPT4AllEmbeddings\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"# The vectorstore to use to index the child chunks\n",
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -U langchain openai langchain_chroma langchain-experimental # (newest versions required for multi-modal)"
|
||||
"! pip install -U langchain openai chromadb langchain-experimental # (newest versions required for multi-modal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -30,7 +30,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
|
||||
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml matplotlib tiktoken open_clip_torch torch"
|
||||
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml pillow matplotlib tiktoken open_clip_torch torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -132,7 +132,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"baseline = Chroma.from_texts(\n",
|
||||
@@ -409,7 +409,7 @@
|
||||
" table_summaries,\n",
|
||||
" tables,\n",
|
||||
" image_summaries,\n",
|
||||
" img_base64_list,\n",
|
||||
" image_summaries,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -22,25 +22,13 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e8d63d14-138d-4aa5-a741-7fd3537d00aa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 16,
|
||||
"id": "2e87c10a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAI, OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
@@ -49,7 +37,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 17,
|
||||
"id": "0b7b772b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -66,10 +54,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 18,
|
||||
"id": "f2675861",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Running Chroma using direct local API.\n",
|
||||
"Using DuckDB in-memory for database. Data will be transient.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
@@ -84,7 +81,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"id": "bc5403d4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -96,25 +93,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "1431cded",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import WebBaseLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "915d3ff3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -124,20 +113,16 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 7,
|
||||
"id": "96a2edf8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Created a chunk of size 2122, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 3187, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1017, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1049, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 1256, which is longer than the specified 1000\n",
|
||||
"Created a chunk of size 2321, which is longer than the specified 1000\n"
|
||||
"Running Chroma using direct local API.\n",
|
||||
"Using DuckDB in-memory for database. Data will be transient.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -150,6 +135,14 @@
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "71ecef90",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c0a6c031",
|
||||
@@ -160,30 +153,31 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 43,
|
||||
"id": "eb142786",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import things that are needed generically\n",
|
||||
"from langchain.agents import Tool"
|
||||
"from langchain.agents import AgentType, Tool, initialize_agent\n",
|
||||
"from langchain_openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 44,
|
||||
"id": "850bc4e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
|
||||
" ),\n",
|
||||
@@ -192,116 +186,94 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "70c461d8-aaca-4f2a-9a93-bf35841cc615",
|
||||
"execution_count": 45,
|
||||
"id": "fc47f230",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"# Construct the agent. We will use the default agent type here.\n",
|
||||
"# See documentation for a full list of options.\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "a6d2b911-3044-4430-a35b-75832bb45334",
|
||||
"execution_count": 46,
|
||||
"id": "10ca2db8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What did biden say about ketanji brown jackson in the state of the union address?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" state_of_union_qa_system (call_26QlRdsptjEJJZjFsAUjEbaH)\n",
|
||||
" Call ID: call_26QlRdsptjEJJZjFsAUjEbaH\n",
|
||||
" Args:\n",
|
||||
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"\n",
|
||||
"In the State of the Union address, Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "e836b4cd-abf7-49eb-be0e-b9ad501213f3",
|
||||
"execution_count": 47,
|
||||
"id": "4e91b811",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Why use ruff over flake8?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_KqDoWeO9bo9OAXdxOsCb6msC)\n",
|
||||
" Call ID: call_KqDoWeO9bo9OAXdxOsCb6msC\n",
|
||||
" Args:\n",
|
||||
" __arg1: Why use ruff over flake8?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: ruff_qa_system\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There are a few reasons why someone might choose to use Ruff over Flake8:\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"\n",
|
||||
"1. Larger rule set: Ruff implements over 800 rules, while Flake8 only implements around 200. This means that Ruff can catch more potential issues in your code.\n",
|
||||
"\n",
|
||||
"2. Better compatibility with other tools: Ruff is designed to work well with other tools like Black, isort, and type checkers like Mypy. This means that you can use Ruff alongside these tools to get more comprehensive feedback on your code.\n",
|
||||
"\n",
|
||||
"3. Automatic fixing of lint violations: Unlike Flake8, Ruff is capable of automatically fixing its own lint violations. This can save you time and effort when fixing issues in your code.\n",
|
||||
"\n",
|
||||
"4. Native implementation of popular Flake8 plugins: Ruff re-implements some of the most popular Flake8 plugins natively, which means you don't have to install and configure multiple plugins to get the same functionality.\n",
|
||||
"\n",
|
||||
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8, making it a popular choice for many developers.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\n",
|
||||
"You might choose to use Ruff over Flake8 for several reasons:\n",
|
||||
"\n",
|
||||
"1. Ruff has a much larger rule set, implementing over 800 rules compared to Flake8's roughly 200, so it can catch more potential issues.\n",
|
||||
"2. Ruff is designed to work better with other tools like Black, isort, and type checkers like Mypy, providing more comprehensive code feedback.\n",
|
||||
"3. Ruff can automatically fix its own lint violations, which Flake8 cannot, saving time and effort.\n",
|
||||
"4. Ruff natively implements some popular Flake8 plugins, so you don't need to install and configure multiple plugins separately.\n",
|
||||
"\n",
|
||||
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 47,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"Why use ruff over flake8?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\"Why use ruff over flake8?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -324,20 +296,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 48,
|
||||
"id": "f59b377e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
|
||||
" return_direct=True,\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
|
||||
" return_direct=True,\n",
|
||||
@@ -347,92 +319,90 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "06f69c0f-c83d-4b7f-a1c8-7614aced3bae",
|
||||
"execution_count": 49,
|
||||
"id": "8615707a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "a6b38c12-ac25-43c0-b9c2-2b1985ab4825",
|
||||
"execution_count": 50,
|
||||
"id": "36e718a9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What did biden say about ketanji brown jackson in the state of the union address?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" state_of_union_qa_system (call_yjxh11OnZiauoyTAn9npWdxj)\n",
|
||||
" Call ID: call_yjxh11OnZiauoyTAn9npWdxj\n",
|
||||
" Args:\n",
|
||||
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 50,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "88f08d86-7972-4148-8128-3ac8898ad68a",
|
||||
"execution_count": 51,
|
||||
"id": "edfd0a1a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"Why use ruff over flake8?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_GiWWfwF6wbbRFQrHlHbhRtGW)\n",
|
||||
" Call ID: call_GiWWfwF6wbbRFQrHlHbhRtGW\n",
|
||||
" Args:\n",
|
||||
" __arg1: What are the advantages of using ruff over flake8 for Python linting?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: ruff_qa_system\n",
|
||||
"\n",
|
||||
" Ruff has a larger rule set, supports automatic fixing of lint violations, and does not require the installation of additional plugins. It also has better compatibility with Black and can be used alongside a type checker for more comprehensive code analysis.\n"
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"Why use ruff over flake8?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\"Why use ruff over flake8?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -447,19 +417,19 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 57,
|
||||
"id": "d397a233",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name=\"state_of_union_qa_system\",\n",
|
||||
" name=\"State of Union QA System\",\n",
|
||||
" func=state_of_union.run,\n",
|
||||
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"ruff_qa_system\",\n",
|
||||
" name=\"Ruff QA System\",\n",
|
||||
" func=ruff.run,\n",
|
||||
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
|
||||
" ),\n",
|
||||
@@ -468,60 +438,60 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "41743f29-150d-40ba-aa8e-3a63c32216aa",
|
||||
"execution_count": 58,
|
||||
"id": "06157240",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langgraph.prebuilt import create_react_agent\n",
|
||||
"\n",
|
||||
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
|
||||
"# Construct the agent. We will use the default agent type here.\n",
|
||||
"# See documentation for a full list of options.\n",
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "e20e81dd-284a-4d07-9160-63a84b65cba8",
|
||||
"execution_count": 59,
|
||||
"id": "b492b520",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"================================\u001b[1m Human Message \u001b[0m=================================\n",
|
||||
"\n",
|
||||
"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"Tool Calls:\n",
|
||||
" ruff_qa_system (call_VOnxiOEehauQyVOTjDJkR5L2)\n",
|
||||
" Call ID: call_VOnxiOEehauQyVOTjDJkR5L2\n",
|
||||
" Args:\n",
|
||||
" __arg1: What tool does ruff use to run over Jupyter Notebooks?\n",
|
||||
" state_of_union_qa_system (call_AbSsXAxwe4JtCRhga926SxOZ)\n",
|
||||
" Call ID: call_AbSsXAxwe4JtCRhga926SxOZ\n",
|
||||
" Args:\n",
|
||||
" __arg1: Did the president mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union?\n",
|
||||
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
|
||||
"Name: state_of_union_qa_system\n",
|
||||
"\n",
|
||||
" No, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n",
|
||||
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.\n",
|
||||
"Action: Ruff QA System\n",
|
||||
"Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now need to find out if the president mentioned this tool in the state of the union.\n",
|
||||
"Action: State of Union QA System\n",
|
||||
"Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
|
||||
"Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
|
||||
"\n",
|
||||
"Ruff does not support source.organizeImports and source.fixAll code actions in Jupyter Notebooks. Additionally, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n"
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'No, the president did not mention nbQA in the state of the union.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 59,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"input_message = {\n",
|
||||
" \"role\": \"user\",\n",
|
||||
" \"content\": \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\",\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"for step in agent.stream(\n",
|
||||
" {\"messages\": [input_message]},\n",
|
||||
" stream_mode=\"values\",\n",
|
||||
"):\n",
|
||||
" step[\"messages\"][-1].pretty_print()"
|
||||
"agent.run(\n",
|
||||
" \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -549,7 +519,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.4"
|
||||
"version": "3.10.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -14,7 +14,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install -qU langchain-airbyte langchain_chroma"
|
||||
"%pip install -qU langchain-airbyte"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -123,7 +123,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import tiktoken\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"enc = tiktoken.get_encoding(\"cl100k_base\")\n",
|
||||
|
||||
@@ -31,8 +31,8 @@
|
||||
"source": [
|
||||
"# Optional\n",
|
||||
"import os\n",
|
||||
"# os.environ['LANGSMITH_TRACING'] = 'true' # enables tracing\n",
|
||||
"# os.environ['LANGSMITH_API_KEY'] = <your-api-key>"
|
||||
"# os.environ['LANGCHAIN_TRACING_V2'] = 'true' # enables tracing\n",
|
||||
"# os.environ['LANGCHAIN_API_KEY'] = <your-api-key>"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -66,7 +66,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!python3 -m pip install --upgrade langchain langchain-deeplake openai"
|
||||
"#!python3 -m pip install --upgrade langchain deeplake openai"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -90,8 +90,7 @@
|
||||
"import os\n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"if \"OPENAI_API_KEY\" not in os.environ:\n",
|
||||
" os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
|
||||
"# Please manually enter OpenAI Key"
|
||||
]
|
||||
},
|
||||
@@ -666,26 +665,89 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Your Deep Lake dataset has been successfully created!\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" \r"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Dataset(path='hub://adilkhan/langchain-code', tensors=['embedding', 'id', 'metadata', 'text'])\n",
|
||||
"\n",
|
||||
" tensor htype shape dtype compression\n",
|
||||
" ------- ------- ------- ------- ------- \n",
|
||||
" embedding embedding (8244, 1536) float32 None \n",
|
||||
" id text (8244, 1) str None \n",
|
||||
" metadata json (8244, 1) str None \n",
|
||||
" text text (8244, 1) str None \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": []
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_deeplake.vectorstores import DeeplakeVectorStore\n",
|
||||
"from langchain_community.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"username = \"<USERNAME_OR_ORG>\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"db = DeeplakeVectorStore.from_documents(\n",
|
||||
" documents=texts,\n",
|
||||
" embedding=embeddings,\n",
|
||||
" dataset_path=f\"hub://{username}/langchain-code\",\n",
|
||||
" overwrite=True,\n",
|
||||
"db = DeepLake.from_documents(\n",
|
||||
" texts, embeddings, dataset_path=f\"hub://{username}/langchain-code\", overwrite=True\n",
|
||||
")\n",
|
||||
"db"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from langchain_community.vectorstores import DeepLake\n",
|
||||
"\n",
|
||||
"# db = DeepLake.from_documents(\n",
|
||||
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
|
||||
"# )\n",
|
||||
"# db"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
@@ -697,16 +759,24 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Deep Lake Dataset in hub://adilkhan/langchain-code already exists, loading from the storage\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db = DeeplakeVectorStore(\n",
|
||||
"db = DeepLake(\n",
|
||||
" dataset_path=f\"hub://{username}/langchain-code\",\n",
|
||||
" read_only=True,\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" embedding=embeddings,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -725,6 +795,36 @@
|
||||
"retriever.search_kwargs[\"k\"] = 20"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def filter(x):\n",
|
||||
" # filter based on source code\n",
|
||||
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
" # filter based on path e.g. extension\n",
|
||||
" metadata = x[\"metadata\"].data()[\"value\"]\n",
|
||||
" return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"### turn on below for custom filtering\n",
|
||||
"# retriever.search_kwargs['filter'] = filter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
@@ -736,8 +836,10 @@
|
||||
"from langchain.chains import ConversationalRetrievalChain\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"qa = RetrievalQA.from_llm(model, retriever=retriever)"
|
||||
"model = ChatOpenAI(\n",
|
||||
" model_name=\"gpt-3.5-turbo-0613\"\n",
|
||||
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -38,7 +38,7 @@
|
||||
"source": [
|
||||
"Connection is via `cassio` using `auto=True` parameter, and the notebook uses OpenAI. You should create a `.env` file accordingly.\n",
|
||||
"\n",
|
||||
"For Cassandra, set:\n",
|
||||
"For Casssandra, set:\n",
|
||||
"```bash\n",
|
||||
"CASSANDRA_CONTACT_POINTS\n",
|
||||
"CASSANDRA_USERNAME\n",
|
||||
@@ -229,9 +229,9 @@
|
||||
" \"smoke\",\n",
|
||||
" \"temp\",\n",
|
||||
"]\n",
|
||||
"assert all([column in df.columns for column in expected_columns]), (\n",
|
||||
" \"DataFrame does not have the expected columns\"\n",
|
||||
")"
|
||||
"assert all(\n",
|
||||
" [column in df.columns for column in expected_columns]\n",
|
||||
"), \"DataFrame does not have the expected columns\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
273
cookbook/databricks_sql_db.ipynb
Normal file
273
cookbook/databricks_sql_db.ipynb
Normal file
@@ -0,0 +1,273 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "707d13a7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Databricks\n",
|
||||
"\n",
|
||||
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
|
||||
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0076d072",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "739b489b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install databricks-sql-connector"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73113163",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connecting to Databricks\n",
|
||||
"\n",
|
||||
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
|
||||
"\n",
|
||||
"### Syntax\n",
|
||||
"```python\n",
|
||||
"SQLDatabase.from_databricks(\n",
|
||||
" catalog: str,\n",
|
||||
" schema: str,\n",
|
||||
" host: Optional[str] = None,\n",
|
||||
" api_token: Optional[str] = None,\n",
|
||||
" warehouse_id: Optional[str] = None,\n",
|
||||
" cluster_id: Optional[str] = None,\n",
|
||||
" engine_args: Optional[dict] = None,\n",
|
||||
" **kwargs: Any)\n",
|
||||
"```\n",
|
||||
"### Required Parameters\n",
|
||||
"* `catalog`: The catalog name in the Databricks database.\n",
|
||||
"* `schema`: The schema name in the catalog.\n",
|
||||
"\n",
|
||||
"### Optional Parameters\n",
|
||||
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
|
||||
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
|
||||
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
|
||||
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
|
||||
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
|
||||
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
|
||||
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b11c7e48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "8102bca0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Connecting to Databricks with SQLDatabase wrapper\n",
|
||||
"from langchain_community.utilities import SQLDatabase\n",
|
||||
"\n",
|
||||
"db = SQLDatabase.from_databricks(catalog=\"samples\", schema=\"nyctaxi\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "9dd36f58",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Creating a OpenAI Chat LLM wrapper\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b5c5f1a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### SQL Chain example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "36f2270b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.utilities import SQLDatabaseChain\n",
|
||||
"\n",
|
||||
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "4e2b5f25",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
|
||||
"What is the average duration of taxi rides that start between midnight and 6am?\n",
|
||||
"SQLQuery:\u001b[32;1m\u001b[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
|
||||
"FROM trips\n",
|
||||
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001b[0m\n",
|
||||
"SQLResult: \u001b[33;1m\u001b[1;3m[(987.8122786304605,)]\u001b[0m\n",
|
||||
"Answer:\u001b[32;1m\u001b[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"db_chain.run(\n",
|
||||
" \"What is the average duration of taxi rides that start between midnight and 6am?\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e496d5e5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### SQL Database Agent example\n",
|
||||
"\n",
|
||||
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/toolkits/sql_database.html) for answering questions over a Databricks database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "9918e86a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_sql_agent\n",
|
||||
"from langchain_community.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
|
||||
"agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "c484a76e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n",
|
||||
"Action Input: \u001b[0m\n",
|
||||
"Observation: \u001b[38;5;200m\u001b[1;3mtrips\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
|
||||
"Action: schema_sql_db\n",
|
||||
"Action Input: trips\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m\n",
|
||||
"CREATE TABLE trips (\n",
|
||||
"\ttpep_pickup_datetime TIMESTAMP, \n",
|
||||
"\ttpep_dropoff_datetime TIMESTAMP, \n",
|
||||
"\ttrip_distance FLOAT, \n",
|
||||
"\tfare_amount FLOAT, \n",
|
||||
"\tpickup_zip INT, \n",
|
||||
"\tdropoff_zip INT\n",
|
||||
") USING DELTA\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from trips table:\n",
|
||||
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
|
||||
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
|
||||
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
|
||||
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
|
||||
"*/\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
|
||||
"Action: query_checker_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Observation: \u001b[31;1m\u001b[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
|
||||
"Action: query_sql_db\n",
|
||||
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m[(30.6, '0 00:43:31.000000000')]\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
|
||||
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What is the longest trip distance and how long did it take?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -39,7 +39,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain docugami==0.0.8 dgml-utils==0.3.0 pydantic langchainhub langchain-chroma hnswlib --upgrade --quiet"
|
||||
"! pip install langchain docugami==0.0.8 dgml-utils==0.3.0 pydantic langchainhub chromadb hnswlib --upgrade --quiet"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -547,7 +547,7 @@
|
||||
"\n",
|
||||
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
|
||||
"from langchain.storage import InMemoryStore\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores.chroma import Chroma\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
|
||||
@@ -115,7 +115,7 @@
|
||||
"\n",
|
||||
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
|
||||
"\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a few relevant columns given the question.\n",
|
||||
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
|
||||
"\n",
|
||||
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
|
||||
"\n",
|
||||
|
||||
@@ -84,7 +84,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install --quiet pypdf langchain-chroma tiktoken openai \n",
|
||||
"%pip install --quiet pypdf chromadb tiktoken openai \n",
|
||||
"%pip uninstall -y langchain-fireworks\n",
|
||||
"%pip install --editable /mnt/disks/data/langchain/libs/partners/fireworks"
|
||||
]
|
||||
@@ -138,7 +138,7 @@
|
||||
"all_splits = text_splitter.split_documents(data)\n",
|
||||
"\n",
|
||||
"# Add to vectorDB\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_fireworks.embeddings import FireworksEmbeddings\n",
|
||||
"\n",
|
||||
"vectorstore = Chroma.from_documents(\n",
|
||||
|
||||
@@ -487,7 +487,7 @@
|
||||
" print(\"*\" * 40)\n",
|
||||
" print(\n",
|
||||
" colored(\n",
|
||||
" f\"After {i + 1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
|
||||
" f\"After {i+1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
|
||||
" \"blue\",\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
|
||||
@@ -170,7 +170,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter\n",
|
||||
"\n",
|
||||
"with open(\"../../state_of_the_union.txt\") as f:\n",
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -7,7 +7,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain-chroma langchain_community tiktoken langchain-openai langchainhub langchain langgraph"
|
||||
"! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain langgraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -30,8 +30,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.document_loaders import WebBaseLoader\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"urls = [\n",
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain-chroma langchain_community tiktoken langchain-openai langchainhub langchain langgraph tavily-python"
|
||||
"! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain langgraph tavily-python"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -77,8 +77,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.document_loaders import WebBaseLoader\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"urls = [\n",
|
||||
@@ -180,8 +180,8 @@
|
||||
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.tools.tavily_search import TavilySearchResults\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.messages import BaseMessage, FunctionMessage\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install langchain-chroma langchain_community tiktoken langchain-openai langchainhub langchain langgraph"
|
||||
"! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain langgraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -86,8 +86,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.document_loaders import WebBaseLoader\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"urls = [\n",
|
||||
@@ -188,7 +188,7 @@
|
||||
"from langchain.output_parsers import PydanticOutputParser\n",
|
||||
"from langchain.output_parsers.openai_tools import PydanticToolsParser\n",
|
||||
"from langchain.prompts import PromptTemplate\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.messages import BaseMessage, FunctionMessage\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
|
||||
@@ -336,7 +336,7 @@
|
||||
" # Create a prompt template with format instructions and the query\n",
|
||||
" prompt = PromptTemplate(\n",
|
||||
" template=\"\"\"You are generating questions that is well optimized for retrieval. \\n \n",
|
||||
" Look at the input and try to reason about the underlying semantic intent / meaning. \\n \n",
|
||||
" Look at the input and try to reason about the underlying sematic intent / meaning. \\n \n",
|
||||
" Here is the initial question:\n",
|
||||
" \\n ------- \\n\n",
|
||||
" {question} \n",
|
||||
@@ -643,7 +643,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.1 64-bit",
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -657,12 +657,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "1a1af0ee75eeea9e2e1ee996c87e7a2b11a0bebd85af04bb136d915cefc0abce"
|
||||
}
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -148,11 +148,11 @@
|
||||
"\n",
|
||||
" instructions = \"None\"\n",
|
||||
" for i in range(max_meta_iters):\n",
|
||||
" print(f\"[Episode {i + 1}/{max_meta_iters}]\")\n",
|
||||
" print(f\"[Episode {i+1}/{max_meta_iters}]\")\n",
|
||||
" chain = initialize_chain(instructions, memory=None)\n",
|
||||
" output = chain.predict(human_input=task)\n",
|
||||
" for j in range(max_iters):\n",
|
||||
" print(f\"(Step {j + 1}/{max_iters})\")\n",
|
||||
" print(f\"(Step {j+1}/{max_iters})\")\n",
|
||||
" print(f\"Assistant: {output}\")\n",
|
||||
" print(\"Human: \")\n",
|
||||
" human_input = input()\n",
|
||||
|
||||
@@ -124,8 +124,8 @@
|
||||
"# Optional-- If you want to enable Langsmith -- good for debugging\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
||||
"os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass()"
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,7 +156,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Ensure you have an HF_TOKEN in your development environment:\n",
|
||||
"# Ensure you have an HF_TOKEN in your development enviornment:\n",
|
||||
"# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
|
||||
"\n",
|
||||
"# Load MongoDB's embedded_movies dataset from Hugging Face\n",
|
||||
|
||||
@@ -58,7 +58,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -U langchain openai langchain-chroma langchain-experimental # (newest versions required for multi-modal)"
|
||||
"! pip install -U langchain openai chromadb langchain-experimental # (newest versions required for multi-modal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -187,7 +187,7 @@
|
||||
"\n",
|
||||
"import chromadb\n",
|
||||
"import numpy as np\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
|
||||
"from PIL import Image as _PILImage\n",
|
||||
"\n",
|
||||
|
||||
@@ -18,14 +18,9 @@
|
||||
"* Use of multimodal embeddings (such as [CLIP](https://openai.com/research/clip)) to embed images and text\n",
|
||||
"* Use of [VDMS](https://github.com/IntelLabs/vdms/blob/master/README.md) as a vector store with support for multi-modal\n",
|
||||
"* Retrieval of both images and text using similarity search\n",
|
||||
"* Passing raw images and text chunks to a multimodal LLM for answer synthesis "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2498a0a1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"* Passing raw images and text chunks to a multimodal LLM for answer synthesis \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Packages\n",
|
||||
"\n",
|
||||
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
|
||||
@@ -38,26 +33,16 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install --quiet -U langchain-vdms langchain-experimental langchain-ollama\n",
|
||||
"# (newest versions required for multi-modal)\n",
|
||||
"! pip install --quiet -U vdms langchain-experimental\n",
|
||||
"\n",
|
||||
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
|
||||
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" \"onnxruntime==1.17.0\" pillow pydantic lxml open_clip_torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "78ac6543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from dotenv import load_dotenv, find_dotenv\n",
|
||||
"# load_dotenv(find_dotenv(), override=True);"
|
||||
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml open_clip_torch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5c8916e",
|
||||
"id": "6a6b6e73",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Start VDMS Server\n",
|
||||
@@ -69,14 +54,15 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1e6e2c15",
|
||||
"id": "5f483872",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"a701e5ac3523006e9540b5355e2d872d5d78383eab61562a675d5b9ac21fde65\n"
|
||||
"docker: Error response from daemon: Conflict. The container name \"/vdms_rag_nb\" is already in use by container \"0c19ed281463ac10d7efe07eb815643e3e534ddf24844357039453ad2b0c27e8\". You have to remove (or rename) that container to be able to reuse that name.\n",
|
||||
"See 'docker run --help'.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -84,11 +70,22 @@
|
||||
"! docker run --rm -d -p 55559:55555 --name vdms_rag_nb intellabs/vdms:latest\n",
|
||||
"\n",
|
||||
"# Connect to VDMS Vector Store\n",
|
||||
"from langchain_vdms.vectorstores import VDMS_Client\n",
|
||||
"from langchain_community.vectorstores.vdms import VDMS_Client\n",
|
||||
"\n",
|
||||
"vdms_client = VDMS_Client(port=55559)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "78ac6543",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# from dotenv import load_dotenv, find_dotenv\n",
|
||||
"# load_dotenv(find_dotenv(), override=True);"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1e94b3fb-8e3e-4736-be0a-ad881626c7bd",
|
||||
@@ -98,9 +95,14 @@
|
||||
"\n",
|
||||
"### Partition PDF text and images\n",
|
||||
" \n",
|
||||
"Let's use famous photographs from the PDF version of Library of Congress Magazine in this example.\n",
|
||||
"Let's look at an example pdf containing interesting images.\n",
|
||||
"\n",
|
||||
"We can use `partition_pdf` from [Unstructured](https://unstructured-io.github.io/unstructured/introduction.html#key-concepts) to extract text and images."
|
||||
"Famous photographs from library of congress:\n",
|
||||
"\n",
|
||||
"* https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\n",
|
||||
"* We'll use this as an example below\n",
|
||||
"\n",
|
||||
"We can use `partition_pdf` below from [Unstructured](https://unstructured-io.github.io/unstructured/introduction.html#key-concepts) to extract text and images."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,13 +116,12 @@
|
||||
"\n",
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"# Folder to store pdf and extracted images\n",
|
||||
"base_datapath = Path(\"./data/multimodal_files\").resolve()\n",
|
||||
"datapath = base_datapath / \"images\"\n",
|
||||
"# Folder with pdf and extracted images\n",
|
||||
"datapath = Path(\"./multimodal_files\").resolve()\n",
|
||||
"datapath.mkdir(parents=True, exist_ok=True)\n",
|
||||
"\n",
|
||||
"pdf_url = \"https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\"\n",
|
||||
"pdf_path = str(base_datapath / pdf_url.split(\"/\")[-1])\n",
|
||||
"pdf_path = str(datapath / pdf_url.split(\"/\")[-1])\n",
|
||||
"with open(pdf_path, \"wb\") as f:\n",
|
||||
" f.write(requests.get(pdf_url).content)"
|
||||
]
|
||||
@@ -173,8 +174,14 @@
|
||||
"source": [
|
||||
"## Multi-modal embeddings with our document\n",
|
||||
"\n",
|
||||
"In this section, we initialize the VDMS vector store for both text and images. For better performance, we use model `ViT-g-14` from [OpenClip multimodal embeddings](https://python.langchain.com/docs/integrations/text_embedding/open_clip).\n",
|
||||
"The images are stored as base64 encoded strings with `vectorstore.add_images`.\n"
|
||||
"We will use [OpenClip multimodal embeddings](https://python.langchain.com/docs/integrations/text_embedding/open_clip).\n",
|
||||
"\n",
|
||||
"We use a larger model for better performance (set in `langchain_experimental.open_clip.py`).\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"model_name = \"ViT-g-14\"\n",
|
||||
"checkpoint = \"laion2b_s34b_b88k\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -186,14 +193,16 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import VDMS\n",
|
||||
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
|
||||
"from langchain_vdms import VDMS\n",
|
||||
"\n",
|
||||
"# Create VDMS\n",
|
||||
"vectorstore = VDMS(\n",
|
||||
" client=vdms_client,\n",
|
||||
" collection_name=\"mm_rag_clip_photos\",\n",
|
||||
" embedding=OpenCLIPEmbeddings(model_name=\"ViT-g-14\", checkpoint=\"laion2b_s34b_b88k\"),\n",
|
||||
" embedding_function=OpenCLIPEmbeddings(\n",
|
||||
" model_name=\"ViT-g-14\", checkpoint=\"laion2b_s34b_b88k\"\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Get image URIs with .jpg extension only\n",
|
||||
@@ -224,7 +233,7 @@
|
||||
"source": [
|
||||
"## RAG\n",
|
||||
"\n",
|
||||
"Here we define helper functions for image results."
|
||||
"`vectorstore.add_images` will store / retrieve images as base64 encoded strings."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -313,10 +322,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.messages import HumanMessage, SystemMessage\n",
|
||||
"from langchain_community.llms.ollama import Ollama\n",
|
||||
"from langchain_core.messages import HumanMessage\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
|
||||
"from langchain_ollama.llms import OllamaLLM\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def prompt_func(data_dict):\n",
|
||||
@@ -341,8 +350,8 @@
|
||||
" \"As an expert art critic and historian, your task is to analyze and interpret images, \"\n",
|
||||
" \"considering their historical and cultural significance. Alongside the images, you will be \"\n",
|
||||
" \"provided with related text to offer context. Both will be retrieved from a vectorstore based \"\n",
|
||||
" \"on user-input keywords. Please use your extensive knowledge and analytical skills to provide a \"\n",
|
||||
" \"comprehensive summary that includes:\\n\"\n",
|
||||
" \"on user-input keywords. Please convert answers to english and use your extensive knowledge \"\n",
|
||||
" \"and analytical skills to provide a comprehensive summary that includes:\\n\"\n",
|
||||
" \"- A detailed description of the visual elements in the image.\\n\"\n",
|
||||
" \"- The historical and cultural context of the image.\\n\"\n",
|
||||
" \"- An interpretation of the image's symbolism and meaning.\\n\"\n",
|
||||
@@ -360,7 +369,7 @@
|
||||
" \"\"\"Multi-modal RAG chain\"\"\"\n",
|
||||
"\n",
|
||||
" # Multi-modal LLM\n",
|
||||
" llm_model = OllamaLLM(\n",
|
||||
" llm_model = Ollama(\n",
|
||||
" verbose=True, temperature=0.5, model=\"llava\", base_url=\"http://localhost:11434\"\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
@@ -383,8 +392,7 @@
|
||||
"id": "1566096d-97c2-4ddc-ba4a-6ef88c525e4e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Test retrieval and run RAG\n",
|
||||
"Now let's query for a `woman with children` and retrieve the top results."
|
||||
"## Test retrieval and run RAG"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -420,121 +428,6 @@
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"© 2017 LARRY D. MOORE\n",
|
||||
"\n",
|
||||
"contemporary criticism of the less-than- thoughtful circumstances under which Lange photographed Thomson, the picture’s power to engage has not diminished. Artists in other countries have appropriated the image, changing the mother’s features into those of other ethnicities, but keeping her expression and the positions of her clinging children. Long after anyone could help the Thompson family, this picture has resonance in another time of national crisis, unemployment and food shortages.\n",
|
||||
"\n",
|
||||
"A striking, but very different picture is a 1900 portrait of the legendary Hin-mah-too-yah- lat-kekt (Chief Joseph) of the Nez Percé people. The Bureau of American Ethnology in Washington, D.C., regularly arranged for its photographer, De Lancey Gill, to photograph Native American delegations that came to the capital to confer with officials about tribal needs and concerns. Although Gill described Chief Joseph as having “an air of gentleness and quiet reserve,” the delegate skeptically appraises the photographer, which is not surprising given that the United States broke five treaties with Chief Joseph and his father between 1855 and 1885.\n",
|
||||
"\n",
|
||||
"More than a glance, second looks may reveal new knowledge into complex histories.\n",
|
||||
"\n",
|
||||
"Anne Wilkes Tucker is the photography curator emeritus of the Museum of Fine Arts, Houston and curator of the “Not an Ostrich” exhibition.\n",
|
||||
"\n",
|
||||
"28\n",
|
||||
"\n",
|
||||
"28 LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"THEYRE WILLING TO HAVE MEENTERTAIN THEM DURING THE DAY,BUT AS SOON AS IT STARTSGETTING DARK, THEY ALLGO OFF, AND LEAVE ME! \n",
|
||||
"ROSA PARKS: IN HER OWN WORDS\n",
|
||||
"\n",
|
||||
"COMIC ART: 120 YEARS OF PANELS AND PAGES\n",
|
||||
"\n",
|
||||
"SHALL NOT BE DENIED: WOMEN FIGHT FOR THE VOTE\n",
|
||||
"\n",
|
||||
"More information loc.gov/exhibits\n",
|
||||
"Nuestra Sefiora de las Iguanas\n",
|
||||
"\n",
|
||||
"Graciela Iturbide’s 1979 portrait of Zobeida Díaz in the town of Juchitán in southeastern Mexico conveys the strength of women and reflects their important contributions to the economy. Díaz, a merchant, was selling iguanas to cook and eat, carrying them on her head, as is customary.\n",
|
||||
"\n",
|
||||
"GRACIELA ITURBIDE. “NUESTRA SEÑORA DE LAS IGUANAS.” 1979. GELATIN SILVER PRINT. © GRACIELA ITURBIDE, USED BY PERMISSION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
|
||||
"\n",
|
||||
"Iturbide requested permission to take a photograph, but this proved challenging because the iguanas were constantly moving, causing Díaz to laugh. The result, however, was a brilliant portrait that the inhabitants of Juchitán claimed with pride. They have reproduced it on posters and erected a statue honoring Díaz and her iguanas. The photo now appears throughout the world, inspiring supporters of feminism, women’s rights and gender equality.\n",
|
||||
"\n",
|
||||
"—Adam Silvia is a curator in the Prints and Photographs Division.\n",
|
||||
"\n",
|
||||
"6\n",
|
||||
"\n",
|
||||
"6 LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"‘Migrant Mother’ is Florence Owens Thompson\n",
|
||||
"\n",
|
||||
"The iconic portrait that became the face of the Great Depression is also the most famous photograph in the collections of the Library of Congress.\n",
|
||||
"\n",
|
||||
"The Library holds the original source of the photo — a nitrate negative measuring 4 by 5 inches. Do you see a faint thumb in the bottom right? The photographer, Dorothea Lange, found the thumb distracting and after a few years had the negative altered to make the thumb almost invisible. Lange’s boss at the Farm Security Administration, Roy Stryker, criticized her action because altering a negative undermines the credibility of a documentary photo.\n",
|
||||
"Shrimp Picker\n",
|
||||
"\n",
|
||||
"The photos and evocative captions of Lewis Hine served as source material for National Child Labor Committee reports and exhibits exposing abusive child labor practices in the United States in the first decades of the 20th century.\n",
|
||||
"\n",
|
||||
"LEWIS WICKES HINE. “MANUEL, THE YOUNG SHRIMP-PICKER, FIVE YEARS OLD, AND A MOUNTAIN OF CHILD-LABOR OYSTER SHELLS BEHIND HIM. HE WORKED LAST YEAR. UNDERSTANDS NOT A WORD OF ENGLISH. DUNBAR, LOPEZ, DUKATE COMPANY. LOCATION: BILOXI, MISSISSIPPI.” FEBRUARY 1911. NATIONAL CHILD LABOR COMMITTEE COLLECTION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
|
||||
"\n",
|
||||
"For 15 years, Hine\n",
|
||||
"\n",
|
||||
"crisscrossed the country, documenting the practices of the worst offenders. His effective use of photography made him one of the committee's greatest publicists in the campaign for legislation to ban child labor.\n",
|
||||
"\n",
|
||||
"Hine was a master at taking photos that catch attention and convey a message and, in this photo, he framed Manuel in a setting that drove home the boy’s small size and unsafe environment.\n",
|
||||
"\n",
|
||||
"Captions on photos of other shrimp pickers emphasized their long working hours as well as one hazard of the job: The acid from the shrimp made pickers’ hands sore and “eats the shoes off your feet.”\n",
|
||||
"\n",
|
||||
"Such images alerted viewers to all that workers, their families and the nation sacrificed when children were part of the labor force. The Library holds paper records of the National Child Labor Committee as well as over 5,000 photographs.\n",
|
||||
"\n",
|
||||
"—Barbara Natanson is head of the Reference Section in the Prints and Photographs Division.\n",
|
||||
"\n",
|
||||
"8\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"Intergenerational Portrait\n",
|
||||
"\n",
|
||||
"Raised on the Apsáalooke (Crow) reservation in Montana, photographer Wendy Red Star created her “Apsáalooke Feminist” self-portrait series with her daughter Beatrice. With a dash of wry humor, mother and daughter are their own first-person narrators.\n",
|
||||
"\n",
|
||||
"Red Star explains the significance of their appearance: “The dress has power: You feel strong and regal wearing it. In my art, the elk tooth dress specifically symbolizes Crow womanhood and the matrilineal line connecting me to my ancestors. As a mother, I spend hours searching for the perfect elk tooth dress materials to make a prized dress for my daughter.”\n",
|
||||
"\n",
|
||||
"In a world that struggles with cultural identities, this photograph shows us the power and beauty of blending traditional and contemporary styles.\n",
|
||||
"‘American Gothic’ Product #216040262 Price: $24\n",
|
||||
"\n",
|
||||
"U.S. Capitol at Night Product #216040052 Price: $24\n",
|
||||
"\n",
|
||||
"Good Reading Ahead Product #21606142 Price: $24\n",
|
||||
"\n",
|
||||
"Gordon Parks created an iconic image with this 1942 photograph of cleaning woman Ella Watson.\n",
|
||||
"\n",
|
||||
"Snow blankets the U.S. Capitol in this classic image by Ernest L. Crandall.\n",
|
||||
"\n",
|
||||
"Start your new year out right with a poster promising good reading for months to come.\n",
|
||||
"\n",
|
||||
"▪ Order online: loc.gov/shop ▪ Order by phone: 888.682.3557\n",
|
||||
"\n",
|
||||
"26\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"LIBRARY OF CONGRESS MAGAZINE\n",
|
||||
"\n",
|
||||
"SUPPORT\n",
|
||||
"\n",
|
||||
"A PICTURE OF PHILANTHROPY Annenberg Foundation Gives $1 Million and a Photographic Collection to the Library.\n",
|
||||
"\n",
|
||||
"A major gift by Wallis Annenberg and the Annenberg Foundation in Los Angeles will support the effort to reimagine the visitor experience at the Library of Congress. The foundation also is donating 1,000 photographic prints from its Annenberg Space for Photography exhibitions to the Library.\n",
|
||||
"\n",
|
||||
"The Library is pursuing a multiyear plan to transform the experience of its nearly 2 million annual visitors, share more of its treasures with the public and show how Library collections connect with visitors’ own creativity and research. The project is part of a strategic plan established by Librarian of Congress Carla Hayden to make the Library more user-centered for Congress, creators and learners of all ages.\n",
|
||||
"\n",
|
||||
"A 2018 exhibition at the Annenberg Space for Photography in Los Angeles featured over 400 photographs from the Library. The Library is planning a future photography exhibition, based on the Annenberg-curated show, along with a documentary film on the Library and its history, produced by the Annenberg Space for Photography.\n",
|
||||
"\n",
|
||||
"“The nation’s library is honored to have the strong support of Wallis Annenberg and the Annenberg Foundation as we enhance the experience for our visitors,” Hayden said. “We know that visitors will find new connections to the Library through the incredible photography collections and countless other treasures held here to document our nation’s history and creativity.”\n",
|
||||
"\n",
|
||||
"To enhance the Library’s holdings, the foundation is giving the Library photographic prints for long-term preservation from 10 other exhibitions hosted at the Annenberg Space for Photography. The Library holds one of the world’s largest photography collections, with about 14 million photos and over 1 million images digitized and available online.\n",
|
||||
"18 LIBRARY OF CONGRESS MAGAZINE\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
@@ -559,14 +452,6 @@
|
||||
" print(doc.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "15e9b54d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's use the `multi_modal_rag_chain` to process the same query and display the response."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
@@ -577,17 +462,10 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The image is a black and white photograph by Dorothea Lange titled \"Destitute Pea Pickers in California. Mother of Seven Children. Age Thirty-Two. Nipomo, California.\" It was taken in March 1936 as part of the Farm Security Administration-Office of War Information Collection.\n",
|
||||
"\n",
|
||||
"The photograph features a woman with seven children, who appear to be in a state of poverty and hardship. The woman is seated, looking directly at the camera, while three of her children are standing behind her. They all seem to be dressed in ragged clothing, indicative of their impoverished condition.\n",
|
||||
"\n",
|
||||
"The historical context of this image is related to the Great Depression, which was a period of economic hardship in the United States that lasted from 1929 to 1939. During this time, many people struggled to make ends meet, and poverty was widespread. This photograph captures the plight of one such family during this difficult period.\n",
|
||||
"\n",
|
||||
"The symbolism of the image is multifaceted. The woman's direct gaze at the camera can be seen as a plea for help or an expression of desperation. The ragged clothing of the children serves as a stark reminder of the poverty and hardship experienced by many during this time.\n",
|
||||
"\n",
|
||||
"In terms of connections to the related text, it is mentioned that Florence Owens Thompson, the woman in the photograph, initially regretted having her picture taken. However, she later came to appreciate the importance of the image as a representation of the struggles faced by many during the Great Depression. The mention of Helena Zinkham suggests that she may have played a role in the creation or distribution of this photograph.\n",
|
||||
"\n",
|
||||
"Overall, this image is a powerful depiction of poverty and hardship during the Great Depression, capturing the resilience and struggles of one family amidst difficult times. \n"
|
||||
"1. Detailed description of the visual elements in the image: The image features a woman with children, likely a mother and her family, standing together outside. They appear to be poor or struggling financially, as indicated by their attire and surroundings.\n",
|
||||
"2. Historical and cultural context of the image: The photo was taken in 1936 during the Great Depression, when many families struggled to make ends meet. Dorothea Lange, a renowned American photographer, took this iconic photograph that became an emblem of poverty and hardship experienced by many Americans at that time.\n",
|
||||
"3. Interpretation of the image's symbolism and meaning: The image conveys a sense of unity and resilience despite adversity. The woman and her children are standing together, displaying their strength as a family unit in the face of economic challenges. The photograph also serves as a reminder of the importance of empathy and support for those who are struggling.\n",
|
||||
"4. Connections between the image and the related text: The text provided offers additional context about the woman in the photo, her background, and her feelings towards the photograph. It highlights the historical backdrop of the Great Depression and emphasizes the significance of this particular image as a representation of that time period.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -616,15 +494,17 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fe4a98ee",
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8ba652da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".test-venv",
|
||||
"display_name": ".langchain-venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -638,7 +518,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.10"
|
||||
"version": "3.10.13"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -183,7 +183,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"game_description = f\"\"\"Here is the topic for a Dungeons & Dragons game: {quest}.\n",
|
||||
" The characters are: {(*character_names,)}.\n",
|
||||
" The characters are: {*character_names,}.\n",
|
||||
" The story is narrated by the storyteller, {storyteller_name}.\"\"\"\n",
|
||||
"\n",
|
||||
"player_descriptor_system_message = SystemMessage(\n",
|
||||
@@ -334,7 +334,7 @@
|
||||
" You are the storyteller, {storyteller_name}.\n",
|
||||
" Please make the quest more specific. Be creative and imaginative.\n",
|
||||
" Please reply with the specified quest in {word_limit} words or less. \n",
|
||||
" Speak directly to the characters: {(*character_names,)}.\n",
|
||||
" Speak directly to the characters: {*character_names,}.\n",
|
||||
" Do not add anything else.\"\"\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
|
||||
@@ -200,7 +200,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"game_description = f\"\"\"Here is the topic for the presidential debate: {topic}.\n",
|
||||
"The presidential candidates are: {\", \".join(character_names)}.\"\"\"\n",
|
||||
"The presidential candidates are: {', '.join(character_names)}.\"\"\"\n",
|
||||
"\n",
|
||||
"player_descriptor_system_message = SystemMessage(\n",
|
||||
" content=\"You can add detail to the description of each presidential candidate.\"\n",
|
||||
@@ -595,7 +595,7 @@
|
||||
" Frame the debate topic as a problem to be solved.\n",
|
||||
" Be creative and imaginative.\n",
|
||||
" Please reply with the specified topic in {word_limit} words or less. \n",
|
||||
" Speak directly to the presidential candidates: {(*character_names,)}.\n",
|
||||
" Speak directly to the presidential candidates: {*character_names,}.\n",
|
||||
" Do not add anything else.\"\"\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
|
||||
@@ -58,7 +58,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -U langchain-nomic langchain-chroma langchain-community tiktoken langchain-openai langchain"
|
||||
"! pip install -U langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -71,9 +71,9 @@
|
||||
"# Optional: LangSmith API keys\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
|
||||
"os.environ[\"LANGSMITH_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
|
||||
"os.environ[\"LANGSMITH_API_KEY\"] = \"api_key\""
|
||||
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
|
||||
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
|
||||
"os.environ[\"LANGCHAIN_API_KEY\"] = \"api_key\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -167,7 +167,7 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
|
||||
"from langchain_nomic import NomicEmbeddings\n",
|
||||
@@ -215,8 +215,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_community.chat_models import ChatOllama\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_ollama import ChatOllama\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"# Prompt\n",
|
||||
|
||||
@@ -56,7 +56,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install -U langchain-nomic langchain-chroma langchain-community tiktoken langchain-openai langchain # (newest versions required for multi-modal)"
|
||||
"! pip install -U langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain # (newest versions required for multi-modal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -194,7 +194,7 @@
|
||||
"\n",
|
||||
"import chromadb\n",
|
||||
"import numpy as np\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_nomic import NomicEmbeddings\n",
|
||||
"from PIL import Image as _PILImage\n",
|
||||
"\n",
|
||||
|
||||
@@ -20,8 +20,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"from langchain_text_splitters import CharacterTextSplitter"
|
||||
]
|
||||
@@ -395,7 +395,8 @@
|
||||
"prompt_messages = [\n",
|
||||
" SystemMessage(\n",
|
||||
" content=(\n",
|
||||
" \"You are a world class algorithm to answer questions in a specific format.\"\n",
|
||||
" \"You are a world class algorithm to answer \"\n",
|
||||
" \"questions in a specific format.\"\n",
|
||||
" )\n",
|
||||
" ),\n",
|
||||
" HumanMessage(content=\"Answer question using the following context\"),\n",
|
||||
|
||||
@@ -29,7 +29,7 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"LANGSMITH_PROJECT\"] = \"movie-qa\""
|
||||
"os.environ[\"LANGCHAIN_PROJECT\"] = \"movie-qa\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -80,7 +80,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.vectorstores import Chroma\n",
|
||||
"from langchain_openai import OpenAIEmbeddings\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
|
||||
@@ -25,7 +25,7 @@
|
||||
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
|
||||
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
|
||||
"\n",
|
||||
"This guide demonstrates how Oracle AI Vector Search can be used with LangChain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
|
||||
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
|
||||
"\n",
|
||||
" * Loading the documents from various sources using OracleDocLoader\n",
|
||||
" * Summarizing them within/outside the database using OracleSummary\n",
|
||||
@@ -47,19 +47,7 @@
|
||||
"source": [
|
||||
"### Prerequisites\n",
|
||||
"\n",
|
||||
"Please install the Oracle Database [python-oracledb driver](https://pypi.org/project/oracledb/) to use LangChain with Oracle AI Vector Search:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"$ python -m pip install --upgrade oracledb\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Demo User\n",
|
||||
"First, connect as a privileged user to create a demo user with all the required privileges. Change the credentials for your environment. Also set the DEMO_PY_DIR path to a directory on the database host where your model file is located:"
|
||||
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -68,30 +56,65 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# pip install oracledb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create Demo User\n",
|
||||
"First, create a demo user with all the required privileges. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n",
|
||||
"User setup done!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# Please update with your SYSTEM (or privileged user) username, password, and database connection string\n",
|
||||
"username = \"SYSTEM\"\n",
|
||||
"# Update with your username, password, hostname, and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"\n",
|
||||
" with connection.cursor() as cursor:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
" try:\n",
|
||||
" cursor.execute(\n",
|
||||
" \"\"\"\n",
|
||||
" begin\n",
|
||||
" -- Drop user\n",
|
||||
" execute immediate 'drop user if exists testuser cascade';\n",
|
||||
"\n",
|
||||
" begin\n",
|
||||
" execute immediate 'drop user testuser cascade';\n",
|
||||
" exception\n",
|
||||
" when others then\n",
|
||||
" dbms_output.put_line('Error dropping user: ' || SQLERRM);\n",
|
||||
" end;\n",
|
||||
" \n",
|
||||
" -- Create user and grant privileges\n",
|
||||
" execute immediate 'create user testuser identified by testuser';\n",
|
||||
" execute immediate 'grant connect, unlimited tablespace, create credential, create procedure, create any index to testuser';\n",
|
||||
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/home/yourname/demo/orachain''';\n",
|
||||
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/scratch/hroy/view_storage/hroy_devstorage/demo/orachain''';\n",
|
||||
" execute immediate 'grant read, write on directory DEMO_PY_DIR to public';\n",
|
||||
" execute immediate 'grant create mining model to testuser';\n",
|
||||
"\n",
|
||||
" \n",
|
||||
" -- Network access\n",
|
||||
" begin\n",
|
||||
" DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n",
|
||||
@@ -104,7 +127,15 @@
|
||||
" end;\n",
|
||||
" \"\"\"\n",
|
||||
" )\n",
|
||||
" print(\"User setup done!\")"
|
||||
" print(\"User setup done!\")\n",
|
||||
" except Exception as e:\n",
|
||||
" print(f\"User setup failed with error: {e}\")\n",
|
||||
" finally:\n",
|
||||
" cursor.close()\n",
|
||||
" conn.close()\n",
|
||||
"except Exception as e:\n",
|
||||
" print(f\"Connection failed with error: {e}\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -112,13 +143,13 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Process Documents using Oracle AI\n",
|
||||
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by LangChain.\n",
|
||||
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
|
||||
"\n",
|
||||
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
|
||||
"\n",
|
||||
"The Oracle AI Vector Search LangChain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
|
||||
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
|
||||
"\n",
|
||||
"In the sections that follow, we will detail the utilization of Oracle AI LangChain APIs to effectively implement each of these processes."
|
||||
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -126,24 +157,38 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Connect to Demo User\n",
|
||||
"The following sample code shows how to connect to Oracle Database using the python-oracledb driver. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You can switch to Thick mode if you are unable to use Thin mode."
|
||||
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 45,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"import oracledb\n",
|
||||
"\n",
|
||||
"# please update with your username, password, and database connection string\n",
|
||||
"username = \"testuser\"\n",
|
||||
"# please update with your username, password, hostname and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"connection = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
"print(\"Connection successful!\")"
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,12 +201,22 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Table created and populated.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"with connection.cursor() as cursor:\n",
|
||||
" drop_table_sql = \"\"\"drop table if exists demo_tab\"\"\"\n",
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
"\n",
|
||||
" drop_table_sql = \"\"\"drop table demo_tab\"\"\"\n",
|
||||
" cursor.execute(drop_table_sql)\n",
|
||||
"\n",
|
||||
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
|
||||
@@ -184,9 +239,15 @@
|
||||
" ]\n",
|
||||
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
|
||||
"\n",
|
||||
"connection.commit()\n",
|
||||
" conn.commit()\n",
|
||||
"\n",
|
||||
"print(\"Table created and populated.\")"
|
||||
" print(\"Table created and populated.\")\n",
|
||||
" cursor.close()\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Table creation failed.\")\n",
|
||||
" cursor.close()\n",
|
||||
" conn.close()\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -200,22 +261,30 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the ONNX Model\n",
|
||||
"### Load ONNX Model\n",
|
||||
"\n",
|
||||
"Oracle accommodates a variety of embedding providers, enabling you to choose between proprietary database solutions and third-party services such as Oracle Generative AI Service and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||||
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
|
||||
"\n",
|
||||
"***Important*** : Should you opt for the database option, you must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||||
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
|
||||
"\n",
|
||||
"A significant advantage of utilizing an ONNX model directly within Oracle Database is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||||
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
|
||||
"\n",
|
||||
"Below is the example code to upload an ONNX model into Oracle Database:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 47,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"ONNX model loaded.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"\n",
|
||||
@@ -225,8 +294,12 @@
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
"model_name = \"demo_model\"\n",
|
||||
"\n",
|
||||
"OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
|
||||
"print(\"ONNX model loaded.\")"
|
||||
"try:\n",
|
||||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||||
" print(\"ONNX model loaded.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"ONNX model loading failed!\")\n",
|
||||
" sys.exit(1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -248,7 +321,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with connection.cursor() as cursor:\n",
|
||||
"try:\n",
|
||||
" cursor = conn.cursor()\n",
|
||||
" cursor.execute(\n",
|
||||
" \"\"\"\n",
|
||||
" declare\n",
|
||||
@@ -275,7 +349,12 @@
|
||||
" params => json(jo.to_string));\n",
|
||||
" end;\n",
|
||||
" \"\"\"\n",
|
||||
" )"
|
||||
" )\n",
|
||||
" cursor.close()\n",
|
||||
" print(\"Credentials created.\")\n",
|
||||
"except Exception as ex:\n",
|
||||
" cursor.close()\n",
|
||||
" raise"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -283,24 +362,33 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load Documents\n",
|
||||
"You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
|
||||
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
|
||||
"\n",
|
||||
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
|
||||
"\n",
|
||||
"Below is a sample code snippet that demonstrates how to use OracleDocLoader:"
|
||||
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 48,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of docs loaded: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"# loading from Oracle Database table\n",
|
||||
"# make sure you have the table with this specification\n",
|
||||
"loader_params = {}\n",
|
||||
"loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
@@ -308,7 +396,7 @@
|
||||
"}\n",
|
||||
"\n",
|
||||
"\"\"\" load the docs \"\"\"\n",
|
||||
"loader = OracleDocLoader(conn=connection, params=loader_params)\n",
|
||||
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
|
||||
"docs = loader.load()\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
@@ -321,23 +409,23 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Summary\n",
|
||||
"Now that you have loaded the documents, you may want to generate a summary for each document. The Oracle AI Vector Search LangChain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, Oracle Generative AI Service, HuggingFace, among others, allowing you to select the provider that best meets their needs. To utilize these capabilities, you must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
|
||||
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** You may need to set proxy if you want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
|
||||
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# proxy to be used when we instantiate summary and embedder objects\n",
|
||||
"# proxy to be used when we instantiate summary and embedder object\n",
|
||||
"proxy = \"\""
|
||||
]
|
||||
},
|
||||
@@ -345,14 +433,22 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code shows how to generate a summary:"
|
||||
"The following sample code will show how to generate summary:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 49,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of Summaries: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.utilities.oracleai import OracleSummary\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -367,7 +463,7 @@
|
||||
"\n",
|
||||
"# get the summary instance\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"summ = OracleSummary(conn=connection, params=summary_params, proxy=proxy)\n",
|
||||
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
|
||||
"\n",
|
||||
"list_summary = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -391,9 +487,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 50,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of Chunks: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -402,7 +506,7 @@
|
||||
"splitter_params = {\"normalize\": \"all\"}\n",
|
||||
"\n",
|
||||
"\"\"\" get the splitter instance \"\"\"\n",
|
||||
"splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
|
||||
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
|
||||
"\n",
|
||||
"list_chunks = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -419,19 +523,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate Embeddings\n",
|
||||
"Now that the documents are chunked as per requirements, you may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||||
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"***Note:*** You may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||||
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -443,14 +547,22 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following sample code shows how to generate embeddings:"
|
||||
"The following sample code will show how to generate embeddings:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 51,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Number of embeddings: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
@@ -460,7 +572,7 @@
|
||||
"\n",
|
||||
"# get the embedding instance\n",
|
||||
"# Remove proxy if not required\n",
|
||||
"embedder = OracleEmbeddings(conn=connection, params=embedder_params, proxy=proxy)\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
|
||||
"\n",
|
||||
"embeddings = []\n",
|
||||
"for doc in docs:\n",
|
||||
@@ -479,19 +591,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create Oracle AI Vector Store\n",
|
||||
"Now that you know how to use Oracle AI LangChain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
|
||||
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, let's import all the dependencies:"
|
||||
"First, let's import all the dependencies."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 52,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@@ -514,80 +626,100 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, let's combine all document processing stages together. Here is the sample code:"
|
||||
"Next, let's combine all document processing stages together. Here is the sample code below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 53,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Connection successful!\n",
|
||||
"ONNX model loaded.\n",
|
||||
"Number of total chunks with metadata: 3\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\"\"\"\n",
|
||||
"In this sample example, we will use 'database' provider for both summary and embeddings\n",
|
||||
"so, we don't need to do the following:\n",
|
||||
"In this sample example, we will use 'database' provider for both summary and embeddings.\n",
|
||||
"So, we don't need to do the followings:\n",
|
||||
" - set proxy for 3rd party providers\n",
|
||||
" - create credential for 3rd party providers\n",
|
||||
"\n",
|
||||
"If you choose to use 3rd party provider, please follow the necessary steps for proxy and credential.\n",
|
||||
"If you choose to use 3rd party provider, \n",
|
||||
"please follow the necessary steps for proxy and credential.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"# please update with your username, password, and database connection string\n",
|
||||
"# oracle connection\n",
|
||||
"# please update with your username, password, hostname, and service_name\n",
|
||||
"username = \"\"\n",
|
||||
"password = \"\"\n",
|
||||
"dsn = \"\"\n",
|
||||
"\n",
|
||||
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
|
||||
"try:\n",
|
||||
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
|
||||
" print(\"Connection successful!\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"Connection failed!\")\n",
|
||||
" sys.exit(1)\n",
|
||||
"\n",
|
||||
" # load onnx model\n",
|
||||
" # please update with your related information\n",
|
||||
" onnx_dir = \"DEMO_PY_DIR\"\n",
|
||||
" onnx_file = \"tinybert.onnx\"\n",
|
||||
" model_name = \"demo_model\"\n",
|
||||
" OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
|
||||
"\n",
|
||||
"# load onnx model\n",
|
||||
"# please update with your related information\n",
|
||||
"onnx_dir = \"DEMO_PY_DIR\"\n",
|
||||
"onnx_file = \"tinybert.onnx\"\n",
|
||||
"model_name = \"demo_model\"\n",
|
||||
"try:\n",
|
||||
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
|
||||
" print(\"ONNX model loaded.\")\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"ONNX model loading failed!\")\n",
|
||||
" sys.exit(1)\n",
|
||||
"\n",
|
||||
" # params\n",
|
||||
" # please update necessary fields with related information\n",
|
||||
" loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
" \"colname\": \"data\",\n",
|
||||
" }\n",
|
||||
" summary_params = {\n",
|
||||
" \"provider\": \"database\",\n",
|
||||
" \"glevel\": \"S\",\n",
|
||||
" \"numParagraphs\": 1,\n",
|
||||
" \"language\": \"english\",\n",
|
||||
" }\n",
|
||||
" splitter_params = {\"normalize\": \"all\"}\n",
|
||||
" embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
" # instantiate loader, summary, splitter, and embedder\n",
|
||||
" loader = OracleDocLoader(conn=connection, params=loader_params)\n",
|
||||
" summary = OracleSummary(conn=connection, params=summary_params)\n",
|
||||
" splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
|
||||
" embedder = OracleEmbeddings(conn=connection, params=embedder_params)\n",
|
||||
"# params\n",
|
||||
"# please update necessary fields with related information\n",
|
||||
"loader_params = {\n",
|
||||
" \"owner\": \"testuser\",\n",
|
||||
" \"tablename\": \"demo_tab\",\n",
|
||||
" \"colname\": \"data\",\n",
|
||||
"}\n",
|
||||
"summary_params = {\n",
|
||||
" \"provider\": \"database\",\n",
|
||||
" \"glevel\": \"S\",\n",
|
||||
" \"numParagraphs\": 1,\n",
|
||||
" \"language\": \"english\",\n",
|
||||
"}\n",
|
||||
"splitter_params = {\"normalize\": \"all\"}\n",
|
||||
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
|
||||
"\n",
|
||||
" # process the documents\n",
|
||||
" chunks_with_mdata = []\n",
|
||||
" for id, doc in enumerate(docs, start=1):\n",
|
||||
" summ = summary.get_summary(doc.page_content)\n",
|
||||
" chunks = splitter.split_text(doc.page_content)\n",
|
||||
" for ic, chunk in enumerate(chunks, start=1):\n",
|
||||
" chunk_metadata = doc.metadata.copy()\n",
|
||||
" chunk_metadata[\"id\"] = (\n",
|
||||
" chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
|
||||
" )\n",
|
||||
" chunk_metadata[\"document_id\"] = str(id)\n",
|
||||
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
|
||||
" chunks_with_mdata.append(\n",
|
||||
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
|
||||
" )\n",
|
||||
"# instantiate loader, summary, splitter, and embedder\n",
|
||||
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
|
||||
"summary = OracleSummary(conn=conn, params=summary_params)\n",
|
||||
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
|
||||
"embedder = OracleEmbeddings(conn=conn, params=embedder_params)\n",
|
||||
"\n",
|
||||
" \"\"\" verify \"\"\"\n",
|
||||
" print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
|
||||
"# process the documents\n",
|
||||
"chunks_with_mdata = []\n",
|
||||
"for id, doc in enumerate(docs, start=1):\n",
|
||||
" summ = summary.get_summary(doc.page_content)\n",
|
||||
" chunks = splitter.split_text(doc.page_content)\n",
|
||||
" for ic, chunk in enumerate(chunks, start=1):\n",
|
||||
" chunk_metadata = doc.metadata.copy()\n",
|
||||
" chunk_metadata[\"id\"] = chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
|
||||
" chunk_metadata[\"document_id\"] = str(id)\n",
|
||||
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
|
||||
" chunks_with_mdata.append(\n",
|
||||
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"\"\"\" verify \"\"\"\n",
|
||||
"print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -601,15 +733,23 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 55,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Vector Store Table: oravs\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# create Oracle AI Vector Store\n",
|
||||
"vectorstore = OracleVS.from_documents(\n",
|
||||
" chunks_with_mdata,\n",
|
||||
" embedder,\n",
|
||||
" client=connection,\n",
|
||||
" client=conn,\n",
|
||||
" table_name=\"oravs\",\n",
|
||||
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
|
||||
")\n",
|
||||
@@ -638,12 +778,12 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 56,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"oraclevs.create_index(\n",
|
||||
" connection, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
|
||||
" conn, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(\"Index created.\")"
|
||||
@@ -653,7 +793,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. You may adjust various parameters according to your specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
|
||||
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
|
||||
"\n",
|
||||
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
|
||||
]
|
||||
@@ -665,16 +805,29 @@
|
||||
"## Perform Semantic Search\n",
|
||||
"All set!\n",
|
||||
"\n",
|
||||
"You have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. You are now prepared to proceed with semantic searches.\n",
|
||||
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
|
||||
"\n",
|
||||
"Below is the sample code for this process:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 58,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
|
||||
"[]\n",
|
||||
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
|
||||
"[]\n",
|
||||
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
|
||||
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What is Oracle AI Vector Store?\"\n",
|
||||
"filter = {\"document_id\": [\"1\"]}\n",
|
||||
@@ -719,7 +872,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.3"
|
||||
"version": "3.11.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,759 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "10f50955-be55-422f-8c62-3a32f8cf02ed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# RAG application running locally on Intel Xeon CPU using langchain and open-source models"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "48113be6-44bb-4aac-aed3-76a1365b9561",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Author - Pratool Bharti (pratool.bharti@intel.com)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8b10b54b-1572-4ea1-9c1e-1d29fcc3dcd9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this cookbook, we use langchain tools and open source models to execute locally on CPU. This notebook has been validated to run on Intel Xeon 8480+ CPU. Here we implement a RAG pipeline for Llama2 model to answer questions about Intel Q1 2024 earnings release."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acadbcec-3468-4926-8ce5-03b678041c0a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Create a conda or virtualenv environment with python >=3.10 and install following libraries**\n",
|
||||
"<br>\n",
|
||||
"\n",
|
||||
"`pip install --upgrade langchain langchain-community langchainhub langchain-chroma bs4 gpt4all pypdf pysqlite3-binary` <br>\n",
|
||||
"`pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "84c392c8-700a-42ec-8e94-806597f22e43",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Load pysqlite3 in sys modules since ChromaDB requires sqlite3.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "145cd491-b388-4ea7-bdc8-2f4995cac6fd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"__import__(\"pysqlite3\")\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"sys.modules[\"sqlite3\"] = sys.modules.pop(\"pysqlite3\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "14dde7e2-b236-49b9-b3a0-08c06410418c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Import essential components from langchain to load and split data**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "887643ba-249e-48d6-9aa7-d25087e8dfbf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"from langchain_community.document_loaders import PyPDFLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "922c0eba-8736-4de5-bd2f-3d0f00b16e43",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Download Intel Q1 2024 earnings release**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "2d6a2419-5338-4188-8615-a40a65ff8019",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--2024-07-15 15:04:43-- https://d1io3yog0oux5.cloudfront.net/_11d435a500963f99155ee058df09f574/intel/db/887/9014/earnings_release/Q1+24_EarningsRelease_FINAL.pdf\n",
|
||||
"Resolving proxy-dmz.intel.com (proxy-dmz.intel.com)... 10.7.211.16\n",
|
||||
"Connecting to proxy-dmz.intel.com (proxy-dmz.intel.com)|10.7.211.16|:912... connected.\n",
|
||||
"Proxy request sent, awaiting response... 200 OK\n",
|
||||
"Length: 133510 (130K) [application/pdf]\n",
|
||||
"Saving to: ‘intel_q1_2024_earnings.pdf’\n",
|
||||
"\n",
|
||||
"intel_q1_2024_earni 100%[===================>] 130.38K --.-KB/s in 0.005s \n",
|
||||
"\n",
|
||||
"2024-07-15 15:04:44 (24.6 MB/s) - ‘intel_q1_2024_earnings.pdf’ saved [133510/133510]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"!wget 'https://d1io3yog0oux5.cloudfront.net/_11d435a500963f99155ee058df09f574/intel/db/887/9014/earnings_release/Q1+24_EarningsRelease_FINAL.pdf' -O intel_q1_2024_earnings.pdf"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3612627-e105-453d-8a50-bbd6e39dedb5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Loading earning release pdf document through PyPDFLoader**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "cac6278e-ebad-4224-a062-bf6daca24cb0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = PyPDFLoader(\"intel_q1_2024_earnings.pdf\")\n",
|
||||
"data = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a7dca43b-1c62-41df-90c7-6ed2904f823d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Splitting entire document in several chunks with each chunk size is 500 tokens**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "4486adbe-0d0e-4685-8c08-c1774ed6e993",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
|
||||
"all_splits = text_splitter.split_documents(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af142346-e793-4a52-9a56-63e3be416b3d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Looking at the first split of the document**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "e4240fd1-898e-4bfc-a377-02c9bc25b56e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'source': 'intel_q1_2024_earnings.pdf', 'page': 0}, page_content='Intel Corporation\\n2200 Mission College Blvd.\\nSanta Clara, CA 95054-1549\\n \\nNews Release\\n Intel Reports First -Quarter 2024 Financial Results\\nNEWS SUMMARY\\n▪First-quarter revenue of $12.7 billion , up 9% year over year (YoY).\\n▪First-quarter GAAP earnings (loss) per share (EPS) attributable to Intel was $(0.09) ; non-GAAP EPS \\nattributable to Intel was $0.18 .')"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"all_splits[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b88d2632-7c1b-49ef-a691-c0eb67d23e6a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**One of the major step in RAG is to convert each split of document into embeddings and store in a vector database such that searching relevant documents are efficient.** <br>\n",
|
||||
"**For that, importing Chroma vector database from langchain. Also, importing open source GPT4All for embedding models**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "9ff99dd7-9d47-4239-ba0a-d775792334ba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_chroma import Chroma\n",
|
||||
"from langchain_community.embeddings import GPT4AllEmbeddings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b5d1f4dd-dd8d-4a20-95d1-2dbdd204375a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**In next step, we will download one of the most popular embedding model \"all-MiniLM-L6-v2\". Find more details of the model at this link https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "05db3494-5d8e-4a13-9941-26330a86f5e5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_name = \"all-MiniLM-L6-v2.gguf2.f16.gguf\"\n",
|
||||
"gpt4all_kwargs = {\"allow_download\": \"True\"}\n",
|
||||
"embeddings = GPT4AllEmbeddings(model_name=model_name, gpt4all_kwargs=gpt4all_kwargs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4e53999e-1983-46ac-8039-2783e194c3ae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Store all the embeddings in the Chroma database**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "0922951a-9ddf-4761-973d-8e9a86f61284",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "29f94fa0-6c75-4a65-a1a3-debc75422479",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Now, let's find relevant splits from the documents related to the question**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "88c8152d-ec7a-4f0b-9d86-877789407537",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"4\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"question = \"What is Intel CCG revenue in Q1 2024\"\n",
|
||||
"docs = vectorstore.similarity_search(question)\n",
|
||||
"print(len(docs))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "53330c6b-cb0f-43f9-b379-2e57ac1e5335",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Look at the first retrieved document from the vector database**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "43a6d94f-b5c4-47b0-a353-2db4c3d24d9c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(metadata={'page': 1, 'source': 'intel_q1_2024_earnings.pdf'}, page_content='Client Computing Group (CCG) $7.5 billion up31%\\nData Center and AI (DCAI) $3.0 billion up5%\\nNetwork and Edge (NEX) $1.4 billion down 8%\\nTotal Intel Products revenue $11.9 billion up17%\\nIntel Foundry $4.4 billion down 10%\\nAll other:\\nAltera $342 million down 58%\\nMobileye $239 million down 48%\\nOther $194 million up17%\\nTotal all other revenue $775 million down 46%\\nIntersegment eliminations $(4.4) billion\\nTotal net revenue $12.7 billion up9%\\nIntel Products Highlights')"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "64ba074f-4b36-442e-b7e2-b26d6e2815c3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Download Lllama-2 model from Huggingface and store locally** <br>\n",
|
||||
"**You can download different quantization variant of Lllama-2 model from the link below. We are using Q8 version here (7.16GB).** <br>\n",
|
||||
"https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c8dd0811-6f43-4bc6-b854-2ab377639c9a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!huggingface-cli download TheBloke/Llama-2-7b-Chat-GGUF llama-2-7b-chat.Q8_0.gguf --local-dir . --local-dir-use-symlinks False"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3895b1f5-f51d-4539-abf0-af33d7ca48ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Import langchain components required to load downloaded LLMs model**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "fb087088-aa62-44c0-8356-061e9b9f1186",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks.manager import CallbackManager\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain_community.llms import LlamaCpp"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5a8a111e-2614-4b70-b034-85cd3e7304cb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Loading the local Lllama-2 model using Llama-cpp library**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "fb917da2-c0d7-4995-b56d-26254276e0da",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q8_0.gguf (version GGUF V2)\n",
|
||||
"llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
|
||||
"llama_model_loader: - kv 0: general.architecture str = llama\n",
|
||||
"llama_model_loader: - kv 1: general.name str = LLaMA v2\n",
|
||||
"llama_model_loader: - kv 2: llama.context_length u32 = 4096\n",
|
||||
"llama_model_loader: - kv 3: llama.embedding_length u32 = 4096\n",
|
||||
"llama_model_loader: - kv 4: llama.block_count u32 = 32\n",
|
||||
"llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008\n",
|
||||
"llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128\n",
|
||||
"llama_model_loader: - kv 7: llama.attention.head_count u32 = 32\n",
|
||||
"llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32\n",
|
||||
"llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001\n",
|
||||
"llama_model_loader: - kv 10: general.file_type u32 = 7\n",
|
||||
"llama_model_loader: - kv 11: tokenizer.ggml.model str = llama\n",
|
||||
"llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n",
|
||||
"llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...\n",
|
||||
"llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
|
||||
"llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1\n",
|
||||
"llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2\n",
|
||||
"llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0\n",
|
||||
"llama_model_loader: - kv 18: general.quantization_version u32 = 2\n",
|
||||
"llama_model_loader: - type f32: 65 tensors\n",
|
||||
"llama_model_loader: - type q8_0: 226 tensors\n",
|
||||
"llm_load_vocab: special tokens cache size = 259\n",
|
||||
"llm_load_vocab: token to piece cache size = 0.1684 MB\n",
|
||||
"llm_load_print_meta: format = GGUF V2\n",
|
||||
"llm_load_print_meta: arch = llama\n",
|
||||
"llm_load_print_meta: vocab type = SPM\n",
|
||||
"llm_load_print_meta: n_vocab = 32000\n",
|
||||
"llm_load_print_meta: n_merges = 0\n",
|
||||
"llm_load_print_meta: vocab_only = 0\n",
|
||||
"llm_load_print_meta: n_ctx_train = 4096\n",
|
||||
"llm_load_print_meta: n_embd = 4096\n",
|
||||
"llm_load_print_meta: n_layer = 32\n",
|
||||
"llm_load_print_meta: n_head = 32\n",
|
||||
"llm_load_print_meta: n_head_kv = 32\n",
|
||||
"llm_load_print_meta: n_rot = 128\n",
|
||||
"llm_load_print_meta: n_swa = 0\n",
|
||||
"llm_load_print_meta: n_embd_head_k = 128\n",
|
||||
"llm_load_print_meta: n_embd_head_v = 128\n",
|
||||
"llm_load_print_meta: n_gqa = 1\n",
|
||||
"llm_load_print_meta: n_embd_k_gqa = 4096\n",
|
||||
"llm_load_print_meta: n_embd_v_gqa = 4096\n",
|
||||
"llm_load_print_meta: f_norm_eps = 0.0e+00\n",
|
||||
"llm_load_print_meta: f_norm_rms_eps = 1.0e-06\n",
|
||||
"llm_load_print_meta: f_clamp_kqv = 0.0e+00\n",
|
||||
"llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
|
||||
"llm_load_print_meta: f_logit_scale = 0.0e+00\n",
|
||||
"llm_load_print_meta: n_ff = 11008\n",
|
||||
"llm_load_print_meta: n_expert = 0\n",
|
||||
"llm_load_print_meta: n_expert_used = 0\n",
|
||||
"llm_load_print_meta: causal attn = 1\n",
|
||||
"llm_load_print_meta: pooling type = 0\n",
|
||||
"llm_load_print_meta: rope type = 0\n",
|
||||
"llm_load_print_meta: rope scaling = linear\n",
|
||||
"llm_load_print_meta: freq_base_train = 10000.0\n",
|
||||
"llm_load_print_meta: freq_scale_train = 1\n",
|
||||
"llm_load_print_meta: n_ctx_orig_yarn = 4096\n",
|
||||
"llm_load_print_meta: rope_finetuned = unknown\n",
|
||||
"llm_load_print_meta: ssm_d_conv = 0\n",
|
||||
"llm_load_print_meta: ssm_d_inner = 0\n",
|
||||
"llm_load_print_meta: ssm_d_state = 0\n",
|
||||
"llm_load_print_meta: ssm_dt_rank = 0\n",
|
||||
"llm_load_print_meta: model type = 7B\n",
|
||||
"llm_load_print_meta: model ftype = Q8_0\n",
|
||||
"llm_load_print_meta: model params = 6.74 B\n",
|
||||
"llm_load_print_meta: model size = 6.67 GiB (8.50 BPW) \n",
|
||||
"llm_load_print_meta: general.name = LLaMA v2\n",
|
||||
"llm_load_print_meta: BOS token = 1 '<s>'\n",
|
||||
"llm_load_print_meta: EOS token = 2 '</s>'\n",
|
||||
"llm_load_print_meta: UNK token = 0 '<unk>'\n",
|
||||
"llm_load_print_meta: LF token = 13 '<0x0A>'\n",
|
||||
"llm_load_print_meta: max token length = 48\n",
|
||||
"llm_load_tensors: ggml ctx size = 0.14 MiB\n",
|
||||
"llm_load_tensors: CPU buffer size = 6828.64 MiB\n",
|
||||
"...................................................................................................\n",
|
||||
"llama_new_context_with_model: n_ctx = 2048\n",
|
||||
"llama_new_context_with_model: n_batch = 512\n",
|
||||
"llama_new_context_with_model: n_ubatch = 512\n",
|
||||
"llama_new_context_with_model: flash_attn = 0\n",
|
||||
"llama_new_context_with_model: freq_base = 10000.0\n",
|
||||
"llama_new_context_with_model: freq_scale = 1\n",
|
||||
"llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB\n",
|
||||
"llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB\n",
|
||||
"llama_new_context_with_model: CPU output buffer size = 0.12 MiB\n",
|
||||
"llama_new_context_with_model: CPU compute buffer size = 164.01 MiB\n",
|
||||
"llama_new_context_with_model: graph nodes = 1030\n",
|
||||
"llama_new_context_with_model: graph splits = 1\n",
|
||||
"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 | \n",
|
||||
"Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'llama.rope.dimension_count': '128', 'llama.attention.head_count': '32', 'tokenizer.ggml.bos_token_id': '1', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '7'}\n",
|
||||
"Using fallback chat format: llama-2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm = LlamaCpp(\n",
|
||||
" model_path=\"llama-2-7b-chat.Q8_0.gguf\",\n",
|
||||
" n_gpu_layers=-1,\n",
|
||||
" n_batch=512,\n",
|
||||
" n_ctx=2048,\n",
|
||||
" f16_kv=True, # MUST set to True, otherwise you will run into problem after a couple of calls\n",
|
||||
" callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "43e06f56-ef97-451b-87d9-8465ea442aed",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Now let's ask the same question to Llama model without showing them the earnings release.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "1033dd82-5532-437d-a548-27695e109589",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"?\n",
|
||||
"(NASDAQ:INTC)\n",
|
||||
"Intel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year."
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"llama_print_timings: load time = 131.20 ms\n",
|
||||
"llama_print_timings: sample time = 16.05 ms / 68 runs ( 0.24 ms per token, 4236.76 tokens per second)\n",
|
||||
"llama_print_timings: prompt eval time = 131.14 ms / 16 tokens ( 8.20 ms per token, 122.01 tokens per second)\n",
|
||||
"llama_print_timings: eval time = 3225.00 ms / 67 runs ( 48.13 ms per token, 20.78 tokens per second)\n",
|
||||
"llama_print_timings: total time = 3466.40 ms / 83 tokens\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"?\\n(NASDAQ:INTC)\\nIntel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"llm.invoke(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "75f5cb10-746f-4e37-9386-b85a4d2b84ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**As you can see, model is giving wrong information. Correct asnwer is CCG revenue in Q1 2024 is $7.5B. Now let's apply RAG using the earning release document**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0f4150ec-5692-4756-b11a-22feb7ab88ff",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**in RAG, we modify the input prompt by adding relevent documents with the question. Here, we use one of the popular RAG prompt**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "226c14b0-f43e-4a1f-a1e4-04731d467ec4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: {question} \\nContext: {context} \\nAnswer:\"))]"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain import hub\n",
|
||||
"\n",
|
||||
"rag_prompt = hub.pull(\"rlm/rag-prompt\")\n",
|
||||
"rag_prompt.messages"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "77deb6a0-0950-450a-916a-f2a029676c20",
|
||||
"metadata": {},
|
||||
"source": "**Appending all retrieved documents in a single document**"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "2dbc3327-6ef3-4c1f-8797-0c71964b0921",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def format_docs(docs):\n",
|
||||
" return \"\\n\\n\".join(doc.page_content for doc in docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2e2d9f18-49d0-43a3-bea8-78746ffa86b7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**The last step is to create a chain using langchain tool that will create an e2e pipeline. It will take question and context as an input.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "427379c2-51ff-4e0f-8278-a45221363299",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough, RunnablePick\n",
|
||||
"\n",
|
||||
"# Chain\n",
|
||||
"chain = (\n",
|
||||
" RunnablePassthrough.assign(context=RunnablePick(\"context\") | format_docs)\n",
|
||||
" | rag_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "095d6280-c949-4d00-8e32-8895a82d245f",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Llama.generate: prefix-match hit\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Based on the provided context, Intel CCG revenue in Q1 2024 was $7.5 billion up 31%."
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"llama_print_timings: load time = 131.20 ms\n",
|
||||
"llama_print_timings: sample time = 7.74 ms / 31 runs ( 0.25 ms per token, 4004.13 tokens per second)\n",
|
||||
"llama_print_timings: prompt eval time = 2529.41 ms / 674 tokens ( 3.75 ms per token, 266.46 tokens per second)\n",
|
||||
"llama_print_timings: eval time = 1542.94 ms / 30 runs ( 51.43 ms per token, 19.44 tokens per second)\n",
|
||||
"llama_print_timings: total time = 4123.68 ms / 704 tokens\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Based on the provided context, Intel CCG revenue in Q1 2024 was $7.5 billion up 31%.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.invoke({\"context\": docs, \"question\": question})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "638364b2-6bd2-4471-9961-d3a1d1b9d4ee",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Now we see the results are correct as it is mentioned in earnings release.** <br>\n",
|
||||
"**To further automate, we will create a chain that will take input as question and retriever so that we don't need to retrieve documents separately**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"id": "4654e5b7-635f-4767-8b31-4c430164cdd5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = vectorstore.as_retriever()\n",
|
||||
"qa_chain = (\n",
|
||||
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
|
||||
" | rag_prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0979f393-fd0a-4e82-b844-68371c6ad68f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Now we only need to pass the question to the chain and it will fetch the contexts directly from the vector database to generate the answer**\n",
|
||||
"<br>\n",
|
||||
"**Let's try with another question**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "3ea07b82-e6ec-4084-85f4-191373530172",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Llama.generate: prefix-match hit\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" According to the provided context, Intel DCAI revenue in Q1 2024 was $3.0 billion up 5%."
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"llama_print_timings: load time = 131.20 ms\n",
|
||||
"llama_print_timings: sample time = 6.28 ms / 31 runs ( 0.20 ms per token, 4937.88 tokens per second)\n",
|
||||
"llama_print_timings: prompt eval time = 2681.93 ms / 730 tokens ( 3.67 ms per token, 272.19 tokens per second)\n",
|
||||
"llama_print_timings: eval time = 1471.07 ms / 30 runs ( 49.04 ms per token, 20.39 tokens per second)\n",
|
||||
"llama_print_timings: total time = 4206.77 ms / 760 tokens\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' According to the provided context, Intel DCAI revenue in Q1 2024 was $3.0 billion up 5%.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"qa_chain.invoke(\"what is Intel DCAI revenue in Q1 2024?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9407f2a0-4a35-4315-8e96-02fcb80f210c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.11.1 64-bit",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "1a1af0ee75eeea9e2e1ee996c87e7a2b11a0bebd85af04bb136d915cefc0abce"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user