Compare commits

..

10 Commits

Author SHA1 Message Date
Erick Friis
ad87d24edc x 2024-11-21 20:02:19 -08:00
Erick Friis
254e59c2ce x 2024-11-21 19:54:14 -08:00
Erick Friis
01bf59679c x 2024-11-21 19:40:08 -08:00
Erick Friis
bffca0d5c2 x 2024-11-21 19:37:17 -08:00
Erick Friis
46ea6722f4 x 2024-11-21 19:00:38 -08:00
Erick Friis
d83000b5b8 x 2024-11-21 18:59:10 -08:00
Erick Friis
3dd6c05ce7 x 2024-11-21 18:48:59 -08:00
Erick Friis
1798d6e92e x 2024-11-21 18:41:56 -08:00
Erick Friis
9ac46cc264 x 2024-11-21 18:17:04 -08:00
Erick Friis
ec12b492f1 docs: poetry publish 2024-11-21 15:56:57 -08:00
5202 changed files with 484997 additions and 244274 deletions

View File

@@ -5,31 +5,26 @@ This project includes a [dev container](https://containers.dev/), which lets you
You can use the dev container configuration in this folder to build and run the app without needing to install any of its tools locally! You can use it in [GitHub Codespaces](https://github.com/features/codespaces) or the [VS Code Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).
## GitHub Codespaces
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain)
You may use the button above, or follow these steps to open this repo in a Codespace:
1. Click the **Code** drop-down menu at the top of <https://github.com/langchain-ai/langchain>.
1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain.
1. Click on the **Codespaces** tab.
1. Click **Create codespace on master**.
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
## VS Code Dev Containers
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
> [!NOTE]
> If you click the link above you will open the main repo (`langchain-ai/langchain`) and *not* your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
```txt
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/&lt;YOUR_USERNAME&gt;/&lt;YOUR_CLONED_REPO_NAME&gt;
Note: If you click the link above you will open the main repo (langchain-ai/langchain) and not your local cloned repo. This is fine if you only want to run and test the library, but if you want to contribute you can use the link below and replace with your username and cloned repo name:
```
https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/<yourusername>/<yourclonedreponame>
```
Then you will have a local cloned repo where you can contribute and then create pull requests.
If you already have VS Code and Docker installed, you can use the button above to get started. This will use VSCode to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
If you already have VS Code and Docker installed, you can use the button above to get started. This will cause VS Code to automatically install the Dev Containers extension if needed, clone the source code into a container volume, and spin up a dev container for use.
Alternatively you can also follow these steps to open this repo in a container using the VS Code Dev Containers extension:
@@ -45,5 +40,5 @@ You can learn more in the [Dev Containers documentation](https://code.visualstud
## Tips and tricks
- If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
- If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.
* If you are working with the same repository folder in a container and Windows, you'll want consistent line endings (otherwise you may see hundreds of changes in the SCM view). The `.gitattributes` file in the root of this repo will disable line ending conversion and should prevent this. See [tips and tricks](https://code.visualstudio.com/docs/devcontainers/tips-and-tricks#_resolving-git-line-ending-issues-in-containers-resulting-in-many-modified-files) for more info.
* If you'd like to review the contents of the image used in this dev container, you can check it out in the [devcontainers/images](https://github.com/devcontainers/images/tree/main/src/python) repo.

View File

@@ -1,58 +1,36 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-docker-compose
{
// Name for the dev container
"name": "langchain",
// Point to a Docker Compose file
"dockerComposeFile": "./docker-compose.yaml",
// Required when using Docker Compose. The name of the service to connect to once running
"service": "langchain",
// The optional 'workspaceFolder' property is the path VS Code should open by default when
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
"workspaceFolder": "/workspaces/langchain",
"mounts": [
"source=langchain-workspaces,target=/workspaces/langchain,type=volume"
],
// Prevent the container from shutting down
"overrideCommand": true,
// Features to add to the dev container. More info: https://containers.dev/features
"features": {
"ghcr.io/devcontainers/features/git:1": {},
"ghcr.io/devcontainers/features/github-cli:1": {}
},
"containerEnv": {
"UV_LINK_MODE": "copy"
},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Run commands after the container is created
"postCreateCommand": "uv sync && echo 'LangChain (Python) dev environment ready!'",
// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"ms-python.debugpy",
"ms-python.mypy-type-checker",
"ms-python.isort",
"unifiedjs.vscode-mdx",
"davidanson.vscode-markdownlint",
"ms-toolsai.jupyter",
"GitHub.copilot",
"GitHub.copilot-chat"
],
"settings": {
"python.defaultInterpreterPath": ".venv/bin/python",
"python.formatting.provider": "none",
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
}
}
}
}
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
// Name for the dev container
"name": "langchain",
// Point to a Docker Compose file
"dockerComposeFile": "./docker-compose.yaml",
// Required when using Docker Compose. The name of the service to connect to once running
"service": "langchain",
// The optional 'workspaceFolder' property is the path VS Code should open by default when
// connected. This is typically a file mount in .devcontainer/docker-compose.yml
"workspaceFolder": "/workspaces/langchain",
// Prevent the container from shutting down
"overrideCommand": true
// Features to add to the dev container. More info: https://containers.dev/features
// "features": {
// "ghcr.io/devcontainers-contrib/features/poetry:2": {}
// }
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "cat /etc/os-release",
// Configure tool-specific properties.
// "customizations": {},
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}

View File

@@ -4,9 +4,26 @@ services:
build:
dockerfile: libs/langchain/dev.Dockerfile
context: ..
volumes:
# Update this to wherever you want VS Code to mount the folder of your project
- ..:/workspaces/langchain:cached
networks:
- langchain-network
# environment:
# MONGO_ROOT_USERNAME: root
# MONGO_ROOT_PASSWORD: example123
# depends_on:
# - mongo
# mongo:
# image: mongo
# restart: unless-stopped
# environment:
# MONGO_INITDB_ROOT_USERNAME: root
# MONGO_INITDB_ROOT_PASSWORD: example123
# ports:
# - "27017:27017"
# networks:
# - langchain-network
networks:
langchain-network:

View File

@@ -1,52 +0,0 @@
# top-most EditorConfig file
root = true
# All files
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
# Python files
[*.py]
indent_style = space
indent_size = 4
max_line_length = 88
# JSON files
[*.json]
indent_style = space
indent_size = 2
# YAML files
[*.{yml,yaml}]
indent_style = space
indent_size = 2
# Markdown files
[*.md]
indent_style = space
indent_size = 2
trim_trailing_whitespace = false
# Configuration files
[*.{toml,ini,cfg}]
indent_style = space
indent_size = 4
# Shell scripts
[*.sh]
indent_style = space
indent_size = 2
# Makefile
[Makefile]
indent_style = tab
indent_size = 4
# Jupyter notebooks
[*.ipynb]
# Jupyter may include trailing whitespace in cell
# outputs that's semantically meaningful
trim_trailing_whitespace = false

5
.github/CODEOWNERS vendored
View File

@@ -1,3 +1,2 @@
/.github/ @baskaryan @ccurme @eyurtsev
/libs/core/ @eyurtsev
/libs/packages.yml @ccurme
/.github/ @efriis @baskaryan @ccurme
/libs/packages.yml @efriis

View File

@@ -129,4 +129,4 @@ For answers to common questions about this code of conduct, see the FAQ at
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations
[translations]: https://www.contributor-covenant.org/translations

View File

@@ -3,8 +3,4 @@
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
## New features
For new features, please start a new [discussion on our forum](https://forum.langchain.com/), where the maintainers will help with scoping out the necessary changes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).

38
.github/DISCUSSION_TEMPLATE/ideas.yml vendored Normal file
View File

@@ -0,0 +1,38 @@
labels: [idea]
body:
- type: checkboxes
id: checks
attributes:
label: Checked
description: Please confirm and check all the following options.
options:
- label: I searched existing ideas and did not find a similar one
required: true
- label: I added a very descriptive title
required: true
- label: I've clearly described the feature request and motivation for it
required: true
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide links to any relevant GitHub repos, papers, or other resources if relevant.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type: textarea
id: proposal
validations:
required: false
attributes:
label: Proposal (If applicable)
description: |
If you would like to propose a solution, please describe it here.

122
.github/DISCUSSION_TEMPLATE/q-a.yml vendored Normal file
View File

@@ -0,0 +1,122 @@
labels: [Question]
body:
- type: markdown
attributes:
value: |
Thanks for your interest in LangChain 🦜️🔗!
Please follow these instructions, fill every question, and do every step. 🙏
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
By asking questions in a structured way (following this) it will be much easier for us to help you.
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓
Relevant links to check before opening a question to see if your question has already been answered, fixed or
if there's another way to solve your problem:
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
[API Reference](https://api.python.langchain.com/en/stable/),
[GitHub search](https://github.com/langchain-ai/langchain),
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
[LangChain ChatBot](https://chat.langchain.com/)
- type: checkboxes
id: checks
attributes:
label: Checked other resources
description: Please confirm and check all the following options.
options:
- label: I added a very descriptive title to this question.
required: true
- label: I searched the LangChain documentation with the integrated search.
required: true
- label: I used the GitHub search to find a similar question and didn't find it.
required: true
- type: checkboxes
id: help
attributes:
label: Commit to Help
description: |
After submitting this, I commit to one of:
* Read open questions until I find 2 where I can help someone and add a comment to help there.
* I already hit the "watch" button in this repository to receive notifications and I commit to help at least 2 people that ask questions in the future.
* Once my question is answered, I will mark the answer as "accepted".
options:
- label: I commit to help with one of those options 👆
required: true
- type: textarea
id: example
attributes:
label: Example Code
description: |
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
**Important!**
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
from langchain_core.runnables import RunnableLambda
def bad_code(inputs) -> int:
raise NotImplementedError('For demo purpose')
chain = RunnableLambda(bad_code)
chain.invoke('Hello!')
render: python
validations:
required: true
- type: textarea
id: description
attributes:
label: Description
description: |
What is the problem, question, or error?
Write a short description explaining what you are doing, what you expect to happen, and what is currently happening.
placeholder: |
* I'm trying to use the `langchain` library to do X.
* I expect to see Y.
* Instead, it does Z.
validations:
required: true
- type: textarea
id: system-info
attributes:
label: System Info
description: |
Please share your system info with us.
"pip freeze | grep langchain"
platform (windows / linux / mac)
python version
OR if you're on a recent version of langchain-core you can paste the output of:
python -m langchain_core.sys_info
placeholder: |
"pip freeze | grep langchain"
platform
python version
Alternatively, if you're on a recent version of langchain-core you can paste the output of:
python -m langchain_core.sys_info
These will only surface LangChain packages, don't forget to include any other relevant
packages you're using (if you're not sure what's relevant, you can paste the entire output of `pip freeze`).
validations:
required: true

View File

@@ -1,33 +1,35 @@
name: "\U0001F41B Bug Report"
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the LangChain forum.
labels: ["bug"]
description: Report a bug in LangChain. To report a security issue, please instead use the security option below. For questions, please use the GitHub Discussions.
labels: ["02 Bug Report"]
body:
- type: markdown
attributes:
value: |
Thank you for taking the time to file a bug report.
Use this to report BUGS in LangChain. For usage questions, feature requests and general design questions, please use the [LangChain Forum](https://forum.langchain.com/).
value: >
Thank you for taking the time to file a bug report.
Use this to report bugs in LangChain.
If you're not certain that your issue is due to a bug in LangChain, please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions)
to ask for help with your issue.
Relevant links to check before filing a bug report to see if your issue has already been reported, fixed or
if there's another way to solve your problem:
* [LangChain Forum](https://forum.langchain.com/),
* [LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
* [LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
* [LangChain how-to guides](https://python.langchain.com/docs/how_to/),
* [API Reference](https://python.langchain.com/api_reference/),
* [LangChain ChatBot](https://chat.langchain.com/)
* [GitHub search](https://github.com/langchain-ai/langchain),
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
[API Reference](https://api.python.langchain.com/en/stable/),
[GitHub search](https://github.com/langchain-ai/langchain),
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
[LangChain ChatBot](https://chat.langchain.com/)
- type: checkboxes
id: checks
attributes:
label: Checked other resources
description: Please confirm and check all the following options.
options:
- label: This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- label: I added a very descriptive title to this issue.
required: true
- label: I added a clear and descriptive title that summarizes this issue.
- label: I searched the LangChain documentation with the integrated search.
required: true
- label: I used the GitHub search to find a similar question and didn't find it.
required: true
@@ -35,10 +37,6 @@ body:
required: true
- label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
required: true
- label: I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
required: true
- label: I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
required: true
- type: textarea
id: reproduction
validations:
@@ -47,25 +45,25 @@ body:
label: Example Code
description: |
Please add a self-contained, [minimal, reproducible, example](https://stackoverflow.com/help/minimal-reproducible-example) with your use case.
If a maintainer can copy it, run it, and see it right away, there's a much higher chance that you'll be able to get help.
**Important!**
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
**Important!**
* Use code tags (e.g., ```python ... ```) to correctly [format your code](https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting).
* INCLUDE the language label (e.g. `python`) after the first three backticks to enable syntax highlighting. (e.g., ```python rather than ```).
* Reduce your code to the minimum required to reproduce the issue if possible. This makes it much easier for others to help you.
* Avoid screenshots when possible, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
The following code:
The following code:
```python
from langchain_core.runnables import RunnableLambda
def bad_code(inputs) -> int:
raise NotImplementedError('For demo purpose')
chain = RunnableLambda(bad_code)
chain.invoke('Hello!')
```
@@ -101,18 +99,16 @@ body:
Please share your system info with us. Do NOT skip this step and please don't trim
the output. Most users don't include enough information here and it makes it harder
for us to help you.
Run the following command in your terminal and paste the output here:
`python -m langchain_core.sys_info`
python -m langchain_core.sys_info
or if you have an existing python interpreter running:
```python
from langchain_core import sys_info
sys_info.print_sys_info()
```
alternatively, put the entire output of `pip freeze` here.
placeholder: |
python -m langchain_core.sys_info

View File

@@ -1,9 +1,12 @@
blank_issues_enabled: false
version: 2.1
contact_links:
- name: Documentation
url: https://github.com/langchain-ai/docs/issues/new?template=langchain.yml
about: Report an issue related to the LangChain documentation
- name: LangChain Forum
url: https://forum.langchain.com/
about: General community discussions and support
- name: 🤔 Question or Problem
about: Ask a question or ask about a problem in GitHub Discussions.
url: https://www.github.com/langchain-ai/langchain/discussions/categories/q-a
- name: Feature Request
url: https://www.github.com/langchain-ai/langchain/discussions/categories/ideas
about: Suggest a feature or an idea
- name: Show and tell
about: Show what you built with LangChain
url: https://www.github.com/langchain-ai/langchain/discussions/categories/show-and-tell

View File

@@ -0,0 +1,58 @@
name: Documentation
description: Report an issue related to the LangChain documentation.
title: "DOC: <Please write a comprehensive title after the 'DOC: ' prefix>"
labels: [03 - Documentation]
body:
- type: markdown
attributes:
value: >
Thank you for taking the time to report an issue in the documentation.
Only report issues with documentation here, explain if there are
any missing topics or if you found a mistake in the documentation.
Do **NOT** use this to ask usage questions or reporting issues with your code.
If you have usage questions or need help solving some problem,
please use [GitHub Discussions](https://github.com/langchain-ai/langchain/discussions).
If you're in the wrong place, here are some helpful links to find a better
place to ask your question:
[LangChain documentation with the integrated search](https://python.langchain.com/docs/get_started/introduction),
[API Reference](https://api.python.langchain.com/en/stable/),
[GitHub search](https://github.com/langchain-ai/langchain),
[LangChain Github Discussions](https://github.com/langchain-ai/langchain/discussions),
[LangChain Github Issues](https://github.com/langchain-ai/langchain/issues?q=is%3Aissue),
[LangChain ChatBot](https://chat.langchain.com/)
- type: input
id: url
attributes:
label: URL
description: URL to documentation
validations:
required: false
- type: checkboxes
id: checks
attributes:
label: Checklist
description: Please confirm and check all the following options.
options:
- label: I added a very descriptive title to this issue.
required: true
- label: I included a link to the documentation page I am referring to (if applicable).
required: true
- type: textarea
attributes:
label: "Issue with current documentation:"
description: >
Please make sure to leave a reference to the document/code you're
referring to. Feel free to include names of classes, functions, methods
or concepts you'd like to see documented more.
- type: textarea
attributes:
label: "Idea or request for content:"
description: >
Please describe as clearly as possible what topics you think are missing
from the current documentation.

View File

@@ -5,10 +5,10 @@ body:
attributes:
value: |
Thanks for your interest in LangChain! 🚀
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation on the [LangChain Forum](https://forum.langchain.com/) instead.
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
or are a regular contributor to LangChain with previous merged pull requests.
- type: checkboxes
id: privileged

View File

@@ -1,33 +1,29 @@
(Replace this entire block of text)
Thank you for contributing to LangChain!
Thank you for contributing to LangChain! Follow these steps to mark your pull request as ready for review. **If any of these steps are not completed, your PR will not be considered for review.**
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR title**: Follows the format: {TYPE}({SCOPE}): {DESCRIPTION}
- Examples:
- feat(core): add multi-tenant support
- fix(cli): resolve flag parsing error
- docs(openai): update API usage examples
- Allowed `{TYPE}` values:
- feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release
- Allowed `{SCOPE}` values (optional):
- core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek, exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant, xai
- *Note:* the `{DESCRIPTION}` must not start with an uppercase letter.
- Once you've written the title, please delete this checklist item; do not include it in the PR.
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
- **Description:** a description of the change. Include a [closing keyword](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) if applicable to a relevant issue.
- **Issue:** the issue # it fixes, if applicable (e.g. Fixes #123)
- **Dependencies:** any dependencies required for this change
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, you must include:
1. A test for the integration, preferably unit tests that do not rely on network access,
2. An example notebook showing its use. It lives in `docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. **We will not consider a PR unless these three are passing in CI.** See [contribution guidelines](https://python.langchain.com/docs/contributing/) for more.
- [ ] **Add tests and docs**: If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Most PRs should not touch more than one package.
- Please do not add dependencies to `pyproject.toml` files (even optional ones) unless they are **required** for unit tests.
- Changes should be backwards compatible.
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in langchain.
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

View File

@@ -4,4 +4,4 @@ RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.
COPY ./app /app
CMD ["python", "/app/main.py"]
CMD ["python", "/app/main.py"]

View File

@@ -4,8 +4,8 @@ description: "Generate the data for the LangChain People page"
author: "Jacob Lee <jacob@langchain.dev>"
inputs:
token:
description: "User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}"
description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
required: true
runs:
using: "docker"
image: "Dockerfile"
using: 'docker'
image: 'Dockerfile'

View File

@@ -1,21 +0,0 @@
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching
name: uv-install
description: Set up Python and uv
inputs:
python-version:
description: Python version, supporting MAJOR.MINOR only
required: true
env:
UV_VERSION: "0.5.25"
runs:
using: composite
steps:
- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
version: ${{ env.UV_VERSION }}
python-version: ${{ inputs.python-version }}

View File

@@ -1,325 +0,0 @@
# Global Development Guidelines for LangChain Projects
## Core Development Principles
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
**Bad - Breaking Change:**
```python
def get_user(id, verbose=False): # Changed from `user_id`
pass
```
**Good - Stable Interface:**
```python
def get_user(user_id: str, verbose: bool = False) -> User:
"""Retrieve user by ID with optional verbose output."""
pass
```
**Before making ANY changes to public APIs:**
- Check if the function/class is exported in `__init__.py`
- Look for existing usage patterns in tests and examples
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
### 2. Code Quality Standards
**All Python code MUST include type hints and return types.**
**Bad:**
```python
def p(u, d):
return [x for x in u if x not in d]
```
**Good:**
```python
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
"""Filter out users that are not in the known users set.
Args:
users: List of user identifiers to filter.
known_users: Set of known/valid user identifiers.
Returns:
List of users that are not in the known_users set.
"""
return [user for user in users if user not in known_users]
```
**Style Requirements:**
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
- Avoid unnecessary abstraction or premature optimization
- Follow existing patterns in the codebase you're modifying
### 3. Testing Requirements
**Every new feature or bugfix MUST be covered by unit tests.**
**Test Organization:**
- Unit tests: `tests/unit_tests/` (no network calls allowed)
- Integration tests: `tests/integration_tests/` (network calls permitted)
- Use `pytest` as the testing framework
**Test Quality Checklist:**
- [ ] Tests fail when your new logic is broken
- [ ] Happy path is covered
- [ ] Edge cases and error conditions are tested
- [ ] Use fixtures/mocks for external dependencies
- [ ] Tests are deterministic (no flaky tests)
Checklist questions:
- [ ] Does the test suite fail if your new logic is broken?
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
- [ ] Do tests use fixtures or mocks where needed?
```python
def test_filter_unknown_users():
"""Test filtering unknown users from a list."""
users = ["alice", "bob", "charlie"]
known_users = {"alice", "bob"}
result = filter_unknown_users(users, known_users)
assert result == ["charlie"]
assert len(result) == 1
```
### 4. Security and Risk Assessment
**Security Checklist:**
- No `eval()`, `exec()`, or `pickle` on user-controlled input
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
- Remove unreachable/commented code before committing
- Race conditions or resource leaks (file handles, sockets, threads).
- Ensure proper resource cleanup (file handles, connections)
**Bad:**
```python
def load_config(path):
with open(path) as f:
return eval(f.read()) # ⚠️ Never eval config
```
**Good:**
```python
import json
def load_config(path: str) -> dict:
with open(path) as f:
return json.load(f)
```
### 5. Documentation Standards
**Use Google-style docstrings with Args section for all public functions.**
**Insufficient Documentation:**
```python
def send_email(to, msg):
"""Send an email to a recipient."""
```
**Complete Documentation:**
```python
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
"""
Send an email to a recipient with specified priority.
Args:
to: The email address of the recipient.
msg: The message body to send.
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
Returns:
True if email was sent successfully, False otherwise.
Raises:
InvalidEmailError: If the email address format is invalid.
SMTPConnectionError: If unable to connect to email server.
"""
```
**Documentation Guidelines:**
- Types go in function signatures, NOT in docstrings
- Focus on "why" rather than "what" in descriptions
- Document all parameters, return values, and exceptions
- Keep descriptions concise but clear
- Use reStructuredText for docstrings to enable rich formatting
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
### 6. Architectural Improvements
**When you encounter code that could be improved, suggest better designs:**
**Poor Design:**
```python
def process_data(data, db_conn, email_client, logger):
# Function doing too many things
validated = validate_data(data)
result = db_conn.save(validated)
email_client.send_notification(result)
logger.log(f"Processed {len(data)} items")
return result
```
**Better Design:**
```python
@dataclass
class ProcessingResult:
"""Result of data processing operation."""
items_processed: int
success: bool
errors: List[str] = field(default_factory=list)
class DataProcessor:
"""Handles data validation, storage, and notification."""
def __init__(self, db_conn: Database, email_client: EmailClient):
self.db = db_conn
self.email = email_client
def process(self, data: List[dict]) -> ProcessingResult:
"""Process and store data with notifications."""
validated = self._validate_data(data)
result = self.db.save(validated)
self._notify_completion(result)
return result
```
**Design Improvement Areas:**
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
- Reduce code duplication through shared utilities
- Make unit testing easier
- Improve separation of concerns (single responsibility)
- Make unit testing easier through dependency injection
- Add clarity without adding complexity
- Prefer dataclasses for structured data
## Development Tools & Commands
### Package Management
```bash
# Add package
uv add package-name
# Sync project dependencies
uv sync
uv lock
```
### Testing
```bash
# Run unit tests (no network)
make test
# Don't run integration tests, as API keys must be set
# Run specific test file
uv run --group test pytest tests/unit_tests/test_specific.py
```
### Code Quality
```bash
# Lint code
make lint
# Format code
make format
# Type checking
uv run --group lint mypy .
```
### Dependency Management Patterns
**Local Development Dependencies:**
```toml
[tool.uv.sources]
langchain-core = { path = "../core", editable = true }
langchain-tests = { path = "../standard-tests", editable = true }
```
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
```python
from langchain_core.tools import tool
@tool
def search_database(query: str) -> str:
"""Search the database for relevant information.
Args:
query: The search query string.
"""
# Implementation here
return results
```
## Commit Standards
**Use Conventional Commits format for PR titles:**
- `feat(core): add multi-tenant support`
- `fix(cli): resolve flag parsing error`
- `docs: update API usage examples`
- `docs(openai): update API usage examples`
## Framework-Specific Guidelines
- Follow the existing patterns in `langchain-core` for base abstractions
- Use `langchain_core.callbacks` for execution tracking
- Implement proper streaming support where applicable
- Avoid deprecated components like legacy `LLMChain`
### Partner Integrations
- Follow the established patterns in existing partner libraries
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
- Include comprehensive integration tests
- Document API key requirements and authentication
---
## Quick Reference Checklist
Before submitting code changes:
- [ ] **Breaking Changes**: Verified no public API changes
- [ ] **Type Hints**: All functions have complete type annotations
- [ ] **Tests**: New functionality is fully tested
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
- [ ] **Documentation**: Google-style docstrings for public functions
- [ ] **Code Quality**: `make lint` and `make format` pass
- [ ] **Architecture**: Suggested improvements where applicable
- [ ] **Commit Message**: Follows Conventional Commits format

View File

@@ -1,11 +0,0 @@
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
# and
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"

View File

@@ -3,18 +3,18 @@ import json
import os
import sys
from collections import defaultdict
from pathlib import Path
from typing import Dict, List, Set
from pathlib import Path
import tomllib
from get_min_versions import get_min_version_from_toml
from packaging.requirements import Requirement
LANGCHAIN_DIRS = [
"libs/core",
"libs/text-splitters",
"libs/langchain",
"libs/langchain_v1",
"libs/community",
]
# when set to True, we are ignoring core dependents
@@ -30,13 +30,20 @@ IGNORED_PARTNERS = [
# specifically in huggingface jobs
# https://github.com/langchain-ai/langchain/issues/25558
"huggingface",
# prompty exhibiting issues with numpy for Python 3.13
# https://github.com/langchain-ai/langchain/actions/runs/12651104685/job/35251034969?pr=29065
"prompty",
]
# Cap python version at 3.12 for some packages with dependencies that are not yet
# compatible with python 3.13 (mostly hf tokenizers).
PY_312_MAX_PACKAGES = [
"libs/partners/chroma", # https://github.com/chroma-core/chroma/issues/4382
f"libs/partners/{integration}"
for integration in [
"chroma",
"couchbase",
"huggingface",
"mistralai",
"nomic",
"qdrant",
]
]
@@ -61,17 +68,15 @@ def dependents_graph() -> dict:
# load regular and test deps from pyproject.toml
with open(path, "rb") as f:
pyproject = tomllib.load(f)
pyproject = tomllib.load(f)["tool"]["poetry"]
pkg_dir = "libs" + "/".join(path.split("libs")[1].split("/")[:-1])
for dep in [
*pyproject["project"]["dependencies"],
*pyproject["dependency-groups"]["test"],
*pyproject["dependencies"].keys(),
*pyproject["group"]["test"]["dependencies"].keys(),
]:
requirement = Requirement(dep)
package_name = requirement.name
if "langchain" in dep:
dependents[package_name].add(pkg_dir)
dependents[dep].add(pkg_dir)
continue
# load extended deps from extended_testing_deps.txt
@@ -83,9 +88,9 @@ def dependents_graph() -> dict:
for depline in extended_deps:
if depline.startswith("-e "):
# editable dependency
assert depline.startswith("-e ../partners/"), (
"Extended test deps should only editable install partner packages"
)
assert depline.startswith(
"-e ../partners/"
), "Extended test deps should only editable install partner packages"
partner = depline.split("partners/")[1]
dep = f"langchain-{partner}"
else:
@@ -118,23 +123,25 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
if job == "test-pydantic":
return _get_pydantic_test_configs(dir_)
if job == "codspeed":
py_versions = ["3.12"] # 3.13 is not yet supported
elif dir_ == "libs/core":
if dir_ == "libs/core":
py_versions = ["3.9", "3.10", "3.11", "3.12", "3.13"]
# custom logic for specific directories
elif dir_ == "libs/partners/milvus":
# milvus doesn't allow 3.12 because they declare deps in funny way
# milvus poetry doesn't allow 3.12 because they
# declare deps in funny way
py_versions = ["3.9", "3.11"]
elif dir_ in PY_312_MAX_PACKAGES:
py_versions = ["3.9", "3.12"]
elif dir_ == "libs/langchain" and job == "extended-tests":
py_versions = ["3.9", "3.13"]
elif dir_ == "libs/langchain_v1":
py_versions = ["3.10", "3.13"]
elif dir_ in ["libs/community", "libs/langchain"] and job == "extended-tests":
# community extended test resolution in 3.12 is slow
# even in uv
py_versions = ["3.9", "3.11"]
elif dir_ == "libs/community" and job == "compile-integration-tests":
# community integration deps are slow in 3.12
py_versions = ["3.9", "3.11"]
elif dir_ == ".":
# unable to install with 3.13 because tokenizers doesn't support 3.13 yet
py_versions = ["3.9", "3.12"]
@@ -147,17 +154,17 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
def _get_pydantic_test_configs(
dir_: str, *, python_version: str = "3.11"
) -> List[Dict[str, str]]:
with open("./libs/core/uv.lock", "rb") as f:
core_uv_lock_data = tomllib.load(f)
for package in core_uv_lock_data["package"]:
with open("./libs/core/poetry.lock", "rb") as f:
core_poetry_lock_data = tomllib.load(f)
for package in core_poetry_lock_data["package"]:
if package["name"] == "pydantic":
core_max_pydantic_minor = package["version"].split(".")[1]
break
with open(f"./{dir_}/uv.lock", "rb") as f:
dir_uv_lock_data = tomllib.load(f)
with open(f"./{dir_}/poetry.lock", "rb") as f:
dir_poetry_lock_data = tomllib.load(f)
for package in dir_uv_lock_data["package"]:
for package in dir_poetry_lock_data["package"]:
if package["name"] == "pydantic":
dir_max_pydantic_minor = package["version"].split(".")[1]
break
@@ -179,6 +186,11 @@ def _get_pydantic_test_configs(
else "0"
)
custom_mins = {
# depends on pydantic-settings 2.4 which requires pydantic 2.7
"libs/community": 7,
}
max_pydantic_minor = min(
int(dir_max_pydantic_minor),
int(core_max_pydantic_minor),
@@ -186,6 +198,7 @@ def _get_pydantic_test_configs(
min_pydantic_minor = max(
int(dir_min_pydantic_minor),
int(core_min_pydantic_minor),
custom_mins.get(dir_, 0),
)
configs = [
@@ -213,8 +226,6 @@ def _get_configs_for_multi_dirs(
)
elif job == "extended-tests":
dirs = list(dirs_to_run["extended-test"])
elif job == "codspeed":
dirs = list(dirs_to_run["codspeed"])
else:
raise ValueError(f"Unknown job: {job}")
@@ -230,7 +241,6 @@ if __name__ == "__main__":
"lint": set(),
"test": set(),
"extended-test": set(),
"codspeed": set(),
}
docs_edited = False
@@ -254,8 +264,6 @@ if __name__ == "__main__":
dirs_to_run["extended-test"].update(LANGCHAIN_DIRS)
dirs_to_run["lint"].add(".")
if file.startswith("libs/core"):
dirs_to_run["codspeed"].add(f"libs/core")
if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS):
# add that dir and all dirs after in LANGCHAIN_DIRS
# for extended testing
@@ -271,11 +279,8 @@ if __name__ == "__main__":
dirs_to_run["extended-test"].add(dir_)
elif file.startswith("libs/standard-tests"):
# TODO: update to include all packages that rely on standard-tests (all partner packages)
# Note: won't run on external repo partners
# note: won't run on external repo partners
dirs_to_run["lint"].add("libs/standard-tests")
dirs_to_run["test"].add("libs/standard-tests")
dirs_to_run["lint"].add("libs/cli")
dirs_to_run["test"].add("libs/cli")
dirs_to_run["test"].add("libs/partners/mistralai")
dirs_to_run["test"].add("libs/partners/openai")
dirs_to_run["test"].add("libs/partners/anthropic")
@@ -283,9 +288,8 @@ if __name__ == "__main__":
dirs_to_run["test"].add("libs/partners/groq")
elif file.startswith("libs/cli"):
dirs_to_run["lint"].add("libs/cli")
dirs_to_run["test"].add("libs/cli")
# todo: add cli makefile
pass
elif file.startswith("libs/partners"):
partner_dir = file.split("/")[2]
if os.path.isdir(f"libs/partners/{partner_dir}") and [
@@ -294,7 +298,6 @@ if __name__ == "__main__":
if not filename.startswith(".")
] != ["README.md"]:
dirs_to_run["test"].add(f"libs/partners/{partner_dir}")
dirs_to_run["codspeed"].add(f"libs/partners/{partner_dir}")
# Skip if the directory was deleted or is just a tombstone readme
elif file == "libs/packages.yml":
continue
@@ -303,11 +306,9 @@ if __name__ == "__main__":
f"Unknown lib: {file}. check_diff.py likely needs "
"an update for this new library!"
)
elif file.startswith("docs/") or file in [
"pyproject.toml",
"uv.lock",
]: # docs or root uv files
docs_edited = True
elif any(file.startswith(p) for p in ["docs/", "cookbook/"]):
if file.startswith("docs/"):
docs_edited = True
dirs_to_run["lint"].add(".")
dependents = dependents_graph()
@@ -323,7 +324,6 @@ if __name__ == "__main__":
"compile-integration-tests",
"dependencies",
"test-pydantic",
"codspeed",
]
}
map_job_to_configs["test-doc-imports"] = (

View File

@@ -1,5 +1,4 @@
import sys
import tomllib
if __name__ == "__main__":
@@ -11,25 +10,26 @@ if __name__ == "__main__":
toml_data = tomllib.load(file)
# see if we're releasing an rc
version = toml_data["project"]["version"]
version = toml_data["tool"]["poetry"]["version"]
releasing_rc = "rc" in version or "dev" in version
# if not, iterate through dependencies and make sure none allow prereleases
if not releasing_rc:
dependencies = toml_data["project"]["dependencies"]
for dep_version in dependencies:
dependencies = toml_data["tool"]["poetry"]["dependencies"]
for lib in dependencies:
dep_version = dependencies[lib]
dep_version_string = (
dep_version["version"] if isinstance(dep_version, dict) else dep_version
)
if "rc" in dep_version_string:
raise ValueError(
f"Dependency {dep_version} has a prerelease version. Please remove this."
f"Dependency {lib} has a prerelease version. Please remove this."
)
if isinstance(dep_version, dict) and dep_version.get(
"allow-prereleases", False
):
raise ValueError(
f"Dependency {dep_version} has allow-prereleases set to true. Please remove this."
f"Dependency {lib} has allow-prereleases set to true. Please remove this."
)

View File

@@ -1,5 +1,4 @@
import sys
from collections import defaultdict
from typing import Optional
if sys.version_info >= (3, 11):
@@ -8,16 +7,20 @@ else:
# for python 3.10 and below, which doesnt have stdlib tomllib
import tomli as tomllib
import re
from typing import List
from packaging.specifiers import SpecifierSet
from packaging.version import Version
import requests
from packaging.requirements import Requirement
from packaging.specifiers import SpecifierSet
from packaging.version import Version, parse
from packaging.version import parse
from typing import List
import re
MIN_VERSION_LIBS = [
"langchain-core",
"langchain-community",
"langchain",
"langchain-text-splitters",
"numpy",
@@ -30,6 +33,7 @@ SKIP_IF_PULL_REQUEST = [
"langchain-core",
"langchain-text-splitters",
"langchain",
"langchain-community",
]
@@ -68,13 +72,11 @@ def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]:
spec_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", spec_string)
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string)
for y in range(1, 10):
spec_string = re.sub(
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}", spec_string
)
spec_string = re.sub(rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y+1}", spec_string)
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
for x in range(1, 10):
spec_string = re.sub(
rf"\^{x}\.(\d+)\.(\d+)", rf">={x}.\1.\2,<{x + 1}", spec_string
rf"\^{x}\.(\d+)\.(\d+)", rf">={x}.\1.\2,<{x+1}", spec_string
)
spec_set = SpecifierSet(spec_string)
@@ -92,23 +94,6 @@ def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]:
return str(min(valid_versions)) if valid_versions else None
def _check_python_version_from_requirement(
requirement: Requirement, python_version: str
) -> bool:
if not requirement.marker:
return True
else:
marker_str = str(requirement.marker)
if "python_version" or "python_full_version" in marker_str:
python_version_str = "".join(
char
for char in marker_str
if char.isdigit() or char in (".", "<", ">", "=", ",")
)
return check_python_version(python_version, python_version_str)
return True
def get_min_version_from_toml(
toml_path: str,
versions_for: str,
@@ -120,10 +105,8 @@ def get_min_version_from_toml(
with open(toml_path, "rb") as file:
toml_data = tomllib.load(file)
dependencies = defaultdict(list)
for dep in toml_data["project"]["dependencies"]:
requirement = Requirement(dep)
dependencies[requirement.name].append(requirement)
# Get the dependencies from tool.poetry.dependencies
dependencies = toml_data["tool"]["poetry"]["dependencies"]
# Initialize a dictionary to store the minimum versions
min_versions = {}
@@ -138,11 +121,17 @@ def get_min_version_from_toml(
if lib in dependencies:
if include and lib not in include:
continue
requirements = dependencies[lib]
for requirement in requirements:
if _check_python_version_from_requirement(requirement, python_version):
version_string = str(requirement.specifier)
break
# Get the version string
version_string = dependencies[lib]
if isinstance(version_string, dict):
version_string = version_string["version"]
if isinstance(version_string, list):
version_string = [
vs
for vs in version_string
if check_python_version(python_version, vs["python"])
][0]["version"]
# Use parse_version to get the minimum supported version from version_string
min_version = get_minimum_version(lib, version_string)
@@ -167,12 +156,12 @@ def check_python_version(version_string, constraint_string):
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string)
for y in range(1, 10):
constraint_string = re.sub(
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}.0", constraint_string
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y+1}.0", constraint_string
)
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
for x in range(1, 10):
constraint_string = re.sub(
rf"\^{x}\.0\.(\d+)", rf">={x}.0.\1,<{x + 1}.0.0", constraint_string
rf"\^{x}\.0\.(\d+)", rf">={x}.0.\1,<{x+1}.0.0", constraint_string
)
try:

View File

@@ -3,10 +3,9 @@
import os
import shutil
from pathlib import Path
from typing import Any, Dict
import yaml
from pathlib import Path
from typing import Dict, Any
def load_packages_yaml() -> Dict[str, Any]:
@@ -21,14 +20,13 @@ def get_target_dir(package_name: str) -> Path:
base_path = Path("langchain/libs")
if package_name_short == "experimental":
return base_path / "experimental"
if package_name_short == "community":
return base_path / "community"
return base_path / "partners" / package_name_short
def clean_target_directories(packages: list) -> None:
"""Remove old directories that will be replaced."""
for package in packages:
target_dir = get_target_dir(package["name"])
if target_dir.exists():
print(f"Removing {target_dir}")
@@ -38,6 +36,7 @@ def clean_target_directories(packages: list) -> None:
def move_libraries(packages: list) -> None:
"""Move libraries from their source locations to the target directories."""
for package in packages:
repo_name = package["repo"].split("/")[1]
source_path = package["path"]
target_dir = get_target_dir(package["name"])
@@ -65,41 +64,19 @@ def main():
try:
# Load packages configuration
package_yaml = load_packages_yaml()
packages = [
p
for p in package_yaml["packages"]
if not p.get("disabled", False)
and p["repo"].startswith("langchain-ai/")
and p["repo"] != "langchain-ai/langchain"
]
# Clean target directories
clean_target_directories(
[
p
for p in package_yaml["packages"]
if (
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
)
and p["repo"] != "langchain-ai/langchain"
and p["name"]
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
]
)
clean_target_directories(packages)
# Move libraries to their new locations
move_libraries(
[
p
for p in package_yaml["packages"]
if not p.get("disabled", False)
and (
p["repo"].startswith("langchain-ai/") or p.get("include_in_api_ref")
)
and p["repo"] != "langchain-ai/langchain"
and p["name"]
!= "langchain-ai21" # Skip AI21 due to dependency conflicts
]
)
# Delete ones without a pyproject.toml
for partner in Path("langchain/libs/partners").iterdir():
if partner.is_dir() and not (partner / "pyproject.toml").exists():
print(f"Removing {partner} as it does not have a pyproject.toml")
shutil.rmtree(partner)
move_libraries(packages)
print("Library sync completed successfully!")

View File

@@ -81,93 +81,56 @@ import time
__version__ = "2022.12+dev"
# Update symlinks only if the platform supports not following them
UPDATE_SYMLINKS = bool(os.utime in getattr(os, "supports_follow_symlinks", []))
UPDATE_SYMLINKS = bool(os.utime in getattr(os, 'supports_follow_symlinks', []))
# Call os.path.normpath() only if not in a POSIX platform (Windows)
NORMALIZE_PATHS = os.path.sep != "/"
NORMALIZE_PATHS = (os.path.sep != '/')
# How many files to process in each batch when re-trying merge commits
STEPMISSING = 100
# (Extra) keywords for the os.utime() call performed by touch()
UTIME_KWS = {} if not UPDATE_SYMLINKS else {"follow_symlinks": False}
UTIME_KWS = {} if not UPDATE_SYMLINKS else {'follow_symlinks': False}
# Command-line interface ######################################################
def parse_args():
parser = argparse.ArgumentParser(description=__doc__.split("\n---")[0])
parser = argparse.ArgumentParser(
description=__doc__.split('\n---')[0])
group = parser.add_mutually_exclusive_group()
group.add_argument(
"--quiet",
"-q",
dest="loglevel",
action="store_const",
const=logging.WARNING,
default=logging.INFO,
help="Suppress informative messages and summary statistics.",
)
group.add_argument(
"--verbose",
"-v",
action="count",
help="""
group.add_argument('--quiet', '-q', dest='loglevel',
action="store_const", const=logging.WARNING, default=logging.INFO,
help="Suppress informative messages and summary statistics.")
group.add_argument('--verbose', '-v', action="count", help="""
Print additional information for each processed file.
Specify twice to further increase verbosity.
""",
)
""")
parser.add_argument(
"--cwd",
"-C",
metavar="DIRECTORY",
help="""
parser.add_argument('--cwd', '-C', metavar="DIRECTORY", help="""
Run as if %(prog)s was started in directory %(metavar)s.
This affects how --work-tree, --git-dir and PATHSPEC arguments are handled.
See 'man 1 git' or 'git --help' for more information.
""",
)
""")
parser.add_argument(
"--git-dir",
dest="gitdir",
metavar="GITDIR",
help="""
parser.add_argument('--git-dir', dest='gitdir', metavar="GITDIR", help="""
Path to the git repository, by default auto-discovered by searching
the current directory and its parents for a .git/ subdirectory.
""",
)
""")
parser.add_argument(
"--work-tree",
dest="workdir",
metavar="WORKTREE",
help="""
parser.add_argument('--work-tree', dest='workdir', metavar="WORKTREE", help="""
Path to the work tree root, by default the parent of GITDIR if it's
automatically discovered, or the current directory if GITDIR is set.
""",
)
""")
parser.add_argument(
"--force",
"-f",
default=False,
action="store_true",
help="""
parser.add_argument('--force', '-f', default=False, action="store_true", help="""
Force updating files with uncommitted modifications.
Untracked files and uncommitted deletions, renames and additions are
always ignored.
""",
)
""")
parser.add_argument(
"--merge",
"-m",
default=False,
action="store_true",
help="""
parser.add_argument('--merge', '-m', default=False, action="store_true", help="""
Include merge commits.
Leads to more recent times and more files per commit, thus with the same
time, which may or may not be what you want.
@@ -175,130 +138,71 @@ def parse_args():
are found sooner, which can improve performance, sometimes substantially.
But as merge commits are usually huge, processing them may also take longer.
By default, merge commits are only used for files missing from regular commits.
""",
)
""")
parser.add_argument(
"--first-parent",
default=False,
action="store_true",
help="""
parser.add_argument('--first-parent', default=False, action="store_true", help="""
Consider only the first parent, the "main branch", when evaluating merge commits.
Only effective when merge commits are processed, either when --merge is
used or when finding missing files after the first regular log search.
See --skip-missing.
""",
)
""")
parser.add_argument(
"--skip-missing",
"-s",
dest="missing",
default=True,
action="store_false",
help="""
parser.add_argument('--skip-missing', '-s', dest="missing", default=True,
action="store_false", help="""
Do not try to find missing files.
If merge commits were not evaluated with --merge and some files were
not found in regular commits, by default %(prog)s searches for these
files again in the merge commits.
This option disables this retry, so files found only in merge commits
will not have their timestamp updated.
""",
)
""")
parser.add_argument(
"--no-directories",
"-D",
dest="dirs",
default=True,
action="store_false",
help="""
parser.add_argument('--no-directories', '-D', dest='dirs', default=True,
action="store_false", help="""
Do not update directory timestamps.
By default, use the time of its most recently created, renamed or deleted file.
Note that just modifying a file will NOT update its directory time.
""",
)
""")
parser.add_argument(
"--test",
"-t",
default=False,
action="store_true",
help="Test run: do not actually update any file timestamp.",
)
parser.add_argument('--test', '-t', default=False, action="store_true",
help="Test run: do not actually update any file timestamp.")
parser.add_argument(
"--commit-time",
"-c",
dest="commit_time",
default=False,
action="store_true",
help="Use commit time instead of author time.",
)
parser.add_argument('--commit-time', '-c', dest='commit_time', default=False,
action='store_true', help="Use commit time instead of author time.")
parser.add_argument(
"--oldest-time",
"-o",
dest="reverse_order",
default=False,
action="store_true",
help="""
parser.add_argument('--oldest-time', '-o', dest='reverse_order', default=False,
action='store_true', help="""
Update times based on the oldest, instead of the most recent commit of a file.
This reverses the order in which the git log is processed to emulate a
file "creation" date. Note this will be inaccurate for files deleted and
re-created at later dates.
""",
)
""")
parser.add_argument(
"--skip-older-than",
metavar="SECONDS",
type=int,
help="""
parser.add_argument('--skip-older-than', metavar='SECONDS', type=int, help="""
Ignore files that are currently older than %(metavar)s.
Useful in workflows that assume such files already have a correct timestamp,
as it may improve performance by processing fewer files.
""",
)
""")
parser.add_argument(
"--skip-older-than-commit",
"-N",
default=False,
action="store_true",
help="""
parser.add_argument('--skip-older-than-commit', '-N', default=False,
action='store_true', help="""
Ignore files older than the timestamp it would be updated to.
Such files may be considered "original", likely in the author's repository.
""",
)
""")
parser.add_argument(
"--unique-times",
default=False,
action="store_true",
help="""
parser.add_argument('--unique-times', default=False, action="store_true", help="""
Set the microseconds to a unique value per commit.
Allows telling apart changes that would otherwise have identical timestamps,
as git's time accuracy is in seconds.
""",
)
""")
parser.add_argument(
"pathspec",
nargs="*",
metavar="PATHSPEC",
help="""
parser.add_argument('pathspec', nargs='*', metavar='PATHSPEC', help="""
Only modify paths matching %(metavar)s, relative to current directory.
By default, update all but untracked files and submodules.
""",
)
""")
parser.add_argument(
"--version",
"-V",
action="version",
version="%(prog)s version {version}".format(version=get_version()),
)
parser.add_argument('--version', '-V', action='version',
version='%(prog)s version {version}'.format(version=get_version()))
args_ = parser.parse_args()
if args_.verbose:
@@ -308,18 +212,17 @@ def parse_args():
def get_version(version=__version__):
if not version.endswith("+dev"):
if not version.endswith('+dev'):
return version
try:
cwd = os.path.dirname(os.path.realpath(__file__))
return Git(cwd=cwd, errors=False).describe().lstrip("v")
return Git(cwd=cwd, errors=False).describe().lstrip('v')
except Git.Error:
return "-".join((version, "unknown"))
return '-'.join((version, "unknown"))
# Helper functions ############################################################
def setup_logging():
"""Add TRACE logging level and corresponding method, return the root logger"""
logging.TRACE = TRACE = logging.DEBUG // 2
@@ -352,13 +255,11 @@ def normalize(path):
if path and path[0] == '"':
# Python 2: path = path[1:-1].decode("string-escape")
# Python 3: https://stackoverflow.com/a/46650050/624066
path = (
path[1:-1] # Remove enclosing double quotes
.encode("latin1") # Convert to bytes, required by 'unicode-escape'
.decode("unicode-escape") # Perform the actual octal-escaping decode
.encode("latin1") # 1:1 mapping to bytes, UTF-8 encoded
.decode("utf8", "surrogateescape")
) # Decode from UTF-8
path = (path[1:-1] # Remove enclosing double quotes
.encode('latin1') # Convert to bytes, required by 'unicode-escape'
.decode('unicode-escape') # Perform the actual octal-escaping decode
.encode('latin1') # 1:1 mapping to bytes, UTF-8 encoded
.decode('utf8', 'surrogateescape')) # Decode from UTF-8
if NORMALIZE_PATHS:
# Make sure the slash matches the OS; for Windows we need a backslash
path = os.path.normpath(path)
@@ -381,12 +282,12 @@ def touch_ns(path, mtime_ns):
def isodate(secs: int):
# time.localtime() accepts floats, but discards fractional part
return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(secs))
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(secs))
def isodate_ns(ns: int):
# for integers fromtimestamp() is equivalent and ~16% slower than isodate()
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=" ")
return datetime.datetime.fromtimestamp(ns / 1000000000).isoformat(sep=' ')
def get_mtime_ns(secs: int, idx: int):
@@ -404,49 +305,35 @@ def get_mtime_path(path):
# Git class and parse_log(), the heart of the script ##########################
class Git:
def __init__(self, workdir=None, gitdir=None, cwd=None, errors=True):
self.gitcmd = ["git"]
self.gitcmd = ['git']
self.errors = errors
self._proc = None
if workdir:
self.gitcmd.extend(("--work-tree", workdir))
if gitdir:
self.gitcmd.extend(("--git-dir", gitdir))
if cwd:
self.gitcmd.extend(("-C", cwd))
if workdir: self.gitcmd.extend(('--work-tree', workdir))
if gitdir: self.gitcmd.extend(('--git-dir', gitdir))
if cwd: self.gitcmd.extend(('-C', cwd))
self.workdir, self.gitdir = self._get_repo_dirs()
def ls_files(self, paths: list = None):
return (normalize(_) for _ in self._run("ls-files --full-name", paths))
return (normalize(_) for _ in self._run('ls-files --full-name', paths))
def ls_dirty(self, force=False):
return (
normalize(_[3:].split(" -> ", 1)[-1])
for _ in self._run("status --porcelain")
if _[:2] != "??" and (not force or (_[0] in ("R", "A") or _[1] == "D"))
)
return (normalize(_[3:].split(' -> ', 1)[-1])
for _ in self._run('status --porcelain')
if _[:2] != '??' and (not force or (_[0] in ('R', 'A')
or _[1] == 'D')))
def log(
self,
merge=False,
first_parent=False,
commit_time=False,
reverse_order=False,
paths: list = None,
):
cmd = "whatchanged --pretty={}".format("%ct" if commit_time else "%at")
if merge:
cmd += " -m"
if first_parent:
cmd += " --first-parent"
if reverse_order:
cmd += " --reverse"
def log(self, merge=False, first_parent=False, commit_time=False,
reverse_order=False, paths: list = None):
cmd = 'whatchanged --pretty={}'.format('%ct' if commit_time else '%at')
if merge: cmd += ' -m'
if first_parent: cmd += ' --first-parent'
if reverse_order: cmd += ' --reverse'
return self._run(cmd, paths)
def describe(self):
return self._run("describe --tags", check=True)[0]
return self._run('describe --tags', check=True)[0]
def terminate(self):
if self._proc is None:
@@ -458,22 +345,18 @@ class Git:
pass
def _get_repo_dirs(self):
return (
os.path.normpath(_)
for _ in self._run(
"rev-parse --show-toplevel --absolute-git-dir", check=True
)
)
return (os.path.normpath(_) for _ in
self._run('rev-parse --show-toplevel --absolute-git-dir', check=True))
def _run(self, cmdstr: str, paths: list = None, output=True, check=False):
cmdlist = self.gitcmd + shlex.split(cmdstr)
if paths:
cmdlist.append("--")
cmdlist.append('--')
cmdlist.extend(paths)
popen_args = dict(universal_newlines=True, encoding="utf8")
popen_args = dict(universal_newlines=True, encoding='utf8')
if not self.errors:
popen_args["stderr"] = subprocess.DEVNULL
log.trace("Executing: %s", " ".join(cmdlist))
popen_args['stderr'] = subprocess.DEVNULL
log.trace("Executing: %s", ' '.join(cmdlist))
if not output:
return subprocess.call(cmdlist, **popen_args)
if check:
@@ -496,26 +379,30 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
mtime = 0
datestr = isodate(0)
for line in git.log(
merge, args.first_parent, args.commit_time, args.reverse_order, filterlist
merge,
args.first_parent,
args.commit_time,
args.reverse_order,
filterlist
):
stats["loglines"] += 1
stats['loglines'] += 1
# Blank line between Date and list of files
if not line:
continue
# Date line
if line[0] != ":": # Faster than `not line.startswith(':')`
stats["commits"] += 1
if line[0] != ':': # Faster than `not line.startswith(':')`
stats['commits'] += 1
mtime = int(line)
if args.unique_times:
mtime = get_mtime_ns(mtime, stats["commits"])
mtime = get_mtime_ns(mtime, stats['commits'])
if args.debug:
datestr = isodate(mtime)
continue
# File line: three tokens if it describes a renaming, otherwise two
tokens = line.split("\t")
tokens = line.split('\t')
# Possible statuses:
# M: Modified (content changed)
@@ -524,7 +411,7 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
# T: Type changed: to/from regular file, symlinks, submodules
# R099: Renamed (moved), with % of unchanged content. 100 = pure rename
# Not possible in log: C=Copied, U=Unmerged, X=Unknown, B=pairing Broken
status = tokens[0].split(" ")[-1]
status = tokens[0].split(' ')[-1]
file = tokens[-1]
# Handles non-ASCII chars and OS path separator
@@ -532,76 +419,56 @@ def parse_log(filelist, dirlist, stats, git, merge=False, filterlist=None):
def do_file():
if args.skip_older_than_commit and get_mtime_path(file) <= mtime:
stats["skip"] += 1
stats['skip'] += 1
return
if args.debug:
log.debug(
"%d\t%d\t%d\t%s\t%s",
stats["loglines"],
stats["commits"],
stats["files"],
datestr,
file,
)
log.debug("%d\t%d\t%d\t%s\t%s",
stats['loglines'], stats['commits'], stats['files'],
datestr, file)
try:
touch(os.path.join(git.workdir, file), mtime)
stats["touches"] += 1
stats['touches'] += 1
except Exception as e:
log.error("ERROR: %s: %s", e, file)
stats["errors"] += 1
stats['errors'] += 1
def do_dir():
if args.debug:
log.debug(
"%d\t%d\t-\t%s\t%s",
stats["loglines"],
stats["commits"],
datestr,
"{}/".format(dirname or "."),
)
log.debug("%d\t%d\t-\t%s\t%s",
stats['loglines'], stats['commits'],
datestr, "{}/".format(dirname or '.'))
try:
touch(os.path.join(git.workdir, dirname), mtime)
stats["dirtouches"] += 1
stats['dirtouches'] += 1
except Exception as e:
log.error("ERROR: %s: %s", e, dirname)
stats["direrrors"] += 1
stats['direrrors'] += 1
if file in filelist:
stats["files"] -= 1
stats['files'] -= 1
filelist.remove(file)
do_file()
if args.dirs and status in ("A", "D"):
if args.dirs and status in ('A', 'D'):
dirname = os.path.dirname(file)
if dirname in dirlist:
dirlist.remove(dirname)
do_dir()
# All files done?
if not stats["files"]:
if not stats['files']:
git.terminate()
return
# Main Logic ##################################################################
def main():
start = time.time() # yes, Wall time. CPU time is not realistic for users.
stats = {
_: 0
for _ in (
"loglines",
"commits",
"touches",
"skip",
"errors",
"dirtouches",
"direrrors",
)
}
stats = {_: 0 for _ in ('loglines', 'commits', 'touches', 'skip', 'errors',
'dirtouches', 'direrrors')}
logging.basicConfig(level=args.loglevel, format="%(message)s")
logging.basicConfig(level=args.loglevel, format='%(message)s')
log.trace("Arguments: %s", args)
# First things first: Where and Who are we?
@@ -632,16 +499,13 @@ def main():
# Symlink (to file, to dir or broken - git handles the same way)
if not UPDATE_SYMLINKS and os.path.islink(fullpath):
log.warning(
"WARNING: Skipping symlink, no OS support for updates: %s", path
)
log.warning("WARNING: Skipping symlink, no OS support for updates: %s",
path)
continue
# skip files which are older than given threshold
if (
args.skip_older_than
and start - get_mtime_path(fullpath) > args.skip_older_than
):
if (args.skip_older_than
and start - get_mtime_path(fullpath) > args.skip_older_than):
continue
# Always add files relative to worktree root
@@ -655,17 +519,15 @@ def main():
else:
dirty = set(git.ls_dirty())
if dirty:
log.warning(
"WARNING: Modified files in the working directory were ignored."
"\nTo include such files, commit your changes or use --force."
)
log.warning("WARNING: Modified files in the working directory were ignored."
"\nTo include such files, commit your changes or use --force.")
filelist -= dirty
# Build dir list to be processed
dirlist = set(os.path.dirname(_) for _ in filelist) if args.dirs else set()
stats["totalfiles"] = stats["files"] = len(filelist)
log.info("{0:,} files to be processed in work dir".format(stats["totalfiles"]))
stats['totalfiles'] = stats['files'] = len(filelist)
log.info("{0:,} files to be processed in work dir".format(stats['totalfiles']))
if not filelist:
# Nothing to do. Exit silently and without errors, just like git does
@@ -682,18 +544,10 @@ def main():
if args.missing and not args.merge:
filterlist = list(filelist)
missing = len(filterlist)
log.info(
"{0:,} files not found in log, trying merge commits".format(missing)
)
log.info("{0:,} files not found in log, trying merge commits".format(missing))
for i in range(0, missing, STEPMISSING):
parse_log(
filelist,
dirlist,
stats,
git,
merge=True,
filterlist=filterlist[i : i + STEPMISSING],
)
parse_log(filelist, dirlist, stats, git,
merge=True, filterlist=filterlist[i:i + STEPMISSING])
# Still missing some?
for file in filelist:
@@ -702,33 +556,29 @@ def main():
# Final statistics
# Suggestion: use git-log --before=mtime to brag about skipped log entries
def log_info(msg, *a, width=13):
ifmt = "{:%d,}" % (width,) # not using 'n' for consistency with ffmt
ffmt = "{:%d,.2f}" % (width,)
ifmt = '{:%d,}' % (width,) # not using 'n' for consistency with ffmt
ffmt = '{:%d,.2f}' % (width,)
# %-formatting lacks a thousand separator, must pre-render with .format()
log.info(msg.replace("%d", ifmt).replace("%f", ffmt).format(*a))
log.info(msg.replace('%d', ifmt).replace('%f', ffmt).format(*a))
log_info(
"Statistics:\n%f seconds\n%d log lines processed\n%d commits evaluated",
time.time() - start,
stats["loglines"],
stats["commits"],
)
"Statistics:\n"
"%f seconds\n"
"%d log lines processed\n"
"%d commits evaluated",
time.time() - start, stats['loglines'], stats['commits'])
if args.dirs:
if stats["direrrors"]:
log_info("%d directory update errors", stats["direrrors"])
log_info("%d directories updated", stats["dirtouches"])
if stats['direrrors']: log_info("%d directory update errors", stats['direrrors'])
log_info("%d directories updated", stats['dirtouches'])
if stats["touches"] != stats["totalfiles"]:
log_info("%d files", stats["totalfiles"])
if stats["skip"]:
log_info("%d files skipped", stats["skip"])
if stats["files"]:
log_info("%d files missing", stats["files"])
if stats["errors"]:
log_info("%d file update errors", stats["errors"])
if stats['touches'] != stats['totalfiles']:
log_info("%d files", stats['totalfiles'])
if stats['skip']: log_info("%d files skipped", stats['skip'])
if stats['files']: log_info("%d files missing", stats['files'])
if stats['errors']: log_info("%d file update errors", stats['errors'])
log_info("%d files updated", stats["touches"])
log_info("%d files updated", stats['touches'])
if args.test:
log.info("TEST RUN - No files modified!")

7
.github/workflows/.codespell-exclude vendored Normal file
View File

@@ -0,0 +1,7 @@
libs/community/langchain_community/llms/yuan2.py
"NotIn": "not in",
- `/checkin`: Check-in
docs/docs/integrations/providers/trulens.mdx
self.assertIn(
from trulens_eval import Tru
tru = Tru()

View File

@@ -1,4 +1,4 @@
name: '🔗 Compile Integration Tests'
name: compile-integration-test
on:
workflow_call:
@@ -12,11 +12,8 @@ on:
type: string
description: "Python version to use"
permissions:
contents: read
env:
UV_FROZEN: "true"
POETRY_VERSION: "1.7.1"
jobs:
build:
@@ -24,25 +21,27 @@ jobs:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
timeout-minutes: 20
name: 'Python ${{ inputs.python-version }}'
name: "poetry run pytest -m compile tests/integration_tests #${{ inputs.python-version }}"
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ inputs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: compile-integration
- name: '📦 Install Integration Dependencies'
- name: Install integration dependencies
shell: bash
run: uv sync --group test --group test_integration
run: poetry install --with=test_integration,test
- name: '🔗 Check Integration Tests Compile'
- name: Check integration tests compile
shell: bash
run: uv run pytest -m compile tests/integration_tests
run: poetry run pytest -m compile tests/integration_tests
- name: '🧹 Verify Clean Working Directory'
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu

View File

@@ -1,5 +1,4 @@
name: '🚀 Integration Tests'
run-name: 'Test ${{ inputs.working-directory }} on Python ${{ inputs.python-version }}'
name: Integration tests
on:
workflow_dispatch:
@@ -7,18 +6,13 @@ on:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
python-version:
required: true
type: string
description: "Python version to use"
default: "3.11"
permissions:
contents: read
env:
UV_FROZEN: "true"
POETRY_VERSION: "1.7.1"
jobs:
build:
@@ -26,28 +20,34 @@ jobs:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
name: 'Python ${{ inputs.python-version }}'
name: Python ${{ inputs.python-version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ inputs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: core
- name: '📦 Install Integration Dependencies'
- name: Install dependencies
shell: bash
run: uv sync --group test --group test_integration
run: poetry install --with test,test_integration
- name: '🚀 Run Integration Tests'
- name: Install deps outside pyproject
if: ${{ startsWith(inputs.working-directory, 'libs/community/') }}
shell: bash
run: poetry run pip install "boto3<2" "google-cloud-aiplatform<2"
- name: Run integration tests
shell: bash
env:
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
@@ -67,6 +67,8 @@ jobs:
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
@@ -74,10 +76,10 @@ jobs:
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
ES_API_KEY: ${{ secrets.ES_API_KEY }}
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
run: |
make integration_tests

View File

@@ -1,6 +1,4 @@
name: '🧹 Code Linting'
# Runs code quality checks using ruff, mypy, and other linting tools
# Checks both package code and test code for consistency
name: lint
on:
workflow_call:
@@ -14,33 +12,41 @@ on:
type: string
description: "Python version to use"
permissions:
contents: read
env:
POETRY_VERSION: "1.7.1"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
# This env var allows us to get inline annotations when ruff has complaints.
RUFF_OUTPUT_FORMAT: github
UV_FROZEN: "true"
jobs:
# Linting job - runs quality checks on package and test code
build:
name: 'Python ${{ inputs.python-version }}'
name: "make lint #${{ inputs.python-version }}"
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- name: '📋 Checkout Code'
uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ inputs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: lint-with-extras
- name: '📦 Install Lint & Typing Dependencies'
- name: Check Poetry File
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry check
- name: Check lock file
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry lock --check
- name: Install dependencies
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
@@ -51,14 +57,24 @@ jobs:
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --group lint --group typing
poetry install --with lint,typing
- name: '🔍 Analyze Package Code with Linters'
- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v4
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ inputs.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint_package
- name: '📦 Install Unit Test Dependencies'
- name: Install unit test dependencies
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
@@ -70,14 +86,23 @@ jobs:
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --inexact --group test
- name: '📦 Install Unit + Integration Test Dependencies'
poetry install --with test
- name: Install unit+integration test dependencies
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --inexact --group test --group test_integration
poetry install --with test,test_integration
- name: '🔍 Analyze Test Code with Linters'
- name: Get .mypy_cache_test to speed up mypy
uses: actions/cache@v4
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache_test
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ inputs.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint_tests

View File

@@ -1,5 +1,5 @@
name: '🚀 Package Release'
run-name: 'Release ${{ inputs.working-directory }} ${{ inputs.release-version }}'
name: release
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
on:
workflow_call:
inputs:
@@ -12,27 +12,18 @@ on:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"
default: 'libs/langchain'
release-version:
required: true
type: string
default: '0.1.0'
description: "New version of package being released"
dangerous-nonmaster-release:
required: false
type: boolean
default: false
description: "Release from a non-master branch (danger!) - Only use for hotfixes"
description: "Release from a non-master branch (danger!)"
env:
PYTHON_VERSION: "3.11"
UV_FROZEN: "true"
UV_NO_SYNC: "true"
POETRY_VERSION: "1.7.1"
jobs:
# Build the distribution package and extract version info
# Runs in isolated environment with minimal permissions for security
build:
if: github.ref == 'refs/heads/master' || inputs.dangerous-nonmaster-release
environment: Scheduled testing
@@ -43,12 +34,15 @@ jobs:
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
@@ -62,7 +56,7 @@ jobs:
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: Build project for distribution
run: uv build
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: Upload build
@@ -71,20 +65,13 @@ jobs:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Check version
- name: Check Version
id: check-version
shell: python
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
import os
import tomllib
with open("pyproject.toml", "rb") as f:
data = tomllib.load(f)
pkg_name = data["project"]["name"]
version = data["project"]["version"]
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
f.write(f"pkg-name={pkg_name}\n")
f.write(f"version={version}\n")
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
release-notes:
needs:
- build
@@ -92,7 +79,7 @@ jobs:
outputs:
release-body: ${{ steps.generate-release-body.outputs.release-body }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: langchain-ai/langchain
path: langchain
@@ -100,7 +87,7 @@ jobs:
${{ inputs.working-directory }}
ref: ${{ github.ref }} # this scopes to just ref'd branch
fetch-depth: 0 # this fetches entire commit history
- name: Check tags
- name: Check Tags
id: check-tags
shell: bash
working-directory: langchain/${{ inputs.working-directory }}
@@ -108,32 +95,15 @@ jobs:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
run: |
# Handle regular versions and pre-release versions differently
if [[ "$VERSION" == *"-"* ]]; then
# This is a pre-release version (contains a hyphen)
# Extract the base version without the pre-release suffix
BASE_VERSION=${VERSION%%-*}
# Look for the latest release of the same base version
REGEX="^$PKG_NAME==$BASE_VERSION\$"
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
PREV_TAG="$PKG_NAME==${VERSION%.*}.$(( ${VERSION##*.} - 1 ))"; [[ "${VERSION##*.}" -eq 0 ]] && PREV_TAG=""
# If no exact base version match, look for the latest release of any kind
if [ -z "$PREV_TAG" ]; then
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
PREV_TAG=$(git tag --sort=-creatordate | (grep -P "$REGEX" || true) | head -1)
fi
else
# Regular version handling
PREV_TAG="$PKG_NAME==${VERSION%.*}.$(( ${VERSION##*.} - 1 ))"; [[ "${VERSION##*.}" -eq 0 ]] && PREV_TAG=""
# backup case if releasing e.g. 0.3.0, looks up last release
# note if last release (chronologically) was e.g. 0.1.47 it will get
# that instead of the last 0.2 release
if [ -z "$PREV_TAG" ]; then
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
echo $REGEX
PREV_TAG=$(git tag --sort=-creatordate | (grep -P $REGEX || true) | head -1)
fi
# backup case if releasing e.g. 0.3.0, looks up last release
# note if last release (chronologically) was e.g. 0.1.47 it will get
# that instead of the last 0.2 release
if [ -z "$PREV_TAG" ]; then
REGEX="^$PKG_NAME==\\d+\\.\\d+\\.\\d+\$"
echo $REGEX
PREV_TAG=$(git tag --sort=-creatordate | (grep -P $REGEX || true) | head -1)
fi
# if PREV_TAG is empty, let it be empty
@@ -197,9 +167,8 @@ jobs:
- release-notes
- test-pypi-publish
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
# We explicitly *don't* set up caching here. This ensures our tests are
# maximally sensitive to catching breakage.
@@ -214,18 +183,15 @@ jobs:
# - The package is published, and it breaks on the missing dependency when
# used in the real world.
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
id: setup-python
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
- uses: actions/download-artifact@v5
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Import dist package
- name: Import published package
shell: bash
working-directory: ${{ inputs.working-directory }}
env:
@@ -241,21 +207,27 @@ jobs:
# - attempt install again after 5 seconds if it fails because there is
# sometimes a delay in availability on test pypi
run: |
uv venv
VIRTUAL_ENV=.venv uv pip install dist/*.whl
poetry run pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION" || \
( \
sleep 15 && \
poetry run pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION" \
)
# Replace all dashes in the package name with underscores,
# since that's how Python imports packages with dashes in the name.
# also remove _official suffix
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g | sed s/_official//g)"
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
uv run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
- name: Import test dependencies
run: uv sync --group test
run: poetry install --with test
working-directory: ${{ inputs.working-directory }}
# Overwrite the local version of the package with the built version
# Overwrite the local version of the package with the test PyPI version.
- name: Import published package (again)
working-directory: ${{ inputs.working-directory }}
shell: bash
@@ -263,7 +235,9 @@ jobs:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
run: |
VIRTUAL_ENV=.venv uv pip install dist/*.whl
poetry run pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION"
- name: Run unit tests
run: make tests
@@ -272,15 +246,15 @@ jobs:
- name: Check for prerelease versions
working-directory: ${{ inputs.working-directory }}
run: |
uv run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
poetry run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
id: min-version
run: |
VIRTUAL_ENV=.venv uv pip install packaging requests
python_version="$(uv run python --version | awk '{print $2}')"
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)"
poetry run pip install packaging requests
python_version="$(poetry run python --version | awk '{print $2}')"
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml release $python_version)"
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
echo "min-versions=$min_versions"
@@ -289,13 +263,12 @@ jobs:
env:
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
run: |
VIRTUAL_ENV=.venv uv pip install --force-reinstall --editable .
VIRTUAL_ENV=.venv uv pip install --force-reinstall $MIN_VERSIONS
poetry run pip install --force-reinstall $MIN_VERSIONS --editable .
make tests
working-directory: ${{ inputs.working-directory }}
- name: Import integration test dependencies
run: uv sync --group test --group test_integration
run: poetry install --with test,test_integration
working-directory: ${{ inputs.working-directory }}
- name: Run integration tests
@@ -323,6 +296,8 @@ jobs:
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
WATSONX_APIKEY: ${{ secrets.WATSONX_APIKEY }}
WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
@@ -330,103 +305,19 @@ jobs:
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
ES_API_KEY: ${{ secrets.ES_API_KEY }}
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
UPSTAGE_API_KEY: ${{ secrets.UPSTAGE_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}
# Test select published packages against new core
test-prior-published-packages-against-new-core:
needs:
- build
- release-notes
- test-pypi-publish
- pre-release-checks
runs-on: ubuntu-latest
strategy:
matrix:
partner: [openai, anthropic]
fail-fast: false # Continue testing other partners if one fails
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
steps:
- uses: actions/checkout@v5
# We implement this conditional as Github Actions does not have good support
# for conditionally needing steps. https://github.com/actions/runner/issues/491
- name: Check if libs/core
run: |
if [ "${{ startsWith(inputs.working-directory, 'libs/core') }}" != "true" ]; then
echo "Not in libs/core. Exiting successfully."
exit 0
fi
- name: Set up Python + uv
if: startsWith(inputs.working-directory, 'libs/core')
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
- uses: actions/download-artifact@v5
if: startsWith(inputs.working-directory, 'libs/core')
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Test against ${{ matrix.partner }}
if: startsWith(inputs.working-directory, 'libs/core')
run: |
# Identify latest tag, excluding pre-releases
LATEST_PACKAGE_TAG="$(
git ls-remote --tags origin "langchain-${{ matrix.partner }}*" \
| awk '{print $2}' \
| sed 's|refs/tags/||' \
| grep -E '[0-9]+\.[0-9]+\.[0-9]+$' \
| sort -Vr \
| head -n 1
)"
echo "Latest package tag: $LATEST_PACKAGE_TAG"
# Shallow-fetch just that single tag
git fetch --depth=1 origin tag "$LATEST_PACKAGE_TAG"
# Checkout the latest package files
rm -rf $GITHUB_WORKSPACE/libs/partners/${{ matrix.partner }}/*
rm -rf $GITHUB_WORKSPACE/libs/standard-tests/*
cd $GITHUB_WORKSPACE/libs/
git checkout "$LATEST_PACKAGE_TAG" -- standard-tests/
git checkout "$LATEST_PACKAGE_TAG" -- partners/${{ matrix.partner }}/
cd partners/${{ matrix.partner }}
# Print as a sanity check
echo "Version number from pyproject.toml: "
cat pyproject.toml | grep "version = "
# Run tests
uv sync --group test --group test_integration
uv pip install ../../core/dist/*.whl
make integration_tests
publish:
needs:
- build
- release-notes
- test-pypi-publish
- pre-release-checks
- test-prior-published-packages-against-new-core
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
@@ -441,14 +332,17 @@ jobs:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- uses: actions/download-artifact@v5
- uses: actions/download-artifact@v4
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
@@ -480,18 +374,21 @@ jobs:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: Set up Python + uv
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
- uses: actions/download-artifact@v5
- uses: actions/download-artifact@v4
with:
name: dist
path: ${{ inputs.working-directory }}/dist/
- name: Create Tag
uses: ncipollo/release-action@v1
with:

View File

@@ -1,6 +1,4 @@
name: '🧪 Unit Testing'
# Runs unit tests with both current and minimum supported dependency versions
# to ensure compatibility across the supported range
name: test
on:
workflow_call:
@@ -14,61 +12,57 @@ on:
type: string
description: "Python version to use"
permissions:
contents: read
env:
UV_FROZEN: "true"
UV_NO_SYNC: "true"
POETRY_VERSION: "1.7.1"
jobs:
# Main test job - runs unit tests with current deps, then retests with minimum versions
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
timeout-minutes: 20
name: 'Python ${{ inputs.python-version }}'
name: "make test #${{ inputs.python-version }}"
steps:
- name: '📋 Checkout Code'
uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
id: setup-python
with:
python-version: ${{ inputs.python-version }}
- name: '📦 Install Test Dependencies'
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: core
- name: Install dependencies
shell: bash
run: uv sync --group test --dev
run: poetry install --with test
- name: '🧪 Run Core Unit Tests'
- name: Run core tests
shell: bash
run: |
make test
- name: '🔍 Calculate Minimum Dependency Versions'
- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
id: min-version
shell: bash
run: |
VIRTUAL_ENV=.venv uv pip install packaging tomli requests
python_version="$(uv run python --version | awk '{print $2}')"
min_versions="$(uv run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)"
poetry run pip install packaging tomli requests
python_version="$(poetry run python --version | awk '{print $2}')"
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml pull_request $python_version)"
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
echo "min-versions=$min_versions"
- name: '🧪 Run Tests with Minimum Dependencies'
- name: Run unit tests with minimum dependency versions
if: ${{ steps.min-version.outputs.min-versions != '' }}
env:
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
run: |
VIRTUAL_ENV=.venv uv pip install $MIN_VERSIONS
poetry run pip install $MIN_VERSIONS
make tests
working-directory: ${{ inputs.working-directory }}
- name: '🧹 Verify Clean Working Directory'
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
@@ -79,4 +73,4 @@ jobs:
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -1,4 +1,4 @@
name: '📑 Documentation Import Testing'
name: test_doc_imports
on:
workflow_call:
@@ -8,40 +8,37 @@ on:
type: string
description: "Python version to use"
permissions:
contents: read
env:
UV_FROZEN: "true"
POETRY_VERSION: "1.7.1"
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 20
name: '🔍 Check Doc Imports (Python ${{ inputs.python-version }})'
name: "check doc imports #${{ inputs.python-version }}"
steps:
- name: '📋 Checkout Code'
uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ inputs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
cache-key: core
- name: '📦 Install Test Dependencies'
- name: Install dependencies
shell: bash
run: uv sync --group test
run: poetry install --with test
- name: '📦 Install LangChain in Editable Mode'
- name: Install langchain editable
run: |
VIRTUAL_ENV=.venv uv pip install langchain-experimental langchain-community -e libs/core libs/langchain
poetry run pip install langchain-experimental -e libs/core libs/langchain libs/community
- name: '🔍 Validate Documentation Import Statements'
- name: Check doc imports
shell: bash
run: |
uv run python docs/scripts/check_imports.py
poetry run python docs/scripts/check_imports.py
- name: '🧹 Verify Clean Working Directory'
- name: Ensure the test did not create any additional files
shell: bash
run: |
set -eu

View File

@@ -1,4 +1,4 @@
name: '🐍 Pydantic Version Testing'
name: test pydantic intermediate versions
on:
workflow_call:
@@ -17,12 +17,8 @@ on:
type: string
description: "Pydantic version to test."
permissions:
contents: read
env:
UV_FROZEN: "true"
UV_NO_SYNC: "true"
POETRY_VERSION: "1.7.1"
jobs:
build:
@@ -30,31 +26,32 @@ jobs:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
timeout-minutes: 20
name: 'Pydantic ~=${{ inputs.pydantic-version }}'
name: "make test # pydantic: ~=${{ inputs.pydantic-version }}, python: ${{ inputs.python-version }}, "
steps:
- name: '📋 Checkout Code'
uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ inputs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ inputs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ inputs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: core
- name: '📦 Install Test Dependencies'
- name: Install dependencies
shell: bash
run: uv sync --group test
run: poetry install --with test
- name: '🔄 Install Specific Pydantic Version'
- name: Overwrite pydantic version
shell: bash
run: VIRTUAL_ENV=.venv uv pip install pydantic~=${{ inputs.pydantic-version }}
run: poetry run pip install pydantic~=${{ inputs.pydantic-version }}
- name: '🧪 Run Core Tests'
- name: Run core tests
shell: bash
run: |
make test
- name: '🧹 Verify Clean Working Directory'
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
@@ -64,4 +61,4 @@ jobs:
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -1,4 +1,4 @@
name: '🧪 Test Release Package'
name: test-release
on:
workflow_call:
@@ -14,8 +14,8 @@ on:
description: "Release from a non-master branch (danger!)"
env:
PYTHON_VERSION: "3.11"
UV_FROZEN: "true"
POETRY_VERSION: "1.7.1"
PYTHON_VERSION: "3.10"
jobs:
build:
@@ -27,12 +27,15 @@ jobs:
version: ${{ steps.check-version.outputs.version }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: release
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
@@ -45,30 +48,23 @@ jobs:
# > It is strongly advised to separate jobs for building [...]
# > from the publish job.
# https://github.com/pypa/gh-action-pypi-publish#non-goals
- name: '📦 Build Project for Distribution'
run: uv build
- name: Build project for distribution
run: poetry build
working-directory: ${{ inputs.working-directory }}
- name: '⬆️ Upload Build Artifacts'
- name: Upload build
uses: actions/upload-artifact@v4
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/
- name: '🔍 Extract Version Information'
- name: Check Version
id: check-version
shell: python
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
import os
import tomllib
with open("pyproject.toml", "rb") as f:
data = tomllib.load(f)
pkg_name = data["project"]["name"]
version = data["project"]["version"]
with open(os.environ["GITHUB_OUTPUT"], "a") as f:
f.write(f"pkg-name={pkg_name}\n")
f.write(f"version={version}\n")
echo pkg-name="$(poetry version | cut -d ' ' -f 1)" >> $GITHUB_OUTPUT
echo version="$(poetry version --short)" >> $GITHUB_OUTPUT
publish:
needs:
@@ -83,9 +79,9 @@ jobs:
id-token: write
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- uses: actions/download-artifact@v5
- uses: actions/download-artifact@v4
with:
name: test-dist
path: ${{ inputs.working-directory }}/dist/

View File

@@ -1,109 +1,88 @@
name: '📚 API Docs'
run-name: 'Build & Deploy API Reference'
# Runs daily or can be triggered manually for immediate updates
name: API docs build
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *' # Daily at 1PM UTC
- cron: '0 13 * * *'
env:
POETRY_VERSION: "1.8.1"
PYTHON_VERSION: "3.11"
jobs:
# Only runs on main repository to prevent unnecessary builds on forks
build:
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
runs-on: ubuntu-latest
permissions:
contents: read
permissions: write-all
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
path: langchain
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: langchain-ai/langchain-api-docs-html
path: langchain-api-docs-html
token: ${{ secrets.TOKEN_GITHUB_API_DOCS_HTML }}
- name: '📋 Extract Repository List with yq'
- name: Get repos with yq
id: get-unsorted-repos
uses: mikefarah/yq@master
with:
cmd: |
yq '
.packages[]
| select(
(
(.repo | test("^langchain-ai/"))
and
(.repo != "langchain-ai/langchain")
)
or
(.include_in_api_ref // false)
)
| .repo
' langchain/libs/packages.yml
cmd: yq '.packages[].repo' langchain/libs/packages.yml
- name: '📋 Parse YAML & Checkout Repositories'
- name: Parse YAML and checkout repos
env:
REPOS_UNSORTED: ${{ steps.get-unsorted-repos.outputs.result }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Get unique repositories
REPOS=$(echo "$REPOS_UNSORTED" | sort -u)
# Checkout each unique repository
# Checkout each unique repository that is in langchain-ai org
for repo in $REPOS; do
# Validate repository format (allow any org with proper format)
if [[ ! "$repo" =~ ^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$ ]]; then
echo "Error: Invalid repository format: $repo"
exit 1
if [[ "$repo" != "langchain-ai/langchain" && "$repo" == langchain-ai/* ]]; then
REPO_NAME=$(echo $repo | cut -d'/' -f2)
echo "Checking out $repo to $REPO_NAME"
git clone --depth 1 https://github.com/$repo.git $REPO_NAME
fi
REPO_NAME=$(echo $repo | cut -d'/' -f2)
# Additional validation for repo name
if [[ ! "$REPO_NAME" =~ ^[a-zA-Z0-9_.-]+$ ]]; then
echo "Error: Invalid repository name: $REPO_NAME"
exit 1
fi
echo "Checking out $repo to $REPO_NAME"
git clone --depth 1 https://github.com/$repo.git $REPO_NAME
done
- name: '🐍 Setup Python ${{ env.PYTHON_VERSION }}'
uses: actions/setup-python@v6
id: setup-python
- name: Set up Python ${{ env.PYTHON_VERSION }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./langchain/.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
cache-key: api-docs
working-directory: langchain
- name: '📦 Install Initial Python Dependencies'
- name: Install initial py deps
working-directory: langchain
run: |
python -m pip install -U uv
python -m uv pip install --upgrade --no-cache-dir pip setuptools pyyaml
- name: '📦 Organize Library Directories'
- name: Move libs with script
run: python langchain/.github/scripts/prep_api_docs_build.py
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: '🧹 Remove Old HTML Files'
- name: Rm old html
run:
rm -rf langchain-api-docs-html/api_reference_build/html
- name: '📦 Install Documentation Dependencies'
- name: Install dependencies
working-directory: langchain
run: |
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}")
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental
python -m uv pip install -r docs/api_reference/requirements.txt
- name: '🔧 Configure Git Settings'
- name: Set Git config
working-directory: langchain
run: |
git config --local user.email "actions@github.com"
git config --local user.name "Github Actions"
- name: '📚 Build API Documentation'
- name: Build docs
working-directory: langchain
run: |
python docs/api_reference/create_api_rst.py

View File

@@ -1,28 +1,25 @@
name: '🔗 Check Broken Links'
name: Check Broken Links
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *'
permissions:
contents: read
jobs:
check-links:
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: '🟢 Setup Node.js 18.x'
uses: actions/setup-node@v4
- uses: actions/checkout@v4
- name: Use Node.js 18.x
uses: actions/setup-node@v3
with:
node-version: 18.x
cache: "yarn"
cache-dependency-path: ./docs/yarn.lock
- name: '📦 Install Node Dependencies'
- name: Install dependencies
run: yarn install --immutable --mode=skip-build
working-directory: ./docs
- name: '🔍 Scan Documentation for Broken Links'
- name: Check broken links
run: yarn check-broken-links
working-directory: ./docs

View File

@@ -1,49 +0,0 @@
name: '🔍 Check `core` Version Equality'
# Ensures version numbers in pyproject.toml and version.py stay in sync
# Prevents releases with mismatched version numbers
on:
pull_request:
paths:
- 'libs/core/pyproject.toml'
- 'libs/core/langchain_core/version.py'
permissions:
contents: read
jobs:
check_version_equality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: '✅ Verify pyproject.toml & version.py Match'
run: |
# Check core versions
CORE_PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' libs/core/pyproject.toml)
CORE_VERSION_PY_VERSION=$(grep -Po '(?<=^VERSION = ")[^"]*' libs/core/langchain_core/version.py)
# Compare core versions
if [ "$CORE_PYPROJECT_VERSION" != "$CORE_VERSION_PY_VERSION" ]; then
echo "langchain-core versions in pyproject.toml and version.py do not match!"
echo "pyproject.toml version: $CORE_PYPROJECT_VERSION"
echo "version.py version: $CORE_VERSION_PY_VERSION"
exit 1
else
echo "Core versions match: $CORE_PYPROJECT_VERSION"
fi
# Check langchain_v1 versions
LANGCHAIN_PYPROJECT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' libs/langchain_v1/pyproject.toml)
LANGCHAIN_INIT_PY_VERSION=$(grep -Po '(?<=^__version__ = ")[^"]*' libs/langchain_v1/langchain/__init__.py)
# Compare langchain_v1 versions
if [ "$LANGCHAIN_PYPROJECT_VERSION" != "$LANGCHAIN_INIT_PY_VERSION" ]; then
echo "langchain_v1 versions in pyproject.toml and __init__.py do not match!"
echo "pyproject.toml version: $LANGCHAIN_PYPROJECT_VERSION"
echo "version.py version: $LANGCHAIN_INIT_PY_VERSION"
exit 1
else
echo "Langchain v1 versions match: $LANGCHAIN_PYPROJECT_VERSION"
fi

View File

@@ -1,12 +1,11 @@
name: '🔧 CI'
---
name: CI
on:
push:
branches: [master]
pull_request:
merge_group:
# Optimizes CI performance by canceling redundant workflow runs
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
@@ -17,32 +16,20 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
env:
UV_FROZEN: "true"
UV_NO_SYNC: "true"
POETRY_VERSION: "1.7.1"
jobs:
# This job analyzes which files changed and creates a dynamic test matrix
# to only run tests/lints for the affected packages, improving CI efficiency
build:
name: 'Detect Changes & Set Matrix'
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.labels.*.name, 'ci-ignore') }}
steps:
- name: '📋 Checkout Code'
uses: actions/checkout@v5
- name: '🐍 Setup Python 3.11'
uses: actions/setup-python@v6
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: '📂 Get Changed Files'
id: files
uses: Ana06/get-changed-files@v2.3.0
- name: '🔍 Analyze Changed Files & Generate Build Matrix'
id: set-matrix
- id: files
uses: Ana06/get-changed-files@v2.2.0
- id: set-matrix
run: |
python -m pip install packaging requests
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
@@ -54,8 +41,8 @@ jobs:
dependencies: ${{ steps.set-matrix.outputs.dependencies }}
test-doc-imports: ${{ steps.set-matrix.outputs.test-doc-imports }}
test-pydantic: ${{ steps.set-matrix.outputs.test-pydantic }}
# Run linting only on packages that have changed files
lint:
name: cd ${{ matrix.job-configs.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.lint != '[]' }}
strategy:
@@ -68,8 +55,8 @@ jobs:
python-version: ${{ matrix.job-configs.python-version }}
secrets: inherit
# Run unit tests only on packages that have changed files
test:
name: cd ${{ matrix.job-configs.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.test != '[]' }}
strategy:
@@ -82,8 +69,8 @@ jobs:
python-version: ${{ matrix.job-configs.python-version }}
secrets: inherit
# Test compatibility with different Pydantic versions for affected packages
test-pydantic:
name: cd ${{ matrix.job-configs.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.test-pydantic != '[]' }}
strategy:
@@ -104,12 +91,12 @@ jobs:
job-configs: ${{ fromJson(needs.build.outputs.test-doc-imports) }}
fail-fast: false
uses: ./.github/workflows/_test_doc_imports.yml
secrets: inherit
with:
python-version: ${{ matrix.job-configs.python-version }}
secrets: inherit
# Verify integration tests compile without actually running them (faster feedback)
compile-integration-tests:
name: cd ${{ matrix.job-configs.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.compile-integration-tests != '[]' }}
strategy:
@@ -122,9 +109,8 @@ jobs:
python-version: ${{ matrix.job-configs.python-version }}
secrets: inherit
# Run extended test suites that require additional dependencies
extended-tests:
name: 'Extended Tests'
name: "cd ${{ matrix.job-configs.working-directory }} / make extended_tests #${{ matrix.job-configs.python-version }}"
needs: [ build ]
if: ${{ needs.build.outputs.extended-tests != '[]' }}
strategy:
@@ -133,28 +119,32 @@ jobs:
job-configs: ${{ fromJson(needs.build.outputs.extended-tests) }}
fail-fast: false
runs-on: ubuntu-latest
timeout-minutes: 20
defaults:
run:
working-directory: ${{ matrix.job-configs.working-directory }}
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python ${{ matrix.job-configs.python-version }} + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python ${{ matrix.job-configs.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.job-configs.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ matrix.job-configs.working-directory }}
cache-key: extended
- name: '📦 Install Dependencies & Run Extended Tests'
- name: Install dependencies
shell: bash
run: |
echo "Running extended tests, installing dependencies with uv..."
uv venv
uv sync --group test
VIRTUAL_ENV=.venv uv pip install -r extended_testing_deps.txt
VIRTUAL_ENV=.venv make extended_tests
echo "Running extended tests, installing dependencies with poetry..."
poetry install --with test
poetry run pip install uv
poetry run uv pip install -r extended_testing_deps.txt
- name: '🧹 Verify Clean Working Directory'
- name: Run extended tests
run: make extended_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
@@ -165,10 +155,8 @@ jobs:
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
# Final status check - ensures all required jobs passed before allowing merge
ci_success:
name: '✅ CI Success'
name: "CI Success"
needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic]
if: |
always()
@@ -178,7 +166,7 @@ jobs:
RESULTS_JSON: ${{ toJSON(needs.*.result) }}
EXIT_CODE: ${{!contains(needs.*.result, 'failure') && !contains(needs.*.result, 'cancelled') && '0' || '1'}}
steps:
- name: '🎉 All Checks Passed'
- name: "CI Success"
run: |
echo $JOBS_JSON
echo $RESULTS_JSON

View File

@@ -1,4 +1,5 @@
name: '📑 Integration Docs Lint'
---
name: Integration docs lint
on:
push:
@@ -15,24 +16,21 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/setup-python@v6
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- id: files
uses: Ana06/get-changed-files@v2.3.0
uses: Ana06/get-changed-files@v2.2.0
with:
filter: |
*.ipynb
*.md
*.mdx
- name: '🔍 Check New Documentation Templates'
- name: Check new docs
run: |
python docs/scripts/check_templates.py ${{ steps.files.outputs.added }}

36
.github/workflows/codespell.yml vendored Normal file
View File

@@ -0,0 +1,36 @@
---
name: CI / cd . / make spell_check
on:
push:
branches: [master, v0.1, v0.2]
pull_request:
permissions:
contents: read
jobs:
codespell:
name: (Check for spelling errors)
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Dependencies
run: |
pip install toml
- name: Extract Ignore Words List
run: |
# Use a Python script to extract the ignore words list from pyproject.toml
python .github/workflows/extract_ignored_words_list.py
id: extract_ignore_words
# - name: Codespell
# uses: codespell-project/actions-codespell@v2
# with:
# skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv,*.lock
# ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
# exclude_file: ./.github/workflows/codespell-exclude

View File

@@ -1,66 +0,0 @@
name: '⚡ CodSpeed'
on:
push:
branches:
- master
pull_request:
workflow_dispatch:
permissions:
contents: read
env:
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: foo
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: foo
DEEPSEEK_API_KEY: foo
FIREWORKS_API_KEY: foo
jobs:
codspeed:
name: 'Benchmark'
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.labels.*.name, 'codspeed-ignore') }}
strategy:
matrix:
include:
- working-directory: libs/core
mode: walltime
- working-directory: libs/partners/openai
- working-directory: libs/partners/anthropic
- working-directory: libs/partners/deepseek
- working-directory: libs/partners/fireworks
- working-directory: libs/partners/xai
- working-directory: libs/partners/mistralai
- working-directory: libs/partners/groq
fail-fast: false
steps:
- uses: actions/checkout@v5
# We have to use 3.12 as 3.13 is not yet supported
- name: '📦 Install UV Package Manager'
uses: astral-sh/setup-uv@v6
with:
python-version: "3.12"
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: '📦 Install Test Dependencies'
run: uv sync --group test
working-directory: ${{ matrix.working-directory }}
- name: '⚡ Run Benchmarks: ${{ matrix.working-directory }}'
uses: CodSpeedHQ/action@v3
with:
token: ${{ secrets.CODSPEED_TOKEN }}
run: |
cd ${{ matrix.working-directory }}
if [ "${{ matrix.working-directory }}" = "libs/core" ]; then
uv run --no-sync pytest ./tests/benchmarks --codspeed
else
uv run --no-sync pytest ./tests/ --codspeed
fi
mode: ${{ matrix.mode || 'instrumentation' }}

View File

@@ -1,28 +1,37 @@
name: '👥 LangChain People'
run-name: 'Update People Data'
# This workflow updates the LangChain People data by fetching the latest information from the LangChain Git
name: LangChain People
on:
schedule:
- cron: "0 14 1 * *"
push:
branches: [jacob/people]
workflow_dispatch:
inputs:
debug_enabled:
description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
required: false
default: 'false'
jobs:
langchain-people:
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
runs-on: ubuntu-latest
permissions:
contents: write
permissions: write-all
steps:
- name: '📋 Dump GitHub Context'
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- uses: actions/checkout@v5
- uses: actions/checkout@v4
# Ref: https://github.com/actions/runner/issues/2033
- name: '🔧 Fix Git Safe Directory in Container'
- name: Fix git safe.directory in container
run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
# Allow debugging with tmate
- name: Setup tmate session
uses: mxschmitt/action-tmate@v3
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
with:
limit-access-to-actor: true
- uses: ./.github/actions/people
with:
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}

View File

@@ -1,111 +0,0 @@
# -----------------------------------------------------------------------------
# PR Title Lint Workflow
#
# Purpose:
# Enforces Conventional Commits format for pull request titles to maintain a
# clear, consistent, and machine-readable change history across our repository.
# This helps with automated changelog generation and semantic versioning.
#
# Enforced Commit Message Format (Conventional Commits 1.0.0):
# <type>[optional scope]: <description>
# [optional body]
# [optional footer(s)]
#
# Allowed Types:
# • feat — a new feature (MINOR bump)
# • fix — a bug fix (PATCH bump)
# • docs — documentation only changes
# • style — formatting, missing semi-colons, etc.; no code change
# • refactor — code change that neither fixes a bug nor adds a feature
# • perf — code change that improves performance
# • test — adding missing tests or correcting existing tests
# • build — changes that affect the build system or external dependencies
# • ci — continuous integration/configuration changes
# • chore — other changes that don't modify src or test files
# • revert — reverts a previous commit
# • release — prepare a new release
#
# Allowed Scopes (optional):
# core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek,
# exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai,
# perplexity, prompty, qdrant, xai
#
# Rules & Tips for New Committers:
# 1. Subject (type) must start with a lowercase letter and, if possible, be
# followed by a scope wrapped in parenthesis `(scope)`
# 2. Breaking changes:
# Append "!" after type/scope (e.g., feat!: drop Node 12 support)
# Or include a footer "BREAKING CHANGE: <details>"
# 3. Example PR titles:
# feat(core): add multitenant support
# fix(cli): resolve flag parsing error
# docs: update API usage examples
# docs(openai): update API usage examples
#
# Resources:
# • Conventional Commits spec: https://www.conventionalcommits.org/en/v1.0.0/
# -----------------------------------------------------------------------------
name: '🏷️ PR Title Lint'
permissions:
pull-requests: read
on:
pull_request:
types: [opened, edited, synchronize]
jobs:
# Validates that PR title follows Conventional Commits specification
lint-pr-title:
name: 'Validate PR Title Format'
runs-on: ubuntu-latest
steps:
- name: '✅ Validate Conventional Commits Format'
uses: amannn/action-semantic-pull-request@v6
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
types: |
feat
fix
docs
style
refactor
perf
test
build
ci
chore
revert
release
scopes: |
core
cli
langchain
langchain_v1
standard-tests
text-splitters
docs
anthropic
chroma
deepseek
exa
fireworks
groq
huggingface
mistralai
nomic
ollama
openai
perplexity
prompty
qdrant
xai
infra
requireScope: false
disallowScopes: |
release
[A-Z]+
ignoreLabels: |
ignore-lint-pr-title

View File

@@ -1,5 +1,5 @@
name: '📓 Validate Documentation Notebooks'
run-name: 'Test notebooks in ${{ inputs.working-directory }}'
name: Run notebooks
on:
workflow_dispatch:
inputs:
@@ -14,57 +14,56 @@ on:
schedule:
- cron: '0 13 * * *'
permissions:
contents: read
env:
UV_FROZEN: "true"
POETRY_VERSION: "1.7.1"
jobs:
build:
runs-on: ubuntu-latest
if: github.repository == 'langchain-ai/langchain' || github.event_name != 'schedule'
name: '📑 Test Documentation Notebooks'
name: "Test docs"
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
- name: '🐍 Set up Python + UV'
uses: "./.github/actions/uv_setup"
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ github.event.inputs.python_version || '3.11' }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: run-notebooks
- name: '🔐 Authenticate to Google Cloud'
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: google-github-actions/auth@v3
uses: google-github-actions/auth@v2
with:
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
- name: '🔐 Configure AWS Credentials'
uses: aws-actions/configure-aws-credentials@v5
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: '📦 Install Dependencies'
- name: Install dependencies
run: |
uv sync --group dev --group test
poetry install --with dev,test
- name: '📦 Pre-download Test Files'
- name: Pre-download files
run: |
uv run python docs/scripts/cache_data.py
poetry run python docs/scripts/cache_data.py
curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 docs/docs/how_to/Chinook.db
cp docs/docs/how_to/Chinook.db docs/docs/tutorials/Chinook.db
- name: '🔧 Prepare Notebooks for CI'
- name: Prepare notebooks
run: |
uv run python docs/scripts/prepare_notebooks_for_ci.py --comment-install-cells --working-directory ${{ github.event.inputs.working-directory || 'all' }}
poetry run python docs/scripts/prepare_notebooks_for_ci.py --comment-install-cells --working-directory ${{ github.event.inputs.working-directory || 'all' }}
- name: '🚀 Execute Notebooks'
- name: Run notebooks
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

View File

@@ -1,86 +1,48 @@
name: '⏰ Scheduled Integration Tests'
run-name: "Run Integration Tests - ${{ inputs.working-directory-force || 'all libs' }} (Python ${{ inputs.python-version-force || '3.9, 3.11' }})"
name: Scheduled tests
on:
workflow_dispatch: # Allows maintainers to trigger the workflow manually in GitHub UI
inputs:
working-directory-force:
type: string
description: "From which folder this pipeline executes - defaults to all in matrix - example value: libs/partners/anthropic"
python-version-force:
type: string
description: "Python version to use - defaults to 3.9 and 3.11 in matrix - example value: 3.9"
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
schedule:
- cron: '0 13 * * *' # Runs daily at 1PM UTC (9AM EDT/6AM PDT)
permissions:
contents: read
- cron: '0 13 * * *'
env:
POETRY_VERSION: "1.8.4"
UV_FROZEN: "true"
DEFAULT_LIBS: '["libs/partners/openai", "libs/partners/anthropic", "libs/partners/fireworks", "libs/partners/groq", "libs/partners/mistralai", "libs/partners/xai", "libs/partners/google-vertexai", "libs/partners/google-genai", "libs/partners/aws"]'
POETRY_LIBS: ("libs/partners/aws")
POETRY_VERSION: "1.7.1"
jobs:
# Generate dynamic test matrix based on input parameters or defaults
# Only runs on the main repo (for scheduled runs) or when manually triggered
compute-matrix:
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
runs-on: ubuntu-latest
name: '📋 Compute Test Matrix'
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: '🔢 Generate Python & Library Matrix'
id: set-matrix
env:
DEFAULT_LIBS: ${{ env.DEFAULT_LIBS }}
WORKING_DIRECTORY_FORCE: ${{ github.event.inputs.working-directory-force || '' }}
PYTHON_VERSION_FORCE: ${{ github.event.inputs.python-version-force || '' }}
run: |
# echo "matrix=..." where matrix is a json formatted str with keys python-version and working-directory
# python-version should default to 3.9 and 3.11, but is overridden to [PYTHON_VERSION_FORCE] if set
# working-directory should default to DEFAULT_LIBS, but is overridden to [WORKING_DIRECTORY_FORCE] if set
python_version='["3.9", "3.11"]'
working_directory="$DEFAULT_LIBS"
if [ -n "$PYTHON_VERSION_FORCE" ]; then
python_version="[\"$PYTHON_VERSION_FORCE\"]"
fi
if [ -n "$WORKING_DIRECTORY_FORCE" ]; then
working_directory="[\"$WORKING_DIRECTORY_FORCE\"]"
fi
matrix="{\"python-version\": $python_version, \"working-directory\": $working_directory}"
echo $matrix
echo "matrix=$matrix" >> $GITHUB_OUTPUT
# Run integration tests against partner libraries with live API credentials
# Tests are run with both Poetry and UV depending on the library's setup
build:
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
name: '🐍 Python ${{ matrix.python-version }}: ${{ matrix.working-directory }}'
name: Python ${{ matrix.python-version }} - ${{ matrix.working-directory }}
runs-on: ubuntu-latest
needs: [compute-matrix]
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
python-version: ${{ fromJSON(needs.compute-matrix.outputs.matrix).python-version }}
working-directory: ${{ fromJSON(needs.compute-matrix.outputs.matrix).working-directory }}
python-version:
- "3.9"
- "3.11"
working-directory:
- "libs/partners/openai"
- "libs/partners/anthropic"
- "libs/partners/fireworks"
- "libs/partners/groq"
- "libs/partners/mistralai"
- "libs/partners/google-vertexai"
- "libs/partners/google-genai"
- "libs/partners/aws"
steps:
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
path: langchain
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: langchain-ai/langchain-google
path: langchain-google
- uses: actions/checkout@v5
- uses: actions/checkout@v4
with:
repository: langchain-ai/langchain-aws
path: langchain-aws
- name: '📦 Organize External Libraries'
- name: Move libs
run: |
rm -rf \
langchain/libs/partners/google-genai \
@@ -89,8 +51,7 @@ jobs:
mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
mv langchain-aws/libs/aws langchain/libs/partners/aws
- name: '🐍 Set up Python ${{ matrix.python-version }} + Poetry'
if: contains(env.POETRY_LIBS, matrix.working-directory)
- name: Set up Python ${{ matrix.python-version }}
uses: "./langchain/.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
@@ -98,45 +59,29 @@ jobs:
working-directory: langchain/${{ matrix.working-directory }}
cache-key: scheduled
- name: '🐍 Set up Python ${{ matrix.python-version }} + UV'
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
uses: "./langchain/.github/actions/uv_setup"
with:
python-version: ${{ matrix.python-version }}
- name: '🔐 Authenticate to Google Cloud'
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: google-github-actions/auth@v3
uses: google-github-actions/auth@v2
with:
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
- name: '🔐 Configure AWS Credentials'
uses: aws-actions/configure-aws-credentials@v5
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: '📦 Install Dependencies (Poetry)'
if: contains(env.POETRY_LIBS, matrix.working-directory)
- name: Install dependencies
run: |
echo "Running scheduled tests, installing dependencies with poetry..."
cd langchain/${{ matrix.working-directory }}
poetry install --with=test_integration,test
- name: '📦 Install Dependencies (UV)'
if: "!contains(env.POETRY_LIBS, matrix.working-directory)"
run: |
echo "Running scheduled tests, installing dependencies with uv..."
cd langchain/${{ matrix.working-directory }}
uv sync --group test --group test_integration
- name: '🚀 Run Integration Tests'
- name: Run integration tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
@@ -144,31 +89,27 @@ jobs:
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
GOOGLE_SEARCH_API_KEY: ${{ secrets.GOOGLE_SEARCH_API_KEY }}
GOOGLE_CSE_ID: ${{ secrets.GOOGLE_CSE_ID }}
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
run: |
cd langchain/${{ matrix.working-directory }}
make integration_tests
- name: '🧹 Clean up External Libraries'
# Clean up external libraries to avoid affecting git status check
run: |
- name: Remove external libraries
run: |
rm -rf \
langchain/libs/partners/google-genai \
langchain/libs/partners/google-vertexai \
langchain/libs/partners/aws
- name: '🧹 Verify Clean Working Directory'
- name: Ensure the tests did not create any additional files
working-directory: langchain
run: |
set -eu

2
.gitignore vendored
View File

@@ -1,4 +1,5 @@
.vs/
.vscode/
.idea/
# Byte-compiled / optimized / DLL files
__pycache__/
@@ -58,7 +59,6 @@ coverage.xml
*.py,cover
.hypothesis/
.pytest_cache/
.codspeed/
# Translations
*.mo

View File

@@ -1,14 +0,0 @@
{
"MD013": false,
"MD024": {
"siblings_only": true
},
"MD025": false,
"MD033": false,
"MD034": false,
"MD036": false,
"MD041": false,
"MD046": {
"style": "fenced"
}
}

View File

@@ -1,111 +0,0 @@
repos:
- repo: local
hooks:
- id: core
name: format core
language: system
entry: make -C libs/core format
files: ^libs/core/
pass_filenames: false
- id: langchain
name: format langchain
language: system
entry: make -C libs/langchain format
files: ^libs/langchain/
pass_filenames: false
- id: standard-tests
name: format standard-tests
language: system
entry: make -C libs/standard-tests format
files: ^libs/standard-tests/
pass_filenames: false
- id: text-splitters
name: format text-splitters
language: system
entry: make -C libs/text-splitters format
files: ^libs/text-splitters/
pass_filenames: false
- id: anthropic
name: format partners/anthropic
language: system
entry: make -C libs/partners/anthropic format
files: ^libs/partners/anthropic/
pass_filenames: false
- id: chroma
name: format partners/chroma
language: system
entry: make -C libs/partners/chroma format
files: ^libs/partners/chroma/
pass_filenames: false
- id: couchbase
name: format partners/couchbase
language: system
entry: make -C libs/partners/couchbase format
files: ^libs/partners/couchbase/
pass_filenames: false
- id: exa
name: format partners/exa
language: system
entry: make -C libs/partners/exa format
files: ^libs/partners/exa/
pass_filenames: false
- id: fireworks
name: format partners/fireworks
language: system
entry: make -C libs/partners/fireworks format
files: ^libs/partners/fireworks/
pass_filenames: false
- id: groq
name: format partners/groq
language: system
entry: make -C libs/partners/groq format
files: ^libs/partners/groq/
pass_filenames: false
- id: huggingface
name: format partners/huggingface
language: system
entry: make -C libs/partners/huggingface format
files: ^libs/partners/huggingface/
pass_filenames: false
- id: mistralai
name: format partners/mistralai
language: system
entry: make -C libs/partners/mistralai format
files: ^libs/partners/mistralai/
pass_filenames: false
- id: nomic
name: format partners/nomic
language: system
entry: make -C libs/partners/nomic format
files: ^libs/partners/nomic/
pass_filenames: false
- id: ollama
name: format partners/ollama
language: system
entry: make -C libs/partners/ollama format
files: ^libs/partners/ollama/
pass_filenames: false
- id: openai
name: format partners/openai
language: system
entry: make -C libs/partners/openai format
files: ^libs/partners/openai/
pass_filenames: false
- id: prompty
name: format partners/prompty
language: system
entry: make -C libs/partners/prompty format
files: ^libs/partners/prompty/
pass_filenames: false
- id: qdrant
name: format partners/qdrant
language: system
entry: make -C libs/partners/qdrant format
files: ^libs/partners/qdrant/
pass_filenames: false
- id: root
name: format docs, cookbook
language: system
entry: make format
files: ^(docs|cookbook)/
pass_filenames: false

View File

@@ -1,7 +1,12 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
formats:
- pdf
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
@@ -10,16 +15,15 @@ build:
commands:
- mkdir -p $READTHEDOCS_OUTPUT
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/api_reference/conf.py
configuration: docs/api_reference/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats:
- pdf
# formats:
# - pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/api_reference/requirements.txt
install:
- requirements: docs/api_reference/requirements.txt

View File

@@ -1,21 +0,0 @@
{
"recommendations": [
"ms-python.python",
"charliermarsh.ruff",
"ms-python.mypy-type-checker",
"ms-toolsai.jupyter",
"ms-toolsai.jupyter-keymap",
"ms-toolsai.jupyter-renderers",
"ms-toolsai.vscode-jupyter-cell-tags",
"ms-toolsai.vscode-jupyter-slideshow",
"yzhang.markdown-all-in-one",
"davidanson.vscode-markdownlint",
"bierner.markdown-mermaid",
"bierner.markdown-preview-github-styles",
"eamodio.gitlens",
"github.vscode-pull-request-github",
"github.vscode-github-actions",
"redhat.vscode-yaml",
"editorconfig.editorconfig",
],
}

82
.vscode/settings.json vendored
View File

@@ -1,82 +0,0 @@
{
"python.analysis.include": [
"libs/**",
"docs/**",
"cookbook/**"
],
"python.analysis.exclude": [
"**/node_modules",
"**/__pycache__",
"**/.pytest_cache",
"**/.*",
"_dist/**",
"docs/_build/**",
"docs/api_reference/_build/**"
],
"python.analysis.autoImportCompletions": true,
"python.analysis.typeCheckingMode": "basic",
"python.testing.cwd": "${workspaceFolder}",
"python.linting.enabled": true,
"python.linting.ruffEnabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports.ruff": "explicit",
"source.fixAll": "explicit"
},
"editor.defaultFormatter": "charliermarsh.ruff"
},
"editor.rulers": [
88
],
"editor.tabSize": 4,
"editor.insertSpaces": true,
"editor.trimAutoWhitespace": true,
"files.trimTrailingWhitespace": true,
"files.insertFinalNewline": true,
"files.exclude": {
"**/__pycache__": true,
"**/.pytest_cache": true,
"**/*.pyc": true,
"**/.mypy_cache": true,
"**/.ruff_cache": true,
"_dist/**": true,
"docs/_build/**": true,
"docs/api_reference/_build/**": true,
"**/node_modules": true,
"**/.git": false
},
"search.exclude": {
"**/__pycache__": true,
"**/*.pyc": true,
"_dist/**": true,
"docs/_build/**": true,
"docs/api_reference/_build/**": true,
"**/node_modules": true,
"**/.git": true,
"uv.lock": true,
"yarn.lock": true
},
"git.autofetch": true,
"git.enableSmartCommit": true,
"jupyter.askForKernelRestart": false,
"jupyter.interactiveWindow.textEditor.executeSelection": true,
"[markdown]": {
"editor.wordWrap": "on",
"editor.quickSuggestions": {
"comments": "off",
"strings": "off",
"other": "off"
}
},
"[yaml]": {
"editor.tabSize": 2,
"editor.insertSpaces": true
},
"[json]": {
"editor.tabSize": 2,
"editor.insertSpaces": true
},
"python.terminal.activateEnvironment": false,
"python.defaultInterpreterPath": "./.venv/bin/python"
}

325
CLAUDE.md
View File

@@ -1,325 +0,0 @@
# Global Development Guidelines for LangChain Projects
## Core Development Principles
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
**Bad - Breaking Change:**
```python
def get_user(id, verbose=False): # Changed from `user_id`
pass
```
**Good - Stable Interface:**
```python
def get_user(user_id: str, verbose: bool = False) -> User:
"""Retrieve user by ID with optional verbose output."""
pass
```
**Before making ANY changes to public APIs:**
- Check if the function/class is exported in `__init__.py`
- Look for existing usage patterns in tests and examples
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
### 2. Code Quality Standards
**All Python code MUST include type hints and return types.**
**Bad:**
```python
def p(u, d):
return [x for x in u if x not in d]
```
**Good:**
```python
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
"""Filter out users that are not in the known users set.
Args:
users: List of user identifiers to filter.
known_users: Set of known/valid user identifiers.
Returns:
List of users that are not in the known_users set.
"""
return [user for user in users if user not in known_users]
```
**Style Requirements:**
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
- Avoid unnecessary abstraction or premature optimization
- Follow existing patterns in the codebase you're modifying
### 3. Testing Requirements
**Every new feature or bugfix MUST be covered by unit tests.**
**Test Organization:**
- Unit tests: `tests/unit_tests/` (no network calls allowed)
- Integration tests: `tests/integration_tests/` (network calls permitted)
- Use `pytest` as the testing framework
**Test Quality Checklist:**
- [ ] Tests fail when your new logic is broken
- [ ] Happy path is covered
- [ ] Edge cases and error conditions are tested
- [ ] Use fixtures/mocks for external dependencies
- [ ] Tests are deterministic (no flaky tests)
Checklist questions:
- [ ] Does the test suite fail if your new logic is broken?
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
- [ ] Do tests use fixtures or mocks where needed?
```python
def test_filter_unknown_users():
"""Test filtering unknown users from a list."""
users = ["alice", "bob", "charlie"]
known_users = {"alice", "bob"}
result = filter_unknown_users(users, known_users)
assert result == ["charlie"]
assert len(result) == 1
```
### 4. Security and Risk Assessment
**Security Checklist:**
- No `eval()`, `exec()`, or `pickle` on user-controlled input
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
- Remove unreachable/commented code before committing
- Race conditions or resource leaks (file handles, sockets, threads).
- Ensure proper resource cleanup (file handles, connections)
**Bad:**
```python
def load_config(path):
with open(path) as f:
return eval(f.read()) # ⚠️ Never eval config
```
**Good:**
```python
import json
def load_config(path: str) -> dict:
with open(path) as f:
return json.load(f)
```
### 5. Documentation Standards
**Use Google-style docstrings with Args section for all public functions.**
**Insufficient Documentation:**
```python
def send_email(to, msg):
"""Send an email to a recipient."""
```
**Complete Documentation:**
```python
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
"""
Send an email to a recipient with specified priority.
Args:
to: The email address of the recipient.
msg: The message body to send.
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
Returns:
True if email was sent successfully, False otherwise.
Raises:
InvalidEmailError: If the email address format is invalid.
SMTPConnectionError: If unable to connect to email server.
"""
```
**Documentation Guidelines:**
- Types go in function signatures, NOT in docstrings
- Focus on "why" rather than "what" in descriptions
- Document all parameters, return values, and exceptions
- Keep descriptions concise but clear
- Use reStructuredText for docstrings to enable rich formatting
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
### 6. Architectural Improvements
**When you encounter code that could be improved, suggest better designs:**
**Poor Design:**
```python
def process_data(data, db_conn, email_client, logger):
# Function doing too many things
validated = validate_data(data)
result = db_conn.save(validated)
email_client.send_notification(result)
logger.log(f"Processed {len(data)} items")
return result
```
**Better Design:**
```python
@dataclass
class ProcessingResult:
"""Result of data processing operation."""
items_processed: int
success: bool
errors: List[str] = field(default_factory=list)
class DataProcessor:
"""Handles data validation, storage, and notification."""
def __init__(self, db_conn: Database, email_client: EmailClient):
self.db = db_conn
self.email = email_client
def process(self, data: List[dict]) -> ProcessingResult:
"""Process and store data with notifications."""
validated = self._validate_data(data)
result = self.db.save(validated)
self._notify_completion(result)
return result
```
**Design Improvement Areas:**
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
- Reduce code duplication through shared utilities
- Make unit testing easier
- Improve separation of concerns (single responsibility)
- Make unit testing easier through dependency injection
- Add clarity without adding complexity
- Prefer dataclasses for structured data
## Development Tools & Commands
### Package Management
```bash
# Add package
uv add package-name
# Sync project dependencies
uv sync
uv lock
```
### Testing
```bash
# Run unit tests (no network)
make test
# Don't run integration tests, as API keys must be set
# Run specific test file
uv run --group test pytest tests/unit_tests/test_specific.py
```
### Code Quality
```bash
# Lint code
make lint
# Format code
make format
# Type checking
uv run --group lint mypy .
```
### Dependency Management Patterns
**Local Development Dependencies:**
```toml
[tool.uv.sources]
langchain-core = { path = "../core", editable = true }
langchain-tests = { path = "../standard-tests", editable = true }
```
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
```python
from langchain_core.tools import tool
@tool
def search_database(query: str) -> str:
"""Search the database for relevant information.
Args:
query: The search query string.
"""
# Implementation here
return results
```
## Commit Standards
**Use Conventional Commits format for PR titles:**
- `feat(core): add multi-tenant support`
- `fix(cli): resolve flag parsing error`
- `docs: update API usage examples`
- `docs(openai): update API usage examples`
## Framework-Specific Guidelines
- Follow the existing patterns in `langchain-core` for base abstractions
- Use `langchain_core.callbacks` for execution tracking
- Implement proper streaming support where applicable
- Avoid deprecated components like legacy `LLMChain`
### Partner Integrations
- Follow the established patterns in existing partner libraries
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
- Include comprehensive integration tests
- Document API key requirements and authentication
---
## Quick Reference Checklist
Before submitting code changes:
- [ ] **Breaking Changes**: Verified no public API changes
- [ ] **Type Hints**: All functions have complete type annotations
- [ ] **Tests**: New functionality is fully tested
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
- [ ] **Documentation**: Google-style docstrings for public functions
- [ ] **Code Quality**: `make lint` and `make format` pass
- [ ] **Architecture**: Suggested improvements where applicable
- [ ] **Commit Message**: Follows Conventional Commits format

View File

@@ -7,5 +7,5 @@ Please see the following guides for migrating LangChain code:
* Migrating from [LangChain 0.0.x Chains](https://python.langchain.com/docs/versions/migrating_chains/)
* Upgrade to [LangGraph Memory](https://python.langchain.com/docs/versions/migrating_memory/)
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain.

View File

@@ -1,13 +1,13 @@
.PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck spell_check spell_fix lint lint_package lint_tests format format_diff
.EXPORT_ALL_VARIABLES:
UV_FROZEN = true
## help: Show this help info.
help: Makefile
@printf "\n\033[1mUsage: make <TARGETS> ...\033[0m\n\n\033[1mTargets:\033[0m\n\n"
@sed -n 's/^## //p' $< | awk -F':' '{printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort | sed -e 's/^/ /'
## all: Default target, shows help.
all: help
## clean: Clean documentation and API documentation artifacts.
clean: docs_clean api_docs_clean
@@ -16,79 +16,49 @@ clean: docs_clean api_docs_clean
######################
## docs_build: Build the documentation.
docs_build: docs_clean
@echo "📚 Building LangChain documentation..."
docs_build:
cd docs && make build
@echo "✅ Documentation build complete!"
## docs_clean: Clean the documentation build artifacts.
docs_clean:
@echo "🧹 Cleaning documentation artifacts..."
cd docs && make clean
@echo "✅ LangChain documentation cleaned"
## docs_linkcheck: Run linkchecker on the documentation.
docs_linkcheck:
@echo "🔗 Checking documentation links..."
@if [ -d _dist/docs ]; then \
uv run --group test linkchecker _dist/docs/ --ignore-url node_modules; \
else \
echo "⚠️ Documentation not built. Run 'make docs_build' first."; \
exit 1; \
fi
@echo "✅ Link check complete"
poetry run linkchecker _dist/docs/ --ignore-url node_modules
## api_docs_build: Build the API Reference documentation.
api_docs_build: clean
@echo "📖 Building API Reference documentation..."
uv pip install -e libs/cli
uv run --group docs python docs/api_reference/create_api_rst.py
cd docs/api_reference && uv run --group docs make html
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
@echo "✅ API documentation built"
@echo "🌐 Opening documentation in browser..."
open docs/api_reference/_build/html/reference.html
api_docs_build:
poetry run python docs/api_reference/create_api_rst.py
cd docs/api_reference && poetry run make html
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
API_PKG ?= text-splitters
api_docs_quick_preview: clean
@echo "⚡ Building quick API preview for $(API_PKG)..."
uv run --group docs python docs/api_reference/create_api_rst.py $(API_PKG)
cd docs/api_reference && uv run --group docs make html
uv run --group docs python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
@echo "🌐 Opening preview in browser..."
api_docs_quick_preview:
poetry run python docs/api_reference/create_api_rst.py $(API_PKG)
cd docs/api_reference && poetry run make html
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
open docs/api_reference/_build/html/reference.html
## api_docs_clean: Clean the API Reference documentation build artifacts.
api_docs_clean:
@echo "🧹 Cleaning API documentation artifacts..."
find ./docs/api_reference -name '*_api_reference.rst' -delete
git clean -fdX ./docs/api_reference
rm -f docs/api_reference/index.md
@echo "✅ API documentation cleaned"
rm docs/api_reference/index.md
## api_docs_linkcheck: Run linkchecker on the API Reference documentation.
api_docs_linkcheck:
@echo "🔗 Checking API documentation links..."
@if [ -f docs/api_reference/_build/html/index.html ]; then \
uv run --group test linkchecker docs/api_reference/_build/html/index.html; \
else \
echo "⚠️ API documentation not built. Run 'make api_docs_build' first."; \
exit 1; \
fi
@echo "✅ API link check complete"
poetry run linkchecker docs/api_reference/_build/html/index.html
## spell_check: Run codespell on the project.
spell_check:
@echo "✏️ Checking spelling across project..."
uv run --group codespell codespell --toml pyproject.toml
@echo "✅ Spell check complete"
poetry run codespell --toml pyproject.toml
## spell_fix: Run codespell on the project and fix the errors.
spell_fix:
@echo "✏️ Fixing spelling errors across project..."
uv run --group codespell codespell --toml pyproject.toml -w
@echo "✅ Spelling errors fixed"
poetry run codespell --toml pyproject.toml -w
######################
# LINTING AND FORMATTING
@@ -96,24 +66,12 @@ spell_fix:
## lint: Run linting on the project.
lint lint_package lint_tests:
@echo "🔍 Running code linting and checks..."
uv run --group lint ruff check docs cookbook
uv run --group lint ruff format docs cookbook cookbook --diff
git --no-pager grep 'from langchain import' docs cookbook | grep -vE 'from langchain import (hub)' && echo "Error: no importing langchain from root in docs, except for hub" && exit 1 || exit 0
git --no-pager grep 'api.python.langchain.com' -- docs/docs ':!docs/docs/additional_resources/arxiv_references.mdx' ':!docs/docs/integrations/document_loaders/sitemap.ipynb' || exit 0 && \
echo "Error: you should link python.langchain.com/api_reference, not api.python.langchain.com in the docs" && \
exit 1
@echo "✅ Linting complete"
poetry run ruff check docs cookbook
poetry run ruff format docs cookbook cookbook --diff
poetry run ruff check --select I docs cookbook
git grep 'from langchain import' docs/docs cookbook | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
## format: Format the project files.
format format_diff:
@echo "🎨 Formatting project files..."
uv run --group lint ruff format docs cookbook
uv run --group lint ruff check --fix docs cookbook
@echo "✅ Formatting complete"
update-package-downloads:
@echo "📊 Updating package download statistics..."
uv run python docs/scripts/packages_yml_get_downloads.py
@echo "✅ Package downloads updated"
poetry run ruff format docs cookbook
poetry run ruff check --select I --fix docs cookbook

183
README.md
View File

@@ -1,83 +1,142 @@
<picture>
<source media="(prefers-color-scheme: light)" srcset="docs/static/img/logo-dark.svg">
<source media="(prefers-color-scheme: dark)" srcset="docs/static/img/logo-light.svg">
<img alt="LangChain Logo" src="docs/static/img/logo-dark.svg" width="80%">
</picture>
# 🦜️🔗 LangChain
<div>
<br>
</div>
⚡ Build context-aware reasoning applications ⚡
[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain?style=flat-square)](https://github.com/langchain-ai/langchain/releases)
[![CI](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)
[![PyPI - License](https://img.shields.io/pypi/l/langchain-core?style=flat-square)](https://opensource.org/licenses/MIT)
[![PyPI - Downloads](https://img.shields.io/pepy/dt/langchain)](https://pypistats.org/packages/langchain-core)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-core?style=flat-square)](https://pypistats.org/packages/langchain-core)
[![GitHub star chart](https://img.shields.io/github/stars/langchain-ai/langchain?style=flat-square)](https://star-history.com/#langchain-ai/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain?style=flat-square)](https://github.com/langchain-ai/langchain/issues)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[<img src="https://github.com/codespaces/badge.svg" alt="Open in Github Codespace" title="Open in Github Codespace" width="150" height="20">](https://codespaces.new/langchain-ai/langchain)
[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/langchain-ai/langchain)
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
> [!NOTE]
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
LangChain is a framework for building LLM-powered applications. It helps you chain
together interoperable components and third-party integrations to simplify AI
application development — all while future-proofing decisions as the underlying
technology evolves.
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
## Quick Install
With pip:
```bash
pip install -U langchain
pip install langchain
```
To learn more about LangChain, check out
[the docs](https://python.langchain.com/docs/introduction/). If youre looking for more
advanced customization or agent orchestration, check out
[LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building
controllable agent workflows.
With conda:
## Why use LangChain?
```bash
conda install langchain -c conda-forge
```
LangChain helps developers build applications powered by LLMs through a standard
interface for models, embeddings, vector stores, and more.
## 🤔 What is LangChain?
Use LangChain for:
**LangChain** is a framework for developing applications powered by large language models (LLMs).
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and
external/internal systems, drawing from LangChains vast library of integrations with
model providers, tools, vector stores, retrievers, and more.
- **Model interoperability**. Swap models in and out as your engineering team
experiments to find the best choice for your applications needs. As the industry
frontier evolves, adapt quickly — LangChains abstractions keep you moving without
losing momentum.
For these applications, LangChain simplifies the entire application lifecycle:
## LangChains ecosystem
- **Open-source libraries**: Build your applications using LangChain's open-source [building blocks](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel), [components](https://python.langchain.com/docs/concepts/), and [third-party integrations](https://python.langchain.com/docs/integrations/providers/).
Use [LangGraph](https://langchain-ai.github.io/langgraph/) to build stateful agents with first-class streaming and human-in-the-loop support.
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/).
While the LangChain framework can be used standalone, it also integrates seamlessly
with any LangChain product, giving developers a full suite of tools when building LLM
applications.
### Open-source libraries
To improve your LLM application development, pair LangChain with:
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
- **`langchain-community`**: Third party integrations.
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[`LangGraph`](https://langchain-ai.github.io/langgraph/)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
- [LangSmith](https://www.langchain.com/langsmith) - Helpful for agent evals and
observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain
visibility in production, and improve performance over time.
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can
reliably handle complex tasks with LangGraph, our low-level agent orchestration
framework. LangGraph offers customizable architecture, long-term memory, and
human-in-the-loop workflows — and is trusted in production by companies like LinkedIn,
Uber, Klarna, and GitLab.
- [LangGraph Platform](https://docs.langchain.com/langgraph-platform) - Deploy
and scale agents effortlessly with a purpose-built deployment platform for long-running, stateful workflows. Discover, reuse, configure, and share agents across
teams — and iterate quickly with visual prototyping in
[LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
### Productionization:
## Additional resources
- **[LangSmith](https://docs.smith.langchain.com/)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with
guided examples on getting started with LangChain.
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code
snippets for topics such as tool calling, RAG use cases, and more.
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key
concepts behind the LangChain framework.
- [LangChain Forum](https://forum.langchain.com/): Connect with the community and share all of your technical questions, ideas, and feedback.
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on
navigating base packages and integrations for LangChain.
- [Chat LangChain](https://chat.langchain.com/): Ask questions & chat with our documentation.
### Deployment:
- **[LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.
![Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers.](docs/static/svg/langchain_stack_112024.svg#gh-light-mode-only "LangChain Architecture Overview")
![Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers.](docs/static/svg/langchain_stack_112024_dark.svg#gh-dark-mode-only "LangChain Architecture Overview")
## 🧱 What can you build with LangChain?
**❓ Question answering with RAG**
- [Documentation](https://python.langchain.com/docs/tutorials/rag/)
- End-to-end Example: [Chat LangChain](https://chat.langchain.com) and [repo](https://github.com/langchain-ai/chat-langchain)
**🧱 Extracting structured output**
- [Documentation](https://python.langchain.com/docs/tutorials/extraction/)
- End-to-end Example: [SQL Llama2 Template](https://github.com/langchain-ai/langchain-extract/)
**🤖 Chatbots**
- [Documentation](https://python.langchain.com/docs/tutorials/chatbot/)
- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
And much more! Head to the [Tutorials](https://python.langchain.com/docs/tutorials/) section of the docs for more.
## 🚀 How does LangChain help?
The main value props of the LangChain libraries are:
1. **Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
## LangChain Expression Language (LCEL)
LCEL is a key part of LangChain, allowing you to build and organize chains of processes in a straightforward, declarative manner. It was designed to support taking prototypes directly into production without needing to alter any code. This means you can use LCEL to set up everything from basic "prompt + LLM" setups to intricate, multi-step workflows.
- **[Overview](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel)**: LCEL and its benefits
- **[Interface](https://python.langchain.com/docs/concepts/#runnable-interface)**: The standard Runnable interface for LCEL objects
- **[Primitives](https://python.langchain.com/docs/how_to/#langchain-expression-language-lcel)**: More on the primitives LCEL includes
- **[Cheatsheet](https://python.langchain.com/docs/how_to/lcel_cheatsheet/)**: Quick overview of the most common usage patterns
## Components
Components fall into the following **modules**:
**📃 Model I/O**
This includes [prompt management](https://python.langchain.com/docs/concepts/#prompt-templates), [prompt optimization](https://python.langchain.com/docs/concepts/#example-selectors), a generic interface for [chat models](https://python.langchain.com/docs/concepts/#chat-models) and [LLMs](https://python.langchain.com/docs/concepts/#llms), and common utilities for working with [model outputs](https://python.langchain.com/docs/concepts/#output-parsers).
**📚 Retrieval**
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/concepts/#document-loaders) from a variety of sources, [preparing it](https://python.langchain.com/docs/concepts/#text-splitters), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/docs/concepts/#retrievers) it for use in the generation step.
**🤖 Agents**
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a [standard interface for agents](https://python.langchain.com/docs/concepts/#agents), along with [LangGraph](https://github.com/langchain-ai/langgraph) for building custom agents.
## 📖 Documentation
Please see [here](https://python.langchain.com) for full documentation, which includes:
- [Introduction](https://python.langchain.com/docs/introduction/): Overview of the framework and the structure of the docs.
- [Tutorials](https://python.langchain.com/docs/tutorials/): If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
- [How-to guides](https://python.langchain.com/docs/how_to/): Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
- [Conceptual guide](https://python.langchain.com/docs/concepts/): Conceptual explanations of the key parts of the framework.
- [API Reference](https://api.python.langchain.com): Thorough documentation of every class and method.
## 🌐 Ecosystem
- [🦜🛠️ LangSmith](https://docs.smith.langchain.com/): Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
- [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph/): Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
- [🦜🕸️ LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#langgraph-platform): Deploy LLM applications built with LangGraph into production.
## 💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see [here](https://python.langchain.com/docs/contributing/).
## 🌟 Contributors
[![langchain contributors](https://contrib.rocks/image?repo=langchain-ai/langchain&max=2000)](https://github.com/langchain-ai/langchain/graphs/contributors)

View File

@@ -1,44 +1,20 @@
# Security Policy
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the ability to access, interact with and manipulate external resources.
## Best practices
When building such applications, developers should remember to follow good security practices:
* [**Limit Permissions**](https://en.wikipedia.org/wiki/Principle_of_least_privilege): Scope permissions specifically to the application's need. Granting broad or excessive permissions can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials, disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), specifying proxy configurations to control external requests, etc., as appropriate for your application.
* **Anticipate Potential Misuse**: Just as humans can err, so can Large Language Models (LLMs). Always assume that any system access or credentials may be used in any way allowed by the permissions they are assigned. For example, if a pair of database credentials allows deleting data, it's safest to assume that any LLM able to use those credentials may in fact delete data.
* [**Defense in Depth**](https://en.wikipedia.org/wiki/Defense_in_depth_(computing)): No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate, the odds that a Large Language Model (LLM) may make a mistake. It's best to combine multiple layered security approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
Risks of not doing so include, but are not limited to:
* Data corruption or loss.
* Unauthorized access to confidential information.
* Compromised performance or availability of critical resources.
Example scenarios with mitigation strategies:
* A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
If you're building applications that access external resources like file systems, APIs
or databases, consider speaking with your company's security team to determine how to best
design and secure your applications.
## Reporting OSS Vulnerabilities
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
a bounty program for our open source projects.
LangChain is partnered with [huntr by Protect AI](https://huntr.com/) to provide
a bounty program for our open source projects.
Please report security vulnerabilities associated with the LangChain
open source projects at [huntr](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true).
Please report security vulnerabilities associated with the LangChain
open source projects by visiting the following link:
[https://huntr.com/bounties/disclose/](https://huntr.com/bounties/disclose/?target=https%3A%2F%2Fgithub.com%2Flangchain-ai%2Flangchain&validSearch=true)
Before reporting a vulnerability, please review:
1) In-Scope Targets and Out-of-Scope Targets below.
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
3) The [Best Practices](#best-practices) above to
3) LangChain [security guidelines](https://python.langchain.com/docs/security) to
understand what we consider to be a security vulnerability vs. developer
responsibility.
@@ -46,38 +22,39 @@ Before reporting a vulnerability, please review:
The following packages and repositories are eligible for bug bounties:
* langchain-core
* langchain (see exceptions)
* langchain-community (see exceptions)
* langgraph
* langserve
- langchain-core
- langchain (see exceptions)
- langchain-community (see exceptions)
- langgraph
- langserve
### Out of Scope Targets
All out of scope targets defined by huntr as well as:
* **langchain-experimental**: This repository is for experimental code and is not
eligible for bug bounties (see [package warning](https://pypi.org/project/langchain-experimental/)), bug reports to it will be marked as interesting or waste of
- **langchain-experimental**: This repository is for experimental code and is not
eligible for bug bounties, bug reports to it will be marked as interesting or waste of
time and published with no bounty attached.
* **tools**: Tools in either langchain or langchain-community are not eligible for bug
- **tools**: Tools in either langchain or langchain-community are not eligible for bug
bounties. This includes the following directories
* libs/langchain/langchain/tools
* libs/community/langchain_community/tools
* Please review the [Best Practices](#best-practices)
- langchain/tools
- langchain-community/tools
- Please review our [security guidelines](https://python.langchain.com/docs/security)
for more details, but generally tools interact with the real world. Developers are
expected to understand the security implications of their code and are responsible
for the security of their tools.
* Code documented with security notices. This will be decided on a case-by-case basis, but likely will not be eligible for a bounty as the code is already
- Code documented with security notices. This will be decided done on a case by
case basis, but likely will not be eligible for a bounty as the code is already
documented with guidelines for developers that should be followed for making their
application secure.
* Any LangSmith related repositories or APIs (see [Reporting LangSmith Vulnerabilities](#reporting-langsmith-vulnerabilities)).
- Any LangSmith related repositories or APIs see below.
## Reporting LangSmith Vulnerabilities
Please report security vulnerabilities associated with LangSmith by email to `security@langchain.dev`.
* LangSmith site: [https://smith.langchain.com](https://smith.langchain.com)
* SDK client: [https://github.com/langchain-ai/langsmith-sdk](https://github.com/langchain-ai/langsmith-sdk)
- LangSmith site: https://smith.langchain.com
- SDK client: https://github.com/langchain-ai/langsmith-sdk
### Other Security Concerns

View File

@@ -60,7 +60,7 @@
"id": "CI8Elyc5gBQF"
},
"source": [
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number."
"Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint it ready, you need to copy its number."
]
},
{

View File

@@ -47,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "6a75a5c6-34ee-4ab9-a664-d9b432d812ee",
"metadata": {},
"outputs": [
@@ -61,7 +61,7 @@
],
"source": [
"# Local\n",
"from langchain_ollama import ChatOllama\n",
"from langchain_community.chat_models import ChatOllama\n",
"\n",
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",

View File

@@ -144,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "kWDWfSDBMPl8",
"metadata": {},
"outputs": [
@@ -185,7 +185,7 @@
" )\n",
" # Text summary chain\n",
" model = VertexAI(\n",
" temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024\n",
" temperature=0, model_name=\"gemini-pro\", max_tokens=1024\n",
" ).with_fallbacks([empty_response])\n",
" summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()\n",
"\n",
@@ -235,7 +235,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"id": "PeK9bzXv3olF",
"metadata": {},
"outputs": [],
@@ -254,7 +254,7 @@
"\n",
"def image_summarize(img_base64, prompt):\n",
" \"\"\"Make image summary\"\"\"\n",
" model = ChatVertexAI(model=\"gemini-2.5-flash\", max_tokens=1024)\n",
" model = ChatVertexAI(model=\"gemini-pro-vision\", max_tokens=1024)\n",
"\n",
" msg = model.invoke(\n",
" [\n",
@@ -394,7 +394,7 @@
"# The vectorstore to use to index the summaries\n",
"vectorstore = Chroma(\n",
" collection_name=\"mm_rag_cj_blog\",\n",
" embedding_function=VertexAIEmbeddings(model_name=\"text-embedding-005\"),\n",
" embedding_function=VertexAIEmbeddings(model_name=\"textembedding-gecko@latest\"),\n",
")\n",
"\n",
"# Create retriever\n",
@@ -431,7 +431,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "GlwCErBaCKQW",
"metadata": {},
"outputs": [],
@@ -553,7 +553,7 @@
" \"\"\"\n",
"\n",
" # Multi-modal LLM\n",
" model = ChatVertexAI(temperature=0, model_name=\"gemini-2.5-flash\", max_tokens=1024)\n",
" model = ChatVertexAI(temperature=0, model_name=\"gemini-pro-vision\", max_tokens=1024)\n",
"\n",
" # RAG pipeline\n",
" chain = (\n",

View File

@@ -21,6 +21,7 @@ Notebook | Description
[code-analysis-deeplake.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/code-analysis-deeplake.ipynb) | Analyze its own code base with the help of gpt and activeloop's deep lake.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval.ipynb) | Build a custom agent that can interact with ai plugins by retrieving tools and creating natural language wrappers around openapi endpoints.
[custom_agent_with_plugin_retri...](https://github.com/langchain-ai/langchain/tree/master/cookbook/custom_agent_with_plugin_retrieval_using_plugnplai.ipynb) | Build a custom agent with plugin retrieval functionality, utilizing ai plugins from the `plugnplai` directory.
[databricks_sql_db.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/databricks_sql_db.ipynb) | Connect to databricks runtimes and databricks sql.
[deeplake_semantic_search_over_...](https://github.com/langchain-ai/langchain/tree/master/cookbook/deeplake_semantic_search_over_chat.ipynb) | Perform semantic search and question-answering over a group chat using activeloop's deep lake with gpt4.
[elasticsearch_db_qa.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/elasticsearch_db_qa.ipynb) | Interact with elasticsearch analytics databases in natural language and build search queries via the elasticsearch dsl API.
[extraction_openai_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/extraction_openai_tools.ipynb) | Structured Data Extraction with OpenAI Tools
@@ -49,7 +50,7 @@ Notebook | Description
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
[rag_upstage_document_parse_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_document_parse_groundedness_check.ipynb) | End-to-end RAG example using Upstage Document Parse and Groundedness Check.
[rag_upstage_layout_analysis_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_layout_analysis_groundedness_check.ipynb) | End-to-end RAG example using Upstage Layout Analysis and Groundedness Check.
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
@@ -62,6 +63,4 @@ Notebook | Description
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
[rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
[rag-agents-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/local_rag_agents_intel_cpu.ipynb) | Build a RAG agent locally with open source models that routes questions through one of two paths to find answers. The agent generates answers based on documents retrieved from either the vector database or retrieved from web search. If the vector database lacks relevant information, the agent opts for web search. Open-source models for LLM and embeddings are used locally on an Intel Xeon CPU to execute this pipeline.
[rag_mlflow_tracking_evaluation.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_mlflow_tracking_evaluation.ipynb) | Guide on how to create a RAG pipeline and track + evaluate it with MLflow.
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.

View File

@@ -204,14 +204,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "523e6ed2-2132-4748-bdb7-db765f20648d",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models import ChatOllama\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_ollama import ChatOllama"
"from langchain_core.prompts import ChatPromptTemplate"
]
},
{

View File

@@ -30,7 +30,7 @@
"outputs": [],
"source": [
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml matplotlib tiktoken open_clip_torch torch"
"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml pillow matplotlib tiktoken open_clip_torch torch"
]
},
{
@@ -409,7 +409,7 @@
" table_summaries,\n",
" tables,\n",
" image_summaries,\n",
" img_base64_list,\n",
" image_summaries,\n",
")"
]
},

View File

@@ -22,19 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e8d63d14-138d-4aa5-a741-7fd3537d00aa",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 16,
"id": "2e87c10a",
"metadata": {},
"outputs": [],
@@ -49,7 +37,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 17,
"id": "0b7b772b",
"metadata": {},
"outputs": [],
@@ -66,10 +54,19 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 18,
"id": "f2675861",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"from langchain_community.document_loaders import TextLoader\n",
"\n",
@@ -84,7 +81,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "bc5403d4",
"metadata": {},
"outputs": [],
@@ -96,25 +93,17 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "1431cded",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"USER_AGENT environment variable not set, consider setting it to identify your requests.\n"
]
}
],
"outputs": [],
"source": [
"from langchain_community.document_loaders import WebBaseLoader"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 6,
"id": "915d3ff3",
"metadata": {},
"outputs": [],
@@ -124,20 +113,16 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"id": "96a2edf8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"name": "stdout",
"output_type": "stream",
"text": [
"Created a chunk of size 2122, which is longer than the specified 1000\n",
"Created a chunk of size 3187, which is longer than the specified 1000\n",
"Created a chunk of size 1017, which is longer than the specified 1000\n",
"Created a chunk of size 1049, which is longer than the specified 1000\n",
"Created a chunk of size 1256, which is longer than the specified 1000\n",
"Created a chunk of size 2321, which is longer than the specified 1000\n"
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
@@ -150,6 +135,14 @@
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71ecef90",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "c0a6c031",
@@ -160,30 +153,31 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 43,
"id": "eb142786",
"metadata": {},
"outputs": [],
"source": [
"# Import things that are needed generically\n",
"from langchain.agents import Tool"
"from langchain.agents import AgentType, Tool, initialize_agent\n",
"from langchain_openai import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 44,
"id": "850bc4e9",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name=\"state_of_union_qa_system\",\n",
" name=\"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
" ),\n",
" Tool(\n",
" name=\"ruff_qa_system\",\n",
" name=\"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
" ),\n",
@@ -192,116 +186,94 @@
},
{
"cell_type": "code",
"execution_count": 11,
"id": "70c461d8-aaca-4f2a-9a93-bf35841cc615",
"execution_count": 45,
"id": "fc47f230",
"metadata": {},
"outputs": [],
"source": [
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "a6d2b911-3044-4430-a35b-75832bb45334",
"execution_count": 46,
"id": "10ca2db8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"What did biden say about ketanji brown jackson in the state of the union address?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" state_of_union_qa_system (call_26QlRdsptjEJJZjFsAUjEbaH)\n",
" Call ID: call_26QlRdsptjEJJZjFsAUjEbaH\n",
" Args:\n",
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: state_of_union_qa_system\n",
"\n",
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"\n",
"In the State of the Union address, Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
"agent.run(\n",
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e836b4cd-abf7-49eb-be0e-b9ad501213f3",
"execution_count": 47,
"id": "4e91b811",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Why use ruff over flake8?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" ruff_qa_system (call_KqDoWeO9bo9OAXdxOsCb6msC)\n",
" Call ID: call_KqDoWeO9bo9OAXdxOsCb6msC\n",
" Args:\n",
" __arg1: Why use ruff over flake8?\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: ruff_qa_system\n",
"\n",
"\n",
"There are a few reasons why someone might choose to use Ruff over Flake8:\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"\n",
"1. Larger rule set: Ruff implements over 800 rules, while Flake8 only implements around 200. This means that Ruff can catch more potential issues in your code.\n",
"\n",
"2. Better compatibility with other tools: Ruff is designed to work well with other tools like Black, isort, and type checkers like Mypy. This means that you can use Ruff alongside these tools to get more comprehensive feedback on your code.\n",
"\n",
"3. Automatic fixing of lint violations: Unlike Flake8, Ruff is capable of automatically fixing its own lint violations. This can save you time and effort when fixing issues in your code.\n",
"\n",
"4. Native implementation of popular Flake8 plugins: Ruff re-implements some of the most popular Flake8 plugins natively, which means you don't have to install and configure multiple plugins to get the same functionality.\n",
"\n",
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8, making it a popular choice for many developers.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\n",
"You might choose to use Ruff over Flake8 for several reasons:\n",
"\n",
"1. Ruff has a much larger rule set, implementing over 800 rules compared to Flake8's roughly 200, so it can catch more potential issues.\n",
"2. Ruff is designed to work better with other tools like Black, isort, and type checkers like Mypy, providing more comprehensive code feedback.\n",
"3. Ruff can automatically fix its own lint violations, which Flake8 cannot, saving time and effort.\n",
"4. Ruff natively implements some popular Flake8 plugins, so you don't need to install and configure multiple plugins separately.\n",
"\n",
"Overall, Ruff offers a more comprehensive and user-friendly experience compared to Flake8.\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"Why use ruff over flake8?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
"agent.run(\"Why use ruff over flake8?\")"
]
},
{
@@ -324,20 +296,20 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 48,
"id": "f59b377e",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name=\"state_of_union_qa_system\",\n",
" name=\"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question.\",\n",
" return_direct=True,\n",
" ),\n",
" Tool(\n",
" name=\"ruff_qa_system\",\n",
" name=\"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.\",\n",
" return_direct=True,\n",
@@ -347,92 +319,90 @@
},
{
"cell_type": "code",
"execution_count": 15,
"id": "06f69c0f-c83d-4b7f-a1c8-7614aced3bae",
"execution_count": 49,
"id": "8615707a",
"metadata": {},
"outputs": [],
"source": [
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "a6b38c12-ac25-43c0-b9c2-2b1985ab4825",
"execution_count": 50,
"id": "36e718a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"What did biden say about ketanji brown jackson in the state of the union address?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" state_of_union_qa_system (call_yjxh11OnZiauoyTAn9npWdxj)\n",
" Call ID: call_yjxh11OnZiauoyTAn9npWdxj\n",
" Args:\n",
" __arg1: What did Biden say about Ketanji Brown Jackson in the state of the union address?\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: state_of_union_qa_system\n",
"\n",
" Biden said that he nominated Ketanji Brown Jackson for the United States Supreme Court and praised her as one of the nation's top legal minds who will continue Justice Breyer's legacy of excellence.\n"
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what Biden said about Ketanji Brown Jackson in the State of the Union address.\n",
"Action: State of Union QA System\n",
"Action Input: What did Biden say about Ketanji Brown Jackson in the State of the Union address?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\" Biden said that Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What did biden say about ketanji brown jackson in the state of the union address?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
"agent.run(\n",
" \"What did biden say about ketanji brown jackson in the state of the union address?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "88f08d86-7972-4148-8128-3ac8898ad68a",
"execution_count": 51,
"id": "edfd0a1a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"Why use ruff over flake8?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" ruff_qa_system (call_GiWWfwF6wbbRFQrHlHbhRtGW)\n",
" Call ID: call_GiWWfwF6wbbRFQrHlHbhRtGW\n",
" Args:\n",
" __arg1: What are the advantages of using ruff over flake8 for Python linting?\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: ruff_qa_system\n",
"\n",
" Ruff has a larger rule set, supports automatic fixing of lint violations, and does not require the installation of additional plugins. It also has better compatibility with Black and can be used alongside a type checker for more comprehensive code analysis.\n"
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out the advantages of using ruff over flake8\n",
"Action: Ruff QA System\n",
"Action Input: What are the advantages of using ruff over flake8?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"Why use ruff over flake8?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
"agent.run(\"Why use ruff over flake8?\")"
]
},
{
@@ -447,19 +417,19 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 57,
"id": "d397a233",
"metadata": {},
"outputs": [],
"source": [
"tools = [\n",
" Tool(\n",
" name=\"state_of_union_qa_system\",\n",
" name=\"State of Union QA System\",\n",
" func=state_of_union.run,\n",
" description=\"useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
" ),\n",
" Tool(\n",
" name=\"ruff_qa_system\",\n",
" name=\"Ruff QA System\",\n",
" func=ruff.run,\n",
" description=\"useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.\",\n",
" ),\n",
@@ -468,60 +438,60 @@
},
{
"cell_type": "code",
"execution_count": 19,
"id": "41743f29-150d-40ba-aa8e-3a63c32216aa",
"execution_count": 58,
"id": "06157240",
"metadata": {},
"outputs": [],
"source": [
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"agent = create_react_agent(\"openai:gpt-4.1-mini\", tools)"
"# Construct the agent. We will use the default agent type here.\n",
"# See documentation for a full list of options.\n",
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "e20e81dd-284a-4d07-9160-63a84b65cba8",
"execution_count": 59,
"id": "b492b520",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
"\n",
"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"Tool Calls:\n",
" ruff_qa_system (call_VOnxiOEehauQyVOTjDJkR5L2)\n",
" Call ID: call_VOnxiOEehauQyVOTjDJkR5L2\n",
" Args:\n",
" __arg1: What tool does ruff use to run over Jupyter Notebooks?\n",
" state_of_union_qa_system (call_AbSsXAxwe4JtCRhga926SxOZ)\n",
" Call ID: call_AbSsXAxwe4JtCRhga926SxOZ\n",
" Args:\n",
" __arg1: Did the president mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union?\n",
"=================================\u001b[1m Tool Message \u001b[0m=================================\n",
"Name: state_of_union_qa_system\n",
"\n",
" No, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n",
"==================================\u001b[1m Ai Message \u001b[0m==================================\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.\n",
"Action: Ruff QA System\n",
"Action Input: What tool does ruff use to run over Jupyter Notebooks?\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now need to find out if the president mentioned this tool in the state of the union.\n",
"Action: State of Union QA System\n",
"Action Input: Did the president mention nbQA in the state of the union?\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: No, the president did not mention nbQA in the state of the union.\u001b[0m\n",
"\n",
"Ruff does not support source.organizeImports and source.fixAll code actions in Jupyter Notebooks. Additionally, the president did not mention the tool that ruff uses to run over Jupyter Notebooks in the state of the union.\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'No, the president did not mention nbQA in the state of the union.'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_message = {\n",
" \"role\": \"user\",\n",
" \"content\": \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\",\n",
"}\n",
"\n",
"for step in agent.stream(\n",
" {\"messages\": [input_message]},\n",
" stream_mode=\"values\",\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
"agent.run(\n",
" \"What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?\"\n",
")"
]
},
{
@@ -549,7 +519,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.10.1"
}
},
"nbformat": 4,

View File

@@ -31,8 +31,8 @@
"source": [
"# Optional\n",
"import os\n",
"# os.environ['LANGSMITH_TRACING'] = 'true' # enables tracing\n",
"# os.environ['LANGSMITH_API_KEY'] = <your-api-key>"
"# os.environ['LANGCHAIN_TRACING_V2'] = 'true' # enables tracing\n",
"# os.environ['LANGCHAIN_API_KEY'] = <your-api-key>"
]
},
{

View File

@@ -66,7 +66,7 @@
},
"outputs": [],
"source": [
"#!python3 -m pip install --upgrade langchain langchain-deeplake openai"
"#!python3 -m pip install --upgrade langchain deeplake openai"
]
},
{
@@ -666,26 +666,89 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain-code', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (8244, 1536) float32 None \n",
" id text (8244, 1) str None \n",
" metadata json (8244, 1) str None \n",
" text text (8244, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"data": {
"text/plain": [
"<langchain_community.vectorstores.deeplake.DeepLake at 0x7fe1b67d7a30>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_deeplake.vectorstores import DeeplakeVectorStore\n",
"from langchain_community.vectorstores import DeepLake\n",
"\n",
"username = \"<USERNAME_OR_ORG>\"\n",
"\n",
"\n",
"db = DeeplakeVectorStore.from_documents(\n",
" documents=texts,\n",
" embedding=embeddings,\n",
" dataset_path=f\"hub://{username}/langchain-code\",\n",
" overwrite=True,\n",
"db = DeepLake.from_documents(\n",
" texts, embeddings, dataset_path=f\"hub://{username}/langchain-code\", overwrite=True\n",
")\n",
"db"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"`Optional`: You can also use Deep Lake's Managed Tensor Database as a hosting service and run queries there. In order to do so, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# from langchain_community.vectorstores import DeepLake\n",
"\n",
"# db = DeepLake.from_documents(\n",
"# texts, embeddings, dataset_path=f\"hub://{<org_id>}/langchain-code\", runtime={\"tensor_db\": True}\n",
"# )\n",
"# db"
]
},
{
"attachments": {},
"cell_type": "markdown",
@@ -697,16 +760,24 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://adilkhan/langchain-code already exists, loading from the storage\n"
]
}
],
"source": [
"db = DeeplakeVectorStore(\n",
"db = DeepLake(\n",
" dataset_path=f\"hub://{username}/langchain-code\",\n",
" read_only=True,\n",
" embedding_function=embeddings,\n",
" embedding=embeddings,\n",
")"
]
},
@@ -725,6 +796,36 @@
"retriever.search_kwargs[\"k\"] = 20"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also specify user defined functions using [Deep Lake filters](https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def filter(x):\n",
" # filter based on source code\n",
" if \"something\" in x[\"text\"].data()[\"value\"]:\n",
" return False\n",
"\n",
" # filter based on path e.g. extension\n",
" metadata = x[\"metadata\"].data()[\"value\"]\n",
" return \"only_this\" in metadata[\"source\"] or \"also_that\" in metadata[\"source\"]\n",
"\n",
"\n",
"### turn on below for custom filtering\n",
"# retriever.search_kwargs['filter'] = filter"
]
},
{
"cell_type": "code",
"execution_count": 20,
@@ -736,8 +837,10 @@
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"qa = RetrievalQA.from_llm(model, retriever=retriever)"
"model = ChatOpenAI(\n",
" model_name=\"gpt-3.5-turbo-0613\"\n",
") # 'ada' 'gpt-3.5-turbo-0613' 'gpt-4',\n",
"qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)"
]
},
{

View File

@@ -229,9 +229,9 @@
" \"smoke\",\n",
" \"temp\",\n",
"]\n",
"assert all([column in df.columns for column in expected_columns]), (\n",
" \"DataFrame does not have the expected columns\"\n",
")"
"assert all(\n",
" [column in df.columns for column in expected_columns]\n",
"), \"DataFrame does not have the expected columns\""
]
},
{

View File

@@ -0,0 +1,273 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "707d13a7",
"metadata": {},
"source": [
"# Databricks\n",
"\n",
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
]
},
{
"cell_type": "markdown",
"id": "0076d072",
"metadata": {},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "739b489b",
"metadata": {},
"outputs": [],
"source": [
"!pip install databricks-sql-connector"
]
},
{
"cell_type": "markdown",
"id": "73113163",
"metadata": {},
"source": [
"## Connecting to Databricks\n",
"\n",
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
"\n",
"### Syntax\n",
"```python\n",
"SQLDatabase.from_databricks(\n",
" catalog: str,\n",
" schema: str,\n",
" host: Optional[str] = None,\n",
" api_token: Optional[str] = None,\n",
" warehouse_id: Optional[str] = None,\n",
" cluster_id: Optional[str] = None,\n",
" engine_args: Optional[dict] = None,\n",
" **kwargs: Any)\n",
"```\n",
"### Required Parameters\n",
"* `catalog`: The catalog name in the Databricks database.\n",
"* `schema`: The schema name in the catalog.\n",
"\n",
"### Optional Parameters\n",
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
]
},
{
"cell_type": "markdown",
"id": "b11c7e48",
"metadata": {},
"source": [
"## Examples"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8102bca0",
"metadata": {},
"outputs": [],
"source": [
"# Connecting to Databricks with SQLDatabase wrapper\n",
"from langchain_community.utilities import SQLDatabase\n",
"\n",
"db = SQLDatabase.from_databricks(catalog=\"samples\", schema=\"nyctaxi\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9dd36f58",
"metadata": {},
"outputs": [],
"source": [
"# Creating a OpenAI Chat LLM wrapper\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
]
},
{
"cell_type": "markdown",
"id": "5b5c5f1a",
"metadata": {},
"source": [
"### SQL Chain example\n",
"\n",
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "36f2270b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.utilities import SQLDatabaseChain\n",
"\n",
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4e2b5f25",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"What is the average duration of taxi rides that start between midnight and 6am?\n",
"SQLQuery:\u001b[32;1m\u001b[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
"FROM trips\n",
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(987.8122786304605,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\n",
" \"What is the average duration of taxi rides that start between midnight and 6am?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e496d5e5",
"metadata": {},
"source": [
"### SQL Database Agent example\n",
"\n",
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9918e86a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import create_sql_agent\n",
"from langchain_community.agent_toolkits import SQLDatabaseToolkit\n",
"\n",
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
"agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c484a76e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n",
"Action Input: \u001b[0m\n",
"Observation: \u001b[38;5;200m\u001b[1;3mtrips\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
"Action: schema_sql_db\n",
"Action Input: trips\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"CREATE TABLE trips (\n",
"\ttpep_pickup_datetime TIMESTAMP, \n",
"\ttpep_dropoff_datetime TIMESTAMP, \n",
"\ttrip_distance FLOAT, \n",
"\tfare_amount FLOAT, \n",
"\tpickup_zip INT, \n",
"\tdropoff_zip INT\n",
") USING DELTA\n",
"\n",
"/*\n",
"3 rows from trips table:\n",
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
"*/\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
"Action: query_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m[(30.6, '0 00:43:31.000000000')]\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What is the longest trip distance and how long did it take?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -115,7 +115,7 @@
"\n",
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
"\n",
"Unless told to do not query for all the columns from a specific index, only ask for a few relevant columns given the question.\n",
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
"\n",
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
"\n",

View File

@@ -487,7 +487,7 @@
" print(\"*\" * 40)\n",
" print(\n",
" colored(\n",
" f\"After {i + 1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
" f\"After {i+1} observations, Tommie's summary is:\\n{tommie.get_summary(force_refresh=True)}\",\n",
" \"blue\",\n",
" )\n",
" )\n",

View File

@@ -20,7 +20,11 @@
"cell_type": "markdown",
"id": "5939a54c-3198-4ba4-8346-1cc088c473c0",
"metadata": {},
"source": "##### You can embed text in the same VectorDB space as images, and retrieve text and images as well based on input text or image.\n##### Following link demonstrates that.\n<a> https://python.langchain.com/v0.2/docs/integrations/text_embedding/open_clip/ </a>"
"source": [
"##### You can embed text in the same VectorDB space as images, and retreive text and images as well based on input text or image.\n",
"##### Following link demonstrates that.\n",
"<a> https://python.langchain.com/v0.2/docs/integrations/text_embedding/open_clip/ </a>"
]
},
{
"cell_type": "markdown",
@@ -354,7 +358,7 @@
"id": "6e5cd014-db86-4d6b-8399-25cae3da5570",
"metadata": {},
"source": [
"## Helper function to plot retrieved similar images"
"## Helper function to plot retrived similar images"
]
},
{
@@ -385,7 +389,7 @@
" ax = axs[idx]\n",
" ax.imshow(img)\n",
" # Assuming similarity is not available in the new data, removed sim_score\n",
" ax.title.set_text(f\"\\nProduct ID: {data['id']}\\n Score: {score}\")\n",
" ax.title.set_text(f\"\\nProduct ID: {data[\"id\"]}\\n Score: {score}\")\n",
" ax.axis(\"off\") # Turn off axis\n",
"\n",
" # Hide any remaining empty subplots\n",
@@ -596,4 +600,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -79,17 +79,6 @@
"tool_executor = ToolExecutor(tools)"
]
},
{
"cell_type": "markdown",
"id": "168152fc",
"metadata": {},
"source": [
"📘 **Note on `SystemMessage` usage with LangGraph-based agents**\n",
"\n",
"When constructing the `messages` list for an agent, you *must* manually include any `SystemMessage`s.\n",
"Unlike some agent executors in LangChain that set a default, LangGraph requires explicit inclusion."
]
},
{
"cell_type": "markdown",
"id": "fe6e8f78-1ef7-42ad-b2bf-835ed5850553",

File diff suppressed because one or more lines are too long

View File

@@ -148,11 +148,11 @@
"\n",
" instructions = \"None\"\n",
" for i in range(max_meta_iters):\n",
" print(f\"[Episode {i + 1}/{max_meta_iters}]\")\n",
" print(f\"[Episode {i+1}/{max_meta_iters}]\")\n",
" chain = initialize_chain(instructions, memory=None)\n",
" output = chain.predict(human_input=task)\n",
" for j in range(max_iters):\n",
" print(f\"(Step {j + 1}/{max_iters})\")\n",
" print(f\"(Step {j+1}/{max_iters})\")\n",
" print(f\"Assistant: {output}\")\n",
" print(\"Human: \")\n",
" human_input = input()\n",

View File

@@ -124,8 +124,8 @@
"# Optional-- If you want to enable Langsmith -- good for debugging\n",
"import os\n",
"\n",
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
"os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass()"
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
]
},
{
@@ -156,7 +156,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Ensure you have an HF_TOKEN in your development environment:\n",
"# Ensure you have an HF_TOKEN in your development enviornment:\n",
"# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)\n",
"\n",
"# Load MongoDB's embedded_movies dataset from Hugging Face\n",

View File

@@ -23,41 +23,7 @@
},
{
"cell_type": "markdown",
"id": "2498a0a1",
"metadata": {},
"source": [
"## Packages\n",
"\n",
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "febbc459-ebba-4c1a-a52b-fed7731593f8",
"metadata": {},
"outputs": [],
"source": [
"! pip install --quiet -U langchain-vdms langchain-experimental langchain-ollama\n",
"\n",
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" \"onnxruntime==1.17.0\" pillow pydantic lxml open_clip_torch"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "78ac6543",
"metadata": {},
"outputs": [],
"source": [
"# from dotenv import load_dotenv, find_dotenv\n",
"# load_dotenv(find_dotenv(), override=True);"
]
},
{
"cell_type": "markdown",
"id": "e5c8916e",
"id": "6a6b6e73",
"metadata": {},
"source": [
"## Start VDMS Server\n",
@@ -68,15 +34,15 @@
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1e6e2c15",
"execution_count": 1,
"id": "5f483872",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a701e5ac3523006e9540b5355e2d872d5d78383eab61562a675d5b9ac21fde65\n"
"a1b9206b08ef626e15b356bf9e031171f7c7eb8f956a2733f196f0109246fe2b\n"
]
}
],
@@ -84,11 +50,45 @@
"! docker run --rm -d -p 55559:55555 --name vdms_rag_nb intellabs/vdms:latest\n",
"\n",
"# Connect to VDMS Vector Store\n",
"from langchain_vdms.vectorstores import VDMS_Client\n",
"from langchain_community.vectorstores.vdms import VDMS_Client\n",
"\n",
"vdms_client = VDMS_Client(port=55559)"
]
},
{
"cell_type": "markdown",
"id": "2498a0a1",
"metadata": {},
"source": [
"## Packages\n",
"\n",
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "febbc459-ebba-4c1a-a52b-fed7731593f8",
"metadata": {},
"outputs": [],
"source": [
"! pip install --quiet -U vdms langchain-experimental\n",
"\n",
"# lock to 0.10.19 due to a persistent bug in more recent versions\n",
"! pip install --quiet pdf2image \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml open_clip_torch"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "78ac6543",
"metadata": {},
"outputs": [],
"source": [
"# from dotenv import load_dotenv, find_dotenv\n",
"# load_dotenv(find_dotenv(), override=True);"
]
},
{
"cell_type": "markdown",
"id": "1e94b3fb-8e3e-4736-be0a-ad881626c7bd",
@@ -115,12 +115,11 @@
"import requests\n",
"\n",
"# Folder to store pdf and extracted images\n",
"base_datapath = Path(\"./data/multimodal_files\").resolve()\n",
"datapath = base_datapath / \"images\"\n",
"datapath = Path(\"./data/multimodal_files\").resolve()\n",
"datapath.mkdir(parents=True, exist_ok=True)\n",
"\n",
"pdf_url = \"https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\"\n",
"pdf_path = str(base_datapath / pdf_url.split(\"/\")[-1])\n",
"pdf_path = str(datapath / pdf_url.split(\"/\")[-1])\n",
"with open(pdf_path, \"wb\") as f:\n",
" f.write(requests.get(pdf_url).content)"
]
@@ -186,8 +185,8 @@
"source": [
"import os\n",
"\n",
"from langchain_community.vectorstores import VDMS\n",
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
"from langchain_vdms import VDMS\n",
"\n",
"# Create VDMS\n",
"vectorstore = VDMS(\n",
@@ -313,10 +312,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.messages import HumanMessage, SystemMessage\n",
"from langchain_community.llms.ollama import Ollama\n",
"from langchain_core.messages import HumanMessage\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
"from langchain_ollama.llms import OllamaLLM\n",
"\n",
"\n",
"def prompt_func(data_dict):\n",
@@ -341,8 +340,8 @@
" \"As an expert art critic and historian, your task is to analyze and interpret images, \"\n",
" \"considering their historical and cultural significance. Alongside the images, you will be \"\n",
" \"provided with related text to offer context. Both will be retrieved from a vectorstore based \"\n",
" \"on user-input keywords. Please use your extensive knowledge and analytical skills to provide a \"\n",
" \"comprehensive summary that includes:\\n\"\n",
" \"on user-input keywords. Please convert answers to english and use your extensive knowledge \"\n",
" \"and analytical skills to provide a comprehensive summary that includes:\\n\"\n",
" \"- A detailed description of the visual elements in the image.\\n\"\n",
" \"- The historical and cultural context of the image.\\n\"\n",
" \"- An interpretation of the image's symbolism and meaning.\\n\"\n",
@@ -360,7 +359,7 @@
" \"\"\"Multi-modal RAG chain\"\"\"\n",
"\n",
" # Multi-modal LLM\n",
" llm_model = OllamaLLM(\n",
" llm_model = Ollama(\n",
" verbose=True, temperature=0.5, model=\"llava\", base_url=\"http://localhost:11434\"\n",
" )\n",
"\n",
@@ -420,121 +419,6 @@
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"© 2017 LARRY D. MOORE\n",
"\n",
"contemporary criticism of the less-than- thoughtful circumstances under which Lange photographed Thomson, the pictures power to engage has not diminished. Artists in other countries have appropriated the image, changing the mothers features into those of other ethnicities, but keeping her expression and the positions of her clinging children. Long after anyone could help the Thompson family, this picture has resonance in another time of national crisis, unemployment and food shortages.\n",
"\n",
"A striking, but very different picture is a 1900 portrait of the legendary Hin-mah-too-yah- lat-kekt (Chief Joseph) of the Nez Percé people. The Bureau of American Ethnology in Washington, D.C., regularly arranged for its photographer, De Lancey Gill, to photograph Native American delegations that came to the capital to confer with officials about tribal needs and concerns. Although Gill described Chief Joseph as having “an air of gentleness and quiet reserve,” the delegate skeptically appraises the photographer, which is not surprising given that the United States broke five treaties with Chief Joseph and his father between 1855 and 1885.\n",
"\n",
"More than a glance, second looks may reveal new knowledge into complex histories.\n",
"\n",
"Anne Wilkes Tucker is the photography curator emeritus of the Museum of Fine Arts, Houston and curator of the “Not an Ostrich” exhibition.\n",
"\n",
"28\n",
"\n",
"28 LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"THEYRE WILLING TO HAVE MEENTERTAIN THEM DURING THE DAY,BUT AS SOON AS IT STARTSGETTING DARK, THEY ALLGO OFF, AND LEAVE ME! \n",
"ROSA PARKS: IN HER OWN WORDS\n",
"\n",
"COMIC ART: 120 YEARS OF PANELS AND PAGES\n",
"\n",
"SHALL NOT BE DENIED: WOMEN FIGHT FOR THE VOTE\n",
"\n",
"More information loc.gov/exhibits\n",
"Nuestra Sefiora de las Iguanas\n",
"\n",
"Graciela Iturbides 1979 portrait of Zobeida Díaz in the town of Juchitán in southeastern Mexico conveys the strength of women and reflects their important contributions to the economy. Díaz, a merchant, was selling iguanas to cook and eat, carrying them on her head, as is customary.\n",
"\n",
"GRACIELA ITURBIDE. “NUESTRA SEÑORA DE LAS IGUANAS.” 1979. GELATIN SILVER PRINT. © GRACIELA ITURBIDE, USED BY PERMISSION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
"\n",
"Iturbide requested permission to take a photograph, but this proved challenging because the iguanas were constantly moving, causing Díaz to laugh. The result, however, was a brilliant portrait that the inhabitants of Juchitán claimed with pride. They have reproduced it on posters and erected a statue honoring Díaz and her iguanas. The photo now appears throughout the world, inspiring supporters of feminism, womens rights and gender equality.\n",
"\n",
"—Adam Silvia is a curator in the Prints and Photographs Division.\n",
"\n",
"6\n",
"\n",
"6 LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"Migrant Mother is Florence Owens Thompson\n",
"\n",
"The iconic portrait that became the face of the Great Depression is also the most famous photograph in the collections of the Library of Congress.\n",
"\n",
"The Library holds the original source of the photo — a nitrate negative measuring 4 by 5 inches. Do you see a faint thumb in the bottom right? The photographer, Dorothea Lange, found the thumb distracting and after a few years had the negative altered to make the thumb almost invisible. Langes boss at the Farm Security Administration, Roy Stryker, criticized her action because altering a negative undermines the credibility of a documentary photo.\n",
"Shrimp Picker\n",
"\n",
"The photos and evocative captions of Lewis Hine served as source material for National Child Labor Committee reports and exhibits exposing abusive child labor practices in the United States in the first decades of the 20th century.\n",
"\n",
"LEWIS WICKES HINE. “MANUEL, THE YOUNG SHRIMP-PICKER, FIVE YEARS OLD, AND A MOUNTAIN OF CHILD-LABOR OYSTER SHELLS BEHIND HIM. HE WORKED LAST YEAR. UNDERSTANDS NOT A WORD OF ENGLISH. DUNBAR, LOPEZ, DUKATE COMPANY. LOCATION: BILOXI, MISSISSIPPI.” FEBRUARY 1911. NATIONAL CHILD LABOR COMMITTEE COLLECTION. PRINTS AND PHOTOGRAPHS DIVISION.\n",
"\n",
"For 15 years, Hine\n",
"\n",
"crisscrossed the country, documenting the practices of the worst offenders. His effective use of photography made him one of the committee's greatest publicists in the campaign for legislation to ban child labor.\n",
"\n",
"Hine was a master at taking photos that catch attention and convey a message and, in this photo, he framed Manuel in a setting that drove home the boys small size and unsafe environment.\n",
"\n",
"Captions on photos of other shrimp pickers emphasized their long working hours as well as one hazard of the job: The acid from the shrimp made pickers hands sore and “eats the shoes off your feet.”\n",
"\n",
"Such images alerted viewers to all that workers, their families and the nation sacrificed when children were part of the labor force. The Library holds paper records of the National Child Labor Committee as well as over 5,000 photographs.\n",
"\n",
"—Barbara Natanson is head of the Reference Section in the Prints and Photographs Division.\n",
"\n",
"8\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"Intergenerational Portrait\n",
"\n",
"Raised on the Apsáalooke (Crow) reservation in Montana, photographer Wendy Red Star created her “Apsáalooke Feminist” self-portrait series with her daughter Beatrice. With a dash of wry humor, mother and daughter are their own first-person narrators.\n",
"\n",
"Red Star explains the significance of their appearance: “The dress has power: You feel strong and regal wearing it. In my art, the elk tooth dress specifically symbolizes Crow womanhood and the matrilineal line connecting me to my ancestors. As a mother, I spend hours searching for the perfect elk tooth dress materials to make a prized dress for my daughter.”\n",
"\n",
"In a world that struggles with cultural identities, this photograph shows us the power and beauty of blending traditional and contemporary styles.\n",
"American Gothic Product #216040262 Price: $24\n",
"\n",
"U.S. Capitol at Night Product #216040052 Price: $24\n",
"\n",
"Good Reading Ahead Product #21606142 Price: $24\n",
"\n",
"Gordon Parks created an iconic image with this 1942 photograph of cleaning woman Ella Watson.\n",
"\n",
"Snow blankets the U.S. Capitol in this classic image by Ernest L. Crandall.\n",
"\n",
"Start your new year out right with a poster promising good reading for months to come.\n",
"\n",
"▪ Order online: loc.gov/shop ▪ Order by phone: 888.682.3557\n",
"\n",
"26\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"LIBRARY OF CONGRESS MAGAZINE\n",
"\n",
"SUPPORT\n",
"\n",
"A PICTURE OF PHILANTHROPY Annenberg Foundation Gives $1 Million and a Photographic Collection to the Library.\n",
"\n",
"A major gift by Wallis Annenberg and the Annenberg Foundation in Los Angeles will support the effort to reimagine the visitor experience at the Library of Congress. The foundation also is donating 1,000 photographic prints from its Annenberg Space for Photography exhibitions to the Library.\n",
"\n",
"The Library is pursuing a multiyear plan to transform the experience of its nearly 2 million annual visitors, share more of its treasures with the public and show how Library collections connect with visitors own creativity and research. The project is part of a strategic plan established by Librarian of Congress Carla Hayden to make the Library more user-centered for Congress, creators and learners of all ages.\n",
"\n",
"A 2018 exhibition at the Annenberg Space for Photography in Los Angeles featured over 400 photographs from the Library. The Library is planning a future photography exhibition, based on the Annenberg-curated show, along with a documentary film on the Library and its history, produced by the Annenberg Space for Photography.\n",
"\n",
"“The nations library is honored to have the strong support of Wallis Annenberg and the Annenberg Foundation as we enhance the experience for our visitors,” Hayden said. “We know that visitors will find new connections to the Library through the incredible photography collections and countless other treasures held here to document our nations history and creativity.”\n",
"\n",
"To enhance the Librarys holdings, the foundation is giving the Library photographic prints for long-term preservation from 10 other exhibitions hosted at the Annenberg Space for Photography. The Library holds one of the worlds largest photography collections, with about 14 million photos and over 1 million images digitized and available online.\n",
"18 LIBRARY OF CONGRESS MAGAZINE\n"
]
}
],
"source": [
@@ -577,17 +461,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
" The image is a black and white photograph by Dorothea Lange titled \"Destitute Pea Pickers in California. Mother of Seven Children. Age Thirty-Two. Nipomo, California.\" It was taken in March 1936 as part of the Farm Security Administration-Office of War Information Collection.\n",
"\n",
"The photograph features a woman with seven children, who appear to be in a state of poverty and hardship. The woman is seated, looking directly at the camera, while three of her children are standing behind her. They all seem to be dressed in ragged clothing, indicative of their impoverished condition.\n",
"\n",
"The historical context of this image is related to the Great Depression, which was a period of economic hardship in the United States that lasted from 1929 to 1939. During this time, many people struggled to make ends meet, and poverty was widespread. This photograph captures the plight of one such family during this difficult period.\n",
"\n",
"The symbolism of the image is multifaceted. The woman's direct gaze at the camera can be seen as a plea for help or an expression of desperation. The ragged clothing of the children serves as a stark reminder of the poverty and hardship experienced by many during this time.\n",
"\n",
"In terms of connections to the related text, it is mentioned that Florence Owens Thompson, the woman in the photograph, initially regretted having her picture taken. However, she later came to appreciate the importance of the image as a representation of the struggles faced by many during the Great Depression. The mention of Helena Zinkham suggests that she may have played a role in the creation or distribution of this photograph.\n",
"\n",
"Overall, this image is a powerful depiction of poverty and hardship during the Great Depression, capturing the resilience and struggles of one family amidst difficult times. \n"
" The image depicts a woman with several children. The woman appears to be of Cherokee heritage, as suggested by the text provided. The image is described as having been initially regretted by the subject, Florence Owens Thompson, due to her feeling that it did not accurately represent her leadership qualities.\n",
"The historical and cultural context of the image is tied to the Great Depression and the Dust Bowl, both of which affected the Cherokee people in Oklahoma. The photograph was taken during this period, and its subject, Florence Owens Thompson, was a leader within her community who worked tirelessly to help those affected by these crises.\n",
"The image's symbolism and meaning can be interpreted as a representation of resilience and strength in the face of adversity. The woman is depicted with multiple children, which could signify her role as a caregiver and protector during difficult times.\n",
"Connections between the image and the related text include Florence Owens Thompson's leadership qualities and her regretted feelings about the photograph. Additionally, the mention of Dorothea Lange, the photographer who took this photo, ties the image to its historical context and the broader narrative of the Great Depression and Dust Bowl in Oklahoma. \n"
]
}
],
@@ -614,17 +491,11 @@
"source": [
"! docker kill vdms_rag_nb"
]
},
{
"cell_type": "markdown",
"id": "fe4a98ee",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".test-venv",
"display_name": ".langchain-venv",
"language": "python",
"name": "python3"
},
@@ -638,7 +509,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -183,7 +183,7 @@
"outputs": [],
"source": [
"game_description = f\"\"\"Here is the topic for a Dungeons & Dragons game: {quest}.\n",
" The characters are: {(*character_names,)}.\n",
" The characters are: {*character_names,}.\n",
" The story is narrated by the storyteller, {storyteller_name}.\"\"\"\n",
"\n",
"player_descriptor_system_message = SystemMessage(\n",
@@ -334,7 +334,7 @@
" You are the storyteller, {storyteller_name}.\n",
" Please make the quest more specific. Be creative and imaginative.\n",
" Please reply with the specified quest in {word_limit} words or less. \n",
" Speak directly to the characters: {(*character_names,)}.\n",
" Speak directly to the characters: {*character_names,}.\n",
" Do not add anything else.\"\"\"\n",
" ),\n",
"]\n",

View File

@@ -200,7 +200,7 @@
"outputs": [],
"source": [
"game_description = f\"\"\"Here is the topic for the presidential debate: {topic}.\n",
"The presidential candidates are: {\", \".join(character_names)}.\"\"\"\n",
"The presidential candidates are: {', '.join(character_names)}.\"\"\"\n",
"\n",
"player_descriptor_system_message = SystemMessage(\n",
" content=\"You can add detail to the description of each presidential candidate.\"\n",
@@ -595,7 +595,7 @@
" Frame the debate topic as a problem to be solved.\n",
" Be creative and imaginative.\n",
" Please reply with the specified topic in {word_limit} words or less. \n",
" Speak directly to the presidential candidates: {(*character_names,)}.\n",
" Speak directly to the presidential candidates: {*character_names,}.\n",
" Do not add anything else.\"\"\"\n",
" ),\n",
"]\n",

View File

@@ -71,9 +71,9 @@
"# Optional: LangSmith API keys\n",
"import os\n",
"\n",
"os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
"os.environ[\"LANGSMITH_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"os.environ[\"LANGSMITH_API_KEY\"] = \"api_key\""
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
"os.environ[\"LANGCHAIN_API_KEY\"] = \"api_key\""
]
},
{
@@ -215,8 +215,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models import ChatOllama\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_ollama import ChatOllama\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Prompt\n",

View File

@@ -395,7 +395,8 @@
"prompt_messages = [\n",
" SystemMessage(\n",
" content=(\n",
" \"You are a world class algorithm to answer questions in a specific format.\"\n",
" \"You are a world class algorithm to answer \"\n",
" \"questions in a specific format.\"\n",
" )\n",
" ),\n",
" HumanMessage(content=\"Answer question using the following context\"),\n",

View File

@@ -29,7 +29,7 @@
"source": [
"import os\n",
"\n",
"os.environ[\"LANGSMITH_PROJECT\"] = \"movie-qa\""
"os.environ[\"LANGCHAIN_PROJECT\"] = \"movie-qa\""
]
},
{

View File

@@ -25,7 +25,7 @@
" * [Oracle Blockchain](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_blockchain_table.html#GUID-B469E277-978E-4378-A8C1-26D3FF96C9A6)\n",
" * [JSON](https://docs.oracle.com/en/database/oracle/oracle-database/23/adjsn/json-in-oracle-database.html)\n",
"\n",
"This guide demonstrates how Oracle AI Vector Search can be used with LangChain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
" * Loading the documents from various sources using OracleDocLoader\n",
" * Summarizing them within/outside the database using OracleSummary\n",
@@ -47,19 +47,7 @@
"source": [
"### Prerequisites\n",
"\n",
"Please install the Oracle Database [python-oracledb driver](https://pypi.org/project/oracledb/) to use LangChain with Oracle AI Vector Search:\n",
"\n",
"```\n",
"$ python -m pip install --upgrade oracledb\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, connect as a privileged user to create a demo user with all the required privileges. Change the credentials for your environment. Also set the DEMO_PY_DIR path to a directory on the database host where your model file is located:"
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
@@ -68,30 +56,65 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, create a demo user with all the required privileges. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"User setup done!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# Please update with your SYSTEM (or privileged user) username, password, and database connection string\n",
"username = \"SYSTEM\"\n",
"# Update with your username, password, hostname, and service_name\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"\n",
" with connection.cursor() as cursor:\n",
" cursor = conn.cursor()\n",
" try:\n",
" cursor.execute(\n",
" \"\"\"\n",
" begin\n",
" -- Drop user\n",
" execute immediate 'drop user if exists testuser cascade';\n",
"\n",
" begin\n",
" execute immediate 'drop user testuser cascade';\n",
" exception\n",
" when others then\n",
" dbms_output.put_line('Error dropping user: ' || SQLERRM);\n",
" end;\n",
" \n",
" -- Create user and grant privileges\n",
" execute immediate 'create user testuser identified by testuser';\n",
" execute immediate 'grant connect, unlimited tablespace, create credential, create procedure, create any index to testuser';\n",
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/home/yourname/demo/orachain''';\n",
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/scratch/hroy/view_storage/hroy_devstorage/demo/orachain''';\n",
" execute immediate 'grant read, write on directory DEMO_PY_DIR to public';\n",
" execute immediate 'grant create mining model to testuser';\n",
"\n",
" \n",
" -- Network access\n",
" begin\n",
" DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n",
@@ -104,7 +127,15 @@
" end;\n",
" \"\"\"\n",
" )\n",
" print(\"User setup done!\")"
" print(\"User setup done!\")\n",
" except Exception as e:\n",
" print(f\"User setup failed with error: {e}\")\n",
" finally:\n",
" cursor.close()\n",
" conn.close()\n",
"except Exception as e:\n",
" print(f\"Connection failed with error: {e}\")\n",
" sys.exit(1)"
]
},
{
@@ -112,13 +143,13 @@
"metadata": {},
"source": [
"## Process Documents using Oracle AI\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by LangChain.\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
"\n",
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
"\n",
"The Oracle AI Vector Search LangChain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"\n",
"In the sections that follow, we will detail the utilization of Oracle AI LangChain APIs to effectively implement each of these processes."
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
]
},
{
@@ -126,24 +157,38 @@
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code shows how to connect to Oracle Database using the python-oracledb driver. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You can switch to Thick mode if you are unable to use Thin mode."
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a Thin mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in Thick mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 45,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, and database connection string\n",
"username = \"testuser\"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"connection = oracledb.connect(user=username, password=password, dsn=dsn)\n",
"print(\"Connection successful!\")"
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)"
]
},
{
@@ -156,12 +201,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 46,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table created and populated.\n"
]
}
],
"source": [
"with connection.cursor() as cursor:\n",
" drop_table_sql = \"\"\"drop table if exists demo_tab\"\"\"\n",
"try:\n",
" cursor = conn.cursor()\n",
"\n",
" drop_table_sql = \"\"\"drop table demo_tab\"\"\"\n",
" cursor.execute(drop_table_sql)\n",
"\n",
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
@@ -184,9 +239,15 @@
" ]\n",
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
"\n",
"connection.commit()\n",
" conn.commit()\n",
"\n",
"print(\"Table created and populated.\")"
" print(\"Table created and populated.\")\n",
" cursor.close()\n",
"except Exception as e:\n",
" print(\"Table creation failed.\")\n",
" cursor.close()\n",
" conn.close()\n",
" sys.exit(1)"
]
},
{
@@ -200,22 +261,30 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the ONNX Model\n",
"### Load ONNX Model\n",
"\n",
"Oracle accommodates a variety of embedding providers, enabling you to choose between proprietary database solutions and third-party services such as Oracle Generative AI Service and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"\n",
"***Important*** : Should you opt for the database option, you must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"\n",
"A significant advantage of utilizing an ONNX model directly within Oracle Database is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"\n",
"Below is the example code to upload an ONNX model into Oracle Database:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 47,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ONNX model loaded.\n"
]
}
],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"\n",
@@ -225,8 +294,12 @@
"onnx_file = \"tinybert.onnx\"\n",
"model_name = \"demo_model\"\n",
"\n",
"OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
"print(\"ONNX model loaded.\")"
"try:\n",
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
" print(\"ONNX model loaded.\")\n",
"except Exception as e:\n",
" print(\"ONNX model loading failed!\")\n",
" sys.exit(1)"
]
},
{
@@ -248,7 +321,8 @@
"metadata": {},
"outputs": [],
"source": [
"with connection.cursor() as cursor:\n",
"try:\n",
" cursor = conn.cursor()\n",
" cursor.execute(\n",
" \"\"\"\n",
" declare\n",
@@ -275,7 +349,12 @@
" params => json(jo.to_string));\n",
" end;\n",
" \"\"\"\n",
" )"
" )\n",
" cursor.close()\n",
" print(\"Credentials created.\")\n",
"except Exception as ex:\n",
" cursor.close()\n",
" raise"
]
},
{
@@ -283,24 +362,33 @@
"metadata": {},
"source": [
"### Load Documents\n",
"You have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"\n",
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
"\n",
"Below is a sample code snippet that demonstrates how to use OracleDocLoader:"
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 48,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of docs loaded: 3\n"
]
}
],
"source": [
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"# loading from Oracle Database table\n",
"# make sure you have the table with this specification\n",
"loader_params = {}\n",
"loader_params = {\n",
" \"owner\": \"testuser\",\n",
" \"tablename\": \"demo_tab\",\n",
@@ -308,7 +396,7 @@
"}\n",
"\n",
"\"\"\" load the docs \"\"\"\n",
"loader = OracleDocLoader(conn=connection, params=loader_params)\n",
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
"docs = loader.load()\n",
"\n",
"\"\"\" verify \"\"\"\n",
@@ -321,23 +409,23 @@
"metadata": {},
"source": [
"### Generate Summary\n",
"Now that you have loaded the documents, you may want to generate a summary for each document. The Oracle AI Vector Search LangChain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, Oracle Generative AI Service, HuggingFace, among others, allowing you to select the provider that best meets their needs. To utilize these capabilities, you must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** You may need to set proxy if you want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder objects\n",
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
@@ -345,14 +433,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code shows how to generate a summary:"
"The following sample code will show how to generate summary:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 49,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of Summaries: 3\n"
]
}
],
"source": [
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_core.documents import Document\n",
@@ -367,7 +463,7 @@
"\n",
"# get the summary instance\n",
"# Remove proxy if not required\n",
"summ = OracleSummary(conn=connection, params=summary_params, proxy=proxy)\n",
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
"\n",
"list_summary = []\n",
"for doc in docs:\n",
@@ -391,9 +487,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 50,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of Chunks: 3\n"
]
}
],
"source": [
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_core.documents import Document\n",
@@ -402,7 +506,7 @@
"splitter_params = {\"normalize\": \"all\"}\n",
"\n",
"\"\"\" get the splitter instance \"\"\"\n",
"splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
"\n",
"list_chunks = []\n",
"for doc in docs:\n",
@@ -419,19 +523,19 @@
"metadata": {},
"source": [
"### Generate Embeddings\n",
"Now that the documents are chunked as per requirements, you may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** You may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
@@ -443,14 +547,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code shows how to generate embeddings:"
"The following sample code will show how to generate embeddings:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 51,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of embeddings: 3\n"
]
}
],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_core.documents import Document\n",
@@ -460,7 +572,7 @@
"\n",
"# get the embedding instance\n",
"# Remove proxy if not required\n",
"embedder = OracleEmbeddings(conn=connection, params=embedder_params, proxy=proxy)\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
"\n",
"embeddings = []\n",
"for doc in docs:\n",
@@ -479,19 +591,19 @@
"metadata": {},
"source": [
"## Create Oracle AI Vector Store\n",
"Now that you know how to use Oracle AI LangChain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import all the dependencies:"
"First, let's import all the dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
@@ -514,80 +626,100 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's combine all document processing stages together. Here is the sample code:"
"Next, let's combine all document processing stages together. Here is the sample code below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 53,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"ONNX model loaded.\n",
"Number of total chunks with metadata: 3\n"
]
}
],
"source": [
"\"\"\"\n",
"In this sample example, we will use 'database' provider for both summary and embeddings\n",
"so, we don't need to do the following:\n",
"In this sample example, we will use 'database' provider for both summary and embeddings.\n",
"So, we don't need to do the followings:\n",
" - set proxy for 3rd party providers\n",
" - create credential for 3rd party providers\n",
"\n",
"If you choose to use 3rd party provider, please follow the necessary steps for proxy and credential.\n",
"If you choose to use 3rd party provider, \n",
"please follow the necessary steps for proxy and credential.\n",
"\"\"\"\n",
"\n",
"# please update with your username, password, and database connection string\n",
"# oracle connection\n",
"# please update with your username, password, hostname, and service_name\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"with oracledb.connect(user=username, password=password, dsn=dsn) as connection:\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)\n",
"\n",
" # load onnx model\n",
" # please update with your related information\n",
" onnx_dir = \"DEMO_PY_DIR\"\n",
" onnx_file = \"tinybert.onnx\"\n",
" model_name = \"demo_model\"\n",
" OracleEmbeddings.load_onnx_model(connection, onnx_dir, onnx_file, model_name)\n",
"\n",
"# load onnx model\n",
"# please update with your related information\n",
"onnx_dir = \"DEMO_PY_DIR\"\n",
"onnx_file = \"tinybert.onnx\"\n",
"model_name = \"demo_model\"\n",
"try:\n",
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
" print(\"ONNX model loaded.\")\n",
"except Exception as e:\n",
" print(\"ONNX model loading failed!\")\n",
" sys.exit(1)\n",
"\n",
" # params\n",
" # please update necessary fields with related information\n",
" loader_params = {\n",
" \"owner\": \"testuser\",\n",
" \"tablename\": \"demo_tab\",\n",
" \"colname\": \"data\",\n",
" }\n",
" summary_params = {\n",
" \"provider\": \"database\",\n",
" \"glevel\": \"S\",\n",
" \"numParagraphs\": 1,\n",
" \"language\": \"english\",\n",
" }\n",
" splitter_params = {\"normalize\": \"all\"}\n",
" embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
" # instantiate loader, summary, splitter, and embedder\n",
" loader = OracleDocLoader(conn=connection, params=loader_params)\n",
" summary = OracleSummary(conn=connection, params=summary_params)\n",
" splitter = OracleTextSplitter(conn=connection, params=splitter_params)\n",
" embedder = OracleEmbeddings(conn=connection, params=embedder_params)\n",
"# params\n",
"# please update necessary fields with related information\n",
"loader_params = {\n",
" \"owner\": \"testuser\",\n",
" \"tablename\": \"demo_tab\",\n",
" \"colname\": \"data\",\n",
"}\n",
"summary_params = {\n",
" \"provider\": \"database\",\n",
" \"glevel\": \"S\",\n",
" \"numParagraphs\": 1,\n",
" \"language\": \"english\",\n",
"}\n",
"splitter_params = {\"normalize\": \"all\"}\n",
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
" # process the documents\n",
" chunks_with_mdata = []\n",
" for id, doc in enumerate(docs, start=1):\n",
" summ = summary.get_summary(doc.page_content)\n",
" chunks = splitter.split_text(doc.page_content)\n",
" for ic, chunk in enumerate(chunks, start=1):\n",
" chunk_metadata = doc.metadata.copy()\n",
" chunk_metadata[\"id\"] = (\n",
" chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
" )\n",
" chunk_metadata[\"document_id\"] = str(id)\n",
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
" chunks_with_mdata.append(\n",
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
" )\n",
"# instantiate loader, summary, splitter, and embedder\n",
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
"summary = OracleSummary(conn=conn, params=summary_params)\n",
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params)\n",
"\n",
" \"\"\" verify \"\"\"\n",
" print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
"# process the documents\n",
"chunks_with_mdata = []\n",
"for id, doc in enumerate(docs, start=1):\n",
" summ = summary.get_summary(doc.page_content)\n",
" chunks = splitter.split_text(doc.page_content)\n",
" for ic, chunk in enumerate(chunks, start=1):\n",
" chunk_metadata = doc.metadata.copy()\n",
" chunk_metadata[\"id\"] = chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
" chunk_metadata[\"document_id\"] = str(id)\n",
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
" chunks_with_mdata.append(\n",
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
" )\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
]
},
{
@@ -601,15 +733,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 55,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Vector Store Table: oravs\n"
]
}
],
"source": [
"# create Oracle AI Vector Store\n",
"vectorstore = OracleVS.from_documents(\n",
" chunks_with_mdata,\n",
" embedder,\n",
" client=connection,\n",
" client=conn,\n",
" table_name=\"oravs\",\n",
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
")\n",
@@ -638,12 +778,12 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"oraclevs.create_index(\n",
" connection, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
" conn, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
")\n",
"\n",
"print(\"Index created.\")"
@@ -653,7 +793,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. You may adjust various parameters according to your specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"\n",
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
]
@@ -665,16 +805,29 @@
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"You have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. You are now prepared to proceed with semantic searches.\n",
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
"\n",
"Below is the sample code for this process:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 58,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
"[]\n",
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
"[]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
]
}
],
"source": [
"query = \"What is Oracle AI Vector Store?\"\n",
"filter = {\"document_id\": [\"1\"]}\n",
@@ -719,7 +872,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -552,7 +552,9 @@
"cell_type": "markdown",
"id": "77deb6a0-0950-450a-916a-f2a029676c20",
"metadata": {},
"source": "**Appending all retrieved documents in a single document**"
"source": [
"**Appending all retreived documents in a single document**"
]
},
{
"cell_type": "code",
@@ -756,4 +758,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -1,439 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "3716230e",
"metadata": {},
"source": [
"# RAG Pipeline with MLflow Tracking, Tracing & Evaluation\n",
"\n",
"This notebook demonstrates how to build a complete Retrieval-Augmented Generation (RAG) pipeline using LangChain and integrate it with MLflow for experiment tracking, tracing, and evaluation.\n",
"\n",
"\n",
"- **RAG Pipeline Construction**: Build a complete RAG system using LangChain components\n",
"- **MLflow Integration**: Track experiments, parameters, and artifacts\n",
"- **Tracing**: Monitor inputs, outputs, retrieved documents, scores, prompts, and timings\n",
"- **Evaluation**: Use MLflow's built-in scorers to assess RAG performance\n",
"- **Best Practices**: Implement proper configuration management and reproducible experiments\n",
"\n",
"We'll build a RAG system that can answer questions about academic papers by:\n",
"1. Loading and chunking documents from ArXiv\n",
"2. Creating embeddings and a vector store\n",
"3. Setting up a retrieval-augmented generation chain\n",
"4. Tracking all experiments with MLflow\n",
"5. Evaluating the system's performance\n",
"\n",
"![System Diagram](https://miro.medium.com/v2/resize:fit:720/format:webp/1*eiw86PP4hrBBxhjTjP0JUQ.png)"
]
},
{
"cell_type": "markdown",
"id": "2f7561c4",
"metadata": {},
"source": [
"#### Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0814ebe9",
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain mlflow langchain-community arxiv pymupdf langchain-text-splitters langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "747399b6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import mlflow\n",
"from mlflow.genai.scorers import RelevanceToQuery, Correctness, ExpectationsGuidelines\n",
"from langchain_community.document_loaders import ArxivLoader\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
"from langchain_core.output_parsers import StrOutputParser"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4141ee05",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR OPENAI API KEY>\"\n",
"\n",
"mlflow.set_experiment(\"LangChain-RAG-MLflow\")\n",
"mlflow.langchain.autolog()"
]
},
{
"cell_type": "markdown",
"id": "dd5eb41b",
"metadata": {},
"source": [
"Define all hyperparameters and configuration in a centralized dictionary. This makes it easy to:\n",
"- Track different experiment configurations\n",
"- Reproduce results\n",
"- Perform hyperparameter tuning\n",
"\n",
"**Key Parameters**:\n",
"- `chunk_size`: Size of text chunks for document splitting\n",
"- `chunk_overlap`: Overlap between consecutive chunks\n",
"- `retriever_k`: Number of documents to retrieve\n",
"- `embeddings_model`: OpenAI embedding model\n",
"- `llm`: Language model for generation\n",
"- `temperature`: Sampling temperature for the LLM"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6dcdc5d8",
"metadata": {},
"outputs": [],
"source": [
"CONFIG = {\n",
" \"chunk_size\": 400,\n",
" \"chunk_overlap\": 80,\n",
" \"retriever_k\": 3,\n",
" \"embeddings_model\": \"text-embedding-3-small\",\n",
" \"system_prompt\": \"You are a helpful assistant. Use the following context to answer the question. Use three sentences maximum and keep the answer concise.\",\n",
" \"llm\": \"gpt-5-nano\",\n",
" \"temperature\": 0,\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "8a2985f1",
"metadata": {},
"source": [
"#### ArXiv Dcoument Loading and Processing"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1f32aa36",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Published': '2023-08-02', 'Title': 'Attention Is All You Need', 'Authors': 'Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin', 'Summary': 'The dominant sequence transduction models are based on complex recurrent or\\nconvolutional neural networks in an encoder-decoder configuration. The best\\nperforming models also connect the encoder and decoder through an attention\\nmechanism. We propose a new simple network architecture, the Transformer, based\\nsolely on attention mechanisms, dispensing with recurrence and convolutions\\nentirely. Experiments on two machine translation tasks show these models to be\\nsuperior in quality while being more parallelizable and requiring significantly\\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014\\nEnglish-to-German translation task, improving over the existing best results,\\nincluding ensembles by over 2 BLEU. On the WMT 2014 English-to-French\\ntranslation task, our model establishes a new single-model state-of-the-art\\nBLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction\\nof the training costs of the best models from the literature. We show that the\\nTransformer generalizes well to other tasks by applying it successfully to\\nEnglish constituency parsing both with large and limited training data.'}\n"
]
}
],
"source": [
"# Load documents from ArXiv\n",
"loader = ArxivLoader(\n",
" query=\"1706.03762\",\n",
" load_max_docs=1,\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].metadata)\n",
"\n",
"# Split documents into chunks\n",
"splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=CONFIG[\"chunk_size\"],\n",
" chunk_overlap=CONFIG[\"chunk_overlap\"],\n",
")\n",
"chunks = splitter.split_documents(docs)\n",
"\n",
"# Join chunks into a single string\n",
"def join_chunks(chunks):\n",
" return \"\\n\\n\".join([chunk.page_content for chunk in chunks])\n"
]
},
{
"cell_type": "markdown",
"id": "6e194ab4",
"metadata": {},
"source": [
"#### Vector Store and Retriever Setup"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "26dfbeaa",
"metadata": {},
"outputs": [],
"source": [
"# Create embeddings\n",
"embeddings = OpenAIEmbeddings(model=CONFIG[\"embeddings_model\"])\n",
"\n",
"# Create vector store from documents\n",
"vectorstore = InMemoryVectorStore.from_documents(\n",
" chunks,\n",
" embedding=embeddings,\n",
")\n",
"\n",
"# Create retriever\n",
"retriever = vectorstore.as_retriever(search_kwargs={\"k\": CONFIG[\"retriever_k\"]})"
]
},
{
"cell_type": "markdown",
"id": "bc1f181b",
"metadata": {},
"source": [
"#### RAG Chain Construction using [LCEL](https://python.langchain.com/docs/concepts/lcel/)\n",
"\n",
"Flow:\n",
"1. Query → Retriever (finds relevant chunks)\n",
"2. Chunks → join_chunks (creates context)\n",
"3. Context + Query → Prompt Template\n",
"4. Prompt → Language Model → Response\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6a810dc3",
"metadata": {},
"outputs": [],
"source": [
"# Initialize the language model\n",
"llm = ChatOpenAI(model=CONFIG[\"llm\"], temperature=CONFIG[\"temperature\"])\n",
"\n",
"# Create the prompt template\n",
"prompt = ChatPromptTemplate.from_messages([\n",
" (\"system\",CONFIG[\"system_prompt\"] + \"\\n\\nContext:\\n{context}\\n\\n\"),\n",
" (\"human\", \"\\n{question}\\n\"),\n",
"])\n",
"\n",
"# Construct the RAG chain\n",
"rag_chain = (\n",
" {\n",
" \"context\": retriever | RunnableLambda(join_chunks),\n",
" \"question\": RunnablePassthrough(),\n",
" }\n",
" |prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c04bd019",
"metadata": {},
"source": [
"#### Prediction Function with MLflow Tracing\n",
"\n",
"Create a prediction function decorated with `@mlflow.trace` to automatically log:\n",
"- Input queries\n",
"- Retrieved documents\n",
"- Generated responses\n",
"- Execution time\n",
"- Chain intermediate steps"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7b45fc04",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Question: What is the main idea of the paper?\n",
"Response: The main idea is to replace recurrent/convolutional sequence models with a pure attention-based architecture called the Transformer. It uses self-attention to model dependencies between all positions in the input and output, enabling full parallelization and better handling of long-range relations. This approach achieves strong results on translation and can extend to other modalities.\n"
]
}
],
"source": [
"@mlflow.trace\n",
"def predict_fn(question: str) -> str:\n",
" return rag_chain.invoke(question)\n",
"\n",
"# Test the prediction function\n",
"sample_question = \"What is the main idea of the paper?\"\n",
"response = predict_fn(sample_question)\n",
"print(f\"Question: {sample_question}\")\n",
"print(f\"Response: {response}\")"
]
},
{
"cell_type": "markdown",
"id": "421469de",
"metadata": {},
"source": [
"#### Evaluation Dataset and Scoring\n",
"\n",
"Define an evaluation dataset and run systematic evaluation using [MLflow's built-in scorers](https://mlflow.org/docs/latest/genai/eval-monitor/scorers/llm-judge/predefined/#available-scorers):\n",
"\n",
"<u>Evaluation Components:</u>\n",
"- **Dataset**: Questions with expected concepts and facts\n",
"- **Scorers**: \n",
" - `RelevanceToQuery`: Measures how relevant the response is to the question\n",
" - `Correctness`: Evaluates factual accuracy of the response\n",
" - `ExpectationsGuidelines`: Checks that output matches expectation guidelines\n",
"\n",
"<u>Best Practices:</u>\n",
"- Create diverse test cases covering different query types\n",
"- Include expected concepts to guide evaluation\n",
"- Use multiple scoring metrics for comprehensive assessment"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5c1dc4f2",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2025/08/23 20:14:39 INFO mlflow.models.evaluation.utils.trace: Auto tracing is temporarily enabled during the model evaluation for computing some metrics and debugging. To disable tracing, call `mlflow.autolog(disable=True)`.\n",
"2025/08/23 20:14:39 INFO mlflow.genai.utils.data_validation: Testing model prediction with the first sample in the dataset.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2b6c6687efa24796b39c7951d589d481",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Evaluating: 0%| | 0/3 [Elapsed: 00:00, Remaining: ?] "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"✨ Evaluation completed.\n",
"\n",
"Metrics and evaluation results are logged to the MLflow run:\n",
" Run name: \u001b[94mbaseline_eval\u001b[0m\n",
" Run ID: \u001b[94ma2218d9f24c9415f8040d3b77af103a9\u001b[0m\n",
"\n",
"To view the detailed evaluation results with sample-wise scores,\n",
"open the \u001b[93m\u001b[1mTraces\u001b[0m tab in the Run page in the MLflow UI.\n",
"\n"
]
}
],
"source": [
"# Define evaluation dataset\n",
"eval_dataset = [\n",
" {\n",
" \"inputs\": {\"question\": \"What is the main idea of the paper?\"},\n",
" \"expectations\": {\n",
" \"key_concepts\": [\"attention mechanism\", \"transformer\", \"neural network\"],\n",
" \"expected_facts\": [\"attention mechanism is a key component of the transformer model\"],\n",
" \"guidelines\": [\"The response must be factual and concise\"],\n",
" }\n",
" },\n",
" {\n",
" \"inputs\": {\"question\": \"What's the difference between a transformer and a recurrent neural network?\"},\n",
" \"expectations\": {\n",
" \"key_concepts\": [\"sequential\", \"attention mechanism\", \"hidden state\"],\n",
" \"expected_facts\": [\"transformer processes data in parallel while RNN processes data sequentially\"],\n",
" \"guidelines\": [\"The response must be factual and focus on the difference between the two models\"],\n",
" }\n",
" },\n",
" {\n",
" \"inputs\": {\"question\": \"What does the attention mechanism do?\"},\n",
" \"expectations\": {\n",
" \"key_concepts\": [\"query\", \"key\", \"value\", \"relationship\", \"similarity\"],\n",
" \"expected_facts\": [\"attention allows the model to weigh the importance of different parts of the input sequence when processing it\"],\n",
" \"guidelines\": [\"The response must be factual and explain the concept of attention\"],\n",
" }\n",
" }\n",
"]\n",
"\n",
"# Run evaluation with MLflow\n",
"with mlflow.start_run(run_name=\"baseline_eval\") as run:\n",
" # Log configuration parameters\n",
" mlflow.log_params(CONFIG)\n",
"\n",
" # Run evaluation\n",
" results = mlflow.genai.evaluate(\n",
" data=eval_dataset,\n",
" predict_fn=predict_fn,\n",
" scorers=[RelevanceToQuery(), Correctness(), ExpectationsGuidelines()],\n",
" )\n"
]
},
{
"cell_type": "markdown",
"id": "52b137c7",
"metadata": {},
"source": [
"#### Launch MLflow UI to check out the results\n",
"\n",
"<u>What you'll see in the UI:</u>\n",
"- **Experiments**: Compare different RAG configurations\n",
"- **Runs**: Individual experiment runs with metrics and parameters\n",
"- **Traces**: Detailed execution traces showing retrieval and generation steps\n",
"- **Evaluation Results**: Scoring metrics and detailed comparisons\n",
"- **Artifacts**: Saved models, datasets, and other files\n",
"\n",
"Navigate to `http://localhost:5000` after running the command below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "817c3799",
"metadata": {},
"outputs": [],
"source": [
"!mlflow ui"
]
},
{
"cell_type": "markdown",
"id": "c75861e3",
"metadata": {},
"source": [
"You should see something like this\n",
"\n",
"![MLflow UI image](https://miro.medium.com/v2/resize:fit:720/format:webp/1*Cx7MMy53pAP7150x_hvztA.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,82 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG using Upstage Document Parse and Groundedness Check\n",
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Document Parse and Groundedness Check."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"from langchain_community.vectorstores import DocArrayInMemorySearch\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_core.runnables.base import RunnableSerializable\n",
"from langchain_upstage import (\n",
" ChatUpstage,\n",
" UpstageDocumentParseLoader,\n",
" UpstageEmbeddings,\n",
" UpstageGroundednessCheck,\n",
")\n",
"\n",
"model = ChatUpstage()\n",
"\n",
"files = [\"/PATH/TO/YOUR/FILE.pdf\", \"/PATH/TO/YOUR/FILE2.pdf\"]\n",
"\n",
"loader = UpstageDocumentParseLoader(file_path=files, split=\"element\")\n",
"\n",
"docs = loader.load()\n",
"\n",
"vectorstore = DocArrayInMemorySearch.from_documents(\n",
" docs, embedding=UpstageEmbeddings(model=\"solar-embedding-1-large\")\n",
")\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"template = \"\"\"Answer the question based only on the following context:\n",
"{context}\n",
"\n",
"Question: {question}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"output_parser = StrOutputParser()\n",
"\n",
"retrieved_docs = retriever.get_relevant_documents(\"How many parameters in SOLAR model?\")\n",
"\n",
"groundedness_check = UpstageGroundednessCheck()\n",
"groundedness = \"\"\n",
"while groundedness != \"grounded\":\n",
" chain: RunnableSerializable = RunnablePassthrough() | prompt | model | output_parser\n",
"\n",
" result = chain.invoke(\n",
" {\n",
" \"context\": retrieved_docs,\n",
" \"question\": \"How many parameters in SOLAR model?\",\n",
" }\n",
" )\n",
"\n",
" groundedness = groundedness_check.invoke(\n",
" {\n",
" \"context\": retrieved_docs,\n",
" \"answer\": result,\n",
" }\n",
" )"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,82 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RAG using Upstage Layout Analysis and Groundedness Check\n",
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Layout Analysis and Groundedness Check."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"from langchain_community.vectorstores import DocArrayInMemorySearch\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_core.runnables.base import RunnableSerializable\n",
"from langchain_upstage import (\n",
" ChatUpstage,\n",
" UpstageEmbeddings,\n",
" UpstageGroundednessCheck,\n",
" UpstageLayoutAnalysisLoader,\n",
")\n",
"\n",
"model = ChatUpstage()\n",
"\n",
"files = [\"/PATH/TO/YOUR/FILE.pdf\", \"/PATH/TO/YOUR/FILE2.pdf\"]\n",
"\n",
"loader = UpstageLayoutAnalysisLoader(file_path=files, split=\"element\")\n",
"\n",
"docs = loader.load()\n",
"\n",
"vectorstore = DocArrayInMemorySearch.from_documents(\n",
" docs, embedding=UpstageEmbeddings(model=\"solar-embedding-1-large\")\n",
")\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"template = \"\"\"Answer the question based only on the following context:\n",
"{context}\n",
"\n",
"Question: {question}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"output_parser = StrOutputParser()\n",
"\n",
"retrieved_docs = retriever.get_relevant_documents(\"How many parameters in SOLAR model?\")\n",
"\n",
"groundedness_check = UpstageGroundednessCheck()\n",
"groundedness = \"\"\n",
"while groundedness != \"grounded\":\n",
" chain: RunnableSerializable = RunnablePassthrough() | prompt | model | output_parser\n",
"\n",
" result = chain.invoke(\n",
" {\n",
" \"context\": retrieved_docs,\n",
" \"question\": \"How many parameters in SOLAR model?\",\n",
" }\n",
" )\n",
"\n",
" groundedness = groundedness_check.invoke(\n",
" {\n",
" \"context\": retrieved_docs,\n",
" \"answer\": result,\n",
" }\n",
" )"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -53,7 +53,7 @@
"id": "f5ccda4e-7af5-4355-b9c4-25547edf33f9",
"metadata": {},
"source": [
"Let's first load up this paper, and split into text chunks of size 1000."
"Lets first load up this paper, and split into text chunks of size 1000."
]
},
{
@@ -241,7 +241,7 @@
"id": "360b2837-8024-47e0-a4ba-592505a9a5c8",
"metadata": {},
"source": [
"With our embedder in place, let's define our retriever:"
"With our embedder in place, lets define our retriever:"
]
},
{
@@ -312,7 +312,7 @@
"id": "d84ea8f4-a5de-4d76-b44d-85e56583f489",
"metadata": {},
"source": [
"Let's write our documents into our new store. This will use our embedder on each document."
"Lets write our documents into our new store. This will use our embedder on each document."
]
},
{
@@ -339,7 +339,7 @@
"id": "580bc212-8ecd-4d28-8656-b96fcd0d7eb6",
"metadata": {},
"source": [
"Great! Our retriever is good to go. Let's load up an LLM, that will reason over the retrieved documents:"
"Great! Our retriever is good to go. Lets load up an LLM, that will reason over the retrieved documents:"
]
},
{
@@ -430,7 +430,7 @@
"id": "3bc53602-86d6-420f-91b1-fc2effa7e986",
"metadata": {},
"source": [
"Excellent! Let's ask it a question.\n",
"Excellent! lets ask it a question.\n",
"We will also use a verbose and debug, to check which documents were used by the model to produce the answer."
]
},

View File

@@ -233,7 +233,7 @@ Question: {input}"""
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Unless the user specifies in his question a specific number of examples he wishes to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

View File

@@ -34,7 +34,7 @@
"tools = [multiply, exponentiate, add]\n",
"\n",
"gpt35 = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0).bind_tools(tools)\n",
"claude3 = ChatAnthropic(model=\"claude-3-7-sonnet-20250219\").bind_tools(tools)\n",
"claude3 = ChatAnthropic(model=\"claude-3-sonnet-20240229\").bind_tools(tools)\n",
"llm_with_tools = gpt35.configurable_alternatives(\n",
" ConfigurableField(id=\"llm\"), default_key=\"gpt35\", claude3=claude3\n",
")"
@@ -113,15 +113,14 @@
{
"data": {
"text/plain": [
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_xuNXwm2P6U2Pp2pAbC1sdIBz', 'function': {'arguments': '{\"x\": 3, \"y\": 5}', 'name': 'add'}, 'type': 'function'}, {'id': 'call_0pImUJUDlYa5zfBcxxuvWyYS', 'function': {'arguments': '{\"x\": 8, \"y\": 2.743}', 'name': 'exponentiate'}, 'type': 'function'}, {'id': 'call_yaownQ9TZK0dkqD1KSFyax4H', 'function': {'arguments': '{\"x\": 17.24, \"y\": -918.1241}', 'name': 'add'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 75, 'prompt_tokens': 131, 'total_tokens': 206, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-ByJm2qxSWU3oTTSZQv64J4XQKZhA6', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--35fad027-47f7-44d3-aa8b-99f4fc24098c-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 5}, 'id': 'call_xuNXwm2P6U2Pp2pAbC1sdIBz', 'type': 'tool_call'}, {'name': 'exponentiate', 'args': {'x': 8, 'y': 2.743}, 'id': 'call_0pImUJUDlYa5zfBcxxuvWyYS', 'type': 'tool_call'}, {'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'call_yaownQ9TZK0dkqD1KSFyax4H', 'type': 'tool_call'}], usage_metadata={'input_tokens': 131, 'output_tokens': 75, 'total_tokens': 206, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}),\n",
" ToolMessage(content='8.0', tool_call_id='call_xuNXwm2P6U2Pp2pAbC1sdIBz'),\n",
" ToolMessage(content='300.03770462067547', tool_call_id='call_0pImUJUDlYa5zfBcxxuvWyYS'),\n",
" ToolMessage(content='-900.8841', tool_call_id='call_yaownQ9TZK0dkqD1KSFyax4H'),\n",
" AIMessage(content='The results are:\\n1. 3 plus 5 is 8.\\n2. 5 raised to the power of 2.743 is approximately 300.04.\\n3. 17.24 minus 918.1241 is approximately -900.88.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 55, 'prompt_tokens': 236, 'total_tokens': 291, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-ByJm345MYnpowGS90iAZAlSs7haed', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--5fa66d47-d80e-45d0-9c32-31348c735d72-0', usage_metadata={'input_tokens': 236, 'output_tokens': 55, 'total_tokens': 291, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}"
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_6yMU2WsS4Bqgi1WxFHxtfJRc', 'function': {'arguments': '{\"x\": 8, \"y\": 2.743}', 'name': 'exponentiate'}, 'type': 'function'}, {'id': 'call_GAL3dQiKFF9XEV0RrRLPTvVp', 'function': {'arguments': '{\"x\": 17.24, \"y\": -918.1241}', 'name': 'add'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 58, 'prompt_tokens': 168, 'total_tokens': 226}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-528302fc-7acf-4c11-82c4-119ccf40c573-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 8, 'y': 2.743}, 'id': 'call_6yMU2WsS4Bqgi1WxFHxtfJRc'}, {'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'call_GAL3dQiKFF9XEV0RrRLPTvVp'}]),\n",
" ToolMessage(content='300.03770462067547', tool_call_id='call_6yMU2WsS4Bqgi1WxFHxtfJRc'),\n",
" ToolMessage(content='-900.8841', tool_call_id='call_GAL3dQiKFF9XEV0RrRLPTvVp'),\n",
" AIMessage(content='The result of \\\\(3 + 5^{2.743}\\\\) is approximately 300.04, and the result of \\\\(17.24 - 918.1241\\\\) is approximately -900.88.', response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 251, 'total_tokens': 295}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-d1161669-ed09-4b18-94bd-6d8530df5aa8-0')]}"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -147,17 +146,17 @@
{
"data": {
"text/plain": [
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\", additional_kwargs={}, response_metadata={}),\n",
" AIMessage(content=[{'text': \"I'll solve these calculations for you.\\n\\nFor the first part, I need to calculate 3 plus 5 raised to the power of 2.743.\\n\\nLet me break this down:\\n1) First, I'll calculate 5 raised to the power of 2.743\\n2) Then add 3 to the result\", 'type': 'text'}, {'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'input': {'x': 5, 'y': 2.743}, 'name': 'exponentiate', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01HCbDmuzdg9ATMyKbnecbEE', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 563, 'output_tokens': 146, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--9f6469fb-bcbb-4c1c-9eec-79f6979c38e6-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 5, 'y': 2.743}, 'id': 'toolu_01L1mXysBQtpPLQ2AZTaCGmE', 'type': 'tool_call'}], usage_metadata={'input_tokens': 563, 'output_tokens': 146, 'total_tokens': 709, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
" ToolMessage(content='82.65606421491815', tool_call_id='toolu_01L1mXysBQtpPLQ2AZTaCGmE'),\n",
" AIMessage(content=[{'text': \"Now I'll add 3 to this result:\", 'type': 'text'}, {'id': 'toolu_01NARC83e9obV35mZ6jYzBiN', 'input': {'x': 3, 'y': 82.65606421491815}, 'name': 'add', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01ELwyCtVLeGC685PUFqmdz2', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 727, 'output_tokens': 87, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--d5af3d7c-e8b7-4cc2-997a-ad2dafd08751-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 82.65606421491815}, 'id': 'toolu_01NARC83e9obV35mZ6jYzBiN', 'type': 'tool_call'}], usage_metadata={'input_tokens': 727, 'output_tokens': 87, 'total_tokens': 814, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
" ToolMessage(content='85.65606421491815', tool_call_id='toolu_01NARC83e9obV35mZ6jYzBiN'),\n",
" AIMessage(content=[{'text': \"For the second part, you asked for 17.24 - 918.1241. I don't have a subtraction function available, but I can rewrite this as adding a negative number: 17.24 + (-918.1241)\", 'type': 'text'}, {'id': 'toolu_01Q6fLcZkBWZpMPCZ55WXR3N', 'input': {'x': 17.24, 'y': -918.1241}, 'name': 'add', 'type': 'tool_use'}], additional_kwargs={}, response_metadata={'id': 'msg_01WkmDwUxWjjaKGnTtdLGJnN', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 832, 'output_tokens': 130, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--39a6fbda-4c81-47a6-b361-524bd4ee5823-0', tool_calls=[{'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'toolu_01Q6fLcZkBWZpMPCZ55WXR3N', 'type': 'tool_call'}], usage_metadata={'input_tokens': 832, 'output_tokens': 130, 'total_tokens': 962, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}}),\n",
" ToolMessage(content='-900.8841', tool_call_id='toolu_01Q6fLcZkBWZpMPCZ55WXR3N'),\n",
" AIMessage(content='So, the answers are:\\n1) 3 plus 5 raised to the 2.743 = 85.65606421491815\\n2) 17.24 - 918.1241 = -900.8841', additional_kwargs={}, response_metadata={'id': 'msg_015Yoc62CvdJbANGFouiQ6AQ', 'model': 'claude-3-7-sonnet-20250219', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 978, 'output_tokens': 58, 'server_tool_use': None, 'service_tier': 'standard'}, 'model_name': 'claude-3-7-sonnet-20250219'}, id='run--174c0882-6180-47ea-8f63-d7b747302327-0', usage_metadata={'input_tokens': 978, 'output_tokens': 58, 'total_tokens': 1036, 'input_token_details': {'cache_read': 0, 'cache_creation': 0}})]}"
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
" AIMessage(content=[{'text': \"Okay, let's break this down into two parts:\", 'type': 'text'}, {'id': 'toolu_01DEhqcXkXTtzJAiZ7uMBeDC', 'input': {'x': 3, 'y': 5}, 'name': 'add', 'type': 'tool_use'}], response_metadata={'id': 'msg_01AkLGH8sxMHaH15yewmjwkF', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 450, 'output_tokens': 81}}, id='run-f35bfae8-8ded-4f8a-831b-0940d6ad16b6-0', tool_calls=[{'name': 'add', 'args': {'x': 3, 'y': 5}, 'id': 'toolu_01DEhqcXkXTtzJAiZ7uMBeDC'}]),\n",
" ToolMessage(content='8.0', tool_call_id='toolu_01DEhqcXkXTtzJAiZ7uMBeDC'),\n",
" AIMessage(content=[{'id': 'toolu_013DyMLrvnrto33peAKMGMr1', 'input': {'x': 8.0, 'y': 2.743}, 'name': 'exponentiate', 'type': 'tool_use'}], response_metadata={'id': 'msg_015Fmp8aztwYcce2JDAFfce3', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 545, 'output_tokens': 75}}, id='run-48aaeeeb-a1e5-48fd-a57a-6c3da2907b47-0', tool_calls=[{'name': 'exponentiate', 'args': {'x': 8.0, 'y': 2.743}, 'id': 'toolu_013DyMLrvnrto33peAKMGMr1'}]),\n",
" ToolMessage(content='300.03770462067547', tool_call_id='toolu_013DyMLrvnrto33peAKMGMr1'),\n",
" AIMessage(content=[{'text': 'So 3 plus 5 raised to the 2.743 power is 300.04.\\n\\nFor the second part:', 'type': 'text'}, {'id': 'toolu_01UTmMrGTmLpPrPCF1rShN46', 'input': {'x': 17.24, 'y': -918.1241}, 'name': 'add', 'type': 'tool_use'}], response_metadata={'id': 'msg_015TkhfRBENPib2RWAxkieH6', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'tool_use', 'stop_sequence': None, 'usage': {'input_tokens': 638, 'output_tokens': 105}}, id='run-45fb62e3-d102-4159-881d-241c5dbadeed-0', tool_calls=[{'name': 'add', 'args': {'x': 17.24, 'y': -918.1241}, 'id': 'toolu_01UTmMrGTmLpPrPCF1rShN46'}]),\n",
" ToolMessage(content='-900.8841', tool_call_id='toolu_01UTmMrGTmLpPrPCF1rShN46'),\n",
" AIMessage(content='Therefore, 17.24 - 918.1241 = -900.8841', response_metadata={'id': 'msg_01LgKnRuUcSyADCpxv9tPoYD', 'model': 'claude-3-sonnet-20240229', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 759, 'output_tokens': 24}}, id='run-1008254e-ccd1-497c-8312-9550dd77bd08-0')]}"
]
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -178,7 +177,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "langchain",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -192,7 +191,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.16"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -227,7 +227,7 @@
"outputs": [],
"source": [
"conversation_description = f\"\"\"Here is the topic of conversation: {topic}\n",
"The participants are: {\", \".join(names.keys())}\"\"\"\n",
"The participants are: {', '.join(names.keys())}\"\"\"\n",
"\n",
"agent_descriptor_system_message = SystemMessage(\n",
" content=\"You can add detail to the description of the conversation participant.\"\n",
@@ -396,7 +396,7 @@
" You are the moderator.\n",
" Please make the topic more specific.\n",
" Please reply with the specified quest in {word_limit} words or less. \n",
" Speak directly to the participants: {(*names,)}.\n",
" Speak directly to the participants: {*names,}.\n",
" Do not add anything else.\"\"\"\n",
" ),\n",
"]\n",

View File

@@ -26,7 +26,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"76e78b89cee4d6d31154823f93592315df79c28410dfbfc87c9f70cbfdfa648b\n"
"2e44b44201c8778b462342ac97f5ccf05a4e02aa8a04505ecde97bf20dcc4cbb\n"
]
}
],
@@ -49,7 +49,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install --quiet -U langchain-vdms langchain-experimental sentence-transformers opencv-python open_clip_torch torch accelerate"
"! pip install --quiet -U vdms langchain-experimental sentence-transformers opencv-python open_clip_torch torch accelerate"
]
},
{
@@ -63,16 +63,7 @@
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/data1/cwlacewe/apps/cwlacewe_langchain/.langchain-venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"outputs": [],
"source": [
"import json\n",
"import os\n",
@@ -89,10 +80,10 @@
"from langchain_community.embeddings.sentence_transformer import (\n",
" SentenceTransformerEmbeddings,\n",
")\n",
"from langchain_community.vectorstores.vdms import VDMS, VDMS_Client\n",
"from langchain_core.callbacks.manager import CallbackManagerForLLMRun\n",
"from langchain_core.runnables import ConfigurableField\n",
"from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
"from langchain_vdms.vectorstores import VDMS, VDMS_Client\n",
"from transformers import (\n",
" AutoModelForCausalLM,\n",
" AutoTokenizer,\n",
@@ -372,7 +363,7 @@
"\t\tThere are 2 shoppers in this video. Shopper 1 is wearing a plaid shirt and a spectacle. Shopper 2 who is not completely captured in the frame seems to wear a black shirt and is moving away with his back turned towards the camera. There is a shelf towards the right of the camera frame. Shopper 2 is hanging an item back to a hanger and then quickly walks away in a similar fashion as shopper 2. Contents of the nearer side of the shelf with respect to camera seems to be camping lanterns and cleansing agents, arranged at the top. In the middle part of the shelf, various tools including grommets, a pocket saw, candles, and other helpful camping items can be observed. Midway through the shelf contains items which appear to be steel containers and items made up of plastic with red, green, orange, and yellow colors, while those at the bottom are packed in cardboard boxes. Contents at the farther part of the shelf are well stocked and organized but are not glaringly visible.\n",
"\n",
"\tMetadata:\n",
"\t\t{'fps': 24.0, 'total_frames': 120.0, 'video': 'clip16.mp4'}\n",
"\t\t{'fps': 24.0, 'id': 'c6e5f894-b905-46f5-ac9e-4487a9235561', 'total_frames': 120.0, 'video': 'clip16.mp4'}\n",
"Retrieved Top matching video!\n",
"\n",
"\n"
@@ -401,12 +392,18 @@
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading checkpoint shards: 100%|██████████| 2/2 [00:18<00:00, 9.01s/it]\n",
"WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the cpu.\n"
]
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3edf8783e114487ca490d8dec5c46884",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
@@ -558,7 +555,7 @@
"\t\tA single shopper is seen in this video standing facing the shelf and in the bottom part of the frame. He's wearing a light-colored shirt and a spectacle. The shopper is carrying a red colored basket in his left hand. The entire basket is not clearly visible, but it does seem to contain something in a blue colored package which the shopper has just placed in the basket given his right hand was seen inside the basket. Then the shopper leans towards the shelf and checks out an item in orange package. He picks this single item with his right hand and proceeds to place the item in the basket. The entire shelf looks well stocked except for the top part of the shelf which is empty. The shopper has not picked any item from this part of the shelf. The rest of the shelf looks well stocked and does not need any restocking. The contents on the farther part of the shelf consists of items, majority of which are packed in black, yellow, and green packages. No other details are visible of these items.\n",
"\n",
"\tMetadata:\n",
"\t\t{'fps': 24.0, 'total_frames': 162.0, 'video': 'clip10.mp4'}\n",
"\t\t{'fps': 24.0, 'id': '37ddc212-994e-4db0-877f-5ed09965ab90', 'total_frames': 162.0, 'video': 'clip10.mp4'}\n",
"Retrieved Top matching video!\n",
"\n",
"\n"
@@ -588,7 +585,7 @@
"User : Find a man holding a red shopping basket\n",
"Assistant : Most relevant retrieved video is **clip9.mp4** \n",
"\n",
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the available information, I cannot confirm whether the basket is empty or contains items. However, the rest of the\n"
"I see a person standing in front of a well-stocked shelf, they are wearing a light-colored shirt and glasses, and they have a red shopping basket in their left hand. They are leaning forward and picking up an item from the shelf with their right hand. The item is packaged in a blue-green box. Based on the scene description, I can confirm that the person is indeed holding a red shopping basket.</s>\n"
]
}
],
@@ -658,7 +655,7 @@
],
"metadata": {
"kernelspec": {
"display_name": ".langchain-venv",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -672,7 +669,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -144,8 +144,8 @@
"outputs": [],
"source": [
"# import os\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
"# os.environ[\"LANGSMITH_PROJECT\"] = \"default\" # Make sure this session actually exists."
"# os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\"\n",
"# os.environ[\"LANGCHAIN_SESSION\"] = \"default\" # Make sure this session actually exists."
]
},
{

View File

@@ -1,9 +1,9 @@
# We build the docs in these stages:
# 1. Install vercel and python dependencies
# 2. Copy files from "source dir" to "intermediate dir"
# 2. Generate files like model feat table, etc in "intermediate dir"
# 3. Copy files to their right spots (e.g. langserve readme) in "intermediate dir"
# 4. Build the docs from "intermediate dir" to "output dir"
# we build the docs in these stages:
# 1. install vercel and python dependencies
# 2. copy files from "source dir" to "intermediate dir"
# 2. generate files like model feat table, etc in "intermediate dir"
# 3. copy files to their right spots (e.g. langserve readme) in "intermediate dir"
# 4. build the docs from "intermediate dir" to "output dir"
SOURCE_DIR = docs/
INTERMEDIATE_DIR = build/intermediate/docs
@@ -13,50 +13,43 @@ OUTPUT_NEW_DOCS_DIR = $(OUTPUT_NEW_DIR)/docs
PYTHON = .venv/bin/python
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec sh -c ' \
for dir; do \
if find "$$dir" -maxdepth 1 -type f \( -name "pyproject.toml" -o -name "setup.py" \) | grep -q .; then \
echo "$$dir"; \
fi \
done' sh {} + | grep -vE "airbyte|ibm|databricks" | tr '\n' ' ')
PORT ?= 3001
clean:
rm -rf build
clean-cache:
rm -rf build .venv/deps_installed
install-vercel-deps:
yum -y -q update
yum -y -q install gcc bzip2-devel libffi-devel zlib-devel wget tar gzip rsync -y
yum -y update
yum install gcc bzip2-devel libffi-devel zlib-devel wget tar gzip rsync -y
install-py-deps:
@echo "📦 Installing Python dependencies..."
@if [ ! -d .venv ]; then python3 -m venv .venv; fi
@if [ ! -f .venv/deps_installed ]; then \
$(PYTHON) -m pip install -q --upgrade pip --disable-pip-version-check; \
$(PYTHON) -m pip install -q --upgrade uv; \
$(PYTHON) -m uv pip install -q --pre -r vercel_requirements.txt; \
$(PYTHON) -m uv pip install -q --pre $$($(PYTHON) scripts/partner_deps_list.py) --overrides vercel_overrides.txt; \
touch .venv/deps_installed; \
fi
@echo "✅ Dependencies installed"
python3 -m venv .venv
$(PYTHON) -m pip install --upgrade pip
$(PYTHON) -m pip install --upgrade uv
$(PYTHON) -m uv pip install --pre -r vercel_requirements.txt
$(PYTHON) -m uv pip install --pre --editable $(PARTNER_DEPS_LIST)
generate-files:
@echo "📄 Generating documentation files..."
mkdir -p $(INTERMEDIATE_DIR)
cp -rp $(SOURCE_DIR)/* $(INTERMEDIATE_DIR)
@if [ ! -f build/langserve_readme_cache.md ] || [ $$(find build/langserve_readme_cache.md -mtime +1 -print) ]; then \
echo "🌐 Downloading LangServe README..."; \
curl -s https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md | sed 's/<=/\&lt;=/g' > build/langserve_readme_cache.md; \
fi
cp build/langserve_readme_cache.md $(INTERMEDIATE_DIR)/langserve.md
cp ../SECURITY.md $(INTERMEDIATE_DIR)/security.md
@echo "🔧 Generating feature tables and processing links..."
$(PYTHON) scripts/tool_feat_table.py $(INTERMEDIATE_DIR) & \
$(PYTHON) scripts/kv_store_feat_table.py $(INTERMEDIATE_DIR) & \
$(PYTHON) scripts/partner_pkg_table.py $(INTERMEDIATE_DIR) & \
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langserve.md https://github.com/langchain-ai/langserve/tree/main/ & \
wait
@echo "✅ Files generated"
$(PYTHON) scripts/tool_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/kv_store_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/partner_pkg_table.py $(INTERMEDIATE_DIR)
curl https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md | sed 's/<=/\&lt;=/g' > $(INTERMEDIATE_DIR)/langserve.md
$(PYTHON) scripts/resolve_local_links.py $(INTERMEDIATE_DIR)/langserve.md https://github.com/langchain-ai/langserve/tree/main/
copy-infra:
@echo "📂 Copying infrastructure files..."
mkdir -p $(OUTPUT_NEW_DIR)
cp -r src $(OUTPUT_NEW_DIR)
cp vercel.json $(OUTPUT_NEW_DIR)
@@ -66,24 +59,16 @@ copy-infra:
cp package.json $(OUTPUT_NEW_DIR)
cp sidebars.js $(OUTPUT_NEW_DIR)
cp -r static $(OUTPUT_NEW_DIR)
cp -r ../libs/cli/langchain_cli/integration_template $(OUTPUT_NEW_DIR)/src/theme
cp yarn.lock $(OUTPUT_NEW_DIR)
@echo "✅ Infrastructure files copied"
render:
@echo "📓 Converting notebooks (this may take a while)..."
$(PYTHON) scripts/notebook_convert.py $(INTERMEDIATE_DIR) $(OUTPUT_NEW_DOCS_DIR)
@echo "✅ Notebooks converted"
md-sync:
@echo "📝 Syncing markdown files..."
rsync -avmq --include="*/" --include="*.mdx" --include="*.md" --include="*.png" --include="*/_category_.yml" --exclude="*" $(INTERMEDIATE_DIR)/ $(OUTPUT_NEW_DOCS_DIR)
@echo "✅ Markdown files synced"
append-related:
@echo "🔗 Appending related links..."
$(PYTHON) scripts/append_related_links.py $(OUTPUT_NEW_DOCS_DIR)
@echo "✅ Related links appended"
generate-references:
$(PYTHON) scripts/generate_api_reference_links.py --docs_dir $(OUTPUT_NEW_DOCS_DIR)
@@ -91,15 +76,10 @@ generate-references:
update-md: generate-files md-sync
build: install-py-deps generate-files copy-infra render md-sync append-related
@echo ""
@echo "🎉 Documentation build complete!"
@echo "📖 To view locally, run: cd docs && make start"
@echo ""
vercel-build: install-vercel-deps build generate-references
rm -rf docs
mv $(OUTPUT_NEW_DOCS_DIR) docs
cp -r ../libs/cli/langchain_cli/integration_template src/theme
rm -rf build
mkdir static/api_reference
git clone --depth=1 https://github.com/langchain-ai/langchain-api-docs-html.git
@@ -108,9 +88,4 @@ vercel-build: install-vercel-deps build generate-references
NODE_OPTIONS="--max-old-space-size=5000" yarn run docusaurus build
start:
@echo "🚀 Starting documentation server on port $(PORT)..."
@echo "📖 Installing Node.js dependencies..."
cd $(OUTPUT_NEW_DIR) && yarn install --silent
@echo "🌐 Starting server at http://localhost:$(PORT)"
@echo "Press Ctrl+C to stop the server"
cd $(OUTPUT_NEW_DIR) && yarn start --port=$(PORT)
cd $(OUTPUT_NEW_DIR) && yarn && yarn start --port=$(PORT)

View File

@@ -108,7 +108,7 @@ class GalleryGridDirective(SphinxDirective):
# Parse the template with Sphinx Design to create an output container
# Prep the options for the template grid
class_ = "gallery-directive" + f" {self.options.get('class-container', '')}"
class_ = "gallery-directive" + f' {self.options.get("class-container", "")}'
options = {"gutter": 2, "class-container": class_}
options_str = "\n".join(f":{k}: {v}" for k, v in options.items())

View File

@@ -80,8 +80,6 @@
html {
--pst-font-family-base: 'Inter';
--pst-font-family-heading: 'Inter Tight', sans-serif;
--pst-icon-versionmodified-deprecated: var(--pst-icon-exclamation-triangle);
}
/*******************************************************************************
@@ -94,7 +92,7 @@ html {
* https://sass-lang.com/documentation/interpolation
*/
/* Defaults to light mode if data-theme is not set */
html:not([data-theme]), html[data-theme=light] {
html:not([data-theme]) {
--pst-color-primary: #287977;
--pst-color-primary-bg: #80D6D3;
--pst-color-secondary: #6F3AED;
@@ -124,8 +122,58 @@ html:not([data-theme]), html[data-theme=light] {
--pst-color-on-background: #F4F9F8;
--pst-color-surface: #F4F9F8;
--pst-color-on-surface: #222832;
--pst-color-deprecated: #f47d2e;
--pst-color-deprecated-bg: #fff3e8;
}
html:not([data-theme]) {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html:not([data-theme]) .only-dark,
html:not([data-theme]) .only-dark ~ figcaption {
display: none !important;
}
/* NOTE: @each {...} is like a for-loop
* https://sass-lang.com/documentation/at-rules/control/each
*/
html[data-theme=light] {
--pst-color-primary: #287977;
--pst-color-primary-bg: #80D6D3;
--pst-color-secondary: #6F3AED;
--pst-color-secondary-bg: #DAD6FE;
--pst-color-accent: #c132af;
--pst-color-accent-bg: #f8dff5;
--pst-color-info: #276be9;
--pst-color-info-bg: #dce7fc;
--pst-color-warning: #f66a0a;
--pst-color-warning-bg: #f8e3d0;
--pst-color-success: #00843f;
--pst-color-success-bg: #d6ece1;
--pst-color-attention: var(--pst-color-warning);
--pst-color-attention-bg: var(--pst-color-warning-bg);
--pst-color-danger: #d72d47;
--pst-color-danger-bg: #f9e1e4;
--pst-color-text-base: #222832;
--pst-color-text-muted: #48566b;
--pst-color-heading-color: #ffffff;
--pst-color-shadow: rgba(0, 0, 0, 0.1);
--pst-color-border: #d1d5da;
--pst-color-border-muted: rgba(23, 23, 26, 0.2);
--pst-color-inline-code: #912583;
--pst-color-inline-code-links: #246161;
--pst-color-target: #f3cf95;
--pst-color-background: #ffffff;
--pst-color-on-background: #F4F9F8;
--pst-color-surface: #F4F9F8;
--pst-color-on-surface: #222832;
color-scheme: light;
}
html[data-theme=light] {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html[data-theme=light] .only-dark,
html[data-theme=light] .only-dark ~ figcaption {
display: none !important;
}
html[data-theme=dark] {
@@ -158,8 +206,6 @@ html[data-theme=dark] {
--pst-color-on-background: #222832;
--pst-color-surface: #29313d;
--pst-color-on-surface: #f3f4f5;
--pst-color-deprecated: #b46f3e;
--pst-color-deprecated-bg: #341906;
/* Adjust images in dark mode (unless they have class .only-dark or
* .dark-light, in which case assume they're already optimized for dark
* mode).
@@ -170,30 +216,6 @@ html[data-theme=dark] {
*/
color-scheme: dark;
}
html:not([data-theme]) {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html:not([data-theme]) .only-dark,
html:not([data-theme]) .only-dark ~ figcaption {
display: none !important;
}
/* NOTE: @each {...} is like a for-loop
* https://sass-lang.com/documentation/at-rules/control/each
*/
html[data-theme=light] {
color-scheme: light;
}
html[data-theme=light] {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html[data-theme=light] .only-dark,
html[data-theme=light] .only-dark ~ figcaption {
display: none !important;
}
html[data-theme=dark] {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
@@ -328,7 +350,7 @@ html[data-theme=dark] .MathJax_SVG * {
}
.bd-sidebar-primary {
width: max-content; /* Adjust this value to your preference */
width: 22%; /* Adjust this value to your preference */
line-height: 1.4;
}
@@ -367,13 +389,6 @@ html[data-theme=dark] .MathJax_SVG * {
div.deprecated {
margin-top: 0.5em;
margin-bottom: 2em;
background-color: var(--pst-color-deprecated-bg);
border-color: var(--pst-color-deprecated);
}
span.versionmodified.deprecated:before {
color: var(--pst-color-deprecated);
}
.admonition-beta.admonition, div.admonition-beta.admonition {
@@ -393,4 +408,4 @@ dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.
p {
font-size: 0.9rem;
margin-bottom: 0.5rem;
}
}

View File

@@ -11,7 +11,6 @@
import json
import os
import sys
from datetime import datetime
from pathlib import Path
import toml
@@ -88,24 +87,12 @@ class Beta(BaseAdmonition):
def setup(app):
app.add_directive("example_links", ExampleLinksDirective)
app.add_directive("beta", Beta)
app.connect("autodoc-skip-member", skip_private_members)
def skip_private_members(app, what, name, obj, skip, options):
if skip:
return True
if hasattr(obj, "__doc__") and obj.__doc__ and ":private:" in obj.__doc__:
return True
if name == "__init__" and obj.__objclass__ is object:
# don't document default init
return True
return None
# -- Project information -----------------------------------------------------
project = "🦜🔗 LangChain"
copyright = f"{datetime.now().year}, LangChain Inc"
copyright = "2023, LangChain Inc"
author = "LangChain, Inc"
html_favicon = "_static/img/brand/favicon.png"
@@ -129,7 +116,6 @@ extensions = [
"_extensions.gallery_directive",
"sphinx_design",
"sphinx_copybutton",
"sphinxcontrib.googleanalytics",
]
source_suffix = [".rst", ".md"]
@@ -262,8 +248,6 @@ myst_enable_extensions = ["colon_fence"]
# generate autosummary even if no references
autosummary_generate = True
# Don't fail on autosummary import warnings
autosummary_ignore_module_all = False
html_copy_source = False
html_show_sourcelink = False
@@ -271,14 +255,8 @@ html_show_sourcelink = False
# Set canonical URL from the Read the Docs Domain
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "")
googleanalytics_id = "G-9B66JQQH2F"
# Tell Jinja2 templates the build is running on Read the Docs
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
master_doc = "index"
# If a signatures length in characters exceeds 60,
# each parameter within the signature will be displayed on an individual logical line
maximum_signature_line_length = 60

View File

@@ -72,21 +72,14 @@ def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
Returns:
list: A list of loaded module objects.
"""
classes_: List[ClassInfo] = []
functions: List[FunctionInfo] = []
module = importlib.import_module(module_path)
if ":private:" in (module.__doc__ or ""):
return ModuleMembers(classes_=[], functions=[])
for name, type_ in inspect.getmembers(module):
if not hasattr(type_, "__module__"):
continue
if type_.__module__ != module_path:
continue
if ":private:" in (type_.__doc__ or ""):
continue
if inspect.isclass(type_):
# The type of the class is used to select a template
@@ -97,7 +90,7 @@ def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
if type(type_) is typing_extensions._TypedDictMeta: # type: ignore
kind: ClassKind = "TypedDict"
elif type(type_) is typing._TypedDictMeta: # type: ignore
kind = "TypedDict"
kind: ClassKind = "TypedDict"
elif (
issubclass(type_, Runnable)
and issubclass(type_, BaseModel)
@@ -189,7 +182,7 @@ def _load_package_modules(
if isinstance(package_directory, str)
else package_directory
)
modules_by_namespace: Dict[str, ModuleMembers] = {}
modules_by_namespace = {}
# Get the high level package name
package_name = package_path.name
@@ -202,12 +195,6 @@ def _load_package_modules(
if file_path.name.startswith("_"):
continue
if "integration_template" in file_path.parts:
continue
if "project_template" in file_path.parts:
continue
relative_module_name = file_path.relative_to(package_path)
# Skip if any module part starts with an underscore
@@ -273,7 +260,7 @@ def _construct_doc(
.. _{package_namespace}:
======================================
{package_namespace.replace("_", "-")}: {package_version}
{package_namespace.replace('_', '-')}: {package_version}
======================================
.. automodule:: {package_namespace}
@@ -283,7 +270,7 @@ def _construct_doc(
.. toctree::
:hidden:
:maxdepth: 2
"""
index_autosummary = """
"""
@@ -331,7 +318,7 @@ def _construct_doc(
index_autosummary += f"""
:ref:`{package_namespace}_{module}`
{"^" * (len(package_namespace) + len(module) + 8)}
{'^' * (len(package_namespace) + len(module) + 8)}
"""
if classes:
@@ -365,12 +352,12 @@ def _construct_doc(
module_doc += f"""\
:template: {template}
{class_["qualified_name"]}
"""
index_autosummary += f"""
{class_["qualified_name"]}
{class_['qualified_name']}
"""
if functions:
@@ -433,7 +420,7 @@ def _construct_doc(
"""
index_autosummary += f"""
{class_["qualified_name"]}
{class_['qualified_name']}
"""
if deprecated_functions:
@@ -492,16 +479,23 @@ def _package_namespace(package_name: str) -> str:
Returns:
modified package_name: Can be either "langchain" or "langchain_{package_name}"
"""
if package_name == "langchain":
return "langchain"
if package_name == "standard-tests":
return "langchain_tests"
return f"langchain_{package_name.replace('-', '_')}"
return (
package_name
if package_name == "langchain"
else f"langchain_{package_name.replace('-', '_')}"
)
def _package_dir(package_name: str = "langchain") -> Path:
"""Return the path to the directory containing the documentation."""
if (ROOT_DIR / "libs" / package_name).exists():
if package_name in (
"langchain",
"experimental",
"community",
"core",
"cli",
"text-splitters",
):
return ROOT_DIR / "libs" / package_name / _package_namespace(package_name)
else:
return (
@@ -526,12 +520,7 @@ def _get_package_version(package_dir: Path) -> str:
"Aborting the build."
)
exit(1)
try:
# uses uv
return pyproject["project"]["version"]
except KeyError:
# uses poetry
return pyproject["tool"]["poetry"]["version"]
return pyproject["tool"]["poetry"]["version"]
def _out_file_path(package_name: str) -> Path:
@@ -545,20 +534,13 @@ def _build_index(dirs: List[str]) -> None:
"ai21": "AI21",
"ibm": "IBM",
}
ordered = [
"core",
"langchain",
"text-splitters",
"community",
"experimental",
"standard-tests",
]
ordered = ["core", "langchain", "text-splitters", "community", "experimental"]
main_ = [dir_ for dir_ in ordered if dir_ in dirs]
integrations = sorted(dir_ for dir_ in dirs if dir_ not in main_)
doc = """# LangChain Python API Reference
Welcome to the LangChain Python API reference. This is a reference for all
`langchain-x` packages.
Welcome to the LangChain Python API reference. This is a reference for all
`langchain-x` packages.
For user guides see [https://python.langchain.com](https://python.langchain.com).
@@ -597,12 +579,7 @@ For the legacy API reference hosted on ReadTheDocs see [https://api.python.langc
if integrations:
integration_headers = [
" ".join(
custom_names.get(
x,
x.title().replace("db", "DB")
if dir_ == "langchain_v1"
else x.title().replace("ai", "AI").replace("db", "DB"),
)
custom_names.get(x, x.title().replace("ai", "AI").replace("db", "DB"))
for x in dir_.split("-")
)
for dir_ in integrations
@@ -670,12 +647,17 @@ def main(dirs: Optional[list] = None) -> None:
print("Starting to build API reference files.")
if not dirs:
dirs = [
p.parent.name
for p in (ROOT_DIR / "libs").rglob("pyproject.toml")
# Exclude packages that are not directly under libs/ or libs/partners/
if p.parent.parent.name in ("libs", "partners")
dir_
for dir_ in os.listdir(ROOT_DIR / "libs")
if dir_ not in ("cli", "partners", "packages.yml")
]
for dir_ in sorted(dirs):
dirs += [
dir_
for dir_ in os.listdir(ROOT_DIR / "libs" / "partners")
if os.path.isdir(ROOT_DIR / "libs" / "partners" / dir_)
and "pyproject.toml" in os.listdir(ROOT_DIR / "libs" / "partners" / dir_)
]
for dir_ in dirs:
# Skip any hidden directories
# Some of these could be present by mistake in the code base
# e.g., .pytest_cache from running tests from the wrong location.
@@ -686,7 +668,7 @@ def main(dirs: Optional[list] = None) -> None:
print("Building package:", dir_)
_build_rst_file(package_name=dir_)
_build_index(sorted(dirs))
_build_index(dirs)
print("API reference files built.")

File diff suppressed because one or more lines are too long

View File

@@ -9,4 +9,3 @@ pyyaml
sphinx-design
sphinx-copybutton
beautifulsoup4
sphinxcontrib-googleanalytics

View File

@@ -7,7 +7,7 @@
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_config <langchain_core.runnables.base.Runnable.with_config>`, :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
{% block attributes %}
{% if attributes %}

View File

@@ -19,6 +19,6 @@
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_config <langchain_core.runnables.base.Runnable.with_config>`, :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
.. example_links:: {{ objname }}

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More