mirror of https://github.com/hwchase17/langchain.git synced 2026-04-23 20:23:59 +00:00

Files

Mason Daugherty f5f715985a chore: rework PR title and description guidance (#36917 )

Rework the PR and commit guidance in the agent guidelines so new
contributors (human and AI) produce descriptions and titles that age
well.

2026-04-21 12:03:54 -04:00

13 KiB

Raw Blame History

Global development guidelines for the LangChain monorepo

This document provides context to understand the LangChain Python project and assist with development.

Project architecture and context

Monorepo structure

This is a Python monorepo with multiple independently versioned packages that use uv.

langchain/
├── libs/
│   ├── core/             # `langchain-core` primitives and base abstractions
│   ├── langchain/        # `langchain-classic` (legacy, no new features)
│   ├── langchain_v1/     # Actively maintained `langchain` package
│   ├── partners/         # Third-party integrations
│   │   ├── openai/       # OpenAI models and embeddings
│   │   ├── anthropic/    # Anthropic (Claude) integration
│   │   ├── ollama/       # Local model support
│   │   └── ... (other integrations maintained by the LangChain team)
│   ├── text-splitters/   # Document chunking utilities
│   ├── standard-tests/   # Shared test suite for integrations
│   ├── model-profiles/   # Model configuration profiles
├── .github/              # CI/CD workflows and templates
├── .vscode/              # VSCode IDE standard settings and recommended extensions
└── README.md             # Information about LangChain

Core layer (langchain-core): Base abstractions, interfaces, and protocols. Users should not need to know about this layer directly.
Implementation layer (langchain): Concrete implementations and high-level public utilities
Integration layer (partners/): Third-party service integrations. Note that this monorepo is not exhaustive of all LangChain integrations; some are maintained in separate repos, such as langchain-ai/langchain-google and langchain-ai/langchain-aws. Usually these repos are cloned at the same level as this monorepo, so if needed, you can refer to their code directly by navigating to ../langchain-google/ from this monorepo.
Testing layer (standard-tests/): Standardized integration tests for partner integrations

Development tools & commands

uv – Fast Python package installer and resolver (replaces pip/poetry)
make – Task runner for common development commands. Feel free to look at the Makefile for available commands and usage patterns.
ruff – Fast Python linter and formatter
mypy – Static type checking
pytest – Testing framework

This monorepo uses uv for dependency management. Local development uses editable installs: [tool.uv.sources]

Each package in libs/ has its own pyproject.toml and uv.lock.

Before running your tests, set up all packages by running:

# For all groups
uv sync --all-groups

# or, to install a specific group only:
uv sync --group test

# Run unit tests (no network)
make test

# Run specific test file
uv run --group test pytest tests/unit_tests/test_specific.py

# Lint code
make lint

# Format code
make format

# Type checking
uv run --group lint mypy .

Key config files

pyproject.toml: Main workspace configuration with dependency groups
uv.lock: Locked dependencies for reproducible builds
Makefile: Development tasks

PR and commit titles

Follow Conventional Commits. See .github/workflows/pr_lint.yml for allowed types and scopes. All titles must include a scope with no exceptions — even for the main langchain package.

Start the text after type(scope): with a lowercase letter, unless the first word is a proper noun (e.g. Azure, GitHub, OpenAI) or a named entity (class, function, method, parameter, or variable name).
Wrap named entities in backticks so they render as code. Proper nouns are left unadorned.
Keep titles short and descriptive — save detail for the body.

Examples:

feat(langchain): add new chat completion feature
fix(core): resolve type hinting issue in vector store
chore(anthropic): update infrastructure dependencies
feat(langchain): `ls_agent_type` tag on `create_agent` calls
fix(openai): infer Azure chat profiles from model name

PR descriptions

The description is the summary — do not add a # Summary header.

When the PR closes an issue, lead with the closing keyword on its own line at the very top, followed by a horizontal rule and then the body:
```
Closes #123

---

<rest of description>
```
Only Closes, Fixes, and Resolves auto-close the referenced issue on merge. Related: or similar labels are informational and do not close anything.
Explain the why: the motivation and why this solution is the right one. Limit prose.
Write for readers who may be unfamiliar with this area of the codebase. Avoid insider shorthand and prefer language that is friendly to public viewers — this aids interpretability.
Do not cite line numbers; they go stale as soon as the file changes.
Rarely include full file paths or filenames. Reference the affected symbol, class, or subsystem by name instead.
Wrap class, function, method, parameter, and variable names in backticks.
Skip dedicated "Test plan" or "Testing" sections in most cases. Mention tests only when coverage is non-obvious, risky, or otherwise notable.
Call out areas of the change that require careful review.
Add a brief disclaimer noting AI-agent involvement in the contribution.

Core development principles

Maintain stable public interfaces

CRITICAL: Always attempt to preserve function signatures, argument positions, and names for exported/public methods. Do not make breaking changes. You should warn the developer for any function signature changes, regardless of whether they look breaking or not.

Before making ANY changes to public APIs:

Check if the function/class is exported in __init__.py
Look for existing usage patterns in tests and examples
Use keyword-only arguments for new parameters: *, new_param: str = "default"
Mark experimental features clearly with docstring warnings (using MkDocs Material admonitions, like !!! warning)

Ask: "Would this change break someone's code if they used it last week?"

Code quality standards

All Python code MUST include type hints and return types.

def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
    """Single line description of the function.

    Any additional context about the function can go here.

    Args:
        users: List of user identifiers to filter.
        known_users: Set of known/valid user identifiers.

    Returns:
        List of users that are not in the `known_users` set.
    """

Use descriptive, self-explanatory variable names.
Follow existing patterns in the codebase you're modifying
Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense

Testing requirements

Every new feature or bugfix MUST be covered by unit tests.

Unit tests: tests/unit_tests/ (no network calls allowed)
Integration tests: tests/integration_tests/ (network calls permitted)
We use pytest as the testing framework; if in doubt, check other existing tests for examples.
The testing file structure should mirror the source code structure.

Checklist:

Tests fail when your new logic is broken
Happy path is covered
Edge cases and error conditions are tested
Use fixtures/mocks for external dependencies
Tests are deterministic (no flaky tests)
Does the test suite fail if your new logic is broken?

Security and risk assessment

No eval(), exec(), or pickle on user-controlled input
Proper exception handling (no bare except:) and use a msg variable for error messages
Remove unreachable/commented code before committing
Race conditions or resource leaks (file handles, sockets, threads).
Ensure proper resource cleanup (file handles, connections)

Documentation standards

Use Google-style docstrings with Args section for all public functions.

def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
    """Send an email to a recipient with specified priority.

    Any additional context about the function can go here.

    Args:
        to: The email address of the recipient.
        msg: The message body to send.
        priority: Email priority level.

    Returns:
        `True` if email was sent successfully, `False` otherwise.

    Raises:
        InvalidEmailError: If the email address format is invalid.
        SMTPConnectionError: If unable to connect to email server.
    """

Types go in function signatures, NOT in docstrings
- If a default is present, DO NOT repeat it in the docstring unless there is post-processing or it is set conditionally.
Focus on "why" rather than "what" in descriptions
Document all parameters, return values, and exceptions
Keep descriptions concise but clear
Ensure American English spelling (e.g., "behavior", not "behaviour")
Do NOT use Sphinx-style double backtick formatting (``code``). Use single backticks (`code`) for inline code references in docstrings and comments.

Model references in docs and examples

Always use the latest generally available (GA) models when referencing LLMs in docstrings and illustrative code snippets. Avoid preview or beta identifiers unless the model has no GA equivalent. Outdated model names signal stale code and confuse users.

Before writing or updating model references, verify current model IDs against the provider's official docs. Do not rely on memorized or cached model names — they go stale quickly.

Changing shipped default parameter values in code (e.g., a model= kwarg default in a class constructor) may constitute a breaking change — see "Maintain stable public interfaces" above. This guidance applies to documentation and examples, not code defaults.

For model profile data (capability flags, context windows), use the langchain-profiles CLI described below.

Model profiles

Model profiles are generated using the langchain-profiles CLI in libs/model-profiles. The --data-dir must point to the directory containing profile_augmentations.toml, not the top-level package directory.

# Run from libs/model-profiles
cd libs/model-profiles

# Refresh profiles for a partner in this repo
uv run langchain-profiles refresh --provider openai --data-dir ../partners/openai/langchain_openai/data

# Refresh profiles for a partner in an external repo (requires echo y to confirm)
echo y | uv run langchain-profiles refresh --provider google --data-dir /path/to/langchain-google/libs/genai/langchain_google_genai/data

Example partners with profiles in this repo:

libs/partners/openai/langchain_openai/data/ (provider: openai)
libs/partners/anthropic/langchain_anthropic/data/ (provider: anthropic)
libs/partners/perplexity/langchain_perplexity/data/ (provider: perplexity)

The echo y | pipe is required when --data-dir is outside the libs/model-profiles working directory.

CI/CD infrastructure

Release process

Releases are triggered manually via .github/workflows/_release.yml with working-directory and release-version inputs.

PR labeling and linting

Title linting (.github/workflows/pr_lint.yml)

Auto-labeling:

.github/workflows/pr_labeler.yml – Unified PR labeler (size, file, title, external/internal, contributor tier)
.github/workflows/pr_labeler_backfill.yml – Manual backfill of PR labels on open PRs
.github/workflows/auto-label-by-package.yml – Issue labeling by package
.github/workflows/tag-external-issues.yml – Issue external/internal classification

Adding a new partner to CI

When adding a new partner package, update these files:

.github/ISSUE_TEMPLATE/*.yml – Add to package dropdown
.github/dependabot.yml – Add dependency update entry
.github/scripts/pr-labeler-config.json – Add file rule and scope-to-label mapping
.github/workflows/_release.yml – Add API key secrets if needed
.github/workflows/auto-label-by-package.yml – Add package label
.github/workflows/check_diffs.yml – Add to change detection
.github/workflows/integration_tests.yml – Add integration test config
.github/workflows/pr_lint.yml – Add to allowed scopes

GitHub Actions & Workflows

This repository require actions to be pinned to a full-length commit SHA. Attempting to use a tag will fail. Use the gh cli to query. Verify tags are not annotated tag objects (which would need dereferencing).

Additional resources

Documentation: https://docs.langchain.com/oss/python/langchain/overview and source at https://github.com/langchain-ai/docs or ../docs/. Prefer the local install and use file search tools for best results. If needed, use the docs MCP server as defined in .mcp.json for programmatic access.
Contributing Guide: Contributing Guide

13 KiB Raw Blame History Unescape Escape