Files
langchain/.github/copilot-instructions.md
Mason Daugherty 5e9eb19a83 chore: update branch with changes from master (#32277)
Co-authored-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: jmaillefaud <jonathan.maillefaud@evooq.ch>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: tanwirahmad <tanwirahmad@users.noreply.github.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: niceg <79145285+growmuye@users.noreply.github.com>
Co-authored-by: Chaitanya varma <varmac301@gmail.com>
Co-authored-by: dishaprakash <57954147+dishaprakash@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Kanav Bansal <13186335+bansalkanav@users.noreply.github.com>
Co-authored-by: Aleksandr Filippov <71711753+alex-feel@users.noreply.github.com>
Co-authored-by: Alex Feel <afilippov@spotware.com>
2025-07-28 10:39:41 -04:00

8.6 KiB

Global Development Guidelines for LangChain Projects

Core Development Principles

1. Maintain Stable Public Interfaces ⚠️ CRITICAL

Always attempt to preserve function signatures, argument positions, and names for exported/public methods.

Bad - Breaking Change:

def get_user(id, verbose=False):  # Changed from `user_id`
    pass

Good - Stable Interface:

def get_user(user_id: str, verbose: bool = False) -> User:
    """Retrieve user by ID with optional verbose output."""
    pass

Before making ANY changes to public APIs:

  • Check if the function/class is exported in __init__.py
  • Look for existing usage patterns in tests and examples
  • Use keyword-only arguments for new parameters: *, new_param: str = "default"
  • Mark experimental features clearly with docstring warnings (using reStructuredText, like .. warning::)

🧠 Ask yourself: "Would this change break someone's code if they used it last week?"

2. Code Quality Standards

All Python code MUST include type hints and return types.

Bad:

def p(u, d):
    return [x for x in u if x not in d]

Good:

def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
    """Filter out users that are not in the known users set.

    Args:
        users: List of user identifiers to filter.
        known_users: Set of known/valid user identifiers.

    Returns:
        List of users that are not in the known_users set.
    """
    return [user for user in users if user not in known_users]

Style Requirements:

  • Use descriptive, self-explanatory variable names. Avoid overly short or cryptic identifiers.
  • Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
  • Avoid unnecessary abstraction or premature optimization
  • Follow existing patterns in the codebase you're modifying

3. Testing Requirements

Every new feature or bugfix MUST be covered by unit tests.

Test Organization:

  • Unit tests: tests/unit_tests/ (no network calls allowed)
  • Integration tests: tests/integration_tests/ (network calls permitted)
  • Use pytest as the testing framework

Test Quality Checklist:

  • Tests fail when your new logic is broken
  • Happy path is covered
  • Edge cases and error conditions are tested
  • Use fixtures/mocks for external dependencies
  • Tests are deterministic (no flaky tests)

Checklist questions:

  • Does the test suite fail if your new logic is broken?
  • Are all expected behaviors exercised (happy path, invalid input, etc)?
  • Do tests use fixtures or mocks where needed?
def test_filter_unknown_users():
    """Test filtering unknown users from a list."""
    users = ["alice", "bob", "charlie"]
    known_users = {"alice", "bob"}

    result = filter_unknown_users(users, known_users)

    assert result == ["charlie"]
    assert len(result) == 1

4. Security and Risk Assessment

Security Checklist:

  • No eval(), exec(), or pickle on user-controlled input
  • Proper exception handling (no bare except:) and use a msg variable for error messages
  • Remove unreachable/commented code before committing
  • Race conditions or resource leaks (file handles, sockets, threads).
  • Ensure proper resource cleanup (file handles, connections)

Bad:

def load_config(path):
    with open(path) as f:
        return eval(f.read())  # ⚠️ Never eval config

Good:

import json

def load_config(path: str) -> dict:
    with open(path) as f:
        return json.load(f)

5. Documentation Standards

Use Google-style docstrings with Args section for all public functions.

Insufficient Documentation:

def send_email(to, msg):
    """Send an email to a recipient."""

Complete Documentation:

def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
    """
    Send an email to a recipient with specified priority.

    Args:
        to: The email address of the recipient.
        msg: The message body to send.
        priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).

    Returns:
        True if email was sent successfully, False otherwise.

    Raises:
        InvalidEmailError: If the email address format is invalid.
        SMTPConnectionError: If unable to connect to email server.
    """

Documentation Guidelines:

  • Types go in function signatures, NOT in docstrings
  • Focus on "why" rather than "what" in descriptions
  • Document all parameters, return values, and exceptions
  • Keep descriptions concise but clear
  • Use reStructuredText for docstrings to enable rich formatting

📌 Tip: Keep descriptions concise but clear. Only document return values if non-obvious.

6. Architectural Improvements

When you encounter code that could be improved, suggest better designs:

Poor Design:

def process_data(data, db_conn, email_client, logger):
    # Function doing too many things
    validated = validate_data(data)
    result = db_conn.save(validated)
    email_client.send_notification(result)
    logger.log(f"Processed {len(data)} items")
    return result

Better Design:

@dataclass
class ProcessingResult:
    """Result of data processing operation."""
    items_processed: int
    success: bool
    errors: List[str] = field(default_factory=list)

class DataProcessor:
    """Handles data validation, storage, and notification."""

    def __init__(self, db_conn: Database, email_client: EmailClient):
        self.db = db_conn
        self.email = email_client

    def process(self, data: List[dict]) -> ProcessingResult:
        """Process and store data with notifications."""
        validated = self._validate_data(data)
        result = self.db.save(validated)
        self._notify_completion(result)
        return result

Design Improvement Areas:

If there's a cleaner, more scalable, or simpler design, highlight it and suggest improvements that would:

  • Reduce code duplication through shared utilities
  • Make unit testing easier
  • Improve separation of concerns (single responsibility)
  • Make unit testing easier through dependency injection
  • Add clarity without adding complexity
  • Prefer dataclasses for structured data

Development Tools & Commands

Package Management

# Add package
uv add package-name

# Sync project dependencies
uv sync
uv lock

Testing

# Run unit tests (no network)
make test

# Don't run integration tests, as API keys must be set

# Run specific test file
uv run --group test pytest tests/unit_tests/test_specific.py

Code Quality

# Lint code
make lint

# Format code
make format

# Type checking
uv run --group lint mypy .

Dependency Management Patterns

Local Development Dependencies:

[tool.uv.sources]
langchain-core = { path = "../core", editable = true }
langchain-tests = { path = "../standard-tests", editable = true }

For tools, use the @tool decorator from langchain_core.tools:

from langchain_core.tools import tool

@tool
def search_database(query: str) -> str:
    """Search the database for relevant information.

    Args:
        query: The search query string.
    """
    # Implementation here
    return results

Commit Standards

Use Conventional Commits format for PR titles:

  • feat(core): add multi-tenant support
  • fix(cli): resolve flag parsing error
  • docs: update API usage examples
  • docs(openai): update API usage examples

Framework-Specific Guidelines

  • Follow the existing patterns in langchain-core for base abstractions
  • Use langchain_core.callbacks for execution tracking
  • Implement proper streaming support where applicable
  • Avoid deprecated components like legacy LLMChain

Partner Integrations

  • Follow the established patterns in existing partner libraries
  • Implement standard interfaces (BaseChatModel, BaseEmbeddings, etc.)
  • Include comprehensive integration tests
  • Document API key requirements and authentication

Quick Reference Checklist

Before submitting code changes:

  • Breaking Changes: Verified no public API changes
  • Type Hints: All functions have complete type annotations
  • Tests: New functionality is fully tested
  • Security: No dangerous patterns (eval, silent failures, etc.)
  • Documentation: Google-style docstrings for public functions
  • Code Quality: make lint and make format pass
  • Architecture: Suggested improvements where applicable
  • Commit Message: Follows Conventional Commits format