Compare commits

...

181 Commits

Author SHA1 Message Date
Eugene Yurtsev
7ab4a7841a chore(infra): properly disable testing for 0.3 against latest packages (#34041)
properly disable testing for 0.3 against latest packages (since latest
packages only work with langchain_core > 1.0)
2025-11-19 17:17:28 -05:00
Eugene Yurtsev
6e968fd23c chore(infra): disable integration tests temporarily when releasing v0.3 (#34040)
Temporarily disable running tests against latest published packages with
core 0.3 as the latest published packages only work with core>1.0
2025-11-19 17:09:41 -05:00
Eugene Yurtsev
640d85c60f release(core): 0.3.80 (#34039)
Release 0.3.80
2025-11-19 16:53:03 -05:00
Eugene Yurtsev
fa7789d6c2 fix(core): fix validation for input variables in f-string templates, restrict functionality supported by jinja2, mustache templates (#34038)
* Fix validation for input variables in f-string templates
* Restrict functionality of features supported by jinja2 and mustache
templates
2025-11-19 16:52:32 -05:00
Lauren Hirata Singh
889e8b6de8 Revert 33805 fix last plz (#33806) 2025-11-03 16:16:18 -05:00
Lauren Hirata Singh
5cb0501c59 fix(docs): redirects (#33805) 2025-11-03 15:58:13 -05:00
Lauren Hirata Singh
5838e3e8e5 fix(docs): fine tune redirects (#33802) 2025-11-03 15:01:53 -05:00
Lauren Hirata Singh
fbd96c688a chore(docs): Add redirect for multi-modal (#33801) 2025-11-03 13:04:41 -05:00
Lauren Hirata Singh
2085f69d68 fix(docs): Make redirects more specific for integrations (#33799) 2025-11-03 11:38:56 -05:00
Lauren Hirata Singh
df2ec0ca38 fix(docs): Fix regex in redirects (#33795) 2025-11-03 10:52:37 -05:00
Lauren Hirata Singh
51e1447c9e chore(docs): add api reference redirects (#33765) 2025-10-31 13:50:00 -04:00
Lauren Hirata Singh
bac96fe33f fix(docs): get redirects to build (#33763) 2025-10-31 12:09:21 -04:00
Lauren Hirata Singh
d8b08a1ecd fix(docs): redirects (#33734) 2025-10-29 17:57:08 -04:00
Lauren Hirata Singh
9b5e00f578 fix(docs): Redirects fix (#33724) 2025-10-29 13:47:16 -04:00
Lauren Hirata Singh
8c22e69491 chore(docs): redirects to new docs (#33703)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-10-29 12:12:18 -04:00
Mason Daugherty
d62b4499ad fix: (v0.3) unsupported @vercel/edge import (#33620) 2025-10-21 00:37:40 -04:00
Mason Daugherty
f8bb3f0d19 docs: v0.3 deprecation banner (#33613) 2025-10-20 17:01:06 -04:00
Mason Daugherty
8284e278d6 Revert "chore(docs): v0.3 redirects" (#33612)
Reverts langchain-ai/langchain#33553
2025-10-20 11:27:03 -04:00
Lauren Hirata Singh
3a846eeb8d chore(docs): v0.3 redirects (#33553) 2025-10-17 00:00:21 -04:00
Lauren Hirata Singh
d273341249 chore(docs): add middleware to handle redirects (#33547)
still need to add v0.3 redirects
2025-10-16 21:12:08 -04:00
Lauren Hirata Singh
db49a14a34 chore(docs): Redirects v0.1/v0.2 (#33538) 2025-10-16 16:46:37 -04:00
Mason Daugherty
ab7eda236e fix: feature table for MongoDB (#33471) 2025-10-13 21:17:22 -04:00
Jib
d418cbdf44 docs: flag Multi Tenancy as a MongoDBAtlasVectorStore supported feature (#33469)
- **Description:** 
- Change the docs flag for v0.3 branch to list Multi-tenancy as a
MongoDBAtlasVectorStore supported feature
  - **Issue:** N/A
  - **Dependencies:** None

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://docs.langchain.com/oss/python/contributing) for
more.

Additional guidelines:

- Most PRs should not touch more than one package.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests. Likewise,
please do not update the `uv.lock` files unless you are adding a
required dependency.
- Changes should be backwards compatible.
- Make sure optional dependencies are imported within a function.
2025-10-13 16:57:40 -04:00
ccurme
b93d2f7f3a release(core): 0.3.79 (#33401) 2025-10-09 16:59:16 -04:00
ccurme
a763ebe86c release(anthropic): 0.3.22 (#33394) 2025-10-09 14:29:38 -04:00
ccurme
5fa1094451 fix(anthropic,standard-tests): carry over updates to v0.3 (#33393)
Cherry pick of https://github.com/langchain-ai/langchain/pull/33390 and
https://github.com/langchain-ai/langchain/pull/33391.
2025-10-09 14:25:34 -04:00
Anika
dd4de696b8 fix(core): handle parent/child mustache vars (#33346)
Description:

currently mustache_schema("{{x.y}} {{x}}") will error. pr fixes

Issue: na
**Dependencies:**na

---------

Co-authored-by: Mason Daugherty <github@mdrxy.com>
2025-10-09 08:52:10 -07:00
Mason Daugherty
809a0216a5 chore: update v0.3 ref homepage (#33338) 2025-10-07 12:56:07 -04:00
Mason Daugherty
44ec72fa0d fix(infra): allow prerelease installations for partner packages during api ref doc build (#33325) 2025-10-06 18:05:30 -04:00
Mason Daugherty
5459ff1ee3 chore: cap lib upper bounds, run sync (#33320) 2025-10-06 17:39:01 -04:00
ccurme
f95669aa0a release(openai): 0.3.35 (#33299) 2025-10-06 10:51:01 -04:00
ccurme
3a465d635b feat(openai): enable stream_usage when using default base URL and client (#33296) 2025-10-06 10:22:36 -04:00
ccurme
0b51de4cab release(core): 0.3.78 (#33253) 2025-10-03 12:40:15 -04:00
ccurme
5904cbea89 feat(core): add optional include_id param to convert_to_openai_messages function (#33248) 2025-10-03 11:37:16 -04:00
Mason Daugherty
c9590ef79d docs: fix infinite loop in vercel.json redirects (#33240) 2025-10-02 20:24:09 -04:00
Mason Daugherty
c972552c40 docs: work for freeze (#33239) 2025-10-02 20:01:26 -04:00
Mason Daugherty
e16feb93b9 release(ollama): 0.3.10 (#33210) 2025-10-02 11:35:24 -04:00
Mason Daugherty
2bb57d45d2 fix(ollama): exclude None parameters from options dictionary (#33208) (#33209)
fix #33206
2025-10-02 11:32:21 -04:00
ccurme
d7cce2f469 feat(langchain_v1): update messages namespace (#33207) 2025-10-02 10:35:00 -04:00
Mason Daugherty
48b77752d0 release(ollama): 0.3.9 (#33200) 2025-10-01 22:31:20 -04:00
Mason Daugherty
6f2d16e6be refactor(ollama): simplify options handling (#33199)
Fixes #32744

Don't restrict options; the client accepts any dict
2025-10-01 21:58:12 -04:00
Mason Daugherty
a9eda18e1e refactor(ollama): clean up tests (#33198) 2025-10-01 21:52:01 -04:00
Mason Daugherty
a89c549cb0 feat(ollama): add basic auth support (#32328)
support for URL authentication in the format
`https://user:password@host:port` for all LangChain Ollama clients.

Related to #32327 and #25055
2025-10-01 20:46:37 -04:00
Sydney Runkle
a336afaecd feat(langchain): use decorators for jumps instead (#33179)
The old `before_model_jump_to` classvar approach was quite clunky, this
is nicer imo and easier to document. Also moving from `jump_to` to
`can_jump_to` which is more idiomatic.

Before:

```py
class MyMiddleware(AgentMiddleware):
    before_model_jump_to: ClassVar[list[JumpTo]] = ["end"]

    def before_model(state, runtime) -> dict[str, Any]:
        return {"jump_to": "end"}
```

After

```py
class MyMiddleware(AgentMiddleware):

    @hook_config(can_jump_to=["end"])
    def before_model(state, runtime) -> dict[str, Any]:
        return {"jump_to": "end"}
```
2025-10-01 16:49:27 -07:00
Lauren Hirata Singh
af07949d13 fix(docs): Redirects (#33190) 2025-10-01 16:28:47 -04:00
Sydney Runkle
a10e880c00 feat(langchain_v1): add async support for create_agent (#33175)
This makes branching **much** more simple internally and helps greatly
w/ type safety for users. It just allows for one signature on hooks
instead of multiple.

Opened after https://github.com/langchain-ai/langchain/pull/33164
ballooned more than expected, w/ branching for:
* sync vs async
* runtime vs no runtime (this is self imposed)

**This also removes support for nodes w/o `runtime` in the signature.**
We can always go back and add support for nodes w/o `runtime`.

I think @christian-bromann's idea to re-export `runtime` from
langchain's agents might make sense due to the abundance of imports
here.

Check out the value of the change based on this diff:
https://github.com/langchain-ai/langchain/pull/33176
2025-10-01 19:15:39 +00:00
Eugene Yurtsev
7b5e839be3 chore(langchain_v1): use list[str] for modifyModelRequest (#33166)
Update model request to return tools by name. This will decrease the
odds of misusing the API.

We'll need to extend the type for built-in tools later.
2025-10-01 14:46:19 -04:00
ccurme
740842485c fix(openai): bump min core version (#33188)
Required for new tests added in
https://github.com/langchain-ai/langchain/pull/32541 and
https://github.com/langchain-ai/langchain/pull/33183.
2025-10-01 11:01:15 -04:00
noeliecherrier
08bb74f148 fix(mistralai): handle HTTP errors in async embed documents (#33187)
The async embed function does not properly handle HTTP errors.

For instance with large batches, Mistral AI returns `Too many inputs in
request, split into more batches.` in a 400 error.

This leads to a KeyError in `response.json()["data"]` l.288

This PR fixes the issue by:
- calling `response.raise_for_status()` before returning
- adding a retry similarly to what is done in the synchronous
counterpart `embed_documents`

I also added an integration test, but willing to move it to unit tests
if more relevant.
2025-10-01 10:57:47 -04:00
ccurme
7d78ed9b53 release(standard-tests): 0.3.22 (#33186) 2025-10-01 10:39:17 -04:00
ccurme
7ccff656eb release(core): 0.3.77 (#33185) 2025-10-01 10:24:07 -04:00
ccurme
002d623f2d feat: (core, standard-tests) support PDF inputs in ToolMessages (#33183) 2025-10-01 10:16:16 -04:00
Mohammad Mohtashim
34f8031bd9 feat(langchain): Using Structured Response as Key in Output Schema for Middleware Agent (#33159)
- **Description:** Changing the key from `response` to
`structured_response` for middleware agent to keep it sync with agent
without middleware. This a breaking change.
 - **Issue:** #33154
2025-10-01 03:24:59 +00:00
Mason Daugherty
a541b5bee1 chore(infra): rfc README.md for better presentation (#33172) 2025-09-30 17:44:42 -04:00
Mason Daugherty
3e970506ba chore(core): remove runnable section from README.md (#33171) 2025-09-30 17:15:31 -04:00
Mason Daugherty
d1b0196faa chore(infra): whitespace fix (#33170) 2025-09-30 17:14:55 -04:00
ccurme
aac69839a9 release(openai): 0.3.34 (#33169) 2025-09-30 16:48:39 -04:00
ccurme
64141072a3 feat(openai): support openai sdk 2.0 (#33168) 2025-09-30 16:34:00 -04:00
Mason Daugherty
0795be2a04 docs(core): remove non-existent param from as_tool docstring (#33165) 2025-09-30 19:43:34 +00:00
Eugene Yurtsev
9c97597175 chore(langchain_v1): expose middleware decorators and selected messages (#33163)
* Make it easy to improve the middleware shortcuts
* Export the messages that we're confident we'll expose
2025-09-30 14:14:57 -04:00
Sydney Runkle
eed0f6c289 feat(langchain): todo middleware (#33152)
Porting the [planning
middleware](39c0138d0f/src/deepagents/middleware.py (L21))
over from deepagents.

Also adding the ability to configure:
* System prompt
* Tool description

```py
from langchain.agents.middleware.planning import PlanningMiddleware
from langchain.agents import create_agent

agent = create_agent("openai:gpt-4o", middleware=[PlanningMiddleware()])

result = await agent.invoke({"messages": [HumanMessage("Help me refactor my codebase")]})

print(result["todos"])  # Array of todo items with status tracking
```
2025-09-30 02:23:26 +00:00
ccurme
729637a347 docs(anthropic): document support for memory tool and context management (#33149) 2025-09-29 16:38:01 -04:00
Mason Daugherty
3325196be1 fix(langchain): handle gpt-5 model name in init_chat_model (#33148)
expand to match any `gpt-*` model to openai
2025-09-29 16:16:17 -04:00
Mason Daugherty
f402fdcea3 fix(langchain): add context_management to Anthropic chat model init (#33150) 2025-09-29 16:13:47 -04:00
ccurme
ca9217c02d release(anthropic): 0.3.21 (#33147) 2025-09-29 19:56:28 +00:00
ccurme
f9bae40475 feat(anthropic): support memory and context management features (#33146)
https://docs.claude.com/en/docs/build-with-claude/context-editing

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-29 15:42:38 -04:00
ccurme
839a18e112 fix(openai): remove __future__.annotations import from test files (#33144)
Breaks schema conversion in places.
2025-09-29 16:23:32 +00:00
Mohammad Mohtashim
33a6def762 fix(core): Support of 'reasoning' type in 'convert_to_openai_messages' (#33050) 2025-09-29 09:17:05 -04:00
nhuang-lc
c456c8ae51 fix(langchain): fix response action for HITL (#33131)
Multiple improvements to HITL flow:

* On a `response` type resume, we should still append the tool call to
the last AIMessage (otherwise we have a ToolResult without a
corresponding ToolCall)
* When all interrupts have `response` types (so there's no pending tool
calls), we should jump back to the first node (instead of end) as we
enforced in the previous `post_model_hook_router`
* Added comments to `model_to_tools` router so clarify all of the
potential exit conditions

Additionally:
* Lockfile update to use latest LG alpha release
* Added test for `jump_to` behaving ephemerally, this was fixed in LG
but surfaced as a bug w/ `jump_to`.
* Bump version to v1.0.0a10 to prep for alpha release

---------

Co-authored-by: Sydney Runkle <sydneymarierunkle@gmail.com>
Co-authored-by: Sydney Runkle <54324534+sydney-runkle@users.noreply.github.com>
2025-09-29 13:08:18 +00:00
Eugene Yurtsev
54ea62050b chore(langchain_v1): move tool node to tools namespace (#33132)
* Move ToolNode to tools namespace
* Expose injected variable as well in tools namespace
* Update doc-strings throughout
2025-09-26 15:23:57 -04:00
Mason Daugherty
986302322f docs: more standardization (#33124) 2025-09-25 20:46:20 -04:00
Mason Daugherty
a5137b0a3e refactor(langchain): resolve pydantic deprecation warnings (#33125) 2025-09-25 17:33:18 -04:00
Mason Daugherty
5bea28393d docs: standardize .. code-block directive usage (#33122)
and fix typos
2025-09-25 16:49:56 -04:00
Mason Daugherty
c3fed20940 docs: correct ported over directives (#33121)
Match rest of repo
2025-09-25 15:54:54 -04:00
Mason Daugherty
6d418ba983 test(mistralai): add xfail for structured output test (#33119)
In rare cases (difficult to reproduce), Mistral's API fails to return
valid bodies, leading to failures from `PydanticToolsParser`
2025-09-25 13:05:31 -04:00
Mason Daugherty
12daba63ff test(openai): raise token limit for o1 test (#33118)
`test_o1[False-False]` was sometimes failing because the OpenAI o1 model
was hitting a token limit with only 100 tokens
2025-09-25 12:57:33 -04:00
Christophe Bornet
eaf8dce7c2 chore: bump ruff version to 0.13 (#33043)
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-25 12:27:39 -04:00
Mason Daugherty
f82de1a8d7 chore: bump locks (#33114) 2025-09-25 01:46:01 -04:00
Mason Daugherty
e3efd1e891 test(text-splitters): capture beta warnings (#33113) 2025-09-25 01:30:20 -04:00
Mason Daugherty
d6769cf032 test(text-splitters): resolve pytest marker warning (#33112)
#33111
2025-09-25 01:29:42 -04:00
Mason Daugherty
7ab2e0dd3b test(core): resolve pytest marker warning (#33111)
Remove redundant/outdated `@pytest.mark.requires("jinja2")` decorator

Pytest marks (like `@pytest.mark.requires(...)`) applied directly to
fixtures have no effect and are deprecated.
2025-09-25 01:08:54 -04:00
Mason Daugherty
81319ad3f0 test(core): resolve pydantic_v1 deprecation warning (#33110)
Excluded pydantic_v1 module from import testing

Acceptable since this pydantic_v1 is explicitly deprecated. Testing its
importability at this stage serves little purpose since users should
migrate away from it.
2025-09-25 01:08:03 -04:00
Mason Daugherty
e3f3c13b75 refactor(core): use aadd_documents in vectorstores unit tests (#33109)
Don't use the deprecated `upsert()` and `aupsert()`

Instead use the recommended alternatives
2025-09-25 00:57:08 -04:00
Mason Daugherty
c30844fce4 fix(core): use version agnostic get_fields (#33108)
Resolves a warning
2025-09-25 00:54:29 -04:00
Mason Daugherty
c9eb3bdb2d test(core): use secure hash algorithm in indexing test to eliminate SHA-1 warning (#33107)
Finish work from #33101
2025-09-25 00:49:11 -04:00
Mason Daugherty
e97baeb9a6 test(core): suppress pydantic_v1 deprecation warnings during import tests (#33106)
We intentionally import these. Hide warnings to reduce testing noise.
2025-09-25 00:37:40 -04:00
Mason Daugherty
3a6046b157 test(core): don't use deprecated input_variables param in from_file (#33105)
finish #33104
2025-09-25 04:29:31 +00:00
Mason Daugherty
8fdc619f75 refactor(core): don't use deprecated input_variables param in from_file (#33104)
Missed awhile back; causes warnings during tests
2025-09-25 00:14:17 -04:00
Ali Ismail
729bfe8369 test(core): enhance stringify_value test coverage for nested structures (#33099)
## Summary
Adds test coverage for the `stringify_value` utility function to handle
complex nested data structures that weren't previously tested.

## Changes
- Added `test_stringify_value_nested_structures()` to `test_strings.py`
- Tests nested dictionaries within lists
- Tests mixed-type lists with various data types
- Verifies proper stringification of complex nested structures

## Why This Matters
- Fills a gap in test coverage for edge cases
- Ensures `stringify_value` handles complex data structures correctly  
- Improves confidence in string utility functions used throughout the
codebase
- Low risk addition that strengthens existing test suite

## Testing
```bash
uv run --group test pytest libs/core/tests/unit_tests/utils/test_strings.py::test_stringify_value_nested_structures -v
```

This test addition follows the project's testing patterns and adds
meaningful coverage without introducing any breaking changes.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-25 00:04:47 -04:00
Mason Daugherty
9b624a79b2 test(core): suppress deprecation warnings in PipelinePromptTemplate (#33102)
We're intentionally testing this still so as not to regress. Reduce
warning noise.
2025-09-25 04:03:27 +00:00
Mason Daugherty
c60c5a91cb fix(core): use secure hash algorithm in indexing test to eliminate SHA-1 warning (#33101)
Use SHA-256 (collision-resistant) instead of the default SHA-1. No
functional changes to test behavior.
2025-09-25 00:02:11 -04:00
Mason Daugherty
d9e0c212e0 chore(infra): add tests to label mapping (#33103) 2025-09-25 00:01:53 -04:00
Sydney Runkle
f015526e42 release(langchain): v1.0.0a9 (#33098) 2025-09-24 21:02:53 +00:00
Sydney Runkle
57d931532f fix(langchain): extra arg for anthropic caching, __end__ -> end for jump_to (#33097)
Also updating `jump_to` to use `end` instead of `__end__`
2025-09-24 17:00:40 -04:00
Mason Daugherty
50012d95e2 chore: update pull_request_target types, harden (#33096)
Enhance the pull request workflows by updating the `pull_request_target`
types and ensuring safety by avoiding checkout of the PR's head. Update
the action to use a specific commit from the archived repository.
2025-09-24 16:37:16 -04:00
Mason Daugherty
33f06875cb fix(langchain_v1): version equality check (#33095) 2025-09-24 16:27:55 -04:00
dependabot[bot]
e5730307e7 chore: bump actions/setup-node from 4 to 5 (#32952)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4
to 5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/setup-node/releases">actions/setup-node's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<h2>What's Changed</h2>
<h3>Breaking Changes</h3>
<ul>
<li>Enhance caching in setup-node with automatic package manager
detection by <a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
</ul>
<p>This update, introduces automatic caching when a valid
<code>packageManager</code> field is present in your
<code>package.json</code>. This aims to improve workflow performance and
make dependency management more seamless.
To disable this automatic caching, set <code>package-manager-cache:
false</code></p>
<pre lang="yaml"><code>steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v5
  with:
    package-manager-cache: false
</code></pre>
<ul>
<li>Upgrade action to use node24 by <a
href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p>Make sure your runner is on version v2.327.1 or later to ensure
compatibility with this release. <a
href="https://github.com/actions/runner/releases/tag/v2.327.1">See
Release Notes</a></p>
<h3>Dependency Upgrades</h3>
<ul>
<li>Upgrade <code>@​octokit/request-error</code> and
<code>@​actions/github</code> by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1227">actions/setup-node#1227</a></li>
<li>Upgrade uuid from 9.0.1 to 11.1.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1273">actions/setup-node#1273</a></li>
<li>Upgrade undici from 5.28.5 to 5.29.0 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1295">actions/setup-node#1295</a></li>
<li>Upgrade form-data to bring in fix for critical vulnerability by <a
href="https://github.com/gowridurgad"><code>@​gowridurgad</code></a> in
<a
href="https://redirect.github.com/actions/setup-node/pull/1332">actions/setup-node#1332</a></li>
<li>Upgrade actions/checkout from 4 to 5 by <a
href="https://github.com/dependabot"><code>@​dependabot</code></a>[bot]
in <a
href="https://redirect.github.com/actions/setup-node/pull/1345">actions/setup-node#1345</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/priya-kinthali"><code>@​priya-kinthali</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li>
<li><a href="https://github.com/salmanmkc"><code>@​salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v5.0.0">https://github.com/actions/setup-node/compare/v4...v5.0.0</a></p>
<h2>v4.4.0</h2>
<h2>What's Changed</h2>
<h3>Bug fixes:</h3>
<ul>
<li>Make eslint-compact matcher compatible with Stylelint by <a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li>Add support for indented eslint output by <a
href="https://github.com/fregante"><code>@​fregante</code></a> in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
</ul>
<h3>Enhancement:</h3>
<ul>
<li>Support private mirrors by <a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<h3>Dependency update:</h3>
<ul>
<li>Upgrade <code>@​action/cache</code> from 4.0.2 to 4.0.3 by <a
href="https://github.com/aparnajyothi-y"><code>@​aparnajyothi-y</code></a>
in <a
href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/FloEdelmann"><code>@​FloEdelmann</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li>
<li><a href="https://github.com/fregante"><code>@​fregante</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li>
<li><a
href="https://github.com/marco-ippolito"><code>@​marco-ippolito</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a0853c2454"><code>a0853c2</code></a>
Bump actions/checkout from 4 to 5 (<a
href="https://redirect.github.com/actions/setup-node/issues/1345">#1345</a>)</li>
<li><a
href="b7234cc9fe"><code>b7234cc</code></a>
Upgrade action to use node24 (<a
href="https://redirect.github.com/actions/setup-node/issues/1325">#1325</a>)</li>
<li><a
href="d7a11313b5"><code>d7a1131</code></a>
Enhance caching in setup-node with automatic package manager detection
(<a
href="https://redirect.github.com/actions/setup-node/issues/1348">#1348</a>)</li>
<li><a
href="5e2628c959"><code>5e2628c</code></a>
Bumps form-data (<a
href="https://redirect.github.com/actions/setup-node/issues/1332">#1332</a>)</li>
<li><a
href="65beceff8e"><code>65becef</code></a>
Bump undici from 5.28.5 to 5.29.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1295">#1295</a>)</li>
<li><a
href="7e24a656e1"><code>7e24a65</code></a>
Bump uuid from 9.0.1 to 11.1.0 (<a
href="https://redirect.github.com/actions/setup-node/issues/1273">#1273</a>)</li>
<li><a
href="08f58d1471"><code>08f58d1</code></a>
Bump <code>@​octokit/request-error</code> and
<code>@​actions/github</code> (<a
href="https://redirect.github.com/actions/setup-node/issues/1227">#1227</a>)</li>
<li>See full diff in <a
href="https://github.com/actions/setup-node/compare/v4...v5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-node&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-24 16:26:05 -04:00
Mason Daugherty
4783a9c18e style: update workflow name for version equality check (#33094) 2025-09-24 20:11:30 +00:00
Mason Daugherty
ee4d84de7c style(core): typo/docs lint pass (#33093) 2025-09-24 16:11:21 -04:00
Mason Daugherty
092dd5e174 chore: update link for monorepo structure (#33091) 2025-09-24 19:55:00 +00:00
Sydney Runkle
dd81e1c3fb release(langchain): 1.0.0a8 (#33090) 2025-09-24 15:31:29 -04:00
Sydney Runkle
135a5b97e6 feat(langchain): improvements to anthropic prompt caching (#33058)
Adding an `unsupported_model_behavior` arg that can be `'ignore'`,
`'warn'`, or `'raise'`. Defaults to `'warn'`.
2025-09-24 15:28:49 -04:00
Mason Daugherty
b92b394804 style: repo linting pass (#33089)
enable docstring-code-format
2025-09-24 15:25:55 -04:00
Sydney Runkle
083bb3cdd7 fix(langchain): need to inject all state for tools registered by middleware (#33087)
Type hints matter for conditional edges!
2025-09-24 15:25:51 -04:00
Mason Daugherty
2e9291cdd7 fix: lift openai version constraints across packages (#33088)
re: #33038 and https://github.com/openai/openai-python/issues/2644
2025-09-24 15:25:10 -04:00
Sydney Runkle
4f8a76b571 chore(langchain): renaming for HITL (#33067) 2025-09-24 07:19:44 -04:00
Mason Daugherty
05ba941230 style(cli): linting pass (#33078) 2025-09-24 01:24:52 -04:00
Mason Daugherty
ae4976896e chore: delete erroneous .readthedocs.yaml (#33079)
From the legacy docs/not needed here
2025-09-24 01:24:42 -04:00
Mason Daugherty
504ef96500 chore: add commit message generation instructions for VSCode (#33077) 2025-09-24 05:06:43 +00:00
Mason Daugherty
d99a02bb27 chore: add AGENTS.md (#33076)
it would be super cool if Anthropic supported this instead of
`CLAUDE.md` :/

https://agents.md/
2025-09-24 05:02:14 +00:00
Mason Daugherty
793de80429 chore: update label mapping in PR title labeler configuration (#33075) 2025-09-24 01:00:14 -04:00
Mason Daugherty
7d4e9d8cda revert(infra): put SECURITY.md at root (#33074) 2025-09-24 00:54:37 -04:00
Mason Daugherty
54dca494cf chore: delete erroneous poetry.toml configuration file (#33073)
- Not used by the current build system
- Potentially confusing for new contributors
- A leftover artifact from the Poetry to uv migration
2025-09-24 04:40:17 +00:00
Mason Daugherty
7b30e58386 chore: delete erroneous yarn.lock in root (#33072)
Appears to have had no purpose/was added by mistake and nobody
questioned it
2025-09-24 04:35:00 +00:00
Mason Daugherty
e62b541dfd chore(infra): move SECURITY.md to .github (#33071)
cleaning up top-level. `.github` folder placement will continue to show
on repo homepage:
https://docs.github.com/en/code-security/getting-started/adding-a-security-policy-to-your-repository#about-security-policies
2025-09-24 00:27:48 -04:00
Mason Daugherty
8699980d09 chore(scripts): remove obsolete release and mypy/ruff update scripts (#33070)
Outdated scripts related to release management and mypy/ruff updates

Cleaning up the root-level
2025-09-24 04:24:38 +00:00
Mason Daugherty
79e536b0d6 chore(infra): further docs build cleanup (#33057)
Reorganize the requirements for better clarity and consistency. Improve
documentation on scripts and workflows.
2025-09-23 17:29:58 -04:00
Sydney Runkle
b5720ff17a chore(langchain): simplifying HITL condition (#33065)
Simplifying condition
2025-09-23 21:24:14 +00:00
nhuang-lc
48b05224ad fix(langchain_v1): only interrupt if at least one ToolConfig value is True (#33064)
**Description:** Right now, we interrupt even if the provided ToolConfig
has all false values. We should ignore ToolConfigs which do not have at
least one value marked as true (just as we would if tool_name: False was
passed into the dict).
2025-09-23 17:20:34 -04:00
Sydney Runkle
89079ad411 feat(langchain): new decorator pattern for dynamically generated middleware (#33053)
# Main Changes

1. Adding decorator utilities for dynamically defining middleware with
single hook functions (see an example below for dynamic system prompt)
2. Adding better conditional edge drawing with jump configuration
attached to middleware. Can be registered w/ the decorator new
decorator!

## Decorator Utilities

```py
from langchain.agents.middleware_agent import create_agent, AgentState, ModelRequest
from langchain.agents.middleware.types import modify_model_request
from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import InMemorySaver


@modify_model_request
def modify_system_prompt(request: ModelRequest, state: AgentState) -> ModelRequest:
    request.system_prompt = (
        "You are a helpful assistant."
        f"Please record the number of previous messages in your response: {len(state['messages'])}"
    )
    return request

agent = create_agent(
    model="openai:gpt-4o-mini", 
    middleware=[modify_system_prompt]
).compile(checkpointer=InMemorySaver())
```

## Visualization and Routing improvements

We now require that middlewares define the valid jumps for each hook.

If using the new decorator syntax, this can be done with:

```py
@before_model(jump_to=["__end__"])
@after_model(jump_to=["tools", "__end__"])
```

If using the subclassing syntax, you can use these two class vars:

```py
class MyMiddlewareAgentMiddleware):
    before_model_jump_to = ["__end__"]
    after_model_jump_to = ["tools", "__end__"]
```

Open for debate if we want to bundle these in a single jump map / config
for a middleware. Easy to migrate later if we decide to add more hooks.

We will need to **really clearly document** that these must be
explicitly set in order to enable conditional edges.

Notice for the below case, `Middleware2` does actually enable jumps.

<table>
  <thead>
    <tr>
      <th>Before (broken), adding conditional edges unconditionally</th>
      <th>After (fixed), adding conditional edges sparingly</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>
<img width="619" height="508" alt="Screenshot 2025-09-23 at 10 23 23 AM"
src="https://github.com/user-attachments/assets/bba2d098-a839-4335-8e8c-b50dd8090959"
/>
      </td>
      <td>
<img width="469" height="490" alt="Screenshot 2025-09-23 at 10 23 13 AM"
src="https://github.com/user-attachments/assets/717abf0b-fc73-4d5f-9313-b81247d8fe26"
/>
      </td>
    </tr>
  </tbody>
</table>

<details>
<summary>Snippet for the above</summary>

```py
from typing import Any
from langchain.agents.tool_node import InjectedState
from langgraph.runtime import Runtime
from langchain.agents.middleware.types import AgentMiddleware, AgentState
from langchain.agents.middleware_agent import create_agent
from langchain_core.tools import tool
from typing import Annotated
from langchain_core.messages import HumanMessage
from typing_extensions import NotRequired

@tool
def simple_tool(input: str) -> str:
    """A simple tool."""
    return "successful tool call"


class Middleware1(AgentMiddleware):
    """Custom middleware that adds a simple tool."""

    tools = [simple_tool]

    def before_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

class Middleware2(AgentMiddleware):

    before_model_jump_to = ["tools", "__end__"]

    def before_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

class Middleware3(AgentMiddleware):

    def before_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> None:
        return None

builder = create_agent(
    model="openai:gpt-4o-mini",
    middleware=[Middleware1(), Middleware2(), Middleware3()],
    system_prompt="You are a helpful assistant.",
)
agent = builder.compile()
```

</details>

## More Examples

### Guardrails `after_model`

<img width="379" height="335" alt="Screenshot 2025-09-23 at 10 40 09 AM"
src="https://github.com/user-attachments/assets/45bac7dd-398e-45d1-ae58-6ecfa27dfc87"
/>

<details>
<summary>Code</summary>

```py
from langchain.agents.middleware_agent import create_agent, AgentState, ModelRequest
from langchain.agents.middleware.types import after_model
from langchain_core.messages import HumanMessage, AIMessage
from langgraph.checkpoint.memory import InMemorySaver
from typing import cast, Any

@after_model(jump_to=["model", "__end__"])
def after_model_hook(state: AgentState) -> dict[str, Any]:
    """Check the last AI message for safety violations."""
    last_message_content = cast(AIMessage, state["messages"][-1]).content.lower()
    print(last_message_content)

    unsafe_keywords = ["pineapple"]
    if any(keyword in last_message_content for keyword in unsafe_keywords):

        # Jump back to model to regenerate response
        return {"jump_to": "model", "messages": [HumanMessage("Please regenerate your response, and don't talk about pineapples. You can talk about apples instead.")]}

    return {"jump_to": "__end__"}

# Create agent with guardrails middleware
agent = create_agent(
    model="openai:gpt-4o-mini",
    middleware=[after_model_hook],
    system_prompt="Keep your responses to one sentence please!"
).compile()

# Test with potentially unsafe input
result = agent.invoke(
    {"messages": [HumanMessage("Tell me something about pineapples")]},
)

for msg in result["messages"]:
    print(msg.pretty_print())

"""
================================ Human Message =================================

Tell me something about pineapples
None
================================== Ai Message ==================================

Pineapples are tropical fruits known for their sweet, tangy flavor and distinctive spiky exterior.
None
================================ Human Message =================================

Please regenerate your response, and don't talk about pineapples. You can talk about apples instead.
None
================================== Ai Message ==================================

Apples are popular fruits that come in various varieties, known for their crisp texture and sweetness, and are often used in cooking and baking.
None
"""
```

</details>
2025-09-23 13:25:55 -04:00
Mason Daugherty
2c95586f2a chore(infra): audit workflows, scripts (#33055)
Mostly adding a descriptive frontmatter to workflow files. Also address
some formatting and outdated artifacts

No functional changes outside of
[d5457c3](d5457c39ee),
[90708a0](90708a0d99),
and
[338c82d](338c82d21e)
2025-09-23 17:08:19 +00:00
Mason Daugherty
9c1285cf5b chore(infra): fix ping pong pr labeler config (#33054)
The title-based labeler was clearing all pre-existing labels (including
the file-based ones) before adding its semantic labels.
2025-09-22 21:19:53 -04:00
Sydney Runkle
c3be45bf14 fix(langchain): HITL bug causing dupe interrupt (#33052)
Need to find **last** AI msg (not first). Getting too creative w/
generators.
2025-09-22 20:09:12 -04:00
Arman Tsaturian
8f488d62b2 docs: fix stripe toolkit import in the guide (#33044)
**Description:**
Stripe tools integration guide incorrectly referenced the `crewai`
toolkit. Updated the import to use the correct `langchain` toolkit.

Stripe docs reference:
https://docs.stripe.com/agents?framework=langchain&lang=python
2025-09-22 15:17:09 -04:00
Mason Daugherty
cdae9e4942 fix(infra): prevent labeler workflow from adding/removing same labels (#33039)
The file-based and title-based labeler workflows were conflicting,
causing the bot to add and remove identical labels in the same
operation. Hopefully this fixes
2025-09-21 04:37:59 +00:00
Mason Daugherty
7ddc798f95 fix(openai): pin upper bound to prevent Pydantic 2.7.0 issues (#33038)
https://github.com/openai/openai-python/issues/2644
2025-09-21 00:27:03 -04:00
Mason Daugherty
7dcf6a515e fix: update method calls from dict to model_dump in Chain (#33035) 2025-09-20 23:47:44 -04:00
Mason Daugherty
043a7560a5 test: use .get() for safe ls_params access (#33034) 2025-09-20 23:46:37 -04:00
Mason Daugherty
5b418d3f26 feat(infra): add PR labeler configurations and workflows (#33031) 2025-09-20 22:33:08 -04:00
Mason Daugherty
6b4054c795 chore(infra): update pre-commit hooks to include linting (#33029) 2025-09-21 02:26:19 +00:00
Mason Daugherty
30fde5af38 chore(infra): remove couchbase formatting hook from pre-commit (#33030)
Should've been done when it was removed from the monorepo
2025-09-20 22:09:57 -04:00
Mason Daugherty
781db9d892 chore: update pyproject.toml files, remove codespell (#33028)
- Removes Codespell from deps, docs, and `Makefile`s
- Python version requirements in all `pyproject.toml` files now use the
`~=` (compatible release) specifier
- All dependency groups and main dependencies now use explicit lower and
upper bounds, reducing potential for breaking changes
2025-09-20 22:09:33 -04:00
Sydney Runkle
f2b0afd0b7 release(langchain): 1.0.0a6 (#33024)
w/ improvements to HITL, state schema merging, dynamic system prompt
2025-09-19 18:47:41 +00:00
Sydney Runkle
c3654202a3 fix(langchain): use state schema as input schema to middleware nodes (#33023)
We want state schema as the input schema to middleware nodes because the
conditional edges after these nodes need access to the full state.

Also, we just generally want all state passed to middleware nodes, so we
should be specifying this explicitly. If we don't, the state annotations
used by users in their node signatures are used (so they might be
missing fields).
2025-09-19 18:43:33 +00:00
Sydney Runkle
4d118777bc feat(langchain): dynamic system prompt middleware (#33006)
# Changes

## Adds support for `DynamicSystemPromptMiddleware`

```py
from langchain.agents.middleware import DynamicSystemPromptMiddleware
from langgraph.runtime import Runtime
from typing_extensions import TypedDict

class Context(TypedDict):
    user_name: str

def system_prompt(state: AgentState, runtime: Runtime[Context]) -> str:
    user_name = runtime.context.get("user_name", "n/a")
    return f"You are a helpful assistant. Always address the user by their name: {user_name}"

middleware = DynamicSystemPromptMiddleware(system_prompt)
```

## Adds support for `runtime` in middleware hooks

```py
class AgentMiddleware(Generic[StateT, ContextT]):
    def modify_model_request(
        self,
        request: ModelRequest,
        state: StateT,
        runtime: Runtime[ContextT],  # Optional runtime parameter
    ) -> ModelRequest:
        # upgrade model if runtime.context.subscription is `top-tier` or whatever
```

## Adds support for omitting state attributes from input / output
schemas

```py
from typing import Annotated, NotRequired
from langchain.agents.middleware.types import PrivateStateAttr, OmitFromInput, OmitFromOutput

class CustomState(AgentState):
    # Private field - not in input or output schemas
    internal_counter: NotRequired[Annotated[int, PrivateStateAttr]]
    
    # Input-only field - not in output schema
    user_input: NotRequired[Annotated[str, OmitFromOutput]]
    
    # Output-only field - not in input schema  
    computed_result: NotRequired[Annotated[str, OmitFromInput]]
```

## Additionally
* Removes filtering of state before passing into middleware hooks

Typing is not foolproof here, still need to figure out some of the
generics stuff w/ state and context schema extensions for middleware.

TODO:
* More docs for middleware, should hold off on this until other prios
like MCP and deepagents are met

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-09-18 16:07:16 -04:00
Mason Daugherty
f158cea1e8 release(mistralai): 0.2.12 (#33008) 2025-09-18 11:42:11 -04:00
Sadiq Khan
90280d1f58 docs(core): fix bugs and improve example code in chat_history.py (#32994)
## Summary

This PR fixes several bugs and improves the example code in
`BaseChatMessageHistory` docstring that would prevent it from working
correctly.

### Bugs Fixed
- **Critical bug**: Fixed `json.dump(messages, f)` →
`json.dump(serialized, f)` - was using wrong variable
- **NameError**: Fixed bare variable references to use
`self.storage_path` and `self.session_id`
- **Missing imports**: Added required imports (`json`, `os`, message
converters) to make example runnable

### Improvements
- Added missing type hints following project standards (`messages() ->
list[BaseMessage]`, `clear() -> None`)
- Added robust error handling with `FileNotFoundError` exception
handling
- Added directory creation with `os.makedirs(exist_ok=True)` to prevent
path errors
- Improved performance: `json.load(f)` instead of `json.loads(f.read())`
- Added explicit UTF-8 encoding to all file operations
- Updated stores.py to use modern union syntax (`int | None` vs
`Optional[int]`)

### Test Plan
- [x] Code passes linting (`ruff check`)
- [x] Example code now has all required imports and proper syntax
- [x] Fixed variable references prevent runtime errors
- [x] Follows project's type annotation standards

The example code in the docstring is now fully functional and follows
LangChain's coding standards.

---------

Co-authored-by: sadiqkhzn <sadiqkhzn@users.noreply.github.com>
2025-09-18 09:34:19 -04:00
Dushmanta
ee340e0a3b fix(docs): update dead link to docling github and docs (#33001)
- **Description:** Updated the dead/unreachable links to Docling from
the additional resources section of the langchain-docling docs
  - **Issue:** Fixes langchain-ai/docs/issues/574
  - **Dependencies:** None
2025-09-18 09:30:29 -04:00
Sydney Runkle
d5ba5d3511 feat(langchain): improved HITL patterns (#32996)
# Main changes / new features

## Better support for parallel tool calls

1. Support for multiple tool calls requiring human input
2. Support for combination of tool calls requiring human input + those
that are auto-approved
3. Support structured output w/ tool calls requiring human input
4. Support structured output w/ standard tool calls

## Shortcut for allowed actions

Adds a shortcut where tool config can be specified as a `bool`, meaning
"all actions allowed"

```py
HumanInTheLoopMiddleware(tool_configs={"expensive_tool": True})
```

## A few design decisions here
* We only raise one interrupt w/ all `HumanInterrupt`s, currently we
won't be able to execute all tools until all of these are resolved. This
isn't super blocking bc we can't re-invoke the model until all tools
have finished execution. That being said, if you have a long running
auto-approved tool, this could slow things down.

## TODOs

* Ideally, we would rename `accept` -> `approve`
* Ideally, we would rename `respond` -> `reject`
* Docs update (@sydney-runkle to own)
* In another PR I'd like to refactor testing to have one file for each
prebuilt middleware :)

Fast follow to https://github.com/langchain-ai/langchain/pull/32962
which was deemed as too breaking
2025-09-17 16:53:01 -04:00
Mason Daugherty
76d0758007 fix(docs): json_mode -> json_schema (#32993) 2025-09-17 18:21:34 +00:00
Mason Daugherty
8b3f74012c docs: update GenAI structured output section to include JSON mode details (#32992) 2025-09-17 17:40:34 +00:00
Mason Daugherty
54a9556f5c chore(cli): update lock (#32986) 2025-09-17 02:08:20 +00:00
Mason Daugherty
66041a2778 refactor(cli): target ruff 310 (#32985)
Use union types for optional parameters
2025-09-16 22:04:28 -04:00
Mason Daugherty
ab1b822523 chore: update PR title lint (#32983) 2025-09-16 22:04:19 -04:00
Chase Lean
543d90e108 docs: add langchain-scraperapi (#31973)
Adds documentation for the integration langchain-scraperapi, which
contains 3 tools using the ScraperAPI service.

The tools give AI agents the ability to

Scrape the web and return HTML/text/markdown
Perform Google search and return json output
Perform Amazon search and return json output

For reference, here is the official repo for langchain_scraperapi:
https://github.com/scraperapi/langchain-scraperapi
2025-09-16 21:46:20 -04:00
Adam Deedman
f8640630d8 docs: fix memory for agents (#32979)
Replaced `input_message` parameter with a directly called tuple, e.g.
`{"messages": [("user", "What is my name?")]}`

Before, the memory function wasn't working with the agent, using the
format of the input_message parameter.

Specifically, on page [Build an
Agent#adding-in-memory](https://python.langchain.com/docs/tutorials/agents/#adding-in-memory)

In the previous code, the query "What's my name?" wasn't working, as the
agent could not recall memory correctly.

<img width="860" height="679" alt="image"
src="https://github.com/user-attachments/assets/dfbca21e-ffe9-4645-a810-3be7a46d81d5"
/>
2025-09-16 15:46:15 -04:00
Mason Daugherty
f9605c7438 chore(infra): update contribution guide link in CONTRIBUTING.md (#32976) 2025-09-16 15:15:53 +00:00
Mason Daugherty
ebd6f7d8a3 chore(infra): update security guidelines formatting (#32975) 2025-09-16 15:12:10 +00:00
ccurme
e63c1d7171 chore(langchain): drop cap on python version (#32974) 2025-09-16 10:44:21 -04:00
Mason Daugherty
8180020b93 chore: restore commented out optional deps (#32971)
langchain & langchain_v1
2025-09-16 10:10:49 -04:00
Username46786
435194acf6 docs: add cross-links between summarization how-to pages (#32968)
This PR improves navigation in the summarization how-to section by
adding
cross-links from the single-call guide to the related map-reduce and
refine
guides. This mirrors the docs style guide’s emphasis on clear
cross-references
and should help readers discover the appropriate pattern for longer
texts.

- Source edited: docs/docs/how_to/summarize_stuff.ipynb
- Links added:
  - /docs/how_to/summarize_map_reduce/
  - /docs/how_to/summarize_refine/

Type: docs-only (no code changes)
2025-09-16 09:59:03 -04:00
Mason Daugherty
244c699551 refactor(cli): drop Python 3.9 (#32964) 2025-09-15 19:22:53 -04:00
Mason Daugherty
369858de19 chore(infra): fix codspeed (#32963)
Related to #32950

CodSpeed v4 runs pytest inside its own runner process, which does not
automatically inherit environment variables from the job
2025-09-15 21:52:46 +00:00
Ali Ismail
4ebce80fbb docs(langchain): add docstring for _load_map_reduce_chain (#32961)
Description:
Add a docstring to _load_map_reduce_chain in chains/summarize/ to
explain the purpose of the prompt argument and document function
parameters. This addresses an existing TODO in the codebase.

Issue:
N/A (documentation improvement only)

Dependencies:
None
2025-09-15 17:19:20 -04:00
Mason Daugherty
8670b24c8e test(groq): xfail tool integration test (#32960)
Groq models have known issues with tool calling consistency,
[particularly with complex tools derived from
runnables](https://github.com/langchain-ai/langchain/discussions/19990).
[(more)](https://github.com/langchain-ai/langchain/discussions/24309)

xfail until we can dedicate time to wrangling their API/model handling
2025-09-15 14:23:22 -04:00
Ademílson Tonato
8d60ddba3a docs: update installation command for firecrawl-py package (#32958) 2025-09-15 14:10:08 -04:00
Mason Daugherty
9f6431924f feat(openai): add max_tokens to AzureChatOpenAI (#32959)
Fixes #32949

This pattern is [present in
`ChatOpenAI`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/langchain_openai/chat_models/base.py#L2821)
but wasn't carried over to Azure.


[CI](https://github.com/langchain-ai/langchain/actions/runs/17741751797/job/50417180998)
2025-09-15 14:09:20 -04:00
Ali Ismail
569a3d9602 docs(langchain): add docstring for _load_stuff_chain (#32932)
**Description:**  
Add a docstring to `_load_stuff_chain` in `chains/summarize/` to explain
the purpose of the `prompt` argument and document function parameters.
This addresses an existing TODO in the codebase.

**Issue:**  
N/A (documentation improvement only)

**Dependencies:**  
None
2025-09-15 10:02:50 -04:00
dependabot[bot]
8ef4df903f chore(infra): bump CodSpeedHQ/action from 3 to 4 (#32950)
Bumps [CodSpeedHQ/action](https://github.com/codspeedhq/action) from 3
to 4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/codspeedhq/action/releases">CodSpeedHQ/action's
releases</a>.</em></p>
<blockquote>
<h2>v4.0.0</h2>
<h2>💥 BREAKING</h2>
<p>It's now required to explicitly set the runner mode to
<code>instrumentation</code> or <code>walltime</code> using either:</p>
<ul>
<li>the <code>mode</code> argument</li>
<li>or the <code>CODSPEED_RUNNER_MODE</code> environment variable</li>
</ul>
<blockquote>
<p>[!TIP]
Before, this variable was automatically set to
<code>instrumentation</code> on every runner except for <a
href="https://codspeed.io/docs/instruments/walltime">CodSpeed macro
runners</a> where it was set to <code>walltime</code> by default.</p>
</blockquote>
<p>Find more details in <a
href="https://codspeed.io/docs/instruments">the instruments
documentation</a>.</p>
<h2>Details</h2>
<h3><!-- raw HTML omitted -->🚀 Features</h3>
<ul>
<li>Make perf profiling enabled by default by <a
href="https://github.com/GuillaumeLagrange"><code>@​GuillaumeLagrange</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/110">#110</a></li>
<li>Make the runner mode argument required by <a
href="https://github.com/GuillaumeLagrange"><code>@​GuillaumeLagrange</code></a></li>
<li>Use introspected node in walltime mode by <a
href="https://github.com/GuillaumeLagrange"><code>@​GuillaumeLagrange</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/108">#108</a></li>
<li>Add instrumented go shell script by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/102">#102</a></li>
</ul>
<h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3>
<ul>
<li>Compute proper load bias by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/107">#107</a></li>
<li>Increase timeout for first perf ping by <a
href="https://github.com/GuillaumeLagrange"><code>@​GuillaumeLagrange</code></a></li>
<li>Prevent running with valgrind by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/106">#106</a></li>
</ul>
<h3><!-- raw HTML omitted -->🏗️ Refactor</h3>
<ul>
<li>Change go-runner binary name by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/111">#111</a></li>
</ul>
<p><strong>Full Runner Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md">https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md</a></p>
<h2>v3.8.1</h2>
<h2>What's Changed</h2>
<h3><!-- raw HTML omitted -->🐛 Bug Fixes</h3>
<ul>
<li>Don't show error when libpython is not found by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a></li>
</ul>
<h3><!-- raw HTML omitted -->🏗️ Refactor</h3>
<ul>
<li>Improve conditional compilation in
<code>get_pipe_open_options</code> by <a
href="https://github.com/art049"><code>@​art049</code></a> in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/100">#100</a></li>
</ul>
<h3><!-- raw HTML omitted -->⚙️ Internals</h3>
<ul>
<li>Change log level to warn for venv_compat error by <a
href="https://github.com/not-matthias"><code>@​not-matthias</code></a>
in <a
href="https://redirect.github.com/CodSpeedHQ/runner/pull/104">#104</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/action/compare/v3.8.0...v3.8.1">https://github.com/CodSpeedHQ/action/compare/v3.8.0...v3.8.1</a>
<strong>Full Runner Changelog</strong>: <a
href="https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md">https://github.com/CodSpeedHQ/runner/blob/main/CHANGELOG.md</a></p>
<h2>v3.8.0</h2>
<h2>What's Changed</h2>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="653fdc30e6"><code>653fdc3</code></a>
Release v4.0.1 🚀</li>
<li><a
href="4da7be1bda"><code>4da7be1</code></a>
chore: bump runner version to 4.0.1</li>
<li><a
href="172d6c5630"><code>172d6c5</code></a>
chore: make the comment about input validation more discrete</li>
<li><a
href="d15e1ce813"><code>d15e1ce</code></a>
chore: improve the release script</li>
<li><a
href="6eeb021fd0"><code>6eeb021</code></a>
Release v4.0.0 🚀</li>
<li><a
href="74312dabbe"><code>74312da</code></a>
chore: improve the release script</li>
<li><a
href="8a17a350a8"><code>8a17a35</code></a>
ci: add modes to the matrix</li>
<li><a
href="8e3f02a649"><code>8e3f02a</code></a>
feat: make the mode argument required</li>
<li><a
href="97c7a6f5fc"><code>97c7a6f</code></a>
chore: bump runner version to 4.0.0</li>
<li><a
href="8a4cadd026"><code>8a4cadd</code></a>
chore: point the changelog to the runner</li>
<li>See full diff in <a
href="https://github.com/codspeedhq/action/compare/v3...v4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=CodSpeedHQ/action&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-15 13:56:46 +00:00
doubleinfinity
b944bbc766 docs: add ZeusDB vector store integration (#32822)
## Description

This PR adds documentation for the new ZeusDB vector store integration
with LangChain.

## Motivation

ZeusDB is a high-performance vector database (Python/Rust backend)
designed for AI applications that need fast similarity search and
real-time vector ops. This integration brings ZeusDB's capabilities to
the LangChain ecosystem, giving developers another production-oriented
option for vector storage and retrieval.

**Key Features:**
- **User-Friendly Python API**: Intuitive interface that integrates
seamlessly with Python ML workflows
- **High Performance**: Powered by a robust Rust backend for
lightning-fast vector operations
- **Enterprise Logging**: Comprehensive logging capabilities for
monitoring and debugging production systems
- **Advanced Features**: Includes product quantization and persistence
capabilities
- **AI-Optimized**: Purpose-built for modern AI applications and RAG
pipelines

## Changes

- Added provider documentation:
`docs/docs/integrations/providers/zeusdb.mdx` (installation, setup).

- Added vector store documentation:
`docs/docs/integrations/vectorstores/zeusdb.ipynb` (quickstart for
creating/querying a ZeusDBVectorStore).

- Registered langchain-zeusdb in `libs/packages.yml` for discovery.

## Target users

- AI/ML engineers building RAG pipelines

- Data scientists working with large document collections

- Developers needing high-throughput vector search

- Teams requiring near real-time vector operations

## Testing

- Followed LangChain's "How to add standard tests to an integration"
guidance.
- Code passes format, lint, and test checks locally.
- Tested with LangChain Core 0.3.74
- Works with Python 3.10 to 3.13

## Package Information
**PyPI:** https://pypi.org/project/langchain-zeusdb
**Github:** https://github.com/ZeusDB/langchain-zeusdb
2025-09-15 09:55:14 -04:00
Filip Makraduli
0be7515abc docs: add superlinked retriever integration (#32433)
# feat(superlinked): add superlinked retriever integration

**Description:** 
Add Superlinked as a custom retriever with full LangChain compatibility.
This integration enables users to leverage Superlinked's multi-modal
vector search capabilities including text similarity, categorical
similarity, recency, and numerical spaces with flexible weighting
strategies. The implementation provides a `SuperlinkedRetriever` class
that extends LangChain's `BaseRetriever` with comprehensive error
handling, parameter validation, and support for various vector databases
(in-memory, Qdrant, Redis, MongoDB).

**Key Features:**
- Full LangChain `BaseRetriever` compatibility with `k` parameter
support
- Multi-modal search spaces (text, categorical, numerical, recency)
- Flexible weighting strategies for complex search scenarios
- Vector database agnostic implementation
- Comprehensive validation and error handling
- Complete test coverage (unit tests, integration tests)
- Detailed documentation with 6 practical usage examples

**Issue:** N/A (new integration)

**Dependencies:** 
- `superlinked==33.5.1` (peer dependency, imported within functions)
- `pandas^2.2.0` (required by superlinked)

**Linkedin handle:** https://www.linkedin.com/in/filipmakraduli/

## Implementation Details

### Files Added/Modified:
- `libs/partners/superlinked/` - Complete package structure
- `libs/partners/superlinked/langchain_superlinked/retrievers.py` - Main
retriever implementation
- `libs/partners/superlinked/tests/unit_tests/test_retrievers.py` - unit
tests
- `libs/partners/superlinked/tests/integration_tests/test_retrievers.py`
- Integration tests with mocking
- `docs/docs/integrations/retrievers/superlinked.ipynb` - Documentation
a few usage examples

### Testing:
- `make format` - passing
- `make lint` - passing 
- `make test` - passing (16 unit tests, integration tests)
- Comprehensive test coverage including error handling, validation, and
edge cases

### Documentation:
- Example notebook with 6 practical scenarios:
  1. Simple text search
  2. Multi-space blog search (content + category + recency)
  3. E-commerce product search (price + brand + ratings)
  4. News article search (sentiment + topics + recency)
  5. LangChain RAG integration example
  6. Qdrant vector database integration

### Code Quality:
- Follows LangChain contribution guidelines
- Backwards compatible
- Optional dependencies imported within functions
- Comprehensive error handling and validation
- Type hints and docstrings throughout

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-15 13:54:04 +00:00
Sadiq Khan
cc9a97a477 docs(core): add type hints to BaseStore example code (#32946)
## Summary
- Add comprehensive type hints to the MyInMemoryStore example code in
BaseStore docstring
- Improve documentation quality and educational value for developers
- Align with LangChain's coding standards requiring type hints on all
Python code

## Changes Made
- Added return type annotations to all methods (__init__, mget, mset,
mdelete, yield_keys)
- Added parameter type annotations using proper generic types (Sequence,
Iterator)
- Added instance variable type annotation for the store attribute
- Used modern Python union syntax (str | None) for optional types

## Test Plan
- Verified Python syntax validity with ast.parse()
- No functional changes to actual code, only documentation improvements
- Example code now follows best practices and coding standards

This change improves the educational value of the example code and
ensures consistency with LangChain's requirement that "All Python code
MUST include type hints and return types" as specified in the
development guidelines.

---------

Co-authored-by: sadiqkhzn <sadiqkhzn@users.noreply.github.com>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-15 13:45:34 +00:00
Dmitry
ee17adb022 docs: add AI/ML API integration (#32430)
**Description:**
Introduces documentation notebooks for AI/ML API integration covering
the following use cases:
- Chat models (`ChatAimlapi`)
- Text completion models (`AimlapiLLM`)
- Provider usage examples
- Text embedding models (`AimlapiEmbeddings`)

Additionally, adds the `langchain-aimlapi` package entry to
`libs/packages.yml` for package management.

This PR aims to provide a comprehensive starting point for developers
integrating AI/ML API models with LangChain via the new
`langchain-aimlapi` package.

**Issue:** N/A

**Dependencies:** None

**Twitter handle:** @aimlapi

---

### **To-Do Before Submitting PR:**

* [x] Run `make format`
* [x] Run `make lint`
* [x] Confirm all documentation notebooks are in
`docs/docs/integrations/`
* [x] Double-check `libs/packages.yml` has the correct repo path
* [x] Confirm no `pyproject.toml` modifications were made unnecessarily

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-15 09:41:40 -04:00
Noraina
6a43f140bc docs: update SerpApi free searches amount in tool feature table (#32945)
**Description:** 
This PR updates the free searches per month from **100** to **250** and
renames SerpAPI to [SerpApi](https://serpapi.com/) to prevent confusion.
Add import API keys and enhance usage instructions in the Jupyter
notebook

**Issue:** N/A

**Dependencies:** N/A

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.
2025-09-14 21:42:59 -04:00
Youngho Kim
4619a2727f docs(anthropic): update documentation links (#32938)
**Description:**
This PR updated links to the latest Anthropic documentation. Changes
include revised links for model overview, tool usage, web search tool,
text editor tool, and more.

**Issue:**
N/A

**Dependencies:**
None

**Twitter handle:**
N/A
2025-09-14 21:38:51 -04:00
湛露先生
6487a7e2e5 chore(langchain): remove duplicate .pdf listing (#32929) 2025-09-14 21:33:40 -04:00
湛露先生
406ebc9141 chore(langchain): Fix typos in core docstrings (#32928)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-09-14 21:33:06 -04:00
Nikhil Chandrappa
e6b5ff213a docs: add YugabyteDB Distributed SQL database (#32571)
- **Description:** The `langchain-yugabytedb` package implementations of
core LangChain abstractions using `YugabyteDB` Distributed SQL Database.
  
YugabyteDB is a cloud-native distributed PostgreSQL-compatible database
that combines strong consistency with ultra-resilience, seamless
scalability, geo-distribution, and highly flexible data locality to
deliver business-critical, transactional applications.

[YugabyteDB](https://www.yugabyte.com/ai/) combines the power of the
`pgvector` PostgreSQL extension with an inherently distributed
architecture. This future-proofed foundation helps you build GenAI
applications using RAG retrieval that demands high-performance vector
search.

- [ ] **tests and docs**: 
1. `langchain-yugabytedb`
[github](https://github.com/yugabyte/langchain-yugabytedb) repo.
2. YugabyteDB VectorStore example notebook showing its use. It lives in
`langchain/docs/docs/integrations/vectorstores/yugabytedb.ipynb`
directory.
  3. Running `langchain-yugabytedb` unit tests 
  
- Setting up a Development Environment

This document details how to set up a local development environment that
will
allow you to contribute changes to the project.

Acquire sources and create virtualenv.
```shell
git clone https://github.com/yugabyte/langchain-yugabytedb
cd langchain-yugabytedb
uv venv --python=3.13
source .venv/bin/activate
```

Install package in editable mode.
```shell
uv pip install pipx  
pipx install poetry
poetry install
uv pip install pytest pytest_asyncio pytest-timeout langchain-core langchain_tests sqlalchemy psycopg psycopg-binary numpy pgvector
```

Start YugabyteDB RF-1 Universe.
```shell
docker run -d --name yugabyte_node01 --hostname yugabyte01 \
  -p 7000:7000 -p 9000:9000 -p 15433:15433 -p 5433:5433 -p 9042:9042 \
  yugabytedb/yugabyte:2.25.2.0-b359 bin/yugabyted start --background=false \
  --master_flags="allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true" \
  --tserver_flags="allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true"

docker exec -it yugabyte_node01 bin/ysqlsh -h yugabyte01 -c "CREATE extension vector;"
```

Invoke test cases.
```shell
pytest -vvv tests/unit_tests/yugabytedb_tests
```
2025-09-12 16:55:09 -04:00
Michael Yilma
03f0ebd93e docs: add Bigtable Key-value Store and Vector Store Docs (#32598)
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**

- [x] **feat(docs)**: add Bigtable Key-value store doc
- [X] **feat(docs)**: add Bigtable Vector store doc 

This PR adds a doc for Bigtable and LangChain Key-value store
integration. It contains guides on how to add, delete, get, and yield
key-value pairs from Bigtable Key-value Store for LangChain.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. **We will not consider
a PR unless these three are passing in CI.** See [contribution
guidelines](https://python.langchain.com/docs/contributing/) for more.

Additional guidelines:

- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to `pyproject.toml` files (even
optional ones) unless they are **required** for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:53:59 -04:00
Bar Cohen
c9eed530ce docs: add Timbr tools integration (#32862)
# feat(integrations): Add Timbr tools integration

## DESCRIPTION

This PR adds comprehensive documentation and integration support for
Timbr's semantic layer tools in LangChain.

[Timbr](https://timbr.ai/) provides an ontology-driven semantic layer
that enables natural language querying of databases through
business-friendly concepts. It connects raw data to governed business
measures for consistent access across BI, APIs, and AI applications.

[`langchain-timbr`](https://pypi.org/project/langchain-timbr/) is a
Python SDK that extends
[LangChain](https://github.com/WPSemantix/Timbr-GenAI/tree/main/LangChain)
and
[LangGraph](https://github.com/WPSemantix/Timbr-GenAI/tree/main/LangGraph)
with custom agents, chains, and nodes for seamless integration with the
Timbr semantic layer. It enables converting natural language prompts
into optimized semantic-SQL queries and executing them directly against
your data.

**What's Added:**
- Complete integration documentation for `langchain-timbr` package
- Tool documentation page with usage examples and API reference

**Integration Components:**
- `IdentifyTimbrConceptChain` - Identify relevant concepts from user
prompts
- `GenerateTimbrSqlChain` - Generate SQL queries from natural language
- `ValidateTimbrSqlChain` - Validate queries against knowledge graph
schemas
- `ExecuteTimbrQueryChain` - Execute queries against semantic databases
- `GenerateAnswerChain` - Generate human-readable answers from results

## Documentation Added

- `/docs/integrations/providers/timbr.mdx` - Provider overview and
configuration
- `/docs/integrations/tools/timbr.ipynb` - Comprehensive tool usage
examples

## Links

- [PyPI Package](https://pypi.org/project/langchain-timbr/)
- [GitHub Repository](https://github.com/WPSemantix/langchain-timbr)
- [Official
Documentation](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/)

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:51:42 -04:00
tbice
e6c38a043f docs: add Qwen integration guide and update qwq documentation (#32817)
Thank you for contributing to LangChain! Follow these steps to mark your
pull request as ready for review. **If any of these steps are not
completed, your PR will not be considered for review.**

**Description:**  
Add documentation for Qwen integration in LangChain, including setup
instructions, usage examples, and configuration details. Update related
qwq documentation to reflect current best practices and improve clarity
for users.

This PR enhances the documentation ecosystem by:
- Adding a new guide for integrating Qwen models
- Updating outdated or incomplete qwq documentation
- Improving structure and readability of relevant sections

**Issue:** N/A  
**Dependencies:** None

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:49:20 -04:00
Elif Sema Balcioglu
dc47c2c598 docs: update langchain-oracledb documentation (#32805)
`Oracle AI Vector Search` integrations for LangChain have been moved to
a dedicated package, [langchain-oracledb
](https://pypi.org/project/langchain-oracledb/), and a new repository,
[langchain-oracle
](https://github.com/oracle/langchain-oracle/tree/main/libs/oracledb).
This PR updates the corresponding documentation, including installation
instructions and import statements, to reflect these changes.

This PR is complemented with:
https://github.com/langchain-ai/langchain-community/pull/283

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:47:10 -04:00
Yuvraj Chandra
3420ca1da2 docs: add ZenRows provider and tool integration docs (#31742)
**Description:** Adds documentation for ZenRows integration with
LangChain, including provider overview and detailed tool documentation.
ZenRows is an enterprise-grade web scraping solution that enables
LangChain agents to extract web content at scale with advanced features
like JavaScript rendering, anti-bot bypass, geo-targeting, and multiple
output formats.

This PR includes:
- Provider documentation
(`docs/docs/integrations/providers/zenrows.ipynb`)
- Tool documentation
(`docs/docs/integrations/tools/zenrows_universal_scraper.ipynb`)
- Complete usage examples and API reference links

**Issue:** N/A

**Dependencies:** 
- [langchain-zenrows](https://github.com/ZenRows-Hub/langchain-zenrows)
package (external, available on
[PyPI](https://pypi.org/project/langchain-zenrows/))
- No changes to core LangChain dependencies

**LinkedIn handle:** https://www.linkedin.com/company/zenrows/

---------

Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:37:49 -04:00
Vishal Karwande
f11dd177e9 docs: update oci documentation and examples. (#32749)
Adding Oracle Generative AI as one of the providers for langchain.
Updated the old examples in the documentation with the new working
examples.

---------

Co-authored-by: Vishal Karwande <vishalkarwande@Vishals-MacBook-Pro.local>
Co-authored-by: Mason Daugherty <mason@langchain.dev>
2025-09-12 16:28:03 -04:00
Ali Ismail
d5a4abf960 docs(core): remove duplicate 'the' in indexing/api.py (#32924)
**Description:** Fixes a small typo in `_get_document_with_hash` inside 
`libs/core/langchain_core/indexing/api.py`.

**Issue:** N/A (no related issue)

**Dependencies:** None
2025-09-12 15:49:54 -04:00
Eugene Yurtsev
b1497bcea1 chore(core): test that default values in tool calls are preserved in json schema representation (#32921)
Add unit test coverage for this issue:
https://github.com/langchain-ai/langchain/issues/32232
2025-09-12 12:50:54 -04:00
Sydney Runkle
84f9824cc9 chore: use uv caches (#32919)
Especially helpful for the text splitters tests where we're installing
pytorch (expensive and slow slow slow). Should speed up CI by 5-10 mins.

w/o caches, CI taking 20 minutes 😨 
w/ caches, CI taking 3 minutes
2025-09-12 10:29:35 -04:00
Sydney Runkle
0814bfe5ed ci: use partial runs w/ codspeed (#32920)
Taking advantage of [partial
runs](https://codspeed.io/docs/features/partial-runs)!

This should save us minutes on every CI job, we only run codspeed for
libs w/ changes and this doesn't affect benchmarking drops
2025-09-12 09:46:01 -04:00
Christophe Bornet
cbaf97ada4 chore: bump mypy version to 1.18 (#32914) 2025-09-12 09:19:23 -04:00
Sydney Runkle
dc2da95ac0 release(langchain): v1.0.0a5 (#32917) 2025-09-12 08:36:44 -04:00
Sydney Runkle
9e78ff19ab fix(langchain): use messages from model request (#32908)
Oversight when moving back to basic function call for
`modify_model_request` rather than implementation as its own node.

Basic test right now failing on main, passing on this branch

Revealed a gap in testing. Will write up a more robust test suite for
basic middleware features.
2025-09-12 08:18:02 -04:00
440 changed files with 37076 additions and 16963 deletions

View File

@@ -3,8 +3,4 @@
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
## New features
For new features, please start a new [discussion on our forum](https://forum.langchain.com/), where the maintainers will help with scoping out the necessary changes.
To learn how to contribute to LangChain, please follow the [contribution guide here](https://docs.langchain.com/oss/python/contributing).

View File

@@ -119,7 +119,3 @@ body:
python -m langchain_core.sys_info
validations:
required: true

View File

@@ -42,11 +42,11 @@ body:
label: Feature Description
description: |
Please provide a clear and concise description of the feature you would like to see added to LangChain.
What specific functionality are you requesting? Be as detailed as possible.
placeholder: |
I would like LangChain to support...
This feature would allow users to...
- type: textarea
id: use-case
@@ -56,13 +56,13 @@ body:
label: Use Case
description: |
Describe the specific use case or problem this feature would solve.
Why do you need this feature? What problem does it solve for you or other users?
placeholder: |
I'm trying to build an application that...
Currently, I have to work around this by...
This feature would help me/users to...
- type: textarea
id: proposed-solution
@@ -72,13 +72,13 @@ body:
label: Proposed Solution
description: |
If you have ideas about how this feature could be implemented, please describe them here.
This is optional but can be helpful for maintainers to understand your vision.
placeholder: |
I think this could be implemented by...
The API could look like...
```python
# Example of how the feature might work
```
@@ -90,15 +90,15 @@ body:
label: Alternatives Considered
description: |
Have you considered any alternative solutions or workarounds?
What other approaches have you tried or considered?
placeholder: |
I've tried using...
Alternative approaches I considered:
1. ...
2. ...
But these don't work because...
- type: textarea
id: additional-context
@@ -110,9 +110,9 @@ body:
Add any other context, screenshots, examples, or references that would help explain your feature request.
placeholder: |
Related issues: #...
Similar features in other libraries:
- ...
Additional context or examples:
- ...

View File

@@ -1,4 +1,6 @@
# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/action.yml
# TODO: fix this, migrate to new docs repo?
name: "Generate LangChain People"
description: "Generate the data for the LangChain People page"
author: "Jacob Lee <jacob@langchain.dev>"

View File

@@ -1,12 +1,24 @@
# TODO: https://docs.astral.sh/uv/guides/integration/github/#caching
# Helper to set up Python and uv with caching
name: uv-install
description: Set up Python and uv
description: Set up Python and uv with caching
inputs:
python-version:
description: Python version, supporting MAJOR.MINOR only
required: true
enable-cache:
description: Enable caching for uv dependencies
required: false
default: "true"
cache-suffix:
description: Custom cache key suffix for cache invalidation
required: false
default: ""
working-directory:
description: Working directory for cache glob scoping
required: false
default: "**"
env:
UV_VERSION: "0.5.25"
@@ -15,7 +27,13 @@ runs:
using: composite
steps:
- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v6
with:
version: ${{ env.UV_VERSION }}
python-version: ${{ inputs.python-version }}
enable-cache: ${{ inputs.enable-cache }}
cache-dependency-glob: |
${{ inputs.working-directory }}/pyproject.toml
${{ inputs.working-directory }}/uv.lock
${{ inputs.working-directory }}/requirements*.txt
cache-suffix: ${{ inputs.cache-suffix }}

80
.github/pr-file-labeler.yml vendored Normal file
View File

@@ -0,0 +1,80 @@
# Label PRs (config)
# Automatically applies labels based on changed files and branch patterns
# Core packages
core:
- changed-files:
- any-glob-to-any-file:
- "libs/core/**/*"
langchain:
- changed-files:
- any-glob-to-any-file:
- "libs/langchain/**/*"
- "libs/langchain_v1/**/*"
v1:
- changed-files:
- any-glob-to-any-file:
- "libs/langchain_v1/**/*"
cli:
- changed-files:
- any-glob-to-any-file:
- "libs/cli/**/*"
standard-tests:
- changed-files:
- any-glob-to-any-file:
- "libs/standard-tests/**/*"
# Partner integrations
integration:
- changed-files:
- any-glob-to-any-file:
- "libs/partners/**/*"
# Infrastructure and DevOps
infra:
- changed-files:
- any-glob-to-any-file:
- ".github/**/*"
- "Makefile"
- ".pre-commit-config.yaml"
- "scripts/**/*"
- "docker/**/*"
- "Dockerfile*"
github_actions:
- changed-files:
- any-glob-to-any-file:
- ".github/workflows/**/*"
- ".github/actions/**/*"
dependencies:
- changed-files:
- any-glob-to-any-file:
- "**/pyproject.toml"
- "uv.lock"
- "**/requirements*.txt"
- "**/poetry.lock"
# Documentation
documentation:
- changed-files:
- any-glob-to-any-file:
- "docs/**/*"
- "**/*.md"
- "**/*.rst"
- "**/README*"
# Security related changes
security:
- changed-files:
- any-glob-to-any-file:
- "**/*security*"
- "**/*auth*"
- "**/*credential*"
- "**/*secret*"
- "**/*token*"
- ".github/workflows/security*"

41
.github/pr-title-labeler.yml vendored Normal file
View File

@@ -0,0 +1,41 @@
# PR title labeler config
#
# Labels PRs based on conventional commit patterns in titles
#
# Format: type(scope): description or type!: description (breaking)
add-missing-labels: true
clear-prexisting: false
include-commits: false
include-title: true
label-for-breaking-changes: breaking
label-mapping:
documentation: ["docs"]
feature: ["feat"]
fix: ["fix"]
infra: ["build", "ci", "chore"]
integration:
[
"anthropic",
"chroma",
"deepseek",
"exa",
"fireworks",
"groq",
"huggingface",
"mistralai",
"nomic",
"ollama",
"openai",
"perplexity",
"prompty",
"qdrant",
"xai",
]
linting: ["style"]
performance: ["perf"]
refactor: ["refactor"]
release: ["release"]
revert: ["revert"]
tests: ["test"]

View File

@@ -1,3 +1,18 @@
"""Analyze git diffs to determine which directories need to be tested.
Intelligently determines which LangChain packages and directories need to be tested,
linted, or built based on the changes. Handles dependency relationships between
packages, maps file changes to appropriate CI job configurations, and outputs JSON
configurations for GitHub Actions.
- Maps changed files to affected package directories (libs/core, libs/partners/*, etc.)
- Builds dependency graph to include dependent packages when core components change
- Generates test matrix configurations with appropriate Python versions
- Handles special cases for Pydantic version testing and performance benchmarks
Used as part of the check_diffs workflow.
"""
import glob
import json
import os
@@ -17,7 +32,7 @@ LANGCHAIN_DIRS = [
"libs/langchain_v1",
]
# when set to True, we are ignoring core dependents
# When set to True, we are ignoring core dependents
# in order to be able to get CI to pass for each individual
# package that depends on core
# e.g. if you touch core, we don't then add textsplitters/etc to CI
@@ -49,9 +64,9 @@ def all_package_dirs() -> Set[str]:
def dependents_graph() -> dict:
"""
Construct a mapping of package -> dependents, such that we can
run tests on all dependents of a package when a change is made.
"""Construct a mapping of package -> dependents
Done such that we can run tests on all dependents of a package when a change is made.
"""
dependents = defaultdict(set)
@@ -123,9 +138,6 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
elif dir_ == "libs/core":
py_versions = ["3.9", "3.10", "3.11", "3.12", "3.13"]
# custom logic for specific directories
elif dir_ == "libs/partners/milvus":
# milvus doesn't allow 3.12 because they declare deps in funny way
py_versions = ["3.9", "3.11"]
elif dir_ in PY_312_MAX_PACKAGES:
py_versions = ["3.9", "3.12"]
@@ -134,6 +146,8 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
py_versions = ["3.9", "3.13"]
elif dir_ == "libs/langchain_v1":
py_versions = ["3.10", "3.13"]
elif dir_ in {"libs/cli"}:
py_versions = ["3.10", "3.13"]
elif dir_ == ".":
# unable to install with 3.13 because tokenizers doesn't support 3.13 yet

View File

@@ -1,3 +1,5 @@
"""Check that no dependencies allow prereleases unless we're releasing a prerelease."""
import sys
import tomllib
@@ -6,15 +8,14 @@ if __name__ == "__main__":
# Get the TOML file path from the command line argument
toml_file = sys.argv[1]
# read toml file
with open(toml_file, "rb") as file:
toml_data = tomllib.load(file)
# see if we're releasing an rc
# See if we're releasing an rc or dev version
version = toml_data["project"]["version"]
releasing_rc = "rc" in version or "dev" in version
# if not, iterate through dependencies and make sure none allow prereleases
# If not, iterate through dependencies and make sure none allow prereleases
if not releasing_rc:
dependencies = toml_data["project"]["dependencies"]
for dep_version in dependencies:

View File

@@ -1,3 +1,5 @@
"""Get minimum versions of dependencies from a pyproject.toml file."""
import sys
from collections import defaultdict
from typing import Optional
@@ -5,7 +7,7 @@ from typing import Optional
if sys.version_info >= (3, 11):
import tomllib
else:
# for python 3.10 and below, which doesnt have stdlib tomllib
# For Python 3.10 and below, which doesnt have stdlib tomllib
import tomli as tomllib
import re
@@ -34,14 +36,13 @@ SKIP_IF_PULL_REQUEST = [
def get_pypi_versions(package_name: str) -> List[str]:
"""
Fetch all available versions for a package from PyPI.
"""Fetch all available versions for a package from PyPI.
Args:
package_name (str): Name of the package
package_name: Name of the package
Returns:
List[str]: List of all available versions
List of all available versions
Raises:
requests.exceptions.RequestException: If PyPI API request fails
@@ -54,24 +55,23 @@ def get_pypi_versions(package_name: str) -> List[str]:
def get_minimum_version(package_name: str, spec_string: str) -> Optional[str]:
"""
Find the minimum published version that satisfies the given constraints.
"""Find the minimum published version that satisfies the given constraints.
Args:
package_name (str): Name of the package
spec_string (str): Version specification string (e.g., ">=0.2.43,<0.4.0,!=0.3.0")
package_name: Name of the package
spec_string: Version specification string (e.g., ">=0.2.43,<0.4.0,!=0.3.0")
Returns:
Optional[str]: Minimum compatible version or None if no compatible version found
Minimum compatible version or None if no compatible version found
"""
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
# Rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
spec_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", spec_string)
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string)
# Rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1 (can be anywhere in constraint string)
for y in range(1, 10):
spec_string = re.sub(
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}", spec_string
)
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
# Rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
for x in range(1, 10):
spec_string = re.sub(
rf"\^{x}\.(\d+)\.(\d+)", rf">={x}.\1.\2,<{x + 1}", spec_string
@@ -154,22 +154,25 @@ def get_min_version_from_toml(
def check_python_version(version_string, constraint_string):
"""
Check if the given Python version matches the given constraints.
"""Check if the given Python version matches the given constraints.
:param version_string: A string representing the Python version (e.g. "3.8.5").
:param constraint_string: A string representing the package's Python version constraints (e.g. ">=3.6, <4.0").
:return: True if the version matches the constraints, False otherwise.
Args:
version_string: A string representing the Python version (e.g. "3.8.5").
constraint_string: A string representing the package's Python version
constraints (e.g. ">=3.6, <4.0").
Returns:
True if the version matches the constraints
"""
# rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
# Rewrite occurrences of ^0.0.z to 0.0.z (can be anywhere in constraint string)
constraint_string = re.sub(r"\^0\.0\.(\d+)", r"0.0.\1", constraint_string)
# rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string)
# Rewrite occurrences of ^0.y.z to >=0.y.z,<0.y+1.0 (can be anywhere in constraint string)
for y in range(1, 10):
constraint_string = re.sub(
rf"\^0\.{y}\.(\d+)", rf">=0.{y}.\1,<0.{y + 1}.0", constraint_string
)
# rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
# Rewrite occurrences of ^x.y.z to >=x.y.z,<x+1.0.0 (can be anywhere in constraint string)
for x in range(1, 10):
constraint_string = re.sub(
rf"\^{x}\.0\.(\d+)", rf">={x}.0.\1,<{x + 1}.0.0", constraint_string

View File

@@ -1,5 +1,8 @@
#!/usr/bin/env python
"""Script to sync libraries from various repositories into the main langchain repository."""
"""Sync libraries from various repositories into this monorepo.
Moves cloned partner packages into libs/partners structure.
"""
import os
import shutil
@@ -10,7 +13,7 @@ import yaml
def load_packages_yaml() -> Dict[str, Any]:
"""Load and parse the packages.yml file."""
"""Load and parse packages.yml."""
with open("langchain/libs/packages.yml", "r") as f:
return yaml.safe_load(f)
@@ -61,12 +64,15 @@ def move_libraries(packages: list) -> None:
def main():
"""Main function to orchestrate the library sync process."""
"""Orchestrate the library sync process."""
try:
# Load packages configuration
package_yaml = load_packages_yaml()
# Clean target directories
# Clean/empty target directories in preparation for moving new ones
#
# Only for packages in the langchain-ai org or explicitly included via
# include_in_api_ref, excluding 'langchain' itself and 'langchain-ai21'
clean_target_directories(
[
p
@@ -80,7 +86,9 @@ def main():
]
)
# Move libraries to their new locations
# Move cloned libraries to their new locations, only for packages in the
# langchain-ai org or explicitly included via include_in_api_ref,
# excluding 'langchain' itself and 'langchain-ai21'
move_libraries(
[
p
@@ -95,7 +103,7 @@ def main():
]
)
# Delete ones without a pyproject.toml
# Delete partner packages without a pyproject.toml
for partner in Path("langchain/libs/partners").iterdir():
if partner.is_dir() and not (partner / "pyproject.toml").exists():
print(f"Removing {partner} as it does not have a pyproject.toml")

View File

@@ -1,3 +1,11 @@
# Validates that a package's integration tests compile without syntax or import errors.
#
# (If an integration test fails to compile, it won't run.)
#
# Called as part of check_diffs.yml workflow
#
# Runs pytest with compile marker to check syntax/imports.
name: '🔗 Compile Integration Tests'
on:
@@ -33,6 +41,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
cache-suffix: compile-integration-tests-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Integration Dependencies'
shell: bash

View File

@@ -1,3 +1,10 @@
# Runs `make integration_tests` on the specified package.
#
# Manually triggered via workflow_dispatch for testing with real APIs.
#
# Installs integration test dependencies and executes full test suite.
name: '🚀 Integration Tests'
run-name: 'Test ${{ inputs.working-directory }} on Python ${{ inputs.python-version }}'
@@ -34,6 +41,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
cache-suffix: integration-tests-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Integration Dependencies'
shell: bash
@@ -81,7 +90,7 @@ jobs:
run: |
make integration_tests
- name: Ensure the tests did not create any additional files
- name: 'Ensure testing did not create/modify files'
shell: bash
run: |
set -eu

View File

@@ -1,6 +1,11 @@
name: '🧹 Code Linting'
# Runs code quality checks using ruff, mypy, and other linting tools
# Checks both package code and test code for consistency
# Runs linting.
#
# Uses the package's Makefile to run the checks, specifically the
# `lint_package` and `lint_tests` targets.
#
# Called as part of check_diffs.yml workflow.
name: '🧹 Linting'
on:
workflow_call:
@@ -39,16 +44,10 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
cache-suffix: lint-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Lint & Typing Dependencies'
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
#
# If you change this configuration, make sure to change the `cache-key`
# in the `poetry_setup` action above to stop using the old cache.
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --group lint --group typing
@@ -58,20 +57,13 @@ jobs:
run: |
make lint_package
- name: '📦 Install Unit Test Dependencies'
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
#
# If you change this configuration, make sure to change the `cache-key`
# in the `poetry_setup` action above to stop using the old cache.
# It doesn't matter how you change it, any change will cause a cache-bust.
- name: '📦 Install Test Dependencies (non-partners)'
# (For directories NOT starting with libs/partners/)
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
working-directory: ${{ inputs.working-directory }}
run: |
uv sync --inexact --group test
- name: '📦 Install Unit + Integration Test Dependencies'
- name: '📦 Install Test Dependencies'
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
working-directory: ${{ inputs.working-directory }}
run: |

View File

@@ -1,3 +1,9 @@
# Builds and publishes LangChain packages to PyPI.
#
# Manually triggered, though can be used as a reusable workflow (workflow_call).
#
# Handles version bumping, building, and publishing to PyPI with authentication.
name: '🚀 Package Release'
run-name: 'Release ${{ inputs.working-directory }} ${{ inputs.release-version }}'
on:
@@ -52,8 +58,8 @@ jobs:
# We want to keep this build stage *separate* from the release stage,
# so that there's no sharing of permissions between them.
# The release stage has trusted publishing and GitHub repo contents write access,
# and we want to keep the scope of that access limited just to the release job.
# (Release stage has trusted publishing and GitHub repo contents write access,
#
# Otherwise, a malicious `build` step (e.g. via a compromised dependency)
# could get access to our GitHub or PyPI credentials.
#
@@ -288,16 +294,19 @@ jobs:
run: |
VIRTUAL_ENV=.venv uv pip install dist/*.whl
- name: Run unit tests
run: make tests
working-directory: ${{ inputs.working-directory }}
- name: Check for prerelease versions
# Block release if any dependencies allow prerelease versions
# (unless this is itself a prerelease version)
working-directory: ${{ inputs.working-directory }}
run: |
uv run python $GITHUB_WORKSPACE/.github/scripts/check_prerelease_dependencies.py pyproject.toml
- name: Run unit tests
run: make tests
working-directory: ${{ inputs.working-directory }}
- name: Get minimum versions
# Find the minimum published versions that satisfies the given constraints
working-directory: ${{ inputs.working-directory }}
id: min-version
run: |
@@ -322,6 +331,7 @@ jobs:
working-directory: ${{ inputs.working-directory }}
- name: Run integration tests
# Uses the Makefile's `integration_tests` target for the specified package
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
env:
AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
@@ -362,7 +372,11 @@ jobs:
working-directory: ${{ inputs.working-directory }}
# Test select published packages against new core
# Done when code changes are made to langchain-core
test-prior-published-packages-against-new-core:
# Installs the new core with old partners: Installs the new unreleased core
# alongside the previously published partner packages and runs integration tests
if: github.ref != 'refs/heads/v0.3'
needs:
- build
- release-notes
@@ -390,6 +404,7 @@ jobs:
# We implement this conditional as Github Actions does not have good support
# for conditionally needing steps. https://github.com/actions/runner/issues/491
# TODO: this seems to be resolved upstream, so we can probably remove this workaround
- name: Check if libs/core
run: |
if [ "${{ startsWith(inputs.working-directory, 'libs/core') }}" != "true" ]; then
@@ -417,7 +432,7 @@ jobs:
git ls-remote --tags origin "langchain-${{ matrix.partner }}*" \
| awk '{print $2}' \
| sed 's|refs/tags/||' \
| grep -E '[0-9]+\.[0-9]+\.[0-9]+$' \
| grep -E '==0\.3\.[0-9]+$' \
| sort -Vr \
| head -n 1
)"
@@ -444,12 +459,12 @@ jobs:
make integration_tests
publish:
# Publishes the package to PyPI
needs:
- build
- release-notes
- test-pypi-publish
- pre-release-checks
- test-prior-published-packages-against-new-core
runs-on: ubuntu-latest
permissions:
# This permission is used for trusted publishing:
@@ -486,6 +501,7 @@ jobs:
attestations: false
mark-release:
# Marks the GitHub release with the new version tag
needs:
- build
- release-notes
@@ -495,7 +511,7 @@ jobs:
runs-on: ubuntu-latest
permissions:
# This permission is needed by `ncipollo/release-action` to
# create the GitHub release.
# create the GitHub release/tag
contents: write
defaults:

View File

@@ -1,6 +1,7 @@
name: '🧪 Unit Testing'
# Runs unit tests with both current and minimum supported dependency versions
# to ensure compatibility across the supported range
# to ensure compatibility across the supported range.
name: '🧪 Unit Testing'
on:
workflow_call:
@@ -39,6 +40,9 @@ jobs:
id: setup-python
with:
python-version: ${{ inputs.python-version }}
cache-suffix: test-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Test Dependencies'
shell: bash
run: uv sync --group test --dev

View File

@@ -1,3 +1,10 @@
# Validates that all import statements in `.ipynb` notebooks are correct and functional.
#
# Called as part of check_diffs.yml.
#
# Installs test dependencies and LangChain packages in editable mode and
# runs check_imports.py.
name: '📑 Documentation Import Testing'
on:
@@ -27,6 +34,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
cache-suffix: test-doc-imports-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Test Dependencies'
shell: bash

View File

@@ -1,3 +1,5 @@
# Facilitate unit testing against different Pydantic versions for a provided package.
name: '🐍 Pydantic Version Testing'
on:
@@ -40,6 +42,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ inputs.python-version }}
cache-suffix: test-pydantic-${{ inputs.working-directory }}
working-directory: ${{ inputs.working-directory }}
- name: '📦 Install Test Dependencies'
shell: bash

View File

@@ -1,11 +1,19 @@
# Build the API reference documentation.
#
# Runs daily. Can also be triggered manually for immediate updates.
#
# Built HTML pushed to langchain-ai/langchain-api-docs-html.
#
# Looks for langchain-ai org repos in packages.yml and checks them out.
# Calls prep_api_docs_build.py.
name: '📚 API Docs'
run-name: 'Build & Deploy API Reference'
# Runs daily or can be triggered manually for immediate updates
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *' # Daily at 1PM UTC
- cron: '0 13 * * *' # Runs daily at 1PM UTC (9AM EDT/6AM PDT)
env:
PYTHON_VERSION: "3.11"
@@ -31,6 +39,8 @@ jobs:
uses: mikefarah/yq@master
with:
cmd: |
# Extract repos from packages.yml that are in the langchain-ai org
# (excluding 'langchain' itself)
yq '
.packages[]
| select(
@@ -77,24 +87,31 @@ jobs:
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: '📦 Install Initial Python Dependencies'
- name: '📦 Install Initial Python Dependencies using uv'
working-directory: langchain
run: |
python -m pip install -U uv
python -m uv pip install --upgrade --no-cache-dir pip setuptools pyyaml
- name: '📦 Organize Library Directories'
# Places cloned partner packages into libs/partners structure
run: python langchain/.github/scripts/prep_api_docs_build.py
- name: '🧹 Remove Old HTML Files'
- name: '🧹 Clear Prior Build'
run:
# Remove artifacts from prior docs build
rm -rf langchain-api-docs-html/api_reference_build/html
- name: '📦 Install Documentation Dependencies'
- name: '📦 Install Documentation Dependencies using uv'
working-directory: langchain
run: |
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt
# Install all partner packages in editable mode with overrides
python -m uv pip install $(ls ./libs/partners | xargs -I {} echo "./libs/partners/{}") --overrides ./docs/vercel_overrides.txt --prerelease=allow
# Install core langchain and other main packages
python -m uv pip install libs/core libs/langchain libs/text-splitters libs/community libs/experimental libs/standard-tests
# Install Sphinx and related packages for building docs
python -m uv pip install -r docs/api_reference/requirements.txt
- name: '🔧 Configure Git Settings'
@@ -106,14 +123,29 @@ jobs:
- name: '📚 Build API Documentation'
working-directory: langchain
run: |
# Generate the API reference RST files
python docs/api_reference/create_api_rst.py
# Build the HTML documentation using Sphinx
# -T: show full traceback on exception
# -E: don't use cached environment (force rebuild, ignore cached doctrees)
# -b html: build HTML docs (vs PDS, etc.)
# -d: path for the cached environment (parsed document trees / doctrees)
# - Separate from output dir for faster incremental builds
# -c: path to conf.py
# -j auto: parallel build using all available CPU cores
python -m sphinx -T -E -b html -d ../langchain-api-docs-html/_build/doctrees -c docs/api_reference docs/api_reference ../langchain-api-docs-html/api_reference_build/html -j auto
# Post-process the generated HTML
python docs/api_reference/scripts/custom_formatter.py ../langchain-api-docs-html/api_reference_build/html
# Default index page is blank so we copy in the actual home page.
cp ../langchain-api-docs-html/api_reference_build/html/{reference,index}.html
# Removes Sphinx's intermediate build artifacts after the build is complete.
rm -rf ../langchain-api-docs-html/_build/
# https://github.com/marketplace/actions/add-commit
# Commit and push changes to langchain-api-docs-html repo
- uses: EndBug/add-and-commit@v9
with:
cwd: langchain-api-docs-html

View File

@@ -1,9 +1,11 @@
# Runs broken link checker in /docs on a daily schedule.
name: '🔗 Check Broken Links'
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *'
- cron: '0 13 * * *' # Runs daily at 1PM UTC (9AM EDT/6AM PDT)
permissions:
contents: read
@@ -15,7 +17,7 @@ jobs:
steps:
- uses: actions/checkout@v5
- name: '🟢 Setup Node.js 18.x'
uses: actions/setup-node@v4
uses: actions/setup-node@v5
with:
node-version: 18.x
cache: "yarn"

View File

@@ -1,6 +1,8 @@
name: '🔍 Check `core` Version Equality'
# Ensures version numbers in pyproject.toml and version.py stay in sync
# Prevents releases with mismatched version numbers
# Ensures version numbers in pyproject.toml and version.py stay in sync.
#
# (Prevents releases with mismatched version numbers)
name: '🔍 Check Version Equality'
on:
pull_request:

View File

@@ -1,3 +1,18 @@
# Primary CI workflow.
#
# Only runs against packages that have changed files.
#
# Runs:
# - Linting (_lint.yml)
# - Unit Tests (_test.yml)
# - Pydantic compatibility tests (_test_pydantic.yml)
# - Documentation import tests (_test_doc_imports.yml)
# - Integration test compilation checks (_compile_integration_test.yml)
# - Extended test suites that require additional dependencies
# - Codspeed benchmarks (if not labeled 'codspeed-ignore')
#
# Reports status to GitHub checks and PR status.
name: '🔧 CI'
on:
@@ -11,8 +26,8 @@ on:
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
# a limited number of job runners to be active at the same time, so it's better to
# cancel pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
@@ -54,6 +69,7 @@ jobs:
dependencies: ${{ steps.set-matrix.outputs.dependencies }}
test-doc-imports: ${{ steps.set-matrix.outputs.test-doc-imports }}
test-pydantic: ${{ steps.set-matrix.outputs.test-pydantic }}
codspeed: ${{ steps.set-matrix.outputs.codspeed }}
# Run linting only on packages that have changed files
lint:
needs: [ build ]
@@ -110,6 +126,7 @@ jobs:
# Verify integration tests compile without actually running them (faster feedback)
compile-integration-tests:
name: 'Compile Integration Tests'
needs: [ build ]
if: ${{ needs.build.outputs.compile-integration-tests != '[]' }}
strategy:
@@ -144,6 +161,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ matrix.job-configs.python-version }}
cache-suffix: extended-tests-${{ matrix.job-configs.working-directory }}
working-directory: ${{ matrix.job-configs.working-directory }}
- name: '📦 Install Dependencies & Run Extended Tests'
shell: bash
@@ -166,10 +185,72 @@ jobs:
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'
# Run codspeed benchmarks only on packages that have changed files
codspeed:
name: '⚡ CodSpeed Benchmarks'
needs: [ build ]
if: ${{ needs.build.outputs.codspeed != '[]' && !contains(github.event.pull_request.labels.*.name, 'codspeed-ignore') }}
runs-on: ubuntu-latest
strategy:
matrix:
job-configs: ${{ fromJson(needs.build.outputs.codspeed) }}
fail-fast: false
steps:
- uses: actions/checkout@v5
# We have to use 3.12 as 3.13 is not yet supported
- name: '📦 Install UV Package Manager'
uses: astral-sh/setup-uv@v6
with:
python-version: "3.12"
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: '📦 Install Test Dependencies'
run: uv sync --group test
working-directory: ${{ matrix.job-configs.working-directory }}
- name: '⚡ Run Benchmarks: ${{ matrix.job-configs.working-directory }}'
uses: CodSpeedHQ/action@v4
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_FILES_API_IMAGE_ID: ${{ secrets.ANTHROPIC_FILES_API_IMAGE_ID }}
ANTHROPIC_FILES_API_PDF_ID: ${{ secrets.ANTHROPIC_FILES_API_PDF_ID }}
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }}
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
HUGGINGFACEHUB_API_TOKEN: ${{ secrets.HUGGINGFACEHUB_API_TOKEN }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
NOMIC_API_KEY: ${{ secrets.NOMIC_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PPLX_API_KEY: ${{ secrets.PPLX_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
with:
token: ${{ secrets.CODSPEED_TOKEN }}
run: |
cd ${{ matrix.job-configs.working-directory }}
if [ "${{ matrix.job-configs.working-directory }}" = "libs/core" ]; then
uv run --no-sync pytest ./tests/benchmarks --codspeed
else
uv run --no-sync pytest ./tests/ --codspeed
fi
mode: ${{ matrix.job-configs.working-directory == 'libs/core' && 'walltime' || 'instrumentation' }}
# Final status check - ensures all required jobs passed before allowing merge
ci_success:
name: '✅ CI Success'
needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic]
needs: [build, lint, test, compile-integration-tests, extended-tests, test-doc-imports, test-pydantic, codspeed]
if: |
always()
runs-on: ubuntu-latest

View File

@@ -1,3 +1,6 @@
# For integrations, we run check_templates.py to ensure that new docs use the correct
# templates based on their type. See the script for more details.
name: '📑 Integration Docs Lint'
on:

View File

@@ -1,66 +0,0 @@
name: '⚡ CodSpeed'
on:
push:
branches:
- master
pull_request:
workflow_dispatch:
permissions:
contents: read
env:
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: foo
AZURE_OPENAI_LEGACY_CHAT_DEPLOYMENT_NAME: foo
DEEPSEEK_API_KEY: foo
FIREWORKS_API_KEY: foo
jobs:
codspeed:
name: 'Benchmark'
runs-on: ubuntu-latest
if: ${{ !contains(github.event.pull_request.labels.*.name, 'codspeed-ignore') }}
strategy:
matrix:
include:
- working-directory: libs/core
mode: walltime
- working-directory: libs/partners/openai
- working-directory: libs/partners/anthropic
- working-directory: libs/partners/deepseek
- working-directory: libs/partners/fireworks
- working-directory: libs/partners/xai
- working-directory: libs/partners/mistralai
- working-directory: libs/partners/groq
fail-fast: false
steps:
- uses: actions/checkout@v5
# We have to use 3.12 as 3.13 is not yet supported
- name: '📦 Install UV Package Manager'
uses: astral-sh/setup-uv@v6
with:
python-version: "3.12"
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: '📦 Install Test Dependencies'
run: uv sync --group test
working-directory: ${{ matrix.working-directory }}
- name: '⚡ Run Benchmarks: ${{ matrix.working-directory }}'
uses: CodSpeedHQ/action@v3
with:
token: ${{ secrets.CODSPEED_TOKEN }}
run: |
cd ${{ matrix.working-directory }}
if [ "${{ matrix.working-directory }}" = "libs/core" ]; then
uv run --no-sync pytest ./tests/benchmarks --codspeed
else
uv run --no-sync pytest ./tests/ --codspeed
fi
mode: ${{ matrix.mode || 'instrumentation' }}

View File

@@ -1,10 +0,0 @@
import toml
pyproject_toml = toml.load("pyproject.toml")
# Extract the ignore words list (adjust the key as per your TOML structure)
ignore_words_list = (
pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
)
print(f"::set-output name=ignore_words_list::{ignore_words_list}")

View File

@@ -1,9 +1,11 @@
# Updates the LangChain People data by fetching the latest info from the LangChain Git.
# TODO: broken/not used
name: '👥 LangChain People'
run-name: 'Update People Data'
# This workflow updates the LangChain People data by fetching the latest information from the LangChain Git
on:
schedule:
- cron: "0 14 1 * *"
- cron: "0 14 1 * *" # Runs at 14:00 UTC on the 1st of every month (10AM EDT/7AM PDT)
push:
branches: [jacob/people]
workflow_dispatch:

28
.github/workflows/pr_labeler_file.yml vendored Normal file
View File

@@ -0,0 +1,28 @@
# Label PRs based on changed files.
#
# See `.github/pr-file-labeler.yml` to see rules for each label/directory.
name: "🏷️ Pull Request Labeler"
on:
# Safe since we're not checking out or running the PR's code
# Never check out the PR's head in a pull_request_target job
pull_request_target:
types: [opened, synchronize, reopened, edited]
jobs:
labeler:
name: 'label'
permissions:
contents: read
pull-requests: write
issues: write
runs-on: ubuntu-latest
steps:
- name: Label Pull Request
uses: actions/labeler@v6
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
configuration-path: .github/pr-file-labeler.yml
sync-labels: false

28
.github/workflows/pr_labeler_title.yml vendored Normal file
View File

@@ -0,0 +1,28 @@
# Label PRs based on their titles.
#
# See `.github/pr-title-labeler.yml` to see rules for each label/title pattern.
name: "🏷️ PR Title Labeler"
on:
# Safe since we're not checking out or running the PR's code
# Never check out the PR's head in a pull_request_target job
pull_request_target:
types: [opened, synchronize, reopened, edited]
jobs:
pr-title-labeler:
name: 'label'
permissions:
contents: read
pull-requests: write
issues: write
runs-on: ubuntu-latest
steps:
- name: Label PR based on title
# Archived repo; latest commit (v0.1.0)
uses: grafana/pr-labeler-action@f19222d3ef883d2ca5f04420fdfe8148003763f0
with:
token: ${{ secrets.GITHUB_TOKEN }}
configuration-path: .github/pr-title-labeler.yml

View File

@@ -1,50 +1,43 @@
# -----------------------------------------------------------------------------
# PR Title Lint Workflow
# PR title linting.
#
# Purpose:
# Enforces Conventional Commits format for pull request titles to maintain a
# clear, consistent, and machine-readable change history across our repository.
# This helps with automated changelog generation and semantic versioning.
# FORMAT (Conventional Commits 1.0.0):
#
# Enforced Commit Message Format (Conventional Commits 1.0.0):
# <type>[optional scope]: <description>
# [optional body]
# [optional footer(s)]
#
# Examples:
# feat(core): add multitenant support
# fix(cli): resolve flag parsing error
# docs: update API usage examples
# docs(openai): update API usage examples
#
# Allowed Types:
# feat — a new feature (MINOR bump)
# fix — a bug fix (PATCH bump)
# docs — documentation only changes
# style — formatting, missing semi-colons, etc.; no code change
# refactor — code change that neither fixes a bug nor adds a feature
# perf — code change that improves performance
# test — adding missing tests or correcting existing tests
# build — changes that affect the build system or external dependencies
# ci — continuous integration/configuration changes
# chore — other changes that don't modify src or test files
# revert — reverts a previous commit
# release — prepare a new release
# * feat — a new feature (MINOR)
# * fix — a bug fix (PATCH)
# * docs — documentation only changes (either in /docs or code comments)
# * style — formatting, linting, etc.; no code change or typing refactors
# * refactor — code change that neither fixes a bug nor adds a feature
# * perf — code change that improves performance
# * test — adding tests or correcting existing
# * build — changes that affect the build system/external dependencies
# * ci — continuous integration/configuration changes
# * chore — other changes that don't modify source or test files
# * revert — reverts a previous commit
# * release — prepare a new release
#
# Allowed Scopes (optional):
# core, cli, langchain, standard-tests, docs, anthropic, chroma, deepseek,
# exa, fireworks, groq, huggingface, mistralai, nomic, ollama, openai,
# perplexity, prompty, qdrant, xai
# core, cli, langchain, langchain_v1, langchain_legacy, standard-tests,
# text-splitters, docs, anthropic, chroma, deepseek, exa, fireworks, groq,
# huggingface, mistralai, nomic, ollama, openai, perplexity, prompty, qdrant,
# xai, infra
#
# Rules & Tips for New Committers:
# 1. Subject (type) must start with a lowercase letter and, if possible, be
# followed by a scope wrapped in parenthesis `(scope)`
# 2. Breaking changes:
# Append "!" after type/scope (e.g., feat!: drop Node 12 support)
# Or include a footer "BREAKING CHANGE: <details>"
# 3. Example PR titles:
# feat(core): add multitenant support
# fix(cli): resolve flag parsing error
# docs: update API usage examples
# docs(openai): update API usage examples
# Rules:
# 1. The 'Type' must start with a lowercase letter.
# 2. Breaking changes: append "!" after type/scope (e.g., feat!: drop x support)
#
# Resources:
# • Conventional Commits spec: https://www.conventionalcommits.org/en/v1.0.0/
# -----------------------------------------------------------------------------
# Enforces Conventional Commits format for pull request titles to maintain a clear and
# machine-readable change history.
name: '🏷️ PR Title Lint'
@@ -56,9 +49,9 @@ on:
types: [opened, edited, synchronize]
jobs:
# Validates that PR title follows Conventional Commits specification
# Validates that PR title follows Conventional Commits 1.0.0 specification
lint-pr-title:
name: 'Validate PR Title Format'
name: 'validate format'
runs-on: ubuntu-latest
steps:
- name: '✅ Validate Conventional Commits Format'
@@ -84,6 +77,7 @@ jobs:
cli
langchain
langchain_v1
langchain_legacy
standard-tests
text-splitters
docs

View File

@@ -1,3 +1,5 @@
# Integration tests for documentation notebooks.
name: '📓 Validate Documentation Notebooks'
run-name: 'Test notebooks in ${{ inputs.working-directory }}'
on:
@@ -32,6 +34,8 @@ jobs:
uses: "./.github/actions/uv_setup"
with:
python-version: ${{ github.event.inputs.python_version || '3.11' }}
cache-suffix: run-notebooks-${{ github.event.inputs.working-directory || 'all' }}
working-directory: ${{ github.event.inputs.working-directory || '**' }}
- name: '🔐 Authenticate to Google Cloud'
id: 'auth'

View File

@@ -1,8 +1,14 @@
# Routine integration tests against partner libraries with live API credentials.
#
# Uses `make integration_tests` for each library in the matrix.
#
# Runs daily. Can also be triggered manually for immediate updates.
name: '⏰ Scheduled Integration Tests'
run-name: "Run Integration Tests - ${{ inputs.working-directory-force || 'all libs' }} (Python ${{ inputs.python-version-force || '3.9, 3.11' }})"
on:
workflow_dispatch: # Allows maintainers to trigger the workflow manually in GitHub UI
workflow_dispatch:
inputs:
working-directory-force:
type: string
@@ -54,13 +60,13 @@ jobs:
echo $matrix
echo "matrix=$matrix" >> $GITHUB_OUTPUT
# Run integration tests against partner libraries with live API credentials
# Tests are run with both Poetry and UV depending on the library's setup
# Tests are run with Poetry or UV depending on the library's setup
build:
if: github.repository_owner == 'langchain-ai' || github.event_name != 'schedule'
name: '🐍 Python ${{ matrix.python-version }}: ${{ matrix.working-directory }}'
runs-on: ubuntu-latest
needs: [compute-matrix]
timeout-minutes: 20
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
@@ -161,7 +167,7 @@ jobs:
make integration_tests
- name: '🧹 Clean up External Libraries'
# Clean up external libraries to avoid affecting git status check
# Clean up external libraries to avoid affecting the following git status check
run: |
rm -rf \
langchain/libs/partners/google-genai \

9
.github/workflows/v1_changes.md vendored Normal file
View File

@@ -0,0 +1,9 @@
With the deprecation of v0 docs, the following files will need to be migrated/supported
in the new docs repo:
- run_notebooks.yml: New repo should run Integration tests on code snippets?
- people.yml: Need to fix and somehow display on the new docs site
- Subsequently, `.github/actions/people/`
- _test_doc_imports.yml
- check_new_docs.yml
- check-broken-links.yml

View File

@@ -2,110 +2,104 @@ repos:
- repo: local
hooks:
- id: core
name: format core
name: format and lint core
language: system
entry: make -C libs/core format
entry: make -C libs/core format lint
files: ^libs/core/
pass_filenames: false
- id: langchain
name: format langchain
name: format and lint langchain
language: system
entry: make -C libs/langchain format
entry: make -C libs/langchain format lint
files: ^libs/langchain/
pass_filenames: false
- id: standard-tests
name: format standard-tests
name: format and lint standard-tests
language: system
entry: make -C libs/standard-tests format
entry: make -C libs/standard-tests format lint
files: ^libs/standard-tests/
pass_filenames: false
- id: text-splitters
name: format text-splitters
name: format and lint text-splitters
language: system
entry: make -C libs/text-splitters format
entry: make -C libs/text-splitters format lint
files: ^libs/text-splitters/
pass_filenames: false
- id: anthropic
name: format partners/anthropic
name: format and lint partners/anthropic
language: system
entry: make -C libs/partners/anthropic format
entry: make -C libs/partners/anthropic format lint
files: ^libs/partners/anthropic/
pass_filenames: false
- id: chroma
name: format partners/chroma
name: format and lint partners/chroma
language: system
entry: make -C libs/partners/chroma format
entry: make -C libs/partners/chroma format lint
files: ^libs/partners/chroma/
pass_filenames: false
- id: couchbase
name: format partners/couchbase
language: system
entry: make -C libs/partners/couchbase format
files: ^libs/partners/couchbase/
pass_filenames: false
- id: exa
name: format partners/exa
name: format and lint partners/exa
language: system
entry: make -C libs/partners/exa format
entry: make -C libs/partners/exa format lint
files: ^libs/partners/exa/
pass_filenames: false
- id: fireworks
name: format partners/fireworks
name: format and lint partners/fireworks
language: system
entry: make -C libs/partners/fireworks format
entry: make -C libs/partners/fireworks format lint
files: ^libs/partners/fireworks/
pass_filenames: false
- id: groq
name: format partners/groq
name: format and lint partners/groq
language: system
entry: make -C libs/partners/groq format
entry: make -C libs/partners/groq format lint
files: ^libs/partners/groq/
pass_filenames: false
- id: huggingface
name: format partners/huggingface
name: format and lint partners/huggingface
language: system
entry: make -C libs/partners/huggingface format
entry: make -C libs/partners/huggingface format lint
files: ^libs/partners/huggingface/
pass_filenames: false
- id: mistralai
name: format partners/mistralai
name: format and lint partners/mistralai
language: system
entry: make -C libs/partners/mistralai format
entry: make -C libs/partners/mistralai format lint
files: ^libs/partners/mistralai/
pass_filenames: false
- id: nomic
name: format partners/nomic
name: format and lint partners/nomic
language: system
entry: make -C libs/partners/nomic format
entry: make -C libs/partners/nomic format lint
files: ^libs/partners/nomic/
pass_filenames: false
- id: ollama
name: format partners/ollama
name: format and lint partners/ollama
language: system
entry: make -C libs/partners/ollama format
entry: make -C libs/partners/ollama format lint
files: ^libs/partners/ollama/
pass_filenames: false
- id: openai
name: format partners/openai
name: format and lint partners/openai
language: system
entry: make -C libs/partners/openai format
entry: make -C libs/partners/openai format lint
files: ^libs/partners/openai/
pass_filenames: false
- id: prompty
name: format partners/prompty
name: format and lint partners/prompty
language: system
entry: make -C libs/partners/prompty format
entry: make -C libs/partners/prompty format lint
files: ^libs/partners/prompty/
pass_filenames: false
- id: qdrant
name: format partners/qdrant
name: format and lint partners/qdrant
language: system
entry: make -C libs/partners/qdrant format
entry: make -C libs/partners/qdrant format lint
files: ^libs/partners/qdrant/
pass_filenames: false
- id: root
name: format docs, cookbook
name: format and lint docs, cookbook
language: system
entry: make format
entry: make format lint
files: ^(docs|cookbook)/
pass_filenames: false

View File

@@ -1,25 +0,0 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"
commands:
- mkdir -p $READTHEDOCS_OUTPUT
- cp -r api_reference_build/* $READTHEDOCS_OUTPUT
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/api_reference/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats:
- pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/api_reference/requirements.txt

View File

@@ -78,5 +78,10 @@
"editor.insertSpaces": true
},
"python.terminal.activateEnvironment": false,
"python.defaultInterpreterPath": "./.venv/bin/python"
"python.defaultInterpreterPath": "./.venv/bin/python",
"github.copilot.chat.commitMessageGeneration.instructions": [
{
"file": ".github/workflows/pr_lint.yml"
}
]
}

325
AGENTS.md Normal file
View File

@@ -0,0 +1,325 @@
# Global Development Guidelines for LangChain Projects
## Core Development Principles
### 1. Maintain Stable Public Interfaces ⚠️ CRITICAL
**Always attempt to preserve function signatures, argument positions, and names for exported/public methods.**
**Bad - Breaking Change:**
```python
def get_user(id, verbose=False): # Changed from `user_id`
pass
```
**Good - Stable Interface:**
```python
def get_user(user_id: str, verbose: bool = False) -> User:
"""Retrieve user by ID with optional verbose output."""
pass
```
**Before making ANY changes to public APIs:**
- Check if the function/class is exported in `__init__.py`
- Look for existing usage patterns in tests and examples
- Use keyword-only arguments for new parameters: `*, new_param: str = "default"`
- Mark experimental features clearly with docstring warnings (using reStructuredText, like `.. warning::`)
🧠 *Ask yourself:* "Would this change break someone's code if they used it last week?"
### 2. Code Quality Standards
**All Python code MUST include type hints and return types.**
**Bad:**
```python
def p(u, d):
return [x for x in u if x not in d]
```
**Good:**
```python
def filter_unknown_users(users: list[str], known_users: set[str]) -> list[str]:
"""Filter out users that are not in the known users set.
Args:
users: List of user identifiers to filter.
known_users: Set of known/valid user identifiers.
Returns:
List of users that are not in the known_users set.
"""
return [user for user in users if user not in known_users]
```
**Style Requirements:**
- Use descriptive, **self-explanatory variable names**. Avoid overly short or cryptic identifiers.
- Attempt to break up complex functions (>20 lines) into smaller, focused functions where it makes sense
- Avoid unnecessary abstraction or premature optimization
- Follow existing patterns in the codebase you're modifying
### 3. Testing Requirements
**Every new feature or bugfix MUST be covered by unit tests.**
**Test Organization:**
- Unit tests: `tests/unit_tests/` (no network calls allowed)
- Integration tests: `tests/integration_tests/` (network calls permitted)
- Use `pytest` as the testing framework
**Test Quality Checklist:**
- [ ] Tests fail when your new logic is broken
- [ ] Happy path is covered
- [ ] Edge cases and error conditions are tested
- [ ] Use fixtures/mocks for external dependencies
- [ ] Tests are deterministic (no flaky tests)
Checklist questions:
- [ ] Does the test suite fail if your new logic is broken?
- [ ] Are all expected behaviors exercised (happy path, invalid input, etc)?
- [ ] Do tests use fixtures or mocks where needed?
```python
def test_filter_unknown_users():
"""Test filtering unknown users from a list."""
users = ["alice", "bob", "charlie"]
known_users = {"alice", "bob"}
result = filter_unknown_users(users, known_users)
assert result == ["charlie"]
assert len(result) == 1
```
### 4. Security and Risk Assessment
**Security Checklist:**
- No `eval()`, `exec()`, or `pickle` on user-controlled input
- Proper exception handling (no bare `except:`) and use a `msg` variable for error messages
- Remove unreachable/commented code before committing
- Race conditions or resource leaks (file handles, sockets, threads).
- Ensure proper resource cleanup (file handles, connections)
**Bad:**
```python
def load_config(path):
with open(path) as f:
return eval(f.read()) # ⚠️ Never eval config
```
**Good:**
```python
import json
def load_config(path: str) -> dict:
with open(path) as f:
return json.load(f)
```
### 5. Documentation Standards
**Use Google-style docstrings with Args section for all public functions.**
**Insufficient Documentation:**
```python
def send_email(to, msg):
"""Send an email to a recipient."""
```
**Complete Documentation:**
```python
def send_email(to: str, msg: str, *, priority: str = "normal") -> bool:
"""
Send an email to a recipient with specified priority.
Args:
to: The email address of the recipient.
msg: The message body to send.
priority: Email priority level (``'low'``, ``'normal'``, ``'high'``).
Returns:
True if email was sent successfully, False otherwise.
Raises:
InvalidEmailError: If the email address format is invalid.
SMTPConnectionError: If unable to connect to email server.
"""
```
**Documentation Guidelines:**
- Types go in function signatures, NOT in docstrings
- Focus on "why" rather than "what" in descriptions
- Document all parameters, return values, and exceptions
- Keep descriptions concise but clear
- Use reStructuredText for docstrings to enable rich formatting
📌 *Tip:* Keep descriptions concise but clear. Only document return values if non-obvious.
### 6. Architectural Improvements
**When you encounter code that could be improved, suggest better designs:**
**Poor Design:**
```python
def process_data(data, db_conn, email_client, logger):
# Function doing too many things
validated = validate_data(data)
result = db_conn.save(validated)
email_client.send_notification(result)
logger.log(f"Processed {len(data)} items")
return result
```
**Better Design:**
```python
@dataclass
class ProcessingResult:
"""Result of data processing operation."""
items_processed: int
success: bool
errors: List[str] = field(default_factory=list)
class DataProcessor:
"""Handles data validation, storage, and notification."""
def __init__(self, db_conn: Database, email_client: EmailClient):
self.db = db_conn
self.email = email_client
def process(self, data: List[dict]) -> ProcessingResult:
"""Process and store data with notifications."""
validated = self._validate_data(data)
result = self.db.save(validated)
self._notify_completion(result)
return result
```
**Design Improvement Areas:**
If there's a **cleaner**, **more scalable**, or **simpler** design, highlight it and suggest improvements that would:
- Reduce code duplication through shared utilities
- Make unit testing easier
- Improve separation of concerns (single responsibility)
- Make unit testing easier through dependency injection
- Add clarity without adding complexity
- Prefer dataclasses for structured data
## Development Tools & Commands
### Package Management
```bash
# Add package
uv add package-name
# Sync project dependencies
uv sync
uv lock
```
### Testing
```bash
# Run unit tests (no network)
make test
# Don't run integration tests, as API keys must be set
# Run specific test file
uv run --group test pytest tests/unit_tests/test_specific.py
```
### Code Quality
```bash
# Lint code
make lint
# Format code
make format
# Type checking
uv run --group lint mypy .
```
### Dependency Management Patterns
**Local Development Dependencies:**
```toml
[tool.uv.sources]
langchain-core = { path = "../core", editable = true }
langchain-tests = { path = "../standard-tests", editable = true }
```
**For tools, use the `@tool` decorator from `langchain_core.tools`:**
```python
from langchain_core.tools import tool
@tool
def search_database(query: str) -> str:
"""Search the database for relevant information.
Args:
query: The search query string.
"""
# Implementation here
return results
```
## Commit Standards
**Use Conventional Commits format for PR titles:**
- `feat(core): add multi-tenant support`
- `fix(cli): resolve flag parsing error`
- `docs: update API usage examples`
- `docs(openai): update API usage examples`
## Framework-Specific Guidelines
- Follow the existing patterns in `langchain-core` for base abstractions
- Use `langchain_core.callbacks` for execution tracking
- Implement proper streaming support where applicable
- Avoid deprecated components like legacy `LLMChain`
### Partner Integrations
- Follow the established patterns in existing partner libraries
- Implement standard interfaces (`BaseChatModel`, `BaseEmbeddings`, etc.)
- Include comprehensive integration tests
- Document API key requirements and authentication
---
## Quick Reference Checklist
Before submitting code changes:
- [ ] **Breaking Changes**: Verified no public API changes
- [ ] **Type Hints**: All functions have complete type annotations
- [ ] **Tests**: New functionality is fully tested
- [ ] **Security**: No dangerous patterns (eval, silent failures, etc.)
- [ ] **Documentation**: Google-style docstrings for public functions
- [ ] **Code Quality**: `make lint` and `make format` pass
- [ ] **Architecture**: Suggested improvements where applicable
- [ ] **Commit Message**: Follows Conventional Commits format

View File

@@ -1,4 +1,4 @@
.PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck spell_check spell_fix lint lint_package lint_tests format format_diff
.PHONY: all clean help docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck lint lint_package lint_tests format format_diff
.EXPORT_ALL_VARIABLES:
UV_FROZEN = true
@@ -78,18 +78,6 @@ api_docs_linkcheck:
fi
@echo "✅ API link check complete"
## spell_check: Run codespell on the project.
spell_check:
@echo "✏️ Checking spelling across project..."
uv run --group codespell codespell --toml pyproject.toml
@echo "✅ Spell check complete"
## spell_fix: Run codespell on the project and fix the errors.
spell_fix:
@echo "✏️ Fixing spelling errors across project..."
uv run --group codespell codespell --toml pyproject.toml -w
@echo "✅ Spelling errors fixed"
######################
# LINTING AND FORMATTING
######################
@@ -100,7 +88,7 @@ lint lint_package lint_tests:
uv run --group lint ruff check docs cookbook
uv run --group lint ruff format docs cookbook cookbook --diff
git --no-pager grep 'from langchain import' docs cookbook | grep -vE 'from langchain import (hub)' && echo "Error: no importing langchain from root in docs, except for hub" && exit 1 || exit 0
git --no-pager grep 'api.python.langchain.com' -- docs/docs ':!docs/docs/additional_resources/arxiv_references.mdx' ':!docs/docs/integrations/document_loaders/sitemap.ipynb' || exit 0 && \
echo "Error: you should link python.langchain.com/api_reference, not api.python.langchain.com in the docs" && \
exit 1

108
README.md
View File

@@ -1,83 +1,75 @@
<picture>
<source media="(prefers-color-scheme: light)" srcset="docs/static/img/logo-dark.svg">
<source media="(prefers-color-scheme: dark)" srcset="docs/static/img/logo-light.svg">
<img alt="LangChain Logo" src="docs/static/img/logo-dark.svg" width="80%">
</picture>
<p align="center">
<picture>
<source media="(prefers-color-scheme: light)" srcset="docs/static/img/logo-dark.svg">
<source media="(prefers-color-scheme: dark)" srcset="docs/static/img/logo-light.svg">
<img alt="LangChain Logo" src="docs/static/img/logo-dark.svg" width="80%">
</picture>
</p>
<div>
<br>
</div>
<p align="center">
The platform for reliable agents.
</p>
[![PyPI - License](https://img.shields.io/pypi/l/langchain-core?style=flat-square)](https://opensource.org/licenses/MIT)
[![PyPI - Downloads](https://img.shields.io/pepy/dt/langchain)](https://pypistats.org/packages/langchain-core)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[<img src="https://github.com/codespaces/badge.svg" alt="Open in Github Codespace" title="Open in Github Codespace" width="150" height="20">](https://codespaces.new/langchain-ai/langchain)
[![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/langchain-ai/langchain)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
<p align="center">
<a href="https://opensource.org/licenses/MIT" target="_blank">
<img src="https://img.shields.io/pypi/l/langchain-core?style=flat-square" alt="PyPI - License">
</a>
<a href="https://pypistats.org/packages/langchain-core" target="_blank">
<img src="https://img.shields.io/pepy/dt/langchain" alt="PyPI - Downloads">
</a>
<a href="https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain" target="_blank">
<img src="https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square" alt="Open in Dev Containers">
</a>
<a href="https://codespaces.new/langchain-ai/langchain" target="_blank">
<img src="https://github.com/codespaces/badge.svg" alt="Open in Github Codespace" title="Open in Github Codespace" width="150" height="20">
</a>
<a href="https://codspeed.io/langchain-ai/langchain" target="_blank">
<img src="https://img.shields.io/endpoint?url=https://codspeed.io/badge.json" alt="CodSpeed Badge">
</a>
<a href="https://twitter.com/langchainai" target="_blank">
<img src="https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI" alt="Twitter / X">
</a>
</p>
> [!NOTE]
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
LangChain is a framework for building LLM-powered applications. It helps you chain
together interoperable components and third-party integrations to simplify AI
application development — all while future-proofing decisions as the underlying
technology evolves.
LangChain is a framework for building LLM-powered applications. It helps you chain together interoperable components and third-party integrations to simplify AI application development — all while future-proofing decisions as the underlying technology evolves.
```bash
pip install -U langchain
```
To learn more about LangChain, check out
[the docs](https://python.langchain.com/docs/introduction/). If youre looking for more
advanced customization or agent orchestration, check out
[LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building
controllable agent workflows.
---
**Documentation**: To learn more about LangChain, check out [the docs](https://python.langchain.com/docs/introduction/).
If you're looking for more advanced customization or agent orchestration, check out [LangGraph](https://langchain-ai.github.io/langgraph/), our framework for building controllable agent workflows.
> [!NOTE]
> Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
## Why use LangChain?
LangChain helps developers build applications powered by LLMs through a standard
interface for models, embeddings, vector stores, and more.
LangChain helps developers build applications powered by LLMs through a standard interface for models, embeddings, vector stores, and more.
Use LangChain for:
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and
external/internal systems, drawing from LangChains vast library of integrations with
model providers, tools, vector stores, retrievers, and more.
- **Model interoperability**. Swap models in and out as your engineering team
experiments to find the best choice for your applications needs. As the industry
frontier evolves, adapt quickly — LangChains abstractions keep you moving without
losing momentum.
- **Real-time data augmentation**. Easily connect LLMs to diverse data sources and external/internal systems, drawing from LangChains vast library of integrations with model providers, tools, vector stores, retrievers, and more.
- **Model interoperability**. Swap models in and out as your engineering team experiments to find the best choice for your applications needs. As the industry frontier evolves, adapt quickly — LangChains abstractions keep you moving without losing momentum.
## LangChains ecosystem
While the LangChain framework can be used standalone, it also integrates seamlessly
with any LangChain product, giving developers a full suite of tools when building LLM
applications.
While the LangChain framework can be used standalone, it also integrates seamlessly with any LangChain product, giving developers a full suite of tools when building LLM applications.
To improve your LLM application development, pair LangChain with:
- [LangSmith](https://www.langchain.com/langsmith) - Helpful for agent evals and
observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain
visibility in production, and improve performance over time.
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can
reliably handle complex tasks with LangGraph, our low-level agent orchestration
framework. LangGraph offers customizable architecture, long-term memory, and
human-in-the-loop workflows — and is trusted in production by companies like LinkedIn,
Uber, Klarna, and GitLab.
- [LangGraph Platform](https://docs.langchain.com/langgraph-platform) - Deploy
and scale agents effortlessly with a purpose-built deployment platform for long-running, stateful workflows. Discover, reuse, configure, and share agents across
teams — and iterate quickly with visual prototyping in
[LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
- [LangSmith](https://www.langchain.com/langsmith) - Helpful for agent evals and observability. Debug poor-performing LLM app runs, evaluate agent trajectories, gain visibility in production, and improve performance over time.
- [LangGraph](https://langchain-ai.github.io/langgraph/) - Build agents that can reliably handle complex tasks with LangGraph, our low-level agent orchestration framework. LangGraph offers customizable architecture, long-term memory, and human-in-the-loop workflows — and is trusted in production by companies like LinkedIn, Uber, Klarna, and GitLab.
- [LangGraph Platform](https://docs.langchain.com/langgraph-platform) - Deploy and scale agents effortlessly with a purpose-built deployment platform for long-running, stateful workflows. Discover, reuse, configure, and share agents across teams — and iterate quickly with visual prototyping in [LangGraph Studio](https://langchain-ai.github.io/langgraph/concepts/langgraph_studio/).
## Additional resources
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with
guided examples on getting started with LangChain.
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code
snippets for topics such as tool calling, RAG use cases, and more.
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key
concepts behind the LangChain framework.
- [Tutorials](https://python.langchain.com/docs/tutorials/): Simple walkthroughs with guided examples on getting started with LangChain.
- [How-to Guides](https://python.langchain.com/docs/how_to/): Quick, actionable code snippets for topics such as tool calling, RAG use cases, and more.
- [Conceptual Guides](https://python.langchain.com/docs/concepts/): Explanations of key concepts behind the LangChain framework.
- [LangChain Forum](https://forum.langchain.com/): Connect with the community and share all of your technical questions, ideas, and feedback.
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on
navigating base packages and integrations for LangChain.
- [API Reference](https://python.langchain.com/api_reference/): Detailed reference on navigating base packages and integrations for LangChain.
- [Chat LangChain](https://chat.langchain.com/): Ask questions & chat with our documentation.

View File

@@ -22,9 +22,7 @@ Example scenarios with mitigation strategies:
* A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such misuse.
* A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
If you're building applications that access external resources like file systems, APIs
or databases, consider speaking with your company's security team to determine how to best
design and secure your applications.
If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your company's security team to determine how to best design and secure your applications.
## Reporting OSS Vulnerabilities
@@ -37,10 +35,8 @@ open source projects at [huntr](https://huntr.com/bounties/disclose/?target=http
Before reporting a vulnerability, please review:
1) In-Scope Targets and Out-of-Scope Targets below.
2) The [langchain-ai/langchain](https://python.langchain.com/docs/contributing/repo_structure) monorepo structure.
3) The [Best Practices](#best-practices) above to
understand what we consider to be a security vulnerability vs. developer
responsibility.
2) The [langchain-ai/langchain](https://docs.langchain.com/oss/python/contributing/code#supporting-packages) monorepo structure.
3) The [Best Practices](#best-practices) above to understand what we consider to be a security vulnerability vs. developer responsibility.
### In-Scope Targets

View File

@@ -47,10 +47,12 @@
"source": [
"### Prerequisites\n",
"\n",
"Please install the Oracle Database [python-oracledb driver](https://pypi.org/project/oracledb/) to use LangChain with Oracle AI Vector Search:\n",
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb.\n",
"\n",
"```\n",
"$ python -m pip install --upgrade oracledb\n",
"$ python -m pip install -U langchain-oracledb\n",
"```"
]
},
@@ -217,7 +219,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_oracledb.embeddings.oracleai import OracleEmbeddings\n",
"\n",
"# please update with your related information\n",
"# make sure that you have onnx file in the system\n",
@@ -296,7 +298,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_oracledb.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"# loading from Oracle Database table\n",
@@ -354,7 +356,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_oracledb.utilities.oracleai import OracleSummary\n",
"from langchain_core.documents import Document\n",
"\n",
"# using 'database' provider\n",
@@ -395,7 +397,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_core.documents import Document\n",
"\n",
"# split by default parameters\n",
@@ -452,7 +454,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_oracledb.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_core.documents import Document\n",
"\n",
"# using ONNX model loaded to Oracle Database\n",
@@ -498,14 +500,14 @@
"import sys\n",
"\n",
"import oracledb\n",
"from langchain_community.document_loaders.oracleai import (\n",
"from langchain_oracledb.document_loaders.oracleai import (\n",
" OracleDocLoader,\n",
" OracleTextSplitter,\n",
")\n",
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_community.vectorstores import oraclevs\n",
"from langchain_community.vectorstores.oraclevs import OracleVS\n",
"from langchain_oracledb.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_oracledb.utilities.oracleai import OracleSummary\n",
"from langchain_oracledb.vectorstores import oraclevs\n",
"from langchain_oracledb.vectorstores.oraclevs import OracleVS\n",
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
"from langchain_core.documents import Document"
]
@@ -677,19 +679,19 @@
"outputs": [],
"source": [
"query = \"What is Oracle AI Vector Store?\"\n",
"filter = {\"document_id\": [\"1\"]}\n",
"db_filter = {\"document_id\": \"1\"}\n",
"\n",
"# Similarity search without a filter\n",
"print(vectorstore.similarity_search(query, 1))\n",
"\n",
"# Similarity search with a filter\n",
"print(vectorstore.similarity_search(query, 1, filter=filter))\n",
"print(vectorstore.similarity_search(query, 1, filter=db_filter))\n",
"\n",
"# Similarity search with relevance score\n",
"print(vectorstore.similarity_search_with_score(query, 1))\n",
"\n",
"# Similarity search with relevance score with filter\n",
"print(vectorstore.similarity_search_with_score(query, 1, filter=filter))\n",
"print(vectorstore.similarity_search_with_score(query, 1, filter=db_filter))\n",
"\n",
"# Max marginal relevance search\n",
"print(vectorstore.max_marginal_relevance_search(query, 1, fetch_k=20, lambda_mult=0.5))\n",
@@ -697,7 +699,7 @@
"# Max marginal relevance search with filter\n",
"print(\n",
" vectorstore.max_marginal_relevance_search(\n",
" query, 1, fetch_k=20, lambda_mult=0.5, filter=filter\n",
" query, 1, fetch_k=20, lambda_mult=0.5, filter=db_filter\n",
" )\n",
")"
]

View File

@@ -1,3 +1,154 @@
# LangChain Documentation
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/how_to/documentation)
For more information on contributing to our documentation, see the [Documentation Contributing Guide](https://python.langchain.com/docs/contributing/how_to/documentation).
## Structure
The primary documentation is located in the `docs/` directory. This directory contains
both the source files for the main documentation as well as the API reference doc
build process.
### API Reference
API reference documentation is located in `docs/api_reference/` and is generated from
the codebase using Sphinx.
The API reference have additional build steps that differ from the main documentation.
#### Deployment Process
Currently, the build process roughly follows these steps:
1. Using the `api_doc_build.yml` GitHub workflow, the API reference docs are
[built](#build-technical-details) and copied to the `langchain-api-docs-html`
repository. This workflow is triggered either (1) on a cron routine interval or (2)
triggered manually.
In short, the workflow extracts all `langchain-ai`-org-owned repos defined in
`langchain/libs/packages.yml`, clones them locally (in the workflow runner's file
system), and then builds the API reference RST files (using `create_api_rst.py`).
Following post-processing, the HTML files are pushed to the
`langchain-api-docs-html` repository.
2. After the HTML files are in the `langchain-api-docs-html` repository, they are **not**
automatically published to the [live docs site](https://python.langchain.com/api_reference/).
The docs site is served by Vercel. The Vercel deployment process copies the HTML
files from the `langchain-api-docs-html` repository and deploys them to the live
site. Deployments are triggered on each new commit pushed to `v0.3`.
#### Build Technical Details
The build process creates a virtual monorepo by syncing multiple repositories, then generates comprehensive API documentation:
1. **Repository Sync Phase:**
- `.github/scripts/prep_api_docs_build.py` - Clones external partner repos and organizes them into the `libs/partners/` structure to create a virtual monorepo for documentation building
2. **RST Generation Phase:**
- `docs/api_reference/create_api_rst.py` - Main script that **generates RST files** from Python source code
- Scans `libs/` directories and extracts classes/functions from each module (using `inspect`)
- Creates `.rst` files using specialized templates for different object types
- Templates in `docs/api_reference/templates/` (`pydantic.rst`, `runnable_pydantic.rst`, etc.)
3. **HTML Build Phase:**
- Sphinx-based, uses `sphinx.ext.autodoc` (auto-extracts docstrings from the codebase)
- `docs/api_reference/conf.py` (sphinx config) configures `autodoc` and other extensions
- `sphinx-build` processes the generated `.rst` files into HTML using autodoc
- `docs/api_reference/scripts/custom_formatter.py` - Post-processes the generated HTML
- Copies `reference.html` to `index.html` to create the default landing page (artifact? might not need to do this - just put everyhing in index directly?)
4. **Deployment:**
- `.github/workflows/api_doc_build.yml` - Workflow responsible for orchestrating the entire build and deployment process
- Built HTML files are committed and pushed to the `langchain-api-docs-html` repository
#### Local Build
For local development and testing of API documentation, use the Makefile targets in the repository root:
```bash
# Full build
make api_docs_build
```
Like the CI process, this target:
- Installs the CLI package in editable mode
- Generates RST files for all packages using `create_api_rst.py`
- Builds HTML documentation with Sphinx
- Post-processes the HTML with `custom_formatter.py`
- Opens the built documentation (`reference.html`) in your browser
**Quick Preview:**
```bash
make api_docs_quick_preview API_PKG=openai
```
- Generates RST files for only the specified package (default: `text-splitters`)
- Builds and post-processes HTML documentation
- Opens the preview in your browser
Both targets automatically clean previous builds and handle the complete build pipeline locally, mirroring the CI process but for faster iteration during development.
#### Documentation Standards
**Docstring Format:**
The API reference uses **Google-style docstrings** with reStructuredText markup. Sphinx processes these through the `sphinx.ext.napoleon` extension to generate documentation.
**Required format:**
```python
def example_function(param1: str, param2: int = 5) -> bool:
"""Brief description of the function.
Longer description can go here. Use reStructuredText syntax for
rich formatting like **bold** and *italic*.
TODO: code: figure out what works?
Args:
param1: Description of the first parameter.
param2: Description of the second parameter with default value.
Returns:
Description of the return value.
Raises:
ValueError: When param1 is empty.
TypeError: When param2 is not an integer.
.. warning::
This function is experimental and may change.
"""
```
**Special Markers:**
- `:private:` in docstrings excludes members from documentation
- `.. warning::` adds warning admonitions
#### Site Styling and Assets
**Theme and Styling:**
- Uses [**PyData Sphinx Theme**](https://pydata-sphinx-theme.readthedocs.io/en/stable/index.html) (`pydata_sphinx_theme`)
- Custom CSS in `docs/api_reference/_static/css/custom.css` with LangChain-specific:
- Color palette
- Inter font family
- Custom navbar height and sidebar formatting
- Deprecated/beta feature styling
**Static Assets:**
- Logos: `_static/wordmark-api.svg` (light) and `_static/wordmark-api-dark.svg` (dark mode)
- Favicon: `_static/img/brand/favicon.png`
- Custom CSS: `_static/css/custom.css`
**Post-Processing:**
- `scripts/custom_formatter.py` cleans up generated HTML:
- Shortens TOC entries from `ClassName.method()` to `method()`
**Analytics and Integration:**
- GitHub integration (source links, edit buttons)
- Example backlinking through custom `ExampleLinksDirective`

View File

@@ -50,7 +50,7 @@ class GalleryGridDirective(SphinxDirective):
individual cards + ["image", "header", "content", "title"].
Danger:
This directive can only be used in the context of a Myst documentation page as
This directive can only be used in the context of a MyST documentation page as
the templates use Markdown flavored formatting.
"""

View File

@@ -394,3 +394,21 @@ p {
font-size: 0.9rem;
margin-bottom: 0.5rem;
}
/* Deprecation announcement banner styling */
.bd-header-announcement {
background-color: #790000 !important;
color: white !important;
font-weight: 600;
padding: 0.75rem 1rem;
text-align: center;
}
.bd-header-announcement a {
color: white !important;
text-decoration: underline !important;
}
.bd-header-announcement a:hover {
color: #f0f0f0 !important;
}

View File

@@ -1,7 +1,5 @@
"""Configuration file for the Sphinx documentation builder."""
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
@@ -20,16 +18,18 @@ from docutils.parsers.rst.directives.admonitions import BaseAdmonition
from docutils.statemachine import StringList
from sphinx.util.docutils import SphinxDirective
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# Add paths to Python import system so Sphinx can import LangChain modules
# This allows autodoc to introspect and document the actual code
_DIR = Path(__file__).parent.absolute()
sys.path.insert(0, os.path.abspath("."))
sys.path.insert(0, os.path.abspath("../../libs/langchain"))
sys.path.insert(0, os.path.abspath(".")) # Current directory
sys.path.insert(0, os.path.abspath("../../libs/langchain")) # LangChain main package
# Load package metadata from pyproject.toml (for version info, etc.)
with (_DIR.parents[1] / "libs" / "langchain" / "pyproject.toml").open("r") as f:
data = toml.load(f)
# Load mapping of classes to example notebooks for backlinking
# This file is generated by scripts that scan our tutorial/example notebooks
with (_DIR / "guide_imports.json").open("r") as f:
imported_classes = json.load(f)
@@ -86,6 +86,7 @@ class Beta(BaseAdmonition):
def setup(app):
"""Register custom directives and hooks with Sphinx."""
app.add_directive("example_links", ExampleLinksDirective)
app.add_directive("beta", Beta)
app.connect("autodoc-skip-member", skip_private_members)
@@ -125,7 +126,7 @@ extensions = [
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"IPython.sphinxext.ipython_console_highlighting",
"myst_parser",
"myst_parser", # For generated index.md and reference.md
"_extensions.gallery_directive",
"sphinx_design",
"sphinx_copybutton",
@@ -217,7 +218,7 @@ html_theme_options = {
# # Use :html_theme.sidebar_secondary.remove: for file-wide removal
# "secondary_sidebar_items": {"**": ["page-toc", "sourcelink"]},
# "show_version_warning_banner": True,
# "announcement": None,
"announcement": "⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank' style='color: white; text-decoration: underline;'>Visit the new v1.0 docs</a> and new <a href='https://reference.langchain.com/python' target='_blank' style='color: white; text-decoration: underline;'>reference docs</a>",
"icon_links": [
{
# Label for this link
@@ -258,6 +259,7 @@ html_static_path = ["_static"]
html_css_files = ["css/custom.css"]
html_use_index = False
# Only used on the generated index.md and reference.md files
myst_enable_extensions = ["colon_fence"]
# generate autosummary even if no references
@@ -268,11 +270,11 @@ autosummary_ignore_module_all = False
html_copy_source = False
html_show_sourcelink = False
googleanalytics_id = "G-9B66JQQH2F"
# Set canonical URL from the Read the Docs Domain
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "")
googleanalytics_id = "G-9B66JQQH2F"
# Tell Jinja2 templates the build is running on Read the Docs
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True

View File

@@ -1,4 +1,41 @@
"""Script for auto-generating api_reference.rst."""
"""Auto-generate API reference documentation (RST files) for LangChain packages.
* Automatically discovers all packages in `libs/` and `libs/partners/`
* For each package, recursively walks the filesystem to:
* Load Python modules using importlib
* Extract classes and functions using Python's inspect module
* Classify objects by type (Pydantic models, Runnables, TypedDicts, etc.)
* Filter out private members (names starting with '_') and deprecated items
* Creates structured RST files with:
* Module-level documentation pages with autosummary tables
* Different Sphinx templates based on object type (see templates/ directory)
* Proper cross-references and navigation structure
* Separation of current vs deprecated APIs
* Generates a directory tree like:
```
docs/api_reference/
├── index.md # Main landing page with package gallery
├── reference.md # Package overview and navigation
├── core/ # langchain-core documentation
│ ├── index.rst
│ ├── callbacks.rst
│ └── ...
├── langchain/ # langchain documentation
│ ├── index.rst
│ └── ...
└── partners/ # Integration packages
├── openai/
├── anthropic/
└── ...
```
## Key Features
* Respects privacy markers:
* Modules with `:private:` in docstring are excluded entirely
* Objects with `:private:` in docstring are filtered out
* Names starting with '_' are treated as private
"""
import importlib
import inspect
@@ -177,12 +214,13 @@ def _load_package_modules(
Traversal based on the file system makes it easy to determine which
of the modules/packages are part of the package vs. 3rd party or built-in.
Parameters:
package_directory (Union[str, Path]): Path to the package directory.
submodule (Optional[str]): Optional name of submodule to load.
Args:
package_directory: Path to the package directory.
submodule: Optional name of submodule to load.
Returns:
Dict[str, ModuleMembers]: A dictionary where keys are module names and values are ModuleMembers objects.
A dictionary where keys are module names and values are `ModuleMembers`
objects.
"""
package_path = (
Path(package_directory)
@@ -199,12 +237,13 @@ def _load_package_modules(
package_path = package_path / submodule
for file_path in package_path.rglob("*.py"):
# Skip private modules
if file_path.name.startswith("_"):
continue
# Skip integration_template and project_template directories (for libs/cli)
if "integration_template" in file_path.parts:
continue
if "project_template" in file_path.parts:
continue
@@ -215,8 +254,13 @@ def _load_package_modules(
continue
# Get the full namespace of the module
# Example: langchain_core/schema/output_parsers.py ->
# langchain_core.schema.output_parsers
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
# Keep only the top level namespace
# Example: langchain_core.schema.output_parsers ->
# langchain_core
top_namespace = namespace.split(".")[0]
try:
@@ -253,16 +297,16 @@ def _construct_doc(
members_by_namespace: Dict[str, ModuleMembers],
package_version: str,
) -> List[typing.Tuple[str, str]]:
"""Construct the contents of the reference.rst file for the given package.
"""Construct the contents of the `reference.rst` for the given package.
Args:
package_namespace: The package top level namespace
members_by_namespace: The members of the package, dict organized by top level
module contains a list of classes and functions
inside of the top level namespace.
members_by_namespace: The members of the package dict organized by top level.
Module contains a list of classes and functions inside of the top level
namespace.
Returns:
The contents of the reference.rst file.
The string contents of the reference.rst file.
"""
docs = []
index_doc = f"""\
@@ -465,10 +509,13 @@ def _construct_doc(
def _build_rst_file(package_name: str = "langchain") -> None:
"""Create a rst file for building of documentation.
"""Create a rst file for a given package.
Args:
package_name: Can be either "langchain" or "core"
package_name: Name of the package to create the rst file for.
Returns:
The rst file is created in the same directory as this script.
"""
package_dir = _package_dir(package_name)
package_members = _load_package_modules(package_dir)
@@ -500,7 +547,10 @@ def _package_namespace(package_name: str) -> str:
def _package_dir(package_name: str = "langchain") -> Path:
"""Return the path to the directory containing the documentation."""
"""Return the path to the directory containing the documentation.
Attempts to find the package in `libs/` first, then `libs/partners/`.
"""
if (ROOT_DIR / "libs" / package_name).exists():
return ROOT_DIR / "libs" / package_name / _package_namespace(package_name)
else:
@@ -514,7 +564,7 @@ def _package_dir(package_name: str = "langchain") -> Path:
def _get_package_version(package_dir: Path) -> str:
"""Return the version of the package."""
"""Return the version of the package by reading the `pyproject.toml`."""
try:
with open(package_dir.parent / "pyproject.toml", "r") as f:
pyproject = toml.load(f)
@@ -540,6 +590,15 @@ def _out_file_path(package_name: str) -> Path:
def _build_index(dirs: List[str]) -> None:
"""Build the index.md file for the API reference.
Args:
dirs: List of package directories to include in the index.
Returns:
The index.md file is created in the same directory as this script.
"""
custom_names = {
"aws": "AWS",
"ai21": "AI21",
@@ -556,12 +615,17 @@ def _build_index(dirs: List[str]) -> None:
integrations = sorted(dir_ for dir_ in dirs if dir_ not in main_)
doc = """# LangChain Python API Reference
Welcome to the LangChain Python API reference. This is a reference for all
Welcome to the LangChain v0.3 Python API reference. This is a reference for all
`langchain-x` packages.
For user guides see [https://python.langchain.com](https://python.langchain.com).
```{danger}
These pages refer to the the v0.3 versions of LangChain packages and integrations. To
visit the documentation for the latest versions of LangChain, visit [https://docs.langchain.com](https://docs.langchain.com)
and [https://reference.langchain.com/python/](https://reference.langchain.com/python/) (for references.)
For the legacy API reference (<v0.3) hosted on ReadTheDocs see [https://api.python.langchain.com/](https://api.python.langchain.com/).
```
For the legacy API reference hosted on ReadTheDocs see [https://api.python.langchain.com/](https://api.python.langchain.com/).
"""
if main_:
@@ -647,9 +711,14 @@ See the full list of integrations in the Section Navigation.
{integration_tree}
```
"""
# Write the reference.md file
with open(HERE / "reference.md", "w") as f:
f.write(doc)
# Write a dummy index.md file that points to reference.md
# Sphinx requires an index file to exist in each doc directory
# TODO: investigate why we don't just put everything in index.md directly?
# if it works it works I guess
dummy_index = """\
# API reference
@@ -665,8 +734,11 @@ Reference<reference>
def main(dirs: Optional[list] = None) -> None:
"""Generate the api_reference.rst file for each package."""
print("Starting to build API reference files.")
"""Generate the `api_reference.rst` file for each package.
If dirs is None, generate for all packages in `libs/` and `libs/partners/`.
Otherwise generate only for the specified package(s).
"""
if not dirs:
dirs = [
p.parent.name
@@ -675,18 +747,17 @@ def main(dirs: Optional[list] = None) -> None:
if p.parent.parent.name in ("libs", "partners")
]
for dir_ in sorted(dirs):
# Skip any hidden directories
# Skip any hidden directories prefixed with a dot
# Some of these could be present by mistake in the code base
# e.g., .pytest_cache from running tests from the wrong location.
# (e.g., .pytest_cache from running tests from the wrong location)
if dir_.startswith("."):
print("Skipping dir:", dir_)
continue
else:
print("Building package:", dir_)
print("Building:", dir_)
_build_rst_file(package_name=dir_)
_build_index(sorted(dirs))
print("API reference files built.")
if __name__ == "__main__":

View File

@@ -1,12 +1,12 @@
autodoc_pydantic>=2,<3
sphinx>=8,<9
myst-parser>=3
sphinx-autobuild>=2024
pydata-sphinx-theme>=0.15
toml>=0.10.2
myst-nb>=1.1.1
pyyaml
sphinx-design
sphinx-copybutton
beautifulsoup4
sphinxcontrib-googleanalytics
pydata-sphinx-theme>=0.15
myst-parser>=3
myst-nb>=1.1.1
toml>=0.10.2
pyyaml
beautifulsoup4

View File

@@ -1,3 +1,10 @@
"""Post-process generated HTML files to clean up table-of-contents headers.
Runs after Sphinx generates the API reference HTML. It finds TOC entries like
"ClassName.method_name()" and shortens them to just "method_name()" for better
readability in the sidebar navigation.
"""
import sys
from glob import glob
from pathlib import Path

View File

@@ -1,3 +1,7 @@
:::danger
⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the new v1.0 docs</a>
:::
# Conceptual guide
This guide provides explanations of the key concepts behind the LangChain framework and AI applications more broadly.

View File

@@ -189,40 +189,6 @@ This can be very helpful when you've made changes to only certain parts of the p
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Spellcheck
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
To check spelling for this project:
```bash
# If you have `make` installed:
make spell_check
# If you don't have `make` (Windows alternative):
uv run --all-groups codespell --toml pyproject.toml
```
To fix spelling in place:
```bash
# If you have `make` installed:
make spell_fix
# If you don't have `make` (Windows alternative):
uv run --all-groups codespell --toml pyproject.toml -w
```
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
```python
[tool.codespell]
...
# Add here:
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
```
### Pre-commit
We use [pre-commit](https://pre-commit.com/) to ensure commits are formatted/linted.

View File

@@ -3,6 +3,10 @@ sidebar_position: 0
sidebar_class_name: hidden
---
:::danger
⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the new v1.0 docs</a>
:::
# How-to guides
Here youll find answers to "How do I….?" types of questions.
@@ -72,7 +76,7 @@ See [supported integrations](/docs/integrations/chat/) for details on getting st
### Example selectors
[Example Selectors](/docs/concepts/example_selectors) are responsible for selecting the correct few shot examples to pass to the prompt.
[Example Selectors](/docs/concepts/example_selectors) are responsible for selecting the correct few-shot examples to pass to the prompt.
- [How to: use example selectors](/docs/how_to/example_selectors)
- [How to: select examples by length](/docs/how_to/example_selectors_length_based)
@@ -168,7 +172,7 @@ See [supported integrations](/docs/integrations/vectorstores/) for details on ge
Indexing is the process of keeping your vectorstore in-sync with the underlying data source.
- [How to: reindex data to keep your vectorstore in sync with the underlying data source](/docs/how_to/indexing)
- [How to: reindex data to keep your vectorstore in-sync with the underlying data source](/docs/how_to/indexing)
### Tools

View File

@@ -668,7 +668,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"id": "df0370e3",
"metadata": {},
"outputs": [
@@ -685,7 +685,7 @@
}
],
"source": [
"structured_llm = llm.with_structured_output(None, method=\"json_mode\")\n",
"structured_llm = llm.with_structured_output(None, method=\"json_schema\")\n",
"\n",
"structured_llm.invoke(\n",
" \"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys\"\n",

View File

@@ -39,6 +39,16 @@
"/>\n"
]
},
{
"cell_type": "markdown",
"id": "ecc06359",
"metadata": {},
"source": [
"See also: [How to summarize through parallelization](/docs/how_to/summarize_map_reduce/) and\n",
"[How to summarize through iterative refinement](/docs/how_to/summarize_refine/).\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,

View File

@@ -0,0 +1,288 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fbeb3f1eb129d115",
"metadata": {
"collapsed": false
},
"source": [
"---\n",
"sidebar_label: AI/ML API\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "6051ba9cfc65a60a",
"metadata": {
"collapsed": false
},
"source": [
"# ChatAimlapi\n",
"\n",
"This page will help you get started with AI/ML API [chat models](/docs/concepts/chat_models.mdx). For detailed documentation of all ChatAimlapi features and configurations, head to the [API reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration).\n",
"\n",
"AI/ML API provides access to **300+ models** (Deepseek, Gemini, ChatGPT, etc.) via high-uptime and high-rate API."
]
},
{
"cell_type": "markdown",
"id": "512f94fa4bea2628",
"metadata": {
"collapsed": false
},
"source": [
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| ChatAimlapi | langchain-aimlapi | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-aimlapi?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-aimlapi?style=flat-square&label=%20) |"
]
},
{
"cell_type": "markdown",
"id": "7163684608502d37",
"metadata": {
"collapsed": false
},
"source": [
"### Model features\n",
"| Tool calling | Structured output | JSON mode | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |\n",
"|:------------:|:-----------------:|:---------:|:-----------:|:-----------:|:-----------:|:---------------------:|:------------:|:-----------:|:--------:|\n",
"| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n"
]
},
{
"cell_type": "markdown",
"id": "bb9345d5b24a7741",
"metadata": {
"collapsed": false
},
"source": [
"## Setup\n",
"To access AI/ML API models, sign up at [aimlapi.com](https://aimlapi.com/app/?utm_source=langchain&utm_medium=github&utm_campaign=integration), generate an API key, and set the `AIMLAPI_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b26280519672f194",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:16:58.837623Z",
"start_time": "2025-08-07T07:16:55.346214Z"
}
},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if \"AIMLAPI_API_KEY\" not in os.environ:\n",
" os.environ[\"AIMLAPI_API_KEY\"] = getpass.getpass(\"Enter your AI/ML API key: \")"
]
},
{
"cell_type": "markdown",
"id": "fa131229e62dfd47",
"metadata": {
"collapsed": false
},
"source": [
"### Installation\n",
"Install the `langchain-aimlapi` package:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3777dc00d768299e",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:17:11.195741Z",
"start_time": "2025-08-07T07:17:02.288142Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-aimlapi"
]
},
{
"cell_type": "markdown",
"id": "d168108b0c4f9d7",
"metadata": {
"collapsed": false
},
"source": [
"## Instantiation\n",
"Now we can instantiate the `ChatAimlapi` model and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f29131e65e47bd16",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:17:23.499746Z",
"start_time": "2025-08-07T07:17:11.196747Z"
}
},
"outputs": [],
"source": [
"from langchain_aimlapi import ChatAimlapi\n",
"\n",
"llm = ChatAimlapi(\n",
" model=\"meta-llama/Llama-3-70b-chat-hf\",\n",
" temperature=0.7,\n",
" max_tokens=512,\n",
" timeout=30,\n",
" max_retries=3,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "861b87289f8e146d",
"metadata": {
"collapsed": false
},
"source": [
"## Invocation\n",
"You can invoke the model with a list of messages:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "430b1cff2e6d77b4",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:17:30.586261Z",
"start_time": "2025-08-07T07:17:29.074409Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"J'adore la programmation.\n"
]
}
],
"source": [
"messages = [\n",
" (\"system\", \"You are a helpful assistant that translates English to French.\"),\n",
" (\"human\", \"I love programming.\"),\n",
"]\n",
"\n",
"ai_msg = llm.invoke(messages)\n",
"print(ai_msg.content)"
]
},
{
"cell_type": "markdown",
"id": "5463797524a19b2e",
"metadata": {
"collapsed": false
},
"source": [
"## Chaining\n",
"We can chain the model with a prompt template as follows:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bf6defc12a0c5d78",
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:17:36.368436Z",
"start_time": "2025-08-07T07:17:34.770581Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ich liebe das Programmieren.\n"
]
}
],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"response = chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")\n",
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "fcf0bf10a872355c",
"metadata": {
"collapsed": false
},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatAimlapi features and configurations, visit the [API Reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -19,7 +19,7 @@
"\n",
"This notebook provides a quick overview for getting started with Anthropic [chat models](/docs/concepts/chat_models). For detailed documentation of all ChatAnthropic features and configurations head to the [API reference](https://python.langchain.com/api_reference/anthropic/chat_models/langchain_anthropic.chat_models.ChatAnthropic.html).\n",
"\n",
"Anthropic has several chat models. You can find information about their latest models and their costs, context windows, and supported input types in the [Anthropic docs](https://docs.anthropic.com/en/docs/models-overview).\n",
"Anthropic has several chat models. You can find information about their latest models and their costs, context windows, and supported input types in the [Anthropic docs](https://docs.anthropic.com/en/docs/about-claude/models/overview).\n",
"\n",
"\n",
":::info AWS Bedrock and Google VertexAI\n",
@@ -840,7 +840,7 @@
"source": [
"## Token-efficient tool use\n",
"\n",
"Anthropic supports a (beta) [token-efficient tool use](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/token-efficient-tool-use) feature. To use it, specify the relevant beta-headers when instantiating the model."
"Anthropic supports a (beta) [token-efficient tool use](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use) feature. To use it, specify the relevant beta-headers when instantiating the model."
]
},
{
@@ -1191,6 +1191,40 @@
"response.content"
]
},
{
"cell_type": "markdown",
"id": "74247a07-b153-444f-9c56-77659aeefc88",
"metadata": {},
"source": [
"## Context management\n",
"\n",
"Anthropic supports a context editing feature that will automatically manage the model's context window (e.g., by clearing tool results).\n",
"\n",
"See [Anthropic documentation](https://docs.claude.com/en/docs/build-with-claude/context-editing) for details and configuration options.\n",
"\n",
":::info\n",
"Requires ``langchain-anthropic>=0.3.21``\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbb79c5d-37b5-4212-b36f-f27366192cf9",
"metadata": {},
"outputs": [],
"source": [
"from langchain_anthropic import ChatAnthropic\n",
"\n",
"llm = ChatAnthropic(\n",
" model=\"claude-sonnet-4-5-20250929\",\n",
" betas=[\"context-management-2025-06-27\"],\n",
" context_management={\"edits\": [{\"type\": \"clear_tool_uses_20250919\"}]},\n",
")\n",
"llm_with_tools = llm.bind_tools([{\"type\": \"web_search_20250305\", \"name\": \"web_search\"}])\n",
"response = llm_with_tools.invoke(\"Search for recent developments in AI\")"
]
},
{
"cell_type": "markdown",
"id": "cbfec7a9-d9df-4d12-844e-d922456dd9bf",
@@ -1198,7 +1232,7 @@
"source": [
"## Built-in tools\n",
"\n",
"Anthropic supports a variety of [built-in tools](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/text-editor-tool), which can be bound to the model in the [usual way](/docs/how_to/tool_calling/). Claude will generate tool calls adhering to its internal schema for the tool:"
"Anthropic supports a variety of [built-in tools](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/text-editor-tool), which can be bound to the model in the [usual way](/docs/how_to/tool_calling/). Claude will generate tool calls adhering to its internal schema for the tool:"
]
},
{
@@ -1208,7 +1242,7 @@
"source": [
"### Web search\n",
"\n",
"Claude can use a [web search tool](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/web-search-tool) to run searches and ground its responses with citations."
"Claude can use a [web search tool](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/web-search-tool) to run searches and ground its responses with citations."
]
},
{
@@ -1457,6 +1491,38 @@
"</details>"
]
},
{
"cell_type": "markdown",
"id": "29405da2-d2ef-415c-b674-6e29073cd05e",
"metadata": {},
"source": [
"### Memory tool\n",
"\n",
"Claude supports a memory tool for client-side storage and retrieval of context across conversational threads. See docs [here](https://docs.claude.com/en/docs/agents-and-tools/tool-use/memory-tool) for details.\n",
"\n",
":::info\n",
"Requires ``langchain-anthropic>=0.3.21``\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bbd76eaa-041f-4fb8-8346-ca8fe0001c01",
"metadata": {},
"outputs": [],
"source": [
"from langchain_anthropic import ChatAnthropic\n",
"\n",
"llm = ChatAnthropic(\n",
" model=\"claude-sonnet-4-5-20250929\",\n",
" betas=[\"context-management-2025-06-27\"],\n",
")\n",
"llm_with_tools = llm.bind_tools([{\"type\": \"memory_20250818\", \"name\": \"memory\"}])\n",
"\n",
"response = llm_with_tools.invoke(\"What are my interests?\")"
]
},
{
"cell_type": "markdown",
"id": "040f381a-1768-479a-9a5e-aa2d7d77e0d5",
@@ -1522,7 +1588,7 @@
"source": [
"### Text editor\n",
"\n",
"The text editor tool can be used to view and modify text files. See docs [here](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/text-editor-tool) for details."
"The text editor tool can be used to view and modify text files. See docs [here](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/text-editor-tool) for details."
]
},
{

View File

@@ -31,7 +31,7 @@
"\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |\n",
"| ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |\n",
"\n",
"### Setup\n",
"\n",
@@ -653,15 +653,35 @@
"\n",
"# Initialize the model\n",
"llm = ChatGoogleGenerativeAI(model=\"gemini-2.0-flash\", temperature=0)\n",
"structured_llm = llm.with_structured_output(Person)\n",
"\n",
"# Method 1: Default function calling approach\n",
"structured_llm_default = llm.with_structured_output(Person)\n",
"\n",
"# Method 2: Native JSON mode\n",
"structured_llm_json = llm.with_structured_output(Person, method=\"json_mode\")\n",
"\n",
"# Invoke the model with a query asking for structured information\n",
"result = structured_llm.invoke(\n",
"result = structured_llm_json.invoke(\n",
" \"Who was the 16th president of the USA, and how tall was he in meters?\"\n",
")\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"id": "g9w06ld1ggq",
"metadata": {},
"source": [
"### Structured Output Methods\n",
"\n",
"Two methods are supported for structured output:\n",
"\n",
"- **`method=\"function_calling\"` (default)**: Uses tool calling to extract structured data. Compatible with all Gemini models.\n",
"- **`method=\"json_mode\"`**: Uses Gemini's native structured output with `responseSchema`. More reliable but requires Gemini 1.5+ models.\n",
"\n",
"The `json_mode` method is **recommended for better reliability** as it constrains the model's generation process directly rather than relying on post-processing tool calls."
]
},
{
"cell_type": "markdown",
"id": "90d4725e",

View File

@@ -17,7 +17,7 @@
"source": [
"# ChatOCIGenAI\n",
"\n",
"This notebook provides a quick overview for getting started with OCIGenAI [chat models](/docs/concepts/chat_models). For detailed documentation of all ChatOCIGenAI features and configurations head to the [API reference](https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.oci_generative_ai.ChatOCIGenAI.html).\n",
"This notebook provides a quick overview for getting started with OCIGenAI [chat models](/docs/concepts/chat_models). For detailed documentation of all ChatOCIGenAI features and configurations head to the [API reference](https://pypi.org/project/langchain-oci/).\n",
"\n",
"Oracle Cloud Infrastructure (OCI) Generative AI is a fully managed service that provides a set of state-of-the-art, customizable large language models (LLMs) that cover a wide range of use cases, and which is available through a single API.\n",
"Using the OCI Generative AI service you can access ready-to-use pretrained models, or create and host your own fine-tuned custom models based on your own data on dedicated AI clusters. Detailed documentation of the service and API is available __[here](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)__ and __[here](https://docs.oracle.com/en-us/iaas/api/#/en/generative-ai/20231130/)__.\n",
@@ -26,9 +26,9 @@
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/chat/oci_generative_ai) |\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [ChatOCIGenAI](https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.oci_generative_ai.ChatOCIGenAI.html) | [langchain-community](https://python.langchain.com/api_reference/community/index.html) | ❌ | ❌ | ❌ |\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/docs/integrations/chat/oci_generative_ai) |\n",
"| :--- |:---------------------------------------------------------------------------------| :---: | :---: | :---: |\n",
"| [ChatOCIGenAI](https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.oci_generative_ai.ChatOCIGenAI.html) | [langchain-oci](https://github.com/oracle/langchain-oracle) | ❌ | ❌ | ❌ |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | [JSON mode](/docs/how_to/structured_output/#advanced-specifying-the-method-for-structuring-outputs) | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
@@ -37,7 +37,7 @@
"\n",
"## Setup\n",
"\n",
"To access OCIGenAI models you'll need to install the `oci` and `langchain-community` packages.\n",
"To access OCIGenAI models you'll need to install the `oci` and `langchain-oci` packages.\n",
"\n",
"### Credentials\n",
"\n",
@@ -84,13 +84,15 @@
"outputs": [],
"source": [
"from langchain_oci.chat_models import ChatOCIGenAI\n",
"from langchain_core.messages import AIMessage, HumanMessage, SystemMessage\n",
"\n",
"chat = ChatOCIGenAI(\n",
" model_id=\"cohere.command-r-16k\",\n",
" model_id=\"cohere.command-r-plus-08-2024\",\n",
" service_endpoint=\"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com\",\n",
" compartment_id=\"MY_OCID\",\n",
" model_kwargs={\"temperature\": 0.7, \"max_tokens\": 500},\n",
" compartment_id=\"compartment_id\",\n",
" model_kwargs={\"temperature\": 0, \"max_tokens\": 500},\n",
" auth_type=\"SECURITY_TOKEN\",\n",
" auth_profile=\"auth_profile_name\",\n",
" auth_file_location=\"auth_file_location\",\n",
")"
]
},
@@ -110,14 +112,7 @@
"tags": []
},
"outputs": [],
"source": [
"messages = [\n",
" SystemMessage(content=\"your are an AI assistant.\"),\n",
" AIMessage(content=\"Hi there human!\"),\n",
" HumanMessage(content=\"tell me a joke.\"),\n",
"]\n",
"response = chat.invoke(messages)"
]
"source": "response = chat.invoke(\"Tell me one fact about Earth\")"
},
{
"cell_type": "code",
@@ -146,13 +141,22 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.prompts import PromptTemplate\n",
"from langchain_oci.chat_models import ChatOCIGenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\"Tell me a joke about {topic}\")\n",
"chain = prompt | chat\n",
"\n",
"response = chain.invoke({\"topic\": \"dogs\"})\n",
"print(response.content)"
"llm = ChatOCIGenAI(\n",
" model_id=\"cohere.command-r-plus-08-2024\",\n",
" service_endpoint=\"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com\",\n",
" compartment_id=\"compartment_id\",\n",
" model_kwargs={\"temperature\": 0, \"max_tokens\": 500},\n",
" auth_type=\"SECURITY_TOKEN\",\n",
" auth_profile=\"auth_profile_name\",\n",
" auth_file_location=\"auth_file_location\",\n",
")\n",
"prompt = PromptTemplate(input_variables=[\"query\"], template=\"{query}\")\n",
"llm_chain = prompt | llm\n",
"response = llm_chain.invoke(\"what is the capital of france?\")\n",
"print(response)"
]
},
{
@@ -162,7 +166,7 @@
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatOCIGenAI features and configurations head to the API reference: https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.oci_generative_ai.ChatOCIGenAI.html"
"For detailed documentation of all ChatOCIGenAI features and configurations head to the API reference: https://pypi.org/project/langchain-oci/"
]
}
],

View File

@@ -0,0 +1,408 @@
{
"cells": [
{
"cell_type": "raw",
"id": "afaf8039",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"---\n",
"sidebar_label: Qwen\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"metadata": {},
"source": [
"# ChatQwen\n",
"\n",
"This will help you get started with Qwen [chat models](../../concepts/chat_models.mdx). For detailed documentation of all ChatQwen features and configurations head to the [API reference](https://pypi.org/project/langchain-qwq/).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"\n",
"| Class | Package | Local | Serializable | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [ChatQwen](https://pypi.org/project/langchain-qwq/) | [langchain-qwq](https://pypi.org/project/langchain-qwq/) | ❌ | beta | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-qwq?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-qwq?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](../../how_to/tool_calling.ipynb) | [Structured output](../../how_to/structured_output.ipynb) | JSON mode | [Image input](../../how_to/multimodal_inputs.ipynb) | Audio input | Video input | [Token-level streaming](../../how_to/chat_streaming.ipynb) | Native async | [Token usage](../../how_to/chat_token_usage_tracking.ipynb) | [Logprobs](../../how_to/logprobs.ipynb) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ✅ |✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access Qwen models you'll need to create an Alibaba Cloud account, get an API key, and install the `langchain-qwq` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [Alibaba's API Key page](https://account.alibabacloud.com/login/login.htm?oauth_callback=https%3A%2F%2Fbailian.console.alibabacloud.com%2F%3FapiKey%3D1&lang=en#/api-key) to sign up to Alibaba Cloud and generate an API key. Once you've done this set the `DASHSCOPE_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "433e8d2b-9519-4b49-b2c4-7ab65b046c94",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"DASHSCOPE_API_KEY\"):\n",
" os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"Enter your Dashscope API key: \")"
]
},
{
"cell_type": "markdown",
"id": "0730d6a1-c893-4840-9817-5e5251676d5d",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"The LangChain QwQ integration lives in the `langchain-qwq` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-qwq"
]
},
{
"cell_type": "markdown",
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Hello! How can I assist you today? 😊', additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'qwen-flash'}, id='run--62798a20-d425-48ab-91fc-8e62e37c6084-0', usage_metadata={'input_tokens': 9, 'output_tokens': 11, 'total_tokens': 20, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_qwq import ChatQwen\n",
"\n",
"llm = ChatQwen(model=\"qwen-flash\")\n",
"response = llm.invoke(\"Hello\")\n",
"\n",
"response"
]
},
{
"cell_type": "markdown",
"id": "2b4f3e15",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "62e0dbc3",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'qwen-flash'}, id='run--33f905e0-880a-4a67-ab83-313fd7a06369-0', usage_metadata={'input_tokens': 32, 'output_tokens': 8, 'total_tokens': 40, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates English to French.\"\n",
" \"Translate the user sentence.\",\n",
" ),\n",
" (\"human\", \"I love programming.\"),\n",
"]\n",
"ai_msg = llm.invoke(messages)\n",
"ai_msg"
]
},
{
"cell_type": "markdown",
"id": "18e2bfc0-7e78-4528-a73f-499ac150dca8",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"We can [chain](../../how_to/sequence.ipynb) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe Programmierung.', additional_kwargs={}, response_metadata={'finish_reason': 'stop', 'model_name': 'qwen-flash'}, id='run--9d8bab6d-d6fe-4b9f-95f2-c30c3ff0a50e-0', usage_metadata={'input_tokens': 28, 'output_tokens': 5, 'total_tokens': 33, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates\"\n",
" \"{input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8d1b3ef3",
"metadata": {},
"source": [
"## Tool Calling\n",
"ChatQwen supports tool calling API that lets you describe tools and their arguments, and have the model return a JSON object with a tool to invoke and the inputs to that tool."
]
},
{
"cell_type": "markdown",
"id": "6db1a355",
"metadata": {},
"source": [
"### Use with `bind_tools`"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "15fb6a6d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"content='' additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_f0c2cc49307f480db78a45', 'function': {'arguments': '{\"first_int\": 5, \"second_int\": 42}', 'name': 'multiply'}, 'type': 'function'}]} response_metadata={'finish_reason': 'tool_calls', 'model_name': 'qwen-flash'} id='run--27c5aafb-9710-42f5-ab78-5a2ad1d9050e-0' tool_calls=[{'name': 'multiply', 'args': {'first_int': 5, 'second_int': 42}, 'id': 'call_f0c2cc49307f480db78a45', 'type': 'tool_call'}] usage_metadata={'input_tokens': 166, 'output_tokens': 27, 'total_tokens': 193, 'input_token_details': {}, 'output_token_details': {}}\n"
]
}
],
"source": [
"from langchain_core.tools import tool\n",
"\n",
"from langchain_qwq import ChatQwen\n",
"\n",
"\n",
"@tool\n",
"def multiply(first_int: int, second_int: int) -> int:\n",
" \"\"\"Multiply two integers together.\"\"\"\n",
" return first_int * second_int\n",
"\n",
"\n",
"llm = ChatQwen(model=\"qwen-flash\")\n",
"\n",
"llm_with_tools = llm.bind_tools([multiply])\n",
"\n",
"msg = llm_with_tools.invoke(\"What's 5 times forty two\")\n",
"\n",
"print(msg)"
]
},
{
"cell_type": "markdown",
"id": "cc8ffd89-c474-45a7-a123-e0b1d362f54f",
"metadata": {},
"source": [
"### vision Support"
]
},
{
"cell_type": "markdown",
"id": "3e8a7d46-d1f6-4ae8-835a-266ca47e4daf",
"metadata": {},
"source": [
"#### Image"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "54f69db3-fa51-4b9a-885c-1353968066e3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This image depicts a cozy, rustic Christmas scene set against a wooden backdrop. The arrangement features a variety of festive decorations that evoke a warm, holiday atmosphere:\n",
"\n",
"- **Centerpiece**: A decorative reindeer figurine with large antlers stands prominently in the background.\n",
"- **Miniature Trees**: Two small, snow-dusted artificial Christmas trees flank the reindeer, adding to the wintry feel.\n",
"- **Candles**: Three log-shaped candle holders made from birch bark are lit, casting a soft, warm glow. Two are in the foreground, and one is slightly behind them.\n",
"- **\"Merry Christmas\" Sign**: A wooden cutout sign spelling \"MERRY CHRISTMAS\" is placed on the left, decorated with a tiny golden gift box and a small reindeer silhouette.\n",
"- **Holiday Elements**: Pinecones, red berries, greenery, and fairy lights are scattered throughout, enhancing the natural, festive theme.\n",
"- **Other Details**: A white sack with \"SANTA\" written on it is partially visible on the left, along with a large glass ornament and twinkling string lights.\n",
"\n",
"The overall aesthetic is warm, inviting, and traditional, emphasizing natural materials like wood, pine, and birch bark. It captures the essence of a rustic, homemade Christmas celebration.\n"
]
}
],
"source": [
"from langchain_core.messages import HumanMessage\n",
"\n",
"model = ChatQwen(model=\"qwen-vl-max-latest\")\n",
"\n",
"messages = [\n",
" HumanMessage(\n",
" content=[\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": \"https://example.com/image/image.png\"},\n",
" },\n",
" {\"type\": \"text\", \"text\": \"What do you see in this image?\"},\n",
" ]\n",
" )\n",
"]\n",
"\n",
"response = model.invoke(messages)\n",
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "b1faea19-932f-4dc8-b0af-60e3507eee08",
"metadata": {},
"source": [
"#### Video"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59355c38-d3e2-4051-811a-2b99286ea01b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This video features a young woman with a warm and cheerful expression, standing outdoors in a well-lit environment. She has short, neatly styled brown hair with bangs and is wearing a soft pink knitted cardigan over a white top. A delicate necklace adorns her neck, adding a subtle touch of elegance to her outfit.\n",
"\n",
"Throughout the video, she maintains eye contact with the camera, smiling gently and occasionally opening her mouth as if speaking or laughing. Her facial expressions are natural and engaging, suggesting a friendly and approachable demeanor. The background is softly blurred, indicating a shallow depth of field, which keeps the focus on her. It appears to be an urban setting with modern buildings, possibly a residential or commercial area.\n",
"\n",
"The lighting is bright and natural, likely from sunlight, casting a soft glow on her face and highlighting her features. The overall tone of the video is pleasant and inviting, evoking a sense of warmth and positivity.\n",
"\n",
"In the top right corner of the frames, there is a watermark that reads \"通义·AI合成,\" which indicates that this video was generated using AI technology by Tongyi Lab, a company known for its advancements in artificial intelligence and digital content creation. This suggests that the video may be a demonstration of AI-generated human-like avatars or synthetic media.\n"
]
}
],
"source": [
"from langchain_core.messages import HumanMessage\n",
"\n",
"model = ChatQwen(model=\"qwen-vl-max-latest\")\n",
"\n",
"messages = [\n",
" HumanMessage(\n",
" content=[\n",
" {\n",
" \"type\": \"video_url\",\n",
" \"video_url\": {\"url\": \"https://example.com/video/1.mp4\"},\n",
" },\n",
" {\"type\": \"text\", \"text\": \"Can you tell me about this video?\"},\n",
" ]\n",
" )\n",
"]\n",
"\n",
"response = model.invoke(messages)\n",
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatQwen features and configurations head to the [API reference](https://pypi.org/project/langchain-qwq/)"
]
},
{
"cell_type": "markdown",
"id": "ce1026e3",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -34,7 +34,7 @@
"### Model features\n",
"| [Tool calling](../../how_to/tool_calling.ipynb) | [Structured output](../../how_to/structured_output.ipynb) | JSON mode | [Image input](../../how_to/multimodal_inputs.ipynb) | Audio input | Video input | [Token-level streaming](../../how_to/chat_streaming.ipynb) | Native async | [Token usage](../../how_to/chat_token_usage_tracking.ipynb) | [Logprobs](../../how_to/logprobs.ipynb) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ✅ | | ❌ | | ✅ | ✅ | ✅ | ❌ | \n",
"| ✅ | ✅ | ✅ | | ❌ | | ✅ | ✅ | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
@@ -47,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "433e8d2b-9519-4b49-b2c4-7ab65b046c94",
"metadata": {},
"outputs": [],
@@ -91,7 +91,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
"metadata": {},
"outputs": [],
@@ -117,7 +117,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"id": "62e0dbc3",
"metadata": {
"tags": []
@@ -126,10 +126,10 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'aime la programmation.\", additional_kwargs={'reasoning_content': 'Okay, the user wants me to translate \"I love programming.\" into French. Let\\'s start by breaking down the sentence. The subject is \"I\", which in French is \"Je\". The verb is \"love\", which in this context is present tense, so \"aime\". The object is \"programming\". Now, \"programming\" in French can be \"la programmation\". \\n\\nWait, should it be \"programmation\" or \"programmation\"? Let me confirm the spelling. Yes, \"programmation\" is correct. Now, putting it all together: \"Je aime la programmation.\" Hmm, but in French, there\\'s a tendency to contract \"je\" and \"aime\". Wait, actually, \"je\" followed by a vowel sound usually takes \"j\\'\". So it should be \"J\\'aime la programmation.\" \\n\\nLet me double-check. \"J\\'aime\" is the correct contraction for \"I love\". The definite article \"la\" is needed because \"programmation\" is a feminine noun. Yes, \"programmation\" is a feminine noun, so \"la\" is correct. \\n\\nIs there any other way to say it? Maybe \"J\\'adore la programmation\" for \"I love\" in a stronger sense, but the user didn\\'t specify the intensity. Since the original is straightforward, \"J\\'aime la programmation.\" is the direct translation. \\n\\nI think that\\'s it. No mistakes there. So the final translation should be \"J\\'aime la programmation.\"'}, response_metadata={'model_name': 'qwq-plus'}, id='run-5045cd6a-edbd-4b2f-bf24-b7bdf3777fb9-0', usage_metadata={'input_tokens': 32, 'output_tokens': 326, 'total_tokens': 358, 'input_token_details': {}, 'output_token_details': {}})"
"AIMessage(content=\"J'aime la programmation.\", additional_kwargs={'reasoning_content': 'Okay, the user wants me to translate \"I love programming.\" into French. Let me start by recalling the basic translation. The verb \"love\" in French is \"aimer\", and \"programming\" is \"la programmation\". So the literal translation would be \"J\\'aime la programmation.\" But wait, I should check if there\\'s any context or nuances I need to consider. The user mentioned they\\'re a helpful assistant, so maybe they want a more natural or commonly used phrase. Sometimes in French, people might use \"adorer\" instead of \"aimer\" for stronger emphasis, but \"aimer\" is more standard here. Also, the structure \"J\\'aime\" is correct for \"I love\". No need for any articles if it\\'s a general statement, but \"la programmation\" is a feminine noun, so the article is necessary. Let me confirm the gender of \"programmation\"—yes, it\\'s feminine. So \"la\" is correct. I think that\\'s it. The translation should be \"J\\'aime la programmation.\"'}, response_metadata={'model_name': 'qwq-plus'}, id='run--396edf0f-ab92-4317-99be-cc9f5377c312-0', usage_metadata={'input_tokens': 32, 'output_tokens': 229, 'total_tokens': 261, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -159,17 +159,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={'reasoning_content': 'Okay, the user wants me to translate \"I love programming.\" into German. Let me think. The verb \"love\" is \"lieben\" or \"mögen\" in German, but \"lieben\" is more like love, while \"mögen\" is prefer. Since it\\'s about programming, which is a strong affection, \"lieben\" is better. The subject is \"I\", which is \"ich\". Then \"programming\" is \"Programmierung\" or \"Coding\". But \"Programmierung\" is more formal. Alternatively, sometimes people say \"ich liebe es zu programmieren\" which is \"I love to program\". Hmm, maybe the direct translation would be \"Ich liebe die Programmierung.\" But maybe the more natural way is \"Ich liebe es zu programmieren.\" Let me check. Both are correct, but the second one might sound more natural in everyday speech. The user might prefer the concise version. Alternatively, maybe \"Ich liebe die Programmierung.\" is better. Wait, the original is \"programming\" as a noun. So using the noun form would be appropriate. So \"Ich liebe die Programmierung.\" But sometimes people also use \"Coding\" in German, like \"Ich liebe das Coding.\" But that\\'s more anglicism. Probably better to stick with \"Programmierung\". Alternatively, \"Programmieren\" as a noun. Oh right! \"Programmieren\" can be a noun when used in the accusative case. So \"Ich liebe das Programmieren.\" That\\'s correct and natural. Yes, that\\'s the best translation. So the answer is \"Ich liebe das Programmieren.\"'}, response_metadata={'model_name': 'qwq-plus'}, id='run-2c418451-51d8-4319-8269-2ce129363a1a-0', usage_metadata={'input_tokens': 28, 'output_tokens': 341, 'total_tokens': 369, 'input_token_details': {}, 'output_token_details': {}})"
"AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={'reasoning_content': 'Okay, the user wants to translate \"I love programming.\" into German. Let\\'s start by breaking down the sentence. The subject is \"I,\" which translates to \"Ich\" in German. The verb \"love\" is \"liebe\" in present tense for the first person singular. Then \"programming\" is a noun. Now, in German, the word for programming, especially in the context of computer programming, is \"Programmierung.\" However, sometimes people might use \"Programmieren\" as well. Wait, but \"Programmierung\" is the noun form, so \"die Programmierung.\" The structure in German would be \"Ich liebe die Programmierung.\" Alternatively, could it be \"Programmieren\" as the verb in a nominalized form? Let me think. If you say \"Ich liebe das Programmieren,\" that\\'s also correct because \"das Programmieren\" is the gerundive form, which is commonly used for activities. So both are possible. Which one is more natural? Hmm. \"Das Programmieren\" might be more common in everyday language. Let me check some examples. For instance, \"I love cooking\" would be \"Ich liebe das Kochen.\" So following that pattern, \"Ich liebe das Programmieren\" would be the equivalent. Therefore, maybe \"Programmieren\" with the article \"das\" is better here. But the user might just want a direct translation without the article. Wait, the original sentence is \"I love programming,\" which is a noun, so in German, you need an article. So the correct translation would include \"das\" before the noun. So the correct sentence is \"Ich liebe das Programmieren.\" Alternatively, if they want to use the noun without an article, maybe in a more abstract sense, but I think \"das\" is necessary here. Let me confirm. Yes, in German, when using the noun form of a verb like this, you need the article. So the best translation is \"Ich liebe das Programmieren.\" I think that\\'s the most natural way to say it. Alternatively, \"Programmierung\" is more formal, but \"Programmieren\" is more commonly used in such contexts. So I\\'ll go with \"Ich liebe das Programmieren.\"'}, response_metadata={'model_name': 'qwq-plus'}, id='run--0ceaba8a-7842-48fb-8bec-eb96d2c83ed4-0', usage_metadata={'input_tokens': 28, 'output_tokens': 466, 'total_tokens': 494, 'input_token_details': {}, 'output_token_details': {}})"
]
},
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -217,7 +217,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "15fb6a6d",
"metadata": {},
"outputs": [
@@ -225,12 +225,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"content='' additional_kwargs={'reasoning_content': 'Okay, the user is asking \"What\\'s 5 times forty two\". Let me break this down. First, I need to identify the numbers involved. The first number is 5, which is straightforward. The second number is forty two, which is 42 in digits. The operation they want is multiplication.\\n\\nLooking at the tools provided, there\\'s a function called multiply that takes two integers. So I should use that. The parameters are first_int and second_int. \\n\\nI need to convert \"forty two\" to 42. Since the function requires integers, both numbers should be in integer form. So 5 and 42. \\n\\nNow, I\\'ll structure the tool call. The function name is multiply, and the arguments should be first_int: 5 and second_int: 42. I\\'ll make sure the JSON is correctly formatted without any syntax errors. Let me double-check the parameters to ensure they\\'re required and of the right type. Yep, both are required and integers. \\n\\nNo examples were provided, but the function\\'s purpose is clear. So the correct tool call should be to multiply those two numbers. I think that\\'s all. No other functions are needed here.'} response_metadata={'model_name': 'qwq-plus'} id='run-638895aa-fdde-4567-bcfa-7d8e5d4f24af-0' tool_calls=[{'name': 'multiply', 'args': {'first_int': 5, 'second_int': 42}, 'id': 'call_d088275851c140529ed2ad', 'type': 'tool_call'}] usage_metadata={'input_tokens': 176, 'output_tokens': 277, 'total_tokens': 453, 'input_token_details': {}, 'output_token_details': {}}\n"
"content='' additional_kwargs={'reasoning_content': 'Okay, the user is asking \"What\\'s 5 times forty two\". Let me break this down. They want the product of 5 and 42. The function provided is called multiply, which takes two integers. First, I need to parse the numbers from the question. The first integer is 5, straightforward. The second is forty two, which is 42 in numeric form. So I should call the multiply function with first_int=5 and second_int=42. Let me double-check the parameters: both are required and of type integer. Yep, that\\'s correct. No examples given, but the function should handle these numbers. Alright, time to format the tool call.'} response_metadata={'model_name': 'qwq-plus'} id='run--3c5ff46c-3fc8-4caf-a665-2405aeef2948-0' tool_calls=[{'name': 'multiply', 'args': {'first_int': 5, 'second_int': 42}, 'id': 'call_33fb94c6662d44928e56ec', 'type': 'tool_call'}] usage_metadata={'input_tokens': 176, 'output_tokens': 173, 'total_tokens': 349, 'input_token_details': {}, 'output_token_details': {}}\n"
]
}
],
"source": [
"from langchain_core.tools import tool\n",
"\n",
"from langchain_qwq import ChatQwQ\n",
"\n",
"\n",
@@ -249,6 +250,170 @@
"print(msg)"
]
},
{
"cell_type": "markdown",
"id": "88aa9980-1bd6-4cc9-aeac-4c9011e617fc",
"metadata": {},
"source": [
"### vision Support"
]
},
{
"cell_type": "markdown",
"id": "79e372e3-7050-4038-bf88-d1e8f5ddae09",
"metadata": {},
"source": [
"#### Image"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c2372365-7208-42f9-a147-deffdc390313",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The image depicts a charming Christmas-themed arrangement set against a rustic wooden backdrop, creating a warm and festive atmosphere. Here's a detailed breakdown:\n",
"\n",
"### **Background & Setting**\n",
"- **Wooden Wall**: A horizontally paneled wooden wall forms the backdrop, giving a cozy, cabin-like feel.\n",
"- **Foreground Surface**: The decorations rest on a smooth wooden surface (likely a table or desk), enhancing the natural, earthy tone of the scene.\n",
"\n",
"### **Key Elements**\n",
"1. **Snow-Covered Trees**:\n",
" - Two miniature evergreen trees dusted with artificial snow flank the sides of the arrangement, evoking a wintry landscape.\n",
"\n",
"2. **String Lights**:\n",
" - A strand of white bulb lights stretches across the back, weaving through the decor and adding a soft, glowing ambiance.\n",
"\n",
"3. **Ornamental Sphere**:\n",
" - A reflective gold sphere with striped patterns sits near the center-left, catching and dispersing light.\n",
"\n",
"4. **\"Merry Christmas\" Sign**:\n",
" - A wooden cutout spelling \"MERRY CHRISTMAS\" in capital letters serves as the focal point. The letters feature star-shaped cutouts, allowing light to shine through.\n",
"\n",
"5. **Reindeer Figurine**:\n",
" - A brown reindeer with white facial markings and large antlers stands prominently on the right, facing forward and adding a playful touch.\n",
"\n",
"6. **Candle Holders**:\n",
" - Three birch-bark candle holders are arranged in front of the reindeer. Two hold lit tealights, casting a warm glow, while the third remains unlit.\n",
"\n",
"7. **Natural Accents**:\n",
" - **Pinecones**: Scattered throughout, adding texture and a woodland feel.\n",
" - **Berry Branches**: Red-berried greenery (likely holly) weaves behind the sign, introducing vibrant color.\n",
" - **Pine Branches**: Fresh-looking branches enhance the seasonal authenticity.\n",
"\n",
"8. **Gift Box**:\n",
" - A small golden gift box with a bow sits near the left, symbolizing holiday gifting.\n",
"\n",
"9. **Textile Detail**:\n",
" - A fabric piece with \"Christmas\" embroidered on it peeks from the left, partially obscured but contributing to the thematic unity.\n",
"\n",
"### **Color Palette & Mood**\n",
"- **Warm Tones**: Browns (wood, reindeer), golds (ornament, gift box), and whites (snow, lights) dominate, creating a inviting glow.\n",
"- **Cool Accents**: Greens (trees, branches) and reds (berries) provide contrast, balancing the warmth.\n",
"- **Lighting**: The lit candles and string lights cast a soft, flickering illumination, enhancing the intimate, celebratory vibe.\n",
"\n",
"### **Composition**\n",
"- **Balance**: The arrangement is symmetrical, with trees and candles on either side framing the central sign and reindeer.\n",
"- **Depth**: Layered elements (trees, lights, branches) create visual interest, drawing the eye inward.\n",
"\n",
"This image beautifully captures the essence of a cozy, handmade Christmas display, blending traditional symbols with natural textures to evoke nostalgia and joy.\n"
]
}
],
"source": [
"from langchain_core.messages import HumanMessage\n",
"\n",
"model = ChatQwQ(model=\"qvq-max-latest\")\n",
"\n",
"messages = [\n",
" HumanMessage(\n",
" content=[\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": \"https://example.com/image/image.png\"},\n",
" },\n",
" {\"type\": \"text\", \"text\": \"What do you see in this image?\"},\n",
" ]\n",
" )\n",
"]\n",
"\n",
"response = model.invoke(messages)\n",
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "9242acf7-9a66-40b1-98b5-b113d28fc6ec",
"metadata": {},
"source": [
"#### Video"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f0a9e542-7a85-44d2-8576-14314a50d948",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The image provided is a still frame from a video featuring a young woman with short brown hair and bangs, smiling brightly at the camera. Here's a detailed breakdown:\n",
"\n",
"### **Description of the Image:**\n",
"- **Subject:** A youthful female with a cheerful expression, showcasing a wide smile with visible teeth.\n",
"- **Appearance:** \n",
" - Short, neatly styled brown hair with blunt bangs.\n",
" - Natural makeup emphasizing clear skin and subtle eye makeup.\n",
" - Wearing a white round-neck shirt layered under a light pink knitted cardigan.\n",
" - Accessories include a delicate necklace with a small pendant and small earrings.\n",
"- **Background:** An outdoor setting with blurred architectural elements (e.g., buildings with columns), suggesting a campus, park, or residential area.\n",
"- **Lighting:** Soft, natural daylight, enhancing the warm and inviting atmosphere.\n",
"\n",
"### **Key Details About the Video:**\n",
"1. **AI-Generated Content:** The watermark (\"通义·AI合成\" / \"Tongyi AI Synthesis\") indicates this image was created using Alibaba's Tongyi AI model, known for generating hyper-realistic visuals.\n",
"2. **Style & Purpose:** The high-quality, photorealistic style suggests the video may demonstrate AI imaging capabilities, potentially for advertising, entertainment, or educational purposes.\n",
"3. **Context Clues:** The subject's casual yet polished look and the pleasant outdoor setting imply a positive, approachable theme (e.g., lifestyle, technology promotion, or social media content).\n",
"\n",
"### **What We Can Infer About the Video:**\n",
"- Likely showcases dynamic AI-generated scenes featuring the same character in various poses or interactions.\n",
"- May highlight realism in digital avatars or synthetic media.\n",
"- Could be part of a demo reel, tutorial, or creative project emphasizing AI artistry.\n",
"\n",
"### **Limitations:**\n",
"- As only a single frame is provided, specifics about the video's length, narrative, or additional scenes cannot be determined.\n",
"\n",
"If you have more frames or context, feel free to share! 😊\n"
]
}
],
"source": [
"from langchain_core.messages import HumanMessage\n",
"\n",
"model = ChatQwQ(model=\"qvq-max-latest\")\n",
"\n",
"messages = [\n",
" HumanMessage(\n",
" content=[\n",
" {\n",
" \"type\": \"video_url\",\n",
" \"video_url\": {\"url\": \"https://example.com/video/1.mp4\"},\n",
" },\n",
" {\"type\": \"text\", \"text\": \"Can you tell me about this video?\"},\n",
" ]\n",
" )\n",
"]\n",
"\n",
"response = model.invoke(messages)\n",
"print(response.content)"
]
},
{
"cell_type": "markdown",
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
@@ -258,11 +423,19 @@
"\n",
"For detailed documentation of all ChatQwQ features and configurations head to the [API reference](https://pypi.org/project/langchain-qwq/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "824f0c67-5f3b-4079-bc17-2cf92755bdd5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -276,7 +449,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.1"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -16,13 +16,7 @@
"This notebook covers how to load documents from Oracle Autonomous Database.\n",
"\n",
"## Prerequisites\n",
"1. Install python-oracledb:\n",
"\n",
" `pip install oracledb`\n",
" \n",
" See [Installing python-oracledb](https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html).\n",
"\n",
"2. A database that python-oracledb's default 'Thin' mode can connected to. This is true of Oracle Autonomous Database, see [python-oracledb Architecture](https://python-oracledb.readthedocs.io/en/latest/user_guide/introduction.html#architecture).\n"
"1. A database that python-oracledb's default 'Thin' mode can connected to. This is true of Oracle Autonomous Database, see [python-oracledb Architecture](https://python-oracledb.readthedocs.io/en/latest/user_guide/introduction.html#architecture).\n"
]
},
{
@@ -38,17 +32,12 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"cell_type": "markdown",
"metadata": {},
"source": [
"pip install oracledb"
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb."
]
},
{
@@ -62,7 +51,21 @@
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import OracleAutonomousDatabaseLoader\n",
"# python -m pip install -U langchain-oracledb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from langchain_oracledb.document_loaders import OracleAutonomousDatabaseLoader\n",
"from settings import s"
]
},
@@ -99,7 +102,7 @@
" config_dir=s.CONFIG_DIR,\n",
" wallet_location=s.WALLET_LOCATION,\n",
" wallet_password=s.PASSWORD,\n",
" tns_name=s.TNS_NAME,\n",
" dsn=s.DSN,\n",
")\n",
"doc_1 = doc_loader_1.load()\n",
"\n",
@@ -108,7 +111,7 @@
" user=s.USERNAME,\n",
" password=s.PASSWORD,\n",
" schema=s.SCHEMA,\n",
" connection_string=s.CONNECTION_STRING,\n",
" dsn=s.DSN,\n",
" wallet_location=s.WALLET_LOCATION,\n",
" wallet_password=s.PASSWORD,\n",
")\n",
@@ -147,7 +150,7 @@
" password=s.PASSWORD,\n",
" schema=s.SCHEMA,\n",
" config_dir=s.CONFIG_DIR,\n",
" tns_name=s.TNS_NAME,\n",
" dsn=s.DSN,\n",
" parameters=[\"Direct Sales\"],\n",
")\n",
"doc_3 = doc_loader_3.load()\n",
@@ -157,7 +160,7 @@
" user=s.USERNAME,\n",
" password=s.PASSWORD,\n",
" schema=s.SCHEMA,\n",
" connection_string=s.CONNECTION_STRING,\n",
" dsn=s.DSN,\n",
" parameters=[\"Direct Sales\"],\n",
")\n",
"doc_4 = doc_loader_4.load()"

View File

@@ -42,7 +42,9 @@
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb."
]
},
{
@@ -51,7 +53,7 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
"# python -m pip install -U langchain-oracledb"
]
},
{
@@ -154,7 +156,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_oracledb.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
@@ -199,7 +201,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_oracledb.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",

View File

@@ -0,0 +1,357 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"---\n",
"sidebar_label: AI/ML API\n",
"---"
],
"metadata": {
"collapsed": false
},
"id": "c74887ead73c5eb4"
},
{
"cell_type": "markdown",
"source": [
"# AimlapiLLM\n",
"\n",
"This page will help you get started with AI/ML API [text completion models](/docs/concepts/text_llms). For detailed documentation of all AimlapiLLM features and configurations, head to the [API reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration).\n",
"\n",
"AI/ML API provides access to **300+ models** (Deepseek, Gemini, ChatGPT, etc.) via high-uptime and high-rate API."
],
"metadata": {
"collapsed": false
},
"id": "c1895707cde83d90"
},
{
"cell_type": "markdown",
"source": [
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| AimlapiLLM | langchain-aimlapi | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-aimlapi?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-aimlapi?style=flat-square&label=%20) |"
],
"metadata": {
"collapsed": false
},
"id": "72b0a510b6eac641"
},
{
"cell_type": "markdown",
"source": [
"### Model features\n",
"| Tool calling | Structured output | JSON mode | Image input | Audio input | Video input | Token-level streaming | Native async | Token usage | Logprobs |\n",
"|:------------:|:-----------------:|:---------:|:-----------:|:-----------:|:-----------:|:---------------------:|:------------:|:-----------:|:--------:|\n",
"| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n"
],
"metadata": {
"collapsed": false
},
"id": "4b87089494d8877d"
},
{
"cell_type": "markdown",
"source": [
"## Setup\n",
"To access AI/ML API models, sign up at [aimlapi.com](https://aimlapi.com/app/?utm_source=langchain&utm_medium=github&utm_campaign=integration), generate an API key, and set the `AIMLAPI_API_KEY` environment variable:"
],
"metadata": {
"collapsed": false
},
"id": "2c45017efcc36569"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if \"AIMLAPI_API_KEY\" not in os.environ:\n",
" os.environ[\"AIMLAPI_API_KEY\"] = getpass.getpass(\"Enter your AI/ML API key: \")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:24:48.681319Z",
"start_time": "2025-08-07T07:24:47.490206Z"
}
},
"id": "86b05af725c45941",
"execution_count": 1
},
{
"cell_type": "markdown",
"source": [
"### Installation\n",
"Install the `langchain-aimlapi` package:"
],
"metadata": {
"collapsed": false
},
"id": "51171ba92cb2b382"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-aimlapi"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:18:08.606708Z",
"start_time": "2025-08-07T07:17:59.901457Z"
}
},
"id": "2b15cbaf7d5e1560",
"execution_count": 2
},
{
"cell_type": "markdown",
"source": [
"## Instantiation\n",
"Now we can instantiate the `AimlapiLLM` model and generate text completions:"
],
"metadata": {
"collapsed": false
},
"id": "e94379f9d37fe6b3"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"from langchain_aimlapi import AimlapiLLM\n",
"\n",
"llm = AimlapiLLM(\n",
" model=\"gpt-3.5-turbo-instruct\",\n",
" temperature=0.5,\n",
" max_tokens=256,\n",
")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:46:52.875867Z",
"start_time": "2025-08-07T07:46:52.869961Z"
}
},
"id": "8a3af681997723b0",
"execution_count": 23
},
{
"cell_type": "markdown",
"source": [
"## Invocation\n",
"You can invoke the model with a prompt:"
],
"metadata": {
"collapsed": false
},
"id": "c983ab1d95887e8f"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Bubble sort is a simple sorting algorithm that repeatedly steps through the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order. This process is repeated until the entire list is sorted.\n",
"\n",
"The algorithm gets its name from the way smaller elements \"bubble\" to the top of the list. It is commonly used for educational purposes due to its simplicity, but it is not a very efficient sorting algorithm for large data sets.\n",
"\n",
"Here is an implementation of the bubble sort algorithm in Python:\n",
"\n",
"1. Start by defining a function that takes in a list as its argument.\n",
"2. Set a variable \"swapped\" to True, indicating that a swap has occurred.\n",
"3. Create a while loop that runs as long as the \"swapped\" variable is True.\n",
"4. Inside the loop, set the \"swapped\" variable to False.\n",
"5. Create a for loop that iterates through the list, starting from the first element and ending at the second to last element.\n",
"6. Inside the for loop, compare the current element with the next element. If the current element is larger than the next element, swap them and set the \"swapped\" variable to True.\n",
"7. After the for loop, if the \"swapped\" variable\n"
]
}
],
"source": [
"response = llm.invoke(\"Explain the bubble sort algorithm in Python.\")\n",
"print(response)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:46:57.209950Z",
"start_time": "2025-08-07T07:46:53.935975Z"
}
},
"id": "9a193081f431a42a",
"execution_count": 24
},
{
"cell_type": "markdown",
"source": [
"## Streaming Invocation\n",
"You can also stream responses token-by-token:"
],
"metadata": {
"collapsed": false
},
"id": "1afedb28f556c7bd"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \n",
"\n",
"1. Python\n",
"Python has been consistently growing in popularity and has become one of the most widely used programming languages in recent years. It is used for a wide range of applications such as web development, data analysis, machine learning, and artificial intelligence. Its simple syntax and readability make it an attractive choice for beginners and experienced programmers alike. With the rise of data-driven technology and automation, Python is projected to be the most in-demand language in 2025.\n",
"\n",
"2. JavaScript\n",
"JavaScript continues to dominate the web development scene and is expected to maintain its position as a top programming language in 2025. With the increasing use of front-end frameworks like React and Angular, JavaScript is crucial for building dynamic and interactive user interfaces. Additionally, the rise of serverless architecture and the popularity of Node.js make JavaScript an essential language for both front-end and back-end development.\n",
"\n",
"3. Go\n",
"Go, also known as Golang, is a relatively new programming language developed by Google. It is designed for"
]
}
],
"source": [
"llm = AimlapiLLM(\n",
" model=\"gpt-3.5-turbo-instruct\",\n",
")\n",
"\n",
"for chunk in llm.stream(\"List top 5 programming languages in 2025 with reasons.\"):\n",
" print(chunk, end=\"\", flush=True)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:49:25.223233Z",
"start_time": "2025-08-07T07:49:22.101498Z"
}
},
"id": "a132c9183f648fb4",
"execution_count": 26
},
{
"cell_type": "markdown",
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all AimlapiLLM features and configurations, visit the [API Reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration).\n"
],
"metadata": {
"collapsed": false
},
"id": "7b4ab33058dc0974"
},
{
"cell_type": "markdown",
"source": [
"## Chaining\n",
"\n",
"You can also easily combine with a prompt template for easy structuring of user input. We can do this using [LCEL](/docs/concepts/lcel)"
],
"metadata": {
"collapsed": false
},
"id": "900f36a35477c8ae"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"from langchain_core.prompts import PromptTemplate\n",
"\n",
"prompt = PromptTemplate.from_template(\"Tell me a joke about {topic}\")\n",
"chain = prompt | llm"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:49:34.857042Z",
"start_time": "2025-08-07T07:49:34.853032Z"
}
},
"id": "d7f10052eb4ff249",
"execution_count": 27
},
{
"cell_type": "code",
"outputs": [
{
"data": {
"text/plain": "\"\\n\\nWhy do bears have fur coats?\\n\\nBecause they'd look silly in sweaters! \""
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"topic\": \"bears\"})"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:49:48.565804Z",
"start_time": "2025-08-07T07:49:35.558426Z"
}
},
"id": "184c333c60f94b05",
"execution_count": 28
},
{
"cell_type": "markdown",
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `AI/ML API` llm features and configurations head to the API reference: [API Reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration)"
],
"metadata": {
"collapsed": false
},
"id": "804f3a79a8046ec1"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -22,30 +22,28 @@
"metadata": {},
"source": [
"## Setup\n",
"Ensure that the oci sdk and the langchain-community package are installed"
"Ensure that the oci sdk and the langchain-community package are installed\n",
"\n",
":::caution You are currently on a page documenting the use of Oracle's text generation models. Which are deprecated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"cell_type": "code",
"outputs": [],
"source": [
"!pip install -U langchain-oci"
]
"execution_count": null,
"source": "!pip install -U langchain-oci"
},
{
"metadata": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage"
]
"source": "## Usage"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"from langchain_oci.llms import OCIGenAI\n",
"\n",

View File

@@ -0,0 +1,272 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# AI/ML API LLM\n",
"\n",
"[AI/ML API](https://aimlapi.com/app/?utm_source=langchain&utm_medium=github&utm_campaign=integration) provides an API to query **300+ leading AI models** (Deepseek, Gemini, ChatGPT, etc.) with enterprise-grade performance.\n",
"\n",
"This example demonstrates how to use LangChain to interact with AI/ML API models."
],
"metadata": {
"collapsed": false
},
"id": "bb9dcd1ba7b0f560"
},
{
"cell_type": "markdown",
"source": [
"## Installation"
],
"metadata": {
"collapsed": false
},
"id": "e4c35f60c565d369"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: langchain-aimlapi in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (0.1.0)\n",
"Requirement already satisfied: langchain-core<0.4.0,>=0.3.15 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-aimlapi) (0.3.67)\n",
"Requirement already satisfied: langsmith>=0.3.45 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.4.4)\n",
"Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (9.1.2)\n",
"Requirement already satisfied: jsonpatch<2.0,>=1.33 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (1.33)\n",
"Requirement already satisfied: PyYAML>=5.3 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (6.0.2)\n",
"Requirement already satisfied: packaging<25,>=23.2 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (24.2)\n",
"Requirement already satisfied: typing-extensions>=4.7 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (4.14.0)\n",
"Requirement already satisfied: pydantic>=2.7.4 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (2.11.7)\n",
"Requirement already satisfied: jsonpointer>=1.9 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (3.0.0)\n",
"Requirement already satisfied: httpx<1,>=0.23.0 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.28.1)\n",
"Requirement already satisfied: orjson<4.0.0,>=3.9.14 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (3.10.18)\n",
"Requirement already satisfied: requests<3,>=2 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (2.32.4)\n",
"Requirement already satisfied: requests-toolbelt<2.0.0,>=1.0.0 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (1.0.0)\n",
"Requirement already satisfied: zstandard<0.24.0,>=0.23.0 in c:\\users\\tuman\\appdata\\roaming\\python\\python312\\site-packages (from langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.23.0)\n",
"Requirement already satisfied: annotated-types>=0.6.0 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from pydantic>=2.7.4->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.7.0)\n",
"Requirement already satisfied: pydantic-core==2.33.2 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from pydantic>=2.7.4->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (2.33.2)\n",
"Requirement already satisfied: typing-inspection>=0.4.0 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from pydantic>=2.7.4->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.4.1)\n",
"Requirement already satisfied: anyio in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (4.9.0)\n",
"Requirement already satisfied: certifi in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (2025.6.15)\n",
"Requirement already satisfied: httpcore==1.* in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (1.0.9)\n",
"Requirement already satisfied: idna in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (3.10)\n",
"Requirement already satisfied: h11>=0.16 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from httpcore==1.*->httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (0.16.0)\n",
"Requirement already satisfied: charset_normalizer<4,>=2 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests<3,>=2->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (3.4.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from requests<3,>=2->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (2.5.0)\n",
"Requirement already satisfied: sniffio>=1.1 in c:\\users\\tuman\\appdata\\local\\programs\\python\\python312\\lib\\site-packages (from anyio->httpx<1,>=0.23.0->langsmith>=0.3.45->langchain-core<0.4.0,>=0.3.15->langchain-aimlapi) (1.3.1)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"[notice] A new release of pip is available: 25.0.1 -> 25.2\n",
"[notice] To update, run: python.exe -m pip install --upgrade pip\n"
]
}
],
"source": [
"%pip install --upgrade langchain-aimlapi"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-06T15:22:02.570792Z",
"start_time": "2025-08-06T15:21:32.377131Z"
}
},
"id": "77d4a44909effc3c",
"execution_count": 4
},
{
"cell_type": "markdown",
"source": [
"## Environment\n",
"\n",
"To use AI/ML API, you'll need an API key which you can generate at:\n",
"[https://aimlapi.com/app/](https://aimlapi.com/app/?utm_source=langchain&utm_medium=github&utm_campaign=integration)\n",
"\n",
"You can pass it via `aimlapi_api_key` parameter or set as environment variable `AIMLAPI_API_KEY`."
],
"metadata": {
"collapsed": false
},
"id": "c41eaf364c0b414f"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"if \"AIMLAPI_API_KEY\" not in os.environ:\n",
" os.environ[\"AIMLAPI_API_KEY\"] = getpass.getpass(\"Enter your AI/ML API key: \")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:15:37.147559Z",
"start_time": "2025-08-07T07:15:30.919160Z"
}
},
"id": "421cd40d4e54de62",
"execution_count": 3
},
{
"cell_type": "markdown",
"source": [
"## Example: Chat Model"
],
"metadata": {
"collapsed": false
},
"id": "d9cbe98904f4c5e4"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The city that never sleeps! New York City is a treasure trove of excitement, entertainment, and adventure. Here are some fun things to do in NYC:\n",
"\n",
"**Iconic Attractions:**\n",
"\n",
"1. **Statue of Liberty and Ellis Island**: Take a ferry to Liberty Island to see the iconic statue up close and visit the Ellis Island Immigration Museum.\n",
"2. **Central Park**: A tranquil oasis in the middle of Manhattan, perfect for a stroll, picnic, or bike ride.\n",
"3. **Empire State Building**: For a panoramic view of the city, head to the observation deck of this iconic skyscraper.\n",
"4. **The Metropolitan Museum of Art**: One of the world's largest and most famous museums, with a collection that spans over 5,000 years of human history.\n",
"\n",
"**Neighborhood Explorations:**\n",
"\n",
"1. **SoHo**: Known for its trendy boutiques, art galleries, and cast-iron buildings.\n",
"2. **Greenwich Village**: A charming neighborhood with a rich history, known for its bohemian vibe, jazz clubs, and historic brownstones.\n",
"3. **Chinatown and Little Italy**: Experience the vibrant cultures of these two iconic neighborhoods, with delicious food, street festivals, and unique shops.\n",
"4. **Williamsburg, Brooklyn**: A hip neighborhood with a thriving arts scene, trendy bars, and some of the best restaurants in the city.\n",
"\n",
"**Food and Drink:**\n",
"\n",
"1. **Try a classic NYC slice of pizza**: Visit Lombardi's, Joe's Pizza, or Patsy's Pizzeria for a taste of the city's famous pizza.\n",
"2. **Bagels with lox and cream cheese**: A classic NYC breakfast at a Jewish deli like Russ & Daughters Cafe or Ess-a-Bagel.\n",
"3. **Food markets**: Visit Smorgasburg in Brooklyn or Chelsea Market for a variety of artisanal foods and drinks.\n",
"4. **Rooftop bars**: Enjoy a drink with a view at 230 Fifth, the Top of the Strand, or the Roof at The Viceroy Central Park.\n",
"**Performing Arts:**\n",
"\n",
"1. **Broadway shows**: Catch a musical or play on the Great White Way, like Hamilton, The Lion King, or Wicked.\n",
"2. **Jazz clubs**: Visit Blue Note Jazz Club, the Village Vanguard, or the Jazz Standard for live music performances.\n",
"3. **Lincoln Center**: Home to the New York City Ballet, the Metropolitan Opera, and the Juilliard School.\n",
"4. **"
]
}
],
"source": [
"from langchain_aimlapi import ChatAimlapi\n",
"\n",
"chat = ChatAimlapi(\n",
" model=\"meta-llama/Llama-3-70b-chat-hf\",\n",
")\n",
"\n",
"# Stream response\n",
"for chunk in chat.stream(\"Tell me fun things to do in NYC\"):\n",
" print(chunk.content, end=\"\", flush=True)\n",
"\n",
"# Or use invoke()\n",
"# response = chat.invoke(\"Tell me fun things to do in NYC\")\n",
"# print(response)"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:15:59.612289Z",
"start_time": "2025-08-07T07:15:47.864231Z"
}
},
"id": "3f73a8e113a58e9b",
"execution_count": 4
},
{
"cell_type": "markdown",
"source": [
"## Example: Text Completion Model"
],
"metadata": {
"collapsed": false
},
"id": "7aca59af5cadce80"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" # Funkcja ponownie zwraca nową listę, bez zmienienia listy przekazanej jako argument w funkcji\n",
" my_list = [16, 12, 16, 3, 2, 6]\n",
" new_list = my_list[:]\n",
" for x in range(len(new_list)):\n",
" for y in range(len(new_list) - 1):\n",
" if new_list[y] > new_list[y + 1]:\n",
" new_list[y], new_list[y + 1] = new_list[y + 1], new_list[y]\n",
" return new_list, my_list\n",
"\n",
"\n",
"def bubble_sort_lib3(list): # Sortowanie z wykorzystaniem zewnętrznej biblioteki poza pętlą\n",
" from itertools import permutations\n",
" y = len(list)\n",
" perms = []\n",
" for a in range(0, y + 1):\n",
" for subset in permutations(list, a):\n",
" \n"
]
}
],
"source": [
"from langchain_aimlapi import AimlapiLLM\n",
"\n",
"llm = AimlapiLLM(\n",
" model=\"gpt-3.5-turbo-instruct\",\n",
")\n",
"\n",
"print(llm.invoke(\"def bubble_sort(): \"))"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:16:22.595703Z",
"start_time": "2025-08-07T07:16:19.410881Z"
}
},
"id": "2af3be417769efc3",
"execution_count": 6
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -36,7 +36,7 @@ For end-to-end usage check out
## Additional Resources
- [LangChain Docling integration GitHub](https://github.com/DS4SD/docling-langchain)
- [LangChain Docling integration GitHub](https://github.com/docling-project/docling-langchain)
- [LangChain Docling integration PyPI package](https://pypi.org/project/langchain-docling/)
- [Docling GitHub](https://github.com/DS4SD/docling)
- [Docling docs](https://ds4sd.github.io/docling/)
- [Docling GitHub](https://github.com/docling-project/docling)
- [Docling docs](https://docling-project.github.io/docling/)

View File

@@ -10,7 +10,7 @@
Install the python SDK:
```bash
pip install firecrawl-py==0.0.20
pip install firecrawl-py
```
## Document loader

View File

@@ -0,0 +1,129 @@
# Bigtable
Bigtable is a scalable, fully managed key-value and wide-column store ideal for fast access to structured, semi-structured, or unstructured data. This page provides an overview of Bigtable's LangChain integrations.
**Client Library Documentation:** [cloud.google.com/python/docs/reference/langchain-google-bigtable/latest](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest)
**Product Documentation:** [cloud.google.com/bigtable](https://cloud.google.com/bigtable)
## Quick Start
To use this library, you first need to:
1. Select or create a Cloud Platform project.
2. Enable billing for your project.
3. Enable the Google Cloud Bigtable API.
4. Set up Authentication.
## Installation
The main package for this integration is `langchain-google-bigtable`.
```bash
pip install -U langchain-google-bigtable
```
## Integrations
The `langchain-google-bigtable` package provides the following integrations:
### Vector Store
With `BigtableVectorStore`, you can store documents and their vector embeddings to find the most similar or relevant information in your database.
* **Full `VectorStore` Implementation:** Supports all methods from the LangChain `VectorStore` abstract class.
* **Async/Sync Support:** All methods are available in both asynchronous and synchronous versions.
* **Metadata Filtering:** Powerful filtering on metadata fields, including logical AND/OR combinations.
* **Multiple Distance Strategies:** Supports both Cosine and Euclidean distance for similarity search.
* **Customizable Storage:** Full control over how content, embeddings, and metadata are stored in Bigtable columns.
```python
from langchain_google_bigtable import BigtableVectorStore
# Your embedding service and other configurations
# embedding_service = ...
engine = await BigtableEngine.async_initialize(project_id="your-project-id")
vector_store = await BigtableVectorStore.create(
engine=engine,
instance_id="your-instance-id",
table_id="your-table-id",
embedding_service=embedding_service,
collection="your_collection_name",
)
await vector_store.aadd_documents([your_documents])
results = await vector_store.asimilarity_search("your query")
```
Learn more in the [Vector Store how-to guide](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/vector_store.ipynb).
### Key-value Store
Use `BigtableByteStore` as a persistent, scalable key-value store for caching, session management, or other storage needs. It supports both synchronous and asynchronous operations.
```python
from langchain_google_bigtable import BigtableByteStore
# Initialize the store
store = await BigtableByteStore.create(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-table-id",
)
# Set and get values
await store.amset([("key1", b"value1")])
retrieved = await store.amget(["key1"])
```
Learn more in the [Key-value Store how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/key-value-store).
### Document Loader
Use the `BigtableLoader` to load data from a Bigtable table and represent it as LangChain `Document` objects.
```python
from langchain_google_bigtable import BigtableLoader
loader = BigtableLoader(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-table-name"
)
docs = loader.load()
```
Learn more in the [Document Loader how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/document-loader).
### Chat Message History
Use `BigtableChatMessageHistory` to store conversation histories, enabling stateful chains and agents.
```python
from langchain_google_bigtable import BigtableChatMessageHistory
history = BigtableChatMessageHistory(
project_id="your-project-id",
instance_id="your-instance-id",
table_id="your-message-store",
session_id="user-session-123"
)
history.add_user_message("Hello!")
history.add_ai_message("Hi there!")
```
Learn more in the [Chat Message History how-to guide](https://cloud.google.com/python/docs/reference/langchain-google-bigtable/latest/chat-message-history).
## Contributions
Contributions to this library are welcome. Please see the CONTRIBUTING guide in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/) for more details
## License
This project is licensed under the Apache 2.0 License - see the LICENSE file in the [package repo](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/LICENSE) for details.
## Disclaimer
This is not an officially supported Google product.

View File

@@ -1,4 +1,4 @@
# Oracle Cloud Infrastructure (OCI)
# Oracle Cloud Infrastructure (OCI)
The `LangChain` integrations related to [Oracle Cloud Infrastructure](https://www.oracle.com/artificial-intelligence/).
@@ -11,16 +11,14 @@ The `LangChain` integrations related to [Oracle Cloud Infrastructure](https://ww
To use, you should have the latest `oci` python SDK and the langchain_community package installed.
```bash
pip install -U langchain_oci
python -m pip install -U langchain-oci
```
See [chat](/docs/integrations/llms/oci_generative_ai), [complete](/docs/integrations/chat/oci_generative_ai), and [embedding](/docs/integrations/text_embedding/oci_generative_ai) usage examples.
See [chat](/docs/integrations/chat/oci_generative_ai), [complete](/docs/integrations/chat/oci_generative_ai), and [embedding](/docs/integrations/text_embedding/oci_generative_ai) usage examples.
```python
from langchain_oci.chat_models import ChatOCIGenAI
from langchain_oci.llms import OCIGenAI
from langchain_oci.embeddings import OCIGenAIEmbeddings
```

View File

@@ -0,0 +1,72 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ScraperAPI\n",
"\n",
"[ScraperAPI](https://www.scraperapi.com/) enables data collection from any public website with its web scraping API, without worrying about proxies, browsers, or CAPTCHA handling. [langchain-scraperapi](https://github.com/scraperapi/langchain-scraperapi) wraps this service, making it easy for AI agents to browse the web and scrape data from it.\n",
"\n",
"## Installation and Setup\n",
"\n",
"- Install the Python package with `pip install langchain-scraperapi`.\n",
"- Obtain an API key from [ScraperAPI](https://www.scraperapi.com/) and set the environment variable `SCRAPERAPI_API_KEY`.\n",
"\n",
"### Tools\n",
"\n",
"The package offers 3 tools to scrape any website, get structured Google search results, and get structured Amazon search results respectively.\n",
"\n",
"To import them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install langchain_scraperapi\n",
"\n",
"from langchain_scraperapi.tools import (\n",
" ScraperAPIAmazonSearchTool,\n",
" ScraperAPIGoogleSearchTool,\n",
" ScraperAPITool,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Example use:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tool = ScraperAPITool()\n",
"\n",
"result = tool.invoke({\"url\": \"https://example.com\", \"output_format\": \"markdown\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a more detailed walkthrough of how to use these tools, visit the [official repository](https://github.com/scraperapi/langchain-scraperapi)."
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,140 @@
---
title: Superlinked
description: LangChain integration package for the Superlinked retrieval stack
---
import Link from '@docusaurus/Link';
### Overview
Superlinked enables contextaware retrieval using multiple space types (text similarity, categorical, numerical, recency, and more). The `langchain-superlinked` package provides a LangChainnative `SuperlinkedRetriever` that plugs directly into your RAG chains.
### Links
- <Link to="https://github.com/superlinked/langchain-superlinked">Integration repository</Link>
- <Link to="https://links.superlinked.com/langchain_repo_sl">Superlinked core repository</Link>
- <Link to="https://links.superlinked.com/langchain_article">Article: Build RAG using LangChain & Superlinked</Link>
### Install
```bash
pip install -U langchain-superlinked superlinked
```
### Quickstart
```python
import superlinked.framework as sl
from langchain_superlinked import SuperlinkedRetriever
# 1) Define schema
class DocumentSchema(sl.Schema):
id: sl.IdField
content: sl.String
doc_schema = DocumentSchema()
# 2) Define space and index
text_space = sl.TextSimilaritySpace(
text=doc_schema.content, model="sentence-transformers/all-MiniLM-L6-v2"
)
doc_index = sl.Index([text_space])
# 3) Define query
query = (
sl.Query(doc_index)
.find(doc_schema)
.similar(text_space.text, sl.Param("query_text"))
.select([doc_schema.content])
.limit(sl.Param("limit"))
)
# 4) Minimal app setup
source = sl.InMemorySource(schema=doc_schema)
executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])
app = executor.run()
source.put([
{"id": "1", "content": "Machine learning algorithms process data efficiently."},
{"id": "2", "content": "Natural language processing understands human language."},
])
# 5) LangChain retriever
retriever = SuperlinkedRetriever(
sl_client=app, sl_query=query, page_content_field="content"
)
# Search
docs = retriever.invoke("artificial intelligence", limit=2)
for d in docs:
print(d.page_content)
```
### What the retriever expects (App and Query)
The retriever takes two core inputs:
- `sl_client`: a Superlinked App created by running an executor (e.g., `InMemoryExecutor(...).run()`)
- `sl_query`: a `QueryDescriptor` returned by chaining `sl.Query(...).find(...).similar(...).select(...).limit(...)`
Minimal setup:
```python
import superlinked.framework as sl
from langchain_superlinked import SuperlinkedRetriever
class Doc(sl.Schema):
id: sl.IdField
content: sl.String
doc = Doc()
space = sl.TextSimilaritySpace(text=doc.content, model="sentence-transformers/all-MiniLM-L6-v2")
index = sl.Index([space])
query = (
sl.Query(index)
.find(doc)
.similar(space.text, sl.Param("query_text"))
.select([doc.content])
.limit(sl.Param("limit"))
)
source = sl.InMemorySource(schema=doc)
app = sl.InMemoryExecutor(sources=[source], indices=[index]).run()
retriever = SuperlinkedRetriever(sl_client=app, sl_query=query, page_content_field="content")
```
Note: For a persistent vector DB, pass `vector_database=...` to the executor (e.g., Qdrant) before `.run()`.
### Use within a chain
```python
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
prompt = ChatPromptTemplate.from_template(
"""
Answer based on context:\n\nContext: {context}\nQuestion: {question}
"""
)
chain = ({"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| ChatOpenAI())
answer = chain.invoke("How does machine learning work?")
```
### Resources
- <Link to="https://pypi.org/project/langchain-superlinked/">PyPI: langchain-superlinked</Link>
- <Link to="https://pypi.org/project/superlinked/">PyPI: superlinked</Link>
- <Link to="https://github.com/superlinked/langchain-superlinked">Source repository</Link>
- <Link to="https://links.superlinked.com/langchain_repo_sl">Superlinked core repository</Link>
- <Link to="https://links.superlinked.com/langchain_article">Build RAG using LangChain & Superlinked (article)</Link>

View File

@@ -0,0 +1,170 @@
# Timbr
[Timbr](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/) integrates natural language inputs with Timbr's ontology-driven semantic layer. Leveraging Timbr's robust ontology capabilities, the SDK integrates with Timbr data models and leverages semantic relationships and annotations, enabling users to query data using business-friendly language.
Timbr provides a pre-built SQL agent, `TimbrSqlAgent`, which can be used for end-to-end purposes from user prompt, through semantic SQL query generation and validation, to query execution and result analysis.
For customizations and partial usage, you can use LangChain chains and LangGraph nodes with our 5 main tools:
- `IdentifyTimbrConceptChain` & `IdentifyConceptNode` - Identify relevant concepts from user prompts
- `GenerateTimbrSqlChain` & `GenerateTimbrSqlNode` - Generate SQL queries from natural language prompts
- `ValidateTimbrSqlChain` & `ValidateSemanticSqlNode` - Validate SQL queries against Timbr knowledge graph schemas
- `ExecuteTimbrQueryChain` & `ExecuteSemanticQueryNode` - Execute (semantic and regular) SQL queries against Timbr knowledge graph databases
- `GenerateAnswerChain` & `GenerateResponseNode` - Generate human-readable answers based on a given prompt and data rows
Additionally, `langchain-timbr` provides `TimbrLlmConnector` for manual integration with Timbr's semantic layer using LLM providers. This connector includes the following methods:
- `get_ontologies` - List Timbr's semantic knowledge graphs
- `get_concepts` - List selected knowledge graph ontology representation concepts
- `get_views` - List selected knowledge graph ontology representation views
- `determine_concept` - Identify relevant concepts from user prompts
- `generate_sql` - Generate SQL queries from natural language prompts
- `validate_sql` - Validate SQL queries against Timbr knowledge graph schemas
- `run_timbr_query` - Execute (semantic and regular) SQL queries against Timbr knowledge graph databases
- `run_llm_query` - Execute agent pipeline to determine concept, generate SQL, and run query from natural language prompt
## Quickstart
### Installation
#### Install the package
```bash
pip install langchain-timbr
```
#### Optional: Install with selected LLM provider
Choose one of: openai, anthropic, google, azure_openai, snowflake, databricks (or 'all')
```bash
pip install 'langchain-timbr[<your selected providers, separated by comma without spaces>]'
```
## Configuration
Starting from `langchain-timbr` v2.0.0, all chains, agents, and nodes support optional environment-based configuration. You can set the following environment variables to provide default values and simplify setup for the provided tools:
### Timbr Connection Parameters
- **TIMBR_URL**: Default Timbr server URL
- **TIMBR_TOKEN**: Default Timbr authentication token
- **TIMBR_ONTOLOGY**: Default ontology/knowledge graph name
When these environment variables are set, the corresponding parameters (`url`, `token`, `ontology`) become optional in all chain and agent constructors and will use the environment values as defaults.
### LLM Configuration Parameters
- **LLM_TYPE**: The type of LLM provider (one of langchain_timbr LlmTypes enum: 'openai-chat', 'anthropic-chat', 'chat-google-generative-ai', 'azure-openai-chat', 'snowflake-cortex', 'chat-databricks')
- **LLM_API_KEY**: The API key for authenticating with the LLM provider
- **LLM_MODEL**: The model name or deployment to use
- **LLM_TEMPERATURE**: Temperature setting for the LLM
- **LLM_ADDITIONAL_PARAMS**: Additional parameters as dict or JSON string
When LLM environment variables are set, the `llm` parameter becomes optional and will use the `LlmWrapper` with environment configuration.
Example environment setup:
```bash
# Timbr connection
export TIMBR_URL="https://your-timbr-app.com/"
export TIMBR_TOKEN="tk_XXXXXXXXXXXXXXXXXXXXXXXX"
export TIMBR_ONTOLOGY="timbr_knowledge_graph"
# LLM configuration
export LLM_TYPE="openai-chat"
export LLM_API_KEY="your-openai-api-key"
export LLM_MODEL="gpt-4o"
export LLM_TEMPERATURE="0.1"
export LLM_ADDITIONAL_PARAMS='{"max_tokens": 1000}'
```
## Usage
Import and utilize your intended chain/node, or use TimbrLlmConnector to manually integrate with Timbr's semantic layer. For a complete agent working example, see the [Timbr tool page](/docs/integrations/tools/timbr).
### ExecuteTimbrQueryChain example
```python
from langchain_timbr import ExecuteTimbrQueryChain
# You can use the standard LangChain ChatOpenAI/ChatAnthropic models
# or any other LLM model based on langchain_core.language_models.chat.BaseChatModel
llm = ChatOpenAI(model="gpt-4o", temperature=0, openai_api_key='open-ai-api-key')
# Optional alternative: Use Timbr's LlmWrapper, which provides generic connections to different LLM providers
from langchain_timbr import LlmWrapper, LlmTypes
llm = LlmWrapper(llm_type=LlmTypes.OpenAI, api_key="open-ai-api-key", model="gpt-4o")
execute_timbr_query_chain = ExecuteTimbrQueryChain(
llm=llm,
url="https://your-timbr-app.com/",
token="tk_XXXXXXXXXXXXXXXXXXXXXXXX",
ontology="timbr_knowledge_graph",
schema="dtimbr", # optional
concept="Sales", # optional
concepts_list=["Sales","Orders"], # optional
views_list=["sales_view"], # optional
note="We only need sums", # optional
retries=3, # optional
should_validate_sql=True # optional
)
result = execute_timbr_query_chain.invoke({"prompt": "What are the total sales for last month?"})
rows = result["rows"]
sql = result["sql"]
concept = result["concept"]
schema = result["schema"]
error = result.get("error", None)
usage_metadata = result.get("execute_timbr_usage_metadata", {})
determine_concept_usage = usage_metadata.get('determine_concept', {})
generate_sql_usage = usage_metadata.get('generate_sql', {})
# Each usage_metadata item contains:
# * 'approximate': Estimated token count calculated before invoking the LLM
# * 'input_tokens'/'output_tokens'/'total_tokens'/etc.: Actual token usage metrics returned by the LLM
```
### Multiple chains using SequentialChain example
```python
from langchain.chains import SequentialChain
from langchain_timbr import ExecuteTimbrQueryChain, GenerateAnswerChain
from langchain_openai import ChatOpenAI
# You can use the standard LangChain ChatOpenAI/ChatAnthropic models
# or any other LLM model based on langchain_core.language_models.chat.BaseChatModel
llm = ChatOpenAI(model="gpt-4o", temperature=0, openai_api_key='open-ai-api-key')
# Optional alternative: Use Timbr's LlmWrapper, which provides generic connections to different LLM providers
from langchain_timbr import LlmWrapper, LlmTypes
llm = LlmWrapper(llm_type=LlmTypes.OpenAI, api_key="open-ai-api-key", model="gpt-4o")
execute_timbr_query_chain = ExecuteTimbrQueryChain(
llm=llm,
url='https://your-timbr-app.com/',
token='tk_XXXXXXXXXXXXXXXXXXXXXXXX',
ontology='timbr_knowledge_graph',
)
generate_answer_chain = GenerateAnswerChain(
llm=llm,
url='https://your-timbr-app.com/',
token='tk_XXXXXXXXXXXXXXXXXXXXXXXX',
)
pipeline = SequentialChain(
chains=[execute_timbr_query_chain, generate_answer_chain],
input_variables=["prompt"],
output_variables=["answer", "sql"]
)
result = pipeline.invoke({"prompt": "What are the total sales for last month?"})
```
## Additional Resources
- [PyPI](https://pypi.org/project/langchain-timbr)
- [GitHub](https://github.com/WPSemantix/langchain-timbr)
- [LangChain Timbr Docs](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/)
- [LangGraph Timbr Docs](https://docs.timbr.ai/doc/docs/integration/langgraph-sdk)

View File

@@ -0,0 +1,68 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_MFfVhVCa15x"
},
"source": [
"# ZenRows\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. ZenRows specializes in scraping modern websites, bypassing anti-bot systems, extracting structured data from any website, rendering JavaScript-heavy content, accessing geo-restricted websites, and more.\n",
"\n",
"[langchain-zenrows](https://pypi.org/project/langchain-zenrows/) provides tools that allow LLMs to access web data using ZenRows' powerful scraping infrastructure.\n",
"\n",
"## Installation and Setup\n",
"\n",
"```bash\n",
"pip install langchain-zenrows\n",
"```\n",
"\n",
"You'll need to set up your ZenRows API key:\n",
"\n",
"```python\n",
"import os\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"your-api-key\"\n",
"```\n",
"\n",
"Or you can pass it directly when initializing tools:\n",
"\n",
"```python\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")\n",
"```\n",
"\n",
"## Tools\n",
"\n",
"### ZenRowsUniversalScraper\n",
"\n",
"The ZenRows integration provides comprehensive web scraping features:\n",
"\n",
"- **JavaScript Rendering**: Scrape modern SPAs and dynamic content\n",
"- **Anti-Bot Bypass**: Overcome sophisticated bot detection systems \n",
"- **Geo-Targeting**: Access region-specific content with 190+ countries\n",
"- **Multiple Output Formats**: HTML, Markdown, Plaintext, PDF, Screenshots\n",
"- **CSS Extraction**: Target specific data with CSS selectors\n",
"- **Structured Data Extraction**: Automatically extract emails, phone numbers, links, and more\n",
"- **Session Management**: Maintain consistent sessions across requests\n",
"- **Premium Proxies**: Residential IPs for maximum success rates\n",
"\n",
"See more in the [ZenRows tool documentation](/docs/integrations/tools/zenrows_universal_scraper)."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,603 @@
<p align="center" width="100%">
<h1 align="center">LangChain ZeusDB Integration</h1>
</p>
A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications.
## Features
🚀 **High Performance**
- Rust-powered vector database backend
- Advanced HNSW indexing for sub-millisecond search
- Product Quantization for 4x-256x memory compression
- Concurrent search with automatic parallelization
🎯 **LangChain Native**
- Full VectorStore API compliance
- Async/await support for all operations
- Seamless integration with LangChain retrievers
- Maximal Marginal Relevance (MMR) search
🏢 **Enterprise Ready**
- Structured logging with performance monitoring
- Index persistence with complete state preservation
- Advanced metadata filtering
- Graceful error handling and fallback mechanisms
## Quick Start
### Installation
```bash
pip install -qU langchain-zeusdb
```
### Getting Started
This example uses *OpenAIEmbeddings*, which requires an OpenAI API key - [Get your OpenAI API key here](https://platform.openai.com/api-keys)
If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.).
```bash
pip install langchain-openai
```
```python
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
```
### Basic Usage
```python
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from zeusdb import VectorDatabase
# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create ZeusDB index
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine"
)
# Create vector store
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
vector_store.add_documents(docs)
# Search
results = vector_store.similarity_search("fast database", k=2)
print(f"Found the following {len(results)} results:")
print(results)
```
**Expected results:**
```
Found the following 2 results:
[Document(id='ea2b4f13-b0b7-4cef-bb91-0fc4f4c41295', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), Document(id='33dc1e87-a18a-4827-a0df-6ee47eabc7b2', metadata={'source': 'docs'}, page_content='LangChain is powerful')]
```
<br />
### Factory Methods
For convenience, you can create and populate a vector store in a single step:
**Example 1: - Create from texts (creates index and adds texts in one step)**
```python
vector_store_texts = ZeusDBVectorStore.from_texts(
texts=["Hello world", "Goodbye world"],
embedding=embeddings,
metadatas=[{"source": "text1"}, {"source": "text2"}]
)
print("texts store count:", vector_store_texts.get_vector_count()) # -> 2
print("texts store peek:", vector_store_texts.zeusdb_index.list(2)) # [('id1', {...}), ('id2', {...})]
# Search the texts-based store
results = vector_store_texts.similarity_search("Hello", k=1)
print(f"Found in texts store: {results[0].page_content}") # -> "Hello world"
```
**Expected results:**
```
texts store count: 2
texts store peek: [('e9c39b44-b610-4e00-91f3-bf652e9989ac', {'source': 'text1', 'text': 'Hello world'}), ('d33f210c-ed53-4006-a64a-a9eee397fec9', {'source': 'text2', 'text': 'Goodbye world'})]
Found in texts store: Hello world
```
<br />
**Example 2: - Create from documents (creates index and adds documents in one step)**
```python
new_docs = [
Document(page_content="Python is great", metadata={"source": "python"}),
Document(page_content="JavaScript is flexible", metadata={"source": "js"}),
]
vector_store_docs = ZeusDBVectorStore.from_documents(
documents=new_docs,
embedding=embeddings
)
print("docs store count:", vector_store_docs.get_vector_count()) # -> 2
print("docs store peek:", vector_store_docs.zeusdb_index.list(2)) # [('id3', {...}), ('id4', {...})]
# Search the documents-based store
results = vector_store_docs.similarity_search("Python", k=1)
print(f"Found in docs store: {results[0].page_content}") # -> "Python is great"
```
**Expected results:**
```
docs store count: 2
docs store peek: [('aab2d1c1-7e02-4817-8dd8-6fb03570bb6f', {'text': 'Python is great', 'source': 'python'}), ('9a8a82cb-0e70-456c-9db2-556e464de14e', {'text': 'JavaScript is flexible', 'source': 'js'})]
Found in docs store: Python is great
```
<br />
## Advanced Features
ZeusDB's enterprise-grade capabilities are fully integrated into the LangChain ecosystem, providing quantization, persistence, advanced search features and many other enterprise capabilities.
### Memory-Efficient Setup with Quantization
For large datasets, use Product Quantization to reduce memory usage:
```python
# Create quantized index for memory efficiency
quantization_config = {
'type': 'pq',
'subvectors': 8,
'bits': 8,
'training_size': 10000
}
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine",
quantization_config=quantization_config
)
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
```
Please refer to our [documentation](https://docs.zeusdb.com/en/latest/vector_database/product_quantization.html) for helpful configuration guidelines and recommendations for setting up quantization.
<br />
### Persistence
ZeusDB persistence lets you save a fully populated index to disk and load it later with complete state restoration. This includes vectors, metadata, HNSW graph, and (if enabled) Product Quantization models.
What gets saved:
- Vectors & IDs
- Metadata
- HNSW graph structure
- Quantization config, centroids, and training state (if PQ is enabled)
**How to Save your vector store**
```python
# Save index
vector_store.save_index("my_index.zdb")
```
**How to Load your vector store**
```python
# Load index
loaded_store = ZeusDBVectorStore.load_index(
path="my_index.zdb",
embedding=embeddings
)
# Verify after load
print("vector count:", loaded_store.get_vector_count())
print("index info:", loaded_store.info())
print("store peek:", loaded_store.zeusdb_index.list(2))
```
**Notes**
- The path is a directory, not a single file. Ensure the target is writable.
- Saved indexes are cross-platform and include format/version info for compatibility checks.
- If you used PQ, both the compression model and state are preserved—no need to retrain after loading.
- You can continue to use all vector store APIs (similarity_search, retrievers, etc.) on the loaded_store.
For further details (including file structure, and further comprehensive examples), see the [documentation](https://docs.zeusdb.com/en/latest/vector_database/persistence.html).
<br />
### Advanced Search Options
Use these to control scoring, diversity, metadata filtering, and retriever integration for your searches.
#### Similarity search with scores
Returns `(Document, raw_distance)` pairs from ZeusDB — **lower distance = more similar**.
If you prefer normalized relevance in `[0, 1]`, use `similarity_search_with_relevance_scores`.
```python
# Similarity search with scores
results_with_scores = vector_store.similarity_search_with_score(
query="machine learning",
k=5
)
print(results_with_scores)
```
**Expected results:**
```
[
(Document(id='ac0eaf5b-9f02-4ce2-8957-c369a7262c61', metadata={'source': 'docs'}, page_content='LangChain is powerful'), 0.8218843340873718),
(Document(id='faae3adf-7cf3-463c-b282-3790b096fa23', metadata={'source': 'docs'}, page_content='ZeusDB is fast'), 0.9140053391456604)
]
```
#### MMR search for diversity
MMR (Maximal Marginal Relevance) balances two forces: relevance to the query and diversity among selected results, reducing near-duplicate answers. Control the trade-off with lambda_mult (1.0 = all relevance, 0.0 = all diversity).
```python
# MMR search for diversity
mmr_results = vector_store.max_marginal_relevance_search(
query="AI applications",
k=5,
fetch_k=20,
lambda_mult=0.7 # Balance relevance vs diversity
)
print(mmr_results)
```
#### Search with metadata filtering
Filter results using document metadata you stored when adding docs
```python
# Search with metadata filtering
results = vector_store.similarity_search(
query="database performance",
k=3,
filter={"source": "documentation"}
)
```
For supported metadata query types and operators, please refer to the [documentation](https://docs.zeusdb.com/en/latest/vector_database/metadata_filtering.html).
#### As a Retriever
Turning the vector store into a retriever gives you a standard LangChain interface that chains (e.g., RetrievalQA) can call to fetch context. Under the hood it uses your chosen search type (similarity or mmr) and search_kwargs.
```python
# Convert to retriever for use in chains
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 3, "lambda_mult": 0.8}
)
# Use with LangChain Expression Language (LCEL) - requires only langchain-core
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI()
# Create a chain using LCEL
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Use the chain
answer = chain.invoke("What is ZeusDB?")
print(answer)
```
**Expected results:**
```
ZeusDB is a fast database management system.
```
<br />
## Async Support
ZeusDB supports asynchronous operations for non-blocking, concurrent vector operations.
**When to use async:** web servers (FastAPI/Starlette), agents/pipelines doing parallel searches, or notebooks where you want non-blocking/concurrent retrieval. If you're writing simple scripts, the sync methods are fine.
Those are **asynchronous operations** - the async/await versions of the regular synchronous methods. Here's what each one does:
1. `await vector_store.aadd_documents(documents)` - Asynchronously adds documents to the vector store (async version of `add_documents()`)
2. `await vector_store.asimilarity_search("query", k=5)` - Asynchronously performs similarity search (async version of `similarity_search()`)
3. `await vector_store.adelete(ids=["doc1", "doc2"])` - Asynchronously deletes documents by their IDs (async version of `delete()`)
The async versions are useful when:
- You're building async applications (using `asyncio`, FastAPI, etc.)
- You want non-blocking operations that can run concurrently
- You're handling multiple requests simultaneously
- You want better performance in I/O-bound applications
For example, instead of blocking while adding documents:
```python
# Synchronous (blocking)
vector_store.add_documents(docs) # Blocks until complete
# Asynchronous (non-blocking)
await vector_store.aadd_documents(docs) # Can do other work while this runs
```
All operations support async/await:
**Script version (`python my_script.py`):**
```python
import asyncio
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from zeusdb import VectorDatabase
# Setup
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vdb = VectorDatabase()
index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
async def main():
# Add documents asynchronously
ids = await vector_store.aadd_documents(docs)
print("Added IDs:", ids)
# Run multiple searches concurrently
results_fast, results_powerful = await asyncio.gather(
vector_store.asimilarity_search("fast", k=2),
vector_store.asimilarity_search("powerful", k=2),
)
print("Fast results:", [d.page_content for d in results_fast])
print("Powerful results:", [d.page_content for d in results_powerful])
# Delete documents asynchronously
deleted = await vector_store.adelete(ids=ids[:1])
print("Deleted first doc:", deleted)
if __name__ == "__main__":
asyncio.run(main())
```
**Colab/Notebook/Jupyter version (top-level `await`):**
```python
from langchain_zeusdb import ZeusDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from zeusdb import VectorDatabase
import asyncio
# Setup
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vdb = VectorDatabase()
index = vdb.create(index_type="hnsw", dim=1536, space="cosine")
vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
# Add documents asynchronously
ids = await vector_store.aadd_documents(docs)
print("Added IDs:", ids)
# Run multiple searches concurrently
results_fast, results_powerful = await asyncio.gather(
vector_store.asimilarity_search("fast", k=2),
vector_store.asimilarity_search("powerful", k=2),
)
print("Fast results:", [d.page_content for d in results_fast])
print("Powerful results:", [d.page_content for d in results_powerful])
# Delete documents asynchronously
deleted = await vector_store.adelete(ids=ids[:1])
print("Deleted first doc:", deleted)
```
**Expected results:**
```
Added IDs: ['9c440918-715f-49ba-9b97-0d991d29e997', 'ad59c645-d3ba-4a4a-a016-49ed39514123']
Fast results: ['ZeusDB is fast', 'LangChain is powerful']
Powerful results: ['LangChain is powerful', 'ZeusDB is fast']
Deleted first doc: True
```
<br />
## Monitoring and Observability
### Performance Monitoring
```python
# Get index statistics
stats = vector_store.get_zeusdb_stats()
print(f"Index size: {stats.get('total_vectors', '0')} vectors")
print(f"Dimension: {stats.get('dimension')} | Space: {stats.get('space')} | Index type: {stats.get('index_type')}")
# Benchmark search performance
performance = vector_store.benchmark_search_performance(
query_count=100,
max_threads=4
)
print(f"Search QPS: {performance.get('parallel_qps', 0):.0f}")
# Check quantization status
if vector_store.is_quantized():
progress = vector_store.get_training_progress()
print(f"Quantization training: {progress:.1f}% complete")
else:
print("Index is not quantized")
```
**Expected results:**
```
Index size: 2 vectors
Dimension: 1536 | Space: cosine | Index type: HNSW
Search QPS: 53807
Index is not quantized
```
### Enterprise Logging
ZeusDB includes enterprise-grade structured logging that works automatically with smart environment detection:
```python
import logging
# ZeusDB automatically detects your environment and applies appropriate logging:
# - Development: Human-readable logs, WARNING level
# - Production: JSON structured logs, ERROR level
# - Testing: Minimal output, CRITICAL level
# - Jupyter: Clean readable logs, INFO level
# Operations are automatically logged with performance metrics
vector_store.add_documents(docs)
# Logs: {"operation":"vector_addition","total_inserted":2,"duration_ms":45}
# Control logging with environment variables if needed
# ZEUSDB_LOG_LEVEL=debug ZEUSDB_LOG_FORMAT=json python your_app.py
```
To learn more about the full features of ZeusDB's enterprise logging capabilities please read the following [documentation](https://docs.zeusdb.com/en/latest/vector_database/logging.html).
<br />
## Configuration Options
### Index Parameters
```python
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw", # Index algorithm
dim=1536, # Vector dimension
space="cosine", # Distance metric: cosine, l2, l1
m=16, # HNSW connectivity
ef_construction=200, # Build-time search width
expected_size=100000, # Expected number of vectors
quantization_config=None # Optional quantization
)
```
### Search Parameters
```python
results = vector_store.similarity_search(
query="search query",
k=5, # Number of results
ef_search=None, # Runtime search width (auto if None)
filter={"key": "value"} # Metadata filter
)
```
## Error Handling
The integration includes comprehensive error handling:
```python
try:
results = vector_store.similarity_search("query")
print(results)
except Exception as e:
# Graceful degradation with logging
print(f"Search failed: {e}")
# Fallback logic here
```
## Requirements
- **Python**: 3.10 or higher
- **ZeusDB**: 0.0.8 or higher
- **LangChain Core**: 0.3.74 or higher
## Installation from Source
```bash
git clone https://github.com/zeusdb/langchain-zeusdb.git
cd langchain-zeusdb/libs/zeusdb
pip install -e .
```
## Use Cases
- **RAG Applications**: High-performance retrieval for question answering
- **Semantic Search**: Fast similarity search across large document collections
- **Recommendation Systems**: Vector-based content and collaborative filtering
- **Embeddings Analytics**: Analysis of high-dimensional embedding spaces
- **Real-time Applications**: Low-latency vector search for production systems
## Compatibility
### LangChain Versions
- **LangChain Core**: 0.3.74+
### Distance Metrics
- **Cosine**: Default, normalized similarity
- **Euclidean (L2)**: Geometric distance
- **Manhattan (L1)**: City-block distance
### Embedding Models
Compatible with any embedding provider:
- OpenAI (`text-embedding-3-small`, `text-embedding-3-large`)
- Hugging Face Transformers
- Cohere Embeddings
- Custom embedding functions
## Support
- **Documentation**: [docs.zeusdb.com](https://docs.zeusdb.com)
- **Issues**: [GitHub Issues](https://github.com/zeusdb/langchain-zeusdb/issues)
- **Email**: contact@zeusdb.com
---
*Making vector search fast, scalable, and developer-friendly.*

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,204 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SuperlinkedRetriever Examples\n",
"\n",
"This notebook demonstrates how to build a Superlinked App and Query Descriptor and use them with the LangChain `SuperlinkedRetriever`.\n",
"\n",
"Install the integration from PyPI:\n",
"\n",
"```bash\n",
"pip install -U langchain-superlinked superlinked\n",
"```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Install the integration and its peer dependency:\n",
"\n",
"```bash\n",
"pip install -U langchain-superlinked superlinked\n",
"```\n",
"\n",
"## Instantiation\n",
"\n",
"See below for creating a Superlinked App (`sl_client`) and a `QueryDescriptor` (`sl_query`), then wiring them into `SuperlinkedRetriever`.\n",
"\n",
"## Usage\n",
"\n",
"Call `retriever.invoke(query_text, **params)` to retrieve `Document` objects. Examples below show single-space and multi-space setups.\n",
"\n",
"## Use within a chain\n",
"\n",
"The retriever can be used in LangChain chains by piping it into your prompt and model. See the main Superlinked retriever page for a full RAG example.\n",
"\n",
"## API reference\n",
"\n",
"Refer to the API docs:\n",
"\n",
"- https://python.langchain.com/api_reference/superlinked/retrievers/langchain_superlinked.retrievers.SuperlinkedRetriever.html\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import superlinked.framework as sl\n",
"from langchain_superlinked import SuperlinkedRetriever\n",
"from datetime import timedelta\n",
"\n",
"\n",
"# Define schema\n",
"class DocumentSchema(sl.Schema):\n",
" id: sl.IdField\n",
" content: sl.String\n",
"\n",
"\n",
"doc_schema = DocumentSchema()\n",
"\n",
"# Space + index\n",
"text_space = sl.TextSimilaritySpace(\n",
" text=doc_schema.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"doc_index = sl.Index([text_space])\n",
"\n",
"# Query descriptor\n",
"query = (\n",
" sl.Query(doc_index)\n",
" .find(doc_schema)\n",
" .similar(text_space.text, sl.Param(\"query_text\"))\n",
" .select([doc_schema.content])\n",
" .limit(sl.Param(\"limit\"))\n",
")\n",
"\n",
"# Minimal app\n",
"source = sl.InMemorySource(schema=doc_schema)\n",
"executor = sl.InMemoryExecutor(sources=[source], indices=[doc_index])\n",
"app = executor.run()\n",
"\n",
"# Data\n",
"source.put(\n",
" [\n",
" {\"id\": \"1\", \"content\": \"Machine learning algorithms process data efficiently.\"},\n",
" {\n",
" \"id\": \"2\",\n",
" \"content\": \"Natural language processing understands human language.\",\n",
" },\n",
" {\"id\": \"3\", \"content\": \"Deep learning models require significant compute.\"},\n",
" ]\n",
")\n",
"\n",
"# Retriever\n",
"retriever = SuperlinkedRetriever(\n",
" sl_client=app, sl_query=query, page_content_field=\"content\"\n",
")\n",
"\n",
"retriever.invoke(\"artificial intelligence\", limit=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Multi-space example (blog posts)\n",
"class BlogPostSchema(sl.Schema):\n",
" id: sl.IdField\n",
" title: sl.String\n",
" content: sl.String\n",
" category: sl.String\n",
" published_date: sl.Timestamp\n",
"\n",
"\n",
"blog = BlogPostSchema()\n",
"\n",
"content_space = sl.TextSimilaritySpace(\n",
" text=blog.content, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"title_space = sl.TextSimilaritySpace(\n",
" text=blog.title, model=\"sentence-transformers/all-MiniLM-L6-v2\"\n",
")\n",
"cat_space = sl.CategoricalSimilaritySpace(\n",
" category_input=blog.category, categories=[\"technology\", \"science\", \"business\"]\n",
")\n",
"recency_space = sl.RecencySpace(\n",
" timestamp=blog.published_date,\n",
" period_time_list=[\n",
" sl.PeriodTime(timedelta(days=30)),\n",
" sl.PeriodTime(timedelta(days=90)),\n",
" ],\n",
")\n",
"\n",
"blog_index = sl.Index([content_space, title_space, cat_space, recency_space])\n",
"\n",
"blog_query = (\n",
" sl.Query(\n",
" blog_index,\n",
" weights={\n",
" content_space: sl.Param(\"content_weight\"),\n",
" title_space: sl.Param(\"title_weight\"),\n",
" cat_space: sl.Param(\"category_weight\"),\n",
" recency_space: sl.Param(\"recency_weight\"),\n",
" },\n",
" )\n",
" .find(blog)\n",
" .similar(content_space.text, sl.Param(\"query_text\"))\n",
" .select([blog.title, blog.content, blog.category, blog.published_date])\n",
" .limit(sl.Param(\"limit\"))\n",
")\n",
"\n",
"source = sl.InMemorySource(schema=blog)\n",
"app = sl.InMemoryExecutor(sources=[source], indices=[blog_index]).run()\n",
"\n",
"from datetime import datetime\n",
"\n",
"source.put(\n",
" [\n",
" {\n",
" \"id\": \"p1\",\n",
" \"title\": \"Intro to ML\",\n",
" \"content\": \"Machine learning 101\",\n",
" \"category\": \"technology\",\n",
" \"published_date\": int((datetime.now() - timedelta(days=5)).timestamp()),\n",
" },\n",
" {\n",
" \"id\": \"p2\",\n",
" \"title\": \"AI in Healthcare\",\n",
" \"content\": \"Transforming diagnosis\",\n",
" \"category\": \"science\",\n",
" \"published_date\": int((datetime.now() - timedelta(days=15)).timestamp()),\n",
" },\n",
" ]\n",
")\n",
"\n",
"blog_retriever = SuperlinkedRetriever(\n",
" sl_client=app,\n",
" sl_query=blog_query,\n",
" page_content_field=\"content\",\n",
" metadata_fields=[\"title\", \"category\", \"published_date\"],\n",
")\n",
"\n",
"blog_retriever.invoke(\n",
" \"machine learning\", content_weight=1.0, recency_weight=0.5, limit=2\n",
")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,483 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Google Bigtable\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigtableByteStore\n",
"\n",
"This guide covers how to use Google Cloud Bigtable as a key-value store.\n",
"\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. \n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/key_value_store.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"The `BigtableByteStore` uses Google Cloud Bigtable as a backend for a key-value store. It supports synchronous and asynchronous operations for setting, getting, and deleting key-value pairs.\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [BigtableByteStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"### Prerequisites\n",
"\n",
"To get started, you will need a Google Cloud project with an active Bigtable instance and table. \n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
"* [Create a Bigtable instance and table](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"\n",
"### Installation\n",
"\n",
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` for the embedding cache example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project to use its resources within this notebook.\n",
"\n",
"If you don't know your project ID, you can run `gcloud config list` or see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in your project, instance, and table details.\n",
"PROJECT_ID = \"your-gcp-project-id\" # @param {type:\"string\"}\n",
"INSTANCE_ID = \"your-instance-id\" # @param {type:\"string\"}\n",
"TABLE_ID = \"your-table-id\" # @param {type:\"string\"}\n",
"\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud to access your project resources.\n",
"- For **Colab**, use the cell below.\n",
"- For **Vertex AI Workbench**, see the [setup instructions](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"To use `BigtableByteStore`, we first ensure a table exists and then initialize a `BigtableEngine` to manage connections."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import (\n",
" BigtableByteStore,\n",
" BigtableEngine,\n",
" init_key_value_store_table,\n",
")\n",
"\n",
"# Ensure the table and column family exist.\n",
"init_key_value_store_table(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableEngine\n",
"A `BigtableEngine` object handles the execution context for the store, especially for async operations. It's recommended to initialize a single engine and reuse it across multiple stores for better performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the engine to manage async operations.\n",
"engine = await BigtableEngine.async_initialize(\n",
" project_id=PROJECT_ID, instance_id=INSTANCE_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableByteStore\n",
"\n",
"This is the main class for interacting with the key-value store. It provides the methods for setting, getting, and deleting data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize the store.\n",
"store = await BigtableByteStore.create(engine=engine, table_id=TABLE_ID)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"The store supports both sync (`mset`, `mget`) and async (`amset`, `amget`) methods. This guide uses the async versions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set\n",
"Use `amset` to save key-value pairs to the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kv_pairs = [\n",
" (\"key1\", b\"value1\"),\n",
" (\"key2\", b\"value2\"),\n",
" (\"key3\", b\"value3\"),\n",
"]\n",
"\n",
"await store.amset(kv_pairs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Get\n",
"Use `amget` to retrieve values. If a key is not found, `None` is returned for that key."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retrieved_vals = await store.amget([\"key1\", \"key2\", \"nonexistent_key\"])\n",
"print(retrieved_vals)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete\n",
"Use `amdelete` to remove keys from the store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.amdelete([\"key3\"])\n",
"\n",
"# Verifying the key was deleted\n",
"await store.amget([\"key1\", \"key3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Iterate over keys\n",
"Use `ayield_keys` to iterate over all keys or keys with a specific prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_keys = [key async for key in store.ayield_keys()]\n",
"print(f\"All keys: {all_keys}\")\n",
"\n",
"prefixed_keys = [key async for key in store.ayield_keys(prefix=\"key1\")]\n",
"print(f\"Prefixed keys: {prefixed_keys}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage: Embedding Caching\n",
"\n",
"A common use case for a key-value store is to cache expensive operations like computing text embeddings, which saves time and cost."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import CacheBackedEmbeddings\n",
"from langchain_google_vertexai.embeddings import VertexAIEmbeddings\n",
"\n",
"underlying_embeddings = VertexAIEmbeddings(\n",
" project=PROJECT_ID, model_name=\"textembedding-gecko@003\"\n",
")\n",
"\n",
"# Use a namespace to avoid key collisions with other data.\n",
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, store, namespace=\"text-embeddings\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"First call (computes and caches embedding):\")\n",
"%time embedding_result_1 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\nSecond call (retrieves from cache):\")\n",
"%time embedding_result_2 = await cached_embedder.aembed_query(\"Hello, world!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### As a Simple Document Retriever\n",
"\n",
"This section shows how to create a simple retriever using the Bigtable store. It acts as a document persistence layer, fetching documents that match a query prefix."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.retrievers import BaseRetriever\n",
"from langchain_core.documents import Document\n",
"from langchain_core.callbacks import CallbackManagerForRetrieverRun\n",
"from typing import List, Optional, Any, Union\n",
"import json\n",
"\n",
"\n",
"class SimpleKVStoreRetriever(BaseRetriever):\n",
" \"\"\"A simple retriever that retrieves documents based on a prefix match in the key-value store.\"\"\"\n",
"\n",
" store: BigtableByteStore\n",
" documents: List[Union[Document, str]]\n",
" k: int\n",
"\n",
" def set_up_store(self):\n",
" kv_pairs_to_set = []\n",
" for i, doc in enumerate(self.documents):\n",
" if isinstance(doc, str):\n",
" doc = Document(page_content=doc)\n",
" if not doc.id:\n",
" doc.id = str(i)\n",
" value = (\n",
" \"Page Content\\n\"\n",
" + doc.page_content\n",
" + \"\\nMetadata\"\n",
" + json.dumps(doc.metadata)\n",
" )\n",
" kv_pairs_to_set.append((doc.id, value.encode(\"utf-8\")))\n",
" self.store.mset(kv_pairs_to_set)\n",
"\n",
" async def _aget_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> List[Document]:\n",
" keys = [key async for key in self.store.ayield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" async for document in await self.store.amget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved\n",
"\n",
" def _get_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: Optional[CallbackManagerForRetrieverRun] = None,\n",
" ) -> list[Document]:\n",
" keys = [key for key in self.store.yield_keys(prefix=query)][: self.k]\n",
" documents_retrieved = []\n",
" for document in self.store.mget(keys):\n",
" if document:\n",
" document_str = document.decode(\"utf-8\")\n",
" page_content = document_str.split(\"Content\\n\")[1].split(\"\\nMetadata\")[0]\n",
" metadata = json.loads(document_str.split(\"\\nMetadata\")[1])\n",
" documents_retrieved.append(\n",
" Document(page_content=page_content, metadata=metadata)\n",
" )\n",
" return documents_retrieved"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"documents = [\n",
" Document(\n",
" page_content=\"Goldfish are popular pets for beginners, requiring relatively simple care.\",\n",
" metadata={\"type\": \"fish\", \"trait\": \"low maintenance\"},\n",
" id=\"fish#Goldfish\",\n",
" ),\n",
" Document(\n",
" page_content=\"Cats are independent pets that often enjoy their own space.\",\n",
" metadata={\"type\": \"cat\", \"trait\": \"independence\"},\n",
" id=\"mammals#Cats\",\n",
" ),\n",
" Document(\n",
" page_content=\"Rabbits are social animals that need plenty of space to hop around.\",\n",
" metadata={\"type\": \"rabbit\", \"trait\": \"social\"},\n",
" id=\"mammals#Rabbits\",\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever_store = BigtableByteStore.create_sync(\n",
" engine=engine, instance_id=INSTANCE_ID, table_id=TABLE_ID\n",
")\n",
"\n",
"KVDocumentRetriever = SimpleKVStoreRetriever(\n",
" store=retriever_store, documents=documents, k=2\n",
")\n",
"\n",
"KVDocumentRetriever.set_up_store()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"fish\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"KVDocumentRetriever.invoke(\"mammals\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For full details on the `BigtableByteStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/key_value_store.py)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,319 @@
{
"cells": [
{
"cell_type": "raw",
"source": [
"---\n",
"sidebar_label: AI/ML API Embeddings\n",
"---"
],
"metadata": {
"collapsed": false
},
"id": "24ae9a5bcf0c8c19"
},
{
"cell_type": "markdown",
"source": [
"# AimlapiEmbeddings\n",
"\n",
"This will help you get started with AI/ML API embedding models using LangChain. For detailed documentation on `AimlapiEmbeddings` features and configuration options, please refer to the [API reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"import { ItemTable } from \"@theme/FeatureTables\";\n",
"\n",
"<ItemTable category=\"text_embedding\" item=\"AI/ML API\" />\n",
"\n",
"## Setup\n",
"\n",
"To access AI/ML API embedding models you'll need to create an account, get an API key, and install the `langchain-aimlapi` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Head to [https://aimlapi.com/app/](https://aimlapi.com/app/?utm_source=langchain&utm_medium=github&utm_campaign=integration) to sign up and generate an API key. Once you've done this, set the `AIMLAPI_API_KEY` environment variable:"
],
"metadata": {
"collapsed": false
},
"id": "4af58f76e6ce897a"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"AIMLAPI_API_KEY\"):\n",
" os.environ[\"AIMLAPI_API_KEY\"] = getpass.getpass(\"Enter your AI/ML API key: \")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:50:37.393789Z",
"start_time": "2025-08-07T07:50:27.679399Z"
}
},
"id": "3297a770bc0b2b88",
"execution_count": 1
},
{
"cell_type": "markdown",
"source": [
"To enable automated tracing of your model calls, set your [LangSmith](https://docs.smith.langchain.com/) API key:"
],
"metadata": {
"collapsed": false
},
"id": "da319ae795659a93"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\"\n",
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:50:40.840377Z",
"start_time": "2025-08-07T07:50:40.837144Z"
}
},
"id": "6869f433a2f9dc3e",
"execution_count": 2
},
{
"cell_type": "markdown",
"source": [
"### Installation\n",
"\n",
"The LangChain AI/ML API integration lives in the `langchain-aimlapi` package:"
],
"metadata": {
"collapsed": false
},
"id": "3f6de2cfc36a4dba"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-aimlapi"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:50:50.693835Z",
"start_time": "2025-08-07T07:50:41.453138Z"
}
},
"id": "23c22092f806aa31",
"execution_count": 3
},
{
"cell_type": "markdown",
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our embeddings model and perform embedding operations:"
],
"metadata": {
"collapsed": false
},
"id": "db718f4b551164f3"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"from langchain_aimlapi import AimlapiEmbeddings\n",
"\n",
"embeddings = AimlapiEmbeddings(\n",
" model=\"text-embedding-ada-002\",\n",
")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:51:03.046723Z",
"start_time": "2025-08-07T07:50:50.694842Z"
}
},
"id": "88b86f20598af88e",
"execution_count": 4
},
{
"cell_type": "markdown",
"source": [
"## Indexing and Retrieval\n",
"\n",
"Embedding models are often used in retrieval-augmented generation (RAG) flows. Below is how to index and retrieve data using the `embeddings` object we initialized above with `InMemoryVectorStore`."
],
"metadata": {
"collapsed": false
},
"id": "847447f4ff1fe82a"
},
{
"cell_type": "code",
"outputs": [
{
"data": {
"text/plain": "'LangChain is the framework for building context-aware reasoning applications'"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.vectorstores import InMemoryVectorStore\n",
"\n",
"text = \"LangChain is the framework for building context-aware reasoning applications\"\n",
"\n",
"vectorstore = InMemoryVectorStore.from_texts(\n",
" [text],\n",
" embedding=embeddings,\n",
")\n",
"\n",
"retriever = vectorstore.as_retriever()\n",
"\n",
"retrieved_documents = retriever.invoke(\"What is LangChain?\")\n",
"retrieved_documents[0].page_content"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:51:05.421030Z",
"start_time": "2025-08-07T07:51:03.047729Z"
}
},
"id": "595ccebd97dabeef",
"execution_count": 5
},
{
"cell_type": "markdown",
"source": [
"## Direct Usage\n",
"\n",
"You can directly call `embed_query` and `embed_documents` for custom embedding scenarios.\n",
"\n",
"### Embed single text:"
],
"metadata": {
"collapsed": false
},
"id": "aa922f78938d1ae1"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.0011368310078978539, 0.00714730704203248, -0.014703838154673576, -0.034064359962940216, 0.011239\n"
]
}
],
"source": [
"single_vector = embeddings.embed_query(text)\n",
"print(str(single_vector)[:100])"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:51:06.285037Z",
"start_time": "2025-08-07T07:51:05.422035Z"
}
},
"id": "c06952ac53aab22",
"execution_count": 6
},
{
"cell_type": "markdown",
"source": [
"### Embed multiple texts:"
],
"metadata": {
"collapsed": false
},
"id": "52c9b7de79992a7b"
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[-0.0011398226488381624, 0.007080476265400648, -0.014682820066809654, -0.03407655283808708, 0.011276\n",
"[-0.005510928109288216, 0.016650190576910973, -0.011078780516982079, -0.03116573952138424, -0.003735\n"
]
}
],
"source": [
"text2 = (\n",
" \"LangGraph is a library for building stateful, multi-actor applications with LLMs\"\n",
")\n",
"two_vectors = embeddings.embed_documents([text, text2])\n",
"for vector in two_vectors:\n",
" print(str(vector)[:100])"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2025-08-07T07:51:07.954778Z",
"start_time": "2025-08-07T07:51:06.285544Z"
}
},
"id": "f1dcf3c389e11cc1",
"execution_count": 7
},
{
"cell_type": "markdown",
"source": [
"## API Reference\n",
"\n",
"For detailed documentation on `AimlapiEmbeddings` features and configuration options, please refer to the [API reference](https://docs.aimlapi.com/?utm_source=langchain&utm_medium=github&utm_campaign=integration).\n"
],
"metadata": {
"collapsed": false
},
"id": "a45ff6faef63cab2"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -26,13 +26,11 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"cell_type": "code",
"outputs": [],
"source": [
"!pip install -U langchain_oci"
]
"execution_count": null,
"source": "!pip install -U langchain-oci"
},
{
"cell_type": "markdown",
@@ -75,9 +73,9 @@
"\n",
"# use default authN method API-key\n",
"embeddings = OCIGenAIEmbeddings(\n",
" model_id=\"MY_EMBEDDING_MODEL\",\n",
" model_id=\"cohere.embed-v4.0\",\n",
" service_endpoint=\"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com\",\n",
" compartment_id=\"MY_OCID\",\n",
" compartment_id=\"compartment_id\",\n",
")\n",
"\n",
"\n",

View File

@@ -42,7 +42,9 @@
"source": [
"### Prerequisites\n",
"\n",
"Ensure you have the Oracle Python Client driver installed to facilitate the integration of Langchain with Oracle AI Vector Search."
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb."
]
},
{
@@ -51,7 +53,7 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
"# python -m pip install -U langchain-oracledb"
]
},
{
@@ -113,7 +115,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_oracledb.embeddings.oracleai import OracleEmbeddings\n",
"\n",
"# Update the directory and file names for your ONNX model\n",
"# make sure that you have onnx file in the system\n",
@@ -223,7 +225,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_oracledb.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
@@ -237,10 +239,10 @@
"\n",
"# using huggingface\n",
"embedder_params = {\n",
" \"provider\": \"huggingface\", \n",
" \"credential_name\": \"HF_CRED\", \n",
" \"url\": \"https://api-inference.huggingface.co/pipeline/feature-extraction/\", \n",
" \"model\": \"sentence-transformers/all-MiniLM-L6-v2\", \n",
" \"provider\": \"huggingface\",\n",
" \"credential_name\": \"HF_CRED\",\n",
" \"url\": \"https://api-inference.huggingface.co/pipeline/feature-extraction/\",\n",
" \"model\": \"sentence-transformers/all-MiniLM-L6-v2\",\n",
" \"wait_for_model\": \"true\"\n",
"}\n",
"\"\"\"\n",

View File

@@ -42,7 +42,9 @@
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb."
]
},
{
@@ -51,7 +53,7 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
"# python -m pip install -U langchain-oracledb"
]
},
{
@@ -123,7 +125,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_oracledb.utilities.oracleai import OracleSummary\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",

View File

@@ -0,0 +1,329 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d3a12ba8",
"metadata": {},
"source": [
"# LangChain ScraperAPI\n",
"\n",
"Give your AI agent the ability to browse websites, search Google and Amazon in just two lines of code.\n",
"\n",
"The `langchain-scraperapi` package adds three ready-to-use LangChain tools backed by the [ScraperAPI](https://www.scraperapi.com/) service:\n",
"\n",
"| Tool class | Use it to |\n",
"|------------|------------------|\n",
"| `ScraperAPITool` | Grab the HTML/text/markdown of any web page |\n",
"| `ScraperAPIGoogleSearchTool` | Get structured Google Search SERP data |\n",
"| `ScraperAPIAmazonSearchTool` | Get structured Amazon product-search data |\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Package | Serializable | [JS support](https://js.langchain.com/docs/integrations/tools/__module_name__) | Package latest |\n",
"| :--- | :---: | :---: | :---: |\n",
"| [langchain-scraperapi](https://pypi.org/project/langchain-scraperapi/) | ❌ | ❌ | v0.1.1 |"
]
},
{
"cell_type": "markdown",
"id": "d1f7c70f",
"metadata": {},
"source": [
"### Setup\n",
"\n",
"Install the `langchain-scraperapi` package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "494ecbc3",
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain-scraperapi"
]
},
{
"cell_type": "markdown",
"id": "c111d2fb",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"Create an account at https://www.scraperapi.com/ and get an API key."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d315465",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\""
]
},
{
"cell_type": "markdown",
"id": "e06ffe48",
"metadata": {},
"source": [
"## Instantiation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "27ae5612",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"tool = ScraperAPITool()"
]
},
{
"cell_type": "markdown",
"id": "9ff46136",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e1a4c7f",
"metadata": {},
"outputs": [],
"source": [
"output = tool.invoke(\n",
" {\n",
" \"url\": \"https://langchain.com\",\n",
" \"output_format\": \"markdown\",\n",
" \"render\": True,\n",
" }\n",
")\n",
"print(output)"
]
},
{
"cell_type": "markdown",
"id": "051ef7b1",
"metadata": {},
"source": [
"## Features\n",
"\n",
"### 1. `ScraperAPITool` — browse any website\n",
"\n",
"Invoke the *raw* ScraperAPI endpoint and get HTML, rendered DOM, text, or markdown.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`url`** **(required)** target page URL \n",
"* **Optional (mirror ScraperAPI query params)** \n",
" * `output_format`: `\"text\"` | `\"markdown\"` (default returns raw HTML) \n",
" * `country_code`: e.g. `\"us\"`, `\"de\"` \n",
" * `device_type`: `\"desktop\"` | `\"mobile\"` \n",
" * `premium`: `bool` use premium proxies \n",
" * `render`: `bool` run JS before returning HTML \n",
" * `keep_headers`: `bool` include response headers \n",
" \n",
"For the complete set of modifiers see the [ScraperAPI request-customisation docs](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a0c7cc2",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"tool = ScraperAPITool()\n",
"\n",
"html_text = tool.invoke(\n",
" {\n",
" \"url\": \"https://langchain.com\",\n",
" \"output_format\": \"markdown\",\n",
" \"render\": True,\n",
" }\n",
")\n",
"print(html_text[:300], \"…\")"
]
},
{
"cell_type": "markdown",
"id": "9f2947dd",
"metadata": {},
"source": [
"### 2. `ScraperAPIGoogleSearchTool` — structured Google Search\n",
"\n",
"Structured SERP data via `/structured/google/search`.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`query`** **(required)** natural-language search string \n",
"* **Optional** — `country_code`, `tld`, `uule`, `hl`, `gl`, `ie`, `oe`, `start`, `num` \n",
"* `output_format`: `\"json\"` (default) or `\"csv\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aeac1195",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPIGoogleSearchTool\n",
"\n",
"google_search = ScraperAPIGoogleSearchTool()\n",
"\n",
"results = google_search.invoke(\n",
" {\n",
" \"query\": \"what is langchain\",\n",
" \"num\": 20,\n",
" \"output_format\": \"json\",\n",
" }\n",
")\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"id": "3dc2f845",
"metadata": {},
"source": [
"### 3. `ScraperAPIAmazonSearchTool` — structured Amazon Search\n",
"\n",
"Structured product results via `/structured/amazon/search`.\n",
"\n",
"**Invocation arguments**\n",
"\n",
"* **`query`** **(required)** product search terms \n",
"* **Optional** — `country_code`, `tld`, `page` \n",
"* `output_format`: `\"json\"` (default) or `\"csv\"`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05a4a6ed",
"metadata": {},
"outputs": [],
"source": [
"from langchain_scraperapi.tools import ScraperAPIAmazonSearchTool\n",
"\n",
"amazon_search = ScraperAPIAmazonSearchTool()\n",
"\n",
"products = amazon_search.invoke(\n",
" {\n",
" \"query\": \"noise cancelling headphones\",\n",
" \"tld\": \"co.uk\",\n",
" \"page\": 2,\n",
" }\n",
")\n",
"print(products)"
]
},
{
"cell_type": "markdown",
"id": "607eb8c8",
"metadata": {},
"source": [
"## Use within an agent\n",
"\n",
"Here is an example of using the tools in an AI agent. The `ScraperAPITool` gives the AI the ability to browse any website, summarize articles, and click on links to navigate between pages."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6541b286",
"metadata": {},
"outputs": [],
"source": [
"%pip install -U langchain-openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb62e921",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_openai import ChatOpenAI\n",
"from langchain_scraperapi.tools import ScraperAPITool\n",
"\n",
"os.environ[\"SCRAPERAPI_API_KEY\"] = \"your-api-key\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n",
"\n",
"tools = [ScraperAPITool(output_format=\"markdown\")]\n",
"llm = ChatOpenAI(model_name=\"gpt-4o\", temperature=0)\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that can browse websites for users. When asked to browse a website or a link, do so with the ScraperAPITool, then provide information based on the website based on the user's needs.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
" ]\n",
")\n",
"\n",
"agent = create_tool_calling_agent(llm, tools, prompt)\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)\n",
"response = agent_executor.invoke(\n",
" {\"input\": \"can you browse hacker news and summarize the first website\"}\n",
")"
]
},
{
"cell_type": "markdown",
"id": "4e90c894",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"Below you can find more information on additional parameters to the tools to customize your requests.\n",
"\n",
"* [ScraperAPITool](https://docs.scraperapi.com/python/making-requests/customizing-requests)\n",
"* [ScraperAPIGoogleSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/google-serp-api-structured-data-in-python)\n",
"* [ScraperAPIAmazonSearchTool](https://docs.scraperapi.com/python/make-requests-with-scraperapi-in-python/scraperapi-structured-data-collection-in-python/amazon-search-api-structured-data-in-python)\n",
"\n",
"The LangChain wrappers surface these parameters directly."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -118,7 +118,7 @@
"metadata": {},
"outputs": [],
"source": [
"from stripe_agent_toolkit.crewai.toolkit import StripeAgentToolkit\n",
"from stripe_agent_toolkit.langchain.toolkit import StripeAgentToolkit\n",
"\n",
"stripe_agent_toolkit = StripeAgentToolkit(\n",
" secret_key=os.getenv(\"STRIPE_SECRET_KEY\"),\n",

View File

@@ -0,0 +1,350 @@
{
"cells": [
{
"cell_type": "raw",
"id": "2ce4bdbc",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"---\n",
"sidebar_label: timbr\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "a6f91f20",
"metadata": {},
"source": [
"# Timbr\n",
"\n",
"[Timbr](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/) integrates natural language inputs with Timbr's ontology-driven semantic layer. Leveraging Timbr's robust ontology capabilities, the SDK integrates with Timbr data models and leverages semantic relationships and annotations, enabling users to query data using business-friendly language.\n",
"\n",
"This notebook provides a quick overview for getting started with Timbr tools and agents. For more information about Timbr visit [Timbr.ai](https://timbr.ai/) or the [Timbr Documentation](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/)\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"Timbr package for LangChain is [langchain-timbr](https://pypi.org/project/langchain-timbr), which provides seamless integration with Timbr's semantic layer for natural language to SQL conversion.\n",
"\n",
"### Tool features\n",
"\n",
"| Tool Name | Description |\n",
"| :--- | :--- |\n",
"| `IdentifyTimbrConceptChain` | Identify relevant concepts from user prompts |\n",
"| `GenerateTimbrSqlChain` | Generate SQL queries from natural language prompts |\n",
"| `ValidateTimbrSqlChain` | Validate SQL queries against Timbr knowledge graph schemas |\n",
"| `ExecuteTimbrQueryChain` | Execute SQL queries against Timbr knowledge graph databases |\n",
"| `GenerateAnswerChain` | Generate human-readable answers from query results |\n",
"| `TimbrSqlAgent` | End-to-end SQL agent for natural language queries |\n",
"\n",
"### TimbrSqlAgent Parameters\n",
"\n",
"The `TimbrSqlAgent` is a pre-built agent that combines all the above tools for end-to-end natural language to SQL processing.\n",
"\n",
"For the complete list of parameters and detailed documentation, see: [TimbrSqlAgent Documentation](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/#timbr-sql-agent)\n",
"\n",
"| Parameter | Type | Required | Description |\n",
"| :--- | :--- | :--- | :--- |\n",
"| `llm` | BaseChatModel | Yes | Language model instance (ChatOpenAI, ChatAnthropic, etc.) |\n",
"| `url` | str | Yes | Timbr application URL |\n",
"| `token` | str | Yes | Timbr API token |\n",
"| `ontology` | str | Yes | Knowledge graph ontology name |\n",
"| `schema` | str | No | Database schema name |\n",
"| `concept` | str | No | Specific concept to focus on |\n",
"| `concepts_list` | List[str] | No | List of relevant concepts |\n",
"| `views_list` | List[str] | No | List of available views |\n",
"| `note` | str | No | Additional context or instructions |\n",
"| `retries` | int | No | Number of retry attempts (default: 3) |\n",
"| `should_validate_sql` | bool | No | Whether to validate generated SQL (default: True) |\n",
"\n",
"## Setup\n",
"\n",
"The integration lives in the `langchain-timbr` package.\n",
"\n",
"In this example, we'll use OpenAI for the LLM provider."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f85b4089",
"metadata": {},
"outputs": [],
"source": [
"%pip install --quiet -U langchain-timbr[openai]"
]
},
{
"cell_type": "markdown",
"id": "b15e9266",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"You'll need Timbr credentials to use the tools. Get your API token from your Timbr application's API settings."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0b178a2-8816-40ca-b57c-ccdd86dde9c9",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"# Set up Timbr credentials\n",
"if not os.environ.get(\"TIMBR_URL\"):\n",
" os.environ[\"TIMBR_URL\"] = input(\"Timbr URL:\\n\")\n",
"\n",
"if not os.environ.get(\"TIMBR_TOKEN\"):\n",
" os.environ[\"TIMBR_TOKEN\"] = getpass.getpass(\"Timbr API Token:\\n\")\n",
"\n",
"if not os.environ.get(\"TIMBR_ONTOLOGY\"):\n",
" os.environ[\"TIMBR_ONTOLOGY\"] = input(\"Timbr Ontology:\\n\")\n",
"\n",
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\\n\")"
]
},
{
"cell_type": "markdown",
"id": "1c97218f-f366-479d-8bf7-fe9f2f6df73f",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Instantiate Timbr tools and agents. First, let's set up the LLM and basic Timbr chains:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b3ddfe9-ca79-494c-a7ab-1f56d9407a64",
"metadata": {},
"outputs": [],
"source": [
"from langchain_timbr import (\n",
" ExecuteTimbrQueryChain,\n",
" GenerateAnswerChain,\n",
" TimbrSqlAgent,\n",
" LlmWrapper,\n",
" LlmTypes,\n",
")\n",
"\n",
"# Set up the LLM\n",
"# from langchain_openai import ChatOpenAI\n",
"# llm = ChatOpenAI(model=\"gpt-4o\", temperature=0)\n",
"\n",
"# Alternative: Use Timbr's LlmWrapper for an easy LLM setup\n",
"llm = LlmWrapper(\n",
" llm_type=LlmTypes.OpenAI, api_key=os.environ[\"OPENAI_API_KEY\"], model=\"gpt-4o\"\n",
")\n",
"\n",
"# Instantiate Timbr chains\n",
"execute_timbr_query_chain = ExecuteTimbrQueryChain(\n",
" llm=llm,\n",
" url=os.environ[\"TIMBR_URL\"],\n",
" token=os.environ[\"TIMBR_TOKEN\"],\n",
" ontology=os.environ[\"TIMBR_ONTOLOGY\"],\n",
")\n",
"\n",
"generate_answer_chain = GenerateAnswerChain(\n",
" llm=llm, url=os.environ[\"TIMBR_URL\"], token=os.environ[\"TIMBR_TOKEN\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "74147a1a",
"metadata": {},
"source": [
"## Invocation\n",
"\n",
"### Execute SQL queries from natural language\n",
"\n",
"You can use the individual chains to perform specific operations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "65310a8b-eb0c-4d9e-a618-4f4abe2414fc",
"metadata": {},
"outputs": [],
"source": [
"# Execute a natural language query\n",
"result = execute_timbr_query_chain.invoke(\n",
" {\"prompt\": \"What are the total sales for last month?\"}\n",
")\n",
"\n",
"print(\"SQL Query:\", result[\"sql\"])\n",
"print(\"Results:\", result[\"rows\"])\n",
"print(\"Concept:\", result[\"concept\"])\n",
"\n",
"# Generate a human-readable answer from the results\n",
"answer_result = generate_answer_chain.invoke(\n",
" {\"prompt\": \"What are the total sales for last month?\", \"rows\": result[\"rows\"]}\n",
")\n",
"\n",
"print(\"Human-readable answer:\", answer_result[\"answer\"])"
]
},
{
"cell_type": "markdown",
"id": "d6e73897",
"metadata": {},
"source": [
"## Use within an agent\n",
"\n",
"### Using TimbrSqlAgent\n",
"\n",
"The `TimbrSqlAgent` provides an end-to-end solution that combines concept identification, SQL generation, validation, execution, and answer generation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f90e33a7",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import AgentExecutor\n",
"\n",
"# Create a TimbrSqlAgent with all parameters\n",
"timbr_agent = TimbrSqlAgent(\n",
" llm=llm,\n",
" url=os.environ[\"TIMBR_URL\"],\n",
" token=os.environ[\"TIMBR_TOKEN\"],\n",
" ontology=os.environ[\"TIMBR_ONTOLOGY\"],\n",
" concepts_list=[\"Sales\", \"Orders\"], # optional\n",
" views_list=[\"sales_view\"], # optional\n",
" note=\"Focus on monthly aggregations\", # optional\n",
" retries=3, # optional\n",
" should_validate_sql=True, # optional\n",
")\n",
"\n",
"# Use the agent for end-to-end natural language to answer processing\n",
"agent_result = AgentExecutor.from_agent_and_tools(\n",
" agent=timbr_agent,\n",
" tools=[], # No tools needed as we're directly using the chain\n",
" verbose=True,\n",
").invoke(\"Show me the top 5 customers by total sales amount this year\")\n",
"\n",
"print(\"Final Answer:\", agent_result[\"answer\"])\n",
"print(\"Generated SQL:\", agent_result[\"sql\"])\n",
"print(\"Usage Metadata:\", agent_result.get(\"usage_metadata\", {}))"
]
},
{
"cell_type": "markdown",
"id": "659f9fbd-6fcf-445f-aa8c-72d8e60154bd",
"metadata": {},
"source": [
"### Sequential Chains\n",
"\n",
"You can combine multiple Timbr chains using LangChain's SequentialChain for custom workflows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af3123ad-7a02-40e5-b58e-7d56e23e5830",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import SequentialChain\n",
"\n",
"# Create a sequential pipeline\n",
"pipeline = SequentialChain(\n",
" chains=[execute_timbr_query_chain, generate_answer_chain],\n",
" input_variables=[\"prompt\"],\n",
" output_variables=[\"answer\", \"sql\", \"rows\"],\n",
")\n",
"\n",
"# Execute the pipeline\n",
"pipeline_result = pipeline.invoke(\n",
" {\"prompt\": \"What are the average order values by customer segment?\"}\n",
")\n",
"\n",
"print(\"Pipeline Result:\", pipeline_result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fdbf35b5-3aaf-4947-9ec6-48c21533fb95",
"metadata": {},
"outputs": [],
"source": [
"# Example: Accessing usage metadata from Timbr operations\n",
"result_with_metadata = execute_timbr_query_chain.invoke(\n",
" {\"prompt\": \"How many orders were placed last quarter?\"}\n",
")\n",
"\n",
"# Extract usage metadata\n",
"usage_metadata = result_with_metadata.get(\"execute_timbr_usage_metadata\", {})\n",
"determine_concept_usage = usage_metadata.get(\"determine_concept\", {})\n",
"generate_sql_usage = usage_metadata.get(\"generate_sql\", {})\n",
"\n",
"print(determine_concept_usage)\n",
"\n",
"print(\n",
" \"Concept determination token estimate:\",\n",
" determine_concept_usage.get(\"approximate\", \"N/A\"),\n",
")\n",
"print(\n",
" \"Concept determination tokens:\",\n",
" determine_concept_usage.get(\"token_usage\", {}).get(\"total_tokens\", \"N/A\"),\n",
")\n",
"\n",
"print(\"SQL generation token estimate:\", generate_sql_usage.get(\"approximate\", \"N/A\"))\n",
"print(\n",
" \"SQL generation tokens:\",\n",
" generate_sql_usage.get(\"token_usage\", {}).get(\"total_tokens\", \"N/A\"),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "4ac8146c",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"- [PyPI](https://pypi.org/project/langchain-timbr)\n",
"- [GitHub](https://github.com/WPSemantix/langchain-timbr)\n",
"- [LangChain Timbr Documentation](https://docs.timbr.ai/doc/docs/integration/langchain-sdk/)\n",
"- [LangGraph Timbr Documentation](https://docs.timbr.ai/doc/docs/integration/langgraph-sdk)\n",
"- [Timbr Official Website](https://timbr.ai/)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,281 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "FVo_qZB6crBs"
},
"source": [
"# ZenRowsUniversalScraper\n",
"\n",
"[ZenRows](https://www.zenrows.com/) is an enterprise-grade web scraping tool that provides advanced web data extraction capabilities at scale. For more information about ZenRows and its Universal Scraper API, visit the [official documentation](https://docs.zenrows.com/universal-scraper-api/).\n",
"\n",
"This document provides a quick overview for getting started with ZenRowsUniversalScraper tool. For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [API reference](https://github.com/ZenRows-Hub/langchain-zenrows?tab=readme-ov-file#api-reference).\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | JS support | Package latest |\n",
"| :--- | :--- | :---: | :---: |\n",
"| [ZenRowsUniversalScraper](https://pypi.org/project/langchain-zenrows/) | [langchain-zenrows](https://pypi.org/project/langchain-zenrows/) | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zenrows?style=flat-square&label=%20) |\n",
"\n",
"### Tool features\n",
"\n",
"| Feature | Support |\n",
"| :--- | :---: |\n",
"| **JavaScript Rendering** | ✅ |\n",
"| **Anti-Bot Bypass** | ✅ |\n",
"| **Geo-Targeting** | ✅ |\n",
"| **Multiple Output Formats** | ✅ |\n",
"| **CSS Extraction** | ✅ |\n",
"| **Screenshot Capture** | ✅ |\n",
"| **Session Management** | ✅ |\n",
"| **Premium Proxies** | ✅ |\n",
"\n",
"## Setup\n",
"\n",
"Install the required dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"id": "henNSgOlcww5"
},
"outputs": [],
"source": [
"pip install langchain-zenrows"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IS2yw_UaczgP"
},
"source": [
"### Credentials\n",
"\n",
"You'll need a ZenRows API key to use this tool. You can sign up for free at [ZenRows](https://app.zenrows.com/register?prod=universal_scraper)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "Z097qruic2iH"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hB7fHgmQc5eh"
},
"source": [
"## Instantiation\n",
"\n",
"Here's how to instantiate an instance of the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "ezdGcI3Hc8H3"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cUal-Ioic_0k"
},
"source": [
"You can also pass the ZenRows API key when initializing the ZenRowsUniversalScraper tool."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "sPd95HKzdCGr"
},
"outputs": [],
"source": [
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper(zenrows_api_key=\"your-api-key\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c8rEvAY4dFX2"
},
"source": [
"## Invocation\n",
"\n",
"### Basic Usage\n",
"\n",
"The tool accepts a URL and various optional parameters to customize the scraping behavior:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GKTDKhXEdGku"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"# Initialize the tool\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape a simple webpage\n",
"result = zenrows_scraper_tool.invoke({\"url\": \"https://httpbin.io/html\"})\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7Kd1loN5dJbt"
},
"source": [
"### Advanced Usage with Parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NfJOQdBhdLrp"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"\n",
"# Set your ZenRows API key\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Scrape with JavaScript rendering and premium proxies\n",
"result = zenrows_scraper_tool.invoke(\n",
" {\n",
" \"url\": \"https://www.scrapingcourse.com/ecommerce/\",\n",
" \"js_render\": True,\n",
" \"premium_proxy\": True,\n",
" \"proxy_country\": \"us\",\n",
" \"response_type\": \"markdown\",\n",
" \"wait\": 2000,\n",
" }\n",
")\n",
"\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8eivshtqdNe0"
},
"source": [
"### Use within an agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JmbPF7xadPgK"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_openai import ChatOpenAI # or your preferred LLM\n",
"from langchain_zenrows import ZenRowsUniversalScraper\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
"# Set your ZenRows and OpenAI API keys\n",
"os.environ[\"ZENROWS_API_KEY\"] = \"<YOUR_ZENROWS_API_KEY>\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<YOUR_OPEN_AI_API_KEY>\"\n",
"\n",
"\n",
"# Initialize components\n",
"llm = ChatOpenAI(model=\"gpt-4o-mini\")\n",
"zenrows_scraper_tool = ZenRowsUniversalScraper()\n",
"\n",
"# Create agent\n",
"agent = create_react_agent(llm, [zenrows_scraper_tool])\n",
"\n",
"# Use the agent\n",
"result = agent.invoke(\n",
" {\n",
" \"messages\": \"Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time.\"\n",
" }\n",
")\n",
"\n",
"print(\"Agent Response:\")\n",
"for message in result[\"messages\"]:\n",
" print(f\"{message.content}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k9lqlhoAdRSb"
},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ZenRowsUniversalScraper features and configurations head to the [**ZenRowsUniversalScraper API reference**](https://github.com/ZenRows-Hub/langchain-zenrows).\n",
"\n",
"For comprehensive information about the underlying API parameters and capabilities, see the [ZenRows Universal API documentation](https://docs.zenrows.com/universal-scraper-api/api-reference)."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,839 @@
{
"cells": [
{
"cell_type": "raw",
"id": "7fb27b941602401d91542211134fc71a",
"metadata": {
"id": "7fb27b941602401d91542211134fc71a"
},
"source": [
"---\n",
"sidebar_label: Google Bigtable\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "acae54e37e7d407bbb7b55eff062a284",
"metadata": {
"id": "acae54e37e7d407bbb7b55eff062a284"
},
"source": [
"# BigtableVectorStore\n",
"\n",
"This guide covers the `BigtableVectorStore` integration for using Google Cloud Bigtable as a vector store.\n",
"\n",
"[Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data.\n"
]
},
{
"cell_type": "markdown",
"id": "9a63283cbaf04dbcab1f6479b197f3a8",
"metadata": {
"id": "9a63283cbaf04dbcab1f6479b197f3a8"
},
"source": [
"## Overview\n",
"\n",
"The `BigtableVectorStore` uses Google Cloud Bigtable to store documents and their vector embeddings for similarity search and retrieval. It supports powerful metadata filtering to refine search results.\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: |\n",
"| [BigtableVectorStore](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py) | [langchain-google-bigtable](https://pypi.org/project/langchain-google-bigtable/) | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-google-bigtable?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-google-bigtable) |"
]
},
{
"cell_type": "markdown",
"id": "8dd0d8092fe74a7c96281538738b07e2",
"metadata": {
"id": "8dd0d8092fe74a7c96281538738b07e2"
},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "72eea5119410473aa328ad9291626812",
"metadata": {
"id": "72eea5119410473aa328ad9291626812"
},
"source": [
"### Prerequisites\n",
"\n",
"To get started, you will need a Google Cloud project with an active Bigtable instance.\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Enable the Bigtable API](https://console.cloud.google.com/flows/enableapi?apiid=bigtable.googleapis.com)\n",
"* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"\n",
"### Installation\n",
"\n",
"The integration is in the `langchain-google-bigtable` package. The command below also installs `langchain-google-vertexai` to use for an embedding service."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8edb47106e1a46a883d545849b8ab81b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8edb47106e1a46a883d545849b8ab81b",
"outputId": "b6c95f84-f271-4bd0-f024-81ea38ce7f80"
},
"outputs": [],
"source": [
"%pip install -qU langchain-google-bigtable langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"id": "WEparXIIO41L",
"metadata": {
"id": "WEparXIIO41L"
},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "OB8Mg8HxO9HV",
"metadata": {
"id": "OB8Mg8HxO9HV"
},
"outputs": [],
"source": [
"# Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "10185d26023b46108eb7d9f57d49d2b3",
"metadata": {
"id": "10185d26023b46108eb7d9f57d49d2b3"
},
"source": [
"### Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8763a12b2bbd4a93a75aff182afb95dc",
"outputId": "865ca13d-47e1-4458-dfe3-96b0e7a57810"
},
"outputs": [],
"source": [
"# @markdown Please fill in your project, instance, and a new table name.\n",
"PROJECT_ID = \"google.com:cloud-bigtable-dev\" # @param {type:\"string\"}\n",
"INSTANCE_ID = \"anweshadas-test\" # @param {type:\"string\"}\n",
"TABLE_ID = \"your-vector-store-table-3\" # @param {type:\"string\"}\n",
"\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "xx0JMrbNOfnV",
"metadata": {
"id": "xx0JMrbNOfnV"
},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "T1pPsDCzOURd",
"metadata": {
"id": "T1pPsDCzOURd"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user(project_id=PROJECT_ID)"
]
},
{
"cell_type": "markdown",
"id": "7623eae2785240b9bd12b16a66d81610",
"metadata": {
"id": "7623eae2785240b9bd12b16a66d81610"
},
"source": [
"## Initialization\n",
"\n",
"Initializing the `BigtableVectorStore` involves three steps: setting up the embedding service, ensuring the Bigtable table is created, and configuring the store's parameters."
]
},
{
"cell_type": "markdown",
"id": "7cdc8c89c7104fffa095e18ddfef8986",
"metadata": {
"id": "7cdc8c89c7104fffa095e18ddfef8986"
},
"source": [
"### 1. Set up Embedding Service\n",
"First, we need a model to create the vector embeddings for our documents. We'll use a Vertex AI model for this example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b118ea5561624da68c537baed56e602f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b118ea5561624da68c537baed56e602f",
"outputId": "99b55b9a-61c7-4dbe-bf1f-dd84ddc434da"
},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embeddings = VertexAIEmbeddings(project=PROJECT_ID, model_name=\"gemini-embedding-001\")"
]
},
{
"cell_type": "markdown",
"id": "938c804e27f84196a10c8828c723f798",
"metadata": {
"id": "938c804e27f84196a10c8828c723f798"
},
"source": [
"### 2. Initialize a Table\n",
"Before creating a `BigtableVectorStore`, a table with the correct column families must exist. The `init_vector_store_table` helper function is the recommended way to create and configure a table. If the table already exists, it will do nothing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "504fb2a444614c0babb325280ed9130a",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "504fb2a444614c0babb325280ed9130a",
"outputId": "2e6453bc-5eed-4a3e-a8d4-59e945485be6"
},
"outputs": [],
"source": [
"from langchain_google_bigtable.vector_store import init_vector_store_table\n",
"\n",
"DATA_COLUMN_FAMILY = \"doc_data\"\n",
"\n",
"try:\n",
" init_vector_store_table(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
" content_column_family=DATA_COLUMN_FAMILY,\n",
" embedding_column_family=DATA_COLUMN_FAMILY,\n",
" )\n",
" print(f\"Table '{TABLE_ID}' is ready.\")\n",
"except ValueError as e:\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"id": "59bbdb311c014d738909a11f9e486628",
"metadata": {
"id": "59bbdb311c014d738909a11f9e486628"
},
"source": [
"### 3. Configure the Vector Store\n",
"Now we define the parameters that control how the vector store connects to Bigtable and how it handles data."
]
},
{
"cell_type": "markdown",
"id": "b43b363d81ae4b689946ece5c682cd59",
"metadata": {
"id": "b43b363d81ae4b689946ece5c682cd59"
},
"source": [
"#### The BigtableEngine\n",
"A `BigtableEngine` object manages clients and async operations. It is highly recommended to initialize a single engine and reuse it across multiple stores for better performance and resource management."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a65eabff63a45729fe45fb5ade58bdc",
"metadata": {
"id": "8a65eabff63a45729fe45fb5ade58bdc"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableEngine\n",
"\n",
"engine = await BigtableEngine.async_initialize(project_id=PROJECT_ID)"
]
},
{
"cell_type": "markdown",
"id": "c3933fab20d04ec698c2621248eb3be0",
"metadata": {
"id": "c3933fab20d04ec698c2621248eb3be0"
},
"source": [
"#### Collections\n",
"A `collection` provides a logical namespace for your documents within a single Bigtable table. It is used as a prefix for the row keys, allowing multiple vector stores to coexist in the same table without interfering with each other."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4dd4641cc4064e0191573fe9c69df29b",
"metadata": {
"id": "4dd4641cc4064e0191573fe9c69df29b"
},
"outputs": [],
"source": [
"collection_name = \"my_docs\""
]
},
{
"cell_type": "markdown",
"id": "8309879909854d7188b41380fd92a7c3",
"metadata": {
"id": "8309879909854d7188b41380fd92a7c3"
},
"source": [
"#### Metadata Configuration\n",
"When creating a `BigtableVectorStore`, you have two optional parameters for handling metadata:\n",
"\n",
"* `metadata_mappings`: This is a list of `VectorMetadataMapping` objects. You **must** define a mapping for any metadata key you wish to use for filtering in your search queries. Each mapping specifies the data type (`encoding`) for the metadata field, which is crucial for correct filtering.\n",
"* `metadata_as_json_column`: This is an optional `ColumnConfig` that tells the store to save the *entire* metadata dictionary as a single JSON string in a specific column. This is useful for efficiently retrieving all of a document's metadata at once, including fields not defined in `metadata_mappings`. **Note:** Fields stored only in this JSON column cannot be used for filtering."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ed186c9a28b402fb0bc4494df01f08d",
"metadata": {
"id": "3ed186c9a28b402fb0bc4494df01f08d"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import ColumnConfig, VectorMetadataMapping, Encoding\n",
"\n",
"# Define mappings for metadata fields you want to filter on.\n",
"metadata_mappings = [\n",
" VectorMetadataMapping(metadata_key=\"author\", encoding=Encoding.UTF8),\n",
" VectorMetadataMapping(metadata_key=\"year\", encoding=Encoding.INT_BIG_ENDIAN),\n",
" VectorMetadataMapping(metadata_key=\"category\", encoding=Encoding.UTF8),\n",
" VectorMetadataMapping(metadata_key=\"rating\", encoding=Encoding.FLOAT),\n",
"]\n",
"\n",
"# Define the optional column for storing all metadata as a single JSON string.\n",
"metadata_as_json_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"metadata_json\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "cb1e1581032b452c9409d6c6813c49d1",
"metadata": {
"id": "cb1e1581032b452c9409d6c6813c49d1"
},
"source": [
"### 4. Create the BigtableVectorStore Instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "iKM4BktZR56p",
"metadata": {
"id": "iKM4BktZR56p"
},
"outputs": [],
"source": [
"# Configure the columns for your store.\n",
"content_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"content\"\n",
")\n",
"embedding_column = ColumnConfig(\n",
" column_family=DATA_COLUMN_FAMILY, column_qualifier=\"embedding\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "379cbbc1e968416e875cc15c1202d7eb",
"metadata": {
"id": "379cbbc1e968416e875cc15c1202d7eb"
},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableVectorStore\n",
"\n",
"vector_store = await BigtableVectorStore.create(\n",
" project_id=PROJECT_ID,\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
" engine=engine,\n",
" embedding_service=embeddings,\n",
" collection=collection_name,\n",
" metadata_mappings=metadata_mappings,\n",
" metadata_as_json_column=metadata_as_json_column,\n",
" content_column=content_column,\n",
" embedding_column=embedding_column,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "277c27b1587741f2af2001be3712ef0d",
"metadata": {
"id": "277c27b1587741f2af2001be3712ef0d"
},
"source": [
"## Manage vector store"
]
},
{
"cell_type": "markdown",
"id": "db7b79bc585a40fcaf58bf750017e135",
"metadata": {
"id": "db7b79bc585a40fcaf58bf750017e135"
},
"source": [
"### Add Documents\n",
"You can add documents with pre-defined IDs. If a `Document` is added without an `id` attribute, the vector store will automatically generate a **`uuid4` string** for it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "916684f9a58a4a2aa5f864670399430d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "916684f9a58a4a2aa5f864670399430d",
"outputId": "eb343088-624a-41a1-94cd-53e0c3cfa207"
},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"docs_to_add = [\n",
" Document(\n",
" page_content=\"A young farm boy, Luke Skywalker, is thrust into a galactic conflict.\",\n",
" id=\"doc_1\",\n",
" metadata={\n",
" \"author\": \"George Lucas\",\n",
" \"year\": 1977,\n",
" \"category\": \"sci-fi\",\n",
" \"rating\": 4.8,\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"A hobbit named Frodo Baggins must destroy a powerful ring.\",\n",
" id=\"doc_2\",\n",
" metadata={\n",
" \"author\": \"J.R.R. Tolkien\",\n",
" \"year\": 1954,\n",
" \"category\": \"fantasy\",\n",
" \"rating\": 4.9,\n",
" },\n",
" ),\n",
" # Document without a pre-defined ID, one will be generated.\n",
" Document(\n",
" page_content=\"A group of children confront an evil entity emerging from the sewers.\",\n",
" metadata={\"author\": \"Stephen King\", \"year\": 1986, \"category\": \"horror\"},\n",
" ),\n",
" Document(\n",
" page_content=\"In a distant future, the noble House Atreides rules the desert planet Arrakis.\",\n",
" id=\"doc_3\",\n",
" metadata={\n",
" \"author\": \"Frank Herbert\",\n",
" \"year\": 1965,\n",
" \"category\": \"sci-fi\",\n",
" \"rating\": 4.9,\n",
" },\n",
" ),\n",
"]\n",
"\n",
"added_ids = await vector_store.aadd_documents(docs_to_add)\n",
"print(f\"Added documents with IDs: {added_ids}\")"
]
},
{
"cell_type": "markdown",
"id": "1671c31a24314836a5b85d7ef7fbf015",
"metadata": {
"id": "1671c31a24314836a5b85d7ef7fbf015"
},
"source": [
"### Update Documents\n",
"`BigtableVectorStore` handles updates by overwriting. To update a document, simply add it again with the same ID but with new content or metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33b0902fd34d4ace834912fa1002cf8e",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "33b0902fd34d4ace834912fa1002cf8e",
"outputId": "d80f2b01-44df-45d7-9ff5-f77527f04733"
},
"outputs": [],
"source": [
"doc_to_update = [\n",
" Document(\n",
" page_content=\"An old hobbit, Frodo Baggins, must take a powerful ring to be destroyed.\", # Updated content\n",
" id=\"doc_2\", # Same ID\n",
" metadata={\n",
" \"author\": \"J.R.R. Tolkien\",\n",
" \"year\": 1954,\n",
" \"category\": \"epic-fantasy\",\n",
" \"rating\": 4.9,\n",
" }, # Updated metadata\n",
" )\n",
"]\n",
"\n",
"await vector_store.aadd_documents(doc_to_update)\n",
"print(\"Document 'doc_2' has been updated.\")"
]
},
{
"cell_type": "markdown",
"id": "f6fa52606d8c4a75a9b52967216f8f3f",
"metadata": {
"id": "f6fa52606d8c4a75a9b52967216f8f3f"
},
"source": [
"### Delete Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5a1fa73e5044315a093ec459c9be902",
"metadata": {
"id": "f5a1fa73e5044315a093ec459c9be902"
},
"outputs": [],
"source": [
"is_deleted = await vector_store.adelete(ids=[\"doc_2\"])"
]
},
{
"cell_type": "markdown",
"id": "cdf66aed5cc84ca1b48e60bad68798a8",
"metadata": {
"id": "cdf66aed5cc84ca1b48e60bad68798a8"
},
"source": [
"## Query vector store"
]
},
{
"cell_type": "markdown",
"id": "28d3efd5258a48a79c179ea5c6759f01",
"metadata": {
"id": "28d3efd5258a48a79c179ea5c6759f01"
},
"source": [
"### Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3f9bc0b9dd2c44919cc8dcca39b469f8",
"outputId": "dbd5426c-139a-451b-d456-c241cf794aec"
},
"outputs": [],
"source": [
"results = await vector_store.asimilarity_search(\"a story about a powerful ring\", k=1)\n",
"print(results[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "0e382214b5f147d187d36a2058b9c724",
"metadata": {
"id": "0e382214b5f147d187d36a2058b9c724"
},
"source": [
"### Search with Filters\n",
"\n",
"Apply filters before the vector search runs."
]
},
{
"cell_type": "markdown",
"id": "e7f8g9h0-query-header-restored",
"metadata": {
"id": "e7f8g9h0-query-header-restored"
},
"source": [
"#### The kNN Search Algorithm and Filtering\n",
"\n",
"By default, `BigtableVectorStore` uses a **k-Nearest Neighbors (kNN)** search algorithm to find the `k` vectors in the database that are most similar to your query vector. The vector store offers filtering to reduce the search space *before* the kNN search is performed, which can make queries faster and more relevant.\n",
"\n",
"#### Configuring Queries with `QueryParameters`\n",
"\n",
"All search settings are controlled via the `QueryParameters` object. This object allows you to specify not only filters but also other important search aspects:\n",
"* `algorithm`: The search algorithm to use. Defaults to `\"kNN\"`.\n",
"* `distance_strategy`: The metric used for comparison, such as `COSINE` (default) or `EUCLIDEAN`.\n",
"* `vector_data_type`: The data type of the stored vectors, like `FLOAT32` or `DOUBLE64`. This should match the precision of your embeddings.\n",
"* `filters`: A dictionary defining the filtering logic to apply.\n",
"\n",
"#### Understanding Encodings\n",
"\n",
"To filter on metadata fields, you must define them in `metadata_mappings` with the correct `encoding` so Bigtable can properly interpret the data. Supported encodings include:\n",
"* **String**: `UTF8`, `UTF16`, `ASCII` for text-based metadata.\n",
"* **Numeric**: `INT_BIG_ENDIAN` or `INT_LITTLE_ENDIAN` for integers, and `FLOAT` or `DOUBLE` for decimal numbers.\n",
"* **Boolean**: `BOOL` for true/false values."
]
},
{
"cell_type": "markdown",
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f",
"metadata": {
"id": "5b09d5ef5b5e4bb6ab9b829b10b6a29f"
},
"source": [
"#### Filtering Support Table\n",
"\n",
"| Filter Category | Key / Operator | Meaning |\n",
"|---|---|---|\n",
"| **Row Key** | `RowKeyFilter` | Narrows search to document IDs with a specific prefix. |\n",
"| **Metadata Key** | `ColumnQualifiers` | Checks for the presence of one or more exact metadata keys. |\n",
"| | `ColumnQualifierPrefix` | Checks if a metadata key starts with a given prefix. |\n",
"| | `ColumnQualifierRegex` | Checks if a metadata key matches a regular expression. |\n",
"| **Metadata Value** | `ColumnValueFilter` | Container for all value-based conditions. |\n",
"| | `==` | Equality |\n",
"| | `!=` | Inequality |\n",
"| | `>` | Greater than |\n",
"| | `<` | Less than |\n",
"| | `>=` | Greater than or equal |\n",
"| | `<=` | Less than or equal |\n",
"| | `in` | Value is in a list. |\n",
"| | `nin` | Value is not in a list. |\n",
"| | `contains` | Checks for substring presence. |\n",
"| | `like` | Performs a regex match on a string. |\n",
"| **Logical**| `ColumnValueChainFilter` | Logical AND for combining value conditions. |\n",
"| | `ColumnValueUnionFilter` | Logical OR for combining value conditions. |"
]
},
{
"cell_type": "markdown",
"id": "a50416e276a0479cbe66534ed1713a40",
"metadata": {
"id": "a50416e276a0479cbe66534ed1713a40"
},
"source": [
"#### Complex Filter Example\n",
"\n",
"This example uses multiple nested logical filters. It searches for documents that are either (`category` is 'sci-fi' AND `year` between 1970-2000) OR (`author` is 'J.R.R. Tolkien') OR (`rating` > 4.5)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46a27a456b804aa2a380d5edf15a5daf",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "46a27a456b804aa2a380d5edf15a5daf",
"outputId": "7679570a-80f6-4342-8380-daecb62d7cf8"
},
"outputs": [],
"source": [
"from langchain_google_bigtable.vector_store import QueryParameters\n",
"\n",
"complex_filter = {\n",
" \"ColumnValueFilter\": {\n",
" \"ColumnValueUnionFilter\": { # OR\n",
" \"ColumnValueChainFilter\": { # First AND condition\n",
" \"category\": {\"==\": \"sci-fi\"},\n",
" \"year\": {\">\": 1970, \"<\": 2000},\n",
" },\n",
" \"author\": {\"==\": \"J.R.R. Tolkien\"},\n",
" }\n",
" }\n",
"}\n",
"\n",
"query_params_complex = QueryParameters(filters=complex_filter)\n",
"\n",
"complex_results = await vector_store.asimilarity_search(\n",
" \"a story about a hero's journey\", k=5, query_parameters=query_params_complex\n",
")\n",
"\n",
"print(f\"Found {len(complex_results)} documents matching the complex filter:\")\n",
"for doc in complex_results:\n",
" print(f\"- ID: {doc.id}, Metadata: {doc.metadata}\")"
]
},
{
"cell_type": "markdown",
"id": "1944c39560714e6e80c856f20744a8e5",
"metadata": {
"id": "1944c39560714e6e80c856f20744a8e5"
},
"source": [
"### Search with score\n",
"You can also retrieve the distance score along with the documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6ca27006b894b04b6fc8b79396e2797",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "d6ca27006b894b04b6fc8b79396e2797",
"outputId": "32360bd3-7ccb-4ed6-b68a-52788c902049"
},
"outputs": [],
"source": [
"results_with_scores = await vector_store.asimilarity_search_with_score(\n",
" query=\"an evil entity\", k=1\n",
")\n",
"for doc, score in results_with_scores:\n",
" print(f\"* [SCORE={score:.4f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "f61877af4e7f4313ad8234302950b331",
"metadata": {
"id": "f61877af4e7f4313ad8234302950b331"
},
"source": [
"### Use as Retriever\n",
"The vector store can be easily used as a retriever in RAG applications. You can specify the search type (e.g., `similarity` or `mmr`) and pass search-time arguments like `k` and `query_parameters`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "84d5ab97d17b4c38ab41a2b065bbd0c0",
"outputId": "b33dc07f-08d4-4108-c50d-96dd4e8d719b"
},
"outputs": [],
"source": [
"# Define a filter to use with the retriever\n",
"retriever_filter = {\"ColumnValueFilter\": {\"category\": {\"==\": \"horror\"}}}\n",
"retriever_query_params = QueryParameters(filters=retriever_filter)\n",
"\n",
"retriever = vector_store.as_retriever(\n",
" search_type=\"mmr\", # Specify MMR for retrieval\n",
" search_kwargs={\n",
" \"k\": 1,\n",
" \"lambda_mult\": 0.8,\n",
" \"query_parameters\": retriever_query_params, # Pass filter parameters\n",
" },\n",
")\n",
"retrieved_docs = await retriever.ainvoke(\"a story about a hobbit\")\n",
"print(retrieved_docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2",
"metadata": {
"id": "35ffc1ce1c7b4df9ace1bc936b8b1dc2"
},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [Tutorials](https://python.langchain.com/docs/tutorials/rag/)\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
]
},
{
"cell_type": "markdown",
"id": "76127f4a2f6a44fba749ea7800e59d51",
"metadata": {
"id": "76127f4a2f6a44fba749ea7800e59d51"
},
"source": [
"## API reference\n",
"\n",
"For full details on the `BigtableVectorStore` class, see the source code on [GitHub](https://github.com/googleapis/langchain-google-bigtable-python/blob/main/src/langchain_google_bigtable/vector_store.py)."
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -43,9 +43,9 @@
"source": [
"### Prerequisites for using Langchain with Oracle AI Vector Search\n",
"\n",
"You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration\n",
"You'll need to install `langchain-oracledb` with `python -m pip install -U langchain-oracledb` to use this integration.\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
"The `python-oracledb` driver is installed automatically as a dependency of langchain-oracledb."
]
},
{
@@ -55,7 +55,7 @@
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
"# python -m pip install -U langchain-oracledb"
]
},
{
@@ -103,8 +103,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import oraclevs\n",
"from langchain_community.vectorstores.oraclevs import OracleVS\n",
"from langchain_oracledb.vectorstores import oraclevs\n",
"from langchain_oracledb.vectorstores.oraclevs import OracleVS\n",
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
"from langchain_core.documents import Document\n",
"from langchain_huggingface import HuggingFaceEmbeddings"
@@ -400,7 +400,111 @@
"id": "7223d048-5c0b-4e91-a91b-a7daa9f86758",
"metadata": {},
"source": [
"### Demonstrate advanced searches on all six vector stores, with and without attribute filtering with filtering, we only select the document id 101 and nothing else"
"### Demonstrate advanced searches on all six vector stores, with and without attribute filtering with filtering, we only select the document id 101 and nothing else.\n",
"\n",
"Oracle Database 23ai supports pre-filtering, in-filtering, and post-filtering to enhance AI Vector Search capabilities. These filtering mechanisms allow users to apply constraints before, during, and after performing vector similarity searches, improving search performance and accuracy.\n",
"\n",
"Key Points about Filtering in Oracle 23ai:\n",
"1. Pre-filtering\n",
" Applies traditional SQL filters to reduce the dataset before performing the vector similarity search.\n",
" Helps improve efficiency by limiting the amount of data processed by AI algorithms.\n",
"2. In-filtering\n",
" Utilizes AI Vector Search to perform similarity searches directly on vector embeddings, using optimized indexes and algorithms.\n",
" Efficiently filters results based on vector similarity without requiring full dataset scans.\n",
"3. Post-filtering\n",
" Applies additional SQL filtering to refine the results after the vector similarity search.\n",
" Allows further refinement based on business logic or additional metadata conditions.\n",
"\n",
"\n",
"**Why is this Important?**\n",
"- Performance Optimization: Pre-filtering significantly reduces query execution time, making searches on massive datasets more efficient.\n",
"- Accuracy Enhancement: In-filtering ensures that vector searches are semantically meaningful, improving the quality of search results.\n"
]
},
{
"cell_type": "markdown",
"id": "71406bf9",
"metadata": {},
"source": [
"#### Filter Details\n",
"\n",
"`OracleVS` supports a set of filters that can be applied to `metadata` fields using `filter` parameter. These filters allow you to select and refine data based on various criteria. \n",
"\n",
"**Available Filter Operators:**\n",
"\n",
"| Operator | Description |\n",
"|--------------------------|--------------------------------------------------------------------------------------------------|\n",
"| \\$exists | Field exists. |\n",
"| \\$eq | Field value equals the operand value (`=`). |\n",
"| \\$ne | Field exists and value does not equal the operand value (`!=`). |\n",
"| \\$gt | Field value is greater than the operand value (`>`). |\n",
"| \\$lt | Field value is less than the operand value (`<`). |\n",
"| \\$gte | Field value is greater than or equal to the operand value (`>=`). |\n",
"| \\$lte | Field value is less than or equal to the operand value (`<=`). |\n",
"| \\$between | Field value is between (or equal to) two values in the operand array. |\n",
"| \\$startsWith | Field value starts with the operand value. |\n",
"| \\$hasSubstring | Field value contains the operand as a substring. |\n",
"| \\$instr | Field value contains the operand as a substring. |\n",
"| \\$regex | Field value matches the given regular expression pattern. |\n",
"| \\$like | Field value matches the operand pattern (using SQL-like syntax). |\n",
"| \\$in | Field value equals at least one value in the operand array. |\n",
"| \\$nin | Field exists, but its value is not equal to any in the operand array, or the field does not exist.|\n",
"| \\$all | Field value is an array containing all items from the operand array, or a scalar matching a single operand. |\n",
"\n",
"- You can combine these filters using logical operators:\n",
"\n",
"| Logical Operator | Description |\n",
"|------------------|----------------------|\n",
"| \\$and | Logical AND |\n",
"| \\$or | Logical OR |\n",
"| \\$nor | Logical NOR |\n",
"\n",
"**Example Filter:**\n",
"```json\n",
"{\n",
" \"age\": 65,\n",
" \"name\": {\"$regex\": \"*rk\"},\n",
" \"$or\": [\n",
" {\n",
" \"$and\": [\n",
" {\"name\": \"Jason\"},\n",
" {\"drinks\": {\"$in\": [\"tea\", \"soda\"]}}\n",
" ]\n",
" },\n",
" {\n",
" \"$nor\": [\n",
" {\"age\": {\"$lt\": 65}},\n",
" {\"name\": \"Jason\"}\n",
" ]\n",
" }\n",
" ]\n",
"}\n",
"```\n",
"\n",
"**Additional Usage Tips:**\n",
"- You can omit `$and` when all filters in an object must be satisfied. These two are equivalent:\n",
"```json\n",
"{ \"$and\": [\n",
" { \"name\": { \"$startsWith\": \"Fred\" } },\n",
" { \"salary\": { \"$gt\": 10000, \"$lte\": 20000 } }\n",
"]}\n",
"```\n",
"```json\n",
"{\n",
" \"name\": { \"$startsWith\": \"Fred\" },\n",
" \"salary\": { \"$gt\": 10000, \"$lte\": 20000 }\n",
"}\n",
"```\n",
"- The `$not` clause can negate a comparison operator:\n",
"```json\n",
"{ \"address.zip\": { \"$not\": { \"$eq\": \"90001\" } } }\n",
"```\n",
"- Using `field: scalar` is equivalent to `field: { \"$eq\": scalar }`:\n",
"```json\n",
"{ \"animal\": \"cat\" }\n",
"```\n",
"\n",
"For more filter examples, refer to the [test specification](https://github.com/oracle/langchain-oracle/blob/main/libs/oracledb/tests/integration_tests/vectorstores/test_oraclevs.py)."
]
},
{
@@ -415,7 +519,23 @@
" query = \"How are LOBS stored in Oracle Database\"\n",
" # Constructing a filter for direct comparison against document metadata\n",
" # This filter aims to include documents whose metadata 'id' is exactly '2'\n",
" filter_criteria = {\"id\": [\"101\"]} # Direct comparison filter\n",
" db_filter = {\n",
" \"$and\": [\n",
" {\"id\": \"101\"}, # FilterCondition\n",
" {\n",
" \"$or\": [ # FilterGroup\n",
" {\"status\": \"approved\"},\n",
" {\"link\": \"Document Example Test 2\"},\n",
" {\n",
" \"$and\": [ # Nested FilterGroup\n",
" {\"status\": \"approved\"},\n",
" {\"link\": \"Document Example Test 2\"},\n",
" ]\n",
" },\n",
" ]\n",
" },\n",
" ]\n",
" }\n",
"\n",
" for i, vs in enumerate(vector_stores, start=1):\n",
" print(f\"\\n--- Vector Store {i} Advanced Searches ---\")\n",
@@ -425,7 +545,7 @@
"\n",
" # Similarity search with a filter\n",
" print(\"\\nSimilarity search results with filter:\")\n",
" print(vs.similarity_search(query, 2, filter=filter_criteria))\n",
" print(vs.similarity_search(query, 2, filter=db_filter))\n",
"\n",
" # Similarity search with relevance score\n",
" print(\"\\nSimilarity search with relevance score:\")\n",
@@ -433,7 +553,7 @@
"\n",
" # Similarity search with relevance score with filter\n",
" print(\"\\nSimilarity search with relevance score with filter:\")\n",
" print(vs.similarity_search_with_score(query, 2, filter=filter_criteria))\n",
" print(vs.similarity_search_with_score(query, 2, filter=db_filter))\n",
"\n",
" # Max marginal relevance search\n",
" print(\"\\nMax marginal relevance search results:\")\n",
@@ -443,7 +563,7 @@
" print(\"\\nMax marginal relevance search results with filter:\")\n",
" print(\n",
" vs.max_marginal_relevance_search(\n",
" query, 2, fetch_k=20, lambda_mult=0.5, filter=filter_criteria\n",
" query, 2, fetch_k=20, lambda_mult=0.5, filter=db_filter\n",
" )\n",
" )\n",
"\n",
@@ -477,7 +597,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.13.5"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,664 @@
{
"cells": [
{
"cell_type": "raw",
"id": "1957f5cb",
"metadata": {},
"source": [
"---\n",
"sidebar_label: YugabyteDB\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "ef1f0986",
"metadata": {},
"source": [
"# YugabyteDBVectorStore\n",
"\n",
"This notebook covers how to get started with the YugabyteDB vector store in langchain, using the `langchain-yugabytedb` package.\n",
"\n",
"YugabyteDB is a cloud-native distributed PostgreSQL-compatible database that combines strong consistency with ultra-resilience, seamless scalability, geo-distribution, and highly flexible data locality to deliver business-critical, transactional applications.\n",
"\n",
"[YugabyteDB](https://www.yugabyte.com/ai/) combines the power of the `pgvector` PostgreSQL extension with an inherently distributed architecture. This future-proofed foundation helps you build GenAI applications using RAG retrieval that demands high-performance vector search.\n",
"\n",
"YugabyteDBs unique approach to vector indexing addresses the limitations of single-node PostgreSQL systems when dealing with large-scale vector datasets.\n",
"\n",
"\n",
"## Setup\n",
"\n",
"### Minimum Version\n",
"`langchain-yugabytedb` module requires YugabyteDB `v2025.1.0.0` or higher.\n",
"\n",
"### Connecting to YugabyteDB database\n",
"\n",
"In order to get started with `YugabyteDBVectorStore`, lets start a local YugabyteDB node for development purposes - \n",
"\n",
"### Start YugabyteDB RF-1 Universe."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a8147d9",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"\n",
"docker run -d --name yugabyte_node01 --hostname yugabyte01 \\\n",
" -p 7000:7000 -p 9000:9000 -p 15433:15433 -p 5433:5433 -p 9042:9042 \\\n",
" yugabytedb/yugabyte:2.25.2.0-b359 bin/yugabyted start --background=false \\\n",
" --master_flags=\"allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true\" \\\n",
" --tserver_flags=\"allowed_preview_flags_csv=ysql_yb_enable_advisory_locks,ysql_yb_enable_advisory_locks=true\"\n",
"\n",
"docker exec -it yugabyte_node01 bin/ysqlsh -h yugabyte01 -c \"CREATE extension vector;\""
]
},
{
"cell_type": "markdown",
"id": "541e4507",
"metadata": {},
"source": [
"For production deployment, performance benchmarking, or deploying a true multi-node on multi-host setup, see Deploy [YugabyteDB](https://docs.yugabyte.com/stable/deploy/)."
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "432f461c",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain\n",
"%pip install --upgrade --quiet langchain-openai langchain-community tiktoken\n",
"%pip install --upgrade --quiet psycopg-binary"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64e28aa6",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU \"langchain-yugabytedb\""
]
},
{
"cell_type": "markdown",
"id": "3f9951f4",
"metadata": {},
"source": [
"### Set your YugabyteDB Values\n",
"\n",
"YugabyteDB clients connect to the cluster using a PostgreSQL compliant connection string. YugabyteDB connection parameters are provided below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4715d61",
"metadata": {},
"outputs": [],
"source": [
"YUGABYTEDB_USER = \"yugabyte\" # @param {type: \"string\"}\n",
"YUGABYTEDB_PASSWORD = \"\" # @param {type: \"string\"}\n",
"YUGABYTEDB_HOST = \"localhost\" # @param {type: \"string\"}\n",
"YUGABYTEDB_PORT = \"5433\" # @param {type: \"string\"}\n",
"YUGABYTEDB_DB = \"yugabyte\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "93df377e",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"### Environment Setup\n",
"\n",
"This notebook uses the OpenAI API through `OpenAIEmbeddings`. We suggest obtaining an OpenAI API key and export it as an environment variable with the name `OPENAI_API_KEY`.\n",
"\n",
"### Connecting to YugabyteDB Universe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_yugabytedb import YBEngine, YugabyteDBVectorStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"TABLE_NAME = \"my_doc_collection\"\n",
"VECTOR_SIZE = 1536\n",
"\n",
"CONNECTION_STRING = (\n",
" f\"postgresql+asyncpg://{YUGABYTEDB_USER}:{YUGABYTEDB_PASSWORD}@{YUGABYTEDB_HOST}\"\n",
" f\":{YUGABYTEDB_PORT}/{YUGABYTEDB_DB}\"\n",
")\n",
"engine = YBEngine.from_connection_string(url=CONNECTION_STRING)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"engine.init_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=VECTOR_SIZE,\n",
")\n",
"\n",
"yugabyteDBVectorStore = YugabyteDBVectorStore.create_sync(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" embedding_service=embeddings,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ac6071d4",
"metadata": {},
"source": [
"## Manage vector store\n",
"\n",
"### Add items to vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17f5efc0",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"docs = [\n",
" Document(page_content=\"Apples and oranges\"),\n",
" Document(page_content=\"Cars and airplanes\"),\n",
" Document(page_content=\"Train\"),\n",
"]\n",
"\n",
"yugabyteDBVectorStore.add_documents(docs)"
]
},
{
"cell_type": "markdown",
"id": "7b92b5f6",
"metadata": {},
"source": [
"['b40e7f47-3a4e-4b88-b6e2-cb3465dde6bd', '275823d2-1a47-440d-904b-c07b132fd72b', 'f0c5a9bc-1456-40fe-906b-4e808d601470']"
]
},
{
"cell_type": "markdown",
"id": "dcf1b905",
"metadata": {},
"source": [
"### Delete items from vector store\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef61e188",
"metadata": {},
"outputs": [],
"source": [
"yugabyteDBVectorStore.delete(ids=[\"275823d2-1a47-440d-904b-c07b132fd72b\"])"
]
},
{
"cell_type": "markdown",
"id": "f8f751e1",
"metadata": {},
"source": [
"### Update items from vector store\n",
"\n",
"Note: Update operation is not supported by YugabyteDBVectorStore."
]
},
{
"cell_type": "markdown",
"id": "c3620501",
"metadata": {},
"source": [
"## Query vector store\n",
"\n",
"Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n",
"\n",
"### Query directly\n",
"\n",
"Performing a simple similarity search can be done as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa0a16fa",
"metadata": {},
"outputs": [],
"source": [
"query = \"I'd like a fruit.\"\n",
"docs = yugabyteDBVectorStore.similarity_search(query)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"id": "3ed9d733",
"metadata": {},
"source": [
"If you want to execute a similarity search and receive the corresponding scores you can run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5efd2eaa",
"metadata": {},
"outputs": [],
"source": [
"query = \"I'd like a fruit.\"\n",
"docs = yugabyteDBVectorStore.similarity_search(query, k=1)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"id": "0c235cdc",
"metadata": {},
"source": [
"### Query by turning into retriever\n",
"\n",
"You can also transform the vector store into a retriever for easier usage in your chains. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3460093",
"metadata": {},
"outputs": [],
"source": [
"retriever = yugabyteDBVectorStore.as_retriever(search_kwargs={\"k\": 1})\n",
"retriever.invoke(\"I'd like a fruit.\")"
]
},
{
"cell_type": "markdown",
"id": "5e657ae6",
"metadata": {},
"source": [
"## ChatMessageHistory\n",
"\n",
"The chat message history abstraction helps to persist chat message history in a YugabyteDB table.\n",
"\n",
"`YugabyteDBChatMessageHistory` is parameterized using a table_name and a session_id.\n",
"\n",
"The table_name is the name of the table in the database where the chat messages will be stored.\n",
"\n",
"The session_id is a unique identifier for the chat session. It can be assigned by the caller using uuid.uuid4()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0677c927",
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"from langchain_core.messages import SystemMessage, AIMessage, HumanMessage\n",
"from langchain_yugabytedb import YugabyteDBChatMessageHistory\n",
"import psycopg\n",
"\n",
"# Establish a synchronous connection to the database\n",
"# (or use psycopg.AsyncConnection for async)\n",
"conn_info = \"dbname=yugabyte user=yugabyte host=localhost port=5433\"\n",
"sync_connection = psycopg.connect(conn_info)\n",
"\n",
"# Create the table schema (only needs to be done once)\n",
"table_name = \"chat_history\"\n",
"YugabyteDBChatMessageHistory.create_tables(sync_connection, table_name)\n",
"\n",
"session_id = str(uuid.uuid4())\n",
"\n",
"# Initialize the chat history manager\n",
"chat_history = YugabyteDBChatMessageHistory(\n",
" table_name, session_id, sync_connection=sync_connection\n",
")\n",
"\n",
"# Add messages to the chat history\n",
"chat_history.add_messages(\n",
" [\n",
" SystemMessage(content=\"Meow\"),\n",
" AIMessage(content=\"woof\"),\n",
" HumanMessage(content=\"bark\"),\n",
" ]\n",
")\n",
"\n",
"print(chat_history.messages)"
]
},
{
"cell_type": "markdown",
"id": "901c75dc",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"\n",
"One of the primary advantages of the vector stores is to provide contextual data to the LLMs. LLMs often are trained with stale data and might not have the relevant domain specific knowledge which results in halucinations in LLMs responses. Take the following example - \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d0f51eb3",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"from langchain_openai import ChatOpenAI\n",
"from langchain_core.messages import HumanMessage, SystemMessage, AIMessage\n",
"\n",
"my_api_key = getpass.getpass(\"Enter your API Key: \")\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0.7, api_key=my_api_key)\n",
"# Start with a system message to set the persona/behavior of the AI\n",
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful and friendly assistant named 'YugaAI'. You love to answer questions about YugabyteDB and distributed sql.\"\n",
" ),\n",
" # First human turn\n",
" HumanMessage(content=\"Hi YugaAI! Where's the headquarters of YugabyteDB?\"),\n",
"]\n",
"\n",
"print(\"--- First Interaction ---\")\n",
"print(f\"Human: {messages[1].content}\") # Print the human message\n",
"response1 = llm.invoke(messages)\n",
"print(f\"YugaAI: {response1.content}\")\n",
"\n",
"print(\"\\n--- Second Interaction ---\")\n",
"print(f\"Human: {messages[2].content}\") # Print the new human message\n",
"response2 = llm.invoke(messages) # Send the *entire* message history\n",
"print(f\"YugaAI: {response2.content}\")\n",
"\n",
"# Add the second AI response to the history\n",
"messages.append(AIMessage(content=response2.content))\n",
"\n",
"# --- 5. Another Turn with a different topic ---\n",
"messages.append(\n",
" HumanMessage(\n",
" content=\"Can you tell me the current preview release version of YugabyteDB?\"\n",
" )\n",
")\n",
"\n",
"print(\"\\n--- Third Interaction ---\")\n",
"print(f\"Human: {messages[4].content}\") # Print the new human message\n",
"response3 = llm.invoke(messages) # Send the *entire* message history\n",
"print(f\"YugaAI: {response3.content}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8500e27b",
"metadata": {
"vscode": {
"languageId": "Log"
}
},
"outputs": [],
"source": [
"--- First Interaction ---\n",
"Human: Hi YugaAI! Where's the headquarters of YugabyteDB?\n",
"YugaAI: Hello! YugabyteDB's headquarters is located in Sunnyvale, California, USA.\n",
"\n",
"--- Second Interaction ---\n",
"Human: And what are YugabyteDB's supported APIs?\n",
"YugaAI: YugabyteDB's headquarters is located in Sunnyvale, California, USA.\n",
"\n",
"YugabyteDB supports several APIs, including:\n",
"1. YSQL (PostgreSQL-compatible SQL)\n",
"2. YCQL (Cassandra-compatible query language)\n",
"3. YEDIS (Redis-compatible key-value store)\n",
"\n",
"These APIs allow developers to interact with YugabyteDB using familiar interfaces and tools.\n",
"\n",
"--- Third Interaction ---\n",
"Human: Can you tell me the current preview release version of YugabyteDB?\n",
"YugaAI: The current preview release version of YugabyteDB is 2.11.0. This version includes new features, improvements, and bug fixes that are being tested by the community before the official stable release."
]
},
{
"cell_type": "markdown",
"id": "7fe4301c",
"metadata": {},
"source": [
"The current preview release of YugabyteDB is `v2.25.2.0`, however LLMs is providing stale information which is 2-3 years old. This is where the vector stores complement the LLMs by providing a way to store and retrive relevant information."
]
},
{
"cell_type": "markdown",
"id": "d4e19220",
"metadata": {},
"source": [
"### Construct a RAG for providing contextual information\n",
"\n",
"We will provide the relevant information to the LLMs by reading the YugabyteDB documentation. Let's first read the YugabyteDB docs and add data into YugabyteDB Vectorstore by loading, splitting and chuncking data from a html source. We will then store the vector embeddings generated by OpenAI embeddings into YugabyteDB Vectorstore.\n",
"\n",
"#### Generate Embeddings "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05ec4ebc",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"from langchain_community.document_loaders import WebBaseLoader\n",
"from langchain_text_splitters import CharacterTextSplitter\n",
"from langchain_yugabytedb import YBEngine, YugabyteDBVectorStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"my_api_key = getpass.getpass(\"Enter your API Key: \")\n",
"url = \"https://docs.yugabyte.com/preview/releases/ybdb-releases/v2.25/\"\n",
"\n",
"loader = WebBaseLoader(url)\n",
"\n",
"documents = loader.load()\n",
"\n",
"print(f\"Number of documents loaded: {len(documents)}\")\n",
"\n",
"# For very large HTML files, you'll want to split the text into smaller\n",
"# chunks before sending them to an LLM, as LLMs have token limits.\n",
"for i, doc in enumerate(documents):\n",
" text_splitter = CharacterTextSplitter(\n",
" separator=\"\\n\\n\", # Split by double newline (common paragraph separator)\n",
" chunk_size=1000, # Each chunk will aim for 1000 characters\n",
" chunk_overlap=200, # Allow 200 characters overlap between chunks\n",
" length_function=len,\n",
" is_separator_regex=False,\n",
" )\n",
"\n",
" # Apply the splitter to the loaded documents\n",
" chunks = text_splitter.split_documents(documents)\n",
"\n",
" print(f\"\\n--- After Splitting ({len(chunks)} chunks) ---\")\n",
"\n",
" CONNECTION_STRING = \"postgresql+psycopg://yugabyte:@localhost:5433/yugabyte\"\n",
" TABLE_NAME = \"yb_relnotes_chunks\"\n",
" VECTOR_SIZE = 1536\n",
" engine = YBEngine.from_connection_string(url=CONNECTION_STRING)\n",
" engine.init_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=VECTOR_SIZE,\n",
" )\n",
" embeddings = OpenAIEmbeddings(api_key=my_api_key)\n",
"\n",
" # The PGVector.from_documents method handles:\n",
" # 1. Creating the table if it doesn't exist (with 'embedding' column).\n",
" # 2. Generating embeddings for each chunk using the provided embeddings model.\n",
" # 3. Inserting the chunk text, metadata, and embeddings into the table.\n",
" vectorstore = YugabyteDBVectorStore.from_documents(\n",
" engine=engine, table_name=TABLE_NAME, documents=chunks, embedding=embeddings\n",
" )\n",
"\n",
" print(f\"Successfully stored {len(chunks)} chunks in PostgreSQL table: {TABLE_NAME}\")"
]
},
{
"cell_type": "markdown",
"id": "e6483d89",
"metadata": {},
"source": [
"#### Configure the YugabyteDB retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18a84445",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"retriever = vectorstore.as_retriever(search_kwargs={\"k\": 3})\n",
"print(\n",
" f\"Retriever created, set to retrieve top {retriever.search_kwargs['k']} documents.\"\n",
")\n",
"\n",
"\n",
"# Initialize the Chat Model (e.g., OpenAI's GPT-3.5 Turbo)\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0, api_key=my_api_key)\n",
"\n",
"# Define the RAG prompt template\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful and friendly assistant named 'YugaAI'. You love to answer questions about YugabyteDB and distributed sql.\",\n",
" ),\n",
" (\"human\", \"Context: {context}\\nQuestion: {question}\"),\n",
" ]\n",
")\n",
"# Build the RAG chain\n",
"# 1. Take the input question.\n",
"# 2. Pass it to the retriever to get relevant documents.\n",
"# 3. Format the documents into a string for the context.\n",
"# 4. Pass the context and question to the prompt template.\n",
"# 5. Send the prompt to the LLM.\n",
"# 6. Parse the LLM's string output.\n",
"rag_chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "markdown",
"id": "04d12dc5",
"metadata": {},
"source": [
"Now, let's try asking the same question `Can you tell me the current preview release version of YugabyteDB?` again to the LLM"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "846e9963",
"metadata": {},
"outputs": [],
"source": [
"# Invoke the RAG chain with a question\n",
"rag_query = \"Can you tell me the current preview release version of YugabyteDB?\"\n",
"print(f\"\\nQuerying RAG chain: '{rag_query}'\")\n",
"rag_response = rag_chain.invoke(rag_query)\n",
"print(\"\\n--- RAG Chain Response ---\")\n",
"print(rag_response)"
]
},
{
"cell_type": "markdown",
"id": "c6fc3e7f",
"metadata": {},
"source": [
"Querying RAG chain: 'Can you tell me the current preview release version of YugabyteDB?'\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efc2e0c7",
"metadata": {
"vscode": {
"languageId": "log"
}
},
"outputs": [],
"source": [
"--- RAG Chain Response ---\n",
"The current preview release version of YugabyteDB is v2.25.2.0."
]
},
{
"cell_type": "markdown",
"id": "af388f24",
"metadata": {},
"source": [
"## API reference\n",
" \n",
"For detailed information of all YugabyteDBVectorStore features and configurations head to the langchain-yugabytedb github repo: https://github.com/yugabyte/langchain-yugabytedb\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,617 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ef1f0986",
"metadata": {},
"source": [
"# ⚡ ZeusDB Vector Store\n",
"\n",
"ZeusDB is a high-performance, Rust-powered vector database with enterprise features like quantization, persistence and logging.\n",
"\n",
"This notebook covers how to get started with the ZeusDB Vector Store to efficiently use ZeusDB with LangChain."
]
},
{
"cell_type": "markdown",
"id": "107c485d-13a3-4309-9fda-5a0440862d3c",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "36fdc060",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "d978e3fd-d130-436f-841d-d133c0fae8fb",
"metadata": {},
"source": [
"Install the ZeusDB LangChain integration package from PyPi:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42ca8320-b866-4f37-944e-96eda54231d2",
"metadata": {},
"outputs": [],
"source": [
"pip install -qU langchain-zeusdb"
]
},
{
"cell_type": "markdown",
"id": "2a0e518a-ae8a-464b-8b47-9deb9d4ab063",
"metadata": {},
"source": [
"*Setup in Jupyter Notebooks*"
]
},
{
"cell_type": "markdown",
"id": "1d092ea6-8553-4686-9563-b8318225a04a",
"metadata": {},
"source": [
"> 💡 Tip: If youre working inside Jupyter or Google Colab, use the %pip magic command so the package is installed into the active kernel:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64e28aa6",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-zeusdb"
]
},
{
"cell_type": "markdown",
"id": "c12fe175-a299-47d3-869f-9367b6aa572d",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "31554e69-40b2-4201-9f92-57e73ac66d33",
"metadata": {},
"source": [
"## Getting Started"
]
},
{
"cell_type": "markdown",
"id": "b696b3dd-0fed-4ed2-a79a-5b32598508c0",
"metadata": {},
"source": [
"This example uses OpenAIEmbeddings, which requires an OpenAI API key [Get your OpenAI API key here](https://platform.openai.com/api-keys)"
]
},
{
"cell_type": "markdown",
"id": "2b79766e-7725-4be0-a183-4947b56892c5",
"metadata": {},
"source": [
"If you prefer, you can also use this package with any other embedding provider (Hugging Face, Cohere, custom functions, etc.)."
]
},
{
"cell_type": "markdown",
"id": "b5266cc7-28da-459e-a28d-128382ed5a20",
"metadata": {},
"source": [
"Install the LangChain OpenAI integration package from PyPi:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ed941cd-5e06-4c61-9235-90bd0b0b0452",
"metadata": {},
"outputs": [],
"source": [
"pip install -qU langchain-openai\n",
"\n",
"# Use this command if inside Jupyter Notebooks\n",
"#%pip install -qU langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "0f49b2ec-d047-455d-8c05-da041112dd8a",
"metadata": {},
"source": [
"#### Please choose an option below for your OpenAI key integration"
]
},
{
"cell_type": "markdown",
"id": "ed2d9bf6-be53-4fc1-9611-158f03fd71b7",
"metadata": {},
"source": [
"*Option 1: 🔑 Enter your API key each time* "
]
},
{
"cell_type": "markdown",
"id": "eff5b6a5-4c57-4531-896e-54bcb2b1dec2",
"metadata": {},
"source": [
"Use getpass in Jupyter to securely input your key for the current session:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08a50da9-5ed1-40dc-a390-07b031369761",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "markdown",
"id": "7321917e-8586-42e4-9822-b68cfd74f233",
"metadata": {},
"source": [
"*Option 2: 🗂️ Use a .env file*"
]
},
{
"cell_type": "markdown",
"id": "b9297b6b-bd7e-457f-95af-5b41c7ab9b41",
"metadata": {},
"source": [
"Keep your key in a local .env file and load it automatically with python-dotenv"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85a139dc-f439-4e4e-bc46-76d9478c304d",
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv() # reads .env and sets OPENAI_API_KEY"
]
},
{
"cell_type": "markdown",
"id": "1af364e3-df59-4963-aaaa-0e83f6ec5e32",
"metadata": {},
"source": [
"🎉🎉 That's it! You are good to go."
]
},
{
"cell_type": "markdown",
"id": "3146180e-026e-4421-a490-ffd14ceabac3",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "93df377e",
"metadata": {},
"source": [
"## Initialization"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb55dfe8-2c98-45b6-ba90-7a3667ceee0c",
"metadata": {},
"outputs": [],
"source": [
"# Import required Packages and Classes\n",
"from langchain_zeusdb import ZeusDBVectorStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from zeusdb import VectorDatabase"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Initialize embeddings\n",
"embeddings = OpenAIEmbeddings(model=\"text-embedding-3-small\")\n",
"\n",
"# Create ZeusDB index\n",
"vdb = VectorDatabase()\n",
"index = vdb.create(index_type=\"hnsw\", dim=1536, space=\"cosine\")\n",
"\n",
"# Create vector store\n",
"vector_store = ZeusDBVectorStore(zeusdb_index=index, embedding=embeddings)"
]
},
{
"cell_type": "markdown",
"id": "f45fa43c-8b54-4a75-b7b0-92ac0ac506c6",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "ac6071d4",
"metadata": {},
"source": [
"## Manage vector store"
]
},
{
"cell_type": "markdown",
"id": "edf53787-ebda-4306-afc3-f7d440dcb1ff",
"metadata": {},
"source": [
"### 2.1 Add items to vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17f5efc0",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"\n",
"document_1 = Document(\n",
" page_content=\"ZeusDB is a high-performance vector database\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"document_2 = Document(\n",
" page_content=\"Product Quantization reduces memory usage significantly\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"document_3 = Document(\n",
" page_content=\"ZeusDB integrates seamlessly with LangChain\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\"},\n",
")\n",
"\n",
"documents = [document_1, document_2, document_3]\n",
"\n",
"vector_store.add_documents(documents=documents, ids=[\"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "markdown",
"id": "c738c3e0",
"metadata": {},
"source": [
"### 2.2 Update items in vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0aa8b71",
"metadata": {},
"outputs": [],
"source": [
"updated_document = Document(\n",
" page_content=\"ZeusDB now supports advanced Product Quantization with 4x-256x compression\",\n",
" metadata={\"source\": \"https://docs.zeusdb.com\", \"updated\": True},\n",
")\n",
"\n",
"vector_store.add_documents([updated_document], ids=[\"1\"])"
]
},
{
"cell_type": "markdown",
"id": "dcf1b905",
"metadata": {},
"source": [
"### 2.3 Delete items from vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef61e188",
"metadata": {},
"outputs": [],
"source": [
"vector_store.delete(ids=[\"3\"])"
]
},
{
"cell_type": "markdown",
"id": "1a0091af-777d-4651-888a-3b346d7990f5",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "c3620501",
"metadata": {},
"source": [
"## Query vector store"
]
},
{
"cell_type": "markdown",
"id": "4ba3fdb2-b7d6-4f0f-b8c9-91f63596018b",
"metadata": {},
"source": [
"### 3.1 Query directly"
]
},
{
"cell_type": "markdown",
"id": "400a9b25-9587-4116-ab59-6888602ec2b1",
"metadata": {},
"source": [
"Performing a simple similarity search:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa0a16fa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search(query=\"high performance database\", k=2)\n",
"\n",
"for doc in results:\n",
" print(f\"* {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "3ed9d733",
"metadata": {},
"source": [
"If you want to execute a similarity search and receive the corresponding scores:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5efd2eaa",
"metadata": {},
"outputs": [],
"source": [
"results = vector_store.similarity_search_with_score(query=\"memory optimization\", k=2)\n",
"\n",
"for doc, score in results:\n",
" print(f\"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]\")"
]
},
{
"cell_type": "markdown",
"id": "0c235cdc",
"metadata": {},
"source": [
"### 3.2 Query by turning into retriever"
]
},
{
"cell_type": "markdown",
"id": "59292cb5-5dc8-4158-9137-89d0f6ca711d",
"metadata": {},
"source": [
"You can also transform the vector store into a retriever for easier usage in your chains:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3460093",
"metadata": {},
"outputs": [],
"source": [
"retriever = vector_store.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 2})\n",
"\n",
"retriever.invoke(\"vector database features\")"
]
},
{
"cell_type": "markdown",
"id": "cc2d2b63-99d8-45c4-85e6-6a9409551ada",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "persistence_section",
"metadata": {},
"source": [
"## ZeusDB-Specific Features"
]
},
{
"cell_type": "markdown",
"id": "memory_section",
"metadata": {},
"source": [
"### 4.1 Memory-Efficient Setup with Product Quantization"
]
},
{
"cell_type": "markdown",
"id": "12832d02-d9ea-4c35-a20f-05c85d1d7723",
"metadata": {},
"source": [
"For large datasets, use Product Quantization to reduce memory usage:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "quantization_example",
"metadata": {},
"outputs": [],
"source": [
"# Create memory-optimized vector store\n",
"quantization_config = {\"type\": \"pq\", \"subvectors\": 8, \"bits\": 8, \"training_size\": 10000}\n",
"\n",
"vdb_quantized = VectorDatabase()\n",
"quantized_index = vdb_quantized.create(\n",
" index_type=\"hnsw\", dim=1536, quantization_config=quantization_config\n",
")\n",
"\n",
"quantized_vector_store = ZeusDBVectorStore(\n",
" zeusdb_index=quantized_index, embedding=embeddings\n",
")\n",
"\n",
"print(f\"Created quantized store: {quantized_index.info()}\")"
]
},
{
"cell_type": "markdown",
"id": "6ffe0613-b2a7-484e-9219-1166b65c49c5",
"metadata": {},
"source": [
"### 4.2 Persistence"
]
},
{
"cell_type": "markdown",
"id": "fbc323ee-4c6c-43fc-beba-675d820ca078",
"metadata": {},
"source": [
"Save and load your vector store to disk:"
]
},
{
"cell_type": "markdown",
"id": "834354d1-55ad-48fe-84e1-a5eacff3f6bb",
"metadata": {},
"source": [
"How to Save your vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9d1332b-a7ac-4a4b-a060-f2061599d3f1",
"metadata": {},
"outputs": [],
"source": [
"# Save the vector store\n",
"vector_store.save_index(\"my_zeusdb_index.zdb\")"
]
},
{
"cell_type": "markdown",
"id": "23370621-5b51-4313-800f-3a2fb9de52d2",
"metadata": {},
"source": [
"How to Load your vector store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9ed5778-58e4-4724-b69d-3c7b48cda429",
"metadata": {},
"outputs": [],
"source": [
"# Load the vector store\n",
"loaded_store = ZeusDBVectorStore.load_index(\n",
" path=\"my_zeusdb_index.zdb\", embedding=embeddings\n",
")\n",
"\n",
"print(f\"Loaded store with {loaded_store.get_vector_count()} vectors\")"
]
},
{
"cell_type": "markdown",
"id": "610cfe63-d4a8-4ef0-88a8-cf9cc3cbbfce",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "901c75dc",
"metadata": {},
"source": [
"## Usage for retrieval-augmented generation\n",
"\n",
"For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
"\n",
"- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n",
"- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval/)"
]
},
{
"cell_type": "markdown",
"id": "1d9d9d51-3798-410f-b1b3-f9736ea8c238",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "25b08eb0-99ab-4919-a201-5243fdfa39e9",
"metadata": {},
"source": [
"## API reference"
]
},
{
"cell_type": "markdown",
"id": "77fdca8b-f75e-4100-9f1d-7a017567dc59",
"metadata": {},
"source": [
"For detailed documentation of all ZeusDBVectorStore features and configurations head to the Doc reference: https://docs.zeusdb.com/en/latest/vector_database/integrations/langchain.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -3,6 +3,10 @@ sidebar_position: 0
sidebar_class_name: hidden
---
:::danger
⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the new v1.0 docs</a>
:::
# Introduction
**LangChain** is a framework for developing applications powered by large language models (LLMs).

View File

@@ -691,7 +691,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"id": "a13462d0-2d02-4474-921e-15a1ba1fa274",
"metadata": {},
"outputs": [
@@ -709,16 +709,15 @@
}
],
"source": [
"input_message = {\"role\": \"user\", \"content\": \"Hi, I'm Bob!\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"Hi, I'm Bob!\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"id": "56d8028b-5dbc-40b2-86f5-ed60631d86a3",
"metadata": {},
"outputs": [
@@ -736,9 +735,8 @@
}
],
"source": [
"input_message = {\"role\": \"user\", \"content\": \"What's my name?\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"What is my name?\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]
@@ -761,7 +759,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"id": "24460239",
"metadata": {},
"outputs": [
@@ -782,9 +780,8 @@
"# highlight-next-line\n",
"config = {\"configurable\": {\"thread_id\": \"xyz123\"}}\n",
"\n",
"input_message = {\"role\": \"user\", \"content\": \"What's my name?\"}\n",
"for step in agent_executor.stream(\n",
" {\"messages\": [input_message]}, config, stream_mode=\"values\"\n",
" {\"messages\": [(\"user\", \"What is my name?\")]}, config, stream_mode=\"values\"\n",
"):\n",
" step[\"messages\"][-1].pretty_print()"
]

View File

@@ -2,6 +2,11 @@
sidebar_position: 0
sidebar_class_name: hidden
---
:::danger
⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the new v1.0 docs</a>
:::
# Tutorials
New to LangChain or LLM app development in general? Read this material to quickly get up and running building your first applications.

View File

@@ -1,6 +1,6 @@
# LangChain v0.3
*Last updated: 09.16.24*
*Last updated: 09.16.2024*
## What's changed

View File

@@ -87,7 +87,7 @@ const config = {
({
docs: {
editUrl:
"https://github.com/langchain-ai/langchain/edit/master/docs/",
"https://github.com/langchain-ai/langchain/edit/v0.3/docs/",
sidebarPath: require.resolve("./sidebars.js"),
remarkPlugins: [
[require("@docusaurus/remark-plugin-npm2yarn"), { sync: true }],
@@ -142,8 +142,8 @@ const config = {
respectPrefersColorScheme: true,
},
announcementBar: {
content: "These docs will be deprecated and no longer maintained with the release of LangChain v1.0 in October 2025. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the v1.0 alpha docs</a>",
backgroundColor: "#FFAE42",
content: "⚠️ THESE DOCS ARE OUTDATED. <a href='https://docs.langchain.com/oss/python/langchain/overview' target='_blank'>Visit the new v1.0 docs</a>",
backgroundColor: "#790000ff",
},
prism: {
theme: {

View File

@@ -16,14 +16,13 @@ fi
if { \
[ "$VERCEL_ENV" == "production" ] || \
[ "$VERCEL_GIT_COMMIT_REF" == "master" ] || \
[ "$VERCEL_GIT_COMMIT_REF" == "v0.1" ] || \
[ "$VERCEL_GIT_COMMIT_REF" == "v0.2" ] || \
[ "$VERCEL_GIT_COMMIT_REF" == "v0.3rc" ]; \
} && [ "$VERCEL_GIT_REPO_OWNER" == "langchain-ai" ]
then
echo "✅ Production build - proceeding with build"
exit 1
echo "✅ Production build - proceeding with build"
exit 1
fi

View File

@@ -1,4 +1,20 @@
"""This script checks documentation for broken import statements."""
"""Check documentation for broken import statements.
Validates that all import statements in Jupyter notebooks within the documentation
directory are functional and can be successfully imported.
- Scans all `.ipynb` files in `docs/`
- Extracts import statements from code cells
- Tests each import to ensure it works
- Reports any broken imports that would fail for users
Usage:
python docs/scripts/check_imports.py
Exit codes:
0: All imports are valid
1: Found broken imports (ImportError raised)
"""
import importlib
import json

View File

@@ -57,8 +57,8 @@ SEARCH_TOOL_FEAT_TABLE = {
"available_data": "URL, Snippet, Title, Search Rank, Site Links, Authors",
"link": "/docs/integrations/tools/searchapi",
},
"SerpAPI": {
"pricing": "100 Free Searches/Month",
"SerpApi": {
"pricing": "250 Free Searches/Month",
"available_data": "Answer",
"link": "/docs/integrations/tools/serpapi",
},

View File

@@ -204,7 +204,7 @@ def get_vectorstore_table():
"similarity_search_with_score": True,
"asearch": True,
"Passes Standard Tests": True,
"Multi Tenancy": False,
"Multi Tenancy": True,
"Local/Cloud": "Local",
"IDs in add Documents": True,
},

View File

@@ -119,7 +119,7 @@ export default function ChatModelTabs(props) {
value: "anthropic",
label: "Anthropic",
model: "claude-3-7-sonnet-20250219",
comment: "# Note: Model versions may become outdated. Check https://docs.anthropic.com/en/docs/models-overview for latest versions",
comment: "# Note: Model versions may become outdated. Check https://docs.anthropic.com/en/docs/about-claude/models/overview for latest versions",
apiKeyName: "ANTHROPIC_API_KEY",
packageName: "langchain[anthropic]",
},
@@ -239,6 +239,13 @@ ${llmVarName} = ChatWatsonx(
model: "deepseek-chat",
apiKeyName: "DEEPSEEK_API_KEY",
packageName: "langchain-deepseek",
},
{
value: "chatocigenai",
label: "ChatOCIGenAI",
model: "cohere.command-r-plus-08-2024",
apiKeyName: "OCI_API_KEY",
packageName: "langchain-oci",
}
].map((item) => ({
...item,

View File

@@ -34,6 +34,7 @@ export default function EmbeddingTabs(props) {
fakeEmbeddingParams,
hideFakeEmbedding,
customVarName,
hideOCIGenAIEmbeddings
} = props;
const openAIParamsOrDefault = openaiParams ?? `model="text-embedding-3-large"`;
@@ -183,6 +184,15 @@ export default function EmbeddingTabs(props) {
default: false,
shouldHide: hideFakeEmbedding,
},
{
value: "OCIGenAIEmbeddings",
label: "OCIGenAIEmbeddings",
text: `from langchain_oci.embeddings import OCIGenAIEmbeddings`,
apiKeyName: "OCI_API_KEY",
packageName: "langchain-oci",
default: false,
shouldHide: hideOCIGenAIEmbeddings,
},
];
const modelOptions = tabItems

View File

@@ -39,6 +39,17 @@ const FEATURE_TABLES = {
"local": false,
"apiLink": "https://python.langchain.com/api_reference/mistralai/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html"
},
{
"name": "ChatAIMLAPI",
"package": "langchain-aimlapi",
"link": "aimlapi/",
"structured_output": true,
"tool_calling": true,
"json_mode": true,
"multimodal": true,
"local": false,
"apiLink": "https://python.langchain.com/api_reference/aimlapi/chat_models/langchain_aimlapi.chat_models.ChatAIMLAPI.html"
},
{
"name": "ChatFireworks",
"package": "langchain-fireworks",
@@ -247,6 +258,17 @@ const FEATURE_TABLES = {
"multimodal": true,
"local": false,
"apiLink": "https://python.langchain.com/api_reference/perplexity/chat_models/langchain_perplexity.chat_models.ChatPerplexity.html"
},
{
"name": "ChatOCIGenAI",
"package": "langchain-oci",
"link": "oci_generative_ai",
"structured_output": true,
"tool_calling": true,
"json_mode": true,
"multimodal": true,
"local": false,
"apiLink": "https://github.com/oracle/langchain-oracle"
}
],
},
@@ -301,6 +323,12 @@ const FEATURE_TABLES = {
package: "langchain-fireworks",
apiLink: "https://python.langchain.com/api_reference/fireworks/llms/langchain_fireworks.llms.Fireworks.html"
},
{
name: "AimlapiLLM",
link: "aimlapi",
package: "langchain-aimlapi",
apiLink: "https://python.langchain.com/api_reference/aimlapi/llms/langchain_aimlapi.llms.AimlapiLLM.html"
},
{
name: "OllamaLLM",
link: "ollama",
@@ -382,6 +410,12 @@ const FEATURE_TABLES = {
package: "langchain-fireworks",
apiLink: "https://python.langchain.com/api_reference/fireworks/embeddings/langchain_fireworks.embeddings.FireworksEmbeddings.html"
},
{
name: "AI/ML API",
link: "/docs/integrations/text_embedding/aimlapi",
package: "langchain-aimlapi",
apiLink: "https://python.langchain.com/api_reference/aimlapi/embeddings/langchain_aimlapi.embeddings.AimlapiEmbeddings.html"
},
{
name: "MistralAI",
link: "/docs/integrations/text_embedding/mistralai",
@@ -418,6 +452,13 @@ const FEATURE_TABLES = {
package: "langchain-nvidia",
apiLink: "https://python.langchain.com/api_reference/nvidia_ai_endpoints/embeddings/langchain_nvidia_ai_endpoints.embeddings.NVIDIAEmbeddings.html"
},
{
name: "OCIGenAIEmbeddings",
link: "oci_generative_ai",
package: "langchain-oci",
apiLink: "https://github.com/oracle/langchain-oracle"
}
]
},
document_retrievers: {
@@ -1158,7 +1199,7 @@ const FEATURE_TABLES = {
searchWithScore: true,
async: true,
passesStandardTests: false,
multiTenancy: false,
multiTenancy: true,
local: true,
idsInAddDocuments: true,
},
@@ -1189,17 +1230,17 @@ const FEATURE_TABLES = {
idsInAddDocuments: true,
},
{
name: "PGVectorStore",
link: "pgvectorstore",
deleteById: true,
filtering: true,
searchByVector: true,
searchWithScore: true,
async: true,
passesStandardTests: true,
multiTenancy: false,
local: true,
idsInAddDocuments: true,
name: "PGVectorStore",
link: "pgvectorstore",
deleteById: true,
filtering: true,
searchByVector: true,
searchWithScore: true,
async: true,
passesStandardTests: true,
multiTenancy: false,
local: true,
idsInAddDocuments: true,
},
{
name: "PineconeVectorStore",

Some files were not shown because too many files have changed in this diff Show More