Compare commits

...

354 Commits

Author SHA1 Message Date
isaac hershenson
3392ab24ed changes 2024-08-16 16:38:13 -07:00
Chengzu Ou
c1bd4e05bc docs: fix Databricks Vector Search demo notebook (#25504)
**Description:** This PR fixes an issue in the demo notebook of
Databricks Vector Search in "Work with Delta Sync Index" section.

**Issue:** N/A

**Dependencies:** N/A

---------

Co-authored-by: Chengzu Ou <chengzu.ou@databrick.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-16 20:24:30 +00:00
Isaac Francisco
a2e90a5a43 add embeddings integration tests (#25508) 2024-08-16 13:20:37 -07:00
Bagatur
a06818a654 openai[patch]: update core dep (#25502) 2024-08-16 18:30:17 +00:00
Bagatur
df98552b6f core[patch]: Release 0.2.33 (#25498) 2024-08-16 11:18:54 -07:00
ccurme
b83f1eb0d5 core, partners: implement standard tracing params for LLMs (#25410) 2024-08-16 13:18:09 -04:00
Bagatur
9f0c76bf89 openai[patch]: Release 0.1.22 (#25496) 2024-08-16 16:53:04 +00:00
ccurme
01ecd0acba openai[patch]: fix json mode for Azure (#25488)
https://github.com/langchain-ai/langchain/issues/25479
https://github.com/langchain-ai/langchain/issues/25485

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-16 09:50:50 -07:00
Eugene Yurtsev
1fd1c1dca5 docs: use .invoke rather than __call__ in openai integration notebook (#25494)
Documentation should be using .invoke()
2024-08-16 15:59:18 +00:00
Bagatur
253ceca76a docs: fix mimetype parser docstring (#25463) 2024-08-15 16:16:52 -07:00
Eugene Yurtsev
e18511bb22 core[minor], anthropic[patch]: Upgrade @root_validator usage to be consistent with pydantic 2 (#25457)
anthropic: Upgrade `@root_validator` usage to be consistent with
pydantic 2
core: support looking up multiple keys from env in from_env factory
2024-08-15 20:09:34 +00:00
Eugene Yurtsev
34da8be60b pinecone[patch]: Upgrade @root_validators to be consistent with pydantic 2 (#25453)
Upgrade root validators for pydantic 2 migration
2024-08-15 19:45:14 +00:00
Eugene Yurtsev
b297af5482 voyageai[patch]: Upgrade root validators for pydantic 2 (#25455)
Update @root_validators to be consistent with pydantic 2 semantics
2024-08-15 15:30:41 -04:00
Eugene Yurtsev
4cdaca67dc ai21[patch]: Upgrade @root_validators for pydantic 2 migration (#25454)
Upgrade @root_validators usage to match pydantic 2 semantics
2024-08-15 14:54:08 -04:00
Eugene Yurtsev
d72a08a60d groq[patch]: Update root validators for pydantic 2 migration (#25402) 2024-08-15 18:46:52 +00:00
Leonid Ganeline
8eb63a609e docs: arxiv page update (#25450)
Added `arxive` papers that use `LangGraph` or `LangSmith`. Improved the
page formatting.
2024-08-15 14:30:35 -04:00
Isaac Francisco
5150ec3a04 [experimental]: minor fix to open assistants code (#24682) 2024-08-15 17:50:57 +00:00
Bagatur
2b4fbcb4b4 docs: format oai embeddings docstring (#25448) 2024-08-15 16:57:54 +00:00
Eugene Yurtsev
eb3870e9d8 fireworks[patch]: Upgrade @root_validators to be pydantic 2 compliant (#25443)
Update @root_validators to be pydantic 2 compliant
2024-08-15 16:56:48 +00:00
William FH
75ae585deb Merge support for group manager (#25360) 2024-08-15 09:56:31 -07:00
Eugene Yurtsev
b7c070d437 docs[patch]: Update code that checks API keys (#25444)
Check whether the API key is already in the environment

Update:

```python
import getpass
import os

os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
os.environ["DATABRICKS_TOKEN"] = getpass.getpass("Enter your Databricks access token: ")
```

To:

```python
import getpass
import os

os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
    os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
        "Enter your Databricks access token: "
    )
```

grit migration:

```
engine marzano(0.1)
language python

`os.environ[$Q] = getpass.getpass("$X")` as $CHECK where {
    $CHECK <: ! within if_statement(),
    $CHECK => `if $Q not in os.environ:\n    $CHECK`
}
```
2024-08-15 12:52:37 -04:00
Bagatur
60b65528c5 docs: fix api ref mod links in pkg page (#25447) 2024-08-15 16:52:12 +00:00
Eugene Yurtsev
2ef9d12372 mistralai[patch]: Update more @root_validators for pydantic 2 compatibility (#25446)
Update @root_validators in mistralai integration for pydantic 2 compatibility
2024-08-15 12:44:42 -04:00
Eugene Yurtsev
6910b0b3aa docs[patch]: Fix integration notebook for Fireworks llm (#25442)
Fix integration notebook
2024-08-15 12:42:33 -04:00
Eugene Yurtsev
831708beb7 together[patch]: Update @root_validator for pydantic 2 compatibility (#25423)
This PR updates usage of @root_validator to be compatible with pydantic 2.
2024-08-15 11:27:42 -04:00
Eugene Yurtsev
a114255b82 ai21[patch]: Update @root_validators for pydantic2 migration (#25401)
Update @root_validators for pydantic 2 migration.
2024-08-15 11:26:44 -04:00
Eugene Yurtsev
6f68c8d6ab mistralai[patch]: Update root validator for compatibility with pydantic 2 (#25403) 2024-08-15 11:26:24 -04:00
ccurme
8afbab4cf6 langchain[patch]: deprecate various chains (#25310)
- [x] NatbotChain: move to community, deprecate langchain version.
Update to use `prompt | llm | output_parser` instead of LLMChain.
- [x] LLMMathChain: deprecate + add langgraph replacement example to API
ref
- [x] HypotheticalDocumentEmbedder (retriever): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] FlareChain: update to use `prompt | llm | output_parser` instead
of LLMChain
- [x] ConstitutionalChain: deprecate + add langgraph replacement example
to API ref
- [x] LLMChainExtractor (document compressor): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] LLMChainFilter (document compressor): update to use `prompt | llm
| output_parser` instead of LLMChain
- [x] RePhraseQueryRetriever (retriever): update to use `prompt | llm |
output_parser` instead of LLMChain
2024-08-15 10:49:26 -04:00
Luke
66e30efa61 experimental: Fix divide by 0 error (#25439)
Within the semantic chunker, when calling `_threshold_from_clusters`
there is the possibility for a divide by 0 error if the
`number_of_chunks` is equal to the length of `distances`.

Fix simply implements a check if these values match to prevent the error
and enable chunking to continue.
2024-08-15 14:46:30 +00:00
ccurme
ba167dc158 community[patch]: update connection string in azure cosmos integration test (#25438) 2024-08-15 14:07:54 +00:00
Eugene Yurtsev
44f69063b1 docs[patch]: Fix a few typos in the chat integration docs for TogetherAI (#25424)
Fix a few minor typos
2024-08-15 09:48:36 -04:00
Isaac Francisco
f18b77fd59 [docs]: pdf loaders (#25425) 2024-08-14 21:44:57 -07:00
Isaac Francisco
966b408634 [docs]: doc loader changes (#25417) 2024-08-14 19:46:33 -07:00
ccurme
bd261456f6 langchain: bump core to 0.2.32 (#25421) 2024-08-15 00:00:42 +00:00
Bagatur
ec8ffc8f40 core[patch]: Release 0.2.32 (#25420) 2024-08-14 15:56:56 -07:00
Bagatur
2494cecabf core[patch]: tool import fix (#25419) 2024-08-14 22:54:13 +00:00
ccurme
df632b8cde langchain: bump min core version (#25418) 2024-08-14 22:51:35 +00:00
ccurme
1050e890c6 langchain: release 0.2.14 (#25416)
Fixes https://github.com/langchain-ai/langchain/issues/25413
2024-08-14 22:29:39 +00:00
Isaac Francisco
c4779f5b9c [docs]: sitemaploader update (#25363) 2024-08-14 15:27:40 -07:00
gbaian10
0a99935794 docs: remove the extra period in docstring (#25414)
Remove the period after the hyperlink in the docstring of
BaseChatOpenAI.with_structured_output.

I have repeatedly copied the extra period at the end of the hyperlink,
which results in a "Page not found" page when pasted into the browser.
2024-08-14 18:07:15 -04:00
Isaac Francisco
63aba3fe5b [docs]: link fix directory loader (#25411) 2024-08-14 20:58:54 +00:00
Bagatur
dc80be5efe docs: fix deprecated functions table (#25409) 2024-08-14 12:25:39 -07:00
Erick Friis
ab29ee79a3 docs: fix tool index (#25404) 2024-08-14 18:36:41 +00:00
Werner van der Merwe
1d3f7231b8 fix: typo where github should be gitlab (#25397)
**PR title**: "GitLabToolkit: fix typo"
    - **Description:** fix typo where GitHub should have been GitLab
    - **Dependencies:** None
2024-08-14 18:36:25 +00:00
Bagatur
a58d4ba340 core[patch]: Release 0.2.31 (#25388) 2024-08-14 11:26:49 -07:00
Bagatur
d178fb9dc3 docs: fix api ref package tables (#25400) 2024-08-14 10:40:16 -07:00
Bagatur
414154fa59 experimental[patch]: refactor rl chain structure (#25398)
can't have a class and function with same name but different
capitalization in same file for api reference building
2024-08-14 17:09:43 +00:00
Flávio Knob
94c9cb7321 Update document_loader_custom.ipynb (#25393)
Fix typo

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-14 12:33:21 -04:00
Jacob Lee
012929551c docs[patch]: Hide deprecated integration pages (#25389) 2024-08-14 09:17:39 -07:00
Bagatur
63c483ea01 standard-tests: import fix (#25395) 2024-08-14 09:13:56 -07:00
Bagatur
eec7bb4f51 anthropic[patch]: Release 0.1.23 (#25394) 2024-08-14 09:03:39 -07:00
Flávio Knob
f0f125dac7 Update document_loader_custom.ipynb (#25391)
Fix typo and some `callout` tags

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-14 15:07:42 +00:00
Eugene Yurtsev
f4196f1fb8 ollama[patch]: Update extra in ollama package (#25383)
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.
2024-08-14 10:30:01 -04:00
Chengyu Yan
d0ad713937 core: fix issue#24660, slove error messages about ValueError when use model with history (#25183)
- **Description:**
This PR will slove error messages about `ValueError` when use model with
history.
Detail in #24660.
#22933 causes that
`langchain_core.runnables.history.RunnableWithMessageHistory._get_output_messages`
miss type check of `output_val` if `output_val` is `False`. After
running `RunnableWithMessageHistory._is_not_async`, `output` is `False`.

249945a572/libs/core/langchain_core/runnables/history.py (L323-L334)

15a36dd0a2/libs/core/langchain_core/runnables/history.py (L461-L471)
~~I suggest that `_get_output_messages` return empty list when
`output_val == False`.~~

- **Issue**:
  - #24660

- **Dependencies:**: No Change.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-08-14 14:26:22 +00:00
Jacob Lee
ddd7919f6a docs[patch]: Add conceptual guide links to integration index pages (#25387) 2024-08-14 07:14:24 -07:00
Bagatur
493e474063 docs: udpated api reference (#25172)
- Move the API reference into the vercel build
- Update api reference organization and styling
2024-08-14 07:00:17 -07:00
Leonid Ganeline
4a812e3193 docs: integrations references update (#25217)
Added missed provider pages. Fixed formats and added descriptions and
links.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-14 13:58:38 +00:00
Eugene Yurtsev
5f5e8c9a60 huggingface[patch], pinecone[patch], fireworks[patch], mistralai[patch], voyageai[patch], togetherai[path]: convert Pydantic extras to literals (#25384)
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.

- fireworks
- voyage ai
- mistralai
- mistral ai
- together ai
- huggigng face
- pinecone
2024-08-14 09:55:30 -04:00
Eugene Yurtsev
d00176e523 openai[patch]: Update extra to match pydantic 2 (#25382)
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.
2024-08-14 09:55:18 -04:00
Eugene Yurtsev
dc51cc5690 core[minor]: Prevent PydanticOutputParser from encoding schema as ASCII (#25386)
This allows users to provide parameter descriptions in the pydantic
models in other languages.

Continuing this PR: https://github.com/langchain-ai/langchain/pull/24809
2024-08-14 13:54:31 +00:00
ccurme
27690506d0 multiple: update removal targets (#25361) 2024-08-14 09:50:39 -04:00
Ikko Eltociear Ashimine
4029f5650c docs: update clarifai.ipynb (#25373)
Intialize -> Initialize
2024-08-14 09:20:17 -04:00
Erick Friis
10e6725a7e docs: tools index table (#25370) 2024-08-14 02:38:03 +00:00
Harrison Chase
967b6f21f6 docs: improve document loaders index (#25365)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-14 01:48:48 +00:00
Erick Friis
4a78be7861 docs: remove sidebar comment (#25369) 2024-08-14 01:47:12 +00:00
Eugene Yurtsev
d6c180996f docs[patch]: Fix typo in CohereEmbeddings integration docs (#25367)
Fix typo
2024-08-14 01:18:54 +00:00
Eugene Yurtsev
93dcc47463 docs: Partial integration update for cohere embeddings (#25250)
This can be finished after the following issue is resolved:

https://github.com/langchain-ai/langchain-cohere/issues/81

Related to: https://github.com/langchain-ai/langchain/issues/24856

```json
[
   {
      "provider": "cohere",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-08-14 00:53:13 +00:00
Eugene Yurtsev
27def6bddb docs[patch]: Update integration docs for AzureOpenAIEmbeddings (#25311)
https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:33:13 +00:00
Eugene Yurtsev
b4e3bdb714 docs: Update nomic AI embeddings integration docs (#25308)
Issue: https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:32:07 +00:00
Eugene Yurtsev
f82c3f622a docs: Update AI21Embeddings Integration docs (#25298)
Update AI21 Integration docs

Issue: https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:30:16 +00:00
Eugene Yurtsev
d55d99222b docs: update integration docs for mistral ai embedding model (#25253)
Related issue: https://github.com/langchain-ai/langchain/issues/24856

```json
[
   {
      "provider": "mistralai",
      "js":  true,
      "local": false,
     "serializable": false,
    "native_async": true
   }
]
```

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:25:36 +00:00
Eugene Yurtsev
0f6217f507 docs: together ai embeddings integration docs (#25252)
Update together AI embedding integration docs

Related issue: https://github.com/langchain-ai/langchain/issues/24856

```json
[
   {
      "provider": "together",
      "js":  true,
      "local": false,
     "serializable": false,
   }
]
```

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:24:02 +00:00
Eugene Yurtsev
8645a49f31 docs: Update integration docs for OllamaEmbeddingsModel (#25314)
Issue: https://github.com/langchain-ai/langchain/issues/24856

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:23:05 +00:00
Eugene Yurtsev
a4ef830480 docs: update integration docs for openai embeddings (#25249)
Related issue: https://github.com/langchain-ai/langchain/issues/24856

```json
   {
      "provider": "openai",
      "js":  true,
      "local": false,
     "serializable": false,
"async_native": true
  }
```

---------

Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-08-14 00:21:36 +00:00
Eugene Yurtsev
b1aed44540 docs: Updating integration docs for Fireworks Embeddings (#25247)
Providers:
* fireworks

See related issue:
* https://github.com/langchain-ai/langchain/issues/24856

Features:

```json
[
   {
      "provider": "fireworks",
      "js":  true,
      "local": false,
     "serializable": false,
   }



]


```

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2024-08-13 17:04:18 -07:00
Isaac Francisco
f4ffd692a3 [docs]: standardize doc loader doc strings (#25325) 2024-08-13 23:18:56 +00:00
Isaac Francisco
e0bbb81d04 [docs]: standardize tool docstrings (#25351) 2024-08-13 16:10:00 -07:00
Erick Friis
d5b548b4ce docs: index pages, sidebars (#25316) 2024-08-13 15:52:51 -07:00
Isaac Francisco
0478f7f5e4 [docs]: LLM integration pages (#25005) 2024-08-13 14:50:45 -07:00
thedavgar
9d08369442 community: fix AzureSearch vectorstore asyncronous methods (#24921)
**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](ffe6ca986e)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue #24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.

@chrislrobert said in [this
comment](https://github.com/langchain-ai/langchain/issues/24740#issuecomment-2254168302)
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.

In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.

**Issue:** #24740 and #24064
**Dependencies:** No new dependencies for this change

**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67)


**Lint and test**: Passes the tests and the linter
2024-08-13 14:20:51 -07:00
Isaac Francisco
6bc451b942 [docs]: merge tool/toolkit duplicates (#25197) 2024-08-13 12:19:17 -07:00
Fedor Nikolaev
2b15518c5f community: add args_schema to SearxSearchResults tool (#25350)
This adds `args_schema` member to `SearxSearchResults` tool. This member
is already present in the `SearxSearchRun` tool in the same file.

I was having `TypeError: Type is not JSON serializable:
AsyncCallbackManagerForToolRun` being thrown in langserve playground
when I was using `SearxSearchResults` tool as a part of chain there.
This fixes the issue, so the error is not raised anymore.

This is a example langserve app that was giving me the error, but it
works properly after the proposed fix:
```python
#!/usr/bin/env python

from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SearxSearchWrapper
from langchain_community.tools.searx_search.tool import SearxSearchResults
from langserve import add_routes

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

s = SearxSearchWrapper(searx_host="http://localhost:8080")

search = SearxSearchResults(wrapper=s)

search_chain = (
    {"context": search, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

app = FastAPI()

add_routes(
    app,
    search_chain,
    path="/chain",
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=8000)
```
2024-08-13 18:26:09 +00:00
Matt Kandler
b6df3405fb docs: Fix broken link to Runhouse documentation (#25349)
- **Description:** Runhouse recently migrated from Read the Docs to a
self-hosted solution. This PR updates a broken link from the old docs to
www.run.house/docs. Also changed "The Runhouse" to "Runhouse" (it's
cleaner).
- **Issue:** None
- **Dependencies:** None
2024-08-13 18:18:19 +00:00
maang-h
089f5e6cad Standardize SparkLLM (#25239)
- **Description:** Standardize SparkLLM, include:
  - docs, the issue #24803 
  - to support stream
  - update api url
  - model init arg names, the issue #20085
2024-08-13 09:50:12 -04:00
Leonid Ganeline
35e2230f56 docs: integrationsreferences update (#25322)
Added missed provider pages. Fixed formats and added descriptions and
links.
2024-08-13 09:29:51 -04:00
Chen Xiabin
24155aa1ac qianfan generate/agenerate with usage_metadata (#25332) 2024-08-13 09:24:41 -04:00
Christophe Bornet
ebbe609193 Add README for astradb package (#25345)
Similar to
https://github.com/langchain-ai/langchain/blob/master/libs/partners/ibm/README.md
2024-08-13 09:17:23 -04:00
Eugene Yurtsev
f679ed72ca ollama[patch]: Update API Reference for ollama embeddings (#25315)
Update API reference for OllamaEmbeddings
Issue: https://github.com/langchain-ai/langchain/issues/24856
2024-08-12 21:31:48 -04:00
Erick Friis
2907ab2297 community: release 0.2.12 (#25324) 2024-08-12 23:30:27 +00:00
Erick Friis
06f8bd9946 langchain: release 0.2.13 (#25323) 2024-08-12 22:24:06 +00:00
Erick Friis
252f0877d1 core: release 0.2.30 (#25321) 2024-08-12 22:01:24 +00:00
Eugene Yurtsev
217a915b29 openai: Update API Reference docs for AzureOpenAI Embeddings (#25312)
Update AzureOpenAI Embeddings docs
2024-08-12 19:41:18 +00:00
Eugene Yurtsev
056c7c2983 core[patch]: Update API reference for fake embeddings (#25313)
Issue: https://github.com/langchain-ai/langchain/issues/24856

Using the same template for the fake embeddings in langchain_core as
used in the integrations.
2024-08-12 19:40:05 +00:00
Ben Chambers
1adc161642 community: kwargs for CassandraGraphVectorStore (#25300)
- **Description:** pass kwargs from CassandraGraphVectorStore to
underlying store

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-12 18:01:29 +00:00
Hassan-Memon
deb27d8970 docs: remove unused imports in Conversational RAG tutorial (#25297)
Cleaned up the "Tying it Together" section of the Conversational RAG
tutorial by removing unnecessary imports that were not used. This
reduces confusion and makes the code more concise.

Thank you for contributing to LangChain!

PR title: docs: remove unused imports in Conversational RAG tutorial

PR message:

Description: Removed unnecessary imports from the "Tying it Together"
section of the Conversational RAG tutorial. These imports were not used
in the code and created confusion. The updated code is now more concise
and easier to understand.
Issue: N/A
Dependencies: None
LinkedIn handle: [Hassan
Memon](https://www.linkedin.com/in/hassan-memon-a109b3257/)
Add tests and docs:

Hi [LangChain Team Member’s Name],

I hope you're doing well! I’m thrilled to share that I recently made my
second contribution to the LangChain project. If possible, could you
give me a shoutout on LinkedIn? It would mean a lot to me and could help
inspire others to contribute to the community as well.

Here’s my LinkedIn profile: [Hassan
Memon](https://www.linkedin.com/in/hassan-memon-a109b3257/).

Thank you so much for your support and for creating such a great
platform for learning and collaboration. I'm looking forward to
contributing more in the future!

Best regards,
Hassan Memon
2024-08-12 13:49:55 -04:00
gbaian10
5efd0fe9ae docs: Change SqliteSaver to MemorySaver (#25306)
fix: #25137

`SqliteSaver.from_conn_string()` has been changed to a `contextmanager`
method in `langgraph >= 0.2.0`, the original usage is no longer
applicable.

Refer to
<https://github.com/langchain-ai/langgraph/pull/1271#issue-2454736415>
modification method to replace `SqliteSaver` with `MemorySaver`.
2024-08-12 13:45:32 -04:00
Eugene Yurtsev
1c9917dfa2 fireworks[patch]: Fix doc-string for API Referenmce (#25304) 2024-08-12 17:16:13 +00:00
Eugene Yurtsev
ccff1ba8b8 ai21[patch]: Update API reference documentation (#25302)
Issue: https://github.com/langchain-ai/langchain/issues/24856
2024-08-12 13:15:27 -04:00
Eugene Yurtsev
53ee5770d3 fireworks: Add APIReference for the FireworksEmbeddings model (#25292)
Add API Reference documentation for the FireworksEmbedding model.

Issue: https://github.com/langchain-ai/langchain/issues/24856
2024-08-12 13:13:43 -04:00
Eugene Yurtsev
8626abf8b5 togetherai[patch]: Update API Reference for together AI embeddings model (#25295)
Issue: https://github.com/langchain-ai/langchain/issues/24856
2024-08-12 17:12:28 +00:00
Eugene Yurtsev
1af8456a2c mistralai[patch]: Docs Update APIReference for MistralAIEmbeddings (#25294)
Update API Reference for MistralAI embeddings

Issue: https://github.com/langchain-ai/langchain/issues/24856
2024-08-12 15:25:37 +00:00
Eugene Yurtsev
0a3500808d openai[patch]: Docs fix RST formatting in OpenAIEmbeddings (#25293) 2024-08-12 11:24:35 -04:00
Eugene Yurtsev
ee8a585791 openai[patch]: Add API Reference docs to OpenAIEmbeddings (#25290)
Issue: [24856](https://github.com/langchain-ai/langchain/issues/24856)
2024-08-12 14:53:51 +00:00
ccurme
e77eeee6ee core[patch]: add standard tracing params for retrievers (#25240) 2024-08-12 14:51:59 +00:00
Mohammad Mohtashim
9927a4866d [Community] - Added bind_tools and with_structured_output for ChatZhipuAI (#23887)
- **Description:** This PR implements the `bind_tool` functionality for
ChatZhipuAI as requested by the user. ChatZhipuAI models support tool
calling according to the `OpenAI` tool format, as outlined in their
official documentation [here](https://open.bigmodel.cn/dev/api#glm-4).
- **Issue:**  ##23868

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-12 14:11:43 +00:00
Hassan-Memon
420534c8ca docs: Replaced SqliteSaver with MemorySaver and updated installation instru… (#25285)
…ctions to match LangGraph v2 documentation. Corrected code snippet to
prevent validation errors.

Here's how you can fill out the provided template for your pull request:

---

**Thank you for contributing to LangChain!**

- [ ] **PR title**: `docs: update checkpointer example in Conversational
RAG tutorial`

- [ ] **PR message**:
- **Description:** Updated the Conversational RAG tutorial to correct
the checkpointer example by replacing `SqliteSaver` with `MemorySaver`.
Added installation instructions for `langgraph-checkpoint-memory` to
match LangGraph v2 documentation and prevent validation errors.
    - **Issue:** N/A
    - **Dependencies:** `langgraph-checkpoint-memory`
    - **Twitter handle:** N/A

- [ ] **Add tests and docs**: 
  1. No new integration tests are required.
  2. Updated documentation in the Conversational RAG tutorial.

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: [LangChain Contribution
Guidelines](https://python.langchain.com/docs/contributing/)

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-12 09:24:51 -04:00
Yunus Emre Özdemir
794f28d4e2 docs: document upstash vector namespaces (#25289)
**Description:** This PR rearranges the examples in Upstash Vector
integration documentation to describe how to use namespaces and improve
the description of metadata filtering.
2024-08-12 09:17:11 -04:00
JasonJ
f28ae20b81 docs: pip install bug fixed (#25287)
Thank you for contributing to LangChain!
- **Description:** Fixing package install bug in cookbook
- **Issue:** zsh:1: no matches found: unstructured[all-docs]
- **Dependencies:** N/A
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!



If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-12 05:12:44 +00:00
Soichi Sumi
9f0eda6a18 docs: Fix link for API reference of Gmail Toolkit (#25286)
- **Description:** Fix link for API reference of Gmail Toolkit
- **Issue:** I've just found this issue while I'm reading the doc
- **Dependencies:** N/A
- **Twitter handle:** [@soichisumi](https://x.com/soichisumi)

TODO: If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-12 05:12:31 +00:00
Anush
472527166f qdrant: Update API reference link and install command (#25245)
## Description

As the title goes. The current API reference links to the deprecated
class.
2024-08-11 16:54:14 -04:00
Aryan Singh
074fa0db73 docs: Fixed grammer error in functions.ipynb (#25255)
**Description**: Grammer Error in functions.ipynb
**Issue**: #25222
2024-08-11 20:53:27 +00:00
gbaian10
4fd1efc48f docs: update "Build an Agent" Installation Hint in agents.ipynb (#25263)
fix #25257
2024-08-11 16:51:34 -04:00
gbaian10
aa2722cbe2 docs: update numbering of items in docstring (#25267)
A problem similar to #25093 .

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-11 20:50:24 +00:00
Maddy Adams
a82c0533f2 langchain: default to langsmith sdk for pulling prompts, fallback to langchainhub (#24156)
**Description:** Deprecating langchainhub, replacing with langsmith sdk
2024-08-11 13:30:52 -07:00
maang-h
bc60cddc1b docs: Fix ChatBaichuan, QianfanChatEndpoint, ChatSparkLLM, ChatZhipuAI docs (#25265)
- **Description:** Fix some chat models docs, include:
  - ChatBaichuan
  - QianfanChatEndpoint
  - ChatSparkLLM
  - ChatZhipuAI
2024-08-11 16:23:55 -04:00
ZhangShenao
43deed2a95 Improvement[Embeddings] Add dimension support to ZhipuAIEmbeddings (#25274)
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
2024-08-11 16:20:37 -04:00
maang-h
9cd608efb3 docs: Standardize OpenAI Docs (#25280)
- **Description:** Standardize OpenAI Docs
- **Issue:** the issue #24803

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-11 20:20:16 +00:00
Bagatur
fd546196ef openai[patch]: Release 0.1.21 (#25269) 2024-08-10 16:37:31 -07:00
Eugene Yurtsev
6dd9f053e3 core[patch]: Deprecating beta upsert APIs in vectorstore (#25069)
This PR deprecates the beta upsert APIs in vectorstore.

We'll introduce them in a V2 abstraction instead to keep the existing
vectorstore implementations lighter weight.

The main problem with the existing APIs is that it's a bit more
challenging to
implement the correct behavior w/ respect to IDs since ID can be present
in
both the function signature and as an optional attribute on the document
object.

But VectorStores that pass the standard tests should have implemented
the semantics properly!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-09 17:17:36 -04:00
Bagatur
ca9dcee940 standard-tests[patch]: test ToolMessage.status="error" (#25210) 2024-08-09 13:00:14 -07:00
Eugene Yurtsev
dadb6f1445 cli[patch]: Update integration template for embedding models (#25248)
Update integration template for embedding models
2024-08-09 14:28:57 -04:00
Eugene Yurtsev
b6f0174bb9 community[patch],core[patch]: Update EdenaiTool root_validator and add unit test in core (#25233)
This PR gets rid `root_validators(allow_reuse=True)` logic used in
EdenAI Tool in preparation for pydantic 2 upgrade.
- add another test to secret_from_env_factory
2024-08-09 15:59:27 +00:00
blueoom
c3ced4c6ce core[patch]: use time.monotonic() instead time.time() in InMemoryRateLimiter
**Description:**

The get time point method in the _consume() method of
core.rate_limiters.InMemoryRateLimiter uses time.time(), which can be
affected by system time backwards. Therefore, it is recommended to use
the monotonically increasing monotonic() to obtain the time

```python
        with self._consume_lock:
            now = time.time()  # time.time() -> time.monotonic()

            # initialize on first call to avoid a burst
            if self.last is None:
                self.last = now

            elapsed = now - self.last  # when use time.time(), elapsed may be negative when system time backwards

```
2024-08-09 11:31:20 -04:00
Eugene Yurtsev
bd6c31617e community[patch]: Remove more @allow_reuse=True validators (#25236)
Remove some additional allow_reuse=True usage in @root_validators.
2024-08-09 11:10:27 -04:00
Eugene Yurtsev
6e57aa7c36 community[patch]: Remove usage of @root_validator(allow_reuse=True) (#25235)
Remove usage of @root_validator(allow_reuse=True)
2024-08-09 10:57:42 -04:00
thiswillbeyourgithub
a2b4c33bd6 community[patch]: FAISS: ValueError mentions normalize_score_fn isntead of relevance_score_fn (#25225)
Thank you for contributing to LangChain!

- [X] **PR title**: "community: fix valueerror mentions wrong argument
missing"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [X] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** when faiss.py has a None relevance_score_fn it raises
a ValueError that says a normalize_fn_score argument is needed.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-09 14:40:29 +00:00
ccurme
4825dc0d76 langchain[patch]: add deprecations (#24792) 2024-08-09 10:34:43 -04:00
ccurme
02300471be langchain[patch]: extended-tests: drop logprobs from OAI expected config (#25234)
Following https://github.com/langchain-ai/langchain/pull/25229
2024-08-09 14:23:11 +00:00
Shivendra Soni
66b7206ab6 community: Add llm-extraction option to FireCrawl Document Loader (#25231)
**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-09 13:59:10 +00:00
blaufink
c81c77b465 partners: fix of issue #24880 (#25229)
- Description: As described in the related issue: There is an error
occuring when using langchain-openai>=0.1.17 which can be attributed to
the following PR: #23691
Here, the parameter logprobs is added to requests per default.
However, AzureOpenAI takes issue with this parameter as stated here:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python-new&pivots=programming-language-chat-completions
-> "If you set any of these parameters, you get an error."
Therefore, this PR changes the default value of logprobs parameter to
None instead of False. This results in it being filtered before the
request is sent.
- Issue: #24880
- Dependencies: /

Co-authored-by: blaufink <sebastian.brueckner@outlook.de>
2024-08-09 13:21:37 +00:00
ccurme
3b7437d184 docs: update integration api refs (#25195)
- [x] toolkits
- [x] retrievers (in this repo)
2024-08-09 12:27:32 +00:00
Bagatur
91ea4b7449 infra: avoid orjson 3.10.7 in vercel build (#25212) 2024-08-09 02:23:18 +00:00
Isaac Francisco
652b3fa4a4 [docs]: playwright fix (#25163) 2024-08-08 17:13:42 -07:00
Bagatur
7040013140 core[patch]: fix deprecation pydantic bug (#25204)
#25004 is incompatible with pydantic < 1.10.17. Introduces fix for this.
2024-08-08 16:39:38 -07:00
Isaac Francisco
dc7423e88f [docs]: standardizing document loader integration pages (#25002) 2024-08-08 16:33:09 -07:00
Casey Clements
25f2e25be1 partners[patch]: Mongodb Retrievers - CI final touches. (#25202)
## Description

Contains 2 updates to for integration tests to run on langchain's CI.
Addendum to #25057 to get release github action to succeed.
2024-08-08 15:38:31 -07:00
Bagatur
786ef021a3 docs: redirect toolkits (#25190) 2024-08-08 14:54:11 -07:00
Eugene Yurtsev
429a0ee7fd core[minor]: Add factory for looking up secrets from the env (#25198)
Add factory method for looking secrets from the env.
2024-08-08 16:41:58 -04:00
Erick Friis
da9281feb2 cli: release 0.0.29 (#25196) 2024-08-08 12:52:49 -07:00
Erick Friis
c6ece6a96d core: autodetect more ls params (#25044)
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-08-08 12:44:21 -07:00
Eugene Yurtsev
86355640c3 experimental[patch]: Use get_fields adapter (#25193)
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.

Code mod generated using the following grit pattern:

```
engine marzano(0.1)
language python


`$X.__fields__` => `get_fields($X)` where {
    add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
2024-08-08 15:10:11 -04:00
Eugene Yurtsev
b9f65e5038 experimental[patch]: Migrate pydantic extra to literals (#25194)
Migrate pydantic extra to literals

Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.

This works correctly also in pydantic v1.

```python
from pydantic.v1 import BaseModel

class Foo(BaseModel, extra="forbid"):
    x: int

Foo(x=5, y=1)
```

And 


```python
from pydantic.v1 import BaseModel

class Foo(BaseModel):
    x: int

    class Config:
      extra = "forbid"

Foo(x=5, y=1)
```


## Enum -> literal using grit pattern:

```
engine marzano(0.1)
language python
or {
    `extra=Extra.allow` => `extra="allow"`,
    `extra=Extra.forbid` => `extra="forbid"`,
    `extra=Extra.ignore` => `extra="ignore"`
}
```

Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)


## Sort attributes in Config:

```
engine marzano(0.1)
language python


function sort($values) js {
    return $values.text.split(',').sort().join("\n");
}


class_definition($name, $body) as $C where {
    $name <: `Config`,
    $body <: block($statements),
    $values = [],
    $statements <: some bubble($values) assignment() as $A where {
        $values += $A
    },
    $body => sort($values),
}

```
2024-08-08 19:05:54 +00:00
Eugene Yurtsev
30fb345342 core[minor]: Add from_env utility (#25189)
Add a utility that can be used as a default factory

The goal will be to start migrating from of the pydantic models to use
`from_env` as a default factory if possible.

```python

from pydantic import Field, BaseModel
from langchain_core.utils import from_env

class Foo(BaseModel):
   name: str = Field(default_factory=from_env('HELLO'))
```
2024-08-08 14:52:35 -04:00
Eugene Yurtsev
98779797fe community[patch]: Use get_fields adapter for pydantic (#25191)
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.

Code mod generated using the following grit pattern:

```
engine marzano(0.1)
language python


`$X.__fields__` => `get_fields($X)` where {
    add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
2024-08-08 14:43:09 -04:00
Rajendra Kadam
663638d6a8 community[minor]: [SharePointLoader] Load extended metadata for the root folder (#24872)
- **Title:** [SharePointLoader] Load extended metadata for the root
folder
- **Description:** 
    - Ensure extended metadata loads correctly for the root folder.
- Cleanup: Refactor SharePointLoader to remove unused fields(`file_id` &
`site_id`).
- **Dependencies:** NA
- **Add tests and docs:** NA
2024-08-08 14:39:16 -04:00
Eugene Yurtsev
2f209d84fa core[patch]: Add pydantic get_fields adapter (#25187)
Add adapter to get fields
2024-08-08 17:47:42 +00:00
Eugene Yurtsev
c72e522e96 langchain[patch]: Upgrade pydantic extra (#25186)
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.

This works correctly also in pydantic v1.

```python
from pydantic.v1 import BaseModel

class Foo(BaseModel, extra="forbid"):
    x: int

Foo(x=5, y=1)
```

And 


```python
from pydantic.v1 import BaseModel

class Foo(BaseModel):
    x: int

    class Config:
      extra = "forbid"

Foo(x=5, y=1)
```


## Enum -> literal using grit pattern:

```
engine marzano(0.1)
language python
or {
    `extra=Extra.allow` => `extra="allow"`,
    `extra=Extra.forbid` => `extra="forbid"`,
    `extra=Extra.ignore` => `extra="ignore"`
}
```

Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)


## Sort attributes in Config:

```
engine marzano(0.1)
language python


function sort($values) js {
    return $values.text.split(',').sort().join("\n");
}


class_definition($name, $body) as $C where {
    $name <: `Config`,
    $body <: block($statements),
    $values = [],
    $statements <: some bubble($values) assignment() as $A where {
        $values += $A
    },
    $body => sort($values),
}

```
2024-08-08 17:27:27 +00:00
Eugene Yurtsev
bf5193bb99 community[patch]: Upgrade pydantic extra (#25185)
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.

This works correctly also in pydantic v1.

```python
from pydantic.v1 import BaseModel

class Foo(BaseModel, extra="forbid"):
    x: int

Foo(x=5, y=1)
```

And 


```python
from pydantic.v1 import BaseModel

class Foo(BaseModel):
    x: int

    class Config:
      extra = "forbid"

Foo(x=5, y=1)
```


## Enum -> literal using grit pattern:

```
engine marzano(0.1)
language python
or {
    `extra=Extra.allow` => `extra="allow"`,
    `extra=Extra.forbid` => `extra="forbid"`,
    `extra=Extra.ignore` => `extra="ignore"`
}
```

Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)


## Sort attributes in Config:

```
engine marzano(0.1)
language python


function sort($values) js {
    return $values.text.split(',').sort().join("\n");
}


class_definition($name, $body) as $C where {
    $name <: `Config`,
    $body <: block($statements),
    $values = [],
    $statements <: some bubble($values) assignment() as $A where {
        $values += $A
    },
    $body => sort($values),
}

```
2024-08-08 17:20:39 +00:00
Isaac Francisco
11adc09e02 [docs]: change rag reference in vector store pages (#25125) 2024-08-08 10:08:14 -07:00
Anush
6b32810b68 qdrant: Update doc with usage snippets (#25179)
## Description

This PR adds back snippets demonstrating sparse and hybrid retrieval in
the Qdrant notebook.

Without the snippets, it's hard to grok the usage.
2024-08-08 12:58:26 -04:00
Eugene Yurtsev
3da2713172 docs: Update pydantic compatibility (#25145)
Update pydantic compatibility
2024-08-08 12:10:44 -04:00
Eugene Yurtsev
425f6ffa5b core[patch]: Fix aindex API (#25155)
A previous PR accidentally broke the aindex API by renaming a positional
argument vectorstore into vector_store. This PR reverts this change.
2024-08-08 12:08:18 -04:00
Isaac Francisco
15a36dd0a2 [docs]: combine tools and toolkits (#25158) 2024-08-08 08:59:02 -07:00
ololand
249945a572 Update polygon.py for business subscription (#25085)
For business subscription the status is STOCKSBUSINESS not OK

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-08 15:28:41 +00:00
ccurme
59b8850909 groq[patch]: update rate limit in integration tests (#25177)
Divide by ~2 to account for testing python 3.8 and 3.12 in parallel.
2024-08-08 13:33:25 +00:00
Chad Juliano
4828c441a7 docs: Update notebook name for Kinetica (#25149)
**Description:** Change notebook description in documentation.
**Issue:** N/A
**Dependencies:** N/A
2024-08-08 09:27:29 -04:00
Francisco Kurucz
725e4912ae docs: Fix reference to SQL QA migration (#25157)
**Description:** I found that the link to the notebook in the Migration
notes is broken, i found that it was linked to this file
https://github.com/langchain-ai/langchain/blob/v0.0.250/docs/extras/use_cases/tabular/sql_query.ipynb
and i think now this tutorial
https://github.com/JuanFKurucz/langchain/blob/master/docs/docs/tutorials/sql_qa.ipynb
is the best fit for this reference

**Twitter handle:** @juanfkurucz

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-08 09:26:13 -04:00
ogawa
d895db11d6 community[patch]: gpt-4o-2024-08-06 costs (#25164)
- **Description:** updated OpenAI cost definitions according to the
following:
  - https://openai.com/api/pricing/
- **Twitter handle:** `@ogawa65a`

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-08 13:22:11 +00:00
Brace Sproul
d77c7c4236 docs: Fix misspelling of instantiate in docs (#25107) 2024-08-07 15:05:06 -07:00
Eugene Yurtsev
7b1a132aff core[patch]: Add unit tests for Serializable (#25152)
Add a few test cases for serializable (many other test cases already
covered
throguh runnable tests).
2024-08-07 21:01:36 +00:00
Bagatur
df99b832a7 core[patch]: support Field deprecation (#25004)
![Screenshot 2024-08-02 at 4 23 17
PM](https://github.com/user-attachments/assets/c757e093-877e-4af6-9dcd-984195454158)
2024-08-07 13:57:55 -07:00
ccurme
803eba3163 core[patch]: check for model_fields attribute (#25108)
`__fields__` raises a warning in pydantic v2
2024-08-07 13:32:56 -07:00
Casey Clements
6e9a8b188f mongodb: Add Hybrid and Full-Text Search Retrievers, release 0.2.0 (#25057)
## Description

This pull-request extends the existing vector search strategies of
MongoDBAtlasVectorSearch to include Hybrid (Reciprocal Rank Fusion) and
Full-text via new Retrievers.

There is a small breaking change in the form of the `prefilter` kwarg to
search. For this, and because we have now added a great deal of
features, including programmatic Index creation/deletion since 0.1.0, we
plan to bump the version to 0.2.0.

### Checklist
* Unit tests have been extended
* formatting has been applied
* One mypy error remains which will either go away in CI or be
simplified.

---------

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-07 20:10:29 +00:00
Isaac Francisco
f337408b0f [docs]: add sidebar for different tool categories (#25065) 2024-08-07 12:57:58 -07:00
Bagatur
0b4608f71e infra: temp skip oai embeddings test (#25148) 2024-08-07 17:51:39 +00:00
Bagatur
a4086119f8 openai[patch]: Release 0.1.21rc2 (#25146) 2024-08-07 16:59:15 +00:00
Bagatur
b4c12346cc core[patch]: Release 0.2.29 (#25126) 2024-08-07 09:50:20 -07:00
Erick Friis
dff83cce66 core[patch]: base language model disable_streaming (#25070)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-08-07 09:26:21 -07:00
eric-langenberg
130e80b60f docs: rag.ipynb - fixing typo (#25142)
Just changing gpt-3.5 to gpt-4o-mini . That's what's used in the code
examples now. It just didn't get updated in the main text.
2024-08-07 16:02:22 +00:00
Bagatur
09fbce13c5 openai[patch]: ChatOpenAI.with_structured_output json_schema support (#25123) 2024-08-07 08:09:07 -07:00
maang-h
0ba125c3cd docs: Standardize QianfanLLMEndpoint LLM (#25139)
- **Description:** Standardize QianfanLLMEndpoint LLM,include:
  - docs, the issue #24803 
  - model init arg names, the issue #20085
2024-08-07 10:57:27 -04:00
Eugene Yurtsev
28e0958ff4 core[patch]: Relax rate limit unit tests in terms of timing (#25140)
Relax rate limit unit tests
2024-08-07 14:04:58 +00:00
Eray Eroğlu
a2e9910268 Documentation Update for Upstash Semantic Caching (#25114)
Thank you for contributing to LangChain!

- [ ] **PR title**: "Documentation Update : Semantic Caching Update for
Upstash"
 - Docs, llm caching integrations update

- **Description:** Upstash supports semantic caching, and we would like
to inform you about this
- **Twitter handle:** You can mention eray_eroglu_ if you want to post a
tweet about the PR

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-07 14:02:07 +00:00
Pat Patterson
7e7fcf5b1f community: Fix ValidationError on creating GPT4AllEmbeddings with no gpt4all_kwargs (#25124)
- **Description:** Instantiating `GPT4AllEmbeddings` with no
`gpt4all_kwargs` argument raised a `ValidationError`. Root cause: #21238
added the capability to pass `gpt4all_kwargs` through to the `GPT4All`
instance via `Embed4All`, but broke code that did not specify a
`gpt4all_kwargs` argument.
- **Issue:** #25119 
- **Dependencies:** None
- **Twitter handle:** [`@metadaddy`](https://twitter.com/metadaddy)
2024-08-07 13:34:01 +00:00
Atanu Dasgupta
04dd8d3b0a Update google_search.ipynb (#25135)
updated with langchain_google_community instead as the latest revision

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-07 13:30:59 +00:00
ZhangShenao
63d84e93b9 patch[doc] Fix word spelling error (#25128)
Fix word spelling error
2024-08-07 09:16:17 -04:00
Eugene Yurtsev
4d28c70000 core[patch]: Sort Config attributes (#25127)
This PR does an aesthetic sort of the config object attributes. This
will make it a bit easier to go back and forth between pydantic v1 and
pydantic v2 on the 0.3.x branch
2024-08-07 02:53:50 +00:00
Erick Friis
46a47710b0 partners/milvus: release 0.1.4 (#25058) 2024-08-06 16:29:29 -07:00
Erick Friis
35ebd2620c infra,cli: template matching registration (#25110) 2024-08-06 15:29:55 -07:00
ccurme
23c9aba575 groq[patch]: allow warnings during tests (#25105)
Among integration packages in libs/partners, Groq is an exception in
that it errors on warnings.

Following https://github.com/langchain-ai/langchain/pull/25084, Groq
fails with

> pydantic.warnings.PydanticDeprecatedSince20: The `__fields__`
attribute is deprecated, use `model_fields` instead. Deprecated in
Pydantic V2.0 to be removed in V3.0.

Here we update the behavior to no longer fail on warning, which is
consistent with the rest of the packages in libs/partners.
2024-08-06 18:02:20 -04:00
Bagatur
1331e8589c docs: oai chat nit (#25117) 2024-08-06 22:00:42 +00:00
Bagatur
7882d5c978 openai[patch]: Release 0.1.21rc1 (#25116) 2024-08-06 21:50:36 +00:00
Bagatur
70677202c7 core[patch]: Release 0.2.29rc1 (#25115) 2024-08-06 21:36:56 +00:00
Bagatur
78403a3746 core[patch], openai[patch]: enable strict tool calling (#25111)
Introduced
https://openai.com/index/introducing-structured-outputs-in-the-api/
2024-08-06 21:21:06 +00:00
ccurme
5d10139fc7 docs[patch]: add to qa with sources guide (#25112) 2024-08-06 17:08:35 -04:00
Eugene Yurtsev
d283f452cc core[minor]: Add support for DocumentIndex in the index api (#25100)
Support document index in the index api.
2024-08-06 12:30:49 -07:00
Virat Singh
264ab96980 community: Add stock market tools from financialdatasets.ai (#25025)
**Description:**
In this PR, I am adding three stock market tools from
financialdatasets.ai (my API!):
- get balance sheets
- get cash flow statements
- get income statements

Twitter handle: [@virattt](https://twitter.com/virattt)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-06 18:28:12 +00:00
William FH
267855b3c1 Set Context in RunnableSequence & RunnableParallel (#25073) 2024-08-06 11:10:37 -07:00
Naval Chand
71c0698ee4 Added bedrock 3-5 sonnet cost detials for BedrockAnthropicTokenUsageCallbackHandler (#25104)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Example: "community: Added bedrock 3-5 sonnet cost detials for
BedrockAnthropicTokenUsageCallbackHandler"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Naval Chand <navalchand@192.168.1.36>
2024-08-06 17:28:47 +00:00
Isaac Francisco
a72fddbf8d [docs]: vector store integration pages (#24858)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-06 17:20:27 +00:00
Bagatur
2c798622cd docs: runnable docstring space (#25106) 2024-08-06 16:46:50 +00:00
Bagatur
3abf1b6905 docs: versions sidebar (#25061) 2024-08-06 09:23:43 -07:00
maang-h
1028af17e7 docs: Standardize Tongyi (#25103)
- **Description:** Standardize Tongyi LLM,include:
  - docs, the issue #24803
  - model init arg names, the issue #20085
2024-08-06 11:44:12 -04:00
Dobiichi-Origami
061ed250f6 delete the default model value from langchain and discard the need fo… (#24915)
- description: I remove the limitation of mandatory existence of
`QIANFAN_AK` and default model name which langchain uses cause there is
already a default model nama underlying `qianfan` SDK powering langchain
component.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-06 14:11:05 +00:00
Eugene Yurtsev
293a4a78de core[patch]: Include dependencies in sys_info (#25076)
`python -m langchain_core.sys_info`

```bash
System Information
------------------
> OS:  Linux
> OS Version:  #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2
> Python Version:  3.11.4 (main, Sep 25 2023, 10:06:23) [GCC 11.4.0]

Package Information
-------------------
> langchain_core: 0.2.28
> langchain: 0.2.8
> langsmith: 0.1.85
> langchain_anthropic: 0.1.20
> langchain_openai: 0.1.20
> langchain_standard_tests: 0.1.1
> langchain_text_splitters: 0.2.2
> langgraph: 0.1.19

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.9.5
> anthropic: 0.31.1
> async-timeout: Installed. No version info available.
> defusedxml: 0.7.1
> httpx: 0.27.0
> jsonpatch: 1.33
> numpy: 1.26.4
> openai: 1.39.0
> orjson: 3.10.6
> packaging: 24.1
> pydantic: 2.8.2
> pytest: 7.4.4
> PyYAML: 6.0.1
> requests: 2.32.3
> SQLAlchemy: 2.0.31
> tenacity: 8.5.0
> tiktoken: 0.7.0
> typing-extensions: 4.12.2
```
2024-08-06 09:57:39 -04:00
Dominik Fladung
ffa0c838d8 Allow ConfluenceLoader authorization via Personal Access Tokens (#25096)
- community: Allow authorization to Confluence with bearer token

- **Description:** Allow authorization to Confluence with [Personal
Access
Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html)
by checking for the keys `['client_id', token: ['access_token',
'token_type']]`

- **Issue:** 

Currently the following error occurs when using an personal access token
for authorization.

```python
loader = ConfluenceLoader(
    url=os.getenv('CONFLUENCE_URL'),
    oauth2={
        'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"},
        'client_id': 'client_id',
    },
    page_ids=['12345678'], 
)
```

```
ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"]
```

With this PR the loader runs as expected.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-06 13:42:47 +00:00
orkhank
111c7df117 docs: update numbering of items in method docs (#25093)
Some methods' doc strings have a wrong numbering of items. The numbers
were adjusted accordingly
2024-08-06 09:21:52 -04:00
Bagatur
6eb42c657e core[patch]: Remove default BaseModel init docstring (#25009)
Currently a default init docstring gets appended to the class docstring
of every BaseModel inherited object. This removes the default init
docstring.

![Screenshot 2024-08-02 at 5 09 55
PM](https://github.com/user-attachments/assets/757fe4ae-a793-4e7d-8354-512de2c06818)
2024-08-06 01:04:04 +00:00
Gram Liu
88a9a6a758 core[patch]: Add pydantic metadata to subset model (#25032)
- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031 
- **Dependencies:** n/a
- **Twitter handle:** @gramliu

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-08-05 17:57:39 -07:00
BhujayKumarBhatta
8f33fce871 docs: change for optional variables in chatprompt (#25017)
Fixes #24884
2024-08-05 23:57:44 +00:00
Erick Friis
423d286546 infra: check doc script skip index page (#25088) 2024-08-05 16:38:30 -07:00
Bagatur
e572521f2a core[patch]: exclude special pydantic init params (#25084) 2024-08-05 23:32:51 +00:00
Isaac Francisco
63ddf0afb4 ollama: allow base_url, headers, and auth to be passed (#25078) 2024-08-05 15:39:36 -07:00
Eugene Yurtsev
4bcd2aad6c core[patch]: Relax time constraints on rate limit test (#25071)
Try to keep the unit test fast, but also have it repeat more robustly
2024-08-05 17:04:22 -04:00
jigsawlabs-student
427a04151c community: fix neo4j from_existing_graph (#24912)
Fixes Neo4JVector.from_existing_graph integration with huggingface

Previously threw an error with existing databases, because
from_existing_graph query returns empty list of new nodes, which are
then passed to embedding function, and huggingface errors with empty
list.

Fixes [24401](https://github.com/langchain-ai/langchain/issues/24401)

---------

Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com>
2024-08-05 21:01:46 +00:00
Tomaz Bratanic
d166967003 experimental: Add gliner graph transformer (#25066)
You can use this with:

```
from langchain_experimental.graph_transformers import GlinerGraphTransformer
gliner = GlinerGraphTransformer(allowed_nodes=["Person", "Organization", "Nobel"], allowed_relationships=["EMPLOYEE", "WON"])

from langchain_core.documents import Document

text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]

gliner.convert_to_graph_documents(documents)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-05 21:01:27 +00:00
Bagatur
a74e466507 docs: aws pydantic v2 compat (#25075) 2024-08-05 20:47:11 +00:00
Bagatur
a02a09c973 docs: remove redundant deprecation warning (#25067) 2024-08-05 18:44:47 +00:00
Eugene Yurtsev
41dfad5104 core[minor]: Introduce DocumentIndex abstraction (#25062)
This PR adds a minimal document indexer abstraction.

The goal of this abstraction is to allow developers to create custom
retrievers that also have a standard indexing API and allow updating the
document content in them.

The abstraction comes with a test suite that can verify that the indexer
implements the correct semantics.

This is an iteration over a previous PRs
(https://github.com/langchain-ai/langchain/pull/24364). The main
difference is that we're sub-classing from BaseRetriever in this
iteration and as so have consolidated the sync and async interfaces.

The main problem with the current design is that runt time search
configuration has to be specified at init rather than provided at run
time.

We will likely resolve this issue in one of the two ways:

(1) Define a method (`get_retriever`) that will allow creating a
retriever at run time with a specific configuration.. If we do this, we
will likely break the subclass on BaseRetriever
(2) Generalize base retriever so it can support structured queries

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-05 18:06:33 +00:00
Vkzem
e7b95e0802 docs: update exa search (#24861)
- [x] **PR title**: "docs: changed example for Exa search retriever
usage"

- [x] **PR message**:
- **Description:** Changed Exa integration doc at
`docs/docs/integrations/tools/exa_search.ipynb` to better reflect simple
Exa use case
- **Issue:** move toward more canonical use of Exa method
(`search_and_contents` rather than just `search`)
    - **Dependencies:** no dependencies; docs only change
    - **Twitter handle:** n/a - small change

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. - will do

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-05 17:41:33 +00:00
Stuart Marsh
16bd0697dc milvus: fixed bug when using partition key and dynamic fields together (#25028)
**Description:**

This PR fixes a bug where if `enable_dynamic_field` and
`partition_key_field` are enabled at the same time, a pymilvus error
occurs.

Milvus requires the partition key field to be a full schema defined
field, and not a dynamic one, so it will throw the error "the specified
partition key field {field} not exist" when creating the collection.

When `enabled_dynamic_field` is set to `True`, all schema field creation
based on `metadatas` is skipped. This code now checks if
`partition_key_field` is set, and creates the field.

Integration test added.

**Twitter handle:** StuartMarshUK

---------

Co-authored-by: Stuart Marsh <stuart.marsh@qumata.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-05 16:01:55 +00:00
Jim Baldwin
6890daa90c community: make AthenaLoader profile_name optional and fix type hint (#24958)
- **Description:** This PR makes the AthenaLoader profile_name optional
and fixes the type hint which says the type is `str` but it should be
`str` or `None` as None is handled in the loader init. This is a minor
problem but it just confused me when I was using the Athena Loader to
why we had to use a Profile, as I want that for local but not
production.
- **Issue:** #24957 
- **Dependencies:** None.
2024-08-05 14:28:58 +00:00
Alexey Lapin
335894893b langchain: Make RetryWithErrorOutputParser.from_llm() create a correct retry chain (#25053)
Description: RetryWithErrorOutputParser.from_llm() creates a retry chain
that returns a Generation instance, when it should actually just return
a string.
This class was forgotten when fixing the issue in PR #24687
2024-08-05 14:21:27 +00:00
Dobiichi-Origami
c5cb52a3c6 community: fix issue of the existence of numeric object in additional_kwargs a… (#24863)
- **Description:** A previous PR breaks the code from
`baidu_qianfan_endpoint.py` which causes the malfunction of streaming
2024-08-05 10:15:55 -04:00
ZhangShenao
cda79dbb6c community[patch]: Optimize test case for MoonshotChat (#25050)
Optimize test case for `MoonshotChat`. Use standard
ChatModelIntegrationTests.
2024-08-05 10:11:25 -04:00
orkhank
cea3f72485 docs: fix comment lines in code blocks (#25054)
The comments inside some code blocks seems to be misplaced. The comment
lines containing explanation about `default_key` behavior when operating
with prompts are updated.
2024-08-05 14:11:09 +00:00
ZhangShenao
02c35da445 doc[Retriever] Enhance api docs for MultiQueryRetriever (#25035)
Enhance api docs for `MultiQueryRetriever`:

- Complete missing parameters
- Unify parameter name
2024-08-04 13:56:38 -04:00
Alex Sherstinsky
208042e0f2 community: Fix Predibase Integration for HuggingFace-hosted fine-tuned adapters (#25015)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-08-03 14:05:43 -07:00
maang-h
f5da0d6d87 docs: Standardize MiniMaxEmbeddings (#24983)
- **Description:** Standardize MiniMaxEmbeddings
  - docs, the issue #24856 
  - model init arg names, the issue #20085
2024-08-03 14:01:23 -04:00
ZhangShenao
2c3e3dc6b1 patch[Partners] Unified fix of incorrect variable declarations in all check_imports (#25014)
There are some incorrect declarations of variable `has_failure` in
check_imports. The purpose of this PR is to uniformly fix these errors.
2024-08-03 13:49:41 -04:00
maang-h
7de62abc91 docs: Standardize SparkLLMTextEmbeddings docstrings (#25021)
- **Description:** Standardize SparkLLMTextEmbeddings docstrings
- **Issue:** the issue #24856
2024-08-03 13:44:09 -04:00
Tomaz Bratanic
f9a11a9197 Add relik transformer config (#25019) 2024-08-03 08:41:45 -04:00
Bagatur
1dcee68cb8 docs: show beta directive (#25013)
![Screenshot 2024-08-02 at 7 15 34
PM](https://github.com/user-attachments/assets/086831c7-36f3-4962-98dc-d707b6289747)
2024-08-03 03:07:45 +00:00
Bagatur
e81ddb32a6 docs: fix kwargs docstring (#25010)
Fix:
![Screenshot 2024-08-02 at 5 33 37
PM](https://github.com/user-attachments/assets/7c56cdeb-ee81-454c-b3eb-86aa8a9bdc8d)
2024-08-02 19:54:54 -07:00
Bagatur
57747892ce docs: show deprecation warning first in api ref (#25001)
OLD
![Screenshot 2024-08-02 at 3 29 39
PM](https://github.com/user-attachments/assets/7f169121-1202-4770-a006-d72ac7a1aa33)


NEW
![Screenshot 2024-08-02 at 3 29 45
PM](https://github.com/user-attachments/assets/9cc07cbd-2ae9-4077-95c5-03cb051e6cd7)
2024-08-02 17:35:25 -07:00
Bagatur
679843abb0 docs: separate deprecated classes (#25007)
![Screenshot 2024-08-02 at 4 58 54
PM](https://github.com/user-attachments/assets/29424dd5-0593-4818-9eed-901ff47246b9)
2024-08-02 17:12:47 -07:00
Isaac Francisco
73570873ab docs: standardizing tavily tool docs (#24736)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-02 22:25:27 +00:00
Isaac Francisco
2ae76cecde [docs]: updating mistral and hugging face chat model pages (#24731) 2024-08-02 15:21:25 -07:00
Bagatur
4305f78e40 core[patch]: Release 0.2.28 (#25000) 2024-08-02 21:07:06 +00:00
Bagatur
64ccddf3cb docs: fmt concepts (#24999) 2024-08-02 20:35:45 +00:00
Bagatur
dd8e4cd020 text-splitters[patch]: Release 0.2.3 (#24998) 2024-08-02 20:27:22 +00:00
Bagatur
0de0cd2d31 core[patch]: merge message runs nit (#24997)
Only add separator if both chunks are non-empty
2024-08-02 20:25:43 +00:00
Bagatur
8e2316b8c2 community[patch]: Release 0.2.11 (#24989) 2024-08-02 20:08:44 +00:00
ccurme
c2538e7834 experimental[patch]: bump min versions of core and community (#24996)
Ollama functions unit test broken with min version of community.
2024-08-02 19:58:55 +00:00
ccurme
acba38a18e docs: update toolkit guides (#24992) 2024-08-02 15:51:05 -04:00
ccurme
22c1a4041b community[patch]: support named arguments in github toolkit (#24986)
Parameters may be passed in by name if generated from tool calls.
2024-08-02 18:27:32 +00:00
ccurme
4797b806c2 experimental[patch]: release 0.0.64 (#24990) 2024-08-02 18:00:57 +00:00
Tomaz Bratanic
7061869aec Add relik graph transformer (#24982)
Relik is a new library for graph extraction that offers smaller and
cheaper models for graph construction
2024-08-02 13:55:41 -04:00
Erick Friis
98c22e9082 docs: feature table component (#24985) 2024-08-02 17:41:47 +00:00
ccurme
c04d95b962 standard-tests: set integration test parameters independent of unit test (#24979)
This ends up getting set in integration tests.
2024-08-02 10:40:11 -07:00
gbaian10
54e9ea433a fix: Modify the order of init_chat_model import ollama package. (#24977) 2024-08-02 08:32:56 -07:00
David Gao
fe1820cdaf docs: add wikipedia integration docs (#24932)
Dear langchain maintainers, 

I add the wikipedia integration docs according to the [web
docs](https://python.langchain.com/v0.2/docs/integrations/retrievers/wikipedia/),
and follow the format of [tavily
example](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/retrievers/tavily.ipynb)
and [retriever
template](https://github.com/langchain-ai/langchain/blob/master/libs/cli/langchain_cli/integration_template/docs/retrievers.ipynb),
this is my first time contributing large repo. please let me know if I'm
doing anything wrong, thank you!

Topic related: #24908

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-02 10:12:04 -04:00
ZhangShenao
71c0564c9f community[patch]: Add test case for MoonshotChat (#24960)
Add test case for `MoonshotChat`.
2024-08-02 09:37:31 -04:00
ZhangShenao
c65e48996c patch[partners] Fix check_imports bugs in pinecone and milvus (#24971)
Fix wrong declared variables of `check_imports` in pinecone and milvus
2024-08-02 09:27:11 -04:00
Isaac Francisco
d7688a4328 community[patch]: adding artifact to Tavily search (#24376)
This allows you to get raw content as well as the answer, instead of
just getting the results.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-01 21:12:11 -07:00
Bagatur
7b08de8909 langchain[patch]: Release 0.2.12 (#24954) 2024-08-02 04:04:49 +00:00
Bagatur
245cb5a252 core[patch]: Release 0.2.27 (#24952) 2024-08-02 01:43:24 +00:00
Bagatur
199e9c5ae0 core[patch]: Fix tool args schema inherited field parsing (#24936)
Fix #24925
2024-08-01 18:36:33 -07:00
Bagatur
fba65ba04f infra: test core on py 3.9, 10, 11 (#24951) 2024-08-01 18:23:37 -07:00
Leonid Ganeline
4092876863 core: docstrings `BaseCallbackHandler update (#24948)
Added missed docstrings
2024-08-01 20:46:53 -04:00
ccurme
6e45dba471 docs: fix redirect (#24950) 2024-08-01 20:45:54 -04:00
WU LIFU
ad16eed119 core[patch]: runnable config ensure_config deep copy from var_child_runnable… (#24862)
**issue**: #24660 
RunnableWithMessageHistory.stream result in error because the
[evaluation](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/runnables/branch.py#L220)
of the branch
[condition](99eb31ec41/libs/core/langchain_core/runnables/history.py (L328C1-L329C1))
unexpectedly trigger the
"[on_end](99eb31ec41/libs/core/langchain_core/runnables/history.py (L332))"
(exit_history) callback of the default branch


**descriptions**
After a lot of investigation I'm convinced that the root cause is that
1. during the execution of the runnable, the
[var_child_runnable_config](99eb31ec41/libs/core/langchain_core/runnables/config.py (L122))
is shared between the branch
[condition](99eb31ec41/libs/core/langchain_core/runnables/history.py (L328C1-L329C1))
runnable and the [default branch
runnable](99eb31ec41/libs/core/langchain_core/runnables/history.py (L332))
within the same context
2. when the default branch runnable runs, it gets the
[var_child_runnable_config](99eb31ec41/libs/core/langchain_core/runnables/config.py (L163))
and may unintentionally [add more handlers
](99eb31ec41/libs/core/langchain_core/runnables/config.py (L325))to
the callback manager of this config
3. when it is again the turn for the
[condition](99eb31ec41/libs/core/langchain_core/runnables/history.py (L328C1-L329C1))
to run, it gets the `var_child_runnable_config` whose callback manager
has the handlers added by the default branch. When it runs that handler
(`exit_history`) it leads to the error
   
with the assumption that, the `ensure_config` function actually does
want to create a immutable copy from `var_child_runnable_config` because
it starts with an [`empty` variable
](99eb31ec41/libs/core/langchain_core/runnables/config.py (L156)),
i go ahead to do a deepcopy to ensure that future modification to the
returned value won't affect the `var_child_runnable_config` variable
   
   Having said that I actually 
1. don't know if this is a proper fix
2. don't know whether it will lead to other unintended consequence 
3. don't know why only "stream" runs into this issue while "invoke" runs
without problem

so @nfcampos @hwchase17 please help review, thanks!

---------

Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-08-01 17:30:32 -07:00
Jacob Lee
3ab09d87d6 docs[patch]: Adds components for prereqs, compatibility, fix chat model tab issue (#24585)
Added to `docs/how_to/tools_runtime` as a proof of concept, will apply
everywhere if we like.

A bit more compact than the default callouts, will help standardize the
layout of our pages since we frequently use these boxes.

<img width="1088" alt="Screenshot 2024-07-23 at 4 49 02 PM"
src="https://github.com/user-attachments/assets/7380801c-e092-4d31-bcd8-3652ee05f29e">
2024-08-01 15:04:13 -07:00
ccurme
9cb69a8746 docs: update retriever template, add arxiv retriever (#24947) 2024-08-01 16:53:18 -04:00
Casey Clements
db3ceb4d0a partners/mongodb: Improved search index commands (#24745)
Hardens index commands with try/except for free clusters and optional
waits for syncing and tests.

[efriis](https://github.com/efriis) These are the upgrades to the search
index commands (CRUD) that I mentioned.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-01 20:16:32 +00:00
ccurme
db42576b09 docs: delete old migration guide (#24881)
Redirects to
https://python.langchain.com/v0.2/docs/versions/migrating_chains/
2024-08-01 16:11:47 -04:00
Ikko Eltociear Ashimine
be5294e35d docs: update agents.ipynb (#24945)
initalize -> initialize
2024-08-01 14:37:37 -04:00
ccurme
41ed23a050 docs: update retriever integration pages (#24931) 2024-08-01 14:37:07 -04:00
maang-h
ea505985c4 docs: Standardize ZhipuAIEmbeddings docstrings (#24933)
- **Description:** Standardize ZhipuAIEmbeddings rich docstrings.
- **Issue:** the issue #24856
2024-08-01 14:06:53 -04:00
ccurme
02db66d764 docs: fix kv store column headers (#24941)
![Screenshot 2024-08-01 at 12 32 19
PM](https://github.com/user-attachments/assets/888056b7-3065-4be0-a6b8-bcab5b729c2c)
2024-08-01 09:49:36 -07:00
Anneli Samuel
2204d8cb7d community[patch]: Invoke on_llm_new_token callback before yielding chunk (#24938)
**Description**: Invoke on_llm_new_token callback before yielding chunk
in streaming mode
**Issue**:
[#16913](https://github.com/langchain-ai/langchain/issues/16913)
2024-08-01 16:39:04 +00:00
John
ff6274d32d docs: update langchain-unstructured docs (#24935)
- **Description:** The UnstructuredClient will have a breaking change in
the near future. Add a note in the docs that the examples here may not
use the latest version and users should refer to the SDK docs for the
latest info.
2024-08-01 16:27:40 +00:00
ccurme
c72f0d2f20 docs: update toolkit integration pages (#24887)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-01 12:13:08 -04:00
Eugene Yurtsev
75776e4a54 core[patch]: In unit tests, use _schema() instead of BaseModel.schema() (#24930)
This PR introduces a module with some helper utilities for the pydantic
1 -> 2 migration.

They're meant to be used in the following way:

1) Use the utility code to get unit tests pass without requiring
modification to the unit tests
2) (If desired) upgrade the unit tests to match pydantic 2 output
3) (If desired) stop using the utility code

Currently, this module contains a way to map `schema()` generated by
pydantic 2 to (mostly) match the output from pydantic v1.
2024-08-01 11:59:04 -04:00
Serena Ruan
1827bb4042 community[patch]: support bind_tools for ChatMlflow (#24547)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- **Description:** 
Support ChatMlflow.bind_tools method
Tested in Databricks:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/fa28ef50-0110-4698-8eda-4faf6f0b9ef8">



- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
2024-08-01 08:43:07 -07:00
Michal Gregor
769c3bb838 huggingface: Added a missing argument to a ChatHuggingFace doc notebook. (#24929)
- **Description:** When adding docs for constructing ChatHuggingFace
using a HuggingFacePipeline, I forgot to add `return_full_text=False` as
an argument. In this setup, the chat response would incorrectly contain
all the input text. I am fixing that here by adding that line to the
offending notebook.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-01 15:42:35 +00:00
BottlePumpkin
bfc59c1d26 community: Fix KeyError in NotionDB loader when 'name' is missing (#24224)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.



**Description:** This PR fixes a KeyError in NotionDBLoader when the
"name" key is missing in the "people" property.

**Issue:** Fixes #24223 

**Dependencies:** None

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-08-01 13:55:40 +00:00
alexqiao
8eb0bdead3 community[patch]: Invoke callback prior to yielding token (#24917)
**Description: Invoke callback prior to yielding token in stream method
for chat_models .**
**Issue**: https://github.com/langchain-ai/langchain/issues/16913
#16913
2024-08-01 13:19:55 +00:00
ZhangShenao
b2dd9ffaaf patch[cli] Fix bug in check_imports.py (#24918)
The variable `has_failure` in check_imports.py is wrong-declared. It's
actually an another variable.
2024-08-01 09:08:12 -04:00
Jacob Lee
f14121faaf docs[patch]: Update local RAG tutorial (#24909) 2024-07-31 19:19:23 -07:00
Bagatur
b7abac9f92 infra: poetry lock root (#24913) 2024-08-01 01:19:34 +00:00
Jacob Lee
42c686bc28 docs[patch]: Update local model how-to guide (#24911)
Updates to use `langchain_ollama`, new models, chat model example
2024-07-31 18:01:55 -07:00
Erick Friis
600fc233ef partners/ollama: release 0.1.1 (#24910) 2024-07-31 17:31:29 -07:00
Bagatur
25b93cc4c0 core[patch]: stringify tool non-content blocks (#24626)
Slightly breaking bugfix. Shouldn't cause too many issues since no
models would be able to handle non-content block ToolMessage.content
anyways.
2024-07-31 16:42:38 -07:00
Bagatur
492df75937 docs: chat model table nit (#24907) 2024-07-31 15:14:27 -07:00
Bagatur
a24c445e02 docs: cleanup readme (#24905) 2024-07-31 15:03:28 -07:00
Jacob Lee
5098f9dc79 infra: related section in docs (#24829)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-31 14:25:58 -07:00
Nikita Pakunov
c776471ac6 community: fix AttributeError: 'YandexGPT' object has no attribute '_grpc_metadata' (#24432)
Fixes #24049

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-31 21:18:33 +00:00
Bagatur
752a71b688 integrations[patch]: release model packages (#24900) 2024-07-31 20:48:20 +00:00
Jacob Lee
1213a59f87 docs[patch]: Update kv store docs pages (#24848) 2024-07-31 13:23:24 -07:00
Erick Friis
17a06cb7a6 infra: check templates based on integration (#24857)
instead of hardcoding a linter for each, iterate through the lines of
the template notebook and find lines that start with `##` (includes
lower headings), and enforce that those headings are found in new docs
that are contributed
2024-07-31 13:19:50 -07:00
Erick Friis
a7380dd531 cli: release 0.0.28 (#24852) 2024-07-31 13:03:24 -07:00
Erick Friis
e98e4be0f7 cli: register new integration doc templates (#24854)
- wait to merge for retriever.ipynb merge #24836
2024-07-31 13:03:05 -07:00
Eugene Yurtsev
210623b409 core[minor]: Add support for pydantic 2 to utility to get fields (#24899)
Add compatibility for pydantic 2 for a utility function.

This will help push some small changes to master, so they don't have to
be kept track of on a separate branch.
2024-07-31 19:11:07 +00:00
Bagatur
7d1694040d core[patch]: Release 0.2.26 (#24898) 2024-07-31 19:00:50 +00:00
Eugene Yurtsev
add16111b9 community[patch]: Make the pydantic linter stricter (#24897)
Stricter linting of deprecated pydantic features.
2024-07-31 18:57:37 +00:00
Eugene Yurtsev
a4a444f73d community[patch]: Fix arcee llm usage of root_validator(pre=False) (#24896)
Should be pre=True
2024-07-31 18:49:20 +00:00
Eugene Yurtsev
69c656aa5f langchain[minor]: Upgrade ambiguous root_validator to @pre_init (#24895)
The @pre_init validator is a temporary solution for base models. It has
similar (but not identical) semantics to @root_validator(), but it works
strictly as a pre-init validator.

It'll work as expected as long as the pydantic model type hints were
correct.
2024-07-31 18:46:47 +00:00
Eugene Yurtsev
5099a9c9b4 core[patch]: Update unit tests with a workaround for using AnyID in pydantic 2 (#24892)
Pydantic 2 ignores __eq__ overload for subclasses of strings.
2024-07-31 14:42:12 -04:00
Bagatur
8461934c2b core[patch], integrations[patch]: convert TypedDict to tool schema support (#24641)
supports following UX

```python
    class SubTool(TypedDict):
        """Subtool docstring"""

        args: Annotated[Dict[str, Any], {}, "this does bar"]

    class Tool(TypedDict):
        """Docstring
        Args:
            arg1: foo
        """

        arg1: str
        arg2: Union[int, str]
        arg3: Optional[List[SubTool]]
        arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"]
        arg5: Annotated[Optional[float], None]
```

- can parse google style docstring
- can use Annotated to specify default value (second arg)
- can use Annotated to specify arg description (third arg)
- can have nested complex types
2024-07-31 18:27:24 +00:00
Eugene Yurtsev
d24b82357f community[patch]: Add missing annotations (#24890)
This PR adds annotations in comunity package.

Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.

This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
2024-07-31 18:13:44 +00:00
Eugene Yurtsev
7720483432 langchain[patch]: Update unit tests to workaround a pydantic 2 issue (#24886)
This will allow our unit tests to pass when using AnyID() with our pydantic models.
2024-07-31 14:09:40 -04:00
Eugene Yurtsev
2019e31bc5 langchain[patch]: Add missing type annotations (#24889)
Adds missing type annotations in preparation for pydantic 2 upgrade.
2024-07-31 14:09:22 -04:00
ccurme
30f18c7b02 docs: add retriever integrations template (#24836) 2024-07-31 13:50:44 -04:00
Anirudh31415926535
4da3d4b18e docs: Minor corrections and updates to Cohere docs (#22726)
- **Description:** Update the Cohere's provider and RagRetriever
documentations with latest updates.
    - **Twitter handle:** Anirudh1810
2024-07-31 10:16:26 -07:00
ccurme
40b4a3de6e docs: update chat model integration pages (#24882)
to conform with template
2024-07-31 11:26:52 -04:00
Nishan Jain
b00c0fc558 [Community][minor]: Added prompt governance in pebblo_retrieval (#24874)
Title: [pebblo_retrieval] Identifying entities in prompts given in
PebbloRetrievalQA leading to prompt governance
Description: Implemented identification of entities in the prompt using
Pebblo prompt governance API.
Issue: NA
Dependencies: NA
Add tests and docs: NA
2024-07-31 13:14:51 +00:00
Rajendra Kadam
a6add89bd4 community[minor]: [PebbloSafeLoader] Implement content-size-based batching (#24871)
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:** 
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
    - Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
2024-07-31 09:10:28 -04:00
TrumanYan
096b66db4a community: replace it with Tencent Cloud SDK (#24172)
Description: The old method will be discontinued; use the official SDK
for more model options.
Issue: None
Dependencies: None
Twitter handle: None

Co-authored-by: trumanyan <trumanyan@tencent.com>
2024-07-31 09:05:38 -04:00
Erick Friis
99eb31ec41 cli: embed docstring template (#24855) 2024-07-31 02:16:40 +00:00
Noah Peterson
4b2a8ce6c7 docs: Shorten unreasonably long OllamaEmbeddings page (#24850)
This change removes excessive embeddings output in the Jupyter Notebook
on the [Ollama text embedding
page](https://python.langchain.com/v0.2/docs/integrations/text_embedding/ollama/)
2024-07-31 01:57:04 +00:00
Erick Friis
3999e9035c cli/docs: embedding template standardization (#24849) 2024-07-30 18:54:03 -07:00
Bagatur
1181c10c65 docs: reorder integrations sidebar (#24847) 2024-07-30 16:58:26 -07:00
Bagatur
943126c5fd docs: chat model pkg links (#24845) 2024-07-30 16:26:06 -07:00
Erick Friis
1f5444817a community: deprecate BedrockEmbeddings in favor of langchain-aws (#24846) 2024-07-30 23:13:17 +00:00
Jacob Lee
21eb4c9e5d docs[patch]: Adds first kv store doc matching new template (#24844) 2024-07-30 15:58:51 -07:00
Bagatur
a4e940550a docs: integrations custom callout (#24843) 2024-07-30 22:48:18 +00:00
Bagatur
61ecb10a77 docs: partner pkg table (#24840) 2024-07-30 15:28:10 -07:00
Erick Friis
b099cc3507 cli: release 0.0.27 (#24842) 2024-07-30 22:07:50 +00:00
Bagatur
419f2c2585 cli[patch]: tool integration templates (#24837)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-30 14:59:33 -07:00
mschoenb97IL
19b127f640 langchain: Update Langchain -> Langgraph migration docs for the deprecation of the messages_modifier parameter. (#24839)
**Description:** Updated the Langgraph migration docs to use
`state_modifier` rather than `messages_modifier`
**Issue:** N/A
**Dependencies:** N/A

- [ X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-30 21:28:32 +00:00
ccurme
c123cb2b30 docs: update migration guide (#24835)
Move to its own section in the sidebar.
2024-07-30 20:17:12 +00:00
Erick Friis
957b05b8d5 infra: py3.11 for community integration test compiling (#24834)
e.g.
https://github.com/langchain-ai/langchain/actions/runs/10167754785/job/28120861343?pr=24833
2024-07-30 18:43:10 +00:00
Erick Friis
88418af3f5 core: release 0.2.25 (#24833) 2024-07-30 18:41:09 +00:00
Bagatur
37b060112a langchain[patch]: fix ollama in init_chat_model (#24832) 2024-07-30 18:38:53 +00:00
Jerron Lim
d8f3ea82db langchain[patch]: init_chat_model() to import ChatOllama from langchain-ollama and fallback on langchain-community (#24821)
Description: init_chat_model() should import ChatOllama from
`langchain-ollama`. If that fails, fallback to `langchain-community`
2024-07-30 11:16:10 -07:00
Eugene Yurtsev
3a7f3d46c3 docs: Add pydantic compatibility to side bar (#24826)
Add pydantic compatibility to side bar
2024-07-30 14:10:48 -04:00
Isaac Francisco
511242280b [docs]: standardize vectorstores (#24797) 2024-07-30 10:38:04 -07:00
Jacob Lee
ac649800df docs[patch]: Adds kv store integration docs template (#24804) 2024-07-30 10:07:57 -07:00
cffranco94
b01d938997 experimental: Add config to convert_to_graph_documents (#24012)
PR title: Experimental: Add config to convert_to_graph_documents

Description: In order to use langfuse, i need to pass the langfuse
configuration when invoking the chain. langchain_experimental does not
allow to add any parameters (beside the documents) to the
convert_to_graph_documents method. This way, I cannot monitor the chain
in langfuse.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Catarina Franco <catarina.franco@criticalsoftware.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-30 17:01:06 +00:00
Shailendra Mishra
f2d810b3c0 clob_bugfix... (#24813)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-30 12:44:04 -04:00
Anush
51b15448cc community: Fix FastEmbedEmbeddings (#24462)
## Description

This PR:
- Fixes the validation error in `FastEmbedEmbeddings`.
- Adds support for `batch_size`, `parallel` params.
- Removes support for very old FastEmbed versions.
- Updates the FastEmbed doc with the new params.

Associated Issues:
- Resolves #24039
- Resolves #https://github.com/qdrant/fastembed/issues/296
2024-07-30 12:42:46 -04:00
ccurme
73ec24fc56 docs[patch]: add toolkit template (#24791) 2024-07-30 12:36:09 -04:00
Tamir Zitman
b3e1378f2b langchain : text_splitters Added PowerShell (#24582)
- **Description:** Added PowerShell support for text splitters language
include docs relevant update
  - **Issue:** None
  - **Dependencies:** None

---------

Co-authored-by: tzitman <tamir.zitman@intel.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-30 16:13:52 +00:00
ccurme
187ee96f7a docs: update chat model feature table (#24822) 2024-07-30 09:06:42 -07:00
Nuno Campos
68ecebf1ec core: Fix implementation of trim_first_node/trim_last_node to use exact same definition of first/last node as in the getter methods (#24802) 2024-07-30 08:44:27 -07:00
Igor Drozdov
c2706cfb9e feat(community): add tools support for litellm (#23906)
I used the following example to validate the behavior

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import ConfigurableField
from langchain_anthropic import ChatAnthropic
from langchain_community.chat_models import ChatLiteLLM
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor

@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' times 'y'."""
    return x * y

@tool
def exponentiate(x: float, y: float) -> float:
    """Raise 'x' to the 'y'."""
    return x**y

@tool
def add(x: float, y: float) -> float:
    """Add 'x' and 'y'."""
    return x + y

prompt = ChatPromptTemplate.from_messages([
    ("system", "you're a helpful assistant"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

tools = [multiply, exponentiate, add]

llm = ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0)
# llm = ChatLiteLLM(model="claude-3-sonnet-20240229", temperature=0)

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke({"input": "what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241", })
```

`ChatAnthropic` version works:

```
> Entering new AgentExecutor chain...

Invoking: `exponentiate` with `{'x': 5, 'y': 2.743}`
responded: [{'text': 'To calculate 3 + 5^2.743, we can use the "exponentiate" and "add" tools:', 'type': 'text', 'index': 0}, {'id': 'toolu_01Gf54DFTkfLMJQX3TXffmxe', 'input': {}, 'name': 'exponentiate', 'type': 'tool_use', 'index': 1, 'partial_json': '{"x": 5, "y": 2.743}'}]

82.65606421491815
Invoking: `add` with `{'x': 3, 'y': 82.65606421491815}`
responded: [{'id': 'toolu_01XUq9S56GT3Yv2N1KmNmmWp', 'input': {}, 'name': 'add', 'type': 'tool_use', 'index': 0, 'partial_json': '{"x": 3, "y": 82.65606421491815}'}]

85.65606421491815
Invoking: `add` with `{'x': 17.24, 'y': -918.1241}`
responded: [{'text': '\n\nSo 3 + 5^2.743 = 85.66\n\nTo calculate 17.24 - 918.1241, we can use:', 'type': 'text', 'index': 0}, {'id': 'toolu_01BkXTwP7ec9JKYtZPy5JKjm', 'input': {}, 'name': 'add', 'type': 'tool_use', 'index': 1, 'partial_json': '{"x": 17.24, "y": -918.1241}'}]

-900.8841[{'text': '\n\nTherefore, 17.24 - 918.1241 = -900.88', 'type': 'text', 'index': 0}]

> Finished chain.
```

While `ChatLiteLLM` version doesn't.

But with the changes in this PR, along with:

- https://github.com/langchain-ai/langchain/pull/23823
- https://github.com/BerriAI/litellm/pull/4554

The result is _almost_ the same:

```
> Entering new AgentExecutor chain...

Invoking: `exponentiate` with `{'x': 5, 'y': 2.743}`
responded: To calculate 3 + 5^2.743, we can use the "exponentiate" and "add" tools:

82.65606421491815
Invoking: `add` with `{'x': 3, 'y': 82.65606421491815}`


85.65606421491815
Invoking: `add` with `{'x': 17.24, 'y': -918.1241}`
responded:

So 3 + 5^2.743 = 85.66

To calculate 17.24 - 918.1241, we can use:

-900.8841

Therefore, 17.24 - 918.1241 = -900.88

> Finished chain.
```

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-30 15:39:34 +00:00
David Robertson
bfb7f8d40a Brave Search: Enhance search result details with extra snippets (#19209)
**Description:** 

This update significantly improves the Brave Search Tool's utility
within the LangChain library by enriching the search results it returns.
The tool previously returned title, link, and snippet, with the snippet
being a truncated 140-character description from the search engine. To
make the search results more informative, this update enables
extra_snippets by default and introduces additional result fields:
title, link, description (enhancing and renaming the former snippet
field), age, and snippets. The snippets field provides a list of strings
summarizing the webpage, utilizing Brave's capability for more detailed
search insights. This enhancement aims to make the search tool far more
informative and beneficial for users.

**Issue:** N/A

**Dependencies:** No additional dependencies introduced.

**Twitter handle:** @davidalexr987

**Code Changes Summary:**

- Changed the default setting to include extra_snippets in search
results.
- Renamed the snippet field to description to accurately reflect its
content and included an age field for search results.
- Introduced a snippets field that lists webpage summaries, providing
users with comprehensive search result insights.

**Backward Compatibility Note:**

The renaming of snippet to description improves the accuracy of the
returned data field but may impact existing users who have developed
integration's or analyses based on the snippet field. I believe this
change is essential for clarity and utility, and it aligns better with
the data provided by Brave Search.

**Additional Notes:**

This proposal focuses exclusively on the Brave Search package, without
affecting other LangChain packages or introducing new dependencies.
2024-07-30 15:29:38 +00:00
Eugene Yurtsev
873f64751e docs: Remove danger on how to migrate to astream events v2 (#24825)
Users should migrate to v2 now
2024-07-30 15:28:07 +00:00
Ben Chambers
435771fe74 [community]: Fix package name mismatch (#24824)
- **Description:** fix a mismatch in pypi package names
2024-07-30 11:21:39 -04:00
ccurme
b7bbfc7c67 langchain: revert "init_chat_model() to support ChatOllama from langchain-ollama" (#24819)
Reverts langchain-ai/langchain#24818

Overlooked discussion in
https://github.com/langchain-ai/langchain/pull/24801.
2024-07-30 14:23:36 +00:00
Jerron Lim
5abfc85fec langchain: init_chat_model() to support ChatOllama from langchain-ollama (#24818)
Description: Since moving away from `langchain-community` is
recommended, `init_chat_models()` should import ChatOllama from
`langchain-ollama` instead.
2024-07-30 10:17:38 -04:00
Eugene Yurtsev
4fab8996cf docs: Update pydantic compatibility (#24625)
Update pydantic compatibility. This will only be true after we release
the partner packages.
2024-07-29 22:19:00 -04:00
Jacob Lee
d6ca1474e0 docs[patch]: Adds key-value store to conceptual guide (#24798) 2024-07-29 18:45:16 -07:00
Erick Friis
cdaea17b3e cli/docs: llm integration template standardization (#24795) 2024-07-29 17:47:13 -07:00
Bagatur
a6d1fb4275 core[patch]: introduce ToolMessage.status (#24628)
Anthropic models (including via Bedrock and other cloud platforms)
accept a status/is_error attribute on tool messages/results
(specifically in `tool_result` content blocks for Anthropic API). Adding
a ToolMessage.status attribute so that users can set this attribute when
using those models
2024-07-29 14:01:53 -07:00
Isaac Francisco
78d97b49d9 [partner]: ollama llm fix (#24790) 2024-07-29 13:00:02 -07:00
maang-h
4bb1a11e02 community: Add MiniMaxChat bind_tools and structured output (#24310)
- **Description:** 
  - Add `bind_tools` method to support tool calling 
  - Add `with_structured_output` method to support structured output
2024-07-29 15:51:52 -04:00
John
0a2ff40fcc partners/unstructured: fix client api_url (#24680)
**Description:** Add empty string default for api_key and change
`server_url` to `url` to match existing loaders.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-29 11:16:41 -07:00
maang-h
bf685c242f docs: Standardize QianfanEmbeddingsEndpoint (#24786)
- **Description:** Standardize QianfanEmbeddingsEndpoint, include:
  - docstrings, the issue #21983 
  - model init arg names, the issue #20085
2024-07-29 13:19:24 -04:00
ccurme
9998e55936 core[patch]: support tool calls with non-pickleable args in tools (#24741)
Deepcopy raises with non-pickleable args.
2024-07-29 13:18:39 -04:00
Erick Friis
df78608741 mongodb: bson optional import (#24685) 2024-07-29 09:54:01 -07:00
M. Ali
c086410677 fix docs typos (#23668)
Thank you for contributing to LangChain!

- [x] **PR title**: "docs: fix multiple typos"

Co-authored-by: mohblnk <mohamed.ali@blnk.ai>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-29 16:10:55 +00:00
Pere Pasamonte
98175860ad community: Fix AWS DocumentDB similarity_search when filter is None (#24777)
**Description**

Fixes DocumentDBVectorSearch similarity_search when no filter is used;
it defaults to None but $match does not accept None, so changed default
to empty {} before pipeline is created.

**Issue**

AWS DocumentDB similarity search does not work when no filter is used.
Error msg: "the match filter must be an expression in an object" #24775

**Dependencies**

No dependencies

**Twitter handle**

https://x.com/perepasamonte

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-29 15:32:05 +00:00
Lennart J. Kurzweg
7da0597ecb partners[ollama]: Support seed parameter for ChatOllama (#24782)
## Description

Adds seed parameter to ChatOllama

## Resolves Issues
- #24703

## Dependency Changes
None

Co-authored-by: Lennart J. Kurzweg (Nx2) <git@nx2.site>
2024-07-29 15:15:20 +00:00
ccurme
e264ccf484 standard-tests[patch]: update groq and structured output test (#24781)
- Mixtral with Groq has started consistently failing tool calling tests.
Here we restrict testing to llama 3.1.
- `.schema` is deprecated in pydantic proper in favor of
`.model_json_schema`.
2024-07-29 11:10:01 -04:00
ZhangShenao
4a05679fdb patch[experimental] Fix prompt in GenerativeAgentMemory (#24771)
There is an issue with the prompt format in `GenerativeAgentMemory` ,
try to fix it.
The prompt is same as the one in method `_score_memory_importance`.
2024-07-29 07:02:31 -04:00
WU LIFU
2ba8393182 graph_transformers: bug fix for create_simple_model not passing in ll… (#24643)
issue: #24615 

descriptions: The _Graph pydantic model generated from
create_simple_model (which LLMGraphTransformer uses when allowed nodes
and relationships are provided) does not constrain the relationships
(source and target types, relationship type), and the node and
relationship properties with enums when using ChatOpenAI.
The issue is that when calling optional_enum_field throughout
create_simple_model the llm_type parameter is not passed in except for
when creating node type. Passing it into each call fixes the issue.

Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
2024-07-29 07:00:56 -04:00
William FH
01ab2918a2 core[patch]: Respect injected in bound fns (#24733)
Since right now you cant use the nice injected arg syntas directly with
model.bind_tools()
2024-07-28 15:45:19 -07:00
Pavel
7fcfe7c1f4 openai[patch]: openai proxy added to base embeddings (#24539)
- [ ] **PR title**: "langchain-openai: openai proxy added to base
embeddings"

- [ ] **PR message**: 
    - **Description:** 
    Dear langchain developers,
You've already supported proxy for ChatOpenAI implementation in your
package. At the same time, if somebody needed to use proxy for chat, it
also could be necessary to be able to use it for OpenAIEmbeddings.
That's why I think it's important to add proxy support for OpenAI
embeddings. That's what I've done in this PR.

@baskaryan

---------

Co-authored-by: karpov <karpov@dohod.ru>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-07-28 20:54:13 +00:00
Lakshmi Peri
821196c4ee langchain-aws InMemoryVectorStore documentation updates (#24347)
Thank you for contributing to LangChain!

- [x] **PR title**: "Add documentaiton on InMemoryVectorStore driver for
MemoryDB to langchain-aws"
  - Langchain-aws repo :Add MemoryDB documentation 
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added documentation on InMemoryVectorStore driver to
aws.mdx and usage example on MemoryDB clusuter
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
Add memorydb notebook to docs/docs/integrations/ folde


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-28 15:09:51 -04:00
Chuck Wooters
56c2a7f6d4 partners: add missing key name to Field() for ChatFireworks model (#24721)
**Description:** 

In the `ChatFireworks` class definition, the Field() call for the "stop"
("stop_sequences") parameter is missing the "default" keyword.

**Issue:**

Type checker reports "stop_sequences" as a missing arg (not recognizing
the default value is None)

**Dependencies:**

None

**Twitter handle:**

None
2024-07-28 18:40:21 +00:00
AmosDinh
c113682328 community:Add support for specifying document_loaders.firecrawl api url. (#24747)
community:Add support for specifying document_loaders.firecrawl api url.


Add support for specifying document_loaders.firecrawl api url. 
This is mainly to support the
[self-hosting](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md)
option firecrawl provides. Eg. now I can specify localhost:....

The corresponding firecrawl class already provides functionality to pass
the argument. See here:
4c9d62f6d3/apps/python-sdk/firecrawl/firecrawl.py (L29)

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-28 14:30:36 -04:00
Jerron Lim
df37c0d086 partners[ollama]: Support base_url for ChatOllama (#24719)
Add a class attribute `base_url` for ChatOllama to allow users to choose
a different URL to connect to.

Fixes #24555
2024-07-28 14:25:58 -04:00
1157 changed files with 55682 additions and 41706 deletions

View File

@@ -1,7 +1,6 @@
import glob
import json
import os
import re
import sys
import tomllib
from collections import defaultdict
@@ -86,6 +85,11 @@ def add_dependents(dirs_to_eval: Set[str], dependents: dict) -> List[str]:
def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
if dir_ == "libs/core":
return [
{"working-directory": dir_, "python-version": f"3.{v}"}
for v in range(8, 13)
]
min_python = "3.8"
max_python = "3.12"
@@ -100,6 +104,10 @@ def _get_configs_for_single_dir(job: str, dir_: str) -> List[Dict[str, str]]:
# even in uv
max_python = "3.11"
if dir_ == "libs/community" and job == "compile-integration-tests":
# community integration deps are slow in 3.12
max_python = "3.11"
return [
{"working-directory": dir_, "python-version": min_python},
{"working-directory": dir_, "python-version": max_python},

2
.gitignore vendored
View File

@@ -172,6 +172,8 @@ docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
!docs/api_reference/_extensions/
!docs/api_reference/scripts/
docs/docs/build
docs/docs/node_modules
docs/docs/yarn.lock

View File

@@ -52,7 +52,7 @@ Now:
`from langchain_experimental.sql import SQLDatabaseChain`
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out [`create_sql_query_chain`](https://github.com/langchain-ai/langchain/blob/master/docs/extras/use_cases/tabular/sql_query.ipynb)
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out this [`SQL question-answering tutorial`](https://python.langchain.com/v0.2/docs/tutorials/sql_qa/#convert-question-to-sql-query)
`from langchain.chains import create_sql_query_chain`

View File

@@ -31,6 +31,7 @@ docs_linkcheck:
api_docs_build:
poetry run python docs/api_reference/create_api_rst.py
cd docs/api_reference && poetry run make html
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
API_PKG ?= text-splitters
@@ -38,12 +39,14 @@ api_docs_quick_preview:
poetry run pip install "pydantic<2"
poetry run python docs/api_reference/create_api_rst.py $(API_PKG)
cd docs/api_reference && poetry run make html
open docs/api_reference/_build/html/$(shell echo $(API_PKG) | sed 's/-/_/g')_api_reference.html
poetry run python docs/api_reference/scripts/custom_formatter.py docs/api_reference/_build/html/
open docs/api_reference/_build/html/reference.html
## api_docs_clean: Clean the API Reference documentation build artifacts.
api_docs_clean:
find ./docs/api_reference -name '*_api_reference.rst' -delete
git clean -fdX ./docs/api_reference
rm docs/api_reference/index.md
## api_docs_linkcheck: Run linkchecker on the API Reference documentation.

View File

@@ -7,7 +7,6 @@
[![PyPI - License](https://img.shields.io/pypi/l/langchain-core?style=flat-square)](https://opensource.org/licenses/MIT)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-core?style=flat-square)](https://pypistats.org/packages/langchain-core)
[![GitHub star chart](https://img.shields.io/github/stars/langchain-ai/langchain?style=flat-square)](https://star-history.com/#langchain-ai/langchain)
[![Dependency Status](https://img.shields.io/librariesio/github/langchain-ai/langchain?style=flat-square)](https://libraries.io/github/langchain-ai/langchain)
[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain?style=flat-square)](https://github.com/langchain-ai/langchain/issues)
[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode&style=flat-square)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain)

View File

@@ -39,7 +39,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml langchainhub"
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml langchainhub"
]
},
{

View File

@@ -59,7 +59,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml"
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
]
},
{

View File

@@ -59,7 +59,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install langchain langchain-chroma unstructured[all-docs] pydantic lxml"
"! pip install langchain langchain-chroma \"unstructured[all-docs]\" pydantic lxml"
]
},
{

View File

@@ -166,7 +166,7 @@
"source": [
"### SQL Database Agent example\n",
"\n",
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/toolkits/sql_database.html) for answering questions over a Databricks database."
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
]
},
{

View File

@@ -13,7 +13,12 @@ OUTPUT_NEW_DOCS_DIR = $(OUTPUT_NEW_DIR)/docs
PYTHON = .venv/bin/python
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec test -e "{}/pyproject.toml" \; -print | grep -vE "airbyte|ibm|couchbase" | tr '\n' ' ')
PARTNER_DEPS_LIST := $(shell find ../libs/partners -mindepth 1 -maxdepth 1 -type d -exec sh -c ' \
for dir; do \
if find "$$dir" -maxdepth 1 -type f \( -name "pyproject.toml" -o -name "setup.py" \) | grep -q .; then \
echo "$$dir"; \
fi \
done' sh {} + | grep -vE "airbyte|ibm|couchbase" | tr '\n' ' ')
PORT ?= 3001
@@ -36,11 +41,11 @@ generate-files:
cp -r $(SOURCE_DIR)/* $(INTERMEDIATE_DIR)
mkdir -p $(INTERMEDIATE_DIR)/templates
$(PYTHON) scripts/model_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/tool_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/document_loader_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/kv_store_feat_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/partner_pkg_table.py $(INTERMEDIATE_DIR)
$(PYTHON) scripts/copy_templates.py $(INTERMEDIATE_DIR)
@@ -65,16 +70,23 @@ render:
md-sync:
rsync -avm --include="*/" --include="*.mdx" --include="*.md" --include="*.png" --include="*/_category_.yml" --exclude="*" $(INTERMEDIATE_DIR)/ $(OUTPUT_NEW_DOCS_DIR)
append-related:
$(PYTHON) scripts/append_related_links.py $(OUTPUT_NEW_DOCS_DIR)
generate-references:
$(PYTHON) scripts/generate_api_reference_links.py --docs_dir $(OUTPUT_NEW_DOCS_DIR)
build: install-py-deps generate-files copy-infra render md-sync
build: install-py-deps generate-files copy-infra render md-sync append-related
vercel-build: install-vercel-deps build generate-references
rm -rf docs
mv $(OUTPUT_NEW_DOCS_DIR) docs
rm -rf build
yarn run docusaurus build
mkdir static/api_reference
git clone --depth=1 https://github.com/baskaryan/langchain-api-docs-build.git
mv langchain-api-docs-build/api_reference_build/html/* static/api_reference/
rm -rf langchain-api-docs-build
NODE_OPTIONS="--max-old-space-size=5000" yarn run docusaurus build
mv build v0.2
mkdir build
mv v0.2 build

View File

@@ -0,0 +1,144 @@
"""A directive to generate a gallery of images from structured data.
Generating a gallery of images that are all the same size is a common
pattern in documentation, and this can be cumbersome if the gallery is
generated programmatically. This directive wraps this particular use-case
in a helper-directive to generate it with a single YAML configuration file.
It currently exists for maintainers of the pydata-sphinx-theme,
but might be abstracted into a standalone package if it proves useful.
"""
from pathlib import Path
from typing import Any, ClassVar, Dict, List
from docutils import nodes
from docutils.parsers.rst import directives
from sphinx.application import Sphinx
from sphinx.util import logging
from sphinx.util.docutils import SphinxDirective
from yaml import safe_load
logger = logging.getLogger(__name__)
TEMPLATE_GRID = """
`````{{grid}} {columns}
{options}
{content}
`````
"""
GRID_CARD = """
````{{grid-item-card}} {title}
{options}
{content}
````
"""
class GalleryGridDirective(SphinxDirective):
"""A directive to show a gallery of images and links in a Bootstrap grid.
The grid can be generated from a YAML file that contains a list of items, or
from the content of the directive (also formatted in YAML). Use the parameter
"class-card" to add an additional CSS class to all cards. When specifying the grid
items, you can use all parameters from "grid-item-card" directive to customize
individual cards + ["image", "header", "content", "title"].
Danger:
This directive can only be used in the context of a Myst documentation page as
the templates use Markdown flavored formatting.
"""
name = "gallery-grid"
has_content = True
required_arguments = 0
optional_arguments = 1
final_argument_whitespace = True
option_spec: ClassVar[dict[str, Any]] = {
# A class to be added to the resulting container
"grid-columns": directives.unchanged,
"class-container": directives.unchanged,
"class-card": directives.unchanged,
}
def run(self) -> List[nodes.Node]:
"""Create the gallery grid."""
if self.arguments:
# If an argument is given, assume it's a path to a YAML file
# Parse it and load it into the directive content
path_data_rel = Path(self.arguments[0])
path_doc, _ = self.get_source_info()
path_doc = Path(path_doc).parent
path_data = (path_doc / path_data_rel).resolve()
if not path_data.exists():
logger.info(f"Could not find grid data at {path_data}.")
nodes.text("No grid data found at {path_data}.")
return
yaml_string = path_data.read_text()
else:
yaml_string = "\n".join(self.content)
# Use all the element with an img-bottom key as sites to show
# and generate a card item for each of them
grid_items = []
for item in safe_load(yaml_string):
# remove parameters that are not needed for the card options
title = item.pop("title", "")
# build the content of the card using some extra parameters
header = f"{item.pop('header')} \n^^^ \n" if "header" in item else ""
image = f"![image]({item.pop('image')}) \n" if "image" in item else ""
content = f"{item.pop('content')} \n" if "content" in item else ""
# optional parameter that influence all cards
if "class-card" in self.options:
item["class-card"] = self.options["class-card"]
loc_options_str = "\n".join(f":{k}: {v}" for k, v in item.items()) + " \n"
card = GRID_CARD.format(
options=loc_options_str, content=header + image + content, title=title
)
grid_items.append(card)
# Parse the template with Sphinx Design to create an output container
# Prep the options for the template grid
class_ = "gallery-directive" + f' {self.options.get("class-container", "")}'
options = {"gutter": 2, "class-container": class_}
options_str = "\n".join(f":{k}: {v}" for k, v in options.items())
# Create the directive string for the grid
grid_directive = TEMPLATE_GRID.format(
columns=self.options.get("grid-columns", "1 2 3 4"),
options=options_str,
content="\n".join(grid_items),
)
# Parse content as a directive so Sphinx Design processes it
container = nodes.container()
self.state.nested_parse([grid_directive], 0, container)
# Sphinx Design outputs a container too, so just use that
return [container.children[0]]
def setup(app: Sphinx) -> Dict[str, Any]:
"""Add custom configuration to sphinx app.
Args:
app: the Sphinx application
Returns:
the 2 parallel parameters set to ``True``.
"""
app.add_directive("gallery-grid", GalleryGridDirective)
return {
"parallel_read_safe": True,
"parallel_write_safe": True,
}

View File

@@ -1,26 +1,411 @@
pre {
white-space: break-spaces;
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap');
/*******************************************************************************
* master color map. Only the colors that actually differ between light and dark
* themes are specified separately.
*
* To see the full list of colors see https://www.figma.com/file/rUrrHGhUBBIAAjQ82x6pz9/PyData-Design-system---proposal-for-implementation-(2)?node-id=1234%3A765&t=ifcFT1JtnrSshGfi-1
*/
/**
* Function to get items from nested maps
*/
/* Assign base colors for the PyData theme */
:root {
--pst-teal-50: #f4fbfc;
--pst-teal-100: #e9f6f8;
--pst-teal-200: #d0ecf1;
--pst-teal-300: #abdde6;
--pst-teal-400: #3fb1c5;
--pst-teal-500: #0a7d91;
--pst-teal-600: #085d6c;
--pst-teal-700: #064752;
--pst-teal-800: #042c33;
--pst-teal-900: #021b1f;
--pst-violet-50: #f4eefb;
--pst-violet-100: #e0c7ff;
--pst-violet-200: #d5b4fd;
--pst-violet-300: #b780ff;
--pst-violet-400: #9c5ffd;
--pst-violet-500: #8045e5;
--pst-violet-600: #6432bd;
--pst-violet-700: #4b258f;
--pst-violet-800: #341a61;
--pst-violet-900: #1e0e39;
--pst-gray-50: #f9f9fa;
--pst-gray-100: #f3f4f5;
--pst-gray-200: #e5e7ea;
--pst-gray-300: #d1d5da;
--pst-gray-400: #9ca4af;
--pst-gray-500: #677384;
--pst-gray-600: #48566b;
--pst-gray-700: #29313d;
--pst-gray-800: #222832;
--pst-gray-900: #14181e;
--pst-pink-50: #fcf8fd;
--pst-pink-100: #fcf0fa;
--pst-pink-200: #f8dff5;
--pst-pink-300: #f3c7ee;
--pst-pink-400: #e47fd7;
--pst-pink-500: #c132af;
--pst-pink-600: #912583;
--pst-pink-700: #6e1c64;
--pst-pink-800: #46123f;
--pst-pink-900: #2b0b27;
--pst-foundation-white: #ffffff;
--pst-foundation-black: #14181e;
--pst-green-10: #f1fdfd;
--pst-green-50: #E0F7F6;
--pst-green-100: #B3E8E6;
--pst-green-200: #80D6D3;
--pst-green-300: #4DC4C0;
--pst-green-400: #4FB2AD;
--pst-green-500: #287977;
--pst-green-600: #246161;
--pst-green-700: #204F4F;
--pst-green-800: #1C3C3C;
--pst-green-900: #0D2427;
--pst-lilac-50: #f4eefb;
--pst-lilac-100: #DAD6FE;
--pst-lilac-200: #BCB2FD;
--pst-lilac-300: #9F8BFA;
--pst-lilac-400: #7F5CF6;
--pst-lilac-500: #6F3AED;
--pst-lilac-600: #6028D9;
--pst-lilac-700: #5021B6;
--pst-lilac-800: #431D95;
--pst-lilac-900: #1e0e39;
--pst-header-height: 2.5rem;
}
@media (min-width: 1200px) {
.container,
.container-lg,
.container-md,
.container-sm,
.container-xl {
max-width: 2560px !important;
}
html {
--pst-font-family-base: 'Inter';
--pst-font-family-heading: 'Inter Tight', sans-serif;
}
#my-component-root *,
#headlessui-portal-root * {
z-index: 10000;
/*******************************************************************************
* write the color rules for each theme (light/dark)
*/
/* NOTE:
* Mixins enable us to reuse the same definitions for the different modes
* https://sass-lang.com/documentation/at-rules/mixin
* something inserts a variable into a CSS selector or property name
* https://sass-lang.com/documentation/interpolation
*/
/* Defaults to light mode if data-theme is not set */
html:not([data-theme]) {
--pst-color-primary: #287977;
--pst-color-primary-bg: #80D6D3;
--pst-color-secondary: #6F3AED;
--pst-color-secondary-bg: #DAD6FE;
--pst-color-accent: #c132af;
--pst-color-accent-bg: #f8dff5;
--pst-color-info: #276be9;
--pst-color-info-bg: #dce7fc;
--pst-color-warning: #f66a0a;
--pst-color-warning-bg: #f8e3d0;
--pst-color-success: #00843f;
--pst-color-success-bg: #d6ece1;
--pst-color-attention: var(--pst-color-warning);
--pst-color-attention-bg: var(--pst-color-warning-bg);
--pst-color-danger: #d72d47;
--pst-color-danger-bg: #f9e1e4;
--pst-color-text-base: #222832;
--pst-color-text-muted: #48566b;
--pst-color-heading-color: #ffffff;
--pst-color-shadow: rgba(0, 0, 0, 0.1);
--pst-color-border: #d1d5da;
--pst-color-border-muted: rgba(23, 23, 26, 0.2);
--pst-color-inline-code: #912583;
--pst-color-inline-code-links: #246161;
--pst-color-target: #f3cf95;
--pst-color-background: #ffffff;
--pst-color-on-background: #F4F9F8;
--pst-color-surface: #F4F9F8;
--pst-color-on-surface: #222832;
}
html:not([data-theme]) {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html:not([data-theme]) .only-dark,
html:not([data-theme]) .only-dark ~ figcaption {
display: none !important;
}
table.longtable code {
white-space: normal;
/* NOTE: @each {...} is like a for-loop
* https://sass-lang.com/documentation/at-rules/control/each
*/
html[data-theme=light] {
--pst-color-primary: #287977;
--pst-color-primary-bg: #80D6D3;
--pst-color-secondary: #6F3AED;
--pst-color-secondary-bg: #DAD6FE;
--pst-color-accent: #c132af;
--pst-color-accent-bg: #f8dff5;
--pst-color-info: #276be9;
--pst-color-info-bg: #dce7fc;
--pst-color-warning: #f66a0a;
--pst-color-warning-bg: #f8e3d0;
--pst-color-success: #00843f;
--pst-color-success-bg: #d6ece1;
--pst-color-attention: var(--pst-color-warning);
--pst-color-attention-bg: var(--pst-color-warning-bg);
--pst-color-danger: #d72d47;
--pst-color-danger-bg: #f9e1e4;
--pst-color-text-base: #222832;
--pst-color-text-muted: #48566b;
--pst-color-heading-color: #ffffff;
--pst-color-shadow: rgba(0, 0, 0, 0.1);
--pst-color-border: #d1d5da;
--pst-color-border-muted: rgba(23, 23, 26, 0.2);
--pst-color-inline-code: #912583;
--pst-color-inline-code-links: #246161;
--pst-color-target: #f3cf95;
--pst-color-background: #ffffff;
--pst-color-on-background: #F4F9F8;
--pst-color-surface: #F4F9F8;
--pst-color-on-surface: #222832;
color-scheme: light;
}
html[data-theme=light] {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html[data-theme=light] .only-dark,
html[data-theme=light] .only-dark ~ figcaption {
display: none !important;
}
table.longtable td {
max-width: 600px;
html[data-theme=dark] {
--pst-color-primary: #4FB2AD;
--pst-color-primary-bg: #1C3C3C;
--pst-color-secondary: #7F5CF6;
--pst-color-secondary-bg: #431D95;
--pst-color-accent: #e47fd7;
--pst-color-accent-bg: #46123f;
--pst-color-info: #79a3f2;
--pst-color-info-bg: #06245d;
--pst-color-warning: #ff9245;
--pst-color-warning-bg: #652a02;
--pst-color-success: #5fb488;
--pst-color-success-bg: #002f17;
--pst-color-attention: var(--pst-color-warning);
--pst-color-attention-bg: var(--pst-color-warning-bg);
--pst-color-danger: #e78894;
--pst-color-danger-bg: #4e111b;
--pst-color-text-base: #ced6dd;
--pst-color-text-muted: #9ca4af;
--pst-color-heading-color: #14181e;
--pst-color-shadow: rgba(0, 0, 0, 0.2);
--pst-color-border: #48566b;
--pst-color-border-muted: #29313d;
--pst-color-inline-code: #f3c7ee;
--pst-color-inline-code-links: #4FB2AD;
--pst-color-target: #675c04;
--pst-color-background: #14181e;
--pst-color-on-background: #222832;
--pst-color-surface: #29313d;
--pst-color-on-surface: #f3f4f5;
/* Adjust images in dark mode (unless they have class .only-dark or
* .dark-light, in which case assume they're already optimized for dark
* mode).
*/
/* Give images a light background in dark mode in case they have
* transparency and black text (unless they have class .only-dark or .dark-light, in
* which case assume they're already optimized for dark mode).
*/
color-scheme: dark;
}
html[data-theme=dark] {
--pst-color-link: var(--pst-color-primary);
--pst-color-link-hover: var(--pst-color-secondary);
}
html[data-theme=dark] .only-light,
html[data-theme=dark] .only-light ~ figcaption {
display: none !important;
}
html[data-theme=dark] img:not(.only-dark):not(.dark-light) {
filter: brightness(0.8) contrast(1.2);
}
html[data-theme=dark] .bd-content img:not(.only-dark):not(.dark-light) {
background: rgb(255, 255, 255);
border-radius: 0.25rem;
}
html[data-theme=dark] .MathJax_SVG * {
fill: var(--pst-color-text-base);
}
.pst-color-primary {
color: var(--pst-color-primary);
}
.pst-color-secondary {
color: var(--pst-color-secondary);
}
.pst-color-accent {
color: var(--pst-color-accent);
}
.pst-color-info {
color: var(--pst-color-info);
}
.pst-color-warning {
color: var(--pst-color-warning);
}
.pst-color-success {
color: var(--pst-color-success);
}
.pst-color-attention {
color: var(--pst-color-attention);
}
.pst-color-danger {
color: var(--pst-color-danger);
}
.pst-color-text-base {
color: var(--pst-color-text-base);
}
.pst-color-text-muted {
color: var(--pst-color-text-muted);
}
.pst-color-heading-color {
color: var(--pst-color-heading-color);
}
.pst-color-shadow {
color: var(--pst-color-shadow);
}
.pst-color-border {
color: var(--pst-color-border);
}
.pst-color-border-muted {
color: var(--pst-color-border-muted);
}
.pst-color-inline-code {
color: var(--pst-color-inline-code);
}
.pst-color-inline-code-links {
color: var(--pst-color-inline-code-links);
}
.pst-color-target {
color: var(--pst-color-target);
}
.pst-color-background {
color: var(--pst-color-background);
}
.pst-color-on-background {
color: var(--pst-color-on-background);
}
.pst-color-surface {
color: var(--pst-color-surface);
}
.pst-color-on-surface {
color: var(--pst-color-on-surface);
}
/* Adjust the height of the navbar */
.bd-header .bd-header__inner{
height: 52px; /* Adjust this value as needed */
}
.navbar-nav > li > a {
line-height: 52px; /* Vertically center the navbar links */
}
/* Make sure the navbar items align properly */
.navbar-nav {
display: flex;
}
.bd-header .navbar-header-items__start{
margin-left: 0rem
}
.bd-header button.primary-toggle {
margin-right: 0rem;
}
.bd-header ul.navbar-nav .dropdown .dropdown-menu {
overflow-y: auto; /* Enable vertical scrolling */
max-height: 80vh
}
.bd-sidebar-primary {
width: 22%; /* Adjust this value to your preference */
line-height: 1.4;
}
.bd-sidebar-secondary {
line-height: 1.4;
}
.toc-entry a.nav-link, .toc-entry a>code {
background-color: transparent;
border-color: transparent;
}
.bd-sidebar-primary code{
background-color: transparent;
border-color: transparent;
}
.toctree-wrapper li[class^=toctree-l1]>a {
font-size: 1.3em
}
.toctree-wrapper li[class^=toctree-l1] {
margin-bottom: 2em;
}
.toctree-wrapper li[class^=toctree-l]>ul {
margin-top: 0.5em;
font-size: 0.9em;
}
*, :after, :before {
font-style: normal;
}
div.deprecated {
margin-top: 0.5em;
margin-bottom: 2em;
}
.admonition-beta.admonition, div.admonition-beta.admonition {
border-color: var(--pst-color-warning);
margin-top:0.5em;
margin-bottom: 2em;
}
.admonition-beta>.admonition-title, div.admonition-beta>.admonition-title {
background-color: var(--pst-color-warning-bg);
}
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd {
margin-left: 1rem;
}
p {
font-size: 0.9rem;
margin-bottom: 0.5rem;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 777 B

View File

@@ -0,0 +1,11 @@
<svg width="72" height="19" viewBox="0 0 72 19" fill="none" xmlns="http://www.w3.org/2000/svg">
<g clip-path="url(#clip0_4019_2020)">
<path d="M29.4038 5.84477C30.1256 6.56657 30.1256 7.74117 29.4038 8.46296L27.7869 10.0538L27.7704 9.96259C27.6524 9.30879 27.3415 8.71552 26.8723 8.24627C26.5189 7.8936 26.1012 7.63282 25.6305 7.47143C25.3383 7.76508 25.1777 8.14989 25.1777 8.55487C25.1777 8.63706 25.1851 8.72224 25.2001 8.80742C25.4593 8.90082 25.6887 9.04503 25.8815 9.23781C26.6033 9.9596 26.6033 11.1342 25.8815 11.856L24.4738 13.2637C24.1129 13.6246 23.6392 13.8047 23.1647 13.8047C22.6902 13.8047 22.2165 13.6246 21.8556 13.2637C21.1338 12.5419 21.1338 11.3673 21.8556 10.6455L23.4725 9.05549L23.489 9.14665C23.6063 9.79896 23.9171 10.3922 24.3879 10.8622C24.742 11.2164 25.1343 11.4518 25.6043 11.6124L25.691 11.5257C25.954 11.2627 26.0982 10.913 26.0982 10.5402C26.0982 10.4572 26.0907 10.3743 26.0765 10.2929C25.8053 10.2032 25.5819 10.0754 25.3786 9.87218C25.0857 9.57928 24.9034 9.20493 24.8526 8.79024C24.8489 8.76035 24.8466 8.73121 24.8437 8.70132C24.8033 8.16109 24.9983 7.63357 25.3786 7.25399L26.7864 5.84627C27.1353 5.49733 27.6001 5.30455 28.0955 5.30455C28.5909 5.30455 29.0556 5.49658 29.4046 5.84627L29.4038 5.84477ZM36.7548 9.56583C36.7548 14.7163 32.5645 18.9058 27.4148 18.9058H9.34C4.1903 18.9058 0 14.7163 0 9.56583C0 4.41538 4.1903 0.22583 9.34 0.22583H27.4148C32.5652 0.22583 36.7548 4.41613 36.7548 9.56583ZM18 14.25C18.1472 14.0714 17.4673 13.5686 17.3283 13.384C17.0459 13.0777 17.0444 12.6368 16.8538 12.2789C16.3876 11.1985 15.8518 10.1262 15.1024 9.21166C14.3104 8.21116 13.333 7.38326 12.4745 6.44403C11.8371 5.78873 11.6668 4.85548 11.1041 4.15087C10.3285 3.00541 7.87624 2.69308 7.51683 4.31077C7.51833 4.36158 7.50264 4.39371 7.45855 4.42584C7.2598 4.57005 7.08271 4.73518 6.93402 4.93468C6.57013 5.44129 6.51409 6.30057 6.96839 6.75561C6.98333 6.51576 6.99155 6.28936 7.18134 6.1175C7.53252 6.41862 8.06304 6.52547 8.47026 6.30057C9.36989 7.585 9.14573 9.36184 9.86005 10.7457C10.0573 11.0729 10.2561 11.4069 10.5094 11.6939C10.7148 12.0137 11.4247 12.391 11.4665 12.6869C11.474 13.195 11.4142 13.7502 11.7475 14.1753C11.9044 14.4936 11.5188 14.8134 11.208 14.7738C10.8045 14.8291 10.3121 14.5026 9.95868 14.7036C9.8339 14.8388 9.58957 14.6894 9.48197 14.8769C9.44461 14.9741 9.24286 15.1108 9.36316 15.2042C9.49691 15.1026 9.62095 14.9965 9.80102 15.057C9.77412 15.2035 9.88994 15.2244 9.98184 15.267C9.97886 15.3663 9.92057 15.468 9.99679 15.5524C10.0857 15.4627 10.1388 15.3357 10.28 15.2983C10.7492 15.9238 11.2267 14.6655 12.2421 15.2318C12.0359 15.2214 11.8528 15.2475 11.7139 15.4172C11.6795 15.4553 11.6503 15.5001 11.7109 15.5494C12.2586 15.196 12.2556 15.6705 12.6112 15.5248C12.8847 15.382 13.1567 15.2035 13.4817 15.2543C13.1657 15.3454 13.153 15.5995 12.9677 15.8139C12.9363 15.8468 12.9213 15.8842 12.9579 15.9387C13.614 15.8834 13.6678 15.6652 14.1975 15.3977C14.5928 15.1564 14.9866 15.7414 15.3288 15.4082C15.4043 15.3357 15.5074 15.3604 15.6008 15.3507C15.4812 14.7133 14.1669 15.4672 14.1878 14.6124C14.6107 14.3247 14.5136 13.7741 14.542 13.3295C15.0284 13.5992 15.5694 13.7561 16.0461 14.0139C16.2867 14.4025 16.6641 14.9158 17.1669 14.8822C17.1804 14.8433 17.1923 14.8089 17.2065 14.7693C17.359 14.7955 17.5547 14.8964 17.6384 14.7036C17.8663 14.9419 18.201 14.93 18.4992 14.8687C18.7196 14.6894 18.0845 14.4338 17.9993 14.2493L18 14.25ZM31.3458 7.15387C31.3458 6.28413 31.0081 5.46744 30.3946 4.85399C29.7812 4.24054 28.9645 3.9028 28.094 3.9028C27.2235 3.9028 26.4068 4.24054 25.7933 4.85399L24.3856 6.26171C24.0569 6.59048 23.8073 6.97678 23.6436 7.40941L23.6339 7.43407L23.6085 7.44154C23.0974 7.5992 22.6469 7.86969 22.2696 8.24702L20.8618 9.65475C19.5938 10.9235 19.5938 12.9873 20.8618 14.2553C21.4753 14.8687 22.292 15.2064 23.1617 15.2064C24.0314 15.2064 24.8489 14.8687 25.4623 14.2553L26.8701 12.8475C27.1973 12.5203 27.4454 12.1355 27.609 11.7036L27.6188 11.6789L27.6442 11.6707C28.1463 11.5168 28.6095 11.2373 28.9854 10.8622L30.3931 9.4545C31.0066 8.84105 31.3443 8.02436 31.3443 7.15387H31.3458ZM12.8802 13.1972C12.7592 13.6695 12.7196 14.4742 12.1054 14.4974C12.0546 14.7701 12.2944 14.8724 12.5119 14.785C12.7278 14.6856 12.8302 14.8635 12.9026 15.0406C13.2359 15.0891 13.7291 14.9292 13.7477 14.5347C13.2501 14.2478 13.0962 13.7023 12.8795 13.1972H12.8802Z" fill="#F4F3FF"/>
<path d="M43.5142 15.2258L47.1462 3.70583H49.9702L53.6022 15.2258H51.6182L48.3222 4.88983H48.7542L45.4982 15.2258H43.5142ZM45.5382 12.7298V10.9298H51.5862V12.7298H45.5382ZM55.0486 15.2258V3.70583H59.8086C59.9206 3.70583 60.0646 3.71116 60.2406 3.72183C60.4166 3.72716 60.5792 3.74316 60.7286 3.76983C61.3952 3.87116 61.9446 4.0925 62.3766 4.43383C62.8139 4.77516 63.1366 5.20716 63.3446 5.72983C63.5579 6.24716 63.6646 6.82316 63.6646 7.45783C63.6646 8.08716 63.5579 8.66316 63.3446 9.18583C63.1312 9.70316 62.8059 10.1325 62.3686 10.4738C61.9366 10.8152 61.3899 11.0365 60.7286 11.1378C60.5792 11.1592 60.4139 11.1752 60.2326 11.1858C60.0566 11.1965 59.9152 11.2018 59.8086 11.2018H56.9766V15.2258H55.0486ZM56.9766 9.40183H59.7286C59.8352 9.40183 59.9552 9.3965 60.0886 9.38583C60.2219 9.37516 60.3446 9.35383 60.4566 9.32183C60.7766 9.24183 61.0272 9.1005 61.2086 8.89783C61.3952 8.69516 61.5259 8.46583 61.6006 8.20983C61.6806 7.95383 61.7206 7.70316 61.7206 7.45783C61.7206 7.2125 61.6806 6.96183 61.6006 6.70583C61.5259 6.4445 61.3952 6.2125 61.2086 6.00983C61.0272 5.80716 60.7766 5.66583 60.4566 5.58583C60.3446 5.55383 60.2219 5.53516 60.0886 5.52983C59.9552 5.51916 59.8352 5.51383 59.7286 5.51383H56.9766V9.40183ZM65.4273 15.2258V3.70583H67.3553V15.2258H65.4273Z" fill="#F4F3FF"/>
</g>
<defs>
<clipPath id="clip0_4019_2020">
<rect width="71.0711" height="18.68" fill="white" transform="translate(0 0.22583)"/>
</clipPath>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 5.7 KiB

View File

@@ -0,0 +1,11 @@
<svg width="72" height="20" viewBox="0 0 72 20" fill="none" xmlns="http://www.w3.org/2000/svg">
<g clip-path="url(#clip0_4019_689)">
<path d="M29.4038 5.97905C30.1256 6.70085 30.1256 7.87545 29.4038 8.59724L27.7869 10.188L27.7704 10.0969C27.6524 9.44307 27.3415 8.84979 26.8723 8.38055C26.5189 8.02787 26.1012 7.7671 25.6305 7.60571C25.3383 7.89936 25.1777 8.28416 25.1777 8.68915C25.1777 8.77134 25.1851 8.85652 25.2001 8.9417C25.4593 9.0351 25.6887 9.17931 25.8815 9.37209C26.6033 10.0939 26.6033 11.2685 25.8815 11.9903L24.4738 13.398C24.1129 13.7589 23.6392 13.939 23.1647 13.939C22.6902 13.939 22.2165 13.7589 21.8556 13.398C21.1338 12.6762 21.1338 11.5016 21.8556 10.7798L23.4725 9.18977L23.489 9.28093C23.6063 9.93323 23.9171 10.5265 24.3879 10.9965C24.742 11.3507 25.1343 11.586 25.6043 11.7467L25.691 11.66C25.954 11.397 26.0982 11.0473 26.0982 10.6745C26.0982 10.5915 26.0907 10.5086 26.0765 10.4271C25.8053 10.3375 25.5819 10.2097 25.3786 10.0065C25.0857 9.71356 24.9034 9.33921 24.8526 8.92451C24.8489 8.89463 24.8466 8.86549 24.8437 8.8356C24.8033 8.29537 24.9983 7.76785 25.3786 7.38827L26.7864 5.98055C27.1353 5.6316 27.6001 5.43883 28.0955 5.43883C28.5909 5.43883 29.0556 5.63086 29.4046 5.98055L29.4038 5.97905ZM36.7548 9.70011C36.7548 14.8506 32.5645 19.0401 27.4148 19.0401H9.34C4.1903 19.0401 0 14.8506 0 9.70011C0 4.54966 4.1903 0.360107 9.34 0.360107H27.4148C32.5652 0.360107 36.7548 4.55041 36.7548 9.70011ZM18 14.3843C18.1472 14.2057 17.4673 13.7029 17.3283 13.5183C17.0459 13.2119 17.0444 12.7711 16.8538 12.4132C16.3876 11.3327 15.8518 10.2605 15.1024 9.34594C14.3104 8.34543 13.333 7.51754 12.4745 6.57831C11.8371 5.92301 11.6668 4.98976 11.1041 4.28515C10.3285 3.13969 7.87624 2.82736 7.51683 4.44505C7.51833 4.49586 7.50264 4.52799 7.45855 4.56012C7.2598 4.70433 7.08271 4.86946 6.93402 5.06896C6.57013 5.57556 6.51409 6.43484 6.96839 6.88989C6.98333 6.65004 6.99155 6.42364 7.18134 6.25178C7.53252 6.5529 8.06304 6.65975 8.47026 6.43484C9.36989 7.71928 9.14573 9.49612 9.86005 10.8799C10.0573 11.2072 10.2561 11.5412 10.5094 11.8281C10.7148 12.1479 11.4247 12.5253 11.4665 12.8212C11.474 13.3293 11.4142 13.8844 11.7475 14.3096C11.9044 14.6279 11.5188 14.9477 11.208 14.9081C10.8045 14.9634 10.3121 14.6369 9.95868 14.8379C9.8339 14.9731 9.58957 14.8237 9.48197 15.0112C9.44461 15.1083 9.24286 15.2451 9.36316 15.3385C9.49691 15.2369 9.62095 15.1308 9.80102 15.1913C9.77412 15.3377 9.88994 15.3587 9.98184 15.4012C9.97886 15.5006 9.92057 15.6022 9.99679 15.6867C10.0857 15.597 10.1388 15.47 10.28 15.4326C10.7492 16.058 11.2267 14.7997 12.2421 15.3661C12.0359 15.3557 11.8528 15.3818 11.7139 15.5514C11.6795 15.5895 11.6503 15.6344 11.7109 15.6837C12.2586 15.3303 12.2556 15.8047 12.6112 15.659C12.8847 15.5163 13.1567 15.3377 13.4817 15.3885C13.1657 15.4797 13.153 15.7337 12.9677 15.9482C12.9363 15.9811 12.9213 16.0184 12.9579 16.073C13.614 16.0177 13.6678 15.7995 14.1975 15.532C14.5928 15.2907 14.9866 15.8757 15.3288 15.5425C15.4043 15.47 15.5074 15.4946 15.6008 15.4849C15.4812 14.8476 14.1669 15.6015 14.1878 14.7467C14.6107 14.459 14.5136 13.9083 14.542 13.4638C15.0284 13.7335 15.5694 13.8904 16.0461 14.1482C16.2867 14.5367 16.6641 15.0501 17.1669 15.0164C17.1804 14.9776 17.1923 14.9432 17.2065 14.9036C17.359 14.9298 17.5547 15.0306 17.6384 14.8379C17.8663 15.0762 18.201 15.0643 18.4992 15.003C18.7196 14.8237 18.0845 14.5681 17.9993 14.3836L18 14.3843ZM31.3458 7.28815C31.3458 6.41841 31.0081 5.60172 30.3946 4.98826C29.7812 4.37481 28.9645 4.03708 28.094 4.03708C27.2235 4.03708 26.4068 4.37481 25.7933 4.98826L24.3856 6.39599C24.0569 6.72476 23.8073 7.11106 23.6436 7.54369L23.6339 7.56835L23.6085 7.57582C23.0974 7.73348 22.6469 8.00396 22.2696 8.3813L20.8618 9.78902C19.5938 11.0578 19.5938 13.1215 20.8618 14.3895C21.4753 15.003 22.292 15.3407 23.1617 15.3407C24.0314 15.3407 24.8489 15.003 25.4623 14.3895L26.8701 12.9818C27.1973 12.6545 27.4454 12.2697 27.609 11.8378L27.6188 11.8132L27.6442 11.805C28.1463 11.651 28.6095 11.3716 28.9854 10.9965L30.3931 9.58878C31.0066 8.97532 31.3443 8.15863 31.3443 7.28815H31.3458ZM12.8802 13.3315C12.7592 13.8037 12.7196 14.6085 12.1054 14.6316C12.0546 14.9044 12.2944 15.0067 12.5119 14.9193C12.7278 14.8199 12.8302 14.9978 12.9026 15.1748C13.2359 15.2234 13.7291 15.0635 13.7477 14.669C13.2501 14.3821 13.0962 13.8366 12.8795 13.3315H12.8802Z" fill="#246161"/>
<path d="M43.5142 15.3601L47.1462 3.84011H49.9702L53.6022 15.3601H51.6182L48.3222 5.02411H48.7542L45.4982 15.3601H43.5142ZM45.5382 12.8641V11.0641H51.5862V12.8641H45.5382ZM55.0486 15.3601V3.84011H59.8086C59.9206 3.84011 60.0646 3.84544 60.2406 3.85611C60.4166 3.86144 60.5792 3.87744 60.7286 3.90411C61.3952 4.00544 61.9446 4.22677 62.3766 4.56811C62.8139 4.90944 63.1366 5.34144 63.3446 5.86411C63.5579 6.38144 63.6646 6.95744 63.6646 7.59211C63.6646 8.22144 63.5579 8.79744 63.3446 9.32011C63.1312 9.83744 62.8059 10.2668 62.3686 10.6081C61.9366 10.9494 61.3899 11.1708 60.7286 11.2721C60.5792 11.2934 60.4139 11.3094 60.2326 11.3201C60.0566 11.3308 59.9152 11.3361 59.8086 11.3361H56.9766V15.3601H55.0486ZM56.9766 9.53611H59.7286C59.8352 9.53611 59.9552 9.53077 60.0886 9.52011C60.2219 9.50944 60.3446 9.48811 60.4566 9.45611C60.7766 9.37611 61.0272 9.23477 61.2086 9.03211C61.3952 8.82944 61.5259 8.60011 61.6006 8.34411C61.6806 8.08811 61.7206 7.83744 61.7206 7.59211C61.7206 7.34677 61.6806 7.09611 61.6006 6.84011C61.5259 6.57877 61.3952 6.34677 61.2086 6.14411C61.0272 5.94144 60.7766 5.80011 60.4566 5.72011C60.3446 5.68811 60.2219 5.66944 60.0886 5.66411C59.9552 5.65344 59.8352 5.64811 59.7286 5.64811H56.9766V9.53611ZM65.4273 15.3601V3.84011H67.3553V15.3601H65.4273Z" fill="#246161"/>
</g>
<defs>
<clipPath id="clip0_4019_689">
<rect width="71.0711" height="18.68" fill="white" transform="translate(0 0.360107)"/>
</clipPath>
</defs>
</svg>

After

Width:  |  Height:  |  Size: 5.7 KiB

View File

@@ -15,6 +15,8 @@ from pathlib import Path
import toml
from docutils import nodes
from docutils.parsers.rst.directives.admonitions import BaseAdmonition
from docutils.statemachine import StringList
from sphinx.util.docutils import SphinxDirective
# If extensions (or modules to document with autodoc) are in another directory,
@@ -60,26 +62,41 @@ class ExampleLinksDirective(SphinxDirective):
item_node.append(para_node)
list_node.append(item_node)
if list_node.children:
title_node = nodes.title()
title_node = nodes.rubric()
title_node.append(nodes.Text(f"Examples using {class_or_func_name}"))
return [title_node, list_node]
return [list_node]
class Beta(BaseAdmonition):
required_arguments = 0
node_class = nodes.admonition
def run(self):
self.content = self.content or StringList(
[
(
"This feature is in beta. It is actively being worked on, so the "
"API may change."
)
]
)
self.arguments = self.arguments or ["Beta"]
return super().run()
def setup(app):
app.add_directive("example_links", ExampleLinksDirective)
app.add_directive("beta", Beta)
# -- Project information -----------------------------------------------------
project = "🦜🔗 LangChain"
copyright = "2023, LangChain, Inc."
author = "LangChain, Inc."
copyright = "2023, LangChain Inc"
author = "LangChain, Inc"
version = data["tool"]["poetry"]["version"]
release = version
html_title = project + " " + version
html_favicon = "_static/img/brand/favicon.png"
html_last_updated_fmt = "%b %d, %Y"
@@ -95,11 +112,13 @@ extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"sphinx_copybutton",
"sphinx_panels",
"IPython.sphinxext.ipython_console_highlighting",
"myst_parser",
"_extensions.gallery_directive",
"sphinx_design",
"sphinx_copybutton",
]
source_suffix = [".rst"]
source_suffix = [".rst", ".md"]
# some autodoc pydantic options are repeated in the actual template.
# potentially user error, but there may be bugs in the sphinx extension
@@ -131,23 +150,84 @@ exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "scikit-learn-modern"
html_theme_path = ["themes"]
# The theme to use for HTML and HTML Help pages.
html_theme = "pydata_sphinx_theme"
# redirects dictionary maps from old links to new links
html_additional_pages = {}
redirects = {
"index": "langchain_api_reference",
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
# # -- General configuration ------------------------------------------------
"sidebar_includehidden": True,
"use_edit_page_button": False,
# # "analytics": {
# # "plausible_analytics_domain": "scikit-learn.org",
# # "plausible_analytics_url": "https://views.scientific-python.org/js/script.js",
# # },
# # If "prev-next" is included in article_footer_items, then setting show_prev_next
# # to True would repeat prev and next links. See
# # https://github.com/pydata/pydata-sphinx-theme/blob/b731dc230bc26a3d1d1bb039c56c977a9b3d25d8/src/pydata_sphinx_theme/theme/pydata_sphinx_theme/layout.html#L118-L129
"show_prev_next": False,
"search_bar_text": "Search",
"navigation_with_keys": True,
"collapse_navigation": True,
"navigation_depth": 3,
"show_nav_level": 1,
"show_toc_level": 3,
"navbar_align": "left",
"header_links_before_dropdown": 5,
"header_dropdown_text": "Integrations",
"logo": {
"image_light": "_static/wordmark-api.svg",
"image_dark": "_static/wordmark-api-dark.svg",
},
"surface_warnings": True,
# # -- Template placement in theme layouts ----------------------------------
"navbar_start": ["navbar-logo"],
# # Note that the alignment of navbar_center is controlled by navbar_align
"navbar_center": ["navbar-nav"],
"navbar_end": ["langchain_docs", "theme-switcher", "navbar-icon-links"],
# # navbar_persistent is persistent right (even when on mobiles)
"navbar_persistent": ["search-field"],
"article_header_start": ["breadcrumbs"],
"article_header_end": [],
"article_footer_items": [],
"content_footer_items": [],
# # Use html_sidebars that map page patterns to list of sidebar templates
# "primary_sidebar_end": [],
"footer_start": ["copyright"],
"footer_center": [],
"footer_end": [],
# # When specified as a dictionary, the keys should follow glob-style patterns, as in
# # https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-exclude_patterns
# # In particular, "**" specifies the default for all pages
# # Use :html_theme.sidebar_secondary.remove: for file-wide removal
# "secondary_sidebar_items": {"**": ["page-toc", "sourcelink"]},
# "show_version_warning_banner": True,
# "announcement": None,
"icon_links": [
{
# Label for this link
"name": "GitHub",
# URL where the link will redirect
"url": "https://github.com/langchain-ai/langchain", # required
# Icon class (if "type": "fontawesome"), or path to local image (if "type": "local")
"icon": "fa-brands fa-square-github",
# The type of image to be used (see below for details)
"type": "fontawesome",
},
{
"name": "X / Twitter",
"url": "https://twitter.com/langchainai",
"icon": "fab fa-twitter-square",
},
],
"icon_links_label": "Quick Links",
"external_links": [
{"name": "Legacy reference", "url": "https://api.python.langchain.com/"},
],
}
for old_link in redirects:
html_additional_pages[old_link] = "redirects.html"
partners_dir = Path(__file__).parent.parent.parent / "libs/partners"
partners = [
(p.name, p.name.replace("-", "_") + "_api_reference")
for p in partners_dir.iterdir()
]
partners = sorted(partners)
html_context = {
"display_github": True, # Integrate GitHub
@@ -155,8 +235,6 @@ html_context = {
"github_repo": "langchain", # Repo name
"github_version": "master", # Version
"conf_py_path": "/docs/api_reference", # Path in the checkout to the docs root
"redirects": redirects,
"partners": partners,
}
# Add any paths that contain custom static files (such as style sheets) here,
@@ -166,9 +244,7 @@ html_static_path = ["_static"]
# These paths are either relative to html_static_path
# or fully qualified paths (e.g. https://...)
html_css_files = [
"css/custom.css",
]
html_css_files = ["css/custom.css"]
html_use_index = False
myst_enable_extensions = ["colon_fence"]
@@ -185,3 +261,5 @@ html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "")
# Tell Jinja2 templates the build is running on Read the Docs
if os.environ.get("READTHEDOCS", "") == "True":
html_context["READTHEDOCS"] = True
master_doc = "index"

View File

@@ -38,6 +38,8 @@ class ClassInfo(TypedDict):
"""The kind of the class."""
is_public: bool
"""Whether the class is public or not."""
is_deprecated: bool
"""Whether the class is deprecated."""
class FunctionInfo(TypedDict):
@@ -49,6 +51,8 @@ class FunctionInfo(TypedDict):
"""The fully qualified name of the function."""
is_public: bool
"""Whether the function is public or not."""
is_deprecated: bool
"""Whether the function is deprecated."""
class ModuleMembers(TypedDict):
@@ -121,6 +125,7 @@ def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
qualified_name=f"{namespace}.{name}",
kind=kind,
is_public=not name.startswith("_"),
is_deprecated=".. deprecated::" in (type_.__doc__ or ""),
)
)
elif inspect.isfunction(type_):
@@ -129,6 +134,7 @@ def _load_module_members(module_path: str, namespace: str) -> ModuleMembers:
name=name,
qualified_name=f"{namespace}.{name}",
is_public=not name.startswith("_"),
is_deprecated=".. deprecated::" in (type_.__doc__ or ""),
)
)
else:
@@ -233,7 +239,7 @@ def _construct_doc(
package_namespace: str,
members_by_namespace: Dict[str, ModuleMembers],
package_version: str,
) -> str:
) -> List[typing.Tuple[str, str]]:
"""Construct the contents of the reference.rst file for the given package.
Args:
@@ -245,23 +251,62 @@ def _construct_doc(
Returns:
The contents of the reference.rst file.
"""
full_doc = f"""\
=======================
``{package_namespace}`` {package_version}
=======================
docs = []
index_doc = f"""\
:html_theme.sidebar_secondary.remove:
.. currentmodule:: {package_namespace}
.. _{package_namespace}:
======================================
{package_namespace.replace('_', '-')}: {package_version}
======================================
.. automodule:: {package_namespace}
:no-members:
:no-inherited-members:
.. toctree::
:hidden:
:maxdepth: 2
"""
index_autosummary = """
"""
namespaces = sorted(members_by_namespace)
for module in namespaces:
index_doc += f" {module}\n"
module_doc = f"""\
.. currentmodule:: {package_namespace}
.. _{package_namespace}_{module}:
"""
_members = members_by_namespace[module]
classes = [el for el in _members["classes_"] if el["is_public"]]
functions = [el for el in _members["functions"] if el["is_public"]]
classes = [
el
for el in _members["classes_"]
if el["is_public"] and not el["is_deprecated"]
]
functions = [
el
for el in _members["functions"]
if el["is_public"] and not el["is_deprecated"]
]
deprecated_classes = [
el for el in _members["classes_"] if el["is_public"] and el["is_deprecated"]
]
deprecated_functions = [
el
for el in _members["functions"]
if el["is_public"] and el["is_deprecated"]
]
if not (classes or functions):
continue
section = f":mod:`{package_namespace}.{module}`"
section = f":mod:`{module}`"
underline = "=" * (len(section) + 1)
full_doc += f"""\
module_doc += f"""
{section}
{underline}
@@ -269,16 +314,26 @@ def _construct_doc(
:no-members:
:no-inherited-members:
"""
index_autosummary += f"""
:ref:`{package_namespace}_{module}`
{'^' * (len(module) + 5)}
"""
if classes:
full_doc += f"""\
Classes
--------------
module_doc += f"""\
**Classes**
.. currentmodule:: {package_namespace}
.. autosummary::
:toctree: {module}
"""
index_autosummary += """
**Classes**
.. autosummary::
"""
for class_ in sorted(classes, key=lambda c: c["qualified_name"]):
@@ -295,19 +350,22 @@ Classes
else:
template = "class.rst"
full_doc += f"""\
module_doc += f"""\
:template: {template}
{class_["qualified_name"]}
"""
index_autosummary += f"""
{class_['qualified_name']}
"""
if functions:
_functions = [f["qualified_name"] for f in functions]
fstring = "\n ".join(sorted(_functions))
full_doc += f"""\
Functions
--------------
module_doc += f"""\
**Functions**
.. currentmodule:: {package_namespace}
.. autosummary::
@@ -317,7 +375,81 @@ Functions
{fstring}
"""
return full_doc
index_autosummary += f"""
**Functions**
.. autosummary::
{fstring}
"""
if deprecated_classes:
module_doc += f"""\
**Deprecated classes**
.. currentmodule:: {package_namespace}
.. autosummary::
:toctree: {module}
"""
index_autosummary += """
**Deprecated classes**
.. autosummary::
"""
for class_ in sorted(deprecated_classes, key=lambda c: c["qualified_name"]):
if class_["kind"] == "TypedDict":
template = "typeddict.rst"
elif class_["kind"] == "enum":
template = "enum.rst"
elif class_["kind"] == "Pydantic":
template = "pydantic.rst"
elif class_["kind"] == "RunnablePydantic":
template = "runnable_pydantic.rst"
elif class_["kind"] == "RunnableNonPydantic":
template = "runnable_non_pydantic.rst"
else:
template = "class.rst"
module_doc += f"""\
:template: {template}
{class_["qualified_name"]}
"""
index_autosummary += f"""
{class_['qualified_name']}
"""
if deprecated_functions:
_functions = [f["qualified_name"] for f in deprecated_functions]
fstring = "\n ".join(sorted(_functions))
module_doc += f"""\
**Deprecated functions**
.. currentmodule:: {package_namespace}
.. autosummary::
:toctree: {module}
:template: function.rst
{fstring}
"""
index_autosummary += f"""
**Deprecated functions**
.. autosummary::
{fstring}
"""
docs.append((f"{module}.rst", module_doc))
docs.append(("index.rst", index_doc + index_autosummary))
return docs
def _build_rst_file(package_name: str = "langchain") -> None:
@@ -329,13 +461,14 @@ def _build_rst_file(package_name: str = "langchain") -> None:
package_dir = _package_dir(package_name)
package_members = _load_package_modules(package_dir)
package_version = _get_package_version(package_dir)
with open(_out_file_path(package_name), "w") as f:
f.write(
_doc_first_line(package_name)
+ _construct_doc(
_package_namespace(package_name), package_members, package_version
)
)
output_dir = _out_file_path(package_name)
os.mkdir(output_dir)
rsts = _construct_doc(
_package_namespace(package_name), package_members, package_version
)
for name, rst in rsts:
with open(output_dir / name, "w") as f:
f.write(rst)
def _package_namespace(package_name: str) -> str:
@@ -385,12 +518,117 @@ def _get_package_version(package_dir: Path) -> str:
def _out_file_path(package_name: str) -> Path:
"""Return the path to the file containing the documentation."""
return HERE / f"{package_name.replace('-', '_')}_api_reference.rst"
return HERE / f"{package_name.replace('-', '_')}"
def _doc_first_line(package_name: str) -> str:
"""Return the path to the file containing the documentation."""
return f".. {package_name.replace('-', '_')}_api_reference:\n\n"
def _build_index(dirs: List[str]) -> None:
custom_names = {
"airbyte": "Airbyte",
"aws": "AWS",
"ai21": "AI21",
}
ordered = ["core", "langchain", "text-splitters", "community", "experimental"]
main_ = [dir_ for dir_ in ordered if dir_ in dirs]
integrations = sorted(dir_ for dir_ in dirs if dir_ not in main_)
main_headers = [
" ".join(custom_names.get(x, x.title()) for x in dir_.split("-"))
for dir_ in main_
]
integration_headers = [
" ".join(
custom_names.get(x, x.title().replace("ai", "AI").replace("db", "DB"))
for x in dir_.split("-")
)
for dir_ in integrations
]
main_tree = "\n".join(
f"{header_name}<{dir_.replace('-', '_')}/index>"
for header_name, dir_ in zip(main_headers, main_)
)
main_grid = "\n".join(
f'- header: "**{header_name}**"\n content: "{_package_namespace(dir_).replace("_", "-")}: {_get_package_version(_package_dir(dir_))}"\n link: {dir_.replace("-", "_")}/index.html'
for header_name, dir_ in zip(main_headers, main_)
)
integration_tree = "\n".join(
f"{header_name}<{dir_.replace('-', '_')}/index>"
for header_name, dir_ in zip(integration_headers, integrations)
)
integration_grid = ""
integrations_to_show = [
"openai",
"anthropic",
"google-vertexai",
"aws",
"huggingface",
"mistralai",
]
for header_name, dir_ in sorted(
zip(integration_headers, integrations),
key=lambda h_d: integrations_to_show.index(h_d[1])
if h_d[1] in integrations_to_show
else len(integrations_to_show),
)[: len(integrations_to_show)]:
integration_grid += f'\n- header: "**{header_name}**"\n content: {_package_namespace(dir_).replace("_", "-")} {_get_package_version(_package_dir(dir_))}\n link: {dir_.replace("-", "_")}/index.html'
doc = f"""# LangChain Python API Reference
Welcome to the LangChain Python API reference. This is a reference for all
`langchain-x` packages.
For user guides see [https://python.langchain.com](https://python.langchain.com).
For the legacy API reference hosted on ReadTheDocs see [https://api.python.langchain.com/](https://api.python.langchain.com/).
## Base packages
```{{gallery-grid}}
:grid-columns: "1 2 2 3"
{main_grid}
```
```{{toctree}}
:maxdepth: 2
:hidden:
:caption: Base packages
{main_tree}
```
## Integrations
```{{gallery-grid}}
:grid-columns: "1 2 2 3"
{integration_grid}
```
See the full list of integrations in the Section Navigation.
```{{toctree}}
:maxdepth: 2
:hidden:
:caption: Integrations
{integration_tree}
```
"""
with open(HERE / "reference.md", "w") as f:
f.write(doc)
dummy_index = """\
# API reference
```{toctree}
:maxdepth: 3
:hidden:
Reference<reference>
```
"""
with open(HERE / "index.md", "w") as f:
f.write(dummy_index)
def main(dirs: Optional[list] = None) -> None:
@@ -418,6 +656,8 @@ def main(dirs: Optional[list] = None) -> None:
else:
print("Building package:", dir_)
_build_rst_file(package_name=dir_)
_build_index(dirs)
print("API reference files built.")

View File

@@ -1,8 +0,0 @@
=============
LangChain API
=============
.. toctree::
:maxdepth: 2
api_reference.rst

View File

@@ -1,17 +1,11 @@
-e libs/experimental
-e libs/langchain
-e libs/core
-e libs/community
pydantic<2
autodoc_pydantic==1.8.0
myst_parser
nbsphinx==0.8.9
sphinx>=5
sphinx-autobuild==2021.3.14
sphinx_rtd_theme==1.0.0
sphinx-typlog-theme==0.8.0
sphinx-panels
toml
myst_nb
sphinx_copybutton
pydata-sphinx-theme==0.13.1
autodoc_pydantic>=1,<2
sphinx<=7
myst-parser>=3
sphinx-autobuild>=2024
pydata-sphinx-theme>=0.15
toml>=0.10.2
myst-nb>=1.1.1
pyyaml
sphinx-design
sphinx-copybutton
beautifulsoup4

View File

@@ -0,0 +1,41 @@
import sys
from glob import glob
from pathlib import Path
from bs4 import BeautifulSoup
CUR_DIR = Path(__file__).parents[1]
def process_toc_h3_elements(html_content: str) -> str:
"""Update Class.method() TOC headers to just method()."""
# Create a BeautifulSoup object
soup = BeautifulSoup(html_content, "html.parser")
# Find all <li> elements with class "toc-h3"
toc_h3_elements = soup.find_all("li", class_="toc-h3")
# Process each element
for element in toc_h3_elements:
element = element.a.code.span
# Get the text content of the element
content = element.get_text()
# Apply the regex substitution
modified_content = content.split(".")[-1]
# Update the element's content
element.string = modified_content
# Return the modified HTML
return str(soup)
if __name__ == "__main__":
dir = sys.argv[1]
for fn in glob(str(f"{dir.rstrip('/')}/**/*.html"), recursive=True):
with open(fn, "r") as f:
html = f.read()
processed_html = process_toc_h3_elements(html)
with open(fn, "w") as f:
f.write(processed_html)

View File

@@ -1,4 +1,4 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. currentmodule:: {{ module }}
@@ -11,7 +11,7 @@
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
~{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
@@ -22,11 +22,11 @@
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
~{{ item }}
{%- endfor %}
{% for item in methods %}
.. automethod:: {{ name }}.{{ item }}
.. automethod:: {{ item }}
{%- endfor %}
{% endif %}

View File

@@ -1,4 +1,4 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. currentmodule:: {{ module }}

View File

@@ -1,4 +1,4 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. currentmodule:: {{ module }}

View File

@@ -0,0 +1,12 @@
<!-- This will display a link to LangChain docs -->
<head>
<style>
.text-link {
text-decoration: none; /* Remove underline */
color: inherit; /* Inherit color from parent element */
}
</style>
</head>
<body>
<a href="https://python.langchain.com/" class='text-link'>Docs</a>
</body>

View File

@@ -1,4 +1,4 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. currentmodule:: {{ module }}

View File

@@ -1,21 +1,21 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
~{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
@@ -26,11 +26,11 @@
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
~{{ item }}
{%- endfor %}
{% for item in methods %}
.. automethod:: {{ name }}.{{ item }}
.. automethod:: {{ item }}
{%- endfor %}
{% endif %}

View File

@@ -1,10 +1,6 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
.. currentmodule:: {{ module }}
.. autopydantic_model:: {{ objname }}
@@ -19,6 +15,10 @@
:member-order: groupwise
:show-inheritance: True
:special-members: __call__
:exclude-members: construct, copy, dict, from_orm, parse_file, parse_obj, parse_raw, schema, schema_json, update_forward_refs, validate, json, is_lc_serializable, to_json_not_implemented, lc_secrets, lc_attributes, lc_id, get_lc_namespace, astream_log, transform, atransform, get_output_schema, get_prompts, config_schema, map, pick, pipe, with_listeners, with_alisteners, with_config, with_fallbacks, with_types, with_retry, InputType, OutputType, config_specs, output_schema, get_input_schema, get_graph, get_name, input_schema, name, bind, assign
:exclude-members: construct, copy, dict, from_orm, parse_file, parse_obj, parse_raw, schema, schema_json, update_forward_refs, validate, json, is_lc_serializable, to_json_not_implemented, lc_secrets, lc_attributes, lc_id, get_lc_namespace, astream_log, transform, atransform, get_output_schema, get_prompts, config_schema, map, pick, pipe, with_listeners, with_alisteners, with_config, with_fallbacks, with_types, with_retry, InputType, OutputType, config_specs, output_schema, get_input_schema, get_graph, get_name, input_schema, name, bind, assign, as_tool
.. NOTE:: {{objname}} implements the standard :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>`. 🏃
The :py:class:`Runnable Interface <langchain_core.runnables.base.Runnable>` has additional methods that are available on runnables, such as :py:meth:`with_types <langchain_core.runnables.base.Runnable.with_types>`, :py:meth:`with_retry <langchain_core.runnables.base.Runnable.with_retry>`, :py:meth:`assign <langchain_core.runnables.base.Runnable.assign>`, :py:meth:`bind <langchain_core.runnables.base.Runnable.bind>`, :py:meth:`get_graph <langchain_core.runnables.base.Runnable.get_graph>`, and more.
.. example_links:: {{ objname }}

View File

@@ -1,4 +1,4 @@
:mod:`{{module}}`.{{objname}}
{{ objname }}
{{ underline }}==============
.. currentmodule:: {{ module }}

View File

@@ -1,27 +0,0 @@
Copyright (c) 2007-2023 The scikit-learn developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -1,67 +0,0 @@
<script>
$(document).ready(function() {
/* Add a [>>>] button on the top-right corner of code samples to hide
* the >>> and ... prompts and the output and thus make the code
* copyable. */
var div = $('.highlight-python .highlight,' +
'.highlight-python3 .highlight,' +
'.highlight-pycon .highlight,' +
'.highlight-default .highlight')
var pre = div.find('pre');
// get the styles from the current theme
pre.parent().parent().css('position', 'relative');
var hide_text = 'Hide prompts and outputs';
var show_text = 'Show prompts and outputs';
// create and add the button to all the code blocks that contain >>>
div.each(function(index) {
var jthis = $(this);
if (jthis.find('.gp').length > 0) {
var button = $('<span class="copybutton">&gt;&gt;&gt;</span>');
button.attr('title', hide_text);
button.data('hidden', 'false');
jthis.prepend(button);
}
// tracebacks (.gt) contain bare text elements that need to be
// wrapped in a span to work with .nextUntil() (see later)
jthis.find('pre:has(.gt)').contents().filter(function() {
return ((this.nodeType == 3) && (this.data.trim().length > 0));
}).wrap('<span>');
});
// define the behavior of the button when it's clicked
$('.copybutton').click(function(e){
e.preventDefault();
var button = $(this);
if (button.data('hidden') === 'false') {
// hide the code output
button.parent().find('.go, .gp, .gt').hide();
button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'hidden');
button.css('text-decoration', 'line-through');
button.attr('title', show_text);
button.data('hidden', 'true');
} else {
// show the code output
button.parent().find('.go, .gp, .gt').show();
button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'visible');
button.css('text-decoration', 'none');
button.attr('title', hide_text);
button.data('hidden', 'false');
}
});
/*** Add permalink buttons next to glossary terms ***/
$('dl.glossary > dt[id]').append(function() {
return ('<a class="headerlink" href="#' +
this.getAttribute('id') +
'" title="Permalink to this term">¶</a>');
});
});
</script>
{%- if pagename != 'index' and pagename != 'documentation' %}
{% if theme_mathjax_path %}
<script id="MathJax-script" async src="{{ theme_mathjax_path }}"></script>
{% endif %}
{%- endif %}

View File

@@ -1,132 +0,0 @@
{# TEMPLATE VAR SETTINGS #}
{%- set url_root = pathto('', 1) %}
{%- if url_root == '#' %}{% set url_root = '' %}{% endif %}
{%- if not embedded and docstitle %}
{%- set titlesuffix = " &mdash; "|safe + docstitle|e %}
{%- else %}
{%- set titlesuffix = "" %}
{%- endif %}
{%- set lang_attr = 'en' %}
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="{{ lang_attr }}" > <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="{{ lang_attr }}"> <!--<![endif]-->
<head>
<meta charset="utf-8">
{{ metatags }}
<meta name="viewport" content="width=device-width, initial-scale=1.0">
{% block htmltitle %}
<title>{{ title|striptags|e }}{{ titlesuffix }}</title>
{% endblock %}
<link rel="canonical"
href="https://api.python.langchain.com/en/latest/{{ pagename }}.html"/>
{% if favicon_url %}
<link rel="shortcut icon" href="{{ favicon_url|e }}"/>
{% endif %}
<link rel="stylesheet"
href="{{ pathto('_static/css/vendor/bootstrap.min.css', 1) }}"
type="text/css"/>
{%- for css in css_files %}
{%- if css|attr("rel") %}
<link rel="{{ css.rel }}" href="{{ pathto(css.filename, 1) }}"
type="text/css"{% if css.title is not none %}
title="{{ css.title }}"{% endif %} />
{%- else %}
<link rel="stylesheet" href="{{ pathto(css, 1) }}" type="text/css"/>
{%- endif %}
{%- endfor %}
<link rel="stylesheet" href="{{ pathto('_static/' + style, 1) }}" type="text/css"/>
<script id="documentation_options" data-url_root="{{ pathto('', 1) }}"
src="{{ pathto('_static/documentation_options.js', 1) }}"></script>
<script src="{{ pathto('_static/jquery.js', 1) }}"></script>
{%- block extrahead %} {% endblock %}
</head>
<body>
{% include "nav.html" %}
{%- block content %}
<div class="d-flex" id="sk-doc-wrapper">
<input type="checkbox" name="sk-toggle-checkbox" id="sk-toggle-checkbox">
<label id="sk-sidemenu-toggle" class="sk-btn-toggle-toc btn sk-btn-primary"
for="sk-toggle-checkbox">Toggle Menu</label>
<div id="sk-sidebar-wrapper" class="border-right">
<div class="sk-sidebar-toc-wrapper">
{%- if meta and meta['parenttoc']|tobool %}
<div class="sk-sidebar-toc">
{% set nav = get_nav_object(maxdepth=3, collapse=True, numbered=True) %}
<ul>
{% for main_nav_item in nav %}
{% if main_nav_item.active %}
<li>
<a href="{{ main_nav_item.url }}"
class="sk-toc-active">{{ main_nav_item.title }}</a>
</li>
<ul>
{% for nav_item in main_nav_item.children %}
<li>
<a href="{{ nav_item.url }}"
class="{% if nav_item.active %}sk-toc-active{% endif %}">{{ nav_item.title }}</a>
{% if nav_item.children %}
<ul>
{% for inner_child in nav_item.children %}
<li class="sk-toctree-l3">
<a href="{{ inner_child.url }}">{{ inner_child.title }}</a>
</li>
{% endfor %}
</ul>
{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
{% endfor %}
</ul>
</div>
{%- elif meta and meta['globalsidebartoc']|tobool %}
<div class="sk-sidebar-toc sk-sidebar-global-toc">
{{ toctree(maxdepth=2, titles_only=True) }}
</div>
{%- else %}
<div class="sk-sidebar-toc">
{{ toc }}
</div>
{%- endif %}
</div>
</div>
<div id="sk-page-content-wrapper">
<div class="sk-page-content container-fluid body px-md-3" role="main">
{% block body %}{% endblock %}
</div>
<div class="container">
<footer class="sk-content-footer">
{%- if pagename != 'index' %}
{%- if show_copyright %}
{%- if hasdoc('copyright') %}
{% trans path=pathto('copyright'), copyright=copyright|e %}
&copy; {{ copyright }}.{% endtrans %}
{%- else %}
{% trans copyright=copyright|e %}&copy; {{ copyright }}
.{% endtrans %}
{%- endif %}
{%- endif %}
{%- if last_updated %}
{% trans last_updated=last_updated|e %}Last updated
on {{ last_updated }}.{% endtrans %}
{%- endif %}
{%- if show_source and has_source and sourcename %}
<a href="{{ pathto('_sources/' + sourcename, true)|e }}"
rel="nofollow">{{ _('Show this page source') }}</a>
{%- endif %}
{%- endif %}
</footer>
</div>
</div>
</div>
{%- endblock %}
<script src="{{ pathto('_static/js/vendor/bootstrap.min.js', 1) }}"></script>
{% include "javascript.html" %}
</body>
</html>

View File

@@ -1,78 +0,0 @@
{%- if pagename != 'index' and pagename != 'documentation' %}
{%- set nav_bar_class = "sk-docs-navbar" %}
{%- set top_container_cls = "sk-docs-container" %}
{%- else %}
{%- set nav_bar_class = "sk-landing-navbar" %}
{%- set top_container_cls = "sk-landing-container" %}
{%- endif %}
<nav id="navbar" class="{{ nav_bar_class }} navbar navbar-expand-md navbar-light bg-light py-0">
<div class="container-fluid {{ top_container_cls }} px-0">
{%- if logo_url %}
<a class="navbar-brand py-0" href="{{ pathto('index') }}">
<img
class="sk-brand-img"
src="{{ logo_url|e }}"
alt="logo"/>
</a>
{%- endif %}
<button
id="sk-navbar-toggler"
class="navbar-toggler"
type="button"
data-toggle="collapse"
data-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent"
aria-expanded="false"
aria-label="Toggle navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="sk-navbar-collapse collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('langchain_api_reference') }}">LangChain</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('core_api_reference') }}">Core</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('community_api_reference') }}">Community</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('experimental_api_reference') }}">Experimental</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('text_splitters_api_reference') }}">Text splitters</a>
</li>
{%- for title, pathname in partners %}
<li class="nav-item">
<a class="sk-nav-link nav-link nav-more-item-mobile-items" href="{{ pathto(pathname) }}">{{ title }}</a>
</li>
{%- endfor %}
<li class="nav-item dropdown nav-more-item-dropdown">
<a class="sk-nav-link nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Partner libs</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
{%- for title, pathname in partners %}
<a class="sk-nav-dropdown-item dropdown-item" href="{{ pathto(pathname) }}">{{ title }}</a>
{%- endfor %}
</div>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" target="_blank" rel="noopener noreferrer" href="https://python.langchain.com/">Docs</a>
</li>
</ul>
{%- if pagename != "search"%}
<div id="searchbox" role="search">
<div class="searchformwrapper">
<form class="search" action="{{ pathto('search') }}" method="get">
<input class="sk-search-text-input" type="text" name="q" aria-labelledby="searchlabel" />
<input class="sk-search-text-btn" type="submit" value="{{ _('Go') }}" />
</form>
</div>
</div>
{%- endif %}
</div>
</div>
</nav>

View File

@@ -1,16 +0,0 @@
{%- extends "basic/search.html" %}
{% block extrahead %}
<script type="text/javascript" src="{{ pathto('_static/underscore.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('searchindex.js', 1) }}" defer></script>
<script type="text/javascript" src="{{ pathto('_static/doctools.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('_static/language_data.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('_static/searchtools.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('_static/sphinx_highlight.js', 1) }}"></script>
<script type="text/javascript">
$(document).ready(function() {
if (!Search.out) {
Search.init();
}
});
</script>
{% endblock %}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,8 +0,0 @@
[theme]
inherit = basic
pygments_style = default
stylesheet = css/theme.css
[options]
link_to_live_contributing_page = false
mathjax_path =

View File

@@ -4,8 +4,11 @@ LangChain implements the latest research in the field of Natural Language Proces
This page contains `arXiv` papers referenced in the LangChain Documentation, API Reference,
Templates, and Cookbooks.
From the opposite direction, scientists use LangChain in research and reference LangChain in the research papers.
Here you find [such papers](https://arxiv.org/search/?query=langchain&searchtype=all&source=header).
From the opposite direction, scientists use `LangChain` in research and reference it in the research papers.
Here you find papers that reference:
- [LangChain](https://arxiv.org/search/?query=langchain&searchtype=all&source=header)
- [LangGraph](https://arxiv.org/search/?query=langgraph&searchtype=all&source=header)
- [LangSmith](https://arxiv.org/search/?query=langsmith&searchtype=all&source=header)
## Summary
@@ -23,32 +26,30 @@ Here you find [such papers](https://arxiv.org/search/?query=langchain&searchtype
| `2305.14283v3` [Query Rewriting for Retrieval-Augmented Large Language Models](http://arxiv.org/abs/2305.14283v3) | Xinbei Ma, Yeyun Gong, Pengcheng He, et al. | 2023-05-23 | `Template:` [rewrite-retrieve-read](https://python.langchain.com/docs/templates/rewrite-retrieve-read), `Cookbook:` [rewrite](https://github.com/langchain-ai/langchain/blob/master/cookbook/rewrite.ipynb)
| `2305.08291v1` [Large Language Model Guided Tree-of-Thought](http://arxiv.org/abs/2305.08291v1) | Jieyi Long | 2023-05-15 | `API:` [langchain_experimental.tot](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.tot), `Cookbook:` [tree_of_thought](https://github.com/langchain-ai/langchain/blob/master/cookbook/tree_of_thought.ipynb)
| `2305.04091v3` [Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models](http://arxiv.org/abs/2305.04091v3) | Lei Wang, Wanyu Xu, Yihuai Lan, et al. | 2023-05-06 | `Cookbook:` [plan_and_execute_agent](https://github.com/langchain-ai/langchain/blob/master/cookbook/plan_and_execute_agent.ipynb)
| `2305.02156v1` [Zero-Shot Listwise Document Reranking with a Large Language Model](http://arxiv.org/abs/2305.02156v1) | Xueguang Ma, Xinyu Zhang, Ronak Pradeep, et al. | 2023-05-03 | `API:` [langchain...LLMListwiseRerank](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.listwise_rerank.LLMListwiseRerank.html#langchain.retrievers.document_compressors.listwise_rerank.LLMListwiseRerank)
| `2304.08485v2` [Visual Instruction Tuning](http://arxiv.org/abs/2304.08485v2) | Haotian Liu, Chunyuan Li, Qingyang Wu, et al. | 2023-04-17 | `Cookbook:` [Semi_structured_and_multi_modal_RAG](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb), [Semi_structured_multi_modal_RAG_LLaMA2](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb)
| `2304.03442v2` [Generative Agents: Interactive Simulacra of Human Behavior](http://arxiv.org/abs/2304.03442v2) | Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, et al. | 2023-04-07 | `Cookbook:` [multiagent_bidding](https://github.com/langchain-ai/langchain/blob/master/cookbook/multiagent_bidding.ipynb), [generative_agents_interactive_simulacra_of_human_behavior](https://github.com/langchain-ai/langchain/blob/master/cookbook/generative_agents_interactive_simulacra_of_human_behavior.ipynb)
| `2303.17760v2` [CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society](http://arxiv.org/abs/2303.17760v2) | Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, et al. | 2023-03-31 | `Cookbook:` [camel_role_playing](https://github.com/langchain-ai/langchain/blob/master/cookbook/camel_role_playing.ipynb)
| `2303.17580v4` [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](http://arxiv.org/abs/2303.17580v4) | Yongliang Shen, Kaitao Song, Xu Tan, et al. | 2023-03-30 | `API:` [langchain_experimental.autonomous_agents](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.autonomous_agents), `Cookbook:` [hugginggpt](https://github.com/langchain-ai/langchain/blob/master/cookbook/hugginggpt.ipynb)
| `2303.08774v6` [GPT-4 Technical Report](http://arxiv.org/abs/2303.08774v6) | OpenAI, Josh Achiam, Steven Adler, et al. | 2023-03-15 | `Docs:` [docs/integrations/vectorstores/mongodb_atlas](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
| `2301.10226v4` [A Watermark for Large Language Models](http://arxiv.org/abs/2301.10226v4) | John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al. | 2023-01-24 | `API:` [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
| `2301.10226v4` [A Watermark for Large Language Models](http://arxiv.org/abs/2301.10226v4) | John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al. | 2023-01-24 | `API:` [langchain_community...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
| `2212.10496v1` [Precise Zero-Shot Dense Retrieval without Relevance Labels](http://arxiv.org/abs/2212.10496v1) | Luyu Gao, Xueguang Ma, Jimmy Lin, et al. | 2022-12-20 | `API:` [langchain...HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder), `Template:` [hyde](https://python.langchain.com/docs/templates/hyde), `Cookbook:` [hypothetical_document_embeddings](https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb)
| `2212.07425v3` [Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments](http://arxiv.org/abs/2212.07425v3) | Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, et al. | 2022-12-12 | `API:` [langchain_experimental.fallacy_removal](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.fallacy_removal)
| `2211.13892v2` [Complementary Explanations for Effective In-Context Learning](http://arxiv.org/abs/2211.13892v2) | Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, et al. | 2022-11-25 | `API:` [langchain_core...MaxMarginalRelevanceExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html#langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector)
| `2211.10435v2` [PAL: Program-aided Language Models](http://arxiv.org/abs/2211.10435v2) | Luyu Gao, Aman Madaan, Shuyan Zhou, et al. | 2022-11-18 | `API:` [langchain_experimental...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain), `Cookbook:` [program_aided_language_model](https://github.com/langchain-ai/langchain/blob/master/cookbook/program_aided_language_model.ipynb)
| `2210.03629v3` [ReAct: Synergizing Reasoning and Acting in Language Models](http://arxiv.org/abs/2210.03629v3) | Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. | 2022-10-06 | `Docs:` [docs/integrations/providers/cohere](https://python.langchain.com/docs/integrations/providers/cohere), [docs/integrations/chat/huggingface](https://python.langchain.com/docs/integrations/chat/huggingface), [docs/integrations/tools/ionic_shopping](https://python.langchain.com/docs/integrations/tools/ionic_shopping), `API:` [langchain...create_react_agent](https://api.python.langchain.com/en/latest/agents/langchain.agents.react.agent.create_react_agent.html#langchain.agents.react.agent.create_react_agent), [langchain...TrajectoryEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain.html#langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain)
| `2211.10435v2` [PAL: Program-aided Language Models](http://arxiv.org/abs/2211.10435v2) | Luyu Gao, Aman Madaan, Shuyan Zhou, et al. | 2022-11-18 | `API:` [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain), [langchain_experimental...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), `Cookbook:` [program_aided_language_model](https://github.com/langchain-ai/langchain/blob/master/cookbook/program_aided_language_model.ipynb)
| `2210.03629v3` [ReAct: Synergizing Reasoning and Acting in Language Models](http://arxiv.org/abs/2210.03629v3) | Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. | 2022-10-06 | `Docs:` [docs/integrations/providers/cohere](https://python.langchain.com/docs/integrations/providers/cohere), [docs/integrations/tools/ionic_shopping](https://python.langchain.com/docs/integrations/tools/ionic_shopping), `API:` [langchain...TrajectoryEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain.html#langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain), [langchain...create_react_agent](https://api.python.langchain.com/en/latest/agents/langchain.agents.react.agent.create_react_agent.html#langchain.agents.react.agent.create_react_agent)
| `2209.10785v2` [Deep Lake: a Lakehouse for Deep Learning](http://arxiv.org/abs/2209.10785v2) | Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan, et al. | 2022-09-22 | `Docs:` [docs/integrations/providers/activeloop_deeplake](https://python.langchain.com/docs/integrations/providers/activeloop_deeplake)
| `2205.13147v4` [Matryoshka Representation Learning](http://arxiv.org/abs/2205.13147v4) | Aditya Kusupati, Gantavya Bhatt, Aniket Rege, et al. | 2022-05-26 | `Docs:` [docs/integrations/providers/snowflake](https://python.langchain.com/docs/integrations/providers/snowflake)
| `2205.12654v1` [Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages](http://arxiv.org/abs/2205.12654v1) | Kevin Heffernan, Onur Çelebi, Holger Schwenk | 2022-05-25 | `API:` [langchain_community...LaserEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.laser.LaserEmbeddings.html#langchain_community.embeddings.laser.LaserEmbeddings)
| `2204.00498v1` [Evaluating the Text-to-SQL Capabilities of Large Language Models](http://arxiv.org/abs/2204.00498v1) | Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau | 2022-03-15 | `API:` [langchain_community...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL), [langchain_community...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase)
| `2202.00666v5` [Locally Typical Sampling](http://arxiv.org/abs/2202.00666v5) | Clara Meister, Tiago Pimentel, Gian Wiher, et al. | 2022-02-01 | `API:` [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
| `2204.00498v1` [Evaluating the Text-to-SQL Capabilities of Large Language Models](http://arxiv.org/abs/2204.00498v1) | Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau | 2022-03-15 | `API:` [langchain_community...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
| `2202.00666v5` [Locally Typical Sampling](http://arxiv.org/abs/2202.00666v5) | Clara Meister, Tiago Pimentel, Gian Wiher, et al. | 2022-02-01 | `API:` [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
| `2103.00020v1` [Learning Transferable Visual Models From Natural Language Supervision](http://arxiv.org/abs/2103.00020v1) | Alec Radford, Jong Wook Kim, Chris Hallacy, et al. | 2021-02-26 | `API:` [langchain_experimental.open_clip](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.open_clip)
| `1909.05858v2` [CTRL: A Conditional Transformer Language Model for Controllable Generation](http://arxiv.org/abs/1909.05858v2) | Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, et al. | 2019-09-11 | `API:` [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
| `1908.10084v1` [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](http://arxiv.org/abs/1908.10084v1) | Nils Reimers, Iryna Gurevych | 2019-08-27 | `Docs:` [docs/integrations/text_embedding/sentence_transformers](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers)
| `1909.05858v2` [CTRL: A Conditional Transformer Language Model for Controllable Generation](http://arxiv.org/abs/1909.05858v2) | Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, et al. | 2019-09-11 | `API:` [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
## Self-Discover: Large Language Models Self-Compose Reasoning Structures
- **arXiv id:** 2402.03620v1
- **arXiv id:** [2402.03620v1](http://arxiv.org/abs/2402.03620v1) **Published Date:** 2024-02-06
- **Title:** Self-Discover: Large Language Models Self-Compose Reasoning Structures
- **Authors:** Pei Zhou, Jay Pujara, Xiang Ren, et al.
- **Published Date:** 2024-02-06
- **URL:** http://arxiv.org/abs/2402.03620v1
- **LangChain:**
- **Cookbook:** [self-discover](https://github.com/langchain-ai/langchain/blob/master/cookbook/self-discover.ipynb)
@@ -70,11 +71,9 @@ commonalities with human reasoning patterns.
## RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- **arXiv id:** 2401.18059v1
- **arXiv id:** [2401.18059v1](http://arxiv.org/abs/2401.18059v1) **Published Date:** 2024-01-31
- **Title:** RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- **Authors:** Parth Sarthi, Salman Abdullah, Aditi Tuli, et al.
- **Published Date:** 2024-01-31
- **URL:** http://arxiv.org/abs/2401.18059v1
- **LangChain:**
- **Cookbook:** [RAPTOR](https://github.com/langchain-ai/langchain/blob/master/cookbook/RAPTOR.ipynb)
@@ -96,11 +95,9 @@ benchmark by 20% in absolute accuracy.
## Corrective Retrieval Augmented Generation
- **arXiv id:** 2401.15884v2
- **arXiv id:** [2401.15884v2](http://arxiv.org/abs/2401.15884v2) **Published Date:** 2024-01-29
- **Title:** Corrective Retrieval Augmented Generation
- **Authors:** Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, et al.
- **Published Date:** 2024-01-29
- **URL:** http://arxiv.org/abs/2401.15884v2
- **LangChain:**
- **Cookbook:** [langgraph_crag](https://github.com/langchain-ai/langchain/blob/master/cookbook/langgraph_crag.ipynb)
@@ -126,11 +123,9 @@ performance of RAG-based approaches.
## Mixtral of Experts
- **arXiv id:** 2401.04088v1
- **arXiv id:** [2401.04088v1](http://arxiv.org/abs/2401.04088v1) **Published Date:** 2024-01-08
- **Title:** Mixtral of Experts
- **Authors:** Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, et al.
- **Published Date:** 2024-01-08
- **URL:** http://arxiv.org/abs/2401.04088v1
- **LangChain:**
- **Cookbook:** [together_ai](https://github.com/langchain-ai/langchain/blob/master/cookbook/together_ai.ipynb)
@@ -152,11 +147,9 @@ the base and instruct models are released under the Apache 2.0 license.
## Dense X Retrieval: What Retrieval Granularity Should We Use?
- **arXiv id:** 2312.06648v2
- **arXiv id:** [2312.06648v2](http://arxiv.org/abs/2312.06648v2) **Published Date:** 2023-12-11
- **Title:** Dense X Retrieval: What Retrieval Granularity Should We Use?
- **Authors:** Tong Chen, Hongwei Wang, Sihao Chen, et al.
- **Published Date:** 2023-12-11
- **URL:** http://arxiv.org/abs/2312.06648v2
- **LangChain:**
- **Template:** [propositional-retrieval](https://python.langchain.com/docs/templates/propositional-retrieval)
@@ -181,11 +174,9 @@ information.
## Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
- **arXiv id:** 2311.09210v1
- **arXiv id:** [2311.09210v1](http://arxiv.org/abs/2311.09210v1) **Published Date:** 2023-11-15
- **Title:** Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
- **Authors:** Wenhao Yu, Hongming Zhang, Xiaoman Pan, et al.
- **Published Date:** 2023-11-15
- **URL:** http://arxiv.org/abs/2311.09210v1
- **LangChain:**
- **Template:** [chain-of-note-wiki](https://python.langchain.com/docs/templates/chain-of-note-wiki)
@@ -215,11 +206,9 @@ outside the pre-training knowledge scope.
## Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- **arXiv id:** 2310.11511v1
- **arXiv id:** [2310.11511v1](http://arxiv.org/abs/2310.11511v1) **Published Date:** 2023-10-17
- **Title:** Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- **Authors:** Akari Asai, Zeqiu Wu, Yizhong Wang, et al.
- **Published Date:** 2023-10-17
- **URL:** http://arxiv.org/abs/2310.11511v1
- **LangChain:**
- **Cookbook:** [langgraph_self_rag](https://github.com/langchain-ai/langchain/blob/master/cookbook/langgraph_self_rag.ipynb)
@@ -248,11 +237,9 @@ to these models.
## Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
- **arXiv id:** 2310.06117v2
- **arXiv id:** [2310.06117v2](http://arxiv.org/abs/2310.06117v2) **Published Date:** 2023-10-09
- **Title:** Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
- **Authors:** Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, et al.
- **Published Date:** 2023-10-09
- **URL:** http://arxiv.org/abs/2310.06117v2
- **LangChain:**
- **Template:** [stepback-qa-prompting](https://python.langchain.com/docs/templates/stepback-qa-prompting)
@@ -271,11 +258,9 @@ and 11% respectively, TimeQA by 27%, and MuSiQue by 7%.
## Llama 2: Open Foundation and Fine-Tuned Chat Models
- **arXiv id:** 2307.09288v2
- **arXiv id:** [2307.09288v2](http://arxiv.org/abs/2307.09288v2) **Published Date:** 2023-07-18
- **Title:** Llama 2: Open Foundation and Fine-Tuned Chat Models
- **Authors:** Hugo Touvron, Louis Martin, Kevin Stone, et al.
- **Published Date:** 2023-07-18
- **URL:** http://arxiv.org/abs/2307.09288v2
- **LangChain:**
- **Cookbook:** [Semi_Structured_RAG](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_Structured_RAG.ipynb)
@@ -292,11 +277,9 @@ contribute to the responsible development of LLMs.
## Query Rewriting for Retrieval-Augmented Large Language Models
- **arXiv id:** 2305.14283v3
- **arXiv id:** [2305.14283v3](http://arxiv.org/abs/2305.14283v3) **Published Date:** 2023-05-23
- **Title:** Query Rewriting for Retrieval-Augmented Large Language Models
- **Authors:** Xinbei Ma, Yeyun Gong, Pengcheng He, et al.
- **Published Date:** 2023-05-23
- **URL:** http://arxiv.org/abs/2305.14283v3
- **LangChain:**
- **Template:** [rewrite-retrieve-read](https://python.langchain.com/docs/templates/rewrite-retrieve-read)
@@ -322,11 +305,9 @@ for retrieval-augmented LLM.
## Large Language Model Guided Tree-of-Thought
- **arXiv id:** 2305.08291v1
- **arXiv id:** [2305.08291v1](http://arxiv.org/abs/2305.08291v1) **Published Date:** 2023-05-15
- **Title:** Large Language Model Guided Tree-of-Thought
- **Authors:** Jieyi Long
- **Published Date:** 2023-05-15
- **URL:** http://arxiv.org/abs/2305.08291v1
- **LangChain:**
- **API Reference:** [langchain_experimental.tot](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.tot)
@@ -352,11 +333,9 @@ implementation of the ToT-based Sudoku solver is available on GitHub:
## Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- **arXiv id:** 2305.04091v3
- **arXiv id:** [2305.04091v3](http://arxiv.org/abs/2305.04091v3) **Published Date:** 2023-05-06
- **Title:** Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- **Authors:** Lei Wang, Wanyu Xu, Yihuai Lan, et al.
- **Published Date:** 2023-05-06
- **URL:** http://arxiv.org/abs/2305.04091v3
- **LangChain:**
- **Cookbook:** [plan_and_execute_agent](https://github.com/langchain-ai/langchain/blob/master/cookbook/plan_and_execute_agent.ipynb)
@@ -383,13 +362,35 @@ Prompting, and has comparable performance with 8-shot CoT prompting on the math
reasoning problem. The code can be found at
https://github.com/AGI-Edgerunners/Plan-and-Solve-Prompting.
## Zero-Shot Listwise Document Reranking with a Large Language Model
- **arXiv id:** [2305.02156v1](http://arxiv.org/abs/2305.02156v1) **Published Date:** 2023-05-03
- **Title:** Zero-Shot Listwise Document Reranking with a Large Language Model
- **Authors:** Xueguang Ma, Xinyu Zhang, Ronak Pradeep, et al.
- **LangChain:**
- **API Reference:** [langchain...LLMListwiseRerank](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.listwise_rerank.LLMListwiseRerank.html#langchain.retrievers.document_compressors.listwise_rerank.LLMListwiseRerank)
**Abstract:** Supervised ranking methods based on bi-encoder or cross-encoder architectures
have shown success in multi-stage text ranking tasks, but they require large
amounts of relevance judgments as training data. In this work, we propose
Listwise Reranker with a Large Language Model (LRL), which achieves strong
reranking effectiveness without using any task-specific training data.
Different from the existing pointwise ranking methods, where documents are
scored independently and ranked according to the scores, LRL directly generates
a reordered list of document identifiers given the candidate documents.
Experiments on three TREC web search datasets demonstrate that LRL not only
outperforms zero-shot pointwise methods when reranking first-stage retrieval
results, but can also act as a final-stage reranker to improve the top-ranked
results of a pointwise method for improved efficiency. Additionally, we apply
our approach to subsets of MIRACL, a recent multilingual retrieval dataset,
with results showing its potential to generalize across different languages.
## Visual Instruction Tuning
- **arXiv id:** 2304.08485v2
- **arXiv id:** [2304.08485v2](http://arxiv.org/abs/2304.08485v2) **Published Date:** 2023-04-17
- **Title:** Visual Instruction Tuning
- **Authors:** Haotian Liu, Chunyuan Li, Qingyang Wu, et al.
- **Published Date:** 2023-04-17
- **URL:** http://arxiv.org/abs/2304.08485v2
- **LangChain:**
- **Cookbook:** [Semi_structured_and_multi_modal_RAG](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb), [Semi_structured_multi_modal_RAG_LLaMA2](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb)
@@ -412,11 +413,9 @@ publicly available.
## Generative Agents: Interactive Simulacra of Human Behavior
- **arXiv id:** 2304.03442v2
- **arXiv id:** [2304.03442v2](http://arxiv.org/abs/2304.03442v2) **Published Date:** 2023-04-07
- **Title:** Generative Agents: Interactive Simulacra of Human Behavior
- **Authors:** Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, et al.
- **Published Date:** 2023-04-07
- **URL:** http://arxiv.org/abs/2304.03442v2
- **LangChain:**
- **Cookbook:** [multiagent_bidding](https://github.com/langchain-ai/langchain/blob/master/cookbook/multiagent_bidding.ipynb), [generative_agents_interactive_simulacra_of_human_behavior](https://github.com/langchain-ai/langchain/blob/master/cookbook/generative_agents_interactive_simulacra_of_human_behavior.ipynb)
@@ -448,11 +447,9 @@ interaction patterns for enabling believable simulations of human behavior.
## CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- **arXiv id:** 2303.17760v2
- **arXiv id:** [2303.17760v2](http://arxiv.org/abs/2303.17760v2) **Published Date:** 2023-03-31
- **Title:** CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
- **Authors:** Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, et al.
- **Published Date:** 2023-03-31
- **URL:** http://arxiv.org/abs/2303.17760v2
- **LangChain:**
- **Cookbook:** [camel_role_playing](https://github.com/langchain-ai/langchain/blob/master/cookbook/camel_role_playing.ipynb)
@@ -478,11 +475,9 @@ agents and beyond: https://github.com/camel-ai/camel.
## HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- **arXiv id:** 2303.17580v4
- **arXiv id:** [2303.17580v4](http://arxiv.org/abs/2303.17580v4) **Published Date:** 2023-03-30
- **Title:** HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- **Authors:** Yongliang Shen, Kaitao Song, Xu Tan, et al.
- **Published Date:** 2023-03-30
- **URL:** http://arxiv.org/abs/2303.17580v4
- **LangChain:**
- **API Reference:** [langchain_experimental.autonomous_agents](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.autonomous_agents)
@@ -508,40 +503,14 @@ modalities and domains and achieve impressive results in language, vision,
speech, and other challenging tasks, which paves a new way towards the
realization of artificial general intelligence.
## GPT-4 Technical Report
- **arXiv id:** 2303.08774v6
- **Title:** GPT-4 Technical Report
- **Authors:** OpenAI, Josh Achiam, Steven Adler, et al.
- **Published Date:** 2023-03-15
- **URL:** http://arxiv.org/abs/2303.08774v6
- **LangChain:**
- **Documentation:** [docs/integrations/vectorstores/mongodb_atlas](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
**Abstract:** We report the development of GPT-4, a large-scale, multimodal model which can
accept image and text inputs and produce text outputs. While less capable than
humans in many real-world scenarios, GPT-4 exhibits human-level performance on
various professional and academic benchmarks, including passing a simulated bar
exam with a score around the top 10% of test takers. GPT-4 is a
Transformer-based model pre-trained to predict the next token in a document.
The post-training alignment process results in improved performance on measures
of factuality and adherence to desired behavior. A core component of this
project was developing infrastructure and optimization methods that behave
predictably across a wide range of scales. This allowed us to accurately
predict some aspects of GPT-4's performance based on models trained with no
more than 1/1,000th the compute of GPT-4.
## A Watermark for Large Language Models
- **arXiv id:** 2301.10226v4
- **arXiv id:** [2301.10226v4](http://arxiv.org/abs/2301.10226v4) **Published Date:** 2023-01-24
- **Title:** A Watermark for Large Language Models
- **Authors:** John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al.
- **Published Date:** 2023-01-24
- **URL:** http://arxiv.org/abs/2301.10226v4
- **LangChain:**
- **API Reference:** [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
- **API Reference:** [langchain_community...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
**Abstract:** Potential harms of large language models can be mitigated by watermarking
model output, i.e., embedding signals into generated text that are invisible to
@@ -559,11 +528,9 @@ family, and discuss robustness and security.
## Precise Zero-Shot Dense Retrieval without Relevance Labels
- **arXiv id:** 2212.10496v1
- **arXiv id:** [2212.10496v1](http://arxiv.org/abs/2212.10496v1) **Published Date:** 2022-12-20
- **Title:** Precise Zero-Shot Dense Retrieval without Relevance Labels
- **Authors:** Luyu Gao, Xueguang Ma, Jimmy Lin, et al.
- **Published Date:** 2022-12-20
- **URL:** http://arxiv.org/abs/2212.10496v1
- **LangChain:**
- **API Reference:** [langchain...HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder)
@@ -590,11 +557,9 @@ search, QA, fact verification) and languages~(e.g. sw, ko, ja).
## Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
- **arXiv id:** 2212.07425v3
- **arXiv id:** [2212.07425v3](http://arxiv.org/abs/2212.07425v3) **Published Date:** 2022-12-12
- **Title:** Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
- **Authors:** Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande, et al.
- **Published Date:** 2022-12-12
- **URL:** http://arxiv.org/abs/2212.07425v3
- **LangChain:**
- **API Reference:** [langchain_experimental.fallacy_removal](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.fallacy_removal)
@@ -623,11 +588,9 @@ further work on logical fallacy identification.
## Complementary Explanations for Effective In-Context Learning
- **arXiv id:** 2211.13892v2
- **arXiv id:** [2211.13892v2](http://arxiv.org/abs/2211.13892v2) **Published Date:** 2022-11-25
- **Title:** Complementary Explanations for Effective In-Context Learning
- **Authors:** Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, et al.
- **Published Date:** 2022-11-25
- **URL:** http://arxiv.org/abs/2211.13892v2
- **LangChain:**
- **API Reference:** [langchain_core...MaxMarginalRelevanceExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html#langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector)
@@ -651,14 +614,12 @@ performance across three real-world tasks on multiple LLMs.
## PAL: Program-aided Language Models
- **arXiv id:** 2211.10435v2
- **arXiv id:** [2211.10435v2](http://arxiv.org/abs/2211.10435v2) **Published Date:** 2022-11-18
- **Title:** PAL: Program-aided Language Models
- **Authors:** Luyu Gao, Aman Madaan, Shuyan Zhou, et al.
- **Published Date:** 2022-11-18
- **URL:** http://arxiv.org/abs/2211.10435v2
- **LangChain:**
- **API Reference:** [langchain_experimental...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain)
- **API Reference:** [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain), [langchain_experimental...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain)
- **Cookbook:** [program_aided_language_model](https://github.com/langchain-ai/langchain/blob/master/cookbook/program_aided_language_model.ipynb)
**Abstract:** Large language models (LLMs) have recently demonstrated an impressive ability
@@ -686,15 +647,13 @@ publicly available at http://reasonwithpal.com/ .
## ReAct: Synergizing Reasoning and Acting in Language Models
- **arXiv id:** 2210.03629v3
- **arXiv id:** [2210.03629v3](http://arxiv.org/abs/2210.03629v3) **Published Date:** 2022-10-06
- **Title:** ReAct: Synergizing Reasoning and Acting in Language Models
- **Authors:** Shunyu Yao, Jeffrey Zhao, Dian Yu, et al.
- **Published Date:** 2022-10-06
- **URL:** http://arxiv.org/abs/2210.03629v3
- **LangChain:**
- **Documentation:** [docs/integrations/providers/cohere](https://python.langchain.com/docs/integrations/providers/cohere), [docs/integrations/chat/huggingface](https://python.langchain.com/docs/integrations/chat/huggingface), [docs/integrations/tools/ionic_shopping](https://python.langchain.com/docs/integrations/tools/ionic_shopping)
- **API Reference:** [langchain...create_react_agent](https://api.python.langchain.com/en/latest/agents/langchain.agents.react.agent.create_react_agent.html#langchain.agents.react.agent.create_react_agent), [langchain...TrajectoryEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain.html#langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain)
- **Documentation:** [docs/integrations/providers/cohere](https://python.langchain.com/docs/integrations/providers/cohere), [docs/integrations/tools/ionic_shopping](https://python.langchain.com/docs/integrations/tools/ionic_shopping)
- **API Reference:** [langchain...TrajectoryEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain.html#langchain.evaluation.agents.trajectory_eval_chain.TrajectoryEvalChain), [langchain...create_react_agent](https://api.python.langchain.com/en/latest/agents/langchain.agents.react.agent.create_react_agent.html#langchain.agents.react.agent.create_react_agent)
**Abstract:** While large language models (LLMs) have demonstrated impressive capabilities
across tasks in language understanding and interactive decision making, their
@@ -721,11 +680,9 @@ Project site with code: https://react-lm.github.io
## Deep Lake: a Lakehouse for Deep Learning
- **arXiv id:** 2209.10785v2
- **arXiv id:** [2209.10785v2](http://arxiv.org/abs/2209.10785v2) **Published Date:** 2022-09-22
- **Title:** Deep Lake: a Lakehouse for Deep Learning
- **Authors:** Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan, et al.
- **Published Date:** 2022-09-22
- **URL:** http://arxiv.org/abs/2209.10785v2
- **LangChain:**
- **Documentation:** [docs/integrations/providers/activeloop_deeplake](https://python.langchain.com/docs/integrations/providers/activeloop_deeplake)
@@ -747,13 +704,43 @@ visualization engine, or (c) deep learning frameworks without sacrificing GPU
utilization. Datasets stored in Deep Lake can be accessed from PyTorch,
TensorFlow, JAX, and integrate with numerous MLOps tools.
## Matryoshka Representation Learning
- **arXiv id:** [2205.13147v4](http://arxiv.org/abs/2205.13147v4) **Published Date:** 2022-05-26
- **Title:** Matryoshka Representation Learning
- **Authors:** Aditya Kusupati, Gantavya Bhatt, Aniket Rege, et al.
- **LangChain:**
- **Documentation:** [docs/integrations/providers/snowflake](https://python.langchain.com/docs/integrations/providers/snowflake)
**Abstract:** Learned representations are a central component in modern ML systems, serving
a multitude of downstream tasks. When training such representations, it is
often the case that computational and statistical constraints for each
downstream task are unknown. In this context rigid, fixed capacity
representations can be either over or under-accommodating to the task at hand.
This leads us to ask: can we design a flexible representation that can adapt to
multiple downstream tasks with varying computational resources? Our main
contribution is Matryoshka Representation Learning (MRL) which encodes
information at different granularities and allows a single embedding to adapt
to the computational constraints of downstream tasks. MRL minimally modifies
existing representation learning pipelines and imposes no additional cost
during inference and deployment. MRL learns coarse-to-fine representations that
are at least as accurate and rich as independently trained low-dimensional
representations. The flexibility within the learned Matryoshka Representations
offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at
the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale
retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for
long-tail few-shot classification, all while being as robust as the original
representations. Finally, we show that MRL extends seamlessly to web-scale
datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet),
vision + language (ALIGN) and language (BERT). MRL code and pretrained models
are open-sourced at https://github.com/RAIVNLab/MRL.
## Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
- **arXiv id:** 2205.12654v1
- **arXiv id:** [2205.12654v1](http://arxiv.org/abs/2205.12654v1) **Published Date:** 2022-05-25
- **Title:** Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages
- **Authors:** Kevin Heffernan, Onur Çelebi, Holger Schwenk
- **Published Date:** 2022-05-25
- **URL:** http://arxiv.org/abs/2205.12654v1
- **LangChain:**
- **API Reference:** [langchain_community...LaserEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.laser.LaserEmbeddings.html#langchain_community.embeddings.laser.LaserEmbeddings)
@@ -778,14 +765,12 @@ encoders, mine bitexts, and validate the bitexts by training NMT systems.
## Evaluating the Text-to-SQL Capabilities of Large Language Models
- **arXiv id:** 2204.00498v1
- **arXiv id:** [2204.00498v1](http://arxiv.org/abs/2204.00498v1) **Published Date:** 2022-03-15
- **Title:** Evaluating the Text-to-SQL Capabilities of Large Language Models
- **Authors:** Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau
- **Published Date:** 2022-03-15
- **URL:** http://arxiv.org/abs/2204.00498v1
- **LangChain:**
- **API Reference:** [langchain_community...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL), [langchain_community...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase)
- **API Reference:** [langchain_community...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
**Abstract:** We perform an empirical evaluation of Text-to-SQL capabilities of the Codex
language model. We find that, without any finetuning, Codex is a strong
@@ -797,14 +782,12 @@ few-shot examples.
## Locally Typical Sampling
- **arXiv id:** 2202.00666v5
- **arXiv id:** [2202.00666v5](http://arxiv.org/abs/2202.00666v5) **Published Date:** 2022-02-01
- **Title:** Locally Typical Sampling
- **Authors:** Clara Meister, Tiago Pimentel, Gian Wiher, et al.
- **Published Date:** 2022-02-01
- **URL:** http://arxiv.org/abs/2202.00666v5
- **LangChain:**
- **API Reference:** [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
- **API Reference:** [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
**Abstract:** Today's probabilistic language generators fall short when it comes to
producing coherent and fluent text despite the fact that the underlying models
@@ -829,11 +812,9 @@ reducing degenerate repetitions.
## Learning Transferable Visual Models From Natural Language Supervision
- **arXiv id:** 2103.00020v1
- **arXiv id:** [2103.00020v1](http://arxiv.org/abs/2103.00020v1) **Published Date:** 2021-02-26
- **Title:** Learning Transferable Visual Models From Natural Language Supervision
- **Authors:** Alec Radford, Jong Wook Kim, Chris Hallacy, et al.
- **Published Date:** 2021-02-26
- **URL:** http://arxiv.org/abs/2103.00020v1
- **LangChain:**
- **API Reference:** [langchain_experimental.open_clip](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.open_clip)
@@ -861,14 +842,12 @@ https://github.com/OpenAI/CLIP.
## CTRL: A Conditional Transformer Language Model for Controllable Generation
- **arXiv id:** 1909.05858v2
- **arXiv id:** [1909.05858v2](http://arxiv.org/abs/1909.05858v2) **Published Date:** 2019-09-11
- **Title:** CTRL: A Conditional Transformer Language Model for Controllable Generation
- **Authors:** Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, et al.
- **Published Date:** 2019-09-11
- **URL:** http://arxiv.org/abs/1909.05858v2
- **LangChain:**
- **API Reference:** [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
- **API Reference:** [langchain_huggingface...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference)
**Abstract:** Large-scale language models show promising text generation capabilities, but
users cannot easily control particular aspects of the generated text. We
@@ -881,32 +860,4 @@ codes also allow CTRL to predict which parts of the training data are most
likely given a sequence. This provides a potential method for analyzing large
amounts of data via model-based source attribution. We have released multiple
full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl.
## Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- **arXiv id:** 1908.10084v1
- **Title:** Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- **Authors:** Nils Reimers, Iryna Gurevych
- **Published Date:** 2019-08-27
- **URL:** http://arxiv.org/abs/1908.10084v1
- **LangChain:**
- **Documentation:** [docs/integrations/text_embedding/sentence_transformers](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers)
**Abstract:** BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new
state-of-the-art performance on sentence-pair regression tasks like semantic
textual similarity (STS). However, it requires that both sentences are fed into
the network, which causes a massive computational overhead: Finding the most
similar pair in a collection of 10,000 sentences requires about 50 million
inference computations (~65 hours) with BERT. The construction of BERT makes it
unsuitable for semantic similarity search as well as for unsupervised tasks
like clustering.
In this publication, we present Sentence-BERT (SBERT), a modification of the
pretrained BERT network that use siamese and triplet network structures to
derive semantically meaningful sentence embeddings that can be compared using
cosine-similarity. This reduces the effort for finding the most similar pair
from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while
maintaining the accuracy from BERT.
We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning
tasks, where it outperforms other state-of-the-art sentence embeddings methods.

View File

@@ -90,7 +90,7 @@ LCEL aims to provide consistency around behavior and customization over legacy s
`ConversationalRetrievalChain`. Many of these legacy chains hide important details like prompts, and as a wider variety
of viable models emerge, customization has become more and more important.
If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/how_to/migrate_chains/).
If you are currently using one of these legacy chains, please see [this guide for guidance on how to migrate](/docs/versions/migrating_chains).
For guides on how to do specific tasks with LCEL, check out [the relevant how-to guides](/docs/how_to/#langchain-expression-language-lcel).
@@ -498,6 +498,30 @@ Retrievers accept a string query as input and return a list of Document's as out
For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
### Key-value stores
For some techniques, such as [indexing and retrieval with multiple vectors per document](/docs/how_to/multi_vector/) or
[caching embeddings](/docs/how_to/caching_embeddings/), having a form of key-value (KV) storage is helpful.
LangChain includes a [`BaseStore`](https://api.python.langchain.com/en/latest/stores/langchain_core.stores.BaseStore.html) interface,
which allows for storage of arbitrary data. However, LangChain components that require KV-storage accept a
more specific `BaseStore[str, bytes]` instance that stores binary data (referred to as a `ByteStore`), and internally take care of
encoding and decoding data for their specific needs.
This means that as a user, you only need to think about one type of store rather than different ones for different types of data.
#### Interface
All [`BaseStores`](https://api.python.langchain.com/en/latest/stores/langchain_core.stores.BaseStore.html) support the following interface. Note that the interface allows
for modifying **multiple** key-value pairs at once:
- `mget(key: Sequence[str]) -> List[Optional[bytes]]`: get the contents of multiple keys, returning `None` if the key does not exist
- `mset(key_value_pairs: Sequence[Tuple[str, bytes]]) -> None`: set the contents of multiple keys
- `mdelete(key: Sequence[str]) -> None`: delete multiple keys
- `yield_keys(prefix: Optional[str] = None) -> Iterator[str]`: yield all keys in the store, optionally filtering by a prefix
For key-value store implementations, see [this section](/docs/integrations/stores/).
### Tools
<span data-heading-keywords="tool,tools"></span>
@@ -518,7 +542,8 @@ Typical usage may look like the following:
```python
tools = [...] # Define a list of tools
llm_with_tools = llm.bind_tools(tools)
ai_msg = llm_with_tools.invoke("do xyz...") # AIMessage(tool_calls=[ToolCall(...), ...], ...)
ai_msg = llm_with_tools.invoke("do xyz...")
# -> AIMessage(tool_calls=[ToolCall(...), ...], ...)
```
The `AIMessage` returned from the model MAY have `tool_calls` associated with it.
@@ -535,9 +560,14 @@ This generally looks like:
```python
# You will want to previously check that the LLM returned tool calls
tool_call = ai_msg.tool_calls[0] # ToolCall(args={...}, id=..., ...)
tool_call = ai_msg.tool_calls[0]
# ToolCall(args={...}, id=..., ...)
tool_output = tool.invoke(tool_call["args"])
tool_message = ToolMessage(content=tool_output, tool_call_id=tool_call["id"], name=tool_call["name"])
tool_message = ToolMessage(
content=tool_output,
tool_call_id=tool_call["id"],
name=tool_call["name"]
)
```
Note that the `content` field will generally be passed back to the model.
@@ -547,7 +577,12 @@ you can transform the tool output but also pass it as an artifact (read more abo
```python
... # Same code as above
response_for_llm = transform(response)
tool_message = ToolMessage(content=response_for_llm, tool_call_id=tool_call["id"], name=tool_call["name"], artifact=tool_output)
tool_message = ToolMessage(
content=response_for_llm,
tool_call_id=tool_call["id"],
name=tool_call["name"],
artifact=tool_output
)
```
#### Invoke with `ToolCall`
@@ -558,9 +593,14 @@ The benefits of this are that you don't have to write the logic yourself to tran
This generally looks like:
```python
tool_call = ai_msg.tool_calls[0] # ToolCall(args={...}, id=..., ...)
tool_call = ai_msg.tool_calls[0]
# -> ToolCall(args={...}, id=..., ...)
tool_message = tool.invoke(tool_call)
# -> ToolMessage(content="tool result foobar...", tool_call_id=..., name="tool_name")
# -> ToolMessage(
content="tool result foobar...",
tool_call_id=...,
name="tool_name"
)
```
If you are invoking the tool this way and want to include an [artifact](/docs/concepts/#toolmessage) for the ToolMessage, you will need to have the tool return two things.

View File

@@ -63,7 +63,7 @@
"outputs": [],
"source": [
"# <!-- ruff: noqa: F821 -->\n",
"from langchain.globals import set_llm_cache"
"from langchain_core.globals import set_llm_cache"
]
},
{
@@ -103,7 +103,7 @@
],
"source": [
"%%time\n",
"from langchain.cache import InMemoryCache\n",
"from langchain_core.caches import InMemoryCache\n",
"\n",
"set_llm_cache(InMemoryCache())\n",
"\n",

View File

@@ -54,7 +54,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "9e4144de-d925-4d4c-91c3-685ef8baa57c",
"id": "2bb9c73f-9d00-4a19-a81f-cab2f0fd921a",
"metadata": {},
"outputs": [],
"source": [
@@ -63,7 +63,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 4,
"id": "a9e37aa1",
"metadata": {},
"outputs": [],
@@ -718,8 +718,44 @@
"php_splitter = RecursiveCharacterTextSplitter.from_language(\n",
" language=Language.PHP, chunk_size=50, chunk_overlap=0\n",
")\n",
"haskell_docs = php_splitter.create_documents([PHP_CODE])\n",
"haskell_docs"
"php_docs = php_splitter.create_documents([PHP_CODE])\n",
"php_docs"
]
},
{
"cell_type": "markdown",
"id": "e9fa62c1",
"metadata": {},
"source": [
"## PowerShell\n",
"Here's an example using the PowerShell text splitter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e6893ad",
"metadata": {},
"outputs": [],
"source": [
"POWERSHELL_CODE = \"\"\"\n",
"$directoryPath = Get-Location\n",
"\n",
"$items = Get-ChildItem -Path $directoryPath\n",
"\n",
"$files = $items | Where-Object { -not $_.PSIsContainer }\n",
"\n",
"$sortedFiles = $files | Sort-Object LastWriteTime\n",
"\n",
"foreach ($file in $sortedFiles) {\n",
" Write-Output (\"Name: \" + $file.Name + \" | Last Write Time: \" + $file.LastWriteTime)\n",
"}\n",
"\"\"\"\n",
"powershell_splitter = RecursiveCharacterTextSplitter.from_language(\n",
" language=Language.POWERSHELL, chunk_size=100, chunk_overlap=0\n",
")\n",
"powershell_docs = powershell_splitter.create_documents([POWERSHELL_CODE])\n",
"powershell_docs"
]
}
],
@@ -739,7 +775,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -409,7 +409,7 @@
" # When configuring the end runnable, we can then use this id to configure this field\n",
" ConfigurableField(id=\"prompt\"),\n",
" # This sets a default_key.\n",
" # If we specify this key, the default LLM (ChatAnthropic initialized above) will be used\n",
" # If we specify this key, the default prompt (asking for a joke, as initialized above) will be used\n",
" default_key=\"joke\",\n",
" # This adds a new option, with name `poem`\n",
" poem=PromptTemplate.from_template(\"Write a short poem about {topic}\"),\n",
@@ -494,7 +494,7 @@
" # When configuring the end runnable, we can then use this id to configure this field\n",
" ConfigurableField(id=\"prompt\"),\n",
" # This sets a default_key.\n",
" # If we specify this key, the default LLM (ChatAnthropic initialized above) will be used\n",
" # If we specify this key, the default prompt (asking for a joke, as initialized above) will be used\n",
" default_key=\"joke\",\n",
" # This adds a new option, with name `poem`\n",
" poem=PromptTemplate.from_template(\"Write a short poem about {topic}\"),\n",

View File

@@ -63,7 +63,7 @@
"* The `load` methods is a convenience method meant solely for prototyping work -- it just invokes `list(self.lazy_load())`.\n",
"* The `alazy_load` has a default implementation that will delegate to `lazy_load`. If you're using async, we recommend overriding the default implementation and providing a native async implementation.\n",
"\n",
"::: {.callout-important}\n",
":::{.callout-important}\n",
"When implementing a document loader do **NOT** provide parameters via the `lazy_load` or `alazy_load` methods.\n",
"\n",
"All configuration is expected to be passed through the initializer (__init__). This was a design choice made by LangChain to make sure that once a document loader has been instantiated it has all the information needed to load documents.\n",
@@ -235,7 +235,7 @@
"id": "56cb443e-f987-4386-b4ec-975ee129adb2",
"metadata": {},
"source": [
"::: {.callout-tip}\n",
":::{.callout-tip}\n",
"\n",
"`load()` can be helpful in an interactive environment such as a jupyter notebook.\n",
"\n",
@@ -276,7 +276,7 @@
"source": [
"## Working with Files\n",
"\n",
"Many document loaders invovle parsing files. The difference between such loaders usually stems from how the file is parsed rather than how the file is loaded. For example, you can use `open` to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.\n",
"Many document loaders involve parsing files. The difference between such loaders usually stems from how the file is parsed, rather than how the file is loaded. For example, you can use `open` to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.\n",
"\n",
"As a result, it can be helpful to decouple the parsing logic from the loading logic, which makes it easier to re-use a given parser regardless of how the data was loaded.\n",
"\n",
@@ -355,7 +355,7 @@
"id": "433bfb7c-7767-43bc-b71e-42413d7494a8",
"metadata": {},
"source": [
"Using the **blob** API also allows one to load content direclty from memory without having to read it from a file!"
"Using the **blob** API also allows one to load content directly from memory without having to read it from a file!"
]
},
{

View File

@@ -182,7 +182,7 @@ pprint(data)
</CodeOutputBlock>
Another option is set `jq_schema='.'` and provide `content_key`:
Another option is to set `jq_schema='.'` and provide `content_key`:
```python
loader = JSONLoader(

File diff suppressed because one or more lines are too long

View File

@@ -28,7 +28,7 @@
"\n",
"You can use arbitrary functions as [Runnables](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable). This is useful for formatting or when you need functionality not provided by other LangChain components, and custom functions used as Runnables are called [`RunnableLambdas`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.RunnableLambda.html).\n",
"\n",
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single dict input and unpacks it into multiple argument.\n",
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single dict input and unpacks it into multiple arguments.\n",
"\n",
"This guide will cover:\n",
"\n",

View File

@@ -364,7 +364,10 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema\n",
"from langchain_community.chains.graph_qa.cypher_utils import (\n",
" CypherQueryCorrector,\n",
" Schema,\n",
")\n",
"\n",
"# Cypher validation tool for relationship directions\n",
"corrector_schema = [\n",

View File

@@ -31,6 +31,8 @@ This highlights functionality that is core to using LangChain.
[**LCEL cheatsheet**](/docs/how_to/lcel_cheatsheet/): For a quick overview of how to use the main LCEL primitives.
[**Migration guide**](/docs/versions/migrating_chains): For migrating legacy chain abstractions to LCEL.
- [How to: chain runnables](/docs/how_to/sequence)
- [How to: stream runnables](/docs/how_to/streaming)
- [How to: invoke runnables in parallel](/docs/how_to/parallel/)
@@ -43,7 +45,6 @@ This highlights functionality that is core to using LangChain.
- [How to: create a dynamic (self-constructing) chain](/docs/how_to/dynamic_chain/)
- [How to: inspect runnables](/docs/how_to/inspect)
- [How to: add fallbacks to a runnable](/docs/how_to/fallbacks)
- [How to: migrate chains to LCEL](/docs/how_to/migrate_chains)
- [How to: pass runtime secrets to a runnable](/docs/how_to/runnable_runtime_secrets)
## Components
@@ -87,6 +88,7 @@ These are the core building blocks you can use when building applications.
- [How to: few shot prompt tool behavior](/docs/how_to/tools_few_shot)
- [How to: bind model-specific formatted tools](/docs/how_to/tools_model_specific)
- [How to: force a specific tool call](/docs/how_to/tool_choice)
- [How to: work with local models](/docs/how_to/local_llms)
- [How to: init any model in one line](/docs/how_to/chat_models_universal_init/)
### Messages
@@ -105,7 +107,7 @@ What LangChain calls [LLMs](/docs/concepts/#llms) are older forms of language mo
- [How to: create a custom LLM class](/docs/how_to/custom_llm)
- [How to: stream a response back](/docs/how_to/streaming_llm)
- [How to: track token usage](/docs/how_to/llm_token_usage_tracking)
- [How to: work with local LLMs](/docs/how_to/local_llms)
- [How to: work with local models](/docs/how_to/local_llms)
### Output parsers

View File

@@ -36,7 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.globals import set_llm_cache\n",
"from langchain_core.globals import set_llm_cache\n",
"from langchain_openai import OpenAI\n",
"\n",
"# To make the caching really obvious, lets use a slower and older model.\n",
@@ -71,7 +71,7 @@
],
"source": [
"%%time\n",
"from langchain.cache import InMemoryCache\n",
"from langchain_core.caches import InMemoryCache\n",
"\n",
"set_llm_cache(InMemoryCache())\n",
"\n",

View File

@@ -5,11 +5,11 @@
"id": "b8982428",
"metadata": {},
"source": [
"# Run LLMs locally\n",
"# Run models locally\n",
"\n",
"## Use case\n",
"\n",
"The popularity of projects like [PrivateGPT](https://github.com/imartinez/privateGPT), [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://github.com/ollama/ollama), [GPT4All](https://github.com/nomic-ai/gpt4all), [llamafile](https://github.com/Mozilla-Ocho/llamafile), and others underscore the demand to run LLMs locally (on your own device).\n",
"The popularity of projects like [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://github.com/ollama/ollama), [GPT4All](https://github.com/nomic-ai/gpt4all), [llamafile](https://github.com/Mozilla-Ocho/llamafile), and others underscore the demand to run LLMs locally (on your own device).\n",
"\n",
"This has at least two important benefits:\n",
"\n",
@@ -66,6 +66,12 @@
"\n",
"![Image description](../../static/img/llama_t_put.png)\n",
"\n",
"### Formatting prompts\n",
"\n",
"Some providers have [chat model](/docs/concepts/#chat-models) wrappers that takes care of formatting your input prompt for the specific local model you're using. However, if you are prompting local models with a [text-in/text-out LLM](/docs/concepts/#llms) wrapper, you may need to use a prompt tailed for your specific model.\n",
"\n",
"This can [require the inclusion of special tokens](https://huggingface.co/blog/llama2#how-to-prompt-llama-2). [Here's an example for LLaMA 2](https://smith.langchain.com/hub/rlm/rag-prompt-llama).\n",
"\n",
"## Quickstart\n",
"\n",
"[`Ollama`](https://ollama.ai/) is one way to easily run inference on macOS.\n",
@@ -73,10 +79,20 @@
"The instructions [here](https://github.com/jmorganca/ollama?tab=readme-ov-file#ollama) provide details, which we summarize:\n",
" \n",
"* [Download and run](https://ollama.ai/download) the app\n",
"* From command line, fetch a model from this [list of options](https://github.com/jmorganca/ollama): e.g., `ollama pull llama2`\n",
"* From command line, fetch a model from this [list of options](https://github.com/jmorganca/ollama): e.g., `ollama pull llama3.1:8b`\n",
"* When the app is running, all models are automatically served on `localhost:11434`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29450fc9",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_ollama"
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -86,7 +102,7 @@
{
"data": {
"text/plain": [
"' The first man on the moon was Neil Armstrong, who landed on the moon on July 20, 1969 as part of the Apollo 11 mission. obviously.'"
"'...Neil Armstrong!\\n\\nOn July 20, 1969, Neil Armstrong became the first person to set foot on the lunar surface, famously declaring \"That\\'s one small step for man, one giant leap for mankind\" as he stepped off the lunar module Eagle onto the Moon\\'s surface.\\n\\nWould you like to know more about the Apollo 11 mission or Neil Armstrong\\'s achievements?'"
]
},
"execution_count": 2,
@@ -95,51 +111,78 @@
}
],
"source": [
"from langchain_community.llms import Ollama\n",
"from langchain_ollama import OllamaLLM\n",
"\n",
"llm = OllamaLLM(model=\"llama3.1:8b\")\n",
"\n",
"llm = Ollama(model=\"llama2\")\n",
"llm.invoke(\"The first man on the moon was ...\")"
]
},
{
"cell_type": "markdown",
"id": "343ab645",
"id": "674cc672",
"metadata": {},
"source": [
"Stream tokens as they are being generated."
"Stream tokens as they are being generated:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "9cd83603",
"execution_count": 3,
"id": "1386a852",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" The first man to walk on the moon was Neil Armstrong, an American astronaut who was part of the Apollo 11 mission in 1969. февруари 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon's surface, famously declaring \"That's one small step for man, one giant leap for mankind\" as he took his first steps. He was followed by fellow astronaut Edwin \"Buzz\" Aldrin, who also walked on the moon during the mission."
"...|"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Neil| Armstrong|,| an| American| astronaut|.| He| stepped| out| of| the| lunar| module| Eagle| and| onto| the| surface| of| the| Moon| on| July| |20|,| |196|9|,| famously| declaring|:| \"|That|'s| one| small| step| for| man|,| one| giant| leap| for| mankind|.\"||"
]
}
],
"source": [
"for chunk in llm.stream(\"The first man on the moon was ...\"):\n",
" print(chunk, end=\"|\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "e5731060",
"metadata": {},
"source": [
"Ollama also includes a chat model wrapper that handles formatting conversation turns:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f14a778a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' The first man to walk on the moon was Neil Armstrong, an American astronaut who was part of the Apollo 11 mission in 1969. февруари 20, 1969, Armstrong stepped out of the lunar module Eagle and onto the moon\\'s surface, famously declaring \"That\\'s one small step for man, one giant leap for mankind\" as he took his first steps. He was followed by fellow astronaut Edwin \"Buzz\" Aldrin, who also walked on the moon during the mission.'"
"AIMessage(content='The answer is a historic one!\\n\\nThe first man to walk on the Moon was Neil Armstrong, an American astronaut and commander of the Apollo 11 mission. On July 20, 1969, Armstrong stepped out of the lunar module Eagle onto the surface of the Moon, famously declaring:\\n\\n\"That\\'s one small step for man, one giant leap for mankind.\"\\n\\nArmstrong was followed by fellow astronaut Edwin \"Buzz\" Aldrin, who also walked on the Moon during the mission. Michael Collins remained in orbit around the Moon in the command module Columbia.\\n\\nNeil Armstrong passed away on August 25, 2012, but his legacy as a pioneering astronaut and engineer continues to inspire people around the world!', response_metadata={'model': 'llama3.1:8b', 'created_at': '2024-08-01T00:38:29.176717Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 10681861417, 'load_duration': 34270292, 'prompt_eval_count': 19, 'prompt_eval_duration': 6209448000, 'eval_count': 141, 'eval_duration': 4432022000}, id='run-7bed57c5-7f54-4092-912c-ae49073dcd48-0', usage_metadata={'input_tokens': 19, 'output_tokens': 141, 'total_tokens': 160})"
]
},
"execution_count": 40,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler\n",
"from langchain_ollama import ChatOllama\n",
"\n",
"llm = Ollama(\n",
" model=\"llama2\", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])\n",
")\n",
"llm.invoke(\"The first man on the moon was ...\")"
"chat_model = ChatOllama(model=\"llama3.1:8b\")\n",
"\n",
"chat_model.invoke(\"Who was the first man on the moon?\")"
]
},
{
@@ -199,7 +242,7 @@
"\n",
"With [Ollama](https://github.com/jmorganca/ollama), fetch a model via `ollama pull <model family>:<tag>`:\n",
"\n",
"* E.g., for Llama-7b: `ollama pull llama2` will download the most basic version of the model (e.g., smallest # parameters and 4 bit quantization)\n",
"* E.g., for Llama 2 7b: `ollama pull llama2` will download the most basic version of the model (e.g., smallest # parameters and 4 bit quantization)\n",
"* We can also specify a particular version from the [model list](https://github.com/jmorganca/ollama?tab=readme-ov-file#model-library), e.g., `ollama pull llama2:13b`\n",
"* See the full set of parameters on the [API reference page](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html)"
]
@@ -222,9 +265,7 @@
}
],
"source": [
"from langchain_community.llms import Ollama\n",
"\n",
"llm = Ollama(model=\"llama2:13b\")\n",
"llm = OllamaLLM(model=\"llama2:13b\")\n",
"llm.invoke(\"The first man on the moon was ... think step by step\")"
]
},
@@ -268,11 +309,7 @@
"cell_type": "code",
"execution_count": null,
"id": "5eba38dc",
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"metadata": {},
"outputs": [],
"source": [
"%env CMAKE_ARGS=\"-DLLAMA_METAL=on\"\n",
@@ -542,7 +579,6 @@
}
],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.chains.prompt_selector import ConditionalPromptSelector\n",
"from langchain_core.prompts import PromptTemplate\n",
"\n",
@@ -613,9 +649,9 @@
],
"source": [
"# Chain\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"chain = prompt | llm\n",
"question = \"What NFL team won the Super Bowl in the year that Justin Bieber was born?\"\n",
"llm_chain.run({\"question\": question})"
"chain.invoke({\"question\": question})"
]
},
{
@@ -666,7 +702,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.10.5"
}
},
"nbformat": 4,

View File

@@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "662fac50",
"metadata": {},
"outputs": [],
@@ -50,6 +50,26 @@
"%pip install -U langgraph langchain langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "6f8ec38f",
"metadata": {},
"source": [
"Then, set your OpenAI API key."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5fca87ef",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
]
},
{
"cell_type": "markdown",
"id": "8e50635c-1671-46e6-be65-ce95f8167c2f",
@@ -62,7 +82,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "1e425fea-2796-4b99-bee6-9a6ffe73f756",
"metadata": {},
"outputs": [],
@@ -95,7 +115,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "03ea357c-9c36-4464-b2cc-27bd150e1554",
"metadata": {},
"outputs": [
@@ -106,7 +126,7 @@
" 'output': 'The value of `magic_function(3)` is 5.'}"
]
},
"execution_count": 2,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -142,7 +162,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "53a3737a-d167-4255-89bf-20ac37f89a3e",
"metadata": {},
"outputs": [
@@ -153,7 +173,7 @@
" 'output': 'The value of `magic_function(3)` is 5.'}"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -173,7 +193,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "74ecebe3-512e-409c-a661-bdd5b0a2b782",
"metadata": {},
"outputs": [
@@ -181,10 +201,10 @@
"data": {
"text/plain": [
"{'input': 'Pardon?',\n",
" 'output': 'The result of applying `magic_function` to the input 3 is 5.'}"
" 'output': 'The value you get when you apply `magic_function` to the input 3 is 5.'}"
]
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -223,7 +243,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "a9a11ccd-75e2-4c11-844d-a34870b0ff91",
"metadata": {},
"outputs": [
@@ -234,7 +254,7 @@
" 'output': 'El valor de `magic_function(3)` es 5.'}"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -263,19 +283,19 @@
"source": [
"Now, let's pass a custom system message to [react agent executor](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent).\n",
"\n",
"LangGraph's prebuilt `create_react_agent` does not take a prompt template directly as a parameter, but instead takes a [`messages_modifier`](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent) parameter. This modifies messages before they are passed into the model, and can be one of four values:\n",
"LangGraph's prebuilt `create_react_agent` does not take a prompt template directly as a parameter, but instead takes a [`state_modifier`](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent) parameter. This modifies the graph state before the llm is called, and can be one of four values:\n",
"\n",
"- A `SystemMessage`, which is added to the beginning of the list of messages.\n",
"- A `string`, which is converted to a `SystemMessage` and added to the beginning of the list of messages.\n",
"- A `Callable`, which should take in a list of messages. The output is then passed to the language model.\n",
"- Or a [`Runnable`](/docs/concepts/#langchain-expression-language-lcel), which should should take in a list of messages. The output is then passed to the language model.\n",
"- A `Callable`, which should take in full graph state. The output is then passed to the language model.\n",
"- Or a [`Runnable`](/docs/concepts/#langchain-expression-language-lcel), which should take in full graph state. The output is then passed to the language model.\n",
"\n",
"Here's how it looks in action:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "a9486805-676a-4d19-a5c4-08b41b172989",
"metadata": {},
"outputs": [],
@@ -287,7 +307,7 @@
"# This could also be a SystemMessage object\n",
"# system_message = SystemMessage(content=\"You are a helpful assistant. Respond only in Spanish.\")\n",
"\n",
"app = create_react_agent(model, tools, messages_modifier=system_message)\n",
"app = create_react_agent(model, tools, state_modifier=system_message)\n",
"\n",
"\n",
"messages = app.invoke({\"messages\": [(\"user\", query)]})"
@@ -304,7 +324,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "d369ab45-0c82-45f4-9d3e-8efb8dd47e2c",
"metadata": {},
"outputs": [
@@ -317,8 +337,8 @@
}
],
"source": [
"from langchain_core.messages import AnyMessage\n",
"from langgraph.prebuilt import create_react_agent\n",
"from langgraph.prebuilt.chat_agent_executor import AgentState\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
@@ -328,13 +348,13 @@
")\n",
"\n",
"\n",
"def _modify_messages(messages: list[AnyMessage]):\n",
" return prompt.invoke({\"messages\": messages}).to_messages() + [\n",
"def _modify_state_messages(state: AgentState):\n",
" return prompt.invoke({\"messages\": state[\"messages\"]}).to_messages() + [\n",
" (\"user\", \"Also say 'Pandamonium!' after the answer.\")\n",
" ]\n",
"\n",
"\n",
"app = create_react_agent(model, tools, messages_modifier=_modify_messages)\n",
"app = create_react_agent(model, tools, state_modifier=_modify_state_messages)\n",
"\n",
"\n",
"messages = app.invoke({\"messages\": [(\"human\", query)]})\n",
@@ -366,8 +386,8 @@
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1fb52a2c",
"execution_count": 9,
"id": "b97beba5-8f74-430c-9399-91b77c8fa15c",
"metadata": {},
"outputs": [
{
@@ -376,7 +396,7 @@
"text": [
"Hi Polly! The output of the magic function for the input 3 is 5.\n",
"---\n",
"Yes, I remember your name, Polly! How can I assist you further?\n",
"Yes, your name is Polly!\n",
"---\n",
"The output of the magic function for the input 3 is 5.\n"
]
@@ -384,14 +404,14 @@
],
"source": [
"from langchain.agents import AgentExecutor, create_tool_calling_agent\n",
"from langchain_community.chat_message_histories import ChatMessageHistory\n",
"from langchain_core.chat_history import InMemoryChatMessageHistory\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_core.tools import tool\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"model = ChatOpenAI(model=\"gpt-4o\")\n",
"memory = ChatMessageHistory(session_id=\"test-session\")\n",
"memory = InMemoryChatMessageHistory(session_id=\"test-session\")\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
@@ -456,24 +476,23 @@
},
{
"cell_type": "code",
"execution_count": 9,
"id": "035e1253",
"execution_count": 10,
"id": "baca3dc6-678b-4509-9275-2fd653102898",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hi Polly! The output of the magic_function for the input 3 is 5.\n",
"Hi Polly! The output of the magic_function for the input of 3 is 5.\n",
"---\n",
"Yes, your name is Polly!\n",
"---\n",
"The output of the magic_function for the input 3 was 5.\n"
"The output of the magic_function for the input of 3 was 5.\n"
]
}
],
"source": [
"from langchain_core.messages import SystemMessage\n",
"from langgraph.checkpoint import MemorySaver # an in-memory checkpointer\n",
"from langgraph.prebuilt import create_react_agent\n",
"\n",
@@ -483,7 +502,7 @@
"\n",
"memory = MemorySaver()\n",
"app = create_react_agent(\n",
" model, tools, messages_modifier=system_message, checkpointer=memory\n",
" model, tools, state_modifier=system_message, checkpointer=memory\n",
")\n",
"\n",
"config = {\"configurable\": {\"thread_id\": \"test-thread\"}}\n",
@@ -525,16 +544,16 @@
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d640feb3",
"execution_count": 11,
"id": "e62843c4-1107-41f0-a50b-aea256e28053",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'actions': [ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-c68fd76f-a3c3-4c3c-bfd7-748c171ed4b8', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'index': 0}])], tool_call_id='call_q9MgGFjqJbV2xSUX93WqxmOt')], 'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-c68fd76f-a3c3-4c3c-bfd7-748c171ed4b8', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'index': 0}])]}\n",
"{'steps': [AgentStep(action=ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-c68fd76f-a3c3-4c3c-bfd7-748c171ed4b8', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_q9MgGFjqJbV2xSUX93WqxmOt', 'index': 0}])], tool_call_id='call_q9MgGFjqJbV2xSUX93WqxmOt'), observation=5)], 'messages': [FunctionMessage(content='5', name='magic_function')]}\n",
"{'actions': [ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518'}, id='run-5664e138-7085-4da7-a49e-5656a87b8d78', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'index': 0, 'type': 'tool_call_chunk'}])], tool_call_id='call_1exy0rScfPmo4fy27FbQ5qJ2')], 'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518'}, id='run-5664e138-7085-4da7-a49e-5656a87b8d78', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'index': 0, 'type': 'tool_call_chunk'}])]}\n",
"{'steps': [AgentStep(action=ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518'}, id='run-5664e138-7085-4da7-a49e-5656a87b8d78', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_1exy0rScfPmo4fy27FbQ5qJ2', 'index': 0, 'type': 'tool_call_chunk'}])], tool_call_id='call_1exy0rScfPmo4fy27FbQ5qJ2'), observation=5)], 'messages': [FunctionMessage(content='5', name='magic_function')]}\n",
"{'output': 'The value of `magic_function(3)` is 5.', 'messages': [AIMessage(content='The value of `magic_function(3)` is 5.')]}\n"
]
}
@@ -585,23 +604,23 @@
},
{
"cell_type": "code",
"execution_count": 11,
"id": "86abbe07",
"execution_count": 12,
"id": "076ebc85-f804-4093-a25a-a16334c9898e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_yTjXXibj76tyFyPRa1soLo0S', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 70, 'total_tokens': 84}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-b275f314-c42e-4e77-9dec-5c23f7dbd53b-0', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_yTjXXibj76tyFyPRa1soLo0S'}])]}}\n",
"{'tools': {'messages': [ToolMessage(content='5', name='magic_function', id='41c5f227-528d-4483-a313-b03b23b1d327', tool_call_id='call_yTjXXibj76tyFyPRa1soLo0S')]}}\n",
"{'agent': {'messages': [AIMessage(content='The value of `magic_function(3)` is 5.', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 93, 'total_tokens': 107}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'stop', 'logprobs': None}, id='run-0ef12b6e-415d-4758-9b62-5e5e1b350072-0')]}}\n"
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_my9rzFSKR4T1yYKwCsfbZB8A', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 61, 'total_tokens': 75}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_bc2a86f5f5', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-dd705555-8fae-4fb1-a033-5d99a23e3c22-0', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_my9rzFSKR4T1yYKwCsfbZB8A', 'type': 'tool_call'}], usage_metadata={'input_tokens': 61, 'output_tokens': 14, 'total_tokens': 75})]}}\n",
"{'tools': {'messages': [ToolMessage(content='5', name='magic_function', tool_call_id='call_my9rzFSKR4T1yYKwCsfbZB8A')]}}\n",
"{'agent': {'messages': [AIMessage(content='The value of `magic_function(3)` is 5.', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 84, 'total_tokens': 98}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None}, id='run-698cad05-8cb2-4d08-8c2a-881e354f6cc7-0', usage_metadata={'input_tokens': 84, 'output_tokens': 14, 'total_tokens': 98})]}}\n"
]
}
],
"source": [
"from langchain_core.messages import AnyMessage\n",
"from langgraph.prebuilt import create_react_agent\n",
"from langgraph.prebuilt.chat_agent_executor import AgentState\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
@@ -611,12 +630,11 @@
")\n",
"\n",
"\n",
"def _modify_messages(messages: list[AnyMessage]):\n",
" return prompt.invoke({\"messages\": messages}).to_messages()\n",
"def _modify_state_messages(state: AgentState):\n",
" return prompt.invoke({\"messages\": state[\"messages\"]}).to_messages()\n",
"\n",
"\n",
"app = create_react_agent(model, tools, messages_modifier=_modify_messages)\n",
"\n",
"app = create_react_agent(model, tools, state_modifier=_modify_state_messages)\n",
"\n",
"for step in app.stream({\"messages\": [(\"human\", query)]}, stream_mode=\"updates\"):\n",
" print(step)"
@@ -637,14 +655,14 @@
{
"cell_type": "code",
"execution_count": 12,
"id": "4eff44bc-a620-4c8a-97b1-268692a842bb",
"id": "a2f720f3-c121-4be2-b498-92c16bb44b0a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_ABI4hftfEdnVgKyfF6OzZbca', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-837e794f-cfd8-40e0-8abc-4d98ced11b75', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_ABI4hftfEdnVgKyfF6OzZbca'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_ABI4hftfEdnVgKyfF6OzZbca', 'index': 0}])], tool_call_id='call_ABI4hftfEdnVgKyfF6OzZbca'), 5)]\n"
"[(ToolAgentAction(tool='magic_function', tool_input={'input': 3}, log=\"\\nInvoking: `magic_function` with `{'input': 3}`\\n\\n\\n\", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_uPZ2D1Bo5mdED3gwgaeWURrf', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls', 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518'}, id='run-a792db4a-278d-4090-82ae-904a30eada93', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_uPZ2D1Bo5mdED3gwgaeWURrf', 'type': 'tool_call'}], tool_call_chunks=[{'name': 'magic_function', 'args': '{\"input\":3}', 'id': 'call_uPZ2D1Bo5mdED3gwgaeWURrf', 'index': 0, 'type': 'tool_call_chunk'}])], tool_call_id='call_uPZ2D1Bo5mdED3gwgaeWURrf'), 5)]\n"
]
}
],
@@ -667,16 +685,16 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "4f4364ea-dffe-4d25-bdce-ef7d0020b880",
"id": "ef23117a-5ccb-42ce-80c3-ea49a9d3a942",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'messages': [HumanMessage(content='what is the value of magic_function(3)?', id='0f63e437-c4d8-4da9-b6f5-b293ebfe4a64'),\n",
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_S96v28LlI6hNkQrNnIio0JPh', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 64, 'total_tokens': 78}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-ffef7898-14b1-4537-ad90-7c000a8a5d25-0', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_S96v28LlI6hNkQrNnIio0JPh'}]),\n",
" ToolMessage(content='5', name='magic_function', id='fbd9df4e-1dda-4d3e-9044-b001f7875476', tool_call_id='call_S96v28LlI6hNkQrNnIio0JPh'),\n",
" AIMessage(content='The value of `magic_function(3)` is 5.', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 87, 'total_tokens': 101}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'stop', 'logprobs': None}, id='run-e5d94c54-d9f4-45cd-be8e-a9101a8d88d6-0')]}"
"{'messages': [HumanMessage(content='what is the value of magic_function(3)?', id='cd7d0f49-a0e0-425a-b2b0-603a716058ed'),\n",
" AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_VfZ9287DuybOSrBsQH5X12xf', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 55, 'total_tokens': 69}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a1e965cd-bf61-44f9-aec1-8aaecb80955f-0', tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_VfZ9287DuybOSrBsQH5X12xf', 'type': 'tool_call'}], usage_metadata={'input_tokens': 55, 'output_tokens': 14, 'total_tokens': 69}),\n",
" ToolMessage(content='5', name='magic_function', id='20d5c2fe-a5d8-47fa-9e04-5282642e2039', tool_call_id='call_VfZ9287DuybOSrBsQH5X12xf'),\n",
" AIMessage(content='The value of `magic_function(3)` is 5.', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 78, 'total_tokens': 92}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None}, id='run-abf9341c-ef41-4157-935d-a3be5dfa2f41-0', usage_metadata={'input_tokens': 78, 'output_tokens': 14, 'total_tokens': 92})]}"
]
},
"execution_count": 13,
@@ -708,7 +726,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 16,
"id": "16f189a7-fc78-4cb5-aa16-a94ca06401a6",
"metadata": {},
"outputs": [],
@@ -724,7 +742,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 17,
"id": "c96aefd7-6f6e-4670-aca6-1ac3d4e7871f",
"metadata": {},
"outputs": [
@@ -739,11 +757,7 @@
"Invoking: `magic_function` with `{'input': '3'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mSorry, there was an error. Please try again.\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `magic_function` with `{'input': '3'}`\n",
"responded: Parece que hubo un error al intentar obtener el valor de `magic_function(3)`. Permíteme intentarlo de nuevo.\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mSorry, there was an error. Please try again.\u001b[0m\u001b[32;1m\u001b[1;3mAún no puedo obtener el valor de `magic_function(3)`. ¿Hay algo más en lo que pueda ayudarte?\u001b[0m\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mSorry, there was an error. Please try again.\u001b[0m\u001b[32;1m\u001b[1;3mParece que hubo un error al intentar calcular el valor de la función mágica. ¿Te gustaría que lo intente de nuevo?\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -752,10 +766,10 @@
"data": {
"text/plain": [
"{'input': 'what is the value of magic_function(3)?',\n",
" 'output': 'Aún no puedo obtener el valor de `magic_function(3)`. ¿Hay algo más en lo que pueda ayudarte?'}"
" 'output': 'Parece que hubo un error al intentar calcular el valor de la función mágica. ¿Te gustaría que lo intente de nuevo?'}"
]
},
"execution_count": 15,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@@ -797,7 +811,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 18,
"id": "b974a91f-6ae8-4644-83d9-73666258a6db",
"metadata": {},
"outputs": [
@@ -805,12 +819,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"('human', 'what is the value of magic_function(3)?')\n",
"content='' additional_kwargs={'tool_calls': [{'id': 'call_pFdKcCu5taDTtOOfX14vEDRp', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 64, 'total_tokens': 78}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-25836468-ba7e-43be-a7cf-76bba06a2a08-0' tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_pFdKcCu5taDTtOOfX14vEDRp'}]\n",
"content='Sorry, there was an error. Please try again.' name='magic_function' id='1a08b883-9c7b-4969-9e9b-67ce64cdcb5f' tool_call_id='call_pFdKcCu5taDTtOOfX14vEDRp'\n",
"content='It seems there was an error when trying to apply the magic function. Let me try again.' additional_kwargs={'tool_calls': [{'id': 'call_DA0lpDIkBFg2GHy4WsEcZG4K', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 97, 'total_tokens': 131}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-d571b774-0ea3-4e35-8b7d-f32932c3f3cc-0' tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_DA0lpDIkBFg2GHy4WsEcZG4K'}]\n",
"content='Sorry, there was an error. Please try again.' name='magic_function' id='0b45787b-c82a-487f-9a5a-de129c30460f' tool_call_id='call_DA0lpDIkBFg2GHy4WsEcZG4K'\n",
"content='It appears that there is a consistent issue when trying to apply the magic function to the input \"3.\" This could be due to various reasons, such as the input not being in the correct format or an internal error.\\n\\nIf you have any other questions or if there\\'s something else you\\'d like to try, please let me know!' response_metadata={'token_usage': {'completion_tokens': 66, 'prompt_tokens': 153, 'total_tokens': 219}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'stop', 'logprobs': None} id='run-50a962e6-21b7-4327-8dea-8e2304062627-0'\n"
"content='what is the value of magic_function(3)?' id='74e2d5e8-2b59-4820-979c-8d11ecfc14c2'\n",
"content='' additional_kwargs={'tool_calls': [{'id': 'call_ihtrH6IG95pDXpKluIwAgi3J', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 55, 'total_tokens': 69}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-5a35e465-8a08-43dd-ac8b-4a76dcace305-0' tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_ihtrH6IG95pDXpKluIwAgi3J', 'type': 'tool_call'}] usage_metadata={'input_tokens': 55, 'output_tokens': 14, 'total_tokens': 69}\n",
"content='Sorry, there was an error. Please try again.' name='magic_function' id='8c37c19b-3586-46b1-aab9-a045786801a2' tool_call_id='call_ihtrH6IG95pDXpKluIwAgi3J'\n",
"content='It seems there was an error in processing the request. Let me try again.' additional_kwargs={'tool_calls': [{'id': 'call_iF0vYWAd6rfely0cXSqdMOnF', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 31, 'prompt_tokens': 88, 'total_tokens': 119}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-eb88ec77-d492-43a5-a5dd-4cefef9a6920-0' tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_iF0vYWAd6rfely0cXSqdMOnF', 'type': 'tool_call'}] usage_metadata={'input_tokens': 88, 'output_tokens': 31, 'total_tokens': 119}\n",
"content='Sorry, there was an error. Please try again.' name='magic_function' id='c9ff261f-a0f1-4c92-a9f2-cd749f62d911' tool_call_id='call_iF0vYWAd6rfely0cXSqdMOnF'\n",
"content='I am currently unable to process the request with the input \"3\" for the `magic_function`. If you have any other questions or need assistance with something else, please let me know!' response_metadata={'token_usage': {'completion_tokens': 39, 'prompt_tokens': 141, 'total_tokens': 180}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None} id='run-d42508aa-f286-4b57-80fb-f8a76736d470-0' usage_metadata={'input_tokens': 141, 'output_tokens': 39, 'total_tokens': 180}\n"
]
}
],
@@ -847,7 +861,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 19,
"id": "4b8498fc-a7af-4164-a401-d8714f082306",
"metadata": {},
"outputs": [
@@ -874,7 +888,7 @@
" 'output': 'Agent stopped due to max iterations.'}"
]
},
"execution_count": 17,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -917,7 +931,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 20,
"id": "a2b29113-e6be-4f91-aa4c-5c63dea3e423",
"metadata": {},
"outputs": [
@@ -925,7 +939,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_HaQkeCwD5QskzJzFixCBacZ4', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 64, 'total_tokens': 78}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-596c9200-771f-436d-8576-72fcb81620f1-0', tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_HaQkeCwD5QskzJzFixCBacZ4'}])]}}\n",
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_FKiTkTd0Ffd4rkYSzERprf1M', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 55, 'total_tokens': 69}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-b842f7b6-ec10-40f8-8c0e-baa220b77e91-0', tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_FKiTkTd0Ffd4rkYSzERprf1M', 'type': 'tool_call'}], usage_metadata={'input_tokens': 55, 'output_tokens': 14, 'total_tokens': 69})]}}\n",
"------\n",
"{'input': 'what is the value of magic_function(3)?', 'output': 'Agent stopped due to max iterations.'}\n"
]
@@ -956,7 +970,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 21,
"id": "e9eb55f4-a321-4bac-b52d-9e43b411cf92",
"metadata": {},
"outputs": [
@@ -964,7 +978,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_4agJXUHtmHrOOMogjF6ZuzAv', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 64, 'total_tokens': 78}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a1c77db7-405f-43d9-8d57-751f2ca1a58c-0', tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_4agJXUHtmHrOOMogjF6ZuzAv'}])]}}\n",
"{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_WoOB8juagB08xrP38twYlYKR', 'function': {'arguments': '{\"input\":\"3\"}', 'name': 'magic_function'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 55, 'total_tokens': 69}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-73dee47e-30ab-42c9-bb0c-6f227cac96cd-0', tool_calls=[{'name': 'magic_function', 'args': {'input': '3'}, 'id': 'call_WoOB8juagB08xrP38twYlYKR', 'type': 'tool_call'}], usage_metadata={'input_tokens': 55, 'output_tokens': 14, 'total_tokens': 69})]}}\n",
"------\n",
"Task Cancelled.\n"
]
@@ -1005,7 +1019,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 22,
"id": "3f6e2cf2",
"metadata": {},
"outputs": [
@@ -1067,7 +1081,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 23,
"id": "73cabbc4",
"metadata": {},
"outputs": [
@@ -1075,10 +1089,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
"('human', 'what is the value of magic_function(3)?')\n",
"content='' additional_kwargs={'tool_calls': [{'id': 'call_bTURmOn9C8zslmn0kMFeykIn', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 64, 'total_tokens': 78}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-0844a504-7e6b-4ea6-a069-7017e38121ee-0' tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_bTURmOn9C8zslmn0kMFeykIn'}]\n",
"content='Sorry there was an error, please try again.' name='magic_function' id='00d5386f-eb23-4628-9a29-d9ce6a7098cc' tool_call_id='call_bTURmOn9C8zslmn0kMFeykIn'\n",
"content='' additional_kwargs={'tool_calls': [{'id': 'call_JYqvvvWmXow2u012DuPoDHFV', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 96, 'total_tokens': 110}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_729ea513f7', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-b73b1b1c-c829-4348-98cd-60b315c85448-0' tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_JYqvvvWmXow2u012DuPoDHFV'}]\n",
"content='what is the value of magic_function(3)?' id='4fa7fbe5-758c-47a3-9268-717665d10680'\n",
"content='' additional_kwargs={'tool_calls': [{'id': 'call_ujE0IQBbIQnxcF9gsZXQfdhF', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 55, 'total_tokens': 69}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-65d689aa-baee-4342-a5d2-048feefab418-0' tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_ujE0IQBbIQnxcF9gsZXQfdhF', 'type': 'tool_call'}] usage_metadata={'input_tokens': 55, 'output_tokens': 14, 'total_tokens': 69}\n",
"content='Sorry there was an error, please try again.' name='magic_function' id='ef8ddf1d-9ad7-4ac0-b784-b673c4d94bbd' tool_call_id='call_ujE0IQBbIQnxcF9gsZXQfdhF'\n",
"content='It seems there was an issue with the previous attempt. Let me try that again.' additional_kwargs={'tool_calls': [{'id': 'call_GcsAfCFUHJ50BN2IOWnwTbQ7', 'function': {'arguments': '{\"input\":3}', 'name': 'magic_function'}, 'type': 'function'}]} response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 87, 'total_tokens': 119}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'tool_calls', 'logprobs': None} id='run-54527c4b-8ff0-4ee8-8abf-224886bd222e-0' tool_calls=[{'name': 'magic_function', 'args': {'input': 3}, 'id': 'call_GcsAfCFUHJ50BN2IOWnwTbQ7', 'type': 'tool_call'}] usage_metadata={'input_tokens': 87, 'output_tokens': 32, 'total_tokens': 119}\n",
"{'input': 'what is the value of magic_function(3)?', 'output': 'Agent stopped due to max iterations.'}\n"
]
}
@@ -1118,7 +1132,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 24,
"id": "b94bb169",
"metadata": {},
"outputs": [
@@ -1216,12 +1230,12 @@
"source": [
"### In LangGraph\n",
"\n",
"We can use the [`messages_modifier`](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent) just as before when passing in [prompt templates](#prompt-templates)."
"We can use the [`state_modifier`](https://langchain-ai.github.io/langgraph/reference/prebuilt/#create_react_agent) just as before when passing in [prompt templates](#prompt-templates)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 25,
"id": "b309ba9a",
"metadata": {},
"outputs": [
@@ -1246,9 +1260,9 @@
}
],
"source": [
"from langchain_core.messages import AnyMessage\n",
"from langgraph.errors import GraphRecursionError\n",
"from langgraph.prebuilt import create_react_agent\n",
"from langgraph.prebuilt.chat_agent_executor import AgentState\n",
"\n",
"magic_step_num = 1\n",
"\n",
@@ -1265,12 +1279,12 @@
"tools = [magic_function]\n",
"\n",
"\n",
"def _modify_messages(messages: list[AnyMessage]):\n",
"def _modify_state_messages(state: AgentState):\n",
" # Give the agent amnesia, only keeping the original user query\n",
" return [(\"system\", \"You are a helpful assistant\"), messages[0]]\n",
" return [(\"system\", \"You are a helpful assistant\"), state[\"messages\"][0]]\n",
"\n",
"\n",
"app = create_react_agent(model, tools, messages_modifier=_modify_messages)\n",
"app = create_react_agent(model, tools, state_modifier=_modify_state_messages)\n",
"\n",
"try:\n",
" for step in app.stream({\"messages\": [(\"human\", query)]}, stream_mode=\"updates\"):\n",
@@ -1308,7 +1322,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -1,811 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f331037f-be3f-4782-856f-d55dab952488",
"metadata": {},
"source": [
"# How to migrate chains to LCEL\n",
"\n",
":::info Prerequisites\n",
"\n",
"This guide assumes familiarity with the following concepts:\n",
"- [LangChain Expression Language](/docs/concepts#langchain-expression-language-lcel)\n",
"\n",
":::\n",
"\n",
"LCEL is designed to streamline the process of building useful apps with LLMs and combining related components. It does this by providing:\n",
"\n",
"1. **A unified interface**: Every LCEL object implements the `Runnable` interface, which defines a common set of invocation methods (`invoke`, `batch`, `stream`, `ainvoke`, ...). This makes it possible to also automatically and consistently support useful operations like streaming of intermediate steps and batching, since every chain composed of LCEL objects is itself an LCEL object.\n",
"2. **Composition primitives**: LCEL provides a number of primitives that make it easy to compose chains, parallelize components, add fallbacks, dynamically configure chain internals, and more.\n",
"\n",
"LangChain maintains a number of legacy abstractions. Many of these can be reimplemented via short combinations of LCEL primitives. Doing so confers some general advantages:\n",
"\n",
"- The resulting chains typically implement the full `Runnable` interface, including streaming and asynchronous support where appropriate;\n",
"- The chains may be more easily extended or modified;\n",
"- The parameters of the chain are typically surfaced for easier customization (e.g., prompts) over previous versions, which tended to be subclasses and had opaque parameters and internals.\n",
"\n",
"The LCEL implementations can be slightly more verbose, but there are significant benefits in transparency and customizability.\n",
"\n",
"In this guide we review LCEL implementations of common legacy abstractions. Where appropriate, we link out to separate guides with more detail."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b99b47ec",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-community langchain langchain-openai faiss-cpu"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "717c8673",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass()"
]
},
{
"cell_type": "markdown",
"id": "e3621b62-a037-42b8-8faa-59575608bb8b",
"metadata": {},
"source": [
"## `LLMChain`\n",
"<span data-heading-keywords=\"llmchain\"></span>\n",
"\n",
"[`LLMChain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.llm.LLMChain.html) combined a prompt template, LLM, and output parser into a class.\n",
"\n",
"Some advantages of switching to the LCEL implementation are:\n",
"\n",
"- Clarity around contents and parameters. The legacy `LLMChain` contains a default output parser and other options.\n",
"- Easier streaming. `LLMChain` only supports streaming via callbacks.\n",
"- Easier access to raw message outputs if desired. `LLMChain` only exposes these via a parameter or via callback.\n",
"\n",
"import { ColumnContainer, Column } from \"@theme/Columns\";\n",
"\n",
"<ColumnContainer>\n",
"\n",
"<Column>\n",
"\n",
"#### Legacy\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "e628905c-430e-4e4a-9d7c-c91d2f42052e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'adjective': 'funny',\n",
" 'text': \"Why couldn't the bicycle find its way home?\\n\\nBecause it lost its bearings!\"}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"user\", \"Tell me a {adjective} joke\")],\n",
")\n",
"\n",
"chain = LLMChain(llm=ChatOpenAI(), prompt=prompt)\n",
"\n",
"chain({\"adjective\": \"funny\"})"
]
},
{
"cell_type": "markdown",
"id": "cdc3b527-c09e-4c77-9711-c3cc4506cd95",
"metadata": {},
"source": [
"\n",
"</Column>\n",
"\n",
"<Column>\n",
"\n",
"#### LCEL\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0d2a7cf8-1bc7-405c-bb0d-f2ab2ba3b6ab",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Why couldn't the bicycle stand up by itself?\\n\\nBecause it was two tired!\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"user\", \"Tell me a {adjective} joke\")],\n",
")\n",
"\n",
"chain = prompt | ChatOpenAI() | StrOutputParser()\n",
"\n",
"chain.invoke({\"adjective\": \"funny\"})"
]
},
{
"cell_type": "markdown",
"id": "3c0b0513-77b8-4371-a20e-3e487cec7e7f",
"metadata": {},
"source": [
"\n",
"</Column>\n",
"</ColumnContainer>\n",
"\n",
"Note that `LLMChain` by default returns a `dict` containing both the input and the output. If this behavior is desired, we can replicate it using another LCEL primitive, [`RunnablePassthrough`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html):"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "529206c5-abbe-4213-9e6c-3b8586c8000d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'adjective': 'funny',\n",
" 'text': \"Why couldn't the bicycle stand up by itself?\\n\\nBecause it was two tired!\"}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"outer_chain = RunnablePassthrough().assign(text=chain)\n",
"\n",
"outer_chain.invoke({\"adjective\": \"funny\"})"
]
},
{
"cell_type": "markdown",
"id": "29d2e26c-2854-4971-9c2b-613450993921",
"metadata": {},
"source": [
"See [this tutorial](/docs/tutorials/llm_chain) for more detail on building with prompt templates, LLMs, and output parsers."
]
},
{
"cell_type": "markdown",
"id": "00df631d-5121-4918-94aa-b88acce9b769",
"metadata": {},
"source": [
"## `ConversationChain`\n",
"<span data-heading-keywords=\"conversationchain\"></span>\n",
"\n",
"[`ConversationChain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversation.base.ConversationChain.html) incorporates a memory of previous messages to sustain a stateful conversation.\n",
"\n",
"Some advantages of switching to the LCEL implementation are:\n",
"\n",
"- Innate support for threads/separate sessions. To make this work with `ConversationChain`, you'd need to instantiate a separate memory class outside the chain.\n",
"- More explicit parameters. `ConversationChain` contains a hidden default prompt, which can cause confusion.\n",
"- Streaming support. `ConversationChain` only supports streaming via callbacks.\n",
"\n",
"`RunnableWithMessageHistory` implements sessions via configuration parameters. It should be instantiated with a callable that returns a [chat message history](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html). By default, it expects this function to take a single argument `session_id`.\n",
"\n",
"<ColumnContainer>\n",
"<Column>\n",
"\n",
"#### Legacy\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4f2cc6dc-d70a-4c13-9258-452f14290da6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'how are you?',\n",
" 'history': '',\n",
" 'response': \"Arrr, I be doin' well, me matey! Just sailin' the high seas in search of treasure and adventure. How can I assist ye today?\"}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import ConversationChain\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"template = \"\"\"\n",
"You are a pirate. Answer the following questions as best you can.\n",
"Chat history: {history}\n",
"Question: {input}\n",
"\"\"\"\n",
"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"memory = ConversationBufferMemory()\n",
"\n",
"chain = ConversationChain(\n",
" llm=ChatOpenAI(),\n",
" memory=memory,\n",
" prompt=prompt,\n",
")\n",
"\n",
"chain({\"input\": \"how are you?\"})"
]
},
{
"cell_type": "markdown",
"id": "f8e36b0e-c7dc-4130-a51b-189d4b756c7f",
"metadata": {},
"source": [
"</Column>\n",
"\n",
"<Column>\n",
"\n",
"#### LCEL\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "173e1a9c-2a18-4669-b0de-136f39197786",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Arrr, I be doin' well, me heartie! Just sailin' the high seas in search of treasure and adventure. How be ye?\""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.chat_history import InMemoryChatMessageHistory\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a pirate. Answer the following questions as best you can.\"),\n",
" (\"placeholder\", \"{chat_history}\"),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"history = InMemoryChatMessageHistory()\n",
"\n",
"\n",
"def get_history():\n",
" return history\n",
"\n",
"\n",
"chain = prompt | ChatOpenAI() | StrOutputParser()\n",
"\n",
"wrapped_chain = RunnableWithMessageHistory(\n",
" chain,\n",
" get_history,\n",
" history_messages_key=\"chat_history\",\n",
")\n",
"\n",
"wrapped_chain.invoke({\"input\": \"how are you?\"})"
]
},
{
"cell_type": "markdown",
"id": "6b386ce6-895e-442c-88f3-7bec0ab9f401",
"metadata": {},
"source": [
"\n",
"</Column>\n",
"</ColumnContainer>\n",
"\n",
"The above example uses the same `history` for all sessions. The example below shows how to use a different chat history for each session."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4e05994f-1fbc-4699-bf2e-62cb0e4deeb8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Ahoy matey! What can this old pirate do for ye today?'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.chat_history import BaseChatMessageHistory\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"\n",
"store = {}\n",
"\n",
"\n",
"def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
" if session_id not in store:\n",
" store[session_id] = InMemoryChatMessageHistory()\n",
" return store[session_id]\n",
"\n",
"\n",
"chain = prompt | ChatOpenAI() | StrOutputParser()\n",
"\n",
"wrapped_chain = RunnableWithMessageHistory(\n",
" chain,\n",
" get_session_history,\n",
" history_messages_key=\"chat_history\",\n",
")\n",
"\n",
"wrapped_chain.invoke(\n",
" {\"input\": \"Hello!\"},\n",
" config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c36ebecb",
"metadata": {},
"source": [
"See [this tutorial](/docs/tutorials/chatbot) for a more end-to-end guide on building with [`RunnableWithMessageHistory`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html).\n",
"\n",
"## `RetrievalQA`\n",
"<span data-heading-keywords=\"retrievalqa\"></span>\n",
"\n",
"The [`RetrievalQA`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval_qa.base.RetrievalQA.html) chain performed natural-language question answering over a data source using retrieval-augmented generation.\n",
"\n",
"Some advantages of switching to the LCEL implementation are:\n",
"\n",
"- Easier customizability. Details such as the prompt and how documents are formatted are only configurable via specific parameters in the `RetrievalQA` chain.\n",
"- More easily return source documents.\n",
"- Support for runnable methods like streaming and async operations.\n",
"\n",
"Now let's look at them side-by-side. We'll use the same ingestion code to load a [blog post by Lilian Weng](https://lilianweng.github.io/posts/2023-06-23-agent/) on autonomous agents into a local vector store:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1efbe16e",
"metadata": {},
"outputs": [],
"source": [
"# Load docs\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_community.document_loaders import WebBaseLoader\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_openai.chat_models import ChatOpenAI\n",
"from langchain_openai.embeddings import OpenAIEmbeddings\n",
"\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()\n",
"\n",
"# Split\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"all_splits = text_splitter.split_documents(data)\n",
"\n",
"# Store splits\n",
"vectorstore = FAISS.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())\n",
"\n",
"# LLM\n",
"llm = ChatOpenAI()"
]
},
{
"cell_type": "markdown",
"id": "c7e16438",
"metadata": {},
"source": [
"<ColumnContainer>\n",
"\n",
"<Column>\n",
"\n",
"#### Legacy"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "43bf55a0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are autonomous agents?',\n",
" 'result': 'Autonomous agents are LLM-empowered agents that handle autonomous design, planning, and performance of complex tasks, such as scientific experiments. These agents can browse the Internet, read documentation, execute code, call robotics experimentation APIs, and leverage other LLMs. They are capable of reasoning and planning ahead for complicated tasks by breaking them down into smaller steps.'}"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import hub\n",
"from langchain.chains import RetrievalQA\n",
"\n",
"# See full prompt at https://smith.langchain.com/hub/rlm/rag-prompt\n",
"prompt = hub.pull(\"rlm/rag-prompt\")\n",
"\n",
"qa_chain = RetrievalQA.from_llm(\n",
" llm, retriever=vectorstore.as_retriever(), prompt=prompt\n",
")\n",
"\n",
"qa_chain(\"What are autonomous agents?\")"
]
},
{
"cell_type": "markdown",
"id": "081948e5",
"metadata": {},
"source": [
"</Column>\n",
"\n",
"<Column>\n",
"\n",
"#### LCEL\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "9efcc931",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Autonomous agents are agents that can handle autonomous design, planning, and performance of complex tasks, such as scientific experiments. They can browse the Internet, read documentation, execute code, call robotics experimentation APIs, and leverage other language model models. These agents use reasoning steps to develop solutions to specific tasks, like creating a novel anticancer drug.'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import hub\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# See full prompt at https://smith.langchain.com/hub/rlm/rag-prompt\n",
"prompt = hub.pull(\"rlm/rag-prompt\")\n",
"\n",
"\n",
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"qa_chain = (\n",
" {\n",
" \"context\": vectorstore.as_retriever() | format_docs,\n",
" \"question\": RunnablePassthrough(),\n",
" }\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")\n",
"\n",
"qa_chain.invoke(\"What are autonomous agents?\")"
]
},
{
"cell_type": "markdown",
"id": "d6f44fe8",
"metadata": {},
"source": [
"</Column>\n",
"</ColumnContainer>\n",
"\n",
"The LCEL implementation exposes the internals of what's happening around retrieving, formatting documents, and passing them through a prompt to the LLM, but it is more verbose. You can customize and wrap this composition logic in a helper function, or use the higher-level [`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) and [`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) helper method:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "5fe42761",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'What are autonomous agents?',\n",
" 'context': [Document(page_content='Boiko et al. (2023) also looked into LLM-empowered agents for scientific discovery, to handle autonomous design, planning, and performance of complex scientific experiments. This agent can use tools to browse the Internet, read documentation, execute code, call robotics experimentation APIs and leverage other LLMs.\\nFor example, when requested to \"develop a novel anticancer drug\", the model came up with the following reasoning steps:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content='Weng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. LilLog. https://lilianweng.github.io/posts/2023-06-23-agent/.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content=\"LLM Powered Autonomous Agents | Lil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLil'Log\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nPosts\\n\\n\\n\\n\\nArchive\\n\\n\\n\\n\\nSearch\\n\\n\\n\\n\\nTags\\n\\n\\n\\n\\nFAQ\\n\\n\\n\\n\\nemojisearch.app\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n LLM Powered Autonomous Agents\\n \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng\\n\\n\\n \\n\\n\\nTable of Contents\\n\\n\\n\\nAgent System Overview\\n\\nComponent One: Planning\\n\\nTask Decomposition\\n\\nSelf-Reflection\\n\\n\\nComponent Two: Memory\\n\\nTypes of Memory\\n\\nMaximum Inner Product Search (MIPS)\", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'})],\n",
" 'answer': 'Autonomous agents are entities that can operate independently, making decisions and taking actions without direct human intervention. These agents can perform tasks such as planning, executing complex experiments, and leveraging various tools and resources to achieve objectives. In the context provided, LLM-powered autonomous agents are specifically designed for scientific discovery, capable of handling tasks like designing novel anticancer drugs through reasoning steps.'}"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import hub\n",
"from langchain.chains import create_retrieval_chain\n",
"from langchain.chains.combine_documents import create_stuff_documents_chain\n",
"\n",
"# See full prompt at https://smith.langchain.com/hub/langchain-ai/retrieval-qa-chat\n",
"retrieval_qa_chat_prompt = hub.pull(\"langchain-ai/retrieval-qa-chat\")\n",
"\n",
"combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)\n",
"rag_chain = create_retrieval_chain(vectorstore.as_retriever(), combine_docs_chain)\n",
"\n",
"rag_chain.invoke({\"input\": \"What are autonomous agents?\"})"
]
},
{
"cell_type": "markdown",
"id": "2772f4e9",
"metadata": {},
"source": [
"## `ConversationalRetrievalChain`\n",
"<span data-heading-keywords=\"conversationalretrievalchain\"></span>\n",
"\n",
"The [`ConversationalRetrievalChain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html) was an all-in one way that combined retrieval-augmented generation with chat history, allowing you to \"chat with\" your documents.\n",
"\n",
"Advantages of switching to the LCEL implementation are similar to the `RetrievalQA` section above:\n",
"\n",
"- Clearer internals. The `ConversationalRetrievalChain` chain hides an entire question rephrasing step which dereferences the initial query against the chat history.\n",
" - This means the class contains two sets of configurable prompts, LLMs, etc.\n",
"- More easily return source documents.\n",
"- Support for runnable methods like streaming and async operations.\n",
"\n",
"Here are side-by-side implementations with custom prompts. We'll reuse the loaded documents and vector store from the previous section:"
]
},
{
"cell_type": "markdown",
"id": "8bc06416",
"metadata": {},
"source": [
"<ColumnContainer>\n",
"\n",
"<Column>\n",
"\n",
"#### Legacy"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "54eb9576",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What are autonomous agents?',\n",
" 'chat_history': '',\n",
" 'answer': 'Autonomous agents are powered by Large Language Models (LLMs) to handle tasks like scientific discovery and complex experiments autonomously. These agents can browse the internet, read documentation, execute code, and leverage other LLMs to perform tasks. They can reason and plan ahead to decompose complicated tasks into manageable steps.'}"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"condense_question_template = \"\"\"\n",
"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.\n",
"\n",
"Chat History:\n",
"{chat_history}\n",
"Follow Up Input: {question}\n",
"Standalone question:\"\"\"\n",
"\n",
"condense_question_prompt = ChatPromptTemplate.from_template(condense_question_template)\n",
"\n",
"qa_template = \"\"\"\n",
"You are an assistant for question-answering tasks.\n",
"Use the following pieces of retrieved context to answer\n",
"the question. If you don't know the answer, say that you\n",
"don't know. Use three sentences maximum and keep the\n",
"answer concise.\n",
"\n",
"Chat History:\n",
"{chat_history}\n",
"\n",
"Other context:\n",
"{context}\n",
"\n",
"Question: {question}\n",
"\"\"\"\n",
"\n",
"qa_prompt = ChatPromptTemplate.from_template(qa_template)\n",
"\n",
"convo_qa_chain = ConversationalRetrievalChain.from_llm(\n",
" llm,\n",
" vectorstore.as_retriever(),\n",
" condense_question_prompt=condense_question_prompt,\n",
" combine_docs_chain_kwargs={\n",
" \"prompt\": qa_prompt,\n",
" },\n",
")\n",
"\n",
"convo_qa_chain(\n",
" {\n",
" \"question\": \"What are autonomous agents?\",\n",
" \"chat_history\": \"\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "43a8a23c",
"metadata": {},
"source": [
"</Column>\n",
"\n",
"<Column>\n",
"\n",
"#### LCEL\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "c884b138",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'What are autonomous agents?',\n",
" 'chat_history': [],\n",
" 'context': [Document(page_content='Boiko et al. (2023) also looked into LLM-empowered agents for scientific discovery, to handle autonomous design, planning, and performance of complex scientific experiments. This agent can use tools to browse the Internet, read documentation, execute code, call robotics experimentation APIs and leverage other LLMs.\\nFor example, when requested to \"develop a novel anticancer drug\", the model came up with the following reasoning steps:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content='Weng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. LilLog. https://lilianweng.github.io/posts/2023-06-23-agent/.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'}),\n",
" Document(page_content='Or\\n@article{weng2023agent,\\n title = \"LLM-powered Autonomous Agents\",\\n author = \"Weng, Lilian\",\\n journal = \"lilianweng.github.io\",\\n year = \"2023\",\\n month = \"Jun\",\\n url = \"https://lilianweng.github.io/posts/2023-06-23-agent/\"\\n}\\nReferences#\\n[1] Wei et al. “Chain of thought prompting elicits reasoning in large language models.” NeurIPS 2022\\n[2] Yao et al. “Tree of Thoughts: Dliberate Problem Solving with Large Language Models.” arXiv preprint arXiv:2305.10601 (2023).', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agents brain, complemented by several key components:', 'language': 'en'})],\n",
" 'answer': 'Autonomous agents are entities capable of acting independently, making decisions, and performing tasks without direct human intervention. These agents can interact with their environment, perceive information, and take actions based on their goals or objectives. They often use artificial intelligence techniques to navigate and accomplish tasks in complex or dynamic environments.'}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import create_history_aware_retriever, create_retrieval_chain\n",
"\n",
"condense_question_system_template = (\n",
" \"Given a chat history and the latest user question \"\n",
" \"which might reference context in the chat history, \"\n",
" \"formulate a standalone question which can be understood \"\n",
" \"without the chat history. Do NOT answer the question, \"\n",
" \"just reformulate it if needed and otherwise return it as is.\"\n",
")\n",
"\n",
"condense_question_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", condense_question_system_template),\n",
" (\"placeholder\", \"{chat_history}\"),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"history_aware_retriever = create_history_aware_retriever(\n",
" llm, vectorstore.as_retriever(), condense_question_prompt\n",
")\n",
"\n",
"system_prompt = (\n",
" \"You are an assistant for question-answering tasks. \"\n",
" \"Use the following pieces of retrieved context to answer \"\n",
" \"the question. If you don't know the answer, say that you \"\n",
" \"don't know. Use three sentences maximum and keep the \"\n",
" \"answer concise.\"\n",
" \"\\n\\n\"\n",
" \"{context}\"\n",
")\n",
"\n",
"qa_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system_prompt),\n",
" (\"placeholder\", \"{chat_history}\"),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"qa_chain = create_stuff_documents_chain(llm, qa_prompt)\n",
"\n",
"convo_qa_chain = create_retrieval_chain(history_aware_retriever, qa_chain)\n",
"\n",
"convo_qa_chain.invoke(\n",
" {\n",
" \"input\": \"What are autonomous agents?\",\n",
" \"chat_history\": [],\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b2717810",
"metadata": {},
"source": [
"</Column>\n",
"\n",
"</ColumnContainer>\n",
"\n",
"## Next steps\n",
"\n",
"You've now seen how to migrate existing usage of some legacy chains to LCEL.\n",
"\n",
"Next, check out the [LCEL conceptual docs](/docs/concepts/#langchain-expression-language-lcel) for more background information."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,27 +1,103 @@
# How to use LangChain with different Pydantic versions
- Pydantic v2 was released in June, 2023 (https://docs.pydantic.dev/2.0/blog/pydantic-v2-final/)
- v2 contains has a number of breaking changes (https://docs.pydantic.dev/2.0/migration/)
- Pydantic v2 and v1 are under the same package name, so both versions cannot be installed at the same time
- Pydantic v2 was released in June, 2023 (https://docs.pydantic.dev/2.0/blog/pydantic-v2-final/).
- v2 contains has a number of breaking changes (https://docs.pydantic.dev/2.0/migration/).
- Pydantic 1 End of Life was in June 2024. LangChain will be dropping support for Pydantic 1 in the near future,
and likely migrating internally to Pydantic 2. The timeline is tentatively September. This change will be accompanied by a minor version bump in the main langchain packages to version 0.3.x.
## LangChain Pydantic migration plan
As of `langchain>=0.0.267`, LangChain allows users to install either Pydantic V1 or V2.
As of `langchain>=0.0.267`, LangChain will allow users to install either Pydantic V1 or V2.
* Internally LangChain will continue to [use V1](https://docs.pydantic.dev/latest/migration/#continue-using-pydantic-v1-features).
* During this time, users can pin their pydantic version to v1 to avoid breaking changes, or start a partial
migration using pydantic v2 throughout their code, but avoiding mixing v1 and v2 code for LangChain (see below).
Internally, LangChain continues to use the [Pydantic V1](https://docs.pydantic.dev/latest/migration/#continue-using-pydantic-v1-features) via
the v1 namespace of Pydantic 2.
User can either pin to pydantic v1, and upgrade their code in one go once LangChain has migrated to v2 internally, or they can start a partial migration to v2, but must avoid mixing v1 and v2 code for LangChain.
Because Pydantic does not support mixing .v1 and .v2 objects, users should be aware of a number of issues
when using LangChain with Pydantic.
:::caution
While LangChain supports Pydantic V2 objects in some APIs (listed below), it's suggested that users keep using Pydantic V1 objects until LangChain 0.3 is released.
:::
## 1. Passing Pydantic objects to LangChain APIs
Most LangChain APIs for *tool usage* (see list below) have been updated to accept either Pydantic v1 or v2 objects.
* Pydantic v1 objects correspond to subclasses of `pydantic.BaseModel` if `pydantic 1` is installed or subclasses of `pydantic.v1.BaseModel` if `pydantic 2` is installed.
* Pydantic v2 objects correspond to subclasses of `pydantic.BaseModel` if `pydantic 2` is installed.
| API | Pydantic 1 | Pydantic 2 |
|----------------------------------------|------------|----------------------------------------------------------------|
| `BaseChatModel.bind_tools` | Yes | langchain-core>=0.2.23, appropriate version of partner package |
| `BaseChatModel.with_structured_output` | Yes | langchain-core>=0.2.23, appropriate version of partner package |
| `Tool.from_function` | Yes | langchain-core>=0.2.23 |
| `StructuredTool.from_function` | Yes | langchain-core>=0.2.23 |
Partner packages that accept pydantic v2 objects via `bind_tools` or `with_structured_output` APIs:
| Package Name | pydantic v1 | pydantic v2 |
|---------------------|-------------|-------------|
| langchain-mistralai | Yes | >=0.1.11 |
| langchain-anthropic | Yes | >=0.1.21 |
| langchain-robocorp | Yes | >=0.0.10 |
| langchain-openai | Yes | >=0.1.19 |
| langchain-fireworks | Yes | >=0.1.5 |
| langchain-aws | Yes | >=0.1.15 |
Additional partner packages will be updated to accept Pydantic v2 objects in the future.
If you are still seeing issues with these APIs or other APIs that accept Pydantic objects, please open an issue, and we'll
address it.
Example:
Prior to `langchain-core<0.2.23`, use Pydantic v1 objects when passing to LangChain APIs.
```python
from langchain_openai import ChatOpenAI
from pydantic.v1 import BaseModel # <-- Note v1 namespace
class Person(BaseModel):
"""Personal information"""
name: str
model = ChatOpenAI()
model = model.with_structured_output(Person)
model.invoke('Bob is a person.')
```
After `langchain-core>=0.2.23`, use either Pydantic v1 or v2 objects when passing to LangChain APIs.
```python
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
class Person(BaseModel):
"""Personal information"""
name: str
model = ChatOpenAI()
model = model.with_structured_output(Person)
model.invoke('Bob is a person.')
```
## 2. Sub-classing LangChain models
Because LangChain internally uses Pydantic v1, if you are sub-classing LangChain models, you should use Pydantic v1
primitives.
Below are two examples of showing how to avoid mixing pydantic v1 and v2 code in
the case of inheritance and in the case of passing objects to LangChain.
**Example 1: Extending via inheritance**
**YES**
```python
from pydantic.v1 import root_validator, validator
from pydantic.v1 import validator
from langchain_core.tools import BaseTool
class CustomTool(BaseTool): # BaseTool is v1 code
@@ -70,38 +146,33 @@ CustomTool(
)
```
**Example 2: Passing objects to LangChain**
**YES**
## 3. Disable run-time validation for LangChain objects used inside Pydantic v2 models
e.g.,
```python
from langchain_core.tools import Tool
from pydantic.v1 import BaseModel, Field # <-- Uses v1 namespace
from typing import Annotated
class CalculatorInput(BaseModel):
question: str = Field()
from langchain_openai import ChatOpenAI # <-- ChatOpenAI uses pydantic v1
from pydantic import BaseModel, SkipValidation
Tool.from_function( # <-- tool uses v1 namespace
func=lambda question: 'hello',
name="Calculator",
description="useful for when you need to answer questions about math",
args_schema=CalculatorInput
)
class Foo(BaseModel): # <-- BaseModel is from Pydantic v2
model: Annotated[ChatOpenAI, SkipValidation()]
Foo(model=ChatOpenAI(api_key="hello"))
```
**NO**
## 4: LangServe cannot generate OpenAPI docs if running Pydantic 2
```python
from langchain_core.tools import Tool
from pydantic import BaseModel, Field # <-- Uses v2 namespace
If you are using Pydantic 2, you will not be able to generate OpenAPI docs using LangServe.
class CalculatorInput(BaseModel):
question: str = Field()
If you need OpenAPI docs, your options are to either install Pydantic 1:
Tool.from_function( # <-- tool uses v1 namespace
func=lambda question: 'hello',
name="Calculator",
description="useful for when you need to answer questions about math",
args_schema=CalculatorInput
)
```
`pip install pydantic==1.10.17`
or else to use the `APIHandler` object in LangChain to manually create the
routes for your API.
See: https://python.langchain.com/v0.2/docs/langserve/#pydantic

View File

@@ -721,9 +721,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
"from langgraph.checkpoint.memory import MemorySaver\n",
"\n",
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
"memory = MemorySaver()\n",
"\n",
"agent_executor = create_react_agent(llm, tools, checkpointer=memory)"
]
@@ -890,9 +890,9 @@
"from langchain_community.document_loaders import WebBaseLoader\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"from langgraph.checkpoint.sqlite import SqliteSaver\n",
"from langgraph.checkpoint.memory import MemorySaver\n",
"\n",
"memory = SqliteSaver.from_conn_string(\":memory:\")\n",
"memory = MemorySaver()\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
"\n",
"\n",

View File

@@ -14,7 +14,9 @@
"We will cover two approaches:\n",
"\n",
"1. Using the built-in [create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html), which returns sources by default;\n",
"2. Using a simple [LCEL](/docs/concepts#langchain-expression-language-lcel) implementation, to show the operating principle."
"2. Using a simple [LCEL](/docs/concepts#langchain-expression-language-lcel) implementation, to show the operating principle.\n",
"\n",
"We will also show how to structure sources into the model response, such that a model can report what specific sources it used in generating its answer."
]
},
{
@@ -130,8 +132,8 @@
},
{
"cell_type": "code",
"execution_count": 3,
"id": "820244ae-74b4-4593-b392-822979dd91b8",
"execution_count": null,
"id": "24a69b8c-024e-4e34-b827-9c9de46512a3",
"metadata": {},
"outputs": [],
"source": [
@@ -211,11 +213,11 @@
"data": {
"text/plain": [
"{'input': 'What is Task Decomposition?',\n",
" 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the models thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content=\"(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\", metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})],\n",
" 'answer': 'Task decomposition involves breaking down a complex task into smaller and simpler steps. This process helps agents or models handle challenging tasks by dividing them into more manageable subtasks. Techniques like Chain of Thought and Tree of Thoughts are used to decompose tasks into multiple steps for better problem-solving.'}"
" 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the models thinking process.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\\n1. Internet access for searches and information gathering.\\n2. Long Term memory management.\\n3. GPT-3.5 powered Agents for delegation of simple tasks.\\n4. File output.\\n\\nPerformance Evaluation:\\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\\n2. Constructively self-criticize your big-picture behavior constantly.\\n3. Reflect on past decisions and strategies to refine your approach.\\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content=\"(3) Task execution: Expert models execute on the specific tasks and log results.\\nInstruction:\\n\\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.\")],\n",
" 'answer': 'Task decomposition involves breaking down a complex task into smaller and more manageable steps. This process helps agents or models tackle difficult tasks by dividing them into simpler subtasks or components. Task decomposition can be achieved through techniques like Chain of Thought or Tree of Thoughts, which guide the agent in breaking down tasks into sequential or branching steps.'}"
]
},
"execution_count": 5,
@@ -251,18 +253,18 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "22ea137c-1a7a-44dd-ac73-281213979957",
"id": "1950953a-e6f1-439d-b7b9-c3bd456e388d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input': 'What is Task Decomposition',\n",
" 'context': [Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the models thinking process.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}),\n",
" Document(page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})],\n",
" 'answer': 'Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for autonomous agents or models. This process can be achieved by techniques like Chain of Thought (CoT) or Tree of Thoughts, which guide the model to think step by step or explore multiple reasoning possibilities at each step. Task decomposition can be done through simple prompting with language models, task-specific instructions, or human inputs.'}"
" 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\\nComponent One: Planning#\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\nTask Decomposition#\\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the models thinking process.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\\nTask decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='The AI assistant can parse user input to several tasks: [{\"task\": task, \"id\", task_id, \"dep\": dependency_task_ids, \"args\": {\"text\": text, \"image\": URL, \"audio\": URL, \"video\": URL}}]. The \"dep\" field denotes the id of the previous task which generates a new resource that the current task relies on. A special tag \"-task_id\" refers to the generated text image, audio and video in the dependency task with id as task_id. The task MUST be selected from the following options: {{ Available Task List }}. There is a logical relationship between tasks, please note their order. If the user input can\\'t be parsed, you need to reply empty JSON. Here are several cases for your reference: {{ Demonstrations }}. The chat history is recorded as {{ Chat History }}. From this chat history, you can find the path of the user-mentioned resources for your task planning.'),\n",
" Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 11. Illustration of how HuggingGPT works. (Image source: Shen et al. 2023)\\nThe system comprises of 4 stages:\\n(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.\\nInstruction:')],\n",
" 'answer': 'Task decomposition is a technique used in artificial intelligence to break down complex tasks into smaller and more manageable subtasks. This approach helps agents or models to tackle difficult problems by dividing them into simpler steps, improving performance and interpretability. Different methods like Chain of Thought and Tree of Thoughts have been developed to enhance task decomposition in AI systems.'}"
]
},
"execution_count": 6,
@@ -279,15 +281,25 @@
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
"\n",
"\n",
"# This Runnable takes a dict with keys 'input' and 'context',\n",
"# formats them into a prompt, and generates a response.\n",
"rag_chain_from_docs = (\n",
" RunnablePassthrough.assign(context=(lambda x: format_docs(x[\"context\"])))\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
" {\n",
" \"input\": lambda x: x[\"input\"], # input query\n",
" \"context\": lambda x: format_docs(x[\"context\"]), # context\n",
" }\n",
" | prompt # format query and context into prompt\n",
" | llm # generate response\n",
" | StrOutputParser() # coerce to string\n",
")\n",
"\n",
"# Pass input query to retriever\n",
"retrieve_docs = (lambda x: x[\"input\"]) | retriever\n",
"\n",
"# Below, we chain `.assign` calls. This takes a dict and successively\n",
"# adds keys-- \"context\" and \"answer\"-- where the value for each key\n",
"# is determined by a Runnable. The Runnable operates on all existing\n",
"# keys in the dict.\n",
"chain = RunnablePassthrough.assign(context=retrieve_docs).assign(\n",
" answer=rag_chain_from_docs\n",
")\n",
@@ -302,7 +314,105 @@
"source": [
":::{.callout-tip}\n",
"\n",
"Check out the [LangSmith trace](https://smith.langchain.com/public/0cb42685-e29e-4280-a503-bef2014d7ba2/r)\n",
"Check out the [LangSmith trace](https://smith.langchain.com/public/1c055a3b-0236-4670-a3fb-023d418ba796/r)\n",
"\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "c1c17797-d965-4fd2-b8d4-d386f25dd352",
"metadata": {},
"source": [
"## Structure sources in model response\n",
"\n",
"Up to this point, we've simply propagated the documents returned from the retrieval step through to the final response. But this may not illustrate what subset of information the model relied on when generating its answer. Below, we show how to structure sources into the model response, allowing the model to report what specific context it relied on for its answer.\n",
"\n",
"Because the above LCEL implementation is composed of [Runnable](/docs/concepts/#runnable-interface) primitives, it is straightforward to extend. Below, we make a simple change:\n",
"\n",
"- We use the model's tool-calling features to generate [structured output](/docs/how_to/structured_output/), consisting of an answer and list of sources. The schema for the response is represented in the `AnswerWithSources` TypedDict, below.\n",
"- We remove the `StrOutputParser()`, as we expect `dict` output in this scenario."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "8f916b14-1b0a-4975-a62f-52f1353bde15",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from typing_extensions import Annotated, TypedDict\n",
"\n",
"\n",
"# Desired schema for response\n",
"class AnswerWithSources(TypedDict):\n",
" \"\"\"An answer to the question, with sources.\"\"\"\n",
"\n",
" answer: str\n",
" sources: Annotated[\n",
" List[str],\n",
" ...,\n",
" \"List of sources (author + year) used to answer the question\",\n",
" ]\n",
"\n",
"\n",
"# Our rag_chain_from_docs has the following changes:\n",
"# - add `.with_structured_output` to the LLM;\n",
"# - remove the output parser\n",
"rag_chain_from_docs = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"context\": lambda x: format_docs(x[\"context\"]),\n",
" }\n",
" | prompt\n",
" | llm.with_structured_output(AnswerWithSources)\n",
")\n",
"\n",
"retrieve_docs = (lambda x: x[\"input\"]) | retriever\n",
"\n",
"chain = RunnablePassthrough.assign(context=retrieve_docs).assign(\n",
" answer=rag_chain_from_docs\n",
")\n",
"\n",
"response = chain.invoke({\"input\": \"What is Chain of Thought?\"})"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "7a8fc0c5-afb3-4012-a467-3951996a6850",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"answer\": \"Chain of Thought (CoT) is a prompting technique that enhances model performance on complex tasks by instructing the model to \\\"think step by step\\\" to decompose hard tasks into smaller and simpler steps. It transforms big tasks into multiple manageable tasks and sheds light on the interpretation of the model's thinking process.\",\n",
" \"sources\": [\n",
" \"Wei et al. 2022\"\n",
" ]\n",
"}\n"
]
}
],
"source": [
"import json\n",
"\n",
"print(json.dumps(response[\"answer\"], indent=2))"
]
},
{
"cell_type": "markdown",
"id": "7440f785-29c5-4c6b-9656-0d9d5efbac05",
"metadata": {},
"source": [
":::{.callout-tip}\n",
"\n",
"View [LangSmith trace](https://smith.langchain.com/public/0eeddf06-3a7b-4f27-974c-310ca8160f60/r)\n",
"\n",
":::"
]

View File

@@ -38,8 +38,8 @@
" Operator,\n",
" StructuredQuery,\n",
")\n",
"from langchain.retrievers.self_query.chroma import ChromaTranslator\n",
"from langchain.retrievers.self_query.elasticsearch import ElasticsearchTranslator\n",
"from langchain_community.query_constructors.chroma import ChromaTranslator\n",
"from langchain_community.query_constructors.elasticsearch import ElasticsearchTranslator\n",
"from langchain_core.pydantic_v1 import BaseModel"
]
},

View File

@@ -512,7 +512,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers.self_query.chroma import ChromaTranslator\n",
"from langchain_community.query_constructors.chroma import ChromaTranslator\n",
"\n",
"retriever = SelfQueryRetriever(\n",
" query_constructor=query_constructor,\n",

View File

@@ -299,16 +299,16 @@
},
{
"cell_type": "markdown",
"id": "423c6e099e94ca69",
"metadata": {
"collapsed": false
},
"source": [
"### Gradient\n",
"\n",
"In this method, the gradient of distance is used to split chunks along with the percentile method.\n",
"This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data."
],
"metadata": {
"collapsed": false
},
"id": "423c6e099e94ca69"
]
},
{
"cell_type": "code",
@@ -325,6 +325,8 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "e9f393d316ce1f6c",
"metadata": {},
"outputs": [
{
"name": "stdout",
@@ -337,13 +339,13 @@
"source": [
"docs = text_splitter.create_documents([state_of_the_union])\n",
"print(docs[0].page_content)"
],
"metadata": {},
"id": "e9f393d316ce1f6c"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a407cd57f02a0db4",
"metadata": {},
"outputs": [
{
"name": "stdout",
@@ -355,9 +357,7 @@
],
"source": [
"print(len(docs))"
],
"metadata": {},
"id": "a407cd57f02a0db4"
]
}
],
"metadata": {

View File

@@ -761,7 +761,7 @@
"* [SQL tutorial](/docs/tutorials/sql_qa): Many of the challenges of working with SQL db's and CSV's are generic to any structured data type, so it's useful to read the SQL techniques even if you're using Pandas for CSV data analysis.\n",
"* [Tool use](/docs/how_to/tool_calling): Guides on general best practices when working with chains and agents that invoke tools\n",
"* [Agents](/docs/tutorials/agents): Understand the fundamentals of building LLM agents.\n",
"* Integrations: Sandboxed envs like [E2B](/docs/integrations/tools/e2b_data_analysis) and [Bearly](/docs/integrations/tools/bearly), utilities like [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), related agents like [Spark DataFrame agent](/docs/integrations/toolkits/spark)."
"* Integrations: Sandboxed envs like [E2B](/docs/integrations/tools/e2b_data_analysis) and [Bearly](/docs/integrations/tools/bearly), utilities like [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), related agents like [Spark DataFrame agent](/docs/integrations/tools/spark_sql)."
]
}
],

View File

@@ -43,7 +43,7 @@
"\n",
"This is the easiest and most reliable way to get structured outputs. `with_structured_output()` is implemented for models that provide native APIs for structuring outputs, like tool/function calling or JSON mode, and makes use of these capabilities under the hood.\n",
"\n",
"This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes. The method returns a model-like Runnable, except that instead of outputting strings or Messages it outputs objects corresponding to the given schema. The schema can be specified as a [JSON Schema](https://json-schema.org/) or a Pydantic class. If JSON Schema is used then a dictionary will be returned by the Runnable, and if a Pydantic class is used then Pydantic objects will be returned.\n",
"This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes. The method returns a model-like Runnable, except that instead of outputting strings or Messages it outputs objects corresponding to the given schema. The schema can be specified as a TypedDict class, [JSON Schema](https://json-schema.org/) or a Pydantic class. If TypedDict or JSON Schema are used then a dictionary will be returned by the Runnable, and if a Pydantic class is used then a Pydantic object will be returned.\n",
"\n",
"As an example, let's get a model to generate a joke and separate the setup from the punchline:\n",
"\n",
@@ -58,7 +58,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "6d55008f",
"metadata": {},
"outputs": [],
@@ -68,7 +68,7 @@
"\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4-0125-preview\", temperature=0)"
"llm = ChatOpenAI(model=\"gpt-4o\", temperature=0)"
]
},
{
@@ -76,22 +76,24 @@
"id": "a808a401-be1f-49f9-ad13-58dd68f7db5f",
"metadata": {},
"source": [
"If we want the model to return a Pydantic object, we just need to pass in the desired Pydantic class:"
"### Pydantic class\n",
"\n",
"If we want the model to return a Pydantic object, we just need to pass in the desired Pydantic class. The key advantage of using Pydantic is that the model-generated output will be validated. Pydantic will raise an error if any required fields are missing or if any fields are of the wrong type."
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "070bf702",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=8)"
"Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=7)"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -102,12 +104,15 @@
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"\n",
"# Pydantic\n",
"class Joke(BaseModel):\n",
" \"\"\"Joke to tell user.\"\"\"\n",
"\n",
" setup: str = Field(description=\"The setup of the joke\")\n",
" punchline: str = Field(description=\"The punchline to the joke\")\n",
" rating: Optional[int] = Field(description=\"How funny the joke is, from 1 to 10\")\n",
" rating: Optional[int] = Field(\n",
" default=None, description=\"How funny the joke is, from 1 to 10\"\n",
" )\n",
"\n",
"\n",
"structured_llm = llm.with_structured_output(Joke)\n",
@@ -130,12 +135,73 @@
"id": "deddb6d3",
"metadata": {},
"source": [
"We can also pass in a [JSON Schema](https://json-schema.org/) dict if you prefer not to use Pydantic. In this case, the response is also a dict:"
"### TypedDict or JSON Schema\n",
"\n",
"If you don't want to use Pydantic, explicitly don't want validation of the arguments, or want to be able to stream the model outputs, you can define your schema using a TypedDict class. We can optionally use a special `Annotated` syntax supported by LangChain that allows you to specify the default value and description of a field. Note, the default value is *not* filled in automatically if the model doesn't generate it, it is only used in defining the schema that is passed to the model.\n",
"\n",
":::info Requirements\n",
"\n",
"- Core: `langchain-core>=0.2.26`\n",
"- Typing extensions: It is highly recommended to import `Annotated` and `TypedDict` from `typing_extensions` instead of `typing` to ensure consistent behavior across Python versions.\n",
"\n",
":::"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "70d82891-42e8-424a-919e-07d83bcfec61",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'setup': 'Why was the cat sitting on the computer?',\n",
" 'punchline': 'Because it wanted to keep an eye on the mouse!',\n",
" 'rating': 7}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing_extensions import Annotated, TypedDict\n",
"\n",
"\n",
"# TypedDict\n",
"class Joke(TypedDict):\n",
" \"\"\"Joke to tell user.\"\"\"\n",
"\n",
" setup: Annotated[str, ..., \"The setup of the joke\"]\n",
"\n",
" # Alternatively, we could have specified setup as:\n",
"\n",
" # setup: str # no default, no description\n",
" # setup: Annotated[str, ...] # no default, no description\n",
" # setup: Annotated[str, \"foo\"] # default, no description\n",
"\n",
" punchline: Annotated[str, ..., \"The punchline of the joke\"]\n",
" rating: Annotated[Optional[int], None, \"How funny the joke is, from 1 to 10\"]\n",
"\n",
"\n",
"structured_llm = llm.with_structured_output(Joke)\n",
"\n",
"structured_llm.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "markdown",
"id": "e4d7b4dc-f617-4ea8-aa58-847c228791b4",
"metadata": {},
"source": [
"Equivalently, we can pass in a [JSON Schema](https://json-schema.org/) dict. This requires no imports or classes and makes it very clear exactly how each parameter is documented, at the cost of being a bit more verbose."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6700994a",
"metadata": {},
"outputs": [
@@ -144,10 +210,10 @@
"text/plain": [
"{'setup': 'Why was the cat sitting on the computer?',\n",
" 'punchline': 'Because it wanted to keep an eye on the mouse!',\n",
" 'rating': 8}"
" 'rating': 7}"
]
},
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -169,6 +235,7 @@
" \"rating\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"How funny the joke is, from 1 to 10\",\n",
" \"default\": None,\n",
" },\n",
" },\n",
" \"required\": [\"setup\", \"punchline\"],\n",
@@ -185,7 +252,7 @@
"source": [
"### Choosing between multiple schemas\n",
"\n",
"The simplest way to let the model choose from multiple schemas is to create a parent Pydantic class that has a Union-typed attribute:"
"The simplest way to let the model choose from multiple schemas is to create a parent schema that has a Union-typed attribute:"
]
},
{
@@ -209,6 +276,17 @@
"from typing import Union\n",
"\n",
"\n",
"# Pydantic\n",
"class Joke(BaseModel):\n",
" \"\"\"Joke to tell user.\"\"\"\n",
"\n",
" setup: str = Field(description=\"The setup of the joke\")\n",
" punchline: str = Field(description=\"The punchline to the joke\")\n",
" rating: Optional[int] = Field(\n",
" default=None, description=\"How funny the joke is, from 1 to 10\"\n",
" )\n",
"\n",
"\n",
"class ConversationalResponse(BaseModel):\n",
" \"\"\"Respond in a conversational manner. Be kind and helpful.\"\"\"\n",
"\n",
@@ -260,7 +338,7 @@
"source": [
"### Streaming\n",
"\n",
"We can stream outputs from our structured model when the output type is a dict (i.e., when the schema is specified as a JSON Schema dict). \n",
"We can stream outputs from our structured model when the output type is a dict (i.e., when the schema is specified as a TypedDict class or JSON Schema dict). \n",
"\n",
":::info\n",
"\n",
@@ -271,7 +349,7 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": 9,
"id": "aff89877-28a3-472f-a1aa-eff893fe7736",
"metadata": {},
"outputs": [
@@ -302,12 +380,24 @@
"{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the'}\n",
"{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse'}\n",
"{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!'}\n",
"{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 8}\n"
"{'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}\n"
]
}
],
"source": [
"structured_llm = llm.with_structured_output(json_schema)\n",
"from typing_extensions import Annotated, TypedDict\n",
"\n",
"\n",
"# TypedDict\n",
"class Joke(TypedDict):\n",
" \"\"\"Joke to tell user.\"\"\"\n",
"\n",
" setup: Annotated[str, ..., \"The setup of the joke\"]\n",
" punchline: Annotated[str, ..., \"The punchline of the joke\"]\n",
" rating: Annotated[Optional[int], None, \"How funny the joke is, from 1 to 10\"]\n",
"\n",
"\n",
"structured_llm = llm.with_structured_output(Joke)\n",
"\n",
"for chunk in structured_llm.stream(\"Tell me a joke about cats\"):\n",
" print(chunk)"
@@ -327,7 +417,7 @@
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": 11,
"id": "283ba784-2072-47ee-9b2c-1119e3c69e8e",
"metadata": {},
"outputs": [
@@ -335,11 +425,11 @@
"data": {
"text/plain": [
"{'setup': 'Woodpecker',\n",
" 'punchline': \"Woodpecker goes 'knock knock', but don't worry, they never expect you to answer the door!\",\n",
" 'rating': 8}"
" 'punchline': \"Woodpecker who? Woodpecker who can't find a tree is just a bird with a headache!\",\n",
" 'rating': 7}"
]
},
"execution_count": 47,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -377,7 +467,7 @@
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": 12,
"id": "d7381cb0-b2c3-4302-a319-ed72d0b9e43f",
"metadata": {},
"outputs": [
@@ -385,11 +475,11 @@
"data": {
"text/plain": [
"{'setup': 'Crocodile',\n",
" 'punchline': \"Crocodile 'see you later', but in a while, it becomes an alligator!\",\n",
" 'punchline': 'Crocodile be seeing you later, alligator!',\n",
" 'rating': 7}"
]
},
"execution_count": 46,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@@ -491,23 +581,24 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 15,
"id": "df0370e3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=None)"
"{'setup': 'Why was the cat sitting on the computer?',\n",
" 'punchline': 'Because it wanted to keep an eye on the mouse!'}"
]
},
"execution_count": 6,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"structured_llm = llm.with_structured_output(Joke, method=\"json_mode\")\n",
"structured_llm = llm.with_structured_output(None, method=\"json_mode\")\n",
"\n",
"structured_llm.invoke(\n",
" \"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys\"\n",
@@ -526,19 +617,21 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 17,
"id": "10ed2842",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_ASK4EmZeZ69Fi3p554Mb4rWy', 'function': {'arguments': '{\"setup\":\"Why was the cat sitting on the computer?\",\"punchline\":\"Because it wanted to keep an eye on the mouse!\"}', 'name': 'Joke'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 36, 'prompt_tokens': 107, 'total_tokens': 143}, 'model_name': 'gpt-4-0125-preview', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-6491d35b-9164-4656-b75c-d7882cfb76cb-0', tool_calls=[{'name': 'Joke', 'args': {'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!'}, 'id': 'call_ASK4EmZeZ69Fi3p554Mb4rWy'}], usage_metadata={'input_tokens': 107, 'output_tokens': 36, 'total_tokens': 143}),\n",
" 'parsed': Joke(setup='Why was the cat sitting on the computer?', punchline='Because it wanted to keep an eye on the mouse!', rating=None),\n",
"{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'function': {'arguments': '{\"setup\":\"Why was the cat sitting on the computer?\",\"punchline\":\"Because it wanted to keep an eye on the mouse!\",\"rating\":7}', 'name': 'Joke'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 33, 'prompt_tokens': 93, 'total_tokens': 126}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_4e2b2da518', 'finish_reason': 'stop', 'logprobs': None}, id='run-d880d7e2-df08-4e9e-ad92-dfc29f2fd52f-0', tool_calls=[{'name': 'Joke', 'args': {'setup': 'Why was the cat sitting on the computer?', 'punchline': 'Because it wanted to keep an eye on the mouse!', 'rating': 7}, 'id': 'call_f25ZRmh8u5vHlOWfTUw8sJFZ', 'type': 'tool_call'}], usage_metadata={'input_tokens': 93, 'output_tokens': 33, 'total_tokens': 126}),\n",
" 'parsed': {'setup': 'Why was the cat sitting on the computer?',\n",
" 'punchline': 'Because it wanted to keep an eye on the mouse!',\n",
" 'rating': 7},\n",
" 'parsing_error': None}"
]
},
"execution_count": 5,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@@ -546,9 +639,7 @@
"source": [
"structured_llm = llm.with_structured_output(Joke, include_raw=True)\n",
"\n",
"structured_llm.invoke(\n",
" \"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys\"\n",
")"
"structured_llm.invoke(\"Tell me a joke about cats\")"
]
},
{
@@ -824,7 +915,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -838,7 +929,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -24,10 +24,9 @@
"This guide assumes familiarity with the following concepts:\n",
"\n",
"- [Chat models](/docs/concepts/#chat-models)\n",
"- [LangChain Tools](/docs/concepts/#tools)\n",
"- [Tool calling](/docs/concepts/#functiontool-calling)\n",
"- [Tools](/docs/concepts/#tools)\n",
"- [Output parsers](/docs/concepts/#output-parsers)\n",
"\n",
":::\n",
"\n",
"[Tool calling](/docs/concepts/#functiontool-calling) allows a chat model to respond to a given prompt by \"calling a tool\".\n",
@@ -38,15 +37,11 @@
"\n",
"![Diagram of calling a tool](/img/tool_call.png)\n",
"\n",
"If you want to see how to use the model-generated tool call to actually run a tool function [check out this guide](/docs/how_to/tool_results_pass_to_model/).\n",
"If you want to see how to use the model-generated tool call to actually run a tool [check out this guide](/docs/how_to/tool_results_pass_to_model/).\n",
"\n",
":::note Supported models\n",
"\n",
"Tool calling is not universal, but is supported by many popular LLM providers, including [Anthropic](/docs/integrations/chat/anthropic/), \n",
"[Cohere](/docs/integrations/chat/cohere/), [Google](/docs/integrations/chat/google_vertex_ai_palm/), \n",
"[Mistral](/docs/integrations/chat/mistralai/), [OpenAI](/docs/integrations/chat/openai/), and even for locally-running models via [Ollama](/docs/integrations/chat/ollama/).\n",
"\n",
"You can find a [list of all models that support tool calling here](/docs/integrations/chat/).\n",
"Tool calling is not universal, but is supported by many popular LLM providers. You can find a [list of all models that support tool calling here](/docs/integrations/chat/).\n",
"\n",
":::\n",
"\n",
@@ -58,14 +53,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Passing tools to chat models\n",
"## Defining tool schemas\n",
"\n",
"Chat models that support tool calling features implement a `.bind_tools` method, which \n",
"receives a list of functions, Pydantic models, or LangChain [tool objects](https://api.python.langchain.com/en/latest/tools/langchain_core.tools.BaseTool.html#langchain_core.tools.BaseTool) \n",
"and binds them to the chat model in its expected format. Subsequent invocations of the \n",
"chat model will include tool schemas in its calls to the LLM.\n",
"For a model to be able to call tools, we need to pass in tool schemas that describe what the tool does and what it's arguments are. Chat models that support tool calling features implement a `.bind_tools()` method for passing tool schemas to the model. Tool schemas can be passed in as Python functions (with typehints and docstrings), Pydantic models, TypedDict classes, or LangChain [Tool objects](https://api.python.langchain.com/en/latest/tools/langchain_core.tools.BaseTool.html#langchain_core.tools.BaseTool). Subsequent invocations of the model will pass in these tool schemas along with the prompt.\n",
"\n",
"For example, below we implement simple tools for arithmetic:"
"### Python functions\n",
"Our tool schemas can be Python functions:"
]
},
{
@@ -74,26 +67,41 @@
"metadata": {},
"outputs": [],
"source": [
"# The function name, type hints, and docstring are all part of the tool\n",
"# schema that's passed to the model. Defining good, descriptive schemas\n",
"# is an extension of prompt engineering and is an important part of\n",
"# getting models to perform well.\n",
"def add(a: int, b: int) -> int:\n",
" \"\"\"Adds a and b.\"\"\"\n",
" \"\"\"Add two integers.\n",
"\n",
" Args:\n",
" a: First integer\n",
" b: Second integer\n",
" \"\"\"\n",
" return a + b\n",
"\n",
"\n",
"def multiply(a: int, b: int) -> int:\n",
" \"\"\"Multiplies a and b.\"\"\"\n",
" return a * b\n",
" \"\"\"Multiply two integers.\n",
"\n",
"\n",
"tools = [add, multiply]"
" Args:\n",
" a: First integer\n",
" b: Second integer\n",
" \"\"\"\n",
" return a * b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### LangChain Tool\n",
"\n",
"LangChain also implements a `@tool` decorator that allows for further control of the tool schema, such as tool names and argument descriptions. See the how-to guide [here](/docs/how_to/custom_tools/#creating-tools-from-functions) for details.\n",
"\n",
"We can also define the schemas without the accompanying functions using [Pydantic](https://docs.pydantic.dev):"
"### Pydantic class\n",
"\n",
"You can equivalently define the schemas without the accompanying functions using [Pydantic](https://docs.pydantic.dev):"
]
},
{
@@ -105,23 +113,57 @@
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"\n",
"# Note that the docstrings here are crucial, as they will be passed along\n",
"# to the model along with the class name.\n",
"class Add(BaseModel):\n",
" \"\"\"Add two integers together.\"\"\"\n",
"class add(BaseModel):\n",
" \"\"\"Add two integers.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
"\n",
"\n",
"class Multiply(BaseModel):\n",
" \"\"\"Multiply two integers together.\"\"\"\n",
"class multiply(BaseModel):\n",
" \"\"\"Multiply two integers.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
" b: int = Field(..., description=\"Second integer\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TypedDict class\n",
"\n",
":::info Requires `langchain-core>=0.2.25`\n",
":::\n",
"\n",
"Or using TypedDicts and annotations:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from typing_extensions import Annotated, TypedDict\n",
"\n",
"\n",
"tools = [Add, Multiply]"
"class add(TypedDict):\n",
" \"\"\"Add two integers.\"\"\"\n",
"\n",
" # Annotations must have the type and can optionally include a default value and description (in that order).\n",
" a: Annotated[int, ..., \"First integer\"]\n",
" b: Annotated[int, ..., \"Second integer\"]\n",
"\n",
"\n",
"class multiply(BaseModel):\n",
" \"\"\"Multiply two integers.\"\"\"\n",
"\n",
" a: Annotated[int, ..., \"First integer\"]\n",
" b: Annotated[int, ..., \"Second integer\"]\n",
"\n",
"\n",
"tools = [add, multiply]"
]
},
{
@@ -129,7 +171,7 @@
"metadata": {},
"source": [
"To actually bind those schemas to a chat model, we'll use the `.bind_tools()` method. This handles converting\n",
"the `Add` and `Multiply` schemas to the proper format for the model. The tool schema will then be passed it in each time the model is invoked.\n",
"the `add` and `multiply` schemas to the proper format for the model. The tool schema will then be passed it in each time the model is invoked.\n",
"\n",
"```{=mdx}\n",
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
@@ -164,16 +206,16 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_wLTBasMppAwpdiA5CD92l9x7', 'function': {'arguments': '{\"a\":3,\"b\":12}', 'name': 'Multiply'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 89, 'total_tokens': 107}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0f03d4f0ee', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d3f36cca-f225-416f-ac16-0217046f0b38-0', tool_calls=[{'name': 'Multiply', 'args': {'a': 3, 'b': 12}, 'id': 'call_wLTBasMppAwpdiA5CD92l9x7', 'type': 'tool_call'}], usage_metadata={'input_tokens': 89, 'output_tokens': 18, 'total_tokens': 107})"
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_BwYJ4UgU5pRVCBOUmiu7NhF9', 'function': {'arguments': '{\"a\":3,\"b\":12}', 'name': 'multiply'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 80, 'total_tokens': 97}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_ba606877f9', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-7f05e19e-4561-40e2-a2d0-8f4e28e9a00f-0', tool_calls=[{'name': 'multiply', 'args': {'a': 3, 'b': 12}, 'id': 'call_BwYJ4UgU5pRVCBOUmiu7NhF9', 'type': 'tool_call'}], usage_metadata={'input_tokens': 80, 'output_tokens': 17, 'total_tokens': 97})"
]
},
"execution_count": 4,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -214,23 +256,23 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Multiply',\n",
"[{'name': 'multiply',\n",
" 'args': {'a': 3, 'b': 12},\n",
" 'id': 'call_uqJsNrDJ8ZZnFa1BHHYAllEv',\n",
" 'id': 'call_rcdMie7E89Xx06lEKKxJyB5N',\n",
" 'type': 'tool_call'},\n",
" {'name': 'Add',\n",
" {'name': 'add',\n",
" 'args': {'a': 11, 'b': 49},\n",
" 'id': 'call_ud1uHAaYsdpWuxugwoJ63BDs',\n",
" 'id': 'call_nheGN8yfvSJsnIuGZaXihou3',\n",
" 'type': 'tool_call'}]"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -252,31 +294,49 @@
"are populated in the `.invalid_tool_calls` attribute. An `InvalidToolCall` can have \n",
"a name, string arguments, identifier, and error message.\n",
"\n",
"If desired, [output parsers](/docs/how_to#output-parsers) can further \n",
"process the output. For example, we can convert existing values populated on the `.tool_calls` attribute back to the original Pydantic class using the\n",
"\n",
"## Parsing\n",
"\n",
"If desired, [output parsers](/docs/how_to#output-parsers) can further process the output. For example, we can convert existing values populated on the `.tool_calls` to Pydantic objects using the\n",
"[PydanticToolsParser](https://api.python.langchain.com/en/latest/output_parsers/langchain_core.output_parsers.openai_tools.PydanticToolsParser.html):"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Multiply(a=3, b=12), Add(a=11, b=49)]"
"[multiply(a=3, b=12), add(a=11, b=49)]"
]
},
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.output_parsers import PydanticToolsParser\n",
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"chain = llm_with_tools | PydanticToolsParser(tools=[Multiply, Add])\n",
"\n",
"class add(BaseModel):\n",
" \"\"\"Add two integers.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
"\n",
"\n",
"class multiply(BaseModel):\n",
" \"\"\"Multiply two integers.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
"\n",
"\n",
"chain = llm_with_tools | PydanticToolsParser(tools=[add, multiply])\n",
"chain.invoke(query)"
]
},
@@ -294,18 +354,18 @@
"\n",
"You can also check out some more specific uses of tool calling:\n",
"\n",
"- Getting [structured outputs](/docs/how_to/structured_output/) from models\n",
"- Few shot prompting [with tools](/docs/how_to/tools_few_shot/)\n",
"- Stream [tool calls](/docs/how_to/tool_streaming/)\n",
"- Pass [runtime values to tools](/docs/how_to/tool_runtime)\n",
"- Getting [structured outputs](/docs/how_to/structured_output/) from models"
"- Pass [runtime values to tools](/docs/how_to/tool_runtime)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "poetry-venv-311",
"language": "python",
"name": "python3"
"name": "poetry-venv-311"
},
"language_info": {
"codemirror_mode": {
@@ -317,7 +377,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -6,26 +6,20 @@
"source": [
"# How to pass run time values to tools\n",
"\n",
":::info Prerequisites\n",
"import Prerequisites from \"@theme/Prerequisites\";\n",
"import Compatibility from \"@theme/Compatibility\";\n",
"\n",
"This guide assumes familiarity with the following concepts:\n",
"- [Chat models](/docs/concepts/#chat-models)\n",
"- [LangChain Tools](/docs/concepts/#tools)\n",
"- [How to create tools](/docs/how_to/custom_tools)\n",
"- [How to use a model to call tools](/docs/how_to/tool_calling)\n",
":::\n",
"<Prerequisites titlesAndLinks={[\n",
" [\"Chat models\", \"/docs/concepts/#chat-models\"],\n",
" [\"LangChain Tools\", \"/docs/concepts/#tools\"],\n",
" [\"How to create tools\", \"/docs/how_to/custom_tools\"],\n",
" [\"How to use a model to call tools\", \"/docs/how_to/tool_calling\"],\n",
"]} />\n",
"\n",
":::info Using with LangGraph\n",
"\n",
"If you're using LangGraph, please refer to [this how-to guide](https://langchain-ai.github.io/langgraph/how-tos/pass-run-time-values-to-tools/)\n",
"which shows how to create an agent that keeps track of a given user's favorite pets.\n",
":::\n",
"\n",
":::caution Added in `langchain-core==0.2.21`\n",
"\n",
"Must have `langchain-core>=0.2.21` to use this functionality.\n",
"\n",
":::\n",
"<Compatibility packagesAndVersions={[\n",
" [\"langchain-core\", \"0.2.21\"],\n",
"]} />\n",
"\n",
"You may need to bind values to a tool that are only known at runtime. For example, the tool logic may require using the ID of the user who made the request.\n",
"\n",
@@ -33,7 +27,13 @@
"\n",
"Instead, the LLM should only control the parameters of the tool that are meant to be controlled by the LLM, while other parameters (such as user ID) should be fixed by the application logic.\n",
"\n",
"This how-to guide shows you how to prevent the model from generating certain tool arguments and injecting them in directly at runtime."
"This how-to guide shows you how to prevent the model from generating certain tool arguments and injecting them in directly at runtime.\n",
"\n",
":::info Using with LangGraph\n",
"\n",
"If you're using LangGraph, please refer to [this how-to guide](https://langchain-ai.github.io/langgraph/how-tos/pass-run-time-values-to-tools/)\n",
"which shows how to create an agent that keeps track of a given user's favorite pets.\n",
":::"
]
},
{
@@ -597,9 +597,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv-311",
"display_name": "Python 3",
"language": "python",
"name": "poetry-venv-311"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -611,7 +611,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.10.5"
}
},
"nbformat": 4,

View File

@@ -5,7 +5,6 @@ sidebar_position: 3
Toolkits are collections of tools that are designed to be used together for specific tasks. They have convenient loading methods.
For a complete list of available ready-made toolkits, visit [Integrations](/docs/integrations/toolkits/).
All Toolkits expose a `get_tools` method which returns a list of tools.
You can therefore do:

View File

@@ -196,8 +196,6 @@
"\n",
"Toolkits are collections of tools that are designed to be used together for specific tasks. They have convenient loading methods.\n",
"\n",
"For a complete list of available ready-made toolkits, visit [Integrations](/docs/integrations/toolkits/).\n",
"\n",
"All Toolkits expose a `get_tools` method which returns a list of tools.\n",
"\n",
"You're usually meant to use them this way:\n",

View File

@@ -17,26 +17,25 @@
"source": [
"# ChatAI21\n",
"\n",
"## Overview\n",
"\n",
"This notebook covers how to get started with AI21 chat models.\n",
"Note that different chat models support different parameters. See the ",
"[AI21 documentation](https://docs.ai21.com/reference) to learn more about the parameters in your chosen model.\n",
"Note that different chat models support different parameters. See the [AI21 documentation](https://docs.ai21.com/reference) to learn more about the parameters in your chosen model.\n",
"[See all AI21's LangChain components.](https://pypi.org/project/langchain-ai21/) \n",
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c3bef91",
"metadata": {
"ExecuteTime": {
"end_time": "2024-02-15T06:50:44.929635Z",
"start_time": "2024-02-15T06:50:41.209704Z"
}
},
"outputs": [],
"source": [
"!pip install -qU langchain-ai21"
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/__package_name_short_snake__) | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatAI21](https://api.python.langchain.com/en/latest/chat_models/langchain_ai21.chat_models.ChatAI21.html#langchain_ai21.chat_models.ChatAI21) | [langchain-ai21](https://api.python.langchain.com/en/latest/ai21_api_reference.html) | ❌ | beta | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-ai21?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-ai21?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | \n",
"\n",
"\n",
"## Setup"
]
},
{
@@ -44,10 +43,9 @@
"id": "2b4f3e15",
"metadata": {},
"source": [
"## Environment Setup\n",
"### Credentials\n",
"\n",
"We'll need to get an [AI21 API key](https://docs.ai21.com/) and set the ",
"`AI21_API_KEY` environment variable:\n"
"We'll need to get an [AI21 API key](https://docs.ai21.com/) and set the `AI21_API_KEY` environment variable:\n"
]
},
{
@@ -67,48 +65,166 @@
},
{
"cell_type": "markdown",
"id": "4828829d3da430ce",
"metadata": {
"collapsed": false
},
"id": "f6844fff-3702-4489-ab74-732f69f3b9d7",
"metadata": {},
"source": [
"## Usage"
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "39353473fce5dd2e",
"execution_count": null,
"id": "7c2e19d3-7c58-4470-9e1a-718b27a32056",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "98e22f31-8acc-42d6-916d-415d1263c56e",
"metadata": {},
"source": [
"### Installation"
]
},
{
"cell_type": "markdown",
"id": "f9699cd9-58f2-450e-aa64-799e66906c0f",
"metadata": {},
"source": [
"!pip install -qU langchain-ai21"
]
},
{
"cell_type": "markdown",
"id": "4828829d3da430ce",
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c40756fb-cbf8-4d44-a293-3989d707237e",
"metadata": {},
"outputs": [],
"source": [
"from langchain_ai21 import ChatAI21\n",
"\n",
"llm = ChatAI21(model=\"jamba-instruct\", temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "2bdc5d68-2a19-495e-8c04-d11adc86d3ae",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "46b982dc-5d8a-46da-a711-81c03ccd6adc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Bonjour, comment vas-tu?')"
"AIMessage(content=\"J'adore programmer.\", id='run-2e8d16d6-a06e-45cb-8d0c-1c8208645033-0')"
]
},
"execution_count": 1,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n",
" ),\n",
" (\"human\", \"I love programming.\"),\n",
"]\n",
"ai_msg = llm.invoke(messages)\n",
"ai_msg"
]
},
{
"cell_type": "markdown",
"id": "10a30f84-b531-4fd5-8b5b-91512fbdc75b",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "39353473fce5dd2e",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe das Programmieren.', id='run-e1bd82dc-1a7e-4b2e-bde9-ac995929ac0f-0')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_ai21 import ChatAI21\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"chat = ChatAI21(model=\"jamba-instruct\")\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
"prompt = ChatPromptTemplate(\n",
" [\n",
" (\"system\", \"You are a helpful assistant that translates English to French.\"),\n",
" (\"human\", \"Translate this sentence from English to French. {english_text}.\"),\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | chat\n",
"chain.invoke({\"english_text\": \"Hello, how are you?\"})"
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e79de691-9dd6-4697-b57e-59a4a3cc073a",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatAI21 features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_ai21.chat_models.ChatAI21.html"
]
}
],
@@ -128,7 +244,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -115,7 +115,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
"metadata": {},
"outputs": [],
@@ -123,8 +123,8 @@
"from langchain_openai import AzureChatOpenAI\n",
"\n",
"llm = AzureChatOpenAI(\n",
" azure_deployment=\"YOUR-DEPLOYMENT\",\n",
" api_version=\"2024-05-01-preview\",\n",
" azure_deployment=\"gpt-35-turbo\", # or your deployment\n",
" api_version=\"2023-06-01-preview\", # or your api version\n",
" temperature=0,\n",
" max_tokens=None,\n",
" timeout=None,\n",
@@ -143,7 +143,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"id": "62e0dbc3",
"metadata": {
"tags": []
@@ -152,10 +152,10 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 31, 'total_tokens': 39}, 'model_name': 'gpt-35-turbo', 'system_fingerprint': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-a6a732c2-cb02-4e50-9a9c-ab30eab034fc-0', usage_metadata={'input_tokens': 31, 'output_tokens': 8, 'total_tokens': 39})"
"AIMessage(content=\"J'adore la programmation.\", response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 31, 'total_tokens': 39}, 'model_name': 'gpt-35-turbo', 'system_fingerprint': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-bea4b46c-e3e1-4495-9d3a-698370ad963d-0', usage_metadata={'input_tokens': 31, 'output_tokens': 8, 'total_tokens': 39})"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -174,7 +174,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 4,
"id": "d86145b3-bfef-46e8-b227-4dda5c9c2705",
"metadata": {},
"outputs": [
@@ -202,17 +202,17 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 5,
"id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe das Programmieren.', response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 26, 'total_tokens': 32}, 'model_name': 'gpt-35-turbo', 'system_fingerprint': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-084967d7-06f2-441f-b5c1-477e2a9e9d03-0', usage_metadata={'input_tokens': 26, 'output_tokens': 6, 'total_tokens': 32})"
"AIMessage(content='Ich liebe das Programmieren.', response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 26, 'total_tokens': 32}, 'model_name': 'gpt-35-turbo', 'system_fingerprint': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-cbc44038-09d3-40d4-9da2-c5910ee636ca-0', usage_metadata={'input_tokens': 26, 'output_tokens': 6, 'total_tokens': 32})"
]
},
"execution_count": 12,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -264,8 +264,8 @@
},
{
"cell_type": "code",
"execution_count": 5,
"id": "84c411b0-1790-4798-8bb7-47d8ece4c2dc",
"execution_count": 6,
"id": "2ca02d23-60d0-43eb-8d04-070f61f8fefd",
"metadata": {},
"outputs": [
{
@@ -288,22 +288,22 @@
},
{
"cell_type": "code",
"execution_count": 6,
"id": "21234693-d92b-4d69-8a7f-55aa062084bf",
"execution_count": 7,
"id": "e1b07ae2-3de7-44bd-bfdc-b76f4ba45a35",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Cost (USD): $0.000078\n"
"Total Cost (USD): $0.000074\n"
]
}
],
"source": [
"llm_0301 = AzureChatOpenAI(\n",
" azure_deployment=\"YOUR-DEPLOYMENT\",\n",
" api_version=\"2024-05-01-preview\",\n",
" azure_deployment=\"gpt-35-turbo\", # or your deployment\n",
" api_version=\"2023-06-01-preview\", # or your api version\n",
" model_version=\"0301\",\n",
")\n",
"with get_openai_callback() as cb:\n",
@@ -338,7 +338,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "raw",
"id": "afaf8039",
"id": "53fbf15f",
"metadata": {},
"source": [
"---\n",
@@ -12,129 +12,103 @@
},
{
"cell_type": "markdown",
"id": "e49f1e0d",
"id": "bf733a38-db84-4363-89e2-de6735c37230",
"metadata": {},
"source": [
"# ChatCohere\n",
"# Cohere\n",
"\n",
"This doc will help you get started with Cohere [chat models](/docs/concepts/#chat-models). For detailed documentation of all ChatCohere features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_cohere.chat_models.ChatCohere.html).\n",
"\n",
"For an overview of all Cohere models head to the [Cohere docs](https://docs.cohere.com/docs/models).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/cohere) | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatCohere](https://api.python.langchain.com/en/latest/chat_models/langchain_cohere.chat_models.ChatCohere.html) | [langchain-cohere](https://api.python.langchain.com/en/latest/cohere_api_reference.html) | ❌ | beta | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-cohere?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-cohere?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | \n",
"This notebook covers how to get started with [Cohere chat models](https://cohere.com/chat).\n",
"\n",
"Head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_community.chat_models.cohere.ChatCohere.html) for detailed documentation of all attributes and methods."
]
},
{
"cell_type": "markdown",
"id": "3607d67e-e56c-4102-bbba-df2edc0e109e",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"To access Cohere models you'll need to create a Cohere account, get an API key, and install the `langchain-cohere` integration package.\n",
"The integration lives in the `langchain-cohere` package. We can install these with:\n",
"\n",
"### Credentials\n",
"```bash\n",
"pip install -U langchain-cohere\n",
"```\n",
"\n",
"Head to https://dashboard.cohere.com/welcome/login to sign up to Cohere and generate an API key. Once you've done this set the COHERE_API_KEY environment variable:"
"We'll also need to get a [Cohere API key](https://cohere.com/) and set the `COHERE_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "433e8d2b-9519-4b49-b2c4-7ab65b046c94",
"execution_count": 11,
"id": "2108b517-1e8d-473d-92fa-4f930e8072a7",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"COHERE_API_KEY\"] = getpass.getpass(\"Enter your Cohere API key: \")"
"os.environ[\"COHERE_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "72ee0c4b-9764-423a-9dbf-95129e185210",
"id": "cf690fbb",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
"It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
"execution_count": 12,
"id": "7f11de02",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "0730d6a1-c893-4840-9817-5e5251676d5d",
"id": "4c26754b-b3c9-4d93-8f36-43049bd943bf",
"metadata": {},
"source": [
"### Installation\n",
"## Usage\n",
"\n",
"The LangChain Cohere integration lives in the `langchain-cohere` package:"
"ChatCohere supports all [ChatModel](/docs/how_to#chat-models) functionality:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-cohere"
]
},
{
"cell_type": "markdown",
"id": "a38cde65-254d-4219-a441-068766c0d4b5",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
"metadata": {},
"execution_count": 5,
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_cohere import ChatCohere\n",
"\n",
"llm = ChatCohere(\n",
" model=\"command-r-plus\",\n",
" temperature=0,\n",
" max_tokens=None,\n",
" timeout=None,\n",
" max_retries=2,\n",
" # other params...\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2b4f3e15",
"metadata": {},
"source": [
"## Invocation"
"from langchain_core.messages import HumanMessage"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "62e0dbc3",
"execution_count": 13,
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat = ChatCohere()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
"metadata": {
"tags": []
},
@@ -142,110 +116,223 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore programmer.\", additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'd84f80f3-4611-46e6-aed0-9d8665a20a11', 'token_count': {'input_tokens': 89, 'output_tokens': 5}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'd84f80f3-4611-46e6-aed0-9d8665a20a11', 'token_count': {'input_tokens': 89, 'output_tokens': 5}}, id='run-514ab516-ed7e-48ac-b132-2598fb80ebef-0')"
"AIMessage(content='4 && 5 \\n6 || 7 \\n\\nWould you like to play a game of odds and evens?', additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '2076b614-52b3-4082-a259-cc92cd3d9fea', 'token_count': {'prompt_tokens': 68, 'response_tokens': 23, 'total_tokens': 91, 'billed_tokens': 77}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '2076b614-52b3-4082-a259-cc92cd3d9fea', 'token_count': {'prompt_tokens': 68, 'response_tokens': 23, 'total_tokens': 91, 'billed_tokens': 77}}, id='run-3475e0c8-c89b-4937-9300-e07d652455e1-0')"
]
},
"execution_count": 2,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n",
" ),\n",
" (\"human\", \"I love programming.\"),\n",
"]\n",
"ai_msg = llm.invoke(messages)\n",
"ai_msg"
"messages = [HumanMessage(content=\"1\"), HumanMessage(content=\"2 3\")]\n",
"chat.invoke(messages)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d86145b3-bfef-46e8-b227-4dda5c9c2705",
"metadata": {},
"execution_count": 16,
"id": "c5fac0e9-05a4-4fc1-a3b3-e5bbb24b971b",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='4 && 5', additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'f0708a92-f874-46ee-9b93-334d616ad92e', 'token_count': {'prompt_tokens': 68, 'response_tokens': 3, 'total_tokens': 71, 'billed_tokens': 57}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': 'f0708a92-f874-46ee-9b93-334d616ad92e', 'token_count': {'prompt_tokens': 68, 'response_tokens': 3, 'total_tokens': 71, 'billed_tokens': 57}}, id='run-1635e63e-2994-4e7f-986e-152ddfc95777-0')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"await chat.ainvoke(messages)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "025be980-e50d-4a68-93dc-c9c7b500ce34",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"J'adore programmer.\n"
"4 && 5"
]
}
],
"source": [
"print(ai_msg.content)"
]
},
{
"cell_type": "markdown",
"id": "18e2bfc0-7e78-4528-a73f-499ac150dca8",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
"for chunk in chat.stream(messages):\n",
" print(chunk.content, end=\"\", flush=True)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
"execution_count": 18,
"id": "064288e4-f184-4496-9427-bcf148fa055e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe Programmierung.', additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '053bebde-4e1d-4d06-8ee6-3446e7afa25e', 'token_count': {'input_tokens': 84, 'output_tokens': 6}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '053bebde-4e1d-4d06-8ee6-3446e7afa25e', 'token_count': {'input_tokens': 84, 'output_tokens': 6}}, id='run-53700708-b7fb-417b-af36-1a6fcde38e7d-0')"
"[AIMessage(content='4 && 5', additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '6770ca86-f6c3-4ba3-a285-c4772160612f', 'token_count': {'prompt_tokens': 68, 'response_tokens': 3, 'total_tokens': 71, 'billed_tokens': 57}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '6770ca86-f6c3-4ba3-a285-c4772160612f', 'token_count': {'prompt_tokens': 68, 'response_tokens': 3, 'total_tokens': 71, 'billed_tokens': 57}}, id='run-8d6fade2-1b39-4e31-ab23-4be622dd0027-0')]"
]
},
"execution_count": 4,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
"chat.batch([messages])"
]
},
{
"cell_type": "markdown",
"id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3",
"id": "f1c56460",
"metadata": {},
"source": [
"## API reference\n",
"## Chaining\n",
"\n",
"For detailed documentation of all ChatCohere features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_cohere.chat_models.ChatCohere.html"
"You can also easily combine with a prompt template for easy structuring of user input. We can do this using [LCEL](/docs/concepts#langchain-expression-language-lcel)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "0851b103",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\"Tell me a joke about {topic}\")\n",
"chain = prompt | chat"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ae950c0f-1691-47f1-b609-273033cae707",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='What color socks do bears wear?\\n\\nThey dont wear socks, they have bear feet. \\n\\nHope you laughed! If not, maybe this will help: laughter is the best medicine, and a good sense of humor is infectious!', additional_kwargs={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '6edccf44-9bc8-4139-b30e-13b368f3563c', 'token_count': {'prompt_tokens': 68, 'response_tokens': 51, 'total_tokens': 119, 'billed_tokens': 108}}, response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '6edccf44-9bc8-4139-b30e-13b368f3563c', 'token_count': {'prompt_tokens': 68, 'response_tokens': 51, 'total_tokens': 119, 'billed_tokens': 108}}, id='run-ef7f9789-0d4d-43bf-a4f7-f2a0e27a5320-0')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"topic\": \"bears\"})"
]
},
{
"cell_type": "markdown",
"id": "12db8d69",
"metadata": {},
"source": [
"## Tool calling\n",
"\n",
"Cohere supports tool calling functionalities!"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "337e24af",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.messages import (\n",
" HumanMessage,\n",
" ToolMessage,\n",
")\n",
"from langchain_core.tools import tool"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "74d292e7",
"metadata": {},
"outputs": [],
"source": [
"@tool\n",
"def magic_function(number: int) -> int:\n",
" \"\"\"Applies a magic operation to an integer\n",
" Args:\n",
" number: Number to have magic operation performed on\n",
" \"\"\"\n",
" return number + 10\n",
"\n",
"\n",
"def invoke_tools(tool_calls, messages):\n",
" for tool_call in tool_calls:\n",
" selected_tool = {\"magic_function\": magic_function}[tool_call[\"name\"].lower()]\n",
" tool_output = selected_tool.invoke(tool_call[\"args\"])\n",
" messages.append(ToolMessage(tool_output, tool_call_id=tool_call[\"id\"]))\n",
" return messages\n",
"\n",
"\n",
"tools = [magic_function]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ecafcbc6",
"metadata": {},
"outputs": [],
"source": [
"llm_with_tools = chat.bind_tools(tools=tools)\n",
"messages = [HumanMessage(content=\"What is the value of magic_function(2)?\")]"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "aa34fc39",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='The value of magic_function(2) is 12.', additional_kwargs={'documents': [{'id': 'magic_function:0:2:0', 'output': '12', 'tool_name': 'magic_function'}], 'citations': [ChatCitation(start=34, end=36, text='12', document_ids=['magic_function:0:2:0'])], 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '96a55791-0c58-4e2e-bc2a-8550e137c46d', 'token_count': {'input_tokens': 998, 'output_tokens': 59}}, response_metadata={'documents': [{'id': 'magic_function:0:2:0', 'output': '12', 'tool_name': 'magic_function'}], 'citations': [ChatCitation(start=34, end=36, text='12', document_ids=['magic_function:0:2:0'])], 'search_results': None, 'search_queries': None, 'is_search_required': None, 'generation_id': '96a55791-0c58-4e2e-bc2a-8550e137c46d', 'token_count': {'input_tokens': 998, 'output_tokens': 59}}, id='run-f318a9cf-55c8-44f4-91d1-27cf46c6a465-0')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = llm_with_tools.invoke(messages)\n",
"while res.tool_calls:\n",
" messages.append(res)\n",
" messages = invoke_tools(res.tool_calls, messages)\n",
" res = llm_with_tools.invoke(messages)\n",
"\n",
"res"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv-2",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "poetry-venv-2"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -257,7 +344,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.9.6"
}
},
"nbformat": 4,

View File

@@ -36,7 +36,7 @@
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling/) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | \n",
"| | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | \n",
"\n",
"### Supported Methods\n",
"\n",
@@ -395,6 +395,66 @@
"chat_model_external.invoke(\"How to use Databricks?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Function calling on Databricks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Databricks Function Calling is OpenAI-compatible and is only available during model serving as part of Foundation Model APIs.\n",
"\n",
"See [Databricks function calling introduction](https://docs.databricks.com/en/machine-learning/model-serving/function-calling.html#supported-models) for supported models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models.databricks import ChatDatabricks\n",
"\n",
"llm = ChatDatabricks(endpoint=\"databricks-meta-llama-3-70b-instruct\")\n",
"tools = [\n",
" {\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"get_current_weather\",\n",
" \"description\": \"Get the current weather in a given location\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"location\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"The city and state, e.g. San Francisco, CA\",\n",
" },\n",
" \"unit\": {\"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"]},\n",
" },\n",
" },\n",
" },\n",
" }\n",
"]\n",
"\n",
"# supported tool_choice values: \"auto\", \"required\", \"none\", function name in string format,\n",
"# or a dictionary as {\"type\": \"function\", \"function\": {\"name\": <<tool_name>>}}\n",
"model = llm.bind_tools(tools, tool_choice=\"auto\")\n",
"\n",
"messages = [{\"role\": \"user\", \"content\": \"What is the current temperature of Chicago?\"}]\n",
"print(model.invoke(messages))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See [Databricks Unity Catalog](docs/integrations/tools/databricks.ipynb) about how to use UC functions in chains."
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -4,18 +4,68 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hugging Face\n",
"# ChatHuggingFace\n",
"\n",
"This notebook shows how to get started using `Hugging Face` LLM's as chat models.\n",
"This will help you getting started with `langchain_huggingface` [chat models](/docs/concepts/#chat-models). For detailed documentation of all `ChatHuggingFace` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html). For a list of models supported by Hugging Face check out [this page](https://huggingface.co/models).\n",
"\n",
"In particular, we will:\n",
"1. Utilize the [HuggingFaceEndpoint](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/llms/huggingface_endpoint.py) integrations to instantiate an `LLM`.\n",
"2. Utilize the `ChatHuggingFace` class to enable any of these LLMs to interface with LangChain's [Chat Messages](/docs/concepts/#message-types) abstraction.\n",
"3. Explore tool calling with the `ChatHuggingFace`.\n",
"4. Demonstrate how to use an open-source LLM to power an `ChatAgent` pipeline\n",
"## Overview\n",
"### Integration details\n",
"\n",
"### Integration details\n",
"\n",
"> Note: To get started, you'll need to have a [Hugging Face Access Token](https://huggingface.co/docs/hub/security-tokens) saved as an environment variable: `HUGGINGFACEHUB_API_TOKEN`."
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatHuggingFace](https://api.python.langchain.com/en/latest/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html) | [langchain-huggingface](https://api.python.langchain.com/en/latest/huggingface_api_reference.html) | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain_huggingface?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain_huggingface?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access Hugging Face models you'll need to create a Hugging Face account, get an API key, and install the `langchain-huggingface` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Generate a [Hugging Face Access Token](https://huggingface.co/docs/hub/security-tokens) and store it as an environment variable: `HUGGINGFACEHUB_API_TOKEN`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"HUGGINGFACEHUB_API_TOKEN\"):\n",
" os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = getpass.getpass(\"Enter your token: \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatHuggingFace](https://api.python.langchain.com/en/latest/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html) | [langchain_huggingface](https://api.python.langchain.com/en/latest/huggingface_api_reference.html) | ✅ | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain_huggingface?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain_huggingface?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access `langchain_huggingface` models you'll need to create a/an `Hugging Face` account, get an API key, and install the `langchain_huggingface` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"You'll need to have a [Hugging Face Access Token](https://huggingface.co/docs/hub/security-tokens) saved as an environment variable: `HUGGINGFACEHUB_API_TOKEN`."
]
},
{
@@ -24,14 +74,41 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2"
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = getpass.getpass(\n",
" \"Enter your Hugging Face API key: \"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Instantiate an LLM"
"## Instantiation\n",
"\n",
"You can instantiate a `ChatHuggingFace` model in two different ways, either from a `HuggingFaceEndpoint` or from a `HuggingFacePipeline`."
]
},
{
@@ -43,19 +120,32 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.\n",
"Token is valid (permission: fineGrained).\n",
"Your token has been saved to /Users/isaachershenson/.cache/huggingface/token\n",
"Login successful\n"
]
}
],
"source": [
"from langchain_huggingface import HuggingFaceEndpoint\n",
"from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint\n",
"\n",
"llm = HuggingFaceEndpoint(\n",
" repo_id=\"meta-llama/Meta-Llama-3-70B-Instruct\",\n",
" repo_id=\"HuggingFaceH4/zephyr-7b-beta\",\n",
" task=\"text-generation\",\n",
" max_new_tokens=512,\n",
" do_sample=False,\n",
" repetition_penalty=1.03,\n",
")"
")\n",
"\n",
"chat_model = ChatHuggingFace(llm=llm)"
]
},
{
@@ -67,11 +157,194 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "da32ae8ec8864ccfb480044fe2eec065",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"config.json: 0%| | 0.00/638 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ee1891b7e5f64fba88ba35f444e598fb",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model.safetensors.index.json: 0%| | 0.00/23.9k [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9ff1ec7f575b42adb608c15955de7888",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5214696698814b919f561647a684d1e4",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00001-of-00008.safetensors: 0%| | 0.00/1.89G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9ac334c69a2048a0a77340cca44d8c80",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00002-of-00008.safetensors: 0%| | 0.00/1.95G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "465ad1a51d414e0daf1cd9308455be94",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00003-of-00008.safetensors: 0%| | 0.00/1.98G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a329c43c3d574df0afd38c7457cc639c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00004-of-00008.safetensors: 0%| | 0.00/1.95G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a736a6c4023542af8c6ecc232b823d18",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00005-of-00008.safetensors: 0%| | 0.00/1.98G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8bdee70b843d433e8236fff83ecda022",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00006-of-00008.safetensors: 0%| | 0.00/1.95G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5ecb6103e0304ae188a14d598119a361",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00007-of-00008.safetensors: 0%| | 0.00/1.98G [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "174e3cb487bd453c9c70d7614254a35e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"model-00008-of-00008.safetensors: 0%| | 0.00/816M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "28f8c233b04b45d7800e12c785a8c4bc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "449dfa023dc8430fbcde94544ba01c4f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"generation_config.json: 0%| | 0.00/111 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain_huggingface import HuggingFacePipeline\n",
"from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline\n",
"\n",
"llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"HuggingFaceH4/zephyr-7b-beta\",\n",
@@ -81,81 +354,7 @@
" do_sample=False,\n",
" repetition_penalty=1.03,\n",
" ),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run a quantized version, you might specify a `bitsandbytes` quantization config as follows:\n",
"\n",
"```python\n",
"from transformers import BitsAndBytesConfig\n",
"\n",
"quantization_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=\"float16\",\n",
" bnb_4bit_use_double_quant=True\n",
")\n",
"```\n",
"\n",
"and pass it to the `HuggingFacePipeline` as a part of its `model_kwargs`:\n",
"\n",
"```python\n",
"pipeline = HuggingFacePipeline(\n",
" ...\n",
"\n",
" model_kwargs={\"quantization_config\": quantization_config},\n",
" \n",
" ...\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Instantiate the `ChatHuggingFace` to apply chat templates"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instantiate the chat model and some messages to pass. \n",
"\n",
"**Note**: you need to pass the `model_id` explicitly if you are using self-hosted `text-generation-inference`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
]
}
],
"source": [
"from langchain_core.messages import (\n",
" HumanMessage,\n",
" SystemMessage,\n",
")\n",
"from langchain_huggingface import ChatHuggingFace\n",
"\n",
"messages = [\n",
" SystemMessage(content=\"You're a helpful assistant\"),\n",
" HumanMessage(\n",
" content=\"What happens when an unstoppable force meets an immovable object?\"\n",
" ),\n",
"]\n",
"\n",
"chat_model = ChatHuggingFace(llm=llm)"
]
@@ -164,284 +363,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the `model_id`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'meta-llama/Meta-Llama-3-70B-Instruct'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat_model.model_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Inspect how the chat messages are formatted for the LLM call."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\nYou're a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\nWhat happens when an unstoppable force meets an immovable object?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat_model._to_chat_prompt(messages)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the model."
"### Instatiating with Quantization\n",
"\n",
"To run a quantized version of your model, you can specify a `bitsandbytes` quantization config as follows:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"One of the classic thought experiments in physics!\n",
"\n",
"The concept of an unstoppable force meeting an immovable object is a paradox that has puzzled philosophers and physicists for centuries. It's a mind-bending scenario that challenges our understanding of the fundamental laws of physics.\n",
"\n",
"In essence, an unstoppable force is something that cannot be halted or slowed down, while an immovable object is something that cannot be moved or displaced. If we assume that both entities exist in the same universe, we run into a logical contradiction.\n",
"\n",
"Here\n"
]
}
],
"source": [
"res = chat_model.invoke(messages)\n",
"print(res.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Explore the tool calling with `ChatHuggingFace`\n",
"\n",
"`text-generation-inference` supports tool with open source LLMs starting from v2.0.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a basic tool (`Calculator`):"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"from transformers import BitsAndBytesConfig\n",
"\n",
"\n",
"class Calculator(BaseModel):\n",
" \"\"\"Multiply two integers together.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bind the tool to the `chat_model` and give it a try:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Calculator(a=3, b=12)]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.output_parsers.openai_tools import PydanticToolsParser\n",
"\n",
"llm_with_multiply = chat_model.bind_tools([Calculator], tool_choice=\"auto\")\n",
"parser = PydanticToolsParser(tools=[Calculator])\n",
"tool_chain = llm_with_multiply | parser\n",
"tool_chain.invoke(\"How much is 3 multiplied by 12?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Take it for a spin as an agent!\n",
"\n",
"Here we'll test out `Zephyr-7B-beta` as a zero-shot `ReAct` Agent. \n",
"\n",
"The agent is based on the paper [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)\n",
"\n",
"The example below is taken from [here](https://python.langchain.com/v0.1/docs/modules/agents/agent_types/react/#using-chat-models).\n",
"\n",
"> Note: To run this section, you'll need to have a [SerpAPI Token](https://serpapi.com/) saved as an environment variable: `SERPAPI_API_KEY`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain import hub\n",
"from langchain.agents import AgentExecutor, load_tools\n",
"from langchain.agents.format_scratchpad import format_log_to_str\n",
"from langchain.agents.output_parsers import (\n",
" ReActJsonSingleInputOutputParser,\n",
")\n",
"from langchain.tools.render import render_text_description\n",
"from langchain_community.utilities import SerpAPIWrapper"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Configure the agent with a `react-json` style prompt and access to a search engine and calculator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# setup tools\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"\n",
"# setup ReAct style prompt\n",
"prompt = hub.pull(\"hwchase17/react-json\")\n",
"prompt = prompt.partial(\n",
" tools=render_text_description(tools),\n",
" tool_names=\", \".join([t.name for t in tools]),\n",
")\n",
"\n",
"# define the agent\n",
"chat_model_with_stop = chat_model.bind(stop=[\"\\nObservation\"])\n",
"agent = (\n",
" {\n",
" \"input\": lambda x: x[\"input\"],\n",
" \"agent_scratchpad\": lambda x: format_log_to_str(x[\"intermediate_steps\"]),\n",
" }\n",
" | prompt\n",
" | chat_model_with_stop\n",
" | ReActJsonSingleInputOutputParser()\n",
")\n",
"\n",
"# instantiate AgentExecutor\n",
"agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mQuestion: Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\n",
"\n",
"Thought: I need to use the Search tool to find out who Leo DiCaprio's current girlfriend is. Then, I can use the Calculator tool to raise her current age to the power of 0.43.\n",
"\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Search\",\n",
" \"action_input\": \"leo dicaprio girlfriend\"\n",
"}\n",
"```\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mLeonardo DiCaprio may have found The One in Vittoria Ceretti. “They are in love,” a source exclusively reveals in the latest issue of Us Weekly. “Leo was clearly very proud to be showing Vittoria off and letting everyone see how happy they are together.”\u001b[0m\u001b[32;1m\u001b[1;3mNow that we know Leo DiCaprio's current girlfriend is Vittoria Ceretti, let's find out her current age.\n",
"\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Search\",\n",
" \"action_input\": \"vittoria ceretti age\"\n",
"}\n",
"```\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m25 years\u001b[0m\u001b[32;1m\u001b[1;3mNow that we know Vittoria Ceretti's current age is 25, let's use the Calculator tool to raise it to the power of 0.43.\n",
"\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Calculator\",\n",
" \"action_input\": \"25^0.43\"\n",
"}\n",
"```\n",
"\u001b[0m\u001b[33;1m\u001b[1;3mAnswer: 3.991298452658078\u001b[0m\u001b[32;1m\u001b[1;3mFinal Answer: Vittoria Ceretti, Leo DiCaprio's current girlfriend, when raised to the power of 0.43 is approximately 4.0 rounded to two decimal places. Her current age is 25 years old.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\",\n",
" 'output': \"Vittoria Ceretti, Leo DiCaprio's current girlfriend, when raised to the power of 0.43 is approximately 4.0 rounded to two decimal places. Her current age is 25 years old.\"}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke(\n",
" {\n",
" \"input\": \"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\"\n",
" }\n",
"quantization_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=\"float16\",\n",
" bnb_4bit_use_double_quant=True,\n",
")"
]
},
@@ -449,14 +388,92 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Wahoo! Our open-source 7b parameter Zephyr model was able to:\n",
"and pass it to the `HuggingFacePipeline` as a part of its `model_kwargs`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"HuggingFaceH4/zephyr-7b-beta\",\n",
" task=\"text-generation\",\n",
" pipeline_kwargs=dict(\n",
" max_new_tokens=512,\n",
" do_sample=False,\n",
" repetition_penalty=1.03,\n",
" ),\n",
" model_kwargs={\"quantization_config\": quantization_config},\n",
")\n",
"\n",
"1. Plan out a series of actions: `I need to use the Search tool to find out who Leo DiCaprio's current girlfriend is. Then, I can use the Calculator tool to raise her current age to the power of 0.43.`\n",
"2. Then execute a search using the SerpAPI tool to find who Leo DiCaprio's current girlfriend is\n",
"3. Execute another search to find her age\n",
"4. And finally use a calculator tool to calculate her age raised to the power of 0.43\n",
"chat_model = ChatHuggingFace(llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.messages import (\n",
" HumanMessage,\n",
" SystemMessage,\n",
")\n",
"\n",
"It's exciting to see how far open-source LLM's can go as general purpose reasoning agents. Give it a try yourself!"
"messages = [\n",
" SystemMessage(content=\"You're a helpful assistant\"),\n",
" HumanMessage(\n",
" content=\"What happens when an unstoppable force meets an immovable object?\"\n",
" ),\n",
"]\n",
"\n",
"ai_msg = chat_model.invoke(messages)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"According to the popular phrase and hypothetical scenario, when an unstoppable force meets an immovable object, a paradoxical situation arises as both forces are seemingly contradictory. On one hand, an unstoppable force is an entity that cannot be stopped or prevented from moving forward, while on the other hand, an immovable object is something that cannot be moved or displaced from its position. \n",
"\n",
"In this scenario, it is un\n"
]
}
],
"source": [
"print(ai_msg.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `ChatHuggingFace` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all ChatHuggingFace features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html"
]
}
],
@@ -476,7 +493,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,32 @@
---
sidebar_position: 0
sidebar_class_name: hidden
keywords: [compatibility]
---
# Chat models
[Chat models](/docs/concepts/#chat-models) are language models that use a sequence of [messages](/docs/concepts/#messages) as inputs and return messages as outputs (as opposed to using plain text). These are generally newer models.
:::info
If you'd like to write your own chat model, see [this how-to](/docs/how_to/custom_chat_model/).
If you'd like to contribute an integration, see [Contributing integrations](/docs/contributing/integrations/).
:::
## Featured Providers
:::info
While all these LangChain classes support the indicated advanced feature, you may have
to open the provider-specific documentation to learn which hosted models or backends support
the feature.
:::
import { CategoryTable, IndexTable } from "@theme/FeatureTables";
<CategoryTable category="chat" />
## All chat models
<IndexTable />

View File

@@ -13,7 +13,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kinetica SqlAssist LLM Demo\n",
"# Kinetica Language To SQL Chat Model\n",
"\n",
"This notebook demonstrates how to use Kinetica to transform natural language into SQL\n",
"and simplify the process of data retrieval. This demo is intended to show the mechanics\n",

View File

@@ -4,9 +4,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# ChatLlamaCpp\n",
"# Llama.cpp\n",
"\n",
"This notebook provides a quick overview for getting started with chat model intergrated with [llama cpp python](https://github.com/abetlen/llama-cpp-python)."
">[llama.cpp python](https://github.com/abetlen/llama-cpp-python) library is a simple Python bindings for `@ggerganov`\n",
">[llama.cpp](https://github.com/ggerganov/llama.cpp).\n",
">\n",
">This package provides:\n",
">\n",
"> - Low-level access to C API via ctypes interface.\n",
"> - High-level Python API for text completion\n",
"> - `OpenAI`-like API\n",
"> - `LangChain` compatibility\n",
"> - `LlamaIndex` compatibility\n",
"> - OpenAI compatible web server\n",
"> - Local Copilot replacement\n",
"> - Function Calling support\n",
"> - Vision API support\n",
"> - Multiple Models\n"
]
},
{
@@ -212,8 +226,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools import tool\n",
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"from langchain_core.tools import tool\n",
"\n",
"\n",
"class WeatherInput(BaseModel):\n",
@@ -410,7 +424,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@@ -12,254 +12,228 @@
},
{
"cell_type": "markdown",
"id": "bf733a38-db84-4363-89e2-de6735c37230",
"id": "d295c2a2",
"metadata": {},
"source": [
"# MistralAI\n",
"# ChatMistralAI\n",
"\n",
"This notebook covers how to get started with MistralAI chat models, via their [API](https://docs.mistral.ai/api/).\n",
"This will help you getting started with Mistral [chat models](/docs/concepts/#chat-models). For detailed documentation of all `ChatMistralAI` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html). The `ChatMistralAI` class is built on top of the [Mistral API](https://docs.mistral.ai/api/). For a list of all the models supported by Mistral, check out [this page](https://docs.mistral.ai/getting-started/models/).\n",
"\n",
"A valid [API key](https://console.mistral.ai/users/api-keys/) is needed to communicate with the API.\n",
"## Overview\n",
"### Integration details\n",
"\n",
"Head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html) for detailed documentation of all attributes and methods."
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/mistral) | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatMistralAI](https://api.python.langchain.com/en/latest/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html) | [langchain_mistralai](https://api.python.langchain.com/en/latest/mistralai_api_reference.html) | ❌ | beta | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain_mistralai?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain_mistralai?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"\n",
"To access `ChatMistralAI` models you'll need to create a Mistral account, get an API key, and install the `langchain_mistralai` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"\n",
"A valid [API key](https://console.mistral.ai/users/api-keys/) is needed to communicate with the API. Once you've done this set the MISTRAL_API_KEY environment variable:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2461605e",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"MISTRAL_API_KEY\"] = getpass.getpass(\"Enter your Mistral API key: \")"
]
},
{
"cell_type": "markdown",
"id": "cc686b8f",
"id": "788f37ac",
"metadata": {},
"source": [
"## Setup\n",
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "007209d5",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"id": "0f5c74f9",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"You will need the `langchain-core` and `langchain-mistralai` package to use the API. You can install these with:\n",
"The LangChain Mistral integration lives in the `langchain_mistralai` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ab11a65",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_mistralai"
]
},
{
"cell_type": "markdown",
"id": "fb1a335e",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"```bash\n",
"pip install -U langchain-core langchain-mistralai\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e6c38580",
"metadata": {},
"outputs": [],
"source": [
"from langchain_mistralai import ChatMistralAI\n",
"\n",
"We'll also need to get a [Mistral API key](https://console.mistral.ai/users/api-keys/)"
"llm = ChatMistralAI(\n",
" model=\"mistral-large-latest\",\n",
" temperature=0,\n",
" max_retries=2,\n",
" # other params...\n",
")"
]
},
{
"cell_type": "markdown",
"id": "aec79099",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8838c3cc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Sure, I\\'d be happy to help you translate that sentence into French! The English sentence \"I love programming\" translates to \"J\\'aime programmer\" in French. Let me know if you have any other questions or need further assistance!', response_metadata={'token_usage': {'prompt_tokens': 32, 'total_tokens': 84, 'completion_tokens': 52}, 'model': 'mistral-small', 'finish_reason': 'stop'}, id='run-64bac156-7160-4b68-b67e-4161f63e021f-0', usage_metadata={'input_tokens': 32, 'output_tokens': 52, 'total_tokens': 84})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n",
" ),\n",
" (\"human\", \"I love programming.\"),\n",
"]\n",
"ai_msg = llm.invoke(messages)\n",
"ai_msg"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c3fd4184",
"id": "bbf6a048",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"api_key = getpass.getpass()"
]
},
{
"cell_type": "markdown",
"id": "502127fd",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_core.messages import HumanMessage\n",
"from langchain_mistralai.chat_models import ChatMistralAI"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# If api_key is not passed, default behavior is to use the `MISTRAL_API_KEY` environment variable.\n",
"chat = ChatMistralAI(api_key=api_key)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Who's there? I was just about to ask the same thing! How can I assist you today?\")"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [HumanMessage(content=\"knock knock\")]\n",
"chat.invoke(messages)"
]
},
{
"cell_type": "markdown",
"id": "c361ab1e-8c0c-4206-9e3c-9d1424a12b9c",
"metadata": {},
"source": [
"### Async"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c5fac0e9-05a4-4fc1-a3b3-e5bbb24b971b",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Who\\'s there?\\n\\n(You can then continue the \"knock knock\" joke by saying the name of the person or character who should be responding. For example, if I say \"Banana,\" you could respond with \"Banana who?\" and I would say \"Banana bunch! Get it? Because a group of bananas is called a \\'bunch\\'!\" and then we would both laugh and have a great time. But really, you can put anything you want in the spot where I put \"Banana\" and it will still technically be a \"knock knock\" joke. The possibilities are endless!)')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"await chat.ainvoke(messages)"
]
},
{
"cell_type": "markdown",
"id": "86ccef97",
"metadata": {},
"source": [
"### Streaming\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "025be980-e50d-4a68-93dc-c9c7b500ce34",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Who's there?\n",
"\n",
"(After this, the conversation can continue as a call and response \"who's there\" joke. Here is an example of how it could go:\n",
"\n",
"You say: Orange.\n",
"I say: Orange who?\n",
"You say: Orange you glad I didn't say banana!?)\n",
"\n",
"But since you asked for a knock knock joke specifically, here's one for you:\n",
"\n",
"Knock knock.\n",
"\n",
"Me: Who's there?\n",
"\n",
"You: Lettuce.\n",
"\n",
"Me: Lettuce who?\n",
"\n",
"You: Lettuce in, it's too cold out here!\n",
"\n",
"I hope this brings a smile to your face! Do you have a favorite knock knock joke you'd like to share? I'd love to hear it."
"Sure, I'd be happy to help you translate that sentence into French! The English sentence \"I love programming\" translates to \"J'aime programmer\" in French. Let me know if you have any other questions or need further assistance!\n"
]
}
],
"source": [
"for chunk in chat.stream(messages):\n",
" print(chunk.content, end=\"\")"
"print(ai_msg.content)"
]
},
{
"cell_type": "markdown",
"id": "f6189577",
"metadata": {},
"source": [
"### Batch"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "e63aebcb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[AIMessage(content=\"Who's there? I was just about to ask the same thing! Go ahead and tell me who's there. I love a good knock-knock joke.\")]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat.batch([messages])"
]
},
{
"cell_type": "markdown",
"id": "38e39e71",
"id": "32b87f87",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"You can also easily combine with a prompt template for easy structuring of user input. We can do this using [LCEL](/docs/concepts#langchain-expression-language-lcel)"
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ee43a1ae",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\"Tell me a joke about {topic}\")\n",
"chain = prompt | chat"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0dc49212",
"execution_count": 8,
"id": "24e2c51c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Why do bears hate shoes so much? They like to run around in their bear feet.')"
"AIMessage(content='Ich liebe Programmierung. (German translation)', response_metadata={'token_usage': {'prompt_tokens': 26, 'total_tokens': 38, 'completion_tokens': 12}, 'model': 'mistral-small', 'finish_reason': 'stop'}, id='run-dfd4094f-e347-47b0-9056-8ebd7ea35fe7-0', usage_metadata={'input_tokens': 26, 'output_tokens': 12, 'total_tokens': 38})"
]
},
"execution_count": 14,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"topic\": \"bears\"})"
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "cb9b5834",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"Head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html) for detailed documentation of all attributes and methods."
]
}
],
@@ -279,7 +253,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -2,13 +2,24 @@
"cells": [
{
"cell_type": "markdown",
"id": "cc6caafa",
"metadata": {
"id": "cc6caafa"
},
"id": "1f666798-8635-4bc0-a515-04d318588d67",
"metadata": {},
"source": [
"# NVIDIA NIMs\n",
"---\n",
"sidebar_label: NVIDIA AI Endpoints\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "fa8eb20e-4db8-45e3-9e79-c595f4f274da",
"metadata": {},
"source": [
"# ChatNVIDIA\n",
"\n",
"This will help you getting started with NVIDIA [chat models](/docs/concepts/#chat-models). For detailed documentation of all `ChatNVIDIA` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_nvidia_ai_endpoints.chat_models.ChatNVIDIA.html).\n",
"\n",
"## Overview\n",
"The `langchain-nvidia-ai-endpoints` package contains LangChain integrations building applications with models on \n",
"NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
"from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
@@ -24,7 +35,66 @@
"\n",
"This example goes over how to use LangChain to interact with NVIDIA supported via the `ChatNVIDIA` class.\n",
"\n",
"For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation."
"For more information on accessing the chat models through this api, check out the [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/) documentation.\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatNVIDIA](https://api.python.langchain.com/en/latest/chat_models/langchain_nvidia_ai_endpoints.chat_models.ChatNVIDIA.html) | [langchain_nvidia_ai_endpoints](https://api.python.langchain.com/en/latest/nvidia_ai_endpoints_api_reference.html) | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain_nvidia_ai_endpoints?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain_nvidia_ai_endpoints?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"| [Tool calling](/docs/how_to/tool_calling) | [Structured output](/docs/how_to/structured_output/) | JSON mode | [Image input](/docs/how_to/multimodal_inputs/) | Audio input | Video input | [Token-level streaming](/docs/how_to/chat_streaming/) | Native async | [Token usage](/docs/how_to/chat_token_usage_tracking/) | [Logprobs](/docs/how_to/logprobs/) |\n",
"| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n",
"| ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"**To get started:**\n",
"\n",
"1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
"\n",
"2. Click on your model of choice.\n",
"\n",
"3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
"\n",
"4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.\n",
"\n",
"### Credentials\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "208b72da-1535-4249-bbd3-2500028e25e9",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"NVIDIA_API_KEY\"):\n",
" # Note: the API key should start with \"nvapi-\"\n",
" os.environ[\"NVIDIA_API_KEY\"] = getpass.getpass(\"Enter your NVIDIA API key: \")"
]
},
{
"cell_type": "markdown",
"id": "52dc8dcb-0a48-4a4e-9947-764116d2ffd4",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cd9cb12-6ca5-432a-9e42-8a57da073c7e",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
@@ -32,7 +102,9 @@
"id": "f2be90a9",
"metadata": {},
"source": [
"## Installation"
"### Installation\n",
"\n",
"The LangChain NVIDIA AI Endpoints integration lives in the `langchain_nvidia_ai_endpoints` package:"
]
},
{
@@ -45,51 +117,14 @@
"%pip install --upgrade --quiet langchain-nvidia-ai-endpoints"
]
},
{
"cell_type": "markdown",
"id": "ccff689e",
"metadata": {
"id": "ccff689e"
},
"source": [
"## Setup\n",
"\n",
"**To get started:**\n",
"\n",
"1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
"\n",
"2. Click on your model of choice.\n",
"\n",
"3. Under `Input` select the `Python` tab, and click `Get API Key`. Then click `Generate Key`.\n",
"\n",
"4. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "686c4d2f",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"# del os.environ['NVIDIA_API_KEY'] ## delete key and reset\n",
"if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
" print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
"else:\n",
" nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
" assert nvapi_key.startswith(\"nvapi-\"), f\"{nvapi_key[:5]}... is not a valid key\"\n",
" os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
]
},
{
"cell_type": "markdown",
"id": "af0ce26b",
"metadata": {},
"source": [
"## Working with NVIDIA API Catalog"
"## Instantiation\n",
"\n",
"Now we can access models in the NVIDIA API Catalog:"
]
},
{
@@ -108,7 +143,24 @@
"## Core LC Chat Interface\n",
"from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
"\n",
"llm = ChatNVIDIA(model=\"mistralai/mixtral-8x7b-instruct-v0.1\")\n",
"llm = ChatNVIDIA(model=\"mistralai/mixtral-8x7b-instruct-v0.1\")"
]
},
{
"cell_type": "markdown",
"id": "469c8c7f-de62-457f-a30f-674763a8b717",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9512c81b-1f3a-4eca-9470-f52cedff5c74",
"metadata": {},
"outputs": [],
"source": [
"result = llm.invoke(\"Write a ballad about LangChain.\")\n",
"print(result.content)"
]
@@ -630,6 +682,55 @@
"source": [
"See [How to use chat models to call tools](https://python.langchain.com/v0.2/docs/how_to/tool_calling/) for additional examples."
]
},
{
"cell_type": "markdown",
"id": "a9a3c438-121d-46eb-8fb5-b8d5a13cd4a4",
"metadata": {},
"source": [
"## Chaining\n",
"\n",
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af585c6b-fe0a-4833-9860-a4209a71b3c6",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f2f25dd3-0b4a-465f-a53e-95521cdc253c",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `ChatNVIDIA` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_nvidia_ai_endpoints.chat_models.ChatNVIDIA.html"
]
}
],
"metadata": {
@@ -651,7 +752,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -99,7 +99,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -56,23 +56,16 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "e817fe2e-4f1d-4533-b19e-2400b1cf6ce8",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"Enter your OpenAI API key: ········\n"
]
}
],
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
"if not os.environ.get(\"OPENAI_API_KEY\"):\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
]
},
{
@@ -126,7 +119,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "522686de",
"metadata": {
"tags": []
@@ -158,7 +151,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"id": "ce16ad78-8e6f-48cd-954e-98be75eb5836",
"metadata": {
"tags": []
@@ -167,10 +160,10 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 31, 'total_tokens': 36}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_43dfabdef1', 'finish_reason': 'stop', 'logprobs': None}, id='run-012cffe2-5d3d-424d-83b5-51c6d4a593d1-0', usage_metadata={'input_tokens': 31, 'output_tokens': 5, 'total_tokens': 36})"
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 31, 'total_tokens': 36}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-63219b22-03e3-4561-8cc4-78b7c7c3a3ca-0', usage_metadata={'input_tokens': 31, 'output_tokens': 5, 'total_tokens': 36})"
]
},
"execution_count": 6,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -189,7 +182,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 4,
"id": "2cd224b8-4499-41fb-a604-d53a7ff17b2e",
"metadata": {},
"outputs": [
@@ -217,7 +210,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 5,
"id": "fbb043e6",
"metadata": {
"tags": []
@@ -226,10 +219,10 @@
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe Programmieren.', response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 26, 'total_tokens': 31}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-94fa6741-c99b-4513-afce-c3f562631c79-0')"
"AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 26, 'total_tokens': 32}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-350585e1-16ca-4dad-9460-3d9e7e49aaf1-0', usage_metadata={'input_tokens': 26, 'output_tokens': 6, 'total_tokens': 32})"
]
},
"execution_count": 8,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -281,12 +274,12 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"id": "b7ea7690-ec7a-4337-b392-e87d1f39a6ec",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class GetWeather(BaseModel):\n",
@@ -300,17 +293,17 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 7,
"id": "1d1ab955-6a68-42f8-bb5d-86eb1111478a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_H7fABDuzEau48T10Qn0Lsh0D', 'function': {'arguments': '{\"location\":\"San Francisco\"}', 'name': 'GetWeather'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 70, 'total_tokens': 85}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-b469135e-2718-446a-8164-eef37e672ba2-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco'}, 'id': 'call_H7fABDuzEau48T10Qn0Lsh0D'}])"
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'function': {'arguments': '{\"location\":\"San Francisco, CA\"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-1617c9b2-dda5-4120-996b-0333ed5992e2-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})"
]
},
"execution_count": 10,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -322,6 +315,47 @@
"ai_msg"
]
},
{
"cell_type": "markdown",
"id": "67b0f63d-15e6-45e0-9e86-2852ddcff54f",
"metadata": {},
"source": [
"### ``strict=True``\n",
"\n",
":::info Requires ``langchain-openai>=0.1.21rc1``\n",
"\n",
":::\n",
"\n",
"As of Aug 6, 2024, OpenAI supports a `strict` argument when calling tools that will enforce that the tool argument schema is respected by the model. See more here: https://platform.openai.com/docs/guides/function-calling\n",
"\n",
"**Note**: If ``strict=True`` the tool definition will also be validated, and a subset of JSON schema are accepted. Crucially, schema cannot have optional args (those with default values). Read the full docs on what types of schema are supported here: https://platform.openai.com/docs/guides/structured-outputs/supported-schemas. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dc8ac4f1-4039-4392-90c1-2d8331cd6910",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'function': {'arguments': '{\"location\":\"San Francisco, CA\"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-5e3356a9-132d-4623-8e73-dd5a898cf4a6-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_tools = llm.bind_tools([GetWeather], strict=True)\n",
"ai_msg = llm_with_tools.invoke(\n",
" \"what is the weather like in San Francisco\",\n",
")\n",
"ai_msg"
]
},
{
"cell_type": "markdown",
"id": "768d1ae4-4b1a-48eb-a329-c8d5051067a3",
@@ -333,7 +367,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 9,
"id": "166cb7ce-831d-4a7c-9721-abc107f11084",
"metadata": {},
"outputs": [
@@ -341,11 +375,12 @@
"data": {
"text/plain": [
"[{'name': 'GetWeather',\n",
" 'args': {'location': 'San Francisco'},\n",
" 'id': 'call_H7fABDuzEau48T10Qn0Lsh0D'}]"
" 'args': {'location': 'San Francisco, CA'},\n",
" 'id': 'call_jUqhd8wzAIzInTJl72Rla8ht',\n",
" 'type': 'tool_call'}]"
]
},
"execution_count": 11,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -376,17 +411,17 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 11,
"id": "33c4a8b0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={}, example=False)"
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 31, 'total_tokens': 39}, 'model_name': 'ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-0f39b30e-c56e-4f3b-af99-5c948c984146-0', usage_metadata={'input_tokens': 31, 'output_tokens': 8, 'total_tokens': 39})"
]
},
"execution_count": 6,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -396,7 +431,7 @@
" temperature=0, model_name=\"ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR\"\n",
")\n",
"\n",
"fine_tuned_model(messages)"
"fine_tuned_model.invoke(messages)"
]
},
{
@@ -412,9 +447,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv-2",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "poetry-venv-2"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -426,7 +461,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -17,7 +17,7 @@
"source": [
"# ChatPerplexity\n",
"\n",
"This notebook covers how to get started with Perplexity chat models."
"This notebook covers how to get started with `Perplexity` chat models."
]
},
{
@@ -37,17 +37,31 @@
"from langchain_core.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "markdown",
"id": "b26e2035-2f81-4451-ba44-fa2e2d5aeb62",
"metadata": {},
"source": [
"The code provided assumes that your PPLX_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d986aac6-1bae-4608-8514-d3ba5b35b10e",
"metadata": {},
"outputs": [],
"source": [
"chat = ChatPerplexity(\n",
" temperature=0, pplx_api_key=\"YOUR_API_KEY\", model=\"llama-3-sonar-small-32k-online\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "97a8ce3a",
"metadata": {},
"source": [
"The code provided assumes that your PPLX_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:\n",
"\n",
"```python\n",
"chat = ChatPerplexity(temperature=0, pplx_api_key=\"YOUR_API_KEY\", model=\"llama-3-sonar-small-32k-online\")\n",
"```\n",
"\n",
"You can check a list of available models [here](https://docs.perplexity.ai/docs/model-cards). For reproducibility, we can set the API key dynamically by taking it as an input in this notebook."
]
},
@@ -221,7 +235,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@@ -53,7 +53,8 @@
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"TOGETHER_API_KEY\"] = getpass.getpass(\"Enter your Together API key: \")"
"if \"TOGETHER_API_KEY\" not in os.environ:\n",
" os.environ[\"TOGETHER_API_KEY\"] = getpass.getpass(\"Enter your Together API key: \")"
]
},
{
@@ -87,21 +88,10 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"outputs": [],
"source": [
"%pip install -qU langchain-together"
]
@@ -113,14 +103,12 @@
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:\n",
"\n",
"- TODO: Update model instantiation with relevant params."
"Now we can instantiate our model object and generate chat completions:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae",
"metadata": {},
"outputs": [],
@@ -147,7 +135,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"id": "62e0dbc3",
"metadata": {
"tags": []
@@ -156,10 +144,10 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 35, 'total_tokens': 44}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-79efa49b-dbaf-4ef8-9dce-958533823ef6-0', usage_metadata={'input_tokens': 35, 'output_tokens': 9, 'total_tokens': 44})"
"AIMessage(content=\"J'adore la programmation.\", response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 35, 'total_tokens': 44}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-eabcbe33-cdd8-45b8-ab0b-f90b6e7dfad8-0', usage_metadata={'input_tokens': 35, 'output_tokens': 9, 'total_tokens': 44})"
]
},
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -178,7 +166,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"id": "d86145b3-bfef-46e8-b227-4dda5c9c2705",
"metadata": {},
"outputs": [
@@ -206,17 +194,17 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Ich liebe das Programmieren.', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 30, 'total_tokens': 37}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-80bba5fa-1723-4242-8d5a-c09b76b8350b-0', usage_metadata={'input_tokens': 30, 'output_tokens': 7, 'total_tokens': 37})"
"AIMessage(content='Ich liebe das Programmieren.', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 30, 'total_tokens': 37}, 'model_name': 'meta-llama/Llama-3-70b-chat-hf', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-a249aa24-ee31-46ba-9bf9-f4eb135b0a95-0', usage_metadata={'input_tokens': 30, 'output_tokens': 7, 'total_tokens': 37})"
]
},
"execution_count": 8,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -271,7 +259,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -12,14 +12,83 @@
},
{
"cell_type": "markdown",
"id": "eb7e5679-aa06-47e4-a1a3-b6b70e604017",
"id": "8f82e243-f4ee-44e2-b417-099b6401ae3e",
"metadata": {},
"source": [
"# vLLM Chat\n",
"\n",
"vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. This server can be queried in the same format as OpenAI API.\n",
"\n",
"This notebook covers how to get started with vLLM chat models using langchain's `ChatOpenAI` **as it is**."
"## Overview\n",
"This will help you getting started with vLLM [chat models](/docs/concepts/#chat-models), which leverage the `langchain-openai` package. For detailed documentation of all `ChatOpenAI` features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html).\n",
"\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n",
"| [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html) | [langchain_openai](https://api.python.langchain.com/en/latest/langchain_openai.html) | ✅ | beta | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain_openai?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain_openai?style=flat-square&label=%20) |\n",
"\n",
"### Model features\n",
"Specific model features-- such as tool calling, support for multi-modal inputs, support for token-level streaming, etc.-- will depend on the hosted model.\n",
"\n",
"## Setup\n",
"\n",
"See the vLLM docs [here](https://docs.vllm.ai/en/latest/).\n",
"\n",
"To access vLLM models through LangChain, you'll need to install the `langchain-openai` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"Authentication will depend on specifics of the inference server."
]
},
{
"cell_type": "markdown",
"id": "c3b1707a-cf2c-4367-94e3-436c43402503",
"metadata": {},
"source": [
"If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e40bd5e-cbaa-41ef-aaf9-0858eb207184",
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")"
]
},
{
"cell_type": "markdown",
"id": "0739b647-609b-46d3-bdd3-e86fe4463288",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"The LangChain vLLM integration can be accessed via the `langchain-openai` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7afcfbdc-56aa-4529-825a-8acbe7aa5241",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "2cf576d6-7b67-4937-bf99-39071e85720c",
"metadata": {},
"source": [
"## Instantiation\n",
"\n",
"Now we can instantiate our model object and generate chat completions:"
]
},
{
@@ -51,7 +120,7 @@
"source": [
"inference_server_url = \"http://localhost:8000/v1\"\n",
"\n",
"chat = ChatOpenAI(\n",
"llm = ChatOpenAI(\n",
" model=\"mosaicml/mpt-7b\",\n",
" openai_api_key=\"EMPTY\",\n",
" openai_api_base=inference_server_url,\n",
@@ -60,6 +129,14 @@
")"
]
},
{
"cell_type": "markdown",
"id": "34b18328-5e8b-4ff2-9b89-6fbb76b5c7f0",
"metadata": {},
"source": [
"## Invocation"
]
},
{
"cell_type": "code",
"execution_count": 15,
@@ -88,82 +165,66 @@
" content=\"Translate the following sentence from English to Italian: I love programming.\"\n",
" ),\n",
"]\n",
"chat(messages)"
"llm.invoke(messages)"
]
},
{
"cell_type": "markdown",
"id": "55fc7046-a6dc-4720-8c0c-24a6db76a4f4",
"id": "a580a1e4-11a3-4277-bfba-bfb414ac7201",
"metadata": {},
"source": [
"You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use ChatPromptTemplate's format_prompt -- this returns a `PromptValue`, which you can convert to a string or `Message` object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
"## Chaining\n",
"\n",
"For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "123980e9-0dee-4ce5-bde6-d964dd90129c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"template = (\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
")\n",
"system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
"human_template = \"{text}\"\n",
"human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "b2fb8c59-8892-4270-85a2-4f8ab276b75d",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' I love programming too.', additional_kwargs={}, example=False)"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat_prompt = ChatPromptTemplate.from_messages(\n",
" [system_message_prompt, human_message_prompt]\n",
")\n",
"\n",
"# get a chat completion from the formatted messages\n",
"chat(\n",
" chat_prompt.format_prompt(\n",
" input_language=\"English\", output_language=\"Italian\", text=\"I love programming.\"\n",
" ).to_messages()\n",
")"
"We can [chain](/docs/how_to/sequence/) our model with a prompt template like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bbd9861-2b94-4920-8708-b690004f4c4d",
"id": "dd0f4043-48bd-4245-8bdb-e7669666a277",
"metadata": {},
"outputs": [],
"source": []
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate(\n",
" [\n",
" (\n",
" \"system\",\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n",
" ),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | llm\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"German\",\n",
" \"input\": \"I love programming.\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "265f5d51-0a76-4808-8d13-ef598ee6e366",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all features and configurations exposed via `langchain-openai`, head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html\n",
"\n",
"Refer to the vLLM [documentation](https://docs.vllm.ai/en/latest/) as well."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p310",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "conda_pytorch_p310"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@@ -175,7 +236,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.4"
}
},
"nbformat": 4,

View File

@@ -13,7 +13,7 @@
"\n",
"## Prerequisites\n",
"\n",
"You need to have an existing dataset on the Apify platform. If you don't have one, please first check out [this notebook](/docs/integrations/tools/apify) on how to use Apify to extract content from documentation, knowledge bases, help centers, or blogs. This example shows how to load a dataset produced by the [Website Content Crawler](https://apify.com/apify/website-content-crawler)."
"You need to have an existing dataset on the Apify platform. This example shows how to load a dataset produced by the [Website Content Crawler](https://apify.com/apify/website-content-crawler)."
]
},
{

View File

@@ -0,0 +1,243 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BSHTMLLoader\n",
"\n",
"\n",
"This notebook provides a quick overview for getting started with BeautifulSoup4 [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all __ModuleName__Loader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html).\n",
"\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [BSHTMLLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| BSHTMLLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access BSHTMLLoader document loader you'll need to install the `langchain-community` integration package and the `bs4` python package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed to use the `BSHTMLLoader` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community** and **bs4**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community bs4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:\n",
"\n",
"- TODO: Update model instantiation with relevant params."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import BSHTMLLoader\n",
"\n",
"loader = BSHTMLLoader(\n",
" file_path=\"./example_data/fake-content.html\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/fake-content.html', 'title': 'Test Title'}, page_content='\\nTest Title\\n\\n\\nMy First Heading\\nMy first paragraph.\\n\\n\\n')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/fake-content.html', 'title': 'Test Title'}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/fake-content.html', 'title': 'Test Title'}, page_content='\\nTest Title\\n\\n\\nMy First Heading\\nMy first paragraph.\\n\\n\\n')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []\n",
"page[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding separator to BS4\n",
"\n",
"We can also pass a separator to use when calling get_text on the soup"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='\n",
", Test Title, \n",
", \n",
", \n",
", My First Heading, \n",
", My first paragraph., \n",
", \n",
", \n",
"' metadata={'source': './example_data/fake-content.html', 'title': 'Test Title'}\n"
]
}
],
"source": [
"loader = BSHTMLLoader(\n",
" file_path=\"./example_data/fake-content.html\", get_text_separator=\", \"\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(docs[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all BSHTMLLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.html_bs.BSHTMLLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,55 @@
# Sample Markdown Document
## Introduction
Welcome to this sample Markdown document. Markdown is a lightweight markup language used for formatting text. It's widely used for documentation, readme files, and more.
## Features
### Headers
Markdown supports multiple levels of headers:
- **Header 1**: `# Header 1`
- **Header 2**: `## Header 2`
- **Header 3**: `### Header 3`
### Lists
#### Unordered List
- Item 1
- Item 2
- Subitem 2.1
- Subitem 2.2
#### Ordered List
1. First item
2. Second item
3. Third item
### Links
[OpenAI](https://www.openai.com) is an AI research organization.
### Images
Here's an example image:
![Sample Image](https://via.placeholder.com/150)
### Code
#### Inline Code
Use `code` for inline code snippets.
#### Code Block
```python
def greet(name):
return f"Hello, {name}!"
print(greet("World"))
```

View File

@@ -30,6 +30,7 @@
{
"sender_name": "User 2",
"timestamp_ms": 1675595060730,
"content": "",
"photos": [
{"uri": "url_of_some_picture.jpg", "creation_timestamp": 1675595059}
]

File diff suppressed because one or more lines are too long

View File

@@ -164,7 +164,7 @@
},
"outputs": [],
"source": [
"from langchain.document_loaders import GithubFileLoader"
"from langchain_community.document_loaders import GithubFileLoader"
]
},
{

View File

@@ -0,0 +1,45 @@
---
sidebar_position: 0
sidebar_class_name: hidden
---
# Document loaders
import { CategoryTable, IndexTable } from "@theme/FeatureTables";
DocumentLoaders load data into the standard LangChain Document format.
Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the .load method.
An example use case is as follows:
```python
from langchain_community.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(
... # <-- Integration specific parameters here
)
data = loader.load()
```
## Webpages
The below document loaders allow you to load webpages.
<CategoryTable category="webpage_loaders" />
## PDFs
The below document loaders allow you to load PDF documents.
<CategoryTable category="pdf_loaders" />
## Common File Types
The below document loaders allow you to load data from common data formats.
<CategoryTable category="common_loaders" />
## All document loaders
<IndexTable />

View File

@@ -0,0 +1,348 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# JSONLoader\n",
"\n",
"This notebook provides a quick overview for getting started with JSON [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all JSONLoader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html).\n",
"\n",
"- TODO: Add any other relevant links, like information about underlying API, etc.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/file_loaders/json/)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [JSONLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ✅ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| JSONLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access JSON document loader you'll need to install the `langchain-community` integration package as well as the ``jq`` python package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are required to use the `JSONLoader` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community** and **jq**:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community jq "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:\n",
"\n",
"- TODO: Update model instantiation with relevant params."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import JSONLoader\n",
"\n",
"loader = JSONLoader(\n",
" file_path=\"./example_data/facebook_chat.json\",\n",
" jq_schema=\".messages[].content\",\n",
" text_content=False,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}, page_content='Bye!')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(pages)\n",
"\n",
" pages = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read from JSON Lines file\n",
"\n",
"If you want to load documents from a JSON Lines file, you pass `json_lines=True`\n",
"and specify `jq_schema` to extract `page_content` from a single JSON object."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}\n"
]
}
],
"source": [
"loader = JSONLoader(\n",
" file_path=\"./example_data/facebook_chat_messages.jsonl\",\n",
" jq_schema=\".content\",\n",
" text_content=False,\n",
" json_lines=True,\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(docs[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read specific content keys\n",
"\n",
"Another option is to set `jq_schema='.'` and provide a `content_key` in order to only load specific content:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='User 2' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}\n"
]
}
],
"source": [
"loader = JSONLoader(\n",
" file_path=\"./example_data/facebook_chat_messages.jsonl\",\n",
" jq_schema=\".\",\n",
" content_key=\"sender_name\",\n",
" json_lines=True,\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(docs[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## JSON file with jq schema `content_key`\n",
"\n",
"To load documents from a JSON file using the `content_key` within the jq schema, set `is_content_key_jq_parsable=True`. Ensure that `content_key` is compatible and can be parsed using the jq schema."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Bye!' metadata={'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1}\n"
]
}
],
"source": [
"loader = JSONLoader(\n",
" file_path=\"./example_data/facebook_chat.json\",\n",
" jq_schema=\".messages[]\",\n",
" content_key=\".content\",\n",
" is_content_key_jq_parsable=True,\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(docs[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extracting metadata\n",
"\n",
"Generally, we want to include metadata available in the JSON file into the documents that we create from the content.\n",
"\n",
"The following demonstrates how metadata can be extracted using the `JSONLoader`.\n",
"\n",
"There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to directly specify in the schema where the value for the `page_content` can be extracted from.\n",
"\n",
"In this example, we have to tell the loader to iterate over the records in the `messages` field. The jq_schema then has to be `.messages[]`\n",
"\n",
"This allows us to pass the records (dict) into the `metadata_func` that has to be implemented. The `metadata_func` is responsible for identifying which pieces of information in the record should be included in the metadata stored in the final `Document` object.\n",
"\n",
"Additionally, we now have to explicitly specify in the loader, via the `content_key` argument, the key from the record where the value for the `page_content` needs to be extracted from."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': '/Users/isaachershenson/Documents/langchain/docs/docs/integrations/document_loaders/example_data/facebook_chat.json', 'seq_num': 1, 'sender_name': 'User 2', 'timestamp_ms': 1675597571851}\n"
]
}
],
"source": [
"# Define the metadata extraction function.\n",
"def metadata_func(record: dict, metadata: dict) -> dict:\n",
" metadata[\"sender_name\"] = record.get(\"sender_name\")\n",
" metadata[\"timestamp_ms\"] = record.get(\"timestamp_ms\")\n",
"\n",
" return metadata\n",
"\n",
"\n",
"loader = JSONLoader(\n",
" file_path=\"./example_data/facebook_chat.json\",\n",
" jq_schema=\".messages[]\",\n",
" content_key=\"content\",\n",
" metadata_func=metadata_func,\n",
")\n",
"\n",
"docs = loader.load()\n",
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all JSONLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.json_loader.JSONLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MathPixPDFLoader\n",
"\n",
"Inspired by Daniel Gross's snippet here: [https://gist.github.com/danielgross/3ab4104e14faccc12b49200843adab21](https://gist.github.com/danielgross/3ab4104e14faccc12b49200843adab21)\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [MathPixPDFLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.MathpixPDFLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| MathPixPDFLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"Sign up for Mathpix and [create an API key](https://mathpix.com/docs/ocr/creating-an-api-key) to set the `MATHPIX_API_KEY` variables in your environment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"if \"MATHPIX_API_KEY\" not in os.environ:\n",
" os.environ[\"MATHPIX_API_KEY\"] = getpass.getpass(\"Enter your Mathpix API key: \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we are ready to initialize our loader:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import MathpixPDFLoader\n",
"\n",
"file_path = \"./example_data/layout-parser-paper.pdf\"\n",
"loader = MathpixPDFLoader(file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all MathpixPDFLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.MathpixPDFLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,183 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PDFPlumber\n",
"\n",
"Like PyMuPDF, the output Documents contain detailed metadata about the PDF and its pages, and returns one document per page.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PDFPlumberLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PDFPlumberLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| PDFPlumberLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed to use this loader."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PDFPlumberLoader\n",
"\n",
"loader = PDFPlumberLoader(\"./example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'page': 0, 'total_pages': 16, 'Author': '', 'CreationDate': 'D:20210622012710Z', 'Creator': 'LaTeX with hyperref', 'Keywords': '', 'ModDate': 'D:20210622012710Z', 'PTEX.Fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'Producer': 'pdfTeX-1.40.21', 'Subject': '', 'Title': '', 'Trapped': 'False'}, page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 ((cid:0)), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recentadvancesindocumentimageanalysis(DIA)havebeen\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomescouldbeeasilydeployedinproductionandextendedforfurther\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportantinnovationsbyawideaudience.Thoughtherehavebeenon-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopmentindisciplineslikenaturallanguageprocessingandcomputer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademicresearchacross awiderangeof disciplinesinthesocialsciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitiveinterfacesforapplyingandcustomizingDLmodelsforlayoutde-\\ntection,characterrecognition,andmanyotherdocumentprocessingtasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: DocumentImageAnalysis·DeepLearning·LayoutAnalysis\\n· Character Recognition · Open Source library · Toolkit.\\n1 Introduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocumentimageanalysis(DIA)tasksincludingdocumentimageclassification[11,\\n1202\\nnuJ\\n12\\n]VC.sc[\\n2v84351.3012:viXra\\n')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'page': 0, 'total_pages': 16, 'Author': '', 'CreationDate': 'D:20210622012710Z', 'Creator': 'LaTeX with hyperref', 'Keywords': '', 'ModDate': 'D:20210622012710Z', 'PTEX.Fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2', 'Producer': 'pdfTeX-1.40.21', 'Subject': '', 'Title': '', 'Trapped': 'False'}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all PDFPlumberLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PDFPlumberLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,185 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyMuPDF\n",
"\n",
"`PyMuPDF` is optimized for speed, and contains detailed metadata about the PDF and its pages. It returns one document per page.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PyMuPDFLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| PyMuPDFLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed to use the `PyMuPDFLoader`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community** and **pymupdf**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-community pymupdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can initialize our loader and start loading documents. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyMuPDFLoader\n",
"\n",
"loader = PyMuPDFLoader(\"./example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load\n",
"\n",
"You can pass along any of the options from the [PyMuPDF documentation](https://pymupdf.readthedocs.io/en/latest/app1.html#plain-text/) as keyword arguments in the `load` call, and it will be pass along to the `get_text()` call."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'page': 0, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': ''}, page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (\\x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/layout-parser-paper.pdf', 'file_path': './example_data/layout-parser-paper.pdf', 'page': 0, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': ''}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all PyMuPDFLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,187 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyPDFDirectoryLoader\n",
"\n",
"This loader loads all PDF files from a specific directory.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PyPDFDirectoryLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFDirectoryLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| PyPDFDirectoryLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed for this loader."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFDirectoryLoader\n",
"\n",
"directory_path = (\n",
" \"../../docs/integrations/document_loaders/example_data/layout-parser-paper.pdf\"\n",
")\n",
"loader = PyPDFDirectoryLoader(\"example_data/\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': 'example_data/layout-parser-paper.pdf', 'page': 0}, page_content='LayoutParser : A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1( \\x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1Allen Institute for AI\\nshannons@allenai.org\\n2Brown University\\nruochen zhang@brown.edu\\n3Harvard University\\n{melissadell,jacob carlson }@fas.harvard.edu\\n4University of Washington\\nbcgl@cs.washington.edu\\n5University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser , an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io .\\nKeywords: Document Image Analysis ·Deep Learning ·Layout Analysis\\n·Character Recognition ·Open Source library ·Toolkit.\\n1 Introduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [ 11,arXiv:2103.15348v2 [cs.CV] 21 Jun 2021')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': 'example_data/layout-parser-paper.pdf', 'page': 0}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all PyPDFDirectoryLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFDirectoryLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,188 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyPDFium2Loader\n",
"\n",
"\n",
"This notebook provides a quick overview for getting started with PyPDFium2 [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all __ModuleName__Loader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFium2Loader.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PyPDFium2Loader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFium2Loader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| PyPDFium2Loader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"\n",
"To access PyPDFium2 document loader you'll need to install the `langchain-community` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community**."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFium2Loader\n",
"\n",
"file_path = \"./example_data/layout-parser-paper.pdf\"\n",
"loader = PyPDFium2Loader(file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'page': 0}, page_content='LayoutParser: A Unified Toolkit for Deep\\r\\nLearning Based Document Image Analysis\\r\\nZejiang Shen\\r\\n1\\r\\n(), Ruochen Zhang\\r\\n2\\r\\n, Melissa Dell\\r\\n3\\r\\n, Benjamin Charles Germain\\r\\nLee\\r\\n4\\r\\n, Jacob Carlson\\r\\n3\\r\\n, and Weining Li\\r\\n5\\r\\n1 Allen Institute for AI\\r\\nshannons@allenai.org 2 Brown University\\r\\nruochen zhang@brown.edu 3 Harvard University\\r\\n{melissadell,jacob carlson}@fas.harvard.edu\\r\\n4 University of Washington\\r\\nbcgl@cs.washington.edu 5 University of Waterloo\\r\\nw422li@uwaterloo.ca\\r\\nAbstract. Recent advances in document image analysis (DIA) have been\\r\\nprimarily driven by the application of neural networks. Ideally, research\\r\\noutcomes could be easily deployed in production and extended for further\\r\\ninvestigation. However, various factors like loosely organized codebases\\r\\nand sophisticated model configurations complicate the easy reuse of im\\x02portant innovations by a wide audience. Though there have been on-going\\r\\nefforts to improve reusability and simplify deep learning (DL) model\\r\\ndevelopment in disciplines like natural language processing and computer\\r\\nvision, none of them are optimized for challenges in the domain of DIA.\\r\\nThis represents a major gap in the existing toolkit, as DIA is central to\\r\\nacademic research across a wide range of disciplines in the social sciences\\r\\nand humanities. This paper introduces LayoutParser, an open-source\\r\\nlibrary for streamlining the usage of DL in DIA research and applica\\x02tions. The core LayoutParser library comes with a set of simple and\\r\\nintuitive interfaces for applying and customizing DL models for layout de\\x02tection, character recognition, and many other document processing tasks.\\r\\nTo promote extensibility, LayoutParser also incorporates a community\\r\\nplatform for sharing both pre-trained models and full document digiti\\x02zation pipelines. We demonstrate that LayoutParser is helpful for both\\r\\nlightweight and large-scale digitization pipelines in real-word use cases.\\r\\nThe library is publicly available at https://layout-parser.github.io.\\r\\nKeywords: Document Image Analysis· Deep Learning· Layout Analysis\\r\\n· Character Recognition· Open Source library· Toolkit.\\r\\n1 Introduction\\r\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\r\\ndocument image analysis (DIA) tasks including document image classification [11,\\r\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/layout-parser-paper.pdf', 'page': 0}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all PyPDFium2Loader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFium2Loader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,202 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PyPDFLoader\n",
"\n",
"This notebook provides a quick overview for getting started with `PyPDF` [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all DocumentLoader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html).\n",
"\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"\n",
"| Class | Package | Local | Serializable | JS support|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [PyPDFLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ❌ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| PyPDFLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are required to use `PyPDFLoader`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"To use `PyPDFLoader` you need to have the `langchain-community` python package downloaded:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community pypdf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFLoader\n",
"\n",
"loader = PyPDFLoader(\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'page': 0}, page_content='LayoutParser : A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1( \\x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1Allen Institute for AI\\nshannons@allenai.org\\n2Brown University\\nruochen zhang@brown.edu\\n3Harvard University\\n{melissadell,jacob carlson }@fas.harvard.edu\\n4University of Washington\\nbcgl@cs.washington.edu\\n5University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser , an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io .\\nKeywords: Document Image Analysis ·Deep Learning ·Layout Analysis\\n·Character Recognition ·Open Source library ·Toolkit.\\n1 Introduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [ 11,arXiv:2103.15348v2 [cs.CV] 21 Jun 2021')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/layout-parser-paper.pdf', 'page': 0}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" pages = []\n",
"len(pages)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"LayoutParser : A Unified Toolkit for DL-Based DIA 11\n",
"focuses on precision, efficiency, and robustness. \n",
"{'source': './example_data/layout-parser-paper.pdf', 'page': 10}\n"
]
}
],
"source": [
"print(pages[0].page_content[:100])\n",
"print(pages[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `PyPDFLoader` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.pdf.PyPDFLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -7,7 +7,18 @@
"source": [
"# Recursive URL\n",
"\n",
"The `RecursiveUrlLoader` lets you recursively scrape all child links from a root URL and parse them into Documents."
"The `RecursiveUrlLoader` lets you recursively scrape all child links from a root URL and parse them into Documents.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/web_loaders/recursive_url_loader/)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [RecursiveUrlLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.recursive_url_loader.RecursiveUrlLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ✅ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| RecursiveUrlLoader | ✅ | ❌ | \n"
]
},
{
@@ -17,6 +28,12 @@
"source": [
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are required to use the `RecursiveUrlLoader`.\n",
"\n",
"### Installation\n",
"\n",
"The `RecursiveUrlLoader` lives in the `langchain-community` package. There's no other required packages, though you will get richer default Document metadata if you have ``beautifulsoup4` installed as well."
]
},
@@ -186,6 +203,50 @@
"That certainly looks like HTML that comes from the url https://docs.python.org/3.9/, which is what we expected. Let's now look at some variations we can make to our basic example that can be helpful in different situations. "
]
},
{
"cell_type": "markdown",
"id": "b17b7202",
"metadata": {},
"source": [
"## Lazy loading\n",
"\n",
"If we're loading a large number of Documents and our downstream operations can be done over subsets of all loaded Documents, we can lazily load our Documents one at a time to minimize our memory footprint:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4b13e4d1",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/4j/2rz3865x6qg07tx43146py8h0000gn/T/ipykernel_73962/2110507528.py:6: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features=\"xml\"` into the BeautifulSoup constructor.\n",
" soup = BeautifulSoup(html, \"lxml\")\n"
]
}
],
"source": [
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
" if len(pages) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" pages = []"
]
},
{
"cell_type": "markdown",
"id": "fb039682",
"metadata": {},
"source": [
"In this example we never have more than 10 Documents loaded into memory at a time."
]
},
{
"cell_type": "markdown",
"id": "8f41cc89",
@@ -256,50 +317,6 @@
"You can similarly pass in a `metadata_extractor` to customize how Document metadata is extracted from the HTTP response. See the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.recursive_url_loader.RecursiveUrlLoader.html) for more on this."
]
},
{
"cell_type": "markdown",
"id": "1dddbc94",
"metadata": {},
"source": [
"## Lazy loading\n",
"\n",
"If we're loading a large number of Documents and our downstream operations can be done over subsets of all loaded Documents, we can lazily load our Documents one at a time to minimize our memory footprint:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "7d0114fc",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/4j/2rz3865x6qg07tx43146py8h0000gn/T/ipykernel_73962/2110507528.py:6: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features=\"xml\"` into the BeautifulSoup constructor.\n",
" soup = BeautifulSoup(html, \"lxml\")\n"
]
}
],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
"cell_type": "markdown",
"id": "f88a7c2f-35df-4c3a-b238-f91be2674b96",
"metadata": {},
"source": [
"In this example we never have more than 10 Documents loaded into memory at a time."
]
},
{
"cell_type": "markdown",
"id": "3e4d1c8f",

File diff suppressed because one or more lines are too long

View File

@@ -6,9 +6,35 @@
"source": [
"# Sitemap\n",
"\n",
"Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrape and load all pages in the sitemap, returning each page as a Document.\n",
"Extends from the `WebBaseLoader`, `SitemapLoader` loads a sitemap from a given URL, and then scrapes and loads all pages in the sitemap, returning each page as a Document.\n",
"\n",
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load. Note, while this will speed up the scraping process, but it may cause the server to block you. Be careful!"
"The scraping is done concurrently. There are reasonable limits to concurrent requests, defaulting to 2 per second. If you aren't concerned about being a good citizen, or you control the scrapped server, or don't care about load you can increase this limit. Note, while this will speed up the scraping process, it may cause the server to block you. Be careful!\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/web_loaders/sitemap/)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [SiteMapLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.sitemap.SitemapLoader.html#langchain_community.document_loaders.sitemap.SitemapLoader) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ✅ | ❌ | ✅ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| SiteMapLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access SiteMap document loader you'll need to install the `langchain-community` integration package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed to run this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
@@ -17,21 +43,55 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet nest_asyncio"
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community**."
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fix notebook asyncio bug"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# fixes a bug with asyncio and jupyter\n",
"import nest_asyncio\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents:"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -43,13 +103,63 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"sitemap_loader = SitemapLoader(web_path=\"https://api.python.langchain.com/sitemap.xml\")\n",
"\n",
"docs = sitemap_loader.load()"
"sitemap_loader = SitemapLoader(web_path=\"https://api.python.langchain.com/sitemap.xml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Fetching pages: 100%|##########| 28/28 [00:04<00:00, 6.83it/s]\n"
]
},
{
"data": {
"text/plain": [
"Document(metadata={'source': 'https://api.python.langchain.com/en/stable/', 'loc': 'https://api.python.langchain.com/en/stable/', 'lastmod': '2024-05-15T00:29:42.163001+00:00', 'changefreq': 'weekly', 'priority': '1'}, page_content='\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLangChain Python API Reference Documentation.\\n\\n\\nYou will be automatically redirected to the new location of this page.\\n\\n')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = sitemap_loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': 'https://api.python.langchain.com/en/stable/', 'loc': 'https://api.python.langchain.com/en/stable/', 'lastmod': '2024-05-15T00:29:42.163001+00:00', 'changefreq': 'weekly', 'priority': '1'}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
@@ -71,24 +181,37 @@
"sitemap_loader.requests_kwargs = {\"verify\": False}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load\n",
"\n",
"You can also load the pages lazily in order to minimize the memory load."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nLangChain Python API Reference Documentation.\\n\\n\\nYou will be automatically redirected to the new location of this page.\\n\\n', metadata={'source': 'https://api.python.langchain.com/en/stable/', 'loc': 'https://api.python.langchain.com/en/stable/', 'lastmod': '2024-02-09T01:10:49.422114+00:00', 'changefreq': 'weekly', 'priority': '1'})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
"name": "stderr",
"output_type": "stream",
"text": [
"Fetching pages: 100%|##########| 28/28 [00:01<00:00, 19.06it/s]\n"
]
}
],
"source": [
"docs[0]"
"page = []\n",
"for doc in sitemap_loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []"
]
},
{
@@ -224,11 +347,13 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": []
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all SiteMapLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.sitemap.SitemapLoader.html#langchain_community.document_loaders.sitemap.SitemapLoader"
]
}
],
"metadata": {
@@ -247,7 +372,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -7,20 +7,41 @@
"source": [
"# Unstructured\n",
"\n",
"This notebook covers how to use `Unstructured` package to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more.\n",
"This notebook covers how to use `Unstructured` [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders) to load files of many types. `Unstructured` currently supports loading of text files, powerpoints, html, pdfs, images, and more.\n",
"\n",
"Please see [this guide](/docs/integrations/providers/unstructured/) for more instructions on setting up Unstructured locally, including setting up required system dependencies."
"Please see [this guide](../../integrations/providers/unstructured.mdx) for more instructions on setting up Unstructured locally, including setting up required system dependencies.\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/file_loaders/unstructured/)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [UnstructuredLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/unstructured_api_reference.html) | ✅ | ❌ | ✅ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| UnstructuredLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"### Credentials\n",
"\n",
"By default, `langchain-unstructured` installs a smaller footprint that requires offloading of the partitioning logic to the Unstructured API, which requires an API key. If you use the local installation, you do not need an API key. To get your API key, head over to [this site](https://unstructured.io) and get an API key, and then set it in the cell below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "2886982e",
"metadata": {},
"outputs": [],
"source": [
"# Install package, compatible with API partitioning\n",
"%pip install --upgrade --quiet \"langchain-unstructured\""
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"UNSTRUCTURED_API_KEY\"] = getpass.getpass(\n",
" \"Enter your Unstructured API key: \"\n",
")"
]
},
{
@@ -28,14 +49,32 @@
"id": "e75e2a6d",
"metadata": {},
"source": [
"### Local Partitioning (Optional)\n",
"### Installation\n",
"\n",
"By default, `langchain-unstructured` installs a smaller footprint that requires\n",
"offloading of the partitioning logic to the Unstructured API.\n",
"#### Normal Installation\n",
"\n",
"If you would like to run the partitioning logic locally, you will need to install\n",
"a combination of system dependencies, as outlined in the \n",
"[Unstructured documentation here](https://docs.unstructured.io/open-source/installation/full-installation).\n",
"The following packages are required to run the rest of this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9de83b3",
"metadata": {},
"outputs": [],
"source": [
"# Install package, compatible with API partitioning\n",
"%pip install --upgrade --quiet langchain-unstructured unstructured-client unstructured \"unstructured[pdf]\" python-magic"
]
},
{
"cell_type": "markdown",
"id": "637eda35",
"metadata": {},
"source": [
"#### Installation for Local\n",
"\n",
"If you would like to run the partitioning logic locally, you will need to install a combination of system dependencies, as outlined in the [Unstructured documentation here](https://docs.unstructured.io/open-source/installation/full-installation).\n",
"\n",
"For example, on Macs you can install the required dependencies with:\n",
"\n",
@@ -47,7 +86,7 @@
"brew install libxml2 libxslt\n",
"```\n",
"\n",
"You can install the required `pip` dependencies with:\n",
"You can install the required `pip` dependencies needed for local with:\n",
"\n",
"```bash\n",
"pip install \"langchain-unstructured[local]\"\n",
@@ -59,120 +98,117 @@
"id": "a9c1c775",
"metadata": {},
"source": [
"### Quickstart\n",
"## Initialization\n",
"\n",
"To simply load a file as a document, you can use the LangChain `DocumentLoader.load` \n",
"interface:"
"The `UnstructuredLoader` allows loading from a variety of different file types. To read all about the `unstructured` package please refer to their [documentation](https://docs.unstructured.io/open-source/introduction/overview)/. In this example, we show loading from both a text file and a PDF file."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "79d3e549",
"metadata": {},
"outputs": [],
"source": [
"from langchain_unstructured import UnstructuredLoader\n",
"\n",
"loader = UnstructuredLoader(\"./example_data/state_of_the_union.txt\")\n",
"file_paths = [\n",
" \"./example_data/layout-parser-paper.pdf\",\n",
" \"./example_data/state_of_the_union.txt\",\n",
"]\n",
"\n",
"docs = loader.load()"
"\n",
"loader = UnstructuredLoader(file_paths)"
]
},
{
"cell_type": "markdown",
"id": "b4ab0a79",
"id": "8b68dcab",
"metadata": {},
"source": [
"### Load list of files"
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "092d9a0b",
"execution_count": 2,
"id": "8da59ef8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO: NumExpr defaulting to 12 threads.\n",
"INFO: pikepdf C++ to Python logger bridge initialized\n"
]
},
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}, page_content='1 2 0 2')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "97f7aa1f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"whatsapp_chat.txt : 1/22/23, 6:30 PM - User 1: Hi! Im interested in your bag. Im offering $50. Let me know if you are in\n",
"state_of_the_union.txt : May God bless you all. May God protect our troops.\n"
"{'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}\n"
]
}
],
"source": [
"file_paths = [\n",
" \"./example_data/whatsapp_chat.txt\",\n",
" \"./example_data/state_of_the_union.txt\",\n",
"]\n",
"\n",
"loader = UnstructuredLoader(file_paths)\n",
"\n",
"docs = loader.load()\n",
"\n",
"print(docs[0].metadata.get(\"filename\"), \": \", docs[0].page_content[:100])\n",
"print(docs[-1].metadata.get(\"filename\"), \": \", docs[-1].page_content[:100])"
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"id": "8de9ef16",
"id": "0d7f991b",
"metadata": {},
"source": [
"## PDF Example\n",
"\n",
"Processing PDF documents works exactly the same way. Unstructured detects the file type and extracts the same types of elements."
]
},
{
"cell_type": "markdown",
"id": "672733fd",
"metadata": {},
"source": [
"### Define a Partitioning Strategy\n",
"\n",
"Unstructured document loader allow users to pass in a `strategy` parameter that lets Unstructured\n",
"know how to partition pdf and other OCR'd documents. Currently supported strategies are `\"auto\"`,\n",
"`\"hi_res\"`, `\"ocr_only\"`, and `\"fast\"`. Learn more about the different strategies\n",
"[here](https://docs.unstructured.io/open-source/core-functionality/partitioning#partition-pdf). \n",
"\n",
"Not all document types have separate hi res and fast partitioning strategies. For those document types, the `strategy` kwarg is\n",
"ignored. In some cases, the high res strategy will fallback to fast if there is a dependency missing\n",
"(i.e. a model for document partitioning). You can see how to apply a strategy to an\n",
"`UnstructuredLoader` below."
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "60685353",
"execution_count": 4,
"id": "b05604d2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 393.9), (16.34, 560.0), (36.34, 560.0), (36.34, 393.9)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': '89565df026a24279aaea20dc08cedbec', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'e9fa370aef7ee5c05744eb7bb7d9981b'}, page_content='2 v 8 4 3 5 1 . 3 0 1 2 : v i X r a'),\n",
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((157.62199999999999, 114.23496279999995), (157.62199999999999, 146.5141628), (457.7358962799999, 146.5141628), (457.7358962799999, 114.23496279999995)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'Title', 'element_id': 'bde0b230a1aa488e3ce837d33015181b'}, page_content='LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis'),\n",
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((134.809, 168.64029940800003), (134.809, 192.2517444), (480.5464199080001, 192.2517444), (480.5464199080001, 168.64029940800003)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': '54700f902899f0c8c90488fa8d825bce'}, page_content='Zejiang Shen1 ((cid:0)), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain Lee4, Jacob Carlson3, and Weining Li5'),\n",
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((207.23000000000002, 202.57205439999996), (207.23000000000002, 311.8195408), (408.12676, 311.8195408), (408.12676, 202.57205439999996)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'b650f5867bad9bb4e30384282c79bcfe'}, page_content='1 Allen Institute for AI shannons@allenai.org 2 Brown University ruochen zhang@brown.edu 3 Harvard University {melissadell,jacob carlson}@fas.harvard.edu 4 University of Washington bcgl@cs.washington.edu 5 University of Waterloo w422li@uwaterloo.ca'),\n",
" Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((162.779, 338.45008160000003), (162.779, 566.8455408), (454.0372021523199, 566.8455408), (454.0372021523199, 338.45008160000003)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-02-27T15:49:27', 'links': [{'text': ':// layout - parser . github . io', 'url': 'https://layout-parser.github.io', 'start_index': 1477}], 'page_number': 1, 'parent_id': 'bde0b230a1aa488e3ce837d33015181b', 'filetype': 'application/pdf', 'category': 'NarrativeText', 'element_id': 'cfc957c94fe63c8fd7c7f4bcb56e75a7'}, page_content='Abstract. Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of im- portant innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces LayoutParser, an open-source library for streamlining the usage of DL in DIA research and applica- tions. The core LayoutParser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout de- tection, character recognition, and many other document processing tasks. To promote extensibility, LayoutParser also incorporates a community platform for sharing both pre-trained models and full document digiti- zation pipelines. We demonstrate that LayoutParser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at https://layout-parser.github.io.')]"
"Document(metadata={'source': './example_data/layout-parser-paper.pdf', 'coordinates': {'points': ((16.34, 213.36), (16.34, 253.36), (36.34, 253.36), (36.34, 213.36)), 'system': 'PixelSpace', 'layout_width': 612, 'layout_height': 792}, 'file_directory': './example_data', 'filename': 'layout-parser-paper.pdf', 'languages': ['eng'], 'last_modified': '2024-07-25T21:28:58', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'UncategorizedText', 'element_id': 'd3ce55f220dfb75891b4394a18bcb973'}, page_content='1 2 0 2')"
]
},
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_unstructured import UnstructuredLoader\n",
"pages = []\n",
"for doc in loader.lazy_load():\n",
" pages.append(doc)\n",
"\n",
"loader = UnstructuredLoader(\"./example_data/layout-parser-paper.pdf\", strategy=\"fast\")\n",
"\n",
"docs = loader.load()\n",
"\n",
"docs[5:10]"
"pages[0]"
]
},
{
@@ -241,23 +277,6 @@
"if youd like to self-host the Unstructured API or run it locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e5fde16",
"metadata": {},
"outputs": [],
"source": [
"# Install package\n",
"%pip install \"langchain-unstructured\"\n",
"%pip install \"unstructured-client\"\n",
"\n",
"# Set API key\n",
"import os\n",
"\n",
"os.environ[\"UNSTRUCTURED_API_KEY\"] = \"FAKE_API_KEY\""
]
},
{
"cell_type": "code",
"execution_count": 9,
@@ -358,8 +377,9 @@
"Partitioning with the Unstructured API relies on the [Unstructured SDK\n",
"Client](https://docs.unstructured.io/api-reference/api-services/sdk).\n",
"\n",
"Below is an example showing how you can customize some features of the client and use your own\n",
"`requests.Session()`, pass in an alternative `server_url`, or customize the `RetryConfig` object for more control over how failed requests are handled."
"Below is an example showing how you can customize some features of the client and use your own `requests.Session()`, pass in an alternative `server_url`, or customize the `RetryConfig` object for more control over how failed requests are handled.\n",
"\n",
"Note that the example below may not use the latest version of the UnstructuredClient and there could be breaking changes in future releases. For the latest examples, refer to the [Unstructured Python SDK](https://docs.unstructured.io/api-reference/api-services/sdk-python) docs."
]
},
{
@@ -494,6 +514,16 @@
"print(\"Number of LangChain documents:\", len(docs))\n",
"print(\"Length of text in the document:\", len(docs[0].page_content))"
]
},
{
"cell_type": "markdown",
"id": "ce01aa40",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all `UnstructuredLoader` features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_unstructured.document_loaders.UnstructuredLoader.html"
]
}
],
"metadata": {
@@ -512,7 +542,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
"version": "3.11.9"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,269 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# UnstructuredMarkdownLoader\n",
"\n",
"This notebook provides a quick overview for getting started with UnstructuredMarkdown [document loader](https://python.langchain.com/v0.2/docs/concepts/#document-loaders). For detailed documentation of all __ModuleName__Loader features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html).\n",
"\n",
"## Overview\n",
"### Integration details\n",
"\n",
"\n",
"| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/document_loaders/file_loaders/unstructured/)|\n",
"| :--- | :--- | :---: | :---: | :---: |\n",
"| [UnstructuredMarkdownLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html) | [langchain_community](https://api.python.langchain.com/en/latest/community_api_reference.html) | ❌ | ❌ | ✅ | \n",
"### Loader features\n",
"| Source | Document Lazy Loading | Native Async Support\n",
"| :---: | :---: | :---: | \n",
"| UnstructuredMarkdownLoader | ✅ | ❌ | \n",
"\n",
"## Setup\n",
"\n",
"To access UnstructuredMarkdownLoader document loader you'll need to install the `langchain-community` integration package and the `unstructured` python package.\n",
"\n",
"### Credentials\n",
"\n",
"No credentials are needed to use this loader."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to get automated best in-class tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Installation\n",
"\n",
"Install **langchain_community** and **unstructured**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain_community unstructured"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialization\n",
"\n",
"Now we can instantiate our model object and load documents. \n",
"\n",
"You can run the loader in one of two modes: \"single\" and \"elements\". If you use \"single\" mode, the document will be returned as a single `Document` object. If you use \"elements\" mode, the unstructured library will split the document into elements such as `Title` and `NarrativeText`. You can pass in additional `unstructured` kwargs after mode to apply different `unstructured` settings."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import UnstructuredMarkdownLoader\n",
"\n",
"loader = UnstructuredMarkdownLoader(\n",
" \"./example_data/example.md\",\n",
" mode=\"single\",\n",
" strategy=\"fast\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/example.md'}, page_content='Sample Markdown Document\\n\\nIntroduction\\n\\nWelcome to this sample Markdown document. Markdown is a lightweight markup language used for formatting text. It\\'s widely used for documentation, readme files, and more.\\n\\nFeatures\\n\\nHeaders\\n\\nMarkdown supports multiple levels of headers:\\n\\nHeader 1: # Header 1\\n\\nHeader 2: ## Header 2\\n\\nHeader 3: ### Header 3\\n\\nLists\\n\\nUnordered List\\n\\nItem 1\\n\\nItem 2\\n\\nSubitem 2.1\\n\\nSubitem 2.2\\n\\nOrdered List\\n\\nFirst item\\n\\nSecond item\\n\\nThird item\\n\\nLinks\\n\\nOpenAI is an AI research organization.\\n\\nImages\\n\\nHere\\'s an example image:\\n\\nCode\\n\\nInline Code\\n\\nUse code for inline code snippets.\\n\\nCode Block\\n\\n```python def greet(name): return f\"Hello, {name}!\"\\n\\nprint(greet(\"World\")) ```')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'source': './example_data/example.md'}\n"
]
}
],
"source": [
"print(docs[0].metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lazy Load"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'source': './example_data/example.md', 'link_texts': ['OpenAI'], 'link_urls': ['https://www.openai.com'], 'last_modified': '2024-08-14T15:04:18', 'languages': ['eng'], 'parent_id': 'de1f74bf226224377ab4d8b54f215bb9', 'filetype': 'text/markdown', 'file_directory': './example_data', 'filename': 'example.md', 'category': 'NarrativeText', 'element_id': '898a542a261f7dc65e0072d1e847d535'}, page_content='OpenAI is an AI research organization.')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"page = []\n",
"for doc in loader.lazy_load():\n",
" page.append(doc)\n",
" if len(page) >= 10:\n",
" # do some paged operation, e.g.\n",
" # index.upsert(page)\n",
"\n",
" page = []\n",
"page[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Elements\n",
"\n",
"In this example we will load in the `elements` mode, which will return a list of the different elements in the markdown document:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"29"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_community.document_loaders import UnstructuredMarkdownLoader\n",
"\n",
"loader = UnstructuredMarkdownLoader(\n",
" \"./example_data/example.md\",\n",
" mode=\"elements\",\n",
" strategy=\"fast\",\n",
")\n",
"\n",
"docs = loader.load()\n",
"len(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you see there are 29 elements that were pulled from the `example.md` file. The first element is the title of the document as expected:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Sample Markdown Document'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].page_content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API reference\n",
"\n",
"For detailed documentation of all UnstructuredMarkdownLoader features and configurations head to the API reference: https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Some files were not shown because too many files have changed in this diff Show More