Compare commits

..

495 Commits

Author SHA1 Message Date
leo-gan
8a17b34193 second version 2024-01-29 21:48:44 -08:00
leo-gan
44322081aa try to fix 2024-01-08 14:24:02 -08:00
Eugene Yurtsev
3a8ad90509 langchain(patch): Fix output type for pydantic output parser (#15714)
This PR fixes the output type for the pydantic output parser.

Fix for: https://github.com/langchain-ai/langserve/issues/301
2024-01-08 16:53:10 -05:00
Erick Friis
95a2c92e26 experimental[patch]: minimum version bump (#15724)
- experimental: minimum version bump
- actually 0.1.5
- actually 0.1.7
2024-01-08 13:04:57 -08:00
Erick Friis
6c9b7c2cec experimental: minimum version bump (#15722)
experimental relies on `from langchain_core.runnables.config import
run_in_executor` which was introduced in core 0.1.5.

Updated pyproject dependency as well as minimum version test.
2024-01-08 12:58:24 -08:00
Kane Sweet
167a0ac5f5 docs: update aws_dynamodb integration doc (#15666)
- **Description:** 
- Updated the docs for the memory integration module
`aws_dynamodb.ipynb`
  - **Issue:** 
    - #15664 
  - **Dependencies:** 
    - N/A
2024-01-08 12:27:29 -08:00
Ian
32ec56194b community: fix myscale delete function bug (#15675)
Now the SQL used to delete vector doc from myscale is as follow:
```sql
DELETE FROM collection WHERE id = '1' AND id = '2' AND id = '3'
```

But the expected one should be 

```sql
DELETE FROM collection WHERE id IN ('1', '2', '3')
```
2024-01-08 12:26:29 -08:00
Hamza Kyamanywa
fc3cb64dc3 langchain-docs: Correct the word "iteratively" in use-cases documentation (#15697)
Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
**Description:** Fixes the word "iteratively" in the use-cases
documentation
**Twitter handle:** @untilhamza
2024-01-08 12:24:00 -08:00
Christophe Bornet
a466f79ac9 Fix AstraDB logical operator filtering (#15699)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
This change fixes the AstraDB logical operator filtering (`$and,`
`$or`).
The `metadata` prefix must not be added if the key is `$and` or `$or`.
2024-01-08 12:23:46 -08:00
Christophe Bornet
1f5f6381ec Add doc for AstraDB document loader (#15703)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
See preview :
https://langchain-git-fork-cbornet-astra-loader-doc-langchain.vercel.app/docs/integrations/document_loaders/astradb
2024-01-08 12:21:46 -08:00
Eugene Yurtsev
b508fcce65 core(minor): Add a way to print out system information for debugging purposes. (#15718)
To use: 

```bash
python -m langchain_core.sys_info
```
2024-01-08 12:20:18 -08:00
MING KANG
c3624b416d docs: fix llm/chat_model tables (#15716)
- **Description:** This PR aims to fix the documentation for
langchain-commnuity.

- **Issue:** The table In this page :
[https://python.langchain.com/docs/integrations/llms/](https://urldefense.com/v3/__https://python.langchain.com/docs/integrations/llms/__;!!ACWV5N9M2RV99hQ!Jqw8gWnQrL1H6blPiGN10jrh1TDAzqGcKAaTAZv7TBy1X_m-03E7T-alOrWY5_71R8QUdONvF2wMRK54D50$)
is built based on old module. The proposed fix is to modify the import
from langchain-commnuity.
  
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-01-08 11:40:35 -08:00
Erick Friis
94911ae503 community[patch]: Support different Pinecone initializations depending on the version (#15717)
Co-authored-by: DosticJelena <jelenadostic2@gmail.com>
2024-01-08 11:33:36 -08:00
Bagatur
c0eb2482c3 docs: add LangGraph (#15682) 2024-01-08 08:38:14 -08:00
Harrison Chase
3e7a590a43 dont use docarray (#15710)
theres some issue with the integration currently so dont use it
2024-01-08 07:54:10 -08:00
Bagatur
4c47f39fcb community[patch]: Release 0.0.10 (#15678) 2024-01-08 00:24:45 -05:00
Bagatur
60f925d678 core[patch]: Release 0.1.8 (#15677) 2024-01-08 00:05:12 -05:00
Nuno Campos
7ce4cd0709 Do not issue beta or deprecation warnings on internal calls (#15641) 2024-01-07 20:54:45 -08:00
Nuno Campos
ef22559f1f Populate streamed_output for all runs handled by atransform_stream_with_config (#15599)
This means that users of astream_log() now get streamed output of
virtually all requested runs, whereas before the only streamed output
would be for the root run and raw llm runs

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-07 19:35:43 -08:00
abzachshan
7025fa23aa Docs: Add missing import of 'ConfigurableField' in 'Full code comparison' example in LCEL (#15661)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
- **Description:** Add missing import of 'ConfigurableField' in 'Full
code comparison' example in LCEL
  - **Issue:** Example code not running
  - **Dependencies:** None
  - **Twitter handle:** @heyyoshan
2024-01-07 13:45:32 -08:00
Harrison Chase
38ae4df3a1 update ragatouille integration (#15658) 2024-01-07 10:51:34 -08:00
Earlee
98c6c9603e community: fix: should flush after inserting data on milvus (#15568)
The inserted data cannot take effect immediately. We should flush after
inserting data on milvus.
2024-01-07 09:33:47 -08:00
chyroc
a17a3638b5 Docs: fix excel document loader typo (#15470) 2024-01-07 09:33:35 -08:00
Shaurya Rohatgi
1bfb1725a1 fix: Ollama import statements (#15493)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-07 09:31:24 -08:00
chyroc
9ae901c5e6 Feat: add CHM file loader (#15519)
fix https://github.com/langchain-ai/langchain/issues/15469
2024-01-07 09:28:52 -08:00
Nan LI
0b393315ce community: Correct Input API Key Name in Notebook and Enhance Readability of Comments for ZhipuAI Chat Model (#15529)
- **Description:** This update rectifies an error in the notebook by
changing the input variable from `zhipu_api_key` to `api_key`. It also
includes revisions to comments to improve program readability.
- **Issue:** The input variable in the notebook example should be
`api_key` instead of `zhipu_api_key`.
- **Dependencies:** No additional dependencies are required for this
change.

To ensure quality and standards, we have performed extensive linting and
testing. Commands such as make format, make lint, and make test have
been run from the root of the modified package to ensure compliance with
LangChain's coding standards.
2024-01-07 09:27:47 -08:00
kursathalat
9ea28ee464 fix: Fix DEFAULT_API_KEY for ArgillaCallbackHandler (#15534)
- ArgillaCallbackHandler does not properly set the default values while
initializing. This PR corrects the line.
- Issue: #15531 
- Dependencies: Argilla

- Also corrected some dead links.
2024-01-07 09:26:51 -08:00
V.Prasanna kumar
378d40f3ea changed broken link for wandb tracing with agent (#15578)
fix of #14905 

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-07 08:48:15 -08:00
Ammar Azman
a37389ac59 Adding reading source for Curie model (#15569)
Improving documentation

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** Adding resource for Curie model
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
  - **Twitter handle:** @mmarccode

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-07 08:42:17 -08:00
Bagatur
4759d10cf6 docs: add changelog (#15606)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-07 08:34:34 -08:00
Chad Norvell
d1bfb70bc4 community: Allow deleting by ID and collection in pgvector (#15627)
- **Description:** The `delete_collection` method deletes an entire
collection regardless of custom ID. The `delete` method deletes
everything with the provided custom IDs regardless of collection. It can
be useful to restrict deletion to both the collection and a set of
custom IDs. This change adds support for that by allowing you to
optionally specify that `delete` should be restricted to the collection
defined on the `PGVector` instance.
2024-01-07 08:33:21 -08:00
Chad Norvell
f6226d464e community: Include PDF ID in MathPix metadata (#15629)
- **Description:** Includes the PDF ID in the MathPix document metadata.
This is useful in case you need to re-request a processed PDF from the
MathPix API later.
2024-01-07 08:31:53 -08:00
Chad Norvell
d2a686b165 community: Provide more actionable errors in the MathPix PDF loader (#15630)
- **Description:** The `error_info['id']` can be cross-referenced with
the MathPix API documentation to get very specific information about why
an error occurred.
2024-01-07 08:31:09 -08:00
Usama Shahid
f0128dbcde Update openai_tools.ipynb (#15649)
Description: Update openai_tools.ipynb
Issue: The distinction between OpenAI function agents and OpenAI tools
was not adequately emphasised.
2024-01-07 08:30:30 -08:00
Kai
5d05df4bce community: Fixed bug of "system message check" in chat_models/tongyi. (#15631)
- **Description:** This PR is to fix a bug of "system message check" in
langchain_community/ chat_models/tongyi.py
- **Issue:** In term of current logic, if there's no system message in
the chat messages, an error of "System message can only be the first
message." will be wrongly raised.
  - **Dependencies:** No.
  - **Twitter handle:** I don't have a Twitter account.
2024-01-07 08:30:18 -08:00
Erick Friis
08be477c24 templates: 0.1 bump (#15648) 2024-01-06 18:31:46 -08:00
Raunak
64f5968a81 community: Replaced hardcoded "metadata" with FIELDS_METADATA variable in semantic_hybrid_search_with_score_and_rerank (#15642)
- **Description:** This PR is to fix a bug in
semantic_hybrid_search_with_score_and_rerank() function in
langchain_community/vectorstores/azuresearch.py. The hardcoded
"metadata" name is replaced with FIELDS_METADATA variable with an if
block to check if the metadata column exists or not.
- **Issue:** Fixed #15581
- **Dependencies:** No
- **Twitter handle:** None

Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
2024-01-06 17:04:59 -08:00
Harrison Chase
472f70c54b fix docs build (#15645) 2024-01-06 16:26:34 -08:00
Erick Friis
b1fa726377 docs: langchain-openai (#15513)
Updates docs and cookbooks to import ChatOpenAI, OpenAI, and OpenAI
Embeddings from `langchain_openai`

There are likely more

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-06 15:54:48 -08:00
Harrison Chase
be612f408e move output parser table (#15637) 2024-01-06 15:40:13 -08:00
Bagatur
14c5c15958 experimental[patch]: Release 0.0.48 (#15483) 2024-01-06 12:46:00 -05:00
Erick Friis
d136925c49 community[patch]: fix deprecation warnings on openai subclasses (#15621) 2024-01-05 18:02:17 -08:00
Bagatur
4ac61670b2 infra: fix langchain openai test dep (#15620) 2024-01-05 20:14:22 -05:00
Bagatur
81810cec2e langchain[minor]: Release 0.1.0 (#15619) 2024-01-05 19:33:35 -05:00
Bagatur
c5226d7a18 docs: update cohere chat integration (#15562)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-05 16:33:29 -08:00
Erick Friis
1bc6b19ea7 openai[patch]: v0.0.2 (#15618) 2024-01-05 16:33:10 -08:00
Bagatur
46446a100d core[patch]: deprecate v1 tracer (#15608) 2024-01-05 19:25:19 -05:00
Bagatur
dbb582d227 infra: community bump min core version (#15617) 2024-01-05 19:17:48 -05:00
Bagatur
1e4b8f0453 community[patch]: Release 0.0.9 (#15615) 2024-01-05 19:11:18 -05:00
Erick Friis
7f8baa030b openai: core version, rc1 (#15614) 2024-01-05 15:57:23 -08:00
Erick Friis
98be1e5ed0 infra: title release action runs (#15612)
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#run-name
2024-01-05 15:24:57 -08:00
Erick Friis
5ac3a06378 google-vertexai: release 0.0.1 (#15613) 2024-01-05 15:24:23 -08:00
Bagatur
96b47e18e0 core[patch]: Release 0.1.7 (#15610) 2024-01-05 18:24:11 -05:00
Erick Friis
b257c7d0ea google-vertexai, openai: release candidate version (#15611) 2024-01-05 15:05:27 -08:00
Erick Friis
1a42ad353a infra: vertex integration test creds (#15609) 2024-01-05 15:03:39 -08:00
Erick Friis
ebc75c5ca7 openai[minor]: implement langchain-openai package (#15503)
Todo

- [x] copy over integration tests
- [x] update docs with new instructions in #15513 
- [x] add linear ticket to bump core -> community, community->langchain,
and core->openai deps
- [ ] (optional): add `pip install langchain-openai` command to each
notebook using it
- [x] Update docstrings to not need `openai` install
- [x] Add serialization
- [x] deprecate old models

Contributor steps:

- [x] Add secret names to manual integrations workflow in
.github/workflows/_integration_test.yml
- [x] Add secrets to release workflow (for pre-release testing) in
.github/workflows/_release.yml

Maintainer steps (Contributors should not do these):

- [x] set up pypi and test pypi projects
- [x] add credential secrets to Github Actions
- [ ] add package to conda-forge


Functional changes to existing classes:

- now relies on openai client v1 (1.6.1) via concrete dep in
langchain-openai package

Codebase organization

- some function calling stuff moved to
`langchain_core.utils.function_calling` in order to be used in both
community and langchain-openai
2024-01-05 15:03:28 -08:00
Bagatur
a7d023aaf0 core[patch], community[patch]: mark runnable context, lc load as beta (#15603) 2024-01-05 17:54:26 -05:00
Bagatur
75281af822 docs: Fix chain redirects (#15600) 2024-01-05 15:07:30 -05:00
Leonid Kuligin
f73bf4ee54 google-vertexai: added langchain_google_vertexai package (#15218)
added langchain_google_vertexai package

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-05 10:44:10 -08:00
Bagatur
e1fc4d5b95 core[patch]: add beta decorator (#15589) 2024-01-05 13:16:27 -05:00
Harrison Chase
b484d941ae update memory (#15507) 2024-01-05 09:49:26 -08:00
Bagatur
68eb3053e7 langchain[patch]: deprecate old agent classes and methods (#15558) 2024-01-05 12:42:54 -05:00
Harrison Chase
9b9449750c update chain docs (#15495)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-01-05 09:15:00 -08:00
Bagatur
00dfbd2a99 core[minor], langchain[minor]: deprecate old Chain and LLM methods (#15499) 2024-01-05 11:58:35 -05:00
Harrison Chase
fd5fbb507d fix links (#15566)
there are still a few broken ones:

- some in the chains docs, which I will delete soon :)
- some pointing to a sqlite tool, which we should add
2024-01-04 21:57:30 -08:00
Matthew Kwiatkowski
7c4fe58f55 Docs: Fix typos in question_answering (#15565)
**Description**: Fixed a minor typo in the RAG Docs:
- ~~This usually happen offline~~ -> This usually happen**s** offline
2024-01-04 21:57:21 -08:00
chyroc
f12b5c1222 Feat: support Milvus more params (#15447)
fix https://github.com/langchain-ai/langchain/issues/15442
2024-01-04 20:07:23 -08:00
V.Prasanna kumar
aa1c7a56a9 docs: removed deprecated openai model (#15533)
removed the deprecated model from text embedding page of openai notebook
and added the suggested model from openai page


<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-04 17:53:42 -08:00
Bagatur
f5e4f0b30b langchain[minor]: add warnings when importing integrations (#15505)
Should be imported from community directly
2024-01-04 17:41:45 -05:00
Harrison Chase
14966581df add ragatouille (#15561) 2024-01-04 13:45:20 -08:00
Eugene Yurtsev
bf0b3cc0b5 core[patch]: Further restrict recursive URL loader (#15559)
Includes code from this PR:  https://github.com/langchain-ai/langchain/compare/HEAD...m0kr4n3:security/fix_ssrf 
with additional fixes 

Unit tests cover new test cases
2024-01-04 16:33:57 -05:00
Bagatur
817b84de9e core[patch]: Release 0.1.6 (#15547) 2024-01-04 11:02:04 -05:00
Bagatur
b2f15738dd core[patch], langchain[patch], community[patch]: Revert #15326 (#15546) 2024-01-04 10:39:37 -05:00
Harrison Chase
7a93356cbc add new chain howtos (#15430) 2024-01-03 21:19:58 -08:00
Erick Friis
81886ad345 docs: fix broken link (#15509) 2024-01-03 16:00:18 -08:00
Erick Friis
02f9c76791 docs: broken link in contributor docs (#15436) 2024-01-03 13:45:33 -08:00
Erick Friis
1437872df9 infra: fail check_diffs if too many files changed (#15423)
Jobs like
https://github.com/langchain-ai/langchain/actions/runs/7389187843/job/20101494206
only receive the first 300 changed files. Because of the opportunity to
miss packages, better to auto-fail and manually run.

Checking that it does what I expect in #15424
2024-01-03 13:30:16 -08:00
Erick Friis
69a8a26683 templates: fix deps (#15439) 2024-01-03 13:28:05 -08:00
Erick Friis
70beb2e40d docs: contributor faq (#15502) 2024-01-03 13:19:39 -08:00
Bagatur
6e90b7a91b langchain[patch]: bump community >=0.0.8,<0.1 (#15492) 2024-01-03 13:31:48 -05:00
Bagatur
8b7d6531a5 langchain[patch]: Release 0.0.354 (#15482) 2024-01-03 12:51:55 -05:00
Bagatur
0b579dc623 infra: update community test min reqs (#15490) 2024-01-03 12:13:29 -05:00
Bagatur
266db0efc8 community[patch]: bump core version >=0.1.5,<0.2 (#15488) 2024-01-03 12:03:31 -05:00
Bagatur
63e0cae2b1 infra: fix min deps test (#15486) 2024-01-03 11:34:46 -05:00
Bagatur
a2324ee533 community[patch]: Release 0.0.8 (#15481) 2024-01-03 11:28:50 -05:00
Bagatur
54b58c03db infra: add minimum deps pre release check (#15485) 2024-01-03 11:28:35 -05:00
Bagatur
b317ad2472 core[patch]: Release 0.1.5 (#15480) 2024-01-03 10:26:27 -05:00
Bagatur
baeac236b6 langchain[patch], experimental[patch]: update utilities imports (#15438) 2024-01-03 02:18:15 -05:00
Harutaka Kawamura
73da8f863c Remove unused Params (#14385)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Removes unused `Params` in `libs/langchain/langchain/llms/mlflow.py`.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 22:45:18 -08:00
chyroc
b65e57971e Patch: improve type hint (#15451) 2024-01-02 22:39:27 -08:00
Harutaka Kawamura
8ebf55ebbf Fix llms.Mlflow example (#14386)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

The example code for `llms.Mlflow` is outdated.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 22:35:13 -08:00
Nolan
6c4b5a4eff Add option to preserve headers in MarkdownHeaderTextSplitter (#14433)
- **Description:** `MarkdownHeaderTextSplitter` currently strips header
lines from chunked content. Many applications require these header lines
are preserved. This adds an optional parameter to preserve those headers
in the chunked content.
  - **Issue:** #2836 (relevant)
  - **Dependencies:** -
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @finnless

Unit tests and new examples in notebook included.

cc @rlancemartin

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 22:34:52 -08:00
Xin Liu
0a7d360ba4 feat: new integration wasm_chat (#14787)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Adds `WasmChat` integration. `WasmChat` runs GGUF models locally or via
chat service in lightweight and secure WebAssembly containers. In this
PR, `WasmChatService` is introduced as the first step of the
integration. `WasmChatService` is driven by
[llama-api-server](https://github.com/second-state/llama-utils) and
[WasmEdge Runtime](https://wasmedge.org/).

---------

Signed-off-by: Xin Liu <sam@secondstate.io>
2024-01-02 22:33:14 -08:00
Harrison Chase
51dcb89a72 cleanup getting started (#15450) 2024-01-02 22:26:35 -08:00
Leonid Ganeline
2bbee894bb fixed a dependency duplicate (#15444)
BaseModel is derived twice. Left only one.
2024-01-02 21:40:04 -08:00
William FH
65afc13b8b [Improvement] Evals: Add git info (#15446) 2024-01-02 20:08:50 -08:00
Anush
58cc7878e9 refactor: Qdrant async improvements (#14492)
Follow up on https://github.com/langchain-ai/langchain/pull/13048.
This PR intends to simplify the Qdrant async implementation by replacing
the internal GRPC methods with the `QdrantAsyncClient` methods.
This is a backward compatible change with no additional steps required
after merge.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 20:07:48 -08:00
Li-Lun Lin
cda68d717c core[patch]: update LanguageModelInput from List to Sequence (#14405)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-01-02 18:49:01 -08:00
JuR-0
4dab37741a Fix Bedrock broad error catching (#14398)
Fixes #14347 

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** Added the traceback of the previous error to keep the
initial error type,
  - **Issue:** #14347 ,
  - **Dependencies:** None,
  - **Tag maintainer:** 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Julien Raffy <julien.raffy@emeria.eu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 17:25:48 -08:00
amaleki2
413a56b8f1 adding vectorstore_kwarg attribute to search_similarity function (#14604)
- **Description:** the ability to add all extra parameter of vectorstore
and using them SemanticSimilarityExampleSelector.
  - **Issue:** #14583
  - **Dependencies:** no dependensies
  - **Tag maintainer:** 
  - **Twitter handle:** @AmirMalekiz

---------

Co-authored-by: Amir Maleki <amaleki@fb.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 17:18:33 -08:00
Bob Lin
e93be14c11 Improvement: Allow passing parameters to the underlying es_client. Closes: #14403 (#14435)
### Description

In https://github.com/langchain-ai/langchain/issues/14403, the user
mentioned that he hopes not to verify ssl and needs to pass more
parameters

I found that the `Elasticsearch` class [has very many
parameters](98f2af2134/elasticsearch/_sync/client/__init__.py (L131-L191)
):

<img width="1097" alt="Screenshot 2023-12-08 at 4 24 39 PM"
src="https://github.com/langchain-ai/langchain/assets/10000925/f2201554-b41a-4388-a8e8-c14a2d0466d4">

In order to adapt to more situations, I want to add the kwargs parameter
so that users can enter more `Elasticsearch` parameters. Like
[redis](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/redis/base.py#L253),
[tair](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/tair.py#L32),
[myscale](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/vectorstores/myscale.py#L112)
and so on.
2024-01-02 16:48:17 -08:00
codehound42
8aa921d3a4 Support score_threshold in SupabaseVectorStore similarity search (#14439)
Description: Add support for setting the `score_threshold` for
similarity search in SupabaseVectoreStore.

This pull request addresses issue #14438

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 16:47:05 -08:00
Antonio Pisani
d4a98e4e04 core: update json output parser (#15079)
- **Description:** changed json.py to handle additional cases of partial
json string to be parsed, basically by dropping the last character in
the string until a valid json string is found or the string is empty.
Also added additional test cases.
  
- **Issue:** function parse_partial_json could not parse cases where the
key is present but the value is not.

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2024-01-02 16:34:43 -08:00
YISH
eecfa81918 Add the collection_description parameter to Milvus (#14524)
Because Milvus' collection_name doesn't support UFT8 characters in other
languages, I want the `collection_descriotion`.


<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-02 16:28:01 -08:00
Evgenii Molov
b4ec340fb3 Fix failing serpapi response processing for Google Maps API (#14817)
**Description:** Fix for processing for serpapi response for Google Maps
API
**Issue:** Due to the fact corresponding
[api](https://serpapi.com/google-maps-api) returns 'local_results' as
list, and old version requested `res["local_results"].keys()` of the
list. As the result we got exception: ```AttributeError: 'list' object
has no attribute 'keys'```.

Way to reproduce wrong behaviour:
```
    params = {
        "engine": "google_maps",
        "type": "search",
        "google_domain": "google.de",
        "ll": "@51.1917,10.525,14z",
        "hl": "de",
        "gl": "de",
    }
    search = SerpAPIWrapper(params=params)
    results = search.run("cafe")
```

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Ran <rccalman@gmail.com>
2024-01-02 16:17:21 -08:00
YISH
da0f750a0b Milvus allows to store metadata as json field (#14636)
Because Milvus doesn't support nullable fields, but document metadata is
very rich, so it makes more sense to store it as json.


https://github.com/milvus-io/pymilvus/issues/1705#issuecomment-1731112372

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 16:12:00 -08:00
Erick Friis
620168e459 docs: together ai updates (#15435) 2024-01-02 16:05:53 -08:00
Bagatur
93e924ec96 langchain[patch], docs: update agent toolkit imports (#15434) 2024-01-02 18:58:50 -05:00
Ashley Xu
0ce7858529 feat: add Google BigQueryVectorSearch in vectorstore (#14829)
BigQuery vector search lets you use GoogleSQL to do semantic search,
using vector indexes for fast but approximate results, or using brute
force for exact results.

This PR integrates LangChain vectorstore with BigQuery Vector Search.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Vlad Kolesnikov <vladkol@google.com>
2024-01-02 15:57:14 -08:00
JaguarDB
02f59c2035 Use args option in jaguar so it takes more options in similarity search (#15080)
- **Description:** replace score_threshold with args
  - **Issue:** needs a way to pass more options to similarity search
  - **Dependencies:** None
  - **Twitter handle:** @workbot

---------

Co-authored-by: JY <jyjy@jaguardb>
2024-01-02 15:53:06 -08:00
chyroc
37ad6ec248 Refactor: use SecretStr for tongyi chat-model (#15102) 2024-01-02 15:45:23 -08:00
Shaurya Rohatgi
e1c2cd7a28 community: Semanticscholar tool to search 200M+ scientific articles (#15151)
- **Description:** Tool now supports querying over 200 million
scientific articles, vastly expanding its reach beyond the 2 million
articles accessible through Arxiv. This update significantly broadens
access to the entire scope of scientific literature.
- **Dependencies:** semantischolar
https://github.com/danielnsilva/semanticscholar
  - **Twitter handle:** @shauryr

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-02 15:36:03 -08:00
aqibamir
073e4107cd Fixed minor type in self_query.ipynb (#15196)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-02 15:34:09 -08:00
dudub12
7e6b0056b8 SQLDatabase drop the column names in the result. (#15361)
Fix for the following bug:
https://github.com/langchain-ai/langchain/issues/15360

---------

Co-authored-by: dudu butbul <100126964+dudu-upstream@users.noreply.github.com>
2024-01-02 15:29:25 -08:00
chyroc
07d294b5ec Fix: fix Bing Search empty result exception, fix #15384 (#15387)
fix https://github.com/langchain-ai/langchain/issues/15384
2024-01-02 15:25:00 -08:00
Bagatur
1678d6ca17 langchain[patch], experimental[patch], docs: update tools imports (#15433) 2024-01-02 18:23:34 -05:00
Bob Lin
e57e50b213 Remove unused _get_python_repl (#15389)
This part of the code can also be safely cleaned up.
2024-01-02 15:21:00 -08:00
Dariusz Kajtoch
15b6c049d4 core:adds tests for partial_variables (#15427)
**Description:** Added small tests to test partial_variables in
PromptTemplate. It was missing.
2024-01-02 15:00:06 -08:00
suhas-kotaki
73a628de9a added fix for key error: doc_id (#15428)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-02 14:59:53 -08:00
Leonid Ganeline
1e6519edc2 docs Microsoft platform page update (#15420)
Added two new document_loader references. Improved the format
consistency of the example pages
2024-01-02 14:59:40 -08:00
Leonid Ganeline
b8c6ebf647 refactor utils (#15432)
The `langchain` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.utils)
that belongs to `community`. If they moved then `langchain.utils`
namespace would be removed completely.
- moved `ernie_functions` artifacts to `community`
2024-01-02 14:56:38 -08:00
Bagatur
fa5d49f2c1 docs, experimental[patch], langchain[patch], community[patch]: update storage imports (#15429)
ran 
```bash
g grep -l "langchain.vectorstores" | xargs -L 1 sed -i '' "s/langchain\.vectorstores/langchain_community.vectorstores/g"
g grep -l "langchain.document_loaders" | xargs -L 1 sed -i '' "s/langchain\.document_loaders/langchain_community.document_loaders/g"
g grep -l "langchain.chat_loaders" | xargs -L 1 sed -i '' "s/langchain\.chat_loaders/langchain_community.chat_loaders/g"
g grep -l "langchain.document_transformers" | xargs -L 1 sed -i '' "s/langchain\.document_transformers/langchain_community.document_transformers/g"
g grep -l "langchain\.graphs" | xargs -L 1 sed -i '' "s/langchain\.graphs/langchain_community.graphs/g"
g grep -l "langchain\.memory\.chat_message_histories" | xargs -L 1 sed -i '' "s/langchain\.memory\.chat_message_histories/langchain_community.chat_message_histories/g"
gco master libs/langchain/tests/unit_tests/*/test_imports.py
gco master libs/langchain/tests/unit_tests/**/test_public_api.py
```
2024-01-02 16:47:11 -05:00
Harrison Chase
a33d92306c add get prompts method (#15425) 2024-01-02 12:44:14 -08:00
Nuno Campos
6810b4b0bc Use tz-aware utc datetimes in tracer (#15187)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-02 12:36:40 -08:00
Bagatur
480626dc99 docs, community[patch], experimental[patch], langchain[patch], cli[pa… (#15412)
…tch]: import models from community

ran
```bash
git grep -l 'from langchain\.chat_models' | xargs -L 1 sed -i '' "s/from\ langchain\.chat_models/from\ langchain_community.chat_models/g"
git grep -l 'from langchain\.llms' | xargs -L 1 sed -i '' "s/from\ langchain\.llms/from\ langchain_community.llms/g"
git grep -l 'from langchain\.embeddings' | xargs -L 1 sed -i '' "s/from\ langchain\.embeddings/from\ langchain_community.embeddings/g"
git checkout master libs/langchain/tests/unit_tests/llms
git checkout master libs/langchain/tests/unit_tests/chat_models
git checkout master libs/langchain/tests/unit_tests/embeddings/test_imports.py
make format
cd libs/langchain; make format
cd ../experimental; make format
cd ../core; make format
```
2024-01-02 15:32:16 -05:00
Nuno Campos
9cbf14dec2 Fetch runnable config from context var inside runnable lambda and runnable generator (#15334)
- easier to write custom logic/loops with automatic tracing
- if you don't want to streaming support write a regular function and
pass to RunnableLambda
- if you do want streaming write a generator and pass it to
RunnableGenerator

```py
import json
from typing import AsyncIterator

from langchain_core.messages import BaseMessage, FunctionMessage, HumanMessage
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import Runnable, RunnableGenerator, RunnablePassthrough
from langchain_core.tools import BaseTool

from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.chat_models import ChatOpenAI
from langchain.tools.render import format_tool_to_openai_function


def _get_tavily():
    from langchain.tools.tavily_search import TavilySearchResults
    from langchain.utilities.tavily_search import TavilySearchAPIWrapper

    tavily_search = TavilySearchAPIWrapper()
    return TavilySearchResults(api_wrapper=tavily_search)


async def _agent_executor_generator(
    input: AsyncIterator[list[BaseMessage]],
    *,
    max_iterations: int = 10,
    tools: dict[str, BaseTool],
    agent: Runnable[list[BaseMessage], BaseMessage],
    parser: Runnable[BaseMessage, AgentAction | AgentFinish],
) -> AsyncIterator[BaseMessage]:
    messages = [m async for mm in input for m in mm]
    for _ in range(max_iterations):
        next_message = await agent.ainvoke(messages)
        yield next_message
        messages.append(next_message)

        parsed = await parser.ainvoke(next_message)
        if isinstance(parsed, AgentAction):
            result = await tools[parsed.tool].ainvoke(parsed.tool_input)
            next_message = FunctionMessage(name=parsed.tool, content=json.dumps(result))
            yield next_message
            messages.append(next_message)
        elif isinstance(parsed, AgentFinish):
            return


def get_agent_executor(tools: list[BaseTool], system_message: str):
    llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0, streaming=True)
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_message),
            MessagesPlaceholder(variable_name="messages"),
        ]
    )
    llm_with_tools = llm.bind(
        functions=[format_tool_to_openai_function(t) for t in tools]
    )

    agent = {"messages": RunnablePassthrough()} | prompt | llm_with_tools
    parser = OpenAIFunctionsAgentOutputParser()
    executor = RunnableGenerator(_agent_executor_generator)
    return executor.bind(
        tools={tool.name for tool in tools}, agent=agent, parser=parser
    )


agent = get_agent_executor([_get_tavily()], "You are a very nice agent!")


async def main():
    async for message in agent.astream(
        [HumanMessage(content="whats the weather in sf tomorrow?")]
    ):
        print(message)


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())
```

results in this trace
https://smith.langchain.com/public/fa17f05d-9724-4d08-8fa1-750f8fcd051b/r
2024-01-02 12:16:39 -08:00
Bagatur
8e0d5813c2 langchain[patch], experimental[patch]: replace langchain.schema imports (#15410)
Import from core instead.

Ran:
```bash
git grep -l 'from langchain.schema\.output_parser' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.output_parser/from\ langchain_core.output_parsers/g"
git grep -l 'from langchain.schema\.messages' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.messages/from\ langchain_core.messages/g"
git grep -l 'from langchain.schema\.document' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.document/from\ langchain_core.documents/g"
git grep -l 'from langchain.schema\.runnable' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.runnable/from\ langchain_core.runnables/g"
git grep -l 'from langchain.schema\.vectorstore' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.vectorstore/from\ langchain_core.vectorstores/g"
git grep -l 'from langchain.schema\.language_model' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.language_model/from\ langchain_core.language_models/g"
git grep -l 'from langchain.schema\.embeddings' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.embeddings/from\ langchain_core.embeddings/g"
git grep -l 'from langchain.schema\.storage' | xargs -L 1 sed -i '' "s/from\ langchain\.schema\.storage/from\ langchain_core.stores/g"
git checkout master libs/langchain/tests/unit_tests/schema/
make format
cd libs/experimental
make format
cd ../langchain
make format
```
2024-01-02 15:09:45 -05:00
Bagatur
a3d47b4f19 docs: fix model i/o index links (#15421) 2024-01-02 13:38:05 -05:00
Bagatur
5a43e0e885 docs: fix agents index links (#15419) 2024-01-02 13:15:01 -05:00
Ankush Gola
f50dba12ff Calculate trace_id and dotted_order client side (#15351)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-02 10:13:34 -08:00
Erick Friis
a8f6f33cd9 infra: remove path filter on check_diffs (#15418)
CI should run on https://github.com/langchain-ai/langchain/pull/15412

But github only checks first 300 files:
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#git-diff-comparisons

> Diffs are limited to 300 files. If there are files changed that aren't
matched in the first 300 files returned by the filter, the workflow will
not run. You may need to create more specific filters so that the
workflow will run automatically.
2024-01-02 13:10:48 -05:00
Bob Lin
4488234d64 Update gpt4all.mdx doc (#15392)
The [pyllamacpp](https://github.com/nomic-ai/pyllamacpp) repository has
been archived and the model name and usage need to be changed.
2024-01-02 08:57:57 -08:00
Mohammad Mohtashim
b6c57d38fa Langchain_community: Small Fix when loading facebook messages (#15358)
- **Description:** SingleFileFacebookMessengerChatLoader did not handle
the case for when messages had stickers and/or photos so fixed that.
  - **Issue:** #15356

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 18:52:23 -08:00
Mateusz Szewczyk
cbfaccc424 WatsonxLLM updates/enhancements (#14598)
- **Description:** updates/enhancements to IBM
[watsonx.ai](https://www.ibm.com/products/watsonx-ai) LLM provider
(prompt tuned models and prompt templates deployments support)
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
  - **Tag maintainer:** : @hwchase17 , @eyurtsev , @baskaryan 
  - **Twitter handle:** details in comment below.

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally. 

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 18:50:05 -08:00
Manjunath Janardhan
7a0feba9f7 GITLAB_URL should take default https://gitlab.com instead of error (#14638)
The fix #14221 has broken default gitlab url which is forcing the users
to specify GITLAB_URL for default one. With this fix if GITLAB_URL is
not set, the default gitlab url will be taken.

  - **Description:** Add the GITHUB URL instead of None
  - **Issue:** the issue #14221 has broken the default github URL 
  - **Dependencies:** None
  - **Tag maintainer:** @hwchase17 
  - **Twitter handle:** manjunath_shiva
2024-01-01 16:55:52 -08:00
David
dcf047c48f add api_base to _client_params (community version of #14393) (#14644)
- **Description:** This PR adds `api_base` to `_client_params` in the
`chat_model` of LiteLLM to ensure it's included in API calls.
Previously, `api_base` was set on the client but was not included in the
parameters passed to the completion function. This change ensures that
`api_base` is correctly passed to all API calls.
  - **Issue:** #14338
  - **Tag maintainer:** @hwchase17 @agola11
  - **Twitter handle:** @LMS_David_RS
2024-01-01 16:53:16 -08:00
Lucca Zenóbio
3bd0a15506 Fix for openai multi tools input format. (#14653)
Sometimes, the tool_schema is like:
  
` {'action_name': 'search_items', 'action': {'term': 'pizza'}}`

sometimes, specially with gpt3.5 it comes like:

`{'action_name': 'search_items', 'term': 'pizza'}`

and it fails.

This PR is a way to make it work in both scenarios.

issues releated: #6624 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Co-authored-by: Lucca Zenobio <lucca.zenobio@ifood.com.br>
2024-01-01 16:50:31 -08:00
xuxiang
dd1d818a82 Fixing the Issue with DashScopeEmbeddings Handling More than 25 Rows of Data (#14662)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
 
This change addresses the issue where DashScopeEmbeddingAPI limits
requests to 25 lines of data, and DashScopeEmbeddings did not handle
cases with more than 25 lines, leading to errors. I have implemented a
fix to manage data exceeding this limit efficiently.

---------

Co-authored-by: xuxiang <xuxiang@aliyun.com>
2024-01-01 16:50:13 -08:00
Thomas B
9d8468a576 Enhancement on feature/yaml output parser (#14674)
Adding to my previously, already merged PR I made some further
improvements:

* Added documentation to the existing Pydantic Parser notebook, with an
example using LCEL and `with_retry()` on `OutputParserException`.
* Added an additional output example to the prompt
* More lenient parser in terms of LLM output format
* Amended unit test

FYI @hwchase17

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 16:49:58 -08:00
Zeeland
ff10f30149 fix: syntax error in function docs (#14641)
fix syntax error in function docs
2024-01-01 16:44:15 -08:00
Paresh Chiramel
9be08a1956 Update _retrieve_ref inside json_schema.py to include an isdigit() check (#14745)
- **Description:** Update _retrieve_ref inside json_schema.py to include
an isdigit() check
- **Issue:** This library is used inside dereference_refs inside
langchain_community.agent_toolkits.openapi.spec. When I read in a yaml
file which has references for "400", "401" etc; the line "out =
out[component]" causes a KeyError. The isdigit() check ensures that if
it is an integer like "400" or "401"; it converts it into integer before
using it as a key to prevent the error.
  - **Dependencies:** No dependencies
  - **Tag maintainer:** @baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 16:25:42 -08:00
Joshua Sundance Bailey
cfd27b1786 python-lint (#14689)
# Description: _python-lint_

This agent writes Python code that is formatted and linted using
`black`, `ruff`, and `mypy`, but does not execute the code. It writes
the code to a temporary file and then runs the linters. Once these
checks pass, the code is returned.

# Dependencies

- black
- ruff
- mypy

# Demo

The functionality can be seen here:
https://huggingface.co/spaces/joshuasundance/langchain-streamlit-demo
2024-01-01 16:25:03 -08:00
Muntaqa Mahmood
cf2dd2fa25 Added: docs Headers to Steam Tool notebook steps (#14749)
Added some Headers in steam tool notebook to match consistency with the
other toolkit notebooks
  - Dependencies: no new dependencies
  - Tag maintainer: @hwchase17, @baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 16:21:40 -08:00
Christophe Bornet
e2a8962ba6 Add AstraDB document loader (#14747)
- **Description:** this adds the AstraDB document loader and an
integration test
  - **Twitter handle:** cbornet_
2024-01-01 16:13:28 -08:00
Leonid Ganeline
de682761c5 docs microsoft pages sort order fix (#14771)
`integrations/document_loaders/` `Excel` and `OneNote` pages in the
navbar were in the wrong sort order. It is because the file names are
not equal to the page titles.
- renamed `excel` and `onenote` file names
2024-01-01 16:10:59 -08:00
Igor Dvorkin
76923e5743 Restore self message sent before OSX 12 Monterey (#14818)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-01 16:04:14 -08:00
savoiepe
d006be60ec Added more filtering options to pgvector vectorstore (#14852)
- **Description:** Using PGVector vector store, it was only possible to
filter for values equals, in or not in metadata. Extended this feature
to work with the following keywords : IN, NIN, BETWEEN, GT, LT, NE, EQ,
LIKE, CONTAINS, OR, AND

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 16:01:22 -08:00
Rajesh Sharma
dfd7b9edda Update regex in output parser (#15082)
The regex used to match "Action" and "Action Input" in the output parser
has been updated. Previously, the regex did not correctly handle
multi-line inputs for "Action Input". The updated code uses the
're.DOTALL' flag to ensure multi-line inputs are correctly captured.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 15:59:22 -08:00
chyroc
32e96a471c Refactor: use SecretStr for llm_rails embeddings (#15090) 2024-01-01 15:24:50 -08:00
chyroc
b440f92d81 Refactor: use SecretStr for embaas embeddings (#15091) 2024-01-01 15:24:00 -08:00
chyroc
ea6cf0f1b1 Refactor: use SecretStr for edenai embeddings (#15092) 2024-01-01 15:22:51 -08:00
chyroc
32e6e9de13 Refactor: use SecretStr for palm chat-model (#15100) 2024-01-01 15:21:41 -08:00
chyroc
b6952d41e5 Refactor: use SecretStr for GPTRouter chat-model (#15101) 2024-01-01 15:20:26 -08:00
Nan LI
f506b4cfd2 community: Integration of New Chat Model Based on ChatGLM3 via ZhipuAI API (#15105)
- **Description:** 
- This PR introduces a significant enhancement to the LangChain project
by integrating a new chat model powered by the third-generation base
large model, ChatGLM3, via the zhipuai API.
- This advanced model supports functionalities like function calls, code
interpretation, and intelligent Agent capabilities.
- The additions include the chat model itself, comprehensive
documentation in the form of Python notebook docs, and thorough testing
with both unit and integrated tests.
- **Dependencies:** This update relies on the ZhipuAI package as a key
dependency.
- **Twitter handle:** If this PR receives spotlight attention, we would
be honored to receive a mention for our integration of the advanced
ChatGLM3 model via the ZhipuAI API. Kindly tag us at @kaiwu.

To ensure quality and standards, we have performed extensive linting and
testing. Commands such as make format, make lint, and make test have
been run from the root of the modified package to ensure compliance with
LangChain's coding standards.

TO DO: Continue refining and enhancing both the unit tests and
integrated tests.

---------

Co-authored-by: jing <jingguo92@gmail.com>
Co-authored-by: hyy1987 <779003812@qq.com>
Co-authored-by: jianchuanqi <qijianchuan@hotmail.com>
Co-authored-by: lirq <whuclarence@gmail.com>
Co-authored-by: whucalrence <81530213+whucalrence@users.noreply.github.com>
Co-authored-by: Jing Guo <48378126+JaneCrystall@users.noreply.github.com>
2024-01-01 15:17:03 -08:00
Hin
2cf1e73d12 Feat add volcano embedding (#14693)
Description: Volcano Ark is an enterprise-grade large-model service
platform for developers, providing a full range of functions and
services such as model training, inference, evaluation, fine-tuning. You
can visit its homepage at https://www.volcengine.com/docs/82379/1099455
for details. This change could help developers use the platform for
embedding.
Issue: None
Dependencies: volcengine
Tag maintainer: @baskaryan
Twitter handle: @hinnnnnnnnnnnns

---------

Co-authored-by: lujingxuansc <lujingxuansc@bytedance.com>
2024-01-01 14:37:35 -08:00
Harrison Chase
81a7a83b21 [docs] update toolkit docs (#15294)
its probably too much work to do all toolkits before 0.1

Also, some agent toolkits (like pandas and python) will require more
work
2024-01-01 14:21:55 -08:00
Naveen RS
fbe4209ce1 Update LLaMA2_sql_chat.ipynb (#15379)
Updated prompt input suggestions

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 14:17:37 -08:00
GauravWaghmare
f79bec12eb Langchain: Fix typo in documentation (#15124)
**Description:**
Fix typo in documentation. 

**Twitter handle:**
@HydrogenHydride
2024-01-01 14:04:51 -08:00
David Křístek
a010f29013 fix: call correct stream method in ollama (#15104)
Co-authored-by: David Kristek <david@David--MacBook-Pro.local>
2024-01-01 14:03:53 -08:00
Vardhaman
c2c2f252c2 docs: updated document for 'Return Source Documents' Functionality (#15106)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description:** updated the outdated code in the document that was
generating the error,
  - **Issue:** #15086 ,
  - **Dependencies:** N/A,
- **Twitter handle:** [@vardhaman722](https://twitter.com/vardhaman722)
2024-01-01 14:03:16 -08:00
Christian Janiake
be578f32be community:Lazy load wikipedia dump file (#15111)
**Description:** the MWDumpLoader implementation currently does not
support the lazy_load method, and the files are usually very large. We
are proposing refactoring the load function, extracting two private
functions with the functionality of loading the dump file and parsing a
single page, to reuse the code in the lazy_load implementation.
2024-01-01 14:02:56 -08:00
purificant
619cd3ce54 ci: upgrade actions (#15114)
This PR upgrades CI actions
[actions/setup-python](https://github.com/actions/setup-python/releases/tag/v5.0.0)
and
[google-github-actions/auth](https://github.com/google-github-actions/auth/releases/tag/v2.0.0)
2024-01-01 14:02:43 -08:00
Samuel Path
138f97af23 Add missing comment char "#" before Load in chain.py for the rag-pinecone-rerank template (#15209)
Without this additional `#`, one needs to add it manually after
uncommenting the section.
2024-01-01 14:01:06 -08:00
chyroc
a4ae4bc361 feat: mask api_key for konko (#14010)
for https://github.com/langchain-ai/langchain/issues/12165
2024-01-01 13:42:49 -08:00
joel-teratis
62d32bd214 fix(minor): added missing **kwargs parameter to chroma query function (#14919)
**Description:**

This PR adds the `**kwargs` parameter to six calls in the `chroma.py`
package. All functions already were able to receive `kwargs` but they
were discarded before.

**Issue:**

When passing `kwargs` to functions in the `chroma.py` package they are
being ignored.

For example:

```
chroma_instance.similarity_search_with_score(
    query,
    k=100,
    include=["metadatas", "documents", "distances", "embeddings"],  # this parameter gets ignored
)
```
The `include` parameter does not get passed on to the next function and
does not have any effect.

**Dependencies:**

None
2024-01-01 13:40:29 -08:00
Abhishek Mishra
f466b6aa4a Documentation: Update playwright documentation for langchain version >= 0.0.351 (#15260)
- A documentation change in the example listed under:
https://python.langchain.com/docs/integrations/toolkits/playwright
- `create_async_playwright_browser` does not exist under the module:
`langchain.tools.playwright.utils` post >= 0.0.351 version
  - No dependencies to be changed

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 13:38:56 -08:00
chyroc
0665a7da19 Docs: add param comment for tracing_v2_enabled (#15308)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-01 13:38:44 -08:00
Ahmed Hathout
d0a1c71eda Langchain: Fix quickstart doc code not working (#15352)
The quickstart doc is missing a few but very simple things that without
them, the code does not work. This PR fixes that by
- Adding commands to install `tiktoken` and `langchainhub`
- Adds a comma between 2 parameters for one of the methods
2024-01-01 13:38:33 -08:00
Donovan Muller
a7f0b65d26 Docs: Fix spelling and grammar on Concepts page (#15364)
- **Description:**  Fix a few spelling and grammar issues
  - **Issue:** NA
  - **Dependencies:** NA
  - **Twitter handle:** @donovancmuller
 
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-01 13:38:14 -08:00
Harrison Chase
768334b6a4 [docs] update agent cookbook lcel (#15349) 2024-01-01 13:31:41 -08:00
Yinghao Zhu
870b4033ed docs(ollama): Fix Documentation in CallbackManager, missing ]) (#15380)
- **Description:** This PR corrects a documentation error in the
`ollama` usage tutorial. Specifically, it fixes a missing `])` in the
`CallbackManager()` example, ensuring that the code snippet is
syntactically correct and can be successfully executed.
  - **Issue:** N/A
- **Dependencies:** No additional dependencies are required for this
change.
  - **Twitter handle:** My twitter is @yhzhu99
2024-01-01 13:17:32 -08:00
Naveen RS
fc8dc6bb39 Update Multi_modal_RAG.ipynb (#15378)
Updated comment for better understanding

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-01-01 13:17:23 -08:00
NuODaniel
7773943a51 community:qianfan endpoint support init params & remove useless params definietion (#15381)
- **Description:**
- support custom kwargs in object initialization. For instantance, QPS
differs from multiple object(chat/completion/embedding with diverse
models), for which global env is not a good choice for configuration.
  - **Issue:** no
  - **Dependencies:** no
  - **Twitter handle:** no

@baskaryan PTAL
2024-01-01 13:12:31 -08:00
Bagatur
26f84b74d0 docs: revamp redirects (#15366) 2023-12-31 16:26:49 -05:00
Bagatur
27dca2d92f docs: cleanup rag use case (#15284) 2023-12-30 19:39:22 -05:00
Ofer Mendelevitch
11accf8366 Community: Newlines before bullets in IPYNB files (Vectara) (#15330)
- **Description:** updated all Vectara IPYNB files so that bullets look
okay in docs (added newline)
  - **Twitter handle:** @ofermend
2023-12-30 14:04:04 -08:00
Nuno Campos
b9636e5c98 Catch type errors in dumps/dumpd (#15336)
These can happen for edge cases not covered by `default` handler (eg.
"strange" keys in dicts)

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-29 17:37:12 -08:00
Nuno Campos
99000c612e Propagate context vars in all classes/methods (#15329)
- Any direct usage of ThreadPoolExecutor or asyncio.run_in_executor
needs manual handling of context vars

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-29 15:59:00 -08:00
Ankush Gola
7eec8f2487 Delete V1 tracer and refactor tracer tests to core (#15326) 2023-12-29 15:55:56 -08:00
Nuno Campos
4e4b119614 Fix executor 2023-12-29 15:50:45 -08:00
Harrison Chase
f20c56db41 [documentation] documentation revamp (#15281)
needs new versions of langchain-core and langchain

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-29 14:51:06 -08:00
chyroc
7ce338201c Patch: improve check openai version (#15301) 2023-12-29 13:44:19 -08:00
Jon Nolen
27ee61645d core: Update messages/__init__.py to account for AIMessageChunk which breaks message history runnable. (#15327)
- **Description:** fix parse issue for AIMessageChunk when using 
  - **Issue:** https://github.com/langchain-ai/langchain/issues/14511
  - **Dependencies:** none
  - **Twitter handle:** none

Taken from this fix:
https://github.com/gpt-engineer-org/gpt-engineer/issues/804#issuecomment-1769853850

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-29 13:41:47 -08:00
Piyush Ranjan
76ec3b0ab4 community: corrected typo in .readthedocs.yaml (#15309)
corrected a possible typing mistake in .readthedocs.yaml
2023-12-29 13:40:33 -08:00
Nuno Campos
9bb1fbcadf Lint 2023-12-29 12:43:55 -08:00
Nuno Campos
f7313adf2a old py compat 2023-12-29 12:38:58 -08:00
Nuno Campos
eb5e250188 Propagate context vars in all classes/methods
- Any direct usage of ThreadPoolExecutor or asyncio.run_in_executor needs manual handling of context vars
2023-12-29 12:34:03 -08:00
Kelly Elton
70e5d05952 Update vectorstore_retriever_memory.mdx (#15275)
removed bad comments

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-29 12:06:30 -08:00
Shuai Liu
4b53440e70 Upgrades the Tongyi LLM and ChatTongyi Model (#14793)
- **Description:** fixes and upgrades for the Tongyi LLM and ChatTongyi
Model
      - Fixed typos; it should be `Tongyi`, not `OpenAI`.
- Fixed a bug in `stream_generate_with_retry`; it's a real stream
generator now.
- Fixed a bug in `validate_environment`; the `dashscope_api_key` should
be properly handled when set by environment variables or initialization
parameters.
- Changed the `dashscope` response to incremental output by setting the
parameter `incremental_output`, which eliminates the need for the
prefix-removal trick.
      - Removed some unused parameters, like `n`, `prefix_messages`.
      - Added `_stream` method.
- Added async methods support, such as `_astream`, `_agenerate`,
`_abatch`.
  - **Dependencies:** No new dependencies.
  - **Tag maintainer:** @hwchase17 

> PS: Some may be confused about the terms `dashscope`, `tongyi`, and
`Qwen`:
> - `dashscope`: A platform to deploy LLMs and provide APIs to invoke
the LLM.
> - `tongyi`: A brand name or overall term about Alibaba Cloud's LLM/AI.
> - `Qwen`: An LLM that is open-sourced and deployed in `dashscope`.
> 
> We use the `dashscope` SDK to interact with the `tongyi`-`Qwen` LLM.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-29 12:06:12 -08:00
Romain Fouilland
6f15cc64b8 langchain: minor changes to StuffDocumentsChain._get_inputs (#15321)
Correcting a small typo ('the' instead of 'then') and changing another
'the' (instead of 'then' too, it was a hard day for the 'n' key :D) to
'also' to match better with what is done in the code

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-29 11:53:30 -08:00
Matthew Teschke
621493228b langchain: Exclude non-utf8 file from loader since it causes an error in the code_understanding example (#15324)
- **Description:** in the code_understanding.ipynb example, the loader
errors out on the
langchain/libs/community/tests/examples/non-utf8-encoding.py file, so I
updated the loader to exclude that file. Excluding that file allows the
example to run.
  - **Issue:** not applicable
  - **Dependencies:** none
2023-12-29 11:50:05 -08:00
Bagatur
8bfac1a319 community[patch]: Release 0.0.7 (#15320) 2023-12-29 13:10:23 -05:00
Harrison Chase
c3b3b77a11 [core] add test for json parser (#15297)
this should fail, but isnt

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-29 09:59:39 -08:00
Nuno Campos
ec090745a6 Improve markdown list parser (#15295)
- do not match text after - in the middle of a sentence

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-29 09:59:21 -08:00
Bagatur
50e99ec601 langchain[patch]: Release 0.0.353 (#15322) 2023-12-29 12:02:51 -05:00
Bagatur
8e06472c91 docs: add use cases index (#15279) 2023-12-29 12:02:31 -05:00
Bagatur
80ceed6da5 core[patch]: Release 0.1.4 (#15319) 2023-12-29 11:33:06 -05:00
Nuno Campos
36ceffd2cd Strip code block fences and extra test from xml when doing streaming … (#15293)
…parse

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-28 16:37:15 -08:00
Diego Rani Mazine
ec72225265 refactor: enable connection pool usage in PGVector (#11514)
- **Description:** `PGVector` refactored to use connection pool.
  - **Issue:** #11433,
  - **Tag maintainer:** @hwchase17 @eyurtsev,

---------

Co-authored-by: Diego Rani Mazine <diego.mazine@mercadolivre.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-28 15:07:16 -08:00
chyroc
507c195a4b Patch: improve openai functions call parser compatibility (#15197)
```shell
Python 3.11.6 (main, Nov  2 2023, 04:39:43) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = {'name': 'gc', 'arguments': '{"prompt":"hi\nbob."}'}
>>> import json
>>> json.loads(s['arguments'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 14 (char 13)
>>> json.loads(s['arguments'].replace('\n', '\\n'))
{'prompt': 'hi\nbob.'}
>>>
```

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-28 15:06:27 -08:00
joshy-deshaw
bf5385592e core, community: propagate context between threads (#15171)
While using `chain.batch`, the default implementation uses a
`ThreadPoolExecutor` and run the chains in separate threads. An issue
with this approach is that that [the token counting
callback](https://python.langchain.com/docs/modules/callbacks/token_counting)
fails to work as a consequence of the context not being propagated
between threads. This PR adds context propagation to the new threads and
adds some thread synchronization in the OpenAI callback. With this
change, the token counting callback works as intended.

Having the context propagation change would be highly beneficial for
those implementing custom callbacks for similar functionalities as well.

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-28 14:51:22 -08:00
Nuno Campos
f74151b4e4 Make all json parsing less strict by default (#15287)
- Enables strict=False by default
- Uses partial json recovery logic by default

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-28 14:48:53 -08:00
Harrison Chase
bc5a0ef6ca remove chat-history (#15286) 2023-12-28 14:22:16 -08:00
Harrison Chase
90aa26a90e [langchain] agents code changes (#15278)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
2023-12-28 13:39:08 -08:00
Harrison Chase
b86803153e [core, langchain] modelio code improvements (#15277) 2023-12-28 12:56:20 -08:00
shroominic
694bbb14cd community: fix typo in async ollama chat (#15276)
Made a stupid typo in the last PR which got already merged😅
2023-12-28 09:56:55 -08:00
triThirty
fea4888e72 community: Enhance Github error prompt (#15248)
- **Description:** The Github error prompt is confused because of JWT
enctrypt to somebody not familiar with Github connection method. This PR
is to add some useful error prompt to help users troubleshooting.
- **Issue:**
https://github.com/langchain-ai/langchain/issues/14550#issuecomment-1867445049
  - **Dependencies:** None,
  - **Twitter handle:** None
2023-12-28 08:25:19 -08:00
Christopher Queen
d5e1725ace langchain: Fix for issue #14631 - .devcontainer doesnt build (#15251)
- **Description:** Fix for issue #14631
- **Issue:** This fixes [Issue
#14631](https://github.com/langchain-ai/langchain/issues/14631)
- **Twitter handle:** [@consultchrisq
](https://twitter.com/consultchrisq?lang=en)
2023-12-28 08:25:03 -08:00
Samuel Path
5e3c3cd425 Fix typo (#15202)
Small typo fix in the templates docs: `languge` -> `language`
2023-12-28 08:24:41 -08:00
Shorthills AI
1343c746c5 Fixed small gramm mistakes (#15246)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
Co-authored-by: AashiGuptaShorthillsAI <144897730+AashiGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: ShamshadAhmedShorthillsAI <144897733+ShamshadAhmedShorthillsAI@users.noreply.github.com>
Co-authored-by: ManpreetShorthillsAI <142380984+ManpreetShorthillsAI@users.noreply.github.com>
2023-12-28 08:11:21 -08:00
Bob Lin
a464eb4394 community: Make doctran synchronous (#15264)
### Description

I found that the methods in [the doctran
library](https://github.com/psychic-api/doctran) have been restructured
into [synchronized
versions](14944a59f7),

And [the example
ipynb](https://github.com/psychic-api/doctran/blob/main/examples.ipynb)
also shows that the code is synchronized, but the README has not been
updated yet.

so we need to modify the code and update the documentation.

### Issue

https://github.com/langchain-ai/langchain/issues/14645
2023-12-28 08:05:24 -08:00
Brendan Smith
9a16590aa9 langchain: Fix class name in RetryOutputParser docstring (#15268)
`OutputFixingParser` -> `RetryOutputParser`



![i'm-helping](https://github.com/langchain-ai/langchain/assets/5986636/68f1b8ce-8a6e-4e75-9cf8-e3c93ac562c2)
2023-12-28 08:03:46 -08:00
Nuno Campos
22b3a233b8 Update passthrough.py (#15252)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-27 22:12:32 -08:00
chyroc
6fb3cc6f27 Fix: Use Union instead of | to improve compatibility, fix #15244 (#15245) 2023-12-27 22:06:42 -08:00
Nuno Campos
6a5a2fb9c8 Add .pick and .assign methods to Runnable (#15229)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-27 13:35:34 -08:00
Nuno Campos
0252a24471 Implement nicer runnable seq constructor, Propagate name through Runn… (#15226)
…ableBinding

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-27 11:24:32 -08:00
Nuno Campos
f36ef0739d Add create_conv_retrieval_chain func (#15084)
```
                                                     +----------+
                                                     | MapInput |
                                                   **+----------+****
                                               ****                  ****
                                           ****                          ***
                                         **                                 ****
                  +------------------------------------+                        **
                  | Lambda(itemgetter('chat_history')) |                         *
                  +------------------------------------+                         *
                                     *                                           *
                                     *                                           *
                                     *                                           *
                       +---------------------------+            +--------------------------------+
                       | Lambda(_get_chat_history) |            | Lambda(itemgetter('question')) |
                       +---------------------------+            +--------------------------------+
                                     *                                           *
                                     *                                           *
                                     *                                           *
                      +----------------------------+                +------------------------+
                      | ContextSet('chat_history') |                | ContextSet('question') |
                      +----------------------------+                +------------------------+
                                               ****                  ****
                                                   ****          ****
                                                       **      **
                                                     +-----------+
                                                     | MapOutput |
                                                     +-----------+
                                                           *
                                                           *
                                                           *
                                                  +----------------+
                                                  | PromptTemplate |
                                                  +----------------+
                                                           *
                                                           *
                                                           *
                                                    +-------------+
                                                    | FakeListLLM |
                                                    +-------------+
                                                           *
                                                           *
                                                           *
                                                  +-----------------+
                                                  | StrOutputParser |
                                                  +-----------------+
                                                           *
                                                           *
                                                           *
                                            +----------------------------+
                                            | ContextSet('new_question') |
                                            +----------------------------+
                                                           *
                                                           *
                                                           *
                                                +---------------------+
                                                | SequentialRetriever |
                                                +---------------------+
                                                           *
                                                           *
                                                           *
                                        +------------------------------------+
                                        | Lambda(_reduce_tokens_below_limit) |
                                        +------------------------------------+
                                                           *
                                                           *
                                                           *
                                           +-------------------------------+
                                           | ContextSet('input_documents') |
                                           +-------------------------------+
                                                           *
                                                           *
                                                           *
                                                     +----------+
                                                  ***| MapInput |****
                                           *******   +----------+    ********
                                   ********                *                 *******
                            *******                         *                       ********
                        ****                                *                               ****
+-------------------------------+            +----------------------------+            +----------------------------+
| ContextGet('input_documents') |            | ContextGet('chat_history') |            | ContextGet('new_question') |
+-------------------------------+****        +----------------------------+            +----------------------------+
                                     *********                *                 *******
                                              ********         *          ******
                                                      *****    *      ****
                                                         +-----------+
                                                         | MapOutput |
                                                         +-----------+
                                                                *
                                                                *
                                                                *
                                                        +-------------+
                                                        | FakeListLLM |
                                                        +-------------+
                                                                *
                                                                *
                                                                *
                                                          +----------+
                                                       ***| MapInput |***
                                               ********   +----------+   ******
                                        *******                 *              *****
                                ********                        *                   ******
                            ****                                *                         ***
    +-------------------------------+            +----------------------------+            +-------------+
    | ContextGet('input_documents') |            | ContextGet('new_question') |          **| Passthrough |
    +-------------------------------+            +----------------------------+   *******  +-------------+
                                     *******                 *              ******
                                            ******           *       *******
                                                  ****      *    ****
                                                     +-----------+
                                                     | MapOutput |
                                                     +-----------+
```

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-26 17:28:10 -08:00
Harrison Chase
4ad77f777e [core] prompt changes (#15186)
change it to pass all variables through all the way in invoke
2023-12-26 15:52:17 -08:00
Nuno Campos
ccf9c8e0be Better input and output schemas for chains that start or end with a R… (#15185)
…unnableAssign or RunnablePick

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-26 15:21:13 -08:00
Nuno Campos
8cdc633465 Implement RunnablePassthrough.pick() (#15184)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-26 14:01:20 -08:00
Vardhaman
15e53a99b2 docs: updated wrong output in Upstash Redis Cache section of LLM Ca… (#15140)
…ching documentation

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->


- **Description:** Fixed the wrong output and code block comment in
`Upstash Redis` Cache section of LLM Caching documentation,
  - **Issue:** #15139 ,
  - **Dependencies:** N/A,
- **Twitter handle:** [@vardhaman722](https://twitter.com/vardhaman722)
2023-12-26 13:08:21 -08:00
chyroc
1abcf441ae Refactor: use SecretStr for Predibase llms (#15119) 2023-12-26 13:01:42 -08:00
chyroc
0a9a73a9c9 Refactor: use SecretStr for PipelineAI llms (#15120) 2023-12-26 13:00:58 -08:00
chyroc
d63ceb65b3 Refactor: use SecretStr for StochasticAI llms (#15118) 2023-12-26 12:59:51 -08:00
chyroc
674fde87d2 Refactor: use SecretStr for VolcEngineMaas llms (#15117) 2023-12-26 12:59:08 -08:00
chyroc
3cc1da2b38 Refactor: use SecretStr for Petals llms (#15121) 2023-12-26 12:57:37 -08:00
Quy Tang
7ef25a3c1b Implement stream and astream for RunnableLambda (#14794)
**Description:** Implement stream and astream methods for RunnableLambda
to make streaming work for functions returning Runnable
  - **Issue:** https://github.com/langchain-ai/langchain/issues/11998
  - **Dependencies:** No new dependencies
  - **Twitter handle:** https://twitter.com/qtangs

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
2023-12-26 12:49:02 -08:00
Nuno Campos
7e26559256 Fix runnable vistitor for funcs without pos args (#15182) 2023-12-26 12:42:24 -08:00
Harrison Chase
b4a0d206d9 [core: minor] fix getters (#15181) 2023-12-26 12:32:55 -08:00
Bagatur
56fad2e8ff langchain[minor]: Add stuff docs runnable (#15178)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-26 12:20:00 -08:00
Harrison Chase
63916cfe35 [core] langauge model like (#15180) 2023-12-26 12:19:50 -08:00
shroominic
e6f0cee896 community: Async Ollama + ChatOllama (#15169)
**Description:**
Adding async methods to booth OllamaLLM and ChatOllama to enable async
streaming and async .on_llm_new_token callbacks.

**Issue:**
ChatOllama is not working in combination with an AsyncCallbackManager
because the .on_llm_new_token method is not awaited.
2023-12-26 12:08:04 -08:00
KallieLev
3154c9bc9f docs: Update dependencies installation cell in steam toolkit (#15148)
**Description:** `decouple` is not the correct package, it's
`python-decouple`, and the notebook cell doesn't compile.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-26 12:07:03 -08:00
Harrison Chase
33e024ad10 [core] print ascii (#15179) 2023-12-26 11:43:14 -08:00
Phill Zarfos
35896faab7 community: correct spelling mistakes of "Suffle" and "reporoducibility" (#15172)
- **Description:** Correct spelling mistakes of "Suffle" and
"reporoducibility" in `DirectoryLoader` class
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2023-12-26 11:22:59 -08:00
chyroc
3a3f880e5a Patch: improve ollama 404 api error message, fix #15147 (#15156)
Make this issue more clearly exposed to developers
2023-12-26 11:07:39 -08:00
Bastiaan Quast
e52a734818 Oxford comma, consistent with format elsewhere (#15167)
This document uses Oxford comma (A, B, and C), in this list the comma
was missing before "and".

This PR corrects that.

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-26 11:07:09 -08:00
Shorthills AI
f59d0d3b20 Corrected an grammatical mistake (#15163)
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
Co-authored-by: AashiGuptaShorthillsAI <144897730+AashiGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: ShamshadAhmedShorthillsAI <144897733+ShamshadAhmedShorthillsAI@users.noreply.github.com>
2023-12-26 11:06:53 -08:00
Harrison Chase
83232d7e94 add multitenancy (#15176) 2023-12-26 09:08:32 -08:00
Nuno Campos
a2d3042823 Improve graph repr for runnable passthrough and itemgetter (#15083)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-22 16:05:48 -08:00
Nuno Campos
0d0901ea18 Nc/dec22/runnable graph lambda (#15078)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-22 14:36:46 -08:00
Ivan
59d4b80a92 [community]: Elasticsearch chat history encoding (#15055)
- Added ensure_ascii property to ElasticsearchChatMessageHistory

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Ivan Chetverikov <ivan.chetverikov@raftds.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-22 13:21:34 -08:00
Corey Brown
9e492620d4 Don't reassign chunk_type (#14923)
**Description**: The parameter chunk_type was being hard coded to
"extractive_answers", so that when "snippet" was being passed, it was
being ignored. This change simply doesn't do that.
2023-12-22 13:20:53 -08:00
Takuya Igei
6da2246215 Add support Vertex AI Gemini uses a public image URL (#14949)
## What

Since `langchain_google_genai.ChatGoogleGenerativeAI` supported A public
image URL, we add to support it in `langchain.chat_models.ChatVertexAI`
as well.

### Example

```py
from langchain.chat_models.vertexai import ChatVertexAI
from langchain_core.messages import HumanMessage

llm = ChatVertexAI(model_name="gemini-pro-vision")
image_message = {
    "type": "image_url",
    "image_url": {
        "url": "https://python.langchain.com/assets/images/cell-18-output-1-0c7fb8b94ff032d51bfe1880d8370104.png",
    },
}
text_message = {
    "type": "text",
    "text": "What is shown in this image?",
}
message = HumanMessage(content=[text_message, image_message])

output = llm([message])
print(output.content)
```

## Refs

-
https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm
-
https://python.langchain.com/docs/integrations/chat/google_generative_ai
2023-12-22 13:19:09 -08:00
Archan Ghosh
affa3e755a Update arxiv.py with get_summaries_as_docs inside of Arxivloader (#14953)
Added the call function get_summaries_as_docs inside of Arxivloader

- **Description:** Added a function that returns the documents from
get_summaries_as_docs, as the call signature is present in the parent
file but never used from Arxivloader, this can be used from Arxivloader
itself just like .load() as both the signatures are same.
- **Issue:** Reduces time to load papers as no pdf is processed only
metadata is pulled from Arxiv allowing users for faster load times on
bulk loads. Users can then choose one or more paper and use ID directly
with .load() to load pdf thereby loading all the contents of the paper.
2023-12-22 13:14:22 -08:00
Sypherd
d4f45b1421 core(minor): Allow explicit types for ChatMessageHistory adds (#14967)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
## Description
Changes the behavior of `add_user_message` and `add_ai_message` to allow
for messages of those types to be passed in. Currently, if you want to
use the `add_user_message` or `add_ai_message` methods, you have to pass
in a string. For `add_message` on `ChatMessageHistory`, however, you
have to pass a `BaseMessage`. This behavior seems a bit inconsistent.
Personally, I'd love to be able to be explicit that I want to
`add_user_message` and pass in a `HumanMessage` without having to grab
the `content` attribute. This PR allows `add_user_message` to accept
`HumanMessage`s or `str`s and `add_ai_message` to accept `AIMessage`s or
`str`s to add that functionality and ensure backwards compatibility.

## Issue
* None

## Dependencies
* None

## Tag maintainer
@hinthornw
@baskaryan 

## Note
`make test` results in `make: *** No rule to make target 'test'.  Stop.`
2023-12-22 13:12:01 -08:00
ccurme
f2782f4c86 community: add args_schema to GmailSendMessage (#14973)
- **Description:** `tools.gmail.send_message` implements a
`SendMessageSchema` that is not used anywhere. `GmailSendMessage` also
does not have an `args_schema` attribute (this led to issues when
invoking the tool with an OpenAI functions agent, at least for me). Here
we add the missing attribute and a minimal test for the tool.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** N/A

---------

Co-authored-by: Chester Curme <chestercurme@microsoft.com>
2023-12-22 13:07:44 -08:00
Satin Wuker
e7ad834a21 docs/docs/get_started: fixing typos in quickstart.mdx (#15025)
Fixing typos: it's -> its
Fixing grammatical mistakes:
* having to worry -> worrying
* convert -> converts
* few main types -> a few main types

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-22 12:55:44 -08:00
Sid Sarasvati
0e3da6d8d2 Update youtube_transcript.ipynb (#15015)
add_video_info should be false in the first example

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-22 12:47:05 -08:00
Philip Kiely - Baseten
6342da333a community: refactor Baseten integration with new API endpoints & docs (#15017)
- **Description:** In response to user feedback, this PR refactors the
Baseten integration with updated model endpoints, as well as updates
relevant documentation. This PR has been tested by end users in
production and works as expected.
  - **Issue:** N/A
- **Dependencies:** This PR actually removes the dependency on the
`baseten` package!
  - **Twitter handle:** https://twitter.com/basetenco
2023-12-22 12:46:24 -08:00
Blane Honeycutt
3fc1b3553b Community: Adds ability to pass a Config to the boto3 client used by Bedrock (#15029)
# Description  
This PR adds the ability to pass a `botocore.config.Config` instance to
the boto3 client instantiated by the Bedrock LLM.

Currently, the Bedrock LLM doesn't support a way to pass a Config, which
means that some settings (e.g., timeouts and retry configuration)
require instantiating a new boto3 client with a Config and then
replacing the LLM's client:

```python
llm = Bedrock(
        region_name='us-west-2',
        model_id="anthropic.claude-v2",
        model_kwargs={'max_tokens_to_sample': 4096, 'temperature': 0},
)

llm.client = boto_client('bedrock-runtime', region_name='us-west-2', config=Config({'read_timeout': 300}))
```

# Issue
N/A

# Dependencies
N/A
2023-12-22 12:42:56 -08:00
Grzegorz Sajko
dc71fcfabf corrected outdated link (#15053)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-22 12:39:38 -08:00
chyroc
0e149bbb4c Improve: remove extra spaces in get_from_env error (#15064) 2023-12-22 11:50:03 -08:00
Ran
c3f8733aef fix: correct spelling mistakes of "seperate, intialise, pre-defined" (#14647)
fix spellings

**seperate -> separate**: found more occurrences, see
https://github.com/langchain-ai/langchain/pull/14602
**initialise -> intialize**: the latter is more common in the repo
**pre-defined > predefined**: adding a comma after a prefix is a
delicate matter, but this is a generally accepted word

also, another word that appears in the repo is "fs" (stands for
filesystem), e.g., in `libs/core/langchain_core/prompts/loading.py`
` """Unified method for loading a prompt from LangChainHub or local
fs."""`
Isn't "filesystem" better?
2023-12-22 11:49:35 -08:00
chyroc
86d27fd684 Fix: fix partners name typo in tests (#15066)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Ran <rccalman@gmail.com>
2023-12-22 11:48:39 -08:00
Harrison Chase
2e159931ac add defaults for tavily (#15075) 2023-12-22 11:48:26 -08:00
chyroc
4440ec5ab3 Refactor: use SecretStr for minimax embeddings (#15067) 2023-12-22 11:43:23 -08:00
chyroc
aa19ca9723 Refactor: use SecretStr for jina embeddings (#15068)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-22 11:42:29 -08:00
Leonid Ganeline
f9230e005b book reference (#15072)
Added a  reference to to a new book about `LangChain`.
2023-12-22 11:41:23 -08:00
Nuno Campos
7d5800ee51 Add Runnable.get_graph() to get a graph representation of a Runnable (#15040)
It can be drawn in ascii with Runnable.get_graph().draw()
2023-12-22 11:40:45 -08:00
Eugene Yurtsev
aad3d8bd47 langchain(patch): Restrict paths in LocalFileStore cache (#15065)
This PR restricts the paths that can be resolve using the local file system cache so that all paths must be contained within the root path.
2023-12-22 11:20:17 -05:00
Michael Goin
501cc8311d community[patch]: Fix generation_config not setting properly for DeepSparse (#15036)
- **Description:** Tiny but important bugfix to use a more stable
interface for specifying generation_config parameters for DeepSparse LLM
2023-12-22 01:39:22 -05:00
QIAN Zifei
2460f977c5 community[minor]: Azure DocumentIntelligenceLoader/Parser support update with latest SDK (#14389)
- **Description:**
Add DocumentIntelligenceLoader & DocumentIntelligenceParser
implementation using the latest Azure Document Intelligence SDK with
markdown support.
The core logic resides in DocumentIntelligenceParser and
DocumentIntelligenceLoader is a mere wrapper of the parser.
The parser will takes api_endpoint and api_key and creates
DocumentIntelligenceClient for the user. 4 parsing modes are supported:
1. Markdown (default)
2. Single
3. Page 
4. Object

UT and notebook are also updated accordingly.

- **Dependencies:** Azure Document Intelligence SDK:
azure-ai-documentintelligence
[azure-sdk-for-python/sdk/documentintelligence/azure-ai-documentintelligence
at 7c42462ac662522a6fd21b17d2a20f4cd40d0356 · Azure/azure-sdk-for-python
(github.com)](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-python%2Ftree%2F7c42462ac662522a6fd21b17d2a20f4cd40d0356%2Fsdk%2Fdocumentintelligence%2Fazure-ai-documentintelligence&data=05%7C01%7CZifei.Qian%40microsoft.com%7C298225aa3e31468a863108dbf07374ff%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638368150928704292%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oE0Sl4HERnMKdbkV9KgBV46Z2xytcQAShdTWf7ZNl%2Bs%3D&reserved=0).

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-21 16:40:27 -08:00
Ran
129a929d69 infra: Fix test filesystem paths incompatible with windows (#14388)
- **Description:** This PR fixes test failures on Windows caused by path
handling differences and unescaped special characters in regex. The
failing tests are:
```
FAILED tests/unit_tests/storage/test_filesystem.py::test_yield_keys - AssertionError: assert ['key1', 'subdir\\key2'] == ['key1', 'subdir/key2']
FAILED tests/unit_tests/test_imports.py::test_importable_all - ModuleNotFoundError: No module named 'langchain_community.langchain_community\\adapters'
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_absolute - re.error: incomplete escape \U at position 53
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_parent_dir - re.error: incomplete escape \U at position 69
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_for_symlink_outside_root - re.error: incomplete escape \U at position 64
```

- **Issue:** fixes
https://github.com/langchain-ai/langchain/issues/11775 (partially)
- **Dependencies:** none
2023-12-21 13:45:42 -08:00
Nuno Campos
71076cceaf Move json and xml parsers to core (#15026)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-21 12:36:56 -08:00
Nuno Campos
d5533b7081 Add option to make messages placeholder optional (#15031)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-21 12:36:37 -08:00
Bagatur
40f42b8947 community[patch]: Release 0.0.6 (#15023) 2023-12-21 14:37:44 -05:00
Bagatur
7eb1100925 core[patch]: Release 0.1.3 (#15022) 2023-12-21 14:35:15 -05:00
Nuno Campos
63e512b680 Implement streaming for all list output parsers (#14981)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-21 11:30:35 -08:00
Nuno Campos
b471166df7 Implement streaming for xml output parser (#14984)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-21 11:30:18 -08:00
Erick Friis
94bc3967a1 infra: api docs build order (#15018) 2023-12-21 11:05:02 -08:00
Jacob Lee
1b01ee0e3c community[minor]: add hf chat wrapper (#14736)
Builds on #14040 with community refactor merged and notebook updated.

Note that with this refactor, models will be imported from
`langchain_community.chat_models.huggingface` rather than the main
`langchain` repo.

---------

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Signed-off-by: Yuchen Liang <yuchenl3@andrew.cmu.edu>
Co-authored-by: Andrew Reed <andrew.reed.r@gmail.com>
Co-authored-by: Andrew Reed <areed1242@gmail.com>
Co-authored-by: A-Roucher <aymeric.roucher@gmail.com>
Co-authored-by: Aymeric Roucher <69208727+A-Roucher@users.noreply.github.com>
2023-12-21 12:28:30 -05:00
Leonid Kuligin
b99274c9d8 community[patch]: changed default for VertexAIEmbeddings (#14614)
Replace this entire comment with:
- **Description:** @kurtisvg has raised a point that it's a good idea to
have a fixed version for embeddings (since otherwise a user might run a
query with one version vs a vectorstore where another version was used).
In order to avoid breaking changes, I'd suggest to give users a warning,
and make a `model_name` a required argument in 1.5 months.
2023-12-21 12:15:19 -05:00
Yannick Müller
138bc49759 docs: fixed wrong link in documentation (#14999)
See #14998
2023-12-21 12:06:43 -05:00
Karim Lalani
228ddabc3b community: fix for surrealdb client 0.3.2 update + store and retrieve metadata (#14997)
Surrealdb client changes from 0.3.1 to 0.3.2 broke the surrealdb vectore
integration.
This PR updates the code to work with the updated client. The change is
backwards compatible with previous versions of surrealdb client.
Also expanded the vector store implementation to store and retrieve
metadata that's included with the document object.
2023-12-21 12:04:57 -05:00
Ikko Eltociear Ashimine
c7be59c122 docs: Update templates README.md (#15013)
Mulitple -> Multiple
2023-12-21 12:04:05 -05:00
Lance Martin
535db72607 Update Ollama multi-modal multi-vector template README.md (#14995) 2023-12-20 20:07:38 -08:00
Lance Martin
94586ec242 Update Ollama multi-modal template README.md (#14994) 2023-12-20 20:07:27 -08:00
Lance Martin
1db7450bc2 Update Gemini template README.md (#14993) 2023-12-20 20:07:20 -08:00
Lance Martin
8996d1a65d Update multi-modal multi-vector template README.md (#14992) 2023-12-20 20:07:12 -08:00
Lance Martin
448b4d3522 Update multi-modal template README.md (#14991)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-20 20:06:52 -08:00
JaguarDB
ca0a75e1fc community[patch]: JaguarHttpClient conditional import (#14985)
- **Description:** Fixed jaguar.py to import JaguarHttpClient with try
and catch
- **Issue:** the issue # Unable to use the JaguarHttpClient at run time
  - **Dependencies:** It requires "pip install -U jaguardb-http-client" 
  - **Twitter handle:** workbot

---------

Co-authored-by: JY <jyjy@jaguardb>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 19:11:57 -08:00
Michael Landis
1c934fff0e community[patch]: support momento vector index filter expressions (#14978)
**Description**

For the Momento Vector Index (MVI) vector store implementation, pass
through `filter_expression` kwarg to the MVI client, if specified. This
change will enable the MVI self query implementation in a future PR.

Also fixes some integration tests.
2023-12-20 19:11:43 -08:00
Yacine
300c1cbf92 community[patch]: Fix typo in class Docstring (#14982)
- **Description:** Fix typo in class Docstring to replace
AZURE_OPENAI_API_ENDPOINT by AZURE_OPENAI_ENDPOINT
  - **Issue:** the issue #14901 
  - **Dependencies:** NA
  - **Twitter handle:**

Co-authored-by: Yacine Bouakkaz <Yacine.Bouakkaz@evokegroup.com>
2023-12-20 19:03:45 -08:00
Lance Martin
320c3ae4c8 templates: Add Ollama multi-modal templates (#14868)
Templates for [local multi-modal
LLMs](https://llava-vl.github.io/llava-interactive/) using -
* Image summaries
* Multi-modal embeddings

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-20 15:28:53 -08:00
chyroc
57d1eb733f core[patch]: update langchain-core runtime library name (#14884)
Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-20 14:35:48 -08:00
Quy Tang
42822484ef core(minor): Implement stream and astream for RunnableBranch (#14805)
* This PR adds `stream` implementations to Runnable Branch.
* Runnable Branch still does not support `transform` so it'll break streaming if it happens in middle or end of sequence, but will work if happens at beginning of sequence.
* Fixes use the async callback manager for async methods
* Handle BaseException rather than Exception, so more errors could be logged as errors when they are encountered


---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-12-20 15:37:56 -05:00
Leonid Ganeline
65a9193db2 docs: alibaba cloud (#14772)
The [provider
page](https://python.langchain.com/docs/integrations/providers/alibabacloud_opensearch)
holds the vector store information. The [Chat
example](https://python.langchain.com/docs/integrations/chat/pai_eas_chat_endpoint)
was incorrectly sorted in the navbar because of the wrong file name.
- Recreated a provide page
- Added missed links and descriptions
- Compound information about vector store from two pages into one
- Fixed file name
2023-12-20 12:32:33 -08:00
Bagatur
99f839d6f3 infra: pr template update (#14963) 2023-12-20 11:53:38 -08:00
MING KANG
ed5e0cfe57 community: add OCI Endpoint (#14250)
- **Description:** 
- [OCI Data
Science](https://docs.oracle.com/en-us/iaas/data-science/using/home.htm)
is a fully managed and serverless platform for data science teams to
build, train, and manage machine learning models in the Oracle Cloud
Infrastructure. This PR add integration for using LangChain with an LLM
hosted on a [OCI Data Science Model
Deployment](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm).
To authenticate,
[oracle-ads](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html)
has been used to automatically load credentials for invoking endpoint.
- **Issue:** None
- **Dependencies:** `oracle-ads`
- **Tag maintainer:** @baskaryan
- **Twitter handle:** None

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-20 11:52:20 -08:00
Erick Friis
75ba22793f community: Vectara summarization (#14970)
Description: Adding Summarization to Vectara, to reflect it provides not
only vector-store type functionality but also can return a summary.
Also added:
MMR capability (in the Vectara platform side)

Updated templates

Updated documentation and IPYNB examples

Tag maintainer: @baskaryan
Twitter handle: @ofermend

---------

Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com>
2023-12-20 11:51:33 -08:00
Erick Friis
cf6951a0c9 docs: links (#14940) 2023-12-20 11:51:18 -08:00
Liang Zhang
6479aab74f community[patch]: Add param "task" to Databricks LLM to work around serialization of transform_output_fn (#14933)
**What is the reproduce code?**

```python
from langchain.chains import LLMChain, load_chain
from langchain.llms import Databricks
from langchain.prompts import PromptTemplate

def transform_output(response):
    # Extract the answer from the responses.
    return str(response["candidates"][0]["text"])

def transform_input(**request):
    full_prompt = f"""{request["prompt"]}
    Be Concise.
    """
    request["prompt"] = full_prompt
    return request

chat_model = Databricks(
    endpoint_name="llama2-13B-chat-Brambles",
    transform_input_fn=transform_input,
    transform_output_fn=transform_output,
    verbose=True,
)
print(f"Test chat model: {chat_model('What is Apache Spark')}") # This works

llm_chain = LLMChain(llm=chat_model, prompt=PromptTemplate.from_template("{chat_input}"))
llm_chain("colorful socks") # this works
llm_chain.save("databricks_llm_chain.yaml") # transform_input_fn and transform_output_fn are not serialized into the model yaml file
loaded_chain = load_chain("databricks_llm_chain.yaml") # The Databricks LLM is recreated with transform_input_fn=None, transform_output_fn=None.
loaded_chain("colorful socks") # Thus this errors. The transform_output_fn is needed to produce the correct output
```


Error:
```
 File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-6c34afab-3473-421d-877f-1ef18930ef4d/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation
text
  str type expected (type=type_error.str)
 request payload: {'query': 'What is a databricks notebook?'}'}
```

**What does the error mean?**

When the LLM generates an answer, represented by a Generation data
object. The Generation data object takes a str field called text, e.g.
Generation(text=”blah”). However, the Databricks LLM tried to put a
non-str to text, e.g. Generation(text={“candidates”:[{“text”: “blah”}]})
Thus, pydantic errors.

**Why the output format becomes incorrect after saving and loading the
Databricks LLM?**

Databrick LLM does not support serializing transform_input_fn and
transform_output_fn, so they are not serialized into the model yaml
file. When the Databricks LLM is loaded, it is recreated with
transform_input_fn=None, transform_output_fn=None. Without
transform_output_fn, the output text is not unwrapped, thus errors.

Missing transform_output_fn causes this error.
Missing transform_input_fn causes the additional prompt “Be Concise.” to
be lost after saving and loading.
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 12:50:23 -05:00
Bagatur
1ea6d83188 langchain[patch]: Release 0.0.352 (#14961) 2023-12-20 10:27:03 -05:00
Bagatur
b03845e069 community[patch]: Release 0.0.5 (#14960) 2023-12-20 10:25:15 -05:00
Bagatur
a841f62791 core[patch]: 0.1.2 (#14959) 2023-12-20 10:13:54 -05:00
Anush
60c70effe9 community[minor]: Qdrant sparse vector retriever (#14814)
## Description

This PR intends to add support for Qdrant's new [sparse vector
retrieval](https://qdrant.tech/articles/sparse-vectors/) by introducing
a new retriever class, `QdrantSparseVectorRetriever`.

Necessary usage docs and integration tests have been added for the
retriever.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 02:22:19 -05:00
mogith-pn
c53fab63a3 community[patch]: Fixed duplicate input id issue in clarifai vectorstore (#14914)
- **Description:** 
This PR fixes the issue faces with duplicate input id in Clarifai
vectorstore class when ingesting documents into the vectorstore more
than the batch size.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 02:21:36 -05:00
Sypherd
5642132c0c community[patch]: Add safe lookup to OpenAI response adapter (#14765)
## Description
Similar to https://github.com/langchain-ai/langchain/issues/5861, I've
experienced `KeyError`s resulting from unsafe lookups in the
`convert_dict_to_message` function in [this
file](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/adapters/openai.py).
While that issue focused on `KeyError 'content'`, I've opened another
issue (#14764) about how the problem still exists in the same function
but with `KeyError 'role'`. The fix for #5861 only added a safe lookup
to the specific line that was giving them trouble.. This PR fixes the
unsafe lookup in the rest of the function but the problem still exists
across the repo.

## Issues
* #14764
* #5861 

## Dependencies
* None

## Checklist
[x] make format
[x] make lint
[ ] make test - Results in `make: *** No rule to make target 'test'.
Stop.`

## Maintainers
* @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 01:17:23 -05:00
AlpinDale
b0588774f1 community[minor]: Add Aphrodite Engine support (#14759)
This PR adds support for PygmalionAI's [Aphrodite
Engine](https://github.com/PygmalionAI/aphrodite-engine), based on
vLLM's attention mechanism. At the moment, this PR does not include
support for the API servers, but they will be added in a later PR.

The only dependency as of now is `aphrodite-engine==0.4.2`. We pin the
version to prevent breakage due to changes in the aphrodite-engine
library.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-20 01:16:57 -05:00
Dmitry Tyumentsev
d21f44b484 community[minor]: Add YandexGPT embeddings (#14767)
- **Description:** Introducing an ability to work with the
[YandexGPT](https://cloud.yandex.com/en/services/yandexgpt) embeddings
models.
---------

Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
2023-12-20 01:11:07 -05:00
Nicolas Suzor
529144649e community[patch]: add png support for vertexai._parse_chat_history_gemini() (#14788)
- **Description:** Modify community chat model vertexai to handle png
and other image types encoded in base64
  - **Dependencies:** added `import re` but no new dependencies.

This addresses a problem where the vertexai method
_parse_chat_history_gemini() was only recognizing image uris in jpeg
format. I made a simple change to cover other extension types.
2023-12-20 00:58:39 -05:00
Dr. Christoph Mittendorf
f348ad4ba8 docs: typo LLaMA2_sql_chat.ipynb (#14798)
"language" (right) vs "langugae" (wrong)
2023-12-20 00:54:06 -05:00
Liu Jun
b0c48dc983 community[patch]: make ak and sk optional in qianfan endpoint (#14835)
- **Description:** The Qianfan SDK offers multiple authentication
methods, but in the `QianfanEndpoint` of Langchain, it currently only
supports authentication through AK and SK. In order to accommodate users
who wish to use alternative authentication methods, this pull request
makes AK and SK optional. This change should not impact existing users,
while allowing users to configure other authentication methods as per
the Qianfan SDK documentation.
  - **Issue:** /
  - **Dependencies:** No
  - **Tag maintainer:** No
  - **Twitter handle:**
2023-12-20 00:49:33 -05:00
Archan Ghosh
65678b3816 community[patch]: Update arxiv.py with Entry ID as a return value (#14915)
Added Entry ID as a return value inside get_summaries_as_docs

- **Description:** Added the Entry ID as a return, so it's easier to
track the IDs of the papers that are being returned.


With the addition return of the entry ID in functions like
ArxivRetriever, it will be easier to reference the ID of the paper
itself.
2023-12-20 00:30:24 -05:00
thehunmonkgroup
dc20766513 docs: readme for langchain-mistralai (#14917)
- **Description:** Add README doc for MistralAI partner package.
  - **Tag maintainer:** @baskaryan
2023-12-20 00:22:43 -05:00
Elena Mata Yandiola
b66659fc28 docs: Clarification google_cloud_storage_directory.ipynb (#14922)
- Description: Just a minor add to the documentation to clarify how to
load all files from a folder. I assumed and try to do it specifying it
in the bucket (BUCKET/FOLDER), instead of using the prefix.
2023-12-20 00:21:42 -05:00
Ari Roffe
8bcadfd446 docs: nit embedding_distance.ipynb (#14929)
**Description:** Fix the docs about embedding distance evaluations
guide.
2023-12-20 00:13:17 -05:00
Yacine
20eacd4b5e docs: update notebook documentation for custom tool (#14942)
- **Description:** Documentation update. The custom tool notebook
documentation is updated to revome the warning caused by directly
instantiating of the LLMMathChain with an llm which is is deprecated.
The from_llm class method is used instead. LLM output results gets
updated as well.
  - **Issue:** no applicable
  - **Dependencies:** No dependencies
  - **Tag maintainer:** @baskaryan
  - **Twitter handle:** @ybouakkaz

Co-authored-by: Yacine Bouakkaz <Yacine.Bouakkaz@evokegroup.com>
2023-12-20 00:08:58 -05:00
Bagatur
345acb26ac community[patch]: Matching engine, return doc id (#14930) 2023-12-20 00:03:11 -05:00
Erick Friis
8a3360edf6 anthropic: beta messages integration (#14928) 2023-12-19 18:55:19 -08:00
Erick Friis
795cf2ddda together: package and embedding model (#14936) 2023-12-19 18:48:32 -08:00
Erick Friis
c21379438c docs: remove unused contributor steps (#14938) 2023-12-19 18:41:50 -08:00
William FH
758bcd4671 Add langsmith and benchmark repo links (#14931)
Think we could link to these in more places
2023-12-19 17:44:31 -08:00
João Galego
d306d89a9b template: Add Bedrock JCVD template (#14480)
This PR adds a simple LangChain template that uses [Anthropic's Claude
on Amazon Bedrock ⛰️](https://aws.amazon.com/bedrock/claude/) to behave
like JCVD.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-19 15:55:58 -08:00
Erick Friis
8b29b31554 cli: test_integration group (#14924) 2023-12-19 12:09:04 -08:00
Erick Friis
4d48aedea3 cli: 0.0.20 (#14920) 2023-12-19 11:56:21 -08:00
Erick Friis
bbb20804bd templates: fix sql-research-assistant (#14921) 2023-12-19 11:55:59 -08:00
Erick Friis
9ef2feb674 cli[patch]: add embedding to integration template (#14881) 2023-12-19 09:58:21 -08:00
Michael Feil
7b96de3d5d community[patch]: update Gradient embeddings (#14846)
- **Description:** Going forward, we have a own API `pip install
gradientai`. Therefore gradually removing the self-build packages in
llamaindex, haystack and langchain.
  - **Issue:** None.
  - **Dependencies:** `pip install gradientai`
  - **Tag maintainer:** @michaelfeil
2023-12-19 11:46:33 -05:00
Igor Dvorkin
6cc3c2452c community[patch]: Enhance iMessage chat loader with timestamp parsing and message ownership (#14804)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-19 11:09:01 -05:00
Mohammad Mohtashim
e3abe12243 community[patch]: helpful error message for GitHubAPIWrapper (#14803)
Very simple change in relation to the issue
https://github.com/langchain-ai/langchain/issues/14550

@baskaryan, @eyurtsev, @hwchase17.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-19 11:08:06 -05:00
Leonid Ganeline
922693caba docs: chunkviz reference (#14802)
Added a reference to the `Chunkviz` utility.
2023-12-19 10:58:16 -05:00
Dmitry Tyumentsev
50381abc42 community[patch]: Add retry logic to Yandex GPT API Calls (#14907)
**Description:** Added logic for re-calling the YandexGPT API in case of
an error

---------

Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
2023-12-19 10:51:42 -05:00
Sirjanpreet Singh Banga
425e5e1791 community[minor]: rename ChatGPTRouter to GPTRouter (#14913)
**Description:**: Rename integration to GPTRouter 
**Tag maintainer:** @Gupta-Anubhav12 @samanyougarg @sirjan-ws-ext  
**Twitter handle:** [@SamanyouGarg](https://twitter.com/SamanyouGarg)
2023-12-19 10:48:52 -05:00
JaguarDB
992b04e475 community[minor]: added jaguar vector store (#14838)
Description: A new vector store Jaguar is being added. Class, test
scripts, and documentation is added.
Issue: None -- This is the first PR contributing to LangChain
Dependencies: This depends on "pip install -U jaguardb-http-client"
client http package
Tag maintainer: @baskaryan, @eyurtsev, @hwchase1
Twitter handle: @workbot

---------

Co-authored-by: JY <jyjy@jaguardb>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-19 10:40:18 -05:00
Bagatur
a5be9f9475 mistralai: Add langchain-mistralai partner package (#14783)
Co-authored-by: Chad Phillips <chad@apartmentlines.com>
2023-12-19 10:34:19 -05:00
Sirjanpreet Singh Banga
44cb899a93 community[minor]: Integrating GPTRouter (#14900)
**Description:** Adding a langchain integration for
[GPTRouter](https://gpt-router.writesonic.com/) 🚀 ,
 **Tag maintainer:** @Gupta-Anubhav12 @samanyougarg @sirjan-ws-ext  
 **Twitter handle:** [@SamanyouGarg](https://twitter.com/SamanyouGarg)
 
Integration Tests Passing:
<img width="1137" alt="Screenshot 2023-12-19 at 5 45 31 PM"
src="https://github.com/Writesonic/langchain/assets/151817113/4a59df9a-ee30-47aa-9df9-b8c4eeb9dc76">
2023-12-19 10:08:36 -05:00
Bagatur
1069a93d18 langchain[patch]: export sagemaker LLMContentHandler (#14906)
Resolves #14904
2023-12-19 10:00:32 -05:00
Kostas Botsas
4f4b078bf3 docs: add reference for XataVectorStore constructor (#14903)
Adds doc reference to the XataVectorStore constructor for use with
existing Xata table contents.

@tsg @philkra
2023-12-19 09:04:46 -05:00
Leonid Ganeline
b2fd41331e docs: docstrings langchain_community update (#14889)
Addded missed docstrings. Fixed inconsistency in docstrings.

**Note** CC @efriis 
There were PR errors on
`langchain_experimental/prompt_injection_identifier/hugging_face_identifier.py`
But, I didn't touch this file in this PR! Can it be some cache problems?
I fixed this error.
2023-12-19 08:58:24 -05:00
William FH
583696732c [Partner] NVIDIA TRT Package (#14733)
Simplify #13976 and add as a separate package.

- [] Add README
- [X] Add doc notebook
- [X] Add simple LLM integration

---------

Co-authored-by: Jeremy Dyer <jdye64@gmail.com>
2023-12-18 19:08:25 -08:00
William FH
0d4cbbcc85 [Partner] Update google integration test (#14883)
Gemini has decided that pickle rick is unsafe:
https://github.com/langchain-ai/langchain/actions/runs/7256642294/job/19769249444#step:8:189


![image](https://github.com/langchain-ai/langchain/assets/13333726/cfbf4312-53b6-4290-84ee-6ce0742e739e)
2023-12-18 18:46:24 -08:00
William FH
f88af1f1cd [Partner] Google GenAi new release (#14882)
to support the system message merging

Also fix integration tests that weren't passing
2023-12-18 18:35:57 -08:00
Leonid Kuligin
2d0f1cae8c added history and support for system_message as param (#14824)
- **Description:** added support for chat_history for Google
GenerativeAI (to actually use the `chat` API) plus since Gemini
currently doesn't have a support for SystemMessage, added support for it
only if a user provides additional `convert_system_message_to_human`
flag during model initialization (in this case, SystemMessage would be
prepanded to the first HumanMessage)
  - **Issue:** #14710 
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
  - **Twitter handle:** lkuligin

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
2023-12-18 18:23:14 -08:00
Leonid Ganeline
2861766d0d Docs tencent pages update (#14879)
- updated `Tencent` provider page: added a chat model and document
loader references; company description
- updated Chat model and Document loader pages with descriptions, links
- renamed files to consistent formats; redirected file names
Note:
I was getting this linting error on code that **was not changed in my
PR**!

> Error:
docs/docs/guides/safety/hugging_face_prompt_injection.ipynb:1:1: I001
Import block is un-sorted or un-formatted
> make: *** [Makefile:47: lint_package] Error 1

I've fixed this error in the notebook
2023-12-18 18:21:39 -08:00
Timothy Ji
c5a685b10b OPENAI_PROXY not working (#14833)
Replace this entire comment with:
- **Description:** OPENAI_PROXY is not working for openai==1.3.9, The
`proxies` argument is deprecated. The `http_client` argument should be
passed instead,
  - **Issue:** OPENAI_PROXY is not working,
  - **Dependencies:** None,
  - **Tag maintainer:** @hwchase17 ,
  - **Twitter handle:** timothy66666
2023-12-18 18:06:14 -08:00
Oleksandr Yaremchuk
d82a3828f2 Improve prompt injection detection (#14842)
- **Description:** This is addition to [my previous
PR](https://github.com/langchain-ai/langchain/pull/13930) with
improvements to flexibility allowing different models and notebook to
use ONNX runtime for faster speed. Since the last PR, [our
model](https://huggingface.co/laiyer/deberta-v3-base-prompt-injection)
got more than 660k downloads, and with the [public
benchmark](https://huggingface.co/spaces/laiyer/prompt-injection-benchmark)
showed much fewer false-positives than the previous one from deepset.
Additionally, on the ONNX runtime, it can be running 3x faster on the
CPU, which might be handy for builders using Langchain.
 **Issue:** N/A
 - **Dependencies:** N/A
 - **Tag maintainer:** N/A 
- **Twitter handle:** `@laiyer_ai`
2023-12-18 17:50:21 -08:00
Harrison Chase
f8dccaa027 Harrison/agent docs custom (#14877) 2023-12-18 17:49:32 -08:00
abhjaw
6fbd068b3f Update kendra.py to avoid Kendra query ValidationException (#14866)
Fixing issue - https://github.com/langchain-ai/langchain/issues/14494 to
avoid Kendra query ValidationException

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** Update kendra.py to avoid Kendra query
ValidationException,
- **Issue:** the issue
#https://github.com/langchain-ai/langchain/issues/14494,
  - **Dependencies:** None,
  - **Tag maintainer:** ,
  - **Twitter handle:** 

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-18 17:46:18 -08:00
Michael Landis
7b2a68ac72 docs: fix typo in contributing re installing integration test deps (#14861)
**Description**

The contributing docs lists a poetry command to install community for
dev work that includes a poetry group called `integration_tests`. This
is a mistake: the poetry group for integration tests is called
`test_integration`, not `integration_tests`. See here:

https://github.com/langchain-ai/langchain/blob/master/libs/community/pyproject.toml#L119
2023-12-18 17:43:56 -08:00
Bin
07ba030a4e docs: fixed tiktoken link error (#14840)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** fixed tiktoken link error, 
  - **Issue:** no,
  - **Dependencies:** no,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
  - **Twitter handle:** no!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** fixed tiktoken link error, 
- **Issue:** no,
- **Dependencies:** no,
- **Tag maintainer:** @baskaryan,
- **Twitter handle:** SignetCode!
2023-12-18 17:16:22 -08:00
Leonid Ganeline
6577b0d987 docstrings langchain update (#14870)
Added missed docstrings
2023-12-18 17:16:08 -08:00
Kane Sweet
ea331f3136 Fix token text splitter duplicates (#14848)
- **Description:** 
- Add a break case to `text_splitter.py::split_text_on_tokens()` to
avoid unwanted item at the end of result.
    - Add a testcase to enforce the behavior.
  - **Issue:** 
    - #14649 
    - #5897
  - **Dependencies:** n/a,
 
---

**Quick illustration of change:**

```
text = "foo bar baz 123"

tokenizer = Tokenizer(
        chunk_overlap=3,
        tokens_per_chunk=7
)

output = split_text_on_tokens(text=text, tokenizer=tokenizer)
```
output before change: `["foo bar", "bar baz", "baz 123", "123"]`
output after change: `["foo bar", "bar baz", "baz 123"]`
2023-12-18 17:15:57 -08:00
Leonid Ganeline
14d04180eb docstrings core update (#14871)
Added missed docstrings
2023-12-18 17:13:35 -08:00
Harrison Chase
d2cce54bf1 WIP: sql research assistant (#14240) 2023-12-18 14:00:18 -08:00
Erick Friis
5f839beab9 community: replace deprecated davinci models (#14860)
This is technically a breaking change because it'll switch out default
models from `text-davinci-003` to `gpt-3.5-turbo-instruct`, but OpenAI
is shutting off those endpoints on 1/4 anyways.

Feels less disruptive to switch out the default instead.
2023-12-18 13:49:46 -08:00
Harrison Chase
193f107cb5 add methods to deserialize prompts that were old (#14857) 2023-12-18 13:45:08 -08:00
Bagatur
714bef0cb6 langchain[patch]: Release 0.0.351 (#14867) 2023-12-18 16:41:48 -05:00
Bagatur
61ad0e8be9 community[patch]: Release 0.0.4 (#14864) 2023-12-18 16:08:08 -05:00
Erick Friis
92957e6cdf docs[patch]: more keywords (#14858) 2023-12-18 10:58:53 -08:00
Erick Friis
9f851d8951 docs[patch]: gemini keywords (#14856) 2023-12-18 10:52:24 -08:00
Vadim Kudlay
23eb480c38 docs: update NVIDIA integration (#14780)
- **Description:** Modification of descriptions for marketing purposes
and transitioning towards `platforms` directory if possible.
- **Issue:** Some marketing opportunities, lodging PR and awaiting later
discussions.
  - 

This PR is intended to be merged when decisions settle/hopefully after
further considerations. Submitting as Draft for now. Nobody @'d yet.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-18 12:13:42 -05:00
Bob Lin
5de1dc72b9 community[patch]: Update Tongyi default model_name (#14844)
<img width="1305" alt="Screenshot 2023-12-18 at 9 54 01 PM"
src="https://github.com/langchain-ai/langchain/assets/10000925/c943fd81-cd48-46eb-8dff-4680424d9ba9">

The current model is no longer available.
2023-12-18 11:35:53 -05:00
William FH
5fc2c578cf [Bugfix] Ensure tool output is a str, for OAI Assistant (#14830)
Tool outputs have to be strings apparently. Ensure they are formatted
correctly before passing as intermediate steps.
 

```
BadRequestError: Error code: 400 - {'error': {'message': '1 validation error for Request\nbody -> tool_outputs -> 0 -> output\n  str type expected (type=type_error.str)', 'type': 'invalid_request_error', 'param': None, 'code': None}}
```
2023-12-17 20:02:18 -08:00
William FH
bbc98a234d Update parser (#14831)
Gpt-3.5 sometimes calls with empty string arguments instead of `{}`

I'd assume it's because the typescript representation on their backend
makes it a bit ambiguous.
2023-12-17 20:02:07 -08:00
Vlad Kolesnikov
11fda490ca community[minor]: New model parameters and dynamic batching for VertexAIEmbeddings (#13999)
- **Description:** VertexAIEmbeddings performance improvements
  - **Twitter handle:** @vladkol

## Improvements

- Dynamic batch size, starting from 250, lowering down to 5. Batch size
varies across regions.
Some regions support larger batches, and it significantly improves
performance.
When running large batches of texts in `us-central1`, performance gain
can be up to 3.5x.
The dynamic batching also makes sure every batch is below 20K token
limit.
- New model parameter `embeddings_type` that translates to `task_type`
parameter of the API. Newer model versions support [different embeddings
task
types](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#api_changes_to_models_released_on_or_after_august_2023).
2023-12-17 22:24:22 -05:00
Peter Jausovec
2e6a9e6381 docs: Fix the broken link to Extraction page (#14806)
**Description:** fixing a broken link to the extraction doc page
2023-12-17 21:22:42 -05:00
Filippo Alimonda
462321f479 docs: typo in rag use case (#14800)
Description: Fixes minor typo to documentation
2023-12-17 21:22:25 -05:00
Erik Welch
6376fab957 docs: Fix link typo to /docs/integrations/text_embedding/nvidia_ai_endpoints (#14827)
This page doesn't exist:
-
https://python.langchain.com/docs/integrations/text_embeddings/nvidia_ai_endpoints

but this one does:
-
https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints
2023-12-17 21:16:59 -05:00
William FH
2d91d2b978 community: Add logprobs in gen output (#14826)
Now that it's supported again for OAI chat models .

Shame this wouldn't include it in the `.invoke()` output though (it's
not included in the message itself). Would need to do a follow-up for
that to be the case
2023-12-17 20:59:27 -05:00
Max
c316731d0f docs: Typo in Templates README.md (#14812)
Corrected path reference from package/pirate-speak to
packages/pirate-speak
2023-12-17 20:56:56 -05:00
Leonid Ganeline
59c3c344df docs redundant pages (#14774)
[ScaNN](https://python.langchain.com/docs/integrations/providers/scann)
and
[DynamoDB](https://python.langchain.com/docs/integrations/platforms/aws#aws-dynamodb)
pages in `providers` are redundant because we have those references in
the Google and AWS platform pages. It is confusing.
- I removed unnecessary pages, redirected files to new nams;
2023-12-17 14:54:48 -08:00
Yacine
2929509edd docs: ensure consistency in declaring LANGCHAIN_API_KEY... (#14823)
... variable, accompanied by a quote

Co-authored-by: Yacine Bouakkaz <Yacine.Bouakkaz@evokegroup.com>
2023-12-17 16:41:44 -05:00
Dmitry Tyumentsev
78ae276df7 community[patch]: fix agenerate return value (#14815)
Fixed:
  -  `_agenerate` return value in the YandexGPT Chat Model
  - duplicate line in the documentation

Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
2023-12-17 16:40:59 -05:00
sujeet
f1d3f29bc4 community[patch]: support for Sybase SQL anywhere added. (#14821)
- **Description:** support for Sybase SQL anywhere added in
sql_database.py file at path
langchain\libs\community\langchain_community\utilities
- **Issue:** It will resolve default schema setting for Sybase SQL
anywhere
  - **Dependencies:** No,
  - **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17,
  - **Twitter handle:** NA

---------

Co-authored-by: learn360sujeet <121271779+learn360sujeet@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-17 16:39:44 -05:00
Erick Friis
1acc7ffa3f infra: cut down on integration steps (#14785)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-17 12:55:59 -08:00
Erick Friis
8a07c56313 docs: developer docs (#14776)
Builds out a developer documentation section in the docs

- Links it from contributing.md
- Adds an initial guide on how to contribute an integration

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-17 12:55:49 -08:00
William FH
01693b291e Permit updates in indexing (#14482) 2023-12-16 13:34:33 -08:00
Erick Friis
133971053a docs[patch]: fix zoom (#14786)
not sure why quarto is removing divs
2023-12-15 17:46:12 -08:00
Noah Stapp
34e6f3ff72 community[patch]: Implement similarity_score_threshold for MongoDB Vector Store (#14740)
Adds the option for `similarity_score_threshold` when using
`MongoDBAtlasVectorSearch` as a vector store retriever.

Example use:

```
vector_search = MongoDBAtlasVectorSearch.from_documents(...)

qa_retriever = vector_search.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={
        "score_threshold": 0.5,
    }
)

qa = RetrievalQA.from_chain_type(
	llm=OpenAI(), 
	chain_type="stuff", 
	retriever=qa_retriever,
)

docs = qa({"query": "..."})
```

I've tested this feature locally, using a MongoDB Atlas Cluster with a
vector search index.
2023-12-15 16:49:21 -08:00
Dmitry Tyumentsev
dcead816df community[patch]: Update YandexGPT API (#14773)
Update LLMand Chat model to use new api version

---------

Co-authored-by: Dmitry Tyumentsev <dmitry.tyumentsev@raftds.com>
2023-12-15 16:25:09 -08:00
Leonid Ganeline
eca89f87d8 docs: google drive update (#14781)
The [Google Drive
toolkit](https://python.langchain.com/docs/integrations/toolkits/google_drive)
page is a duplicate of the [Google Drive
tool](https://python.langchain.com/docs/integrations/tools/google_drive)
page.
- Removed the `Google Drive toolkit` page (it shouldn't be a toolkit but
tool)
- Removed the correspondent reference in the Google platform page
- Redirected the removed page to the tool page.
2023-12-15 16:03:59 -08:00
Lance Martin
42421860bc Add image support for Ollama (#14713)
Support [LLaVA](https://ollama.ai/library/llava):
* Upgrade Ollama
* `ollama pull llava`

Ensure compatibility with [image prompt
template](https://github.com/langchain-ai/langchain/pull/14263)

---------

Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
2023-12-15 16:00:55 -08:00
Leonid Ganeline
1075e7d6e8 docs: cloudflare update (#14779)
Added provider page.
Added links, descriptions
2023-12-15 14:39:41 -08:00
Leonid Ganeline
132be82d7e docs: Steam update (#14778)
Updated the page title. It was inconsistent.
Updated page with links; description and setting details.
2023-12-15 14:18:53 -08:00
Harrison Chase
16399fd61d langchain[patch]: remove unused imports (#14680)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-15 14:12:02 -08:00
Karim Lalani
a0064330b1 community[minor]: Add SurrealDB vectorstore (#13331)
**Description:** Vectorstore implementation around
[SurrealDB](https://www.surrealdb.com)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-15 13:34:51 -08:00
William FH
c5296fd42c [Documentation] Updates to NVIDIA Playground/Foundation Model naming.… (#14770)
…  (#14723)

- **Description:** Minor updates per marketing requests. Namely, name
decisions (AI Foundation Models / AI Playground)
  - **Tag maintainer:** @hinthornw 

Do want to pass around the PR for a bit and ask a few more marketing
questions before merge, but just want to make sure I'm not working in a
vacuum. No major changes to code functionality intended; the PR should
be for documentation and only minor tweaks.

Note: QA model is a bit borked across staging/prod right now. Relevant
teams have been informed and are looking into it, and I'm placeholdered
the response to that of a working version in the notebook.

Co-authored-by: Vadim Kudlay <32310964+VKudlay@users.noreply.github.com>
2023-12-15 12:21:59 -08:00
William FH
65091ebe50 Update propositional-retrieval template (#14766)
More descriptive name. Add parser in ingest. Update image link
2023-12-15 07:57:45 -08:00
William FH
4855964332 Fix OAI Tool Message (#14746)
See format here:
https://platform.openai.com/docs/guides/function-calling/parallel-function-calling


It expects a "name" argument, which we aren't providing by default.


![image](https://github.com/langchain-ai/langchain/assets/13333726/7cd82978-337c-40a1-b099-3bb25cd57eb4)


Alternative is to add the 'name' field directly to the message if people
prefer.
2023-12-15 06:45:09 -08:00
William FH
e3132a7efc [Evals] End project (#14324)
Also does some cleanup.

Now that we support updating/ending projects, do this automatically.
Then you can edit the name of the project in the app.
2023-12-15 00:05:34 -08:00
William FH
93c7eb4e6b [Tracing] String Stacktrace (#14131)
Add full stacktrace
2023-12-14 22:15:07 -08:00
Leonid Kuligin
7f42811e14 google-genai[patch], community[patch]: Added support for new Google GenerativeAI models (#14530)
Replace this entire comment with:
  - **Description:** added support for new Google GenerativeAI models
  - **Twitter handle:** lkuligin

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-14 20:56:46 -08:00
William C Grisaitis
6bbf0797f7 docs: Remove trailing "`" in pip install command (#14730)
hi! just a simple typo fix in the local LLM python docs

- **Description:** removing a trailing "\`" character in a `!pip install
...` command
  - **Issue:** n/a
  - **Dependencies:** n/a
  - **Tag maintainer:** n/a
  - **Twitter handle:** n/a
2023-12-14 17:04:19 -08:00
Bagatur
c7b5dbe8ec infra: fix pre-release integration test and add unit test (#14742) 2023-12-14 16:57:41 -08:00
Erick Friis
480821da59 infra: docs build install community editable (#14739) 2023-12-14 16:13:09 -08:00
Bagatur
b802dd96f2 core[patch]: Release 0.1.1 (#14738) 2023-12-14 16:02:19 -08:00
William FH
9d4100f915 Revert "[Hub|tracing] Tag hub prompts" (#14735)
Reverts langchain-ai/langchain#14720
2023-12-14 14:39:58 -08:00
Bagatur
b9975fac89 infra: add action checkout to pre-release-checks (#14732) 2023-12-14 13:28:13 -08:00
Erick Friis
9fb26a2a71 community[patch]: fix pgvector sqlalchemy (#14726)
Fixes #14699
2023-12-14 13:27:30 -08:00
Bagatur
1cec0afc62 google-genai[patch]: add google-genai integration deps and extras (#14731) 2023-12-14 13:20:10 -08:00
Bagatur
ba897fc04c infra: Pre-release integration tests for partner pkgs (#14687) 2023-12-14 13:11:19 -08:00
Bagatur
74211aa02e infra: add integration test workflow (#14688) 2023-12-14 12:46:45 -08:00
Leonid Kuligin
c5c64aa863 docs: updated branding for Google AI (#14728)
Replace this entire comment with:
  - **Description:** a small fix in branding
2023-12-14 12:31:19 -08:00
Erick Friis
a86065c536 docs[patch]: fix databricks metadata (#14727) 2023-12-14 11:47:34 -08:00
Bob Lin
ff206ae30d Update google_generative_ai.ipynb (#14704) 2023-12-14 10:58:25 -08:00
William FH
852b9ca494 [Hub|tracing] Tag hub prompts (#14720)
If you're using the hub, you'll likely be interested in tracking the
commit/object when tracing. This PR adds it to the config
2023-12-14 10:04:18 -08:00
William FH
79ae6c2a9e Add dense proposals (#14719)
Indexing strategy based on decomposing candidate propositions while
indexing.
2023-12-14 09:21:45 -08:00
William FH
bc3ec78a38 [Workflows] Add nvidia-aiplay to _release.yml (#14722)
As the title says.
In the future will want to have a script to automate this
2023-12-14 09:16:40 -08:00
William FH
451c5d1d8c [Integration] NVIDIA AI Playground (#14648)
Description: Added NVIDIA AI Playground Initial support for a selection of models (Llama models, Mistral, etc.)

Dependencies: These models do depend on the AI Playground services in NVIDIA NGC. API keys with a significant amount of trial compute are available (10K queries as of the time of writing).

H/t to @VKudlay
2023-12-13 19:46:37 -08:00
William FH
1e21a3f7ed [Partner] Gemini Embeddings (#14690)
Add support for Gemini embeddings in the langchain-google-genai package
2023-12-13 17:05:31 -08:00
Lance Martin
3449fce273 Gemini multi-modal RAG template (#14678)
![Screenshot 2023-12-13 at 12 53 39
PM](https://github.com/langchain-ai/langchain/assets/122662504/a6bc3b0b-f177-4367-b9c8-b8862c847026)
2023-12-13 16:43:47 -08:00
Lance Martin
7234335a9a Template for multi-modal w/ multi-vector (#14618)
Results - 

![image](https://github.com/langchain-ai/langchain/assets/122662504/16bac14d-74d7-47b1-aed0-72ae25a81f39)
2023-12-13 16:43:14 -08:00
Bagatur
97a91d9d0d docs: api ref nav Python Docs -> Docs (#14686) 2023-12-13 15:11:09 -08:00
Leonid Ganeline
0d6471c16d docs: platform pages update (#14637)
Updated examples and platform pages.
- added missed tools
- added links and descriptions
2023-12-13 15:08:27 -08:00
Funkeke
ea99612caa community[patch]: fix dashvector endpoint params error (#14484)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Co-authored-by: fangkeke <3339698829@qq.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-13 14:38:27 -08:00
Bob Lin
dce3c74905 community[patch]: Correct type annotation for azure_ad_token_provider Closed: #14402 (#14432)
Description
Fix https://github.com/langchain-ai/langchain/issues/14402, Similar
changes: https://github.com/langchain-ai/langchain/pull/14166

Twitter handle
[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-13 14:37:39 -08:00
Fran Cirka
8a4162d15e community[patch]: Fixed issue with importing Row from sqlalchemy (#14488)
- **Description:** Fixed import of Row in cache.py, 
- **Issue:** the issue # #13464
https://creditone.us.to/langchain-ai/langchain/issues/13464,
  - **Dependencies:** None,
  - **Twitter handle:** @frankybridman

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-13 14:36:08 -08:00
Erick Friis
ab94119a53 docs[patch]: fix bullet points (#14684)
- docs fixes
- escape
- bullets
2023-12-13 14:35:19 -08:00
billytrend-cohere
7e4dbb26a8 templates[patch]: Add cohere librarian template (#14601)
Adding the example I build for the Cohere hackathon.

It can:

use a vector database to reccommend books

<img width="840" alt="image"
src="https://github.com/langchain-ai/langchain/assets/144115527/96543a18-217b-4445-ab4b-950c7cced915">

Use a prompt template to provide information about the library

<img width="834" alt="image"
src="https://github.com/langchain-ai/langchain/assets/144115527/996c8e0f-cab0-4213-bcc9-9baf84f1494b">

Use Cohere RAG to provide grounded results

<img width="822" alt="image"
src="https://github.com/langchain-ai/langchain/assets/144115527/7bb4a883-5316-41a9-9d2e-19fd49a43dcb">

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-13 14:34:44 -08:00
Bagatur
47451951a1 core[patch]: Fix runnable with message history (#14629)
Fix bug shown in #14458. Namely, that saving inputs to history fails
when the input to base runnable is a list of messages
2023-12-13 14:25:35 -08:00
Bagatur
99743539ae docs: per-package version in api docs (#14683) 2023-12-13 14:24:50 -08:00
Bagatur
d4312e2424 docs: fix api ref link (#14679)
Don't point to stable, let api docs choose default version
2023-12-13 13:42:01 -08:00
Bagatur
effd000b91 docs: build partner api refs (#14675) 2023-12-13 13:37:27 -08:00
William FH
6c031e0ebf Wfh/google docs update (#14676)
- Add gemini references
- Fix the notebook (ultra isn't generally available; also gemini will
randomly filter out responses, so added a fallback)

---------

Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
2023-12-13 13:26:53 -08:00
Bagatur
73382a579f google-genai[patch]: Release 0.0.2 (#14677) 2023-12-13 12:59:19 -08:00
Nuno Campos
a16f4a318f \Fix tool_calls message merge (#14613)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-13 12:37:40 -08:00
William FH
405d111da6 [Partner] Add langchain-google-genai package (gemini) (#14621)
Add a new ChatGoogleGenerativeAI class in a `langchain-google-genai`
package.
Still todo: add a deprecation warning in PALM

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-13 11:57:59 -08:00
Bagatur
4574749147 communty[patch]: Release 0.0.3 (#14673) 2023-12-13 11:21:00 -08:00
Erick Friis
c5250f12c2 cli[patch]: unicode issue (#14672)
Some operating systems compile template, resulting in unicode decode
errors
2023-12-13 11:14:51 -08:00
William FH
75b8891399 Update Vertex AI to include Gemini (#14670)
h/t to @lkuligin 
-  **Description:** added new models on VertexAI
  - **Twitter handle:** @lkuligin

---------

Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-13 10:45:02 -08:00
Erick Friis
858f4cbce4 cli[patch]: rc (#14667) 2023-12-13 10:00:04 -08:00
Erick Friis
231891706b infra: skip extended testing for partner packages (#14630)
Tested by merging into #14627
2023-12-13 09:58:48 -08:00
William FH
2bef45074d [Nit] Add newline in notebook (#14665)
For bullet list formatting
2023-12-13 09:46:13 -08:00
Tomaz Bratanic
ea2616ae23 Fix RRF and lucene escape characters for neo4j vector store (#14646)
* Remove Lucene special characters (fixes
https://github.com/langchain-ai/langchain/issues/14232)
* Fixes RRF normalization for hybrid search
2023-12-13 09:09:50 -08:00
Erick Friis
7e6ca3c2b9 cli[patch]: integration template (#14571) 2023-12-13 08:55:30 -08:00
William FH
db04580dfa Add Gemini Notebook (#14661) 2023-12-13 08:47:55 -08:00
James Braza
b9ef92f2f4 Fixed DeprecationWarning for PromptTemplate.from_file module-level calls (#14468)
Resolves https://github.com/langchain-ai/langchain/issues/14467
2023-12-12 17:43:27 -08:00
Chengzu Ou
df95abb7e7 docs: Add Databricks Vector Search example notebook (#14158)
This PR adds an example notebook for the Databricks Vector Search vector
store. It also adds an introduction to the Databricks Vector Search
product on the Databricks's provider page.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-12 17:40:29 -08:00
ggeutzzang
414bddd5f0 DOC: model update in 'Using OpenAI Functions' docs (#14486)
- **Description:** : 
I just update the openai functions docs to use the latest model (ex.
gpt-3.5-turbo-1106)
https://python.langchain.com/docs/modules/chains/how_to/openai_functions

The reason is as follow: 

After reviewing the OpenAI Function Calling official guide at
https://platform.openai.com/docs/guides/function-calling, the following
information was noted:

> "The latest models (gpt-3.5-turbo-1106 and gpt-4-1106-preview) have
been trained to both detect when a function should be called (depending
on the input) and to respond with JSON that adheres to the function
signature more closely than previous models. With this capability also
comes potential risks. We strongly recommend building in user
confirmation flows before taking actions that impact the world on behalf
of users (sending an email, posting something online, making a purchase,
etc)."

CC: @efriis
2023-12-12 17:31:08 -08:00
葛尧
e780433f6b Fix token_usage None issue in ChatOpenAI with local Chatglm2-6B (#14493)
When using local Chatglm2-6B by changing OPENAI_BASE_URL to localhost,
the token_usage in ChatOpenAI becomes None. This leads to an
AttributeError when trying to access token_usage.items().

This commit adds a check to ensure token_usage is not None before
accessing its items. This change prevents the AttributeError and allows
ChatOpenAI to work seamlessly with a local Chatglm2-6B model, aligning
with the way it operates with the OpenAI API.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-12 17:30:37 -08:00
Massimiliano Pronesti
6080c98108 fix(embeddings): huggingface hub embeddings and TEI (#14489)
**Description:** This PR fixes `HuggingFaceHubEmbeddings` by making the
API token optional (as in the client beneath). Most models don't require
one. I also updated the notebook for TEI (text-embeddings-inference)
accordingly as requested here #14288. In addition, I fixed a mistake in
the POST call parameters.

**Tag maintainers:** @baskaryan
2023-12-12 17:21:52 -08:00
Peter Jausovec
5da79e150b [docs]: add missing tiktoken dependency (#14497)
Description: I was following the docs and got an error about missing
tiktoken dependency. Adding it to the comment where the langchain and
docarray libs are.
2023-12-12 17:04:48 -08:00
Thomas B
b4e3e47c92 feat: Yaml output parser (#14496)
## Description
New YAML output parser as a drop-in replacement for the Pydantic output
parser. Yaml is a much more token-efficient format than JSON, proving to
be **~35% faster and using the same percentage fewer completion
tokens**.

☑️ Formatted
☑️ Linted
☑️ Tested (analogous to the existing`test_pydantic_parser.py`)

The YAML parser excels in situations where a list of objects is
required, where the root object needs no key:
```python
class Products(BaseModel):
   __root__: list[Product]
```

I ran the prompt `Generate 10 healthy, organic products` 10 times on one
chain using the `PydanticOutputParser`, the other one using
the`YamlOutputParser` with `Products` (see below) being the targeted
model to be created.

LLMs used were Fireworks' `lama-v2-34b-code-instruct` and OpenAI
`gpt-3.5-turbo`. All runs succeeded without validation errors.

```python
class Nutrition(BaseModel):
    sugar: int = Field(description="Sugar in grams")
    fat: float = Field(description="% of daily fat intake")

class Product(BaseModel):
    name: str = Field(description="Product name")
    stats: Nutrition

class Products(BaseModel):
    """A list of products"""

    products: list[Product] # Used `__root__` for the yaml chain
```
Stats after 10 runs reach were as follows:
### JSON
ø time: 7.75s
ø tokens: 380.8

### YAML
ø time: 5.12s
ø tokens: 242.2


Looking forward to feedback, tips and contributions!
2023-12-12 17:04:31 -08:00
standby24x7
d31ff30df6 docs[patch] Fix some typos in merger_retriever.ipynb (#14502)
This patch fixes some typos.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2023-12-12 17:02:45 -08:00
Mark Cusack
158dda440b Added notebook tutorial on using Yellowbrick as a vector store with LangChain (#14509)
- **Description:** a notebook documenting Yellowbrick as a vector store
usage

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
2023-12-12 16:59:05 -08:00
billytrend-cohere
0dc432aa95 Update cohere provider docs (#14528)
Preview since github wont preview .mdx : 

<img width="1401" alt="image"
src="https://github.com/langchain-ai/langchain/assets/144115527/9e8ba3d9-24ff-4584-9da3-2c9b60e7e624">
2023-12-12 16:48:46 -08:00
Bagatur
b092bfbb3c docs: update langchain diagram (#14619) 2023-12-12 16:36:15 -08:00
Bagatur
e84a350791 infra: rm community split scripts (#14633) 2023-12-12 15:46:15 -08:00
Leonid Ganeline
1bf84c3056 docs ollama pages (#14561)
added provider page; fixed broken links.
2023-12-12 15:45:06 -08:00
Shaurya Rohatgi
a4992ffada fix: to rag-semi-structured template (#14568)
**Description:** 

Fixes to rag-semi-structured template.

- Added required libraries
- pdfminer was causing issues when installing with pip. pdfminer.six
works best
- Changed the pdf name for demo from llama2 to llava


<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-12 15:44:35 -08:00
Bob Lin
a019183a01 create mypy cache dir if it doesn't exist (#14579)
### Description

When running `make lint` multiple times, i can see the error `mkdir:
.mypy_cache: File exists`. Use `mkdir -p` to solve this problem.
<img width="1512" alt="Screenshot 2023-12-12 at 11 22 01 AM"
src="https://github.com/langchain-ai/langchain/assets/10000925/1429383d-3283-4e22-8882-5693bc50b502">
2023-12-12 15:34:50 -08:00
dandanwei
e5bd88383f fix a bug in RedisNum filter againt value 0 (#14587)
- **Description:** There is a bug in RedisNum filter that filter towards
value 0 will be parsed as "*". This is a fix to it.
  - **Issue:** NA
  - **Dependencies:** NA
  - **Tag maintainer:** NA
  - **Twitter handle:** NA
2023-12-12 15:34:45 -08:00
Erick Friis
b885880344 templates[patch]: fix pydantic imports (#14632) 2023-12-12 15:31:14 -08:00
Ikko Eltociear Ashimine
945f6eb5d6 docs: update multi_modal_RAG_chroma.ipynb (#14602)
seperate -> separate

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-12 15:24:37 -08:00
Kenzie Mihardja
159b5cab16 Update Docugami Cookbook (#14626)
**Description:** Update the information in the Docugami cookbook. Fix
broken links and add information on our kg-rag template.

Co-authored-by: Kenzie Mihardja <kenzie@docugami.com>
2023-12-12 15:21:22 -08:00
Lance Martin
282362382c Minor update to ensemble retriever to handle a mix of Documents or str (#14552) 2023-12-12 15:16:49 -08:00
Bagatur
ca7da8f7ef docs: fix links in readme (#14624) 2023-12-12 12:59:09 -08:00
Bagatur
2a10cabf66 docs: core and community readme (#14623) 2023-12-12 12:52:32 -08:00
Bagatur
b72b19b593 experimental[patch]: Release 0.0.47 (#14617) 2023-12-12 11:09:39 -08:00
William FH
c32554a3e0 Add image (#14611) 2023-12-12 10:55:42 -08:00
Bagatur
57337b4862 langchain[patch]: Release 0.0.350 (#14612) 2023-12-12 10:10:34 -08:00
Bagatur
d388863a3b community[patch]: Release 0.0.2 (#14610) 2023-12-12 09:58:04 -08:00
Bagatur
5d1deddbfb core[minor]: Release 0.1.0 (#14607) 2023-12-12 09:33:11 -08:00
Harrison Chase
ad8d8f71aa allow other namespaces (#14606) 2023-12-12 09:09:59 -08:00
William FH
ce61a8ca98 Add Gmail Agent Example (#14567)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-12 08:34:28 -08:00
Eugene Yurtsev
76905aa043 Update RunnableWithMessageHistory (#14351)
This PR updates RunnableWithMessage history to support user specific
configuration for the factory.

It extends support to passing multiple named arguments into the factory
if the factory takes more than a single argument.
2023-12-11 21:34:49 -05:00
Bagatur
8a126c5d04 docs[patch]: update installation with core and community (#14577) 2023-12-11 18:31:25 -08:00
Harutaka Kawamura
b54a1a3ef1 docs[patch]: Fix embeddings example for Databricks (#14576)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

Fix `from langchain.llms import DatabricksEmbeddings` to `from
langchain.embeddings import DatabricksEmbeddings`.

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
2023-12-11 16:55:23 -08:00
Bagatur
9ffca3b92a docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following

```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook}  | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'


```
2023-12-11 16:49:10 -08:00
Erick Friis
0a9d933bb2 infra: import checking bugfix (#14569) 2023-12-11 15:53:51 -08:00
Bagatur
8bdaf55e92 experimental[patch]: Release 0.0.46 (#14572) 2023-12-11 15:46:14 -08:00
Bagatur
14bfc5f9f4 langchain[patch]: Release 0.0.349 (#14570) 2023-12-11 15:30:14 -08:00
Erick Friis
482e2b94fa infra: import CI speed (#14566)
Was taking 10 mins. Now a few seconds.
2023-12-11 15:19:21 -08:00
Bagatur
6a828e60ee community[patch]: Release 0.0.1 (#14565) 2023-12-11 15:18:55 -08:00
Erick Friis
5418d8bfd6 infra: import CI fix (#14562)
TIL `**` globstar doesn't work in make

Makefile changes fix that.

`__getattr__` changes allow import of all files, but raise error when
accessing anything from the module.

file deletions were corresponding libs change from #14559
2023-12-11 14:59:10 -08:00
Bagatur
48b7a0584d infra: Turn release branch check back on (#14563) 2023-12-11 14:40:24 -08:00
Bagatur
9cb128e6e2 core[patch]: Release 0.0.13 (#14558) 2023-12-11 14:36:28 -08:00
Bagatur
a844b495c4 community[patch]: Fix agenttoolkits imports (#14559) 2023-12-11 14:19:25 -08:00
Nuno Campos
3b5b0f16c6 Move runnable context to beta (#14507)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-11 13:58:30 -08:00
Bagatur
ed58eeb9c5 community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463)
Moved the following modules to new package langchain-community in a backwards compatible fashion:

```
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
```

Moved the following to core
```
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
```

See .scripts/community_split/script_integrations.sh for all changes
2023-12-11 13:53:30 -08:00
Eugene Yurtsev
c0f4b95aa9 RunnableWithMessageHistory: Fix input schema (#14516)
Input schema should not have history key
2023-12-10 23:33:02 -05:00
Leonid Ganeline
d9bfdc95ea docs[patch]: google platform page update (#14475)
Added missed tools

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 17:40:44 -08:00
Leonid Ganeline
2fa81739b6 docs[patch]: microsoft platform page update (#14476)
Added `presidio` and `OneNote` references to `microsoft.mdx`; added link
and description to the `presidio` notebook

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 17:40:30 -08:00
Yelin Zhang
84a57f5350 docs[patch]: add missing imports for local_llms (#14453)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Keeping it consistent with everywhere else in the docs and adding the
missing imports to be able to copy paste and run the code example.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-08 17:23:29 -08:00
Harrison Chase
f5befe3b89 manual mapping (#14422) 2023-12-08 16:29:33 -08:00
Erick Friis
c24f277b7c langchain[patch], docs[patch]: use byte store in multivectorretriever (#14474) 2023-12-08 16:26:11 -08:00
Leonid Ganeline
1ef13661b9 docs[patch]: link and description cleanup (#14471)
Fixed inconsistencies; added links and descriptions

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2023-12-08 15:24:38 -08:00
Lance Martin
6fbfc375b9 Update README and vectorstore path for multi-modal template (#14473) 2023-12-08 15:24:05 -08:00
Anish Nag
6da0cfea0e experimental[patch]: SmartLLMChain Output Key Customization (#14466)
**Description**
The `SmartLLMChain` was was fixed to output key "resolution".
Unfortunately, this prevents the ability to use multiple `SmartLLMChain`
in a `SequentialChain` because of colliding output keys. This change
simply gives the option the customize the output key to allow for
sequential chaining. The default behavior is the same as the current
behavior.

Now, it's possible to do the following:
```
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_experimental.smart_llm import SmartLLMChain
from langchain.chains import SequentialChain

joke_prompt = PromptTemplate(
    input_variables=["content"],
    template="Tell me a joke about {content}.",
)
review_prompt = PromptTemplate(
    input_variables=["scale", "joke"],
    template="Rate the following joke from 1 to {scale}: {joke}"
)

llm = ChatOpenAI(temperature=0.9, model_name="gpt-4-32k")
joke_chain = SmartLLMChain(llm=llm, prompt=joke_prompt, output_key="joke")
review_chain = SmartLLMChain(llm=llm, prompt=review_prompt, output_key="review")

chain = SequentialChain(
    chains=[joke_chain, review_chain],
    input_variables=["content", "scale"],
    output_variables=["review"],
    verbose=True
)
response = chain.run({"content": "chickens", "scale": "10"})
print(response)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-08 13:55:51 -08:00
Leonid Ganeline
0797358c1b docs networkxupdate (#14426)
Added setting up instruction, package description and link
2023-12-08 13:39:50 -08:00
Bagatur
300305e5e5 infra: add langchain-community release workflow (#14469) 2023-12-08 13:31:15 -08:00
Ben Flast
b32fcb550d Update mongodb_atlas docs for GA (#14425)
Updated the MongoDB Atlas Vector Search docs to indicate the service is
Generally Available, updated the example to use the new index
definition, and added an example that uses metadata pre-filtering for
semantic search

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-08 13:23:01 -08:00
Erick Friis
b3f226e8f8 core[patch], langchain[patch], experimental[patch]: import CI (#14414) 2023-12-08 11:28:55 -08:00
Leonid Ganeline
ba083887e5 docs Dependents updated statistics (#14461)
Updated statistics for the dependents (packages dependent on `langchain`
package). Only packages with 100+ starts
2023-12-08 14:14:41 -05:00
Eugene Yurtsev
37bee92b8a Use deepcopy in RunLogPatch (#14244)
This PR adds deepcopy usage in RunLogPatch.

I included a unit-test that shows an issue that was caused in LangServe
in the RemoteClient.

```python
import jsonpatch

s1 = {}
s2 = {'value': []}
s3 = {'value': ['a']}

ops0 = list(jsonpatch.JsonPatch.from_diff(None, s1))
ops1 = list(jsonpatch.JsonPatch.from_diff(s1, s2))
ops2 = list(jsonpatch.JsonPatch.from_diff(s2, s3))
ops = ops0 + ops1 + ops2

jsonpatch.apply_patch(None, ops)
{'value': ['a']}

jsonpatch.apply_patch(None, ops)
{'value': ['a', 'a']}

jsonpatch.apply_patch(None, ops)
{'value': ['a', 'a', 'a']}
```
2023-12-08 14:09:36 -05:00
Erick Friis
1d7e5c51aa langchain[patch]: xfail unstable vertex test (#14462) 2023-12-08 11:00:37 -08:00
Erick Friis
477b274a62 langchain[patch]: fix scheduled testing ci dep install (#14460) 2023-12-08 10:37:44 -08:00
Harrison Chase
02ee0073cf revoke serialization (#14456) 2023-12-08 10:31:05 -08:00
Erick Friis
ff0d5514c1 langchain[patch]: fix scheduled testing ci variables (#14459) 2023-12-08 10:27:21 -08:00
Erick Friis
1d725327eb langchain[patch]: Fix scheduled testing (#14428)
- integration tests in pyproject
- integration test fixes
2023-12-08 10:23:02 -08:00
Harrison Chase
7be3eb6fbd fix imports from core (#14430) 2023-12-08 09:33:35 -08:00
Leonid Ganeline
a05230a4ba docs[patch]: promptlayer pages update (#14416)
Updated provider page by adding LLM and ChatLLM references; removed a
content that is duplicate text from the LLM referenced page.
Updated the collback page
2023-12-07 15:48:10 -08:00
Leonid Ganeline
18aba7fdef docs: notebook linting (#14366)
Many jupyter notebooks didn't pass linting. List of these files are
presented in the [tool.ruff.lint.per-file-ignores] section of the
pyproject.toml . Addressed these bugs:
- fixed bugs; added missed imports; updated pyproject.toml
 Only the `document_loaders/tensorflow_datasets.ipyn`,
`cookbook/gymnasium_agent_simulation.ipynb` are not completely fixed.
I'm not sure about imports.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-07 15:47:48 -08:00
2974 changed files with 171863 additions and 103352 deletions

View File

@@ -3,31 +3,17 @@
Hi there! Thank you for even being interested in contributing to LangChain.
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.
To learn about how to contribute, please follow the [guides here](https://python.langchain.com/docs/contributing/)
## 🗺️ Guidelines
### 👩‍💻 Contributing Code
### 👩‍💻 Ways to contribute
To contribute to this project, please follow the ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are a maintainer.
There are many ways to contribute to LangChain. Here are some common ways people contribute:
Please follow the checked-in pull request template when opening pull requests. Note related issues and tag relevant
maintainers.
Pull requests cannot land without passing the formatting, linting, and testing checks first. See [Testing](#testing) and
[Formatting and Linting](#formatting-and-linting) for how to run these checks locally.
It's essential that we maintain great documentation and testing. If you:
- Fix a bug
- Add a relevant unit or integration test when possible. These live in `tests/unit_tests` and `tests/integration_tests`.
- Make an improvement
- Update any affected example notebooks and documentation. These live in `docs`.
- Update unit and integration tests when relevant.
- Add a feature
- Add a demo notebook in `docs/docs/`.
- Add unit and integration tests.
We are a small, progress-oriented team. If there's something you'd like to add or change, opening a pull request is the
best way to get our attention.
- [**Documentation**](https://python.langchain.com/docs/contributing/documentation): Help improve our docs, including this one!
- [**Code**](https://python.langchain.com/docs/contributing/code): Help us write code, fix bugs, or improve our infrastructure.
- [**Integrations**](https://python.langchain.com/docs/contributing/integration): Help us integrate with your favorite vendors and tools.
### 🚩GitHub Issues
@@ -54,291 +40,6 @@ In a similar vein, we do enforce certain linting, formatting, and documentation
If you are finding these difficult (or even just annoying) to work with, feel free to contact a maintainer for help -
we do not want these to get in the way of getting good code into the codebase.
## 🚀 Quick Start
### Contributor Documentation
This quick start guide explains how to run the repository locally.
For a [development container](https://containers.dev/), see the [.devcontainer folder](https://github.com/langchain-ai/langchain/tree/master/.devcontainer).
### Dependency Management: Poetry and other env/dependency managers
This project utilizes [Poetry](https://python-poetry.org/) v1.6.1+ as a dependency manager.
❗Note: *Before installing Poetry*, if you use `Conda`, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
Install Poetry: **[documentation on how to install it](https://python-poetry.org/docs/#installation)**.
❗Note: If you use `Conda` or `Pyenv` as your environment/package manager, after installing Poetry,
tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
### Core vs. Experimental
This repository contains three separate projects:
- `langchain`: core langchain code, abstractions, and use cases.
- `langchain_core`: contain interfaces for key abstractions as well as logic for combining them in chains (LCEL).
- `langchain_experimental`: see the [Experimental README](https://github.com/langchain-ai/langchain/tree/master/libs/experimental/README.md) for more information.
Each of these has its own development environment. Docs are run from the top-level makefile, but development
is split across separate test & release flows.
For this quickstart, start with langchain core:
```bash
cd libs/langchain
```
### Local Development Dependencies
Install langchain development requirements (for running langchain, running examples, linting, formatting, tests, and coverage):
```bash
poetry install --with test
```
Then verify dependency installation:
```bash
make test
```
If the tests don't pass, you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.
If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running
Poetry v1.6.1+. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases.
If you are still seeing this bug on v1.6.1, you may also try disabling "modern installation"
(`poetry config installer.modern-installation false`) and re-installing requirements.
See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
### Testing
_some test dependencies are optional; see section about optional dependencies_.
Unit tests cover modular logic that does not require calls to outside APIs.
If you add new logic, please add a unit test.
To run unit tests:
```bash
make test
```
To run unit tests in Docker:
```bash
make docker_tests
```
There are also [integration tests and code-coverage](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests/README.md) available.
### Only develop langchain_core or langchain_experimental
If you are only developing `langchain_core` or `langchain_experimental`, you can simply install the dependencies for the respective projects and run tests:
```bash
cd libs/core
poetry install --with test
make test
```
Or:
```bash
cd libs/experimental
poetry install --with test
make test
```
### Formatting and Linting
Run these locally before submitting a PR; the CI system will check also.
#### Code Formatting
Formatting for this project is done via [ruff](https://docs.astral.sh/ruff/rules/).
To run formatting for docs, cookbook and templates:
```bash
make format
```
To run formatting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make format
```
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
```bash
make format_diff
```
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
#### Linting
Linting for this project is done via a combination of [ruff](https://docs.astral.sh/ruff/rules/) and [mypy](http://mypy-lang.org/).
To run linting for docs, cookbook and templates:
```bash
make lint
```
To run linting for a library, run the same command from the relevant library directory:
```bash
cd libs/{LIBRARY}
make lint
```
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
```bash
make lint_diff
```
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
#### Spellcheck
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
Note that `codespell` finds common typos, so it could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
To check spelling for this project:
```bash
make spell_check
```
To fix spelling in place:
```bash
make spell_fix
```
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
```python
[tool.codespell]
...
# Add here:
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
```
## Working with Optional Dependencies
Langchain relies heavily on optional dependencies to keep the Langchain package lightweight.
You only need to add a new dependency if a **unit test** relies on the package.
If your package is only required for **integration tests**, then you can skip these
steps and leave all pyproject.toml and poetry.lock files alone.
If you're adding a new dependency to Langchain, assume that it will be an optional dependency, and
that most users won't have it installed.
Users who do not have the dependency installed should be able to **import** your code without
any side effects (no warnings, no errors, no exceptions).
To introduce the dependency to the pyproject.toml file correctly, please do the following:
1. Add the dependency to the main group as an optional dependency
```bash
poetry add --optional [package_name]
```
2. Open pyproject.toml and add the dependency to the `extended_testing` extra
3. Relock the poetry file to update the extra.
```bash
poetry lock --no-update
```
4. Add a unit test that the very least attempts to import the new code. Ideally, the unit
test makes use of lightweight fixtures to test the logic of the code.
5. Please use the `@pytest.mark.requires(package_name)` decorator for any tests that require the dependency.
## Adding a Jupyter Notebook
If you are adding a Jupyter Notebook example, you'll want to install the optional `dev` dependencies.
To install dev dependencies:
```bash
poetry install --with dev
```
Launch a notebook:
```bash
poetry run jupyter notebook
```
When you run `poetry install`, the `langchain` package is installed as editable in the virtualenv, so your new logic can be imported into the notebook.
## Documentation
While the code is split between `langchain` and `langchain.experimental`, the documentation is one holistic thing.
This covers how to get started contributing to documentation.
From the top-level of this repo, install documentation dependencies:
```bash
poetry install
```
### Contribute Documentation
The docs directory contains Documentation and API Reference.
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
For that reason, we ask that you add good documentation to all classes and methods.
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Build Documentation Locally
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
Before building the documentation, it is always a good idea to clean the build directory:
```bash
make docs_clean
make api_docs_clean
```
Next, you can build the documentation as outlined below:
```bash
make docs_build
make api_docs_build
```
Finally, run the link checker to ensure all links are valid:
```bash
make docs_linkcheck
make api_docs_linkcheck
```
### Verify Documentation changes
After pushing documentation changes to the repository, you can preview and verify that the changes are
what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
This will take you to a preview of the documentation changes.
This preview is created by [Vercel](https://vercel.com/docs/getting-started-with-vercel).
## 🏭 Release Process
As of now, LangChain has an ad hoc release process: releases are cut with high frequency by
a developer and published to [PyPI](https://pypi.org/project/langchain/).
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
### 🌟 Recognition
If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
If you have a Twitter account you would like us to mention, please let us know in the PR or through another means.
To learn about how to contribute, please follow the [guides here](https://python.langchain.com/docs/contributing/)

View File

@@ -27,4 +27,4 @@ body:
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md)
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the [Contributing Guide](https://python.langchain.com/docs/contributing/)

View File

@@ -1,20 +1,20 @@
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is whichever of langchain, community, core, experimental, etc. is being modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use. It lives in `docs/extras` directory.
2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17.
-->

View File

@@ -26,7 +26,7 @@ inputs:
runs:
using: composite
steps:
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
name: Setup python ${{ inputs.python-version }}
with:
python-version: ${{ inputs.python-version }}

View File

@@ -1,18 +1,22 @@
import json
import sys
import os
ALL_DIRS = {
LANGCHAIN_DIRS = {
"libs/core",
"libs/langchain",
"libs/experimental",
"libs/community",
"libs/partners/openai",
}
if __name__ == "__main__":
files = sys.argv[1:]
dirs_to_run = set()
if len(files) == 300:
# max diff length is 300 files - there are likely files missing
raise ValueError("Max diff reached. Please manually run CI on changed libs.")
for file in files:
if any(
file.startswith(dir_)
@@ -24,24 +28,29 @@ if __name__ == "__main__":
".github/scripts/check_diff.py",
)
):
dirs_to_run = ALL_DIRS
break
dirs_to_run.update(LANGCHAIN_DIRS)
elif "libs/community" in file:
dirs_to_run.update(
("libs/community", "libs/langchain", "libs/experimental")
)
elif "libs/partners" in file:
partner_dir = file.split("/")[2]
dirs_to_run.update(
(f"libs/partners/{partner_dir}", "libs/langchain", "libs/experimental")
)
if os.path.isdir(f"libs/partners/{partner_dir}"):
dirs_to_run.update(
(
f"libs/partners/{partner_dir}",
"libs/langchain",
"libs/experimental",
)
)
# Skip if the directory was deleted
elif "libs/langchain" in file:
dirs_to_run.update(("libs/langchain", "libs/experimental"))
elif "libs/experimental" in file:
dirs_to_run.add("libs/experimental")
elif file.startswith("libs/"):
dirs_to_run = ALL_DIRS
break
dirs_to_run.update(LANGCHAIN_DIRS)
else:
pass
print(json.dumps(list(dirs_to_run)))
json_output = json.dumps(list(dirs_to_run))
print(f"dirs-to-run={json_output}")

View File

@@ -18,6 +18,7 @@ on:
- libs/langchain
- libs/core
- libs/experimental
- libs/community
# If another push to the same PR or branch happens while this workflow is still running,
@@ -52,8 +53,8 @@ jobs:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
pydantic-compatibility:
uses: ./.github/workflows/_pydantic_compatibility.yml
dependencies:
uses: ./.github/workflows/_dependencies.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
@@ -71,6 +72,7 @@ jobs:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
if: ${{ ! startsWith(inputs.working-directory, 'libs/partners/') }}
steps:
- uses: actions/checkout@v4

View File

@@ -1,4 +1,4 @@
name: pydantic v1/v2 compatibility
name: dependencies
on:
workflow_call:
@@ -28,7 +28,7 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: Pydantic v1/v2 compatibility - Python ${{ matrix.python-version }}
name: dependencies - Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4
@@ -42,7 +42,15 @@ jobs:
- name: Install dependencies
shell: bash
run: poetry install --with test
run: poetry install
- name: Check imports with base dependencies
shell: bash
run: poetry run make check_imports
- name: Install test dependencies
shell: bash
run: poetry install --with test
- name: Install langchain editable
working-directory: ${{ inputs.working-directory }}

67
.github/workflows/_integration_test.yml vendored Normal file
View File

@@ -0,0 +1,67 @@
name: Integration tests
on:
workflow_dispatch:
inputs:
working-directory:
required: true
type: string
env:
POETRY_VERSION: "1.6.1"
jobs:
build:
defaults:
run:
working-directory: ${{ inputs.working-directory }}
runs-on: ubuntu-latest
strategy:
matrix:
python-version:
- "3.8"
- "3.11"
name: Python ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: core
- name: Install dependencies
shell: bash
run: poetry install --with test,test_integration
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: google-github-actions/auth@v2
with:
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
- name: Run integration tests
shell: bash
env:
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
make integration_tests
- name: Ensure the tests did not create any additional files
shell: bash
run: |
set -eu
STATUS="$(git status)"
echo "$STATUS"
# grep will exit non-zero if the target message isn't found,
# and `set -e` above will cause the step to fail.
echo "$STATUS" | grep 'nothing to commit, working tree clean'

View File

@@ -1,5 +1,5 @@
name: release
run-name: Release ${{ inputs.working-directory }} by @${{ github.actor }}
on:
workflow_call:
inputs:
@@ -7,6 +7,12 @@ on:
required: true
type: string
description: "From which folder this pipeline executes"
workflow_dispatch:
inputs:
working-directory:
required: true
type: string
default: 'libs/langchain'
env:
PYTHON_VERSION: "3.10"
@@ -76,6 +82,8 @@ jobs:
- test-pypi-publish
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# We explicitly *don't* set up caching here. This ensures our tests are
# maximally sensitive to catching breakage.
#
@@ -88,12 +96,17 @@ jobs:
# - Tests pass, because the dependency is present even though it wasn't specified.
# - The package is published, and it breaks on the missing dependency when
# used in the real world.
- uses: actions/setup-python@v4
- name: Set up Python + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
- name: Test published package
- name: Import published package
shell: bash
working-directory: ${{ inputs.working-directory }}
env:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
@@ -105,9 +118,8 @@ jobs:
# (https://test.pypi.org/simple). This will include the PKG_NAME==VERSION
# package because VERSION will not have been uploaded to regular PyPI yet.
#
# TODO: add more in-depth pre-publish tests after testing that importing works
run: |
pip install \
poetry run pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION"
@@ -115,7 +127,51 @@ jobs:
# since that's how Python imports packages with dashes in the name.
IMPORT_NAME="$(echo "$PKG_NAME" | sed s/-/_/g)"
python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
poetry run python -c "import $IMPORT_NAME; print(dir($IMPORT_NAME))"
- name: Import test dependencies
run: poetry install --with test,test_integration
working-directory: ${{ inputs.working-directory }}
# Overwrite the local version of the package with the test PyPI version.
- name: Import published package (again)
working-directory: ${{ inputs.working-directory }}
shell: bash
env:
PKG_NAME: ${{ needs.build.outputs.pkg-name }}
VERSION: ${{ needs.build.outputs.version }}
run: |
poetry run pip install \
--extra-index-url https://test.pypi.org/simple/ \
"$PKG_NAME==$VERSION"
- name: Run unit tests
run: make tests
working-directory: ${{ inputs.working-directory }}
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: google-github-actions/auth@v2
with:
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
- name: Run integration tests
if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
env:
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}
- name: Run unit tests with minimum dependency versions
if: ${{ (inputs.working-directory == 'libs/langchain') || (inputs.working-directory == 'libs/community') || (inputs.working-directory == 'libs/experimental') }}
run: |
poetry run pip install -r _test_minimum_requirements.txt
make tests
working-directory: ${{ inputs.working-directory }}
publish:
needs:

View File

@@ -5,11 +5,6 @@ on:
push:
branches: [master]
pull_request:
paths:
- ".github/actions/**"
- ".github/tools/**"
- ".github/workflows/**"
- "libs/**"
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
@@ -26,13 +21,14 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- id: files
uses: Ana06/get-changed-files@v2.2.0
- id: set-matrix
run: echo "dirs-to-run=$(python .github/scripts/check_diff.py ${{ steps.files.outputs.all }})" >> $GITHUB_OUTPUT
run: |
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
outputs:
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
ci:

View File

@@ -36,7 +36,7 @@ jobs:
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: 'google-github-actions/auth@v1'
uses: google-github-actions/auth@v2
with:
credentials_json: '${{ secrets.GOOGLE_CREDENTIALS }}'
@@ -52,13 +52,7 @@ jobs:
shell: bash
run: |
echo "Running scheduled tests, installing dependencies with poetry..."
poetry install --with=test_integration
poetry run pip install google-cloud-aiplatform
poetry run pip install "boto3>=1.28.57"
if [[ ${{ matrix.python-version }} != "3.8" ]]
then
poetry run pip install fireworks-ai
fi
poetry install --with=test_integration,test
- name: Run tests
shell: bash
@@ -68,7 +62,9 @@ jobs:
AZURE_OPENAI_API_VERSION: ${{ secrets.AZURE_OPENAI_API_VERSION }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_DEPLOYMENT_NAME }}
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_CHAT_DEPLOYMENT_NAME }}
AZURE_OPENAI_LLM_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_LLM_DEPLOYMENT_NAME }}
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME: ${{ secrets.AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
run: |
make scheduled_tests

View File

@@ -10,9 +10,10 @@ build:
tools:
python: "3.11"
commands:
- python -mvirtualenv $READTHEDOCS_VIRTUALENV_PATH
- python -m virtualenv $READTHEDOCS_VIRTUALENV_PATH
- python -m pip install --upgrade --no-cache-dir pip setuptools
- python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
- python -m pip install ./libs/partners/*
- python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
- python docs/api_reference/create_api_rst.py
- cat docs/api_reference/conf.py

View File

@@ -1,9 +0,0 @@
"""Main entrypoint into package."""
from importlib import metadata
try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = ""
del metadata # optional, avoids polluting the results of dir(__package__)

View File

@@ -1,123 +0,0 @@
"""Agent toolkits contain integrations with various resources and services.
LangChain has a large ecosystem of integrations with various external resources
like local and remote file systems, APIs and databases.
These integrations allow developers to create versatile applications that combine the
power of LLMs with the ability to access, interact with and manipulate external
resources.
When developing an application, developers should inspect the capabilities and
permissions of the tools that underlie the given agent toolkit, and determine
whether permissions of the given toolkit are appropriate for the application.
See [Security](https://python.langchain.com/docs/security) for more information.
"""
from pathlib import Path
from typing import Any
from langchain_core._api.path import as_import_path
from langchain_community.agent_toolkits.ainetwork.toolkit import AINetworkToolkit
from langchain_community.agent_toolkits.amadeus.toolkit import AmadeusToolkit
from langchain_community.agent_toolkits.azure_cognitive_services import (
AzureCognitiveServicesToolkit,
)
from langchain_community.agent_toolkits.conversational_retrieval.openai_functions import ( # noqa: E501
create_conversational_retrieval_agent,
)
from langchain_community.agent_toolkits.file_management.toolkit import (
FileManagementToolkit,
)
from langchain_community.agent_toolkits.gmail.toolkit import GmailToolkit
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
from langchain_community.agent_toolkits.json.base import create_json_agent
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
from langchain_community.agent_toolkits.multion.toolkit import MultionToolkit
from langchain_community.agent_toolkits.nasa.toolkit import NasaToolkit
from langchain_community.agent_toolkits.nla.toolkit import NLAToolkit
from langchain_community.agent_toolkits.office365.toolkit import O365Toolkit
from langchain_community.agent_toolkits.openapi.base import create_openapi_agent
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
from langchain_community.agent_toolkits.playwright.toolkit import (
PlayWrightBrowserToolkit,
)
from langchain_community.agent_toolkits.powerbi.base import create_pbi_agent
from langchain_community.agent_toolkits.powerbi.chat_base import create_pbi_chat_agent
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.agent_toolkits.slack.toolkit import SlackToolkit
from langchain_community.agent_toolkits.spark_sql.base import create_spark_sql_agent
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.agent_toolkits.steam.toolkit import SteamToolkit
from langchain_community.agent_toolkits.vectorstore.base import (
create_vectorstore_agent,
create_vectorstore_router_agent,
)
from langchain_community.agent_toolkits.vectorstore.toolkit import (
VectorStoreInfo,
VectorStoreRouterToolkit,
VectorStoreToolkit,
)
from langchain_community.agent_toolkits.zapier.toolkit import ZapierToolkit
from langchain_community.tools.retriever import create_retriever_tool
DEPRECATED_AGENTS = [
"create_csv_agent",
"create_pandas_dataframe_agent",
"create_xorbits_agent",
"create_python_agent",
"create_spark_dataframe_agent",
]
def __getattr__(name: str) -> Any:
"""Get attr name."""
if name in DEPRECATED_AGENTS:
relative_path = as_import_path(Path(__file__).parent, suffix=name)
old_path = "langchain." + relative_path
new_path = "langchain_experimental." + relative_path
raise ImportError(
f"{name} has been moved to langchain experimental. "
"See https://github.com/langchain-ai/langchain/discussions/11680"
"for more information.\n"
f"Please update your import statement from: `{old_path}` to `{new_path}`."
)
raise AttributeError(f"{name} does not exist")
__all__ = [
"AINetworkToolkit",
"AmadeusToolkit",
"AzureCognitiveServicesToolkit",
"FileManagementToolkit",
"GmailToolkit",
"JiraToolkit",
"JsonToolkit",
"MultionToolkit",
"NasaToolkit",
"NLAToolkit",
"O365Toolkit",
"OpenAPIToolkit",
"PlayWrightBrowserToolkit",
"PowerBIToolkit",
"SlackToolkit",
"SteamToolkit",
"SQLDatabaseToolkit",
"SparkSQLToolkit",
"VectorStoreInfo",
"VectorStoreRouterToolkit",
"VectorStoreToolkit",
"ZapierToolkit",
"create_json_agent",
"create_openapi_agent",
"create_pbi_agent",
"create_pbi_chat_agent",
"create_spark_sql_agent",
"create_sql_agent",
"create_vectorstore_agent",
"create_vectorstore_router_agent",
"create_conversational_retrieval_agent",
"create_retriever_tool",
]

View File

@@ -1,88 +0,0 @@
from __future__ import annotations
from typing import Any, List, Optional, TYPE_CHECKING
from langchain_core.language_models import BaseLanguageModel
from langchain_core.memory import BaseMemory
from langchain_core.messages import SystemMessage
from langchain_core.prompts.chat import MessagesPlaceholder
from langchain_core.tools import BaseTool
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def _get_default_system_message() -> SystemMessage:
return SystemMessage(
content=(
"Do your best to answer the questions. "
"Feel free to use any tools available to look up "
"relevant information, only if necessary"
)
)
def create_conversational_retrieval_agent(
llm: BaseLanguageModel,
tools: List[BaseTool],
remember_intermediate_steps: bool = True,
memory_key: str = "chat_history",
system_message: Optional[SystemMessage] = None,
verbose: bool = False,
max_token_limit: int = 2000,
**kwargs: Any,
) -> AgentExecutor:
"""A convenience method for creating a conversational retrieval agent.
Args:
llm: The language model to use, should be ChatOpenAI
tools: A list of tools the agent has access to
remember_intermediate_steps: Whether the agent should remember intermediate
steps or not. Intermediate steps refer to prior action/observation
pairs from previous questions. The benefit of remembering these is if
there is relevant information in there, the agent can use it to answer
follow up questions. The downside is it will take up more tokens.
memory_key: The name of the memory key in the prompt.
system_message: The system message to use. By default, a basic one will
be used.
verbose: Whether or not the final AgentExecutor should be verbose or not,
defaults to False.
max_token_limit: The max number of tokens to keep around in memory.
Defaults to 2000.
Returns:
An agent executor initialized appropriately
"""
from langchain.agents.agent import AgentExecutor
from langchain.agents.openai_functions_agent.agent_token_buffer_memory import (
AgentTokenBufferMemory,
)
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.memory.token_buffer import ConversationTokenBufferMemory
if remember_intermediate_steps:
memory: BaseMemory = AgentTokenBufferMemory(
memory_key=memory_key, llm=llm, max_token_limit=max_token_limit
)
else:
memory = ConversationTokenBufferMemory(
memory_key=memory_key,
return_messages=True,
output_key="output",
llm=llm,
max_token_limit=max_token_limit,
)
_system_message = system_message or _get_default_system_message()
prompt = OpenAIFunctionsAgent.create_prompt(
system_message=_system_message,
extra_prompt_messages=[MessagesPlaceholder(variable_name=memory_key)],
)
agent = OpenAIFunctionsAgent(llm=llm, tools=tools, prompt=prompt)
return AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=verbose,
return_intermediate_steps=remember_intermediate_steps,
**kwargs,
)

View File

@@ -1,53 +0,0 @@
"""Json agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.json.prompt import JSON_PREFIX, JSON_SUFFIX
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_json_agent(
llm: BaseLanguageModel,
toolkit: JsonToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = JSON_PREFIX,
suffix: str = JSON_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a json agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,57 +0,0 @@
"""Tool for interacting with a single API with natural language definition."""
from __future__ import annotations
from typing import Any, Optional, TYPE_CHECKING
from langchain_core.language_models import BaseLanguageModel
from langchain_core.tools import Tool
from langchain_community.tools.openapi.utils.api_models import APIOperation
from langchain_community.tools.openapi.utils.openapi_utils import OpenAPISpec
from langchain_community.utilities.requests import Requests
if TYPE_CHECKING:
from langchain.chains.api.openapi.chain import OpenAPIEndpointChain
class NLATool(Tool):
"""Natural Language API Tool."""
@classmethod
def from_open_api_endpoint_chain(
cls, chain: OpenAPIEndpointChain, api_title: str
) -> "NLATool":
"""Convert an endpoint chain to an API endpoint tool."""
expanded_name = (
f'{api_title.replace(" ", "_")}.{chain.api_operation.operation_id}'
)
description = (
f"I'm an AI from {api_title}. Instruct what you want,"
" and I'll assist via an API with description:"
f" {chain.api_operation.description}"
)
return cls(name=expanded_name, func=chain.run, description=description)
@classmethod
def from_llm_and_method(
cls,
llm: BaseLanguageModel,
path: str,
method: str,
spec: OpenAPISpec,
requests: Optional[Requests] = None,
verbose: bool = False,
return_intermediate_steps: bool = False,
**kwargs: Any,
) -> "NLATool":
"""Instantiate the tool from the specified path and method."""
api_operation = APIOperation.from_openapi_spec(spec, path, method)
chain = OpenAPIEndpointChain.from_api_operation(
api_operation,
llm,
requests=requests,
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
**kwargs,
)
return cls.from_open_api_endpoint_chain(chain, spec.info.title)

View File

@@ -1,77 +0,0 @@
"""OpenAPI spec agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.openapi.prompt import (
OPENAPI_PREFIX,
OPENAPI_SUFFIX,
)
from langchain_community.agent_toolkits.openapi.toolkit import OpenAPIToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_openapi_agent(
llm: BaseLanguageModel,
toolkit: OpenAPIToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = OPENAPI_PREFIX,
suffix: str = OPENAPI_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
return_intermediate_steps: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct an OpenAPI agent from an LLM and tools.
*Security Note*: When creating an OpenAPI agent, check the permissions
and capabilities of the underlying toolkit.
For example, if the default implementation of OpenAPIToolkit
uses the RequestsToolkit which contains tools to make arbitrary
network requests against any URL (e.g., GET, POST, PATCH, PUT, DELETE),
Control access to who can submit issue requests using this toolkit and
what network access it has.
See https://python.langchain.com/docs/security for more information.
"""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
return_intermediate_steps=return_intermediate_steps,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,366 +0,0 @@
"""Agent that interacts with OpenAPI APIs via a hierarchical planning approach."""
import json
import re
from functools import partial
from typing import Any, Callable, Dict, List, Optional, TYPE_CHECKING
import yaml
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.prompts import BasePromptTemplate, PromptTemplate
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool, Tool
from langchain_openai.llms import OpenAI
from langchain_community.agent_toolkits.openapi.planner_prompt import (
API_CONTROLLER_PROMPT,
API_CONTROLLER_TOOL_DESCRIPTION,
API_CONTROLLER_TOOL_NAME,
API_ORCHESTRATOR_PROMPT,
API_PLANNER_PROMPT,
API_PLANNER_TOOL_DESCRIPTION,
API_PLANNER_TOOL_NAME,
PARSING_DELETE_PROMPT,
PARSING_GET_PROMPT,
PARSING_PATCH_PROMPT,
PARSING_POST_PROMPT,
PARSING_PUT_PROMPT,
REQUESTS_DELETE_TOOL_DESCRIPTION,
REQUESTS_GET_TOOL_DESCRIPTION,
REQUESTS_PATCH_TOOL_DESCRIPTION,
REQUESTS_POST_TOOL_DESCRIPTION,
REQUESTS_PUT_TOOL_DESCRIPTION,
)
from langchain_community.agent_toolkits.openapi.spec import ReducedOpenAPISpec
from langchain_community.output_parsers.json import parse_json_markdown
from langchain_community.tools.requests.tool import BaseRequestsTool
from langchain_community.utilities.requests import RequestsWrapper
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
from langchain.memory import ReadOnlySharedMemory
#
# Requests tools with LLM-instructed extraction of truncated responses.
#
# Of course, truncating so bluntly may lose a lot of valuable
# information in the response.
# However, the goal for now is to have only a single inference step.
MAX_RESPONSE_LENGTH = 5000
"""Maximum length of the response to be returned."""
def _get_default_llm_chain(prompt: BasePromptTemplate) -> LLMChain:
from langchain.chains.llm import LLMChain
return LLMChain(
llm=OpenAI(),
prompt=prompt,
)
def _get_default_llm_chain_factory(
prompt: BasePromptTemplate,
) -> Callable[[], LLMChain]:
"""Returns a default LLMChain factory."""
return partial(_get_default_llm_chain, prompt)
class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests GET tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_get"
"""Tool name."""
description = REQUESTS_GET_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_GET_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
data_params = data.get("params")
response = self.requests_wrapper.get(data["url"], params=data_params)
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests POST tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_post"
"""Tool name."""
description = REQUESTS_POST_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_POST_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.post(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests PATCH tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_patch"
"""Tool name."""
description = REQUESTS_PATCH_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PATCH_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.patch(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsPutToolWithParsing(BaseRequestsTool, BaseTool):
"""Requests PUT tool with LLM-instructed extraction of truncated responses."""
name: str = "requests_put"
"""Tool name."""
description = REQUESTS_PUT_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PUT_PROMPT)
)
"""LLMChain used to extract the response."""
def _run(self, text: str) -> str:
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.put(data["url"], data["data"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
class RequestsDeleteToolWithParsing(BaseRequestsTool, BaseTool):
"""A tool that sends a DELETE request and parses the response."""
name: str = "requests_delete"
"""The name of the tool."""
description = REQUESTS_DELETE_TOOL_DESCRIPTION
"""The description of the tool."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
"""The maximum length of the response."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_DELETE_PROMPT)
)
"""The LLM chain used to parse the response."""
def _run(self, text: str) -> str:
try:
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.delete(data["url"])
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
).strip()
async def _arun(self, text: str) -> str:
raise NotImplementedError()
#
# Orchestrator, planner, controller.
#
def _create_api_planner_tool(
api_spec: ReducedOpenAPISpec, llm: BaseLanguageModel
) -> Tool:
from langchain.chains.llm import LLMChain
endpoint_descriptions = [
f"{name} {description}" for name, description, _ in api_spec.endpoints
]
prompt = PromptTemplate(
template=API_PLANNER_PROMPT,
input_variables=["query"],
partial_variables={"endpoints": "- " + "- ".join(endpoint_descriptions)},
)
chain = LLMChain(llm=llm, prompt=prompt)
tool = Tool(
name=API_PLANNER_TOOL_NAME,
description=API_PLANNER_TOOL_DESCRIPTION,
func=chain.run,
)
return tool
def _create_api_controller_agent(
api_url: str,
api_docs: str,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
) -> AgentExecutor:
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
get_llm_chain = LLMChain(llm=llm, prompt=PARSING_GET_PROMPT)
post_llm_chain = LLMChain(llm=llm, prompt=PARSING_POST_PROMPT)
tools: List[BaseTool] = [
RequestsGetToolWithParsing(
requests_wrapper=requests_wrapper, llm_chain=get_llm_chain
),
RequestsPostToolWithParsing(
requests_wrapper=requests_wrapper, llm_chain=post_llm_chain
),
]
prompt = PromptTemplate(
template=API_CONTROLLER_PROMPT,
input_variables=["input", "agent_scratchpad"],
partial_variables={
"api_url": api_url,
"api_docs": api_docs,
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
agent = ZeroShotAgent(
llm_chain=LLMChain(llm=llm, prompt=prompt),
allowed_tools=[tool.name for tool in tools],
)
return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
def _create_api_controller_tool(
api_spec: ReducedOpenAPISpec,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
) -> Tool:
"""Expose controller as a tool.
The tool is invoked with a plan from the planner, and dynamically
creates a controller agent with relevant documentation only to
constrain the context.
"""
base_url = api_spec.servers[0]["url"] # TODO: do better.
def _create_and_run_api_controller_agent(plan_str: str) -> str:
pattern = r"\b(GET|POST|PATCH|DELETE)\s+(/\S+)*"
matches = re.findall(pattern, plan_str)
endpoint_names = [
"{method} {route}".format(method=method, route=route.split("?")[0])
for method, route in matches
]
docs_str = ""
for endpoint_name in endpoint_names:
found_match = False
for name, _, docs in api_spec.endpoints:
regex_name = re.compile(re.sub("\{.*?\}", ".*", name))
if regex_name.match(endpoint_name):
found_match = True
docs_str += f"== Docs for {endpoint_name} == \n{yaml.dump(docs)}\n"
if not found_match:
raise ValueError(f"{endpoint_name} endpoint does not exist.")
agent = _create_api_controller_agent(base_url, docs_str, requests_wrapper, llm)
return agent.run(plan_str)
return Tool(
name=API_CONTROLLER_TOOL_NAME,
func=_create_and_run_api_controller_agent,
description=API_CONTROLLER_TOOL_DESCRIPTION,
)
def create_openapi_agent(
api_spec: ReducedOpenAPISpec,
requests_wrapper: RequestsWrapper,
llm: BaseLanguageModel,
shared_memory: Optional[ReadOnlySharedMemory] = None,
callback_manager: Optional[BaseCallbackManager] = None,
verbose: bool = True,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Instantiate OpenAI API planner and controller for a given spec.
Inject credentials via requests_wrapper.
We use a top-level "orchestrator" agent to invoke the planner and controller,
rather than a top-level planner
that invokes a controller with its plan. This is to keep the planner simple.
"""
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.agent import AgentExecutor
from langchain.chains.llm import LLMChain
tools = [
_create_api_planner_tool(api_spec, llm),
_create_api_controller_tool(api_spec, requests_wrapper, llm),
]
prompt = PromptTemplate(
template=API_ORCHESTRATOR_PROMPT,
input_variables=["input", "agent_scratchpad"],
partial_variables={
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
agent = ZeroShotAgent(
llm_chain=LLMChain(llm=llm, prompt=prompt, memory=shared_memory),
allowed_tools=[tool.name for tool in tools],
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,90 +0,0 @@
"""Requests toolkit."""
from __future__ import annotations
from typing import Any, List
from langchain_core.language_models import BaseLanguageModel
from langchain_core.tools import Tool
from langchain_community.agent_toolkits.base import BaseToolkit
from langchain_community.agent_toolkits.json.base import create_json_agent
from langchain_community.agent_toolkits.json.toolkit import JsonToolkit
from langchain_community.agent_toolkits.openapi.prompt import DESCRIPTION
from langchain_community.tools import BaseTool
from langchain_community.tools.json.tool import JsonSpec
from langchain_community.tools.requests.tool import (
RequestsDeleteTool,
RequestsGetTool,
RequestsPatchTool,
RequestsPostTool,
RequestsPutTool,
)
from langchain_community.utilities.requests import TextRequestsWrapper
class RequestsToolkit(BaseToolkit):
"""Toolkit for making REST requests.
*Security Note*: This toolkit contains tools to make GET, POST, PATCH, PUT,
and DELETE requests to an API.
Exercise care in who is allowed to use this toolkit. If exposing
to end users, consider that users will be able to make arbitrary
requests on behalf of the server hosting the code. For example,
users could ask the server to make a request to a private API
that is only accessible from the server.
Control access to who can submit issue requests using this toolkit and
what network access it has.
See https://python.langchain.com/docs/security for more information.
"""
requests_wrapper: TextRequestsWrapper
def get_tools(self) -> List[BaseTool]:
"""Return a list of tools."""
return [
RequestsGetTool(requests_wrapper=self.requests_wrapper),
RequestsPostTool(requests_wrapper=self.requests_wrapper),
RequestsPatchTool(requests_wrapper=self.requests_wrapper),
RequestsPutTool(requests_wrapper=self.requests_wrapper),
RequestsDeleteTool(requests_wrapper=self.requests_wrapper),
]
class OpenAPIToolkit(BaseToolkit):
"""Toolkit for interacting with an OpenAPI API.
*Security Note*: This toolkit contains tools that can read and modify
the state of a service; e.g., by creating, deleting, or updating,
reading underlying data.
For example, this toolkit can be used to delete data exposed via
an OpenAPI compliant API.
"""
json_agent: Any
requests_wrapper: TextRequestsWrapper
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
json_agent_tool = Tool(
name="json_explorer",
func=self.json_agent.run,
description=DESCRIPTION,
)
request_toolkit = RequestsToolkit(requests_wrapper=self.requests_wrapper)
return [*request_toolkit.get_tools(), json_agent_tool]
@classmethod
def from_llm(
cls,
llm: BaseLanguageModel,
json_spec: JsonSpec,
requests_wrapper: TextRequestsWrapper,
**kwargs: Any,
) -> OpenAPIToolkit:
"""Create json agent from llm, then initialize."""
json_agent = create_json_agent(llm, JsonToolkit(spec=json_spec), **kwargs)
return cls(json_agent=json_agent, requests_wrapper=requests_wrapper)

View File

@@ -1,68 +0,0 @@
"""Power BI agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.powerbi.prompt import (
POWERBI_PREFIX,
POWERBI_SUFFIX,
)
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.agents import AgentExecutor
def create_pbi_agent(
llm: BaseLanguageModel,
toolkit: Optional[PowerBIToolkit] = None,
powerbi: Optional[PowerBIDataset] = None,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = POWERBI_PREFIX,
suffix: str = POWERBI_SUFFIX,
format_instructions: Optional[str] = None,
examples: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Power BI agent from an LLM and tools."""
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents import AgentExecutor
from langchain.chains.llm import LLMChain
if toolkit is None:
if powerbi is None:
raise ValueError("Must provide either a toolkit or powerbi dataset")
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
tools = toolkit.get_tools()
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
agent = ZeroShotAgent(
llm_chain=LLMChain(
llm=llm,
prompt=ZeroShotAgent.create_prompt(
tools,
prefix=prefix.format(top_k=top_k).format(tables=tables),
suffix=suffix,
input_variables=input_variables,
**prompt_params,
),
callback_manager=callback_manager, # type: ignore
verbose=verbose,
),
allowed_tools=[tool.name for tool in tools],
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,69 +0,0 @@
"""Power BI agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_community.agent_toolkits.powerbi.prompt import (
POWERBI_CHAT_PREFIX,
POWERBI_CHAT_SUFFIX,
)
from langchain_community.agent_toolkits.powerbi.toolkit import PowerBIToolkit
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.agents import AgentExecutor
from langchain.agents.agent import AgentOutputParser
from langchain.memory.chat_memory import BaseChatMemory
def create_pbi_chat_agent(
llm: BaseChatModel,
toolkit: Optional[PowerBIToolkit] = None,
powerbi: Optional[PowerBIDataset] = None,
callback_manager: Optional[BaseCallbackManager] = None,
output_parser: Optional[AgentOutputParser] = None,
prefix: str = POWERBI_CHAT_PREFIX,
suffix: str = POWERBI_CHAT_SUFFIX,
examples: Optional[str] = None,
input_variables: Optional[List[str]] = None,
memory: Optional[BaseChatMemory] = None,
top_k: int = 10,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Power BI agent from a Chat LLM and tools.
If you supply only a toolkit and no Power BI dataset, the same LLM is used for both.
"""
from langchain.agents import AgentExecutor
from langchain.agents.conversational_chat.base import ConversationalChatAgent
from langchain.memory import ConversationBufferMemory
if toolkit is None:
if powerbi is None:
raise ValueError("Must provide either a toolkit or powerbi dataset")
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
tools = toolkit.get_tools()
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
agent = ConversationalChatAgent.from_llm_and_tools(
llm=llm,
tools=tools,
system_message=prefix.format(top_k=top_k).format(tables=tables),
human_message=suffix,
input_variables=input_variables,
callback_manager=callback_manager,
output_parser=output_parser,
verbose=verbose,
**kwargs,
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
memory=memory
or ConversationBufferMemory(memory_key="chat_history", return_messages=True),
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,106 +0,0 @@
"""Toolkit for interacting with a Power BI dataset."""
from __future__ import annotations
from typing import List, Optional, Union, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain_core.pydantic_v1 import Field
from langchain_community.agent_toolkits.base import BaseToolkit
from langchain_community.tools import BaseTool
from langchain_community.tools.powerbi.prompt import (
QUESTION_TO_QUERY_BASE,
SINGLE_QUESTION_TO_QUERY,
USER_INPUT,
)
from langchain_community.tools.powerbi.tool import (
InfoPowerBITool,
ListPowerBITool,
QueryPowerBITool,
)
from langchain_community.utilities.powerbi import PowerBIDataset
if TYPE_CHECKING:
from langchain.chains.llm import LLMChain
class PowerBIToolkit(BaseToolkit):
"""Toolkit for interacting with Power BI dataset.
*Security Note*: This toolkit interacts with an external service.
Control access to who can use this toolkit.
Make sure that the capabilities given by this toolkit to the calling
code are appropriately scoped to the application.
See https://python.langchain.com/docs/security for more information.
"""
powerbi: PowerBIDataset = Field(exclude=True)
llm: Union[BaseLanguageModel, BaseChatModel] = Field(exclude=True)
examples: Optional[str] = None
max_iterations: int = 5
callback_manager: Optional[BaseCallbackManager] = None
output_token_limit: Optional[int] = None
tiktoken_model_name: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
return [
QueryPowerBITool(
llm_chain=self._get_chain(),
powerbi=self.powerbi,
examples=self.examples,
max_iterations=self.max_iterations,
output_token_limit=self.output_token_limit,
tiktoken_model_name=self.tiktoken_model_name,
),
InfoPowerBITool(powerbi=self.powerbi),
ListPowerBITool(powerbi=self.powerbi),
]
def _get_chain(self) -> LLMChain:
"""Construct the chain based on the callback manager and model type."""
from langchain.chains.llm import LLMChain
if isinstance(self.llm, BaseLanguageModel):
return LLMChain(
llm=self.llm,
callback_manager=self.callback_manager
if self.callback_manager
else None,
prompt=PromptTemplate(
template=SINGLE_QUESTION_TO_QUERY,
input_variables=["tool_input", "tables", "schemas", "examples"],
),
)
system_prompt = SystemMessagePromptTemplate(
prompt=PromptTemplate(
template=QUESTION_TO_QUERY_BASE,
input_variables=["tables", "schemas", "examples"],
)
)
human_prompt = HumanMessagePromptTemplate(
prompt=PromptTemplate(
template=USER_INPUT,
input_variables=["tool_input"],
)
)
return LLMChain(
llm=self.llm,
callback_manager=self.callback_manager if self.callback_manager else None,
prompt=ChatPromptTemplate.from_messages([system_prompt, human_prompt]),
)

View File

@@ -1,64 +0,0 @@
"""Spark SQL agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager, Callbacks
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.spark_sql.prompt import SQL_PREFIX, SQL_SUFFIX
from langchain_community.agent_toolkits.spark_sql.toolkit import SparkSQLToolkit
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_spark_sql_agent(
llm: BaseLanguageModel,
toolkit: SparkSQLToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
callbacks: Callbacks = None,
prefix: str = SQL_PREFIX,
suffix: str = SQL_SUFFIX,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a Spark SQL agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prefix = prefix.format(top_k=top_k)
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
callbacks=callbacks,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
callbacks=callbacks,
verbose=verbose,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,102 +0,0 @@
"""SQL agent."""
from __future__ import annotations
from typing import Any, Dict, List, Optional, Sequence, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_core.messages import AIMessage, SystemMessage
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)
from langchain_community.agent_toolkits.sql.prompt import (
SQL_FUNCTIONS_SUFFIX,
SQL_PREFIX,
SQL_SUFFIX,
)
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.tools import BaseTool
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
from langchain.agents.agent_types import AgentType
def create_sql_agent(
llm: BaseLanguageModel,
toolkit: SQLDatabaseToolkit,
agent_type: Optional[AgentType] = None,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = SQL_PREFIX,
suffix: Optional[str] = None,
format_instructions: Optional[str] = None,
input_variables: Optional[List[str]] = None,
top_k: int = 10,
max_iterations: Optional[int] = 15,
max_execution_time: Optional[float] = None,
early_stopping_method: str = "force",
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
extra_tools: Sequence[BaseTool] = (),
**kwargs: Any,
) -> AgentExecutor:
"""Construct an SQL agent from an LLM and tools."""
from langchain.agents.agent import AgentExecutor, BaseSingleActionAgent
from langchain.agents.agent_types import AgentType
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.chains.llm import LLMChain
agent_type = agent_type or AgentType.ZERO_SHOT_REACT_DESCRIPTION
tools = toolkit.get_tools() + list(extra_tools)
prefix = prefix.format(dialect=toolkit.dialect, top_k=top_k)
agent: BaseSingleActionAgent
if agent_type == AgentType.ZERO_SHOT_REACT_DESCRIPTION:
prompt_params = {"format_instructions": format_instructions} if format_instructions is not None else {}
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix or SQL_SUFFIX,
input_variables=input_variables,
**prompt_params,
)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
elif agent_type == AgentType.OPENAI_FUNCTIONS:
messages = [
SystemMessage(content=prefix),
HumanMessagePromptTemplate.from_template("{input}"),
AIMessage(content=suffix or SQL_FUNCTIONS_SUFFIX),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
input_variables = ["input", "agent_scratchpad"]
_prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)
agent = OpenAIFunctionsAgent(
llm=llm,
prompt=_prompt,
tools=tools,
callback_manager=callback_manager,
**kwargs,
)
else:
raise ValueError(f"Agent type {agent_type} not supported at the moment.")
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
max_iterations=max_iterations,
max_execution_time=max_execution_time,
early_stopping_method=early_stopping_method,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,103 +0,0 @@
"""VectorStore agent."""
from __future__ import annotations
from typing import Any, Dict, Optional, TYPE_CHECKING
from langchain_core.callbacks import BaseCallbackManager
from langchain_core.language_models import BaseLanguageModel
from langchain_community.agent_toolkits.vectorstore.prompt import PREFIX, ROUTER_PREFIX
from langchain_community.agent_toolkits.vectorstore.toolkit import (
VectorStoreRouterToolkit,
VectorStoreToolkit,
)
if TYPE_CHECKING:
from langchain.agents.agent import AgentExecutor
def create_vectorstore_agent(
llm: BaseLanguageModel,
toolkit: VectorStoreToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = PREFIX,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a VectorStore agent from an LLM and tools.
Args:
llm (BaseLanguageModel): LLM that will be used by the agent
toolkit (VectorStoreToolkit): Set of tools for the agent
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
prefix (str, optional): The prefix prompt for the agent. If not provided uses default PREFIX.
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
Returns:
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response
""" # noqa: E501
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)
def create_vectorstore_router_agent(
llm: BaseLanguageModel,
toolkit: VectorStoreRouterToolkit,
callback_manager: Optional[BaseCallbackManager] = None,
prefix: str = ROUTER_PREFIX,
verbose: bool = False,
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Construct a VectorStore router agent from an LLM and tools.
Args:
llm (BaseLanguageModel): LLM that will be used by the agent
toolkit (VectorStoreRouterToolkit): Set of tools for the agent which have routing capability with multiple vector stores
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
prefix (str, optional): The prefix prompt for the router agent. If not provided uses default ROUTER_PREFIX.
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
Returns:
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response.
""" # noqa: E501
from langchain.agents.agent import AgentExecutor
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.chains.llm import LLMChain
tools = toolkit.get_tools()
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
callback_manager=callback_manager,
)
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
callback_manager=callback_manager,
verbose=verbose,
**(agent_executor_kwargs or {}),
)

View File

@@ -1,83 +0,0 @@
"""**Callback handlers** allow listening to events in LangChain.
**Class hierarchy:**
.. code-block::
BaseCallbackHandler --> <name>CallbackHandler # Example: AimCallbackHandler
"""
from langchain_core.callbacks import (
StdOutCallbackHandler,
StreamingStdOutCallbackHandler,
)
from langchain_core.tracers.langchain import LangChainTracer
from langchain_community.callbacks.aim_callback import AimCallbackHandler
from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler
from langchain_community.callbacks.arize_callback import ArizeCallbackHandler
from langchain_community.callbacks.arthur_callback import ArthurCallbackHandler
from langchain_community.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain_community.callbacks.comet_ml_callback import CometCallbackHandler
from langchain_community.callbacks.context_callback import ContextCallbackHandler
from langchain_community.callbacks.file import FileCallbackHandler
from langchain_community.callbacks.flyte_callback import FlyteCallbackHandler
from langchain_community.callbacks.human import HumanApprovalCallbackHandler
from langchain_community.callbacks.infino_callback import InfinoCallbackHandler
from langchain_community.callbacks.labelstudio_callback import (
LabelStudioCallbackHandler,
)
from langchain_community.callbacks.llmonitor_callback import LLMonitorCallbackHandler
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
from langchain_community.callbacks.mlflow_callback import MlflowCallbackHandler
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.promptlayer_callback import (
PromptLayerCallbackHandler,
)
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain_community.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain_community.callbacks.streaming_stdout_final_only import (
FinalStreamingStdOutCallbackHandler,
)
from langchain_community.callbacks.streamlit import (
LLMThoughtLabeler,
StreamlitCallbackHandler,
)
from langchain_community.callbacks.trubrics_callback import TrubricsCallbackHandler
from langchain_community.callbacks.wandb_callback import WandbCallbackHandler
from langchain_community.callbacks.whylabs_callback import WhyLabsCallbackHandler
__all__ = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"FileCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"StdOutCallbackHandler",
"AsyncIteratorCallbackHandler",
"StreamingStdOutCallbackHandler",
"FinalStreamingStdOutCallbackHandler",
"LLMThoughtLabeler",
"LangChainTracer",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]

View File

@@ -1,69 +0,0 @@
from __future__ import annotations
import logging
from contextlib import contextmanager
from contextvars import ContextVar
from typing import (
Generator,
Optional,
)
from langchain_core.tracers.context import register_configure_hook
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.tracers.wandb import WandbTracer
logger = logging.getLogger(__name__)
openai_callback_var: ContextVar[Optional[OpenAICallbackHandler]] = ContextVar(
"openai_callback", default=None
)
wandb_tracing_callback_var: ContextVar[Optional[WandbTracer]] = ContextVar( # noqa: E501
"tracing_wandb_callback", default=None
)
register_configure_hook(openai_callback_var, True)
register_configure_hook(
wandb_tracing_callback_var, True, WandbTracer, "LANGCHAIN_WANDB_TRACING"
)
@contextmanager
def get_openai_callback() -> Generator[OpenAICallbackHandler, None, None]:
"""Get the OpenAI callback handler in a context manager.
which conveniently exposes token and cost information.
Returns:
OpenAICallbackHandler: The OpenAI callback handler.
Example:
>>> with get_openai_callback() as cb:
... # Use the OpenAI callback handler
"""
cb = OpenAICallbackHandler()
openai_callback_var.set(cb)
yield cb
openai_callback_var.set(None)
@contextmanager
def wandb_tracing_enabled(
session_name: str = "default",
) -> Generator[None, None, None]:
"""Get the WandbTracer in a context manager.
Args:
session_name (str, optional): The name of the session.
Defaults to "default".
Returns:
None
Example:
>>> with wandb_tracing_enabled() as session:
... # Use the WandbTracer session
"""
cb = WandbTracer()
wandb_tracing_callback_var.set(cb)
yield None
wandb_tracing_callback_var.set(None)

View File

@@ -1,20 +0,0 @@
"""Tracers that record execution of LangChain runs."""
from langchain_core.tracers.langchain import LangChainTracer
from langchain_core.tracers.langchain_v1 import LangChainTracerV1
from langchain_core.tracers.stdout import (
ConsoleCallbackHandler,
FunctionCallbackHandler,
)
from langchain_community.callbacks.tracers.logging import LoggingCallbackHandler
from langchain_community.callbacks.tracers.wandb import WandbTracer
__all__ = [
"ConsoleCallbackHandler",
"FunctionCallbackHandler",
"LoggingCallbackHandler",
"LangChainTracer",
"LangChainTracerV1",
"WandbTracer",
]

View File

@@ -1,78 +0,0 @@
"""**Chat Models** are a variation on language models.
While Chat Models use language models under the hood, the interface they expose
is a bit different. Rather than expose a "text in, text out" API, they expose
an interface where "chat messages" are the inputs and outputs.
**Class hierarchy:**
.. code-block::
BaseLanguageModel --> BaseChatModel --> <name> # Examples: ChatOpenAI, ChatGooglePalm
**Main helpers:**
.. code-block::
AIMessage, BaseMessage, HumanMessage
""" # noqa: E501
from langchain_community.chat_models.anthropic import ChatAnthropic
from langchain_community.chat_models.anyscale import ChatAnyscale
from langchain_community.chat_models.baichuan import ChatBaichuan
from langchain_community.chat_models.baidu_qianfan_endpoint import QianfanChatEndpoint
from langchain_community.chat_models.bedrock import BedrockChat
from langchain_community.chat_models.cohere import ChatCohere
from langchain_community.chat_models.databricks import ChatDatabricks
from langchain_community.chat_models.ernie import ErnieBotChat
from langchain_community.chat_models.everlyai import ChatEverlyAI
from langchain_community.chat_models.fake import FakeListChatModel
from langchain_community.chat_models.fireworks import ChatFireworks
from langchain_community.chat_models.gigachat import GigaChat
from langchain_community.chat_models.google_palm import ChatGooglePalm
from langchain_community.chat_models.human import HumanInputChatModel
from langchain_community.chat_models.hunyuan import ChatHunyuan
from langchain_community.chat_models.javelin_ai_gateway import ChatJavelinAIGateway
from langchain_community.chat_models.jinachat import JinaChat
from langchain_community.chat_models.konko import ChatKonko
from langchain_community.chat_models.litellm import ChatLiteLLM
from langchain_community.chat_models.minimax import MiniMaxChat
from langchain_community.chat_models.mlflow import ChatMlflow
from langchain_community.chat_models.mlflow_ai_gateway import ChatMLflowAIGateway
from langchain_community.chat_models.ollama import ChatOllama
from langchain_community.chat_models.pai_eas_endpoint import PaiEasChatEndpoint
from langchain_community.chat_models.promptlayer_openai import PromptLayerChatOpenAI
from langchain_community.chat_models.vertexai import ChatVertexAI
from langchain_community.chat_models.volcengine_maas import VolcEngineMaasChat
from langchain_community.chat_models.yandex import ChatYandexGPT
__all__ = [
"BedrockChat",
"FakeListChatModel",
"PromptLayerChatOpenAI",
"ChatDatabricks",
"ChatEverlyAI",
"ChatAnthropic",
"ChatCohere",
"ChatGooglePalm",
"ChatMlflow",
"ChatMLflowAIGateway",
"ChatOllama",
"ChatVertexAI",
"JinaChat",
"HumanInputChatModel",
"MiniMaxChat",
"ChatAnyscale",
"ChatLiteLLM",
"ErnieBotChat",
"ChatJavelinAIGateway",
"ChatKonko",
"PaiEasChatEndpoint",
"QianfanChatEndpoint",
"ChatFireworks",
"ChatYandexGPT",
"ChatBaichuan",
"ChatHunyuan",
"GigaChat",
"VolcEngineMaasChat",
]

View File

@@ -1,101 +0,0 @@
"""Abstract interface for document loader implementations."""
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Iterator, List, Optional, TYPE_CHECKING
from langchain_core.documents import Document
from langchain_community.document_loaders.blob_loaders import Blob
if TYPE_CHECKING:
from langchain.text_splitter import TextSplitter
class BaseLoader(ABC):
"""Interface for Document Loader.
Implementations should implement the lazy-loading method using generators
to avoid loading all Documents into memory at once.
The `load` method will remain as is for backwards compatibility, but its
implementation should be just `list(self.lazy_load())`.
"""
# Sub-classes should implement this method
# as return list(self.lazy_load()).
# This method returns a List which is materialized in memory.
@abstractmethod
def load(self) -> List[Document]:
"""Load data into Document objects."""
def load_and_split(
self, text_splitter: Optional[TextSplitter] = None
) -> List[Document]:
"""Load Documents and split into chunks. Chunks are returned as Documents.
Args:
text_splitter: TextSplitter instance to use for splitting documents.
Defaults to RecursiveCharacterTextSplitter.
Returns:
List of Documents.
"""
from langchain.text_splitter import RecursiveCharacterTextSplitter
if text_splitter is None:
_text_splitter: TextSplitter = RecursiveCharacterTextSplitter()
else:
_text_splitter = text_splitter
docs = self.load()
return _text_splitter.split_documents(docs)
# Attention: This method will be upgraded into an abstractmethod once it's
# implemented in all the existing subclasses.
def lazy_load(
self,
) -> Iterator[Document]:
"""A lazy loader for Documents."""
raise NotImplementedError(
f"{self.__class__.__name__} does not implement lazy_load()"
)
class BaseBlobParser(ABC):
"""Abstract interface for blob parsers.
A blob parser provides a way to parse raw data stored in a blob into one
or more documents.
The parser can be composed with blob loaders, making it easy to reuse
a parser independent of how the blob was originally loaded.
"""
@abstractmethod
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
"""Lazy parsing interface.
Subclasses are required to implement this method.
Args:
blob: Blob instance
Returns:
Generator of documents
"""
def parse(self, blob: Blob) -> List[Document]:
"""Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
Args:
blob: Blob instance
Returns:
List of documents
"""
return list(self.lazy_parse(blob))

View File

@@ -1,147 +0,0 @@
"""Use to load blobs from the local file system."""
from pathlib import Path
from typing import Callable, Iterable, Iterator, Optional, Sequence, TypeVar, Union
from langchain_community.document_loaders.blob_loaders.schema import Blob, BlobLoader
T = TypeVar("T")
def _make_iterator(
length_func: Callable[[], int], show_progress: bool = False
) -> Callable[[Iterable[T]], Iterator[T]]:
"""Create a function that optionally wraps an iterable in tqdm."""
if show_progress:
try:
from tqdm.auto import tqdm
except ImportError:
raise ImportError(
"You must install tqdm to use show_progress=True."
"You can install tqdm with `pip install tqdm`."
)
# Make sure to provide `total` here so that tqdm can show
# a progress bar that takes into account the total number of files.
def _with_tqdm(iterable: Iterable[T]) -> Iterator[T]:
"""Wrap an iterable in a tqdm progress bar."""
return tqdm(iterable, total=length_func())
iterator = _with_tqdm
else:
iterator = iter # type: ignore
return iterator
# PUBLIC API
class FileSystemBlobLoader(BlobLoader):
"""Load blobs in the local file system.
Example:
.. code-block:: python
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = FileSystemBlobLoader("/path/to/directory")
for blob in loader.yield_blobs():
print(blob)
""" # noqa: E501
def __init__(
self,
path: Union[str, Path],
*,
glob: str = "**/[!.]*",
exclude: Sequence[str] = (),
suffixes: Optional[Sequence[str]] = None,
show_progress: bool = False,
) -> None:
"""Initialize with a path to directory and how to glob over it.
Args:
path: Path to directory to load from or path to file to load.
If a path to a file is provided, glob/exclude/suffixes are ignored.
glob: Glob pattern relative to the specified path
by default set to pick up all non-hidden files
exclude: patterns to exclude from results, use glob syntax
suffixes: Provide to keep only files with these suffixes
Useful when wanting to keep files with different suffixes
Suffixes must include the dot, e.g. ".txt"
show_progress: If true, will show a progress bar as the files are loaded.
This forces an iteration through all matching files
to count them prior to loading them.
Examples:
.. code-block:: python
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
# Load a single file.
loader = FileSystemBlobLoader("/path/to/file.txt")
# Recursively load all text files in a directory.
loader = FileSystemBlobLoader("/path/to/directory", glob="**/*.txt")
# Recursively load all non-hidden files in a directory.
loader = FileSystemBlobLoader("/path/to/directory", glob="**/[!.]*")
# Load all files in a directory without recursion.
loader = FileSystemBlobLoader("/path/to/directory", glob="*")
# Recursively load all files in a directory, except for py or pyc files.
loader = FileSystemBlobLoader(
"/path/to/directory",
glob="**/*.txt",
exclude=["**/*.py", "**/*.pyc"]
)
""" # noqa: E501
if isinstance(path, Path):
_path = path
elif isinstance(path, str):
_path = Path(path)
else:
raise TypeError(f"Expected str or Path, got {type(path)}")
self.path = _path.expanduser() # Expand user to handle ~
self.glob = glob
self.suffixes = set(suffixes or [])
self.show_progress = show_progress
self.exclude = exclude
def yield_blobs(
self,
) -> Iterable[Blob]:
"""Yield blobs that match the requested pattern."""
iterator = _make_iterator(
length_func=self.count_matching_files, show_progress=self.show_progress
)
for path in iterator(self._yield_paths()):
yield Blob.from_path(path)
def _yield_paths(self) -> Iterable[Path]:
"""Yield paths that match the requested pattern."""
if self.path.is_file():
yield self.path
return
paths = self.path.glob(self.glob)
for path in paths:
if self.exclude:
if any(path.match(glob) for glob in self.exclude):
continue
if path.is_file():
if self.suffixes and path.suffix not in self.suffixes:
continue
yield path
def count_matching_files(self) -> int:
"""Count files that match the pattern without loading them."""
# Carry out a full iteration to count the files without
# materializing anything expensive in memory.
num = 0
for _ in self._yield_paths():
num += 1
return num

View File

@@ -1,190 +0,0 @@
from __future__ import annotations
from pathlib import Path
from typing import (
TYPE_CHECKING,
Any,
Iterator,
List,
Literal,
Optional,
Sequence,
Union,
)
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser, BaseLoader
from langchain_community.document_loaders.blob_loaders import (
BlobLoader,
FileSystemBlobLoader,
)
from langchain_community.document_loaders.parsers.registry import get_parser
if TYPE_CHECKING:
from langchain.text_splitter import TextSplitter
_PathLike = Union[str, Path]
DEFAULT = Literal["default"]
class GenericLoader(BaseLoader):
"""Generic Document Loader.
A generic document loader that allows combining an arbitrary blob loader with
a blob parser.
Examples:
Parse a specific PDF file:
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"my_lovely_pdf.pdf",
parser=PyPDFParser()
)
.. code-block:: python
from langchain_community.document_loaders import GenericLoader
from langchain_community.document_loaders.blob_loaders import FileSystemBlobLoader
loader = GenericLoader.from_filesystem(
path="path/to/directory",
glob="**/[!.]*",
suffixes=[".pdf"],
show_progress=True,
)
docs = loader.lazy_load()
next(docs)
Example instantiations to change which files are loaded:
.. code-block:: python
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/*.txt")
# Recursively load all non-hidden files in a directory.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="**/[!.]*")
# Load all files in a directory without recursion.
loader = GenericLoader.from_filesystem("/path/to/dir", glob="*")
Example instantiations to change which parser is used:
.. code-block:: python
from langchain_community.document_loaders.parsers.pdf import PyPDFParser
# Recursively load all text files in a directory.
loader = GenericLoader.from_filesystem(
"/path/to/dir",
glob="**/*.pdf",
parser=PyPDFParser()
)
""" # noqa: E501
def __init__(
self,
blob_loader: BlobLoader,
blob_parser: BaseBlobParser,
) -> None:
"""A generic document loader.
Args:
blob_loader: A blob loader which knows how to yield blobs
blob_parser: A blob parser which knows how to parse blobs into documents
"""
self.blob_loader = blob_loader
self.blob_parser = blob_parser
def lazy_load(
self,
) -> Iterator[Document]:
"""Load documents lazily. Use this when working at a large scale."""
for blob in self.blob_loader.yield_blobs():
yield from self.blob_parser.lazy_parse(blob)
def load(self) -> List[Document]:
"""Load all documents."""
return list(self.lazy_load())
def load_and_split(
self, text_splitter: Optional[TextSplitter] = None
) -> List[Document]:
"""Load all documents and split them into sentences."""
raise NotImplementedError(
"Loading and splitting is not yet implemented for generic loaders. "
"When they will be implemented they will be added via the initializer. "
"This method should not be used going forward."
)
@classmethod
def from_filesystem(
cls,
path: _PathLike,
*,
glob: str = "**/[!.]*",
exclude: Sequence[str] = (),
suffixes: Optional[Sequence[str]] = None,
show_progress: bool = False,
parser: Union[DEFAULT, BaseBlobParser] = "default",
parser_kwargs: Optional[dict] = None,
) -> GenericLoader:
"""Create a generic document loader using a filesystem blob loader.
Args:
path: The path to the directory to load documents from OR the path to a
single file to load. If this is a file, glob, exclude, suffixes
will be ignored.
glob: The glob pattern to use to find documents.
suffixes: The suffixes to use to filter documents. If None, all files
matching the glob will be loaded.
exclude: A list of patterns to exclude from the loader.
show_progress: Whether to show a progress bar or not (requires tqdm).
Proxies to the file system loader.
parser: A blob parser which knows how to parse blobs into documents,
will instantiate a default parser if not provided.
The default can be overridden by either passing a parser or
setting the class attribute `blob_parser` (the latter
should be used with inheritance).
parser_kwargs: Keyword arguments to pass to the parser.
Returns:
A generic document loader.
"""
blob_loader = FileSystemBlobLoader(
path,
glob=glob,
exclude=exclude,
suffixes=suffixes,
show_progress=show_progress,
)
if isinstance(parser, str):
if parser == "default":
try:
# If there is an implementation of get_parser on the class, use it.
blob_parser = cls.get_parser(**(parser_kwargs or {}))
except NotImplementedError:
# if not then use the global registry.
blob_parser = get_parser(parser)
else:
blob_parser = get_parser(parser)
else:
blob_parser = parser
return cls(blob_loader, blob_parser)
@staticmethod
def get_parser(**kwargs: Any) -> BaseBlobParser:
"""Override this method to associate a default parser with the class."""
raise NotImplementedError()

View File

@@ -1,70 +0,0 @@
"""Code for generic / auxiliary parsers.
This module contains some logic to help assemble more sophisticated parsers.
"""
from typing import Iterator, Mapping, Optional
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders.schema import Blob
class MimeTypeBasedParser(BaseBlobParser):
"""Parser that uses `mime`-types to parse a blob.
This parser is useful for simple pipelines where the mime-type is sufficient
to determine how to parse a blob.
To use, configure handlers based on mime-types and pass them to the initializer.
Example:
.. code-block:: python
from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser
parser = MimeTypeBasedParser(
handlers={
"application/pdf": ...,
},
fallback_parser=...,
)
""" # noqa: E501
def __init__(
self,
handlers: Mapping[str, BaseBlobParser],
*,
fallback_parser: Optional[BaseBlobParser] = None,
) -> None:
"""Define a parser that uses mime-types to determine how to parse a blob.
Args:
handlers: A mapping from mime-types to functions that take a blob, parse it
and return a document.
fallback_parser: A fallback_parser parser to use if the mime-type is not
found in the handlers. If provided, this parser will be
used to parse blobs with all mime-types not found in
the handlers.
If not provided, a ValueError will be raised if the
mime-type is not found in the handlers.
"""
self.handlers = handlers
self.fallback_parser = fallback_parser
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
"""Load documents from a blob."""
mimetype = blob.mimetype
if mimetype is None:
raise ValueError(f"{blob} does not have a mimetype.")
if mimetype in self.handlers:
handler = self.handlers[mimetype]
yield from handler.lazy_parse(blob)
else:
if self.fallback_parser is not None:
yield from self.fallback_parser.lazy_parse(blob)
else:
raise ValueError(f"Unsupported mime type: {mimetype}")

View File

@@ -1,157 +0,0 @@
from __future__ import annotations
from typing import Any, Dict, Iterator, Optional, TYPE_CHECKING
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.language.cobol import CobolSegmenter
from langchain_community.document_loaders.parsers.language.javascript import (
JavaScriptSegmenter,
)
from langchain_community.document_loaders.parsers.language.python import PythonSegmenter
if TYPE_CHECKING:
from langchain.text_splitter import Language
try:
from langchain.text_splitter import Language
LANGUAGE_EXTENSIONS: Dict[str, str] = {
"py": Language.PYTHON,
"js": Language.JS,
"cobol": Language.COBOL,
}
LANGUAGE_SEGMENTERS: Dict[str, Any] = {
Language.PYTHON: PythonSegmenter,
Language.JS: JavaScriptSegmenter,
Language.COBOL: CobolSegmenter,
}
except ImportError:
LANGUAGE_EXTENSIONS = {}
LANGUAGE_SEGMENTERS = {}
class LanguageParser(BaseBlobParser):
"""Parse using the respective programming language syntax.
Each top-level function and class in the code is loaded into separate documents.
Furthermore, an extra document is generated, containing the remaining top-level code
that excludes the already segmented functions and classes.
This approach can potentially improve the accuracy of QA models over source code.
Currently, the supported languages for code parsing are Python and JavaScript.
The language used for parsing can be configured, along with the minimum number of
lines required to activate the splitting based on syntax.
Examples:
.. code-block:: python
from langchain.text_splitter.Language
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py", ".js"],
parser=LanguageParser()
)
docs = loader.load()
Example instantiations to manually select the language:
.. code-block:: python
from langchain.text_splitter import Language
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(language=Language.PYTHON)
)
Example instantiations to set number of lines threshold:
.. code-block:: python
loader = GenericLoader.from_filesystem(
"./code",
glob="**/*",
suffixes=[".py"],
parser=LanguageParser(parser_threshold=200)
)
"""
def __init__(self, language: Optional[Language] = None, parser_threshold: int = 0):
"""
Language parser that split code using the respective language syntax.
Args:
language: If None (default), it will try to infer language from source.
parser_threshold: Minimum lines needed to activate parsing (0 by default).
"""
self.language = language
self.parser_threshold = parser_threshold
def lazy_parse(self, blob: Blob) -> Iterator[Document]:
code = blob.as_string()
language = self.language or (
LANGUAGE_EXTENSIONS.get(blob.source.rsplit(".", 1)[-1])
if isinstance(blob.source, str)
else None
)
if language is None:
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
if self.parser_threshold >= len(code.splitlines()):
yield Document(
page_content=code,
metadata={
"source": blob.source,
"language": language,
},
)
return
self.Segmenter = LANGUAGE_SEGMENTERS[language]
segmenter = self.Segmenter(blob.as_string())
if not segmenter.is_valid():
yield Document(
page_content=code,
metadata={
"source": blob.source,
},
)
return
for functions_classes in segmenter.extract_functions_classes():
yield Document(
page_content=functions_classes,
metadata={
"source": blob.source,
"content_type": "functions_classes",
"language": language,
},
)
yield Document(
page_content=segmenter.simplify_code(),
metadata={
"source": blob.source,
"content_type": "simplified_code",
"language": language,
},
)

View File

@@ -1,262 +0,0 @@
from __future__ import annotations
import asyncio
import json
from pathlib import Path
from typing import TYPE_CHECKING, Dict, List, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
if TYPE_CHECKING:
import pandas as pd
from telethon.hints import EntityLike
def concatenate_rows(row: dict) -> str:
"""Combine message information in a readable format ready to be used."""
date = row["date"]
sender = row["from"]
text = row["text"]
return f"{sender} on {date}: {text}\n\n"
class TelegramChatFileLoader(BaseLoader):
"""Load from `Telegram chat` dump."""
def __init__(self, path: str):
"""Initialize with a path."""
self.file_path = path
def load(self) -> List[Document]:
"""Load documents."""
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
text = "".join(
concatenate_rows(message)
for message in d["messages"]
if message["type"] == "message" and isinstance(message["text"], str)
)
metadata = {"source": str(p)}
return [Document(page_content=text, metadata=metadata)]
def text_to_docs(text: Union[str, List[str]]) -> List[Document]:
"""Convert a string or list of strings to a list of Documents with metadata."""
from langchain.text_splitter import RecursiveCharacterTextSplitter
if isinstance(text, str):
# Take a single string as one page
text = [text]
page_docs = [Document(page_content=page) for page in text]
# Add page numbers as metadata
for i, doc in enumerate(page_docs):
doc.metadata["page"] = i + 1
# Split pages into chunks
doc_chunks = []
for doc in page_docs:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
chunk_overlap=20,
)
chunks = text_splitter.split_text(doc.page_content)
for i, chunk in enumerate(chunks):
doc = Document(
page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
)
# Add sources a metadata
doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
doc_chunks.append(doc)
return doc_chunks
class TelegramChatApiLoader(BaseLoader):
"""Load `Telegram` chat json directory dump."""
def __init__(
self,
chat_entity: Optional[EntityLike] = None,
api_id: Optional[int] = None,
api_hash: Optional[str] = None,
username: Optional[str] = None,
file_path: str = "telegram_data.json",
):
"""Initialize with API parameters.
Args:
chat_entity: The chat entity to fetch data from.
api_id: The API ID.
api_hash: The API hash.
username: The username.
file_path: The file path to save the data to. Defaults to
"telegram_data.json".
"""
self.chat_entity = chat_entity
self.api_id = api_id
self.api_hash = api_hash
self.username = username
self.file_path = file_path
async def fetch_data_from_telegram(self) -> None:
"""Fetch data from Telegram API and save it as a JSON file."""
from telethon.sync import TelegramClient
data = []
async with TelegramClient(self.username, self.api_id, self.api_hash) as client:
async for message in client.iter_messages(self.chat_entity):
is_reply = message.reply_to is not None
reply_to_id = message.reply_to.reply_to_msg_id if is_reply else None
data.append(
{
"sender_id": message.sender_id,
"text": message.text,
"date": message.date.isoformat(),
"message.id": message.id,
"is_reply": is_reply,
"reply_to_id": reply_to_id,
}
)
with open(self.file_path, "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=4)
def _get_message_threads(self, data: pd.DataFrame) -> dict:
"""Create a dictionary of message threads from the given data.
Args:
data (pd.DataFrame): A DataFrame containing the conversation \
data with columns:
- message.sender_id
- text
- date
- message.id
- is_reply
- reply_to_id
Returns:
dict: A dictionary where the key is the parent message ID and \
the value is a list of message IDs in ascending order.
"""
def find_replies(parent_id: int, reply_data: pd.DataFrame) -> List[int]:
"""
Recursively find all replies to a given parent message ID.
Args:
parent_id (int): The parent message ID.
reply_data (pd.DataFrame): A DataFrame containing reply messages.
Returns:
list: A list of message IDs that are replies to the parent message ID.
"""
# Find direct replies to the parent message ID
direct_replies = reply_data[reply_data["reply_to_id"] == parent_id][
"message.id"
].tolist()
# Recursively find replies to the direct replies
all_replies = []
for reply_id in direct_replies:
all_replies += [reply_id] + find_replies(reply_id, reply_data)
return all_replies
# Filter out parent messages
parent_messages = data[~data["is_reply"]]
# Filter out reply messages and drop rows with NaN in 'reply_to_id'
reply_messages = data[data["is_reply"]].dropna(subset=["reply_to_id"])
# Convert 'reply_to_id' to integer
reply_messages["reply_to_id"] = reply_messages["reply_to_id"].astype(int)
# Create a dictionary of message threads with parent message IDs as keys and \
# lists of reply message IDs as values
message_threads = {
parent_id: [parent_id] + find_replies(parent_id, reply_messages)
for parent_id in parent_messages["message.id"]
}
return message_threads
def _combine_message_texts(
self, message_threads: Dict[int, List[int]], data: pd.DataFrame
) -> str:
"""
Combine the message texts for each parent message ID based \
on the list of message threads.
Args:
message_threads (dict): A dictionary where the key is the parent message \
ID and the value is a list of message IDs in ascending order.
data (pd.DataFrame): A DataFrame containing the conversation data:
- message.sender_id
- text
- date
- message.id
- is_reply
- reply_to_id
Returns:
str: A combined string of message texts sorted by date.
"""
combined_text = ""
# Iterate through sorted parent message IDs
for parent_id, message_ids in message_threads.items():
# Get the message texts for the message IDs and sort them by date
message_texts = (
data[data["message.id"].isin(message_ids)]
.sort_values(by="date")["text"]
.tolist()
)
message_texts = [str(elem) for elem in message_texts]
# Combine the message texts
combined_text += " ".join(message_texts) + ".\n"
return combined_text.strip()
def load(self) -> List[Document]:
"""Load documents."""
if self.chat_entity is not None:
try:
import nest_asyncio
nest_asyncio.apply()
asyncio.run(self.fetch_data_from_telegram())
except ImportError:
raise ImportError(
"""`nest_asyncio` package not found.
please install with `pip install nest_asyncio`
"""
)
p = Path(self.file_path)
with open(p, encoding="utf8") as f:
d = json.load(f)
try:
import pandas as pd
except ImportError:
raise ImportError(
"""`pandas` package not found.
please install with `pip install pandas`
"""
)
normalized_messages = pd.json_normalize(d)
df = pd.DataFrame(normalized_messages)
message_threads = self._get_message_threads(df)
combined_texts = self._combine_message_texts(message_threads, df)
return text_to_docs(combined_texts)

View File

@@ -1,149 +0,0 @@
from typing import Any, Iterator, List, Sequence, cast
from langchain_core.documents import BaseDocumentTransformer, Document
class BeautifulSoupTransformer(BaseDocumentTransformer):
"""Transform HTML content by extracting specific tags and removing unwanted ones.
Example:
.. code-block:: python
from langchain_community.document_transformers import BeautifulSoupTransformer
bs4_transformer = BeautifulSoupTransformer()
docs_transformed = bs4_transformer.transform_documents(docs)
""" # noqa: E501
def __init__(self) -> None:
"""
Initialize the transformer.
This checks if the BeautifulSoup4 package is installed.
If not, it raises an ImportError.
"""
try:
import bs4 # noqa:F401
except ImportError:
raise ImportError(
"BeautifulSoup4 is required for BeautifulSoupTransformer. "
"Please install it with `pip install beautifulsoup4`."
)
def transform_documents(
self,
documents: Sequence[Document],
unwanted_tags: List[str] = ["script", "style"],
tags_to_extract: List[str] = ["p", "li", "div", "a"],
remove_lines: bool = True,
**kwargs: Any,
) -> Sequence[Document]:
"""
Transform a list of Document objects by cleaning their HTML content.
Args:
documents: A sequence of Document objects containing HTML content.
unwanted_tags: A list of tags to be removed from the HTML.
tags_to_extract: A list of tags whose content will be extracted.
remove_lines: If set to True, unnecessary lines will be
removed from the HTML content.
Returns:
A sequence of Document objects with transformed content.
"""
for doc in documents:
cleaned_content = doc.page_content
cleaned_content = self.remove_unwanted_tags(cleaned_content, unwanted_tags)
cleaned_content = self.extract_tags(cleaned_content, tags_to_extract)
if remove_lines:
cleaned_content = self.remove_unnecessary_lines(cleaned_content)
doc.page_content = cleaned_content
return documents
@staticmethod
def remove_unwanted_tags(html_content: str, unwanted_tags: List[str]) -> str:
"""
Remove unwanted tags from a given HTML content.
Args:
html_content: The original HTML content string.
unwanted_tags: A list of tags to be removed from the HTML.
Returns:
A cleaned HTML string with unwanted tags removed.
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
for tag in unwanted_tags:
for element in soup.find_all(tag):
element.decompose()
return str(soup)
@staticmethod
def extract_tags(html_content: str, tags: List[str]) -> str:
"""
Extract specific tags from a given HTML content.
Args:
html_content: The original HTML content string.
tags: A list of tags to be extracted from the HTML.
Returns:
A string combining the content of the extracted tags.
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
text_parts: List[str] = []
for element in soup.find_all():
if element.name in tags:
# Extract all navigable strings recursively from this element.
text_parts += get_navigable_strings(element)
# To avoid duplicate text, remove all descendants from the soup.
element.decompose()
return " ".join(text_parts)
@staticmethod
def remove_unnecessary_lines(content: str) -> str:
"""
Clean up the content by removing unnecessary lines.
Args:
content: A string, which may contain unnecessary lines or spaces.
Returns:
A cleaned string with unnecessary lines removed.
"""
lines = content.split("\n")
stripped_lines = [line.strip() for line in lines]
non_empty_lines = [line for line in stripped_lines if line]
cleaned_content = " ".join(non_empty_lines)
return cleaned_content
async def atransform_documents(
self,
documents: Sequence[Document],
**kwargs: Any,
) -> Sequence[Document]:
raise NotImplementedError
def get_navigable_strings(element: Any) -> Iterator[str]:
from bs4 import NavigableString, Tag
for child in cast(Tag, element).children:
if isinstance(child, Tag):
yield from get_navigable_strings(child)
elif isinstance(child, NavigableString):
if (element.name == "a") and (href := element.get("href")):
yield f"{child.strip()} ({href})"
else:
yield child.strip()

View File

@@ -1,140 +0,0 @@
"""Document transformers that use OpenAI Functions models"""
from typing import Any, Dict, Optional, Sequence, Type, Union
from langchain_core.documents import BaseDocumentTransformer, Document
from langchain_core.language_models import BaseLanguageModel
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel
class OpenAIMetadataTagger(BaseDocumentTransformer, BaseModel):
"""Extract metadata tags from document contents using OpenAI functions.
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import OpenAIMetadataTagger
from langchain_core.documents import Document
schema = {
"properties": {
"movie_title": { "type": "string" },
"critic": { "type": "string" },
"tone": {
"type": "string",
"enum": ["positive", "negative"]
},
"rating": {
"type": "integer",
"description": "The number of stars the critic rated the movie"
}
},
"required": ["movie_title", "critic", "tone"]
}
# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
tagging_chain = create_tagging_chain(schema, llm)
document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain)
original_documents = [
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
]
enhanced_documents = document_transformer.transform_documents(original_documents)
""" # noqa: E501
tagging_chain: Any
"""The chain used to extract metadata from each document."""
def transform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
"""Automatically extract and populate metadata
for each document according to the provided schema."""
new_documents = []
for document in documents:
extracted_metadata: Dict = self.tagging_chain.run(document.page_content) # type: ignore[assignment] # noqa: E501
new_document = Document(
page_content=document.page_content,
metadata={**extracted_metadata, **document.metadata},
)
new_documents.append(new_document)
return new_documents
async def atransform_documents(
self, documents: Sequence[Document], **kwargs: Any
) -> Sequence[Document]:
raise NotImplementedError
def create_metadata_tagger(
metadata_schema: Union[Dict[str, Any], Type[BaseModel]],
llm: BaseLanguageModel,
prompt: Optional[ChatPromptTemplate] = None,
*,
tagging_chain_kwargs: Optional[Dict] = None,
) -> OpenAIMetadataTagger:
"""Create a DocumentTransformer that uses an OpenAI function chain to automatically
tag documents with metadata based on their content and an input schema.
Args:
metadata_schema: Either a dictionary or pydantic.BaseModel class. If a dictionary
is passed in, it's assumed to already be a valid JsonSchema.
For best results, pydantic.BaseModels should have docstrings describing what
the schema represents and descriptions for the parameters.
llm: Language model to use, assumed to support the OpenAI function-calling API.
Defaults to use "gpt-3.5-turbo-0613"
prompt: BasePromptTemplate to pass to the model.
Returns:
An LLMChain that will pass the given function to the model.
Example:
.. code-block:: python
from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import create_metadata_tagger
from langchain_core.documents import Document
schema = {
"properties": {
"movie_title": { "type": "string" },
"critic": { "type": "string" },
"tone": {
"type": "string",
"enum": ["positive", "negative"]
},
"rating": {
"type": "integer",
"description": "The number of stars the critic rated the movie"
}
},
"required": ["movie_title", "critic", "tone"]
}
# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
document_transformer = create_metadata_tagger(schema, llm)
original_documents = [
Document(page_content="Review of The Bee Movie\nBy Roger Ebert\n\nThis is the greatest movie ever made. 4 out of 5 stars."),
Document(page_content="Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.", metadata={"reliable": False}),
]
enhanced_documents = document_transformer.transform_documents(original_documents)
""" # noqa: E501
from langchain.chains.openai_functions import create_tagging_chain
metadata_schema = (
metadata_schema
if isinstance(metadata_schema, dict)
else metadata_schema.schema()
)
_tagging_chain_kwargs = tagging_chain_kwargs or {}
tagging_chain = create_tagging_chain(
metadata_schema, llm, prompt=prompt, **_tagging_chain_kwargs
)
return OpenAIMetadataTagger(tagging_chain=tagging_chain)

View File

@@ -1,160 +0,0 @@
"""**Embedding models** are wrappers around embedding models
from different APIs and services.
**Embedding models** can be LLMs or not.
**Class hierarchy:**
.. code-block::
Embeddings --> <name>Embeddings # Examples: CohereEmbeddings, HuggingFaceEmbeddings
"""
import logging
from typing import Any
from langchain_community.embeddings.aleph_alpha import (
AlephAlphaAsymmetricSemanticEmbedding,
AlephAlphaSymmetricSemanticEmbedding,
)
from langchain_community.embeddings.awa import AwaEmbeddings
from langchain_community.embeddings.baidu_qianfan_endpoint import (
QianfanEmbeddingsEndpoint,
)
from langchain_community.embeddings.bedrock import BedrockEmbeddings
from langchain_community.embeddings.bookend import BookendEmbeddings
from langchain_community.embeddings.cache import CacheBackedEmbeddings
from langchain_community.embeddings.clarifai import ClarifaiEmbeddings
from langchain_community.embeddings.cohere import CohereEmbeddings
from langchain_community.embeddings.dashscope import DashScopeEmbeddings
from langchain_community.embeddings.databricks import DatabricksEmbeddings
from langchain_community.embeddings.deepinfra import DeepInfraEmbeddings
from langchain_community.embeddings.edenai import EdenAiEmbeddings
from langchain_community.embeddings.elasticsearch import ElasticsearchEmbeddings
from langchain_community.embeddings.embaas import EmbaasEmbeddings
from langchain_community.embeddings.ernie import ErnieEmbeddings
from langchain_community.embeddings.fake import (
DeterministicFakeEmbedding,
FakeEmbeddings,
)
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_community.embeddings.google_palm import GooglePalmEmbeddings
from langchain_community.embeddings.gpt4all import GPT4AllEmbeddings
from langchain_community.embeddings.gradient_ai import GradientEmbeddings
from langchain_community.embeddings.huggingface import (
HuggingFaceBgeEmbeddings,
HuggingFaceEmbeddings,
HuggingFaceInferenceAPIEmbeddings,
HuggingFaceInstructEmbeddings,
)
from langchain_community.embeddings.huggingface_hub import HuggingFaceHubEmbeddings
from langchain_community.embeddings.infinity import InfinityEmbeddings
from langchain_community.embeddings.javelin_ai_gateway import JavelinAIGatewayEmbeddings
from langchain_community.embeddings.jina import JinaEmbeddings
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
from langchain_community.embeddings.llamacpp import LlamaCppEmbeddings
from langchain_community.embeddings.localai import LocalAIEmbeddings
from langchain_community.embeddings.minimax import MiniMaxEmbeddings
from langchain_community.embeddings.mlflow import MlflowEmbeddings
from langchain_community.embeddings.mlflow_gateway import MlflowAIGatewayEmbeddings
from langchain_community.embeddings.modelscope_hub import ModelScopeEmbeddings
from langchain_community.embeddings.mosaicml import MosaicMLInstructorEmbeddings
from langchain_community.embeddings.nlpcloud import NLPCloudEmbeddings
from langchain_community.embeddings.octoai_embeddings import OctoAIEmbeddings
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.embeddings.sagemaker_endpoint import (
SagemakerEndpointEmbeddings,
)
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
from langchain_community.embeddings.self_hosted_hugging_face import (
SelfHostedHuggingFaceEmbeddings,
SelfHostedHuggingFaceInstructEmbeddings,
)
from langchain_community.embeddings.sentence_transformer import (
SentenceTransformerEmbeddings,
)
from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain_community.embeddings.tensorflow_hub import TensorflowHubEmbeddings
from langchain_community.embeddings.vertexai import VertexAIEmbeddings
from langchain_community.embeddings.voyageai import VoyageEmbeddings
from langchain_community.embeddings.xinference import XinferenceEmbeddings
logger = logging.getLogger(__name__)
__all__ = [
"CacheBackedEmbeddings",
"ClarifaiEmbeddings",
"CohereEmbeddings",
"DatabricksEmbeddings",
"ElasticsearchEmbeddings",
"FastEmbedEmbeddings",
"HuggingFaceEmbeddings",
"HuggingFaceInferenceAPIEmbeddings",
"InfinityEmbeddings",
"GradientEmbeddings",
"JinaEmbeddings",
"LlamaCppEmbeddings",
"HuggingFaceHubEmbeddings",
"MlflowEmbeddings",
"MlflowAIGatewayEmbeddings",
"ModelScopeEmbeddings",
"TensorflowHubEmbeddings",
"SagemakerEndpointEmbeddings",
"HuggingFaceInstructEmbeddings",
"MosaicMLInstructorEmbeddings",
"SelfHostedEmbeddings",
"SelfHostedHuggingFaceEmbeddings",
"SelfHostedHuggingFaceInstructEmbeddings",
"FakeEmbeddings",
"DeterministicFakeEmbedding",
"AlephAlphaAsymmetricSemanticEmbedding",
"AlephAlphaSymmetricSemanticEmbedding",
"SentenceTransformerEmbeddings",
"GooglePalmEmbeddings",
"MiniMaxEmbeddings",
"VertexAIEmbeddings",
"BedrockEmbeddings",
"DeepInfraEmbeddings",
"EdenAiEmbeddings",
"DashScopeEmbeddings",
"EmbaasEmbeddings",
"OctoAIEmbeddings",
"SpacyEmbeddings",
"NLPCloudEmbeddings",
"GPT4AllEmbeddings",
"XinferenceEmbeddings",
"LocalAIEmbeddings",
"AwaEmbeddings",
"HuggingFaceBgeEmbeddings",
"ErnieEmbeddings",
"JavelinAIGatewayEmbeddings",
"OllamaEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",
"BookendEmbeddings",
]
# TODO: this is in here to maintain backwards compatibility
class HypotheticalDocumentEmbedder:
def __init__(self, *args: Any, **kwargs: Any):
logger.warning(
"Using a deprecated class. Please use "
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
)
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
return H(*args, **kwargs) # type: ignore
@classmethod
def from_llm(cls, *args: Any, **kwargs: Any) -> Any:
logger.warning(
"Using a deprecated class. Please use "
"`from langchain.chains import HypotheticalDocumentEmbedder` instead"
)
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder as H
return H.from_llm(*args, **kwargs)

View File

@@ -1,176 +0,0 @@
"""Module contains code for a cache backed embedder.
The cache backed embedder is a wrapper around an embedder that caches
embeddings in a key-value store. The cache is used to avoid recomputing
embeddings for the same text.
The text is hashed and the hash is used as the key in the cache.
"""
from __future__ import annotations
import hashlib
import json
import uuid
from functools import partial
from typing import Callable, List, Sequence, Union, cast
from langchain_core.embeddings import Embeddings
from langchain_core.stores import BaseStore, ByteStore
from langchain_community.storage.encoder_backed import EncoderBackedStore
NAMESPACE_UUID = uuid.UUID(int=1985)
def _hash_string_to_uuid(input_string: str) -> uuid.UUID:
"""Hash a string and returns the corresponding UUID."""
hash_value = hashlib.sha1(input_string.encode("utf-8")).hexdigest()
return uuid.uuid5(NAMESPACE_UUID, hash_value)
def _key_encoder(key: str, namespace: str) -> str:
"""Encode a key."""
return namespace + str(_hash_string_to_uuid(key))
def _create_key_encoder(namespace: str) -> Callable[[str], str]:
"""Create an encoder for a key."""
return partial(_key_encoder, namespace=namespace)
def _value_serializer(value: Sequence[float]) -> bytes:
"""Serialize a value."""
return json.dumps(value).encode()
def _value_deserializer(serialized_value: bytes) -> List[float]:
"""Deserialize a value."""
return cast(List[float], json.loads(serialized_value.decode()))
class CacheBackedEmbeddings(Embeddings):
"""Interface for caching results from embedding models.
The interface allows works with any store that implements
the abstract store interface accepting keys of type str and values of list of
floats.
If need be, the interface can be extended to accept other implementations
of the value serializer and deserializer, as well as the key encoder.
Examples:
.. code-block: python
from langchain_community.embeddings import CacheBackedEmbeddings, OpenAIEmbeddings
from langchain_community.storage import LocalFileStore
store = LocalFileStore('./my_cache')
underlying_embedder = OpenAIEmbeddings()
embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embedder, store, namespace=underlying_embedder.model
)
# Embedding is computed and cached
embeddings = embedder.embed_documents(["hello", "goodbye"])
# Embeddings are retrieved from the cache, no computation is done
embeddings = embedder.embed_documents(["hello", "goodbye"])
""" # noqa: E501
def __init__(
self,
underlying_embeddings: Embeddings,
document_embedding_store: BaseStore[str, List[float]],
) -> None:
"""Initialize the embedder.
Args:
underlying_embeddings: the embedder to use for computing embeddings.
document_embedding_store: The store to use for caching document embeddings.
"""
super().__init__()
self.document_embedding_store = document_embedding_store
self.underlying_embeddings = underlying_embeddings
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed a list of texts.
The method first checks the cache for the embeddings.
If the embeddings are not found, the method uses the underlying embedder
to embed the documents and stores the results in the cache.
Args:
texts: A list of texts to embed.
Returns:
A list of embeddings for the given texts.
"""
vectors: List[Union[List[float], None]] = self.document_embedding_store.mget(
texts
)
missing_indices: List[int] = [
i for i, vector in enumerate(vectors) if vector is None
]
missing_texts = [texts[i] for i in missing_indices]
if missing_texts:
missing_vectors = self.underlying_embeddings.embed_documents(missing_texts)
self.document_embedding_store.mset(
list(zip(missing_texts, missing_vectors))
)
for index, updated_vector in zip(missing_indices, missing_vectors):
vectors[index] = updated_vector
return cast(
List[List[float]], vectors
) # Nones should have been resolved by now
def embed_query(self, text: str) -> List[float]:
"""Embed query text.
This method does not support caching at the moment.
Support for caching queries is easily to implement, but might make
sense to hold off to see the most common patterns.
If the cache has an eviction policy, we may need to be a bit more careful
about sharing the cache between documents and queries. Generally,
one is OK evicting query caches, but document caches should be kept.
Args:
text: The text to embed.
Returns:
The embedding for the given text.
"""
return self.underlying_embeddings.embed_query(text)
@classmethod
def from_bytes_store(
cls,
underlying_embeddings: Embeddings,
document_embedding_cache: ByteStore,
*,
namespace: str = "",
) -> CacheBackedEmbeddings:
"""On-ramp that adds the necessary serialization and encoding to the store.
Args:
underlying_embeddings: The embedder to use for embedding.
document_embedding_cache: The cache to use for storing document embeddings.
*,
namespace: The namespace to use for document cache.
This namespace is used to avoid collisions with other caches.
For example, set it to the name of the embedding model used.
"""
namespace = namespace
key_encoder = _create_key_encoder(namespace)
encoder_backed_store = EncoderBackedStore[str, List[float]](
document_embedding_cache,
key_encoder,
_value_serializer,
_value_deserializer,
)
return cls(underlying_embeddings, encoder_backed_store)

View File

@@ -1,343 +0,0 @@
from typing import Any, Dict, List, Optional
import requests
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Extra, Field
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
DEFAULT_BGE_MODEL = "BAAI/bge-large-en"
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
DEFAULT_QUERY_INSTRUCTION = (
"Represent the question for retrieving supporting documents: "
)
DEFAULT_QUERY_BGE_INSTRUCTION_EN = (
"Represent this question for searching relevant passages: "
)
DEFAULT_QUERY_BGE_INSTRUCTION_ZH = "为这个句子生成表示以用于检索相关文章:"
class HuggingFaceEmbeddings(BaseModel, Embeddings):
"""HuggingFace sentence_transformers embedding models.
To use, you should have the ``sentence_transformers`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_MODEL_NAME
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
multi_process: bool = False
"""Run encode() on multiple GPUs."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
import sentence_transformers
except ImportError as exc:
raise ImportError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence-transformers`."
) from exc
self.client = sentence_transformers.SentenceTransformer(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
import sentence_transformers
texts = list(map(lambda x: x.replace("\n", " "), texts))
if self.multi_process:
pool = self.client.start_multi_process_pool()
embeddings = self.client.encode_multi_process(texts, pool)
sentence_transformers.SentenceTransformer.stop_multi_process_pool(pool)
else:
embeddings = self.client.encode(texts, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]
class HuggingFaceInstructEmbeddings(BaseModel, Embeddings):
"""Wrapper around sentence_transformers embedding models.
To use, you should have the ``sentence_transformers``
and ``InstructorEmbedding`` python packages installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
model_name = "hkunlp/instructor-large"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_INSTRUCT_MODEL
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
"""Instruction to use for embedding documents."""
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
"""Instruction to use for embedding query."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
from InstructorEmbedding import INSTRUCTOR
self.client = INSTRUCTOR(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
except ImportError as e:
raise ImportError("Dependencies for InstructorEmbedding not found.") from e
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
instruction_pairs = [[self.embed_instruction, text] for text in texts]
embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace instruct model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
instruction_pair = [self.query_instruction, text]
embedding = self.client.encode([instruction_pair], **self.encode_kwargs)[0]
return embedding.tolist()
class HuggingFaceBgeEmbeddings(BaseModel, Embeddings):
"""HuggingFace BGE sentence_transformers embedding models.
To use, you should have the ``sentence_transformers`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
model_name = "BAAI/bge-large-en"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceBgeEmbeddings(
model_name=model_name,
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs
)
"""
client: Any #: :meta private:
model_name: str = DEFAULT_BGE_MODEL
"""Model name to use."""
cache_folder: Optional[str] = None
"""Path to store models.
Can be also set by SENTENCE_TRANSFORMERS_HOME environment variable."""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass to the model."""
encode_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Keyword arguments to pass when calling the `encode` method of the model."""
query_instruction: str = DEFAULT_QUERY_BGE_INSTRUCTION_EN
"""Instruction to use for embedding query."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
super().__init__(**kwargs)
try:
import sentence_transformers
except ImportError as exc:
raise ImportError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence_transformers`."
) from exc
self.client = sentence_transformers.SentenceTransformer(
self.model_name, cache_folder=self.cache_folder, **self.model_kwargs
)
if "-zh" in self.model_name:
self.query_instruction = DEFAULT_QUERY_BGE_INSTRUCTION_ZH
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
texts = [t.replace("\n", " ") for t in texts]
embeddings = self.client.encode(texts, **self.encode_kwargs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
text = text.replace("\n", " ")
embedding = self.client.encode(
self.query_instruction + text, **self.encode_kwargs
)
return embedding.tolist()
class HuggingFaceInferenceAPIEmbeddings(BaseModel, Embeddings):
"""Embed texts using the HuggingFace API.
Requires a HuggingFace Inference API key and a model name.
"""
api_key: str
"""Your API key for the HuggingFace Inference API."""
model_name: str = "sentence-transformers/all-MiniLM-L6-v2"
"""The name of the model to use for text embeddings."""
api_url: Optional[str] = None
"""Custom inference endpoint url. None for using default public url."""
@property
def _api_url(self) -> str:
return self.api_url or self._default_api_url
@property
def _default_api_url(self) -> str:
return (
"https://api-inference.huggingface.co"
"/pipeline"
"/feature-extraction"
f"/{self.model_name}"
)
@property
def _headers(self) -> dict:
return {"Authorization": f"Bearer {self.api_key}"}
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Get the embeddings for a list of texts.
Args:
texts (Documents): A list of texts to get embeddings for.
Returns:
Embedded texts as List[List[float]], where each inner List[float]
corresponds to a single input text.
Example:
.. code-block:: python
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
hf_embeddings = HuggingFaceInferenceAPIEmbeddings(
api_key="your_api_key",
model_name="sentence-transformers/all-MiniLM-l6-v2"
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)
""" # noqa: E501
response = requests.post(
self._api_url,
headers=self._headers,
json={
"inputs": texts,
"options": {"wait_for_model": True, "use_cache": True},
},
)
return response.json()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]

View File

@@ -1,92 +0,0 @@
import os
import sys
from typing import Any, List
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Extra
class JohnSnowLabsEmbeddings(BaseModel, Embeddings):
"""JohnSnowLabs embedding models
To use, you should have the ``johnsnowlabs`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
embedding = JohnSnowLabsEmbeddings(model='embed_sentence.bert')
output = embedding.embed_query("foo bar")
""" # noqa: E501
model: Any = "embed_sentence.bert"
def __init__(
self,
model: Any = "embed_sentence.bert",
hardware_target: str = "cpu",
**kwargs: Any,
):
"""Initialize the johnsnowlabs model."""
super().__init__(**kwargs)
# 1) Check imports
try:
from johnsnowlabs import nlp
from nlu.pipe.pipeline import NLUPipeline
except ImportError as exc:
raise ImportError(
"Could not import johnsnowlabs python package. "
"Please install it with `pip install johnsnowlabs`."
) from exc
# 2) Start a Spark Session
try:
os.environ["PYSPARK_PYTHON"] = sys.executable
os.environ["PYSPARK_DRIVER_PYTHON"] = sys.executable
nlp.start(hardware_target=hardware_target)
except Exception as exc:
raise Exception("Failure starting Spark Session") from exc
# 3) Load the model
try:
if isinstance(model, str):
self.model = nlp.load(model)
elif isinstance(model, NLUPipeline):
self.model = model
else:
self.model = nlp.to_nlu_pipe(model)
except Exception as exc:
raise Exception("Failure loading model") from exc
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a JohnSnowLabs transformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
df = self.model.predict(texts, output_level="document")
emb_col = None
for c in df.columns:
if "embedding" in c:
emb_col = c
return [vec.tolist() for vec in df[emb_col].tolist()]
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a JohnSnowLabs transformer model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
return self.embed_documents([text])[0]

View File

@@ -1,168 +0,0 @@
import importlib
import logging
from typing import Any, Callable, List, Optional
from langchain_community.embeddings.self_hosted import SelfHostedEmbeddings
DEFAULT_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
DEFAULT_INSTRUCT_MODEL = "hkunlp/instructor-large"
DEFAULT_EMBED_INSTRUCTION = "Represent the document for retrieval: "
DEFAULT_QUERY_INSTRUCTION = (
"Represent the question for retrieving supporting documents: "
)
logger = logging.getLogger(__name__)
def _embed_documents(client: Any, *args: Any, **kwargs: Any) -> List[List[float]]:
"""Inference function to send to the remote hardware.
Accepts a sentence_transformer model_id and
returns a list of embeddings for each document in the batch.
"""
return client.encode(*args, **kwargs)
def load_embedding_model(model_id: str, instruct: bool = False, device: int = 0) -> Any:
"""Load the embedding model."""
if not instruct:
import sentence_transformers
client = sentence_transformers.SentenceTransformer(model_id)
else:
from InstructorEmbedding import INSTRUCTOR
client = INSTRUCTOR(model_id)
if importlib.util.find_spec("torch") is not None:
import torch
cuda_device_count = torch.cuda.device_count()
if device < -1 or (device >= cuda_device_count):
raise ValueError(
f"Got device=={device}, "
f"device is required to be within [-1, {cuda_device_count})"
)
if device < 0 and cuda_device_count > 0:
logger.warning(
"Device has %d GPUs available. "
"Provide device={deviceId} to `from_model_id` to use available"
"GPUs for execution. deviceId is -1 for CPU and "
"can be a positive integer associated with CUDA device id.",
cuda_device_count,
)
client = client.to(device)
return client
class SelfHostedHuggingFaceEmbeddings(SelfHostedEmbeddings):
"""HuggingFace embedding models on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
and Lambda, as well as servers specified
by IP address and SSH credentials (such as on-prem, or another cloud
like Paperspace, Coreweave, etc.).
To use, you should have the ``runhouse`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import SelfHostedHuggingFaceEmbeddings
import runhouse as rh
model_name = "sentence-transformers/all-mpnet-base-v2"
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1")
hf = SelfHostedHuggingFaceEmbeddings(model_name=model_name, hardware=gpu)
"""
client: Any #: :meta private:
model_id: str = DEFAULT_MODEL_NAME
"""Model name to use."""
model_reqs: List[str] = ["./", "sentence_transformers", "torch"]
"""Requirements to install on hardware to inference the model."""
hardware: Any
"""Remote hardware to send the inference function to."""
model_load_fn: Callable = load_embedding_model
"""Function to load the model remotely on the server."""
load_fn_kwargs: Optional[dict] = None
"""Keyword arguments to pass to the model load function."""
inference_fn: Callable = _embed_documents
"""Inference function to extract the embeddings."""
def __init__(self, **kwargs: Any):
"""Initialize the remote inference function."""
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
load_fn_kwargs["model_id"] = load_fn_kwargs.get("model_id", DEFAULT_MODEL_NAME)
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", False)
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
class SelfHostedHuggingFaceInstructEmbeddings(SelfHostedHuggingFaceEmbeddings):
"""HuggingFace InstructEmbedding models on self-hosted remote hardware.
Supported hardware includes auto-launched instances on AWS, GCP, Azure,
and Lambda, as well as servers specified
by IP address and SSH credentials (such as on-prem, or another
cloud like Paperspace, Coreweave, etc.).
To use, you should have the ``runhouse`` python package installed.
Example:
.. code-block:: python
from langchain_community.embeddings import SelfHostedHuggingFaceInstructEmbeddings
import runhouse as rh
model_name = "hkunlp/instructor-large"
gpu = rh.cluster(name='rh-a10x', instance_type='A100:1')
hf = SelfHostedHuggingFaceInstructEmbeddings(
model_name=model_name, hardware=gpu)
""" # noqa: E501
model_id: str = DEFAULT_INSTRUCT_MODEL
"""Model name to use."""
embed_instruction: str = DEFAULT_EMBED_INSTRUCTION
"""Instruction to use for embedding documents."""
query_instruction: str = DEFAULT_QUERY_INSTRUCTION
"""Instruction to use for embedding query."""
model_reqs: List[str] = ["./", "InstructorEmbedding", "torch"]
"""Requirements to install on hardware to inference the model."""
def __init__(self, **kwargs: Any):
"""Initialize the remote inference function."""
load_fn_kwargs = kwargs.pop("load_fn_kwargs", {})
load_fn_kwargs["model_id"] = load_fn_kwargs.get(
"model_id", DEFAULT_INSTRUCT_MODEL
)
load_fn_kwargs["instruct"] = load_fn_kwargs.get("instruct", True)
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
instruction_pairs = []
for text in texts:
instruction_pairs.append([self.embed_instruction, text])
embeddings = self.client(self.pipeline_ref, instruction_pairs)
return embeddings.tolist()
def embed_query(self, text: str) -> List[float]:
"""Compute query embeddings using a HuggingFace instruct model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
instruction_pair = [self.query_instruction, text]
embedding = self.client(self.pipeline_ref, [instruction_pair])[0]
return embedding.tolist()

View File

@@ -1,853 +0,0 @@
"""
**LLM** classes provide
access to the large language model (**LLM**) APIs and services.
**Class hierarchy:**
.. code-block::
BaseLanguageModel --> BaseLLM --> LLM --> <name> # Examples: AI21, HuggingFaceHub
**Main helpers:**
.. code-block::
LLMResult, PromptValue,
CallbackManagerForLLMRun, AsyncCallbackManagerForLLMRun,
CallbackManager, AsyncCallbackManager,
AIMessage, BaseMessage
""" # noqa: E501
from typing import Any, Callable, Dict, Type
from langchain_core.language_models.llms import BaseLLM
def _import_ai21() -> Any:
from langchain_community.llms.ai21 import AI21
return AI21
def _import_aleph_alpha() -> Any:
from langchain_community.llms.aleph_alpha import AlephAlpha
return AlephAlpha
def _import_amazon_api_gateway() -> Any:
from langchain_community.llms.amazon_api_gateway import AmazonAPIGateway
return AmazonAPIGateway
def _import_anthropic() -> Any:
from langchain_community.llms.anthropic import Anthropic
return Anthropic
def _import_anyscale() -> Any:
from langchain_community.llms.anyscale import Anyscale
return Anyscale
def _import_arcee() -> Any:
from langchain_community.llms.arcee import Arcee
return Arcee
def _import_aviary() -> Any:
from langchain_community.llms.aviary import Aviary
return Aviary
def _import_azureml_endpoint() -> Any:
from langchain_community.llms.azureml_endpoint import AzureMLOnlineEndpoint
return AzureMLOnlineEndpoint
def _import_baidu_qianfan_endpoint() -> Any:
from langchain_community.llms.baidu_qianfan_endpoint import QianfanLLMEndpoint
return QianfanLLMEndpoint
def _import_bananadev() -> Any:
from langchain_community.llms.bananadev import Banana
return Banana
def _import_baseten() -> Any:
from langchain_community.llms.baseten import Baseten
return Baseten
def _import_beam() -> Any:
from langchain_community.llms.beam import Beam
return Beam
def _import_bedrock() -> Any:
from langchain_community.llms.bedrock import Bedrock
return Bedrock
def _import_bittensor() -> Any:
from langchain_community.llms.bittensor import NIBittensorLLM
return NIBittensorLLM
def _import_cerebriumai() -> Any:
from langchain_community.llms.cerebriumai import CerebriumAI
return CerebriumAI
def _import_chatglm() -> Any:
from langchain_community.llms.chatglm import ChatGLM
return ChatGLM
def _import_clarifai() -> Any:
from langchain_community.llms.clarifai import Clarifai
return Clarifai
def _import_cohere() -> Any:
from langchain_community.llms.cohere import Cohere
return Cohere
def _import_ctransformers() -> Any:
from langchain_community.llms.ctransformers import CTransformers
return CTransformers
def _import_ctranslate2() -> Any:
from langchain_community.llms.ctranslate2 import CTranslate2
return CTranslate2
def _import_databricks() -> Any:
from langchain_community.llms.databricks import Databricks
return Databricks
def _import_databricks_chat() -> Any:
from langchain_community.chat_models.databricks import ChatDatabricks
return ChatDatabricks
def _import_deepinfra() -> Any:
from langchain_community.llms.deepinfra import DeepInfra
return DeepInfra
def _import_deepsparse() -> Any:
from langchain_community.llms.deepsparse import DeepSparse
return DeepSparse
def _import_edenai() -> Any:
from langchain_community.llms.edenai import EdenAI
return EdenAI
def _import_fake() -> Any:
from langchain_community.llms.fake import FakeListLLM
return FakeListLLM
def _import_fireworks() -> Any:
from langchain_community.llms.fireworks import Fireworks
return Fireworks
def _import_forefrontai() -> Any:
from langchain_community.llms.forefrontai import ForefrontAI
return ForefrontAI
def _import_gigachat() -> Any:
from langchain_community.llms.gigachat import GigaChat
return GigaChat
def _import_google_palm() -> Any:
from langchain_community.llms.google_palm import GooglePalm
return GooglePalm
def _import_gooseai() -> Any:
from langchain_community.llms.gooseai import GooseAI
return GooseAI
def _import_gpt4all() -> Any:
from langchain_community.llms.gpt4all import GPT4All
return GPT4All
def _import_gradient_ai() -> Any:
from langchain_community.llms.gradient_ai import GradientLLM
return GradientLLM
def _import_huggingface_endpoint() -> Any:
from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
return HuggingFaceEndpoint
def _import_huggingface_hub() -> Any:
from langchain_community.llms.huggingface_hub import HuggingFaceHub
return HuggingFaceHub
def _import_huggingface_pipeline() -> Any:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
return HuggingFacePipeline
def _import_huggingface_text_gen_inference() -> Any:
from langchain_community.llms.huggingface_text_gen_inference import (
HuggingFaceTextGenInference,
)
return HuggingFaceTextGenInference
def _import_human() -> Any:
from langchain_community.llms.human import HumanInputLLM
return HumanInputLLM
def _import_javelin_ai_gateway() -> Any:
from langchain_community.llms.javelin_ai_gateway import JavelinAIGateway
return JavelinAIGateway
def _import_koboldai() -> Any:
from langchain_community.llms.koboldai import KoboldApiLLM
return KoboldApiLLM
def _import_llamacpp() -> Any:
from langchain_community.llms.llamacpp import LlamaCpp
return LlamaCpp
def _import_manifest() -> Any:
from langchain_community.llms.manifest import ManifestWrapper
return ManifestWrapper
def _import_minimax() -> Any:
from langchain_community.llms.minimax import Minimax
return Minimax
def _import_mlflow() -> Any:
from langchain_community.llms.mlflow import Mlflow
return Mlflow
def _import_mlflow_chat() -> Any:
from langchain_community.chat_models.mlflow import ChatMlflow
return ChatMlflow
def _import_mlflow_ai_gateway() -> Any:
from langchain_community.llms.mlflow_ai_gateway import MlflowAIGateway
return MlflowAIGateway
def _import_modal() -> Any:
from langchain_community.llms.modal import Modal
return Modal
def _import_mosaicml() -> Any:
from langchain_community.llms.mosaicml import MosaicML
return MosaicML
def _import_nlpcloud() -> Any:
from langchain_community.llms.nlpcloud import NLPCloud
return NLPCloud
def _import_octoai_endpoint() -> Any:
from langchain_community.llms.octoai_endpoint import OctoAIEndpoint
return OctoAIEndpoint
def _import_ollama() -> Any:
from langchain_community.llms.ollama import Ollama
return Ollama
def _import_opaqueprompts() -> Any:
from langchain_community.llms.opaqueprompts import OpaquePrompts
return OpaquePrompts
def _import_openllm() -> Any:
from langchain_community.llms.openllm import OpenLLM
return OpenLLM
def _import_openlm() -> Any:
from langchain_community.llms.openlm import OpenLM
return OpenLM
def _import_pai_eas_endpoint() -> Any:
from langchain_community.llms.pai_eas_endpoint import PaiEasEndpoint
return PaiEasEndpoint
def _import_petals() -> Any:
from langchain_community.llms.petals import Petals
return Petals
def _import_pipelineai() -> Any:
from langchain_community.llms.pipelineai import PipelineAI
return PipelineAI
def _import_predibase() -> Any:
from langchain_community.llms.predibase import Predibase
return Predibase
def _import_predictionguard() -> Any:
from langchain_community.llms.predictionguard import PredictionGuard
return PredictionGuard
def _import_promptlayer() -> Any:
from langchain_community.llms.promptlayer_openai import PromptLayerOpenAI
return PromptLayerOpenAI
def _import_promptlayer_chat() -> Any:
from langchain_community.llms.promptlayer_openai import PromptLayerOpenAIChat
return PromptLayerOpenAIChat
def _import_replicate() -> Any:
from langchain_community.llms.replicate import Replicate
return Replicate
def _import_rwkv() -> Any:
from langchain_community.llms.rwkv import RWKV
return RWKV
def _import_sagemaker_endpoint() -> Any:
from langchain_community.llms.sagemaker_endpoint import SagemakerEndpoint
return SagemakerEndpoint
def _import_self_hosted() -> Any:
from langchain_community.llms.self_hosted import SelfHostedPipeline
return SelfHostedPipeline
def _import_self_hosted_hugging_face() -> Any:
from langchain_community.llms.self_hosted_hugging_face import (
SelfHostedHuggingFaceLLM,
)
return SelfHostedHuggingFaceLLM
def _import_stochasticai() -> Any:
from langchain_community.llms.stochasticai import StochasticAI
return StochasticAI
def _import_symblai_nebula() -> Any:
from langchain_community.llms.symblai_nebula import Nebula
return Nebula
def _import_textgen() -> Any:
from langchain_community.llms.textgen import TextGen
return TextGen
def _import_titan_takeoff() -> Any:
from langchain_community.llms.titan_takeoff import TitanTakeoff
return TitanTakeoff
def _import_titan_takeoff_pro() -> Any:
from langchain_community.llms.titan_takeoff_pro import TitanTakeoffPro
return TitanTakeoffPro
def _import_together() -> Any:
from langchain_community.llms.together import Together
return Together
def _import_tongyi() -> Any:
from langchain_community.llms.tongyi import Tongyi
return Tongyi
def _import_vertex() -> Any:
from langchain_community.llms.vertexai import VertexAI
return VertexAI
def _import_vertex_model_garden() -> Any:
from langchain_community.llms.vertexai import VertexAIModelGarden
return VertexAIModelGarden
def _import_vllm() -> Any:
from langchain_community.llms.vllm import VLLM
return VLLM
def _import_vllm_openai() -> Any:
from langchain_community.llms.vllm import VLLMOpenAI
return VLLMOpenAI
def _import_watsonxllm() -> Any:
from langchain_community.llms.watsonxllm import WatsonxLLM
return WatsonxLLM
def _import_writer() -> Any:
from langchain_community.llms.writer import Writer
return Writer
def _import_xinference() -> Any:
from langchain_community.llms.xinference import Xinference
return Xinference
def _import_yandex_gpt() -> Any:
from langchain_community.llms.yandex import YandexGPT
return YandexGPT
def _import_volcengine_maas() -> Any:
from langchain_community.llms.volcengine_maas import VolcEngineMaasLLM
return VolcEngineMaasLLM
def __getattr__(name: str) -> Any:
if name == "AI21":
return _import_ai21()
elif name == "AlephAlpha":
return _import_aleph_alpha()
elif name == "AmazonAPIGateway":
return _import_amazon_api_gateway()
elif name == "Anthropic":
return _import_anthropic()
elif name == "Anyscale":
return _import_anyscale()
elif name == "Arcee":
return _import_arcee()
elif name == "Aviary":
return _import_aviary()
elif name == "AzureMLOnlineEndpoint":
return _import_azureml_endpoint()
elif name == "QianfanLLMEndpoint":
return _import_baidu_qianfan_endpoint()
elif name == "Banana":
return _import_bananadev()
elif name == "Baseten":
return _import_baseten()
elif name == "Beam":
return _import_beam()
elif name == "Bedrock":
return _import_bedrock()
elif name == "NIBittensorLLM":
return _import_bittensor()
elif name == "CerebriumAI":
return _import_cerebriumai()
elif name == "ChatGLM":
return _import_chatglm()
elif name == "Clarifai":
return _import_clarifai()
elif name == "Cohere":
return _import_cohere()
elif name == "CTransformers":
return _import_ctransformers()
elif name == "CTranslate2":
return _import_ctranslate2()
elif name == "Databricks":
return _import_databricks()
elif name == "DeepInfra":
return _import_deepinfra()
elif name == "DeepSparse":
return _import_deepsparse()
elif name == "EdenAI":
return _import_edenai()
elif name == "FakeListLLM":
return _import_fake()
elif name == "Fireworks":
return _import_fireworks()
elif name == "ForefrontAI":
return _import_forefrontai()
elif name == "GigaChat":
return _import_gigachat()
elif name == "GooglePalm":
return _import_google_palm()
elif name == "GooseAI":
return _import_gooseai()
elif name == "GPT4All":
return _import_gpt4all()
elif name == "GradientLLM":
return _import_gradient_ai()
elif name == "HuggingFaceEndpoint":
return _import_huggingface_endpoint()
elif name == "HuggingFaceHub":
return _import_huggingface_hub()
elif name == "HuggingFacePipeline":
return _import_huggingface_pipeline()
elif name == "HuggingFaceTextGenInference":
return _import_huggingface_text_gen_inference()
elif name == "HumanInputLLM":
return _import_human()
elif name == "JavelinAIGateway":
return _import_javelin_ai_gateway()
elif name == "KoboldApiLLM":
return _import_koboldai()
elif name == "LlamaCpp":
return _import_llamacpp()
elif name == "ManifestWrapper":
return _import_manifest()
elif name == "Minimax":
return _import_minimax()
elif name == "Mlflow":
return _import_mlflow()
elif name == "MlflowAIGateway":
return _import_mlflow_ai_gateway()
elif name == "Modal":
return _import_modal()
elif name == "MosaicML":
return _import_mosaicml()
elif name == "NLPCloud":
return _import_nlpcloud()
elif name == "OctoAIEndpoint":
return _import_octoai_endpoint()
elif name == "Ollama":
return _import_ollama()
elif name == "OpaquePrompts":
return _import_opaqueprompts()
elif name == "OpenLLM":
return _import_openllm()
elif name == "OpenLM":
return _import_openlm()
elif name == "PaiEasEndpoint":
return _import_pai_eas_endpoint()
elif name == "Petals":
return _import_petals()
elif name == "PipelineAI":
return _import_pipelineai()
elif name == "Predibase":
return _import_predibase()
elif name == "PredictionGuard":
return _import_predictionguard()
elif name == "PromptLayerOpenAI":
return _import_promptlayer()
elif name == "PromptLayerOpenAIChat":
return _import_promptlayer_chat()
elif name == "Replicate":
return _import_replicate()
elif name == "RWKV":
return _import_rwkv()
elif name == "SagemakerEndpoint":
return _import_sagemaker_endpoint()
elif name == "SelfHostedPipeline":
return _import_self_hosted()
elif name == "SelfHostedHuggingFaceLLM":
return _import_self_hosted_hugging_face()
elif name == "StochasticAI":
return _import_stochasticai()
elif name == "Nebula":
return _import_symblai_nebula()
elif name == "TextGen":
return _import_textgen()
elif name == "TitanTakeoff":
return _import_titan_takeoff()
elif name == "TitanTakeoffPro":
return _import_titan_takeoff_pro()
elif name == "Together":
return _import_together()
elif name == "Tongyi":
return _import_tongyi()
elif name == "VertexAI":
return _import_vertex()
elif name == "VertexAIModelGarden":
return _import_vertex_model_garden()
elif name == "VLLM":
return _import_vllm()
elif name == "VLLMOpenAI":
return _import_vllm_openai()
elif name == "WatsonxLLM":
return _import_watsonxllm()
elif name == "Writer":
return _import_writer()
elif name == "Xinference":
return _import_xinference()
elif name == "YandexGPT":
return _import_yandex_gpt()
elif name == "VolcEngineMaasLLM":
return _import_volcengine_maas()
elif name == "type_to_cls_dict":
# for backwards compatibility
type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
k: v() for k, v in get_type_to_cls_dict().items()
}
return type_to_cls_dict
else:
raise AttributeError(f"Could not find: {name}")
__all__ = [
"AI21",
"AlephAlpha",
"AmazonAPIGateway",
"Anthropic",
"Anyscale",
"Arcee",
"Aviary",
"AzureMLOnlineEndpoint",
"Banana",
"Baseten",
"Beam",
"Bedrock",
"CTransformers",
"CTranslate2",
"CerebriumAI",
"ChatGLM",
"Clarifai",
"Cohere",
"Databricks",
"DeepInfra",
"DeepSparse",
"EdenAI",
"FakeListLLM",
"Fireworks",
"ForefrontAI",
"GigaChat",
"GPT4All",
"GooglePalm",
"GooseAI",
"GradientLLM",
"HuggingFaceEndpoint",
"HuggingFaceHub",
"HuggingFacePipeline",
"HuggingFaceTextGenInference",
"HumanInputLLM",
"KoboldApiLLM",
"LlamaCpp",
"TextGen",
"ManifestWrapper",
"Minimax",
"MlflowAIGateway",
"Modal",
"MosaicML",
"Nebula",
"NIBittensorLLM",
"NLPCloud",
"Ollama",
"OpenLLM",
"OpenLM",
"PaiEasEndpoint",
"Petals",
"PipelineAI",
"Predibase",
"PredictionGuard",
"PromptLayerOpenAI",
"PromptLayerOpenAIChat",
"OpaquePrompts",
"RWKV",
"Replicate",
"SagemakerEndpoint",
"SelfHostedHuggingFaceLLM",
"SelfHostedPipeline",
"StochasticAI",
"TitanTakeoff",
"TitanTakeoffPro",
"Tongyi",
"VertexAI",
"VertexAIModelGarden",
"VLLM",
"VLLMOpenAI",
"WatsonxLLM",
"Writer",
"OctoAIEndpoint",
"Xinference",
"JavelinAIGateway",
"QianfanLLMEndpoint",
"YandexGPT",
"VolcEngineMaasLLM",
]
def get_type_to_cls_dict() -> Dict[str, Callable[[], Type[BaseLLM]]]:
return {
"ai21": _import_ai21,
"aleph_alpha": _import_aleph_alpha,
"amazon_api_gateway": _import_amazon_api_gateway,
"amazon_bedrock": _import_bedrock,
"anthropic": _import_anthropic,
"anyscale": _import_anyscale,
"arcee": _import_arcee,
"aviary": _import_aviary,
"azureml_endpoint": _import_azureml_endpoint,
"bananadev": _import_bananadev,
"baseten": _import_baseten,
"beam": _import_beam,
"cerebriumai": _import_cerebriumai,
"chat_glm": _import_chatglm,
"clarifai": _import_clarifai,
"cohere": _import_cohere,
"ctransformers": _import_ctransformers,
"ctranslate2": _import_ctranslate2,
"databricks": _import_databricks,
"databricks-chat": _import_databricks_chat,
"deepinfra": _import_deepinfra,
"deepsparse": _import_deepsparse,
"edenai": _import_edenai,
"fake-list": _import_fake,
"forefrontai": _import_forefrontai,
"giga-chat-model": _import_gigachat,
"google_palm": _import_google_palm,
"gooseai": _import_gooseai,
"gradient": _import_gradient_ai,
"gpt4all": _import_gpt4all,
"huggingface_endpoint": _import_huggingface_endpoint,
"huggingface_hub": _import_huggingface_hub,
"huggingface_pipeline": _import_huggingface_pipeline,
"huggingface_textgen_inference": _import_huggingface_text_gen_inference,
"human-input": _import_human,
"koboldai": _import_koboldai,
"llamacpp": _import_llamacpp,
"textgen": _import_textgen,
"minimax": _import_minimax,
"mlflow": _import_mlflow,
"mlflow-chat": _import_mlflow_chat,
"mlflow-ai-gateway": _import_mlflow_ai_gateway,
"modal": _import_modal,
"mosaic": _import_mosaicml,
"nebula": _import_symblai_nebula,
"nibittensor": _import_bittensor,
"nlpcloud": _import_nlpcloud,
"ollama": _import_ollama,
"openlm": _import_openlm,
"pai_eas_endpoint": _import_pai_eas_endpoint,
"petals": _import_petals,
"pipelineai": _import_pipelineai,
"predibase": _import_predibase,
"opaqueprompts": _import_opaqueprompts,
"replicate": _import_replicate,
"rwkv": _import_rwkv,
"sagemaker_endpoint": _import_sagemaker_endpoint,
"self_hosted": _import_self_hosted,
"self_hosted_hugging_face": _import_self_hosted_hugging_face,
"stochasticai": _import_stochasticai,
"together": _import_together,
"tongyi": _import_tongyi,
"titan_takeoff": _import_titan_takeoff,
"titan_takeoff_pro": _import_titan_takeoff_pro,
"vertexai": _import_vertex,
"vertexai_model_garden": _import_vertex_model_garden,
"openllm": _import_openllm,
"openllm_client": _import_openllm,
"vllm": _import_vllm,
"vllm_openai": _import_vllm_openai,
"watsonxllm": _import_watsonxllm,
"writer": _import_writer,
"xinference": _import_xinference,
"javelin-ai-gateway": _import_javelin_ai_gateway,
"qianfan_endpoint": _import_baidu_qianfan_endpoint,
"yandex_gpt": _import_yandex_gpt,
"VolcEngineMaasLLM": _import_volcengine_maas,
}

View File

@@ -1,351 +0,0 @@
import re
import warnings
from typing import (
Any,
AsyncIterator,
Callable,
Dict,
Iterator,
List,
Mapping,
Optional,
)
from langchain_core.callbacks import (
AsyncCallbackManagerForLLMRun,
CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseLanguageModel
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
from langchain_core.prompt_values import PromptValue
from langchain_core.pydantic_v1 import Field, SecretStr, root_validator
from langchain_core.utils import (
check_package_version,
get_from_dict_or_env,
get_pydantic_field_names,
)
from langchain_core.utils.utils import build_extra_kwargs, convert_to_secret_str
class _AnthropicCommon(BaseLanguageModel):
client: Any = None #: :meta private:
async_client: Any = None #: :meta private:
model: str = Field(default="claude-2", alias="model_name")
"""Model name to use."""
max_tokens_to_sample: int = Field(default=256, alias="max_tokens")
"""Denotes the number of tokens to predict per generation."""
temperature: Optional[float] = None
"""A non-negative float that tunes the degree of randomness in generation."""
top_k: Optional[int] = None
"""Number of most likely tokens to consider at each step."""
top_p: Optional[float] = None
"""Total probability mass of tokens to consider at each step."""
streaming: bool = False
"""Whether to stream the results."""
default_request_timeout: Optional[float] = None
"""Timeout for requests to Anthropic Completion API. Default is 600 seconds."""
anthropic_api_url: Optional[str] = None
anthropic_api_key: Optional[SecretStr] = None
HUMAN_PROMPT: Optional[str] = None
AI_PROMPT: Optional[str] = None
count_tokens: Optional[Callable[[str], int]] = None
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
@root_validator(pre=True)
def build_extra(cls, values: Dict) -> Dict:
extra = values.get("model_kwargs", {})
all_required_field_names = get_pydantic_field_names(cls)
values["model_kwargs"] = build_extra_kwargs(
extra, values, all_required_field_names
)
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
values["anthropic_api_key"] = convert_to_secret_str(
get_from_dict_or_env(values, "anthropic_api_key", "ANTHROPIC_API_KEY")
)
# Get custom api url from environment.
values["anthropic_api_url"] = get_from_dict_or_env(
values,
"anthropic_api_url",
"ANTHROPIC_API_URL",
default="https://api.anthropic.com",
)
try:
import anthropic
check_package_version("anthropic", gte_version="0.3")
values["client"] = anthropic.Anthropic(
base_url=values["anthropic_api_url"],
api_key=values["anthropic_api_key"].get_secret_value(),
timeout=values["default_request_timeout"],
)
values["async_client"] = anthropic.AsyncAnthropic(
base_url=values["anthropic_api_url"],
api_key=values["anthropic_api_key"].get_secret_value(),
timeout=values["default_request_timeout"],
)
values["HUMAN_PROMPT"] = anthropic.HUMAN_PROMPT
values["AI_PROMPT"] = anthropic.AI_PROMPT
values["count_tokens"] = values["client"].count_tokens
except ImportError:
raise ImportError(
"Could not import anthropic python package. "
"Please it install it with `pip install anthropic`."
)
return values
@property
def _default_params(self) -> Mapping[str, Any]:
"""Get the default parameters for calling Anthropic API."""
d = {
"max_tokens_to_sample": self.max_tokens_to_sample,
"model": self.model,
}
if self.temperature is not None:
d["temperature"] = self.temperature
if self.top_k is not None:
d["top_k"] = self.top_k
if self.top_p is not None:
d["top_p"] = self.top_p
return {**d, **self.model_kwargs}
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {**{}, **self._default_params}
def _get_anthropic_stop(self, stop: Optional[List[str]] = None) -> List[str]:
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
raise NameError("Please ensure the anthropic package is loaded")
if stop is None:
stop = []
# Never want model to invent new turns of Human / Assistant dialog.
stop.extend([self.HUMAN_PROMPT])
return stop
class Anthropic(LLM, _AnthropicCommon):
"""Anthropic large language models.
To use, you should have the ``anthropic`` python package installed, and the
environment variable ``ANTHROPIC_API_KEY`` set with your API key, or pass
it as a named parameter to the constructor.
Example:
.. code-block:: python
import anthropic
from langchain_community.llms import Anthropic
model = Anthropic(model="<model_name>", anthropic_api_key="my-api-key")
# Simplest invocation, automatically wrapped with HUMAN_PROMPT
# and AI_PROMPT.
response = model("What are the biggest risks facing humanity?")
# Or if you want to use the chat mode, build a few-shot-prompt, or
# put words in the Assistant's mouth, use HUMAN_PROMPT and AI_PROMPT:
raw_prompt = "What are the biggest risks facing humanity?"
prompt = f"{anthropic.HUMAN_PROMPT} {prompt}{anthropic.AI_PROMPT}"
response = model(prompt)
"""
class Config:
"""Configuration for this pydantic object."""
allow_population_by_field_name = True
arbitrary_types_allowed = True
@root_validator()
def raise_warning(cls, values: Dict) -> Dict:
"""Raise warning that this class is deprecated."""
warnings.warn(
"This Anthropic LLM is deprecated. "
"Please use `from langchain_community.chat_models import ChatAnthropic` "
"instead"
)
return values
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "anthropic-llm"
def _wrap_prompt(self, prompt: str) -> str:
if not self.HUMAN_PROMPT or not self.AI_PROMPT:
raise NameError("Please ensure the anthropic package is loaded")
if prompt.startswith(self.HUMAN_PROMPT):
return prompt # Already wrapped.
# Guard against common errors in specifying wrong number of newlines.
corrected_prompt, n_subs = re.subn(r"^\n*Human:", self.HUMAN_PROMPT, prompt)
if n_subs == 1:
return corrected_prompt
# As a last resort, wrap the prompt ourselves to emulate instruct-style.
return f"{self.HUMAN_PROMPT} {prompt}{self.AI_PROMPT} Sure, here you go:\n"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
r"""Call out to Anthropic's completion endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
prompt = "What are the biggest risks facing humanity?"
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
response = model(prompt)
"""
if self.streaming:
completion = ""
for chunk in self._stream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
response = self.client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
**params,
)
return response.completion
def convert_prompt(self, prompt: PromptValue) -> str:
return self._wrap_prompt(prompt.to_string())
async def _acall(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Call out to Anthropic's completion endpoint asynchronously."""
if self.streaming:
completion = ""
async for chunk in self._astream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
):
completion += chunk.text
return completion
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
response = await self.async_client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
**params,
)
return response.completion
def _stream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
r"""Call Anthropic completion_stream and return the resulting generator.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
A generator representing the stream of tokens from Anthropic.
Example:
.. code-block:: python
prompt = "Write a poem about a stream."
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
generator = anthropic.stream(prompt)
for token in generator:
yield token
"""
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
for token in self.client.completions.create(
prompt=self._wrap_prompt(prompt), stop_sequences=stop, stream=True, **params
):
chunk = GenerationChunk(text=token.completion)
yield chunk
if run_manager:
run_manager.on_llm_new_token(chunk.text, chunk=chunk)
async def _astream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> AsyncIterator[GenerationChunk]:
r"""Call Anthropic completion_stream and return the resulting generator.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
A generator representing the stream of tokens from Anthropic.
Example:
.. code-block:: python
prompt = "Write a poem about a stream."
prompt = f"\n\nHuman: {prompt}\n\nAssistant:"
generator = anthropic.stream(prompt)
for token in generator:
yield token
"""
stop = self._get_anthropic_stop(stop)
params = {**self._default_params, **kwargs}
async for token in await self.async_client.completions.create(
prompt=self._wrap_prompt(prompt),
stop_sequences=stop,
stream=True,
**params,
):
chunk = GenerationChunk(text=token.completion)
yield chunk
if run_manager:
await run_manager.on_llm_new_token(chunk.text, chunk=chunk)
def get_num_tokens(self, text: str) -> int:
"""Calculate number of tokens."""
if not self.count_tokens:
raise NameError("Please ensure the anthropic package is loaded")
return self.count_tokens(text)

View File

@@ -1,126 +0,0 @@
import json
import logging
from typing import Any, Dict, Iterator, List, Optional
import requests
from langchain_core.callbacks import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
logger = logging.getLogger(__name__)
class CloudflareWorkersAI(LLM):
"""Langchain LLM class to help to access Cloudflare Workers AI service.
To use, you must provide an API token and
account ID to access Cloudflare Workers AI, and
pass it as a named parameter to the constructor.
Example:
.. code-block:: python
from langchain_community.llms.cloudflare_workersai import CloudflareWorkersAI
my_account_id = "my_account_id"
my_api_token = "my_secret_api_token"
llm_model = "@cf/meta/llama-2-7b-chat-int8"
cf_ai = CloudflareWorkersAI(
account_id=my_account_id,
api_token=my_api_token,
model=llm_model
)
""" # noqa: E501
account_id: str
api_token: str
model: str = "@cf/meta/llama-2-7b-chat-int8"
base_url: str = "https://api.cloudflare.com/client/v4/accounts"
streaming: bool = False
endpoint_url: str = ""
def __init__(self, **kwargs: Any) -> None:
"""Initialize the Cloudflare Workers AI class."""
super().__init__(**kwargs)
self.endpoint_url = f"{self.base_url}/{self.account_id}/ai/run/{self.model}"
@property
def _llm_type(self) -> str:
"""Return type of LLM."""
return "cloudflare"
@property
def _default_params(self) -> Dict[str, Any]:
"""Default parameters"""
return {}
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Identifying parameters"""
return {
"account_id": self.account_id,
"api_token": self.api_token,
"model": self.model,
"base_url": self.base_url,
}
def _call_api(self, prompt: str, params: Dict[str, Any]) -> requests.Response:
"""Call Cloudflare Workers API"""
headers = {"Authorization": f"Bearer {self.api_token}"}
data = {"prompt": prompt, "stream": self.streaming, **params}
response = requests.post(self.endpoint_url, headers=headers, json=data)
return response
def _process_response(self, response: requests.Response) -> str:
"""Process API response"""
if response.ok:
data = response.json()
return data["result"]["response"]
else:
raise ValueError(f"Request failed with status {response.status_code}")
def _stream(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
"""Streaming prediction"""
original_steaming: bool = self.streaming
self.streaming = True
_response_prefix_count = len("data: ")
_response_stream_end = b"data: [DONE]"
for chunk in self._call_api(prompt, kwargs).iter_lines():
if chunk == _response_stream_end:
break
if len(chunk) > _response_prefix_count:
try:
data = json.loads(chunk[_response_prefix_count:])
except Exception as e:
logger.debug(chunk)
raise e
if data is not None and "response" in data:
yield GenerationChunk(text=data["response"])
if run_manager:
run_manager.on_llm_new_token(data["response"])
logger.debug("stream end")
self.streaming = original_steaming
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
"""Regular prediction"""
if self.streaming:
return "".join(
[c.text for c in self._stream(prompt, stop, run_manager, **kwargs)]
)
else:
response = self._call_api(prompt, kwargs)
return self._process_response(response)

View File

@@ -1,50 +0,0 @@
from typing import Optional, Type
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai.chat_models import ChatOpenAI
from langchain_community.tools.amadeus.base import AmadeusBaseTool
class ClosestAirportSchema(BaseModel):
"""Schema for the AmadeusClosestAirport tool."""
location: str = Field(
description=(
" The location for which you would like to find the nearest airport "
" along with optional details such as country, state, region, or "
" province, allowing for easy processing and identification of "
" the closest airport. Examples of the format are the following:\n"
" Cali, Colombia\n "
" Lincoln, Nebraska, United States\n"
" New York, United States\n"
" Sydney, New South Wales, Australia\n"
" Rome, Lazio, Italy\n"
" Toronto, Ontario, Canada\n"
)
)
class AmadeusClosestAirport(AmadeusBaseTool):
"""Tool for finding the closest airport to a particular location."""
name: str = "closest_airport"
description: str = (
"Use this tool to find the closest airport to a particular location."
)
args_schema: Type[ClosestAirportSchema] = ClosestAirportSchema
def _run(
self,
location: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
content = (
f" What is the nearest airport to {location}? Please respond with the "
" airport's International Air Transport Association (IATA) Location "
' Identifier in the following JSON format. JSON: "iataCode": "IATA '
' Location Identifier" '
)
return ChatOpenAI(temperature=0).invoke(content)

View File

@@ -1,42 +0,0 @@
"""
This tool allows agents to interact with the clickup library
and operate on a Clickup instance.
To use this tool, you must first set as environment variables:
client_secret
client_id
code
Below is a sample script that uses the Clickup tool:
```python
from langchain_community.agent_toolkits.clickup.toolkit import ClickupToolkit
from langchain_community.utilities.clickup import ClickupAPIWrapper
clickup = ClickupAPIWrapper()
toolkit = ClickupToolkit.from_clickup_api_wrapper(clickup)
```
"""
from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool
from langchain_community.utilities.clickup import ClickupAPIWrapper
class ClickupAction(BaseTool):
"""Tool that queries the Clickup API."""
api_wrapper: ClickupAPIWrapper = Field(default_factory=ClickupAPIWrapper)
mode: str
name: str = ""
description: str = ""
def _run(
self,
instructions: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Clickup API to run an operation."""
return self.api_wrapper.run(self.mode, instructions)

View File

@@ -1,44 +0,0 @@
"""
This tool allows agents to interact with the atlassian-python-api library
and operate on a Jira instance. For more information on the
atlassian-python-api library, see https://atlassian-python-api.readthedocs.io/jira.html
To use this tool, you must first set as environment variables:
JIRA_API_TOKEN
JIRA_USERNAME
JIRA_INSTANCE_URL
Below is a sample script that uses the Jira tool:
```python
from langchain_community.agent_toolkits.jira.toolkit import JiraToolkit
from langchain_community.utilities.jira import JiraAPIWrapper
jira = JiraAPIWrapper()
toolkit = JiraToolkit.from_jira_api_wrapper(jira)
```
"""
from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.pydantic_v1 import Field
from langchain_core.tools import BaseTool
from langchain_community.utilities.jira import JiraAPIWrapper
class JiraAction(BaseTool):
"""Tool that queries the Atlassian Jira API."""
api_wrapper: JiraAPIWrapper = Field(default_factory=JiraAPIWrapper)
mode: str
name: str = ""
description: str = ""
def _run(
self,
instructions: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Atlassian Jira API to run an operation."""
return self.api_wrapper.run(self.mode, instructions)

View File

@@ -1,276 +0,0 @@
"""Tools for interacting with a Power BI dataset."""
import logging
from time import perf_counter
from typing import Any, Dict, Optional, Tuple
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.pydantic_v1 import Field, validator
from langchain_core.tools import BaseTool
from langchain_openai.chat_models import _import_tiktoken
from langchain_community.tools.powerbi.prompt import (
BAD_REQUEST_RESPONSE,
DEFAULT_FEWSHOT_EXAMPLES,
RETRY_RESPONSE,
)
from langchain_community.utilities.powerbi import PowerBIDataset, json_to_md
logger = logging.getLogger(__name__)
class QueryPowerBITool(BaseTool):
"""Tool for querying a Power BI Dataset."""
name: str = "query_powerbi"
description: str = """
Input to this tool is a detailed question about the dataset, output is a result from the dataset. It will try to answer the question using the dataset, and if it cannot, it will ask for clarification.
Example Input: "How many rows are in table1?"
""" # noqa: E501
llm_chain: Any
powerbi: PowerBIDataset = Field(exclude=True)
examples: Optional[str] = DEFAULT_FEWSHOT_EXAMPLES
session_cache: Dict[str, Any] = Field(default_factory=dict, exclude=True)
max_iterations: int = 5
output_token_limit: int = 4000
tiktoken_model_name: Optional[str] = None # "cl100k_base"
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
@validator("llm_chain")
def validate_llm_chain_input_variables( # pylint: disable=E0213
cls, llm_chain: Any
) -> Any:
"""Make sure the LLM chain has the correct input variables."""
for var in llm_chain.prompt.input_variables:
if var not in ["tool_input", "tables", "schemas", "examples"]:
raise ValueError(
"LLM chain for QueryPowerBITool must have input variables ['tool_input', 'tables', 'schemas', 'examples'], found %s", # noqa: C0301 E501 # pylint: disable=C0301
llm_chain.prompt.input_variables,
)
return llm_chain
def _check_cache(self, tool_input: str) -> Optional[str]:
"""Check if the input is present in the cache.
If the value is a bad request, overwrite with the escalated version,
if not present return None."""
if tool_input not in self.session_cache:
return None
return self.session_cache[tool_input]
def _run(
self,
tool_input: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
**kwargs: Any,
) -> str:
"""Execute the query, return the results or an error message."""
if cache := self._check_cache(tool_input):
logger.debug("Found cached result for %s: %s", tool_input, cache)
return cache
try:
logger.info("Running PBI Query Tool with input: %s", tool_input)
query = self.llm_chain.predict(
tool_input=tool_input,
tables=self.powerbi.get_table_names(),
schemas=self.powerbi.get_schemas(),
examples=self.examples,
callbacks=run_manager.get_child() if run_manager else None,
)
except Exception as exc: # pylint: disable=broad-except
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
return self.session_cache[tool_input]
if query == "I cannot answer this":
self.session_cache[tool_input] = query
return self.session_cache[tool_input]
logger.info("PBI Query:\n%s", query)
start_time = perf_counter()
pbi_result = self.powerbi.run(command=query)
end_time = perf_counter()
logger.debug("PBI Result: %s", pbi_result)
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
result, error = self._parse_output(pbi_result)
if error is not None and "TokenExpired" in error:
self.session_cache[
tool_input
] = "Authentication token expired or invalid, please try reauthenticate."
return self.session_cache[tool_input]
iterations = kwargs.get("iterations", 0)
if error and iterations < self.max_iterations:
return self._run(
tool_input=RETRY_RESPONSE.format(
tool_input=tool_input, query=query, error=error
),
run_manager=run_manager,
iterations=iterations + 1,
)
self.session_cache[tool_input] = (
result if result else BAD_REQUEST_RESPONSE.format(error=error)
)
return self.session_cache[tool_input]
async def _arun(
self,
tool_input: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
**kwargs: Any,
) -> str:
"""Execute the query, return the results or an error message."""
if cache := self._check_cache(tool_input):
logger.debug("Found cached result for %s: %s", tool_input, cache)
return f"{cache}, from cache, you have already asked this question."
try:
logger.info("Running PBI Query Tool with input: %s", tool_input)
query = await self.llm_chain.apredict(
tool_input=tool_input,
tables=self.powerbi.get_table_names(),
schemas=self.powerbi.get_schemas(),
examples=self.examples,
callbacks=run_manager.get_child() if run_manager else None,
)
except Exception as exc: # pylint: disable=broad-except
self.session_cache[tool_input] = f"Error on call to LLM: {exc}"
return self.session_cache[tool_input]
if query == "I cannot answer this":
self.session_cache[tool_input] = query
return self.session_cache[tool_input]
logger.info("PBI Query: %s", query)
start_time = perf_counter()
pbi_result = await self.powerbi.arun(command=query)
end_time = perf_counter()
logger.debug("PBI Result: %s", pbi_result)
logger.debug(f"PBI Query duration: {end_time - start_time:0.6f}")
result, error = self._parse_output(pbi_result)
if error is not None and ("TokenExpired" in error or "TokenError" in error):
self.session_cache[
tool_input
] = "Authentication token expired or invalid, please try to reauthenticate or check the scope of the credential." # noqa: E501
return self.session_cache[tool_input]
iterations = kwargs.get("iterations", 0)
if error and iterations < self.max_iterations:
return await self._arun(
tool_input=RETRY_RESPONSE.format(
tool_input=tool_input, query=query, error=error
),
run_manager=run_manager,
iterations=iterations + 1,
)
self.session_cache[tool_input] = (
result if result else BAD_REQUEST_RESPONSE.format(error=error)
)
return self.session_cache[tool_input]
def _parse_output(
self, pbi_result: Dict[str, Any]
) -> Tuple[Optional[str], Optional[Any]]:
"""Parse the output of the query to a markdown table."""
if "results" in pbi_result:
rows = pbi_result["results"][0]["tables"][0]["rows"]
if len(rows) == 0:
logger.info("0 records in result, query was valid.")
return (
None,
"0 rows returned, this might be correct, but please validate if all filter values were correct?", # noqa: E501
)
result = json_to_md(rows)
too_long, length = self._result_too_large(result)
if too_long:
return (
f"Result too large, please try to be more specific or use the `TOPN` function. The result is {length} tokens long, the limit is {self.output_token_limit} tokens.", # noqa: E501
None,
)
return result, None
if "error" in pbi_result:
if (
"pbi.error" in pbi_result["error"]
and "details" in pbi_result["error"]["pbi.error"]
):
return None, pbi_result["error"]["pbi.error"]["details"][0]["detail"]
return None, pbi_result["error"]
return None, pbi_result
def _result_too_large(self, result: str) -> Tuple[bool, int]:
"""Tokenize the output of the query."""
if self.tiktoken_model_name:
tiktoken_ = _import_tiktoken()
encoding = tiktoken_.encoding_for_model(self.tiktoken_model_name)
length = len(encoding.encode(result))
logger.info("Result length: %s", length)
return length > self.output_token_limit, length
return False, 0
class InfoPowerBITool(BaseTool):
"""Tool for getting metadata about a PowerBI Dataset."""
name: str = "schema_powerbi"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Be sure that the tables actually exist by calling list_tables_powerbi first!
Example Input: "table1, table2, table3"
""" # noqa: E501
powerbi: PowerBIDataset = Field(exclude=True)
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def _run(
self,
tool_input: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.powerbi.get_table_info(tool_input.split(", "))
async def _arun(
self,
tool_input: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.powerbi.aget_table_info(tool_input.split(", "))
class ListPowerBITool(BaseTool):
"""Tool for getting tables names."""
name: str = "list_tables_powerbi"
description: str = "Input is an empty string, output is a comma separated list of tables in the database." # noqa: E501 # pylint: disable=C0301
powerbi: PowerBIDataset = Field(exclude=True)
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
def _run(
self,
tool_input: Optional[str] = None,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the names of the tables."""
return ", ".join(self.powerbi.get_table_names())
async def _arun(
self,
tool_input: Optional[str] = None,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Get the names of the tables."""
return ", ".join(self.powerbi.get_table_names())

View File

@@ -1,130 +0,0 @@
# flake8: noqa
"""Tools for interacting with Spark SQL."""
from typing import Any, Dict, Optional
from langchain_core.pydantic_v1 import BaseModel, Field, root_validator
from langchain_core.language_models import BaseLanguageModel
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities.spark_sql import SparkSQL
from langchain_core.tools import BaseTool
from langchain_community.tools.spark_sql.prompt import QUERY_CHECKER
class BaseSparkSQLTool(BaseModel):
"""Base tool for interacting with Spark SQL."""
db: SparkSQL = Field(exclude=True)
class Config(BaseTool.Config):
pass
class QuerySparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for querying a Spark SQL."""
name: str = "query_sql_db"
description: str = """
Input to this tool is a detailed and correct SQL query, output is a result from the Spark SQL.
If the query is not correct, an error message will be returned.
If an error is returned, rewrite the query, check the query, and try again.
"""
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Execute the query, return the results or an error message."""
return self.db.run_no_throw(query)
class InfoSparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for getting metadata about a Spark SQL."""
name: str = "schema_sql_db"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Be sure that the tables actually exist by calling list_tables_sql_db first!
Example Input: "table1, table2, table3"
"""
def _run(
self,
table_names: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.db.get_table_info_no_throw(table_names.split(", "))
class ListSparkSQLTool(BaseSparkSQLTool, BaseTool):
"""Tool for getting tables names."""
name: str = "list_tables_sql_db"
description: str = "Input is an empty string, output is a comma separated list of tables in the Spark SQL."
def _run(
self,
tool_input: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for a specific table."""
return ", ".join(self.db.get_usable_table_names())
class QueryCheckerTool(BaseSparkSQLTool, BaseTool):
"""Use an LLM to check if a query is correct.
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
template: str = QUERY_CHECKER
llm: BaseLanguageModel
llm_chain: Any = Field(init=False)
name: str = "query_checker_sql_db"
description: str = """
Use this tool to double check if your query is correct before executing it.
Always use this tool before executing a query with query_sql_db!
"""
@root_validator(pre=True)
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
if "llm_chain" not in values:
from langchain.chains.llm import LLMChain
values["llm_chain"] = LLMChain(
llm=values.get("llm"),
prompt=PromptTemplate(
template=QUERY_CHECKER, input_variables=["query"]
),
)
if values["llm_chain"].prompt.input_variables != ["query"]:
raise ValueError(
"LLM chain for QueryCheckerTool need to use ['query'] as input_variables "
"for the embedded prompt"
)
return values
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the LLM to check the query."""
return self.llm_chain.predict(
query=query, callbacks=run_manager.get_child() if run_manager else None
)
async def _arun(
self,
query: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.llm_chain.apredict(
query=query, callbacks=run_manager.get_child() if run_manager else None
)

View File

@@ -1,134 +0,0 @@
# flake8: noqa
"""Tools for interacting with a SQL database."""
from typing import Any, Dict, Optional
from langchain_core.pydantic_v1 import BaseModel, Extra, Field, root_validator
from langchain_core.language_models import BaseLanguageModel
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_core.tools import BaseTool
from langchain_community.tools.sql_database.prompt import QUERY_CHECKER
class BaseSQLDatabaseTool(BaseModel):
"""Base tool for interacting with a SQL database."""
db: SQLDatabase = Field(exclude=True)
class Config(BaseTool.Config):
pass
class QuerySQLDataBaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for querying a SQL database."""
name: str = "sql_db_query"
description: str = """
Input to this tool is a detailed and correct SQL query, output is a result from the database.
If the query is not correct, an error message will be returned.
If an error is returned, rewrite the query, check the query, and try again.
"""
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Execute the query, return the results or an error message."""
return self.db.run_no_throw(query)
class InfoSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for getting metadata about a SQL database."""
name: str = "sql_db_schema"
description: str = """
Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.
Example Input: "table1, table2, table3"
"""
def _run(
self,
table_names: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for tables in a comma-separated list."""
return self.db.get_table_info_no_throw(
[t.strip() for t in table_names.split(",")]
)
class ListSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
"""Tool for getting tables names."""
name: str = "sql_db_list_tables"
description: str = "Input is an empty string, output is a comma separated list of tables in the database."
def _run(
self,
tool_input: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Get the schema for a specific table."""
return ", ".join(self.db.get_usable_table_names())
class QuerySQLCheckerTool(BaseSQLDatabaseTool, BaseTool):
"""Use an LLM to check if a query is correct.
Adapted from https://www.patterns.app/blog/2023/01/18/crunchbot-sql-analyst-gpt/"""
template: str = QUERY_CHECKER
llm: BaseLanguageModel
llm_chain: Any = Field(init=False)
name: str = "sql_db_query_checker"
description: str = """
Use this tool to double check if your query is correct before executing it.
Always use this tool before executing a query with sql_db_query!
"""
@root_validator(pre=True)
def initialize_llm_chain(cls, values: Dict[str, Any]) -> Dict[str, Any]:
if "llm_chain" not in values:
from langchain.chains.llm import LLMChain
values["llm_chain"] = LLMChain(
llm=values.get("llm"),
prompt=PromptTemplate(
template=QUERY_CHECKER, input_variables=["dialect", "query"]
),
)
if values["llm_chain"].prompt.input_variables != ["dialect", "query"]:
raise ValueError(
"LLM chain for QueryCheckerTool must have input variables ['query', 'dialect']"
)
return values
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the LLM to check the query."""
return self.llm_chain.predict(
query=query,
dialect=self.db.dialect,
callbacks=run_manager.get_child() if run_manager else None,
)
async def _arun(
self,
query: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
return await self.llm_chain.apredict(
query=query,
dialect=self.db.dialect,
callbacks=run_manager.get_child() if run_manager else None,
)

View File

@@ -1,215 +0,0 @@
"""[DEPRECATED]
## Zapier Natural Language Actions API
\
Full docs here: https://nla.zapier.com/start/
**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions
on Zapier's platform through a natural language API interface.
NLA supports apps like Gmail, Salesforce, Trello, Slack, Asana, HubSpot, Google Sheets,
Microsoft Teams, and thousands more apps: https://zapier.com/apps
Zapier NLA handles ALL the underlying API auth and translation from
natural language --> underlying API call --> return simplified output for LLMs
The key idea is you, or your users, expose a set of actions via an oauth-like setup
window, which you can then query and execute via a REST API.
NLA offers both API Key and OAuth for signing NLA API requests.
1. Server-side (API Key): for quickly getting started, testing, and production scenarios
where LangChain will only use actions exposed in the developer's Zapier account
(and will use the developer's connected accounts on Zapier.com)
2. User-facing (Oauth): for production scenarios where you are deploying an end-user
facing application and LangChain needs access to end-user's exposed actions and
connected accounts on Zapier.com
This quick start will focus on the server-side use case for brevity.
Review [full docs](https://nla.zapier.com/start/) for user-facing oauth developer
support.
Typically, you'd use SequentialChain, here's a basic example:
1. Use NLA to find an email in Gmail
2. Use LLMChain to generate a draft reply to (1)
3. Use NLA to send the draft reply (2) to someone in Slack via direct message
In code, below:
```python
import os
# get from https://platform.openai.com/
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "")
# get from https://nla.zapier.com/docs/authentication/
os.environ["ZAPIER_NLA_API_KEY"] = os.environ.get("ZAPIER_NLA_API_KEY", "")
from langchain_community.agent_toolkits import ZapierToolkit
from langchain_community.utilities.zapier import ZapierNLAWrapper
## step 0. expose gmail 'find email' and slack 'send channel message' actions
# first go here, log in, expose (enable) the two actions:
# https://nla.zapier.com/demo/start
# -- for this example, can leave all fields "Have AI guess"
# in an oauth scenario, you'd get your own <provider> id (instead of 'demo')
# which you route your users through first
zapier = ZapierNLAWrapper()
## To leverage OAuth you may pass the value `nla_oauth_access_token` to
## the ZapierNLAWrapper. If you do this there is no need to initialize
## the ZAPIER_NLA_API_KEY env variable
# zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token="TOKEN_HERE")
toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
```
"""
from typing import Any, Dict, Optional
from langchain_core._api import warn_deprecated
from langchain_core.callbacks import (
AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)
from langchain_core.pydantic_v1 import Field, root_validator
from langchain_core.tools import BaseTool
from langchain_community.tools.zapier.prompt import BASE_ZAPIER_TOOL_PROMPT
from langchain_community.utilities.zapier import ZapierNLAWrapper
class ZapierNLARunAction(BaseTool):
"""
Args:
action_id: a specific action ID (from list actions) of the action to execute
(the set api_key must be associated with the action owner)
instructions: a natural language instruction string for using the action
(eg. "get the latest email from Mike Knoop" for "Gmail: find email" action)
params: a dict, optional. Any params provided will *override* AI guesses
from `instructions` (see "understanding the AI guessing flow" here:
https://nla.zapier.com/docs/using-the-api#ai-guessing)
"""
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
action_id: str
params: Optional[dict] = None
base_prompt: str = BASE_ZAPIER_TOOL_PROMPT
zapier_description: str
params_schema: Dict[str, str] = Field(default_factory=dict)
name: str = ""
description: str = ""
@root_validator
def set_name_description(cls, values: Dict[str, Any]) -> Dict[str, Any]:
zapier_description = values["zapier_description"]
params_schema = values["params_schema"]
if "instructions" in params_schema:
del params_schema["instructions"]
# Ensure base prompt (if overridden) contains necessary input fields
necessary_fields = {"{zapier_description}", "{params}"}
if not all(field in values["base_prompt"] for field in necessary_fields):
raise ValueError(
"Your custom base Zapier prompt must contain input fields for "
"{zapier_description} and {params}."
)
values["name"] = zapier_description
values["description"] = values["base_prompt"].format(
zapier_description=zapier_description,
params=str(list(params_schema.keys())),
)
return values
def _run(
self, instructions: str, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return self.api_wrapper.run_as_str(self.action_id, instructions, self.params)
async def _arun(
self,
instructions: str,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return await self.api_wrapper.arun_as_str(
self.action_id,
instructions,
self.params,
)
ZapierNLARunAction.__doc__ = (
ZapierNLAWrapper.run.__doc__ + ZapierNLARunAction.__doc__ # type: ignore
)
# other useful actions
class ZapierNLAListActions(BaseTool):
"""
Args:
None
"""
name: str = "ZapierNLA_list_actions"
description: str = BASE_ZAPIER_TOOL_PROMPT + (
"This tool returns a list of the user's exposed actions."
)
api_wrapper: ZapierNLAWrapper = Field(default_factory=ZapierNLAWrapper)
def _run(
self,
_: str = "",
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return self.api_wrapper.list_as_str()
async def _arun(
self,
_: str = "",
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the Zapier NLA tool to return a list of all exposed user actions."""
warn_deprecated(
since="0.0.319",
message=(
"This tool will be deprecated on 2023-11-17. See "
"https://nla.zapier.com/sunset/ for details"
),
)
return await self.api_wrapper.alist_as_str()
ZapierNLAListActions.__doc__ = (
ZapierNLAWrapper.list.__doc__ + ZapierNLAListActions.__doc__ # type: ignore
)

View File

@@ -1,283 +0,0 @@
"""Integration tests for the langchain tracer module."""
import asyncio
import os
from aiohttp import ClientSession
from langchain_core.callbacks.manager import atrace_as_chain_group, trace_as_chain_group
from langchain_core.tracers.context import tracing_v2_enabled, tracing_enabled
from langchain_core.prompts import PromptTemplate
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.llms import OpenAI
questions = [
(
"Who won the US Open men's final in 2019? "
"What is his age raised to the 0.334 power?"
),
(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
),
(
"Who won the most recent formula 1 grand prix? "
"What is their age raised to the 0.23 power?"
),
(
"Who won the US Open women's final in 2019? "
"What is her age raised to the 0.34 power?"
),
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
]
def test_tracing_sequential() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
for q in questions[:3]:
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(q)
def test_tracing_session_env_var() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
os.environ["LANGCHAIN_SESSION"] = "my_session"
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0])
if "LANGCHAIN_SESSION" in os.environ:
del os.environ["LANGCHAIN_SESSION"]
async def test_tracing_concurrent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
async def test_tracing_concurrent_bw_compat_environ() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_HANDLER"] = "langchain"
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
if "LANGCHAIN_HANDLER" in os.environ:
del os.environ["LANGCHAIN_HANDLER"]
def test_tracing_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
with tracing_enabled() as session:
assert session
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
async def test_tracing_context_manager_async() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
# start a background task
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
with tracing_enabled() as session:
assert session
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
await asyncio.gather(*tasks)
await task
async def test_tracing_v2_environment_variable() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(["llm-math", "serpapi"], llm=llm, aiosession=aiosession)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
def test_tracing_v2_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = ChatOpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_TRACING_V2" in os.environ:
del os.environ["LANGCHAIN_TRACING_V2"]
with tracing_v2_enabled():
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
def test_tracing_v2_chain_with_tags() -> None:
from langchain.chains.llm import LLMChain
from langchain.chains.constitutional_ai.base import ConstitutionalChain
from langchain.chains.constitutional_ai.models import ConstitutionalPrinciple
llm = OpenAI(temperature=0)
chain = ConstitutionalChain.from_llm(
llm,
chain=LLMChain.from_string(llm, "Q: {question} A:"),
tags=["only-root"],
constitutional_principles=[
ConstitutionalPrinciple(
critique_request="Tell if this answer is good.",
revision_request="Give a better answer.",
)
],
)
if "LANGCHAIN_TRACING_V2" in os.environ:
del os.environ["LANGCHAIN_TRACING_V2"]
with tracing_v2_enabled():
chain.run("what is the meaning of life", tags=["a-tag"])
def test_tracing_v2_agent_with_metadata() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0)
chat = ChatOpenAI(temperature=0)
tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
chat_agent = initialize_agent(
tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
chat_agent.run(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
async def test_tracing_v2_async_agent_with_metadata() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0, metadata={"f": "g", "h": "i"})
chat = ChatOpenAI(temperature=0, metadata={"f": "g", "h": "i"})
async_tools = load_tools(["llm-math", "serpapi"], llm=llm)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
chat_agent = initialize_agent(
async_tools,
chat,
agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
)
await agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
await chat_agent.arun(questions[0], tags=["a-tag"], metadata={"a": "b", "c": "d"})
def test_trace_as_group() -> None:
from langchain.chains.llm import LLMChain
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
with trace_as_chain_group("my_group", inputs={"input": "cars"}) as group_manager:
chain.run(product="cars", callbacks=group_manager)
chain.run(product="computers", callbacks=group_manager)
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
with trace_as_chain_group("my_group_2", inputs={"input": "toys"}) as group_manager:
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
def test_trace_as_group_with_env_set() -> None:
from langchain.chains.llm import LLMChain
os.environ["LANGCHAIN_TRACING_V2"] = "true"
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
with trace_as_chain_group(
"my_group_env_set", inputs={"input": "cars"}
) as group_manager:
chain.run(product="cars", callbacks=group_manager)
chain.run(product="computers", callbacks=group_manager)
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
with trace_as_chain_group(
"my_group_2_env_set", inputs={"input": "toys"}
) as group_manager:
final_res = chain.run(product="toys", callbacks=group_manager)
group_manager.on_chain_end({"output": final_res})
async def test_trace_as_group_async() -> None:
from langchain.chains.llm import LLMChain
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
async with atrace_as_chain_group("my_async_group") as group_manager:
await chain.arun(product="cars", callbacks=group_manager)
await chain.arun(product="computers", callbacks=group_manager)
await chain.arun(product="toys", callbacks=group_manager)
async with atrace_as_chain_group(
"my_async_group_2", inputs={"input": "toys"}
) as group_manager:
res = await asyncio.gather(
*[
chain.arun(product="toys", callbacks=group_manager),
chain.arun(product="computers", callbacks=group_manager),
chain.arun(product="cars", callbacks=group_manager),
]
)
await group_manager.on_chain_end({"output": res})

View File

@@ -1,68 +0,0 @@
"""Integration tests for the langchain tracer module."""
import asyncio
from langchain_community.callbacks import get_openai_callback
from langchain_openai.llms import OpenAI
async def test_openai_callback() -> None:
llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm("What is the square root of 4?")
total_tokens = cb.total_tokens
assert total_tokens > 0
with get_openai_callback() as cb:
llm("What is the square root of 4?")
llm("What is the square root of 4?")
assert cb.total_tokens == total_tokens * 2
with get_openai_callback() as cb:
await asyncio.gather(
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
)
assert cb.total_tokens == total_tokens * 3
task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
with get_openai_callback() as cb:
await llm.agenerate(["What is the square root of 4?"])
await task
assert cb.total_tokens == total_tokens
def test_openai_callback_batch_llm() -> None:
llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm.generate(["What is the square root of 4?", "What is the square root of 4?"])
assert cb.total_tokens > 0
total_tokens = cb.total_tokens
with get_openai_callback() as cb:
llm("What is the square root of 4?")
llm("What is the square root of 4?")
assert cb.total_tokens == total_tokens
def test_openai_callback_agent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
agent.run(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")

View File

@@ -1,30 +0,0 @@
"""Integration tests for the StreamlitCallbackHandler module."""
import pytest
# Import the internal StreamlitCallbackHandler from its module - and not from
# the `langchain_community.callbacks.streamlit` package - so that we don't end up using
# Streamlit's externally-provided callback handler.
from langchain_community.callbacks.streamlit.streamlit_callback_handler import (
StreamlitCallbackHandler,
)
from langchain_openai.llms import OpenAI
@pytest.mark.requires("streamlit")
def test_streamlit_callback_agent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
import streamlit as st
streamlit_callback = StreamlitCallbackHandler(st.container())
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?",
callbacks=[streamlit_callback],
)

View File

@@ -1,118 +0,0 @@
"""Integration tests for the langchain tracer module."""
import asyncio
import os
from aiohttp import ClientSession
from langchain_community.callbacks import wandb_tracing_enabled
from langchain_openai.llms import OpenAI
questions = [
(
"Who won the US Open men's final in 2019? "
"What is his age raised to the 0.334 power?"
),
(
"Who is Olivia Wilde's boyfriend? "
"What is his current age raised to the 0.23 power?"
),
(
"Who won the most recent formula 1 grand prix? "
"What is their age raised to the 0.23 power?"
),
(
"Who won the US Open women's final in 2019? "
"What is her age raised to the 0.34 power?"
),
("Who is Beyonce's husband? " "What is his age raised to the 0.19 power?"),
]
def test_tracing_sequential() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
os.environ["WANDB_PROJECT"] = "langchain-tracing"
for q in questions[:3]:
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(q)
def test_tracing_session_env_var() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run(questions[0])
async def test_tracing_concurrent() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
aiosession = ClientSession()
llm = OpenAI(temperature=0)
async_tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
aiosession=aiosession,
)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
tasks = [agent.arun(q) for q in questions[:3]]
await asyncio.gather(*tasks)
await aiosession.close()
def test_tracing_context_manager() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_WANDB_TRACING" in os.environ:
del os.environ["LANGCHAIN_WANDB_TRACING"]
with wandb_tracing_enabled():
agent.run(questions[0]) # this should be traced
agent.run(questions[0]) # this should not be traced
async def test_tracing_context_manager_async() -> None:
from langchain.agents import AgentType, initialize_agent, load_tools
llm = OpenAI(temperature=0)
async_tools = load_tools(
["llm-math", "serpapi"],
llm=llm,
)
agent = initialize_agent(
async_tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if "LANGCHAIN_WANDB_TRACING" in os.environ:
del os.environ["LANGCHAIN_TRACING"]
# start a background task
task = asyncio.create_task(agent.arun(questions[0])) # this should not be traced
with wandb_tracing_enabled():
tasks = [agent.arun(q) for q in questions[1:4]] # these should be traced
await asyncio.gather(*tasks)
await task

View File

@@ -1,219 +0,0 @@
"""Test Baidu Qianfan Chat Endpoint."""
from typing import Any
from langchain_core.callbacks import CallbackManager
from langchain_core.messages import (
AIMessage,
BaseMessage,
FunctionMessage,
HumanMessage,
)
from langchain_core.outputs import ChatGeneration, LLMResult
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_community.chat_models.baidu_qianfan_endpoint import QianfanChatEndpoint
from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler
_FUNCTIONS: Any = [
{
"name": "format_person_info",
"description": (
"Output formatter. Should always be used to format your response to the"
" user."
),
"parameters": {
"title": "Person",
"description": "Identifying information about a person.",
"type": "object",
"properties": {
"name": {
"title": "Name",
"description": "The person's name",
"type": "string",
},
"age": {
"title": "Age",
"description": "The person's age",
"type": "integer",
},
"fav_food": {
"title": "Fav Food",
"description": "The person's favorite food",
"type": "string",
},
},
"required": ["name", "age"],
},
},
{
"name": "get_current_temperature",
"description": ("Used to get the location's temperature."),
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "city name",
},
"unit": {
"type": "string",
"enum": ["centigrade", "Fahrenheit"],
},
},
"required": ["location", "unit"],
},
"responses": {
"type": "object",
"properties": {
"temperature": {
"type": "integer",
"description": "city temperature",
},
"unit": {
"type": "string",
"enum": ["centigrade", "Fahrenheit"],
},
},
},
},
]
def test_default_call() -> None:
"""Test default model(`ERNIE-Bot`) call."""
chat = QianfanChatEndpoint()
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_model() -> None:
"""Test model kwarg works."""
chat = QianfanChatEndpoint(model="BLOOMZ-7B")
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_model_param() -> None:
"""Test model params works."""
chat = QianfanChatEndpoint()
response = chat(model="BLOOMZ-7B", messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_endpoint() -> None:
"""Test user custom model deployments like some open source models."""
chat = QianfanChatEndpoint(endpoint="qianfan_bloomz_7b_compressed")
response = chat(messages=[HumanMessage(content="Hello")])
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_endpoint_param() -> None:
"""Test user custom model deployments like some open source models."""
chat = QianfanChatEndpoint()
response = chat(
messages=[
HumanMessage(endpoint="qianfan_bloomz_7b_compressed", content="Hello")
]
)
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_multiple_history() -> None:
"""Tests multiple history works."""
chat = QianfanChatEndpoint()
response = chat(
messages=[
HumanMessage(content="Hello."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you doing?"),
]
)
assert isinstance(response, BaseMessage)
assert isinstance(response.content, str)
def test_stream() -> None:
"""Test that stream works."""
chat = QianfanChatEndpoint(streaming=True)
callback_handler = FakeCallbackHandler()
callback_manager = CallbackManager([callback_handler])
response = chat(
messages=[
HumanMessage(content="Hello."),
AIMessage(content="Hello!"),
HumanMessage(content="Who are you?"),
],
stream=True,
callbacks=callback_manager,
)
assert callback_handler.llm_streams > 0
assert isinstance(response.content, str)
def test_multiple_messages() -> None:
"""Tests multiple messages works."""
chat = QianfanChatEndpoint()
message = HumanMessage(content="Hi, how are you.")
response = chat.generate([[message], [message]])
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
for generations in response.generations:
assert len(generations) == 1
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content
def test_functions_call_thoughts() -> None:
chat = QianfanChatEndpoint(model="ERNIE-Bot")
prompt_tmpl = "Use the given functions to answer following question: {input}"
prompt_msgs = [
HumanMessagePromptTemplate.from_template(prompt_tmpl),
]
prompt = ChatPromptTemplate(messages=prompt_msgs)
chain = prompt | chat.bind(functions=_FUNCTIONS)
message = HumanMessage(content="What's the temperature in Shanghai today?")
response = chain.batch([{"input": message}])
assert isinstance(response[0], AIMessage)
assert "function_call" in response[0].additional_kwargs
def test_functions_call() -> None:
chat = QianfanChatEndpoint(model="ERNIE-Bot")
prompt = ChatPromptTemplate(
messages=[
HumanMessage(content="What's the temperature in Shanghai today?"),
AIMessage(
content="",
additional_kwargs={
"function_call": {
"name": "get_current_temperature",
"thoughts": "i will use get_current_temperature "
"to resolve the questions",
"arguments": '{"location":"Shanghai","unit":"centigrade"}',
}
},
),
FunctionMessage(
name="get_current_weather",
content='{"temperature": "25", \
"unit": "摄氏度", "description": "晴朗"}',
),
]
)
chain = prompt | chat.bind(functions=_FUNCTIONS)
resp = chain.invoke({})
assert isinstance(resp, AIMessage)

View File

@@ -1,182 +0,0 @@
from pathlib import Path
import pytest
from langchain_community.document_loaders.concurrent import ConcurrentLoader
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
def test_language_loader_for_python() -> None:
"""Test Python loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 2
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "python"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.py")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "python"
assert (
docs[0].page_content
== """def main():
print("Hello World!")
return 0"""
)
assert (
docs[1].page_content
== """#!/usr/bin/env python3
import sys
# Code for: def main():
if __name__ == "__main__":
sys.exit(main())"""
)
def test_language_loader_for_python_with_parser_threshold() -> None:
"""Test Python loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.py",
parser=LanguageParser(language="python", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def esprima_installed() -> bool:
try:
import esprima # noqa: F401
return True
except Exception as e:
print(f"esprima not installed, skipping test {e}")
return False
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
def test_language_loader_for_javascript() -> None:
"""Test JavaScript loader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 3
metadata = docs[0].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[1].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "functions_classes"
assert metadata["language"] == "js"
metadata = docs[2].metadata
assert metadata["source"] == str(file_path / "hello_world.js")
assert metadata["content_type"] == "simplified_code"
assert metadata["language"] == "js"
assert (
docs[0].page_content
== """class HelloWorld {
sayHello() {
console.log("Hello World!");
}
}"""
)
assert (
docs[1].page_content
== """function main() {
const hello = new HelloWorld();
hello.sayHello();
}"""
)
assert (
docs[2].page_content
== """// Code for: class HelloWorld {
// Code for: function main() {
main();"""
)
def test_language_loader_for_javascript_with_parser_threshold() -> None:
"""Test JavaScript loader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = GenericLoader.from_filesystem(
file_path,
glob="hello_world.js",
parser=LanguageParser(language="js", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def test_concurrent_language_loader_for_javascript_with_parser_threshold() -> None:
"""Test JavaScript ConcurrentLoader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path,
glob="hello_world.js",
parser=LanguageParser(language="js", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
def test_concurrent_language_loader_for_python_with_parser_threshold() -> None:
"""Test Python ConcurrentLoader with parser enabled and below threshold."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path,
glob="hello_world.py",
parser=LanguageParser(language="python", parser_threshold=1000),
)
docs = loader.load()
assert len(docs) == 1
@pytest.mark.skipif(not esprima_installed(), reason="requires esprima package")
def test_concurrent_language_loader_for_javascript() -> None:
"""Test JavaScript ConcurrentLoader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path, glob="hello_world.js", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 3
def test_concurrent_language_loader_for_python() -> None:
"""Test Python ConcurrentLoader with parser enabled."""
file_path = Path(__file__).parent.parent.parent / "examples"
loader = ConcurrentLoader.from_filesystem(
file_path, glob="hello_world.py", parser=LanguageParser(parser_threshold=5)
)
docs = loader.load()
assert len(docs) == 2

View File

@@ -1,147 +0,0 @@
"""Test Fireworks AI API Wrapper."""
import sys
from typing import Generator
import pytest
from langchain_core.outputs import LLMResult
from langchain_community.llms.fireworks import Fireworks
if sys.version_info < (3, 9):
pytest.skip("fireworks-ai requires Python > 3.8", allow_module_level=True)
@pytest.fixture
def llm() -> Fireworks:
return Fireworks(model_kwargs={"temperature": 0, "max_tokens": 512})
@pytest.mark.scheduled
def test_fireworks_call(llm: Fireworks) -> None:
"""Test valid call to fireworks."""
output = llm("How is the weather in New York today?")
assert isinstance(output, str)
@pytest.mark.scheduled
def test_fireworks_model_param() -> None:
"""Tests model parameters for Fireworks"""
llm = Fireworks(model="foo")
assert llm.model == "foo"
@pytest.mark.scheduled
def test_fireworks_invoke(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = llm.invoke("How is the weather in New York today?", stop=[","])
assert isinstance(output, str)
assert output[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_ainvoke(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = await llm.ainvoke("How is the weather in New York today?", stop=[","])
assert isinstance(output, str)
assert output[-1] == ","
@pytest.mark.scheduled
def test_fireworks_batch(llm: Fireworks) -> None:
"""Tests completion with invoke"""
llm = Fireworks()
output = llm.batch(
[
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
],
stop=[","],
)
for token in output:
assert isinstance(token, str)
assert token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_abatch(llm: Fireworks) -> None:
"""Tests completion with invoke"""
output = await llm.abatch(
[
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
"How is the weather in New York today?",
],
stop=[","],
)
for token in output:
assert isinstance(token, str)
assert token[-1] == ","
@pytest.mark.scheduled
def test_fireworks_multiple_prompts(
llm: Fireworks,
) -> None:
"""Test completion with multiple prompts."""
output = llm.generate(["How is the weather in New York today?", "I'm pickle rick"])
assert isinstance(output, LLMResult)
assert isinstance(output.generations, list)
assert len(output.generations) == 2
@pytest.mark.scheduled
def test_fireworks_streaming(llm: Fireworks) -> None:
"""Test stream completion."""
generator = llm.stream("Who's the best quarterback in the NFL?")
assert isinstance(generator, Generator)
for token in generator:
assert isinstance(token, str)
@pytest.mark.scheduled
def test_fireworks_streaming_stop_words(llm: Fireworks) -> None:
"""Test stream completion with stop words."""
generator = llm.stream("Who's the best quarterback in the NFL?", stop=[","])
assert isinstance(generator, Generator)
last_token = ""
for token in generator:
last_token = token
assert isinstance(token, str)
assert last_token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_streaming_async(llm: Fireworks) -> None:
"""Test stream completion."""
last_token = ""
async for token in llm.astream(
"Who's the best quarterback in the NFL?", stop=[","]
):
last_token = token
assert isinstance(token, str)
assert last_token[-1] == ","
@pytest.mark.scheduled
async def test_fireworks_async_agenerate(llm: Fireworks) -> None:
"""Test async."""
output = await llm.agenerate(["What is the best city to live in California?"])
assert isinstance(output, LLMResult)
@pytest.mark.scheduled
async def test_fireworks_multiple_prompts_async_agenerate(llm: Fireworks) -> None:
output = await llm.agenerate(
["How is the weather in New York today?", "I'm pickle rick"]
)
assert isinstance(output, LLMResult)
assert isinstance(output.generations, list)
assert len(output.generations) == 2

View File

@@ -1,77 +0,0 @@
import langchain_community.utilities.opaqueprompts as op
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableParallel
from langchain_openai.llms import OpenAI
from langchain_community.llms.opaqueprompts import OpaquePrompts
prompt_template = """
As an AI assistant, you will answer questions according to given context.
Sensitive personal information in the question is masked for privacy.
For instance, if the original text says "Giana is good," it will be changed
to "PERSON_998 is good."
Here's how to handle these changes:
* Consider these masked phrases just as placeholders, but still refer to
them in a relevant way when answering.
* It's possible that different masked terms might mean the same thing.
Stick with the given term and don't modify it.
* All masked terms follow the "TYPE_ID" pattern.
* Please don't invent new masked terms. For instance, if you see "PERSON_998,"
don't come up with "PERSON_997" or "PERSON_999" unless they're already in the question.
Conversation History: ```{history}```
Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,
John Doe provided me with his personal details. His email is johndoe@example.com
and his contact number is 650-456-7890. He lives in New York City, USA, and
belongs to the American nationality with Christian beliefs and a leaning towards
the Democratic party. He mentioned that he recently made a transaction using his
credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address
1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he
noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided
his website as https://johndoeportfolio.com. John also discussed
some of his US-specific details. He said his bank account number is
1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321,
and he recently renewed his passport,
the number for which is 123456789. He emphasized not to share his SSN, which is
669-45-6789. Furthermore, he mentioned that he accesses his work files remotely
through the IP 192.168.1.1 and has a medical license number MED-123456. ```
Question: ```{question}```
"""
def test_opaqueprompts() -> None:
chain = PromptTemplate.from_template(prompt_template) | OpaquePrompts(llm=OpenAI())
output = chain.invoke(
{
"question": "Write a text message to remind John to do password reset \
for his website through his email to stay secure."
}
)
assert isinstance(output, str)
def test_opaqueprompts_functions() -> None:
prompt = (PromptTemplate.from_template(prompt_template),)
llm = OpenAI()
pg_chain = (
op.sanitize
| RunnableParallel(
secure_context=lambda x: x["secure_context"], # type: ignore
response=(lambda x: x["sanitized_input"]) # type: ignore
| prompt
| llm
| StrOutputParser(),
)
| (lambda x: op.desanitize(x["response"], x["secure_context"]))
)
pg_chain.invoke(
{
"question": "Write a text message to remind John to do password reset\
for his website through his email to stay secure.",
"history": "",
}
)

View File

@@ -1,42 +0,0 @@
"""Test Nebula API wrapper."""
from langchain_community.llms.symblai_nebula import Nebula
def test_symblai_nebula_call() -> None:
"""Test valid call to Nebula."""
conversation = """Sam: Good morning, team! Let's keep this standup concise.
We'll go in the usual order: what you did yesterday,
what you plan to do today, and any blockers. Alex, kick us off.
Alex: Morning! Yesterday, I wrapped up the UI for the user dashboard.
The new charts and widgets are now responsive.
I also had a sync with the design team to ensure the final touchups are in
line with the brand guidelines. Today, I'll start integrating the frontend with
the new API endpoints Rhea was working on.
The only blocker is waiting for some final API documentation,
but I guess Rhea can update on that.
Rhea: Hey, all! Yep, about the API documentation - I completed the majority of
the backend work for user data retrieval yesterday.
The endpoints are mostly set up, but I need to do a bit more testing today.
I'll finalize the API documentation by noon, so that should unblock Alex.
After that, Ill be working on optimizing the database queries
for faster data fetching. No other blockers on my end.
Sam: Great, thanks Rhea. Do reach out if you need any testing assistance
or if there are any hitches with the database.
Now, my update: Yesterday, I coordinated with the client to get clarity
on some feature requirements. Today, I'll be updating our project roadmap
and timelines based on their feedback. Additionally, I'll be sitting with
the QA team in the afternoon for preliminary testing.
Blocker: I might need both of you to be available for a quick call
in case the client wants to discuss the changes live.
Alex: Sounds good, Sam. Just let us know a little in advance for the call.
Rhea: Agreed. We can make time for that.
Sam: Perfect! Let's keep the momentum going. Reach out if there are any
sudden issues or support needed. Have a productive day!
Alex: You too.
Rhea: Thanks, bye!"""
llm = Nebula(nebula_api_key="<your_api_key>")
instruction = """Identify the main objectives mentioned in this
conversation."""
output = llm.invoke(f"{instruction}\n{conversation}")
assert isinstance(output, str)

View File

@@ -1,171 +0,0 @@
"""Integration test for Arxiv API Wrapper."""
from typing import Any, List
import pytest
from langchain_core.documents import Document
from langchain_core.tools import BaseTool
from langchain_community.tools import ArxivQueryRun
from langchain_community.utilities import ArxivAPIWrapper
@pytest.fixture
def api_client() -> ArxivAPIWrapper:
return ArxivAPIWrapper()
def test_run_success_paper_name(api_client: ArxivAPIWrapper) -> None:
"""Test a query of paper name that returns the correct answer"""
output = api_client.run("Heat-bath random walks with Markov bases")
assert "Probability distributions for Markov chains based quantum walks" in output
assert (
"Transformations of random walks on groups via Markov stopping times" in output
)
assert (
"Recurrence of Multidimensional Persistent Random Walks. Fourier and Series "
"Criteria" in output
)
def test_run_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
"""Test a query of an arxiv identifier returns the correct answer"""
output = api_client.run("1605.08386v1")
assert "Heat-bath random walks with Markov bases" in output
def test_run_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
"""Test a query of multiple arxiv identifiers that returns the correct answer"""
output = api_client.run("1605.08386v1 2212.00794v2 2308.07912")
assert "Heat-bath random walks with Markov bases" in output
assert "Scaling Language-Image Pre-training via Masking" in output
assert (
"Ultra-low mass PBHs in the early universe can explain the PTA signal" in output
)
def test_run_returns_several_docs(api_client: ArxivAPIWrapper) -> None:
"""Test that returns several docs"""
output = api_client.run("Caprice Stanley")
assert "On Mixing Behavior of a Family of Random Walks" in output
def test_run_returns_no_result(api_client: ArxivAPIWrapper) -> None:
"""Test that gives no result."""
output = api_client.run("1605.08386WWW")
assert "No good Arxiv Result was found" == output
def assert_docs(docs: List[Document]) -> None:
for doc in docs:
assert doc.page_content
assert doc.metadata
assert set(doc.metadata) == {"Published", "Title", "Authors", "Summary"}
def test_load_success_paper_name(api_client: ArxivAPIWrapper) -> None:
"""Test a query of paper name that returns one document"""
docs = api_client.load("Heat-bath random walks with Markov bases")
assert len(docs) == 3
assert_docs(docs)
def test_load_success_arxiv_identifier(api_client: ArxivAPIWrapper) -> None:
"""Test a query of an arxiv identifier that returns one document"""
docs = api_client.load("1605.08386v1")
assert len(docs) == 1
assert_docs(docs)
def test_load_success_multiple_arxiv_identifiers(api_client: ArxivAPIWrapper) -> None:
"""Test a query of arxiv identifiers that returns the correct answer"""
docs = api_client.load("1605.08386v1 2212.00794v2 2308.07912")
assert len(docs) == 3
assert_docs(docs)
def test_load_returns_no_result(api_client: ArxivAPIWrapper) -> None:
"""Test that returns no docs"""
docs = api_client.load("1605.08386WWW")
assert len(docs) == 0
def test_load_returns_limited_docs() -> None:
"""Test that returns several docs"""
expected_docs = 2
api_client = ArxivAPIWrapper(load_max_docs=expected_docs)
docs = api_client.load("ChatGPT")
assert len(docs) == expected_docs
assert_docs(docs)
def test_load_returns_limited_doc_content_chars() -> None:
"""Test that returns limited doc_content_chars_max"""
doc_content_chars_max = 100
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
docs = api_client.load("1605.08386")
assert len(docs[0].page_content) == doc_content_chars_max
def test_load_returns_unlimited_doc_content_chars() -> None:
"""Test that returns unlimited doc_content_chars_max"""
doc_content_chars_max = None
api_client = ArxivAPIWrapper(doc_content_chars_max=doc_content_chars_max)
docs = api_client.load("1605.08386")
assert len(docs[0].page_content) == pytest.approx(54338, rel=1e-2)
def test_load_returns_full_set_of_metadata() -> None:
"""Test that returns several docs"""
api_client = ArxivAPIWrapper(load_max_docs=1, load_all_available_meta=True)
docs = api_client.load("ChatGPT")
assert len(docs) == 1
for doc in docs:
assert doc.page_content
assert doc.metadata
assert set(doc.metadata).issuperset(
{"Published", "Title", "Authors", "Summary"}
)
print(doc.metadata)
assert len(set(doc.metadata)) > 4
def _load_arxiv_from_universal_entry(**kwargs: Any) -> BaseTool:
from langchain.agents.load_tools import load_tools
tools = load_tools(["arxiv"], **kwargs)
assert len(tools) == 1, "loaded more than 1 tool"
return tools[0]
def test_load_arxiv_from_universal_entry() -> None:
arxiv_tool = _load_arxiv_from_universal_entry()
output = arxiv_tool("Caprice Stanley")
assert (
"On Mixing Behavior of a Family of Random Walks" in output
), "failed to fetch a valid result"
def test_load_arxiv_from_universal_entry_with_params() -> None:
params = {
"top_k_results": 1,
"load_max_docs": 10,
"load_all_available_meta": True,
}
arxiv_tool = _load_arxiv_from_universal_entry(**params)
assert isinstance(arxiv_tool, ArxivQueryRun)
wp = arxiv_tool.api_wrapper
assert wp.top_k_results == 1, "failed to assert top_k_results"
assert wp.load_max_docs == 10, "failed to assert load_max_docs"
assert (
wp.load_all_available_meta is True
), "failed to assert load_all_available_meta"

View File

@@ -1,164 +0,0 @@
"""Integration test for PubMed API Wrapper."""
from typing import Any, List
import pytest
from langchain_core.documents import Document
from langchain_core.tools import BaseTool
from langchain_community.tools import PubmedQueryRun
from langchain_community.utilities import PubMedAPIWrapper
xmltodict = pytest.importorskip("xmltodict")
@pytest.fixture
def api_client() -> PubMedAPIWrapper:
return PubMedAPIWrapper()
def test_run_success(api_client: PubMedAPIWrapper) -> None:
"""Test that returns the correct answer"""
search_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature"
)
output = api_client.run(search_string)
test_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature: Findings and Implications"
)
assert test_string in output
assert len(output) == api_client.doc_content_chars_max
def test_run_returns_no_result(api_client: PubMedAPIWrapper) -> None:
"""Test that gives no result."""
output = api_client.run("1605.08386WWW")
assert "No good PubMed Result was found" == output
def test_retrieve_article_returns_book_abstract(api_client: PubMedAPIWrapper) -> None:
"""Test that returns the excerpt of a book."""
output_nolabel = api_client.retrieve_article("25905357", "")
output_withlabel = api_client.retrieve_article("29262144", "")
test_string_nolabel = (
"Osteoporosis is a multifactorial disorder associated with low bone mass and "
"enhanced skeletal fragility. Although"
)
assert test_string_nolabel in output_nolabel["Summary"]
assert (
"Wallenberg syndrome was first described in 1808 by Gaspard Vieusseux. However,"
in output_withlabel["Summary"]
)
def test_retrieve_article_returns_article_abstract(
api_client: PubMedAPIWrapper,
) -> None:
"""Test that returns the abstract of an article."""
output_nolabel = api_client.retrieve_article("37666905", "")
output_withlabel = api_client.retrieve_article("37666551", "")
test_string_nolabel = (
"This work aims to: (1) Provide maximal hand force data on six different "
"grasp types for healthy subjects; (2) detect grasp types with maximal "
"force significantly affected by hand osteoarthritis (HOA) in women; (3) "
"look for predictors to detect HOA from the maximal forces using discriminant "
"analyses."
)
assert test_string_nolabel in output_nolabel["Summary"]
test_string_withlabel = (
"OBJECTIVES: To assess across seven hospitals from six different countries "
"the extent to which the COVID-19 pandemic affected the volumes of orthopaedic "
"hospital admissions and patient outcomes for non-COVID-19 patients admitted "
"for orthopaedic care."
)
assert test_string_withlabel in output_withlabel["Summary"]
def test_retrieve_article_no_abstract_available(api_client: PubMedAPIWrapper) -> None:
"""Test that returns 'No abstract available'."""
output = api_client.retrieve_article("10766884", "")
assert "No abstract available" == output["Summary"]
def assert_docs(docs: List[Document]) -> None:
for doc in docs:
assert doc.metadata
assert set(doc.metadata) == {
"Copyright Information",
"uid",
"Title",
"Published",
}
def test_load_success(api_client: PubMedAPIWrapper) -> None:
"""Test that returns one document"""
docs = api_client.load_docs("chatgpt")
assert len(docs) == api_client.top_k_results == 3
assert_docs(docs)
def test_load_returns_no_result(api_client: PubMedAPIWrapper) -> None:
"""Test that returns no docs"""
docs = api_client.load_docs("1605.08386WWW")
assert len(docs) == 0
def test_load_returns_limited_docs() -> None:
"""Test that returns several docs"""
expected_docs = 2
api_client = PubMedAPIWrapper(top_k_results=expected_docs)
docs = api_client.load_docs("ChatGPT")
assert len(docs) == expected_docs
assert_docs(docs)
def test_load_returns_full_set_of_metadata() -> None:
"""Test that returns several docs"""
api_client = PubMedAPIWrapper(load_max_docs=1, load_all_available_meta=True)
docs = api_client.load_docs("ChatGPT")
assert len(docs) == 3
for doc in docs:
assert doc.metadata
assert set(doc.metadata).issuperset(
{"Copyright Information", "Published", "Title", "uid"}
)
def _load_pubmed_from_universal_entry(**kwargs: Any) -> BaseTool:
from langchain.agents.load_tools import load_tools
tools = load_tools(["pubmed"], **kwargs)
assert len(tools) == 1, "loaded more than 1 tool"
return tools[0]
def test_load_pupmed_from_universal_entry() -> None:
pubmed_tool = _load_pubmed_from_universal_entry()
search_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature"
)
output = pubmed_tool(search_string)
test_string = (
"Examining the Validity of ChatGPT in Identifying "
"Relevant Nephrology Literature: Findings and Implications"
)
assert test_string in output
def test_load_pupmed_from_universal_entry_with_params() -> None:
params = {
"top_k_results": 1,
}
pubmed_tool = _load_pubmed_from_universal_entry(**params)
assert isinstance(pubmed_tool, PubmedQueryRun)
wp = pubmed_tool.api_wrapper
assert wp.top_k_results == 1, "failed to assert top_k_results"

View File

@@ -1,44 +0,0 @@
import os
from typing import Union
import pytest
from vcr.request import Request
# Those environment variables turn on Deep Lake pytest mode.
# It significantly makes tests run much faster.
# Need to run before `import deeplake`
os.environ["BUGGER_OFF"] = "true"
os.environ["DEEPLAKE_DOWNLOAD_PATH"] = "./testing/local_storage"
os.environ["DEEPLAKE_PYTEST_ENABLED"] = "true"
# This fixture returns a dictionary containing filter_headers options
# for replacing certain headers with dummy values during cassette playback
# Specifically, it replaces the authorization header with a dummy value to
# prevent sensitive data from being recorded in the cassette.
# It also filters request to certain hosts (specified in the `ignored_hosts` list)
# to prevent data from being recorded in the cassette.
@pytest.fixture(scope="module")
def vcr_config() -> dict:
skipped_host = ["pinecone.io"]
def before_record_response(response: dict) -> Union[dict, None]:
return response
def before_record_request(request: Request) -> Union[Request, None]:
for host in skipped_host:
if request.host.startswith(host) or request.host.endswith(host):
return None
return request
return {
"before_record_request": before_record_request,
"before_record_response": before_record_response,
"filter_headers": [
("authorization", "authorization-DUMMY"),
("X-OpenAI-Client-User-Agent", "X-OpenAI-Client-User-Agent-DUMMY"),
("Api-Key", "Api-Key-DUMMY"),
("User-Agent", "User-Agent-DUMMY"),
],
"ignore_localhost": True,
}

View File

@@ -1,85 +0,0 @@
"""Test CallbackManager."""
from unittest.mock import patch
import pytest
from langchain_community.callbacks import get_openai_callback
from langchain_core.callbacks.manager import trace_as_chain_group, CallbackManager
from langchain_core.outputs import LLMResult
from langchain_core.tracers.langchain import LangChainTracer, wait_for_all_tracers
from langchain_openai.llms import BaseOpenAI
def test_callback_manager_configure_context_vars(
monkeypatch: pytest.MonkeyPatch,
) -> None:
"""Test callback manager configuration."""
monkeypatch.setenv("LANGCHAIN_TRACING_V2", "true")
monkeypatch.setenv("LANGCHAIN_TRACING", "false")
with patch.object(LangChainTracer, "_update_run_single"):
with patch.object(LangChainTracer, "_persist_run_single"):
with trace_as_chain_group("test") as group_manager:
assert len(group_manager.handlers) == 1
tracer = group_manager.handlers[0]
assert isinstance(tracer, LangChainTracer)
with get_openai_callback() as cb:
# This is a new empty callback handler
assert cb.successful_requests == 0
assert cb.total_tokens == 0
# configure adds this openai cb but doesn't modify the group manager
mngr = CallbackManager.configure(group_manager)
assert mngr.handlers == [tracer, cb]
assert group_manager.handlers == [tracer]
response = LLMResult(
generations=[],
llm_output={
"token_usage": {
"prompt_tokens": 2,
"completion_tokens": 1,
"total_tokens": 3,
},
"model_name": BaseOpenAI.__fields__["model_name"].default,
},
)
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
# The callback handler has been updated
assert cb.successful_requests == 1
assert cb.total_tokens == 3
assert cb.prompt_tokens == 2
assert cb.completion_tokens == 1
assert cb.total_cost > 0
with get_openai_callback() as cb:
# This is a new empty callback handler
assert cb.successful_requests == 0
assert cb.total_tokens == 0
# configure adds this openai cb but doesn't modify the group manager
mngr = CallbackManager.configure(group_manager)
assert mngr.handlers == [tracer, cb]
assert group_manager.handlers == [tracer]
response = LLMResult(
generations=[],
llm_output={
"token_usage": {
"prompt_tokens": 2,
"completion_tokens": 1,
"total_tokens": 3,
},
"model_name": BaseOpenAI.__fields__["model_name"].default,
},
)
mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
# The callback handler has been updated
assert cb.successful_requests == 1
assert cb.total_tokens == 3
assert cb.prompt_tokens == 2
assert cb.completion_tokens == 1
assert cb.total_cost > 0
wait_for_all_tracers()
assert LangChainTracer._persist_run_single.call_count == 1 # type: ignore

View File

@@ -1,37 +0,0 @@
from langchain_community.callbacks import __all__
EXPECTED_ALL = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"FileCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"StdOutCallbackHandler",
"AsyncIteratorCallbackHandler",
"StreamingStdOutCallbackHandler",
"FinalStreamingStdOutCallbackHandler",
"LLMThoughtLabeler",
"LangChainTracer",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -1,23 +0,0 @@
import pathlib
from langchain_community.chat_loaders import slack, utils
def test_slack_chat_loader() -> None:
chat_path = (
pathlib.Path(__file__).parents[2]
/ "examples"
/ "slack_export.zip"
)
loader = slack.SlackChatLoader(str(chat_path))
chat_sessions = list(
utils.map_ai_messages(loader.lazy_load(), sender="U0500003428")
)
assert chat_sessions, "Chat sessions should not be empty"
assert chat_sessions[1]["messages"], "Chat messages should not be empty"
assert (
"Example message" in chat_sessions[1]["messages"][0].content
), "Chat content mismatch"

View File

@@ -1,54 +0,0 @@
"""Test Anthropic Chat API wrapper."""
from typing import List
from unittest.mock import MagicMock
import pytest
from langchain_core.messages import (
AIMessage,
BaseMessage,
HumanMessage,
SystemMessage,
)
from langchain_community.chat_models import BedrockChat
from langchain_community.chat_models.meta import convert_messages_to_prompt_llama
@pytest.mark.parametrize(
("messages", "expected"),
[
([HumanMessage(content="Hello")], "[INST] Hello [/INST]"),
(
[HumanMessage(content="Hello"), AIMessage(content="Answer:")],
"[INST] Hello [/INST]\nAnswer:",
),
(
[
SystemMessage(content="You're an assistant"),
HumanMessage(content="Hello"),
AIMessage(content="Answer:"),
],
"<<SYS>> You're an assistant <</SYS>>\n[INST] Hello [/INST]\nAnswer:",
),
],
)
def test_formatting(messages: List[BaseMessage], expected: str) -> None:
result = convert_messages_to_prompt_llama(messages)
assert result == expected
def test_anthropic_bedrock() -> None:
client = MagicMock()
respbody = MagicMock(
read=MagicMock(
return_value=MagicMock(
decode=MagicMock(return_value=b'{"completion":"Hi back"}')
)
)
)
client.invoke_model.return_value = {"body": respbody}
model = BedrockChat(model_id="anthropic.claude-v2", client=client)
# should not throw an error
model.invoke("hello there")

View File

@@ -1,36 +0,0 @@
from langchain_community.chat_models import __all__
EXPECTED_ALL = [
"BedrockChat",
"FakeListChatModel",
"PromptLayerChatOpenAI",
"ChatEverlyAI",
"ChatAnthropic",
"ChatCohere",
"ChatDatabricks",
"ChatGooglePalm",
"ChatMlflow",
"ChatMLflowAIGateway",
"ChatOllama",
"ChatVertexAI",
"JinaChat",
"HumanInputChatModel",
"MiniMaxChat",
"ChatAnyscale",
"ChatLiteLLM",
"ErnieBotChat",
"ChatJavelinAIGateway",
"ChatKonko",
"PaiEasChatEndpoint",
"QianfanChatEndpoint",
"ChatFireworks",
"ChatYandexGPT",
"ChatBaichuan",
"ChatHunyuan",
"GigaChat",
"VolcEngineMaasChat",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -1,96 +0,0 @@
"""Tests for the various PDF parsers."""
from pathlib import Path
from typing import Iterator
import pytest
from langchain_community.document_loaders.base import BaseBlobParser
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.pdf import (
PDFMinerParser,
PyMuPDFParser,
PyPDFium2Parser,
PyPDFParser,
)
_THIS_DIR = Path(__file__).parents[3]
_EXAMPLES_DIR = _THIS_DIR / "examples"
# Paths to test PDF files
HELLO_PDF = _EXAMPLES_DIR / "hello.pdf"
LAYOUT_PARSER_PAPER_PDF = _EXAMPLES_DIR / "layout-parser-paper.pdf"
def _assert_with_parser(parser: BaseBlobParser, splits_by_page: bool = True) -> None:
"""Standard tests to verify that the given parser works.
Args:
parser (BaseBlobParser): The parser to test.
splits_by_page (bool): Whether the parser splits by page or not by default.
"""
blob = Blob.from_path(HELLO_PDF)
doc_generator = parser.lazy_parse(blob)
assert isinstance(doc_generator, Iterator)
docs = list(doc_generator)
assert len(docs) == 1
page_content = docs[0].page_content
assert isinstance(page_content, str)
# The different parsers return different amount of whitespace, so using
# startswith instead of equals.
assert docs[0].page_content.startswith("Hello world!")
blob = Blob.from_path(LAYOUT_PARSER_PAPER_PDF)
doc_generator = parser.lazy_parse(blob)
assert isinstance(doc_generator, Iterator)
docs = list(doc_generator)
if splits_by_page:
assert len(docs) == 16
else:
assert len(docs) == 1
# Test is imprecise since the parsers yield different parse information depending
# on configuration. Each parser seems to yield a slightly different result
# for this page!
assert "LayoutParser" in docs[0].page_content
metadata = docs[0].metadata
assert metadata["source"] == str(LAYOUT_PARSER_PAPER_PDF)
if splits_by_page:
assert int(metadata["page"]) == 0
@pytest.mark.requires("pypdf")
def test_pypdf_parser() -> None:
"""Test PyPDF parser."""
_assert_with_parser(PyPDFParser())
@pytest.mark.requires("pdfminer")
def test_pdfminer_parser() -> None:
"""Test PDFMiner parser."""
# Does not follow defaults to split by page.
_assert_with_parser(PDFMinerParser(), splits_by_page=False)
@pytest.mark.requires("fitz") # package is PyMuPDF
def test_pymupdf_loader() -> None:
"""Test PyMuPDF loader."""
_assert_with_parser(PyMuPDFParser())
@pytest.mark.requires("pypdfium2")
def test_pypdfium2_parser() -> None:
"""Test PyPDFium2 parser."""
# Does not follow defaults to split by page.
_assert_with_parser(PyPDFium2Parser())
@pytest.mark.requires("rapidocr_onnxruntime")
def test_extract_images_text_from_pdf() -> None:
"""Test extract image from pdf and recognize text with rapid ocr"""
_assert_with_parser(PyPDFParser(extract_images=True))
_assert_with_parser(PDFMinerParser(extract_images=True))
_assert_with_parser(PyMuPDFParser(extract_images=True))
_assert_with_parser(PyPDFium2Parser(extract_images=True))

View File

@@ -1,59 +0,0 @@
from langchain_community.embeddings import __all__
EXPECTED_ALL = [
"CacheBackedEmbeddings",
"ClarifaiEmbeddings",
"CohereEmbeddings",
"DatabricksEmbeddings",
"ElasticsearchEmbeddings",
"FastEmbedEmbeddings",
"HuggingFaceEmbeddings",
"HuggingFaceInferenceAPIEmbeddings",
"InfinityEmbeddings",
"GradientEmbeddings",
"JinaEmbeddings",
"LlamaCppEmbeddings",
"HuggingFaceHubEmbeddings",
"MlflowAIGatewayEmbeddings",
"MlflowEmbeddings",
"ModelScopeEmbeddings",
"TensorflowHubEmbeddings",
"SagemakerEndpointEmbeddings",
"HuggingFaceInstructEmbeddings",
"MosaicMLInstructorEmbeddings",
"SelfHostedEmbeddings",
"SelfHostedHuggingFaceEmbeddings",
"SelfHostedHuggingFaceInstructEmbeddings",
"FakeEmbeddings",
"DeterministicFakeEmbedding",
"AlephAlphaAsymmetricSemanticEmbedding",
"AlephAlphaSymmetricSemanticEmbedding",
"SentenceTransformerEmbeddings",
"GooglePalmEmbeddings",
"MiniMaxEmbeddings",
"VertexAIEmbeddings",
"BedrockEmbeddings",
"DeepInfraEmbeddings",
"EdenAiEmbeddings",
"DashScopeEmbeddings",
"EmbaasEmbeddings",
"OctoAIEmbeddings",
"SpacyEmbeddings",
"NLPCloudEmbeddings",
"GPT4AllEmbeddings",
"XinferenceEmbeddings",
"LocalAIEmbeddings",
"AwaEmbeddings",
"HuggingFaceBgeEmbeddings",
"ErnieEmbeddings",
"JavelinAIGatewayEmbeddings",
"OllamaEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",
"BookendEmbeddings",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -1,91 +0,0 @@
from langchain_core.language_models.llms import BaseLLM
from langchain_community import llms
EXPECT_ALL = [
"AI21",
"AlephAlpha",
"AmazonAPIGateway",
"Anthropic",
"Anyscale",
"Arcee",
"Aviary",
"AzureMLOnlineEndpoint",
"Banana",
"Baseten",
"Beam",
"Bedrock",
"CTransformers",
"CTranslate2",
"CerebriumAI",
"ChatGLM",
"Clarifai",
"Cohere",
"Databricks",
"DeepInfra",
"DeepSparse",
"EdenAI",
"FakeListLLM",
"Fireworks",
"ForefrontAI",
"GigaChat",
"GPT4All",
"GooglePalm",
"GooseAI",
"GradientLLM",
"HuggingFaceEndpoint",
"HuggingFaceHub",
"HuggingFacePipeline",
"HuggingFaceTextGenInference",
"HumanInputLLM",
"KoboldApiLLM",
"LlamaCpp",
"TextGen",
"ManifestWrapper",
"Minimax",
"MlflowAIGateway",
"Modal",
"MosaicML",
"Nebula",
"NIBittensorLLM",
"NLPCloud",
"Ollama",
"OpenLLM",
"OpenLM",
"PaiEasEndpoint",
"Petals",
"PipelineAI",
"Predibase",
"PredictionGuard",
"PromptLayerOpenAI",
"PromptLayerOpenAIChat",
"OpaquePrompts",
"RWKV",
"Replicate",
"SagemakerEndpoint",
"SelfHostedHuggingFaceLLM",
"SelfHostedPipeline",
"StochasticAI",
"TitanTakeoff",
"TitanTakeoffPro",
"Tongyi",
"VertexAI",
"VertexAIModelGarden",
"VLLM",
"VLLMOpenAI",
"Writer",
"OctoAIEndpoint",
"Xinference",
"JavelinAIGateway",
"QianfanLLMEndpoint",
"YandexGPT",
"VolcEngineMaasLLM",
"WatsonxLLM",
]
def test_all_imports() -> None:
"""Simple test to make sure all things can be imported."""
for cls in llms.__all__:
assert issubclass(getattr(llms, cls), BaseLLM)
assert set(llms.__all__) == set(EXPECT_ALL)

View File

@@ -1,40 +0,0 @@
from typing import List, Type
from langchain_core.tools import BaseTool, StructuredTool
import langchain_community.tools
from langchain_community.tools import _DEPRECATED_TOOLS
from langchain_community.tools import __all__ as tools_all
_EXCLUDE = {
BaseTool,
StructuredTool,
}
def _get_tool_classes(skip_tools_without_default_names: bool) -> List[Type[BaseTool]]:
results = []
for tool_class_name in tools_all:
if tool_class_name in _DEPRECATED_TOOLS:
continue
# Resolve the str to the class
tool_class = getattr(langchain_community.tools, tool_class_name)
if isinstance(tool_class, type) and issubclass(tool_class, BaseTool):
if tool_class in _EXCLUDE:
continue
if (
skip_tools_without_default_names
and tool_class.__fields__["name"].default # type: ignore
in [None, ""]
):
continue
results.append(tool_class)
return results
def test_tool_names_unique() -> None:
"""Test that the default names for our core tools are unique."""
tool_classes = _get_tool_classes(skip_tools_without_default_names=True)
names = sorted([tool_cls.__fields__["name"].default for tool_cls in tool_classes])
duplicated_names = [name for name in names if names.count(name) > 1]
assert not duplicated_names

View File

@@ -1,728 +0,0 @@
"""Test FAISS functionality."""
import datetime
import math
import tempfile
import pytest
from typing import Union
from langchain_core.documents import Document
from langchain_community.docstore.base import Docstore
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores.faiss import FAISS
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
_PAGE_CONTENT = """This is a page about LangChain.
It is a really cool framework.
What isn't there to love about langchain?
Made in 2022."""
class FakeDocstore(Docstore):
"""Fake docstore for testing purposes."""
def search(self, search: str) -> Union[str, Document]:
"""Return the fake document."""
document = Document(page_content=_PAGE_CONTENT)
return document
@pytest.mark.requires("faiss")
def test_faiss() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_afrom_texts() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_vector_sim() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_by_vector(query_vec, k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_async_vector_sim() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_by_vector(query_vec, k=1)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_vector_sim_with_score_threshold() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_by_vector(query_vec, k=2, score_threshold=0.2)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_vector_async_sim_with_score_threshold() -> None:
"""Test vector similarity."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_by_vector(
query_vec, k=2, score_threshold=0.2
)
assert output == [Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_similarity_search_with_score_by_vector() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_with_score_by_vector(query_vec, k=1)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
@pytest.mark.requires("faiss")
async def test_similarity_async_search_with_score_by_vector() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_with_score_by_vector(query_vec, k=1)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
@pytest.mark.requires("faiss")
def test_similarity_search_with_score_by_vector_with_score_threshold() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.similarity_search_with_score_by_vector(
query_vec,
k=2,
score_threshold=0.2,
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
assert output[0][1] < 0.2
@pytest.mark.requires("faiss")
async def test_sim_asearch_with_score_by_vector_with_score_threshold() -> None:
"""Test vector similarity with score by vector."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
index_to_id = docsearch.index_to_docstore_id
expected_docstore = InMemoryDocstore(
{
index_to_id[0]: Document(page_content="foo"),
index_to_id[1]: Document(page_content="bar"),
index_to_id[2]: Document(page_content="baz"),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.asimilarity_search_with_score_by_vector(
query_vec,
k=2,
score_threshold=0.2,
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo")
assert output[0][1] < 0.2
@pytest.mark.requires("faiss")
def test_faiss_mmr() -> None:
texts = ["foo", "foo", "fou", "foy"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
query_vec = FakeEmbeddings().embed_query(text="foo")
# make sure we can have k > docstore size
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo")
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo")
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr() -> None:
texts = ["foo", "foo", "fou", "foy"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
query_vec = await FakeEmbeddings().aembed_query(text="foo")
# make sure we can have k > docstore size
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo")
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo")
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1
)
assert len(output) == len(texts)
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas_and_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": 1}
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo", metadata={"page": 1})
assert output[0][1] == 0.0
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas_and_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": 1}
)
assert len(output) == 1
assert output[0][0] == Document(page_content="foo", metadata={"page": 1})
assert output[0][1] == 0.0
@pytest.mark.requires("faiss")
def test_faiss_mmr_with_metadatas_and_list_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = FakeEmbeddings().embed_query(text="foo")
output = docsearch.max_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": [0, 1, 2]}
)
assert len(output) == 3
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
async def test_faiss_async_mmr_with_metadatas_and_list_filter() -> None:
texts = ["foo", "foo", "fou", "foy"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
query_vec = await FakeEmbeddings().aembed_query(text="foo")
output = await docsearch.amax_marginal_relevance_search_with_score_by_vector(
query_vec, k=10, lambda_mult=0.1, filter={"page": [0, 1, 2]}
)
assert len(output) == 3
assert output[0][0] == Document(page_content="foo", metadata={"page": 0})
assert output[0][1] == 0.0
assert output[1][0] != Document(page_content="foo", metadata={"page": 0})
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1)
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1)
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas_and_filter() -> None:
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foo", k=1, filter={"page": 1})
assert output == [Document(page_content="bar", metadata={"page": 1})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas_and_filter() -> None:
texts = ["foo", "bar", "baz"]
metadatas = [{"page": i} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foo", k=1, filter={"page": 1})
assert output == [Document(page_content="bar", metadata={"page": 1})]
@pytest.mark.requires("faiss")
def test_faiss_with_metadatas_and_list_filter() -> None:
texts = ["foo", "bar", "baz", "foo", "qux"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = FAISS.from_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
docsearch.index_to_docstore_id[3]: Document(
page_content="foo", metadata={"page": 3}
),
docsearch.index_to_docstore_id[4]: Document(
page_content="qux", metadata={"page": 3}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = docsearch.similarity_search("foor", k=1, filter={"page": [0, 1, 2]})
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
async def test_faiss_async_with_metadatas_and_list_filter() -> None:
texts = ["foo", "bar", "baz", "foo", "qux"]
metadatas = [{"page": i} if i <= 3 else {"page": 3} for i in range(len(texts))]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings(), metadatas=metadatas)
expected_docstore = InMemoryDocstore(
{
docsearch.index_to_docstore_id[0]: Document(
page_content="foo", metadata={"page": 0}
),
docsearch.index_to_docstore_id[1]: Document(
page_content="bar", metadata={"page": 1}
),
docsearch.index_to_docstore_id[2]: Document(
page_content="baz", metadata={"page": 2}
),
docsearch.index_to_docstore_id[3]: Document(
page_content="foo", metadata={"page": 3}
),
docsearch.index_to_docstore_id[4]: Document(
page_content="qux", metadata={"page": 3}
),
}
)
assert docsearch.docstore.__dict__ == expected_docstore.__dict__
output = await docsearch.asimilarity_search("foor", k=1, filter={"page": [0, 1, 2]})
assert output == [Document(page_content="foo", metadata={"page": 0})]
@pytest.mark.requires("faiss")
def test_faiss_search_not_found() -> None:
"""Test what happens when document is not found."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
# Get rid of the docstore to purposefully induce errors.
docsearch.docstore = InMemoryDocstore({})
with pytest.raises(ValueError):
docsearch.similarity_search("foo")
@pytest.mark.requires("faiss")
async def test_faiss_async_search_not_found() -> None:
"""Test what happens when document is not found."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
# Get rid of the docstore to purposefully induce errors.
docsearch.docstore = InMemoryDocstore({})
with pytest.raises(ValueError):
await docsearch.asimilarity_search("foo")
@pytest.mark.requires("faiss")
def test_faiss_add_texts() -> None:
"""Test end to end adding of texts."""
# Create initial doc store.
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
# Test adding a similar document as before.
docsearch.add_texts(["foo"])
output = docsearch.similarity_search("foo", k=2)
assert output == [Document(page_content="foo"), Document(page_content="foo")]
@pytest.mark.requires("faiss")
async def test_faiss_async_add_texts() -> None:
"""Test end to end adding of texts."""
# Create initial doc store.
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
# Test adding a similar document as before.
await docsearch.aadd_texts(["foo"])
output = await docsearch.asimilarity_search("foo", k=2)
assert output == [Document(page_content="foo"), Document(page_content="foo")]
@pytest.mark.requires("faiss")
def test_faiss_add_texts_not_supported() -> None:
"""Test adding of texts to a docstore that doesn't support it."""
docsearch = FAISS(FakeEmbeddings(), None, FakeDocstore(), {})
with pytest.raises(ValueError):
docsearch.add_texts(["foo"])
@pytest.mark.requires("faiss")
async def test_faiss_async_add_texts_not_supported() -> None:
"""Test adding of texts to a docstore that doesn't support it."""
docsearch = FAISS(FakeEmbeddings(), None, FakeDocstore(), {})
with pytest.raises(ValueError):
await docsearch.aadd_texts(["foo"])
@pytest.mark.requires("faiss")
def test_faiss_local_save_load() -> None:
"""Test end to end serialization."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(texts, FakeEmbeddings())
temp_timestamp = datetime.datetime.utcnow().strftime("%Y%m%d-%H%M%S")
with tempfile.TemporaryDirectory(suffix="_" + temp_timestamp + "/") as temp_folder:
docsearch.save_local(temp_folder)
new_docsearch = FAISS.load_local(temp_folder, FakeEmbeddings())
assert new_docsearch.index is not None
@pytest.mark.requires("faiss")
async def test_faiss_async_local_save_load() -> None:
"""Test end to end serialization."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(texts, FakeEmbeddings())
temp_timestamp = datetime.datetime.utcnow().strftime("%Y%m%d-%H%M%S")
with tempfile.TemporaryDirectory(suffix="_" + temp_timestamp + "/") as temp_folder:
docsearch.save_local(temp_folder)
new_docsearch = FAISS.load_local(temp_folder, FakeEmbeddings())
assert new_docsearch.index is not None
@pytest.mark.requires("faiss")
def test_faiss_similarity_search_with_relevance_scores() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = docsearch.similarity_search_with_relevance_scores("foo", k=1)
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
async def test_faiss_async_similarity_search_with_relevance_scores() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = await docsearch.asimilarity_search_with_relevance_scores("foo", k=1)
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
def test_faiss_similarity_search_with_relevance_scores_with_threshold() -> None:
"""Test the similarity search with normalized similarities with score threshold."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = docsearch.similarity_search_with_relevance_scores(
"foo", k=2, score_threshold=0.5
)
assert len(outputs) == 1
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
async def test_faiss_asimilarity_search_with_relevance_scores_with_threshold() -> None:
"""Test the similarity search with normalized similarities with score threshold."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts,
FakeEmbeddings(),
relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)
outputs = await docsearch.asimilarity_search_with_relevance_scores(
"foo", k=2, score_threshold=0.5
)
assert len(outputs) == 1
output, score = outputs[0]
assert output == Document(page_content="foo")
assert score == 1.0
@pytest.mark.requires("faiss")
def test_faiss_invalid_normalize_fn() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = FAISS.from_texts(
texts, FakeEmbeddings(), relevance_score_fn=lambda _: 2.0
)
with pytest.warns(Warning, match="scores must be between"):
docsearch.similarity_search_with_relevance_scores("foo", k=1)
@pytest.mark.requires("faiss")
async def test_faiss_async_invalid_normalize_fn() -> None:
"""Test the similarity search with normalized similarities."""
texts = ["foo", "bar", "baz"]
docsearch = await FAISS.afrom_texts(
texts, FakeEmbeddings(), relevance_score_fn=lambda _: 2.0
)
with pytest.warns(Warning, match="scores must be between"):
await docsearch.asimilarity_search_with_relevance_scores("foo", k=1)
@pytest.mark.requires("faiss")
def test_missing_normalize_score_fn() -> None:
"""Test doesn't perform similarity search without a valid distance strategy."""
texts = ["foo", "bar", "baz"]
faiss_instance = FAISS.from_texts(texts, FakeEmbeddings(), distance_strategy="fake")
with pytest.raises(ValueError):
faiss_instance.similarity_search_with_relevance_scores("foo", k=2)
@pytest.mark.requires("faiss")
async def test_async_missing_normalize_score_fn() -> None:
"""Test doesn't perform similarity search without a valid distance strategy."""
texts = ["foo", "bar", "baz"]
faiss_instance = await FAISS.afrom_texts(
texts, FakeEmbeddings(), distance_strategy="fake"
)
with pytest.raises(ValueError):
await faiss_instance.asimilarity_search_with_relevance_scores("foo", k=2)
@pytest.mark.requires("faiss")
def test_delete() -> None:
"""Test the similarity search with normalized similarities."""
ids = ["a", "b", "c"]
docsearch = FAISS.from_texts(["foo", "bar", "baz"], FakeEmbeddings(), ids=ids)
docsearch.delete(ids[1:2])
result = docsearch.similarity_search("bar", k=2)
assert sorted([d.page_content for d in result]) == ["baz", "foo"]
assert docsearch.index_to_docstore_id == {0: ids[0], 1: ids[2]}
@pytest.mark.requires("faiss")
async def test_async_delete() -> None:
"""Test the similarity search with normalized similarities."""
ids = ["a", "b", "c"]
docsearch = await FAISS.afrom_texts(
["foo", "bar", "baz"], FakeEmbeddings(), ids=ids
)
docsearch.delete(ids[1:2])
result = await docsearch.asimilarity_search("bar", k=2)
assert sorted([d.page_content for d in result]) == ["baz", "foo"]
assert docsearch.index_to_docstore_id == {0: ids[0], 1: ids[2]}

View File

@@ -1,13 +0,0 @@
from langchain_community import vectorstores
from langchain_core.vectorstores import VectorStore
def test_all_imports() -> None:
"""Simple test to make sure all things can be imported."""
for cls in vectorstores.__all__:
if cls not in [
"AlibabaCloudOpenSearchSettings",
"ClickhouseSettings",
"MyScaleSettings",
]:
assert issubclass(getattr(vectorstores, cls), VectorStore)

View File

@@ -1,49 +0,0 @@
"""
**Utility functions** for LangChain.
These functions do not depend on any other LangChain module.
"""
from langchain_core.utils.env import get_from_dict_or_env, get_from_env
from langchain_core.utils.formatting import StrictFormatter, formatter
from langchain_core.utils.input import (
get_bolded_text,
get_color_mapping,
get_colored_text,
print_text,
)
from langchain_core.utils.loading import try_load_from_hub
from langchain_core.utils.strings import comma_list, stringify_dict, stringify_value
from langchain_core.utils.utils import (
build_extra_kwargs,
check_package_version,
convert_to_secret_str,
get_pydantic_field_names,
guard_import,
mock_now,
raise_for_status_with_text,
xor_args,
)
__all__ = [
"StrictFormatter",
"check_package_version",
"convert_to_secret_str",
"formatter",
"get_bolded_text",
"get_color_mapping",
"get_colored_text",
"get_pydantic_field_names",
"guard_import",
"mock_now",
"print_text",
"raise_for_status_with_text",
"xor_args",
"try_load_from_hub",
"build_extra_kwargs",
"get_from_env",
"get_from_dict_or_env",
"stringify_dict",
"comma_list",
"stringify_value",
]

View File

@@ -1,45 +0,0 @@
from __future__ import annotations
import os
from typing import Any, Dict, Optional
def env_var_is_set(env_var: str) -> bool:
"""Check if an environment variable is set.
Args:
env_var (str): The name of the environment variable.
Returns:
bool: True if the environment variable is set, False otherwise.
"""
return env_var in os.environ and os.environ[env_var] not in (
"",
"0",
"false",
"False",
)
def get_from_dict_or_env(
data: Dict[str, Any], key: str, env_key: str, default: Optional[str] = None
) -> str:
"""Get a value from a dictionary or an environment variable."""
if key in data and data[key]:
return data[key]
else:
return get_from_env(key, env_key, default=default)
def get_from_env(key: str, env_key: str, default: Optional[str] = None) -> str:
"""Get a value from a dictionary or an environment variable."""
if env_key in os.environ and os.environ[env_key]:
return os.environ[env_key]
elif default is not None:
return default
else:
raise ValueError(
f"Did not find {key}, please add an environment variable"
f" `{env_key}` which contains it, or pass"
f" `{key}` as a named parameter."
)

View File

@@ -1,28 +0,0 @@
from langchain_core.utils import __all__
EXPECTED_ALL = [
"StrictFormatter",
"check_package_version",
"convert_to_secret_str",
"formatter",
"get_bolded_text",
"get_color_mapping",
"get_colored_text",
"get_pydantic_field_names",
"guard_import",
"mock_now",
"print_text",
"raise_for_status_with_text",
"xor_args",
"try_load_from_hub",
"build_extra_kwargs",
"get_from_dict_or_env",
"get_from_env",
"stringify_dict",
"comma_list",
"stringify_value"
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -1,83 +0,0 @@
"""**Callback handlers** allow listening to events in LangChain.
**Class hierarchy:**
.. code-block::
BaseCallbackHandler --> <name>CallbackHandler # Example: AimCallbackHandler
"""
from langchain_core.callbacks import StdOutCallbackHandler, StreamingStdOutCallbackHandler
from langchain_core.tracers.langchain import LangChainTracer
from langchain_core.tracers.context import (
collect_runs,
tracing_enabled,
tracing_v2_enabled,
)
from langchain_community.callbacks.aim_callback import AimCallbackHandler
from langchain_community.callbacks.argilla_callback import ArgillaCallbackHandler
from langchain_community.callbacks.arize_callback import ArizeCallbackHandler
from langchain_community.callbacks.arthur_callback import ArthurCallbackHandler
from langchain_community.callbacks.clearml_callback import ClearMLCallbackHandler
from langchain_community.callbacks.comet_ml_callback import CometCallbackHandler
from langchain_community.callbacks.context_callback import ContextCallbackHandler
from langchain_community.callbacks.file import FileCallbackHandler
from langchain_community.callbacks.flyte_callback import FlyteCallbackHandler
from langchain_community.callbacks.human import HumanApprovalCallbackHandler
from langchain_community.callbacks.infino_callback import InfinoCallbackHandler
from langchain_community.callbacks.labelstudio_callback import LabelStudioCallbackHandler
from langchain_community.callbacks.llmonitor_callback import LLMonitorCallbackHandler
from langchain_community.callbacks.mlflow_callback import MlflowCallbackHandler
from langchain_community.callbacks.openai_info import OpenAICallbackHandler
from langchain_community.callbacks.promptlayer_callback import PromptLayerCallbackHandler
from langchain_community.callbacks.sagemaker_callback import SageMakerCallbackHandler
from langchain_community.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain_community.callbacks.streaming_stdout_final_only import (
FinalStreamingStdOutCallbackHandler,
)
from langchain_community.callbacks.streamlit import LLMThoughtLabeler, StreamlitCallbackHandler
from langchain_community.callbacks.trubrics_callback import TrubricsCallbackHandler
from langchain_community.callbacks.wandb_callback import WandbCallbackHandler
from langchain_community.callbacks.whylabs_callback import WhyLabsCallbackHandler
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
__all__ = [
"AimCallbackHandler",
"ArgillaCallbackHandler",
"ArizeCallbackHandler",
"PromptLayerCallbackHandler",
"ArthurCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"ContextCallbackHandler",
"FileCallbackHandler",
"HumanApprovalCallbackHandler",
"InfinoCallbackHandler",
"MlflowCallbackHandler",
"LLMonitorCallbackHandler",
"OpenAICallbackHandler",
"StdOutCallbackHandler",
"AsyncIteratorCallbackHandler",
"StreamingStdOutCallbackHandler",
"FinalStreamingStdOutCallbackHandler",
"LLMThoughtLabeler",
"LangChainTracer",
"StreamlitCallbackHandler",
"WandbCallbackHandler",
"WhyLabsCallbackHandler",
"get_openai_callback",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
"TrubricsCallbackHandler",
]

View File

@@ -1,68 +0,0 @@
from __future__ import annotations
from langchain_core.callbacks.manager import (
AsyncCallbackManager,
AsyncCallbackManagerForChainGroup,
AsyncCallbackManagerForChainRun,
AsyncCallbackManagerForLLMRun,
AsyncCallbackManagerForRetrieverRun,
AsyncCallbackManagerForToolRun,
AsyncParentRunManager,
AsyncRunManager,
BaseRunManager,
CallbackManager,
CallbackManagerForChainGroup,
CallbackManagerForChainRun,
CallbackManagerForLLMRun,
CallbackManagerForRetrieverRun,
CallbackManagerForToolRun,
Callbacks,
ParentRunManager,
RunManager,
ahandle_event,
atrace_as_chain_group,
handle_event,
trace_as_chain_group,
)
from langchain_core.tracers.context import (
collect_runs,
tracing_enabled,
tracing_v2_enabled,
)
from langchain_core.utils.env import env_var_is_set
from langchain_community.callbacks.manager import (
get_openai_callback,
wandb_tracing_enabled,
)
__all__ = [
"BaseRunManager",
"RunManager",
"ParentRunManager",
"AsyncRunManager",
"AsyncParentRunManager",
"CallbackManagerForLLMRun",
"AsyncCallbackManagerForLLMRun",
"CallbackManagerForChainRun",
"AsyncCallbackManagerForChainRun",
"CallbackManagerForToolRun",
"AsyncCallbackManagerForToolRun",
"CallbackManagerForRetrieverRun",
"AsyncCallbackManagerForRetrieverRun",
"CallbackManager",
"CallbackManagerForChainGroup",
"AsyncCallbackManager",
"AsyncCallbackManagerForChainGroup",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"atrace_as_chain_group",
"trace_as_chain_group",
"handle_event",
"ahandle_event",
"Callbacks",
"env_var_is_set",
"get_openai_callback",
"wandb_tracing_enabled",
]

View File

@@ -1,36 +0,0 @@
from langchain.callbacks.manager import __all__
EXPECTED_ALL = [
"BaseRunManager",
"RunManager",
"ParentRunManager",
"AsyncRunManager",
"AsyncParentRunManager",
"CallbackManagerForLLMRun",
"AsyncCallbackManagerForLLMRun",
"CallbackManagerForChainRun",
"AsyncCallbackManagerForChainRun",
"CallbackManagerForToolRun",
"AsyncCallbackManagerForToolRun",
"CallbackManagerForRetrieverRun",
"AsyncCallbackManagerForRetrieverRun",
"CallbackManager",
"CallbackManagerForChainGroup",
"AsyncCallbackManager",
"AsyncCallbackManagerForChainGroup",
"tracing_enabled",
"tracing_v2_enabled",
"collect_runs",
"atrace_as_chain_group",
"trace_as_chain_group",
"handle_event",
"ahandle_event",
"env_var_is_set",
"Callbacks",
"get_openai_callback",
"wandb_tracing_enabled",
]
def test_all_imports() -> None:
assert set(__all__) == set(EXPECTED_ALL)

View File

@@ -1,75 +0,0 @@
"""Test LLM chain."""
from tempfile import TemporaryDirectory
from typing import Dict, List, Union
from unittest.mock import patch
import pytest
from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from tests.unit_tests.llms.fake_llm import FakeLLM
class FakeOutputParser(BaseOutputParser):
"""Fake output parser class for testing."""
def parse(self, text: str) -> Union[str, List[str], Dict[str, str]]:
"""Parse by splitting."""
return text.split()
@pytest.fixture
def fake_llm_chain() -> LLMChain:
"""Fake LLM chain for testing purposes."""
prompt = PromptTemplate(input_variables=["bar"], template="This is a {bar}:")
return LLMChain(prompt=prompt, llm=FakeLLM(), output_key="text1")
@patch(
"langchain_community.llms.loading.get_type_to_cls_dict",
lambda: {"fake": lambda: FakeLLM},
)
def test_serialization(fake_llm_chain: LLMChain) -> None:
"""Test serialization."""
from langchain.chains.loading import load_chain
with TemporaryDirectory() as temp_dir:
file = temp_dir + "/llm.json"
fake_llm_chain.save(file)
loaded_chain = load_chain(file)
assert loaded_chain == fake_llm_chain
def test_missing_inputs(fake_llm_chain: LLMChain) -> None:
"""Test error is raised if inputs are missing."""
with pytest.raises(ValueError):
fake_llm_chain({"foo": "bar"})
def test_valid_call(fake_llm_chain: LLMChain) -> None:
"""Test valid call of LLM chain."""
output = fake_llm_chain({"bar": "baz"})
assert output == {"bar": "baz", "text1": "foo"}
# Test with stop words.
output = fake_llm_chain({"bar": "baz", "stop": ["foo"]})
# Response should be `bar` now.
assert output == {"bar": "baz", "stop": ["foo"], "text1": "bar"}
def test_predict_method(fake_llm_chain: LLMChain) -> None:
"""Test predict method works."""
output = fake_llm_chain.predict(bar="baz")
assert output == "foo"
def test_predict_and_parse() -> None:
"""Test parsing ability."""
prompt = PromptTemplate(
input_variables=["foo"], template="{foo}", output_parser=FakeOutputParser()
)
llm = FakeLLM(queries={"foo": "foo bar"})
chain = LLMChain(prompt=prompt, llm=llm)
output = chain.predict_and_parse(foo="foo")
assert output == ["foo", "bar"]

View File

@@ -1,20 +0,0 @@
from langchain_openai.chat_models import AzureChatOpenAI, ChatOpenAI, _import_tiktoken
from langchain_openai.embeddings import AzureOpenAIEmbeddings, OpenAIEmbeddings
from langchain_openai.functions import (
convert_pydantic_to_openai_function,
convert_pydantic_to_openai_tool,
)
from langchain_openai.llms import AzureOpenAI, BaseOpenAI, OpenAI
__all__ = [
"_import_tiktoken",
"OpenAI",
"AzureOpenAI",
"ChatOpenAI",
"AzureChatOpenAI",
"OpenAIEmbeddings",
"AzureOpenAIEmbeddings",
"convert_pydantic_to_openai_function",
"convert_pydantic_to_openai_tool",
"BaseOpenAI",
]

View File

@@ -1,17 +0,0 @@
from langchain_openai.chat_models.azure import AzureChatOpenAI
from langchain_openai.chat_models.base import (
ChatOpenAI,
_convert_delta_to_message_chunk,
_create_retry_decorator,
_import_tiktoken,
acompletion_with_retry,
)
__all__ = [
"_create_retry_decorator",
"acompletion_with_retry",
"_convert_delta_to_message_chunk",
"_import_tiktoken",
"ChatOpenAI",
"AzureChatOpenAI",
]

View File

@@ -1,21 +0,0 @@
from langchain_openai.embeddings.azure import AzureOpenAIEmbeddings
from langchain_openai.embeddings.base import (
OpenAIEmbeddings,
_async_retry_decorator,
_check_response,
_create_retry_decorator,
_is_openai_v1,
async_embed_with_retry,
embed_with_retry,
)
__all__ = [
"_create_retry_decorator",
"_async_retry_decorator",
"_check_response",
"embed_with_retry",
"async_embed_with_retry",
"_is_openai_v1",
"OpenAIEmbeddings",
"AzureOpenAIEmbeddings",
]

View File

@@ -1,27 +0,0 @@
from langchain_openai.llms.base import (
AzureOpenAI,
BaseOpenAI,
OpenAI,
OpenAIChat,
_create_retry_decorator,
_stream_response_to_generation_chunk,
_streaming_response_template,
_update_response,
acompletion_with_retry,
completion_with_retry,
update_token_usage,
)
__all__ = [
"update_token_usage",
"_stream_response_to_generation_chunk",
"_update_response",
"_streaming_response_template",
"_create_retry_decorator",
"completion_with_retry",
"acompletion_with_retry",
"OpenAIChat",
"OpenAI",
"AzureOpenAI",
"BaseOpenAI",
]

View File

@@ -1,133 +0,0 @@
import asyncio
import os
from typing import Any
from unittest.mock import MagicMock, patch
import pytest
from langchain_core.language_models import llms as base
from tenacity import wait_none
from langchain_openai.llms import OpenAI
from langchain_openai.utils import is_openai_v1
os.environ["OPENAI_API_KEY"] = "foo"
def _openai_v1_installed() -> bool:
try:
return is_openai_v1()
except Exception as _:
return False
def test_openai_model_param() -> None:
llm = OpenAI(model="foo")
assert llm.model_name == "foo"
llm = OpenAI(model_name="foo")
assert llm.model_name == "foo"
def test_openai_model_kwargs() -> None:
llm = OpenAI(model_kwargs={"foo": "bar"})
assert llm.model_kwargs == {"foo": "bar"}
def test_openai_invalid_model_kwargs() -> None:
with pytest.raises(ValueError):
OpenAI(model_kwargs={"model_name": "foo"})
def test_openai_incorrect_field() -> None:
with pytest.warns(match="not default parameter"):
llm = OpenAI(foo="bar")
assert llm.model_kwargs == {"foo": "bar"}
@pytest.fixture
def mock_completion() -> dict:
return {
"id": "cmpl-3evkmQda5Hu7fcZavknQda3SQ",
"object": "text_completion",
"created": 1689989000,
"model": "text-davinci-003",
"choices": [
{"text": "Bar Baz", "index": 0, "logprobs": None, "finish_reason": "length"}
],
"usage": {"prompt_tokens": 1, "completion_tokens": 2, "total_tokens": 3},
}
def _patched_retry(*args: Any, **kwargs: Any) -> Any:
"""Patched retry for unit tests that does not wait."""
from tenacity import retry
assert "wait" in kwargs
kwargs["wait"] = wait_none()
r = retry(*args, **kwargs)
return r
@pytest.mark.skipif(
_openai_v1_installed(), reason="Retries only handled by LangChain for openai<1"
)
def test_openai_retries(mock_completion: dict) -> None:
llm = OpenAI()
mock_client = MagicMock()
completed = False
raised = False
import openai
def raise_once(*args: Any, **kwargs: Any) -> Any:
nonlocal completed, raised
if not raised:
raised = True
raise openai.error.APIError
completed = True
return mock_completion
mock_client.create = raise_once
# Patch the retry to avoid waiting during a unit test
with patch.object(base, "retry", _patched_retry):
with patch.object(
llm,
"client",
mock_client,
):
res = llm.predict("bar")
assert res == "Bar Baz"
assert completed
assert raised
@pytest.mark.skipif(
_openai_v1_installed(), reason="Retries only handled by LangChain for openai<1"
)
async def test_openai_async_retries(mock_completion: dict) -> None:
llm = OpenAI()
mock_client = MagicMock()
completed = False
raised = False
import openai
async def araise_once(*args: Any, **kwargs: Any) -> Any:
nonlocal completed, raised
if not raised:
raised = True
raise openai.error.APIError
await asyncio.sleep(0)
completed = True
return mock_completion
mock_client.acreate = araise_once
# Patch the retry to avoid waiting during a unit test
with patch.object(base, "retry", _patched_retry):
with patch.object(
llm,
"client",
mock_client,
):
res = await llm.apredict("bar")
assert res == "Bar Baz"
assert completed
assert raised

View File

@@ -1,309 +0,0 @@
#!/bin/bash
cd libs
git checkout master -- langchain/{langchain,tests}
git checkout master -- core/{langchain_core,tests}
git checkout master -- experimental/{langchain_experimental,tests}
rm -rf community/{langchain_community,tests}
rm -rf partners/openai/{langchain_openai,tests}
mkdir -p community/langchain_community
touch community/langchain_community/__init__.py
touch community/README.md
mkdir -p community/tests
touch community/tests/__init__.py
mkdir community/tests/unit_tests
touch community/tests/unit_tests/__init__.py
mkdir community/tests/integration_tests/
touch community/tests/integration_tests/__init__.py
mkdir -p community/langchain_community/utils
touch community/langchain_community/utils/__init__.py
mkdir -p community/tests/unit_tests/utils
touch community/tests/unit_tests/utils/__init__.py
mkdir -p community/langchain_community/indexes
touch community/langchain_community/indexes/__init__.py
mkdir community/tests/unit_tests/indexes
touch community/tests/unit_tests/indexes/__init__.py
mkdir -p partners/openai/langchain_openai/{chat_models,llms,embeddings}
mkdir -p partners/openai/tests/{unit_tests,integration_tests}/llms
mkdir -p partners/openai/tests/{unit_tests,integration_tests}/chat_models
mkdir -p partners/openai/tests/{unit_tests,integration_tests}/embeddings
mkdir -p partners/openai/tests/integration_tests/adapters
touch partners/openai/langchain_openai/__init__.py
touch partners/openai/README.md
touch partners/openai/langchain_openai/{llms,chat_models,embeddings}/__init__.py
touch partners/openai/tests/{unit_tests,integration_tests}/__init__.py
touch partners/openai/tests/{unit_tests,integration_tests}/chat_models/__init__.py
touch partners/openai/tests/{unit_tests,integration_tests}/llms/__init__.py
touch partners/openai/tests/{unit_tests,integration_tests}/embeddings/__init__.py
cd langchain
git grep -l 'from langchain.pydantic_v1' | xargs sed -i '' 's/from langchain.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' | xargs sed -i '' 's/from langchain.tools.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' | xargs sed -i '' 's/from langchain.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' | xargs sed -i '' 's/from langchain.llms.base/from langchain_core.language_models.llms/g'
git grep -l 'from langchain.embeddings.base' | xargs sed -i '' 's/from langchain.embeddings.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' | xargs sed -i '' 's/from langchain.vectorstores.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' | xargs sed -i '' 's/from langchain.agents.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' | xargs sed -i '' 's/from langchain.schema.output/from langchain_core.outputs/g'
git grep -l 'from langchain.schema.messages' | xargs sed -i '' 's/from langchain.schema.messages/from langchain_core.messages/g'
git grep -l 'from langchain.schema.embeddings' | xargs sed -i '' 's/from langchain.schema.embeddings/from langchain_core.embeddings/g'
cd ..
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/indexes/base.py community/langchain_community/indexes
mv langchain/langchain/indexes/_sql_record_manager.py community/langchain_community/indexes
mv langchain/langchain/utils/math.py community/langchain_community/utils
mv langchain/langchain/utils/openai.py partners/openai/langchain_openai/utils.py
mv langchain/langchain/utils/openai_functions.py partners/openai/langchain_openai/functions.py
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
mv langchain/tests/unit_tests/chat_loaders community/tests/unit_tests
mv langchain/tests/unit_tests/document_loaders community/tests/unit_tests
mv langchain/tests/unit_tests/docstore community/tests/unit_tests
mv langchain/tests/unit_tests/document_transformers community/tests/unit_tests
mv langchain/tests/unit_tests/embeddings community/tests/unit_tests
mv langchain/tests/unit_tests/graphs community/tests/unit_tests
mv langchain/tests/unit_tests/llms community/tests/unit_tests
mv langchain/tests/unit_tests/chat_models community/tests/unit_tests
mv langchain/tests/unit_tests/memory/chat_message_histories community/tests/unit_tests
mv langchain/tests/unit_tests/storage community/tests/unit_tests
mv langchain/tests/unit_tests/tools community/tests/unit_tests
mv langchain/tests/unit_tests/utilities community/tests/unit_tests
mv langchain/tests/unit_tests/vectorstores community/tests/unit_tests
mv langchain/tests/unit_tests/callbacks community/tests/unit_tests
mv langchain/tests/unit_tests/indexes/test_sql_record_manager.py community/tests/unit_tests/indexes
mv langchain/tests/unit_tests/utils/test_math.py community/tests/unit_tests/utils
mkdir -p langchain/tests/unit_tests/llms
cp {community,langchain}/tests/unit_tests/llms/fake_llm.py
cp {community,langchain}/tests/unit_tests/llms/fake_chat_model.py
mkdir -p langchain/tests/unit_tests/callbacks
cp {community,langchain}/tests/unit_tests/callbacks/fake_callback_handler.py
mv langchain/tests/unit_tests/utils/test_json_schema.py core/tests/unit_tests/utils
mv langchain/tests/unit_tests/utils/test_html.py core/tests/unit_tests/utils
mv langchain/tests/integration_tests/document_loaders community/tests/integration_tests
mv langchain/tests/integration_tests/embeddings community/tests/integration_tests
mv langchain/tests/integration_tests/graphs community/tests/integration_tests
mv langchain/tests/integration_tests/llms community/tests/integration_tests
mv langchain/tests/integration_tests/chat_models community/tests/integration_tests
mv langchain/tests/integration_tests/memory/chat_message_histories community/tests/integration_tests
mv langchain/tests/integration_tests/storage community/tests/integration_tests
mv langchain/tests/integration_tests/tools community/tests/integration_tests
mv langchain/tests/integration_tests/utilities community/tests/integration_tests
mv langchain/tests/integration_tests/vectorstores community/tests/integration_tests
mv langchain/tests/integration_tests/adapters community/tests/integration_tests
mv langchain/tests/integration_tests/callbacks community/tests/integration_tests
mv langchain/tests/integration_tests/{test_kuzu,test_nebulagraph}.py community/tests/integration_tests/graphs
touch community/tests/integration_tests/{chat_message_histories,tools}/__init__.py
git grep -l 'from langchain.utils.json_schema' | xargs sed -i '' 's/from langchain.utils.json_schema/from langchain_core.utils.json_schema/g'
git grep -l 'from langchain.utils.html' | xargs sed -i '' 's/from langchain.utils.html/from langchain_core.utils.html/g'
git grep -l 'from langchain.utils.strings' | xargs sed -i '' 's/from langchain.utils.strings/from langchain_core.utils.strings/g'
git grep -l 'from langchain.utils.env' | xargs sed -i '' 's/from langchain.utils.env/from langchain_core.utils.env/g'
git add community
cd community
git grep -l 'from langchain.pydantic_v1' | xargs sed -i '' 's/from langchain.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.callbacks.base' | xargs sed -i '' 's/from langchain.callbacks.base/from langchain_core.callbacks/g'
git grep -l 'from langchain.callbacks.stdout' | xargs sed -i '' 's/from langchain.callbacks.stdout/from langchain_core.callbacks/g'
git grep -l 'from langchain.callbacks.streaming_stdout' | xargs sed -i '' 's/from langchain.callbacks.streaming_stdout/from langchain_core.callbacks/g'
git grep -l 'from langchain.callbacks.manager' | xargs sed -i '' 's/from langchain.callbacks.manager/from langchain_core.callbacks/g'
git grep -l 'from langchain.callbacks.tracers.base' | xargs sed -i '' 's/from langchain.callbacks.tracers.base/from langchain_core.tracers/g'
git grep -l 'from langchain.tools.base' | xargs sed -i '' 's/from langchain.tools.base/from langchain_core.tools/g'
git grep -l 'from langchain.agents.tools' | xargs sed -i '' 's/from langchain.agents.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' | xargs sed -i '' 's/from langchain.schema.output/from langchain_core.outputs/g'
git grep -l 'from langchain.schema.messages' | xargs sed -i '' 's/from langchain.schema.messages/from langchain_core.messages/g'
git grep -l 'from langchain.utils.math' | xargs sed -i '' 's/from langchain.utils.math/from langchain_community.utils.math/g'
git grep -l 'from langchain.utils.openai_functions' | xargs sed -i '' 's/from langchain.utils.openai_functions/from langchain_openai.functions/g'
git grep -l 'from langchain.utils.openai' | xargs sed -i '' 's/from langchain.utils.openai/from langchain_openai.utils/g'
git grep -l 'from langchain.chat_models.openai' | xargs sed -i '' 's/from\ langchain.chat_models.openai/from langchain_openai.chat_models/g'
git grep -l 'from langchain.chat_models import ChatOpenAI' | xargs sed -i '' 's/from\ langchain.chat_models\ import\ ChatOpenAI/from langchain_openai.chat_models import ChatOpenAI/g'
git grep -l 'from langchain.chat_models import AzureChatOpenAI' | xargs sed -i '' 's/from\ langchain.chat_models\ import\ AzureChatOpenAI/from langchain_openai.chat_models import AzureChatOpenAI/g'
git grep -l 'from langchain.llms import OpenAI' | xargs sed -i '' 's/from\ langchain.llms\ import\ OpenAI/from langchain_openai.llms import OpenAI/g'
git grep -l 'from langchain.llms import AzureOpenAI' | xargs sed -i '' 's/from\ langchain.llms\ import\ AzureOpenAI/from langchain_openai.llms import AzureOpenAI/g'
git grep -l 'from langchain.embeddings import OpenAIEmbeddings' | xargs sed -i '' 's/from\ langchain.embeddings\ import\ OpenAIEmbeddings/from langchain_openai.embeddings import OpenAIEmbeddings/g'
git grep -l 'from langchain.embeddings import AzureOpenAIEmbeddings' | xargs sed -i '' 's/from\ langchain.embeddings\ import\ AzureOpenAIEmbeddings/from langchain_openai.embeddings import AzureOpenAIEmbeddings/g'
git grep -l 'from langchain.chat_models.azure_openai' | xargs sed -i '' 's/from\ langchain.chat_models.azure_openai/from langchain_openai.chat_models/g'
git grep -l 'from langchain.embeddings.openai' | xargs sed -i '' 's/from langchain.embeddings.openai/from langchain_openai.embeddings/g'
git grep -l 'from langchain.embeddings.azure_openai' | xargs sed -i '' 's/from langchain.embeddings.azure_openai/from langchain_openai.embeddings/g'
git grep -l 'from langchain.adapters.openai' | xargs sed -i '' 's/from langchain.adapters.openai/from langchain_openai.adapters/g'
git grep -l 'from langchain.adapters import openai' | xargs sed -i '' 's/from\ langchain.adapters\ import\ openai/from\ langchain_openai\ import\ adapters/g'
git grep -l 'from langchain.llms.openai' | xargs sed -i '' 's/from langchain.llms.openai/from langchain_openai.llms/g'
git grep -l 'from langchain.utils' | xargs sed -i '' 's/from langchain.utils/from langchain_core.utils/g'
git grep -l 'from langchain\.' | xargs sed -i '' 's/from langchain\./from langchain_community./g'
git grep -l 'from langchain_community.memory.chat_message_histories' | xargs sed -i '' 's/from langchain_community.memory.chat_message_histories/from langchain_community.chat_message_histories/g'
git grep -l 'from langchain_community.agents.agent_toolkits' | xargs sed -i '' 's/from langchain_community.agents.agent_toolkits/from langchain_community.agent_toolkits/g'
git grep -l 'from langchain_community\.text_splitter' | xargs sed -i '' 's/from langchain_community\.text_splitter/from langchain.text_splitter/g'
git grep -l 'from langchain_community\.chains' | xargs sed -i '' 's/from langchain_community\.chains/from langchain.chains/g'
git grep -l 'from langchain_community\.agents' | xargs sed -i '' 's/from langchain_community\.agents/from langchain.agents/g'
git grep -l 'from langchain_community\.memory' | xargs sed -i '' 's/from langchain_community\.memory/from langchain.memory/g'
git grep -l 'langchain\.__version__' | xargs sed -i '' 's/langchain\.__version__/langchain_community.__version__/g'
git grep -l 'langchain\.document_loaders' | xargs sed -i '' 's/langchain\.document_loaders/langchain_community.document_loaders/g'
git grep -l 'langchain\.callbacks' | xargs sed -i '' 's/langchain\.callbacks/langchain_community.callbacks/g'
git grep -l 'langchain\.tools' | xargs sed -i '' 's/langchain\.tools/langchain_community.tools/g'
git grep -l 'langchain\.llms' | xargs sed -i '' 's/langchain\.llms/langchain_community.llms/g'
git grep -l 'import langchain$' | xargs sed -i '' 's/import\ langchain$/import\ langchain_community/g'
git grep -l 'from\ langchain\ ' | xargs sed -i '' 's/from\ langchain\ /from\ langchain_community\ /g'
git grep -l 'langchain_core.language_models.llmsten' | xargs sed -i '' 's/langchain_core.language_models.llmsten/langchain_community.llms.baseten/g'
cd ..
git checkout master -- langchain/langchain
cd langchain
python ../../.scripts/community_split/update_imports.py langchain/chat_loaders langchain_community.chat_loaders
python ../../.scripts/community_split/update_imports.py langchain/callbacks langchain_community.callbacks
python ../../.scripts/community_split/update_imports.py langchain/document_loaders langchain_community.document_loaders
python ../../.scripts/community_split/update_imports.py langchain/docstore langchain_community.docstore
python ../../.scripts/community_split/update_imports.py langchain/document_transformers langchain_community.document_transformers
python ../../.scripts/community_split/update_imports.py langchain/embeddings langchain_community.embeddings
python ../../.scripts/community_split/update_imports.py langchain/graphs langchain_community.graphs
python ../../.scripts/community_split/update_imports.py langchain/llms langchain_community.llms
python ../../.scripts/community_split/update_imports.py langchain/chat_models langchain_community.chat_models
python ../../.scripts/community_split/update_imports.py langchain/memory/chat_message_histories langchain_community.chat_message_histories
python ../../.scripts/community_split/update_imports.py langchain/storage langchain_community.storage
python ../../.scripts/community_split/update_imports.py langchain/tools langchain_community.tools
python ../../.scripts/community_split/update_imports.py langchain/utilities langchain_community.utilities
python ../../.scripts/community_split/update_imports.py langchain/vectorstores langchain_community.vectorstores
python ../../.scripts/community_split/update_imports.py langchain/adapters langchain_community.adapters
python ../../.scripts/community_split/update_imports.py langchain/agents/agent_toolkits langchain_community.agent_toolkits
python ../../.scripts/community_split/update_imports.py langchain/cache.py langchain_community.cache
python ../../.scripts/community_split/update_imports.py langchain/utils/math.py langchain_community.utils.math
python ../../.scripts/community_split/update_imports.py langchain/utils/json_schema.py langchain_core.utils.json_schema
python ../../.scripts/community_split/update_imports.py langchain/utils/html.py langchain_core.utils.html
python ../../.scripts/community_split/update_imports.py langchain/utils/env.py langchain_core.utils.env
python ../../.scripts/community_split/update_imports.py langchain/utils/strings.py langchain_core.utils.strings
python ../../.scripts/community_split/update_imports.py langchain/utils/openai.py langchain_openai.utils
python ../../.scripts/community_split/update_imports.py langchain/utils/openai_functions.py langchain_openai.functions
git grep -l 'from langchain.llms.base ' | xargs sed -i '' 's/from langchain.llms.base /from langchain_core.language_models.llms /g'
git grep -l 'from langchain.chat_models.base ' | xargs sed -i '' 's/from langchain.chat_models.base /from langchain_core.language_models.chat_models /g'
git grep -l 'from langchain.tools.base' | xargs sed -i '' 's/from langchain.tools.base/from langchain_core.tools/g'
git grep -l 'from langchain_community.llms.openai' | xargs sed -i '' 's/from langchain_community.llms.openai/from langchain_openai.llms/g'
git grep -l 'from langchain_community.chat_models.openai' | xargs sed -i '' 's/from langchain_community.chat_models.openai/from langchain_openai.chat_models/g'
git grep -l 'from langchain_community.chat_models.azure_openai' | xargs sed -i '' 's/from langchain_community.chat_models.azure_openai/from langchain_openai.chat_models/g'
git grep -l 'from langchain_community.embeddings.openai' | xargs sed -i '' 's/from langchain_community.embeddings.openai/from langchain_openai.embeddings/g'
git grep -l 'from langchain_community.embeddings.azure_openai' | xargs sed -i '' 's/from langchain_community.embeddings.azure_openai/from langchain_openai.embeddings/g'
git grep -l 'langchain_core.language_models.llmsten' | xargs sed -i '' 's/langchain_core.language_models.llmsten/langchain_community.llms.baseten/g'
cd ..
mv community/langchain_community/utilities/loading.py langchain/langchain/utilities
mv community/langchain_community/utilities/asyncio.py langchain/langchain/utilities
mv community/langchain_community/chat_models/openai.py partners/openai/langchain_openai/chat_models/base.py
mv community/langchain_community/chat_models/azure_openai.py partners/openai/langchain_openai/chat_models/azure.py
mv community/langchain_community/llms/openai.py partners/openai/langchain_openai/llms/base.py
mv community/langchain_community/embeddings/openai.py partners/openai/langchain_openai/embeddings/base.py
mv community/langchain_community/embeddings/azure_openai.py partners/openai/langchain_openai/embeddings/azure.py
mv community/langchain_community/adapters/openai.py partners/openai/langchain_openai/adapters.py
mv community/tests/unit_tests/chat_models/test_openai.py partners/openai/tests/unit_tests/chat_models/test_base.py
mv community/tests/integration_tests/chat_models/test_openai.py partners/openai/tests/integration_tests/chat_models/test_base.py
mv community/tests/unit_tests/chat_models/test_azureopenai.py partners/openai/tests/unit_tests/chat_models/test_azure.py
mv community/tests/integration_tests/chat_models/test_azure_openai.py partners/openai/tests/integration_tests/chat_models/test_azure.py
mv community/tests/unit_tests/llms/test_openai.py partners/openai/tests/unit_tests/llms/test_base.py
mv community/tests/integration_tests/llms/test_openai.py partners/openai/tests/integration_tests/llms/test_base.py
mv community/tests/integration_tests/llms/test_azure_openai.py partners/openai/tests/integration_tests/llms/test_azure.py
mv community/tests/unit_tests/embeddings/test_openai.py partners/openai/tests/unit_tests/embeddings/test_base.py
mv community/tests/integration_tests/embeddings/test_openai.py partners/openai/tests/integration_tests/embeddings/test_base.py
mv community/tests/integration_tests/embeddings/test_azure_openai.py partners/openai/tests/integration_tests/embeddings/test_azure.py
mkdir -p partners/openai/tests/integration_tests/adapters
mv community/tests/integration_tests/adapters/{__init__,test_openai}.py partners/openai/tests/integration_tests/adapters
git add partners core
rm community/langchain_community/{chat_models,llms,tools,embeddings,vectorstores,callbacks}/base.py
rm community/tests/unit_tests/{chat_models,llms,tools,callbacks}/test_base.py
rm community/tests/unit_tests/callbacks/test_manager.py
rm community/langchain_community/callbacks/{stdout,streaming_stdout}.py
rm community/langchain_community/callbacks/tracers/{base,evaluation,langchain,langchain_v1,log_stream,root_listeners,run_collector,schemas,stdout}.py
git checkout master -- langchain/tests/unit_tests/{chat_models,llms,tools,callbacks,document_loaders}/test_base.py
git checkout master -- langchain/tests/unit_tests/{callbacks,docstore,document_loaders,document_transformers,embeddings,graphs,llms,chat_models,storage,tools,utilities,vectorstores}/test_imports.py
git checkout master -- langchain/tests/unit_tests/callbacks/test_manager.py
git checkout master -- langchain/tests/unit_tests/document_loaders/blob_loaders/test_public_api.py
git checkout master -- langchain/tests/unit_tests/document_loaders/parsers/test_public_api.py
git checkout master -- langchain/tests/unit_tests/vectorstores/test_public_api.py
git checkout master -- langchain/tests/unit_tests/schema
touch langchain/tests/unit_tests/{llms,chat_models,tools,callbacks,runnables,document_loaders,docstore,document_transformers,embeddings,graphs,storage,utilities,vectorstores}/__init__.py
touch langchain/tests/unit_tests/document_loaders/{blob_loaders,parsers}/__init__.py
cp -r core/scripts community
cp -r core/scripts partners/openai
sed -i '' 's/from\ langchain_openai.chat_models\ /from\ langchain_openai.chat_models.base\ /g' partners/openai/langchain_openai/chat_models/azure.py
sed -i '' 's/from\ langchain_openai.embeddings\ /from\ langchain_openai.embeddings.base\ /g' partners/openai/langchain_openai/embeddings/azure.py
git grep -l '@pytest\.mark\.requires' partners/openai | xargs sed -i '' 's/@pytest\.mark\.requires("openai")//g'
cp -r langchain/tests/integration_tests/examples community/tests
cp -r langchain/tests/integration_tests/examples community/tests/integration_tests
cp -r langchain/tests/unit_tests/examples community/tests/unit_tests
cp langchain/tests/unit_tests/conftest.py community/tests/unit_tests
cp langchain/tests/integration_tests/test_compile.py community/tests/integration_tests
cp langchain/tests/integration_tests/test_compile.py partners/openai/tests/integration_tests
cp -r ../.scripts/community_split/libs/* .
cp community/tests/integration_tests/vectorstores/fake_embeddings.py langchain/tests/integration_tests/cache/
mv community/tests/{unit_tests,integration_tests}/document_loaders/test_telegram.py
mv community/tests/{unit_tests,integration_tests}/document_loaders/parsers/test_docai.py
mv community/tests/{unit_tests,integration_tests}/chat_message_histories/test_streamlit.py
g grep -l 'integration_tests\.vectorstores\.fake_embeddings' community/tests | xargs sed -i '' 's/integration_tests\.vectorstores\.fake_embeddings/integration_tests.cache.fake_embeddings/g'
cd core
make format
cd ../langchain
make format
cd ../experimental
make format
cd ../community
make format
cd ../partners/openai
make format
cd ../..
sed -E -i '' '1 s/(.*)/\1\ \ \#\ noqa\:\ E501/g' langchain/langchain/agents/agent_toolkits/conversational_retrieval/openai_functions.py
sed -i '' 's/llms\.loading\.get_type_to_cls_dict/llms.get_type_to_cls_dict/g' langchain/tests/unit_tests/chains/test_llm.py
touch community/langchain_community/agent_toolkits/amadeus/__init__.py

View File

@@ -1,85 +0,0 @@
import ast
import os
import sys
from pathlib import Path
class ImportTransformer(ast.NodeTransformer):
def __init__(self, public_items, module_name):
self.public_items = public_items
self.module_name = module_name
def visit_Module(self, node):
imports = [
ast.ImportFrom(
module=self.module_name,
names=[ast.alias(name=item, asname=None)],
level=0,
)
for item in self.public_items
]
all_assignment = ast.Assign(
targets=[ast.Name(id="__all__", ctx=ast.Store())],
value=ast.List(
elts=[ast.Str(s=item) for item in self.public_items], ctx=ast.Load()
),
)
node.body = imports + [all_assignment]
return node
def find_public_classes_and_methods(file_path):
with open(file_path, "r") as file:
node = ast.parse(file.read(), filename=file_path)
public_items = []
for item in node.body:
if isinstance(item, ast.ClassDef) or isinstance(item, ast.FunctionDef):
public_items.append(item.name)
if (
isinstance(item, ast.Assign)
and hasattr(item.targets[0], "id")
and item.targets[0].id not in ("__all__", "logger")
):
public_items.append(item.targets[0].id)
return public_items or None
def process_file(file_path, module_name):
public_items = find_public_classes_and_methods(file_path)
if public_items is None:
return
with open(file_path, "r") as file:
contents = file.read()
tree = ast.parse(contents, filename=file_path)
tree = ImportTransformer(public_items, module_name).visit(tree)
tree = ast.fix_missing_locations(tree)
with open(file_path, "w") as file:
file.write(ast.unparse(tree))
def process_directory(directory_path, base_module_name):
if Path(directory_path).is_file():
process_file(directory_path, base_module_name)
else:
for root, dirs, files in os.walk(directory_path):
for filename in files:
if filename.endswith(".py") and not filename.startswith("_"):
file_path = os.path.join(root, filename)
relative_path = os.path.relpath(file_path, directory_path)
module_name = f"{base_module_name}.{os.path.splitext(relative_path)[0].replace(os.sep, '.')}"
process_file(file_path, module_name)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python script_name.py <directory_path> <base_module_name>")
sys.exit(1)
directory_path = sys.argv[1]
base_module_name = sys.argv[2]
process_directory(directory_path, base_module_name)

View File

@@ -3,8 +3,7 @@
⚡ Building applications with LLMs through composability ⚡
[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/releases)
[![CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml)
[![Experimental CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml)
[![CI](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)
[![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
@@ -45,7 +44,10 @@ This framework consists of several parts.
- **[LangServe](https://github.com/langchain-ai/langserve)**: A library for deploying LangChain chains as a REST API.
- **[LangSmith](https://smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
**This repo contains the `langchain` ([here](libs/langchain)), `langchain-experimental` ([here](libs/experimental)), and `langchain-cli` ([here](libs/cli)) Python packages, as well as [LangChain Templates](templates).**
The LangChain libraries themselves are made up of several different packages.
- **[`langchain-core`](libs/core)**: Base abstractions and LangChain Expression Language.
- **[`langchain-community`](libs/community)**: Third party integrations.
- **[`langchain`](libs/langchain)**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
![LangChain Stack](docs/static/img/langchain_stack.png)
@@ -93,7 +95,7 @@ Agents involve an LLM making decisions about which Actions to take, taking that
Please see [here](https://python.langchain.com) for full documentation, which includes:
- [Getting started](https://python.langchain.com/docs/get_started/introduction): installation, setting up the environment, simple examples
- Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/) and [integrations](https://python.langchain.com/docs/integrations/providers)
- Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/), and [integrations](https://python.langchain.com/docs/integrations/providers)
- [Use case](https://python.langchain.com/docs/use_cases/qa_structured/sql) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/adapters/openai)
- [LangSmith](https://python.langchain.com/docs/langsmith/), [LangServe](https://python.langchain.com/docs/langserve), and [LangChain Template](https://python.langchain.com/docs/templates/) overviews
- [Reference](https://api.python.langchain.com): full API docs
@@ -103,7 +105,7 @@ Please see [here](https://python.langchain.com) for full documentation, which in
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see [here](.github/CONTRIBUTING.md).
For detailed information on how to contribute, see [here](https://python.langchain.com/docs/contributing/).
## 🌟 Contributors

View File

@@ -61,13 +61,13 @@
],
"source": [
"# Local\n",
"from langchain.chat_models import ChatOllama\n",
"from langchain_community.chat_models import ChatOllama\n",
"\n",
"llama2_chat = ChatOllama(model=\"llama2:13b-chat\")\n",
"llama2_code = ChatOllama(model=\"codellama:7b-instruct\")\n",
"\n",
"# API\n",
"from langchain.llms import Replicate\n",
"from langchain_community.llms import Replicate\n",
"\n",
"# REPLICATE_API_TOKEN = getpass()\n",
"# os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n",
@@ -107,7 +107,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities import SQLDatabase\n",
"from langchain_community.utilities import SQLDatabase\n",
"\n",
"db = SQLDatabase.from_uri(\"sqlite:///nba_roster.db\", sample_rows_in_table_info=0)\n",
"\n",
@@ -125,7 +125,7 @@
"id": "654b3577-baa2-4e12-a393-f40e5db49ac7",
"metadata": {},
"source": [
"## Query a SQL DB \n",
"## Query a SQL Database \n",
"\n",
"Follow the runnables workflow [here](https://python.langchain.com/docs/expression_language/cookbook/sql_db)."
]
@@ -149,8 +149,9 @@
],
"source": [
"# Prompt\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"# Update the template based on the type of SQL Database like MySQL, Microsoft SQL Server and so on\n",
"template = \"\"\"Based on the table schema below, write a SQL query that would answer the user's question:\n",
"{schema}\n",
"\n",
@@ -164,8 +165,8 @@
")\n",
"\n",
"# Chain to query\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"sql_response = (\n",
" RunnablePassthrough.assign(schema=get_schema)\n",
@@ -217,7 +218,7 @@
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\",\n",
" \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n",
" ),\n",
" (\"human\", template),\n",
" ]\n",
@@ -277,7 +278,7 @@
"source": [
"# Prompt\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"\n",
"template = \"\"\"Given an input question, convert it to a SQL query. No pre-amble. Based on the table schema below, write a SQL query that would answer the user's question:\n",
"{schema}\n",
@@ -293,7 +294,7 @@
"memory = ConversationBufferMemory(return_messages=True)\n",
"\n",
"# Chain to query with memory\n",
"from langchain.schema.runnable import RunnableLambda\n",
"from langchain_core.runnables import RunnableLambda\n",
"\n",
"sql_chain = (\n",
" RunnablePassthrough.assign(\n",
@@ -345,7 +346,7 @@
" [\n",
" (\n",
" \"system\",\n",
" \"Given an input question and SQL response, convert it to a natural langugae answer. No pre-amble.\",\n",
" \"Given an input question and SQL response, convert it to a natural language answer. No pre-amble.\",\n",
" ),\n",
" (\"human\", template),\n",
" ]\n",

View File

@@ -46,7 +46,7 @@
"\n",
"---\n",
"\n",
"A seperate cookbook highlights `Option 1` [here](https://github.com/langchain-ai/langchain/blob/master/cookbook/multi_modal_RAG_chroma.ipynb).\n",
"A separate cookbook highlights `Option 1` [here](https://github.com/langchain-ai/langchain/blob/master/cookbook/multi_modal_RAG_chroma.ipynb).\n",
"\n",
"And option `Option 2` is appropriate for cases when a multi-modal LLM cannot be used for answer synthesis (e.g., cost, etc).\n",
"\n",
@@ -101,7 +101,7 @@
"If you want to use the provided folder, then simply opt for a [pdf loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf) for the document:\n",
"\n",
"```\n",
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"loader = PyPDFLoader(path + fname)\n",
"docs = loader.load()\n",
"tables = [] # Ignore w/ basic pdf loader\n",
@@ -198,9 +198,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"# Generate summaries of text elements\n",
@@ -270,7 +270,7 @@
"import base64\n",
"import os\n",
"\n",
"from langchain.schema.messages import HumanMessage\n",
"from langchain_core.messages import HumanMessage\n",
"\n",
"\n",
"def encode_image(image_path):\n",
@@ -341,7 +341,7 @@
"Add raw docs and doc summaries to [Multi Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary): \n",
"\n",
"* Store the raw texts, tables, and images in the `docstore`.\n",
"* Store the texts, table summaries, and image summaries in the `vectorstore` for semantic retrieval."
"* Store the texts, table summaries, and image summaries in the `vectorstore` for efficient semantic retrieval."
]
},
{
@@ -353,11 +353,11 @@
"source": [
"import uuid\n",
"\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_core.documents import Document\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"\n",
"def create_multi_vector_retriever(\n",
@@ -442,7 +442,7 @@
"import re\n",
"\n",
"from IPython.display import HTML, display\n",
"from langchain.schema.runnable import RunnableLambda, RunnablePassthrough\n",
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
"from PIL import Image\n",
"\n",
"\n",

File diff suppressed because one or more lines are too long

View File

@@ -235,9 +235,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser"
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI"
]
},
{
@@ -318,11 +318,11 @@
"source": [
"import uuid\n",
"\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_core.documents import Document\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
@@ -374,7 +374,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",

View File

@@ -211,9 +211,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser"
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI"
]
},
{
@@ -373,11 +373,11 @@
"source": [
"import uuid\n",
"\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_core.documents import Document\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(collection_name=\"summaries\", embedding_function=OpenAIEmbeddings())\n",
@@ -646,7 +646,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",

View File

@@ -209,9 +209,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOllama\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser"
"from langchain_community.chat_models import ChatOllama\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate"
]
},
{
@@ -376,11 +376,11 @@
"source": [
"import uuid\n",
"\n",
"from langchain.embeddings import GPT4AllEmbeddings\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.embeddings import GPT4AllEmbeddings\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_core.documents import Document\n",
"\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(\n",
@@ -532,7 +532,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Prompt template\n",
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",

View File

@@ -62,7 +62,7 @@
"path = \"/Users/rlm/Desktop/cpi/\"\n",
"\n",
"# Load\n",
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"\n",
"loader = PyPDFLoader(path + \"cpi.pdf\")\n",
"pdf_pages = loader.load()\n",
@@ -132,8 +132,8 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"baseline = Chroma.from_texts(\n",
" texts=all_splits_pypdf_texts,\n",
@@ -160,9 +160,9 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.schema.output_parser import StrOutputParser\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Prompt\n",
"prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text for retrieval. \\\n",
@@ -202,7 +202,7 @@
"import os\n",
"from io import BytesIO\n",
"\n",
"from langchain.schema.messages import HumanMessage\n",
"from langchain_core.messages import HumanMessage\n",
"from PIL import Image\n",
"\n",
"\n",
@@ -273,8 +273,8 @@
"from base64 import b64decode\n",
"\n",
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
"from langchain.schema.document import Document\n",
"from langchain.storage import InMemoryStore\n",
"from langchain_core.documents import Document\n",
"\n",
"\n",
"def create_multi_vector_retriever(\n",
@@ -475,7 +475,7 @@
"source": [
"from operator import itemgetter\n",
"\n",
"from langchain.schema.runnable import RunnablePassthrough\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"\n",
"# Prompt\n",
"template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
@@ -521,7 +521,7 @@
"import re\n",
"\n",
"from langchain.schema import Document\n",
"from langchain.schema.runnable import RunnableLambda\n",
"from langchain_core.runnables import RunnableLambda\n",
"\n",
"\n",
"def looks_like_base64(sb):\n",

View File

@@ -13,7 +13,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9b22020a",
"metadata": {},
@@ -29,10 +28,9 @@
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.llms import OpenAI\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Chroma\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_openai import OpenAI, OpenAIEmbeddings\n",
"\n",
"llm = OpenAI(temperature=0)"
]
@@ -70,7 +68,7 @@
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain_community.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(doc_path)\n",
"documents = loader.load()\n",
@@ -100,7 +98,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader"
"from langchain_community.document_loaders import WebBaseLoader"
]
},
{
@@ -146,7 +144,6 @@
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c0a6c031",
"metadata": {},
@@ -163,7 +160,7 @@
"source": [
"# Import things that are needed generically\n",
"from langchain.agents import AgentType, Tool, initialize_agent\n",
"from langchain.llms import OpenAI"
"from langchain_openai import OpenAI"
]
},
{
@@ -280,7 +277,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "787a9b5e",
"metadata": {},
@@ -289,7 +285,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9161ba91",
"metadata": {},
@@ -411,7 +406,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "49a0cbbe",
"metadata": {},
@@ -525,7 +519,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.10.1"
}
},
"nbformat": 4,

View File

@@ -29,7 +29,7 @@
"outputs": [],
"source": [
"from langchain.chains import AnalyzeDocumentChain\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)"
]

Some files were not shown because too many files have changed in this diff Show More