Compare commits

..

56 Commits

Author SHA1 Message Date
Erick Friis
bfabf99aa3 infra: api docs autoapi experimenting 2024-02-05 17:37:15 -08:00
Tyler Titsworth
304f3f5fc1 community[patch]: Add Progress bar to HuggingFaceEmbeddings (#16758)
- **Description:** Adds a function parameter to HuggingFaceEmbeddings
called `show_progress` that enables a `tqdm` progress bar if enabled.
Does not function if `multi_process = True`.
  - **Issue:** n/a
  - **Dependencies:** n/a
2024-02-05 14:33:34 -08:00
Supreet Takkar
ae33979813 community[patch]: Allow adding ARNs as model_id to support Amazon Bedrock custom models (#16800)
- **Description:** Adds an additional class variable to `BedrockBase`
called `provider` that allows sending a model provider such as amazon,
cohere, ai21, etc.
Up until now, the model provider is extracted from the `model_id` using
the first part before the `.`, such as `amazon` for
`amazon.titan-text-express-v1` (see [supported list of Bedrock model IDs
here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)).
But for custom Bedrock models where the ARN of the provisioned
throughput must be supplied, the `model_id` is like
`arn:aws:bedrock:...` so the `model_id` cannot be extracted from this. A
model `provider` is required by the LangChain Bedrock class to perform
model-based processing. To allow the same processing to be performed for
custom-models of a specific base model type, passing this `provider`
argument can help solve the issues.
The alternative considered here was the use of
`provider.arn:aws:bedrock:...` which then requires ARN to be extracted
and passed separately when invoking the model. The proposed solution
here is simpler and also does not cause issues for current models
already using the Bedrock class.
  - **Issue:** N/A
  - **Dependencies:** N/A

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-02-05 14:28:03 -08:00
T Cramer
e022bfaa7d langchain: add partial parsing support to JsonOutputToolsParser (#17035)
- **Description:** Add partial parsing support to JsonOutputToolsParser
- **Issue:**
[16736](https://github.com/langchain-ai/langchain/issues/16736)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-05 14:18:30 -08:00
calvinweb
dcf973c22c Langchain: json_chat don't need stop sequenes (#16335)
This is a PR about #16334
The Stop sequenes isn't meanful in `json_chat` because it depends json
to work, not completions
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-05 14:18:16 -08:00
Bagatur
66e45e8ab7 community[patch]: chat model mypy fixes (#17061)
Related to #17048
2024-02-05 13:42:59 -08:00
Bagatur
d93de71d08 community[patch]: chat message history mypy fixes (#17059)
Related to #17048
2024-02-05 13:13:25 -08:00
Bagatur
af5ae24af2 community[patch]: callbacks mypy fixes (#17058)
Related to #17048
2024-02-05 12:37:27 -08:00
Vadim Kudlay
75b6fa1134 nvidia-ai-endpoints[patch]: Support User-Agent metadata and minor fixes. (#16942)
- **Description:** Several meta/usability updates, including User-Agent.
  - **Issue:** 
- User-Agent metadata for tracking connector engagement. @milesial
please check and advise.
- Better error messages. Tries harder to find a request ID. @milesial
requested.
- Client-side image resizing for multimodal models. Hope to upgrade to
Assets API solution in around a month.
- `client.payload_fn` allows you to modify payload before network
request. Use-case shown in doc notebook for kosmos_2.
- `client.last_inputs` put back in to allow for advanced
support/debugging.
  - **Dependencies:** 
- Attempts to pull in PIL for image resizing. If not installed, prints
out "please install" message, warns it might fail, and then tries
without resizing. We are waiting on a more permanent solution.

For LC viz: @hinthornw 
For NV viz: @fciannella @milesial @vinaybagade

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-05 12:24:53 -08:00
Nuno Campos
ae56fd020a Fix condition on custom root type in runnable history (#17017)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 12:15:11 -08:00
Nuno Campos
f0ffebb944 Shield callback methods from cancellation: Fix interrupted runs marked as pending forever (#17010)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 12:09:47 -08:00
Bagatur
e7b3290d30 community[patch]: fix agent_toolkits mypy (#17050)
Related to #17048
2024-02-05 11:56:24 -08:00
Erick Friis
6ffd5b15bc pinecone: init pkg (#16556)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-05 11:55:01 -08:00
Erick Friis
1183769cf7 template: tool-retrieval-fireworks (#17052)
- Initial commit oss-tool-retrieval-agent
- README update
- lint
- lock
- format imports
- Rename to retrieval-agent-fireworks
- cr

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
2024-02-05 11:50:17 -08:00
Harrison Chase
4eda647fdd infra: add -p to mkdir in lint steps (#17013)
Previously, if this did not find a mypy cache then it wouldnt run

this makes it always run

adding mypy ignore comments with existing uncaught issues to unblock other prs

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-02-05 11:22:06 -08:00
Erick Friis
db6af21395 docs: exa contents (#16555) 2024-02-05 11:15:06 -08:00
Eugene Yurtsev
fb245451d2 core[patch]: Add langsmith to printed sys information (#16899) 2024-02-05 11:13:30 -08:00
Mikhail Khludnev
2145636f1d Nvidia trt model name for stop_stream() (#16997)
just removing some legacy leftover.
2024-02-05 10:45:06 -08:00
Christophe Bornet
2ef69fe11b Add async methods to BaseChatMessageHistory and BaseMemory (#16728)
Adds:
   * async methods to BaseChatMessageHistory
   * async methods to ChatMessageHistory
   * async methods to BaseMemory
   * async methods to BaseChatMemory
   * async methods to ConversationBufferMemory
   * tests of ConversationBufferMemory's async methods

  **Twitter handle:** cbornet_
2024-02-05 13:20:28 -05:00
Ryan Kraus
b3c3b58f2c core[patch]: Fixed bug in dict to message conversion. (#17023)
- **Description**: We discovered a bug converting dictionaries to
messages where the ChatMessageChunk message type isn't handled. This PR
adds support for that message type.
- **Issue**: #17022 
- **Dependencies**: None
- **Twitter handle**: None
2024-02-05 10:13:25 -08:00
Nicolas Grenié
54fcd476bb docs: Update ollama examples with new community libraries (#17007)
- **Description:** Updating one line code sample for Ollama with new
**langchain_community** package
  - **Issue:**
  - **Dependencies:** none
  - **Twitter handle:**  @picsoung
2024-02-04 15:13:29 -08:00
Killinsun - Ryota Takeuchi
bcfce146d8 community[patch]: Correct the calling to collection_name in qdrant (#16920)
## Description

In #16608, the calling `collection_name` was wrong.
I made a fix for it. 
Sorry for the inconvenience!

## Issue

https://github.com/langchain-ai/langchain/issues/16962

## Dependencies

N/A



<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Kumar Shivendu <kshivendu1@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-04 10:45:35 -08:00
Erick Friis
849051102a google-genai[patch]: fix new core typing (#16988) 2024-02-03 17:45:44 -08:00
Bagatur
35446c814e openai[patch]: rm tiktoken model warning (#16964) 2024-02-03 16:36:57 -08:00
ccurme
0826d87ecd langchain_mistralai[patch]: Invoke callback prior to yielding token (#16986)
- **Description:** Invoke callback prior to yielding token in stream and
astream methods for ChatMistralAI.
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
2024-02-03 16:30:50 -08:00
Bagatur
267e71606e docs: Update README.md (#16966) 2024-02-02 16:50:58 -08:00
Erick Friis
2b7e47a668 infra: install integration deps for test linting (#16963)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-02 15:59:10 -08:00
Erick Friis
afdd636999 docs: partner packages (#16960) 2024-02-02 15:12:21 -08:00
Erick Friis
06660bc78c core[patch]: handle some optional cases in tools (#16954)
primary problem in pydantic still exists, where `Optional[str]` gets
turned to `string` in the jsonschema `.schema()`

Also fixes the `SchemaSchema` naming issue

---------

Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2024-02-02 15:05:54 -08:00
Mohammad Mohtashim
f8943e8739 core[patch]: Add doc-string to RunnableEach (#16892)
Add doc-string to Runnable Each
---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-02-02 14:11:09 -08:00
Ashley Xu
66adb95284 docs: BigQuery Vector Search went public review and updated docs (#16896)
Update the docs for BigQuery Vector Search
2024-02-02 10:26:44 -08:00
Massimiliano Pronesti
71f9ea33b6 docs: add quantization to vllm and update API (#16950)
- **Description:** Update vLLM docs to include instructions on how to
use quantized models, as well as to replace the deprecated methods.
2024-02-02 10:24:49 -08:00
Bagatur
2a510c71a0 core[patch]: doc init positional args (#16854) 2024-02-02 10:24:16 -08:00
Bagatur
d80c612c92 core[patch]: Message content as positional arg (#16921) 2024-02-02 10:24:02 -08:00
Bagatur
c29e9b6412 core[patch]: fix chat prompt partial messages placeholder var (#16918) 2024-02-02 10:23:37 -08:00
Radhakrishnan
3b0fa9079d docs: Updated integration doc for aleph alpha (#16844)
Description: Updated doc for llm/aleph_alpha with new functions: invoke.
Changed structure of the document to match the required one.
Issue: https://github.com/langchain-ai/langchain/issues/15664
Dependencies: None
Twitter handle: None

---------

Co-authored-by: Radhakrishnan Iyer <radhakrishnan.iyer@ibm.com>
2024-02-02 09:28:06 -08:00
hmasdev
cc17334473 core[minor]: add validation error handler to BaseTool (#14007)
- **Description:** add a ValidationError handler as a field of
[`BaseTool`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools.py#L101)
and add unit tests for the code change.
- **Issue:** #12721 #13662
- **Dependencies:** None
- **Tag maintainer:** 
- **Twitter handle:** @hmdev3
- **NOTE:**
  - I'm wondering if the update of document is required.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-01 20:09:19 -08:00
William FH
bdacfafa05 core[patch]: Remove deep copying of run prior to submitting it to LangChain Tracing (#16904) 2024-02-01 18:46:05 -08:00
William FH
e02efd513f core[patch]: Hide aliases when serializing (#16888)
Currently, if you dump an object initialized with an alias, we'll still
dump the secret values since they're retained in the kwargs
2024-02-01 17:55:37 -08:00
William FH
131c043864 Fix loading of ImagePromptTemplate (#16868)
We didn't override the namespace of the ImagePromptTemplate, so it is
listed as being in langchain.schema

This updates the mapping to let the loader deserialize.

Alternatively, we could make a slight breaking change and update the
namespace of the ImagePromptTemplate since we haven't broadly
publicized/documented it yet..
2024-02-01 17:54:04 -08:00
Erick Friis
6fc2835255 docs: fix broken links (#16855) 2024-02-01 17:29:38 -08:00
Eugene Yurtsev
a265878d71 langchain_openai[patch]: Invoke callback prior to yielding token (#16909)
All models should be calling the callback for new token prior to
yielding the token.

Not doing this can cause callbacks for downstream steps to be called
prior to the callback for the new token; causing issues in
astream_events APIs and other things that depend in callback ordering
being correct.

We need to make this change for all chat models.
2024-02-01 16:43:10 -08:00
Erick Friis
b1a847366c community: revert SQL Stores (#16912)
This reverts commit cfc225ecb3.


https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097

These will have existed in langchain-community 0.0.16 and 0.0.17.
2024-02-01 16:37:40 -08:00
akira wu
f7c709b40e doc: fix typo in message_history.ipynb (#16877)
- **Description:** just fixed a small typo in the documentation in the
`expression_language/how_to/message_history` session
[here](https://python.langchain.com/docs/expression_language/how_to/message_history)
2024-02-01 13:30:29 -08:00
Leonid Ganeline
c2ca6612fe refactor langchain.prompts.example_selector (#15369)
The `langchain.prompts.example_selector` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.prompts)
that belongs to `community`. If they moved to
`langchain_community.example_selectors`, the `langchain.prompts`
namespace would be effectively removed which is great.
- moved a class and afunction to `langchain_community`

Note:
- Previously, the `langchain.prompts.example_selector` artifacts were
moved into the `langchain_core.exampe_selectors`. See the flattened
namespace (`.prompts` was removed)!
Similar flattening was implemented for the `langchain_core` as the
`langchain_core.exampe_selectors`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-01 12:05:57 -08:00
Erick Friis
13a6756067 infra: ci naming 2 (#16893) 2024-02-01 11:39:00 -08:00
Lance Martin
b1e7130d8a Minor update to Nomic cookbook (#16886)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-01 11:28:58 -08:00
Shorthills AI
0bca0f4c24 Docs: Fixed grammatical mistake (#16858)
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: Sanskar Tanwar <142409040+SanskarTanwarShorthillsAI@users.noreply.github.com>
Co-authored-by: UpneetShorthillsAI <144228282+UpneetShorthillsAI@users.noreply.github.com>
Co-authored-by: HarshGuptaShorthillsAI <144897987+HarshGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: AdityaKalraShorthillsAI <143726711+AdityaKalraShorthillsAI@users.noreply.github.com>
Co-authored-by: SakshiShorthillsAI <144228183+SakshiShorthillsAI@users.noreply.github.com>
Co-authored-by: AashiGuptaShorthillsAI <144897730+AashiGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: ShamshadAhmedShorthillsAI <144897733+ShamshadAhmedShorthillsAI@users.noreply.github.com>
Co-authored-by: ManpreetShorthillsAI <142380984+ManpreetShorthillsAI@users.noreply.github.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: BajrangBishnoiShorthillsAi <148060486+BajrangBishnoiShorthillsAi@users.noreply.github.com>
2024-02-01 11:28:15 -08:00
Erick Friis
5b3fc86cfd infra: ci naming (#16890)
Make it clearer how to run equivalent commands locally

Not a perfect 1:1, but will help people get started

![Screenshot 2024-02-01 at 10 53
34 AM](https://github.com/langchain-ai/langchain/assets/9557659/da271aaf-d5db-41e3-9379-cb1d8a0232c5)
2024-02-01 11:09:37 -08:00
Qihui Xie
c5b01ac621 community[patch]: support LIKE comparator (full text match) in Qdrant (#12769)
**Description:** 
Support [Qdrant full text match
filtering](https://qdrant.tech/documentation/concepts/filtering/#full-text-match)
by adding Comparator.LIKE to QdrantTranslator.
2024-02-01 11:03:25 -08:00
Christophe Bornet
9d458d089a community: Factorize AstraDB components constructors (#16779)
* Adds `AstraDBEnvironment` class and use it in `AstraDBLoader`,
`AstraDBCache`, `AstraDBSemanticCache`, `AstraDBBaseStore` and
`AstraDBChatMessageHistory`
* Create an `AsyncAstraDB` if we only have an `AstraDB` and vice-versa
so:
  * we always have an instance of `AstraDB`
* we always have an instance of `AsyncAstraDB` for recent versions of
astrapy
* Create collection if not exists in `AstraDBBaseStore`
* Some typing improvements

Note: `AstraDB` `VectorStore` not using `AstraDBEnvironment` at the
moment. This will be done after the `langchain-astradb` package is out.
2024-02-01 10:51:07 -08:00
Harel Gal
93366861c7 docs: Indicated Guardrails for Amazon Bedrock preview status (#16769)
Added notification about limited preview status of Guardrails for Amazon
Bedrock feature to code example.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
2024-02-01 10:41:48 -08:00
Christophe Bornet
78a1af4848 langchain[patch]: Add async methods to MultiVectorRetriever (#16878)
Adds async support to multi vector retriever
2024-02-01 10:33:06 -08:00
Bagatur
7d03d8f586 docs: fix docstring examples (#16889) 2024-02-01 10:17:26 -08:00
Bagatur
c2d09fb151 infra: bump exp min test reqs (#16884) 2024-02-01 08:35:21 -08:00
Bagatur
65ba5c220b experimental[patch]: Release 0.0.50 (#16883) 2024-02-01 08:27:39 -08:00
273 changed files with 9088 additions and 4165 deletions

View File

@@ -36,30 +36,35 @@ env:
jobs:
lint:
name: "-"
uses: ./.github/workflows/_lint.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
test:
name: "-"
uses: ./.github/workflows/_test.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
compile-integration-tests:
name: "-"
uses: ./.github/workflows/_compile_integration_test.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
dependencies:
name: "-"
uses: ./.github/workflows/_dependencies.yml
with:
working-directory: ${{ inputs.working-directory }}
secrets: inherit
extended-tests:
name: "make extended_tests #${{ matrix.python-version }}"
runs-on: ubuntu-latest
strategy:
matrix:
@@ -68,7 +73,6 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }} extended tests
defaults:
run:
working-directory: ${{ inputs.working-directory }}

View File

@@ -24,7 +24,7 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
name: "poetry run pytest -m compile tests/integration_tests #${{ matrix.python-version }}"
steps:
- uses: actions/checkout@v4

View File

@@ -28,7 +28,7 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: dependencies - Python ${{ matrix.python-version }}
name: dependency checks ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v4

View File

@@ -21,6 +21,7 @@ env:
jobs:
build:
name: "make lint #${{ matrix.python-version }}"
runs-on: ubuntu-latest
strategy:
matrix:
@@ -85,7 +86,7 @@ jobs:
with:
path: |
${{ env.WORKDIR }}/.mypy_cache
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
- name: Analysing the code with our lint
@@ -113,7 +114,7 @@ jobs:
with:
path: |
${{ env.WORKDIR }}/.mypy_cache_test
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}
- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}

View File

@@ -28,7 +28,7 @@ jobs:
- "3.9"
- "3.10"
- "3.11"
name: Python ${{ matrix.python-version }}
name: "make test #${{ matrix.python-version }}"
steps:
- uses: actions/checkout@v4

View File

@@ -1,5 +1,5 @@
---
name: Check library diffs
name: CI
on:
push:
@@ -32,6 +32,7 @@ jobs:
outputs:
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
ci:
name: cd ${{ matrix.working-directory }}
needs: [ build ]
strategy:
matrix:

View File

@@ -1,5 +1,5 @@
---
name: Codespell
name: CI / cd . / make spell_check
on:
push:
@@ -12,7 +12,7 @@ permissions:
jobs:
codespell:
name: Check for spelling errors
name: (Check for spelling errors)
runs-on: ubuntu-latest
steps:

View File

@@ -1,5 +1,5 @@
---
name: Docs, templates, cookbook lint
name: CI / cd .
on:
push:
@@ -15,6 +15,7 @@ on:
jobs:
check:
name: Check for "from langchain import x" imports
runs-on: ubuntu-latest
steps:
@@ -28,6 +29,7 @@ jobs:
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
lint:
name: "-"
uses:
./.github/workflows/_lint.yml
with:

View File

@@ -1,36 +0,0 @@
---
name: templates CI
on:
push:
branches: [ master ]
pull_request:
paths:
- '.github/actions/poetry_setup/action.yml'
- '.github/tools/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/templates_ci.yml'
- 'templates/**'
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
POETRY_VERSION: "1.7.1"
WORKDIR: "templates"
jobs:
lint:
uses:
./.github/workflows/_lint.yml
with:
working-directory: templates
secrets: inherit

View File

@@ -1,6 +1,6 @@
# 🦜️🔗 LangChain
⚡ Building applications with LLMs through composability
⚡ Build context-aware reasoning applications
[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/releases)
[![CI](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/check_diffs.yml)

File diff suppressed because one or more lines are too long

View File

@@ -1,301 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d8da6094-30c7-43f3-a608-c91717b673db",
"metadata": {},
"source": [
"# Nomic Embeddings\n",
"\n",
"Nomic has released a new embedding model with strong performance for long context retrieval (8k context window).\n",
"\n",
"## Signup\n",
"\n",
"Get your API token, then run:\n",
"```\n",
"! nomic login\n",
"```\n",
"\n",
"Then run with your generated API token \n",
"```\n",
"! nomic login < token > \n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f737ec15-e9ab-4629-b54c-24be69e8b60b",
"metadata": {},
"outputs": [],
"source": [
"! nomic login"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8ab7434a-2930-42b5-9164-dc2c03abe232",
"metadata": {},
"outputs": [],
"source": [
"! nomic login token"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3501e2a-4686-4b95-8a1c-f19e035ea354",
"metadata": {},
"outputs": [],
"source": [
"! pip install -U langchain-nomic"
]
},
{
"cell_type": "markdown",
"id": "134475f2-f256-4c13-9712-c55783e6a4e2",
"metadata": {},
"source": [
"## Document Loading\n",
"\n",
"Let's test 3 interesting blog posts."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "01c4d270-171e-45c2-a1b6-e350faa74117",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders import WebBaseLoader\n",
"\n",
"urls = [\n",
" \"https://lilianweng.github.io/posts/2023-06-23-agent/\",\n",
" \"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/\",\n",
" \"https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/\",\n",
"]\n",
"\n",
"docs = [WebBaseLoader(url).load() for url in urls]\n",
"docs_list = [item for sublist in docs for item in sublist]"
]
},
{
"cell_type": "markdown",
"id": "75ab7f74-873c-4d84-af5a-5cf19c61239d",
"metadata": {},
"source": [
"## Splitting \n",
"\n",
"Long context retrieval "
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f512e128-629e-4304-926f-94fe5c999527",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"text_splitter = CharacterTextSplitter.from_tiktoken_encoder(\n",
" chunk_size=7500, chunk_overlap=100\n",
")\n",
"doc_splits = text_splitter.split_documents(docs_list)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d2a69cf0-e3ab-4c92-a1d0-10da45c08b3b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The document is 6562 tokens\n",
"The document is 3037 tokens\n",
"The document is 6092 tokens\n",
"The document is 1050 tokens\n",
"The document is 6933 tokens\n",
"The document is 5560 tokens\n"
]
}
],
"source": [
"import tiktoken\n",
"\n",
"encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
"encoding = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")\n",
"for d in doc_splits:\n",
" print(\"The document is %s tokens\" % len(encoding.encode(d.page_content)))"
]
},
{
"cell_type": "markdown",
"id": "c58d1e9b-e98e-4bd9-b52f-4dfc2a4e69f4",
"metadata": {},
"source": [
"## Index \n",
"\n",
"Nomic embeddings [here](https://docs.nomic.ai/reference/endpoints/nomic-embed-text). "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "76447866-bf8b-412b-93bc-d6ea8ec35952",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
"from langchain_nomic import NomicEmbeddings\n",
"from langchain_nomic.embeddings import NomicEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "15b3eab2-2689-49d4-8cb0-67ef2adcbc49",
"metadata": {},
"outputs": [],
"source": [
"# Add to vectorDB\n",
"vectorstore = Chroma.from_documents(\n",
" documents=doc_splits,\n",
" collection_name=\"rag-chroma\",\n",
" embedding=NomicEmbeddings(model=\"nomic-embed-text-v1\"),\n",
")\n",
"retriever = vectorstore.as_retriever()"
]
},
{
"cell_type": "markdown",
"id": "41131122-3591-4566-aac1-ed19d496820a",
"metadata": {},
"source": [
"## RAG Chain\n",
"\n",
"We can use the Mistral `v0.2`, which is [fine-tuned for 32k context](https://x.com/dchaplot/status/1734198245067243629?s=20).\n",
"\n",
"We can [use Ollama](https://ollama.ai/library/mistral) -\n",
"```\n",
"ollama pull mistral:instruct\n",
"```\n",
"\n",
"We can also run [GPT-4 128k](https://openai.com/blog/new-models-and-developer-products-announced-at-devday). "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1397de64-5b4a-4001-adc5-570ff8d31ff6",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models import ChatOllama\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"# Prompt\n",
"template = \"\"\"Answer the question based only on the following context:\n",
"{context}\n",
"\n",
"Question: {question}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"# LLM API\n",
"model = ChatOpenAI(temperature=0, model=\"gpt-4-1106-preview\")\n",
"\n",
"# Local LLM\n",
"ollama_llm = \"mistral:instruct\"\n",
"model_local = ChatOllama(model=ollama_llm)\n",
"\n",
"# Chain\n",
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt\n",
" | model_local\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1548e00c-1ff6-4e88-aa13-69badf2088fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' Agents, especially those used in artificial intelligence and natural language processing, can have different types of memory. Here are some common types:\\n\\n1. **Short-term memory** or working memory: This is a small capacity, high-turnover memory that holds information temporarily while the agent processes it. Short-term memory is essential for tasks requiring attention and quick response, such as parsing sentences or following instructions.\\n\\n2. **Long-term memory**: This is a large capacity, low-turnover memory where agents store information for extended periods. Long-term memory enables learning from experiences, accessing past knowledge, and improving performance over time.\\n\\n3. **Explicit memory** or declarative memory: Agents use explicit memory to store and recall facts, concepts, and rules that can be expressed in natural language. This type of memory is crucial for problem solving and reasoning.\\n\\n4. **Implicit memory** or procedural memory: Implicit memory refers to the acquisition and retention of skills and habits. The agent learns through repeated experiences without necessarily being aware of it.\\n\\n5. **Connectionist memory**: Connectionist memory, also known as neural networks, is inspired by the structure and function of biological brains. Connectionist models learn and store information in interconnected nodes or artificial neurons. This type of memory enables the model to recognize patterns and generalize knowledge.\\n\\n6. **Hybrid memory systems**: Many advanced agents employ a combination of different memory types to maximize their learning potential and performance. These hybrid systems can integrate short-term, long-term, explicit, implicit, and connectionist memories.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Question\n",
"chain.invoke(\"What are the types of agent memory?\")"
]
},
{
"cell_type": "markdown",
"id": "5ec5b4c3-757d-44df-92ea-dd5f08017dd6",
"metadata": {},
"source": [
"**Mistral**\n",
"\n",
"Trace: 24k prompt tokens.\n",
"\n",
"* https://smith.langchain.com/public/3e04d475-ea08-4ee3-ae66-6416a93d8b08/r\n",
"\n",
"--- \n",
"\n",
"Some considerations are noted in the [needle in a haystack analysis](https://twitter.com/GregKamradt/status/1722386725635580292?lang=en):\n",
"\n",
"* LLMs may suffer with retrieval from large context depending on where the information is placed."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ffb6b63-17ee-42d8-b1fb-d6a866e98458",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -16,7 +16,8 @@ cp ../cookbook/README.md src/pages/cookbook.mdx
mkdir -p docs/templates
cp ../templates/docs/INDEX.md docs/templates/index.md
poetry run python scripts/copy_templates.py
wget https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md
wget -q https://raw.githubusercontent.com/langchain-ai/langserve/main/README.md -O docs/langserve.md
wget -q https://raw.githubusercontent.com/langchain-ai/langgraph/main/README.md -O docs/langgraph.md
yarn

View File

@@ -88,16 +88,20 @@ html_last_updated_fmt = "%b %d, %Y"
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autodoc.typehints",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"sphinx_copybutton",
"sphinx_panels",
"IPython.sphinxext.ipython_console_highlighting",
"autoapi.extension"
# "sphinx.ext.autodoc",
# "sphinx.ext.autodoc.typehints",
# "sphinx.ext.autosummary",
# "sphinx.ext.napoleon",
# "sphinx.ext.viewcode",
# "sphinxcontrib.autodoc_pydantic",
# "sphinx_copybutton",
# "sphinx_panels",
# "IPython.sphinxext.ipython_console_highlighting",
]
autoapi_dirs = ["/Users/erickfriis/langchain/oss-py/libs/core/langchain_core"]
autoapi_python_use_implicit_namespaces = True
source_suffix = [".rst"]
# some autodoc pydantic options are repeated in the actual template.

View File

@@ -6,9 +6,9 @@ pydantic<2
autodoc_pydantic==1.8.0
myst_parser
nbsphinx==0.8.9
sphinx>=5
sphinx>=7
sphinx-autobuild==2021.3.14
sphinx_rtd_theme==1.0.0
sphinx_rtd_theme==2.0.0
sphinx-typlog-theme==0.8.0
sphinx-panels
toml

View File

@@ -37,7 +37,7 @@ from langchain_community.llms import integration_class_REPLACE_ME
## Text Embedding Models
See a [usage example](/docs/integrations/text_embedding/INCLUDE_REAL_NAME)
See a [usage example](/docs/integrations/text_embedding/INCLUDE_REAL_NAME).
```python
from langchain_community.embeddings import integration_class_REPLACE_ME
@@ -45,7 +45,7 @@ from langchain_community.embeddings import integration_class_REPLACE_ME
## Chat models
See a [usage example](/docs/integrations/chat/INCLUDE_REAL_NAME)
See a [usage example](/docs/integrations/chat/INCLUDE_REAL_NAME).
```python
from langchain_community.chat_models import integration_class_REPLACE_ME

View File

@@ -7,7 +7,7 @@
"source": [
"# Add message history (memory)\n",
"\n",
"The `RunnableWithMessageHistory` let's us add message history to certain types of chains.\n",
"The `RunnableWithMessageHistory` let us add message history to certain types of chains.\n",
"\n",
"Specifically, it can be used for any Runnable that takes as input one of\n",
"\n",

View File

@@ -93,6 +93,3 @@ Head to the reference section for full documentation of all classes and methods
### [Developer's guide](/docs/contributing)
Check out the developer's guide for guidelines on contributing and help getting your dev environment set up.
### [Community](/docs/community)
Head to the [Community navigator](/docs/community) to find places to ask questions, share feedback, meet other developers, and dream about the future of LLMs.

View File

@@ -98,7 +98,7 @@ The LLM landscape is evolving at an unprecedented pace, with new libraries and m
### Model composition
Deploying systems like LangChain demands the ability to piece together different models and connect them via logic. Take the example of building a natural language input SQL query engine. Querying an LLM and obtaining the SQL command is only part of the system. You need to extract metadata from the connected database, construct a prompt for the LLM, run the SQL query on an engine, collect and feed back the response to the LLM as the query runs, and present the results to the user. This demonstrates the need to seamlessly integrate various complex components built in Python into a dynamic chain of logical blocks that can be served together.
Deploying systems like LangChain demands the ability to piece together different models and connect them via logic. Take the example of building a natural language input SQL query engine. Querying an LLM and obtaining the SQL command is only part of the system. You need to extract metadata from the connected database, construct a prompt for the LLM, run the SQL query on an engine, collect and feedback the response to the LLM as the query runs, and present the results to the user. This demonstrates the need to seamlessly integrate various complex components built in Python into a dynamic chain of logical blocks that can be served together.
## Cloud providers

View File

@@ -19,7 +19,19 @@
"\n",
"This notebook covers how to get started with MistralAI chat models, via their [API](https://docs.mistral.ai/api/).\n",
"\n",
"A valid [API key](https://console.mistral.ai/users/api-keys/) is needed to communicate with the API."
"A valid [API key](https://console.mistral.ai/users/api-keys/) is needed to communicate with the API.\n",
"\n",
"You will need the `langchain-mistralai` package to use the API. You can install it via pip:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb978a7e",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-core langchain-mistralai"
]
},
{

File diff suppressed because one or more lines are too long

View File

@@ -27,17 +27,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 1,
"id": "0cb0f937-b610-42a2-b765-336eed037031",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdin",
"name": "stdout",
"output_type": "stream",
"text": [
" ········\n"
"········\n"
]
}
],
@@ -51,21 +51,20 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 2,
"id": "6fb585dd",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain_community.llms import AlephAlpha"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 3,
"id": "f81a230d",
"metadata": {
"tags": []
@@ -81,7 +80,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"id": "f0d26e48",
"metadata": {
"tags": []
@@ -98,19 +97,19 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 5,
"id": "6811d621",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
"llm_chain = prompt | llm"
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"id": "3058e63f",
"metadata": {
"tags": []
@@ -119,10 +118,10 @@
{
"data": {
"text/plain": [
"' Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems.\\n'"
"' Artificial Intelligence is the simulation of human intelligence processes by machines.\\n\\n'"
]
},
"execution_count": 10,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -130,8 +129,16 @@
"source": [
"question = \"What is AI?\"\n",
"\n",
"llm_chain.run(question)"
"llm_chain.invoke({\"question\": question})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3544eff",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -150,7 +157,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.12"
},
"vscode": {
"interpreter": {

View File

@@ -107,12 +107,43 @@
"conversation.predict(input=\"Hi there!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_llm = Bedrock(\n",
" credentials_profile_name=\"bedrock-admin\",\n",
" provider=\"cohere\",\n",
" model_id=\"<Custom model ARN>\", # ARN like 'arn:aws:bedrock:...' obtained via provisioning the custom model\n",
" model_kwargs={\"temperature\": 1},\n",
" streaming=True,\n",
" callbacks=[StreamingStdOutCallbackHandler()],\n",
")\n",
"\n",
"conversation = ConversationChain(\n",
" llm=custom_llm, verbose=True, memory=ConversationBufferMemory()\n",
")\n",
"conversation.predict(input=\"What is the recipe of mayonnaise?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Guardrails for Amazon Bedrock example \n",
"\n",
"## Guardrails for Amazon Bedrock (Preview) \n",
"[Guardrails for Amazon Bedrock](https://aws.amazon.com/bedrock/guardrails/) evaluates user inputs and model responses based on use case specific policies, and provides an additional layer of safeguards regardless of the underlying model. Guardrails can be applied across models, including Anthropic Claude, Meta Llama 2, Cohere Command, AI21 Labs Jurassic, and Amazon Titan Text, as well as fine-tuned models.\n",
"**Note**: Guardrails for Amazon Bedrock is currently in preview and not generally available. Reach out through your usual AWS Support contacts if youd like access to this feature.\n",
"In this section, we are going to set up a Bedrock language model with specific guardrails that include tracing capabilities. "
]
},
@@ -136,7 +167,7 @@
" print(f\"Guardrails: {kwargs}\")\n",
"\n",
"\n",
"# guardrails for Amazon Bedrock with trace\n",
"# Guardrails for Amazon Bedrock with trace\n",
"llm = Bedrock(\n",
" credentials_profile_name=\"bedrock-admin\",\n",
" model_id=\"<Model_ID>\",\n",
@@ -163,7 +194,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.7"
}
},
"nbformat": 4,

View File

@@ -82,7 +82,7 @@
" temperature=0.8,\n",
")\n",
"\n",
"print(llm(\"What is the capital of France ?\"))"
"print(llm.invoke(\"What is the capital of France ?\"))"
]
},
{
@@ -117,8 +117,7 @@
"1. The first Pokemon game was released in 1996.\n",
"2. The president was Bill Clinton.\n",
"3. Clinton was president from 1993 to 2001.\n",
"4. The answer is Clinton.\n",
"\n"
"4. The answer is Clinton.\n"
]
},
{
@@ -142,7 +141,7 @@
"\n",
"question = \"Who was the US president in the year the first Pokemon game was released?\"\n",
"\n",
"print(llm_chain.run(question))"
"print(llm_chain.invoke(question))"
]
},
{
@@ -172,7 +171,36 @@
" trust_remote_code=True, # mandatory for hf models\n",
")\n",
"\n",
"llm(\"What is the future of AI?\")"
"llm.invoke(\"What is the future of AI?\")"
]
},
{
"cell_type": "markdown",
"id": "d6ca8fd911d25faa",
"metadata": {
"collapsed": false
},
"source": [
"## Quantization\n",
"\n",
"vLLM supports `awq` quantization. To enable it, pass `quantization` to `vllm_kwargs`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cada3174c46a0ea",
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"llm_q = VLLM(\n",
" model=\"TheBloke/Llama-2-7b-Chat-AWQ\",\n",
" trust_remote_code=True,\n",
" max_new_tokens=512,\n",
" vllm_kwargs={\"quantization\": \"awq\"},\n",
")"
]
},
{
@@ -216,7 +244,7 @@
" model_name=\"tiiuae/falcon-7b\",\n",
" model_kwargs={\"stop\": [\".\"]},\n",
")\n",
"print(llm(\"Rome is\"))"
"print(llm.invoke(\"Rome is\"))"
]
}
],

View File

@@ -207,15 +207,11 @@ from langchain_community.vectorstores import MatchingEngine
> [Google BigQuery](https://cloud.google.com/bigquery),
> BigQuery is a serverless and cost-effective enterprise data warehouse in Google Cloud.
>
> Google BigQuery Vector Search
> [Google BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro)
> BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast but approximate results, or using brute force for exact results.
> It can calculate Euclidean or Cosine distance. With LangChain, we default to use Euclidean distance.
> This is a private preview (experimental) feature. Please submit this
> [enrollment form](https://docs.google.com/forms/d/18yndSb4dTf2H0orqA9N7NAchQEDQekwWiD5jYfEkGWk/viewform?edit_requested=true)
> if you want to enroll BigQuery Vector Search Experimental.
We need to install several python packages.
```bash

View File

@@ -0,0 +1,22 @@
# Providers
LangChain integrates with many providers
## Partner Packages
- [OpenAI](/docs/integrations/platforms/openai)
- [Anthropic](/docs/integrations/platforms/anthropic)
- [Google](/docs/integrations/platforms/google)
- [MistralAI](/docs/integrations/providers/mistralai)
- [NVIDIA AI](/docs/integrations/providers/nvidia)
- [Together AI](/docs/integrations/providers/together)
- [Robocorp](/docs/integrations/providers/robocorp)
- [Exa Search](/docs/integrations/providers/exa_search)
- [Nomic](/docs/integrations/providers/nomic)
## Featured Community Providers
- [AWS](/docs/integrations/platforms/aws)
- [Hugging Face](/docs/integrations/platforms/huggingface)
- [Microsoft](/docs/integrations/platforms/microsoft)

View File

@@ -0,0 +1,77 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exa Search\n",
"\n",
"Exa's search integration exists in its own [partner package](https://pypi.org/project/langchain-exa/). You can install it with:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-exa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to use the package, you will also need to set the `EXA_API_KEY` environment variable to your Exa API key.\n",
"\n",
"## Retriever\n",
"\n",
"You can use the [`ExaSearchRetriever`](/docs/integrations/tools/exa_search#using-exasearchretriever) in a standard retrieval pipeline. You can import it as follows"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from langchain_exa import ExaSearchRetriever"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tools\n",
"\n",
"You can use Exa as an agent tool as described in the [Exa tool calling docs](/docs/integrations/tools/exa_search#using-the-exa-sdk-as-langchain-agent-tools).\n"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MistralAI\n",
"\n",
"Mistral AI is a platform that offers hosting for their powerful open source models.\n",
"\n",
"You can access them via their [API](https://docs.mistral.ai/api/).\n",
"\n",
"A valid [API key](https://console.mistral.ai/users/api-keys/) is needed to communicate with the API.\n",
"\n",
"You will also need the `langchain-mistralai` package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-core langchain-mistralai"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from langchain_mistralai import ChatMistralAI, MistralAIEmbeddings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the docs for their\n",
"\n",
"- [Chat Model](/docs/integrations/chat/mistralai)\n",
"- [Embeddings Model](/docs/integrations/text_embedding/mistralai)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -11,6 +11,22 @@
"- Atlas: their Visual Data Engine\n",
"- GPT4All: their Open Source Edge Language Model Ecosystem\n",
"\n",
"The Nomic integration exists in its own [partner package](https://pypi.org/project/langchain-nomic/). You can install it with:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-nomic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Currently, you can import their hosted [embedding model](/docs/integrations/text_embedding/nomic) as follows:"
]
},

View File

@@ -21,7 +21,7 @@ To use, you should set up the environment variables `ANYSCALE_API_BASE` and
## LLM
```python
from langchain.llms import Ollama
from langchain_community.llms import Ollama
```
See the notebook example [here](/docs/integrations/llms/ollama).
@@ -31,7 +31,7 @@ See the notebook example [here](/docs/integrations/llms/ollama).
### Chat Ollama
```python
from langchain.chat_models import ChatOllama
from langchain_community.chat_models import ChatOllama
```
See the notebook example [here](/docs/integrations/chat/ollama).
@@ -47,7 +47,7 @@ See the notebook example [here](/docs/integrations/chat/ollama_functions).
## Embedding models
```python
from langchain.embeddings import OllamaEmbeddings
from langchain_community.embeddings import OllamaEmbeddings
```
See the notebook example [here](/docs/integrations/text_embedding/ollama).

View File

@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Together AI\n",
"\n",
"> The Together API makes it easy to fine-tune or run leading open-source models with a couple lines of code. We have integrated the worlds leading open-source models, including Llama-2, RedPajama, Falcon, Alpaca, Stable Diffusion XL, and more. Read more: https://together.ai\n",
"\n",
"To use, you'll need an API key which you can find here:\n",
"https://api.together.xyz/settings/api-keys. This can be passed in as init param\n",
"``together_api_key`` or set as environment variable ``TOGETHER_API_KEY``.\n",
"\n",
"Together API reference: https://docs.together.ai/reference/inference\n",
"\n",
"You will also need to install the `langchain-together` integration package:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-together"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from __module_name__ import (\n",
" Together, # LLM\n",
" TogetherEmbeddings,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the docs for their\n",
"\n",
"- [LLM](/docs/integrations/llms/together)\n",
"- [Embeddings Model](/docs/integrations/text_embedding/together)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -1,186 +0,0 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: SQL\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SQLStore\n",
"\n",
"The `SQLStrStore` and `SQLDocStore` implement remote data access and persistence to store strings or LangChain documents in your SQL instance."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['value1', 'value2']\n",
"['key2']\n",
"['key2']\n"
]
}
],
"source": [
"from langchain_community.storage import SQLStrStore\n",
"\n",
"# simple example using an SQLStrStore to store strings\n",
"# same as you would use in \"InMemoryStore\" but using SQL persistence\n",
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
"COLLECTION_NAME = \"test_collection\"\n",
"\n",
"store = SQLStrStore(\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"store.mset([(\"key1\", \"value1\"), (\"key2\", \"value2\")])\n",
"print(store.mget([\"key1\", \"key2\"]))\n",
"# ['value1', 'value2']\n",
"store.mdelete([\"key1\"])\n",
"print(list(store.yield_keys()))\n",
"# ['key2']\n",
"print(list(store.yield_keys(prefix=\"k\")))\n",
"# ['key2']\n",
"# delete the COLLECTION_NAME collection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Integration with ParentRetriever and PGVector\n",
"\n",
"When using PGVector, you already have a SQL instance running. Here is a convenient way of using this instance to store documents associated to vectors. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prepare the PGVector vectorestore with something like this:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import PGVector\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings()\n",
"vector_db = PGVector.from_existing_index(\n",
" embedding=embeddings,\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then create the parent retiever using `SQLDocStore` to persist the documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain.retrievers import ParentDocumentRetriever\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_community.storage import SQLDocStore\n",
"\n",
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
"docstore = SQLDocStore(\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"\n",
"loader = TextLoader(\"./state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"\n",
"parent_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=50)\n",
"\n",
"retriever = ParentDocumentRetriever(\n",
" vectorstore=vector_db,\n",
" docstore=docstore,\n",
" child_splitter=child_splitter,\n",
" parent_splitter=parent_splitter,\n",
")\n",
"retriever.add_documents(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete a collection"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.storage import SQLStrStore\n",
"\n",
"# delete the COLLECTION_NAME collection\n",
"CONNECTION_STRING = \"postgresql+psycopg2://user:pass@localhost:5432/db\"\n",
"COLLECTION_NAME = \"test_collection\"\n",
"store = SQLStrStore(\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"store.delete_collection()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -60,7 +60,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using ExaSearchRetriever\n",
"## Using ExaSearchRetriever\n",
"\n",
"ExaSearchRetriever is a retriever that uses Exa Search to retrieve relevant documents."
]
@@ -345,7 +345,7 @@
" Set the optional include_domains (list[str]) parameter to restrict the search to a list of domains.\n",
" Set the optional start_published_date (str) parameter to restrict the search to documents published after the date (YYYY-MM-DD).\n",
" \"\"\"\n",
" return exa.search(\n",
" return exa.search_and_contents(\n",
" f\"{query}\",\n",
" use_autoprompt=True,\n",
" num_results=5,\n",
@@ -359,7 +359,7 @@
" \"\"\"Search for webpages similar to a given URL.\n",
" The url passed in should be a URL returned from `search`.\n",
" \"\"\"\n",
" return exa.find_similar(url, num_results=5)\n",
" return exa.find_similar_and_contents(url, num_results=5)\n",
"\n",
"\n",
"@tool\n",

View File

@@ -7,22 +7,12 @@
},
"source": [
"# BigQuery Vector Search\n",
"> **BigQueryVectorSearch**:\n",
"BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast approximate results, or using brute force for exact results.\n",
"> [**BigQuery Vector Search**](https://cloud.google.com/bigquery/docs/vector-search-intro) lets you use GoogleSQL to do semantic search, using vector indexes for fast approximate results, or using brute force for exact results.\n",
"\n",
"\n",
"This tutorial illustrates how to work with an end-to-end data and embedding management system in LangChain, and provide scalable semantic search in BigQuery."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a **private preview (experimental)** feature. Please submit this\n",
"[enrollment form](https://docs.google.com/forms/d/18yndSb4dTf2H0orqA9N7NAchQEDQekwWiD5jYfEkGWk/viewform?edit_requested=true)\n",
"if you want to enroll BigQuery Vector Search Experimental."
]
},
{
"cell_type": "markdown",
"metadata": {

View File

@@ -13,7 +13,16 @@
"This notebook shows how to use functionality related to the `Pinecone` vector database.\n",
"\n",
"To use Pinecone, you must have an API key. \n",
"Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart)."
"Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"\n",
"Set the following environment variables to make using the `Pinecone` integration easier:\n",
"\n",
"- `PINECONE_API_KEY`: Your Pinecone API key.\n",
"- `PINECONE_INDEX_NAME`: The name of the index you want to use.\n",
"\n",
"And to follow along in this doc, you should also set\n",
"\n",
"- `OPENAI_API_KEY`: Your OpenAI API key, for using `OpenAIEmbeddings`"
]
},
{
@@ -25,74 +34,27 @@
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet pinecone-client langchain-openai tiktoken langchain"
"%pip install --upgrade --quiet langchain-pinecone langchain-openai langchain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"PINECONE_API_KEY\"] = getpass.getpass(\"Pinecone API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02a536e0-d603-4d79-b18b-1ed562977b40",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"PINECONE_ENV\"] = getpass.getpass(\"Pinecone Environment:\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "320af802-9271-46ee-948f-d2453933d44b",
"id": "42f2ea67",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
"First, let's split our state of the union document into chunked `docs`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffea66e4-bc23-46a9-9580-b348dfe7b7a7",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aac9563e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import Pinecone\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
@@ -103,43 +65,52 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e104aee",
"cell_type": "markdown",
"id": "3a4d377f",
"metadata": {},
"outputs": [],
"source": [
"import pinecone\n",
"Now let's assume you have your Pinecone index set up with `dimension=1536`.\n",
"\n",
"# initialize pinecone\n",
"pinecone.init(\n",
" api_key=os.getenv(\"PINECONE_API_KEY\"), # find at app.pinecone.io\n",
" environment=os.getenv(\"PINECONE_ENV\"), # next to api key in console\n",
")\n",
"\n",
"index_name = \"langchain-demo\"\n",
"\n",
"# First, check if our index already exists. If it doesn't, we create it\n",
"if index_name not in pinecone.list_indexes():\n",
" # we create a new index\n",
" pinecone.create_index(name=index_name, metric=\"cosine\", dimension=1536)\n",
"# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)\n",
"\n",
"# if you already have an index, you can load it like this\n",
"# docsearch = Pinecone.from_existing_index(index_name, embeddings)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
"We can connect to our Pinecone index and insert those chunked docs as contents with `Pinecone.from_documents`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c608226",
"execution_count": 6,
"id": "6e104aee",
"metadata": {},
"outputs": [],
"source": [
"from langchain_pinecone import Pinecone\n",
"\n",
"index_name = \"langchain-test-index\"\n",
"\n",
"docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ffbcb3fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
@@ -156,15 +127,25 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "38a7a60e",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['24631802-4bad-44a7-a4ba-fd71f00cc160']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index = pinecone.Index(\"langchain-demo\")\n",
"vectorstore = Pinecone(index, embeddings.embed_query, \"text\")\n",
"vectorstore = Pinecone(index_name=index_name, embedding=embeddings)\n",
"\n",
"vectorstore.add_texts(\"More text!\")"
"vectorstore.add_texts([\"More text!\"])"
]
},
{
@@ -180,10 +161,91 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "a359ed74",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"## Document 0\n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"\n",
"## Document 1\n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. \n",
"\n",
"America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. \n",
"\n",
"These steps will help blunt gas prices here at home. And I know the news about whats happening can seem alarming. \n",
"\n",
"But I want you to know that we are going to be okay. \n",
"\n",
"When the history of this era is written Putins war on Ukraine will have left Russia weaker and the rest of the world stronger. \n",
"\n",
"While it shouldnt have taken something so terrible for people around the world to see whats at stake now everyone sees it clearly.\n",
"\n",
"## Document 2\n",
"\n",
"We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
"\n",
"Officer Mora was 27 years old. \n",
"\n",
"Officer Rivera was 22. \n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.\n",
"\n",
"## Document 3\n",
"\n",
"One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n",
"\n",
"When they came home, many of the worlds fittest and best trained warriors were never the same. \n",
"\n",
"Headaches. Numbness. Dizziness. \n",
"\n",
"A cancer that would put them in a flag-draped coffin. \n",
"\n",
"I know. \n",
"\n",
"One of those soldiers was my son Major Beau Biden. \n",
"\n",
"We dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n",
"\n",
"But Im committed to finding out everything we can. \n",
"\n",
"Committed to military families like Danielle Robinson from Ohio. \n",
"\n",
"The widow of Sergeant First Class Heath Robinson. \n",
"\n",
"He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n",
"\n",
"Stationed near Baghdad, just yards from burn pits the size of football fields. \n",
"\n",
"Heaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.\n"
]
}
],
"source": [
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
"matched_docs = retriever.get_relevant_documents(query)\n",
@@ -203,15 +265,56 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"id": "9ca82740",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. \n",
"\n",
"2. We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
"\n",
"Officer Mora was 27 years old. \n",
"\n",
"Officer Rivera was 22. \n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety. \n",
"\n"
]
}
],
"source": [
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
"for i, doc in enumerate(found_docs):\n",
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b0fd750b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -230,7 +333,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -101,7 +101,7 @@
"metadata": {},
"source": [
"You can create custom prompt templates that format the prompt in any way you want.\n",
"For more information, see [Custom Prompt Templates](./custom_prompt_template).\n",
"For more information, see [Prompt Template Composition](./composition).\n",
"\n",
"## `ChatPromptTemplate`\n",
"\n",

View File

@@ -752,7 +752,7 @@
"\n",
"* [SQL use case](/docs/use_cases/sql/): Many of the challenges of working with SQL db's and CSV's are generic to any structured data type, so it's useful to read the SQL techniques even if you're using Pandas for CSV data analysis.\n",
"* [Tool use](/docs/use_cases/tool_use/): Guides on general best practices when working with chains and agents that invoke tools\n",
"* [Agents](/docs/use_cases/agents/): Understand the fundamentals of building LLM agents.\n",
"* [Agents](/docs/modules/agents/): Understand the fundamentals of building LLM agents.\n",
"* Integrations: Sandboxed envs like [E2B](/docs/integrations/tools/e2b_data_analysis) and [Bearly](/docs/integrations/tools/bearly), utilities like [SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), related agents like [Spark DataFrame agent](/docs/integrations/toolkits/spark)."
]
}

View File

@@ -586,11 +586,12 @@
"Vector stores are commonly used for retrieval, but there are other ways to do retrieval, too.\n",
"\n",
"`Retriever`: An object that returns `Document`s given a text query\n",
"\n",
"- [Docs](/docs/modules/data_connection/retrievers/): Further documentation on the interface and built-in retrieval techniques. Some of which include:\n",
" - `MultiQueryRetriever` [generates variants of the input question](/docs/modules/data_connection/retrievers/MultiQueryRetriever) to improve retrieval hit rate.\n",
" - `MultiVectorRetriever` (diagram below) instead generates [variants of the embeddings](/docs/modules/data_connection/retrievers/multi_vector), also in order to improve retrieval hit rate.\n",
" - `Max marginal relevance` selects for [relevance and diversity](https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf) among the retrieved documents to avoid passing in duplicate context.\n",
" - Documents can be filtered during vector store retrieval using [`metadata` filters](/docs/use_cases/question_answering/document-context-aware-QA).\n",
" - Documents can be filtered during vector store retrieval using metadata filters, such as with a [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query).\n",
"- [Integrations](/docs/integrations/retrievers/): Integrations with retrieval services.\n",
"- [Interface](https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html): API reference for the base interface."
]

View File

@@ -45,9 +45,9 @@
"\n",
"A central question for building a summarizer is how to pass your documents into the LLM's context window. Two common approaches for this are:\n",
"\n",
"1. `Stuff`: Simply \"stuff\" all your documents into a single prompt. This is the simplest approach (see [here](/docs/modules/chains/document/stuff) for more on the `StuffDocumentsChains`, which is used for this method).\n",
"1. `Stuff`: Simply \"stuff\" all your documents into a single prompt. This is the simplest approach (see [here](/docs/modules/chains#lcel-chains) for more on the `create_stuff_documents_chain` constructor, which is used for this method).\n",
"\n",
"2. `Map-reduce`: Summarize each document on it's own in a \"map\" step and then \"reduce\" the summaries into a final summary (see [here](/docs/modules/chains/document/map_reduce) for more on the `MapReduceDocumentsChain`, which is used for this method)."
"2. `Map-reduce`: Summarize each document on it's own in a \"map\" step and then \"reduce\" the summaries into a final summary (see [here](/docs/modules/chains#legacy-chains) for more on the `MapReduceDocumentsChain`, which is used for this method)."
]
},
{
@@ -523,7 +523,7 @@
"source": [
"## Option 3. Refine\n",
" \n",
"[Refine](/docs/modules/chains/document/refine) is similar to map-reduce:\n",
"[RefineDocumentsChain](/docs/modules/chains#legacy-chains) is similar to map-reduce:\n",
"\n",
"> The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer. For each document, it passes all non-document inputs, the current document, and the latest intermediate answer to an LLM chain to get a new answer.\n",
"\n",
@@ -647,24 +647,10 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"id": "0ddd522e-30dc-4f6a-b993-c4f97e656c4f",
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "`run` not supported when there is not exactly one output key. Got ['output_text', 'intermediate_steps'].",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[17], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mchains\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m AnalyzeDocumentChain\n\u001b[1;32m 3\u001b[0m summarize_document_chain \u001b[38;5;241m=\u001b[39m AnalyzeDocumentChain(combine_docs_chain\u001b[38;5;241m=\u001b[39mchain, text_splitter\u001b[38;5;241m=\u001b[39mtext_splitter)\n\u001b[0;32m----> 4\u001b[0m \u001b[43msummarize_document_chain\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdocs\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/langchain/libs/langchain/langchain/chains/base.py:496\u001b[0m, in \u001b[0;36mChain.run\u001b[0;34m(self, callbacks, tags, metadata, *args, **kwargs)\u001b[0m\n\u001b[1;32m 459\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Convenience method for executing chain.\u001b[39;00m\n\u001b[1;32m 460\u001b[0m \n\u001b[1;32m 461\u001b[0m \u001b[38;5;124;03mThe main difference between this method and `Chain.__call__` is that this\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 493\u001b[0m \u001b[38;5;124;03m # -> \"The temperature in Boise is...\"\u001b[39;00m\n\u001b[1;32m 494\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 495\u001b[0m \u001b[38;5;66;03m# Run at start to make sure this is possible/defined\u001b[39;00m\n\u001b[0;32m--> 496\u001b[0m _output_key \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_run_output_key\u001b[49m\n\u001b[1;32m 498\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m args \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m kwargs:\n\u001b[1;32m 499\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m1\u001b[39m:\n",
"File \u001b[0;32m~/langchain/libs/langchain/langchain/chains/base.py:445\u001b[0m, in \u001b[0;36mChain._run_output_key\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 442\u001b[0m \u001b[38;5;129m@property\u001b[39m\n\u001b[1;32m 443\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_run_output_key\u001b[39m(\u001b[38;5;28mself\u001b[39m) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mstr\u001b[39m:\n\u001b[1;32m 444\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys) \u001b[38;5;241m!=\u001b[39m \u001b[38;5;241m1\u001b[39m:\n\u001b[0;32m--> 445\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 446\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m`run` not supported when there is not exactly \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 447\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mone output key. Got \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 448\u001b[0m )\n\u001b[1;32m 449\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moutput_keys[\u001b[38;5;241m0\u001b[39m]\n",
"\u001b[0;31mValueError\u001b[0m: `run` not supported when there is not exactly one output key. Got ['output_text', 'intermediate_steps']."
]
}
],
"outputs": [],
"source": [
"from langchain.chains import AnalyzeDocumentChain\n",
"\n",

View File

@@ -87,11 +87,11 @@ module.exports = {
collapsible: false,
items: [
{ type: "autogenerated", dirName: "integrations/platforms" },
{ type: "category", label: "More", collapsed: true, items: [{type:"autogenerated", dirName: "integrations/providers" }]},
{ type: "category", label: "More", collapsed: true, items: [{type:"autogenerated", dirName: "integrations/providers" }], link: { type: 'generated-index', slug: "integrations/providers", }},
],
link: {
type: 'generated-index',
slug: "integrations/providers",
type: 'doc',
id: 'integrations/platforms/index'
},
},
{

View File

@@ -17,9 +17,9 @@
},
"outputs": [],
"source": [
"from __module_name__.chat_models import __ModuleName__Chat\n",
"from __module_name__.llms import __ModuleName__LLM\n",
"from __module_name__.vectorstores import __ModuleName__VectorStore"
"from __module_name__ import Chat__ModuleName__\n",
"from __module_name__ import __ModuleName__LLM\n",
"from __module_name__ import __ModuleName__VectorStore"
]
}
],

View File

@@ -41,7 +41,7 @@ lint lint_diff lint_package lint_tests:
poetry run ruff .
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff format $(PYTHON_FILES) --diff
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff --select I $(PYTHON_FILES)
[ "$(PYTHON_FILES)" = "" ] || mkdir -p $(MYPY_CACHE) || poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)
[ "$(PYTHON_FILES)" = "" ] || mkdir -p $(MYPY_CACHE) && poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)
format format_diff:
poetry run ruff format $(PYTHON_FILES)

View File

@@ -1,6 +1,6 @@
from __future__ import annotations
from typing import List, Optional
from typing import Dict, List, Optional, Type
from langchain_core.pydantic_v1 import root_validator
@@ -14,18 +14,17 @@ from langchain_community.tools.file_management.move import MoveFileTool
from langchain_community.tools.file_management.read import ReadFileTool
from langchain_community.tools.file_management.write import WriteFileTool
_FILE_TOOLS = {
# "Type[Runnable[Any, Any]]" has no attribute "__fields__" [attr-defined]
tool_cls.__fields__["name"].default: tool_cls # type: ignore[attr-defined]
for tool_cls in [
CopyFileTool,
DeleteFileTool,
FileSearchTool,
MoveFileTool,
ReadFileTool,
WriteFileTool,
ListDirectoryTool,
]
_FILE_TOOLS: List[Type[BaseTool]] = [
CopyFileTool,
DeleteFileTool,
FileSearchTool,
MoveFileTool,
ReadFileTool,
WriteFileTool,
ListDirectoryTool,
]
_FILE_TOOLS_MAP: Dict[str, Type[BaseTool]] = {
tool_cls.__fields__["name"].default: tool_cls for tool_cls in _FILE_TOOLS
}
@@ -61,20 +60,20 @@ class FileManagementToolkit(BaseToolkit):
def validate_tools(cls, values: dict) -> dict:
selected_tools = values.get("selected_tools") or []
for tool_name in selected_tools:
if tool_name not in _FILE_TOOLS:
if tool_name not in _FILE_TOOLS_MAP:
raise ValueError(
f"File Tool of name {tool_name} not supported."
f" Permitted tools: {list(_FILE_TOOLS)}"
f" Permitted tools: {list(_FILE_TOOLS_MAP)}"
)
return values
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
allowed_tools = self.selected_tools or _FILE_TOOLS.keys()
allowed_tools = self.selected_tools or _FILE_TOOLS_MAP
tools: List[BaseTool] = []
for tool in allowed_tools:
tool_cls = _FILE_TOOLS[tool]
tools.append(tool_cls(root_dir=self.root_dir)) # type: ignore
tool_cls = _FILE_TOOLS_MAP[tool]
tools.append(tool_cls(root_dir=self.root_dir))
return tools

View File

@@ -2,7 +2,7 @@
import json
import re
from functools import partial
from typing import Any, Callable, Dict, List, Optional
from typing import Any, Callable, Dict, List, Optional, cast
import yaml
from langchain_core.callbacks import BaseCallbackManager
@@ -68,7 +68,7 @@ class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
"""Tool name."""
description = REQUESTS_GET_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
response_length: int = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_GET_PROMPT)
@@ -83,7 +83,9 @@ class RequestsGetToolWithParsing(BaseRequestsTool, BaseTool):
except json.JSONDecodeError as e:
raise e
data_params = data.get("params")
response = self.requests_wrapper.get(data["url"], params=data_params)
response: str = cast(
str, self.requests_wrapper.get(data["url"], params=data_params)
)
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
@@ -100,7 +102,7 @@ class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
"""Tool name."""
description = REQUESTS_POST_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
response_length: int = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_POST_PROMPT)
@@ -114,7 +116,7 @@ class RequestsPostToolWithParsing(BaseRequestsTool, BaseTool):
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.post(data["url"], data["data"])
response: str = cast(str, self.requests_wrapper.post(data["url"], data["data"]))
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
@@ -131,7 +133,7 @@ class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
"""Tool name."""
description = REQUESTS_PATCH_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
response_length: int = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PATCH_PROMPT)
@@ -145,7 +147,9 @@ class RequestsPatchToolWithParsing(BaseRequestsTool, BaseTool):
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.patch(data["url"], data["data"])
response: str = cast(
str, self.requests_wrapper.patch(data["url"], data["data"])
)
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
@@ -162,7 +166,7 @@ class RequestsPutToolWithParsing(BaseRequestsTool, BaseTool):
"""Tool name."""
description = REQUESTS_PUT_TOOL_DESCRIPTION
"""Tool description."""
response_length: Optional[int] = MAX_RESPONSE_LENGTH
response_length: int = MAX_RESPONSE_LENGTH
"""Maximum length of the response to be returned."""
llm_chain: Any = Field(
default_factory=_get_default_llm_chain_factory(PARSING_PUT_PROMPT)
@@ -176,7 +180,7 @@ class RequestsPutToolWithParsing(BaseRequestsTool, BaseTool):
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.put(data["url"], data["data"])
response: str = cast(str, self.requests_wrapper.put(data["url"], data["data"]))
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]
@@ -208,7 +212,7 @@ class RequestsDeleteToolWithParsing(BaseRequestsTool, BaseTool):
data = parse_json_markdown(text)
except json.JSONDecodeError as e:
raise e
response = self.requests_wrapper.delete(data["url"])
response: str = cast(str, self.requests_wrapper.delete(data["url"]))
response = response[: self.response_length]
return self.llm_chain.predict(
response=response, instructions=data["output_instructions"]

View File

@@ -58,7 +58,7 @@ def create_pbi_agent(
input_variables=input_variables,
**prompt_params,
),
callback_manager=callback_manager, # type: ignore
callback_manager=callback_manager,
verbose=verbose,
),
allowed_tools=[tool.name for tool in tools],

View File

@@ -2,7 +2,17 @@
from __future__ import annotations
import warnings
from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Sequence, Union
from typing import (
TYPE_CHECKING,
Any,
Dict,
List,
Literal,
Optional,
Sequence,
Union,
cast,
)
from langchain_core.messages import AIMessage, SystemMessage
from langchain_core.prompts import BasePromptTemplate, PromptTemplate
@@ -176,8 +186,8 @@ def create_sql_agent(
elif agent_type == AgentType.OPENAI_FUNCTIONS:
if prompt is None:
messages = [
SystemMessage(content=prefix),
messages: List = [
SystemMessage(content=cast(str, prefix)),
HumanMessagePromptTemplate.from_template("{input}"),
AIMessage(content=suffix or SQL_FUNCTIONS_SUFFIX),
MessagesPlaceholder(variable_name="agent_scratchpad"),
@@ -191,7 +201,7 @@ def create_sql_agent(
elif agent_type == "openai-tools":
if prompt is None:
messages = [
SystemMessage(content=prefix),
SystemMessage(content=cast(str, prefix)),
HumanMessagePromptTemplate.from_template("{input}"),
AIMessage(content=suffix or SQL_FUNCTIONS_SUFFIX),
MessagesPlaceholder(variable_name="agent_scratchpad"),

View File

@@ -60,12 +60,14 @@ from langchain_core.load.load import loads
from langchain_core.outputs import ChatGeneration, Generation
from langchain_core.utils import get_from_env
from langchain_community.utilities.astradb import AstraDBEnvironment
from langchain_community.vectorstores.redis import Redis as RedisVectorstore
logger = logging.getLogger(__file__)
if TYPE_CHECKING:
import momento
from astrapy.db import AstraDB
from cassandra.cluster import Session as CassandraSession
@@ -1262,7 +1264,7 @@ class AstraDBCache(BaseCache):
collection_name: str = ASTRA_DB_CACHE_DEFAULT_COLLECTION_NAME,
token: Optional[str] = None,
api_endpoint: Optional[str] = None,
astra_db_client: Optional[Any] = None, # 'astrapy.db.AstraDB' if passed
astra_db_client: Optional[AstraDB] = None,
namespace: Optional[str] = None,
):
"""
@@ -1278,39 +1280,17 @@ class AstraDBCache(BaseCache):
namespace (Optional[str]): namespace (aka keyspace) where the
collection is created. Defaults to the database's "default namespace".
"""
try:
from astrapy.db import (
AstraDB as LibAstraDB,
)
except (ImportError, ModuleNotFoundError):
raise ImportError(
"Could not import a recent astrapy python package. "
"Please install it with `pip install --upgrade astrapy`."
)
# Conflicting-arg checks:
if astra_db_client is not None:
if token is not None or api_endpoint is not None:
raise ValueError(
"You cannot pass 'astra_db_client' to AstraDB if passing "
"'token' and 'api_endpoint'."
)
self.collection_name = collection_name
self.token = token
self.api_endpoint = api_endpoint
self.namespace = namespace
if astra_db_client is not None:
self.astra_db = astra_db_client
else:
self.astra_db = LibAstraDB(
token=self.token,
api_endpoint=self.api_endpoint,
namespace=self.namespace,
)
self.collection = self.astra_db.create_collection(
collection_name=self.collection_name,
astra_env = AstraDBEnvironment(
token=token,
api_endpoint=api_endpoint,
astra_db_client=astra_db_client,
namespace=namespace,
)
self.astra_db = astra_env.astra_db
self.collection = self.astra_db.create_collection(
collection_name=collection_name,
)
self.collection_name = collection_name
@staticmethod
def _make_id(prompt: str, llm_string: str) -> str:
@@ -1364,7 +1344,7 @@ class AstraDBCache(BaseCache):
def delete(self, prompt: str, llm_string: str) -> None:
"""Evict from cache if there's an entry."""
doc_id = self._make_id(prompt, llm_string)
return self.collection.delete_one(doc_id)
self.collection.delete_one(doc_id)
def clear(self, **kwargs: Any) -> None:
"""Clear cache. This is for all LLMs at once."""
@@ -1395,7 +1375,7 @@ class AstraDBSemanticCache(BaseCache):
collection_name: str = ASTRA_DB_CACHE_DEFAULT_COLLECTION_NAME,
token: Optional[str] = None,
api_endpoint: Optional[str] = None,
astra_db_client: Optional[Any] = None, # 'astrapy.db.AstraDB' if passed
astra_db_client: Optional[AstraDB] = None,
namespace: Optional[str] = None,
embedding: Embeddings,
metric: Optional[str] = None,
@@ -1423,22 +1403,13 @@ class AstraDBSemanticCache(BaseCache):
The default score threshold is tuned to the default metric.
Tune it carefully yourself if switching to another distance metric.
"""
try:
from astrapy.db import (
AstraDB as LibAstraDB,
)
except (ImportError, ModuleNotFoundError):
raise ImportError(
"Could not import a recent astrapy python package. "
"Please install it with `pip install --upgrade astrapy`."
)
# Conflicting-arg checks:
if astra_db_client is not None:
if token is not None or api_endpoint is not None:
raise ValueError(
"You cannot pass 'astra_db_client' to AstraDB if passing "
"'token' and 'api_endpoint'."
)
astra_env = AstraDBEnvironment(
token=token,
api_endpoint=api_endpoint,
astra_db_client=astra_db_client,
namespace=namespace,
)
self.astra_db = astra_env.astra_db
self.embedding = embedding
self.metric = metric
@@ -1457,18 +1428,7 @@ class AstraDBSemanticCache(BaseCache):
self.embedding_dimension = self._get_embedding_dimension()
self.collection_name = collection_name
self.token = token
self.api_endpoint = api_endpoint
self.namespace = namespace
if astra_db_client is not None:
self.astra_db = astra_db_client
else:
self.astra_db = LibAstraDB(
token=self.token,
api_endpoint=self.api_endpoint,
namespace=self.namespace,
)
self.collection = self.astra_db.create_collection(
collection_name=self.collection_name,
dimension=self.embedding_dimension,

View File

@@ -416,15 +416,25 @@ class AimCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
self._run.close()
self.reset_callback_meta()
if reset:
self.__init__( # type: ignore
repo=repo if repo else self.repo,
experiment_name=experiment_name
if experiment_name
else self.experiment_name,
system_tracking_interval=system_tracking_interval
if system_tracking_interval
else self.system_tracking_interval,
log_system_params=log_system_params
if log_system_params
else self.log_system_params,
aim = import_aim()
self.repo = repo if repo else self.repo
self.experiment_name = (
experiment_name if experiment_name else self.experiment_name
)
self.system_tracking_interval = (
system_tracking_interval
if system_tracking_interval
else self.system_tracking_interval
)
self.log_system_params = (
log_system_params if log_system_params else self.log_system_params
)
self._run = aim.Run(
repo=self.repo,
experiment=self.experiment_name,
system_tracking_interval=self.system_tracking_interval,
log_system_params=self.log_system_params,
)
self._run_hash = self._run.hash
self.action_records = []

View File

@@ -1,6 +1,6 @@
import os
import warnings
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, cast
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.callbacks import BaseCallbackHandler
@@ -269,8 +269,8 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
for key in [str(kwargs["parent_run_id"]), str(kwargs["run_id"])]
):
return
prompts = self.prompts.get(str(kwargs["parent_run_id"])) or self.prompts.get(
str(kwargs["run_id"])
prompts: List = self.prompts.get(str(kwargs["parent_run_id"])) or cast(
List, self.prompts.get(str(kwargs["run_id"]), [])
)
for chain_output_key, chain_output_val in outputs.items():
if isinstance(chain_output_val, list):
@@ -283,10 +283,7 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
"response": output["text"].strip(),
},
}
for prompt, output in zip(
prompts, # type: ignore
chain_output_val,
)
for prompt, output in zip(prompts, chain_output_val)
]
)
else:
@@ -295,7 +292,7 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
records=[
{
"fields": {
"prompt": " ".join(prompts), # type: ignore
"prompt": " ".join(prompts),
"response": chain_output_val.strip(),
},
}

View File

@@ -162,7 +162,7 @@ class ArthurCallbackHandler(BaseCallbackHandler):
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""On LLM end, send data to Arthur."""
try:
import pytz # type: ignore[import]
import pytz
except ImportError as e:
raise ImportError(
"Could not import pytz. Please install it with 'pip install pytz'."

View File

@@ -83,7 +83,7 @@ class ClearMLCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
if clearml.Task.current_task():
self.task = clearml.Task.current_task()
else:
self.task = clearml.Task.init( # type: ignore
self.task = clearml.Task.init(
task_type=self.task_type,
project_name=self.project_name,
tags=self.tags,
@@ -361,17 +361,13 @@ class ClearMLCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
if self.visualize and self.nlp and self.temp_dir.name is not None:
doc = self.nlp(text)
dep_out = spacy.displacy.render( # type: ignore
doc, style="dep", jupyter=False, page=True
)
dep_out = spacy.displacy.render(doc, style="dep", jupyter=False, page=True)
dep_output_path = Path(
self.temp_dir.name, hash_string(f"dep-{text}") + ".html"
)
dep_output_path.open("w", encoding="utf-8").write(dep_out)
ent_out = spacy.displacy.render( # type: ignore
doc, style="ent", jupyter=False, page=True
)
ent_out = spacy.displacy.render(doc, style="ent", jupyter=False, page=True)
ent_output_path = Path(
self.temp_dir.name, hash_string(f"ent-{text}") + ".html"
)

View File

@@ -37,7 +37,7 @@ def _get_experiment(
) -> Any:
comet_ml = import_comet_ml()
experiment = comet_ml.Experiment( # type: ignore
experiment = comet_ml.Experiment(
workspace=workspace,
project_name=project_name,
)

View File

@@ -79,12 +79,8 @@ def analyze_text(
if nlp is not None:
spacy = import_spacy()
doc = nlp(text)
dep_out = spacy.displacy.render( # type: ignore
doc, style="dep", jupyter=False, page=True
)
ent_out = spacy.displacy.render( # type: ignore
doc, style="ent", jupyter=False, page=True
)
dep_out = spacy.displacy.render(doc, style="dep", jupyter=False, page=True)
ent_out = spacy.displacy.render(doc, style="ent", jupyter=False, page=True)
text_visualizations = {
"dependency_tree": dep_out,
"entities": ent_out,
@@ -199,7 +195,7 @@ class FlyteCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
complexity_metrics: Dict[str, float] = generation_resp.pop(
"text_complexity_metrics"
) # type: ignore # noqa: E501
)
self.deck.append(
self.markdown_renderer().to_html("#### Text Complexity Metrics")
)

View File

@@ -4,7 +4,7 @@ from typing import Any, Dict, List, Optional, cast
from langchain_core.agents import AgentAction, AgentFinish
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.messages import BaseMessage
from langchain_core.outputs import LLMResult
from langchain_core.outputs import ChatGeneration, LLMResult
def import_infino() -> Any:
@@ -146,7 +146,7 @@ class InfinoCallbackHandler(BaseCallbackHandler):
# Track completion token usage (for openai chat models).
if self.is_chat_openai_model:
messages = " ".join(
generation.message.content # type: ignore[attr-defined]
cast(str, cast(ChatGeneration, generation).message.content)
for generation in generations
)
completion_tokens = get_num_tokens(

View File

@@ -109,13 +109,9 @@ def analyze_text(
spacy = import_spacy()
doc = nlp(text)
dep_out = spacy.displacy.render( # type: ignore
doc, style="dep", jupyter=False, page=True
)
dep_out = spacy.displacy.render(doc, style="dep", jupyter=False, page=True)
ent_out = spacy.displacy.render( # type: ignore
doc, style="ent", jupyter=False, page=True
)
ent_out = spacy.displacy.render(doc, style="ent", jupyter=False, page=True)
text_visualizations = {
"dependency_tree": dep_out,
@@ -233,7 +229,7 @@ class MlflowLogger:
data, os.path.join(self.dir, f"{filename}.json"), run_id=self.run_id
)
def table(self, name: str, dataframe) -> None: # type: ignore
def table(self, name: str, dataframe: Any) -> None:
"""To log the input pandas dataframe as a html table"""
self.html(dataframe.to_html(), f"table_{name}")
@@ -411,7 +407,7 @@ class MlflowCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
)
complexity_metrics: Dict[str, float] = generation_resp.pop(
"text_complexity_metrics"
) # type: ignore # noqa: E501
)
self.mlflg.metrics(
complexity_metrics,
step=self.metrics["step"],
@@ -723,7 +719,7 @@ class MlflowCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
)
return session_analysis_df
def _contain_llm_records(self):
def _contain_llm_records(self) -> bool:
return bool(self.records["on_llm_start_records"])
def flush_tracker(self, langchain_asset: Any = None, finish: bool = False) -> None:

View File

@@ -62,7 +62,7 @@ def StreamlitCallbackHandler(
# guaranteed to support the same set of kwargs.
try:
from streamlit.external.langchain import (
StreamlitCallbackHandler as OfficialStreamlitCallbackHandler, # type: ignore # noqa: 501
StreamlitCallbackHandler as OfficialStreamlitCallbackHandler,
)
return OfficialStreamlitCallbackHandler(

View File

@@ -108,7 +108,7 @@ class MutableExpander:
) -> int:
"""Add a Markdown element to the container and return its index."""
kwargs = {"body": body, "unsafe_allow_html": unsafe_allow_html, "help": help}
new_dg = self._get_dg(index).markdown(**kwargs) # type: ignore[arg-type]
new_dg = self._get_dg(index).markdown(**kwargs)
record = ChildRecord(ChildType.MARKDOWN, kwargs, new_dg)
return self._add_record(record, index)

View File

@@ -489,11 +489,10 @@ class WandbTracer(BaseTracer):
If not, will start a new run with the provided run_args.
"""
if self._wandb.run is None:
run_args = self._run_args or {} # type: ignore
run_args: dict = {**run_args} # type: ignore
run_args: Dict = {**(self._run_args or {})}
if "settings" not in run_args: # type: ignore
run_args["settings"] = {"silent": True} # type: ignore
if "settings" not in run_args:
run_args["settings"] = {"silent": True}
self._wandb.init(**run_args)
if self._wandb.run is not None:

View File

@@ -92,15 +92,11 @@ def analyze_text(
if visualize and nlp and output_dir is not None:
doc = nlp(text)
dep_out = spacy.displacy.render( # type: ignore
doc, style="dep", jupyter=False, page=True
)
dep_out = spacy.displacy.render(doc, style="dep", jupyter=False, page=True)
dep_output_path = Path(output_dir, hash_string(f"dep-{text}") + ".html")
dep_output_path.open("w", encoding="utf-8").write(dep_out)
ent_out = spacy.displacy.render( # type: ignore
doc, style="ent", jupyter=False, page=True
)
ent_out = spacy.displacy.render(doc, style="ent", jupyter=False, page=True)
ent_output_path = Path(output_dir, hash_string(f"ent-{text}") + ".html")
ent_output_path.open("w", encoding="utf-8").write(ent_out)
@@ -193,7 +189,7 @@ class WandbCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
self.stream_logs = stream_logs
self.temp_dir = tempfile.TemporaryDirectory()
self.run: wandb.sdk.wandb_run.Run = wandb.init( # type: ignore
self.run = wandb.init(
job_type=self.job_type,
project=self.project,
entity=self.entity,

View File

@@ -3,11 +3,12 @@ from __future__ import annotations
import json
import time
import typing
from typing import List, Optional
from typing import TYPE_CHECKING, List, Optional
if typing.TYPE_CHECKING:
from astrapy.db import AstraDB as LibAstraDB
from langchain_community.utilities.astradb import AstraDBEnvironment
if TYPE_CHECKING:
from astrapy.db import AstraDB
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import (
@@ -42,40 +43,22 @@ class AstraDBChatMessageHistory(BaseChatMessageHistory):
collection_name: str = DEFAULT_COLLECTION_NAME,
token: Optional[str] = None,
api_endpoint: Optional[str] = None,
astra_db_client: Optional[LibAstraDB] = None, # type 'astrapy.db.AstraDB'
astra_db_client: Optional[AstraDB] = None,
namespace: Optional[str] = None,
) -> None:
"""Create an Astra DB chat message history."""
try:
from astrapy.db import AstraDB as LibAstraDB
except (ImportError, ModuleNotFoundError):
raise ImportError(
"Could not import a recent astrapy python package. "
"Please install it with `pip install --upgrade astrapy`."
)
astra_env = AstraDBEnvironment(
token=token,
api_endpoint=api_endpoint,
astra_db_client=astra_db_client,
namespace=namespace,
)
self.astra_db = astra_env.astra_db
# Conflicting-arg checks:
if astra_db_client is not None:
if token is not None or api_endpoint is not None:
raise ValueError(
"You cannot pass 'astra_db_client' to AstraDB if passing "
"'token' and 'api_endpoint'."
)
self.collection = self.astra_db.create_collection(collection_name)
self.session_id = session_id
self.collection_name = collection_name
self.token = token
self.api_endpoint = api_endpoint
self.namespace = namespace
if astra_db_client is not None:
self.astra_db = astra_db_client
else:
self.astra_db = LibAstraDB(
token=self.token,
api_endpoint=self.api_endpoint,
namespace=self.namespace,
)
self.collection = self.astra_db.create_collection(self.collection_name)
@property
def messages(self) -> List[BaseMessage]: # type: ignore

View File

@@ -47,7 +47,7 @@ class ElasticsearchChatMessageHistory(BaseChatMessageHistory):
):
self.index: str = index
self.session_id: str = session_id
self.ensure_ascii: bool = esnsure_ascii
self.ensure_ascii = esnsure_ascii
# Initialize Elasticsearch client from passed client arg or connection info
if es_connection is not None:
@@ -177,7 +177,7 @@ class ElasticsearchChatMessageHistory(BaseChatMessageHistory):
"created_at": round(time() * 1000),
"history": json.dumps(
message_to_dict(message),
ensure_ascii=self.ensure_ascii,
ensure_ascii=bool(self.ensure_ascii),
),
},
refresh=True,

View File

@@ -1,4 +1,4 @@
from typing import List
from typing import List, Sequence
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage
@@ -13,9 +13,19 @@ class ChatMessageHistory(BaseChatMessageHistory, BaseModel):
messages: List[BaseMessage] = Field(default_factory=list)
async def aget_messages(self) -> List[BaseMessage]:
return self.messages
def add_message(self, message: BaseMessage) -> None:
"""Add a self-created message to the store"""
self.messages.append(message)
async def aadd_messages(self, messages: Sequence[BaseMessage]) -> None:
"""Add messages to the store"""
self.add_messages(messages)
def clear(self) -> None:
self.messages = []
async def aclear(self) -> None:
self.clear()

View File

@@ -39,7 +39,7 @@ class BaseMessageConverter(ABC):
raise NotImplementedError
def create_message_model(table_name, DynamicBase): # type: ignore
def create_message_model(table_name: str, DynamicBase: Any) -> Any:
"""
Create a message model for a given table name.
@@ -52,8 +52,8 @@ def create_message_model(table_name, DynamicBase): # type: ignore
"""
# Model decleared inside a function to have a dynamic table name
class Message(DynamicBase):
# Model declared inside a function to have a dynamic table name.
class Message(DynamicBase): # type: ignore[valid-type, misc]
__tablename__ = table_name
id = Column(Integer, primary_key=True)
session_id = Column(Text)

View File

@@ -40,7 +40,7 @@ class TiDBChatMessageHistory(BaseChatMessageHistory):
self.session_id = session_id
self.table_name = table_name
self.earliest_time = earliest_time
self.cache = []
self.cache: List = []
# Set up SQLAlchemy engine and session
self.engine = create_engine(connection_string)
@@ -102,7 +102,7 @@ class TiDBChatMessageHistory(BaseChatMessageHistory):
logger.error(f"Error loading messages to cache: {e}")
@property
def messages(self) -> List[BaseMessage]:
def messages(self) -> List[BaseMessage]: # type: ignore[override]
"""returns all messages"""
if len(self.cache) == 0:
self.reload_cache()

View File

@@ -149,7 +149,7 @@ class ZepChatMessageHistory(BaseChatMessageHistory):
return None
return zep_memory
def add_user_message(
def add_user_message( # type: ignore[override]
self, message: str, metadata: Optional[Dict[str, Any]] = None
) -> None:
"""Convenience method for adding a human message string to the store.
@@ -160,7 +160,7 @@ class ZepChatMessageHistory(BaseChatMessageHistory):
"""
self.add_message(HumanMessage(content=message), metadata=metadata)
def add_ai_message(
def add_ai_message( # type: ignore[override]
self, message: str, metadata: Optional[Dict[str, Any]] = None
) -> None:
"""Convenience method for adding an AI message string to the store.

View File

@@ -20,7 +20,7 @@ from langchain_community.llms.azureml_endpoint import (
class LlamaContentFormatter(ContentFormatterBase):
def __init__(self):
def __init__(self) -> None:
raise TypeError(
"`LlamaContentFormatter` is deprecated for chat models. Use "
"`LlamaChatContentFormatter` instead."
@@ -72,12 +72,12 @@ class LlamaChatContentFormatter(ContentFormatterBase):
def supported_api_types(self) -> List[AzureMLEndpointApiType]:
return [AzureMLEndpointApiType.realtime, AzureMLEndpointApiType.serverless]
def format_request_payload(
def format_messages_request_payload(
self,
messages: List[BaseMessage],
model_kwargs: Dict,
api_type: AzureMLEndpointApiType,
) -> str:
) -> bytes:
"""Formats the request according to the chosen api"""
chat_messages = [
LlamaChatContentFormatter._convert_message_to_dict(message)
@@ -101,7 +101,9 @@ class LlamaChatContentFormatter(ContentFormatterBase):
return str.encode(request_payload)
def format_response_payload(
self, output: bytes, api_type: AzureMLEndpointApiType
self,
output: bytes,
api_type: AzureMLEndpointApiType = AzureMLEndpointApiType.realtime,
) -> ChatGeneration:
"""Formats response"""
if api_type == AzureMLEndpointApiType.realtime:
@@ -187,7 +189,7 @@ class AzureMLChatOnlineEndpoint(BaseChatModel, AzureMLBaseEndpoint):
if stop:
_model_kwargs["stop"] = stop
request_payload = self.content_formatter.format_request_payload(
request_payload = self.content_formatter.format_messages_request_payload(
messages, _model_kwargs, self.endpoint_api_type
)
response_payload = self.http_client.call(

View File

@@ -327,7 +327,7 @@ class ChatDeepInfra(BaseChatModel):
if chunk:
yield ChatGenerationChunk(message=chunk, generation_info=None)
if run_manager:
run_manager.on_llm_new_token(chunk.content) # type: ignore[arg-type]
run_manager.on_llm_new_token(str(chunk.content))
async def _astream(
self,
@@ -349,7 +349,7 @@ class ChatDeepInfra(BaseChatModel):
if chunk:
yield ChatGenerationChunk(message=chunk, generation_info=None)
if run_manager:
await run_manager.on_llm_new_token(chunk.content) # type: ignore[arg-type]
await run_manager.on_llm_new_token(str(chunk.content))
async def _agenerate(
self,

View File

@@ -165,6 +165,12 @@ class ChatEdenAI(BaseChatModel):
"""Return type of chat model."""
return "edenai-chat"
@property
def _api_key(self) -> str:
if self.edenai_api_key:
return self.edenai_api_key.get_secret_value()
return ""
def _stream(
self,
messages: List[BaseMessage],
@@ -175,7 +181,7 @@ class ChatEdenAI(BaseChatModel):
"""Call out to EdenAI's chat endpoint."""
url = f"{self.edenai_api_url}/text/chat/stream"
headers = {
"Authorization": f"Bearer {self.edenai_api_key.get_secret_value()}",
"Authorization": f"Bearer {self._api_key}",
"User-Agent": self.get_user_agent(),
}
formatted_data = _format_edenai_messages(messages=messages)
@@ -216,7 +222,7 @@ class ChatEdenAI(BaseChatModel):
) -> AsyncIterator[ChatGenerationChunk]:
url = f"{self.edenai_api_url}/text/chat/stream"
headers = {
"Authorization": f"Bearer {self.edenai_api_key.get_secret_value()}",
"Authorization": f"Bearer {self._api_key}",
"User-Agent": self.get_user_agent(),
}
formatted_data = _format_edenai_messages(messages=messages)
@@ -265,7 +271,7 @@ class ChatEdenAI(BaseChatModel):
url = f"{self.edenai_api_url}/text/chat"
headers = {
"Authorization": f"Bearer {self.edenai_api_key.get_secret_value()}",
"Authorization": f"Bearer {self._api_key}",
"User-Agent": self.get_user_agent(),
}
formatted_data = _format_edenai_messages(messages=messages)
@@ -323,7 +329,7 @@ class ChatEdenAI(BaseChatModel):
url = f"{self.edenai_api_url}/text/chat"
headers = {
"Authorization": f"Bearer {self.edenai_api_key.get_secret_value()}",
"Authorization": f"Bearer {self._api_key}",
"User-Agent": self.get_user_agent(),
}
formatted_data = _format_edenai_messages(messages=messages)

View File

@@ -214,7 +214,7 @@ class ErnieBotChat(BaseChatModel):
generations = [
ChatGeneration(
message=AIMessage(
content=response.get("result"),
content=response.get("result", ""),
additional_kwargs={**additional_kwargs},
)
)

View File

@@ -14,6 +14,7 @@ from typing import (
Mapping,
Optional,
Tuple,
Type,
Union,
)
@@ -27,7 +28,7 @@ from langchain_core.language_models.chat_models import (
generate_from_stream,
)
from langchain_core.language_models.llms import create_base_retry_decorator
from langchain_core.messages import AIMessageChunk, BaseMessage
from langchain_core.messages import AIMessageChunk, BaseMessage, BaseMessageChunk
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from langchain_core.pydantic_v1 import BaseModel, Field, SecretStr, root_validator
from langchain_core.utils import convert_to_secret_str, get_from_dict_or_env
@@ -57,8 +58,8 @@ class GPTRouterModel(BaseModel):
def get_ordered_generation_requests(
models_priority_list: List[GPTRouterModel], **kwargs
):
models_priority_list: List[GPTRouterModel], **kwargs: Any
) -> List:
"""
Return the body for the model router input.
"""
@@ -100,7 +101,7 @@ def completion_with_retry(
models_priority_list: List[GPTRouterModel],
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Union[GenerationResponse, Generator[ChunkedGenerationResponse]]:
) -> Union[GenerationResponse, Generator[ChunkedGenerationResponse, None, None]]:
"""Use tenacity to retry the completion call."""
retry_decorator = _create_retry_decorator(llm, run_manager=run_manager)
@@ -122,7 +123,7 @@ async def acompletion_with_retry(
models_priority_list: List[GPTRouterModel],
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Union[GenerationResponse, AsyncGenerator[ChunkedGenerationResponse]]:
) -> Union[GenerationResponse, AsyncGenerator[ChunkedGenerationResponse, None]]:
"""Use tenacity to retry the async completion call."""
retry_decorator = _create_retry_decorator(llm, run_manager=run_manager)
@@ -283,8 +284,8 @@ class GPTRouter(BaseChatModel):
return self._create_chat_result(response)
def _create_chat_generation_chunk(
self, data: Mapping[str, Any], default_chunk_class
):
self, data: Mapping[str, Any], default_chunk_class: Type[BaseMessageChunk]
) -> Tuple[ChatGenerationChunk, Type[BaseMessageChunk]]:
chunk = _convert_delta_to_message_chunk(
{"content": data.get("text", "")}, default_chunk_class
)
@@ -293,8 +294,8 @@ class GPTRouter(BaseChatModel):
dict(finish_reason=finish_reason) if finish_reason is not None else None
)
default_chunk_class = chunk.__class__
chunk = ChatGenerationChunk(message=chunk, generation_info=generation_info)
return chunk, default_chunk_class
gen_chunk = ChatGenerationChunk(message=chunk, generation_info=generation_info)
return gen_chunk, default_chunk_class
def _stream(
self,
@@ -306,7 +307,7 @@ class GPTRouter(BaseChatModel):
message_dicts, params = self._create_message_dicts(messages, stop)
params = {**params, **kwargs, "stream": True}
default_chunk_class = AIMessageChunk
default_chunk_class: Type[BaseMessageChunk] = AIMessageChunk
generator_response = completion_with_retry(
self,
messages=message_dicts,
@@ -339,7 +340,7 @@ class GPTRouter(BaseChatModel):
message_dicts, params = self._create_message_dicts(messages, stop)
params = {**params, **kwargs, "stream": True}
default_chunk_class = AIMessageChunk
default_chunk_class: Type[BaseMessageChunk] = AIMessageChunk
generator_response = acompletion_with_retry(
self,
messages=message_dicts,

View File

@@ -44,7 +44,7 @@ class ChatHuggingFace(BaseChatModel):
llm: Union[HuggingFaceTextGenInference, HuggingFaceEndpoint, HuggingFaceHub]
system_message: SystemMessage = SystemMessage(content=DEFAULT_SYSTEM_PROMPT)
tokenizer: Any = None
model_id: str = None # type: ignore
model_id: Optional[str] = None
def __init__(self, **kwargs: Any):
super().__init__(**kwargs)

View File

@@ -25,7 +25,7 @@ logger = logging.getLogger(__name__)
# Ignoring type because below is valid pydantic code
# Unexpected keyword argument "extra" for "__init_subclass__" of "object" [call-arg]
class ChatParams(BaseModel, extra=Extra.allow): # type: ignore[call-arg]
class ChatParams(BaseModel, extra=Extra.allow):
"""Parameters for the `Javelin AI Gateway` LLM."""
temperature: float = 0.0

View File

@@ -13,6 +13,7 @@ from typing import (
Set,
Tuple,
Union,
cast,
)
import requests
@@ -169,7 +170,9 @@ class ChatKonko(ChatOpenAI):
}
if openai_api_key:
headers["X-OpenAI-Api-Key"] = openai_api_key.get_secret_value()
headers["X-OpenAI-Api-Key"] = cast(
SecretStr, openai_api_key
).get_secret_value()
models_response = requests.get(models_url, headers=headers)

View File

@@ -25,7 +25,7 @@ logger = logging.getLogger(__name__)
# Ignoring type because below is valid pydantic code
# Unexpected keyword argument "extra" for "__init_subclass__" of "object" [call-arg]
class ChatParams(BaseModel, extra=Extra.allow): # type: ignore[call-arg]
class ChatParams(BaseModel, extra=Extra.allow):
"""Parameters for the `MLflow AI Gateway` LLM."""
temperature: float = 0.0

View File

@@ -1,5 +1,5 @@
import json
from typing import Any, AsyncIterator, Dict, Iterator, List, Optional, Union
from typing import Any, AsyncIterator, Dict, Iterator, List, Optional, Union, cast
from langchain_core._api import deprecated
from langchain_core.callbacks import (
@@ -74,10 +74,15 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
if isinstance(message, ChatMessage):
message_text = f"\n\n{message.role.capitalize()}: {message.content}"
elif isinstance(message, HumanMessage):
if message.content[0].get("type") == "text":
message_text = f"[INST] {message.content[0]['text']} [/INST]"
elif message.content[0].get("type") == "image_url":
message_text = message.content[0]["image_url"]["url"]
if isinstance(message.content, List):
first_content = cast(List[Dict], message.content)[0]
content_type = first_content.get("type")
if content_type == "text":
message_text = f"[INST] {first_content['text']} [/INST]"
elif content_type == "image_url":
message_text = first_content["image_url"]["url"]
else:
message_text = f"[INST] {message.content} [/INST]"
elif isinstance(message, AIMessage):
message_text = f"{message.content}"
elif isinstance(message, SystemMessage):
@@ -94,7 +99,7 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
def _convert_messages_to_ollama_messages(
self, messages: List[BaseMessage]
) -> List[Dict[str, Union[str, List[str]]]]:
ollama_messages = []
ollama_messages: List = []
for message in messages:
role = ""
if isinstance(message, HumanMessage):
@@ -111,7 +116,7 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
if isinstance(message.content, str):
content = message.content
else:
for content_part in message.content:
for content_part in cast(List[Dict], message.content):
if content_part.get("type") == "text":
content += f"\n{content_part['text']}"
elif content_part.get("type") == "image_url":
@@ -324,21 +329,15 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> AsyncIterator[ChatGenerationChunk]:
try:
async for stream_resp in self._acreate_chat_stream(
messages, stop, **kwargs
):
if stream_resp:
chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp)
yield chunk
if run_manager:
await run_manager.on_llm_new_token(
chunk.text,
verbose=self.verbose,
)
except OllamaEndpointNotFoundError:
async for chunk in self._legacy_astream(messages, stop, **kwargs):
async for stream_resp in self._acreate_chat_stream(messages, stop, **kwargs):
if stream_resp:
chunk = _chat_stream_response_to_chat_generation_chunk(stream_resp)
yield chunk
if run_manager:
await run_manager.on_llm_new_token(
chunk.text,
verbose=self.verbose,
)
@deprecated("0.0.3", alternative="_stream")
def _legacy_stream(

View File

@@ -554,7 +554,7 @@ class ChatOpenAI(BaseChatModel):
if self.openai_proxy:
import openai
openai.proxy = {"http": self.openai_proxy, "https": self.openai_proxy} # type: ignore[assignment] # noqa: E501
openai.proxy = {"http": self.openai_proxy, "https": self.openai_proxy}
return {**self._default_params, **openai_creds}
def _get_invocation_params(

View File

@@ -13,6 +13,7 @@ from typing import (
Mapping,
Optional,
Union,
cast,
)
from langchain_core.callbacks import (
@@ -197,7 +198,7 @@ class ChatTongyi(BaseChatModel):
return {
"model": self.model_name,
"top_p": self.top_p,
"api_key": self.dashscope_api_key.get_secret_value(),
"api_key": cast(SecretStr, self.dashscope_api_key).get_secret_value(),
"result_format": "message",
**self.model_kwargs,
}

View File

@@ -120,11 +120,10 @@ def _parse_chat_history_gemini(
image = load_image_from_gcs(path=path, project=project)
elif path.startswith("data:image/"):
# extract base64 component from image uri
try:
encoded = re.search(r"data:image/\w{2,4};base64,(.*)", path).group(
1
)
except AttributeError:
encoded: Any = re.search(r"data:image/\w{2,4};base64,(.*)", path)
if encoded:
encoded = encoded.group(1)
else:
raise ValueError(
"Invalid image uri. It should be in the format "
"data:image/<image_type>;base64,<base64_encoded_image>."

View File

@@ -210,7 +210,8 @@ async def _amake_request(self: ChatYandexGPT, messages: List[BaseMessage]) -> st
await asyncio.sleep(1)
operation_request = GetOperationRequest(operation_id=operation.id)
operation = await operation_stub.Get(
operation_request, metadata=self._grpc_metadata
operation_request,
metadata=self._grpc_metadata,
)
completion_response = CompletionResponse()

View File

@@ -5,7 +5,7 @@ import asyncio
import json
import logging
from functools import partial
from typing import Any, Dict, Iterator, List, Optional
from typing import Any, Dict, Iterator, List, Optional, cast
from langchain_core.callbacks import CallbackManagerForLLMRun
from langchain_core.language_models.chat_models import (
@@ -161,7 +161,7 @@ class ChatZhipuAI(BaseChatModel):
return attributes
def __init__(self, *args, **kwargs):
def __init__(self, *args: Any, **kwargs: Any) -> None:
super().__init__(*args, **kwargs)
try:
import zhipuai
@@ -174,7 +174,7 @@ class ChatZhipuAI(BaseChatModel):
"Please install it via 'pip install zhipuai'"
)
def invoke(self, prompt):
def invoke(self, prompt: Any) -> Any: # type: ignore[override]
if self.model == "chatglm_turbo":
return self.zhipuai.model_api.invoke(
model=self.model,
@@ -185,17 +185,17 @@ class ChatZhipuAI(BaseChatModel):
return_type=self.return_type,
)
elif self.model == "characterglm":
meta = self.meta.dict()
_meta = cast(meta, self.meta).dict()
return self.zhipuai.model_api.invoke(
model=self.model,
meta=meta,
meta=_meta,
prompt=prompt,
request_id=self.request_id,
return_type=self.return_type,
)
return None
def sse_invoke(self, prompt):
def sse_invoke(self, prompt: Any) -> Any:
if self.model == "chatglm_turbo":
return self.zhipuai.model_api.sse_invoke(
model=self.model,
@@ -207,18 +207,18 @@ class ChatZhipuAI(BaseChatModel):
incremental=self.incremental,
)
elif self.model == "characterglm":
meta = self.meta.dict()
_meta = cast(meta, self.meta).dict()
return self.zhipuai.model_api.sse_invoke(
model=self.model,
prompt=prompt,
meta=meta,
meta=_meta,
request_id=self.request_id,
return_type=self.return_type,
incremental=self.incremental,
)
return None
async def async_invoke(self, prompt):
async def async_invoke(self, prompt: Any) -> Any:
loop = asyncio.get_running_loop()
partial_func = partial(
self.zhipuai.model_api.async_invoke, model=self.model, prompt=prompt
@@ -229,7 +229,7 @@ class ChatZhipuAI(BaseChatModel):
)
return response
async def async_invoke_result(self, task_id):
async def async_invoke_result(self, task_id: Any) -> Any:
loop = asyncio.get_running_loop()
response = await loop.run_in_executor(
None,
@@ -247,7 +247,7 @@ class ChatZhipuAI(BaseChatModel):
**kwargs: Any,
) -> ChatResult:
"""Generate a chat response."""
prompt = []
prompt: List = []
for message in messages:
if isinstance(message, AIMessage):
role = "assistant"
@@ -270,11 +270,14 @@ class ChatZhipuAI(BaseChatModel):
else:
stream_iter = self._stream(
prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
prompt=prompt,
stop=stop,
run_manager=run_manager,
**kwargs,
)
return generate_from_stream(stream_iter)
async def _agenerate(
async def _agenerate( # type: ignore[override]
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
@@ -307,7 +310,7 @@ class ChatZhipuAI(BaseChatModel):
generations=[ChatGeneration(message=AIMessage(content=content))]
)
def _stream(
def _stream( # type: ignore[override]
self,
prompt: List[Dict[str, str]],
stop: Optional[List[str]] = None,

View File

@@ -123,7 +123,7 @@ class AssemblyAIAudioLoaderById(BaseLoader):
"""
def __init__(self, transcript_id, api_key, transcript_format):
def __init__(self, transcript_id, api_key, transcript_format): # type: ignore[no-untyped-def]
"""
Initializes the AssemblyAI AssemblyAIAudioLoaderById.

View File

@@ -16,8 +16,10 @@ from typing import (
)
from langchain_core.documents import Document
from langchain_core.runnables import run_in_executor
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.utilities.astradb import AstraDBEnvironment
if TYPE_CHECKING:
from astrapy.db import AstraDB, AsyncAstraDB
@@ -42,21 +44,15 @@ class AstraDBLoader(BaseLoader):
nb_prefetched: int = 1000,
extraction_function: Callable[[Dict], str] = json.dumps,
) -> None:
try:
from astrapy.db import AstraDB
except (ImportError, ModuleNotFoundError):
raise ImportError(
"Could not import a recent astrapy python package. "
"Please install it with `pip install --upgrade astrapy`."
)
# Conflicting-arg checks:
if astra_db_client is not None or async_astra_db_client is not None:
if token is not None or api_endpoint is not None:
raise ValueError(
"You cannot pass 'astra_db_client' or 'async_astra_db_client' to "
"AstraDB if passing 'token' and 'api_endpoint'."
)
astra_env = AstraDBEnvironment(
token=token,
api_endpoint=api_endpoint,
astra_db_client=astra_db_client,
async_astra_db_client=async_astra_db_client,
namespace=namespace,
)
self.astra_env = astra_env
self.collection = astra_env.astra_db.collection(collection_name)
self.collection_name = collection_name
self.filter = filter_criteria
self.projection = projection
@@ -64,48 +60,12 @@ class AstraDBLoader(BaseLoader):
self.nb_prefetched = nb_prefetched
self.extraction_function = extraction_function
astra_db = astra_db_client
async_astra_db = async_astra_db_client
if token and api_endpoint:
astra_db = AstraDB(
token=token,
api_endpoint=api_endpoint,
namespace=namespace,
)
try:
from astrapy.db import AsyncAstraDB
async_astra_db = AsyncAstraDB(
token=token,
api_endpoint=api_endpoint,
namespace=namespace,
)
except (ImportError, ModuleNotFoundError):
pass
if not astra_db and not async_astra_db:
raise ValueError(
"Must provide 'astra_db_client' or 'async_astra_db_client' or 'token' "
"and 'api_endpoint'"
)
self.collection = astra_db.collection(collection_name) if astra_db else None
if async_astra_db:
from astrapy.db import AsyncAstraDBCollection
self.async_collection = AsyncAstraDBCollection(
astra_db=async_astra_db, collection_name=collection_name
)
else:
self.async_collection = None
def load(self) -> List[Document]:
"""Eagerly load the content."""
return list(self.lazy_load())
def lazy_load(self) -> Iterator[Document]:
if not self.collection:
raise ValueError("Missing AstraDB client")
queue = Queue(self.nb_prefetched)
queue = Queue(self.nb_prefetched) # type: ignore[var-annotated]
t = threading.Thread(target=self.fetch_results, args=(queue,))
t.start()
while True:
@@ -120,9 +80,27 @@ class AstraDBLoader(BaseLoader):
return [doc async for doc in self.alazy_load()]
async def alazy_load(self) -> AsyncIterator[Document]:
if not self.async_collection:
raise ValueError("Missing AsyncAstraDB client")
async for doc in self.async_collection.paginated_find(
if not self.astra_env.async_astra_db:
iterator = run_in_executor(
None,
self.collection.paginated_find,
filter=self.filter,
options=self.find_options,
projection=self.projection,
sort=None,
prefetched=True,
)
done = object()
while True:
item = await run_in_executor(None, lambda it: next(it, done), iterator)
if item is done:
break
yield item # type: ignore[misc]
return
async_collection = await self.astra_env.async_astra_db.collection(
self.collection_name
)
async for doc in async_collection.paginated_find(
filter=self.filter,
options=self.find_options,
projection=self.projection,
@@ -132,19 +110,19 @@ class AstraDBLoader(BaseLoader):
yield Document(
page_content=self.extraction_function(doc),
metadata={
"namespace": self.async_collection.astra_db.namespace,
"api_endpoint": self.async_collection.astra_db.base_url,
"namespace": async_collection.astra_db.namespace,
"api_endpoint": async_collection.astra_db.base_url,
"collection": self.collection_name,
},
)
def fetch_results(self, queue: Queue):
def fetch_results(self, queue: Queue): # type: ignore[no-untyped-def]
self.fetch_page_result(queue)
while self.find_options.get("pageState"):
self.fetch_page_result(queue)
queue.put(None)
def fetch_page_result(self, queue: Queue):
def fetch_page_result(self, queue: Queue): # type: ignore[no-untyped-def]
res = self.collection.find(
filter=self.filter,
options=self.find_options,

View File

@@ -64,10 +64,10 @@ class BaseLoader(ABC):
iterator = await run_in_executor(None, self.lazy_load)
done = object()
while True:
doc = await run_in_executor(None, next, iterator, done)
doc = await run_in_executor(None, next, iterator, done) # type: ignore[call-arg, arg-type]
if doc is done:
break
yield doc
yield doc # type: ignore[misc]
class BaseBlobParser(ABC):

View File

@@ -33,14 +33,14 @@ class CassandraLoader(BaseLoader):
page_content_mapper: Callable[[Any], str] = str,
metadata_mapper: Callable[[Any], dict] = lambda _: {},
*,
query_parameters: Union[dict, Sequence] = None,
query_timeout: Optional[float] = _NOT_SET,
query_parameters: Union[dict, Sequence] = None, # type: ignore[assignment]
query_timeout: Optional[float] = _NOT_SET, # type: ignore[assignment]
query_trace: bool = False,
query_custom_payload: dict = None,
query_custom_payload: dict = None, # type: ignore[assignment]
query_execution_profile: Any = _NOT_SET,
query_paging_state: Any = None,
query_host: Host = None,
query_execute_as: str = None,
query_execute_as: str = None, # type: ignore[assignment]
) -> None:
"""
Document Loader for Apache Cassandra.
@@ -85,7 +85,7 @@ class CassandraLoader(BaseLoader):
self.query = f"SELECT * FROM {_keyspace}.{table};"
self.metadata = {"table": table, "keyspace": _keyspace}
else:
self.query = query
self.query = query # type: ignore[assignment]
self.metadata = {}
self.session = session or check_resolve_session(session)

View File

@@ -27,7 +27,7 @@ class UnstructuredCHMLoader(UnstructuredFileLoader):
def _get_elements(self) -> List:
from unstructured.partition.html import partition_html
with CHMParser(self.file_path) as f:
with CHMParser(self.file_path) as f: # type: ignore[arg-type]
return [
partition_html(text=item["content"], **self.unstructured_kwargs)
for item in f.load_all()
@@ -45,10 +45,10 @@ class CHMParser(object):
self.file = chm.CHMFile()
self.file.LoadCHM(path)
def __enter__(self):
def __enter__(self): # type: ignore[no-untyped-def]
return self
def __exit__(self, exc_type, exc_value, traceback):
def __exit__(self, exc_type, exc_value, traceback): # type: ignore[no-untyped-def]
if self.file:
self.file.CloseCHM()

View File

@@ -89,4 +89,4 @@ class AzureAIDocumentIntelligenceLoader(BaseLoader):
blob = Blob.from_path(self.file_path)
yield from self.parser.parse(blob)
else:
yield from self.parser.parse_url(self.url_path)
yield from self.parser.parse_url(self.url_path) # type: ignore[arg-type]

View File

@@ -60,7 +60,7 @@ class MWDumpLoader(BaseLoader):
self.skip_redirects = skip_redirects
self.stop_on_error = stop_on_error
def _load_dump_file(self):
def _load_dump_file(self): # type: ignore[no-untyped-def]
try:
import mwxml
except ImportError as e:
@@ -70,7 +70,7 @@ class MWDumpLoader(BaseLoader):
return mwxml.Dump.from_file(open(self.file_path, encoding=self.encoding))
def _load_single_page_from_dump(self, page) -> Document:
def _load_single_page_from_dump(self, page) -> Document: # type: ignore[no-untyped-def, return]
"""Parse a single page."""
try:
import mwparserfromhell

View File

@@ -11,7 +11,7 @@ from langchain_community.document_loaders.blob_loaders import Blob
class VsdxParser(BaseBlobParser, ABC):
def parse(self, blob: Blob) -> Iterator[Document]:
def parse(self, blob: Blob) -> Iterator[Document]: # type: ignore[override]
"""Parse a vsdx file."""
return self.lazy_parse(blob)
@@ -21,7 +21,7 @@ class VsdxParser(BaseBlobParser, ABC):
with blob.as_bytes_io() as pdf_file_obj:
with zipfile.ZipFile(pdf_file_obj, "r") as zfile:
pages = self.get_pages_content(zfile, blob.source)
pages = self.get_pages_content(zfile, blob.source) # type: ignore[arg-type]
yield from [
Document(
@@ -60,13 +60,13 @@ class VsdxParser(BaseBlobParser, ABC):
if "visio/pages/pages.xml" not in zfile.namelist():
print("WARNING - No pages.xml file found in {}".format(source))
return
return # type: ignore[return-value]
if "visio/pages/_rels/pages.xml.rels" not in zfile.namelist():
print("WARNING - No pages.xml.rels file found in {}".format(source))
return
return # type: ignore[return-value]
if "docProps/app.xml" not in zfile.namelist():
print("WARNING - No app.xml file found in {}".format(source))
return
return # type: ignore[return-value]
pagesxml_content: dict = xmltodict.parse(zfile.read("visio/pages/pages.xml"))
appxml_content: dict = xmltodict.parse(zfile.read("docProps/app.xml"))
@@ -79,7 +79,7 @@ class VsdxParser(BaseBlobParser, ABC):
rel["@Name"].strip() for rel in pagesxml_content["Pages"]["Page"]
]
else:
disordered_names: List[str] = [
disordered_names: List[str] = [ # type: ignore[no-redef]
pagesxml_content["Pages"]["Page"]["@Name"].strip()
]
if isinstance(pagesxmlrels_content["Relationships"]["Relationship"], list):
@@ -88,7 +88,7 @@ class VsdxParser(BaseBlobParser, ABC):
for rel in pagesxmlrels_content["Relationships"]["Relationship"]
]
else:
disordered_paths: List[str] = [
disordered_paths: List[str] = [ # type: ignore[no-redef]
"visio/pages/"
+ pagesxmlrels_content["Relationships"]["Relationship"]["@Target"]
]

View File

@@ -89,7 +89,7 @@ class BaichuanTextEmbeddings(BaseModel, Embeddings):
print(f"Exception occurred while trying to get embeddings: {str(e)}")
return None
def embed_documents(self, texts: List[str]) -> Optional[List[List[float]]]:
def embed_documents(self, texts: List[str]) -> Optional[List[List[float]]]: # type: ignore[override]
"""Public method to get embeddings for a list of documents.
Args:
@@ -100,7 +100,7 @@ class BaichuanTextEmbeddings(BaseModel, Embeddings):
"""
return self._embed(texts)
def embed_query(self, text: str) -> Optional[List[float]]:
def embed_query(self, text: str) -> Optional[List[float]]: # type: ignore[override]
"""Public method to get embedding for a single query text.
Args:

View File

@@ -56,7 +56,7 @@ class EdenAiEmbeddings(BaseModel, Embeddings):
headers = {
"accept": "application/json",
"content-type": "application/json",
"authorization": f"Bearer {self.edenai_api_key.get_secret_value()}",
"authorization": f"Bearer {self.edenai_api_key.get_secret_value()}", # type: ignore[union-attr]
"User-Agent": self.get_user_agent(),
}

View File

@@ -85,7 +85,7 @@ class EmbaasEmbeddings(BaseModel, Embeddings):
def _handle_request(self, payload: EmbaasEmbeddingsPayload) -> List[List[float]]:
"""Sends a request to the Embaas API and handles the response."""
headers = {
"Authorization": f"Bearer {self.embaas_api_key.get_secret_value()}",
"Authorization": f"Bearer {self.embaas_api_key.get_secret_value()}", # type: ignore[union-attr]
"Content-Type": "application/json",
}

View File

@@ -162,5 +162,5 @@ class TinyAsyncGradientEmbeddingClient: #: :meta private:
It might be entirely removed in the future.
"""
def __init__(self, *args, **kwargs) -> None:
def __init__(self, *args, **kwargs) -> None: # type: ignore[no-untyped-def]
raise ValueError("Deprecated,TinyAsyncGradientEmbeddingClient was removed.")

View File

@@ -49,6 +49,8 @@ class HuggingFaceEmbeddings(BaseModel, Embeddings):
"""Keyword arguments to pass when calling the `encode` method of the model."""
multi_process: bool = False
"""Run encode() on multiple GPUs."""
show_progress: bool = False
"""Whether to show a progress bar."""
def __init__(self, **kwargs: Any):
"""Initialize the sentence_transformer."""
@@ -88,7 +90,9 @@ class HuggingFaceEmbeddings(BaseModel, Embeddings):
embeddings = self.client.encode_multi_process(texts, pool)
sentence_transformers.SentenceTransformer.stop_multi_process_pool(pool)
else:
embeddings = self.client.encode(texts, **self.encode_kwargs)
embeddings = self.client.encode(
texts, show_progress_bar=self.show_progress, **self.encode_kwargs
)
return embeddings.tolist()

View File

@@ -56,7 +56,7 @@ class LLMRailsEmbeddings(BaseModel, Embeddings):
"""
response = requests.post(
"https://api.llmrails.com/v1/embeddings",
headers={"X-API-KEY": self.api_key.get_secret_value()},
headers={"X-API-KEY": self.api_key.get_secret_value()}, # type: ignore[union-attr]
json={"input": texts, "model": self.model},
timeout=60,
)

View File

@@ -110,7 +110,7 @@ class MiniMaxEmbeddings(BaseModel, Embeddings):
# HTTP headers for authorization
headers = {
"Authorization": f"Bearer {self.minimax_api_key.get_secret_value()}",
"Authorization": f"Bearer {self.minimax_api_key.get_secret_value()}", # type: ignore[union-attr]
"Content-Type": "application/json",
}

View File

@@ -71,7 +71,8 @@ class MlflowEmbeddings(Embeddings, BaseModel):
embeddings: List[List[float]] = []
for txt in _chunk(texts, 20):
resp = self._client.predict(
endpoint=self.endpoint, inputs={"input": txt, **params}
endpoint=self.endpoint,
inputs={"input": txt, **params}, # type: ignore[arg-type]
)
embeddings.extend(r["embedding"] for r in resp["data"])
return embeddings

View File

@@ -63,16 +63,16 @@ class OCIGenAIEmbeddings(BaseModel, Embeddings):
If not specified , DEFAULT will be used
"""
model_id: str = None
model_id: str = None # type: ignore[assignment]
"""Id of the model to call, e.g., cohere.embed-english-light-v2.0"""
model_kwargs: Optional[Dict] = None
"""Keyword arguments to pass to the model"""
service_endpoint: str = None
service_endpoint: str = None # type: ignore[assignment]
"""service endpoint url"""
compartment_id: str = None
compartment_id: str = None # type: ignore[assignment]
"""OCID of compartment"""
truncate: Optional[str] = "END"
@@ -109,7 +109,7 @@ class OCIGenAIEmbeddings(BaseModel, Embeddings):
client_kwargs.pop("signer", None)
elif values["auth_type"] == OCIAuthType(2).name:
def make_security_token_signer(oci_config):
def make_security_token_signer(oci_config): # type: ignore[no-untyped-def]
pk = oci.signer.load_private_key_from_file(
oci_config.get("key_file"), None
)

View File

@@ -78,7 +78,7 @@ class SpacyEmbeddings(BaseModel, Embeddings):
Returns:
A list of embeddings, one for each document.
"""
return [self.nlp(text).vector.tolist() for text in texts]
return [self.nlp(text).vector.tolist() for text in texts] # type: ignore[misc]
def embed_query(self, text: str) -> List[float]:
"""
@@ -90,7 +90,7 @@ class SpacyEmbeddings(BaseModel, Embeddings):
Returns:
The embedding for the text.
"""
return self.nlp(text).vector.tolist()
return self.nlp(text).vector.tolist() # type: ignore[misc]
async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
"""

View File

@@ -42,10 +42,10 @@ class YandexGPTEmbeddings(BaseModel, Embeddings):
embeddings = YandexGPTEmbeddings(iam_token="t1.9eu...", model_uri="emb://<folder-id>/text-search-query/latest")
"""
iam_token: SecretStr = ""
iam_token: SecretStr = "" # type: ignore[assignment]
"""Yandex Cloud IAM token for service account
with the `ai.languageModels.user` role"""
api_key: SecretStr = ""
api_key: SecretStr = "" # type: ignore[assignment]
"""Yandex Cloud Api Key for service account
with the `ai.languageModels.user` role"""
model_uri: str = ""
@@ -146,7 +146,7 @@ def _embed_with_retry(llm: YandexGPTEmbeddings, **kwargs: Any) -> Any:
return _completion_with_retry(**kwargs)
def _make_request(self: YandexGPTEmbeddings, texts: List[str]):
def _make_request(self: YandexGPTEmbeddings, texts: List[str]): # type: ignore[no-untyped-def]
try:
import grpc
from yandex.cloud.ai.foundation_models.v1.foundation_models_service_pb2 import ( # noqa: E501
@@ -167,7 +167,7 @@ def _make_request(self: YandexGPTEmbeddings, texts: List[str]):
for text in texts:
request = TextEmbeddingRequest(model_uri=self.model_uri, text=text)
stub = EmbeddingsServiceStub(channel)
res = stub.TextEmbedding(request, metadata=self._grpc_metadata)
res = stub.TextEmbedding(request, metadata=self._grpc_metadata) # type: ignore[attr-defined]
result.append(list(res.embedding))
time.sleep(self.sleep_interval)

View File

@@ -0,0 +1,10 @@
"""Logic for selecting examples to include in prompts."""
from langchain_community.example_selectors.ngram_overlap import (
NGramOverlapExampleSelector,
ngram_overlap_score,
)
__all__ = [
"NGramOverlapExampleSelector",
"ngram_overlap_score",
]

Some files were not shown because too many files have changed in this diff Show More