## Description
In `langchain_prompty`, messages are templated by Prompty. However, a
call to `ChatPromptTemplate` was initiating a second templating. We now
convert parsed messages to `Message` objects before calling
`ChatPromptTemplate`, signifying clearly that they are already
templated.
We also revert #25739 , which applied to this second templating, which
we now avoid, and did not fix the original issue.
## Issue
Closes#25703
Add array data type for milvus vector store collection create
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
Adds the 'score' returned by Pinecone to the
`PineconeHybridSearchRetriever` list of returned Documents.
There is currently no way to return the score when using Pinecone hybrid
search, so in this PR I include it by default.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
### Description
adds an init method to ChatDeepInfra to set the model_name attribute
accordings to the argument
### Issue
currently, the model_name specified by the user during initialization of
the ChatDeepInfra class is never set. Therefore, it always chooses the
default model (meta-llama/Llama-2-70b-chat-hf, however probably since
this is deprecated it always uses meta-llama/Llama-3-70b-Instruct). We
stumbled across this issue and fixed it as proposed in this pull
request. Feel free to change the fix according to your coding guidelines
and style, this is just a proposal and we want to draw attention to this
problem.
### Dependencies
no additional dependencies required
Feel free to contact me or @timo282 and @finitearth if you have any
questions.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Make the hyperlink only appear once in the
extract_hyperlinks tool output. (for some websites output contains
meaningless '#' hyperlinks multiple times which will extend the tokens
of context window without any advantage)
**Issue:** None
**Dependencies:** None
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain: Chains: query_constructor: add date time
parser"
- [x] **PR message**:
- **Description:** add date time parser to langchain Chains
query_constructor
- **Issue: https://github.com/langchain-ai/langchain/issues/25526
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Added Azure Search Access Token Authentication instead of API KEY auth.
Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263
Dependencies: None
Twitter: @levalencia
@baskaryan
Could you please review? First time creating a PR that fixes some code.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This pull request introduces support for the AI21 tools calling feature,
available by the Jamba-1.5 models. When Jamba-1.5 detects the necessity
to invoke a provided tool, as indicated by the 'tools' parameter passed
to the model:
```
class ToolDefinition(TypedDict, total=False):
type: Required[Literal["function"]]
function: Required[FunctionToolDefinition]
class FunctionToolDefinition(TypedDict, total=False):
name: Required[str]
description: str
parameters: ToolParameters
class ToolParameters(TypedDict, total=False):
type: Literal["object"]
properties: Required[Dict[str, Any]]
required: List[str]
```
It will respond with a list of tool calls structured as follows:
```
class ToolCall(AI21BaseModel):
id: str
function: ToolFunction
type: Literal["function"] = "function"
class ToolFunction(AI21BaseModel):
name: str
arguments: str
```
This pull request incorporates the necessary modifications to integrate
this functionality into the ai21-langchain library.
---------
Co-authored-by: asafg <asafg@ai21.com>
Co-authored-by: pazshalev <111360591+pazshalev@users.noreply.github.com>
Co-authored-by: Paz Shalev <pazs@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- "libs: langchain_milvus: add db name to milvus connection check"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** add db name to milvus connection check
- **Issue:** https://github.com/langchain-ai/langchain/issues/25277
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This addresses the issue mentioned in #25702
I have updated the endpoint used in validating the endpoint API type in
the AzureMLBaseEndpoint class from `/v1/completions` to `/completions`
and `/v1/chat/completions` to `/chat/completions`.
Co-authored-by: = <=>
- **Description:** Added a `template_format` parameter to
`create_chat_prompt` to allow `.prompty` files to handle variables in
different template formats.
- **Issue:** #25703
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **PR message**: **Fix URL construction in newer Python versions**
- **Description:**
- Update the URL construction logic to use the .value attribute for
Routes enum members.
- This adjustment resolves an issue where the code worked correctly in
Python 3.9 but failed in Python 3.11.
- Clean up unused routes.
- **Issue:** NA
- **Dependencies:** NA
* Removed `ruff check --select I` as `I` is already selected and checked
in the main `ruff check` command
* Added checks for non-empty `PYTHON_FILES`
* Run `ruff check` only on `PYTHON_FILES`
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Fix the validation error for `endpoint_url` for
HuggingFaceEndpoint. I have given a descriptive detail of the isse in
the issue that I have created.
- **Issue:** #24742
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
### Summary
Add `DatabricksVectorSearch` and `DatabricksEmbeddings` classes to the
`langchain-databricks` partner packages. Core functionality is
unchanged, but the vector search class is largely refactored for
readability and maintainability.
This PR does not add integration tests yet. This will be added once the
Databricks test workspace is ready.
Tagging @efriis as POC
### Tracker
[✅] Create a package and imgrate ChatDatabricks
[✍️] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their
docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- [ ] **PR message**:
- **Description:** Compatible with other llm (eg: deepseek-chat, glm-4)
usage meta data
- **Issue:** N/A
- **Dependencies:** no new dependencies added
- [ ] **Add tests and docs**:
libs/partners/openai/tests/unit_tests/chat_models/test_base.py
```shell
cd libs/partners/openai
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_stream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_stream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_stream
```
---------
Co-authored-by: hyman <hyman@xiaozancloud.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "langchain-core: Fix type"
- The file to modify is located in
/libs/core/langchain_core/prompts/base.py
- [ ] **PR message**:
- **Description:** The change is a type for the inner input variable,
the type go from dict to Any. This change is required since the method
_validate input expects a type that is not only a dictionary.
- **Dependencies:** There are no dependencies for this change
- [ ] **Add tests and docs**:
1. A test is not needed. This error occurs because I overrode a portion
of the _validate_input method, which is causing a 'beartype' to raise an
error.
Be more explicit in the docs about creating an instance of the
UnstructuredClient if you want to customize it versus using sdk
parameters with the UnstructuredLoader.
Bump the unstructured-client dependency as discussed
[here](https://github.com/langchain-ai/langchain/discussions/25328#discussioncomment-10350949)
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces adjustments to ensure compatibility with the recently
released preview version of [TiDB Serverless Vector
Search](https://tidb.cloud/ai), aiming to prevent user confusion.
- TiDB Vector now supports vector indexing with cosine and l2 distance
strategies, although inner_product remains unsupported.
- Changing the distance strategy is currently not supported, so the test
cased should be adjusted.
updated stop and request_timeout so they aliased to stop_sequences, and
timeout respectively. Added test that both continue to set the same
underlying attributes.
Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)
Co-authored-by: ccurme <chester.curme@gmail.com>
Issue: the `service` optional parameter was mentioned but not used.
Fix: added this parameter.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
There is a bug in the concatenation of embeddings obtained from MLflow
that does not conform to the type hint requested by the function.
``` python
def _query(self, texts: List[str]) -> List[List[float]]:
```
It is logical to expect a **List[List[float]]** for a **List[str]**.
However, the append method encapsulates the response in a global List.
To avoid this, the extend method should be used, which will add the
embeddings of all strings at the same list level.
## Testing
I have tried using OpenAI-ADA to obtain the embeddings, and the result
of executing this snippet is as follows:
``` python
embeds = await MlflowAIGatewayEmbeddings().aembed_documents(texts=["hi", "how are you?"])
print(embeds)
```
``` python
[[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]]
```
When in reality, the expected result should be:
``` python
[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]
```
The above result complies with the expected type hint:
**List[List[float]]** . As I mentioned, we can achieve that by using the
extend method instead of the append method.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Description: Simply pass kwargs to allow arguments like "where" to be
propagated
Issue: Previously, db.delete(where={}) wouldn't work for chroma
vectorstores
Dependencies: N/A
Twitter handle: N/A
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: Send both the query and query_embedding to the Databricks
index for hybrid search.
Issue: When using hybrid search with non-Databricks managed embedding we
currently don't pass both the embedding and query_text to the index.
Hybrid search requires both of these. This change fixes this issue for
both `similarity_search` and `similarity_search_by_vector`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
# Issue
As of late July, Perplexity [no longer supports Llama 3
models](https://docs.perplexity.ai/changelog/introducing-new-and-improved-sonar-models).
# Description
This PR updates the default model and doc examples to reflect their
latest supported model. (Mostly updating the same places changed by
#23723.)
# Twitter handle
`@acompa_` on behalf of the team at Not Diamond. Check us out
[here](https://notdiamond.ai).
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Hello.
First of all, thank you for maintaining such a great project.
## Description
In https://github.com/langchain-ai/langchain/pull/25123, support for
structured_output is added. However, `"additionalProperties": false`
needs to be included at all levels when a nested object is generated.
error from current code:
https://gist.github.com/fufufukakaka/e9b475300e6934853d119428e390f204
```
BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'JokeWithEvaluation': In context=('properties', 'self_evaluation'), 'additionalProperties' is required to be supplied and to be false", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}
```
Reference: [Introducing Structured Outputs in the
API](https://openai.com/index/introducing-structured-outputs-in-the-api/)
```json
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
}
}
}
```
In the current code, `"additionalProperties": false` is only added at
the last level.
This PR introduces the `_add_additional_properties_key` function, which
recursively adds `"additionalProperties": false` to the entire JSON
schema for the request.
Twitter handle: `@fukkaa1225`
Thank you!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Previously the code was able to only handle a single level of nesting
for subgraphs in mermaid. This change adds support for arbitrary nesting
of subgraphs.
Fix handling of pipeline_kwargs to prioritize class attribute defaults.
#19770
Co-authored-by: jaizo <manuel.jaiczay@polygons.at>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:
1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.
The respective solutions I propose are the following:
1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
When I used the Neo4JGraph enhanced_schema=True option, I ran into an
error because a prop min_size of None was compared numerically with an
int.
The fix I applied is similar to the pattern of skipping embeddings
elsewhere in the file.
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
LLM will stop generating text even in the middle of a sentence if
`finish_reason` is `length` (for OpenAI) or `stop_reason` is
`max_tokens` (for Anthropic).
To obtain longer outputs from LLM, we should call the message generation
API multiple times and merge the results into the text to circumvent the
API's output token limit.
The extra line breaks forced by the `merge_message_runs` function when
seamlessly merging messages can be annoying, so I added the option to
specify the chunk separator.
**Issue:**
No corresponding issues.
**Dependencies:**
No dependencies required.
**Twitter handle:**
@hanama_chem
https://x.com/hanama_chem
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
parsed_json is expected to be a list of dictionaries, but it seems to…
be a single dictionary instead.
This is at
libs/experimental/langchain_experimental/graph_transformers/llm.py
process process_response
Thank you for contributing to LangChain!
- [ ] **Bugfix**: "experimental: bugfix"
---------
Co-authored-by: based <basir.sedighi@nris.no>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Cause chunks are joined by space, so they can't be found in text, and
the final `start_index` is very possibility to be -1.
- The simplest way is to use the natural index of the chunk as
`start_index`.
- **Description:** This change adds the ID field that's required in
Pinecone to the result documents of the similarity search method.
- **Issue:** Lack of document metadata namely the ID field
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
[langchain_core] Fix UnionType type var replacement
- Added types.UnionType to typing.Union mapping
Type replacement cause `TypeError: 'type' object is not subscriptable`
if any of union type comes as function `_py_38_safe_origin` return
`types.UnionType` instead of `typing.Union`
```python
>>> from types import UnionType
>>> from typing import Union, get_origin
>>> type_ = get_origin(str | None)
>>> type_
<class 'types.UnionType'>
>>> UnionType[(str, None)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable
>>> Union[(str, None)]
typing.Optional[str]
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: DeepInfra 500 errors have useful information in the text
field that isn't being exposed to the user. I updated the error message
to fix this.
As an example, this code
```
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
model = "meta-llama/Meta-Llama-3-70B-Instruct"
deepinfra_api_token = "..."
model = ChatDeepInfra(model=model, deepinfra_api_token=deepinfra_api_token)
messages = [HumanMessage("All work and no play makes Jack a dull boy\n" * 9000)]
response = model.invoke(messages)
```
Currently gives this error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server: Error 500
```
This change would give the following error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server error status 500: {"error":{"message":"Requested input length 99009 exceeds maximum input length 8192"}}
```
**Refactor PebbloRetrievalQA**
- Created `APIWrapper` and moved API logic into it.
- Created smaller functions/methods for better readability.
- Properly read environment variables.
- Removed unused code.
- Updated models
**Issue:** NA
**Dependencies:** NA
**tests**: NA
**Refactor PebbloSafeLoader**
- Created `APIWrapper` and moved API logic into it.
- Moved helper functions to the utility file.
- Created smaller functions and methods for better readability.
- Properly read environment variables.
- Removed unused code.
**Issue:** NA
**Dependencies:** NA
**tests**: Updated
limit the most recent documents to fetch from MongoDB database.
Thank you for contributing to LangChain!
- [ ] **limit the most recent documents to fetch from MongoDB
database.**: "langchain_mongodb: limit the most recent documents to
fetch from MongoDB database."
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added a doc_limit parameter which enables the limit
for the documents to fetch from MongoDB database
- **Issue:**
- **Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: The neo4j driver can raise a SessionExpired error, which is
considered a retriable error. If a query fails with a SessionExpired
error, this change retries every query once. This change will make the
neo4j integration less flaky.
Twitter handle: noahmay_
### Summary
Create `langchain-databricks` as a new partner packages. This PR does
not migrate all existing Databricks integration, but the package will
eventually contain:
* `ChatDatabricks` (implemented in this PR)
* `DatabricksVectorSearch`
* `DatabricksEmbeddings`
* ~`UCFunctionToolkit`~ (will be done after UC SDK work which
drastically simplify implementation)
Also, this PR does not add integration tests yet. This will be added
once the Databricks test workspace is ready.
Tagging @efriis as POC
### Tracker
[✍️] Create a package and imgrate ChatDatabricks
[ ] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
**Description:** Adding `BoxRetriever` for langchain_box. This retriever
handles two use cases:
* Retrieve all documents that match a full-text search
* Retrieve the answer to a Box AI prompt as a Document
**Twitter handle:** @BoxPlatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Updating metadata for sharepoint loader with full
path i.e., webUrl
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
Co-authored-by: dristy.cd <dristy@clouddefense.io>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
-Description: Adding new package: `langchain-box`:
* `langchain_box.document_loaders.BoxLoader` — DocumentLoader
functionality
* `langchain_box.utilities.BoxAPIWrapper` — Box-specific code
* `langchain_box.utilities.BoxAuth` — Helper class for Box
authentication
* `langchain_box.utilities.BoxAuthType` — enum used by BoxAuth class
- Twitter handle: @boxplatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
also remove some unused dependencies (fastapi) and unused test/lint/dev
dependencies (community, openai, textsplitters)
chromadb 0.5.4 introduced usage of `model_fields` which is pydantic v2
specific. also released in 0.5.5
This will allow complextype metadata to be returned. the current
implementation throws error when dealing with nested metadata
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Here we allow standard tests to specify a value for `tool_choice` via a
`tool_choice_value` property, which defaults to None.
Chat models [available in
Together](https://docs.together.ai/docs/chat-models) have issues passing
standard tool calling tests:
- llama 3.1 models currently [appear to rely on user-side
parsing](https://docs.together.ai/docs/llama-3-function-calling) in
Together;
- Mixtral-8x7B and Mistral-7B (currently tested) consistently do not
call tools in some tests.
Specifying tool_choice also lets us remove an existing `xfail` and use a
smaller model in Groq tests.
- **Description:** The following
[line](fd546196ef/libs/community/langchain_community/document_loaders/parsers/audio.py (L117))
in `OpenAIWhisperParser` returns a text object for some odd reason
despite the official documentation saying it should return `Transcript`
Instance which should have the text attribute. But for the example given
in the issue and even when I tried running on my own, I was directly
getting the text. The small PR accounts for that.
- **Issue:** : #25218
I was able to replicate the error even without the GenericLoader as
shown below and the issue was with `OpenAIWhisperParser`
```python
parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx",
response_format="srt",
temperature=0)
list(parser.lazy_parse(Blob.from_path('path_to_file.m4a')))
```
…he prompt in the create_stuff_documents_chain
Thank you for contributing to LangChain!
- [ ] **PR title**: "langchain:add document_variable_name in the
function _validate_prompt in create_stuff_documents_chain"
- [ ] **PR message**:
- **Description:** add document_variable_name in the function
_validate_prompt in create_stuff_documents_chain
- **Issue:** according to the description of
create_stuff_documents_chain function, the parameter
document_variable_name can be used to override the "context" in the
prompt, but in the function, _validate_prompt it still use DOCUMENTS_KEY
to check if it is a valid prompt, the value of DOCUMENTS_KEY is always
"context", so even through the user use document_variable_name to
override it, the code still tries to check if "context" is in the
prompt, and finally it reports error. so I use document_variable_name to
replace DOCUMENTS_KEY, the default value of document_variable_name is
"context" which is same as DOCUMENTS_KEY, but it can be override by
users.
- **Dependencies:** none
- **Twitter handle:** https://x.com/xjr199703
- [ ] **Add tests and docs**: none
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] NatbotChain: move to community, deprecate langchain version.
Update to use `prompt | llm | output_parser` instead of LLMChain.
- [x] LLMMathChain: deprecate + add langgraph replacement example to API
ref
- [x] HypotheticalDocumentEmbedder (retriever): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] FlareChain: update to use `prompt | llm | output_parser` instead
of LLMChain
- [x] ConstitutionalChain: deprecate + add langgraph replacement example
to API ref
- [x] LLMChainExtractor (document compressor): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] LLMChainFilter (document compressor): update to use `prompt | llm
| output_parser` instead of LLMChain
- [x] RePhraseQueryRetriever (retriever): update to use `prompt | llm |
output_parser` instead of LLMChain
Within the semantic chunker, when calling `_threshold_from_clusters`
there is the possibility for a divide by 0 error if the
`number_of_chunks` is equal to the length of `distances`.
Fix simply implements a check if these values match to prevent the error
and enable chunking to continue.
Remove the period after the hyperlink in the docstring of
BaseChatOpenAI.with_structured_output.
I have repeatedly copied the extra period at the end of the hyperlink,
which results in a "Page not found" page when pasted into the browser.
- **Description:**
This PR will slove error messages about `ValueError` when use model with
history.
Detail in #24660.
#22933 causes that
`langchain_core.runnables.history.RunnableWithMessageHistory._get_output_messages`
miss type check of `output_val` if `output_val` is `False`. After
running `RunnableWithMessageHistory._is_not_async`, `output` is `False`.
249945a572/libs/core/langchain_core/runnables/history.py (L323-L334)15a36dd0a2/libs/core/langchain_core/runnables/history.py (L461-L471)
~~I suggest that `_get_output_messages` return empty list when
`output_val == False`.~~
- **Issue**:
- #24660
- **Dependencies:**: No Change.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.
- fireworks
- voyage ai
- mistralai
- mistral ai
- together ai
- huggigng face
- pinecone
**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](ffe6ca986e)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue #24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.
@chrislrobert said in [this
comment](https://github.com/langchain-ai/langchain/issues/24740#issuecomment-2254168302)
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.
In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.
**Issue:** #24740 and #24064
**Dependencies:** No new dependencies for this change
**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

**Lint and test**: Passes the tests and the linter
This adds `args_schema` member to `SearxSearchResults` tool. This member
is already present in the `SearxSearchRun` tool in the same file.
I was having `TypeError: Type is not JSON serializable:
AsyncCallbackManagerForToolRun` being thrown in langserve playground
when I was using `SearxSearchResults` tool as a part of chain there.
This fixes the issue, so the error is not raised anymore.
This is a example langserve app that was giving me the error, but it
works properly after the proposed fix:
```python
#!/usr/bin/env python
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SearxSearchWrapper
from langchain_community.tools.searx_search.tool import SearxSearchResults
from langserve import add_routes
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
s = SearxSearchWrapper(searx_host="http://localhost:8080")
search = SearxSearchResults(wrapper=s)
search_chain = (
{"context": search, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
app = FastAPI()
add_routes(
app,
search_chain,
path="/chain",
)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=8000)
```
- **Description:** Standardize SparkLLM, include:
- docs, the issue #24803
- to support stream
- update api url
- model init arg names, the issue #20085
- **Description:** This PR implements the `bind_tool` functionality for
ChatZhipuAI as requested by the user. ChatZhipuAI models support tool
calling according to the `OpenAI` tool format, as outlined in their
official documentation [here](https://open.bigmodel.cn/dev/api#glm-4).
- **Issue:** ##23868
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
This PR deprecates the beta upsert APIs in vectorstore.
We'll introduce them in a V2 abstraction instead to keep the existing
vectorstore implementations lighter weight.
The main problem with the existing APIs is that it's a bit more
challenging to
implement the correct behavior w/ respect to IDs since ID can be present
in
both the function signature and as an optional attribute on the document
object.
But VectorStores that pass the standard tests should have implemented
the semantics properly!
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR gets rid `root_validators(allow_reuse=True)` logic used in
EdenAI Tool in preparation for pydantic 2 upgrade.
- add another test to secret_from_env_factory
**Description:**
The get time point method in the _consume() method of
core.rate_limiters.InMemoryRateLimiter uses time.time(), which can be
affected by system time backwards. Therefore, it is recommended to use
the monotonically increasing monotonic() to obtain the time
```python
with self._consume_lock:
now = time.time() # time.time() -> time.monotonic()
# initialize on first call to avoid a burst
if self.last is None:
self.last = now
elapsed = now - self.last # when use time.time(), elapsed may be negative when system time backwards
```
Thank you for contributing to LangChain!
- [X] **PR title**: "community: fix valueerror mentions wrong argument
missing"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [X] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** when faiss.py has a None relevance_score_fn it raises
a ValueError that says a normalize_fn_score argument is needed.
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Description: As described in the related issue: There is an error
occuring when using langchain-openai>=0.1.17 which can be attributed to
the following PR: #23691
Here, the parameter logprobs is added to requests per default.
However, AzureOpenAI takes issue with this parameter as stated here:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python-new&pivots=programming-language-chat-completions
-> "If you set any of these parameters, you get an error."
Therefore, this PR changes the default value of logprobs parameter to
None instead of False. This results in it being filtered before the
request is sent.
- Issue: #24880
- Dependencies: /
Co-authored-by: blaufink <sebastian.brueckner@outlook.de>
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.
Code mod generated using the following grit pattern:
```
engine marzano(0.1)
language python
`$X.__fields__` => `get_fields($X)` where {
add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
Migrate pydantic extra to literals
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
Add a utility that can be used as a default factory
The goal will be to start migrating from of the pydantic models to use
`from_env` as a default factory if possible.
```python
from pydantic import Field, BaseModel
from langchain_core.utils import from_env
class Foo(BaseModel):
name: str = Field(default_factory=from_env('HELLO'))
```
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.
Code mod generated using the following grit pattern:
```
engine marzano(0.1)
language python
`$X.__fields__` => `get_fields($X)` where {
add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
For business subscription the status is STOCKSBUSINESS not OK
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
## Description
This pull-request extends the existing vector search strategies of
MongoDBAtlasVectorSearch to include Hybrid (Reciprocal Rank Fusion) and
Full-text via new Retrievers.
There is a small breaking change in the form of the `prefilter` kwarg to
search. For this, and because we have now added a great deal of
features, including programmatic Index creation/deletion since 0.1.0, we
plan to bump the version to 0.2.0.
### Checklist
* Unit tests have been extended
* formatting has been applied
* One mypy error remains which will either go away in CI or be
simplified.
---------
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Instantiating `GPT4AllEmbeddings` with no
`gpt4all_kwargs` argument raised a `ValidationError`. Root cause: #21238
added the capability to pass `gpt4all_kwargs` through to the `GPT4All`
instance via `Embed4All`, but broke code that did not specify a
`gpt4all_kwargs` argument.
- **Issue:** #25119
- **Dependencies:** None
- **Twitter handle:** [`@metadaddy`](https://twitter.com/metadaddy)
This PR does an aesthetic sort of the config object attributes. This
will make it a bit easier to go back and forth between pydantic v1 and
pydantic v2 on the 0.3.x branch
Among integration packages in libs/partners, Groq is an exception in
that it errors on warnings.
Following https://github.com/langchain-ai/langchain/pull/25084, Groq
fails with
> pydantic.warnings.PydanticDeprecatedSince20: The `__fields__`
attribute is deprecated, use `model_fields` instead. Deprecated in
Pydantic V2.0 to be removed in V3.0.
Here we update the behavior to no longer fail on warning, which is
consistent with the rest of the packages in libs/partners.
**Description:**
In this PR, I am adding three stock market tools from
financialdatasets.ai (my API!):
- get balance sheets
- get cash flow statements
- get income statements
Twitter handle: [@virattt](https://twitter.com/virattt)
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Example: "community: Added bedrock 3-5 sonnet cost detials for
BedrockAnthropicTokenUsageCallbackHandler"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Naval Chand <navalchand@192.168.1.36>
- description: I remove the limitation of mandatory existence of
`QIANFAN_AK` and default model name which langchain uses cause there is
already a default model nama underlying `qianfan` SDK powering langchain
component.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- community: Allow authorization to Confluence with bearer token
- **Description:** Allow authorization to Confluence with [Personal
Access
Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html)
by checking for the keys `['client_id', token: ['access_token',
'token_type']]`
- **Issue:**
Currently the following error occurs when using an personal access token
for authorization.
```python
loader = ConfluenceLoader(
url=os.getenv('CONFLUENCE_URL'),
oauth2={
'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"},
'client_id': 'client_id',
},
page_ids=['12345678'],
)
```
```
ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"]
```
With this PR the loader runs as expected.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031
- **Dependencies:** n/a
- **Twitter handle:** @gramliu
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Fixes Neo4JVector.from_existing_graph integration with huggingface
Previously threw an error with existing databases, because
from_existing_graph query returns empty list of new nodes, which are
then passed to embedding function, and huggingface errors with empty
list.
Fixes [24401](https://github.com/langchain-ai/langchain/issues/24401)
---------
Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com>
You can use this with:
```
from langchain_experimental.graph_transformers import GlinerGraphTransformer
gliner = GlinerGraphTransformer(allowed_nodes=["Person", "Organization", "Nobel"], allowed_relationships=["EMPLOYEE", "WON"])
from langchain_core.documents import Document
text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
gliner.convert_to_graph_documents(documents)
```
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR adds a minimal document indexer abstraction.
The goal of this abstraction is to allow developers to create custom
retrievers that also have a standard indexing API and allow updating the
document content in them.
The abstraction comes with a test suite that can verify that the indexer
implements the correct semantics.
This is an iteration over a previous PRs
(https://github.com/langchain-ai/langchain/pull/24364). The main
difference is that we're sub-classing from BaseRetriever in this
iteration and as so have consolidated the sync and async interfaces.
The main problem with the current design is that runt time search
configuration has to be specified at init rather than provided at run
time.
We will likely resolve this issue in one of the two ways:
(1) Define a method (`get_retriever`) that will allow creating a
retriever at run time with a specific configuration.. If we do this, we
will likely break the subclass on BaseRetriever
(2) Generalize base retriever so it can support structured queries
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This PR fixes a bug where if `enable_dynamic_field` and
`partition_key_field` are enabled at the same time, a pymilvus error
occurs.
Milvus requires the partition key field to be a full schema defined
field, and not a dynamic one, so it will throw the error "the specified
partition key field {field} not exist" when creating the collection.
When `enabled_dynamic_field` is set to `True`, all schema field creation
based on `metadatas` is skipped. This code now checks if
`partition_key_field` is set, and creates the field.
Integration test added.
**Twitter handle:** StuartMarshUK
---------
Co-authored-by: Stuart Marsh <stuart.marsh@qumata.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** This PR makes the AthenaLoader profile_name optional
and fixes the type hint which says the type is `str` but it should be
`str` or `None` as None is handled in the loader init. This is a minor
problem but it just confused me when I was using the Athena Loader to
why we had to use a Profile, as I want that for local but not
production.
- **Issue:** #24957
- **Dependencies:** None.
Description: RetryWithErrorOutputParser.from_llm() creates a retry chain
that returns a Generation instance, when it should actually just return
a string.
This class was forgotten when fixing the issue in PR #24687
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Hardens index commands with try/except for free clusters and optional
waits for syncing and tests.
[efriis](https://github.com/efriis) These are the upgrades to the search
index commands (CRUD) that I mentioned.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces a module with some helper utilities for the pydantic
1 -> 2 migration.
They're meant to be used in the following way:
1) Use the utility code to get unit tests pass without requiring
modification to the unit tests
2) (If desired) upgrade the unit tests to match pydantic 2 output
3) (If desired) stop using the utility code
Currently, this module contains a way to map `schema()` generated by
pydantic 2 to (mostly) match the output from pydantic v1.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- **Description:**
Support ChatMlflow.bind_tools method
Tested in Databricks:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/fa28ef50-0110-4698-8eda-4faf6f0b9ef8">
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** This PR fixes a KeyError in NotionDBLoader when the
"name" key is missing in the "people" property.
**Issue:** Fixes#24223
**Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add compatibility for pydantic 2 for a utility function.
This will help push some small changes to master, so they don't have to
be kept track of on a separate branch.
The @pre_init validator is a temporary solution for base models. It has
similar (but not identical) semantics to @root_validator(), but it works
strictly as a pre-init validator.
It'll work as expected as long as the pydantic model type hints were
correct.
supports following UX
```python
class SubTool(TypedDict):
"""Subtool docstring"""
args: Annotated[Dict[str, Any], {}, "this does bar"]
class Tool(TypedDict):
"""Docstring
Args:
arg1: foo
"""
arg1: str
arg2: Union[int, str]
arg3: Optional[List[SubTool]]
arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"]
arg5: Annotated[Optional[float], None]
```
- can parse google style docstring
- can use Annotated to specify default value (second arg)
- can use Annotated to specify arg description (third arg)
- can have nested complex types
This PR adds annotations in comunity package.
Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.
This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
Title: [pebblo_retrieval] Identifying entities in prompts given in
PebbloRetrievalQA leading to prompt governance
Description: Implemented identification of entities in the prompt using
Pebblo prompt governance API.
Issue: NA
Dependencies: NA
Add tests and docs: NA
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:**
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
- Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
Description: The old method will be discontinued; use the official SDK
for more model options.
Issue: None
Dependencies: None
Twitter handle: None
Co-authored-by: trumanyan <trumanyan@tencent.com>
PR title: Experimental: Add config to convert_to_graph_documents
Description: In order to use langfuse, i need to pass the langfuse
configuration when invoking the chain. langchain_experimental does not
allow to add any parameters (beside the documents) to the
convert_to_graph_documents method. This way, I cannot monitor the chain
in langfuse.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Catarina Franco <catarina.franco@criticalsoftware.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>