This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
Fixes#18513.
## Description
This PR attempts to fix the support for Anthropic Claude v3 models in
BedrockChat LLM. The changes here has updated the payload to use the
`messages` format instead of the formatted text prompt for all models;
`messages` API is backwards compatible with all models in Anthropic, so
this should not break the experience for any models.
## Notes
The PR in the current form does not support the v3 models for the
non-chat Bedrock LLM. This means, that with these changes, users won't
be able to able to use the v3 models with the Bedrock LLM. I can open a
separate PR to tackle this use-case, the intent here was to get this out
quickly, so users can start using and test the chat LLM. The Bedrock LLM
classes have also grown complex with a lot of conditions to support
various providers and models, and is ripe for a refactor to make future
changes more palatable. This refactor is likely to take longer, and
requires more thorough testing from the community. Credit to PRs
[18579](https://github.com/langchain-ai/langchain/pull/18579) and
[18548](https://github.com/langchain-ai/langchain/pull/18548) for some
of the code here.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:
```python
class DocArrayDoc(BaseDoc):
text: Optional[str]
embedding: Optional[NdArray] = Field(**embeddings_params)
metadata: Optional[dict]
```
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.
This is a PR that's meant to raise user awareness that the pickling module is involved.
This is a patch for `CVE-2024-2057`:
https://www.cve.org/CVERecord?id=CVE-2024-2057
This affects users that:
* Use the `TFIDFRetriever`
* Attempt to de-serialize it from an untrusted source that contains a
malicious payload
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0
Added a unit test.
Description:
This pull request addresses two key improvements to the langchain
repository:
**Fix for Crash in Flight Search Interface**:
Previously, the code would crash when encountering a failure scenario in
the flight ticket search interface. This PR resolves this issue by
implementing a fix to handle such scenarios gracefully. Now, the code
handles failures in the flight search interface without crashing,
ensuring smoother operation.
**Documentation Update for Amadeus Toolkit**:
Prior to this update, examples provided in the documentation for the
Amadeus Toolkit were unable to run correctly due to outdated
information. This PR includes an update to the documentation, ensuring
that all examples can now be executed successfully. With this update,
users can effectively utilize the Amadeus Toolkit with accurate and
functioning examples.
These changes aim to enhance the reliability and usability of the
langchain repository by addressing issues related to error handling and
ensuring that documentation remains up-to-date and actionable.
Issue: https://github.com/langchain-ai/langchain/issues/17375
Twitter Handle: SingletonYxx
### Description
Changed the value specified for `content_key` in JSONLoader from a
single key to a value based on jq schema.
I created [similar
PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it
has several conflicts because of the architectural change associated
stable version release, so I re-create this PR to fit new architecture.
### Why
For json data like the following, specify `.data[].attributes.message`
for page_content and `.data[].attributes.id` or
`.data[].attributes.attributes. tags`, etc., the `content_key` must also
parse the json structure.
<details>
<summary>sample json data</summary>
```json
{
"data": [
{
"attributes": {
"message": "message1",
"tags": [
"tag1"
]
},
"id": "1"
},
{
"attributes": {
"message": "message2",
"tags": [
"tag2"
]
},
"id": "2"
}
]
}
```
</details>
<details>
<summary>sample code</summary>
```python
def metadata_func(record: dict, metadata: dict) -> dict:
metadata["source"] = None
metadata["id"] = record.get("id")
metadata["tags"] = record["attributes"].get("tags")
return metadata
sample_file = "sample1.json"
loader = JSONLoader(
file_path=sample_file,
jq_schema=".data[]",
content_key=".attributes.message", ## content_key is parsable into jq schema
is_content_key_jq_parsable=True, ## this is added parameter
metadata_func=metadata_func
)
data = loader.load()
data
```
</details>
### Dependencies
none
### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
## **Description**
Migrate the `MongoDBChatMessageHistory` to the managed
`langchain-mongodb` partner-package
## **Dependencies**
None
## **Twitter handle**
@mongodb
## **tests and docs**
- [x] Migrate existing integration test
- [x ]~ Convert existing integration test to a unit test~ Creation is
out of scope for this ticket
- [x ] ~Considering delaying work until #17470 merges to leverage the
`MockCollection` object. ~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>