**Description**:
Adds a vector store integration with
[sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to
sqlite-vss that is a single C file with no external dependencies.
Pretty straightforward, just copy-pasted the sqlite-vss integration and
made a few tweaks and added integration tests. Only question is whether
all documentation should be directed away from sqlite-vss if it is
defacto deprecated (cc @asg017).
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
Description:
- Add system templates and user templates in integration testing
- initialize the response id field value to request_id
- Adjust the default model to hunyuan-pro
- Remove the default values of Temperature and TopP
- Add SystemMessage
all the integration tests have passed.
1、Execute integration tests for the first time
<img width="1359" alt="71ca77a2-e9be-4af6-acdc-4d665002bd9b"
src="https://github.com/user-attachments/assets/9298dc3a-aa26-4bfa-968b-c011a4e699c9">
2、Run the integration test a second time
<img width="1501" alt="image"
src="https://github.com/user-attachments/assets/61335416-4a67-4840-bb89-090ba668e237">
Issue: None
Dependencies: None
Twitter handle: None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds Intel GPU support to `ipex-llm` llm integration.
**Dependencies:** `ipex-llm`
**Contribution maintainer**: @ivy-lv11 @Oscilloscope98
**tests and docs**:
- Add: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/libs/community/tests/llms/test_ipex_llm.py
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
it fixes two issues:
### YGPTs are broken #25575
```
File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs)
..
--> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata) # type: ignore[attr-defined]
AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata'
```
My gut feeling that #23841 is the cause.
I have to drop leading underscore from `_grpc_metadata` for quickfix,
but I just don't know how to do it _pydantic_ enough.
### minor issue:
if we use `api_key`, which is not the best practice the code fails with
```
File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values)
...
AttributeError: 'tuple' object has no attribute 'append'
```
- Added new integration test. But it requires YGPT env available and
active account. I don't know how int tests dis\enabled in CI.
- added small unit tests with mocks. Should be fine.
---------
Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>
Added Azure Search Access Token Authentication instead of API KEY auth.
Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263
Dependencies: None
Twitter: @levalencia
@baskaryan
Could you please review? First time creating a PR that fixes some code.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces adjustments to ensure compatibility with the recently
released preview version of [TiDB Serverless Vector
Search](https://tidb.cloud/ai), aiming to prevent user confusion.
- TiDB Vector now supports vector indexing with cosine and l2 distance
strategies, although inner_product remains unsupported.
- Changing the distance strategy is currently not supported, so the test
cased should be adjusted.
- [x] NatbotChain: move to community, deprecate langchain version.
Update to use `prompt | llm | output_parser` instead of LLMChain.
- [x] LLMMathChain: deprecate + add langgraph replacement example to API
ref
- [x] HypotheticalDocumentEmbedder (retriever): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] FlareChain: update to use `prompt | llm | output_parser` instead
of LLMChain
- [x] ConstitutionalChain: deprecate + add langgraph replacement example
to API ref
- [x] LLMChainExtractor (document compressor): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] LLMChainFilter (document compressor): update to use `prompt | llm
| output_parser` instead of LLMChain
- [x] RePhraseQueryRetriever (retriever): update to use `prompt | llm |
output_parser` instead of LLMChain
- **Description:** Standardize SparkLLM, include:
- docs, the issue #24803
- to support stream
- update api url
- model init arg names, the issue #20085
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.
Code mod generated using the following grit pattern:
```
engine marzano(0.1)
language python
`$X.__fields__` => `get_fields($X)` where {
add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
- description: I remove the limitation of mandatory existence of
`QIANFAN_AK` and default model name which langchain uses cause there is
already a default model nama underlying `qianfan` SDK powering langchain
component.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR adds annotations in comunity package.
Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.
This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
Description: The old method will be discontinued; use the official SDK
for more model options.
Issue: None
Dependencies: None
Twitter handle: None
Co-authored-by: trumanyan <trumanyan@tencent.com>
## Description
This PR:
- Fixes the validation error in `FastEmbedEmbeddings`.
- Adds support for `batch_size`, `parallel` params.
- Removes support for very old FastEmbed versions.
- Updates the FastEmbed doc with the new params.
Associated Issues:
- Resolves#24039
- Resolves #https://github.com/qdrant/fastembed/issues/296
**Description:**
- This PR exposes some functions in VDMS vectorstore, updates VDMS
related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20)
**Issue:** N/A
**Dependencies:**
- Update vdms>=0.0.20
Thank you for contributing to LangChain!
- This PR adds vector search filtering for Azure Cosmos DB Mongo vCore
and NoSQL.
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
### Description
This pull request added new document loaders to load documents of
various formats using [Dedoc](https://github.com/ispras/dedoc):
- `DedocFileLoader` (determine file types automatically and parse)
- `DedocPDFLoader` (for `PDF` and images parsing)
- `DedocAPIFileLoader` (determine file types automatically and parse
using Dedoc API without library installation)
[Dedoc](https://dedoc.readthedocs.io) is an open-source library/service
that extracts texts, tables, attached files and document structure
(e.g., titles, list items, etc.) from files of various formats. The
library is actively developed and maintained by a group of developers.
`Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images
and more.
Full list of supported formats can be found
[here](https://dedoc.readthedocs.io/en/latest/#id1).
For `PDF` documents, `Dedoc` allows to determine textual layer
correctness and split the document into paragraphs.
### Issue
This pull request extends variety of document loaders supported by
`langchain_community` allowing users to choose the most suitable option
for raw documents parsing.
### Dependencies
The PR added a new (optional) dependency `dedoc>=2.2.5` ([library
documentation](https://dedoc.readthedocs.io)) to the
`extended_testing_deps.txt`
### Twitter handle
None
### Add tests and docs
1. Test for the integration:
`libs/community/tests/integration_tests/document_loaders/test_dedoc.py`
2. Example notebook:
`docs/docs/integrations/document_loaders/dedoc.ipynb`
3. Information about the library:
`docs/docs/integrations/providers/dedoc.mdx`
### Lint and test
Done locally:
- `make format`
- `make lint`
- `make integration_tests`
- `make docs_build` (from the project root)
---------
Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>
- **Description:** `QianfanChatEndpoint` When using tool result to
answer questions, the content of the tool is required to be in Dict
format. Of course, this can require users to return Dict format when
calling the tool, but in order to be consistent with other Chat Models,
I think such modifications are necessary.
Regardless of whether `embedding_func` is set or not, the 'text'
attribute of document should be assigned, otherwise the `page_content`
in the document of the final search result will be lost
The `MongoDBStore` can manage only documents.
It's not possible to use MongoDB for an `CacheBackedEmbeddings`.
With this new implementation, it's possible to use:
```python
CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=embeddings,
document_embedding_cache=MongoDBByteStore(
connection_string=db_uri,
db_name=db_name,
collection_name=collection_name,
),
)
```
and use MongoDB to cache the embeddings !
- **Description:** Add a `KeybertLinkExtractor` for graph vectorstores.
This allows extracting links from keywords in a Document and linking
nodes that have common keywords.
- **Issue:** None
- **Dependencies:** None.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
- Added masking of the API Keys for the modules:
- `langchain/chat_models/openai.py`
- `langchain/llms/openai.py`
- `langchain/llms/google_palm.py`
- `langchain/chat_models/google_palm.py`
- `langchain/llms/edenai.py`
- Updated the modules to utilize `SecretStr` from pydantic to securely
manage API key.
- Added unit/integration tests
- `langchain/chat_models/asure_openai.py` used the `open_api_key` that
is derived from the `ChatOpenAI` Class and it was assuming
`openai_api_key` is a str so we changed it to expect `SecretStr`
instead.
**Issue:** https://github.com/langchain-ai/langchain/issues/12165 ,
**Dependencies:** none,
**Tag maintainer:** @eyurtsev
---------
Co-authored-by: HassanA01 <anikeboss@gmail.com>
Co-authored-by: Aneeq Hassan <aneeq.hassan@utoronto.ca>
Co-authored-by: kristinspenc <kristinspenc2003@gmail.com>
Co-authored-by: faisalt14 <faisalt14@gmail.com>
Co-authored-by: Harshil-Patel28 <76663814+Harshil-Patel28@users.noreply.github.com>
Co-authored-by: kristinspenc <146893228+kristinspenc@users.noreply.github.com>
Co-authored-by: faisalt14 <90787271+faisalt14@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.
- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.
**Issue:** fixes#19735
**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.
Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** At the moment neo4j wrapper is using setVectorProperty,
which is deprecated
([link](https://neo4j.com/docs/operations-manual/5/reference/procedures/#procedure_db_create_setVectorProperty)).
I replaced with the non-deprecated version.
Neo4j recently introduced a new cypher method to associate embeddings
into relations using "setRelationshipVectorProperty" method. In this PR
I also implemented a new method to perform this association maintaining
the same format used in the "add_embeddings" method which is used to
associate embeddings into Nodes.
I also included a test case for this new method.
Thank you for contributing to LangChain!
- [X] *ApertureDB as vectorstore**: "community: Add ApertureDB as a
vectorestore"
- **Description:** this change provides a new community integration that
uses ApertureData's ApertureDB as a vector store.
- **Issue:** none
- **Dependencies:** depends on ApertureDB Python SDK
- **Twitter handle:** ApertureData
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Integration tests rely on a local run of a public docker image.
Example notebook additionally relies on a local Ollama server.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
All lint tests pass.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Gautam <gautam@aperturedata.io>