Commit Graph

12236 Commits

Author SHA1 Message Date
Zapiron
0c11aee486
docs: small grammar changes for conceptual guide page (#28765)
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-17 14:27:55 +00:00
German Martin
3a1d05394d
community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759)
**Description:**

The Apache AGE graph integration incorrectly handled node merging,
allowing duplicate nodes with different IDs but the same type and other
properties. Unlike
[Neo4j](cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)),
[Memgraph](cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)),
[Kuzu](cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)),
and
[Gremlin](cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)),
it did not use the node ID as the primary identifier for merging.

This inconsistency caused data integrity issues and unexpected behavior
when users expected updates to specific nodes by ID.

**Solution:**
This PR modifies the `node_insert_query` to `MERGE` nodes based on label
and ID *only* and updates properties with `SET`, aligning the behavior
with other graph database integrations. The `_format_properties` method
was also modified to handle id overrides.

**Impact:**

This fix ensures data integrity by preventing duplicate nodes, and
provides a consistent behavior across graph database integrations.
2024-12-17 09:21:59 -05:00
gsa9989
cdf6202156
cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424)
* Added Cosmos DB NoSQL Semantic Cache Integration with tests and
jupyter notebook

---------

Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 21:57:05 -05:00
Brian Burgin
27a9056725
community: Fix ChatLiteLLMRouter runtime issues (#28163)
**Description:** Fix ChatLiteLLMRouter ctor validation and model_name
parameter
**Issue:** #19356, #27455, #28077
**Twitter handle:** @bburgin_0
2024-12-16 18:17:39 -05:00
Eugene Yurtsev
234d49653a
docs: Create custom embeddings (#20398)
Guidelines on how to create custom embeddings

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 17:57:51 -05:00
Mikhail Khludnev
00deacc67e
docs, external: introduce langchain-localai (#28751)
Thank you for contributing to LangChain!

Referring to https://github.com/mkhludnev/langchain-localai

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 22:22:37 +00:00
Erick Friis
d4b5e7ef22
community: recommend RedisVectorStore over Redis (#28749) 2024-12-16 21:08:30 +00:00
Hiros
8f5e72de05
community: Correctly handle multi-element rich text (#25762)
**Description:**

- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.

 **Issue:**

Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.

**Dependencies:** any dependencies required for this change
None.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 20:20:27 +00:00
Antonio Lanza
b2102b8cc4
text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782)
This PR closes #27781

# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize

output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success.        Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```

# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-16 19:53:15 +00:00
Tari Yekorogha
d262d41cc0
community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245)
**Description:** Added support for FalkorDB Vector Store, including its
implementation, unit tests, documentation, and an example notebook. The
FalkorDB integration allows users to efficiently manage and query
embeddings in a vector database, with relevance scoring and maximal
marginal relevance search. The following components were implemented:

- Core implementation for FalkorDBVector store.
- Unit tests ensuring proper functionality and edge case coverage.
- Example notebook demonstrating an end-to-end setup, search, and
retrieval using FalkorDB.

**Twitter handle:** @tariyekorogha

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:37:55 +00:00
Aaron Pham
12fced13f4
chore(community): update to OpenLLM 0.6 (#24609)
Update to OpenLLM 0.6, which we decides to make use of OpenLLM's
OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin
wrapper around OpenAI wrapper.

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

---------

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 14:30:07 -05:00
Lvlvko
5c17a4ace9
community: support Hunyuan Embedding (#23160)
## description

- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application

## Dependencies
- tencentcloud-sdk-python

---------

Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:27:19 +00:00
Harrison Chase
de7996c2ca
core: add kwargs support to VectorStore (#25934)
has been missing the passthrough until now

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 18:57:57 +00:00
Henry Tu
87c50f99e5
docs: cerebras Update Llama 3.1 70B to Llama 3.3 70B (#28746)
This PR updates the docs for the Cerebras integration to use Llama 3.3
70b instead of Llama 3.1 70b.

cc: @efriis
2024-12-16 18:43:35 +00:00
Lorenzo
b79a1156ed
community: correct return type of get_files_from_directory in github tool (#27885)
### About:
- **Description:** the _get_files_from_directory_ method return a
string, but it's used in other methods that expect a List[str]
- **Issue:** None
- **Dependencies:** None

This pull request import a new method _list_files_ with the old logic of
_get_files_from_directory_, but it return a List[str] at the end.
The behavior of _ get_files_from_directory_ is not changed.
2024-12-16 10:30:33 -08:00
Sheepsta300
580a8d53f9
community: Add configurable VisualFeatures to the AzureAiServicesImageAnalysisTool (#27444)
Thank you for contributing to LangChain!

- [ ] **PR title**: community: Add configurable `VisualFeatures` to the
`AzureAiServicesImageAnalysisTool`


- [ ] **PR message**:  
- **Description:** The `AzureAiServicesImageAnalysisTool` is a good
service and utilises the Azure AI Vision package under the hood.
However, since the creation of this tool, new `VisualFeatures` have been
added to allow the user to request other image specific information to
be returned. Currently, the tool offers neither configuration of which
features should be return nor does it offer any newer feature types. The
aim of this PR is to address this and expose more of the Azure Service
in this integration.
- **Dependencies:** no new dependencies in the main class file,
azure.ai.vision.imageanalysis added to extra test dependencies file.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Although no tests exist for already implemented Azure Service tools,
I've created 3 unit tests for this class that test initialisation and
credentials, local file analysis and a test for the new changes/
features option.


- [ ] **Lint and test**: All linting has passed.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 18:30:04 +00:00
Erick Friis
1c120e9615
core: xml output parser tags docstring (#28745) 2024-12-16 18:25:16 +00:00
Ana
ebab2ea81b
Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843)
This pull request addresses the issue with authenticating Azure National
Cloud using token (RBAC) in the AzureSearch vectorstore implementation.

## Changes

- Modified the `_get_search_client` method in `azuresearch.py` to pass
`additional_search_client_options` to the `SearchIndexClient` instance.

## Implementation Details

The patch updates the `SearchIndexClient` initialization to include the
`additional_search_client_options` parameter:

```python
index_client: SearchIndexClient = SearchIndexClient(
    endpoint=endpoint,
    credential=credential,
    user_agent=user_agent,
    **additional_search_client_options
)
```

This change allows the `audience` parameter to be correctly passed when
using Azure National Cloud, fixing the authentication issues with
GovCloud & RBAC.

This patch was generated by [Ana - AI SDE](https://openana.ai/), an
AI-powered software development assistant.

This is a fix for [Issue
25823](https://github.com/langchain-ai/langchain/issues/25823)

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 18:22:24 +00:00
chenzimin
169d419581
community: Remove all other keys in ChatLiteLLM and add api_key (#28097)
Thank you for contributing to LangChain!

- **PR title**: "community: Remove all other keys in ChatLiteLLM and add
api_key"


- **PR message**: Currently, no api_key are passed to LiteLLM, and
LiteLLM only takes on api_key parameter. Therefore I removed all current
`*_api_key` attributes (They are not used), and added `api_key` that is
passed to ChatLiteLLM.
  - Should fix issue #27826

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 17:54:29 +00:00
German Martin
d5d18c62b3
community: Apache AGE wrapper additional edge cases. (#28151)
Description: 
Current AGEGraph() implementation does some custom wrapping for graph
queries. The method here is _wrap_query() as it parse the field from the
original query to add some SQL context to it.
This improves the current parsing logic to cover additional edge cases
that are added to the test coverage, basically if any Node property name
or value has the "return" literal in it will break the graph / SQL
query.
We discovered this while dealing with real world datasets, is not an
uncommon scenario and I think it needs to be covered.
2024-12-16 11:28:01 -05:00
Rock2z
768e4a7fd4
[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316)
**Description:**

This PR addresses the `TypeError: sequence item 0: expected str
instance, FluentValue found` error when invoking `WikidataQueryRun`. The
root cause was an incompatible version of the
`wikibase-rest-api-client`, which caused the tool to fail when handling
`FluentValue` objects instead of strings.

The current implementation only supports `wikibase-rest-api-client<0.2`,
but the latest version is `0.2.1`, where the current implementation
breaks. Additionally, the error message advises users to install the
latest version: [code
reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32).
Therefore, this PR updates the tool to support the latest version of
`wikibase-rest-api-client`.

Key changes:
- Updated the handling of `FluentValue` objects to ensure compatibility
with the latest `wikibase-rest-api-client`.
- Removed the restriction to `wikibase-rest-api-client<0.2` and updated
to support the latest version (`0.2.1`).

**Issue:**

Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) –
`TypeError: sequence item 0: expected str instance, FluentValue found`.

**Dependencies:**

- Upgraded `wikibase-rest-api-client` to the latest version to resolve
the issue.

---------

Co-authored-by: peiwen_zhang <peiwen_zhang@email.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 16:22:18 +00:00
André Quintino
a26c786bc5
community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653)
- **Description:** Changed the comparator to use a wildcard query
instead of match. This modification allows for partial text matching on
analyzed fields, which improves the flexibility of the search by
performing full-text searches that aren't limited to exact matches.
- **Issue:** The previous implementation used a match query, which
performs exact matches on analyzed fields. This approach limited the
search capabilities by requiring the query terms to align with the
indexed text. The modification to use a wildcard query instead addresses
this limitation. The wildcard query allows for partial text matching,
which means the search can return results even if only a portion of the
term matches the text. This makes the search more flexible and suitable
for use cases where exact matches aren't necessary or expected, enabling
broader full-text searches across analyzed fields.
In short, the problem was that match queries were too restrictive, and
the change to wildcard queries enhances the ability to perform partial
matches.
- **Dependencies:** none
- **Twitter handle:** @Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 11:16:34 -05:00
Davi Schumacher
0f9b4bf244
community[patch]: update dynamodb chat history to update instead of overwrite (#22397)
**Description:**
The current implementation of `DynamoDBChatMessageHistory` updates the
`History` attribute for a given chat history record by first extracting
the existing contents into memory, appending the new message, and then
using the `put_item` method to put the record back. This has the effect
of overwriting any additional attributes someone may want to include in
the record, like chat session metadata.

This PR suggests changing from using `put_item` to using `update_item`
instead which will keep any other attributes in the record untouched.
The change is backward compatible since
1. `update_item` is an "upsert" operation, creating the record if it
doesn't already exist, otherwise updating it
2. It only touches the db insert call and passes the exact same
information. The rest of the class is left untouched

**Dependencies:**
None

**Tests and docs:**
No unit tests currently exist for the `DynamoDBChatMessageHistory`
class. This PR adds the file
`libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py`
to test the `add_message` and `clear` methods. I wanted to use the moto
library to mock DynamoDB calls but I could not get poetry to resolve it
so I mocked those calls myself in the test. Therefore, no test
dependencies were added.

The change was tested on a test DynamoDB table as well. The first three
images below show the current behavior. First a message is added to chat
history, then a value is inserted in the record in some other attribute,
and finally another message is added to the record, destroying the other
attribute.

![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21)

![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92)

![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a)

The next three images show the new behavior. Once again a value is added
to an attribute other than the History attribute, but now when the
followup message is added it does not destroy that other attribute. The
History attribute itself is unaffected by this change.

![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634)

![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729)

![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff)

The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb`
required no changes and was tested as well.
2024-12-16 10:38:00 -05:00
Christophe Bornet
6ddd5dbb1e
community: Add FewShotSQLTool (#28232)
The `FewShotSQLTool` gets some SQL query examples from a
`BaseExampleSelector` for a given question.
This is useful to provide [few-shot
examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples)
capability to an SQL agent.

Example usage:
```python
from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX

embeddings = OpenAIEmbeddings()

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    AstraDB,
    k=5,
    input_keys=["input"],
    collection_name="lc_few_shots",
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
)

few_shot_sql_tool = FewShotSQLTool(
    example_selector=example_selector,
    description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!"
)

agent = create_sql_agent(
    llm=llm, 
    db=db, 
    prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", 
    extra_tools=[few_shot_sql_tool]
)

result = agent.invoke({"input": "How many artists are there?"})
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 15:37:21 +00:00
Mohammad Mohtashim
8d746086ab
Added bind_tools support for ChatMLX along with small fix in _stream (#28743)
- **Description:** Added Support for `bind_tool` as requested in the
issue. Plus two issue in `_stream` were fixed:
    - Corrected the Positional Argument Passing for `generate_step`
    - Accountability if `token` returned by `generate_step` is integer.
- **Issue:** #28692
2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz
558b65ea32
community: SamabaStudio Tool Calling and Structured Output (#28025)
Description: Add tool calling and structured output support for
SambaStudio chat models, docs included

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 06:15:19 +00:00
Hristiyan Genchev
c87f24d85d
docs: correct variable name from formatted_docs to docs_content (#28735)
- **Description:** Fixed incorrect variable name formatted_docs,
replacing it with docs_content to ensure correct functionality.
2024-12-15 22:09:58 -08:00
clairebehue
fb44e74ca4
community: fix AzureSearch Oauth with azure_ad_access_token (#26995)
**Description:** 
AzureSearch vector store: create a wrapper class on
`azure.core.credentials.TokenCredential` (which is not-instantiable) to
fix Oauth usage with `azure_ad_access_token` argument

**Issue:** [the issue it
fixes](https://github.com/langchain-ai/langchain/issues/26216)

 **Dependencies:** None

- [x] **Lint and test**

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:56:45 +00:00
SirSmokeAlot
29305cd948
community: O365Toolkit - send_event - fixed timezone error (#25876)
**Description**: Fixed formatting start and end time
**Issue**: The old formatting resulted everytime in an timezone error
**Dependencies**: /
**Twitter handle**: /

---------

Co-authored-by: Yannick Opitz <yannick.opitz@gob.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:32:28 +00:00
Erick Friis
4f6ccb7080
text-splitters: extended-tests without socket (#28736) 2024-12-16 05:19:50 +00:00
Erick Friis
8ec1c72e03
text-splitters: test without socket (#28732) 2024-12-15 22:10:35 +00:00
Tibor Reiss
690aa02c31
docs[experimental]: Make docs clearer and add min_chunk_size (#26398)
Fixes #26171:

- added some clarification text for the keyword argument
`breakpoint_threshold_amount`
- added min_chunk_size: together with `breakpoint_threshold_amount`, too
small/big chunk sizes can be avoided

Note: the langchain-experimental was moved to a separate repo, so only
the doc change stays here.
2024-12-15 13:43:48 -08:00
Aayush Kataria
d417e4b372
Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716)
Thank you for contributing to LangChain!

- Added [full
text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search)
and [hybrid
search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search)
support for Azure CosmosDB NoSql Vector Store
- Added a new enum called CosmosDBQueryType which supports the following
values:
    - VECTOR = "vector"
    - FULL_TEXT_SEARCH = "full_text_search"
    - FULL_TEXT_RANK = "full_text_rank"
    - HYBRID = "hybrid"
- User now needs to provide this query_type to the similarity_search
method for the vectorStore to make the correct query api call.
- Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID
query functions we don't support parameterized queries right now. I have
added TODO's in place, and will remove these work arounds by end of
January.
- Added necessary test cases and updated the 


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-15 13:26:32 -08:00
Mohammad Mohtashim
4c1871d9a8
community: Passing the model_kwargs correctly while maintaing backward compatability (#28439)
- **Description:** `Model_Kwargs` was not being passed correctly to
`sentence_transformers.SentenceTransformer` which has been corrected
while maintaing backward compatability
- **Issue:** #28436

---------

Co-authored-by: MoosaTae <sadhis.tae@gmail.com>
Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-15 20:34:29 +00:00
nhols
a3851cb3bc
community: FAISS vectorstore - consistent Document id field (#28728)
make sure id field of Documents in `FAISS` docstore have the same id as
values in `index_to_docstore_id`, implement `get_by_ids` method
2024-12-15 12:23:49 -08:00
Bagatur
a0534ae62a
community[patch]: Release 0.3.12 (#28725) 2024-12-14 22:13:20 +00:00
Bagatur
089e659e03
langchain[patch]: Release 0.3.12 (#28724) 2024-12-14 20:02:18 +00:00
Bagatur
679e3a9970
text-splitters[patch]: Release 0.3.3 (#28723) 2024-12-14 19:20:22 +00:00
ccurme
23b433f683
infra: fix notebook tests (#28722)
Bump unstructured to pick up resolution of
https://github.com/Unstructured-IO/unstructured/issues/3795
2024-12-14 15:13:19 +00:00
Erick Friis
387284c259
core: release 0.3.25 (#28718) 2024-12-14 02:22:28 +00:00
Nawaf Alharbi
decd77c515
community: fix an issue with deepinfra integration (#28715)
Thank you for contributing to LangChain!

- [x] **PR title**: langchain: add URL parameter to ChatDeepInfra class

- [x] **PR message**: add URL parameter to ChatDeepInfra class
- **Description:** This PR introduces a url parameter to the
ChatDeepInfra class in LangChain, allowing users to specify a custom
URL. Previously, the URL for the DeepInfra API was hardcoded to
"https://stage.api.deepinfra.com/v1/openai/chat/completions", which
caused issues when the staging endpoint was not functional. The _url
method was updated to return the value from the url parameter, enabling
greater flexibility and addressing the problem. out!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:15:29 +00:00
Ben Chambers
008efada2c
[community]: Render documents to graphviz (#24830)
- **Description:** Adds a helper that renders documents with the
GraphVectorStore metadata fields to Graphviz for visualization. This is
helpful for understanding and debugging.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:02:09 +00:00
Leonid Ganeline
fc8006121f
docs: integrations W&B update (#28059)
Issue: Here is an ambiguity about W&B integrations. There are two
existing provider pages.
Fix: Added the "root" W&B provider page. Added there the references to
the documentation in the W&B site. Cleaned up formats in existing pages.
Added one more integration reference.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-14 00:47:16 +00:00
Erick Friis
288f204758
docs, community: aerospike docs update (#28717)
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Jesse S <jschmidt@aerospike.com>
Co-authored-by: dylan <dwelch@aerospike.com>
2024-12-14 00:27:37 +00:00
ronidas39
bd008baee0
docs: Update additional_resources/tutorials.mdx (#28005)
Added Langchain complete tutorial playlist from total technology zonne
channel .In this playlist every video is focusing one specific use case
and hands on demo.All tutorials are equally good for every levels .

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 00:21:30 +00:00
Vimpas
337fed80a5
community: 🐛 PDF Filter Type Error (#27154)
Thank you for contributing to LangChain!

 **PR title**: "community: fix  PDF Filter Type Error"


  - **Description:** fix  PDF Filter Type Error"
  - **Issue:** the issue #27153 it fixes,
  - **Dependencies:** no
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 23:30:29 +00:00
Ryan Parker
12111cb922
community: fallback on core async atransform_documents method for MarkdownifyTransformer (#27866)
# Description
Implements the `atransform_documents` method for
`MarkdownifyTransformer` using the `asyncio` built-in library for
concurrency.

Note that this is mainly for API completeness when working with async
frameworks rather than for performance, since the `markdownify` function
is not I/O bound because it works with `Document` objects already in
memory.

# Issue
Fixes #27865

# Dependencies
No new dependencies added, but
[`markdownify`](https://github.com/matthewwithanm/python-markdownify) is
required since this PR updates the `markdownify` integration.

# Tests and docs
- Tests added
- I did not modify the docstrings since they already described the basic
functionality, and [the API docs also already included a
description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents).
If it would be helpful, I would be happy to update the docstrings and/or
the API docs.

# Lint and test
- [x] format
- [x] lint
- [x] test

I ran formatting with `make format`, linting with `make lint`, and
confirmed that tests pass using `make test`. Note that some unit tests
pass in CI but may fail when running `make_test`. Those unit tests are:
- `test_extract_html` (and `test_extract_html_async`)
- `test_strip_tags` (and `test_strip_tags_async`)
- `test_convert_tags` (and `test_convert_tags_async`)

The reason for the difference is that there are trailing spaces when the
tests are run in the CI checks, and no trailing spaces when run with
`make test`. I ensured that the tests pass in CI, but they may fail with
`make test` due to the addition of trailing spaces.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:32:22 +00:00
Manuel
af2e0a7ede
partners: add 'model' alias for consistency in embedding classes (#28374)
**Description:** This PR introduces a `model` alias for the embedding
classes that contain the attribute `model_name`, to ensure consistency
across the codebase, as suggested by a moderator in a previous PR. The
change aligns the usage of attribute names across the project (see for
example
[here](65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304))).
**Issue:** This PR addresses the suggestion from the review of issue
#28269.
**Dependencies:**  None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:30:00 +00:00
Erick Friis
3107d78517
huggingface: fix standard test lint (#28714) 2024-12-13 22:18:54 +00:00
Kaiwei Zhang
b909d54e70
chroma[patch]: Update logic for assigning ids 2024-12-13 21:58:34 +00:00