Commit Graph

6476 Commits

Author SHA1 Message Date
Christophe Bornet
2340b3154d
standard-tests: Bump ruff version to 0.9 (#29230)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:23:01 +00:00
Christophe Bornet
e4a78dfc2a
core: Bump ruff version to 0.9 (#29201)
Also run some preview autofix and formatting

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:20:09 +00:00
Ella Charlaix
6f95db81b7
huggingface: Add IPEX models support (#29179)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-22 00:16:44 +00:00
Bhav Sardana
d6a7aaa97d
community: Fix for Pydantic model validator of GoogleApiClient (#29346)
- [ *] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Fix for pedantic model validator for GoogleApiHandler
    - **Issue:** the issue #29165 

- [ *] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

---------

Signed-off-by: Bhav Sardana <sardana.bhav@gmail.com>
2025-01-21 15:17:43 -05:00
Christophe Bornet
1c4ce7b42b
core: Auto-fix some docstrings (#29337) 2025-01-21 13:29:53 -05:00
ccurme
86a0720310
fireworks[patch]: update model used in integration tests (#29342)
No access to firefunction-v1 and -v2.
2025-01-21 11:05:30 -05:00
Hugo Berg
32c9c58adf
Community: fix missing f-string modifier in oai structured output parsing error (#29326)
- **Description:** The ValueError raised on certain structured-outputs
parsing errors, in langchain openai community integration, was missing a
f-string modifier and so didn't produce useful outputs. This is a
2-line, 2-character change.
- **Issue:** None open that this fixes
- **Dependencies:** Nothing changed
- **Twitter handle:** None

- [X] **Add tests and docs**: There's nothing to add for.
- [-] **Lint and test**: Happy to run this if you deem it necessary.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-21 14:26:38 +00:00
Nuno Campos
566915d7cf
core: fix call to get closure vars for partial-wrapped funcs (#29316)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-21 09:26:15 -05:00
ZhangShenao
33e22ccb19
[Doc] Improve api doc (#29324)
- Fix doc description
- Add static method decorator
2025-01-21 09:16:08 -05:00
Bagatur
536b44a47f
community[patch]: Release 0.3.15 (#29325) 2025-01-21 03:10:07 +00:00
Bagatur
ec5fae76d4
langchain[patch]: Release 0.3.15 (#29322) 2025-01-21 02:24:11 +00:00
Bagatur
923e6fb321
core[patch]: 0.3.31 (#29320) 2025-01-21 01:17:31 +00:00
Ahmed Tammaa
d3ed9b86be
text-splitters[minor]: Replace lxml and XSLT with BeautifulSoup in HTMLHeaderTextSplitter for Improved Large HTML File Processing (#27678)
This pull request updates the `HTMLHeaderTextSplitter` by replacing the
`split_text_from_file` method's implementation. The original method used
`lxml` and XSLT for processing HTML files, which caused
`lxml.etree.xsltapplyerror maxhead` when handling large HTML documents
due to limitations in the XSLT processor. Fixes #13149

By switching to BeautifulSoup (`bs4`), we achieve:

- **Improved Performance and Reliability:** BeautifulSoup efficiently
processes large HTML files without the errors associated with `lxml` and
XSLT.
- **Simplified Dependencies:** Removes the dependency on `lxml` and
external XSLT files, relying instead on the widely used `beautifulsoup4`
library.
- **Maintained Functionality:** The new method replicates the original
behavior, ensuring compatibility with existing code and preserving the
extraction of content and metadata.

**Issue:**

This change addresses issues related to processing large HTML files with
the existing `HTMLHeaderTextSplitter` implementation. It resolves
problems where users encounter lxml.etree.xsltapplyerror maxhead due to
large HTML documents.

**Dependencies:**

- **BeautifulSoup (`beautifulsoup4`):** The `beautifulsoup4` library is
now used for parsing HTML content.
  - Installation: `pip install beautifulsoup4`

**Code Changes:**

Updated the `split_text_from_file` method in `HTMLHeaderTextSplitter` as
follows:

```python
def split_text_from_file(self, file: Any) -> List[Document]:
    """Split HTML file using BeautifulSoup.

    Args:
        file: HTML file path or file-like object.

    Returns:
        List of Document objects with page_content and metadata.
    """
    from bs4 import BeautifulSoup
    from langchain.docstore.document import Document
    import bs4

    # Read the HTML content from the file or file-like object
    if isinstance(file, str):
        with open(file, 'r', encoding='utf-8') as f:
            html_content = f.read()
    else:
        # Assuming file is a file-like object
        html_content = file.read()

    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')

    # Extract the header tags and their corresponding metadata keys
    headers_to_split_on = [tag[0] for tag in self.headers_to_split_on]
    header_mapping = dict(self.headers_to_split_on)

    documents = []

    # Find the body of the document
    body = soup.body if soup.body else soup

    # Find all header tags in the order they appear
    all_headers = body.find_all(headers_to_split_on)

    # If there's content before the first header, collect it
    first_header = all_headers[0] if all_headers else None
    if first_header:
        pre_header_content = ''
        for elem in first_header.find_all_previous():
            if isinstance(elem, bs4.Tag):
                text = elem.get_text(separator=' ', strip=True)
                if text:
                    pre_header_content = text + ' ' + pre_header_content
        if pre_header_content.strip():
            documents.append(Document(
                page_content=pre_header_content.strip(),
                metadata={}  # No metadata since there's no header
            ))
    else:
        # If no headers are found, return the whole content
        full_text = body.get_text(separator=' ', strip=True)
        if full_text.strip():
            documents.append(Document(
                page_content=full_text.strip(),
                metadata={}
            ))
        return documents

    # Process each header and its associated content
    for header in all_headers:
        current_metadata = {}
        header_name = header.name
        header_text = header.get_text(separator=' ', strip=True)
        current_metadata[header_mapping[header_name]] = header_text

        # Collect all sibling elements until the next header of the same or higher level
        content_elements = []
        for sibling in header.find_next_siblings():
            if sibling.name in headers_to_split_on:
                # Stop at the next header
                break
            if isinstance(sibling, bs4.Tag):
                content_elements.append(sibling)

        # Get the text content of the collected elements
        current_content = ''
        for elem in content_elements:
            text = elem.get_text(separator=' ', strip=True)
            if text:
                current_content += text + ' '

        # Create a Document if there is content
        if current_content.strip():
            documents.append(Document(
                page_content=current_content.strip(),
                metadata=current_metadata.copy()
            ))
        else:
            # If there's no content, but we have metadata, still create a Document
            documents.append(Document(
                page_content='',
                metadata=current_metadata.copy()
            ))

    return documents
```

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 16:10:37 -05:00
Christophe Bornet
989eec4b7b
core: Add ruff rule S101 (no assert) (#29267)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 20:24:31 +00:00
Christophe Bornet
e5d62c6ce7
core: Add ruff rule W293 (whitespaces) (#29272) 2025-01-20 15:16:12 -05:00
Philippe PRADOS
4efc5093c1
community[minor]: Refactoring PyMuPDF parser, loader and add image blob parsers (#29063)
* Adds BlobParsers for images. These implementations can take an image
and produce one or more documents per image. This interface can be used
for exposing OCR capabilities.
* Update PyMuPDFParser and Loader to standardize metadata, handle
images, improve table extraction etc.

- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2025-01-20 15:15:43 -05:00
Syed Baqar Abbas
f175319303
[feat] Added backwards compatibility for OllamaEmbeddings initialization (migration from langchain_community.embeddings to langchain_ollama.embeddings (#29296)
- [feat] **Added backwards compatibility for OllamaEmbeddings
initialization (migration from `langchain_community.embeddings` to
`langchain_ollama.embeddings`**: "langchain_ollama"
- **Description:** Given that `OllamaEmbeddings` from
`langchain_community.embeddings` is deprecated, code is being shifted to
``langchain_ollama.embeddings`. However, this does not offer backward
compatibility of initializing the parameters and `OllamaEmbeddings`
object.
    - **Issue:** #29294 
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001


## Additional Information
Previously, `OllamaEmbeddings` from `langchain_community.embeddings`
used to support the following options:

e9abe583b2/libs/community/langchain_community/embeddings/ollama.py (L125-L139)

However, in the new package `from langchain_ollama import
OllamaEmbeddings`, there is no method to set these options. I have added
these parameters to resolve this issue.

This issue was also discussed in
https://github.com/langchain-ai/langchain/discussions/29113
2025-01-20 11:16:29 -05:00
CLOVA Studio 개발
7a95ffc775
community: fix some features on Naver ChatModel & embedding model 2 (#29243)
## Description
- Responding to `NCP API Key` changes.
- To fix `ChatClovaX` `astream` function to raise `SSEError` when an
error event occurs.
- To add `token length` and `ai_filter` to ChatClovaX's
`response_metadata`.
- To update document for apply NCP API Key changes.

cc. @efriis @vbarda
2025-01-20 11:01:03 -05:00
Sangyun_LEE
5d64597490
docs: fix broken Appearance of langchain_community/document_loaders/recursive_url_loader API Reference (#29305)
# PR mesesage
## Description
Fixed a broken Appearance of RecurisveUrlLoader API Reference.

### Before
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/f39df65d-b788-411d-88af-8bfa2607c00b"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/b8a92b70-4548-4b4a-965f-026faeebd0ec"
/>
</p>

### After
<p align="center">
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/8ea28146-de45-42e2-b346-3004ec4dfc55"
/>
<img width="750" alt="image"
src="https://github.com/user-attachments/assets/914c6966-4055-45d3-baeb-2d97eab06fe7"
/>
</p>

## Issue:
N/A
## Dependencies
None
## Twitter handle
N/A

# Add tests and docs
Not applicable; this change only affects documentation.

# Lint and test
Ran make format, make lint, and make test to ensure no issues.
2025-01-20 10:56:59 -05:00
Hemant Rawat
6c52378992
Add Google-style docstring linting and update pyproject.toml (#29303)
### Description:

This PR introduces Google-style docstring linting for the
ModelLaboratory class in libs/langchain/langchain/model_laboratory.py.
It also updates the pyproject.toml file to comply with the latest Ruff
configuration standards (deprecating top-level lint settings in favor of
lint).

### Changes include:
- [x] Added detailed Google-style docstrings to all methods in
ModelLaboratory.
- [x] Updated pyproject.toml to move select and pydocstyle settings
under the [tool.ruff.lint] section.
- [x] Ensured all files pass Ruff linting.

Issue:
Closes #25154

### Dependencies:
No additional dependencies are required for this change.

### Checklist
- [x] Files passes ruff linting.
- [x] Docstrings conform to the Google-style convention.
- [x] pyproject.toml updated to avoid deprecation warnings.
- [x] My PR is ready to review, please review.
2025-01-19 14:37:21 -05:00
Mohammad Mohtashim
b5fbebb3c8
(Community): Changing the BaseURL and Model for MiniMax (#29299)
- **Description:** Changed the Base Default Model and Base URL to
correct versions. Plus added a more explicit exception if user provides
an invalid API Key
- **Issue:** #29278
2025-01-19 14:15:02 -05:00
ccurme
c20f7418c7
openai[patch]: fix Azure LLM test (#29302)
The tokens I get are:
```
['', '\n\n', 'The', ' sun', ' was', ' setting', ' over', ' the', ' horizon', ',', ' casting', '']
```
so possibly an extra empty token is included in the output.

lmk @efriis if we should look into this further.
2025-01-19 17:25:42 +00:00
ccurme
6b249a0dc2
openai[patch]: release 0.3.1 (#29301) 2025-01-19 17:04:00 +00:00
ThomasSaulou
e9abe583b2
chatperplexity stream-citations in additional kwargs (#29273)
chatperplexity stream-citations in additional kwargs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-18 22:31:10 +00:00
TheSongg
1cd4d8d101
[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259)
- [ ] **PR title**:[langchain_community.llms.xinference]: Rewrite
_stream() method and support stream() method in xinference.py

- [ ] **PR message**: Rewrite the _stream method so that the
chain.stream() can be used to return data streams.

       chain = prompt | llm
       chain.stream(input=user_input)


- [ ] **tests**: 
      from langchain_community.llms import Xinference
      from langchain.prompts import PromptTemplate

      llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
          stream = True
       )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
      chain = prompt | llm
      chain.stream(input={'country': 'France'})
2025-01-17 20:31:59 -05:00
ccurme
184ea8aeb2
anthropic[patch]: update tool choice type (#29276) 2025-01-17 15:26:33 -05:00
ccurme
ac52021097
anthropic[patch]: release 0.3.2 (#29275) 2025-01-17 19:48:31 +00:00
ccurme
c616b445f2
anthropic[patch]: support parallel_tool_calls (#29257)
Need to:
- Update docs
- Decide if this is an explicit kwarg of bind_tools
- Decide if this should be in standard test with flag for supporting
2025-01-17 19:41:41 +00:00
ccurme
d5360b9bd6
core[patch]: release 0.3.30 (#29256) 2025-01-16 17:52:37 -05:00
Nuno Campos
595297e2e5
core: Add support for calls in get_function_nonlocals (#29255)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-16 14:43:42 -08:00
Luis Lopez
75663f2cae
community: Add cost per 1K tokens for fine-tuned model cached input (#29248)
### Description

- Since there is no cost per 1k input tokens for a fine-tuned cached
version of `gpt-4o-mini-2024-07-18` is not available when using the
`OpenAICallbackHandler`, it raises an error when trying to make calls
with such model.
- To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary

cc. @efriis
2025-01-16 15:19:26 -05:00
Junon
667d2a57fd
add mode arg to OBSFileLoader.load() method (#29246)
- **Description:** add mode arg to OBSFileLoader.load() method
  - **Issue:** #29245
  - **Dependencies:** no dependencies required for this change

---------

Co-authored-by: Junon_Gz <junon_gz@qq.com>
2025-01-16 11:09:04 -05:00
Erick Friis
5eb4dc5e06
standard-tests: double messages test (#29237) 2025-01-15 15:14:29 -08:00
Nithish Raghunandanan
1051fa5729
couchbase: Migrate couchbase partner package to different repo (#29239)
**Description:** Migrate the couchbase partner package to
[Couchbase-Ecosystem](https://github.com/Couchbase-Ecosystem/langchain-couchbase)
org
2025-01-15 12:37:27 -08:00
Nadeem Sajjad
eaf2fb287f
community(pypdfloader): added page_label in metadata for pypdf loader (#29225)
# Description

## Summary
This PR adds support for handling multi-labeled page numbers in the
**PyPDFLoader**. Some PDFs use complex page numbering systems where the
actual content may begin after multiple introductory pages. The
page_label field helps accurately reflect the document’s page structure,
making it easier to handle such cases during document parsing.

## Motivation
This feature improves document parsing accuracy by allowing users to
access the actual page labels instead of relying only on the physical
page numbers. This is particularly useful for documents where the first
few pages have roman numerals or other non-standard page labels.

## Use Case
This feature is especially useful for **Retrieval-Augmented Generation**
(RAG) systems where users may reference page numbers when asking
questions. Some PDFs have both labeled page numbers (like roman numerals
for introductory sections) and index-based page numbers.

For example, a user might ask:

	"What is mentioned on page 5?"

The system can now check both:
	•	**Index-based page number** (page)
	•	**Labeled page number** (page_label)

This dual-check helps improve retrieval accuracy. Additionally, the
results can be validated with an **agent or tool** to ensure the
retrieved pages match the user’s query contextually.

## Code Changes

- Added a page_label field to the metadata of the Document class in
**PyPDFLoader**.
- Implemented support for retrieving page_label from the
pdf_reader.page_labels.
- Created a test case (test_pypdf_loader_with_multi_label_page_numbers)
with a sample PDF containing multi-labeled pages
(geotopo-komprimiert.pdf) [[Source of
pdf](https://github.com/py-pdf/sample-files/blob/main/009-pdflatex-geotopo/GeoTopo-komprimiert.pdf)].
- Updated existing tests to ensure compatibility and verify page_label
extraction.

## Tests Added

- Added a new test case for a PDF with multi-labeled pages.
- Verified both page and page_label metadata fields are correctly
extracted.

## Screenshots

<img width="549" alt="image"
src="https://github.com/user-attachments/assets/65db9f5c-032e-4592-926f-824777c28f33"
/>
2025-01-15 14:18:07 -05:00
Mehdi
1a38948ee3
Mehdi zare/fmp data doc (#29219)
Title: community: add Financial Modeling Prep (FMP) API integration

Description: Adding LangChain integration for Financial Modeling Prep
(FMP) API to enable semantic search and structured tool creation for
financial data endpoints. This integration provides semantic endpoint
search using vector stores and automatic tool creation with proper
typing and error handling. Users can discover relevant financial
endpoints using natural language queries and get properly typed
LangChain tools for discovered endpoints.

Issue: N/A

Dependencies:

fmp-data>=0.3.1
langchain-core>=0.1.0
faiss-cpu
tiktoken
Twitter handle: @mehdizarem

Unit tests and example notebook have been added:

Tests are in tests/integration_tests/est_tools.py and
tests/unit_tests/test_tools.py
Example notebook is in docs/tools.ipynb
All format, lint and test checks pass:

pytest
mypy .
Dependencies are imported within functions and not added to
pyproject.toml. The changes are backwards compatible and only affect the
community package.

---------

Co-authored-by: mehdizare <mehdizare@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-15 15:31:01 +00:00
Mohammad Mohtashim
288613d361
(text-splitters): Small Fix in _process_html for HTMLSemanticPreservingSplitter to properly extract the metadata. (#29215)
- **Description:** Include `main` in the list of elements whose child
elements needs to be processed for splitting the HTML.
- **Issue:** #29184
2025-01-15 10:18:06 -05:00
TheSongg
4867fe7ac8
[langchain_community.llms.xinference]: fix error in xinference.py (#29216)
- [ ] **PR title**: [langchain_community.llms.xinference]: fix error in
xinference.py

- [ ] **PR message**:
- The old code raised an ValidationError:
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference when import Xinference from xinference.py. This issue has
been resolved by adjusting it's type and default value.

File "/media/vdc/python/lib/python3.10/site-packages/pydantic/main.py",
line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data,
self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference
        client
Field required [type=missing, input_value={'server_url':
'http://10...t4', 'model_kwargs': {}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing

- [ ] **tests**:

       from langchain_community.llms import Xinference
       llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
         )
2025-01-15 10:11:26 -05:00
Syed Baqar Abbas
4278046329
[fix] Convert table names to list for compatibility in SQLDatabase (#29229)
- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
  - The issue #29227 is being fixed here
  - The "package" modified is community
  - The issue lied in this block of code:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L72-L77)

- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
- **Description:** When the SQLDatabase is initialized, it runs a code
`self._inspector.get_table_names(schema=schema)` which expects an output
of list. However, with some connectors (such as snowflake) the data type
returned could be another iterable. This results in a type error when
concatenating the table_names to view_names. I have added explicit type
casting to prevent this.
    - **Issue:** The issue #29227 is being fixed here
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001

## Additional Information
When the following method is called for a Snowflake database:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L75)

Snowflake under the hood calls:
```python
from snowflake.sqlalchemy.snowdialect import SnowflakeDialect
SnowflakeDialect.get_table_names
```

This method returns a `dict_keys()` object which is incompatible to
concatenate with a list and results in a `TypeError`

### Relevant Library Versions
- **snowflake-sqlalchemy**: 1.7.2  
- **snowflake-connector-python**: 3.12.4  
- **sqlalchemy**: 2.0.20  
- **langchain_community**: 0.3.14
2025-01-15 10:00:03 -05:00
Jin Hyung Ahn
05554265b4
community: Fix ConfluenceLoader load() failure caused by deleted pages (#29232)
## Description
This PR modifies the is_public_page function in ConfluenceLoader to
prevent exceptions caused by deleted pages during the execution of
ConfluenceLoader.process_pages().


**Example scenario:**
Consider the following usage of ConfluenceLoader:
```python
import os
from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
        url=os.getenv("BASE_URL"),
        token=os.getenv("TOKEN"),
        max_pages=1000,
        cql=f'type=page and lastmodified >= "2020-01-01 00:00"',
        include_restricted_content=False,
)

# Raised Exception : HTTPError: Outdated version/old_draft/trashed? Cannot find content Please provide valid ContentId.
documents = loader.load()
```

If a deleted page exists within the query result, the is_public_page
function would previously raise an exception when calling
get_all_restrictions_for_content, causing the loader.load() process to
fail for all pages.



By adding a pre-check for the page's "current" status, unnecessary API
calls to get_all_restrictions_for_content for non-current pages are
avoided.


This fix ensures that such pages are skipped without affecting the rest
of the loading process.





## Issue
N/A (No specific issue number)

## Dependencies
No new dependencies are introduced with this change.

## Twitter handle
[@zenoengine](https://x.com/zenoengine)
2025-01-15 09:56:23 -05:00
Mohammad Mohtashim
21eb39dff0
[Community]: AzureOpenAIWhisperParser Authenication Fix (#29135)
- **Description:** `AzureOpenAIWhisperParser` authentication fix as
stated in the issue.
- **Issue:** #29133
2025-01-15 09:44:53 -05:00
Erick Friis
b05543c69b
packages: disable mongodb for api docs (#29218) 2025-01-15 02:23:01 +00:00
Erick Friis
30badd7a32
packages: update mongodb folder (#29217) 2025-01-15 02:01:06 +00:00
pm390
76172511fd
community: Additional parameters for OpenAIAssistantV2Runnable (#29207)
**Description:** Added Additional parameters that could be useful for
usage of OpenAIAssistantV2Runnable.

This change is thought to allow langchain users to set parameters that
cannot be set using assistants UI
(max_completion_tokens,max_prompt_tokens,parallel_tool_calls) and
parameters that could be useful for experimenting like top_p and
temperature.

This PR originated from the need of using parallel_tool_calls in
langchain, this parameter is very important in openAI assistants because
without this parameter set to False strict mode is not respected by
OpenAI Assistants
(https://platform.openai.com/docs/guides/function-calling#parallel-function-calling).

> Note: Currently, if the model calls multiple functions in one turn
then strict mode will be disabled for those calls.

**Issue:** None
**Dependencies:** openai
2025-01-14 15:53:37 -05:00
Bagatur
4ab04ad6be
docs: oai api ref nit (#29210) 2025-01-14 17:55:16 +00:00
Michael Chin
d9b856abad
community: Deprecate Amazon Neptune resources in langchain-community (#29191)
Related: https://github.com/langchain-ai/langchain-aws/pull/322

The legacy `NeptuneOpenCypherQAChain` and `NeptuneSparqlQAChain` classes
are being replaced by the new LCEL format chains
`create_neptune_opencypher_qa_chain` and
`create_neptune_sparql_qa_chain`, respectively, in the `langchain_aws`
package.

This PR adds deprecation warnings to all Neptune classes and functions
that have been migrated to `langchain_aws`. All relevant documentation
has also been updated to replace `langchain_community` usage with the
new `langchain_aws` implementations.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-14 10:23:34 -05:00
Erick Friis
c55af44711
anthropic: pydantic mypy plugin (#29144) 2025-01-13 15:32:40 -08:00
ccurme
1bf6576709
cli[patch]: fix anchor links in templates (#29178)
These are outdated and can break docs builds.
2025-01-13 18:28:18 +00:00
Christopher Varjas
e156b372fb
langchain: support api key argument with OpenAI moderation chain (#29140)
**Description:** Makes it possible to instantiate
`OpenAIModerationChain` with an `openai_api_key` argument only and no
`OPENAI_API_KEY` environment variable defined.

**Issue:** https://github.com/langchain-ai/langchain/issues/25176

**Dependencies:** `openai`

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 11:00:02 -05:00
Nikhil Shahi
335ca3a606
docs: add HyperbrowserLoader docs (#29143)
### Description
This PR adds docs for the
[langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/)
package. It includes a document loader that uses Hyperbrowser to scrape
or crawl any urls and return formatted markdown or html content as well
as relevant metadata.
[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and
scaling headless browsers. It lets you launch and manage browser
sessions at scale and provides easy to use solutions for any webscraping
needs, such as scraping a single page or crawling an entire site.

### Issue
None

### Dependencies
None

### Twitter Handle
`@hyperbrowser`
2025-01-13 10:45:39 -05:00
Tymon Żarski
689592f9bb
community: Fix rank-llm import paths for new 0.20.3 version (#29154)
# **PR title**: "community: Fix rank-llm import paths for new 0.20.3
version"
- The "community" package is being modified to handle updated import
paths for the new `rank-llm` version.

---

## Description
This PR updates the import paths for the `rank-llm` package to account
for changes introduced in version `0.20.3`. The changes ensure
compatibility with both pre- and post-revamp versions of `rank-llm`,
specifically version `0.12.8`. Conditional imports are introduced based
on the detected version of `rank-llm` to handle different path
structures for `VicunaReranker`, `ZephyrReranker`, and `SafeOpenai`.

## Issue
RankLLMRerank usage throws an error when used GPT (not only) when
rank-llm version is > 0.12.8 - #29156

## Dependencies
This change relies on the `packaging` and `pkg_resources` libraries to
handle version checks.

## Twitter handle
@tymzar
2025-01-13 10:22:14 -05:00
Andrew
0e3115330d
Add additional_instructions on openai assistan runs create. (#29164)
- **Description**: In the functions `_create_run` and `_acreate_run`,
the parameters passed to the creation of
`openai.resources.beta.threads.runs` were limited.

  Source: 
  ```
  def _create_run(self, input: dict) -> Any:
        params = {
            k: v
            for k, v in input.items()
            if k in ("instructions", "model", "tools", "run_metadata")
        }
        return self.client.beta.threads.runs.create(
            input["thread_id"],
            assistant_id=self.assistant_id,
            **params,
        )
  ```
- OpenAI Documentation
([createRun](https://platform.openai.com/docs/api-reference/runs/createRun))

- Full list of parameters `openai.resources.beta.threads.runs` ([source
code](https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/threads/runs/runs.py#L91))

 
- **Issue:** Fix #17574 



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 10:11:47 -05:00
ccurme
e4ceafa1c8
langchain[patch]: update extended tests for compatibility with langchain-openai==0.3 (#29174) 2025-01-13 15:04:22 +00:00
Priyansh Agrawal
c115c09b6d
community: add missing format specifier in error log in CubeSemanticLoader (#29172)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**
- **Description:** Add a missing format specifier in an an error log in
`langchain_community.document_loaders.CubeSemanticLoader`
- **Issue:** raises `TypeError: not all arguments converted during
string formatting`


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-13 09:32:57 -05:00
ThomasSaulou
349b5c91c2
fix chatperplexity: remove 'stream' from params in _stream method (#29173)
quick fix chatperplexity: remove 'stream' from params in _stream method
2025-01-13 09:31:37 -05:00
LIU Yuwei
f980144e9c
community: add init for unstructured file loader (#29101)
## Description
Add `__init__` for unstructured loader of
epub/image/markdown/pdf/ppt/word to restrict the input type to `str` or
`Path`.
In the
[signature](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html)
these unstructured loaders receive `file_path: str | List[str] | Path |
List[Path]`, but actually they only receive `str` or `Path`.

## Issue
None

## Dependencies
No changes.
2025-01-13 09:26:00 -05:00
Erick Friis
bbc3e3b2cf
openai: disable streaming for o1 by default (#29147)
Currently 400s
https://community.openai.com/t/streaming-support-for-o1-o1-2024-12-17-resulting-in-400-unsupported-value/1085043

o1-mini and o1-preview stream fine
2025-01-11 02:24:11 +00:00
Isaac Francisco
62074bac60
replace all LANGCHAIN_ flags with LANGSMITH_ flags (#29120) 2025-01-11 01:24:40 +00:00
Bagatur
5c2fbb5b86
docs: Update openai README.md (#29146) 2025-01-10 17:24:16 -08:00
Erick Friis
0a54aedb85
anthropic: pdf integration test (#29142) 2025-01-10 21:56:31 +00:00
ccurme
8de8519daf
tests[patch]: release 0.3.8 (#29141) 2025-01-10 21:53:41 +00:00
Jiang
7d3fb21807
Add lindorm as new integration (#29123)
Misoperation caused the pr close: [origin pr
link](https://github.com/langchain-ai/langchain/pull/29085)

---------

Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2025-01-10 16:30:37 -05:00
ccurme
4819b500e8
pinecone[patch]: release 0.2.2 (#29139) 2025-01-10 14:59:57 -05:00
Ashvin
46fd09ffeb
partner: Update aiohttp in langchain pinecone. (#28863)
- **partner**: "Update Aiohttp for resolving vulnerability issue"
    
- **Description:** I have updated the upper limit of aiohttp from `3.10`
to `3.10.5` in the pyproject.toml file of langchain-pinecone. Hopefully
this will resolve #28771 . Please review this as I'm quite unsure.

---------

Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-10 14:54:52 -05:00
ccurme
f3d370753f
xai[minor]: release 0.2 (#29132)
Update `langchain-openai` to 0.3. See [release
notes](https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D0.3.0)
for details. Should only impact default values of `temperature`, `n`,
and `max_retries`.
2025-01-10 11:47:27 -05:00
ccurme
6e63ccba84
openai[minor]: release 0.3 (#29100)
## Goal

Solve the following problems with `langchain-openai`:

- Structured output with `o1` [breaks out of the
box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099).
- `with_structured_output` by default does not use OpenAI’s [structured
output
feature](https://platform.openai.com/docs/guides/structured-outputs).
- We override API defaults for temperature and other parameters.

## Breaking changes:

- Default method for structured output is changing to OpenAI’s dedicated
[structured output
feature](https://platform.openai.com/docs/guides/structured-outputs).
For schemas specified via TypedDict or JSON schema, strict schema
validation is disabled by default but can be enabled by specifying
`strict=True`.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Models that don’t support `method="json_schema"` (e.g., `gpt-4` and
`gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise
an error unless `method` is explicitly specified.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Schemas specified via Pydantic `BaseModel` that have fields with
non-null defaults or metadata (like min/max constraints) will raise an
error.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- `strict` now defaults to False for `method="json_schema"` when schemas
are specified via TypedDict or JSON schema.
- To recover previous behavior, use `with_structured_output(schema,
strict=True)`
- Schemas specified via Pydantic V1 will raise a warning (and use
`method="function_calling"`) unless `method` is explicitly specified.
- To remove the warning, pass `method="function_calling"` into
`with_structured_output`.
- Streaming with default structured output method / Pydantic schema no
longer generates intermediate streamed chunks.
- To recover previous behavior, pass `method="function_calling"` into
`with_structured_output`.
- We no longer override default temperature (was 0.7 in LangChain, now
will follow OpenAI, currently 1.0).
- To recover previous behavior, initialize `ChatOpenAI` or
`AzureChatOpenAI` with `temperature=0.7`.
- Note: conceptually there is a difference between forcing a tool call
and forcing a response format. Tool calls may have more concise
arguments vs. generating content adhering to a schema. Prompts may need
to be adjusted to recover desired behavior.

---------

Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-01-10 10:50:32 -05:00
ccurme
815bfa1913
openai[patch]: support streaming with json_schema response format (#29044)
- Stream JSON string content. Final chunk includes parsed representation
(following OpenAI
[docs](https://platform.openai.com/docs/guides/structured-outputs#streaming)).
- Mildly (?) breaking change: if you were using streaming with
`response_format` before, usage metadata will disappear unless you set
`stream_usage=True`.

## Response format

Before:

![Screenshot 2025-01-06 at 11 59
01 AM](https://github.com/user-attachments/assets/e54753f7-47d5-421d-b8f3-172f32b3364d)


After:

![Screenshot 2025-01-06 at 11 58
13 AM](https://github.com/user-attachments/assets/34882c6c-2284-45b4-92f7-5b5b69896903)


## with_structured_output

For pydantic output, behavior of `with_structured_output` is unchanged
(except for warning disappearing), because we pluck the parsed
representation straight from OpenAI, and OpenAI doesn't return it until
the stream is completed. Open to alternatives (e.g., parsing from
content or intermediate dict chunks generated by OpenAI).

Before:

![Screenshot 2025-01-06 at 12 38
11 PM](https://github.com/user-attachments/assets/913d320d-f49e-4cbb-a800-b394ae817fd1)

After:

![Screenshot 2025-01-06 at 12 38
58 PM](https://github.com/user-attachments/assets/f7a45dd6-d886-48a6-8d76-d0e21ca767c6)
2025-01-09 10:32:30 -05:00
Panos Vagenas
858f655a25
docs: add Docling loader docs (#29104)
### Description
This adds the docs for the Docling document loader.
[Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX,
HTML, and other formats into a rich unified representation including
document layout, tables etc., making them ready for generative AI
workflows like RAG.

Some references:
- https://research.ibm.com/blog/docling-generative-AI
-
https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
- [Docling Technical Report](https://arxiv.org/abs/2408.09869)

The introduced `DoclingLoader` enables users to:
- use various document types in their LLM applications with ease and
speed, and
- leverage Docling's rich representation for advanced, document-native
grounding.

### Issue
Replacing PR #27987 as discussed with @efriis
[here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930).

### Dependencies
None

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-09 10:15:35 -05:00
Joshua Campbell
00dcc44739
Langchain_community: Fix issue with missing backticks in arango client (#29110)
- **Description:** Adds backticks to generate_schema function in the
arango graph client
- **Issue:** We experienced an issue with the generate schema function
when talking to our arango database where these backticks were missing
    - **Dependencies:** none
    - **Twitter handle:** @anangelofgrace
2025-01-09 10:00:10 -05:00
LIU Yuwei
2b09f798e1
community: add init for UnstructuredHTMLLoader to solve pathlib paths (#29091)
## Description
Add `__init__` for `UnstructuredHTMLLoader` to restrict the input type
to `str` or `Path`, and transfer the `self.file_path` to `str` just like
`UnstructuredXMLLoader` does.

## Issue
Fix #29090 

## Dependencies
No changes.
2025-01-08 10:19:27 -05:00
Jin Hyung Ahn
c8ca1cd42f
community: fix "confluence-loader" enable include_labels for documents loaded via CQL (#29089)
## Description
This PR enables label inclusion for documents loaded via CQL in the
confluence-loader.

- Updated _lazy_load to pass the include_labels parameter instead of
False in process_pages calls for documents loaded via CQL.
- Ensured that labels can now be fetched and added to the metadata for
documents queried with cql.

## Related Modification History
This PR builds on the previous functionality introduced in
[#28259](https://github.com/langchain-ai/langchain/pull/28259), which
added support for including labels with the include_labels option.
However, this functionality did not work as expected for CQL queries,
and this PR fixes that issue.

If the False handling was intentional due to another issue, please let
me know. I have verified with our Confluence instance that this change
allows labels to be correctly fetched for documents loaded via CQL.

## Issue
Fixes #29088


## Dependencies
No changes.

## Twitter Handle
[@zenoengine](https://x.com/zenoengine)
2025-01-08 10:16:39 -05:00
Inah Jeon
9d290abccd
partner: Update Upstage Model Names and Remove Deprecated Model (#29093)
This PR updates model names in the upstage library to reflect the latest
naming conventions and removes deprecated models.

Changes:

Renamed Models:
- `solar-1-mini-chat` -> `solar-mini`
- `solar-1-mini-embedding-query` -> `embedding-query`

Removed Deprecated Models:
- `layout-analysis` (replaced to `document-parse`)

Reference:
- https://console.upstage.ai/docs/getting-started/overview
-
https://github.com/langchain-ai/langchain-upstage/releases/tag/libs%2Fupstage%2Fv0.5.0

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-08 10:13:22 -05:00
Prashanth Rao
b1dafaef9b
Kùzu package integration docs (#29076)
## Langchain Kùzu

### Description
 
This PR adds docs for the `langchain-kuzu` package [on
PyPI](https://pypi.org/project/langchain-kuzu/) that was recently
published, allowing Kùzu users to more easily use and work with
LangChain QA chains. The package will also make it easier for the Kùzu
team to continue supporting and updating the integration over future
releases.

### Twitter Handle

Please tag [@kuzudb](https://x.com/kuzudb) on Twitter once this PR is
merged, so LangChain users can be notified!

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2025-01-08 01:14:00 +00:00
Erick Friis
cc0f81f40f
partners/groq: release 0.2.3 (#29081) 2025-01-07 23:36:51 +00:00
Erick Friis
fcc9cdd100
multiple: disable socket for unit tests (#29080) 2025-01-07 15:31:50 -08:00
Erick Friis
539ebd5431
groq: user agent (#29079) 2025-01-07 23:21:57 +00:00
Erick Friis
c5bee0a544
pinecone: bump core version (#29077) 2025-01-07 20:23:33 +00:00
Cory Waddingham
ce9e9f9314
pinecone: Review pinecone tests (#29073)
Title: langchain-pinecone: improve test structure and async handling

Description: This PR improves the test infrastructure for the
langchain-pinecone package by:
1. Implementing LangChain's standard test patterns for embeddings
2. Adding comprehensive configuration testing
3. Improving async test coverage
4. Fixing integration test issues with namespaces and async markers

The changes make the tests more robust, maintainable, and aligned with
LangChain's testing standards while ensuring proper async behavior in
the embeddings implementation.

Key improvements:
- Added standard EmbeddingsTests implementation
- Split custom configuration tests into a separate test class
- Added proper async test coverage with pytest-asyncio
- Fixed namespace handling in vector store integration tests
- Improved test organization and documentation

Dependencies: None (uses existing test dependencies)

Tests and Documentation:
-  Added standard test implementation following LangChain's patterns
-  Added comprehensive unit tests for configuration and async behavior
-  All tests passing locally
- No documentation changes needed (internal test improvements only)

Twitter handle: N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-07 11:46:30 -08:00
Philippe PRADOS
2921597c71
community[patch]: Refactoring PDF loaders: 01 prepare (#29062)
- **Refactoring PDF loaders step 1**: "community: Refactoring PDF
loaders to standardize approaches"

- **Description:** Declare CloudBlobLoader in __init__.py. file_path is
Union[str, PurePath] anywhere
- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

@eyurtsev it's the start of a PR series.
2025-01-07 11:00:04 -05:00
ccurme
55677e31f7
text-splitters[patch]: release 0.3.5 (#29054)
Resolves https://github.com/langchain-ai/langchain/issues/29053
2025-01-07 09:48:26 -05:00
Erick Friis
187131c55c
Revert "integrations[patch]: remove non-required chat param defaults" (#29048)
Reverts langchain-ai/langchain#26730

discuss best way to release default changes (esp openai temperature)
2025-01-06 14:45:34 -08:00
Bagatur
3d7ae8b5d2
integrations[patch]: remove non-required chat param defaults (#26730)
anthropic:
  - max_retries

openai:
  - n
  - temperature
  - max_retries

fireworks
  - temperature

groq
  - n
  - max_retries
  - temperature

mistral
  - max_retries
  - timeout
  - max_concurrent_requests
  - temperature
  - top_p
  - safe_mode

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 22:26:22 +00:00
UV
b9db8e9921
DOC: Improve human input prompt in FewShotChatMessagePromptTemplate example (#29023)
Fixes #29010 

This PR updates the example for FewShotChatMessagePromptTemplate by
modifying the human input prompt to include a more descriptive and
user-friendly question format ('What is {input}?') instead of just
'{input}'. This change enhances clarity and usability in the
documentation example.

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 12:29:15 -08:00
ccurme
1f78d4faf4
voyageai[patch]: release 0.1.4 (#29046) 2025-01-06 20:20:19 +00:00
Eugene Evstafiev
6a152ce245
docs: add langchain-pull-md Markdown loader (#29024)
- [x] **PR title**: "docs: add langchain-pull-md Markdown loader"

- [x] **PR message**: 
- **Description:** This PR introduces the `langchain-pull-md` package to
the LangChain community. It includes a new document loader that utilizes
the pull.md service to convert URLs into Markdown format, particularly
useful for handling web pages rendered with JavaScript frameworks like
React, Angular, or Vue.js. This loader helps in efficient and reliable
Markdown conversion directly from URLs without local rendering, reducing
server load.
    - **Issue:** NA
    - **Dependencies:** requests >=2.25.1
    - **Twitter handle:** https://x.com/eugeneevstafev?s=21

- [x] **Add tests and docs**: 
1. Added unit tests to verify URL checking and conversion
functionalities.
2. Created a comprehensive example notebook detailing the usage of the
new loader.

- [x] **Lint and test**: 
- Completed local testing using `make format`, `make lint`, and `make
test` commands as per the LangChain contribution guidelines.


**Related Links:**
- [Package Repository](https://github.com/chigwell/langchain-pull-md)
- [PyPI Package](https://pypi.org/project/langchain-pull-md/)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 19:32:43 +00:00
Ashvin
20a715a103
community: Fix redundancy in code. (#29022)
In my previous PR (#28953), I added an unwanted condition for validating
the Azure ML Endpoint. In this PR, I have rectified the issue.
2025-01-06 12:58:16 -05:00
Adrián Panella
acddfc772e
core: allow artifact in create_retriever_tool (#28903)
Add option to return content and artifacts, to also be able to access
the full info of the retrieved documents.

They are returned as a list of dicts in the `artifacts` property if
parameter `response_format` is set to `"content_and_artifact"`.

Defaults to `"content"` to keep current behavior.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 22:10:31 +00:00
ccurme
3e618b16cd
community[patch]: release 0.3.14 (#29019) 2025-01-03 15:34:24 -05:00
ccurme
18eb9c249d
langchain[patch]: release 0.3.14 (#29018) 2025-01-03 15:15:44 -05:00
ccurme
8e50e4288c
core[patch]: release 0.3.29 (#29017) 2025-01-03 14:58:39 -05:00
ccurme
85403bfa99
core[patch]: substantially speed up @deprecated (#29016)
Resolves https://github.com/langchain-ai/langchain/issues/26918

Unit tests don't raise any additional `LangChainDeprecationWarning`.
Would like guidance on how to test this more thoroughly if needed.

Note: speed up for `bind_tools` path is shown below. This is
**redundant** with the speedup in
https://github.com/langchain-ai/langchain/pull/29015. I include it for
demonstration purposes.

Before:

![Screenshot 2025-01-03 at 12 54
50 PM](https://github.com/user-attachments/assets/87f289eb-4cad-4304-85f7-5c58c59080f1)

After:

![Screenshot 2025-01-03 at 12 55
35 PM](https://github.com/user-attachments/assets/95ad0506-e1d1-4c5c-bb27-6a634d8810c9)
2025-01-03 14:38:53 -05:00
ccurme
4bb391fd4e
core[patch]: remove deprecated functions from tool binding hotpath (#29015)
(Inspired by https://github.com/langchain-ai/langchain/issues/26918)

We rely on some deprecated public functions in the hot path for tool
binding (`convert_pydantic_to_openai_function`,
`convert_python_function_to_openai_function`, and
`format_tool_to_openai_function`). My understanding is that what is
deprecated is not the functionality they implement, but use of them in
the public API -- we expect to continue to rely on them.

Here we update these functions to be private and not deprecated. We keep
the public, deprecated functions as simple wrappers that can be safely
deleted.

The `@deprecated` wrapper adds considerable latency due to its use of
the `inspect` module. This update speeds up `bind_tools` by a factor of
~100x:

Before:

![Screenshot 2025-01-03 at 11 22
55 AM](https://github.com/user-attachments/assets/94b1c433-ce12-406f-b64c-ca7103badfe0)

After:

![Screenshot 2025-01-03 at 11 23
41 AM](https://github.com/user-attachments/assets/02d0deab-82e4-45ca-8cc7-a20b91a5b5db)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 19:29:01 +00:00
Eugene Evstafiev
a86904e735
docs: fix typo (#29012)
Thank you for contributing to LangChain!

- [x] **PR title**: "docs: fix typo"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a minor fix of typo
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. ~~a test for the integration, preferably unit tests that do not rely
on network access,~~
2. ~~an example notebook showing its use. It lives in
`docs/docs/integrations` directory.~~


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2025-01-03 09:52:24 -08:00
Erick Friis
919d1c7da6
box: remove box readme for api docs build (#29014) 2025-01-03 09:50:04 -08:00
Erick Friis
d8bc556c94
packages: update box location (#29013) 2025-01-03 09:45:13 -08:00
Amaan
8d7daa59fb
docs: add langchain dappier retriever integration notebooks (#28931)
Add a retriever to interact with Dappier APIs with an example notebook.

The retriever can be invoked with:

```python
from langchain_dappier import DappierRetriever

retriever = DappierRetriever(
    data_model_id="dm_01jagy9nqaeer9hxx8z1sk1jx6",
    k=5
)

retriever.invoke("latest tech news")
```

To retrieve 5 documents related to latest news in the tech sector. The
included notebook also includes deeper details about controlling filters
such as selecting a data model, number of documents to return, site
domain reference, minimum articles from the reference domain, and search
algorithm, as well as including the retriever in a chain.

The integration package can be found over here -
https://github.com/DappierAI/langchain-dappier
2025-01-03 10:21:41 -05:00
ccurme
0185010b88
community[patch]: additional check for prompt caching support (#29008)
Prompt caching explicitly excludes `gpt-4o-2024-05-13`:
https://platform.openai.com/docs/guides/prompt-caching

Resolves https://github.com/langchain-ai/langchain/issues/28997
2025-01-03 10:14:07 -05:00
Tari Yekorogha
ba9dfd9252
docs: Add FalkorDB Chat Message History and Update Package Registry (#28914)
This commit updates the documentation and package registry for the
FalkorDB Chat Message History integration.

**Changes:**

- Added a comprehensive example notebook
falkordb_chat_message_history.ipynb demonstrating how to use FalkorDB
for session-based chat message storage.

- Added a provider notebook for FalkorDB

- Updated libs/packages.yml to register FalkorDB as an integration
package, following LangChain's new guidelines for community
integrations.

**Notes:**

- This update aligns with LangChain's process for registering new
integrations via documentation updates and package registry
modifications.

- No functional or core package changes were made in this commit.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 15:46:47 -05:00
Ashvin
d26c102a5a
community: Update azureml endpoint (#28953)
- In this PR, I have updated the AzureML Endpoint with the latest
endpoint.
- **Description:** I have changed the existing `/chat/completions` to
`/models/chat/completions` in
libs/community/langchain_community/llms/azureml_endpoint.py
    - **Issue:** #25702

---------

Co-authored-by: = <=>
2025-01-02 14:47:02 -05:00
ccurme
7c28321f04
core[patch]: fix deprecation admonition in API ref (#28992)
Before:

![Screenshot 2025-01-02 at 1 49
30 PM](https://github.com/user-attachments/assets/cb30526a-fc0b-439f-96d1-962c226d9dc7)

After:

![Screenshot 2025-01-02 at 1 49
38 PM](https://github.com/user-attachments/assets/32c747ea-6391-4dec-b778-df457695d197)
2025-01-02 14:37:55 -05:00
Mohammad Mohtashim
0e74757b0a
(Community): DuckDuckGoSearchAPIWrapper backend changed from api to auto (#28961)
- **Description:** `DuckDuckGoSearchAPIWrapper` default value for
backend has been changed to avoid User Warning
- **Issue:** #28957
2025-01-02 14:08:22 -05:00
Mohammad Mohtashim
aa551cbcee
(Core) Small Change in Docstring for method partial for BasePromptTemplate (#28969)
- **Description:** Very small change in Docstring for
`BasePromptTemplate`
- **Issue:** #28966
2025-01-02 12:16:30 -05:00
minpeter
a873e0fbfb
community: update documentation and model IDs for FriendliAI provider (#28984)
### Description  

- In the example, remove `llama-2-13b-chat`,
`mixtral-8x7b-instruct-v0-1`.
- Fix llm friendli streaming implementation.
- Update examples in documentation and remove duplicates.

### Issue  
N/A  

### Dependencies  
None  

### Twitter handle  
`@friendliai`
2025-01-02 12:15:59 -05:00
Hrishikesh Kalola
437ec53e29
langchain.agents: corrected documentation (#28986)
**Description:**
This PR updates the codebase to reflect the deprecation of the AgentType
feature. It includes the following changes:

Documentation Update:

Added a deprecation notice to the AgentType class comment.
Provided a reference to the official LangChain migration guide for
transitioning to LangGraph agents.
Reference Link: https://python.langchain.com/docs/how_to/migrate_agent/

**Twitter handle:** @hrrrriiiishhhhh

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 12:13:42 -05:00
Mohammad Mohtashim
49a26c1fca
(Community): Fix Keyword argument for AzureAIDocumentIntelligenceParser (#28959)
- **Description:** Fix the `body` keyword argument for
AzureAIDocumentIntelligenceParser`
- **Issue:** #28948
2025-01-02 11:27:12 -05:00
ccurme
efc687a13b
community[patch]: fix instantiation for Slack tools (#28990)
Believe the current implementation raises PydanticUserError following
[this](https://github.com/pydantic/pydantic/releases/tag/v2.10.1)
Pydantic release.

Resolves https://github.com/langchain-ai/langchain/issues/28989
2025-01-02 16:14:17 +00:00
Yunlin Mao
c59093d67f
docs: add modelscope endpoint (#28941)
## Description

To integrate ModelScope inference API endpoints for both Embeddings,
LLMs and ChatModels, install the package
`langchain-modelscope-integration` (as discussed in issue #28928 ). This
is necessary because the package name `langchain-modelscope` was already
registered by another party.

ModelScope is a premier platform designed to connect model checkpoints
with model applications. It provides the necessary infrastructure to
share open models and promote model-centric development. For more
information, visit GitHub page:
[ModelScope](https://github.com/modelscope).
2025-01-02 10:08:41 -05:00
Bagatur
1c797ac68f
infra: speed up unit tests (#28974)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-02 04:13:08 +00:00
Morgante Pell
79fc9b6b04
cli: bump gritql version (#28981)
**Description:**

bump gritql dependency, to use new binary names from
[here](https://github.com/getgrit/gritql/pull/565)

**Issue:**

fixes https://github.com/langchain-ai/langchain/issues/27822
2025-01-01 20:02:46 -08:00
Bagatur
edbe7d5f5e
core,anthropic[patch]: fix with_structured_output typing (#28950) 2024-12-28 15:46:51 -05:00
dabzr
ffbe5b2106
partners: fix default value for stop_sequences in ChatGroq (#28924)
- **Description:**  
This PR addresses an issue with the `stop_sequences` field in the
`ChatGroq` class. Currently, the field is defined as:
```python
stop: Optional[Union[List[str], str]] = Field(None, alias="stop_sequences")
```  
This causes the language server (LSP) to raise an error indicating that
the `stop_sequences` parameter must be implemented. The issue occurs
because `Field(None, alias="stop_sequences")` is different compared to
`Field(default=None, alias="stop_sequences")`.


![image](https://github.com/user-attachments/assets/bfc34cb1-c664-4c31-b856-8f18419c7350)
To resolve the issue, the field is updated to:  
```python
stop: Optional[Union[List[str], str]] = Field(default=None, alias="stop_sequences")
```  
While this issue does not affect runtime behavior, it ensures
compatibility with LSPs and improves the development experience.
- **Issue:** N/A  
- **Dependencies:** None
2024-12-26 16:43:34 -05:00
Andy Wermke
5940ed3952
community: Fix error handling bug in ChatDeepInfra (#28918)
In the async ClientResponse, `response.text` is not a string property,
but an asynchronous function returning a string.
2024-12-26 14:45:12 -05:00
zep.hyr
7b4d2d5d44
Community : Add cost information for missing OpenAI model (#28882)
In the previous commit, the cached model key for this model was omitted.
When using the "gpt-4o-2024-11-20" model, the token count in the
callback appeared as 0, and the cost was recorded as 0.

We add model and cost information so that the token count and cost can
be displayed for the respective model.

- The message before modification is as follows.
```
Tokens Used: 0
Prompt Tokens: 0
Prompt Tokens Cached: 0 
Completion Tokens: 0  
Reasoning Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
```

- The message after modification is as follows.
```
Tokens Used: 3783 
Prompt Tokens: 3625
Prompt Tokens Cached: 2560
Completion Tokens: 158
Reasoning Tokens: 0
Successful Requests: 1
Total Cost (USD): $0.010642500000000001
```
2024-12-26 14:28:31 -05:00
Erick Friis
3726a944c0
docs: sorted by downloads [wip] (#28869) 2024-12-23 13:13:35 -08:00
Andreas Motl
6352edf77f
docs: CrateDB: Register package langchain-cratedb, and add minimal "provider" documentation (#28877)
Hi Erick. Coming back from a previous attempt, we now made a separate
package for the CrateDB adapter, called `langchain-cratedb`, as advised.
Other than registering the package within `libs/packages.yml`, this
patch includes a minimal amount of documentation to accompany the advent
of this new package. Let us know about any mistakes we made, or changes
you would like to see. Thanks, Andreas.

## About
- **Description:** Register a new database adapter package,
`langchain-cratedb`, providing traditional vector store, document
loader, and chat message history features for a start.
- **Addressed to:** @efriis, @eyurtsev
- **References:** GH-27710
- **Preview:** [Providers » More »
CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/)

## Status
- **PyPI:** https://pypi.org/project/langchain-cratedb/
- **GitHub:** https://github.com/crate/langchain-cratedb
- **Documentation (CrateDB):**
https://cratedb.com/docs/guide/integrate/langchain/
- **Documentation (LangChain):** _This PR._

## Backlog?
Is this applicable for this kind of patch?
> - [ ] **Add tests and docs**: If you're adding a new integration,
please include
> 1. a test for the integration, preferably unit tests that do not rely
on network access,
> 2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

## Q&A
1. Notebooks that use the LangChain CrateDB adapter are currently at
[CrateDB LangChain
Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain),
and the documentation refers to them. Because they are derived from very
old blueprints coming from LangChain 0.0.x times, we guess they need a
refresh before adding them to `docs/docs/integrations`. Is it applicable
to merge this minimal package registration + documentation patch, which
already includes valid code snippets in `cratedb.mdx`, and add
corresponding notebooks on behalf of a subsequent patch later?

2. How would it work getting into the tabular list of _Integration
Packages_ enumerated on the [documentation entrypoint page about
Providers](https://python.langchain.com/docs/integrations/providers/)?

/cc Please also review, @ckurze, @wierdvanderhaar, @kneth,
@simonprickett, if you can find the time. Thanks!
2024-12-23 10:55:44 -05:00
Wang Ran (汪然)
e5c9da3eb6
core[patch]: remove redundant imports (#28861)
`Graph` has been imported at Line: 62
2024-12-23 10:31:23 -05:00
Adrián Panella
8d9907088b
community(azuresearch): allow to use any valid credential (#28873)
Add option to use any valid credential type.
Differentiates async cases needed by Azure Search.

This could replace the use of a static token
2024-12-23 10:05:48 -05:00
Mohammad Mohtashim
41b6a86bbe
Community: LlamaCppEmbeddings embed_documents and embed_query (#28827)
- **Description:** `embed_documents` and `embed_query` was throwing off
the error as stated in the issue. The issue was that `Llama` client is
returning the embeddings in a nested list which is not being accounted
for in the current implementation and therefore the stated error is
being raised.
- **Issue:** #28813

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-23 09:50:22 -05:00
Darien Schettler
32917a0b98
Update dataframe.py (#28871)
community: optimize DataFrame document loader

**Description:**
Simplify the `lazy_load` method in the DataFrame document loader by
combining text extraction and metadata cleanup into a single operation.
This makes the code more concise while maintaining the same
functionality.

**Issue:** N/A

**Dependencies:** None

**Twitter handle:** N/A
2024-12-22 19:16:16 -05:00
yeounhak
f38fc89f35
community: Corrected aload func to be asynchronous from webBaseLoader (#28337)
- **Description:** The aload function, contrary to its name, is not an
asynchronous function, so it cannot work concurrently with other
asynchronous functions.

- **Issue:** #28336 

- **Test: **: Done

- **Docs: **
[here](e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201))

- **Lint: ** All checks passed

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-20 14:42:52 -05:00
Mohammad Mohtashim
8cf5f20bb5
required tool_choice added for ChatHuggingFace (#28851)
- **Description:** HuggingFace Inference Client V3 now supports
`required` as tool_choice which has been added.
- **Issue:** #28842
2024-12-20 12:06:04 -05:00
Sylvain DEPARTE
fcba567a77
partners: allow to set Prefix in AIMessage (for MistralAI) (#28846)
**Description:**

Added ability to set `prefix` attribute to prevent error : 
```
httpx.HTTPStatusError: Error response 400 while fetching https://api.mistral.ai/v1/chat/completions: {"object":"error","message":"Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant","type":"invalid_request_error","param":null,"code":null}
```

Co-authored-by: Sylvain DEPARTE <sylvain.departe@wizbii.com>
2024-12-20 11:09:45 -05:00
Jacob Mansdorfer
6d81137325
community: adding langchain-predictionguard partner package documentation (#28832)
- *[x] **PR title**: "community: adding langchain-predictionguard
partner package documentation"

- *[x] **PR message**:
- **Description:** This PR adds documentation for the
langchain-predictionguard package to main langchain repo, along with
deprecating current Prediction Guard LLMs package. The LLMs package was
previously broken, so I also updated it one final time to allow it to
continue working from this point onward. . This enables users to chat
with LLMs through the Prediction Guard ecosystem.
    - **Package Links**: 
        -  [PyPI](https://pypi.org/project/langchain-predictionguard/)
- [Github
Repo](https://www.github.com/predictionguard/langchain-predictionguard)
    - **Issue:** None
    - **Dependencies:** None
- **Twitter handle:** [@predictionguard](https://x.com/predictionguard)

- *[x] **Add tests and docs**: All docs have been added for the partner
package, and the current LLMs package test was updated to reflect
changes.


- *[x] **Lint and test**: Linting tests are all passing.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-20 10:51:44 -05:00
ccurme
f0e858b4e3
core[patch]: release 0.3.28 (#28837) 2024-12-19 17:52:32 -05:00
ccurme
137d1e9564
langchain[patch]: fix test following update to langchain-openai (#28838) 2024-12-19 22:39:48 +00:00
Emmanuel Leroy
c8db5a19ce
langchain_community.chat_models.oci_generative_ai: Fix a bug when using optional parameters in tools (#28829)
When using tools with optional parameters, the parameter `type` is not
longer available since langchain update to 0.3 (because of the pydantic
upgrade?) and there is now an `anyOf` field instead. This results in the
`type` being `None` in the chat request for the tool parameter, and the
LLM call fails with the error:

```
oci.exceptions.ServiceError: {'target_service': 'generative_ai_inference', 
'status': 400, 'code': '400', 
'opc-request-id': '...', 
'message': 'Parameter definition must have a type.', 
'operation_name': 'chat'
...
}
```

Example code that fails:

```
from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
from langchain_core.tools import tool
from typing import Optional

llm = ChatOCIGenAI(
        model_id="cohere.command-r-plus",
        service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
        compartment_id="ocid1.compartment.oc1...",
        auth_profile="your_profile",
        auth_type="API_KEY",
        model_kwargs={"temperature": 0, "max_tokens": 3000},
)

@tool
def test(example: Optional[str] = None):
    """This is the tool to use to test things

    Args:
        example: example variable, defaults to None
    """
    return "this is a test"

llm_with_tools = llm.bind_tools([test])

result = llm_with_tools.invoke("can you make a test for g")
```

This PR sets the param type to `any` in that case, and fixes the
problem.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-19 22:17:34 +00:00
Bagatur
c3ccd93c12
patch openai json mode test (#28831) 2024-12-19 21:43:32 +00:00
Bagatur
ce6748dbfe
xfail openai image token count test (#28828) 2024-12-19 21:23:30 +00:00
Anusha Karkhanis
26bdf40072
Langchain_Community: SQL LanguageParser (#28430)
## Description
(This PR has contributions from @khushiDesai, @ashvini8, and
@ssumaiyaahmed).

This PR addresses **Issue #11229** which addresses the need for SQL
support in document parsing. This is integrated into the generic
TreeSitter parsing library, allowing LangChain users to easily load
codebases in SQL into smaller, manageable "documents."

This pull request adds a new ```SQLSegmenter``` class, which provides
the SQL integration.

## Issue
**Issue #11229**: Add support for a variety of languages to
LanguageParser

## Testing
We created a file ```test_sql.py``` with several tests to ensure the
```SQLSegmenter``` is functional. Below are the tests we added:

- ```def test_is_valid```: Checks SQL validity.
- ```def test_extract_functions_classes```: Extracts individual SQL
statements.
- ```def test_simplify_code```: Simplifies SQL code with comments.

---------

Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com>
Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com>
Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com>
Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-19 20:30:57 +00:00
Bagatur
a7f2148061
openai[patch]: Release 0.2.14 (#28826) 2024-12-19 11:56:44 -08:00
Bagatur
1378ddfa5f
openai[patch]: type reasoning_effort (#28825) 2024-12-19 19:36:49 +00:00
Erick Friis
6a37899b39
core: dont mutate tool_kwargs during tool run (#28824)
fixes https://github.com/langchain-ai/langchain/issues/24621
2024-12-19 18:11:56 +00:00
Qun
033ac41760
fix crash when using create_xml_agent with parameterless function as … (#26002)
When using `create_xml_agent` or `create_json_chat_agent` to create a
agent, and the function corresponding to the tool is a parameterless
function, the `XMLAgentOutputParser` or `JSONAgentOutputParser` will
parse the tool input into an empty string, `BaseTool` will parse it into
a positional argument.
So, the program will crash finally because we invoke a parameterless
function but with a positional argument.Specially, below code will raise
StopIteration in
[_parse_input](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools/base.py#L419)
```python
from langchain import hub
from langchain.agents import AgentExecutor, create_json_chat_agent, create_xml_agent
from langchain_openai import ChatOpenAI

prompt = hub.pull("hwchase17/react-chat-json")

llm = ChatOpenAI()

# agent = create_xml_agent(llm, tools, prompt)
agent = create_json_chat_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(......)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 13:00:46 -05:00
Luke
f69695069d
text_splitters: Add HTMLSemanticPreservingSplitter (#25911)
**Description:** 

With current HTML splitters, they rely on secondary use of the
`RecursiveCharacterSplitter` to further chunk the document into
manageable chunks. The issue with this is it fails to maintain important
structures such as tables, lists, etc within HTML.

This Implementation of a HTML splitter, allows the user to define a
maximum chunk size, HTML elements to preserve in full, options to
preserve `<a>` href links in the output and custom handlers.

The core splitting begins with headers, similar to `HTMLHeaderSplitter`.
If these sections exceed the length of the `max_chunk_size` further
recursive splitting is triggered. During this splitting, elements listed
to preserve, will be excluded from the splitting process. This can cause
chunks to be slightly larger then the max size, depending on preserved
length. However, all contextual relevance of the preserved item remains
intact.

**Custom Handlers**: Sometimes, companies such as Atlassian have custom
HTML elements, that are not parsed by default with `BeautifulSoup`.
Custom handlers allows a user to provide a function to be ran whenever a
specific html tag is encountered. This allows the user to preserve and
gather information within custom html tags that `bs4` will potentially
miss during extraction.

**Dependencies:** User will need to install `bs4` in their project to
utilise this class

I have also added in `how_to` and unit tests, which require `bs4` to
run, otherwise they will be skipped.

Flowchart of process:


![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 12:09:22 -05:00
Tommaso De Lorenzo
24bfa062bf
langchain: add support for Google Anthropic Vertex AI model garden provider in init_chat_model (#28177)
Simple modification to add support for anthropic models deployed in
Google Vertex AI model garden in `init_chat_model` importing
`ChatAnthropicVertex`

- [v] **Lint and test**
2024-12-19 12:06:21 -05:00
Erick Friis
ff7b01af88
anthropic: less pydantic for client (#28823) 2024-12-19 08:00:02 -08:00
Erick Friis
f1d783748a
anthropic: sdk bump (#28820) 2024-12-19 15:39:21 +00:00
Erick Friis
907f36a6e9
fireworks: fix lint (#28821) 2024-12-19 15:36:36 +00:00
Erick Friis
6526db4871
community: bump core (#28819) 2024-12-19 06:41:53 -08:00
Vignesh A
4c9acdfbf1
Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135)
Added Token tracking for OpenAI's prompt caching and reasoning tokens
Costs updated from https://openai.com/api/pricing/

usage example
```python
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="o1-mini",temperature=1)

with get_openai_callback() as cb:
    response = llm.invoke("hi "*1500)
    print(cb)
```
Output
```
Tokens Used: 1720
	Prompt Tokens: 1508
		Prompt Tokens Cached: 1408
	Completion Tokens: 212
		Reasoning Tokens: 192
Successful Requests: 1
Total Cost (USD): $0.0049559999999999995
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 09:31:13 -05:00
ScriptShi
97f1e1d39f
community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812)
Added some restrictions to a vectorstore I released in the community
before.
2024-12-19 09:30:43 -05:00
Wang Ran (汪然)
f48755d35b
core: typo Utilities for tests. -> Utilities for pydantic. (#28814)
**Description:** typo
2024-12-19 09:26:17 -05:00
Wang Ran (汪然)
51b8ddaf10
core: typo in runnable (#28815)
Thank you for contributing to LangChain!

**Description:** Typo
2024-12-19 09:25:57 -05:00
Erick Friis
3b036a1cf2
partners/fireworks: release 0.2.6 (#28805) 2024-12-18 22:48:35 +00:00
Erick Friis
4eb8bf7793
partners/anthropic: release 0.3.1 (#28801) 2024-12-18 22:45:38 +00:00
Lu Peng
50afa7c4e7
community: add new parameter default_headers (#28700)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- "community: 1. add new parameter `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class."


- [x] **PR message**:
- **Description:** 1. add new parameters `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class.


- [x] **Add tests and docs**:
  1. unit tests
  2. notebook

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 22:33:23 +00:00
Erick Friis
cc616de509
partners/xai: release 0.1.1 (#28806) 2024-12-18 22:15:24 +00:00
Erick Friis
ba8c1b0d8c
partners/groq: release 0.2.2 (#28804) 2024-12-18 22:12:02 +00:00
Erick Friis
a119cae5bd
partners/mistralai: release 0.2.4 (#28803) 2024-12-18 22:11:48 +00:00
Erick Friis
514d78516b
partners/ollama: release 0.2.2 (#28802) 2024-12-18 22:11:08 +00:00