Commit Graph

6605 Commits

Author SHA1 Message Date
ccurme
c20f7418c7
openai[patch]: fix Azure LLM test (#29302)
The tokens I get are:
```
['', '\n\n', 'The', ' sun', ' was', ' setting', ' over', ' the', ' horizon', ',', ' casting', '']
```
so possibly an extra empty token is included in the output.

lmk @efriis if we should look into this further.
2025-01-19 17:25:42 +00:00
ccurme
6b249a0dc2
openai[patch]: release 0.3.1 (#29301) 2025-01-19 17:04:00 +00:00
ThomasSaulou
e9abe583b2
chatperplexity stream-citations in additional kwargs (#29273)
chatperplexity stream-citations in additional kwargs

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-18 22:31:10 +00:00
TheSongg
1cd4d8d101
[langchain_community.llms.xinference]: Rewrite _stream() method and support stream() method in xinference.py (#29259)
- [ ] **PR title**:[langchain_community.llms.xinference]: Rewrite
_stream() method and support stream() method in xinference.py

- [ ] **PR message**: Rewrite the _stream method so that the
chain.stream() can be used to return data streams.

       chain = prompt | llm
       chain.stream(input=user_input)


- [ ] **tests**: 
      from langchain_community.llms import Xinference
      from langchain.prompts import PromptTemplate

      llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
          stream = True
       )
prompt = PromptTemplate(input=['country'], template="Q: where can we
visit in the capital of {country}? A:")
      chain = prompt | llm
      chain.stream(input={'country': 'France'})
2025-01-17 20:31:59 -05:00
ccurme
184ea8aeb2
anthropic[patch]: update tool choice type (#29276) 2025-01-17 15:26:33 -05:00
ccurme
ac52021097
anthropic[patch]: release 0.3.2 (#29275) 2025-01-17 19:48:31 +00:00
ccurme
c616b445f2
anthropic[patch]: support parallel_tool_calls (#29257)
Need to:
- Update docs
- Decide if this is an explicit kwarg of bind_tools
- Decide if this should be in standard test with flag for supporting
2025-01-17 19:41:41 +00:00
ccurme
d5360b9bd6
core[patch]: release 0.3.30 (#29256) 2025-01-16 17:52:37 -05:00
Nuno Campos
595297e2e5
core: Add support for calls in get_function_nonlocals (#29255)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-16 14:43:42 -08:00
Luis Lopez
75663f2cae
community: Add cost per 1K tokens for fine-tuned model cached input (#29248)
### Description

- Since there is no cost per 1k input tokens for a fine-tuned cached
version of `gpt-4o-mini-2024-07-18` is not available when using the
`OpenAICallbackHandler`, it raises an error when trying to make calls
with such model.
- To add the price in the `MODEL_COST_PER_1K_TOKENS` dictionary

cc. @efriis
2025-01-16 15:19:26 -05:00
Junon
667d2a57fd
add mode arg to OBSFileLoader.load() method (#29246)
- **Description:** add mode arg to OBSFileLoader.load() method
  - **Issue:** #29245
  - **Dependencies:** no dependencies required for this change

---------

Co-authored-by: Junon_Gz <junon_gz@qq.com>
2025-01-16 11:09:04 -05:00
Erick Friis
5eb4dc5e06
standard-tests: double messages test (#29237) 2025-01-15 15:14:29 -08:00
Nithish Raghunandanan
1051fa5729
couchbase: Migrate couchbase partner package to different repo (#29239)
**Description:** Migrate the couchbase partner package to
[Couchbase-Ecosystem](https://github.com/Couchbase-Ecosystem/langchain-couchbase)
org
2025-01-15 12:37:27 -08:00
Nadeem Sajjad
eaf2fb287f
community(pypdfloader): added page_label in metadata for pypdf loader (#29225)
# Description

## Summary
This PR adds support for handling multi-labeled page numbers in the
**PyPDFLoader**. Some PDFs use complex page numbering systems where the
actual content may begin after multiple introductory pages. The
page_label field helps accurately reflect the document’s page structure,
making it easier to handle such cases during document parsing.

## Motivation
This feature improves document parsing accuracy by allowing users to
access the actual page labels instead of relying only on the physical
page numbers. This is particularly useful for documents where the first
few pages have roman numerals or other non-standard page labels.

## Use Case
This feature is especially useful for **Retrieval-Augmented Generation**
(RAG) systems where users may reference page numbers when asking
questions. Some PDFs have both labeled page numbers (like roman numerals
for introductory sections) and index-based page numbers.

For example, a user might ask:

	"What is mentioned on page 5?"

The system can now check both:
	•	**Index-based page number** (page)
	•	**Labeled page number** (page_label)

This dual-check helps improve retrieval accuracy. Additionally, the
results can be validated with an **agent or tool** to ensure the
retrieved pages match the user’s query contextually.

## Code Changes

- Added a page_label field to the metadata of the Document class in
**PyPDFLoader**.
- Implemented support for retrieving page_label from the
pdf_reader.page_labels.
- Created a test case (test_pypdf_loader_with_multi_label_page_numbers)
with a sample PDF containing multi-labeled pages
(geotopo-komprimiert.pdf) [[Source of
pdf](https://github.com/py-pdf/sample-files/blob/main/009-pdflatex-geotopo/GeoTopo-komprimiert.pdf)].
- Updated existing tests to ensure compatibility and verify page_label
extraction.

## Tests Added

- Added a new test case for a PDF with multi-labeled pages.
- Verified both page and page_label metadata fields are correctly
extracted.

## Screenshots

<img width="549" alt="image"
src="https://github.com/user-attachments/assets/65db9f5c-032e-4592-926f-824777c28f33"
/>
2025-01-15 14:18:07 -05:00
Mehdi
1a38948ee3
Mehdi zare/fmp data doc (#29219)
Title: community: add Financial Modeling Prep (FMP) API integration

Description: Adding LangChain integration for Financial Modeling Prep
(FMP) API to enable semantic search and structured tool creation for
financial data endpoints. This integration provides semantic endpoint
search using vector stores and automatic tool creation with proper
typing and error handling. Users can discover relevant financial
endpoints using natural language queries and get properly typed
LangChain tools for discovered endpoints.

Issue: N/A

Dependencies:

fmp-data>=0.3.1
langchain-core>=0.1.0
faiss-cpu
tiktoken
Twitter handle: @mehdizarem

Unit tests and example notebook have been added:

Tests are in tests/integration_tests/est_tools.py and
tests/unit_tests/test_tools.py
Example notebook is in docs/tools.ipynb
All format, lint and test checks pass:

pytest
mypy .
Dependencies are imported within functions and not added to
pyproject.toml. The changes are backwards compatible and only affect the
community package.

---------

Co-authored-by: mehdizare <mehdizare@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-15 15:31:01 +00:00
Mohammad Mohtashim
288613d361
(text-splitters): Small Fix in _process_html for HTMLSemanticPreservingSplitter to properly extract the metadata. (#29215)
- **Description:** Include `main` in the list of elements whose child
elements needs to be processed for splitting the HTML.
- **Issue:** #29184
2025-01-15 10:18:06 -05:00
TheSongg
4867fe7ac8
[langchain_community.llms.xinference]: fix error in xinference.py (#29216)
- [ ] **PR title**: [langchain_community.llms.xinference]: fix error in
xinference.py

- [ ] **PR message**:
- The old code raised an ValidationError:
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference when import Xinference from xinference.py. This issue has
been resolved by adjusting it's type and default value.

File "/media/vdc/python/lib/python3.10/site-packages/pydantic/main.py",
line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data,
self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for
Xinference
        client
Field required [type=missing, input_value={'server_url':
'http://10...t4', 'model_kwargs': {}}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.9/v/missing

- [ ] **tests**:

       from langchain_community.llms import Xinference
       llm = Xinference(
server_url="http://0.0.0.0:9997", # replace your xinference server url
model_uid={model_uid} # replace model_uid with the model UID return from
launching the model
         )
2025-01-15 10:11:26 -05:00
Syed Baqar Abbas
4278046329
[fix] Convert table names to list for compatibility in SQLDatabase (#29229)
- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
  - The issue #29227 is being fixed here
  - The "package" modified is community
  - The issue lied in this block of code:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L72-L77)

- [langchain_community.utilities.SQLDatabase] **[fix] Convert table
names to list for compatibility in SQLDatabase**:
- **Description:** When the SQLDatabase is initialized, it runs a code
`self._inspector.get_table_names(schema=schema)` which expects an output
of list. However, with some connectors (such as snowflake) the data type
returned could be another iterable. This results in a type error when
concatenating the table_names to view_names. I have added explicit type
casting to prevent this.
    - **Issue:** The issue #29227 is being fixed here
    - **Dependencies:** None
    - **Twitter handle:** @BaqarAbbas2001

## Additional Information
When the following method is called for a Snowflake database:

44b41b699c/libs/community/langchain_community/utilities/sql_database.py (L75)

Snowflake under the hood calls:
```python
from snowflake.sqlalchemy.snowdialect import SnowflakeDialect
SnowflakeDialect.get_table_names
```

This method returns a `dict_keys()` object which is incompatible to
concatenate with a list and results in a `TypeError`

### Relevant Library Versions
- **snowflake-sqlalchemy**: 1.7.2  
- **snowflake-connector-python**: 3.12.4  
- **sqlalchemy**: 2.0.20  
- **langchain_community**: 0.3.14
2025-01-15 10:00:03 -05:00
Jin Hyung Ahn
05554265b4
community: Fix ConfluenceLoader load() failure caused by deleted pages (#29232)
## Description
This PR modifies the is_public_page function in ConfluenceLoader to
prevent exceptions caused by deleted pages during the execution of
ConfluenceLoader.process_pages().


**Example scenario:**
Consider the following usage of ConfluenceLoader:
```python
import os
from langchain_community.document_loaders import ConfluenceLoader

loader = ConfluenceLoader(
        url=os.getenv("BASE_URL"),
        token=os.getenv("TOKEN"),
        max_pages=1000,
        cql=f'type=page and lastmodified >= "2020-01-01 00:00"',
        include_restricted_content=False,
)

# Raised Exception : HTTPError: Outdated version/old_draft/trashed? Cannot find content Please provide valid ContentId.
documents = loader.load()
```

If a deleted page exists within the query result, the is_public_page
function would previously raise an exception when calling
get_all_restrictions_for_content, causing the loader.load() process to
fail for all pages.



By adding a pre-check for the page's "current" status, unnecessary API
calls to get_all_restrictions_for_content for non-current pages are
avoided.


This fix ensures that such pages are skipped without affecting the rest
of the loading process.





## Issue
N/A (No specific issue number)

## Dependencies
No new dependencies are introduced with this change.

## Twitter handle
[@zenoengine](https://x.com/zenoengine)
2025-01-15 09:56:23 -05:00
Mohammad Mohtashim
21eb39dff0
[Community]: AzureOpenAIWhisperParser Authenication Fix (#29135)
- **Description:** `AzureOpenAIWhisperParser` authentication fix as
stated in the issue.
- **Issue:** #29133
2025-01-15 09:44:53 -05:00
Erick Friis
b05543c69b
packages: disable mongodb for api docs (#29218) 2025-01-15 02:23:01 +00:00
Erick Friis
30badd7a32
packages: update mongodb folder (#29217) 2025-01-15 02:01:06 +00:00
pm390
76172511fd
community: Additional parameters for OpenAIAssistantV2Runnable (#29207)
**Description:** Added Additional parameters that could be useful for
usage of OpenAIAssistantV2Runnable.

This change is thought to allow langchain users to set parameters that
cannot be set using assistants UI
(max_completion_tokens,max_prompt_tokens,parallel_tool_calls) and
parameters that could be useful for experimenting like top_p and
temperature.

This PR originated from the need of using parallel_tool_calls in
langchain, this parameter is very important in openAI assistants because
without this parameter set to False strict mode is not respected by
OpenAI Assistants
(https://platform.openai.com/docs/guides/function-calling#parallel-function-calling).

> Note: Currently, if the model calls multiple functions in one turn
then strict mode will be disabled for those calls.

**Issue:** None
**Dependencies:** openai
2025-01-14 15:53:37 -05:00
Bagatur
4ab04ad6be
docs: oai api ref nit (#29210) 2025-01-14 17:55:16 +00:00
Michael Chin
d9b856abad
community: Deprecate Amazon Neptune resources in langchain-community (#29191)
Related: https://github.com/langchain-ai/langchain-aws/pull/322

The legacy `NeptuneOpenCypherQAChain` and `NeptuneSparqlQAChain` classes
are being replaced by the new LCEL format chains
`create_neptune_opencypher_qa_chain` and
`create_neptune_sparql_qa_chain`, respectively, in the `langchain_aws`
package.

This PR adds deprecation warnings to all Neptune classes and functions
that have been migrated to `langchain_aws`. All relevant documentation
has also been updated to replace `langchain_community` usage with the
new `langchain_aws` implementations.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-14 10:23:34 -05:00
Erick Friis
c55af44711
anthropic: pydantic mypy plugin (#29144) 2025-01-13 15:32:40 -08:00
ccurme
1bf6576709
cli[patch]: fix anchor links in templates (#29178)
These are outdated and can break docs builds.
2025-01-13 18:28:18 +00:00
Christopher Varjas
e156b372fb
langchain: support api key argument with OpenAI moderation chain (#29140)
**Description:** Makes it possible to instantiate
`OpenAIModerationChain` with an `openai_api_key` argument only and no
`OPENAI_API_KEY` environment variable defined.

**Issue:** https://github.com/langchain-ai/langchain/issues/25176

**Dependencies:** `openai`

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 11:00:02 -05:00
Nikhil Shahi
335ca3a606
docs: add HyperbrowserLoader docs (#29143)
### Description
This PR adds docs for the
[langchain-hyperbrowser](https://pypi.org/project/langchain-hyperbrowser/)
package. It includes a document loader that uses Hyperbrowser to scrape
or crawl any urls and return formatted markdown or html content as well
as relevant metadata.
[Hyperbrowser](https://hyperbrowser.ai) is a platform for running and
scaling headless browsers. It lets you launch and manage browser
sessions at scale and provides easy to use solutions for any webscraping
needs, such as scraping a single page or crawling an entire site.

### Issue
None

### Dependencies
None

### Twitter Handle
`@hyperbrowser`
2025-01-13 10:45:39 -05:00
Tymon Żarski
689592f9bb
community: Fix rank-llm import paths for new 0.20.3 version (#29154)
# **PR title**: "community: Fix rank-llm import paths for new 0.20.3
version"
- The "community" package is being modified to handle updated import
paths for the new `rank-llm` version.

---

## Description
This PR updates the import paths for the `rank-llm` package to account
for changes introduced in version `0.20.3`. The changes ensure
compatibility with both pre- and post-revamp versions of `rank-llm`,
specifically version `0.12.8`. Conditional imports are introduced based
on the detected version of `rank-llm` to handle different path
structures for `VicunaReranker`, `ZephyrReranker`, and `SafeOpenai`.

## Issue
RankLLMRerank usage throws an error when used GPT (not only) when
rank-llm version is > 0.12.8 - #29156

## Dependencies
This change relies on the `packaging` and `pkg_resources` libraries to
handle version checks.

## Twitter handle
@tymzar
2025-01-13 10:22:14 -05:00
Andrew
0e3115330d
Add additional_instructions on openai assistan runs create. (#29164)
- **Description**: In the functions `_create_run` and `_acreate_run`,
the parameters passed to the creation of
`openai.resources.beta.threads.runs` were limited.

  Source: 
  ```
  def _create_run(self, input: dict) -> Any:
        params = {
            k: v
            for k, v in input.items()
            if k in ("instructions", "model", "tools", "run_metadata")
        }
        return self.client.beta.threads.runs.create(
            input["thread_id"],
            assistant_id=self.assistant_id,
            **params,
        )
  ```
- OpenAI Documentation
([createRun](https://platform.openai.com/docs/api-reference/runs/createRun))

- Full list of parameters `openai.resources.beta.threads.runs` ([source
code](https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/threads/runs/runs.py#L91))

 
- **Issue:** Fix #17574 



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Co-authored-by: ccurme <chester.curme@gmail.com>
2025-01-13 10:11:47 -05:00
ccurme
e4ceafa1c8
langchain[patch]: update extended tests for compatibility with langchain-openai==0.3 (#29174) 2025-01-13 15:04:22 +00:00
Priyansh Agrawal
c115c09b6d
community: add missing format specifier in error log in CubeSemanticLoader (#29172)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**
- **Description:** Add a missing format specifier in an an error log in
`langchain_community.document_loaders.CubeSemanticLoader`
- **Issue:** raises `TypeError: not all arguments converted during
string formatting`


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-13 09:32:57 -05:00
ThomasSaulou
349b5c91c2
fix chatperplexity: remove 'stream' from params in _stream method (#29173)
quick fix chatperplexity: remove 'stream' from params in _stream method
2025-01-13 09:31:37 -05:00
LIU Yuwei
f980144e9c
community: add init for unstructured file loader (#29101)
## Description
Add `__init__` for unstructured loader of
epub/image/markdown/pdf/ppt/word to restrict the input type to `str` or
`Path`.
In the
[signature](https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html)
these unstructured loaders receive `file_path: str | List[str] | Path |
List[Path]`, but actually they only receive `str` or `Path`.

## Issue
None

## Dependencies
No changes.
2025-01-13 09:26:00 -05:00
Erick Friis
bbc3e3b2cf
openai: disable streaming for o1 by default (#29147)
Currently 400s
https://community.openai.com/t/streaming-support-for-o1-o1-2024-12-17-resulting-in-400-unsupported-value/1085043

o1-mini and o1-preview stream fine
2025-01-11 02:24:11 +00:00
Isaac Francisco
62074bac60
replace all LANGCHAIN_ flags with LANGSMITH_ flags (#29120) 2025-01-11 01:24:40 +00:00
Bagatur
5c2fbb5b86
docs: Update openai README.md (#29146) 2025-01-10 17:24:16 -08:00
Erick Friis
0a54aedb85
anthropic: pdf integration test (#29142) 2025-01-10 21:56:31 +00:00
ccurme
8de8519daf
tests[patch]: release 0.3.8 (#29141) 2025-01-10 21:53:41 +00:00
Jiang
7d3fb21807
Add lindorm as new integration (#29123)
Misoperation caused the pr close: [origin pr
link](https://github.com/langchain-ai/langchain/pull/29085)

---------

Co-authored-by: jiangzhijie <jiangzhijie.jzj@alibaba-inc.com>
2025-01-10 16:30:37 -05:00
ccurme
4819b500e8
pinecone[patch]: release 0.2.2 (#29139) 2025-01-10 14:59:57 -05:00
Ashvin
46fd09ffeb
partner: Update aiohttp in langchain pinecone. (#28863)
- **partner**: "Update Aiohttp for resolving vulnerability issue"
    
- **Description:** I have updated the upper limit of aiohttp from `3.10`
to `3.10.5` in the pyproject.toml file of langchain-pinecone. Hopefully
this will resolve #28771 . Please review this as I'm quite unsure.

---------

Co-authored-by: = <=>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-10 14:54:52 -05:00
ccurme
f3d370753f
xai[minor]: release 0.2 (#29132)
Update `langchain-openai` to 0.3. See [release
notes](https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D0.3.0)
for details. Should only impact default values of `temperature`, `n`,
and `max_retries`.
2025-01-10 11:47:27 -05:00
ccurme
6e63ccba84
openai[minor]: release 0.3 (#29100)
## Goal

Solve the following problems with `langchain-openai`:

- Structured output with `o1` [breaks out of the
box](https://langchain.slack.com/archives/C050X0VTN56/p1735232400232099).
- `with_structured_output` by default does not use OpenAI’s [structured
output
feature](https://platform.openai.com/docs/guides/structured-outputs).
- We override API defaults for temperature and other parameters.

## Breaking changes:

- Default method for structured output is changing to OpenAI’s dedicated
[structured output
feature](https://platform.openai.com/docs/guides/structured-outputs).
For schemas specified via TypedDict or JSON schema, strict schema
validation is disabled by default but can be enabled by specifying
`strict=True`.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Models that don’t support `method="json_schema"` (e.g., `gpt-4` and
`gpt-3.5-turbo`, currently the default model for ChatOpenAI) will raise
an error unless `method` is explicitly specified.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- Schemas specified via Pydantic `BaseModel` that have fields with
non-null defaults or metadata (like min/max constraints) will raise an
error.
- To recover previous default, pass `method="function_calling"` into
`with_structured_output`.
- `strict` now defaults to False for `method="json_schema"` when schemas
are specified via TypedDict or JSON schema.
- To recover previous behavior, use `with_structured_output(schema,
strict=True)`
- Schemas specified via Pydantic V1 will raise a warning (and use
`method="function_calling"`) unless `method` is explicitly specified.
- To remove the warning, pass `method="function_calling"` into
`with_structured_output`.
- Streaming with default structured output method / Pydantic schema no
longer generates intermediate streamed chunks.
- To recover previous behavior, pass `method="function_calling"` into
`with_structured_output`.
- We no longer override default temperature (was 0.7 in LangChain, now
will follow OpenAI, currently 1.0).
- To recover previous behavior, initialize `ChatOpenAI` or
`AzureChatOpenAI` with `temperature=0.7`.
- Note: conceptually there is a difference between forcing a tool call
and forcing a response format. Tool calls may have more concise
arguments vs. generating content adhering to a schema. Prompts may need
to be adjusted to recover desired behavior.

---------

Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-01-10 10:50:32 -05:00
ccurme
815bfa1913
openai[patch]: support streaming with json_schema response format (#29044)
- Stream JSON string content. Final chunk includes parsed representation
(following OpenAI
[docs](https://platform.openai.com/docs/guides/structured-outputs#streaming)).
- Mildly (?) breaking change: if you were using streaming with
`response_format` before, usage metadata will disappear unless you set
`stream_usage=True`.

## Response format

Before:

![Screenshot 2025-01-06 at 11 59
01 AM](https://github.com/user-attachments/assets/e54753f7-47d5-421d-b8f3-172f32b3364d)


After:

![Screenshot 2025-01-06 at 11 58
13 AM](https://github.com/user-attachments/assets/34882c6c-2284-45b4-92f7-5b5b69896903)


## with_structured_output

For pydantic output, behavior of `with_structured_output` is unchanged
(except for warning disappearing), because we pluck the parsed
representation straight from OpenAI, and OpenAI doesn't return it until
the stream is completed. Open to alternatives (e.g., parsing from
content or intermediate dict chunks generated by OpenAI).

Before:

![Screenshot 2025-01-06 at 12 38
11 PM](https://github.com/user-attachments/assets/913d320d-f49e-4cbb-a800-b394ae817fd1)

After:

![Screenshot 2025-01-06 at 12 38
58 PM](https://github.com/user-attachments/assets/f7a45dd6-d886-48a6-8d76-d0e21ca767c6)
2025-01-09 10:32:30 -05:00
Panos Vagenas
858f655a25
docs: add Docling loader docs (#29104)
### Description
This adds the docs for the Docling document loader.
[Docling](https://github.com/DS4SD/docling) parses PDF, DOCX, PPTX,
HTML, and other formats into a rich unified representation including
document layout, tables etc., making them ready for generative AI
workflows like RAG.

Some references:
- https://research.ibm.com/blog/docling-generative-AI
-
https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai
- [Docling Technical Report](https://arxiv.org/abs/2408.09869)

The introduced `DoclingLoader` enables users to:
- use various document types in their LLM applications with ease and
speed, and
- leverage Docling's rich representation for advanced, document-native
grounding.

### Issue
Replacing PR #27987 as discussed with @efriis
[here](https://github.com/langchain-ai/langchain/pull/27987#issuecomment-2489354930).

### Dependencies
None

---------

Signed-off-by: Panos Vagenas <35837085+vagenas@users.noreply.github.com>
2025-01-09 10:15:35 -05:00
Joshua Campbell
00dcc44739
Langchain_community: Fix issue with missing backticks in arango client (#29110)
- **Description:** Adds backticks to generate_schema function in the
arango graph client
- **Issue:** We experienced an issue with the generate schema function
when talking to our arango database where these backticks were missing
    - **Dependencies:** none
    - **Twitter handle:** @anangelofgrace
2025-01-09 10:00:10 -05:00
LIU Yuwei
2b09f798e1
community: add init for UnstructuredHTMLLoader to solve pathlib paths (#29091)
## Description
Add `__init__` for `UnstructuredHTMLLoader` to restrict the input type
to `str` or `Path`, and transfer the `self.file_path` to `str` just like
`UnstructuredXMLLoader` does.

## Issue
Fix #29090 

## Dependencies
No changes.
2025-01-08 10:19:27 -05:00
Jin Hyung Ahn
c8ca1cd42f
community: fix "confluence-loader" enable include_labels for documents loaded via CQL (#29089)
## Description
This PR enables label inclusion for documents loaded via CQL in the
confluence-loader.

- Updated _lazy_load to pass the include_labels parameter instead of
False in process_pages calls for documents loaded via CQL.
- Ensured that labels can now be fetched and added to the metadata for
documents queried with cql.

## Related Modification History
This PR builds on the previous functionality introduced in
[#28259](https://github.com/langchain-ai/langchain/pull/28259), which
added support for including labels with the include_labels option.
However, this functionality did not work as expected for CQL queries,
and this PR fixes that issue.

If the False handling was intentional due to another issue, please let
me know. I have verified with our Confluence instance that this change
allows labels to be correctly fetched for documents loaded via CQL.

## Issue
Fixes #29088


## Dependencies
No changes.

## Twitter Handle
[@zenoengine](https://x.com/zenoengine)
2025-01-08 10:16:39 -05:00
Inah Jeon
9d290abccd
partner: Update Upstage Model Names and Remove Deprecated Model (#29093)
This PR updates model names in the upstage library to reflect the latest
naming conventions and removes deprecated models.

Changes:

Renamed Models:
- `solar-1-mini-chat` -> `solar-mini`
- `solar-1-mini-embedding-query` -> `embedding-query`

Removed Deprecated Models:
- `layout-analysis` (replaced to `document-parse`)

Reference:
- https://console.upstage.ai/docs/getting-started/overview
-
https://github.com/langchain-ai/langchain-upstage/releases/tag/libs%2Fupstage%2Fv0.5.0

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2025-01-08 10:13:22 -05:00
Prashanth Rao
b1dafaef9b
Kùzu package integration docs (#29076)
## Langchain Kùzu

### Description
 
This PR adds docs for the `langchain-kuzu` package [on
PyPI](https://pypi.org/project/langchain-kuzu/) that was recently
published, allowing Kùzu users to more easily use and work with
LangChain QA chains. The package will also make it easier for the Kùzu
team to continue supporting and updating the integration over future
releases.

### Twitter Handle

Please tag [@kuzudb](https://x.com/kuzudb) on Twitter once this PR is
merged, so LangChain users can be notified!

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2025-01-08 01:14:00 +00:00
Erick Friis
cc0f81f40f
partners/groq: release 0.2.3 (#29081) 2025-01-07 23:36:51 +00:00
Erick Friis
fcc9cdd100
multiple: disable socket for unit tests (#29080) 2025-01-07 15:31:50 -08:00
Erick Friis
539ebd5431
groq: user agent (#29079) 2025-01-07 23:21:57 +00:00
Erick Friis
c5bee0a544
pinecone: bump core version (#29077) 2025-01-07 20:23:33 +00:00
Cory Waddingham
ce9e9f9314
pinecone: Review pinecone tests (#29073)
Title: langchain-pinecone: improve test structure and async handling

Description: This PR improves the test infrastructure for the
langchain-pinecone package by:
1. Implementing LangChain's standard test patterns for embeddings
2. Adding comprehensive configuration testing
3. Improving async test coverage
4. Fixing integration test issues with namespaces and async markers

The changes make the tests more robust, maintainable, and aligned with
LangChain's testing standards while ensuring proper async behavior in
the embeddings implementation.

Key improvements:
- Added standard EmbeddingsTests implementation
- Split custom configuration tests into a separate test class
- Added proper async test coverage with pytest-asyncio
- Fixed namespace handling in vector store integration tests
- Improved test organization and documentation

Dependencies: None (uses existing test dependencies)

Tests and Documentation:
-  Added standard test implementation following LangChain's patterns
-  Added comprehensive unit tests for configuration and async behavior
-  All tests passing locally
- No documentation changes needed (internal test improvements only)

Twitter handle: N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-07 11:46:30 -08:00
Philippe PRADOS
2921597c71
community[patch]: Refactoring PDF loaders: 01 prepare (#29062)
- **Refactoring PDF loaders step 1**: "community: Refactoring PDF
loaders to standardize approaches"

- **Description:** Declare CloudBlobLoader in __init__.py. file_path is
Union[str, PurePath] anywhere
- **Twitter handle:** pprados

This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once.
This specific part focuses to prepare the update of all parsers.

For more details, see [PR
28970](https://github.com/langchain-ai/langchain/pull/28970).

@eyurtsev it's the start of a PR series.
2025-01-07 11:00:04 -05:00
ccurme
55677e31f7
text-splitters[patch]: release 0.3.5 (#29054)
Resolves https://github.com/langchain-ai/langchain/issues/29053
2025-01-07 09:48:26 -05:00
Erick Friis
187131c55c
Revert "integrations[patch]: remove non-required chat param defaults" (#29048)
Reverts langchain-ai/langchain#26730

discuss best way to release default changes (esp openai temperature)
2025-01-06 14:45:34 -08:00
Bagatur
3d7ae8b5d2
integrations[patch]: remove non-required chat param defaults (#26730)
anthropic:
  - max_retries

openai:
  - n
  - temperature
  - max_retries

fireworks
  - temperature

groq
  - n
  - max_retries
  - temperature

mistral
  - max_retries
  - timeout
  - max_concurrent_requests
  - temperature
  - top_p
  - safe_mode

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 22:26:22 +00:00
UV
b9db8e9921
DOC: Improve human input prompt in FewShotChatMessagePromptTemplate example (#29023)
Fixes #29010 

This PR updates the example for FewShotChatMessagePromptTemplate by
modifying the human input prompt to include a more descriptive and
user-friendly question format ('What is {input}?') instead of just
'{input}'. This change enhances clarity and usability in the
documentation example.

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 12:29:15 -08:00
ccurme
1f78d4faf4
voyageai[patch]: release 0.1.4 (#29046) 2025-01-06 20:20:19 +00:00
Eugene Evstafiev
6a152ce245
docs: add langchain-pull-md Markdown loader (#29024)
- [x] **PR title**: "docs: add langchain-pull-md Markdown loader"

- [x] **PR message**: 
- **Description:** This PR introduces the `langchain-pull-md` package to
the LangChain community. It includes a new document loader that utilizes
the pull.md service to convert URLs into Markdown format, particularly
useful for handling web pages rendered with JavaScript frameworks like
React, Angular, or Vue.js. This loader helps in efficient and reliable
Markdown conversion directly from URLs without local rendering, reducing
server load.
    - **Issue:** NA
    - **Dependencies:** requests >=2.25.1
    - **Twitter handle:** https://x.com/eugeneevstafev?s=21

- [x] **Add tests and docs**: 
1. Added unit tests to verify URL checking and conversion
functionalities.
2. Created a comprehensive example notebook detailing the usage of the
new loader.

- [x] **Lint and test**: 
- Completed local testing using `make format`, `make lint`, and `make
test` commands as per the LangChain contribution guidelines.


**Related Links:**
- [Package Repository](https://github.com/chigwell/langchain-pull-md)
- [PyPI Package](https://pypi.org/project/langchain-pull-md/)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-06 19:32:43 +00:00
Ashvin
20a715a103
community: Fix redundancy in code. (#29022)
In my previous PR (#28953), I added an unwanted condition for validating
the Azure ML Endpoint. In this PR, I have rectified the issue.
2025-01-06 12:58:16 -05:00
Adrián Panella
acddfc772e
core: allow artifact in create_retriever_tool (#28903)
Add option to return content and artifacts, to also be able to access
the full info of the retrieved documents.

They are returned as a list of dicts in the `artifacts` property if
parameter `response_format` is set to `"content_and_artifact"`.

Defaults to `"content"` to keep current behavior.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 22:10:31 +00:00
ccurme
3e618b16cd
community[patch]: release 0.3.14 (#29019) 2025-01-03 15:34:24 -05:00
ccurme
18eb9c249d
langchain[patch]: release 0.3.14 (#29018) 2025-01-03 15:15:44 -05:00
ccurme
8e50e4288c
core[patch]: release 0.3.29 (#29017) 2025-01-03 14:58:39 -05:00
ccurme
85403bfa99
core[patch]: substantially speed up @deprecated (#29016)
Resolves https://github.com/langchain-ai/langchain/issues/26918

Unit tests don't raise any additional `LangChainDeprecationWarning`.
Would like guidance on how to test this more thoroughly if needed.

Note: speed up for `bind_tools` path is shown below. This is
**redundant** with the speedup in
https://github.com/langchain-ai/langchain/pull/29015. I include it for
demonstration purposes.

Before:

![Screenshot 2025-01-03 at 12 54
50 PM](https://github.com/user-attachments/assets/87f289eb-4cad-4304-85f7-5c58c59080f1)

After:

![Screenshot 2025-01-03 at 12 55
35 PM](https://github.com/user-attachments/assets/95ad0506-e1d1-4c5c-bb27-6a634d8810c9)
2025-01-03 14:38:53 -05:00
ccurme
4bb391fd4e
core[patch]: remove deprecated functions from tool binding hotpath (#29015)
(Inspired by https://github.com/langchain-ai/langchain/issues/26918)

We rely on some deprecated public functions in the hot path for tool
binding (`convert_pydantic_to_openai_function`,
`convert_python_function_to_openai_function`, and
`format_tool_to_openai_function`). My understanding is that what is
deprecated is not the functionality they implement, but use of them in
the public API -- we expect to continue to rely on them.

Here we update these functions to be private and not deprecated. We keep
the public, deprecated functions as simple wrappers that can be safely
deleted.

The `@deprecated` wrapper adds considerable latency due to its use of
the `inspect` module. This update speeds up `bind_tools` by a factor of
~100x:

Before:

![Screenshot 2025-01-03 at 11 22
55 AM](https://github.com/user-attachments/assets/94b1c433-ce12-406f-b64c-ca7103badfe0)

After:

![Screenshot 2025-01-03 at 11 23
41 AM](https://github.com/user-attachments/assets/02d0deab-82e4-45ca-8cc7-a20b91a5b5db)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-03 19:29:01 +00:00
Eugene Evstafiev
a86904e735
docs: fix typo (#29012)
Thank you for contributing to LangChain!

- [x] **PR title**: "docs: fix typo"

- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a minor fix of typo
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. ~~a test for the integration, preferably unit tests that do not rely
on network access,~~
2. ~~an example notebook showing its use. It lives in
`docs/docs/integrations` directory.~~


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2025-01-03 09:52:24 -08:00
Erick Friis
919d1c7da6
box: remove box readme for api docs build (#29014) 2025-01-03 09:50:04 -08:00
Erick Friis
d8bc556c94
packages: update box location (#29013) 2025-01-03 09:45:13 -08:00
Amaan
8d7daa59fb
docs: add langchain dappier retriever integration notebooks (#28931)
Add a retriever to interact with Dappier APIs with an example notebook.

The retriever can be invoked with:

```python
from langchain_dappier import DappierRetriever

retriever = DappierRetriever(
    data_model_id="dm_01jagy9nqaeer9hxx8z1sk1jx6",
    k=5
)

retriever.invoke("latest tech news")
```

To retrieve 5 documents related to latest news in the tech sector. The
included notebook also includes deeper details about controlling filters
such as selecting a data model, number of documents to return, site
domain reference, minimum articles from the reference domain, and search
algorithm, as well as including the retriever in a chain.

The integration package can be found over here -
https://github.com/DappierAI/langchain-dappier
2025-01-03 10:21:41 -05:00
ccurme
0185010b88
community[patch]: additional check for prompt caching support (#29008)
Prompt caching explicitly excludes `gpt-4o-2024-05-13`:
https://platform.openai.com/docs/guides/prompt-caching

Resolves https://github.com/langchain-ai/langchain/issues/28997
2025-01-03 10:14:07 -05:00
Tari Yekorogha
ba9dfd9252
docs: Add FalkorDB Chat Message History and Update Package Registry (#28914)
This commit updates the documentation and package registry for the
FalkorDB Chat Message History integration.

**Changes:**

- Added a comprehensive example notebook
falkordb_chat_message_history.ipynb demonstrating how to use FalkorDB
for session-based chat message storage.

- Added a provider notebook for FalkorDB

- Updated libs/packages.yml to register FalkorDB as an integration
package, following LangChain's new guidelines for community
integrations.

**Notes:**

- This update aligns with LangChain's process for registering new
integrations via documentation updates and package registry
modifications.

- No functional or core package changes were made in this commit.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 15:46:47 -05:00
Ashvin
d26c102a5a
community: Update azureml endpoint (#28953)
- In this PR, I have updated the AzureML Endpoint with the latest
endpoint.
- **Description:** I have changed the existing `/chat/completions` to
`/models/chat/completions` in
libs/community/langchain_community/llms/azureml_endpoint.py
    - **Issue:** #25702

---------

Co-authored-by: = <=>
2025-01-02 14:47:02 -05:00
ccurme
7c28321f04
core[patch]: fix deprecation admonition in API ref (#28992)
Before:

![Screenshot 2025-01-02 at 1 49
30 PM](https://github.com/user-attachments/assets/cb30526a-fc0b-439f-96d1-962c226d9dc7)

After:

![Screenshot 2025-01-02 at 1 49
38 PM](https://github.com/user-attachments/assets/32c747ea-6391-4dec-b778-df457695d197)
2025-01-02 14:37:55 -05:00
Mohammad Mohtashim
0e74757b0a
(Community): DuckDuckGoSearchAPIWrapper backend changed from api to auto (#28961)
- **Description:** `DuckDuckGoSearchAPIWrapper` default value for
backend has been changed to avoid User Warning
- **Issue:** #28957
2025-01-02 14:08:22 -05:00
Mohammad Mohtashim
aa551cbcee
(Core) Small Change in Docstring for method partial for BasePromptTemplate (#28969)
- **Description:** Very small change in Docstring for
`BasePromptTemplate`
- **Issue:** #28966
2025-01-02 12:16:30 -05:00
minpeter
a873e0fbfb
community: update documentation and model IDs for FriendliAI provider (#28984)
### Description  

- In the example, remove `llama-2-13b-chat`,
`mixtral-8x7b-instruct-v0-1`.
- Fix llm friendli streaming implementation.
- Update examples in documentation and remove duplicates.

### Issue  
N/A  

### Dependencies  
None  

### Twitter handle  
`@friendliai`
2025-01-02 12:15:59 -05:00
Hrishikesh Kalola
437ec53e29
langchain.agents: corrected documentation (#28986)
**Description:**
This PR updates the codebase to reflect the deprecation of the AgentType
feature. It includes the following changes:

Documentation Update:

Added a deprecation notice to the AgentType class comment.
Provided a reference to the official LangChain migration guide for
transitioning to LangGraph agents.
Reference Link: https://python.langchain.com/docs/how_to/migrate_agent/

**Twitter handle:** @hrrrriiiishhhhh

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2025-01-02 12:13:42 -05:00
Mohammad Mohtashim
49a26c1fca
(Community): Fix Keyword argument for AzureAIDocumentIntelligenceParser (#28959)
- **Description:** Fix the `body` keyword argument for
AzureAIDocumentIntelligenceParser`
- **Issue:** #28948
2025-01-02 11:27:12 -05:00
ccurme
efc687a13b
community[patch]: fix instantiation for Slack tools (#28990)
Believe the current implementation raises PydanticUserError following
[this](https://github.com/pydantic/pydantic/releases/tag/v2.10.1)
Pydantic release.

Resolves https://github.com/langchain-ai/langchain/issues/28989
2025-01-02 16:14:17 +00:00
Yunlin Mao
c59093d67f
docs: add modelscope endpoint (#28941)
## Description

To integrate ModelScope inference API endpoints for both Embeddings,
LLMs and ChatModels, install the package
`langchain-modelscope-integration` (as discussed in issue #28928 ). This
is necessary because the package name `langchain-modelscope` was already
registered by another party.

ModelScope is a premier platform designed to connect model checkpoints
with model applications. It provides the necessary infrastructure to
share open models and promote model-centric development. For more
information, visit GitHub page:
[ModelScope](https://github.com/modelscope).
2025-01-02 10:08:41 -05:00
Bagatur
1c797ac68f
infra: speed up unit tests (#28974)
Co-authored-by: Erick Friis <erick@langchain.dev>
2025-01-02 04:13:08 +00:00
Morgante Pell
79fc9b6b04
cli: bump gritql version (#28981)
**Description:**

bump gritql dependency, to use new binary names from
[here](https://github.com/getgrit/gritql/pull/565)

**Issue:**

fixes https://github.com/langchain-ai/langchain/issues/27822
2025-01-01 20:02:46 -08:00
Bagatur
edbe7d5f5e
core,anthropic[patch]: fix with_structured_output typing (#28950) 2024-12-28 15:46:51 -05:00
dabzr
ffbe5b2106
partners: fix default value for stop_sequences in ChatGroq (#28924)
- **Description:**  
This PR addresses an issue with the `stop_sequences` field in the
`ChatGroq` class. Currently, the field is defined as:
```python
stop: Optional[Union[List[str], str]] = Field(None, alias="stop_sequences")
```  
This causes the language server (LSP) to raise an error indicating that
the `stop_sequences` parameter must be implemented. The issue occurs
because `Field(None, alias="stop_sequences")` is different compared to
`Field(default=None, alias="stop_sequences")`.


![image](https://github.com/user-attachments/assets/bfc34cb1-c664-4c31-b856-8f18419c7350)
To resolve the issue, the field is updated to:  
```python
stop: Optional[Union[List[str], str]] = Field(default=None, alias="stop_sequences")
```  
While this issue does not affect runtime behavior, it ensures
compatibility with LSPs and improves the development experience.
- **Issue:** N/A  
- **Dependencies:** None
2024-12-26 16:43:34 -05:00
Andy Wermke
5940ed3952
community: Fix error handling bug in ChatDeepInfra (#28918)
In the async ClientResponse, `response.text` is not a string property,
but an asynchronous function returning a string.
2024-12-26 14:45:12 -05:00
zep.hyr
7b4d2d5d44
Community : Add cost information for missing OpenAI model (#28882)
In the previous commit, the cached model key for this model was omitted.
When using the "gpt-4o-2024-11-20" model, the token count in the
callback appeared as 0, and the cost was recorded as 0.

We add model and cost information so that the token count and cost can
be displayed for the respective model.

- The message before modification is as follows.
```
Tokens Used: 0
Prompt Tokens: 0
Prompt Tokens Cached: 0 
Completion Tokens: 0  
Reasoning Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
```

- The message after modification is as follows.
```
Tokens Used: 3783 
Prompt Tokens: 3625
Prompt Tokens Cached: 2560
Completion Tokens: 158
Reasoning Tokens: 0
Successful Requests: 1
Total Cost (USD): $0.010642500000000001
```
2024-12-26 14:28:31 -05:00
Erick Friis
3726a944c0
docs: sorted by downloads [wip] (#28869) 2024-12-23 13:13:35 -08:00
Andreas Motl
6352edf77f
docs: CrateDB: Register package langchain-cratedb, and add minimal "provider" documentation (#28877)
Hi Erick. Coming back from a previous attempt, we now made a separate
package for the CrateDB adapter, called `langchain-cratedb`, as advised.
Other than registering the package within `libs/packages.yml`, this
patch includes a minimal amount of documentation to accompany the advent
of this new package. Let us know about any mistakes we made, or changes
you would like to see. Thanks, Andreas.

## About
- **Description:** Register a new database adapter package,
`langchain-cratedb`, providing traditional vector store, document
loader, and chat message history features for a start.
- **Addressed to:** @efriis, @eyurtsev
- **References:** GH-27710
- **Preview:** [Providers » More »
CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/)

## Status
- **PyPI:** https://pypi.org/project/langchain-cratedb/
- **GitHub:** https://github.com/crate/langchain-cratedb
- **Documentation (CrateDB):**
https://cratedb.com/docs/guide/integrate/langchain/
- **Documentation (LangChain):** _This PR._

## Backlog?
Is this applicable for this kind of patch?
> - [ ] **Add tests and docs**: If you're adding a new integration,
please include
> 1. a test for the integration, preferably unit tests that do not rely
on network access,
> 2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

## Q&A
1. Notebooks that use the LangChain CrateDB adapter are currently at
[CrateDB LangChain
Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain),
and the documentation refers to them. Because they are derived from very
old blueprints coming from LangChain 0.0.x times, we guess they need a
refresh before adding them to `docs/docs/integrations`. Is it applicable
to merge this minimal package registration + documentation patch, which
already includes valid code snippets in `cratedb.mdx`, and add
corresponding notebooks on behalf of a subsequent patch later?

2. How would it work getting into the tabular list of _Integration
Packages_ enumerated on the [documentation entrypoint page about
Providers](https://python.langchain.com/docs/integrations/providers/)?

/cc Please also review, @ckurze, @wierdvanderhaar, @kneth,
@simonprickett, if you can find the time. Thanks!
2024-12-23 10:55:44 -05:00
Wang Ran (汪然)
e5c9da3eb6
core[patch]: remove redundant imports (#28861)
`Graph` has been imported at Line: 62
2024-12-23 10:31:23 -05:00
Adrián Panella
8d9907088b
community(azuresearch): allow to use any valid credential (#28873)
Add option to use any valid credential type.
Differentiates async cases needed by Azure Search.

This could replace the use of a static token
2024-12-23 10:05:48 -05:00
Mohammad Mohtashim
41b6a86bbe
Community: LlamaCppEmbeddings embed_documents and embed_query (#28827)
- **Description:** `embed_documents` and `embed_query` was throwing off
the error as stated in the issue. The issue was that `Llama` client is
returning the embeddings in a nested list which is not being accounted
for in the current implementation and therefore the stated error is
being raised.
- **Issue:** #28813

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-23 09:50:22 -05:00
Darien Schettler
32917a0b98
Update dataframe.py (#28871)
community: optimize DataFrame document loader

**Description:**
Simplify the `lazy_load` method in the DataFrame document loader by
combining text extraction and metadata cleanup into a single operation.
This makes the code more concise while maintaining the same
functionality.

**Issue:** N/A

**Dependencies:** None

**Twitter handle:** N/A
2024-12-22 19:16:16 -05:00
yeounhak
f38fc89f35
community: Corrected aload func to be asynchronous from webBaseLoader (#28337)
- **Description:** The aload function, contrary to its name, is not an
asynchronous function, so it cannot work concurrently with other
asynchronous functions.

- **Issue:** #28336 

- **Test: **: Done

- **Docs: **
[here](e0a95e5646/docs/docs/integrations/document_loaders/web_base.ipynb (L201))

- **Lint: ** All checks passed

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-20 14:42:52 -05:00
Mohammad Mohtashim
8cf5f20bb5
required tool_choice added for ChatHuggingFace (#28851)
- **Description:** HuggingFace Inference Client V3 now supports
`required` as tool_choice which has been added.
- **Issue:** #28842
2024-12-20 12:06:04 -05:00
Sylvain DEPARTE
fcba567a77
partners: allow to set Prefix in AIMessage (for MistralAI) (#28846)
**Description:**

Added ability to set `prefix` attribute to prevent error : 
```
httpx.HTTPStatusError: Error response 400 while fetching https://api.mistral.ai/v1/chat/completions: {"object":"error","message":"Expected last role User or Tool (or Assistant with prefix True) for serving but got assistant","type":"invalid_request_error","param":null,"code":null}
```

Co-authored-by: Sylvain DEPARTE <sylvain.departe@wizbii.com>
2024-12-20 11:09:45 -05:00
Jacob Mansdorfer
6d81137325
community: adding langchain-predictionguard partner package documentation (#28832)
- *[x] **PR title**: "community: adding langchain-predictionguard
partner package documentation"

- *[x] **PR message**:
- **Description:** This PR adds documentation for the
langchain-predictionguard package to main langchain repo, along with
deprecating current Prediction Guard LLMs package. The LLMs package was
previously broken, so I also updated it one final time to allow it to
continue working from this point onward. . This enables users to chat
with LLMs through the Prediction Guard ecosystem.
    - **Package Links**: 
        -  [PyPI](https://pypi.org/project/langchain-predictionguard/)
- [Github
Repo](https://www.github.com/predictionguard/langchain-predictionguard)
    - **Issue:** None
    - **Dependencies:** None
- **Twitter handle:** [@predictionguard](https://x.com/predictionguard)

- *[x] **Add tests and docs**: All docs have been added for the partner
package, and the current LLMs package test was updated to reflect
changes.


- *[x] **Lint and test**: Linting tests are all passing.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-20 10:51:44 -05:00
ccurme
f0e858b4e3
core[patch]: release 0.3.28 (#28837) 2024-12-19 17:52:32 -05:00
ccurme
137d1e9564
langchain[patch]: fix test following update to langchain-openai (#28838) 2024-12-19 22:39:48 +00:00
Emmanuel Leroy
c8db5a19ce
langchain_community.chat_models.oci_generative_ai: Fix a bug when using optional parameters in tools (#28829)
When using tools with optional parameters, the parameter `type` is not
longer available since langchain update to 0.3 (because of the pydantic
upgrade?) and there is now an `anyOf` field instead. This results in the
`type` being `None` in the chat request for the tool parameter, and the
LLM call fails with the error:

```
oci.exceptions.ServiceError: {'target_service': 'generative_ai_inference', 
'status': 400, 'code': '400', 
'opc-request-id': '...', 
'message': 'Parameter definition must have a type.', 
'operation_name': 'chat'
...
}
```

Example code that fails:

```
from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
from langchain_core.tools import tool
from typing import Optional

llm = ChatOCIGenAI(
        model_id="cohere.command-r-plus",
        service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
        compartment_id="ocid1.compartment.oc1...",
        auth_profile="your_profile",
        auth_type="API_KEY",
        model_kwargs={"temperature": 0, "max_tokens": 3000},
)

@tool
def test(example: Optional[str] = None):
    """This is the tool to use to test things

    Args:
        example: example variable, defaults to None
    """
    return "this is a test"

llm_with_tools = llm.bind_tools([test])

result = llm_with_tools.invoke("can you make a test for g")
```

This PR sets the param type to `any` in that case, and fixes the
problem.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-19 22:17:34 +00:00
Bagatur
c3ccd93c12
patch openai json mode test (#28831) 2024-12-19 21:43:32 +00:00
Bagatur
ce6748dbfe
xfail openai image token count test (#28828) 2024-12-19 21:23:30 +00:00
Anusha Karkhanis
26bdf40072
Langchain_Community: SQL LanguageParser (#28430)
## Description
(This PR has contributions from @khushiDesai, @ashvini8, and
@ssumaiyaahmed).

This PR addresses **Issue #11229** which addresses the need for SQL
support in document parsing. This is integrated into the generic
TreeSitter parsing library, allowing LangChain users to easily load
codebases in SQL into smaller, manageable "documents."

This pull request adds a new ```SQLSegmenter``` class, which provides
the SQL integration.

## Issue
**Issue #11229**: Add support for a variety of languages to
LanguageParser

## Testing
We created a file ```test_sql.py``` with several tests to ensure the
```SQLSegmenter``` is functional. Below are the tests we added:

- ```def test_is_valid```: Checks SQL validity.
- ```def test_extract_functions_classes```: Extracts individual SQL
statements.
- ```def test_simplify_code```: Simplifies SQL code with comments.

---------

Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com>
Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com>
Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com>
Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-19 20:30:57 +00:00
Bagatur
a7f2148061
openai[patch]: Release 0.2.14 (#28826) 2024-12-19 11:56:44 -08:00
Bagatur
1378ddfa5f
openai[patch]: type reasoning_effort (#28825) 2024-12-19 19:36:49 +00:00
Erick Friis
6a37899b39
core: dont mutate tool_kwargs during tool run (#28824)
fixes https://github.com/langchain-ai/langchain/issues/24621
2024-12-19 18:11:56 +00:00
Qun
033ac41760
fix crash when using create_xml_agent with parameterless function as … (#26002)
When using `create_xml_agent` or `create_json_chat_agent` to create a
agent, and the function corresponding to the tool is a parameterless
function, the `XMLAgentOutputParser` or `JSONAgentOutputParser` will
parse the tool input into an empty string, `BaseTool` will parse it into
a positional argument.
So, the program will crash finally because we invoke a parameterless
function but with a positional argument.Specially, below code will raise
StopIteration in
[_parse_input](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools/base.py#L419)
```python
from langchain import hub
from langchain.agents import AgentExecutor, create_json_chat_agent, create_xml_agent
from langchain_openai import ChatOpenAI

prompt = hub.pull("hwchase17/react-chat-json")

llm = ChatOpenAI()

# agent = create_xml_agent(llm, tools, prompt)
agent = create_json_chat_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(......)
```

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 13:00:46 -05:00
Luke
f69695069d
text_splitters: Add HTMLSemanticPreservingSplitter (#25911)
**Description:** 

With current HTML splitters, they rely on secondary use of the
`RecursiveCharacterSplitter` to further chunk the document into
manageable chunks. The issue with this is it fails to maintain important
structures such as tables, lists, etc within HTML.

This Implementation of a HTML splitter, allows the user to define a
maximum chunk size, HTML elements to preserve in full, options to
preserve `<a>` href links in the output and custom handlers.

The core splitting begins with headers, similar to `HTMLHeaderSplitter`.
If these sections exceed the length of the `max_chunk_size` further
recursive splitting is triggered. During this splitting, elements listed
to preserve, will be excluded from the splitting process. This can cause
chunks to be slightly larger then the max size, depending on preserved
length. However, all contextual relevance of the preserved item remains
intact.

**Custom Handlers**: Sometimes, companies such as Atlassian have custom
HTML elements, that are not parsed by default with `BeautifulSoup`.
Custom handlers allows a user to provide a function to be ran whenever a
specific html tag is encountered. This allows the user to preserve and
gather information within custom html tags that `bs4` will potentially
miss during extraction.

**Dependencies:** User will need to install `bs4` in their project to
utilise this class

I have also added in `how_to` and unit tests, which require `bs4` to
run, otherwise they will be skipped.

Flowchart of process:


![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 12:09:22 -05:00
Tommaso De Lorenzo
24bfa062bf
langchain: add support for Google Anthropic Vertex AI model garden provider in init_chat_model (#28177)
Simple modification to add support for anthropic models deployed in
Google Vertex AI model garden in `init_chat_model` importing
`ChatAnthropicVertex`

- [v] **Lint and test**
2024-12-19 12:06:21 -05:00
Erick Friis
ff7b01af88
anthropic: less pydantic for client (#28823) 2024-12-19 08:00:02 -08:00
Erick Friis
f1d783748a
anthropic: sdk bump (#28820) 2024-12-19 15:39:21 +00:00
Erick Friis
907f36a6e9
fireworks: fix lint (#28821) 2024-12-19 15:36:36 +00:00
Erick Friis
6526db4871
community: bump core (#28819) 2024-12-19 06:41:53 -08:00
Vignesh A
4c9acdfbf1
Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135)
Added Token tracking for OpenAI's prompt caching and reasoning tokens
Costs updated from https://openai.com/api/pricing/

usage example
```python
from langchain_community.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="o1-mini",temperature=1)

with get_openai_callback() as cb:
    response = llm.invoke("hi "*1500)
    print(cb)
```
Output
```
Tokens Used: 1720
	Prompt Tokens: 1508
		Prompt Tokens Cached: 1408
	Completion Tokens: 212
		Reasoning Tokens: 192
Successful Requests: 1
Total Cost (USD): $0.0049559999999999995
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-19 09:31:13 -05:00
ScriptShi
97f1e1d39f
community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812)
Added some restrictions to a vectorstore I released in the community
before.
2024-12-19 09:30:43 -05:00
Wang Ran (汪然)
f48755d35b
core: typo Utilities for tests. -> Utilities for pydantic. (#28814)
**Description:** typo
2024-12-19 09:26:17 -05:00
Wang Ran (汪然)
51b8ddaf10
core: typo in runnable (#28815)
Thank you for contributing to LangChain!

**Description:** Typo
2024-12-19 09:25:57 -05:00
Erick Friis
3b036a1cf2
partners/fireworks: release 0.2.6 (#28805) 2024-12-18 22:48:35 +00:00
Erick Friis
4eb8bf7793
partners/anthropic: release 0.3.1 (#28801) 2024-12-18 22:45:38 +00:00
Lu Peng
50afa7c4e7
community: add new parameter default_headers (#28700)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- "community: 1. add new parameter `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class."


- [x] **PR message**:
- **Description:** 1. add new parameters `default_headers` for oci model
deployments and oci chat model deployments. 2. updated k parameter in
OCIModelDeploymentLLM class.


- [x] **Add tests and docs**:
  1. unit tests
  2. notebook

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 22:33:23 +00:00
Erick Friis
cc616de509
partners/xai: release 0.1.1 (#28806) 2024-12-18 22:15:24 +00:00
Erick Friis
ba8c1b0d8c
partners/groq: release 0.2.2 (#28804) 2024-12-18 22:12:02 +00:00
Erick Friis
a119cae5bd
partners/mistralai: release 0.2.4 (#28803) 2024-12-18 22:11:48 +00:00
Erick Friis
514d78516b
partners/ollama: release 0.2.2 (#28802) 2024-12-18 22:11:08 +00:00
Bagatur
68940dd0d6
openai[patch]: Release 0.2.13 (#28800) 2024-12-18 22:08:47 +00:00
Erick Friis
4dc28b43ac
community: release 0.3.13 (#28798) 2024-12-18 21:58:46 +00:00
Bagatur
557f63c2e6
core[patch]: Release 0.3.27 (#28799) 2024-12-18 21:58:03 +00:00
Bagatur
4a531437bb
core[patch], openai[patch]: Handle OpenAI developer msg (#28794)
- Convert developer openai messages to SystemMessage
- store additional_kwargs={"__openai_role__": "developer"} so that the
correct role can be reconstructed if needed
- update ChatOpenAI to read in openai_role

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 21:54:07 +00:00
Erick Friis
079f1d93ab
langchain: release 0.3.13 (#28797) 2024-12-18 12:32:00 -08:00
Yuxin Chen
3256b5d6ae
text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373)
- **Description:** 
This PR resolves an issue with the
`ExperimentalMarkdownSyntaxTextSplitter` class, which retains the
internal state across multiple calls to the `split_text` method. This
behaviour caused an unintended accumulation of chunks in `self`
variables, leading to incorrect outputs when processing multiple
Markdown files sequentially.

- Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to
reset the relevant internal attributes at the start of each `split_text`
invocation. This ensures each call processes the input independently.
- Added unit tests in
`libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify
the fix and ensure the state does not persist across calls.

- **Issue:**  
Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440).

- **Dependencies:**
No additional dependencies are introduced with this change.


- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.  
- [x] Ran `make format`, `make lint`, and `make test` to ensure
compliance with project standards.

---------

Co-authored-by: Angel Chen <angelchen396@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 20:27:59 +00:00
Mohammad Mohtashim
7c8f977695
Community: Fix with_structured_output for ChatSambaNovaCloud (#28796)
- **Description:** The `kwargs` was being checked as None object which
was causing the rest of code in `with_structured_output` not getting
executed. The checking part has been fixed in this PR.
- **Issue:** #28776
2024-12-18 14:35:06 -05:00
V.Prasanna kumar
684b146b18
Fixed adding float values into DynamoDB (#26562)
Thank you for contributing to LangChain!

- [x] **PR title**: Add float Message into Dynamo DB
  -  community
  - Example: "community: Chat Message History 


- [x] **PR message**: 
- **Description:** pushing float values into dynamo db creates error ,
solved that by converting to str type
    - **Issue:** Float values are not getting pushed
    - **Twitter handle:** VpkPrasanna
    
    
Have added an utility function for str conversion , let me know where to
place it happy to do an commit.
    
    This PR is from an discussion of #26543
    
    @hwchase17 @baskaryan @efriis

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 13:45:00 -05:00
William FH
50ea1c3ea3
[Core] respect tracing project name cvar (#28792) 2024-12-18 10:02:02 -08:00
Martin Triska
e6b41d081d
community: DocumentLoaderAsParser wrapper (#27749)
## Description

This pull request introduces the `DocumentLoaderAsParser` class, which
acts as an adapter to transform document loaders into parsers within the
LangChain framework. The class enables document loaders that accept a
`file_path` parameter to be utilized as blob parsers. This is
particularly useful for integrating various document loading
capabilities seamlessly into the LangChain ecosystem.

When merged in together with PR
https://github.com/langchain-ai/langchain/pull/27716 It opens options
for `SharePointLoader` / `OneDriveLoader` to process any filetype that
has a document loader.

### Features

- **Flexible Parsing**: The `DocumentLoaderAsParser` class can adapt any
document loader that meets the criteria of accepting a `file_path`
argument, allowing for lazy parsing of documents.
- **Compatibility**: The class has been designed to work with various
document loaders, making it versatile for different use cases.

### Usage Example

To use the `DocumentLoaderAsParser`, you would initialize it with a
suitable document loader class and any required parameters. Here’s an
example of how to do this with the `UnstructuredExcelLoader`:

```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
from langchain_community.document_loaders.excel import UnstructuredExcelLoader

# Initialize the parser adapter with UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")

# Use parser, for ex. pass it to MimeTypeBasedParser
MimeTypeBasedParser(
    handlers={
        "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser
    }
)
```


- **Dependencies:** None
- **Twitter handle:** @martintriska1

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 12:47:08 -05:00
Erick Friis
9b024d00c9
text-splitters: release 0.3.4 (#28795) 2024-12-18 09:44:36 -08:00
Erick Friis
5cf965004c
core: release 0.3.26 (#28793) 2024-12-18 17:28:42 +00:00
Mohammad Mohtashim
d49df4871d
[Community]: Image Extraction Fixed for PDFPlumberParser (#28491)
- **Description:** One-Bit Images was raising error which has been fixed
in this PR for `PDFPlumberParser`
 - **Issue:** #28480

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 11:45:48 -05:00
binhnd102
f723a8456e
Fixes: community: fix LanceDB return no metadata (#27024)
- [ x ] Fix when lancedb return table without metadata column
- **Description:** Check the table schema, if not has metadata column,
init the Document with metadata argument equal to empty dict
    - **Issue:** https://github.com/langchain-ai/langchain/issues/27005

- [ x ] **Add tests and docs**

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-18 15:21:28 +00:00
ANSARI MD AAQIB AHMED
91d28ef453
Add langchain-yt-dlp Document Loader Documentation (#28775)
## Overview
This PR adds documentation for the `langchain-yt-dlp` package, a YouTube
document loader that uses `yt-dlp` for Youtube videos metadata
extraaction.

## Changes
- Added documentation notebook for YoutubeLoader
- Updated packages.yml to include langchain-yt-dlp

## Motivation
The existing LangChain YoutubeLoader was unable to fetch YouTube
metadata due to changes in YouTube's structure. This package resolves
those issues by leveraging the `yt-dlp` library.

## Features
- Reliable YouTube metadata extraction

## Related
- Package Repository: https://github.com/aqib0770/langchain-yt-dlp
- PyPI Package: https://pypi.org/project/langchain-yt-dlp/

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 10:16:50 -05:00
GITHUBear
33b1fb95b8
partners: langchain-oceanbase Integration (#28782)
Hi, langchain team! I'm a maintainer of
[OceanBase](https://github.com/oceanbase/oceanbase).

With the integration guidance, I create a python lib named
[langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase)
to integrate `Oceanbase Vector Store` with `Langchain`.

So I'd like to add the required docs. I will appreciate your feedback.
Thank you!

---------

Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-18 14:51:49 +00:00
Rave Harpaz
986b752fc8
Add OCI Generative AI new model and structured output support (#28754)
- [X] **PR title**: 
 community: Add new model and structured output support


- [X] **PR message**: 
- **Description:** add support for meta llama 3.2 image handling, and
JSON mode for structured output
    - **Issue:** NA
    - **Dependencies:** NA
    - **Twitter handle:** NA


- [x] **Add tests and docs**: 
  1. we have updated our unit tests,
  2. no changes required for documentation.


- [x] **Lint and test**: 
make format, make lint and make test we run successfully

---------

Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-18 09:50:25 -05:00
David Pryce-Compson
ef24220d3f
community: adding haiku 3.5 and opus callbacks (#28783)
**Description:** 
Adding new AWS Bedrock model and their respective costs to match
https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback

**Issue:** 
Missing models for those that wish to try them out

**Dependencies:**
Nothing added

**Twitter handle:**
@David_Pryce and / or @JamfSoftware

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-12-18 09:45:10 -05:00
Yudai Kotani
05a44797ee
langchain_community: Add default None values to DocumentAttributeValue class properties (#28785)
**Description**: 
This PR addresses an issue where the DocumentAttributeValue class
properties did not have default values of None. By explicitly setting
the Optional attributes (DateValue, LongValue, StringListValue, and
StringValue) to default to None, this change ensures the class functions
as expected when no value is provided for these attributes.

**Changes Made**:
Added default None values to the following properties of the
DocumentAttributeValue class:
DateValue
LongValue
StringListValue
StringValue
Removed the invalid argument extra="allow" from the BaseModel
inheritance.
Dependencies: None.

**Twitter handle (optional)**: @__korikori1021

**Checklist**
- [x] Verified that KendraRetriever works as expected after the changes.

Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>
2024-12-18 09:43:04 -05:00
Satyam Kumar
90f7713399
refactor: improve docstring parsing logic for Google style (#28730)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


Description:  
Improved the `_parse_google_docstring` function in `langchain/core` to
support parsing multi-paragraph descriptions before the `Args:` section
while maintaining compliance with Google-style docstring guidelines.
This change ensures better handling of docstrings with detailed function
descriptions.

Issue:  
Fixes #28628

Dependencies:  
None.

Twitter handle:  
@isatyamks

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-18 09:35:19 -05:00
Dong Shin
0b1359801e
community: add trust_env at web_base_loader (#28514)
- **Description:** I am working to address a similar issue to the one
mentioned in https://github.com/langchain-ai/langchain/pull/19499.
Specifically, there is a problem with the Webbase loader used in
open-webui, where it fails to load the proxy configuration. This PR aims
to resolve that issue.




<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.-->

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 21:18:16 -05:00
Erick Friis
be738aa7de
packages: enable vertex api build (#28773) 2024-12-17 11:31:14 -08:00
Bagatur
ac278cbe8b
core[patch]: export InjectedToolCallId (#28772) 2024-12-17 19:29:20 +00:00
Bagatur
e4d3ccf62f
json mode standard test (#25497)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 18:47:34 +00:00
Frank Dai
e81433497b
community: support Confluence cookies (#28760)
**Description**: Some confluence instances don't support personal access
token, then cookie is a convenient way to authenticate. This PR adds
support for Confluence cookies.

**Twitter handle**: soulmachine
2024-12-17 12:16:36 -05:00
ccurme
b745281eec
anthropic[patch]: increase timeouts for integration tests (#28767)
Some tests consistently ran into the 10s limit in CI.
2024-12-17 15:47:17 +00:00
Vinit Kudva
a00258ec12
chroma: fix persistence if client_settings is passed in (#25199)
…ent path given.

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-17 10:03:02 -05:00
Omri Eliyahu Levy
f8883a1321
partners/voyageai: enable setting output dimension (#28740)
Voyage has introduced voyage-3-large and voyage-code-3, which feature
different output dimensions by leveraging a technique called "Matryoshka
Embeddings" (see blog -
https://blog.voyageai.com/2024/12/04/voyage-code-3/).
These two models are available in various sizes: [256, 512, 1024, 2048]
(https://docs.voyageai.com/docs/embeddings#model-choices).

This PR adds the option to set the required output dimension.
2024-12-17 10:02:00 -05:00
German Martin
3a1d05394d
community: Apache AGE wrapper. Ensure Node Uniqueness by ID. (#28759)
**Description:**

The Apache AGE graph integration incorrectly handled node merging,
allowing duplicate nodes with different IDs but the same type and other
properties. Unlike
[Neo4j](cdf6202156/libs/community/langchain_community/graphs/neo4j_graph.py (L47)),
[Memgraph](cdf6202156/libs/community/langchain_community/graphs/memgraph_graph.py (L50)),
[Kuzu](cdf6202156/libs/community/langchain_community/graphs/kuzu_graph.py (L253)),
and
[Gremlin](cdf6202156/libs/community/langchain_community/graphs/gremlin_graph.py (L165)),
it did not use the node ID as the primary identifier for merging.

This inconsistency caused data integrity issues and unexpected behavior
when users expected updates to specific nodes by ID.

**Solution:**
This PR modifies the `node_insert_query` to `MERGE` nodes based on label
and ID *only* and updates properties with `SET`, aligning the behavior
with other graph database integrations. The `_format_properties` method
was also modified to handle id overrides.

**Impact:**

This fix ensures data integrity by preventing duplicate nodes, and
provides a consistent behavior across graph database integrations.
2024-12-17 09:21:59 -05:00
gsa9989
cdf6202156
cosmosdbnosql: Added Cosmos DB NoSQL Semantic Cache Integration with tests and jupyter notebook (#24424)
* Added Cosmos DB NoSQL Semantic Cache Integration with tests and
jupyter notebook

---------

Co-authored-by: Aayush Kataria <aayushkataria3011@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 21:57:05 -05:00
Brian Burgin
27a9056725
community: Fix ChatLiteLLMRouter runtime issues (#28163)
**Description:** Fix ChatLiteLLMRouter ctor validation and model_name
parameter
**Issue:** #19356, #27455, #28077
**Twitter handle:** @bburgin_0
2024-12-16 18:17:39 -05:00
Mikhail Khludnev
00deacc67e
docs, external: introduce langchain-localai (#28751)
Thank you for contributing to LangChain!

Referring to https://github.com/mkhludnev/langchain-localai

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 22:22:37 +00:00
Erick Friis
d4b5e7ef22
community: recommend RedisVectorStore over Redis (#28749) 2024-12-16 21:08:30 +00:00
Hiros
8f5e72de05
community: Correctly handle multi-element rich text (#25762)
**Description:**

- Add _concatenate_rich_text method to combine all elements in rich text
arrays
- Update load_page method to use _concatenate_rich_text for rich text
properties
- Ensure all text content is captured, including inline code and
formatted text
- Add unit tests to verify correct handling of multi-element rich text
This fix prevents truncation of content after backticks or other
formatting elements.

 **Issue:**

Using Notion DB Loader, the text for `richtext` and `title` is truncated
after 1st element was loaded as Notion Loader only read the first
element.

**Dependencies:** any dependencies required for this change
None.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 20:20:27 +00:00
Antonio Lanza
b2102b8cc4
text-splitters: Inconsistent results with NLTKTextSplitter's add_start_index=True (#27782)
This PR closes #27781

# Problem
The current implementation of `NLTKTextSplitter` is using
`sent_tokenize`. However, this `sent_tokenize` doesn't handle chars
between 2 tokenized sentences... hence, this behavior throws errors when
we are using `add_start_index=True`, as described in issue #27781. In
particular:
```python
from nltk.tokenize import sent_tokenize

output1 = sent_tokenize("Innovation drives our success. Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output1)
output2 = sent_tokenize("Innovation drives our success.        Collaboration fosters creative solutions. Efficiency enhances data management.", language="english")
print(output2)
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
>>> ['Innovation drives our success.', 'Collaboration fosters creative solutions.', 'Efficiency enhances data management.']
```

# Solution
With this new `use_span_tokenize` parameter, we can use NLTK to create
sentences (with `span_tokenize`), but also add extra chars to be sure
that we still can map the chunks to the original text.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-16 19:53:15 +00:00
Tari Yekorogha
d262d41cc0
community: added FalkorDB vector store support i.e implementation, test, docs an… (#26245)
**Description:** Added support for FalkorDB Vector Store, including its
implementation, unit tests, documentation, and an example notebook. The
FalkorDB integration allows users to efficiently manage and query
embeddings in a vector database, with relevance scoring and maximal
marginal relevance search. The following components were implemented:

- Core implementation for FalkorDBVector store.
- Unit tests ensuring proper functionality and edge case coverage.
- Example notebook demonstrating an end-to-end setup, search, and
retrieval using FalkorDB.

**Twitter handle:** @tariyekorogha

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:37:55 +00:00
Aaron Pham
12fced13f4
chore(community): update to OpenLLM 0.6 (#24609)
Update to OpenLLM 0.6, which we decides to make use of OpenLLM's
OpenAI-compatible endpoint. Thus, OpenLLM will now just become a thin
wrapper around OpenAI wrapper.

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>

---------

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 14:30:07 -05:00
Lvlvko
5c17a4ace9
community: support Hunyuan Embedding (#23160)
## description

- I refactor `Chathunyuan` using tencentcloud sdk because I found the
original one can't work in my application
- I add `HunyuanEmbeddings` using tencentcloud sdk
- Both of them are extend the basic class of langchain. I have fully
tested them in my application

## Dependencies
- tencentcloud-sdk-python

---------

Co-authored-by: centonhuang <centonhuang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 19:27:19 +00:00
Harrison Chase
de7996c2ca
core: add kwargs support to VectorStore (#25934)
has been missing the passthrough until now

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 18:57:57 +00:00
Lorenzo
b79a1156ed
community: correct return type of get_files_from_directory in github tool (#27885)
### About:
- **Description:** the _get_files_from_directory_ method return a
string, but it's used in other methods that expect a List[str]
- **Issue:** None
- **Dependencies:** None

This pull request import a new method _list_files_ with the old logic of
_get_files_from_directory_, but it return a List[str] at the end.
The behavior of _ get_files_from_directory_ is not changed.
2024-12-16 10:30:33 -08:00
Sheepsta300
580a8d53f9
community: Add configurable VisualFeatures to the AzureAiServicesImageAnalysisTool (#27444)
Thank you for contributing to LangChain!

- [ ] **PR title**: community: Add configurable `VisualFeatures` to the
`AzureAiServicesImageAnalysisTool`


- [ ] **PR message**:  
- **Description:** The `AzureAiServicesImageAnalysisTool` is a good
service and utilises the Azure AI Vision package under the hood.
However, since the creation of this tool, new `VisualFeatures` have been
added to allow the user to request other image specific information to
be returned. Currently, the tool offers neither configuration of which
features should be return nor does it offer any newer feature types. The
aim of this PR is to address this and expose more of the Azure Service
in this integration.
- **Dependencies:** no new dependencies in the main class file,
azure.ai.vision.imageanalysis added to extra test dependencies file.


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Although no tests exist for already implemented Azure Service tools,
I've created 3 unit tests for this class that test initialisation and
credentials, local file analysis and a test for the new changes/
features option.


- [ ] **Lint and test**: All linting has passed.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 18:30:04 +00:00
Erick Friis
1c120e9615
core: xml output parser tags docstring (#28745) 2024-12-16 18:25:16 +00:00
Ana
ebab2ea81b
Fix Azure National Cloud authentication using token (RBAC) (Generated by Ana - AI SDE) (#25843)
This pull request addresses the issue with authenticating Azure National
Cloud using token (RBAC) in the AzureSearch vectorstore implementation.

## Changes

- Modified the `_get_search_client` method in `azuresearch.py` to pass
`additional_search_client_options` to the `SearchIndexClient` instance.

## Implementation Details

The patch updates the `SearchIndexClient` initialization to include the
`additional_search_client_options` parameter:

```python
index_client: SearchIndexClient = SearchIndexClient(
    endpoint=endpoint,
    credential=credential,
    user_agent=user_agent,
    **additional_search_client_options
)
```

This change allows the `audience` parameter to be correctly passed when
using Azure National Cloud, fixing the authentication issues with
GovCloud & RBAC.

This patch was generated by [Ana - AI SDE](https://openana.ai/), an
AI-powered software development assistant.

This is a fix for [Issue
25823](https://github.com/langchain-ai/langchain/issues/25823)

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-12-16 18:22:24 +00:00
chenzimin
169d419581
community: Remove all other keys in ChatLiteLLM and add api_key (#28097)
Thank you for contributing to LangChain!

- **PR title**: "community: Remove all other keys in ChatLiteLLM and add
api_key"


- **PR message**: Currently, no api_key are passed to LiteLLM, and
LiteLLM only takes on api_key parameter. Therefore I removed all current
`*_api_key` attributes (They are not used), and added `api_key` that is
passed to ChatLiteLLM.
  - Should fix issue #27826

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 17:54:29 +00:00
German Martin
d5d18c62b3
community: Apache AGE wrapper additional edge cases. (#28151)
Description: 
Current AGEGraph() implementation does some custom wrapping for graph
queries. The method here is _wrap_query() as it parse the field from the
original query to add some SQL context to it.
This improves the current parsing logic to cover additional edge cases
that are added to the test coverage, basically if any Node property name
or value has the "return" literal in it will break the graph / SQL
query.
We discovered this while dealing with real world datasets, is not an
uncommon scenario and I think it needs to be covered.
2024-12-16 11:28:01 -05:00
Rock2z
768e4a7fd4
[community][fix] Compatibility support to bump up wikibase-rest-api-client version (#27316)
**Description:**

This PR addresses the `TypeError: sequence item 0: expected str
instance, FluentValue found` error when invoking `WikidataQueryRun`. The
root cause was an incompatible version of the
`wikibase-rest-api-client`, which caused the tool to fail when handling
`FluentValue` objects instead of strings.

The current implementation only supports `wikibase-rest-api-client<0.2`,
but the latest version is `0.2.1`, where the current implementation
breaks. Additionally, the error message advises users to install the
latest version: [code
reference](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/wikidata.py#L125C25-L125C32).
Therefore, this PR updates the tool to support the latest version of
`wikibase-rest-api-client`.

Key changes:
- Updated the handling of `FluentValue` objects to ensure compatibility
with the latest `wikibase-rest-api-client`.
- Removed the restriction to `wikibase-rest-api-client<0.2` and updated
to support the latest version (`0.2.1`).

**Issue:**

Fixes [#24093](https://github.com/langchain-ai/langchain/issues/24093) –
`TypeError: sequence item 0: expected str instance, FluentValue found`.

**Dependencies:**

- Upgraded `wikibase-rest-api-client` to the latest version to resolve
the issue.

---------

Co-authored-by: peiwen_zhang <peiwen_zhang@email.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 16:22:18 +00:00
André Quintino
a26c786bc5
community: refactor opensearch query constructor to use wildcard instead of match in the contain comparator (#26653)
- **Description:** Changed the comparator to use a wildcard query
instead of match. This modification allows for partial text matching on
analyzed fields, which improves the flexibility of the search by
performing full-text searches that aren't limited to exact matches.
- **Issue:** The previous implementation used a match query, which
performs exact matches on analyzed fields. This approach limited the
search capabilities by requiring the query terms to align with the
indexed text. The modification to use a wildcard query instead addresses
this limitation. The wildcard query allows for partial text matching,
which means the search can return results even if only a portion of the
term matches the text. This makes the search more flexible and suitable
for use cases where exact matches aren't necessary or expected, enabling
broader full-text searches across analyzed fields.
In short, the problem was that match queries were too restrictive, and
the change to wildcard queries enhances the ability to perform partial
matches.
- **Dependencies:** none
- **Twitter handle:** @Andre_Q_Pereira

---------

Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 11:16:34 -05:00
Davi Schumacher
0f9b4bf244
community[patch]: update dynamodb chat history to update instead of overwrite (#22397)
**Description:**
The current implementation of `DynamoDBChatMessageHistory` updates the
`History` attribute for a given chat history record by first extracting
the existing contents into memory, appending the new message, and then
using the `put_item` method to put the record back. This has the effect
of overwriting any additional attributes someone may want to include in
the record, like chat session metadata.

This PR suggests changing from using `put_item` to using `update_item`
instead which will keep any other attributes in the record untouched.
The change is backward compatible since
1. `update_item` is an "upsert" operation, creating the record if it
doesn't already exist, otherwise updating it
2. It only touches the db insert call and passes the exact same
information. The rest of the class is left untouched

**Dependencies:**
None

**Tests and docs:**
No unit tests currently exist for the `DynamoDBChatMessageHistory`
class. This PR adds the file
`libs/community/tests/unit_tests/chat_message_histories/test_dynamodb_chat_message_history.py`
to test the `add_message` and `clear` methods. I wanted to use the moto
library to mock DynamoDB calls but I could not get poetry to resolve it
so I mocked those calls myself in the test. Therefore, no test
dependencies were added.

The change was tested on a test DynamoDB table as well. The first three
images below show the current behavior. First a message is added to chat
history, then a value is inserted in the record in some other attribute,
and finally another message is added to the record, destroying the other
attribute.

![using_put_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/426acd62-fe29-42f4-b75f-863fb8b3fb21)

![using_put_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/f8a1c864-7114-4fe3-b487-d6f9252f8f92)

![using_put_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/8b691e08-755e-4877-8969-0e9769e5d28a)

The next three images show the new behavior. Once again a value is added
to an attribute other than the History attribute, but now when the
followup message is added it does not destroy that other attribute. The
History attribute itself is unaffected by this change.

![using_update_1_first_message](https://github.com/langchain-ai/langchain/assets/29493541/3e0d76ed-637e-41cd-82c7-01a86c468634)

![using_update_2_add_attribute](https://github.com/langchain-ai/langchain/assets/29493541/52585f9b-71a2-43f0-9dfc-9935aa59c729)

![using_update_3_second_message](https://github.com/langchain-ai/langchain/assets/29493541/f94c8147-2d6f-407a-9a0f-86b94341abff)

The doc located at `docs/docs/integrations/memory/aws_dynamodb.ipynb`
required no changes and was tested as well.
2024-12-16 10:38:00 -05:00
Christophe Bornet
6ddd5dbb1e
community: Add FewShotSQLTool (#28232)
The `FewShotSQLTool` gets some SQL query examples from a
`BaseExampleSelector` for a given question.
This is useful to provide [few-shot
examples](https://python.langchain.com/docs/how_to/sql_prompting/#few-shot-examples)
capability to an SQL agent.

Example usage:
```python
from langchain.agents.agent_toolkits.sql.prompt import SQL_PREFIX

embeddings = OpenAIEmbeddings()

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    embeddings,
    AstraDB,
    k=5,
    input_keys=["input"],
    collection_name="lc_few_shots",
    token=ASTRA_DB_APPLICATION_TOKEN,
    api_endpoint=ASTRA_DB_API_ENDPOINT,
)

few_shot_sql_tool = FewShotSQLTool(
    example_selector=example_selector,
    description="Input to this tool is the input question, output is a few SQL query examples related to the input question. Always use this tool before checking the query with sql_db_query_checker!"
)

agent = create_sql_agent(
    llm=llm, 
    db=db, 
    prefix=SQL_PREFIX + "\nYou MUST get some example queries before creating the query.", 
    extra_tools=[few_shot_sql_tool]
)

result = agent.invoke({"input": "How many artists are there?"})
```

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-16 15:37:21 +00:00
Mohammad Mohtashim
8d746086ab
Added bind_tools support for ChatMLX along with small fix in _stream (#28743)
- **Description:** Added Support for `bind_tool` as requested in the
issue. Plus two issue in `_stream` were fixed:
    - Corrected the Positional Argument Passing for `generate_step`
    - Accountability if `token` returned by `generate_step` is integer.
- **Issue:** #28692
2024-12-16 09:52:49 -05:00
Jorge Piedrahita Ortiz
558b65ea32
community: SamabaStudio Tool Calling and Structured Output (#28025)
Description: Add tool calling and structured output support for
SambaStudio chat models, docs included

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 06:15:19 +00:00
clairebehue
fb44e74ca4
community: fix AzureSearch Oauth with azure_ad_access_token (#26995)
**Description:** 
AzureSearch vector store: create a wrapper class on
`azure.core.credentials.TokenCredential` (which is not-instantiable) to
fix Oauth usage with `azure_ad_access_token` argument

**Issue:** [the issue it
fixes](https://github.com/langchain-ai/langchain/issues/26216)

 **Dependencies:** None

- [x] **Lint and test**

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:56:45 +00:00
SirSmokeAlot
29305cd948
community: O365Toolkit - send_event - fixed timezone error (#25876)
**Description**: Fixed formatting start and end time
**Issue**: The old formatting resulted everytime in an timezone error
**Dependencies**: /
**Twitter handle**: /

---------

Co-authored-by: Yannick Opitz <yannick.opitz@gob.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-16 05:32:28 +00:00
Erick Friis
4f6ccb7080
text-splitters: extended-tests without socket (#28736) 2024-12-16 05:19:50 +00:00
Erick Friis
8ec1c72e03
text-splitters: test without socket (#28732) 2024-12-15 22:10:35 +00:00
Aayush Kataria
d417e4b372
Community: Azure CosmosDB No Sql Vector Store: Full Text and Hybrid Search Support (#28716)
Thank you for contributing to LangChain!

- Added [full
text](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/full-text-search)
and [hybrid
search](https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search)
support for Azure CosmosDB NoSql Vector Store
- Added a new enum called CosmosDBQueryType which supports the following
values:
    - VECTOR = "vector"
    - FULL_TEXT_SEARCH = "full_text_search"
    - FULL_TEXT_RANK = "full_text_rank"
    - HYBRID = "hybrid"
- User now needs to provide this query_type to the similarity_search
method for the vectorStore to make the correct query api call.
- Added a couple of work arounds as for the FULL_TEXT_RANK and HYBRID
query functions we don't support parameterized queries right now. I have
added TODO's in place, and will remove these work arounds by end of
January.
- Added necessary test cases and updated the 


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
2024-12-15 13:26:32 -08:00
Mohammad Mohtashim
4c1871d9a8
community: Passing the model_kwargs correctly while maintaing backward compatability (#28439)
- **Description:** `Model_Kwargs` was not being passed correctly to
`sentence_transformers.SentenceTransformer` which has been corrected
while maintaing backward compatability
- **Issue:** #28436

---------

Co-authored-by: MoosaTae <sadhis.tae@gmail.com>
Co-authored-by: Sadit Wongprayon <101176694+MoosaTae@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-15 20:34:29 +00:00
nhols
a3851cb3bc
community: FAISS vectorstore - consistent Document id field (#28728)
make sure id field of Documents in `FAISS` docstore have the same id as
values in `index_to_docstore_id`, implement `get_by_ids` method
2024-12-15 12:23:49 -08:00
Bagatur
a0534ae62a
community[patch]: Release 0.3.12 (#28725) 2024-12-14 22:13:20 +00:00
Bagatur
089e659e03
langchain[patch]: Release 0.3.12 (#28724) 2024-12-14 20:02:18 +00:00
Bagatur
679e3a9970
text-splitters[patch]: Release 0.3.3 (#28723) 2024-12-14 19:20:22 +00:00
Erick Friis
387284c259
core: release 0.3.25 (#28718) 2024-12-14 02:22:28 +00:00
Nawaf Alharbi
decd77c515
community: fix an issue with deepinfra integration (#28715)
Thank you for contributing to LangChain!

- [x] **PR title**: langchain: add URL parameter to ChatDeepInfra class

- [x] **PR message**: add URL parameter to ChatDeepInfra class
- **Description:** This PR introduces a url parameter to the
ChatDeepInfra class in LangChain, allowing users to specify a custom
URL. Previously, the URL for the DeepInfra API was hardcoded to
"https://stage.api.deepinfra.com/v1/openai/chat/completions", which
caused issues when the staging endpoint was not functional. The _url
method was updated to return the value from the url parameter, enabling
greater flexibility and addressing the problem. out!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:15:29 +00:00
Ben Chambers
008efada2c
[community]: Render documents to graphviz (#24830)
- **Description:** Adds a helper that renders documents with the
GraphVectorStore metadata fields to Graphviz for visualization. This is
helpful for understanding and debugging.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-14 02:02:09 +00:00
Erick Friis
288f204758
docs, community: aerospike docs update (#28717)
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Jesse S <jschmidt@aerospike.com>
Co-authored-by: dylan <dwelch@aerospike.com>
2024-12-14 00:27:37 +00:00
Vimpas
337fed80a5
community: 🐛 PDF Filter Type Error (#27154)
Thank you for contributing to LangChain!

 **PR title**: "community: fix  PDF Filter Type Error"


  - **Description:** fix  PDF Filter Type Error"
  - **Issue:** the issue #27153 it fixes,
  - **Dependencies:** no
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!



- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 23:30:29 +00:00
Ryan Parker
12111cb922
community: fallback on core async atransform_documents method for MarkdownifyTransformer (#27866)
# Description
Implements the `atransform_documents` method for
`MarkdownifyTransformer` using the `asyncio` built-in library for
concurrency.

Note that this is mainly for API completeness when working with async
frameworks rather than for performance, since the `markdownify` function
is not I/O bound because it works with `Document` objects already in
memory.

# Issue
Fixes #27865

# Dependencies
No new dependencies added, but
[`markdownify`](https://github.com/matthewwithanm/python-markdownify) is
required since this PR updates the `markdownify` integration.

# Tests and docs
- Tests added
- I did not modify the docstrings since they already described the basic
functionality, and [the API docs also already included a
description](https://python.langchain.com/api_reference/community/document_transformers/langchain_community.document_transformers.markdownify.MarkdownifyTransformer.html#langchain_community.document_transformers.markdownify.MarkdownifyTransformer.atransform_documents).
If it would be helpful, I would be happy to update the docstrings and/or
the API docs.

# Lint and test
- [x] format
- [x] lint
- [x] test

I ran formatting with `make format`, linting with `make lint`, and
confirmed that tests pass using `make test`. Note that some unit tests
pass in CI but may fail when running `make_test`. Those unit tests are:
- `test_extract_html` (and `test_extract_html_async`)
- `test_strip_tags` (and `test_strip_tags_async`)
- `test_convert_tags` (and `test_convert_tags_async`)

The reason for the difference is that there are trailing spaces when the
tests are run in the CI checks, and no trailing spaces when run with
`make test`. I ensured that the tests pass in CI, but they may fail with
`make test` due to the addition of trailing spaces.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:32:22 +00:00
Manuel
af2e0a7ede
partners: add 'model' alias for consistency in embedding classes (#28374)
**Description:** This PR introduces a `model` alias for the embedding
classes that contain the attribute `model_name`, to ensure consistency
across the codebase, as suggested by a moderator in a previous PR. The
change aligns the usage of attribute names across the project (see for
example
[here](65deeddd5d/libs/partners/groq/langchain_groq/chat_models.py (L304))).
**Issue:** This PR addresses the suggestion from the review of issue
#28269.
**Dependencies:**  None

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 22:30:00 +00:00
Erick Friis
3107d78517
huggingface: fix standard test lint (#28714) 2024-12-13 22:18:54 +00:00
Kaiwei Zhang
b909d54e70
chroma[patch]: Update logic for assigning ids 2024-12-13 21:58:34 +00:00
Karthik Bharadhwaj
498f0249e2
community[minor]: Opensearch hybridsearch implementation (#25375)
community: add hybrid search in opensearch

# Langchain OpenSearch Hybrid Search Implementation

## Implementation of Hybrid Search: 

I have taken LangChain's OpenSearch integration to the next level by
adding hybrid search capabilities. Building on the existing
OpenSearchVectorSearch class, I have implemented Hybrid Search
functionality (which combines the best of both keyword and semantic
search). This new functionality allows users to harness the power of
OpenSearch's advanced hybrid search features without leaving the
familiar LangChain ecosystem. By blending traditional text matching with
vector-based similarity, the enhanced class delivers more accurate and
contextually relevant results. It's designed to seamlessly fit into
existing LangChain workflows, making it easy for developers to upgrade
their search capabilities.

In implementing the hybrid search for OpenSearch within the LangChain
framework, I also incorporated filtering capabilities. It's important to
note that according to the OpenSearch hybrid search documentation, only
post-filtering is supported for hybrid queries. This means that the
filtering is applied after the hybrid search results are obtained,
rather than during the initial search process.

**Note:** For the implementation of hybrid search, I strictly followed
the official OpenSearch Hybrid search documentation and I took
inspiration from
https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search
Thanks Mate!  

### Experiments

I conducted few experiments to verify that the hybrid search
implementation is accurate and capable of reproducing the results of
both plain keyword search and vector search.

Experiment - 1
Hybrid Search
Keyword_weight: 1, vector_weight: 0

I conducted an experiment to verify the accuracy of my hybrid search
implementation by comparing it to a plain keyword search. For this test,
I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid
search, effectively giving full weightage to the keyword component. The
results from this hybrid search configuration matched those of a plain
keyword search, confirming that my implementation can accurately
reproduce keyword-only search results when needed. It's important to
note that while the results were the same, the scores differed between
the two methods. This difference is expected because the plain keyword
search in OpenSearch uses the BM25 algorithm for scoring, whereas the
hybrid search still performs both keyword and vector searches before
normalizing the scores, even when the vector component is given zero
weight. This experiment validates that my hybrid search solution
correctly handles the keyword search component and properly applies the
weighting system, demonstrating its accuracy and flexibility in
emulating different search scenarios.


Experiment - 2
Hybrid Search
keyword_weight = 0.0, vector_weight = 1.0

For experiment-2, I took the inverse approach to further validate my
hybrid search implementation. I set the keyword_weight to 0 and the
vector_weight to 1, effectively giving full weightage to the vector
search component (KNN search). I then compared these results with a pure
vector search. The outcome was consistent with my expectations: the
results from the hybrid search with these settings exactly matched those
from a standalone vector search. This confirms that my implementation
accurately reproduces vector search results when configured to do so. As
with the first experiment, I observed that while the results were
identical, the scores differed between the two methods. This difference
in scoring is expected and can be attributed to the normalization
process in hybrid search, which still considers both components even
when one is given zero weight. This experiment further validates the
accuracy and flexibility of my hybrid search solution, demonstrating its
ability to effectively emulate pure vector search when needed while
maintaining the underlying hybrid search structure.



Experiment - 3
Hybrid Search - balanced

keyword_weight = 0.5, vector_weight = 0.5

For experiment-3, I adopted a balanced approach to further evaluate the
effectiveness of my hybrid search implementation. In this test, I set
both the keyword_weight and vector_weight to 0.5, giving equal
importance to keyword-based and vector-based search components. This
configuration aims to leverage the strengths of both search methods
simultaneously. By setting both weights to 0.5, I intended to create a
scenario where the hybrid search would consider lexical matches and
semantic similarity equally. This balanced approach is often ideal for
many real-world applications, as it can capture both exact keyword
matches and contextually relevant results that might not contain the
exact search terms.

Kindly verify the notebook for the experiments conducted!  

**Notebook:**
https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb

### Instructions to follow for Performing Hybrid Search:

**Step-1: Instantiating OpenSearchVectorSearch Class:**
```python
opensearch_vectorstore = OpenSearchVectorSearch(
    index_name=os.getenv("INDEX_NAME"),
    embedding_function=embedding_model,
    opensearch_url=os.getenv("OPENSEARCH_URL"),
    http_auth=(os.getenv("OPENSEARCH_USERNAME"),os.getenv("OPENSEARCH_PASSWORD")),
    use_ssl=False,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False
)
```

**Parameters:**
1. **index_name:** The name of the OpenSearch index to use.
2. **embedding_function:** The function or model used to generate
embeddings for the documents. It's assumed that embedding_model is
defined elsewhere in the code.
3. **opensearch_url:** The URL of the OpenSearch instance.
4. **http_auth:** A tuple containing the username and password for
authentication.
5. **use_ssl:** Set to False, indicating that the connection to
OpenSearch is not using SSL/TLS encryption.
6. **verify_certs:** Set to False, which means the SSL certificates are
not being verified. This is often used in development environments but
is not recommended for production.
7. **ssl_assert_hostname:** Set to False, disabling hostname
verification in SSL certificates.
8. **ssl_show_warn:** Set to False, suppressing SSL-related warnings.

**Step-2: Configure Search Pipeline:**

To initiate hybrid search functionality, you need to configures a search
pipeline first.

**Implementation Details:**

This method configures a search pipeline in OpenSearch that:
1. Normalizes the scores from both keyword and vector searches using the
min-max technique.
2. Applies the specified weights to the normalized scores.
3. Calculates the final score using an arithmetic mean of the weighted,
normalized scores.


**Parameters:**

* **pipeline_name (str):** A unique identifier for the search pipeline.
It's recommended to use a descriptive name that indicates the weights
used for keyword and vector searches.
* **keyword_weight (float):** The weight assigned to the keyword search
component. This should be a float value between 0 and 1. In this
example, 0.3 gives 30% importance to traditional text matching.
* **vector_weight (float):** The weight assigned to the vector search
component. This should be a float value between 0 and 1. In this
example, 0.7 gives 70% importance to semantic similarity.

```python
opensearch_vectorstore.configure_search_pipelines(
    pipeline_name="search_pipeline_keyword_0.3_vector_0.7",
    keyword_weight=0.3,
    vector_weight=0.7,
)
```

**Step-3: Performing Hybrid Search:**

After creating the search pipeline, you can perform a hybrid search
using the `similarity_search()` method (or) any methods that are
supported by `langchain`. This method combines both `keyword-based and
semantic similarity` searches on your OpenSearch index, leveraging the
strengths of both traditional information retrieval and vector embedding
techniques.

**parameters:**
* **query:** The search query string.
* **k:** The number of top results to return (in this case, 3).
* **search_type:** Set to `hybrid_search` to use both keyword and vector
search capabilities.
* **search_pipeline:** The name of the previously created search
pipeline.

```python
query = "what are the country named in our database?"

top_k = 3

pipeline_name = "search_pipeline_keyword_0.3_vector_0.7"

matched_docs = opensearch_vectorstore.similarity_search_with_score(
                query=query,
                k=top_k,
                search_type="hybrid_search",
                search_pipeline = pipeline_name
            )

matched_docs
```

twitter handle: @iamkarthik98

---------

Co-authored-by: Karthik Kolluri <karthik.kolluri@eidosmedia.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:34:12 -05:00
Philippe PRADOS
f3fb5a9c68
community[minor]: Fix json._validate_metadata_func() (#22842)
JSONparse, in _validate_metadata_func(), checks the consistency of the
_metadata_func() function. To do this, it invokes it and makes sure it
receives a dictionary in response. However, during the call, it does not
respect future calls, as shown on line 100. This generates errors if,
for example, the function is like this:
```python
        def generate_metadata(json_node:Dict[str,Any],kwargs:Dict[str,Any]) -> Dict[str,Any]:
             return {
                "source": url,
                "row": kwargs['seq_num'],
                "question":json_node.get("question"),
            }
        loader = JSONLoader(
            file_path=file_path,
            content_key="answer",
            jq_schema='.[]',
            metadata_func=generate_metadata,
            text_content=False)
```
To avoid this, the verification must comply with the specifications.
This patch does just that.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 21:24:20 +00:00
Keiichi Hirobe
67fd554512
core[patch]: throw exception indexing code if deletion fails in vectorstore (#28103)
The delete methods in the VectorStore and DocumentIndex interfaces
return a status indicating the result. Therefore, we can assume that
their implementations don't throw exceptions but instead return a result
indicating whether the delete operations have failed. The current
implementation doesn't check the returned value, so I modified it to
throw an exception when the operation fails.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 16:14:27 -05:00
Keiichi Hirobe
258b3be5ec
core[minor]: add new clean up strategy "scoped_full" to indexing (#28505)
~Note that this PR is now Draft, so I didn't add change to `aindex`
function and didn't add test codes for my change.
After we have an agreement on the direction, I will add commits.~

`batch_size` is very difficult to decide because setting a large number
like >10000 will impact VectorDB and RecordManager, while setting a
small number will delete records unnecessarily, leading to redundant
work, as the `IMPORTANT` section says.
On the other hand, we can't use `full` because the loader returns just a
subset of the dataset in our use case.

I guess many people are in the same situation as us.

So, as one of the possible solutions for it, I would like to introduce a
new argument, `scoped_full_cleanup`.
This argument will be valid only when `claneup` is Full. If True, Full
cleanup deletes all documents that haven't been updated AND that are
associated with source ids that were seen during indexing. Default is
False.

This change keeps backward compatibility.

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-13 20:35:25 +00:00
Eugene Yurtsev
ce90b25313
core[patch]: Update error message in indexing code for unreachable code assertion (#28712)
Minor update for error message that should never be triggered
2024-12-13 20:21:14 +00:00
Keiichi Hirobe
da28cf1f54
core[patch]: Reverts PR #25754 and add unit tests (#28702)
I reported the bug 2 weeks ago here:
https://github.com/langchain-ai/langchain/issues/28447

I believe this is a critical bug for the indexer, so I submitted a PR to
revert the change and added unit tests to prevent similar bugs from
being introduced in the future.

@eyurtsev Could you check this?
2024-12-13 15:13:06 -05:00
ScriptShi
b0a298894d
community[minor]: Add TablestoreVectorStore (#25767)
Thank you for contributing to LangChain!

- [x] **PR title**:  community: add TablestoreVectorStore



- [x] **PR message**: 
    - **Description:** add TablestoreVectorStore
    - **Dependencies:** none


- [x] **Add tests and docs**: If you're adding a new integration, please
include
  1. a test for the integration: yes
  2. an example notebook showing its use: yes

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-13 11:17:28 -08:00
Erick Friis
86b3c6e81c
community: make old stub for QuerySQLDataBaseTool private to skip api ref (#28711) 2024-12-13 10:43:23 -08:00
Martin Triska
05ebe1e66b
Community: add modified_since argument to O365BaseLoader (#28708)
## What are we doing in this PR
We're adding `modified_since` optional argument to `O365BaseLoader`.
When set, O365 loader will only load documents newer than
`modified_since` datetime.

## Why?
OneDrives / Sharepoints can contain large number of documents. Current
approach is to download and parse all files and let indexer to deal with
duplicates. This can be prohibitively time-consuming. Especially when
using OCR-based parser like
[zerox](fa06188834/libs/community/langchain_community/document_loaders/pdf.py (L948)).
This argument allows to skip documents that are older than known time of
indexing.

_Q: What if a file was modfied during last indexing process?
A: Users can set the `modified_since` conservatively and indexer will
still take care of duplicates._




If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-13 17:30:17 +00:00
Bagatur
fa06188834
community[patch]: fix QuerySQLDatabaseTool name (#28659)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-12 19:16:03 -08:00
Erick Friis
48ab91b520
docs: more useful vercel warnings (#28699) 2024-12-13 03:07:24 +00:00
Michael Chin
28cb2cefc6
docs: Fix stack diagram in community README (#28685)
- **Description:** The stack diagram illustration in the community
README fails to render due to an invalid branch reference. This PR
replaces the broken image link with a valid one referencing master
branch.
2024-12-12 13:33:50 -08:00
Botong Zhu
13c3c4a210
community: fixes json loader not getting texts with json standard (#27327)
This PR fixes JSONLoader._get_text not converting objects to json string
correctly.
If an object is serializable and is not a dict, JSONLoader will use
python built-in str() method to convert it to string. This may cause
object converted to strings not following json standard. For example, a
list will be converted to string with single quotes, and if json.loads
try to load this string, it will cause error.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:33:45 +00:00
Lorenzo
4149c0dd8d
community: add method to create branch and list files for gitlab tool (#27883)
### About

- **Description:** In the Gitlab utilities used for the Gitlab tool
there are no methods to create branches, list branches and files, as
this is already done for Github
- **Issue:** None
- **Dependencies:** None

This Pull request add the methods:
- create_branch
- list_branches_in_repo
- set_active_branch
- list_files_in_main_branch
- list_files_in_bot_branch
- list_files_from_directory

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 19:11:35 +00:00
Prathamesh Nimkar
ca054ed1b1
community: ChatSnowflakeCortex - Add streaming functionality (#27753)
Description:
snowflake.py
Add _stream and _stream_content methods to enable streaming
functionality
fix pydantic issues and added functionality with the overall langchain
version upgrade
added bind_tools method for agentic workflows support through langgraph
updated the _generate method to account for agentic workflows support
through langgraph
cosmetic changes to comments and if conditions

snowflake.ipynb
Added _stream example
cosmetic changes to comments
fixed lint errors

check_pydantic.sh
Decreased counter from 126 to 125 as suggested when formatting

---------

Co-authored-by: Prathamesh Nimkar <prathamesh.nimkar@snowflake.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 18:35:40 -08:00
Wang, Yi
d834c6b618
huggingface: fix tool argument serialization in _convert_TGI_message_to_LC_message (#26075)
Currently `_convert_TGI_message_to_LC_message` replaces `'` in the tool
arguments, so an argument like "It's" will be converted to `It"s` and
could cause a json parser to fail.

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
2024-12-11 18:34:32 -08:00
Lakindu Boteju
5a31792bf1
community: Add support for cross-region inference profile IDs in Bedrock Anthropic Claude token cost calculation (#28167)
This change modifies the token cost calculation logic to support
cross-region inference profile IDs for Anthropic Claude models. Instead
of explicitly listing all regional variants of new inference profile IDs
in the cost dictionaries, the code now extracts a base model ID from the
input model ID (or inference profile ID), making it more maintainable
and automatically supporting new regional variants.

These inference profile IDs follow the format:
`<region>.<vendor>.<model-name>` (e.g.,
`us.anthropic.claude-3-haiku-xxx`, `eu.anthropic.claude-3-sonnet-xxx`).

Cross-region inference profiles are system-defined identifiers that
enable distributing model inference requests across multiple AWS
regions. They help manage unplanned traffic bursts and enhance
resilience during peak demands without additional routing costs.

References for Amazon Bedrock's cross-region inference profiles:-
-
https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html
-
https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 02:33:50 +00:00
fatmelon
d1e0ec7b55
community: VectorStores: Azure Cosmos DB Mongo vCore with DiskANN (#27329)
# Description
Add a new vector index type `diskann` to Azure Cosmos DB Mongo vCore
vector store. Paper of DiskANN can be found here [DiskANN: Fast Accurate
Billion-point Nearest Neighbor Search on a Single
Node](https://proceedings.neurips.cc/paper_files/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf).

## Sample Usage
```python
from pymongo import MongoClient

# INDEX_NAME = "izzy-test-index-2"
# NAMESPACE = "izzy_test_db.izzy_test_collection"
# DB_NAME, COLLECTION_NAME = NAMESPACE.split(".")

client: MongoClient = MongoClient(CONNECTION_STRING)
collection = client[DB_NAME][COLLECTION_NAME]

model_deployment = os.getenv(
    "OPENAI_EMBEDDINGS_DEPLOYMENT", "smart-agent-embedding-ada"
)
model_name = os.getenv("OPENAI_EMBEDDINGS_MODEL_NAME", "text-embedding-ada-002")

vectorstore = AzureCosmosDBVectorSearch.from_documents(
    docs,
    openai_embeddings,
    collection=collection,
    index_name=INDEX_NAME,
)

# Read more about these variables in detail here. https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search
maxDegree = 40
dimensions = 1536
similarity_algorithm = CosmosDBSimilarityType.COS
kind = CosmosDBVectorSearchType.VECTOR_DISKANN
lBuild = 20

vectorstore.create_index(
            dimensions=dimensions,
            similarity=similarity_algorithm,
            kind=kind ,
            max_degree=maxDegree,
            l_build=lBuild,
        )
```

## Dependencies
No additional dependencies were added

---------

Co-authored-by: Yang Qiao (from Dev Box) <yangqiao@microsoft.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:54:04 +00:00
manukychen
ba9b95cd23
Community: Adding bulk_size as a setable param for OpenSearchVectorSearch (#28325)
Description:
When using langchain.retrievers.parent_document_retriever.py with
vectorstore is OpenSearchVectorSearch, I found that the bulk_size param
I passed into OpenSearchVectorSearch class did not work on my
ParentDocumentRetriever.add_documents() function correctly, it will be
overwrite with int 500 the function which OpenSearchVectorSearch class
had (e.g., add_texts(), add_embeddings()...).

So I made this PR requset to fix this, thanks!

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-12 01:45:22 +00:00
xintoteai
45f9c9ae88
langchain: fixed weaviate (v4) vectorstore import for self-query retriever (#28675)
Co-authored-by: Xin Heng <xin.heng@gmail.com>
2024-12-11 15:53:41 -08:00
Thomas van Dongen
ee640d6bd3
community: fixed bug in model2vec embedding code (#28670)
This PR fixes a bug with the current implementation for Model2Vec
embeddings where `embed_documents` does not work as expected.

- **Description**: the current implementation uses `encode_as_sequence`
for encoding documents. This is incorrect, as `encode_as_sequence`
creates token embeddings and not mean embeddings. The normal `encode`
function handles both single and batched inputs and should be used
instead. The return type was also incorrect, as encode returns a NumPy
array. This PR converts the embedding to a list so that the output is
consistent with the Embeddings ABC.
2024-12-11 15:50:56 -08:00
Brian Sharon
b20230c800
community: use correct id_key when deleting by id in LanceDB wrapper (#28655)
- **Description:** The current version of the `delete` method assumes
that the id field will always be called `id`.
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** ugh, Twitter :D 

---

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:49:35 +00:00
Mohammad Mohtashim
fa155a422f
[Community]: requests_kwargs not being used in _fetch (#28646)
- **Description:** `requests_kwargs` is not being passed to `_fetch`
which is fetching pages asynchronously. In this PR, making sure that we
are passing `requests_kwargs` to `_fetch` just like `_scrape`.
- **Issue:** #28634

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-11 23:46:54 +00:00
Mohammad Mohtashim
a37afbe353
mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for MistralAIEmbeddings (#27818)
- **Description:**: In the event of a Rate Limit Error from the
MistralAI server, the response JSON raises a KeyError. To address this,
a simple retry mechanism has been implemented to handle cases where the
request limit is exceeded.
  - **Issue:** #27790

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-12-11 17:53:42 -05:00
Vincent Zhang
df5008fe55
community[minor]: FAISS Filter Function Enhancement with Advanced Query Operators (#28207)
## Description
We are submitting as a team of four for a project. Other team members
are @RuofanChen03, @LikeWang10067, @TANYAL77.

This pull requests expands the filtering capabilities of the FAISS
vectorstore by adding MongoDB-style query operators indicated as
follows, while including comprehensive testing for the added
functionality.
- $eq (equals)
- $neq (not equals)
- $gt (greater than)
- $lt (less than)
- $gte (greater than or equal)
- $lte (less than or equal)
- $in (membership in list)
- $nin (not in list)
- $and (all conditions must match)
- $or (any condition must match)
- $not (negation of condition)


## Issue
This closes https://github.com/langchain-ai/langchain/issues/26379.


## Sample Usage
```python
import faiss
import asyncio
from langchain_community.vectorstores import FAISS
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
documents = [
    Document(page_content="Process customer refund request", metadata={"schema_type": "financial", "handler_type": "refund",}),
    Document(page_content="Update customer shipping address", metadata={"schema_type": "customer", "handler_type": "update",}),
    Document(page_content="Process payment transaction", metadata={"schema_type": "financial", "handler_type": "payment",}),
    Document(page_content="Handle customer complaint", metadata={"schema_type": "customer","handler_type": "complaint",}),
    Document(page_content="Process invoice payment", metadata={"schema_type": "financial","handler_type": "payment",})
]

async def search(vectorstore, query, schema_type, handler_type, k=2):
    schema_filter = {"schema_type": {"$eq": schema_type}}
    handler_filter = {"handler_type": {"$eq": handler_type}}
    combined_filter = {
        "$and": [
            schema_filter,
            handler_filter,
        ]
    }
    base_retriever = vectorstore.as_retriever(
        search_kwargs={"k":k, "filter":combined_filter}
    )
    return await base_retriever.ainvoke(query)

async def main():
    vectorstore = FAISS.from_texts(
        texts=[doc.page_content for doc in documents],
        embedding=embeddings,
        metadatas=[doc.metadata for doc in documents]
    )
    
    def printt(title, documents):
        print(title)
        if not documents:
            print("\tNo documents found.")
            return
        for doc in documents:
            print(f"\t{doc.page_content}. {doc.metadata}")

    printt("Documents:", documents)
    printt('\nquery="process payment", schema_type="financial", handler_type="payment":', await search(vectorstore, query="process payment", schema_type="financial", handler_type="payment", k=2))
    printt('\nquery="customer update", schema_type="customer", handler_type="update":', await search(vectorstore, query="customer update", schema_type="customer", handler_type="update", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="refund":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="refund", k=2))
    printt('\nquery="refund process", schema_type="financial", handler_type="foobar":', await search(vectorstore, query="refund process", schema_type="financial", handler_type="foobar", k=2))
    print()

if __name__ == "__main__":asyncio.run(main())
```

## Output
```
Documents:
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Handle customer complaint. {'schema_type': 'customer', 'handler_type': 'complaint'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="process payment", schema_type="financial", handler_type="payment":
	Process payment transaction. {'schema_type': 'financial', 'handler_type': 'payment'}
	Process invoice payment. {'schema_type': 'financial', 'handler_type': 'payment'}

query="customer update", schema_type="customer", handler_type="update":
	Update customer shipping address. {'schema_type': 'customer', 'handler_type': 'update'}

query="refund process", schema_type="financial", handler_type="refund":
	Process customer refund request. {'schema_type': 'financial', 'handler_type': 'refund'}

query="refund process", schema_type="financial", handler_type="foobar":
	No documents found.

```

---------

Co-authored-by: ruofan chen <ruofan.is.awesome@gmail.com>
Co-authored-by: RickyCowboy <like.wang@mail.utoronto.ca>
Co-authored-by: Shanni Li <tanya.li@mail.utoronto.ca>
Co-authored-by: RuofanChen03 <114096642+ruofanchen03@users.noreply.github.com>
Co-authored-by: Like Wang <102838708+likewang10067@users.noreply.github.com>
2024-12-11 17:52:22 -05:00
like
3048a9a26d
community: tongyi multimodal response format fix to support langchain (#28645)
Description: The multimodal(tongyi) response format "message": {"role":
"assistant", "content": [{"text": "图像"}]}}]} is not compatible with
LangChain.
Dependencies: No

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 21:13:26 +00:00
Bagatur
d0e662e43b
community[patch]: Release 0.3.11 (#28658) 2024-12-10 20:51:13 +00:00
Bagatur
91227ad7fd
langchain[patch]: Release 0.3.11 (#28657) 2024-12-10 12:28:14 -08:00
Bagatur
1fbd86a155
core[patch]: Release 0.3.24 (#28656) 2024-12-10 20:19:21 +00:00
Bagatur
e6a62d8422
core,langchain,community[patch]: allow langsmith 0.2 (#28598) 2024-12-10 18:50:58 +00:00
ccurme
bc4dc7f4b1
ollama[patch]: permit streaming for tool calls (#28654)
Resolves https://github.com/langchain-ai/langchain/issues/28543

Ollama recently
[released](https://github.com/ollama/ollama/releases/tag/v0.4.6) support
for streaming tool calls. Previously we would override the `stream`
parameter if tools were passed in.

Covered in standard tests here:
c1d348e95d/libs/standard-tests/langchain_tests/integration_tests/chat_models.py (L893-L897)

Before, the test generates one message chunk:
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:49:04.468487Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 525471208,
            'load_duration': 19701000,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 473000000,
            'message': Message(
                role='assistant',
                content='',
                images=None,
                tool_calls=[
                    ToolCall(
                        function=Function(name='magic_function', arguments={'input': 3})
                    )
                ]
            )
        },
        id='run-552bbe0f-8fb2-4105-ada1-fa38c1db444d',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'type': 'tool_call',
            },
        ],
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        },
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': 'b0a4dc07-7d7a-487b-bd7b-ad062c2363a2',
                'index': None,
                'type': 'tool_call_chunk',
            }
        ]
    )
]
```

After, it generates two (tool call in one, response metadata in
another):
```python
[
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={},
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        tool_calls=[
            {
                'name': 'magic_function',
                'args': {'input': 3},
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'type': 'tool_call',
            },
        ],
        tool_call_chunks=[
            {
                'name': 'magic_function',
                'args': '{"input": 3}',
                'id': '5bbaee2d-c335-4709-8d67-0783c74bd2e0',
                'index': None,
                'type': 'tool_call_chunk',
            },
        ],
    ),
    AIMessageChunk(
        content='',
        additional_kwargs={},
        response_metadata={
            'model': 'llama3.1',
            'created_at': '2024-12-10T17:46:43.278436Z',
            'done': True,
            'done_reason': 'stop',
            'total_duration': 514282750,
            'load_duration': 16894458,
            'prompt_eval_count': 170,
            'prompt_eval_duration': 31000000,
            'eval_count': 17,
            'eval_duration': 464000000,
            'message': Message(
                role='assistant', content='', images=None, tool_calls=None
            ),
        },
        id='run-9a3f0860-baa1-4bae-9562-13a61702de70',
        usage_metadata={
            'input_tokens': 170, 'output_tokens': 17, 'total_tokens': 187
        }
    ),
]
```
2024-12-10 12:54:37 -05:00
Johannes Mohren
c1d348e95d
doc-loader: retain Azure Doc Intelligence API metadata in Document parser (#28382)
**Description**:
This PR modifies the doc_intelligence.py parser in the community package
to include all metadata returned by the Azure Doc Intelligence API in
the Document object. Previously, only the parsed content (markdown) was
retained, while other important metadata such as bounding boxes (bboxes)
for images and tables was discarded. These image bboxes are crucial for
supporting use cases like multi-modal RAG workflows when using Azure Doc
Intelligence.

The change ensures that all information returned by the Azure Doc
Intelligence API is preserved by setting the metadata attribute of the
Document object to the entire result returned by the API, rather than an
empty dictionary. This extends the parser's utility for complex use
cases without breaking existing functionality.

**Issue**:
This change does not address a specific issue number, but it resolves a
critical limitation in supporting multimodal workflows when using the
LangChain wrapper for the Azure API.

**Dependencies**:
No additional dependencies are required for this change.

---------

Co-authored-by: jmohren <johannes.mohren@aol.de>
2024-12-10 11:22:58 -05:00
Alex Tonkonozhenko
0d20c314dd
Confluence Loader: Fix CQL loading (#27620)
fix #12082

<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
2024-12-10 11:05:23 -05:00
Katarina Supe
aba2711e7f
community: update Memgraph integration (#27017)
**Description:**
- **Memgraph** no longer relies on `Neo4jGraphStore` but **implements
`GraphStore`**, just like other graph databases.
- **Memgraph** no longer relies on `GraphQAChain`, but implements
`MemgraphQAChain`, just like other graph databases.
- The refresh schema procedure has been updated to try using `SHOW
SCHEMA INFO`. The fallback uses Cypher queries (a combination of schema
and Cypher) → **LangChain integration no longer relies on MAGE
library**.
- The **schema structure** has been reformatted. Regardless of the
procedures used to get schema, schema structure is the same.
- The `add_graph_documents()` method has been implemented. It transforms
`GraphDocument` into Cypher queries and creates a graph in Memgraph. It
implements the ability to use `baseEntityLabel` to improve speed
(`baseEntityLabel` has an index on the `id` property). It also
implements the ability to include sources by creating a `MENTIONS`
relationship to the source document.
- Jupyter Notebook for Memgraph has been updated.
- **Issue:** /
- **Dependencies:** /
- **Twitter handle:** supe_katarina (DX Engineer @ Memgraph)

Closes #25606
2024-12-10 10:57:21 -05:00
ccurme
5c6e2cbcda
ollama[patch]: support structured output (#28629)
- Bump minimum version of `ollama` to 0.4.4 (which also addresses
https://github.com/langchain-ai/langchain/issues/28607).
- Support recently-released [structured
output](https://ollama.com/blog/structured-outputs) feature. This can be
accessed by calling `.with_structured_output` with
`method="json_schema"` (choice of name
[mirrors](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.with_structured_output)
what we have for OpenAI's structured output feature).

`ChatOllama` previously implemented `.with_structured_output` via the
[base
implementation](ec9b41431e/libs/core/langchain_core/language_models/chat_models.py (L1117)).
2024-12-10 10:36:00 -05:00
Bagatur
24292c4a31
core[patch]: Release 0.3.23 (#28648) 2024-12-10 10:01:16 +00:00
Bagatur
e24f86e55f
core[patch]: return ToolMessage from tool (#28605) 2024-12-10 09:59:38 +00:00
Erick Friis
ef2f875dfb
core: deprecate PipelinePromptTemplate (#28644) 2024-12-10 03:56:48 +00:00
TamagoTorisugi
0f0df2df60
fix: Set default search_type to 'similarity' in as_retriever method of AzureSearch (#28376)
**Description**
This PR updates the `as_retriever` method in the `AzureSearch` to ensure
that the `search_type` parameter defaults to 'similarity' when not
explicitly provided.

Previously, if the `search_type` was omitted, it did not default to any
specific value. So it was inherited from
`AzureSearchVectorStoreRetriever`, which defaults to 'hybrid'.

This change ensures that the intended default behavior aligns with the
expected usage.

**Issue**
No specific issue was found related to this change.

**Dependencies**
No new dependencies are introduced with this change.

---------

Co-authored-by: prrao87 <prrao87@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:40:04 +00:00
Prashanth Rao
8c6eec5f25
community: KuzuGraph needs allow_dangerous_requests, add graph documents via LLMGraphTransformer (#27949)
- [x] **PR title**: "community: Kuzu - Add graph documents via
LLMGraphTransformer"
- This PR adds a new method `add_graph_documents` to use the
`GraphDocument`s extracted by `LLMGraphTransformer` and store in a Kùzu
graph backend.
- This allows users to transform unstructured text into a graph that
uses Kùzu as the graph store.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: pookam90 <pookam@microsoft.com>
Co-authored-by: Pooja Kamath <60406274+Pookam90@users.noreply.github.com>
Co-authored-by: hsm207 <hsm207@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 03:15:28 +00:00
Filip Ratajczak
4e743b5427
Core: google docstring parsing fix (#28404)
Thank you for contributing to LangChain!

- [ ] **PR title**: "core: google docstring parsing fix"


- [x] **PR message**:
- **Description:** Added a solution for invalid parsing of google
docstring such as:
    Args:
net_annual_income (float): The user's net annual income (in current year
dollars).
- **Issue:** Previous code would return arg = "net_annual_income
(float)" which would cause exception in
_validate_docstring_args_against_annotations
    - **Dependencies:** None

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 00:27:25 +00:00
Arnav Priyadarshi
b78b2f7a28
community[fix]: Update Perplexity to pass parameters into API calls (#28421)
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- **Description:** I realized the invocation parameters were not being
passed into `_generate` so I added those in but then realized that the
parameters contained some old fields designed for an older openai client
which I removed. Parameters work fine now.
- **Issue:** Fixes #28229 
- **Dependencies:** No new dependencies.  
- **Twitter handle:** @arch_plane

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-10 00:23:31 +00:00
Clément Jumel
cf6d1c0ae7
docs: add Linkup integration documentation (#28366)
## Description

First of all, thanks for the great framework that is LangChain!

At [Linkup](https://www.linkup.so/) we're working on an API to connect
LLMs and agents to the internet and our partner sources. We'd be super
excited to see our API integrated in LangChain! This essentially
consists in adding a LangChain retriever and tool, which is done in our
own [package](https://pypi.org/project/langchain-linkup/). Here we're
simply following the [integration
documentation](https://python.langchain.com/docs/contributing/how_to/integrations/)
and update the documentation of LangChain to mention the Linkup
integration.

We do have tests (both units & integration) in our [source
code](https://github.com/LinkupPlatform/langchain-linkup), and tried to
follow as close as possible the [integration
documentation](https://python.langchain.com/docs/contributing/how_to/integrations/)
which specifically requests to focus on documentation changes for an
integration PR, so I'm not adding tests here, even though the PR
checklist seems to suggest so. Feel free to correct me if I got this
wrong!

By the way, we would be thrilled by being mentioned in the list of
providers which have standalone packages
[here](https://langchain-git-fork-linkupplatform-cj-doc-langchain.vercel.app/docs/integrations/providers/),
is there something in particular for us to do for that? 🙂

## Twitter handle

Linkup_platform
<!--
## PR Checklist

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
--!>
2024-12-09 14:36:25 -08:00
Amir Sadeghi
2c49f587aa
community[fix]: could not locate runnable browser (#28289)
set open_browser to false to resolve "could not locate runnable browser"
error while default browser is None

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 21:05:52 +00:00
Martin Triska
75bc6bb191
community: [bugfix] fix source path for office files in O365 (#28260)
# What problem are we fixing?

Currently documents loaded using `O365BaseLoader` fetch source from
`file.web_url` (where `file` is `<class 'O365.drive.File'>`). This works
well for `.pdf` documents. Unfortunately office documents (`.xlsx`,
`.docx` ...) pass their `web_url` in following format:

`https://sharepoint_address/sites/path/to/library/root/Doc.aspx?sourcedoc=%XXXXXXXX-1111-1111-XXXX-XXXXXXXXXX%7D&file=filename.xlsx&action=default&mobileredirect=true`

This obfuscates the path to the file. This PR utilizes the parrent
folder's path and file name to reconstruct the actual location of the
file. Knowing the file's location can be crucial for some RAG
applications (path to the file can carry information we don't want to
loose).

@vbarda Could you please look at this one? I'm @-mentioning you since
we've already closed some PRs together :-)

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 12:34:59 -08:00
Erick Friis
534b8f4364
standard-tests: release 0.3.7 (#28637) 2024-12-09 15:12:18 -05:00
Naka Masato
ce3b69aa05
community: add include_labels option to ConfluenceLoader (#28259)
## **Description:**

Enable `ConfluenceLoader` to include labels with `include_labels` option
(`false` by default for backward compatibility). and the labels are set
to `metadata` in the `Document`. e.g. `{"labels": ["l1", "l2"]}`

## Notes

Confluence API supports to get labels by providing `metadata.labels` to
`expand` query parameter

All of the following functions support `expand` in the same way:
- confluence.get_page_by_id
- confluence.get_all_pages_by_label
- confluence.get_all_pages_from_space
- cql (internally using
[/api/content/search](https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get))

## **Issue:**

No issue related to this PR.

## **Dependencies:** 

No changes.

## **Twitter handle:** 

[@gymnstcs](https://x.com/gymnstcs)


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:35:01 +00:00
Rajendra Kadam
242fee11be
community[minor] Pebblo: Support for new Pinecone class PineconeVectorStore (#28253)
- **Description:** Support for new Pinecone class PineconeVectorStore in
PebbloRetrievalQA.
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** -

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 19:33:54 +00:00
nikitajoyn
9fcd203556
partners/mistralai: Fix KeyError in Vertex AI stream (#28624)
- **Description:** Streaming response from Mistral model using Vertex AI
raises KeyError when trying to access `choices` key, that the last chunk
doesn't have. The fix is to access the key safely using `get()`.
  - **Issue:** https://github.com/langchain-ai/langchain/issues/27886
  - **Dependencies:**
  - **Twitter handle:**
2024-12-09 14:14:58 -05:00
maang-h
b64d846347
docs: Standardize MoonshotChat docstring (#28159)
- **Description:** Add docstring

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 18:46:25 +00:00
Erick Friis
4c70ffff01
standard-tests: sync/async vectorstore tests conditional (#28636)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-09 18:02:55 +00:00
ccurme
ffb5c1905a
openai[patch]: release 0.2.12 (#28633) 2024-12-09 12:38:13 -05:00
ccurme
6e6061fe73
openai[patch]: bump minimum SDK version (#28632)
Resolves https://github.com/langchain-ai/langchain/issues/28625
2024-12-09 11:28:05 -05:00
Mohammad Mohtashim
ec9b41431e
[Core]: Small Docstring Clarification for BaseTool (#28148)
- **Description:** `kwargs` are not being passed to `run` of the
`BaseTool` which has been fixed
- **Issue:** #28114

---------

Co-authored-by: Stevan Kapicic <kapicic.ste1@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 06:10:19 +00:00
Erick Friis
cef21a0b49
cli: warning on app add (#28619)
instead of #28128
2024-12-09 06:07:14 +00:00
Ankit Dangi
90f162efb6
text-splitters: add pydocstyle linting (#28127)
As seen in #23188, turned on Google-style docstrings by enabling
`pydocstyle` linting in the `text-splitters` package. Each resulting
linting error was addressed differently: ignored, resolved, suppressed,
and missing docstrings were added.

Fixes one of the checklist items from #25154, similar to #25939 in
`core` package. Ran `make format`, `make lint` and `make test` from the
root of the package `text-splitters` to ensure no issues were found.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 06:01:03 +00:00
WGNW_MG
eabe587787
community[patch]:Fix for get_openai_callback() return token_cost=0.0 when model is gpt-4o-11-20 (#28408)
- **Description:** update MODEL_COST_PER_1K_TOKENS for new gpt-4o-11-20.
- **Issue:** with latest gpt-4o-11-20, openai callback return
token_cost=0.0
- **Dependencies:** None (just simple dict fix.)
- **Twitter handle:** I Don't Use Twitter. 
- (However..., I have a YouTube channel. Could you upload this there, by
any chance?
https://www.youtube.com/@%EA%B2%9C%EC%B0%BD%EB%B6%80%EA%B3%A0%EB%AC%B8AI%EC%9E%90%EB%AC%B8%EC%84%BC%EC%84%B8)
2024-12-08 20:46:50 -08:00
Fahim Zaman
481c4bfaba
core[patch]: Fixed trim functions, and added corresponding unit test for the solved issue (#28429)
- **Description:** 
- Trim functions were incorrectly deleting nodes with more than 1
outgoing/incoming edge, so an extra condition was added to check for
this directly. A unit test "test_trim_multi_edge" was written to test
this test case specifically.
- **Issue:** 
  - Fixes #28411 
  - Fixes https://github.com/langchain-ai/langgraph/issues/1676
- **Dependencies:** 
  - No changes were made to the dependencies

- [x] Unit tests were added to verify the changes.
- [x] Updated documentation where necessary.
- [x] Ran make format, make lint, and make test to ensure compliance
with project standards.

---------

Co-authored-by: Tasif Hussain <tasif006@gmail.com>
2024-12-08 20:45:28 -08:00
Marco Perini
2354bb7bfa
partners: 🕷️🦜 ScrapeGraph API Integration (#28559)
Hi Langchain team!

I'm the co-founder and mantainer at
[ScrapeGraphAI](https://scrapegraphai.com/).
By following the integration
[guide](https://python.langchain.com/docs/contributing/how_to/integrations/publish/)
on your site, I have created a new lib called
[langchain-scrapegraph](https://github.com/ScrapeGraphAI/langchain-scrapegraph).

With this PR I would like to integrate Scrapegraph as provider in
Langchain, adding the required documentation files.
Let me know if there are some changes to be made to be properly
integrated both in the lib and in the documentation.

Thank you 🕷️🦜

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-09 02:38:21 +00:00
Abhinav
317a38b83e
community[minor]: Add support for modle2vec embeddings (#28507)
This PR add an embeddings integration for model2vec, the
`Model2vecEmbeddings` class.

- **Description**: [Model2Vec](https://github.com/MinishLab/model2vec)
lets you turn any sentence transformer into a really small static model
and makes running the model faster.
- **Issue**:
- **Dependencies**: model2vec
([pypi](https://pypi.org/project/model2vec/))
- **Twitter handle:**:

- [x] **Add tests and docs**: 
-
[Test](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/libs/community/langchain_community/embeddings/model2vec.py),
[docs](https://github.com/blacksmithop/langchain/blob/model2vec_embeddings/docs/docs/integrations/text_embedding/model2vec.ipynb)

- [x] **Lint and test**:

---------

Co-authored-by: Abhinav KM <abhinav.m@zerone-consulting.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-12-09 02:17:22 +00:00
Mohammad Mohtashim
524ee6d9ac
Invalid tool_choice being passed to ChatLiteLLM (#28198)
- **Description:** Invalid `tool_choice` is given to `ChatLiteLLM` to
`bind_tools` due to it's parent's class default value being pass through
`with_structured_output`.
- **Issue:** #28176
2024-12-07 14:33:40 -05:00
Erick Friis
dd0085a9ff
docs: standard tests to markdown, load templates from files (#28603) 2024-12-07 01:37:21 +00:00
Erick Friis
5e8553c31a
standard-tests: retriever docstrings (#28596) 2024-12-07 00:32:19 +00:00
ccurme
d801c6ffc7
tests[patch]: nits (#28601) 2024-12-07 00:13:04 +00:00
Erick Friis
07c2ac765a
community: release 0.3.10 (#28600) 2024-12-07 00:07:13 +00:00
Erick Friis
4a7dc6ec4c
standard-tests: release 0.3.6 (#28599) 2024-12-07 00:05:04 +00:00
ccurme
80a88f8f04
tests[patch]: update API ref for chat models (#28594) 2024-12-06 19:00:14 -05:00
Erick Friis
0eb7ab65f1
multiple: fix xfailed signatures (#28597) 2024-12-06 15:39:47 -08:00
Erick Friis
b7c2029e84
standard-tests: root docstrings (#28595) 2024-12-06 15:14:52 -08:00
Erick Friis
9e2abcd152
standard-tests: show right classes in api docs (#28591) 2024-12-06 14:48:13 -08:00
Erick Friis
246c10a1cc
standard-tests: private members and tools unit troubleshoot (#28590) 2024-12-06 13:52:58 -08:00
Erick Friis
e6663b69f3
langchain: release 0.3.10 (#28585) 2024-12-06 20:20:24 +00:00
Erick Friis
c38b845d7e
core: fix path test (#28584) 2024-12-06 20:05:18 +00:00
ccurme
2c6bc74cb1
multiple: combine sync/async vector store standard test suites (#28580)
Breaking change in `langchain-tests`.
2024-12-06 14:55:06 -05:00
Bagatur
dda9f90047
core[patch]: Release 0.3.22 (#28582) 2024-12-06 19:36:53 +00:00
ccurme
f3dc142d3c
cli[patch]: implement minimal starter vector store (#28577)
Basically the same as core's in-memory vector store. Removed some
optional methods.
2024-12-06 13:10:22 -05:00
Erick Friis
5277a021c1
docs: raw loader codeblock (#28548)
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-12-06 09:26:34 -08:00
Erick Friis
18386c16c7
core, tests: more tolerant _aget_relevant_documents function (#28462) 2024-12-06 00:49:30 +00:00
Erick Friis
bc636ccc60
cli: release 0.0.35 (#28557) 2024-12-05 16:40:52 -08:00
Erick Friis
7ecf38f4fa
cli: create specific files from template (#28556) 2024-12-06 00:32:47 +00:00
Erick Friis
478def8dcc
core: deprecation doc removal (#28553)
![ScreenShot 2024-12-05 at 02 33
43PM@2x](https://github.com/user-attachments/assets/e1ce495b-90ca-41c7-9a65-b403a934675c)
2024-12-05 15:35:28 -08:00
cinqisap
482e8a7855
community: Add support for SAP HANA Vector hnsw index creation (#27884)
**Issue:** Added support for creating indexes in the SAP HANA Vector
engine.
 
**Changes**: 
1. Introduced a new function `create_hnsw_index` in `hanavector.py` that
enables the creation of indexes for SAP HANA Vector.
2. Added integration tests for the index creation function to ensure
functionality.
3. Updated the documentation to reflect the new index creation feature,
including examples and output from the notebook.
4. Fix the operator issue in ` _process_filter_object` function and
change the array argument to a placeholder in the similarity search SQL
statement.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-05 23:29:08 +00:00
blaufink
28f8d436f6
mistral: fix of issue #26029 (#28233)
- Description: Azure AI takes an issue with the safe_mode parameter
being set to False instead of None. Therefore, this PR changes the
default value of safe_mode from False to None. This results in it being
filtered out before the request is sent - avoind the extra-parameter
issue described below.

- Issue: #26029

- Dependencies: /

---------

Co-authored-by: blaufink <sebastian.brueckner@outlook.de>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-05 23:28:12 +00:00
ccurme
ecdfc98ef6
tests[patch]: run standard tests for embeddings and populate embeddings API ref (#28545)
plus minor updates to chat models and vector store API refs
2024-12-05 19:39:03 +00:00
ccurme
b8e861a63b
openai[patch]: add standard tests for embeddings (#28540) 2024-12-05 17:00:27 +00:00
ZhangShenao
d26555c682
[VectorStore] Improvement: Improve chroma vector store (#28524)
- Complete unit test
- Fix spelling error
2024-12-05 11:58:32 -05:00
ccurme
8f9b3b7498
chroma[patch]: fix bug (#28538)
Fix bug introduced in
https://github.com/langchain-ai/langchain/pull/27995

If all document IDs are `""`, the chroma SDK will raise
```
DuplicateIDError: Expected IDs to be unique
```

Caught by [docs
tests](https://github.com/langchain-ai/langchain/actions/runs/12180395579/job/33974633950),
but added a test to langchain-chroma as well.
2024-12-05 15:37:19 +00:00
Erick Friis
ecff9a01e4
cli: release 0.0.34 (#28525) 2024-12-05 15:35:49 +00:00
ccurme
d9e42a1517
langchain[patch]: fix deprecation warning (#28535) 2024-12-05 14:49:10 +00:00
Erick Friis
0f539f0246
standard-tests: release 0.3.5 (#28526) 2024-12-05 00:41:07 -08:00
Erick Friis
43c35d19d4
cli: standard tests in cli, test that they run, skip vectorstore tests (#28521) 2024-12-05 00:38:32 -08:00
Erick Friis
c5acedddc2
anthropic: timeout in tests (10s) (#28488) 2024-12-04 16:03:38 -08:00
ccurme
f459754470
tests[patch]: populate API reference for vector stores (#28520) 2024-12-05 00:02:31 +00:00
ccurme
8bc2c912b8
chroma[patch]: (nit) simplify test (#28517)
Use `self.get_embeddings` on test class instead of importing embeddings
separately.
2024-12-04 20:22:55 +00:00
ccurme
eec55c2550
chroma[patch]: add get_by_ids and fix bug (#28516)
- Run standard integration tests in Chroma
- Add `get_by_ids` method
- Fix bug in `add_texts`: if a list of `ids` is passed but any of them
are None, Chroma will raise an exception. Here we assign a uuid.
2024-12-04 14:00:36 -05:00
Erick Friis
e6a08355a3
docs: more api ref links, add linting step to prevent more (#28495) 2024-12-04 04:19:42 +00:00
wlleiiwang
6151ea78d5
community: implement _select_relevance_score_fn for tencent vectordb (#28036)
implement _select_relevance_score_fn for tencent vectordb
fix use external embedding for tencent vectordb

Co-authored-by: wlleiiwang <wlleiiwang@tencent.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 03:03:00 +00:00
Asi Greenholts
d34bf78f3b
community: BM25Retriever preservation of document id (#27019)
Currently this retriever discards document ids

---------

Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 00:36:00 +00:00
peterdhp
bc5ec63d67
community : allow using apikey for PubMedAPIWrapper (#27246)
**Description**: 

> Without an API key, any site (IP address) posting more than 3 requests
per second to the E-utilities will receive an error message. By
including an API key, a site can post up to 10 requests per second by
default.

quoted from A General Introduction to the E-utilities,NCBI :
https://www.ncbi.nlm.nih.gov/books/NBK25497/

I have simply added a api_key parameter to the PubMedAPIWrapper that can
be used to increase the number of requests per second from 3 to 10.

**Twitter handle** : @KORmaori

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:21:22 -08:00
Eric Pinzur
eff8a54756
langchain_chroma: added document.id support (#27995)
Description:
* Added internal `Document.id` support to Chroma VectorStore

Dependencies:
* https://github.com/langchain-ai/langchain/pull/27968 should be merged
first and this PR should be re-based on top of those changes.

Tests:
* Modified/Added tests for `Document.id` support. All tests are passing.


Note: I am not a member of the Chroma team.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-04 00:04:27 +00:00
William Smith
15e7353168
langchain_community: updated query constructor for Databricks Vector Search due to LangChainDeprecationWarning: filters was deprecated since langchain-community 0.2.11 and will be removed in 0.3. Please use filter instead. (#27974)
- **Description:** Updated the kwargs for the structured query from
filters to filter due to deprecation of 'filters' for Databricks Vector
Search. Also changed the error messages as the allowed operators and
comparators are different which can cause issues with functions such as
get_query_constructor_prompt()

- **Issue:** Fixes the Key Error for filters due to deprecation in favor
for 'filter':

LangChainDeprecationWarning: DatabricksVectorSearch received a key
`filters` in search_kwargs. `filters` was deprecated since
langchain-community 0.2.11 and will be removed in 0.3. Please use
`filter` instead.

- **Dependencies:** N/A
- **Twitter handle:** N/A

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-12-03 16:03:53 -08:00