langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-04 20:28:10 +00:00

Author	SHA1	Message	Date
Bagatur	c3ccd93c12	patch openai json mode test (#28831 )	2024-12-19 21:43:32 +00:00
Bagatur	ce6748dbfe	xfail openai image token count test (#28828 )	2024-12-19 21:23:30 +00:00
Anusha Karkhanis	26bdf40072	Langchain_Community: SQL LanguageParser (#28430 ) ## Description (This PR has contributions from @khushiDesai, @ashvini8, and @ssumaiyaahmed). This PR addresses Issue #11229 which addresses the need for SQL support in document parsing. This is integrated into the generic TreeSitter parsing library, allowing LangChain users to easily load codebases in SQL into smaller, manageable "documents." This pull request adds a new ```SQLSegmenter``` class, which provides the SQL integration. ## Issue Issue #11229: Add support for a variety of languages to LanguageParser ## Testing We created a file ```test_sql.py``` with several tests to ensure the ```SQLSegmenter``` is functional. Below are the tests we added: - ```def test_is_valid```: Checks SQL validity. - ```def test_extract_functions_classes```: Extracts individual SQL statements. - ```def test_simplify_code```: Simplifies SQL code with comments. --------- Co-authored-by: Syeda Sumaiya Ahmed <114104419+ssumaiyaahmed@users.noreply.github.com> Co-authored-by: ashvini hunagund <97271381+ashvini8@users.noreply.github.com> Co-authored-by: Khushi Desai <khushi.desai@advantawitty.com> Co-authored-by: Khushi Desai <59741309+khushiDesai@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-19 20:30:57 +00:00
Bagatur	a7f2148061	openai[patch]: Release 0.2.14 (#28826 )	2024-12-19 11:56:44 -08:00
Bagatur	1378ddfa5f	openai[patch]: type reasoning_effort (#28825 )	2024-12-19 19:36:49 +00:00
Erick Friis	6a37899b39	core: dont mutate tool_kwargs during tool run (#28824 ) fixes https://github.com/langchain-ai/langchain/issues/24621	2024-12-19 18:11:56 +00:00
Qun	033ac41760	fix crash when using create_xml_agent with parameterless function as … (#26002 ) When using `create_xml_agent` or `create_json_chat_agent` to create a agent, and the function corresponding to the tool is a parameterless function, the `XMLAgentOutputParser` or `JSONAgentOutputParser` will parse the tool input into an empty string, `BaseTool` will parse it into a positional argument. So, the program will crash finally because we invoke a parameterless function but with a positional argument.Specially, below code will raise StopIteration in [_parse_input](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/tools/base.py#L419) ```python from langchain import hub from langchain.agents import AgentExecutor, create_json_chat_agent, create_xml_agent from langchain_openai import ChatOpenAI prompt = hub.pull("hwchase17/react-chat-json") llm = ChatOpenAI() # agent = create_xml_agent(llm, tools, prompt) agent = create_json_chat_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) agent_executor.invoke(......) ``` --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 13:00:46 -05:00
Luke	f69695069d	text_splitters: Add HTMLSemanticPreservingSplitter (#25911 ) Description: With current HTML splitters, they rely on secondary use of the `RecursiveCharacterSplitter` to further chunk the document into manageable chunks. The issue with this is it fails to maintain important structures such as tables, lists, etc within HTML. This Implementation of a HTML splitter, allows the user to define a maximum chunk size, HTML elements to preserve in full, options to preserve `<a>` href links in the output and custom handlers. The core splitting begins with headers, similar to `HTMLHeaderSplitter`. If these sections exceed the length of the `max_chunk_size` further recursive splitting is triggered. During this splitting, elements listed to preserve, will be excluded from the splitting process. This can cause chunks to be slightly larger then the max size, depending on preserved length. However, all contextual relevance of the preserved item remains intact. Custom Handlers: Sometimes, companies such as Atlassian have custom HTML elements, that are not parsed by default with `BeautifulSoup`. Custom handlers allows a user to provide a function to be ran whenever a specific html tag is encountered. This allows the user to preserve and gather information within custom html tags that `bs4` will potentially miss during extraction. Dependencies: User will need to install `bs4` in their project to utilise this class I have also added in `how_to` and unit tests, which require `bs4` to run, otherwise they will be skipped. Flowchart of process: ![HTMLSemanticPreservingSplitter](https://github.com/user-attachments/assets/20873c36-22ed-4c80-884b-d3c6f433f5a7) --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 12:09:22 -05:00
Tommaso De Lorenzo	24bfa062bf	langchain: add support for Google Anthropic Vertex AI model garden provider in init_chat_model (#28177 ) Simple modification to add support for anthropic models deployed in Google Vertex AI model garden in `init_chat_model` importing `ChatAnthropicVertex` - [v] Lint and test	2024-12-19 12:06:21 -05:00
Erick Friis	ff7b01af88	anthropic: less pydantic for client (#28823 )	2024-12-19 08:00:02 -08:00
Erick Friis	f1d783748a	anthropic: sdk bump (#28820 )	2024-12-19 15:39:21 +00:00
Erick Friis	907f36a6e9	fireworks: fix lint (#28821 )	2024-12-19 15:36:36 +00:00
Erick Friis	6526db4871	community: bump core (#28819 )	2024-12-19 06:41:53 -08:00
Vignesh A	4c9acdfbf1	Community : Add OpenAI prompt caching and reasoning tokens tracking (#27135 ) Added Token tracking for OpenAI's prompt caching and reasoning tokens Costs updated from https://openai.com/api/pricing/ usage example ```python from langchain_community.callbacks import get_openai_callback from langchain_openai import ChatOpenAI llm = ChatOpenAI(model_name="o1-mini",temperature=1) with get_openai_callback() as cb: response = llm.invoke("hi "*1500) print(cb) ``` Output ``` Tokens Used: 1720 Prompt Tokens: 1508 Prompt Tokens Cached: 1408 Completion Tokens: 212 Reasoning Tokens: 192 Successful Requests: 1 Total Cost (USD): $0.0049559999999999995 ``` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-19 09:31:13 -05:00
ScriptShi	97f1e1d39f	community: tablestore vector store check the dimension of the embedding when writing it to store. (#28812 ) Added some restrictions to a vectorstore I released in the community before.	2024-12-19 09:30:43 -05:00
Wang Ran (汪然)	f48755d35b	core: typo `Utilities for tests.` -> `Utilities for pydantic.` (#28814 ) Description: typo	2024-12-19 09:26:17 -05:00
Wang Ran (汪然)	51b8ddaf10	core: typo in runnable (#28815 ) Thank you for contributing to LangChain! Description: Typo	2024-12-19 09:25:57 -05:00
Erick Friis	3b036a1cf2	partners/fireworks: release 0.2.6 (#28805 )	2024-12-18 22:48:35 +00:00
Erick Friis	4eb8bf7793	partners/anthropic: release 0.3.1 (#28801 )	2024-12-18 22:45:38 +00:00
Lu Peng	50afa7c4e7	community: add new parameter default_headers (#28700 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - "community: 1. add new parameter `default_headers` for oci model deployments and oci chat model deployments. 2. updated k parameter in OCIModelDeploymentLLM class." - [x] PR message: - Description: 1. add new parameters `default_headers` for oci model deployments and oci chat model deployments. 2. updated k parameter in OCIModelDeploymentLLM class. - [x] Add tests and docs: 1. unit tests 2. notebook --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 22:33:23 +00:00
Erick Friis	cc616de509	partners/xai: release 0.1.1 (#28806 )	2024-12-18 22:15:24 +00:00
Erick Friis	ba8c1b0d8c	partners/groq: release 0.2.2 (#28804 )	2024-12-18 22:12:02 +00:00
Erick Friis	a119cae5bd	partners/mistralai: release 0.2.4 (#28803 )	2024-12-18 22:11:48 +00:00
Erick Friis	514d78516b	partners/ollama: release 0.2.2 (#28802 )	2024-12-18 22:11:08 +00:00
Bagatur	68940dd0d6	openai[patch]: Release 0.2.13 (#28800 )	2024-12-18 22:08:47 +00:00
Erick Friis	4dc28b43ac	community: release 0.3.13 (#28798 )	2024-12-18 21:58:46 +00:00
Bagatur	557f63c2e6	core[patch]: Release 0.3.27 (#28799 )	2024-12-18 21:58:03 +00:00
Bagatur	4a531437bb	core[patch], openai[patch]: Handle OpenAI developer msg (#28794 ) - Convert developer openai messages to SystemMessage - store additional_kwargs={"__openai_role__": "developer"} so that the correct role can be reconstructed if needed - update ChatOpenAI to read in openai_role --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 21:54:07 +00:00
Erick Friis	079f1d93ab	langchain: release 0.3.13 (#28797 )	2024-12-18 12:32:00 -08:00
Yuxin Chen	3256b5d6ae	text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter (#28373 ) - Description: This PR resolves an issue with the `ExperimentalMarkdownSyntaxTextSplitter` class, which retains the internal state across multiple calls to the `split_text` method. This behaviour caused an unintended accumulation of chunks in `self` variables, leading to incorrect outputs when processing multiple Markdown files sequentially. - Modified `libs\text-splitters\langchain_text_splitters\markdown.py` to reset the relevant internal attributes at the start of each `split_text` invocation. This ensures each call processes the input independently. - Added unit tests in `libs\text-splitters\tests\unit_tests\test_text_splitters.py` to verify the fix and ensure the state does not persist across calls. - Issue: Fixes [#26440](https://github.com/langchain-ai/langchain/issues/26440). - Dependencies: No additional dependencies are introduced with this change. - [x] Unit tests were added to verify the changes. - [x] Updated documentation where necessary. - [x] Ran `make format`, `make lint`, and `make test` to ensure compliance with project standards. --------- Co-authored-by: Angel Chen <angelchen396@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 20:27:59 +00:00
Mohammad Mohtashim	7c8f977695	Community: Fix `with_structured_output` for `ChatSambaNovaCloud` (#28796 ) - Description: The `kwargs` was being checked as None object which was causing the rest of code in `with_structured_output` not getting executed. The checking part has been fixed in this PR. - Issue: #28776	2024-12-18 14:35:06 -05:00
V.Prasanna kumar	684b146b18	Fixed adding float values into DynamoDB (#26562 ) Thank you for contributing to LangChain! - [x] PR title: Add float Message into Dynamo DB - community - Example: "community: Chat Message History - [x] PR message: - Description: pushing float values into dynamo db creates error , solved that by converting to str type - Issue: Float values are not getting pushed - Twitter handle: VpkPrasanna Have added an utility function for str conversion , let me know where to place it happy to do an commit. This PR is from an discussion of #26543 @hwchase17 @baskaryan @efriis --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 13:45:00 -05:00
William FH	50ea1c3ea3	[Core] respect tracing project name cvar (#28792 )	2024-12-18 10:02:02 -08:00
Martin Triska	e6b41d081d	community: DocumentLoaderAsParser wrapper (#27749 ) ## Description This pull request introduces the `DocumentLoaderAsParser` class, which acts as an adapter to transform document loaders into parsers within the LangChain framework. The class enables document loaders that accept a `file_path` parameter to be utilized as blob parsers. This is particularly useful for integrating various document loading capabilities seamlessly into the LangChain ecosystem. When merged in together with PR https://github.com/langchain-ai/langchain/pull/27716 It opens options for `SharePointLoader` / `OneDriveLoader` to process any filetype that has a document loader. ### Features - Flexible Parsing: The `DocumentLoaderAsParser` class can adapt any document loader that meets the criteria of accepting a `file_path` argument, allowing for lazy parsing of documents. - Compatibility: The class has been designed to work with various document loaders, making it versatile for different use cases. ### Usage Example To use the `DocumentLoaderAsParser`, you would initialize it with a suitable document loader class and any required parameters. Here’s an example of how to do this with the `UnstructuredExcelLoader`: ```python from langchain_community.document_loaders.blob_loaders import Blob from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser from langchain_community.document_loaders.excel import UnstructuredExcelLoader # Initialize the parser adapter with UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # Use parser, for ex. pass it to MimeTypeBasedParser MimeTypeBasedParser( handlers={ "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": xlsx_parser } ) ``` - Dependencies: None - Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 12:47:08 -05:00
Erick Friis	9b024d00c9	text-splitters: release 0.3.4 (#28795 )	2024-12-18 09:44:36 -08:00
Erick Friis	5cf965004c	core: release 0.3.26 (#28793 )	2024-12-18 17:28:42 +00:00
Mohammad Mohtashim	d49df4871d	[Community]: Image Extraction Fixed for `PDFPlumberParser` (#28491 ) - Description: One-Bit Images was raising error which has been fixed in this PR for `PDFPlumberParser` - Issue: #28480 --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 11:45:48 -05:00
binhnd102	f723a8456e	Fixes: community: fix LanceDB return no metadata (#27024 ) - [ x ] Fix when lancedb return table without metadata column - Description: Check the table schema, if not has metadata column, init the Document with metadata argument equal to empty dict - Issue: https://github.com/langchain-ai/langchain/issues/27005 - [ x ] Add tests and docs --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 15:21:28 +00:00
ANSARI MD AAQIB AHMED	91d28ef453	Add langchain-yt-dlp Document Loader Documentation (#28775 ) ## Overview This PR adds documentation for the `langchain-yt-dlp` package, a YouTube document loader that uses `yt-dlp` for Youtube videos metadata extraaction. ## Changes - Added documentation notebook for YoutubeLoader - Updated packages.yml to include langchain-yt-dlp ## Motivation The existing LangChain YoutubeLoader was unable to fetch YouTube metadata due to changes in YouTube's structure. This package resolves those issues by leveraging the `yt-dlp` library. ## Features - Reliable YouTube metadata extraction ## Related - Package Repository: https://github.com/aqib0770/langchain-yt-dlp - PyPI Package: https://pypi.org/project/langchain-yt-dlp/ --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 10:16:50 -05:00
GITHUBear	33b1fb95b8	partners: langchain-oceanbase Integration (#28782 ) Hi, langchain team! I'm a maintainer of [OceanBase](https://github.com/oceanbase/oceanbase). With the integration guidance, I create a python lib named [langchain-oceanbase](https://github.com/oceanbase/langchain-oceanbase) to integrate `Oceanbase Vector Store` with `Langchain`. So I'd like to add the required docs. I will appreciate your feedback. Thank you! --------- Signed-off-by: shanhaikang.shk <shanhaikang.shk@oceanbase.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-12-18 14:51:49 +00:00
Rave Harpaz	986b752fc8	Add OCI Generative AI new model and structured output support (#28754 ) - [X] PR title: community: Add new model and structured output support - [X] PR message: - Description: add support for meta llama 3.2 image handling, and JSON mode for structured output - Issue: NA - Dependencies: NA - Twitter handle: NA - [x] Add tests and docs: 1. we have updated our unit tests, 2. no changes required for documentation. - [x] Lint and test: make format, make lint and make test we run successfully --------- Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-12-18 09:50:25 -05:00
David Pryce-Compson	ef24220d3f	community: adding haiku 3.5 and opus callbacks (#28783 ) Description: Adding new AWS Bedrock model and their respective costs to match https://aws.amazon.com/bedrock/pricing/ for the Bedrock callback Issue: Missing models for those that wish to try them out Dependencies: Nothing added Twitter handle: @David_Pryce and / or @JamfSoftware If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-12-18 09:45:10 -05:00
Yudai Kotani	05a44797ee	langchain_community: Add default None values to DocumentAttributeValue class properties (#28785 ) Description: This PR addresses an issue where the DocumentAttributeValue class properties did not have default values of None. By explicitly setting the Optional attributes (DateValue, LongValue, StringListValue, and StringValue) to default to None, this change ensures the class functions as expected when no value is provided for these attributes. Changes Made: Added default None values to the following properties of the DocumentAttributeValue class: DateValue LongValue StringListValue StringValue Removed the invalid argument extra="allow" from the BaseModel inheritance. Dependencies: None. Twitter handle (optional): @__korikori1021 Checklist - [x] Verified that KendraRetriever works as expected after the changes. Co-authored-by: y1u0d2a1i <y.kotani@raksul.com>	2024-12-18 09:43:04 -05:00
Satyam Kumar	90f7713399	refactor: improve docstring parsing logic for Google style (#28730 ) Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" Description: Improved the `_parse_google_docstring` function in `langchain/core` to support parsing multi-paragraph descriptions before the `Args:` section while maintaining compliance with Google-style docstring guidelines. This change ensures better handling of docstrings with detailed function descriptions. Issue: Fixes #28628 Dependencies: None. Twitter handle: @isatyamks --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-18 09:35:19 -05:00
Dong Shin	0b1359801e	community: add trust_env at web_base_loader (#28514 ) - Description: I am working to address a similar issue to the one mentioned in https://github.com/langchain-ai/langchain/pull/19499. Specifically, there is a problem with the Webbase loader used in open-webui, where it fails to load the proxy configuration. This PR aims to resolve that issue. <!--If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.--> --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 21:18:16 -05:00
Erick Friis	be738aa7de	packages: enable vertex api build (#28773 )	2024-12-17 11:31:14 -08:00
Bagatur	ac278cbe8b	core[patch]: export InjectedToolCallId (#28772 )	2024-12-17 19:29:20 +00:00
Bagatur	e4d3ccf62f	json mode standard test (#25497 ) Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-12-17 18:47:34 +00:00
Frank Dai	e81433497b	community: support Confluence cookies (#28760 ) Description: Some confluence instances don't support personal access token, then cookie is a convenient way to authenticate. This PR adds support for Confluence cookies. Twitter handle: soulmachine	2024-12-17 12:16:36 -05:00
ccurme	b745281eec	anthropic[patch]: increase timeouts for integration tests (#28767 ) Some tests consistently ran into the 10s limit in CI.	2024-12-17 15:47:17 +00:00

1 2 3 4 5 ...

6250 Commits