langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-11 23:40:24 +00:00

Author	SHA1	Message	Date
Eugene Yurtsev	e0186df56b	core[patch]: Clarify upsert response semantics (#23921 )	2024-07-05 15:59:47 -04:00
Eugene Yurtsev	5b7d5f7729	core[patch]: Add comment to clarify aadd_documents (#23920 ) Add comment to clarify how add documents works	2024-07-05 15:20:16 -04:00
ccurme	74c7198906	core, anthropic[patch]: support streaming tool calls when function has no arguments (#23915 ) resolves https://github.com/langchain-ai/langchain/issues/23911 When an AIMessageChunk is instantiated, we attempt to parse tool calls off of the tool_call_chunks. Here we add a special-case to this parsing, where `""` will be parsed as `{}`. This is a reaction to how Anthropic streams tool calls in the case where a function has no arguments: ``` {'id': 'toolu_01J8CgKcuUVrMqfTQWPYh64r', 'input': {}, 'name': 'magic_function', 'type': 'tool_use', 'index': 1} {'partial_json': '', 'type': 'tool_use', 'index': 1} ``` The `partial_json` does not accumulate to a valid json string-- most other providers tend to emit `"{}"` in this case.	2024-07-05 18:57:41 +00:00
Christophe Bornet	42d049f618	core[minor]: Add Graph Store component (#23092 ) This PR introduces a GraphStore component. GraphStore extends VectorStore with the concept of links between documents based on document metadata. This allows linking documents based on a variety of techniques, including common keywords, explicit links in the content, and other patterns. This works with existing Documents, so it’s easy to extend existing VectorStores to be used as GraphStores. The interface can be implemented for any Vector Store technology that supports metadata, not only graph DBs. When retrieving documents for a given query, the first level of search is done using classical similarity search. Next, links may be followed using various traversal strategies to get additional documents. This allows documents to be retrieved that aren’t directly similar to the query but contain relevant information. 2 retrieving methods are added to the VectorStore ones : * traversal_search which gets all linked documents up to a certain depth * mmr_traversal_search which selects linked documents using an MMR algorithm to have more diverse results. If a depth of retrieval of 0 is used, GraphStore is effectively a VectorStore. It enables an easy transition from a simple VectorStore to GraphStore by adding links between documents as a second step. An implementation for Apache Cassandra is also proposed. See https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb for a notebook explaining how to use GraphStore and that shows that it can answer correctly to questions that a simple VectorStore cannot. Twitter handle: _cbornet	2024-07-05 12:24:10 -04:00
Leonid Ganeline	77f5fc3d55	core: docstrings `load` (#23787 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-05 12:23:19 -04:00
Eugene Yurtsev	6f08e11d7c	core[minor]: add upsert, streaming_upsert, aupsert, astreaming_upsert methods to the VectorStore abstraction (#23774 ) This PR rolls out part of the new proposed interface for vectorstores (https://github.com/langchain-ai/langchain/pull/23544) to existing store implementations. The PR makes the following changes: 1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert methods to the vectorstore. 2. Updates `add_texts` and `aadd_texts` to be non required with a default implementation that delegates to `upsert` and `aupsert` if those have been implemented. The original `add_texts` and `aadd_texts` methods are problematic as they spread object specific information across document and *kwargs. (e.g., ids are not a part of the document) 3. Adds a default implementation to `add_documents` and `aadd_documents` that delegates to `upsert` and `aupsert` respectively. 4. Adds standard unit tests to verify that a given vectorstore implements a correct read/write API. A downside of this implementation is that it creates `upsert` with a very similar signature to `add_documents`. The reason for introducing `upsert` is to: Remove any ambiguities about what information is allowed in `kwargs`. Specifically kwargs should only be used for information common to all indexed data. (e.g., indexing timeout). *Allow inheriting from an anticipated generalized interface for indexing that will allow indexing `BaseMedia` (i.e., allow making a vectorstore for images/audio etc.) `add_documents` can be deprecated in the future in favor of `upsert` to make sure that users have a single correct way of indexing content. --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-07-05 12:21:40 -04:00
G Sreejith	3c752238c5	core[patch]: Fix typo in docstring (graphm -> graph) (#23910 ) Changes has been as per the request Replaced graphm with graph	2024-07-05 16:20:33 +00:00
Leonid Ganeline	12c92b6c19	core: docstrings `outputs` (#23889 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-05 12:18:17 -04:00
Leonid Ganeline	1eca98ec56	core: docstrings `prompts` (#23890 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-05 12:17:52 -04:00
Mohammad Mohtashim	2274d2b966	core[patch]: Accounting for Optional Input Variables in BasePromptTemplate (#22851 ) Description: After reviewing the prompts API, it is clear that the only way a user can explicitly mark an input variable as optional is through the `MessagePlaceholder.optional` attribute. Otherwise, the user must explicitly pass in the `input_variables` expected to be used in the `BasePromptTemplate`, which will be validated upon execution. Therefore, to semantically handle a `MessagePlaceholder` `variable_name` as optional, we will treat the `variable_name` of `MessagePlaceholder` as a `partial_variable` if it has been marked as optional. This approach aligns with how the `variable_name` of `MessagePlaceholder` is already handled [here](https://github.com/keenborder786/langchain/blob/optional_input_variables/libs/core/langchain_core/prompts/chat.py#L991). Additionally, an attribute `optional_variable` has been added to `BasePromptTemplate`, and the `variable_name` of `MessagePlaceholder` is also made part of `optional_variable` when marked as optional. Moreover, the `get_input_schema` method has been updated for `BasePromptTemplate` to differentiate between optional and non-optional variables. Issue: #22832, #21425 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-07-05 15:49:40 +00:00
Eugene Yurtsev	9ccc4b1616	core[patch]: Fix logic in BaseChatModel that processes the llm string that is used as a key for caching chat models responses (#23842 ) This PR should fix the following issue: https://github.com/langchain-ai/langchain/issues/23824 Introduced as part of this PR: https://github.com/langchain-ai/langchain/pull/23416 I am unable to reproduce the issue locally though it's clear that we're getting a `serialized` object which is not a dictionary somehow. The test below passes for me prior to the PR as well ```python def test_cache_with_sqllite() -> None: from langchain_community.cache import SQLiteCache from langchain_core.globals import set_llm_cache cache = SQLiteCache(database_path=".langchain.db") set_llm_cache(cache) chat_model = FakeListChatModel(responses=["hello", "goodbye"], cache=True) assert chat_model.invoke("How are you?").content == "hello" assert chat_model.invoke("How are you?").content == "hello" ```	2024-07-03 16:23:55 -04:00
Vadym Barda	9bb623381b	core[minor]: update conversion utils to handle RemoveMessage (#23840 )	2024-07-03 16:13:31 -04:00
Théo Deschamps	39b19cf764	core[patch]: extract input variables for `path` and `detail` keys in order to format an `ImagePromptTemplate` (#22613 ) - Description: Add support for `path` and `detail` keys in `ImagePromptTemplate`. Previously, only variables associated with the `url` key were considered. This PR allows for the inclusion of a local image path and a detail parameter as input to the format method. - Issues: - fixes #20820 - related to #22024 - Dependencies: None - Twitter handle: @DeschampsTho5 --------- Co-authored-by: tdeschamps <tdeschamps@kameleoon.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-07-03 18:58:42 +00:00
Leonid Ganeline	55f6f91f17	core[patch]: docstrings `output_parsers` (#23825 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 14:27:40 -04:00
Bagatur	a0c2281540	infra: update mypy 1.10, ruff 0.5 (#23721 ) ```python """python scripts/update_mypy_ruff.py""" import glob import tomllib from pathlib import Path import toml import subprocess import re ROOT_DIR = Path(__file__).parents[1] def main(): for path in glob.glob(str(ROOT_DIR / "libs/*/pyproject.toml"), recursive=True): print(path) with open(path, "rb") as f: pyproject = tomllib.load(f) try: pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = ( "^1.10" ) pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = ( "^0.5" ) except KeyError: continue with open(path, "w") as f: toml.dump(pyproject, f) cwd = "/".join(path.split("/")[:-1]) completed = subprocess.run( "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color", cwd=cwd, shell=True, capture_output=True, text=True, ) logs = completed.stdout.split("\n") to_ignore = {} for l in logs: if re.match("^(.)\:(\d+)\: error:.\[(.)\]", l): path, line_no, error_type = re.match( "^(.)\:(\d+)\: error:.\[(.*)\]", l ).groups() if (path, line_no) in to_ignore: to_ignore[(path, line_no)].append(error_type) else: to_ignore[(path, line_no)] = [error_type] print(len(to_ignore)) for (error_path, line_no), error_types in to_ignore.items(): all_errors = ", ".join(error_types) full_path = f"{cwd}/{error_path}" try: with open(full_path, "r") as f: file_lines = f.readlines() except FileNotFoundError: continue file_lines[int(line_no) - 1] = ( file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n" ) with open(full_path, "w") as f: f.write("".join(file_lines)) subprocess.run( "poetry run ruff format .; poetry run ruff --select I --fix .", cwd=cwd, shell=True, capture_output=True, text=True, ) if __name__ == "__main__": main() ```	2024-07-03 10:33:27 -07:00
William FH	6cd56821dc	[Core] Unify function schema parsing (#23370 ) Use pydantic to infer nested schemas and all that fun. Include bagatur's convenient docstring parser Include annotation support Previously we didn't adequately support many typehints in the bind_tools() method on raw functions (like optionals/unions, nested types, etc.)	2024-07-03 09:55:38 -07:00
Leonid Ganeline	716a316654	core: docstrings `indexing` (#23785 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 11:27:34 -04:00
Leonid Ganeline	30fdc2dbe7	core: docstrings `messages` (#23788 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-07-03 11:25:00 -04:00
Bagatur	d677dadf5f	core[patch]: mark RemoveMessage beta (#23656 )	2024-07-02 21:27:21 +00:00
SN	acc457f645	core[patch]: fix nested sections for mustache templating (#23747 ) The prompt template variable detection only worked for singly-nested sections because we just kept track of whether we were in a section and then set that to false as soon as we encountered an end block. i.e. the following: ``` {{#outerSection}} {{variableThatShouldntShowUp}} {{#nestedSection}} {{nestedVal}} {{/nestedSection}} {{anotherVariableThatShouldntShowUp}} {{/outerSection}} ``` Would yield `['outerSection', 'anotherVariableThatShouldntShowUp']` as input_variables (whereas it should just yield `['outerSection']`). This fixes that by keeping track of the current depth and using a stack.	2024-07-02 10:20:45 -07:00
Eugene Yurtsev	ebcee4f610	core[patch]: Add versionadded to get_by_ids (#23728 )	2024-07-01 15:16:00 -04:00
Eugene Yurtsev	e800f6bb57	core[minor]: Create BaseMedia object (#23639 ) This PR implements a BaseContent object from which Document and Blob objects will inherit proposed here: https://github.com/langchain-ai/langchain/pull/23544 Alternative: Create a base object that only has an identifier and no metadata. For now decided against it, since that refactor can be done at a later time. It also feels a bit odd since our IDs are optional at the moment. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-07-01 15:07:30 -04:00
Nuno Campos	b36e95caa9	core[patch]: use async messages where possible (#23718 ) Fix #23716 Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-07-01 18:33:05 +00:00
Spyros Avlonitis	8cfb2fa1b7	core[minor]: Add maxsize for InMemoryCache (#23405 ) This PR introduces a maxsize parameter for the InMemoryCache class, allowing users to specify the maximum number of items to store in the cache. If the cache exceeds the specified maximum size, the oldest items are removed. Additionally, comprehensive unit tests have been added to ensure all functionalities are thoroughly tested. The tests are written using pytest and cover both synchronous and asynchronous methods. Twitter: @spyrosavl --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-07-01 14:21:21 -04:00
Eugene Yurtsev	b5aef4cf97	core[patch]: Fix llm string representation for serializable models (#23416 ) Fix LLM string representation for serializable objects. Fix for issue: https://github.com/langchain-ai/langchain/issues/23257 The llm string of serializable chat models is the serialized representation of the object. LangChain serialization dumps some basic information about non serializable objects including their repr() which includes an object id. This means that if a chat model has any non serializable fields (e.g., a cache), then any new instantiation of the those fields will change the llm representation of the chat model and cause chat misses. i.e., re-instantiating a postgres cache would result in cache misses!	2024-07-01 14:06:33 -04:00
nobbbbby	3904f2cd40	core: fix NameError (#23658 ) Description: In the chat_models module of the language model, the import statement for BaseModel has been moved from the conditionally imported section to the main import area, fixing `NameError `. Issue: fix `NameError `	2024-07-01 17:51:23 +00:00
Eugene Yurtsev	4f1821db3e	core[minor]: Add get_by_ids to vectorstore interface (#23594 ) This PR adds a part of the indexing API proposed in this RFC https://github.com/langchain-ai/langchain/pull/23544/files. It allows rolling out `get_by_ids` which should be uncontroversial to existing vectorstores without introducing new abstractions. The semantics for this method depend on the ability of identifying returned documents using the new optional ID field on documents: https://github.com/langchain-ai/langchain/pull/23411 Alternatives are: 1. Relax the sequence requirement ```python def get_by_ids(self, ids: Iterable[str], /) -> Iterable[Document]: ``` Rejected: - implementations are more likley to start batching with bad defaults - users would need to call list() or we'd need to introduce another convenience method 2. Support more kwargs ```python def get_by_ids(self, ids: Sequence[str], /, **kwargs) -> List[Document]: ... ``` Rejected: - No need for `batch` parameter since IDs is a sequence - Output cannot be customized since `Document` is fixed. (e.g., parameters could be useful to grab extra metadata like the vector that was indexed with the Document or to project a part of the document)	2024-07-01 13:04:33 -04:00
Vadym Barda	e8d77002ea	core: add RemoveMessage (#23636 ) This change adds a new message type `RemoveMessage`. This will enable `langgraph` users to manually modify graph state (or have the graph nodes modify the state) to remove messages by `id` Examples: * allow users to delete messages from state by calling ```python graph.update_state(config, values=[RemoveMessage(id=state.values[-1].id)]) ``` * allow nodes to delete messages ```python graph.add_node("delete_messages", lambda state: [RemoveMessage(id=state[-1].id)]) ```	2024-06-28 14:40:02 -07:00
Leonid Ganeline	75a44fe951	core: `chat_*` docstrings (#23412 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-06-27 17:29:38 -04:00
Eugene Yurtsev	da7beb1c38	core[patch]: Add unit test when catching generator exit (#23402 ) This pr adds a unit test for: https://github.com/langchain-ai/langchain/pull/22662 And narrows the scope where the exception is caught.	2024-06-27 20:36:07 +00:00
Eugene Yurtsev	96b72edac8	core[minor]: Add optional ID field to Document schema (#23411 ) This PR adds an optional ID field to the document schema. # 1. Optional or Required - An optional field will will requrie additional checking for the type in user code (annoying). - However, vectorstores currently don't respect this field. So if we make it required and start returning random UUIDs that might be even more confusing to users. Proposal: Start with Optional and convert to Required (with default set to uuid4()) in 1-2 major releases. # 2. Override __str__ or generic solution in prompts Overriding __str__ as a simple way to avoid changing user code that relies on default str(document) in prompts. I considered rolling out a more general solution in prompts (https://github.com/langchain-ai/langchain/pull/8685), but to do that we need to: 1. Make things serializable 2. The more general solution would likely need to be backwards compatible as well 3. It's unclear that one wants to format a List[int] in the same way as List[Document]. The former should be `,` seperated (likely), the latter should be `---` separated (likely). Proposal Start with __str__ override and focus on the vectorstore APIs, we generalize prompts later	2024-06-27 12:15:58 -04:00
Leonid Ganeline	2c9b84c3a8	core[patch]: docstrings `agents` (#23502 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-06-26 17:50:48 -04:00
Leonid Ganeline	2a5d59b3d7	core[patch]: `callbacks` docstrings (#23375 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-06-26 17:11:06 -04:00
Leonid Ganeline	1141b08eb8	core: docstrings `example_selectors` (#23542 ) Added missed docstrings. Formatted docstrings to the consistent form.	2024-06-26 17:10:40 -04:00
Bagatur	32f8f39974	core[patch]: use args_schema doc for tool description (#23503 )	2024-06-25 15:26:35 -07:00
Isaac Francisco	85f5d14cef	[docs]: split up tool docs (#22919 )	2024-06-25 13:15:08 -07:00
William FH	8955bc1866	[Core] Logging: Suppress missing parent warning (#23363 )	2024-06-25 14:57:23 -04:00
ccurme	730c551819	core[patch]: export tool output parsers from langchain_core.output_parsers (#23305 ) These currently read off AIMessage.tool_calls, and only fall back to OpenAI parsing if tool calls aren't populated. Importing these from `openai_tools` (e.g., in our [tool calling docs](https://python.langchain.com/v0.2/docs/how_to/tool_calling/#tool-calls)) can lead to confusion. After landing, would need to release core and update docs.	2024-06-25 14:40:42 -04:00
Riccardo Schirone	4530d851e4	Merge pull request #22662 * core: runnables: special handling GeneratorExit because no error	2024-06-25 08:42:03 -04:00
William FH	efb4c12abe	[Core] Add support for inferring Annotated types (#23284 ) in bind_tools() / convert_to_openai_function	2024-06-21 15:16:30 -07:00
Vadym Barda	9ac302cb97	core[minor]: update draw_mermaid node label processing (#23285 ) This fixes processing issue for nodes with numbers in their labels (e.g. `"node_1"`, which would previously be relabeled as `"node__"`, and now are correctly processed as `"node_1"`)	2024-06-21 21:35:32 +00:00
Bagatur	f824f6d925	docs: fix merge message runs docstring (#23279 )	2024-06-21 19:50:50 +00:00
Bagatur	9eda8f2fe8	docs: fix trim_messages code blocks (#23271 )	2024-06-21 17:15:31 +00:00
Bagatur	4c97a9ee53	docs: fix message transformer docstrings (#23264 )	2024-06-21 16:10:03 +00:00
Brace Sproul	abe7566d7d	core[minor]: BaseChatModel with_structured_output implementation (#22859 )	2024-06-21 08:14:03 -07:00
mackong	360a70c8a8	core[patch]: fix no current event loop for sql history in async mode (#22933 ) - Description: When use RunnableWithMessageHistory/SQLChatMessageHistory in async mode, we'll get the following error: ``` Error in RootListenersTracer.on_chain_end callback: RuntimeError("There is no current event loop in thread 'asyncio_3'.") ``` which throwed by `ddfbca38df/libs/community/langchain_community/chat_message_histories/sql.py (L259)`. and no message history will be add to database. In this patch, a new _aexit_history function which will'be called in async mode is added, and in turn aadd_messages will be called. In this patch, we use `afunc` attribute of a Runnable to check if the end listener should be run in async mode or not. - Issue: #22021, #22022 - Dependencies: N/A	2024-06-21 10:39:47 -04:00
mackong	b108b4d010	core[patch]: set schema format for AsyncRootListenersTracer (#23214 ) - Description: AsyncRootListenersTracer support on_chat_model_start, it's schema_format should be "original+chat". - Issue: N/A - Dependencies:	2024-06-21 09:30:27 -04:00
Bagatur	976b456619	docs: BaseChatModel key methods table (#23238 ) If we're moving documenting inherited params think these kinds of tables become more important ![Screenshot 2024-06-20 at 3 59 12 PM](https://github.com/langchain-ai/langchain/assets/22008038/722266eb-2353-4e85-8fae-76b19bd333e0)	2024-06-20 21:00:22 -07:00
Bagatur	12e0c28a6e	docs: fix chat model methods table (#23233 ) rst table not md ![Screenshot 2024-06-20 at 12 37 46 PM](https://github.com/langchain-ai/langchain/assets/22008038/7a03b869-c1f4-45d0-8d27-3e16f4c6eb19)	2024-06-20 19:51:10 +00:00
Eugene Yurtsev	7545b1d29b	core[patch]: Fix doc-strings for code blocks (#23232 ) Code blocks need extra space around them to be rendered properly by sphinx	2024-06-20 19:34:52 +00:00

1 2 3 4 5 ...

484 Commits