langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-31 10:23:18 +00:00

Author	SHA1	Message	Date
Martin Triska	90189f5639	community: Allow other than default parsers in SharePointLoader and OneDriveLoader (#27716 ) ## What this PR does? ### Currently `O365BaseLoader` (and consequently both derived loaders) are limited to `pdf`, `doc`, `docx` files. - Solution: here we introduce _handlers_ attribute that allows for custom handlers to be passed in. This is done in _dict_ form: Example: ```python from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser # PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749 from langchain_community.document_loaders.excel import UnstructuredExcelLoader xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged") # create dictionary mapping file types to handlers (parsers) handlers = { "doc": MsWordParser() "pdf": PDFMinerParser() "txt": TextParser() "xlsx": xlsx_parser } loader = SharePointLoader(document_library_id="...", handlers=handlers # pass handlers to SharePointLoader ) documents = loader.load() # works the same in OneDriveLoader loader = OneDriveLoader(document_library_id="...", handlers=handlers ) ``` This dictionary is then passed to `MimeTypeBasedParser` same as in the [current implementation](`5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)`). ### Currently `SharePointLoader` and `OneDriveLoader` are separate loaders that both inherit from `O365BaseLoader` However both of these implement the same functionality. The only differences are: - `SharePointLoader` requires argument `document_library_id` whereas `OneDriveLoader` requires `drive_id`. These are just different names for the same thing. - `SharePointLoader` implements significantly more features. - Solution: `OneDriveLoader` is replaced with an empty shell just renaming `drive_id` to `document_library_id` and inheriting from `SharePointLoader` Dependencies: None Twitter handle: @martintriska1 If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-11-06 17:44:34 -05:00
takahashi	482c168b3e	langchain_core: add `file_type` option to make file type default as `png` (#27855 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] description langchain_core.runnables.graph_mermaid.draw_mermaid_png calls this function, but the Mermaid API returns JPEG by default. To be consistent, add the option `file_type` with the default `png` type. - [ ] Add tests and docs: If you're adding a new integration, please include With this small change, I didn't add tests and docs. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: One long sentence was divided into two. Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-11-06 22:37:07 +00:00
Roman Solomatin	0f85dea8c8	langchain-huggingface: use separate kwargs for queries and docs (#27857 ) Now `encode_kwargs` used for both for documents and queries and this leads to wrong embeddings. E. g.: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?",) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64) ``` But from the [model card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers) expexted like this: ```python model_kwargs = {"device": "cuda", "trust_remote_code": True} encode_kwargs = {"normalize_embeddings": False} query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"} model = HuggingFaceEmbeddings( model_name="dunzhang/stella_en_400M_v5", model_kwargs=model_kwargs, encode_kwargs=encode_kwargs, query_encode_kwargs=query_encode_kwargs, ) query_embedding = np.array( model.embed_query("What are some ways to reduce stress?", ) ) document_embedding = np.array( model.embed_documents( [ "There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.", "Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.", ] ) ) print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64) ```	2024-11-06 17:35:39 -05:00
Bagatur	60123bef67	docs: fix trim_messages docstring (#27948 )	2024-11-06 22:25:13 +00:00
murrlincoln	14f1827953	docs: Adding notebook for cdp agentkit toolkit (#27910 ) - Description: Adding in the first pass of documentation for the CDP Agentkit Toolkit - Issue: N/a - Dependencies: cdp-langchain - Twitter handle: @CoinbaseDev --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: John Peterson <john.peterson@coinbase.com>	2024-11-06 13:28:27 -08:00
Eric Pinzur	ea0ad917b0	community: added Document.id support to opensearch vectorstore (#27945 ) Description: * Added support of Document.id on OpenSearch vector store * Added tests cases to match	2024-11-06 15:04:09 -05:00
Hammad Randhawa	75aa82fedc	docs: Completed sentence under the heading "Instantiating a Browser … (#27944 ) …Toolkit" in "playwright.ipynb" integration. - Completed the incomplete sentence in the Langchain Playwright documentation. - Enhanced documentation clarity to guide users on best practices for instantiating browser instances with Langchain Playwright. Example before: > "It's always recommended to instantiate using the from_browser method so that the Example after: > "It's always recommended to instantiate using the `from_browser` method so that the browser context is properly initialized and managed, ensuring seamless interaction and resource optimization." Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-06 19:55:00 +00:00
Bagatur	67ce05a0a7	core[patch]: make oai tool description optional (#27756 )	2024-11-06 18:06:47 +00:00
Bagatur	b2da3115ed	docs: document init_chat_model standard params (#27812 )	2024-11-06 09:50:07 -08:00
Dobiichi-Origami	395674d503	community: re-arrange function call message parse logic for Qianfan (#27935 ) the [PR](https://github.com/langchain-ai/langchain/pull/26208) two month ago has a potential bug which causes malfunction of `tool_call` for `QianfanChatEndpoint` waiting for fix	2024-11-06 09:58:16 -05:00
Erick Friis	41b7a5169d	infra: starter codeowners file (#27929 )	2024-11-05 16:43:11 -08:00
ccurme	66966a6e72	openai[patch]: release 0.2.6 (#27924 ) Some additions in support of [predicted outputs](https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs) feature: - Bump openai sdk version - Add integration test - Add example to integration docs The `prediction` kwarg is already plumbed through model invocation. langchain-openai==0.2.6	2024-11-05 23:02:24 +00:00
Erick Friis	a8c473e114	standard-tests: ci pipeline (#27923 )	2024-11-05 20:55:38 +00:00
Erick Friis	c3b75560dc	infra: release note grep order of operations (#27922 ) langchain-qdrant==0.2.0	2024-11-05 12:44:36 -08:00
Erick Friis	b3c81356ca	infra: release note compute 2 (#27921 ) langchain-nomic==0.1.4	2024-11-05 12:04:41 -08:00
Erick Friis	bff2a8b772	standard-tests: add tools standard tests (#27899 )	2024-11-05 11:44:34 -08:00
SHJUN	f6b2f82099	community: chroma error patch(attribute changed on chroma) (#27827 ) There was a change of attribute name which was "max_batch_size". It's now "get_max_batch_size" method. I want to use "create_batches" which is right down below. Please check this PR link. reference: https://github.com/chroma-core/chroma/pull/2305 --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com> Co-authored-by: Prithvi Kannan <46332835+prithvikannan@users.noreply.github.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Jun Yamog <jkyamog@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ono-hiroki <86904208+ono-hiroki@users.noreply.github.com> Co-authored-by: Dobiichi-Origami <56953648+Dobiichi-Origami@users.noreply.github.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Duy Huynh <vndee.huynh@gmail.com> Co-authored-by: Rashmi Pawar <168514198+raspawar@users.noreply.github.com> Co-authored-by: sifatj <26035630+sifatj@users.noreply.github.com> Co-authored-by: Eric Pinzur <2641606+epinzur@users.noreply.github.com> Co-authored-by: Daniel Vu Dao <danielvdao@users.noreply.github.com> Co-authored-by: Ofer Mendelevitch <ofermend@gmail.com> Co-authored-by: Stéphane Philippart <wildagsx@gmail.com>	2024-11-05 19:43:11 +00:00
Tomaz Bratanic	a3bbbe6a86	update llm graph transformer documentation (#27905 )	2024-11-05 11:54:26 -05:00
Erick Friis	31f4fb790d	standard-tests: release 0.3.0 (#27900 )	2024-11-04 17:29:15 -08:00
Erick Friis	ba5cba04ff	infra: get min versions (#27896 )	2024-11-04 23:46:13 +00:00
Bagatur	6973f7214f	docs: sidebar capitalization (#27894 )	2024-11-04 22:09:32 +00:00
Stéphane Philippart	4b8cd7a09a	community: ✨ Use new OVHcloud batch embedding (#26209 ) - Description: change to do the batch embedding server side and not client side - Twitter handle: @wildagsx --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-04 16:40:30 -05:00
Erick Friis	a54f390090	infra: fix prev tag output (#27892 )	2024-11-04 12:46:23 -08:00
Erick Friis	75f80c2910	infra: fix prev tag condition (#27891 )	2024-11-04 12:42:22 -08:00
Ofer Mendelevitch	d7c39e6dbb	community: update Vectara integration (#27869 ) Thank you for contributing to LangChain! - Description: Updated Vectara integration - Issue: refresh on descriptions across all demos and added UDF reranker - Dependencies: None - Twitter handle: @ofermend --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:40:39 +00:00
Erick Friis	14a71a6e77	infra: fix prev tag calculation (#27890 )	2024-11-04 12:38:39 -08:00
Daniel Vu Dao	5745f3bf78	docs: Update `messages.mdx` (#27856 ) ### Description Updates phrasing for the header of the `Messages` section. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:36:31 +00:00
sifatj	e02a5ee03e	docs: Update VectorStore as_retriever method url in qa_chat_history_how_to.ipynb (#27844 ) Description: Update VectorStore `as_retriever` method api reference url in `qa_chat_history_how_to.ipynb` Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:34:50 +00:00
sifatj	dd1711f3c2	docs: Update max_marginal_relevance_search api reference url in multi_vector.ipynb (#27843 ) Description: Update VectorStore `max_marginal_relevance_search` api reference url in `multi_vector.ipynb` Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:31:36 +00:00
sifatj	aa1f46a03a	docs: Update VectorStore .as_retriever method url in vectorstore_retriever.ipynb (#27842 ) Description: Update VectorStore `.as_retriever` method url in `vectorstore_retriever.ipynb` Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:28:11 +00:00
Eric Pinzur	8eb38622a6	community: fixed bug in GraphVectorStoreRetriever (#27846 ) Description: This fixes an issue that mistakenly created in https://github.com/langchain-ai/langchain/pull/27253. The issue currently exists only in `langchain-community==0.3.4`. Test cases were added to prevent this issue in the future. Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:27:17 +00:00
sifatj	eecf95df9b	docs: Update VectorStore api reference url in rag.ipynb (#27841 ) Description: Update VectorStore api reference url in `rag.ipynb` Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:27:03 +00:00
sifatj	50563400fb	docs: Update broken vectorstore urls in retrievers.ipynb (#27838 ) Description: Update outdated `VectorStore` api reference urls in `retrievers.ipynb` Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 20:26:03 +00:00
Bagatur	dfa83531ad	qdrant,nomic[minor]: bump core deps (#27849 )	2024-11-04 20:19:50 +00:00
Erick Friis	4e5cc84d40	infra: release tag compute (#27836 )	2024-11-04 12:16:51 -08:00
Rashmi Pawar	f86a09f82c	Add nvidia as provider for embedding, llm (#27810 ) Documentation: Add NVIDIA as integration provider cc: @mattf @dglogo Co-authored-by: Erick Friis <erick@langchain.dev>	2024-11-04 19:45:51 +00:00
Erick Friis	0c62684ce1	Revert "infra: add neo4j to package list" (#27887 ) Reverts langchain-ai/langchain#27833 Wait for release	2024-11-04 18:18:38 +00:00
Erick Friis	bcf499df16	infra: add neo4j to package list (#27833 )	2024-11-04 09:24:04 -08:00
Duy Huynh	a487ec47f4	community: set default `output_token_limit` value for `PowerBIToolkit` to fix validation error (#26308 ) ### Description: This PR sets a default value of `output_token_limit = 4000` for the `PowerBIToolkit` to fix the unintentionally validation error. ### Problem: When attempting to run a code snippet from [Langchain's PowerBI toolkit documentation](https://python.langchain.com/v0.1/docs/integrations/toolkits/powerbi/) to interact with a `PowerBIDataset`, the following error occurs: ``` pydantic.v1.error_wrappers.ValidationError: 1 validation error for QueryPowerBITool output_token_limit none is not an allowed value (type=type_error.none.not_allowed) ``` ### Root Cause: The issue arises because when creating a `QueryPowerBITool`, the `output_token_limit` parameter is unintentionally set to `None`, which is the current default for `PowerBIToolkit`. However, `QueryPowerBITool` expects a default value of `4000` for `output_token_limit`. This unintended override causes the error. `17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L63)` `17659ca2cd/libs/community/langchain_community/agent_toolkits/powerbi/toolkit.py (L72-L79)` `17659ca2cd/libs/community/langchain_community/tools/powerbi/tool.py (L39)` ### Solution: To resolve this, the default value of `output_token_limit` is now explicitly set to `4000` in `PowerBIToolkit` to prevent the accidental assignment of `None`. Co-authored-by: ccurme <chester.curme@gmail.com>	2024-11-04 14:34:27 +00:00
Dobiichi-Origami	f7ced5b211	community: read function call from `tool_calls` for Qianfan (#26208 ) I added one more 'elif' to read tool call message from `tool_calls` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-11-04 14:33:32 +00:00
ono-hiroki	b7d549ae88	docs: fix undefined 'data' variable in document_loader_csv.ipynb (#27872 ) Description: This PR addresses an issue in the CSVLoader example where data is not defined, causing a NameError. The line `data = loader.load()` is added to correctly assign the output of loader.load() to the data variable.	2024-11-04 14:10:56 +00:00
Bagatur	3b0b7cfb74	chroma[minor]: release 0.2.0 (#27840 )	2024-11-01 18:12:00 -07:00
Jun Yamog	830cad7bc0	core: fix CommaSeparatedListOutputParser to handle columns that may contain commas in it (#26365 ) - Description: Currently CommaSeparatedListOutputParser can't handle strings that may contain commas within a column. It would parse any commas as the delimiter. Ex. "foo, foo2", "bar", "baz" It will create 4 columns: "foo", "foo2", "bar", "baz" This should be 3 columns: "foo, foo2", "bar", "baz" - Dependencies: Added 2 additional imports, but they are built in python packages. import csv from io import StringIO - Twitter handle: @jkyamog - [ ] Add tests and docs: 1. added simple unit test test_multiple_items_with_comma --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-11-01 22:42:24 +00:00
Erick Friis	9fedb04dd3	docs: INVALID_CHAT_HISTORY redirect (#27845 )	2024-11-01 21:35:11 +00:00
Erick Friis	03a3670a5e	infra: remove some special cases (#27839 )	2024-11-01 21:13:43 +00:00
Bagatur	002e1c9055	airbyte: remove from master (#27837 )	2024-11-01 13:59:34 -07:00
Bagatur	ee63d21915	many: use core 0.3.15 (#27834 ) langchain-box==0.2.3 langchain-community==0.3.5 langchain-couchbase==0.2.2 langchain-exa==0.2.1 langchain-prompty==0.1.1 langchain-text-splitters==0.3.2 langchain-voyageai==0.1.3 langchain==0.3.7	2024-11-01 20:35:55 +00:00
Prithvi Kannan	c3c638cd7b	docs: Reference new databricks-langchain package (#27828 ) Thank you for contributing to LangChain! Update references in Databricks integration page to reference our new partner package databricks-langchain https://github.com/databricks/databricks-ai-bridge/tree/main/integrations/langchain Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>	2024-11-01 10:21:19 -07:00
sifatj	33d445550e	docs: update VectorStore api reference url in retrievers.ipynb (#27814 ) Description: Update outdated `VectorStore` api reference url in Vector store subsection of `retrievers.ipynb`	2024-11-01 15:44:26 +00:00
sifatj	9a4a630e40	docs: Update Retrievers and Runnable links in Retrievers subsection of retrievers.ipynb (#27815 ) Description: Update outdated links for `Retrievers` and `Runnable` in Retrievers subsection of `retrievers.ipynb`	2024-11-01 15:42:30 +00:00

... 8 9 10 11 12 ...

12264 Commits