langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-07-15 01:13:48 +00:00

Author	SHA1	Message	Date
Oleksii Pokotylo	37ca468d03	community: AzureSearch: fix reranking for empty lists (#27104 ) Description: Fix reranking for empty lists Issue: ``` ValueError: not enough values to unpack (expected 3, got 0) documents, scores, vectors = map(list, zip(*docs)) File langchain_community/vectorstores/azuresearch.py", line 1680, in _reorder_results_with_maximal_marginal_relevance ``` Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>	2024-10-07 15:27:09 -04:00
Martin Triska	3fc0ea510e	community : [bugfix] Use document ids as keys in AzureSearch vectorstore (#25486 ) # Description [Vector store base class](`4cdaca67dc/libs/core/langchain_core/vectorstores/base.py (L65)`) currently expects `ids` to be passed in and that is what it passes along to the AzureSearch vector store when attempting to `add_texts()`. However AzureSearch expects `keys` to be passed in. When they are not present, AzureSearch `add_embeddings()` makes up new uuids. This is a problem when trying to run indexing. [Indexing code expects](`b297af5482/libs/core/langchain_core/indexing/api.py (L371)`) the documents to be uploaded using provided ids. Currently AzureSearch ignores `ids` passed from `indexing` and makes up new ones. Later when `indexer` attempts to delete removed file, it uses the `id` it had stored when uploading the document, however it was uploaded under different `id`. Twitter handle: @martintriska1	2024-09-19 09:37:18 -04:00
Erick Friis	c2a3021bb0	multiple: pydantic 2 compatibility, v0.3 (#26443 ) Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com> Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com> Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: ZhangShenao <15201440436@163.com> Co-authored-by: Friso H. Kingma <fhkingma@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Morgante Pell <morgantep@google.com>	2024-09-13 14:38:45 -07:00
Erick Friis	f7e62754a1	community: undo azure_ad_access_token breaking change (#25818 )	2024-08-30 02:06:14 +00:00
ccurme	afe8ccaaa6	community[patch]: Add ID field back to Azure AI Search results (#25828 ) Commandeering https://github.com/langchain-ai/langchain/pull/23243 as maintainers don't have ability to modify that PR. Fixes https://github.com/langchain-ai/langchain/issues/22827 --------- Co-authored-by: Ming Quah <fleetadmiralbutter@icloud.com>	2024-08-28 17:56:50 -04:00
Luis Valencia	99f9a664a5	community: Azure Search Vector Store is missing Access Token Authentication (#24330 ) Added Azure Search Access Token Authentication instead of API KEY auth. Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263 Dependencies: None Twitter: @levalencia @baskaryan Could you please review? First time creating a PR that fixes some code. --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-08-26 15:41:50 -07:00
Jabir	12e490ea56	Update azuresearch.py (#25577 ) This will allow complextype metadata to be returned. the current implementation throws error when dealing with nested metadata Thank you for contributing to LangChain! - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-08-20 12:53:30 +00:00
thedavgar	9d08369442	community: fix AzureSearch vectorstore asyncronous methods (#24921 ) Description Fix the asyncronous methods to retrieve documents from AzureSearch VectorStore. The previous changes from [this commit](`ffe6ca986e`) create a similar code for the syncronous methods and the asyncronous ones but the asyncronous client return an asyncronous iterator "AsyncSearchItemPaged" as said in the issue #24740. To solve this issue, the syncronous iterators in asyncronous methods where changed to asyncronous iterators. @chrislrobert said in [this comment](https://github.com/langchain-ai/langchain/issues/24740#issuecomment-2254168302) that there was a still a flaw due to `with` blocks that close the client after each call. I removed this `with` blocks in the `async_client` following the same pattern as the sync `client`. In order to close up the connections, a __del__ method is included to gently close up clients once the vectorstore object is destroyed. Issue: #24740 and #24064 Dependencies: No new dependencies for this change Example notebook: I created a notebook just to test the changes work and gives the same results as the syncronous methods for vector and hybrid search. With these changes, the asyncronous methods in the retriever work as well. ![image](https://github.com/user-attachments/assets/697e431b-9d7f-4d0d-b205-59d051ac2b67) Lint and test: Passes the tests and the linter	2024-08-13 14:20:51 -07:00
Eugene Yurtsev	bf5193bb99	community[patch]: Upgrade pydantic extra (#25185 ) Upgrade to using a literal for specifying the extra which is the recommended approach in pydantic 2. This works correctly also in pydantic v1. ```python from pydantic.v1 import BaseModel class Foo(BaseModel, extra="forbid"): x: int Foo(x=5, y=1) ``` And ```python from pydantic.v1 import BaseModel class Foo(BaseModel): x: int class Config: extra = "forbid" Foo(x=5, y=1) ``` ## Enum -> literal using grit pattern: ``` engine marzano(0.1) language python or { `extra=Extra.allow` => `extra="allow"`, `extra=Extra.forbid` => `extra="forbid"`, `extra=Extra.ignore` => `extra="ignore"` } ``` Resorted attributes in config and removed doc-string in case we will need to deal with going back and forth between pydantic v1 and v2 during the 0.3 release. (This will reduce merge conflicts.) ## Sort attributes in Config: ``` engine marzano(0.1) language python function sort($values) js { return $values.text.split(',').sort().join("\n"); } class_definition($name, $body) as $C where { $name <: `Config`, $body <: block($statements), $values = [], $statements <: some bubble($values) assignment() as $A where { $values += $A }, $body => sort($values), } ```	2024-08-08 17:20:39 +00:00
Marc Gibbons	cc451effd1	community[patch]: langchain_community.vectorstores.azuresearch Raise LangChainException instead of bare Exception (#23935 ) Raise `LangChainException` instead of `Exception`. This alleviates the need for library users to use bare try/except to handle exceptions raised by `AzureSearch`. Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-07-26 15:59:06 +00:00
thedavgar	ffe6ca986e	community: Fix Bug in Azure Search Vectorstore search asyncronously (#24081 ) Thank you for contributing to LangChain! Description: This PR fixes a bug described in the issue in #24064, when using the AzureSearch Vectorstore with the asyncronous methods to do search which is also the method used for the retriever. The proposed change includes just change the access of the embedding as optional because is it not used anywhere to retrieve documents. Actually, the syncronous methods of retrieval do not use the embedding neither. With this PR the code given by the user in the issue works. ```python vectorstore = AzureSearch( azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"), azure_search_key=os.getenv("AI_SEARCH_API_KEY"), index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"), fields=fields, embedding_function=encoder, ) retriever = vectorstore.as_retriever(search_type="hybrid", k=2) await vectorstore.avector_search("what is the capital of France") await retriever.ainvoke("what is the capital of France") ``` Issue: The Azure Search Vectorstore is not working when searching for documents with asyncronous methods, as described in issue #24064 Dependencies: There are no extra dependencies required for this change. --------- Co-authored-by: isaac hershenson <ihershenson@hmc.edu>	2024-07-11 18:32:19 -07:00
Matt	8327925ab7	community:support additional Azure Search Options (#24134 ) - Description: Support additional kwargs options for the Azure Search client (Described here https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/core/azure-core/README.md#configurations) - Issue: N/A - Dependencies: No additional Dependencies ---------	2024-07-11 18:22:36 +00:00
Eugene Yurtsev	f24e38876a	community[patch]: Update root_validators to use explicit pre=True or pre=False (#23736 )	2024-07-01 17:13:23 -04:00
Bagatur	584a1e30ac	community[patch]: AzureSearch async functions (#22075 )	2024-06-05 14:39:54 -07:00
Dan	86509161b0	community: fix AzureSearch delete documents (#22315 ) Description Fix AzureSearch delete documents method by using FIELDS_ID variable instead of the hard coded "id" value Issue: This is linked to this issue: https://github.com/langchain-ai/langchain/issues/22314 Co-authored-by: dseban <dan.seban@neoxia.com>	2024-06-03 15:55:06 +00:00
Mohammad Mohtashim	16617dd239	community[patch]: AzureSearchVectorStoreRetriever Fixed to account for search_kwargs (#21572 ) - Description: Fixed `AzureSearchVectorStoreRetriever` to account for search_kwargs. More explanation is in the mentioned issue. - Issue: #21492 --------- Co-authored-by: MAC <mac@MACs-MacBook-Pro.local> Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-22 14:46:41 -07:00
Oleksii Pokotylo	98c0b093bb	community[patch]: Extend AzureSearch with `maximal_marginal_relevance`, `from_embeddings` (#21065 ) Description: - Extend AzureSearch with `maximal_marginal_relevance` (for vector and hybrid search) - Add construction `from_embeddings` - if the user has already embedded the texts - Add `add_embeddings` - Refactor common parts (`_simple_search`, `_results_to_documents`, `_reorder_results_with_maximal_marginal_relevance`) - Add `vector_search_dimensions` as a parameter to the constructor to avoid extra calls to `embed_query` (most of the time the user applies the same model and knows the dimension) Issue: none Dependencies: none - [x] Add tests and docs: The docstrings have been added to the new functions, and unified for the existing ones. The example notebook is great in illustrating the main usage of AzureSearch, adding the new methods would only dilute the main content. - [x] Lint and test --------- Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-22 13:36:06 -07:00
Bagatur	72d4a8eeed	community[patch]: AzureSearch dont overwrite default async (#21989 )	2024-05-21 11:01:28 -07:00
Massimiliano Pronesti	0c0db7c5db	feat(community): support semantic hybrid score threshold in Azure AI Search (#21527 ) Support semantic hybrid search with a score threshold -- similar to what we do for similarity search and for hybrid search (#20907).	2024-05-16 15:54:32 -04:00
MacanPN	0f7f448603	community[patch]: add delete() method to AzureSearch vector store (#21127 ) Issue: Currently `AzureSearch` vector store does not implement `delete` method. This PR implements it. This also makes it compatible with LangChain indexer. Dependencies: None Twitter handle: @martintriska1 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-04-30 23:46:18 +00:00
Massimiliano Pronesti	ce89b34fc0	community[patch]: support hybrid search with threshold in Azure AI Search Retriever (#20907 ) Support hybrid search with a score threshold -- similar to what we do for similarity search.	2024-04-29 12:11:44 -04:00
Massimiliano Pronesti	8d1167b32f	community[patch]: add support for similarity_score_threshold search in… (#20852 ) See https://github.com/langchain-ai/langchain/issues/20600#issuecomment-2075569338 for details. @chrislrobert	2024-04-24 19:14:33 +00:00
Massimiliano Pronesti	2542a09abc	community[patch]: AzureSearch incorrectly converted to retriever (#20601 ) Closes #20600. Please see the issue for more details.	2024-04-18 16:06:47 -04:00
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 13:57:39 -07:00
Clément Tamines	a6cbb755a7	community[patch]: fix semantic answer bug in AzureSearch vector store (#18938 ) - Description: The `semantic_hybrid_search_with_score_and_rerank` method of `AzureSearch` contains a hardcoded field name "metadata" for the document metadata in the Azure AI Search Index. Adding such a field is optional when creating an Azure AI Search Index, as other snippets from `AzureSearch` test for the existence of this field before trying to access it. Furthermore, the metadata field name shouldn't be hardcoded as "metadata" and use the `FIELDS_METADATA` variable that defines this field name instead. In the current implementation, any index without a metadata field named "metadata" will yield an error if a semantic answer is returned by the search in `semantic_hybrid_search_with_score_and_rerank`. - Issue: https://github.com/langchain-ai/langchain/issues/18731 - Prior fix to this bug: This bug was fixed in this PR https://github.com/langchain-ai/langchain/pull/15642 by adding a check for the existence of the metadata field named `FIELDS_METADATA` and retrieving a value for the key called "key" in that metadata if it exists. If the field named `FIELDS_METADATA` was not present, an empty string was returned. This fix was removed in this PR https://github.com/langchain-ai/langchain/pull/15659 (see `ed1ffca911`#). @lz-chen: could you confirm this wasn't intentional? - New fix to this bug: I believe there was an oversight in the logic of the fix from [#1564](https://github.com/langchain-ai/langchain/pull/15642) which I explain below. The `semantic_hybrid_search_with_score_and_rerank` method creates a dictionary `semantic_answers_dict` with semantic answers returned by the search as follows. `5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)` The keys in this dictionary are the unique document ids in the index, if I understand the [documentation of semantic answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers) in Azure AI Search correctly. When the method transforms a search result into a `Document` object, an "answer" key is added to the document's metadata. The value for this "answer" key should be the semantic answer returned by the search from this document, if such an answer is returned. The match between a `Document` object and the semantic answers returned by the search should be done through the unique document id, which is used as a key for the `semantic_answers_dict` dictionary. This id is defined in the search result's field named `FIELDS_ID`. I added a check to avoid any error in case no field named `FIELDS_ID` exists in a search result (which shouldn't happen in theory). A benefit of this approach is that this fix should work whether or not the Azure AI Search Index contains a metadata field. @levalencia could you confirm my analysis and test the fix? @raunakshrivastava7 do you agree with the fix? Thanks for the help!	2024-03-25 18:51:54 -07:00
Keith Chan	914af69b44	community[patch]: Update azuresearch vectorstore from_texts() method to include fields argument (#17661 ) - Description: Update azuresearch vectorstore from_texts() method to include fields argument, necessary for creating an Azure AI Search index with custom fields. - Issue: Currently index fields are fixed to default fields if Azure Search index is created using from_texts() method - Dependencies: None - Twitter handle: None --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-08 17:05:35 -08:00
Krista Pratico	bf8e3c6dd1	community[patch]: add fixes for AzureSearch after update to stable azure-search-documents library (#17599 ) - Description: Addresses the bugs described in linked issue where an import was erroneously removed and the rename of a keyword argument was missed when migrating from beta --> stable of the azure-search-documents package - Issue: https://github.com/langchain-ai/langchain/issues/17598 - Dependencies: N/A - Twitter handle: N/A	2024-02-15 22:23:52 -08:00
Lingzhen Chen	30af711c34	community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659 ) - Description: Updates `libs/community/langchain_community/vectorstores/azuresearch.py` to support the stable version `azure-search-documents=11.4.0` - Issue: https://github.com/langchain-ai/langchain/issues/14534, https://github.com/langchain-ai/langchain/issues/15039, https://github.com/langchain-ai/langchain/issues/15355 - Dependencies: azure-search-documents>=11.4.0 --------- Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-02-12 19:23:35 -08:00
Harrison Chase	4eda647fdd	infra: add -p to mkdir in lint steps (#17013 ) Previously, if this did not find a mypy cache then it wouldnt run this makes it always run adding mypy ignore comments with existing uncaught issues to unblock other prs --------- Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-02-05 11:22:06 -08:00
Mahdi Setayesh	eb76f9c9fe	community: Fixing a performance issue with AzureSearch to perform batch embedding (#15594 ) - Description: Azure Cognitive Search vector DB store performs slow embedding as it does not utilize the batch embedding functionality. This PR provide a fix to improve the performance of Azure Search class when adding documents to the vector search, - Issue: #11313 , - Dependencies: any dependencies required for this change, - Twitter handle: we announce bigger features on Twitter. If your PR gets announced, and you'd like a mention, we'll gladly shout you out! Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` from the root of the package you've modified to check this locally. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ If you're adding a new integration, please include: 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17. -->	2024-01-12 10:58:55 -08:00
Raunak	64f5968a81	community: Replaced hardcoded "metadata" with FIELDS_METADATA variable in semantic_hybrid_search_with_score_and_rerank (#15642 ) - Description: This PR is to fix a bug in semantic_hybrid_search_with_score_and_rerank() function in langchain_community/vectorstores/azuresearch.py. The hardcoded "metadata" name is replaced with FIELDS_METADATA variable with an if block to check if the metadata column exists or not. - Issue: Fixed #15581 - Dependencies: No - Twitter handle: None Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>	2024-01-06 17:04:59 -08:00
Bagatur	ed58eeb9c5	community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463 ) Moved the following modules to new package langchain-community in a backwards compatible fashion: ``` mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community mv langchain/langchain/adapters community/langchain_community mv langchain/langchain/callbacks community/langchain_community/callbacks mv langchain/langchain/chat_loaders community/langchain_community mv langchain/langchain/chat_models community/langchain_community mv langchain/langchain/document_loaders community/langchain_community mv langchain/langchain/docstore community/langchain_community mv langchain/langchain/document_transformers community/langchain_community mv langchain/langchain/embeddings community/langchain_community mv langchain/langchain/graphs community/langchain_community mv langchain/langchain/llms community/langchain_community mv langchain/langchain/memory/chat_message_histories community/langchain_community mv langchain/langchain/retrievers community/langchain_community mv langchain/langchain/storage community/langchain_community mv langchain/langchain/tools community/langchain_community mv langchain/langchain/utilities community/langchain_community mv langchain/langchain/vectorstores community/langchain_community mv langchain/langchain/agents/agent_toolkits community/langchain_community mv langchain/langchain/cache.py community/langchain_community ``` Moved the following to core ``` mv langchain/langchain/utils/json_schema.py core/langchain_core/utils mv langchain/langchain/utils/html.py core/langchain_core/utils mv langchain/langchain/utils/strings.py core/langchain_core/utils cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py rm langchain/langchain/utils/env.py ``` See .scripts/community_split/script_integrations.sh for all changes	2023-12-11 13:53:30 -08:00

32 Commits