langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-08-31 10:23:18 +00:00

Author	SHA1	Message	Date
Luka Krapic	727a2ea9f1	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 ) Description: PR adds support for limiting number of messages preserved in a session history for DynamoDBChatMessageHistory --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:56:21 +00:00
Dt22	6dbf1a2de0	community[patch]: fix redis input type for index_schema field (#16874 ) ### Subject: Fix Type Misdeclaration for index_schema in redis/base.py I noticed a type misdeclaration for the index_schema column in the redis/base.py file. When following the instructions outlined in [Redis Custom Metadata Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis) to create our own index_schema, it leads to a Pylance type error. <br/> The error message indicates that Dict[str, list[Dict[str, str]]] is incompatible with the type Optional[Union[Dict[str, str], str, os.PathLike]]. ``` index_schema = { "tag": [{"name": "credit_score"}], "text": [{"name": "user"}, {"name": "job"}], "numeric": [{"name": "age"}], } rds, keys = Redis.from_texts_return_keys( texts, embeddings, metadatas=metadata, redis_url="redis://localhost:6379", index_name="users_modified", index_schema=index_schema, ) ``` Therefore, I have created this pull request to rectify the type declaration problem. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:55:54 +00:00
morgana	074ad5095f	community[patch]: mmr search for Rockset vectorstore integration (#16908 ) - Description: Adding support for mmr search in the Rockset vectorstore integration. - Issue: N/A - Dependencies: N/A - Twitter handle: `@_morgan_adams_` --------- Co-authored-by: Rockset API Bot <admin@rockset.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 18:45:22 +00:00
shahrin014	f51e6a35ba	community[patch]: OllamaEmbeddings - Pass headers to post request (#16880 ) ## Feature - Set additional headers in constructor - Headers will be sent in post request This feature is useful if deploying Ollama on a cloud service such as hugging face, which requires authentication tokens to be passed in the request header. ## Tests - Test if header is passed - Test if header is not passed Similar to https://github.com/langchain-ai/langchain/pull/15881 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 18:44:52 +00:00
Jan Chorowski	b8b42ccbc5	community[minor]: Pathway vectorstore(#14859 ) - Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore - Issue: not applicable - Dependencies: optional dependency on [`pathway`](https://pypi.org/project/pathway/) - Twitter handle: pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <lewymati@users.noreply.github.com> Co-authored-by: mlewandowski <mlewandowski@MacBook-Pro-mlewandowski.local> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> Co-authored-by: mlewandowski <mlewandowski@macbook-pro-mlewandowski.home> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: mlewandowski <mlewandowski@MBPmlewandowski.ht.home> Co-authored-by: Szymon Dudycz <szymond@pathway.com> Co-authored-by: Szymon Dudycz <szymon.dudycz@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 10:50:39 -07:00
Arturs Konfino	2319212d54	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 ) If `prompt` is passed into `create_sql_agent()`, then `toolkit.get_context()` shouldn't be executed against the database unless relevant prompt variables (`table_info` or `table_names`) are present .	2024-03-29 16:42:21 +00:00
高璟琦	ec7a59c96c	community[minor]: Add solar embedding (#19761 ) Solar is a large language model developed by [Upstage](https://upstage.ai/). It's a powerful and purpose-trained LLM. You can visit the embedding service provided by Solar within this pr. You may get SOLAR_API_KEY from https://console.upstage.ai/services/embedding You can refer to more details about accepted llm integration at https://python.langchain.com/docs/integrations/llms/solar.	2024-03-29 09:36:05 -07:00
Tomaz Bratanic	dec00d3050	community[patch]: Add the ability to pass maps to neo4j retrieval query (#19758 ) Makes it easier to flatten complex values to text, so you don't have to use a lot of Cypher to do it.	2024-03-29 08:33:48 -07:00
Robby	f7e8a382cc	community[minor]: add hugging face text-to-speech inference API (#18880 ) Description: I implemented a tool to use Hugging Face text-to-speech inference API. Issue: n/a Dependencies: n/a Twitter handle: No Twitter, but do have [LinkedIn](https://www.linkedin.com/in/robby-horvath/) lol. --------- Co-authored-by: Robby <h0rv@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 15:02:29 +00:00
DasDingoCodes	73eb3f8fd9	community[minor]: Implement DirectoryLoader lazy_load function (#19537 ) Thank you for contributing to LangChain! - [x] PR title: "community: Implement DirectoryLoader lazy_load function" - [x] Description: The `lazy_load` function of the `DirectoryLoader` yields each document separately. If the given `loader_cls` of the `DirectoryLoader` also implemented `lazy_load`, it will be used to yield subdocuments of the file. - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access: `libs/community/tests/unit_tests/document_loaders/test_directory_loader.py` 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory: `docs/docs/integrations/document_loaders/directory.ipynb` - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-03-29 14:46:52 +00:00
Jialei	f7c903e24a	community[minor]: add support for Moonshot llm and chat model (#17100 )	2024-03-29 08:54:23 +00:00
Ethan Yang	7164015135	community[minor]: Add Openvino embedding support (#19632 ) This PR is used to support both HF and BGE embeddings with openvino --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>	2024-03-29 01:34:51 -07:00
T Cramer	540ebf35a9	community[patch]: Add explicit error message to Bedrock error output. (#17328 ) - Description: Propagate Bedrock errors into Langchain explicitly. Use-case: unset region error is hidden behind 'Could not load credentials...' message - Issue: [17654](https://github.com/langchain-ai/langchain/issues/17654) - Dependencies: None --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 03:07:33 +00:00
Marcus Virginia	69bb96c80f	community[patch]: surrealdb handle for empty metadata and allow collection names with complex characters (#17374 ) - Description: Handle for empty metadata and allow collection names with complex characters - Issue: #17057 - Dependencies: `surrealdb` --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-29 01:04:27 +00:00
kYLe	124ab79c23	community[minor]: Add Anyscale embedding support (#17605 ) Description: Add embedding model support for Anyscale Endpoint Dependencies: openai --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:53:53 +00:00
Lance Martin	12843f292f	community[patch]: llama cpp embeddings reset default n_batch (#17594 ) When testing Nomic embeddings -- ``` from langchain_community.embeddings import LlamaCppEmbeddings embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf" embd_lc = LlamaCppEmbeddings(model_path=embd_model_path) embedding_lc = embd_lc.embed_query(query) ``` We were seeing this error for strings > a certain size -- ``` File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count) 824 s_sizes = [] 826 # add to batch --> 827 self._batch.add_sequence(tokens, len(s_sizes), False) 828 t_batch += n_tokens 829 s_sizes.append(n_tokens) File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all) 540 self.batch.token[j] = batch[i] 541 self.batch.pos[j] = i --> 542 self.batch.seq_id[j][0] = seq_id 543 self.batch.n_seq_id[j] = 1 544 self.batch.logits[j] = logits_all ValueError: NULL pointer access ``` The default `n_batch` of llama-cpp-python's Llama is `512` but we were explicitly setting it to `8`. These need to be set to equal for embedding models. * The embedding.cpp example has an assertion to make sure these are always equal. * Apparently this is not being done properly in llama-cpp-python. With `n_batch` set to 8, if more than 8 tokens are passed the batch runs out of space and it crashes. This also explains why the CPU compute buffer size was small: raw client with default `n_batch=512` ``` llama_new_context_with_model: CPU input buffer size = 3.51 MiB llama_new_context_with_model: CPU compute buffer size = 21.00 MiB ``` langchain with `n_batch=8` ``` llama_new_context_with_model: CPU input buffer size = 0.04 MiB llama_new_context_with_model: CPU compute buffer size = 0.33 MiB ``` We can work around this by passing `n_batch=512`, but this will not be obvious to some users: ``` embedding = LlamaCppEmbeddings(model_path=embd_model_path, n_batch=512) ``` From discussion w/ @cebtenzzre. Related: https://github.com/abetlen/llama-cpp-python/issues/1189 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:47:22 +00:00
Zijian Han	8e976545f3	community[patch]: support OpenAI whisper base url (#17695 ) Description: The base URL for OpenAI is retrieved from the environment variable "OPENAI_BASE_URL", whereas for langchain it is obtained from "OPENAI_API_BASE". By adding `base_url = os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute correctly. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:35:27 +00:00
Paulo Nascimento	44a3484503	community[patch]: add NotebookLoader unit test (#17721 ) Thank you for contributing to LangChain! - Description: added unit tests for NotebookLoader. Linked PR: https://github.com/langchain-ai/langchain/pull/17614 - Issue: [#17614](https://github.com/langchain-ai/langchain/pull/17614) - Twitter handle: @paulodoestech - [x] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/ - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: lachiewalker <lachiewalker1@hotmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:27:46 +00:00
Paulo Nascimento	4c3a67122f	community[patch]: add Integration for OpenAI image gen with v1 sdk (#17771 ) Description: Created a Langchain Tool for OpenAI DALLE Image Generation. Issue: [#15901](https://github.com/langchain-ai/langchain/issues/15901) Dependencies: n/a Twitter handle: @paulodoestech - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-29 00:23:14 +00:00
Jiaming	3d3cc71287	community[patch]: fix bugs for bilibili Loader (#18036 ) - Description: 1. Fix the BiliBiliLoader that can receive cookie parameters, it requires 3 other parameters to run. The change is backward compatible. 2. Add test; 3. Add example in docs - Issue: [#14213] Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 16:39:38 -07:00
Sachin Paryani	25c9f3d1d1	community[patch]: Support Streaming in Azure Machine Learning (#18246 ) - [x] PR title: "community: Support streaming in Azure ML and few naming changes" - [x] PR message: - Description: Added support for streaming for azureml_endpoint. Also, renamed and AzureMLEndpointApiType.realtime to AzureMLEndpointApiType.dedicated. Also, added new classes CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and updated the classes LlamaChatContentFormatter and LlamaContentFormatter to now show a deprecated warning message when instantiated. --------- Co-authored-by: Sachin Paryani <saparan@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 23:38:20 +00:00
Victor Adan	afa2d85405	community[patch]: Added missing from_documents method to KNNRetriever. (#18411 ) - Description: Added missing `from_documents` method to `KNNRetriever`, providing the ability to supply metadata to LangChain `Document`s, and to give it parity to the other retrievers, which do have `from_documents`. - Issue: None - Dependencies: None - Twitter handle: None Co-authored-by: Victor Adan <vadan@netroadshow.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-28 23:18:50 +00:00
Smit Parmar	dfc4177b50	community[patch]: mypy ignore fix (#18483 ) Relates to #17048 Description : Applied fix to dynamodb and elasticsearch file. Error was : `Cannot override writeable attribute with read-only property` Suggestion: instead of adding ``` @messages.setter def messages(self, messages: List[BaseMessage]) -> None: raise NotImplementedError("Use add_messages instead") ``` we can change base class property `messages: List[BaseMessage]` to ``` @property def messages(self) -> List[BaseMessage]:... ``` then we don't need to add `@messages.setter` in all child classes.	2024-03-28 15:36:53 -07:00
Luca Dorigo	f19229c564	core[patch]: fix beta, deprecated typing (#18877 ) Description: While not technically incorrect, the TypeVar used for the `@beta` decorator prevented pyright (and thus most vscode users) from correctly seeing the types of functions/classes decorated with `@beta`. This is in part due to a small bug in pyright (https://github.com/microsoft/pyright/issues/7448 ) - however, the `Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not doing anything - classes are `Callables` by default, so by my understanding binding to `Type` does not actually provide any more safety - the modified annotation still works correctly for both functions, properties, and classes. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 22:33:43 +00:00
wulixuan	b7c8bc8268	community[patch]: fix yuan2 errors in LLMs (#19004 ) 1. fix yuan2 errors while invoke Yuan2. 2. update tests.	2024-03-28 14:37:44 -07:00
高远	688ca48019	community[patch]: Adding validation when vector does not exist (#19698 ) Adding validation when vector does not exist Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>	2024-03-28 13:58:23 -07:00
Chaunte W. Lacewell	4a49fc5a95	community[patch]: Fix bug in vdms (#19728 ) Description: Fix embedding check in vdms Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe)	2024-03-28 12:54:24 -07:00
高璟琦	75173d31db	community[minor]: Add solar model chat model (#18556 ) Add our solar chat models, available model choices: * solar-1-mini-chat * solar-1-mini-translate-enko * solar-1-mini-translate-koen More documents and pricing can be found at https://console.upstage.ai/services/solar. The references to our solar model can be found at * https://arxiv.org/abs/2402.17032 --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 12:31:11 -07:00
Davide Menini	f7042321f1	community[patch]: gather token usage info in BedrockChat during generation (#19127 ) This PR allows to calculate token usage for prompts and completion directly in the generation method of BedrockChat. The token usage details are then returned together with the generations, so that other downstream tasks can access them easily. This allows to define a callback for tokens tracking and cost calculation, similarly to what happens with OpenAI (see [OpenAICallbackHandler](https://api.python.langchain.com/en/latest/_modules/langchain_community/callbacks/openai_info.html#OpenAICallbackHandler). I plan on adding a BedrockCallbackHandler later. Right now keeping track of tokens in the callback is already possible, but it requires passing the llm, as done here: https://how.wtf/how-to-count-amazon-bedrock-anthropic-tokens-with-langchain.html. However, I find the approach of this PR cleaner. Thanks for your reviews. FYI @baskaryan, @hwchase17 --------- Co-authored-by: taamedag <Davide.Menini@swisscom.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:58:46 +00:00
ligang-super	a662468dde	community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666 ) - [x] PR title: "community: fix baidu qianfan missing stop parameter" - [x] PR message: - **Description: Baidu Qianfan lost the stop parameter when requesting service due to extracting it from kwargs. This bug can cause the agent to receive incorrect results --------- Co-authored-by: ligang33 <ligang33@baidu.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 18:21:49 +00:00
kaijietti	9c4b6dc979	community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381 ) Without `await`, the `stream` returned from the `async_client` is actually a coroutine, which could not be used in `async for`.	2024-03-27 21:34:46 -07:00
Christian Galo	1adaa3c662	community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488 ) This is a follow up to #18371. These are the changes: - New Azure AI Services toolkit and tools to replace those of Azure Cognitive Services. - Updated documentation for Microsoft platform. - The image analysis tool has been rewritten to use the new package `azure-ai-vision-imageanalysis`, doing a proper replacement of `azure-ai-vision`. These changes: - Update outdated naming from "Azure Cognitive Services" to "Azure AI Services". - Update documentation to use non-deprecated methods to create and use agents. - Removes need to depend on yanked python package (`azure-ai-vision`) There is one new dependency that is needed as a replacement to `azure-ai-vision`: - `azure-ai-vision-imageanalysis`. This is optional and declared within a function. There is a new `azure_ai_services.ipynb` notebook showing usage; Changes have been linted and formatted. I am leaving the actions of adding deprecation notices and future removal of Azure Cognitive Services up to the LangChain team, as I am not sure what the current practice around this is. --- If this PR makes it, my handle is @galo@mastodon.social --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-03-28 03:19:02 +00:00
Shengsheng Huang	ac1dd8ad94	community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518 ) - Description: `bigdl-llm` library has been renamed to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR migrates the `bigdl-llm` integration to `ipex-llm` . - Issue: N/A. The original PR of `bigdl-llm` is https://github.com/langchain-ai/langchain/pull/17953 - Dependencies: `ipex-llm` library - Contribution maintainer: @shane-huang Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb Updated test: libs/community/tests/integration_tests/llms/test_ipex_llm.py	2024-03-27 20:12:59 -07:00
Chaunte W. Lacewell	a31f692f4e	community[minor]: Add VDMS vectorstore (#19551 ) - Description: Add support for Intel Lab's [Visual Data Management System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store - Dependencies: `vdms` library which requires protobuf = "4.24.2". There is a conflict with dashvector in `langchain` package but conflict is resolved in `community`. - Contribution maintainer: [@cwlacewe](https://github.com/cwlacewe) - Added tests: libs/community/tests/integration_tests/vectorstores/test_vdms.py - Added docs: docs/docs/integrations/vectorstores/vdms.ipynb - Added cookbook: cookbook/multi_modal_RAG_vdms.ipynb --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:12:11 +00:00
William FH	b7b62e29fb	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 ) If you use an embedding dist function in an eval loop, you get warned every time. Would prefer to just check once and forget about it. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-28 03:11:02 +00:00
yongheng.liu	7e29b6061f	community[minor]: integrate China Mobile Ecloud vector search (#15298 ) - Description: integrate China Mobile Ecloud vector search, - Dependencies: elasticsearch==7.10.1 Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 23:02:40 +00:00
Hyeongchan Kim	9b70131aed	community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839 ) * Description: add `None` type for `file_path` along with `str` and `List[str]` types. * `file_path`/`filename` arguments in `get_elements_from_api()` and `partition()` can be `None`, however, there's no `None` type hint for `file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader` currently. * calling the function with `file_path=None` is no problem, but my IDE annoys me lol. * Issue: N/A * Dependencies: N/A Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-27 22:31:54 +00:00
CaroFG	cf96060ab7	community[patch]: update for compatibility with latest Meilisearch version (#18970 ) - Description: Updates Meilisearch vectorstore for compatibility with v1.6 and above. Adds embedders settings and embedder_name which are now required. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:08:27 +00:00
chyroc	be2adb1083	community[patch]: support unstructured_kwargs for s3 loader (#15473 ) fix https://github.com/langchain-ai/langchain/issues/15472 Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 22:03:48 +00:00
Tomaz Bratanic	87d2a6b777	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	2024-03-27 14:20:12 -04:00
Rajendra Kadam	0019d8a948	community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574 ) Description: PebbloSafeLoader: Add support for non-file-based Document Loaders This pull request enhances PebbloSafeLoader by introducing support for several non-file-based Document Loaders. With this update, PebbloSafeLoader now seamlessly integrates with the following loaders: - GoogleDriveLoader - SlackDirectoryLoader - Unstructured EmailLoader Issue: NA Dependencies: - None Twitter handle: @Raj__725 --------- Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-03-27 17:39:52 +00:00
hulitaitai	dc2c9dd4d7	Update text2vec.py (#19657 ) Add that URL of the embedding tool "text2vec". Fix minor mistakes in the doc-string.	2024-03-27 13:13:30 -04:00
Guangdong Liu	7042934b5f	community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277 ) - Issue: close #18291 - @baskaryan, @eyurtsev PTAL	2024-03-27 11:43:38 -04:00
yuwenzho	3a7d2cf443	community[minor]: Add ITREX optimized Embeddings (#18474 ) Introduction [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms Description adding ITREX runtime embeddings using intel-extension-for-transformers. added mdx documentation and example notebooks added embedding import testing. --------- Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 07:22:06 +00:00
Fabrizio Ruocco	f12cb0bea4	community[patch]: Microsoft Azure Document Intelligence updates (#16932 ) - Description: Update Azure Document Intelligence implementation by Microsoft team and RAG cookbook with Azure AI Search --------- Co-authored-by: Lu Zhang (AI) <luzhan@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: teethache <hongyateng2006@126.com> Co-authored-by: Lu Zhang <44625949+luzhang06@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 23:36:59 -07:00
Timothy	ad77fa15ee	community[patch]: Adding try-except block for GCSDirectoryLoader (#19591 ) - Description: Implemented try-except block for `GCSDirectoryLoader`. Reason: Users processing large number of unstructured files in a folder may experience many different errors. A try-exception block is added to capture these errors. A new argument `use_try_except=True` is added to enable silent failure so that error caused by processing one file does not break the whole function. - Issue: N/A - Dependencies: no new dependencies - Twitter handle: timothywong731 --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-27 00:12:24 +00:00
xsai9101	160a8eb178	community[minor]: add oracle autonomous database doc loader integration (#19536 ) Thank you for contributing to LangChain! - [ ] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [ ] PR message: *Delete this entire checklist* and replace with - Description: Adding oracle autonomous database document loader integration. This will allow users to connect to oracle autonomous database through connection string or TNS configuration. https://www.oracle.com/autonomous-database/ - Issue: None - Dependencies: oracledb python package https://pypi.org/project/oracledb/ - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [ ] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. Unit test and doc are added. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 17:02:18 -07:00
Adam Law	aeb7b6b11d	community[patch]: use semantic_configurations in AzureSearch (#19347 ) - Description: Currently the semantic_configurations are not used when creating an AzureSearch instance, instead creating a new one with default values. This PR changes the behavior to use the passed semantic_configurations if it is present, and the existing default configuration if not. --------- Co-authored-by: Adam Law <adamlaw@microsoft.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-03-26 13:57:39 -07:00
Adrian Valente	2763d8cbe5	community: add len() implementation to Chroma (#19419 ) Thank you for contributing to LangChain! - [x] Add len() implementation to Chroma: "package: community" - [x] PR message: - Description: add an implementation of the __len__() method for the Chroma vectostore, for convenience. - Issue: no exposed method to know the size of a Chroma vectorstore - Dependencies: None - Twitter handle: lowrank_adrian - [x] Add tests and docs - [x] Lint and test --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-03-26 12:53:10 -04:00
Tom Aarsen	e0a1278d2b	docs: HFEmbeddings: Add more information to model_kwargs/encode_kwargs (#19594 ) - Description: Be more explicit with the `model_kwargs` and `encode_kwargs` for `HuggingFaceEmbeddings`. - Issue: - - Dependencies: - I received some reports by my users that they didn't realise that you could change the default `batch_size` with `HuggingFaceEmbeddings`, which may be attributed to how the `model_kwargs` and `encode_kwargs` don't give much information about what you can specify. I've added some parameter names & links to the Sentence Transformers documentation to help clear it up. Let me know if you'd rather have Markdown/Sphinx-style hyperlinks rather than a "bare URL". - Tom Aarsen	2024-03-26 12:46:04 -04:00

... 3 4 5 6 7 ...

887 Commits