langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-12-10 07:28:22 +00:00

Author	SHA1	Message	Date
Mr. Lance E Sloan «UMich»	84dc2dd059	community[patch]: Load YouTube transcripts (captions) as fixed-duration chunks with start times (#21710 ) - Description: Add a new format, `CHUNKS`, to `langchain_community.document_loaders.youtube.YoutubeLoader` which creates multiple `Document` objects from YouTube video transcripts (captions), each of a fixed duration. The metadata of each chunk `Document` includes the start time of each one and a URL to that time in the video on the YouTube website. I had implemented this for UMich (@umich-its-ai) in a local module, but it makes sense to contribute this to LangChain community for all to benefit and to simplify maintenance. - Issue: N/A - Dependencies: N/A - Twitter: lsloan_umich - Mastodon: [lsloan@mastodon.social](https://mastodon.social/@lsloan) With regards to tests and documentation, most existing features of the `YoutubeLoader` class are not tested. Only the `YoutubeLoader.extract_video_id()` static method had a test. However, while I was waiting for this PR to be reviewed and merged, I had time to add a test for the chunking feature I've proposed in this PR. I have added an example of using chunking to the `docs/docs/integrations/document_loaders/youtube_transcript.ipynb` notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-11 17:44:36 +00:00
Aayush Kataria	71811e0547	community[minor]: Adds a vector store for Azure Cosmos DB for NoSQL (#21676 ) This PR add supports for Azure Cosmos DB for NoSQL vector store. Summary: Description: added vector store integration for Azure Cosmos DB for NoSQL Vector Store, Dependencies: azure-cosmos dependency, Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-11 10:34:01 -07:00
Mathis Joffre	ea43f40daf	community[minor]: Add support for OVHcloud AI Endpoints Embedding (#22667 ) Description: Add support for [OVHcloud AI Endpoints](https://endpoints.ai.cloud.ovh.net/) Embedding models. Inspired by: https://gist.github.com/gmasse/e1f99339e161f4830df6be5d0095349a Signed-off-by: Joffref <mariusjoffre@gmail.com>	2024-06-10 21:07:25 +00:00
Tomaz Bratanic	76a193decc	community[patch]: Add function response to graph cypher qa chain (#22690 ) LLMs struggle with Graph RAG, because it's different from vector RAG in a way that you don't provide the whole context, only the answer and the LLM has to believe. However, that doesn't really work a lot of the time. However, if you wrap the context as function response the accuracy is much better. btw... `union[LLMChain, Runnable]` is linting fun, that's why so many ignores	2024-06-10 13:52:17 -07:00
X-HAN	34edfe4a16	community[minor]: add Volcengine Rerank (#22700 ) Description: this PR adds Volcengine Rerank capability to Langchain, you can find Volcengine Rerank API from [here](https://www.volcengine.com/docs/84313/1254474) & [here](https://www.volcengine.com/docs/84313/1254605). [Volcengine](https://www.volcengine.com/) is a cloud service platform developed by ByteDance, the parent company of TikTok. You can obtain Volcengine API AK/SK from [here](https://www.volcengine.com/docs/84313/1254553). Dependencies: VolcengineRerank depends on `volcengine` python package. Twitter handle: my twitter/x account is https://x.com/LastMonopoly and I'd like a mention, thank you! Tests and docs 1. integration test: `test_volcengine_rerank.py` 2. example notebook: `volcengine_rerank.ipynb` Lint and test: I have run `make format`, `make lint` and `make test` from the root of the package I've modified.	2024-06-10 13:41:05 -07:00
Max Mulatz	058a64c563	Community[minor]: Add language parser for Elixir (#22742 ) Hi 👋 First off, thanks a ton for your work on this 💚 Really appreciate what you're providing here for the community. ## Description This PR adds a basic language parser for the [Elixir](https://elixir-lang.org/) programming language. The parser code is based upon the approach outlined in https://github.com/langchain-ai/langchain/pull/13318: it's using `tree-sitter` under the hood and aligns with all the other `tree-sitter` based parses added that PR. The `CHUNK_QUERY` I'm using here is probably not the most sophisticated one, but it worked for my application. It's a starting point to provide "core" parsing support for Elixir in LangChain. It enables people to use the language parser out in real world applications which may then lead to further tweaking of the queries. I consider this PR just the ground work. - Dependencies: requires `tree-sitter` and `tree-sitter-languages` from the extended dependencies - Twitter handle:`@bitcrowd` ## Checklist - [x] PR title: "package: description" - [x] Add tests and docs - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. <!-- If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->	2024-06-10 15:56:57 +00:00
Philippe PRADOS	9aabb446c5	community[minor]: Add SQL storage implementation (#22207 ) Hello @eyurtsev - package: langchain-comminity - Description: Add SQL implementation for docstore. A new implementation, in line with my other PR ([async PGVector](https://github.com/langchain-ai/langchain-postgres/pull/32), [SQLChatMessageMemory](https://github.com/langchain-ai/langchain/pull/22065)) - Twitter handler: pprados --------- Signed-off-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Piotr Mardziel <piotrm@gmail.com> Co-authored-by: ChengZi <chen.zhang@zilliz.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-07 21:17:02 +00:00
Cahid Arda Öz	6c07eb0c12	community[minor]: Add UpstashRatelimitHandler (#21885 ) Adding `UpstashRatelimitHandler` callback for rate limiting based on number of chain invocations or LLM token usage. For more details, see [upstash/ratelimit-py repository](https://github.com/upstash/ratelimit-py) or the notebook guide included in this PR. Twitter handle: @cahidarda --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-07 21:02:06 +00:00
Erick Friis	a24a9c6427	multiple: get rid of pyproject extras (#22581 ) They cause `poetry lock` to take a ton of time, and `uv pip install` can resolve the constraints from these toml files in trivial time (addressing problem with #19153) This allows us to properly upgrade lockfile dependencies moving forward, which revealed some issues that were either fixed or type-ignored (see file comments)	2024-06-06 15:45:22 -07:00
Isaac Francisco	ba3e219d83	community[patch]: recursive url loader fix and unit tests (#22521 ) Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-05 17:56:20 -07:00
X-HAN	62f13f95e4	community[minor]: add DashScope Rerank (#22403 ) Description: this PR adds DashScope Rerank capability to Langchain, you can find DashScope Rerank API from [here](https://help.aliyun.com/document_detail/2780058.html?spm=a2c4g.2780059.0.0.6d995024FlrJ12) & [here](https://help.aliyun.com/document_detail/2780059.html?spm=a2c4g.2780058.0.0.63f75024cr11N9). [DashScope](https://dashscope.aliyun.com/) is the generative AI service from Alibaba Cloud (Aliyun). You can create DashScope API key from [here](https://bailian.console.aliyun.com/?apiKey=1#/api-key). Dependencies: DashScopeRerank depends on `dashscope` python package. Twitter handle: my twitter/x account is https://x.com/LastMonopoly and I'd like a mention, thanks you! Tests and docs 1. integration test: `test_dashscope_rerank.py` 2. example notebook: `dashscope_rerank.ipynb` Lint and test: I have run `make format`, `make lint` and `make test` from the root of the package I've modified. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-05 15:40:21 -07:00
Philippe PRADOS	8250c177de	community[minor]: Add native async support to SQLChatMessageHistory (#22065 ) # package community: Fix SQLChatMessageHistory ## Description Here is a rewrite of `SQLChatMessageHistory` to properly implement the asynchronous approach. The code circumvents [issue 22021](https://github.com/langchain-ai/langchain/issues/22021) by accepting a synchronous call to `def add_messages()` in an asynchronous scenario. This bypasses the bug. For the same reasons as in [PR 22](https://github.com/langchain-ai/langchain-postgres/pull/32) of `langchain-postgres`, we use a lazy strategy for table creation. Indeed, the promise of the constructor cannot be fulfilled without this. It is not possible to invoke a synchronous call in a constructor. We compensate for this by waiting for the next asynchronous method call to create the table. The goal of the `PostgresChatMessageHistory` class (in `langchain-postgres`) is, among other things, to be able to recycle database connections. The implementation of the class is problematic, as we have demonstrated in [issue 22021](https://github.com/langchain-ai/langchain/issues/22021). Our new implementation of `SQLChatMessageHistory` achieves this by using a singleton of type (`Async`)`Engine` for the database connection. The connection pool is managed by this singleton, and the code is then reentrant. We also accept the type `str` (optionally complemented by `async_mode`. I know you don't like this much, but it's the only way to allow an asynchronous connection string). In order to unify the different classes handling database connections, we have renamed `connection_string` to `connection`, and `Session` to `session_maker`. Now, a single transaction is used to add a list of messages. Thus, a crash during this write operation will not leave the database in an unstable state with a partially added message list. This makes the code resilient. We believe that the `PostgresChatMessageHistory` class is no longer necessary and can be replaced by: ``` PostgresChatMessageHistory = SQLChatMessageHistory ``` This also fixes the bug. ## Issue - [issue 22021](https://github.com/langchain-ai/langchain/issues/22021) - Bug in _exit_history() - Bugs in PostgresChatMessageHistory and sync usage - Bugs in PostgresChatMessageHistory and async usage - [issue 36](https://github.com/langchain-ai/langchain-postgres/issues/36) ## Twitter handle: pprados ## Tests - libs/community/tests/unit_tests/chat_message_histories/test_sql.py (add async test) @baskaryan, @eyurtsev or @hwchase17 can you check this PR ? And, I've been waiting a long time for validation from other PRs. Can you take a look? - [PR 32](https://github.com/langchain-ai/langchain-postgres/pull/32) - [PR 15575](https://github.com/langchain-ai/langchain/pull/15575) - [PR 13200](https://github.com/langchain-ai/langchain/pull/13200) --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-05 15:10:38 +00:00
Vincent Min	59bef31997	community[minor]: Improve InMemoryVectorStore with ability to persist to disk and filter on metadata. (#22186 ) - Description: The InMemoryVectorStore is a nice and simple vector store implementation for quick development and debugging. The current implementation is quite limited in its functionalities. This PR extends the functionalities by adding utility function to persist the vector store to a json file and to load it from a json file. We choose the json file format because it allows inspection of the database contents in a text editor, which is great for debugging. Furthermore, it adds a `filter` keyword that can be used to filter out documents on their `page_content` or `metadata`. - Issue: - - Dependencies: - - Twitter handle: @Vincent_Min	2024-06-05 10:40:34 -04:00
Ofer Mendelevitch	ad502e8d50	community[minor]: Vectara Integration Update - Streaming, FCS, Chat, updates to documentation and example notebooks (#21334 ) Thank you for contributing to LangChain! Description: update to the Vectara / Langchain integration to integrate new Vectara capabilities: - Full RAG implemented as a Runnable with as_rag() - Vectara chat supported with as_chat() - Both support streaming response - Updated documentation and example notebook to reflect all the changes - Updated Vectara templates Twitter handle: ofermend Add tests and docs: no new tests or docs, but updated both existing tests and existing docs	2024-06-04 12:57:28 -07:00
Joydeep Banik Roy	3796672c67	community, milvus, pinecone, qdrant, mongo: Broadcast operation failure while using simsimd beyond v3.7.7 (#22271 ) - [ ] Packages affected: - community: fix `cosine_similarity` to support simsimd beyond 3.7.7 - partners/milvus: fix `cosine_similarity` to support simsimd beyond 3.7.7 - partners/mongodb: fix `cosine_similarity` to support simsimd beyond 3.7.7 - partners/pinecone: fix `cosine_similarity` to support simsimd beyond 3.7.7 - partners/qdrant: fix `cosine_similarity` to support simsimd beyond 3.7.7 - [ ] Broadcast operation failure while using simsimd beyond v3.7.7: - Description: I was using simsimd 4.3.1 and the unsupported operand type issue popped up. When I checked out the repo and ran the tests, they failed as well (have attached a screenshot for that). Looks like it is a variant of https://github.com/langchain-ai/langchain/issues/18022 . Prior to 3.7.7, simd.cdist returned an ndarray but now it returns simsimd.DistancesTensor which is ineligible for a broadcast operation with numpy. With this change, it also remove the need to explicitly cast `Z` to numpy array - Issue: #19905 - Dependencies: No - Twitter handle: https://x.com/GetzJoydeep <img width="1622" alt="Screenshot 2024-05-29 at 2 50 00 PM" src="https://github.com/langchain-ai/langchain/assets/31132555/fb27b383-a9ae-4a6f-b355-6d503b72db56"> - [ ] Considerations: 1. I started with community but since similar changes were there in Milvus, MongoDB, Pinecone, and QDrant so I modified their files as well. If touching multiple packages in one PR is not the norm, then I can remove them from this PR and raise separate ones 2. I have run and verified that the tests work. Since, only MongoDB had tests, I ran theirs and verified it works as well. Screenshots attached : <img width="1573" alt="Screenshot 2024-05-29 at 2 52 13 PM" src="https://github.com/langchain-ai/langchain/assets/31132555/ce87d1ea-19b6-4900-9384-61fbc1a30de9"> <img width="1614" alt="Screenshot 2024-05-29 at 3 33 51 PM" src="https://github.com/langchain-ai/langchain/assets/31132555/6ce1d679-db4c-4291-8453-01028ab2dca5"> I have added a test for simsimd. I feel it may not go well with the CI/CD setup as installing simsimd is not a dependency requirement. I have just imported simsimd to ensure simsimd cosine similarity is invoked. However, its not a good approach. Suggestions are welcome and I can make the required changes on the PR. Please provide guidance on the same as I am new to the community. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-06-04 17:36:31 +00:00
KyrianC	03178ee74f	community[minor]: Add tools calls to `ChatEdenAI` (#22320 ) ### Description Add tools implementation to `ChatEdenAI`: - `bind_tools()` - `with_structured_output()` ### Documentation Updated `docs/docs/integrations/chat/edenai.ipynb` ### Notes We don´t support stream with tools as of yet. If stream is called with tools we directly yield the whole message from `generate` (implemented the same way as Anthropic did).	2024-06-04 10:29:28 -07:00
Rahul Triptahi	77ad857934	community[minor]: Enable retrieval api calls in PebbloRetrievalQA (#21958 ) Description: Enable app discovery and Prompt/Response apis in PebbloSafeRetrieval Documentation: NA Unit test: N/A --------- Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>	2024-06-04 10:18:50 -07:00
ccurme	afe89a1411	community: add standard chat model params to Ollama (#22446 )	2024-06-03 17:45:03 -04:00
maang-h	13140dc4ff	community[patch]: Update the default api_url and reqeust_body of sparkllm embedding (#22136 ) - Description: When I was running the SparkLLMTextEmbeddings, app_id, api_key and api_secret are all correct, but it cannot run normally using the current URL. ```python # example from langchain_community.embeddings import SparkLLMTextEmbeddings embedding= SparkLLMTextEmbeddings( spark_app_id="my-app-id", spark_api_key="my-api-key", spark_api_secret="my-api-secret" ) embedding= "hello" print(spark.embed_query(text1)) ``` ![sparkembedding](https://github.com/langchain-ai/langchain/assets/55082429/11daa853-4f67-45b2-aae2-c95caa14e38c) So I updated the url and request body parameters according to [Embedding_api](https://www.xfyun.cn/doc/spark/Embedding_api.html), now it is runnable.	2024-06-03 12:38:11 -07:00
Yuwen Hu	ba0dca46d7	community[minor]: Add IPEX-LLM BGE embedding support on both Intel CPU and GPU (#22226 ) Description: [IPEX-LLM](https://github.com/intel-analytics/ipex-llm) is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency. This PR adds ipex-llm integrations to langchain for BGE embedding support on both Intel CPU and GPU. Dependencies: `ipex-llm`, `sentence-transformers` Contribution maintainer: @Oscilloscope98 tests and docs: - langchain/docs/docs/integrations/text_embedding/ipex_llm.ipynb - langchain/docs/docs/integrations/text_embedding/ipex_llm_gpu.ipynb - langchain/libs/community/tests/integration_tests/embeddings/test_ipex_llm.py --------- Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com>	2024-06-03 12:37:10 -07:00
Pavlo Paliychuk	342df7cf83	community[minor]: Add Zep Cloud components + docs + examples (#21671 ) Thank you for contributing to LangChain! - [x] PR title: community: Add Zep Cloud components + docs + examples - [x] PR message: We have recently released our new zep-cloud sdks that are compatible with Zep Cloud (not Zep Open Source). We have also maintained our Cloud version of langchain components (ChatMessageHistory, VectorStore) as part of our sdks. This PRs goal is to port these components to langchain community repo, and close the gap with the existing Zep Open Source components already present in community repo (added ZepCloudMemory,ZepCloudVectorStore,ZepCloudRetriever). Also added a ZepCloudChatMessageHistory components together with an expression language example ported from our repo. We have left the original open source components intact on purpose as to not introduce any breaking changes. - Issue: - - Dependencies: Added optional dependency of our new cloud sdk `zep-cloud` - Twitter handle: @paulpaliychuk51 - [x] Add tests and docs - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.	2024-05-27 12:50:13 -07:00
Jirka Lhotka	7c0459faf2	community: Update costs of openai finetuned models (#22124 ) - Description: Update costs of finetuned models and add gpt-3-turbo-0125. Source: https://openai.com/api/pricing/ - Issue: N/A - Dependencies: None	2024-05-24 15:25:17 +00:00
Christophe Bornet	c838de5027	doc: Add doc for CassandraByteStore (#22126 ) Preview: https://langchain-git-fork-cbornet-doc-cassandrabytestore-langchain.vercel.app/v0.2/docs/integrations/stores/cassandra/	2024-05-24 10:57:55 -04:00
Eugene Yurtsev	2d693c484e	docs: fix some spelling mistakes caught by newest version of code spell (#22090 ) Going to merge this even though it doesn't pass all tests, and open a separate PR for the remaining spelling mistakes.	2024-05-23 16:59:11 -04:00
Pavel Zloi	fe26f937e4	community[minor]: ManticoreSearch engine added to vectorstore (#19117 ) Description: ManticoreSearch engine added to vectorstores Issue: no issue, just a new feature Dependencies: https://pypi.org/project/manticoresearch-dev/ Twitter handle: @EvilFreelancer - Example notebook with test integration: https://github.com/EvilFreelancer/langchain/blob/manticore-search-vectorstore/docs/docs/integrations/vectorstores/manticore_search.ipynb --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-23 13:56:18 -07:00
Philippe PRADOS	6dd621d636	community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957 ) Thank you for contributing to LangChain! - [ ] PR title: "Add CloudBlobLoader" - community: Add CloudBlobLoader - [ ] PR message: Add cloud blob loader - Description: Langchain provides several approaches to read different file formats: Specific loaders (`CVSLoader`) or blob-compatible loaders (`FileSystemBlobLoader`). The only implementation proposed for BlobLoader is `FileSystemBlobLoader`. Many projects retrieve files from cloud storage. We propose a new implementation of `BlobLoader` to read files from the three cloud storage systems. The interface is strictly identical to `FileSystemBlobLoader`. The only difference is the constructor, which takes a cloud "url" object such as `s3://my-bucket`, `az://my-bucket`, or `gs://my-bucket`. By streamlining the process, this novel implementation eliminates the requirement to pre-download files from cloud storage to local temporary files (which are seldom removed). The code relies on the [CloudPathLib](https://cloudpathlib.drivendata.org/stable/) library to interpret cloud URLs. This has been added as an optional dependency. ```Python loader = CloudBlobLoader("s3://mybucket/id") for blob in loader.yield_blobs(): print(blob) ``` - [X] Dependencies: CloudPathLib - [X] Twitter handle: pprados - [X] Add tests and docs: Add unit test, but it's easy to convert to integration test, with some files in a cloud storage (see `test_cloud_blob_loader.py`) - [X] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. Hello from Paris @hwchase17. Can you review this PR? --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-05-23 10:59:55 -04:00
Bruno Alvisio	5eabe90494	community[patch]: Adding HEADER to the list of supported locations (#21946 ) Description: adds headers to the list of supported locations when generating the openai function schema	2024-05-22 22:47:56 +00:00
Bagatur	50186da0a1	infra: rm unused # noqa violations (#22049 ) Updating #21137	2024-05-22 15:21:08 -07:00
acho98	45ed5f3f51	community[minor]: Add Clova Embeddings for LangChain Community (#21890 ) - [ ] PR title: "Add Naver ClovaX embedding to LangChain community" - HyperClovaX is a large language model developed by [Naver](https://clova-x.naver.com/welcome). It's a powerful and purpose-trained LLM. - You can visit the embedding service provided by [ClovaX](https://www.ncloud.com/product/aiService/clovaStudio) - You may get CLOVA_EMB_API_KEY, CLOVA_EMB_APIGW_API_KEY, CLOVA_EMB_APP_ID From https://www.ncloud.com/product/aiService/clovaStudio --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-22 22:08:47 +00:00
MSubik	d948783a4c	community[patch]: standardize init args, update for javelin sdk release. (#21980 ) Related to [20085](https://github.com/langchain-ai/langchain/issues/20085) Updated the Javelin chat model to standardize the initialization argument. Also fixed an existing bug, where code was initialized with incorrect call to the JavelinClient defined in the javelin_sdk, resulting in an initialization error. See related [Javelin Documentation](https://docs.getjavelin.io/docs/javelin-python/quickstart).	2024-05-22 21:47:28 +00:00
Mazen Ramadan	3c1d77dd64	community[minor]: Add Scrapfly Loader community integration (#22036 ) Added [Scrapfly](https://scrapfly.io/) Web Loader integration. Scrapfly is a web scraping API that allows extracting web page data into accessible markdown or text datasets. - __Description__: Added Scrapfly web loader for retrieving web page data as markdown or text. - Dependencies: scrapfly-sdk - Twitter: @thealchemi1st --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-22 21:29:13 +00:00
Eric Zhang	e7e41eaabe	langchain: add RankLLM Reranker (#21171 ) Integrate RankLLM reranker (https://github.com/castorini/rank_llm) into LangChain An example notebook is given in `docs/docs/integrations/retrievers/rankllm-reranker.ipynb` --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-05-22 20:12:55 +00:00
maang-h	fc93bed8c4	community: Fix CSVLoader columns is None (#20701 ) - Bug code: In langchain_community/document_loaders/csv_loader.py:100 - Description: currently, when 'CSVLoader' reads the column as None in the 'csv' file, it will report an error because the 'CSVLoader' does not verify whether the column is of str type and does not consider how to handle the corresponding 'row_data' when the column is' None 'in the csv. This pr provides a solution. - Issue: Fix #20699 - thinking: 1. Refer to the processing method for 'langchain_community/document_loaders/csv_loader.py:100' when 'v' equals'None', and apply the same method to 'k'. (Reference`csv.DictReader` ,'k' will only be None when ` len(columns) < len(number_row_data)` is established) 2. ‘k’ equals None only holds when it is the last column, and its corresponding 'v' type is a list. Therefore, I referred to the data format in 'Document' and used ',' to concatenated the elements in the list.(But I'm not sure if you accept this form, if you have any other ideas, communicate) --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>	2024-05-22 12:57:46 -07:00
Eugene Yurtsev	36813d2f00	community[patch]: Fix remaining __inits__ in community (#22037 ) Fixes the __init__ files in community to use __all__ which is statically defined.	2024-05-22 17:42:17 +00:00
Eugene Yurtsev	58360a1e53	community[patch]: Add unit test to verify that init is correctly defined (#22030 ) Fix some __init__ files and add a unit test	2024-05-22 17:19:00 +00:00
Eugene Yurtsev	8d82160a8a	community[patch]: Clean up logic in import checking unit test (#22026 ) Clean up unit test	2024-05-22 15:30:10 +00:00
Eugene Yurtsev	aed64daabb	community[patch]: Add unit test to catch bad __all__ definitions (#21996 ) This will catch all dynamic __all__ definitions.	2024-05-22 09:32:13 -04:00
Robert Caulk	54adcd9e82	community[minor]: add AskNews retriever and AskNews tool (#21581 ) We add a tool and retriever for the [AskNews](https://asknews.app) platform with example notebooks. The retriever can be invoked with: ```py from langchain_community.retrievers import AskNewsRetriever retriever = AskNewsRetriever(k=3) retriever.invoke("impact of fed policy on the tech sector") ``` To retrieve 3 documents in then news related to fed policy impacts on the tech sector. The included notebook also includes deeper details about controlling filters such as category and time, as well as including the retriever in a chain. The tool is quite interesting, as it allows the agent to decide how to obtain the news by forming a query and deciding how far back in time to look for the news: ```py from langchain_community.tools.asknews import AskNewsSearch from langchain import hub from langchain.agents import AgentExecutor, create_openai_functions_agent from langchain_openai import ChatOpenAI tool = AskNewsSearch() instructions = """You are an assistant.""" base_prompt = hub.pull("langchain-ai/openai-functions-template") prompt = base_prompt.partial(instructions=instructions) llm = ChatOpenAI(temperature=0) asknews_tool = AskNewsSearch() tools = [asknews_tool] agent = create_openai_functions_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, ) agent_executor.invoke({"input": "How is the tech sector being affected by fed policy?"}) ``` --------- Co-authored-by: Emre <e@emre.pm>	2024-05-20 18:23:06 -07:00
Jesse S	fc79b372cb	community[minor]: add aerospike vectorstore integration (#21735 ) Please let me know if you see any possible areas of improvement. I would very much appreciate your constructive criticism if time allows. Description: - Added a aerospike vector store integration that utilizes [Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/) add-on. - Added both unit tests and integration tests - Added a docker compose file for spinning up a test environment - Added a notebook Dependencies: any dependencies required for this change - aerospike-vector-search Twitter handle: - No twitter, you can use my GitHub handle or LinkedIn if you'd like Thanks! --------- Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-05-21 01:01:47 +00:00
Eugene Yurtsev	8607735b80	langchain[patch],community[patch]: Move unit tests that depend on community to community (#21685 )	2024-05-16 17:24:27 -04:00
Kyle Cassidy	eca8c4bcc6	Standardized openai init params (#21739 ) ## Patch Summary community:openai[patch]: standardize init args ## Details I made changes to the OpenAI Chat API wrapper test in the Langchain open-source repository - File: `libs/community/tests/unit_tests/chat_models/test_openai.py` - Changes: - Updated `max_retries` with Pydantic Field - Updated the corresponding unit test - Related Issues: #20085 - Updated max_retries with Pydantic Field, updated the unit test. --------- Co-authored-by: JuHyung Son <sonju0427@gmail.com>	2024-05-16 16:30:52 +00:00
Harrison Chase	15be439719	Harrison/move flashrank rerank (#21448 ) third party integration, should be in community	2024-05-15 13:08:52 -07:00
Rajendra Kadam	54e003268e	langchain[minor]: Add PebbloRetrievalQA chain with Identity & Semantic Enforcement support (#20641 ) - Description: PebbloRetrievalQA chain introduces identity enforcement using vector-db metadata filtering - Dependencies: None - Issue: None - Documentation: Adding documentation for PebbloRetrievalQA chain in a separate PR(https://github.com/langchain-ai/langchain/pull/20746) - Unit tests: New unit-tests added --------- Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>	2024-05-15 13:14:52 +00:00
Eugene Yurtsev	25fbe356b4	community[patch]: upgrade to recent version of mypy (#21616 ) This PR upgrades community to a recent version of mypy. It inserts type: ignore on all existing failures.	2024-05-13 14:55:07 -04:00
ccurme	3bb9bec314	bedrock: add unit test for retriever (#21485 ) This was implemented in https://github.com/langchain-ai/langchain/pull/21349 but dropped before merge.	2024-05-09 11:37:03 -04:00
Yash	cb31c3611f	Ndb enterprise (#21233 ) Description: Adds NeuralDBClientVectorStore to the langchain, which is our enterprise client. --------- Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>	2024-05-08 16:30:58 -07:00
Sokolov Fedor	f4ddf64faa	community: Add MarkdownifyTransformer to langchain_community.document_transformers (#21247 ) - Added new document_transformer: MarkdonifyTransformer, that uses `markdonify` package with customizable options to convert HTML to Markdown. It's similar to Html2TextTransformer, but has more flexible options and also I've noticed that sometimes MarkdownifyTransformer performs better than html2text one, so that's why I use markdownify on my project. - Added docs and tests - Usage: ```python from langchain_community.document_transformers import MarkdownifyTransformer markdownify = MarkdownifyTransformer() docs_transform = markdownify.transform_documents(docs) ``` - Example of better performance on simple task, that I've noticed: ``` <html> <head><title>Reports on product movement</title></head> <body> <p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p> </body> ``` Html2TextTransformer: ```python [Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')] # Here we can see 'and\ncontrolling', which has extra '\n' in it ``` MarkdownifyTranformer: ```python [Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')] ``` --------- Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local> Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>	2024-05-08 14:45:13 -07:00
Eugene Yurtsev	f92006de3c	multiple: langchain 0.2 in master (#21191 ) 0.2rc migrations - [x] Move memory - [x] Move remaining retrievers - [x] graph_qa chains - [x] some dependency from evaluation code potentially on math utils - [x] Move openapi chain from `langchain.chains.api.openapi` to `langchain_community.chains.openapi` - [x] Migrate `langchain.chains.ernie_functions` to `langchain_community.chains.ernie_functions` - [x] migrate `langchain/chains/llm_requests.py` to `langchain_community.chains.llm_requests` - [x] Moving `langchain_community.cross_enoders.base:BaseCrossEncoder` -> `langchain_community.retrievers.document_compressors.cross_encoder:BaseCrossEncoder` (namespace not ideal, but it needs to be moved to `langchain` to avoid circular deps) - [x] unit tests langchain -- add pytest.mark.community to some unit tests that will stay in langchain - [x] unit tests community -- move unit tests that depend on community to community - [x] mv integration tests that depend on community to community - [x] mypy checks Other todo - [x] Make deprecation warnings not noisy (need to use warn deprecated and check that things are implemented properly) - [x] Update deprecation messages with timeline for code removal (likely we actually won't be removing things until 0.4 release) -- will give people more time to transition their code. - [ ] Add information to deprecation warning to show users how to migrate their code base using langchain-cli - [ ] Remove any unnecessary requirements in langchain (e.g., is SQLALchemy required?) --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-05-08 16:46:52 -04:00
Eugene Yurtsev	6a1d61dbf1	community[patch]: Fix in memory vectorstore to take into account ids when adding docs (#21384 ) Should respect `ids` if passed	2024-05-07 15:05:16 -04:00
nrpd25	95cc8e3fc3	premai[patch]:Standardized model init args (#21308 ) [Standardized model init args #20085](https://github.com/langchain-ai/langchain/issues/20085) - Enable premai chat model to be initialized with `model_name` as an alias for `model`, `api_key` as an alias for `premai_api_key`. - Add initialization test `test_premai_initialization` --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>	2024-05-06 18:12:29 -04:00

... 2 3 4 5 6 ...

444 Commits