When `ToolNode` hyperlink is clicked, it does not automatically scroll
to the section due to incorrect reference to the heading / id in the
LangGraph documentation
Think it's worth adding a quick guide and including in the table in the
concepts page. `StrOutputParser` can make it easier to deal with the
union type for message content. For example, ChatAnthropic with bound
tools will generate string content if there are no tool calls and
`list[dict]` content otherwise.
I'm also considering removing the output parser section from the
["quickstart"
tutorial](https://python.langchain.com/docs/tutorials/llm_chain/); we
can link to this guide instead.
Corrections and Suggestions for Migrating LangChain Code Documentation
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Edited mainly the `Concepts` section in the LangChain documentation.
Overview:
* Updated some explanations to make the point more clear / Add missing
words for some documentations.
* Rephrased some sentences to make it shorter and more concise.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Last week Anthropic released version 0.39.0 of its python sdk, which
enabled support for Python 3.13. This release deleted a legacy
`client.count_tokens` method, which we currently access during init of
the `Anthropic` LLM. Anthropic has replaced this functionality with the
[client.beta.messages.count_tokens()
API](https://github.com/anthropics/anthropic-sdk-python/pull/726).
To enable support for `anthropic >= 0.39.0` and Python 3.13, here we
drop support for the legacy token counting method, and add support for
the new method via `ChatAnthropic.get_num_tokens_from_messages`.
To fully support the token counting API, we update the signature of
`get_num_tokens_from_message` to accept tools everywhere.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Changed "demon" to "demo" in the code comment for clarity.
PR Title
docs: Fix typo in Tavily Search example
PR Message
Description:
This PR fixes a typo in the code comment of the Tavily Search
documentation. Changed "demon" to "demo" for clarity and to avoid
confusion.
Issue:
No specific issue was mentioned, but this is a minor improvement in
documentation.
Dependencies:
No additional dependencies required.
fix: correct grammar in documentation for streaming modes
Updated sentence to clarify usage of "choose" in "When using the stream
and astream methods with LangGraph, you can choose one or more streaming
modes..." for better readability.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
…n' to 'written' in 'Emit custom output written using LangGraph’s
StreamWriter.'
### Changes:
- Corrected the typo in the phrase 'Emit custom output witten using
LangGraph’s StreamWriter.' to 'Emit custom output written using
LangGraph’s StreamWriter.'
- Enhanced the clarity of the documentation surrounding LangGraph’s
streaming modes, specifically around the StreamWriter functionality.
- Provided additional context and emphasis on the role of the
StreamWriter class in handling custom output.
### Issue Reference:
- GitHub issue: https://github.com/langchain-ai/langchain/issues/28051
This update addresses the issue raised regarding the incorrect spelling
and aims to improve the clarity of the streaming mode documentation for
better user understanding.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**:
**Description:**
Fixed a typo in the documentation for streaming modes, changing "witten"
to "written" in the phrase "Emit custom output witten using LangGraph’s
StreamWriter."
**Issue:**
This PR addresses and fixes the typo in the documentation referenced in
[#28051](https://github.com/langchain-ai/langchain/issues/28051).
**Issue:**
This PR addresses and fixes the typo in the documentation referenced in
[#28051](https://github.com/langchain-ai/langchain/issues/28051).
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
This PR modifies the documentation regarding the configuration of the
VLLM with the LoRA adapter. The updates aim to provide clear
instructions for users on how to set up the LoRA adapter when using the
VLLM.
- before
```python
VLLM(..., enable_lora=True)
```
- after
```python
VLLM(...,
vllm_kwargs={
"enable_lora": True
}
)
```
This change clarifies that users should use the vllm_kwargs to enable
the LoRA adapter.
Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
Updated the phrasing and reasoning on the "abstraction not receiving
much development" part of the documentation
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Description:
* When working with OpenSearchVectorSearch to make
OpenSearchGraphVectorStore (coming soon), I noticed that there wasn't
type hinting for the underlying OpenSearch clients. This fixes that
issue.
* Confirmed tests are still passing with code changes.
Note that there is some additional code duplication now, but I think
this approach is cleaner overall.
## Description
As proposed in our earlier discussion #26977 we have introduced a Google
Books API Tool that leverages the Google Books API found at
[https://developers.google.com/books/docs/v1/using](https://developers.google.com/books/docs/v1/using)
to generate book recommendations.
### Sample Usage
```python
from langchain_community.tools import GoogleBooksQueryRun
from langchain_community.utilities import GoogleBooksAPIWrapper
api_wrapper = GoogleBooksAPIWrapper()
tool = GoogleBooksQueryRun(api_wrapper=api_wrapper)
tool.run('ai')
```
### Sample Output
```txt
Here are 5 suggestions based off your search for books related to ai:
1. "AI's Take on the Stigma Against AI-Generated Content" by Sandy Y. Greenleaf: In a world where artificial intelligence (AI) is rapidly advancing and transforming various industries, a new form of content creation has emerged: AI-generated content. However, despite its potential to revolutionize the way we produce and consume information, AI-generated content often faces a significant stigma. "AI's Take on the Stigma Against AI-Generated Content" is a groundbreaking book that delves into the heart of this issue, exploring the reasons behind the stigma and offering a fresh, unbiased perspective on the topic. Written from the unique viewpoint of an AI, this book provides readers with a comprehensive understanding of the challenges and opportunities surrounding AI-generated content. Through engaging narratives, thought-provoking insights, and real-world examples, this book challenges readers to reconsider their preconceptions about AI-generated content. It explores the potential benefits of embracing this technology, such as increased efficiency, creativity, and accessibility, while also addressing the concerns and drawbacks that contribute to the stigma. As you journey through the pages of this book, you'll gain a deeper understanding of the complex relationship between humans and AI in the realm of content creation. You'll discover how AI can be used as a tool to enhance human creativity, rather than replace it, and how collaboration between humans and machines can lead to unprecedented levels of innovation. Whether you're a content creator, marketer, business owner, or simply someone curious about the future of AI and its impact on our society, "AI's Take on the Stigma Against AI-Generated Content" is an essential read. With its engaging writing style, well-researched insights, and practical strategies for navigating this new landscape, this book will leave you equipped with the knowledge and tools needed to embrace the AI revolution and harness its potential for success. Prepare to have your assumptions challenged, your mind expanded, and your perspective on AI-generated content forever changed. Get ready to embark on a captivating journey that will redefine the way you think about the future of content creation.
Read more at https://play.google.com/store/books/details?id=4iH-EAAAQBAJ&source=gbs_api
2. "AI Strategies For Web Development" by Anderson Soares Furtado Oliveira: From fundamental to advanced strategies, unlock useful insights for creating innovative, user-centric websites while navigating the evolving landscape of AI ethics and security Key Features Explore AI's role in web development, from shaping projects to architecting solutions Master advanced AI strategies to build cutting-edge applications Anticipate future trends by exploring next-gen development environments, emerging interfaces, and security considerations in AI web development Purchase of the print or Kindle book includes a free PDF eBook Book Description If you're a web developer looking to leverage the power of AI in your projects, then this book is for you. Written by an AI and ML expert with more than 15 years of experience, AI Strategies for Web Development takes you on a transformative journey through the dynamic intersection of AI and web development, offering a hands-on learning experience.The first part of the book focuses on uncovering the profound impact of AI on web projects, exploring fundamental concepts, and navigating popular frameworks and tools. As you progress, you'll learn how to build smart AI applications with design intelligence, personalized user journeys, and coding assistants. Later, you'll explore how to future-proof your web development projects using advanced AI strategies and understand AI's impact on jobs. Toward the end, you'll immerse yourself in AI-augmented development, crafting intelligent web applications and navigating the ethical landscape.Packed with insights into next-gen development environments, AI-augmented practices, emerging realities, interfaces, and security governance, this web development book acts as your roadmap to staying ahead in the AI and web development domain. What you will learn Build AI-powered web projects with optimized models Personalize UX dynamically with AI, NLP, chatbots, and recommendations Explore AI coding assistants and other tools for advanced web development Craft data-driven, personalized experiences using pattern recognition Architect effective AI solutions while exploring the future of web development Build secure and ethical AI applications following TRiSM best practices Explore cutting-edge AI and web development trends Who this book is for This book is for web developers with experience in programming languages and an interest in keeping up with the latest trends in AI-powered web development. Full-stack, front-end, and back-end developers, UI/UX designers, software engineers, and web development enthusiasts will also find valuable information and practical guidelines for developing smarter websites with AI. To get the most out of this book, it is recommended that you have basic knowledge of programming languages such as HTML, CSS, and JavaScript, as well as a familiarity with machine learning concepts.
Read more at https://play.google.com/store/books/details?id=FzYZEQAAQBAJ&source=gbs_api
3. "Artificial Intelligence for Students" by Vibha Pandey: A multifaceted approach to develop an understanding of AI and its potential applications KEY FEATURES ● AI-informed focuses on AI foundation, applications, and methodologies. ● AI-inquired focuses on computational thinking and bias awareness. ● AI-innovate focuses on creative and critical thinking and the Capstone project. DESCRIPTION AI is a discipline in Computer Science that focuses on developing intelligent machines, machines that can learn and then teach themselves. If you are interested in AI, this book can definitely help you prepare for future careers in AI and related fields. The book is aligned with the CBSE course, which focuses on developing employability and vocational competencies of students in skill subjects. The book is an introduction to the basics of AI. It is divided into three parts – AI-informed, AI-inquired and AI-innovate. It will help you understand AI's implications on society and the world. You will also develop a deeper understanding of how it works and how it can be used to solve complex real-world problems. Additionally, the book will also focus on important skills such as problem scoping, goal setting, data analysis, and visualization, which are essential for success in AI projects. Lastly, you will learn how decision trees, neural networks, and other AI concepts are commonly used in real-world applications. By the end of the book, you will develop the skills and competencies required to pursue a career in AI. WHAT YOU WILL LEARN ● Get familiar with the basics of AI and Machine Learning. ● Understand how and where AI can be applied. ● Explore different applications of mathematical methods in AI. ● Get tips for improving your skills in Data Storytelling. ● Understand what is AI bias and how it can affect human rights. WHO THIS BOOK IS FOR This book is for CBSE class XI and XII students who want to learn and explore more about AI. Basic knowledge of Statistical concepts, Algebra, and Plotting of equations is a must. TABLE OF CONTENTS 1. Introduction: AI for Everyone 2. AI Applications and Methodologies 3. Mathematics in Artificial Intelligence 4. AI Values (Ethical Decision-Making) 5. Introduction to Storytelling 6. Critical and Creative Thinking 7. Data Analysis 8. Regression 9. Classification and Clustering 10. AI Values (Bias Awareness) 11. Capstone Project 12. Model Lifecycle (Knowledge) 13. Storytelling Through Data 14. AI Applications in Use in Real-World
Read more at https://play.google.com/store/books/details?id=ptq1EAAAQBAJ&source=gbs_api
4. "The AI Book" by Ivana Bartoletti, Anne Leslie and Shân M. Millie: Written by prominent thought leaders in the global fintech space, The AI Book aggregates diverse expertise into a single, informative volume and explains what artifical intelligence really means and how it can be used across financial services today. Key industry developments are explained in detail, and critical insights from cutting-edge practitioners offer first-hand information and lessons learned. Coverage includes: · Understanding the AI Portfolio: from machine learning to chatbots, to natural language processing (NLP); a deep dive into the Machine Intelligence Landscape; essentials on core technologies, rethinking enterprise, rethinking industries, rethinking humans; quantum computing and next-generation AI · AI experimentation and embedded usage, and the change in business model, value proposition, organisation, customer and co-worker experiences in today’s Financial Services Industry · The future state of financial services and capital markets – what’s next for the real-world implementation of AITech? · The innovating customer – users are not waiting for the financial services industry to work out how AI can re-shape their sector, profitability and competitiveness · Boardroom issues created and magnified by AI trends, including conduct, regulation & oversight in an algo-driven world, cybersecurity, diversity & inclusion, data privacy, the ‘unbundled corporation’ & the future of work, social responsibility, sustainability, and the new leadership imperatives · Ethical considerations of deploying Al solutions and why explainable Al is so important
Read more at http://books.google.ca/books?id=oE3YDwAAQBAJ&dq=ai&hl=&source=gbs_api
5. "Artificial Intelligence in Society" by OECD: The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises.
Read more at https://play.google.com/store/books/details?id=eRmdDwAAQBAJ&source=gbs_api
```
## Issue
This closes#27276
## Dependencies
No additional dependencies were added
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
all template installs will now have to declare `--branch v0.2` to make
clear they aren't compatible with langchain 0.3 (most have a pydantic v1
setup). e.g.
```
langchain-cli app add pirate-speak --branch v0.2
```
Thank you for contributing to LangChain!
- [x] **PR title**: "community: chat models wrapper for Cloudflare
Workers AI"
- [x] **PR message**:
- **Description:** Add chat models wrapper for Cloudflare Workers AI.
Enables Langgraph intergration via ChatModel for tool usage, agentic
usage.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This PR adds functionality to pass in in-memory bytes
as a source to `AzureAIDocumentIntelligenceLoader`.
- **Issue:** I needed the functionality, so I added it.
- **Dependencies:** NA
- **Twitter handle:** @akseljoonas if this is a big enough change :)
---------
Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
# OCR-based PDF loader
This implements [Zerox](https://github.com/getomni-ai/zerox) PDF
document loader.
Zerox utilizes simple but very powerful (even though slower and more
costly) approach to parsing PDF documents: it converts PDF to series of
images and passes it to a vision model requesting the contents in
markdown.
It is especially suitable for complex PDFs that are not parsed well by
other alternatives.
## Example use:
```python
from langchain_community.document_loaders.pdf import ZeroxPDFLoader
os.environ["OPENAI_API_KEY"] = "" ## your-api-key
model = "gpt-4o-mini" ## openai model
pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf"
loader = ZeroxPDFLoader(file_path=pdf_url, model=model)
docs = loader.load()
```
The Zerox library supports wide range of provides/models. See Zerox
documentation for details.
- **Dependencies:** `zerox`
- **Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
## Description
This PR adds support for Memcached as a usable LLM model cache by adding
the ```MemcachedCache``` implementation relying on the
[pymemcache](https://github.com/pinterest/pymemcache) client.
Unit test-wise, the new integration is generally covered under existing
import testing. All new functionality depends on pymemcache if
instantiated and used, so to comply with the other cache implementations
the PR also adds optional integration tests for ```MemcachedCache```.
Since this is a new integration, documentation is added for Memcached as
an integration and as an LLM Cache.
## Issue
This PR closes#27275 which was originally raised as a discussion in
#27035
## Dependencies
There are no new required dependencies for langchain, but
[pymemcache](https://github.com/pinterest/pymemcache) is required to
instantiate the new ```MemcachedCache```.
## Example Usage
```python3
from langchain.globals import set_llm_cache
from langchain_openai import OpenAI
from langchain_community.cache import MemcachedCache
from pymemcache.client.base import Client
llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)
set_llm_cache(MemcachedCache(Client('localhost')))
# The first time, it is not yet in cache, so it should take longer
llm.invoke("Which city is the most crowded city in the USA?")
# The second time it is, so it goes faster
llm.invoke("Which city is the most crowded city in the USA?")
```
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Update UC toolkit documentation to show an example of
using recommended LangGraph agent APIs before the existing LangChain
AgentExecutor example. Tested by manually running the updated example
notebook
- **Dependencies:** No new dependencies
---------
Signed-off-by: Sid Murching <sid.murching@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
## What this PR does?
### Currently `O365BaseLoader` (and consequently both derived loaders)
are limited to `pdf`, `doc`, `docx` files.
- **Solution: here we introduce _handlers_ attribute that allows for
custom handlers to be passed in. This is done in _dict_ form:**
**Example:**
```python
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
# PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749
from langchain_community.document_loaders.excel import UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")
# create dictionary mapping file types to handlers (parsers)
handlers = {
"doc": MsWordParser()
"pdf": PDFMinerParser()
"txt": TextParser()
"xlsx": xlsx_parser
}
loader = SharePointLoader(document_library_id="...",
handlers=handlers # pass handlers to SharePointLoader
)
documents = loader.load()
# works the same in OneDriveLoader
loader = OneDriveLoader(document_library_id="...",
handlers=handlers
)
```
This dictionary is then passed to `MimeTypeBasedParser` same as in the
[current
implementation](5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)).
### Currently `SharePointLoader` and `OneDriveLoader` are separate
loaders that both inherit from `O365BaseLoader`
However both of these implement the same functionality. The only
differences are:
- `SharePointLoader` requires argument `document_library_id` whereas
`OneDriveLoader` requires `drive_id`. These are just different names for
the same thing.
- `SharePointLoader` implements significantly more features.
- **Solution: `OneDriveLoader` is replaced with an empty shell just
renaming `drive_id` to `document_library_id` and inheriting from
`SharePointLoader`**
**Dependencies:** None
**Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **description**
langchain_core.runnables.graph_mermaid.draw_mermaid_png calls this
function, but the Mermaid API returns JPEG by default. To be consistent,
add the option `file_type` with the default `png` type.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
With this small change, I didn't add tests and docs.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more:
One long sentence was divided into two.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Now `encode_kwargs` used for both for documents and queries and this
leads to wrong embeddings. E. g.:
```python
model_kwargs = {"device": "cuda", "trust_remote_code": True}
encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"}
model = HuggingFaceEmbeddings(
model_name="dunzhang/stella_en_400M_v5",
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs,
)
query_embedding = np.array(
model.embed_query("What are some ways to reduce stress?",)
)
document_embedding = np.array(
model.embed_documents(
[
"There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.",
"Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.",
]
)
)
print(model._client.similarity(query_embedding, document_embedding)) # output: tensor([[0.8421, 0.3317]], dtype=torch.float64)
```
But from the [model
card](https://huggingface.co/dunzhang/stella_en_400M_v5#sentence-transformers)
expexted like this:
```python
model_kwargs = {"device": "cuda", "trust_remote_code": True}
encode_kwargs = {"normalize_embeddings": False}
query_encode_kwargs = {"normalize_embeddings": False, "prompt_name": "s2p_query"}
model = HuggingFaceEmbeddings(
model_name="dunzhang/stella_en_400M_v5",
model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs,
query_encode_kwargs=query_encode_kwargs,
)
query_embedding = np.array(
model.embed_query("What are some ways to reduce stress?", )
)
document_embedding = np.array(
model.embed_documents(
[
"There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.",
"Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.",
]
)
)
print(model._client.similarity(query_embedding, document_embedding)) # tensor([[0.8398, 0.2990]], dtype=torch.float64)
```
- **Description:** Adding in the first pass of documentation for the CDP
Agentkit Toolkit
- **Issue:** N/a
- **Dependencies:** cdp-langchain
- **Twitter handle:** @CoinbaseDev
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: John Peterson <john.peterson@coinbase.com>
…Toolkit" in "playwright.ipynb" integration.
- Completed the incomplete sentence in the Langchain Playwright
documentation.
- Enhanced documentation clarity to guide users on best practices for
instantiating browser instances with Langchain Playwright.
Example before:
> "It's always recommended to instantiate using the from_browser method
so that the
Example after:
> "It's always recommended to instantiate using the `from_browser`
method so that the browser context is properly initialized and managed,
ensuring seamless interaction and resource optimization."
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** change to do the batch embedding server side and not
client side
- **Twitter handle:** @wildagsx
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Description:
This fixes an issue that mistakenly created in
https://github.com/langchain-ai/langchain/pull/27253. The issue
currently exists only in `langchain-community==0.3.4`.
Test cases were added to prevent this issue in the future.
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This PR addresses an issue in the CSVLoader example where data is not
defined, causing a NameError. The line `data = loader.load()` is added
to correctly assign the output of loader.load() to the data variable.
- **Description:**
Currently CommaSeparatedListOutputParser can't handle strings that may
contain commas within a column. It would parse any commas as the
delimiter.
Ex.
"foo, foo2", "bar", "baz"
It will create 4 columns: "foo", "foo2", "bar", "baz"
This should be 3 columns:
"foo, foo2", "bar", "baz"
- **Dependencies:**
Added 2 additional imports, but they are built in python packages.
import csv
from io import StringIO
- **Twitter handle:** @jkyamog
- [ ] **Add tests and docs**:
1. added simple unit test test_multiple_items_with_comma
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
Update references in Databricks integration page to reference our new
partner package databricks-langchain
https://github.com/databricks/databricks-ai-bridge/tree/main/integrations/langchain
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "core: use friendlier names for duplicated nodes in
mermaid output"
- **Description:** When generating the Mermaid visualization of a chain,
if the chain had multiple nodes of the same type, the reid function
would replace their names with the UUID node_id. This made the generated
graph difficult to understand. This change deduplicates the nodes in a
chain by appending an index to their names.
- **Issue:** None
- **Discussion:**
https://github.com/langchain-ai/langchain/discussions/27714
- **Dependencies:** None
- [ ] **Add tests and docs**:
- Currently this functionality is not covered by unit tests, happy to
add tests if you'd like
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
# Example Code:
```python
from langchain_core.runnables import RunnablePassthrough
def fake_llm(prompt: str) -> str: # Fake LLM for the example
return "completion"
runnable = {
'llm1': fake_llm,
'llm2': fake_llm,
} | RunnablePassthrough.assign(
total_chars=lambda inputs: len(inputs['llm1'] + inputs['llm2'])
)
print(runnable.get_graph().draw_mermaid(with_styles=False))
```
# Before
```mermaid
graph TD;
Parallel_llm1_llm2_Input --> 0b01139db5ed4587ad37964e3a40c0ec;
0b01139db5ed4587ad37964e3a40c0ec --> Parallel_llm1_llm2_Output;
Parallel_llm1_llm2_Input --> a98d4b56bd294156a651230b9293347f;
a98d4b56bd294156a651230b9293347f --> Parallel_llm1_llm2_Output;
Parallel_total_chars_Input --> Lambda;
Lambda --> Parallel_total_chars_Output;
Parallel_total_chars_Input --> Passthrough;
Passthrough --> Parallel_total_chars_Output;
Parallel_llm1_llm2_Output --> Parallel_total_chars_Input;
```
# After
```mermaid
graph TD;
Parallel_llm1_llm2_Input --> fake_llm_1;
fake_llm_1 --> Parallel_llm1_llm2_Output;
Parallel_llm1_llm2_Input --> fake_llm_2;
fake_llm_2 --> Parallel_llm1_llm2_Output;
Parallel_total_chars_Input --> Lambda;
Lambda --> Parallel_total_chars_Output;
Parallel_total_chars_Input --> Passthrough;
Passthrough --> Parallel_total_chars_Output;
Parallel_llm1_llm2_Output --> Parallel_total_chars_Input;
```
PR title: “langchain: add batch request support for text-embedding-v3
model”
PR message:
• Description: This PR introduces batch request support for the
text-embedding-v3 model within LangChain. The new functionality allows
users to process multiple text inputs in a single request, improving
efficiency and performance for high-volume applications.
• Issue: This PR addresses #<issue_number> (if applicable).
• Dependencies: No new external dependencies are required for this
change.
• Twitter handle: If announced on Twitter, please mention me at
@yourhandle.
Add tests and docs:
1. Added unit tests to cover the batch request functionality, ensuring
it operates without requiring network access.
2. Included an example notebook demonstrating the batch request feature,
located in docs/docs/integrations.
Lint and test: All required formatting and linting checks have been
performed using make format and make lint. The changes have been
verified with make test to ensure compatibility.
Additional notes:
• The changes are fully backwards compatible.
• No modifications were made to pyproject.toml, ensuring no new
dependencies were added.
• The update only affects the langchain package and does not involve
other packages.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
…ING} {code: Neo.ClientNotification.Statement.FeatureDeprecationWarning}
{category: DEPRECATION} {title: This feature is deprecated and will be
removed in future versions.} {description: CALL subquery without a
variable scope clause is now deprecated." this warning
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: putao520 <putao520@putao282.com>
The metadata["source"] value for the web paths was being set to
temporary path (/tmp).
Fixed it by creating a new variable self.original_file_path, which will
store the original path.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
I will keep this PR as small as the changes made.
**Description:** fixes a fatal bug syntax error in
AzureCosmosDBNoSqlVectorSearch
**Issue:** #27269#25468
**Description:**
Update the wrapper to support the Polygon API if not you get an error. I
keeped `STOCKBUSINESS` for retro-compatbility with older endpoints /
other uses
Old Code:
```
if status not in ("OK", "STOCKBUSINESS"):
raise ValueError(f"API Error: {data}")
```
API Respond:
```
API Error: {'results': {'P': 0.22, 'S': 0, 'T': 'ZOM', 'X': 5, 'p': 0.123, 'q': 0, 's': 200, 't': 1729614422813395456, 'x': 1, 'z': 1}, 'status': 'STOCKSBUSINESS', 'request_id': 'XXXXXX'}
```
- **Issue:** N/A Polygon API update
- **Dependencies:** N/A
- **Twitter handle:** @wgcv
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** add missing tool_calls kwargs of delta message in
openai adapter, then tool call will work correctly via adapter's stream
chat completion
- **Issue:** Fixes
https://github.com/langchain-ai/langchain/issues/25436
- **Dependencies:** None
- **Description:** Change `MoonshotCommon.client` type from
`_MoonshotClient` to `Any`.
- **Issue:** Fix the issue #27058
- **Dependencies:** No
- **Twitter handle:** TaoWang2218
In PR #17100, the implementation for Moonshot was added, which defined
two classes:
- `MoonshotChat(MoonshotCommon, ChatOpenAI)` in
`langchain_community.chat_models.moonshot`;
- Here, `validate_environment()` assigns **client** as
`openai.OpenAI().chat.completions`
- Note that **client** here is actually a member variable defined in
`ChatOpenAI`;
- `MoonshotCommon` in `langchain_community.llms.moonshot`;
- And here, `validate_environment()` assigns **_client** as
`_MoonshotClient`;
- Note that this is the underscored **_client**, which is defined within
`MoonshotCommon` itself;
At this time, there was no conflict between the two, one being `client`
and the other `_client`.
However, in PR #25878 which fixed#24390, `_client` in `MoonshotCommon`
was changed to `client`. Since then, a conflict in the definition of
`client` has arisen between `MoonshotCommon` and `MoonshotChat`, which
caused `pydantic` validation error.
To fix this issue, the type of `client` in `MoonshotCommon` should be
changed to `Any`.
Signed-off-by: Tao Wang <twang2218@gmail.com>
**Description:**
I added code for lora_request in the community package, but I forgot to
add content to the VLLM page. So, I will do that now. #27731
---------
Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
Thank you for contributing to LangChain!
- **Description:** Adding an empty metadata field when metadata is not
present in the data
- **Issue:** This PR fixes the issue when the data items doesn't contain
the metadata field. This happens when there is already data in the
container, or cx uses CosmosDB Python SDK to insert data.
- **Dependencies:** No dependencies required
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- [x] **PR title**:
"community: OCI Generative AI tool calling bug fix
- [x] **PR message**:
- **Description:** bug fix for streaming chat responses with tool calls.
Update to PR 24693
- **Issue:** chat response content is repeated when streaming
- **Dependencies:** NA
- **Twitter handle:** NA
- [x] **Add tests and docs**: NA
- [x] **Lint and test**: make format, make lint and make test we run
successfully
---------
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
### About
- **Description:** In the Gitlab utilities used for the Gitlab tool
there is no check to prevent pushing to the main branch, as this is
already done for Github (for example here:
5a2cfb49e0/libs/community/langchain_community/utilities/github.py (L587)).
This PR add this check as already done for Github.
- **Issue:** None
- **Dependencies:** None
**Description:** Add support for Writer chat models
**Issue:** N/A
**Dependencies:** Add `writer-sdk` to optional dependencies.
**Twitter handle:** Please tag `@samjulien` and `@Get_Writer`
**Tests and docs**
- [x] Unit test
- [x] Example notebook in `docs/docs/integrations` directory.
**Lint and test**
- [x] Run `make format`
- [x] Run `make lint`
- [x] Run `make test`
---------
Co-authored-by: Johannes <tolstoy.work@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
- Fix bug in Replicate LLM class, where it was looking for parameter
names in a place where they no longer exist in pydantic 2, resulting in
the "Field required" validation error described in the issue.
- Fix Replicate LLM integration tests to:
- Use active models on Replicate.
- Use the correct model parameter `max_new_tokens` as shown in the
[Replicate
docs](https://replicate.com/docs/guides/language-models/how-to-use#minimum-and-maximum-new-tokens).
- Use callbacks instead of deprecated callback_manager.
**Issue:** #26937
**Dependencies:** n/a
**Twitter handle:** n/a
---------
Signed-off-by: Fayvor Love <fayvor@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
`ChatDatabricks` added support for structured output and JSON mode in
the last release. This PR updates the feature table accordingly.
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Thank you for contributing to LangChain!
**Description:** Added the model parameters to be passed in the OpenAI
Assistant. Enabled it at the `OpenAIAssistantV2Runnable` class.
**Issue:** NA
**Dependencies:** None
**Twitter handle:** luizf0992
Thank you for contributing to LangChain!
- **Description:** Add token_usage and model_name metadata to
ChatZhipuAI stream() and astream() response
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: jianfehuang <jianfehuang@tencent.com>
### Description/Issue:
I had problems filtering when setting up a local Milvus db and noticed
that the `filter` option in the `similarity_search` and
`similarity_search_with_score` appeared to do nothing. Instead, the
`expr` option should be used.
The `expr` option is correctly used in the retriever example further
down in the documentation.
The `expr` option seems to be correctly passed on, for example
[here](447c0dd2f0/libs/community/langchain_community/vectorstores/milvus.py (L701))
### Solution:
Update the documentation for the functions mentioned to show intended
behavior.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
- Add the `lora_request` parameter to the VLLM class to support LoRA
model configurations. This enhancement allows users to specify LoRA
requests directly when using VLLM, enabling more flexible and efficient
model customization.
**Issue:**
- No existing issue for `lora_adapter` in VLLM. This PR addresses the
need for configuring LoRA requests within the VLLM framework.
- Reference : [Using LoRA Adapters in
vLLM](https://docs.vllm.ai/en/stable/models/lora.html#using-lora-adapters)
**Example Code :**
Before this change, the `lora_request` parameter was not applied
correctly:
```python
ADAPTER_PATH = "/path/of/lora_adapter"
llm = VLLM(model="Bllossom/llama-3.2-Korean-Bllossom-3B",
max_new_tokens=512,
top_k=2,
top_p=0.90,
temperature=0.1,
vllm_kwargs={
"gpu_memory_utilization":0.5,
"enable_lora":True,
"max_model_len":1024,
}
)
print(llm.invoke(
["...prompt_content..."],
lora_request=LoRARequest("lora_adapter", 1, ADAPTER_PATH)
))
```
**Before Change Output:**
```bash
response was not applied lora_request
```
So, I attempted to apply the lora_adapter to
langchain_community.llms.vllm.VLLM.
**current output:**
```bash
response applied lora_request
```
**Dependencies:**
- None
**Lint and test:**
- All tests and lint checks have passed.
---------
Co-authored-by: Um Changyong <changyong.um@sfa.co.kr>
* **PR title**: "docs: Replaced langchain import with
langchain-nvidia-ai-endpoints in NVIDIA Endpoints Tab"
* **PR message**:
+ **Description:** Replaced the import of `langchain` with
`langchain-nvidia-ai-endpoints` in the NVIDIA Endpoints Tab to resolve
an error caused by the documentation attempting to import the generic
`langchain` module despite the targeted import.
+ **Issue:**
+ **Dependencies:** No additional dependencies introduced; simply
updated the existing import to a more specific module.
+ **Twitter handle:** https://x.com/nawaz0x1
* **Add tests and docs**:
+ **Applicability:** Not applicable in this case, as the change is a fix
to an existing integration rather than the addition of a new one.
+ **Rationale:** No new functionality or integrations are introduced,
only a corrective import change.
* **Lint and test**:
+ **Status:** Completed
+ **Outcome:**
- `make format`: **Passed**
- `make lint`: **Passed**
- `make test`: **Passed**

Thank you for contributing to LangChain!
Add notice of upcoming package consolidation of `langchain-databricks`
into `databricks-langchain`.
<img width="1047" alt="image"
src="https://github.com/user-attachments/assets/18eaa394-4e82-444b-85d5-7812be322674">
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Updated the documentation to fix some grammar errors
- **Description:** Some language errors exist in the documentation
- **Issue:** the issue # Changed the structure of some sentences
**PR Title**: `docs: fix typo in query analysis documentation`
**Description**: This PR corrects a typo on line 68 in the query
analysis documentation, changing **"pharsings"** to **"phrasings"** for
clarity and accuracy. Only one instance of the typo was fixed in the
last merge, and this PR fixes the second instance.
**Issue**: N/A
**Dependencies**: None
**Additional Notes**: No functional changes were made; this is a
documentation fix only.
Edited various notebooks in the tutorial section to fix:
* Grammatical Errors
* Improve Readability by changing the sentence structure or reducing
repeated words which bears the same meaning
* Edited a code block to follow the PEP 8 Standard
* Added more information in some sentences to make the concept more
clear and reduce ambiguity
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Description:**
This PR fixes an issue where non-ASCII characters in Pydantic field
descriptions were being escaped to their Unicode representations when
using `JsonOutputParser`. The change allows non-ASCII characters to be
preserved in the output, which is especially important for multilingual
support and when working with non-English languages.
**Issue:** Fixes#27256
**Example Code:**
```python
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser
class Article(BaseModel):
title: str = Field(description="科学文章的标题")
output_data_structure = Article
parser = JsonOutputParser(pydantic_object=output_data_structure)
print(parser.get_format_instructions())
```
**Previous Output**:
```... "title": {"description": "\\u79d1\\u5b66\\u6587\\u7ae0\\u7684\\u6807\\u9898", "title": "Title", "type": "string"}} ...```
**Current Output**:
```... "title": {"description": "科学文章的标题", "title": "Title", "type":
"string"}} ...```
**Changes made**:
- Modified `json.dumps()` call in
`langchain_core/output_parsers/json.py` to use `ensure_ascii=False`
- Added a unit test to verify Unicode handling
Co-authored-by: Harsimran-19 <harsimran1869@gmail.com>
**PR Title**: `docs: fix typo in query analysis documentation`
**Description**: This PR corrects a typo on line 68 in the query
analysis documentation, changing **"pharsings"** to **"phrasings"** for
clarity and accuracy.
**Issue**: N/A
**Dependencies**: None
**Additional Notes**: No functional changes were made; this is a
documentation fix only.
**Description:**
When annotating a function with the @tool decorator, the symbol should
have type BaseTool. The previous type annotations did not convey that to
type checkers. This patch creates 4 overloads for the tool function for
the 4 different use cases.
1. @tool decorator with no arguments
2. @tool decorator with only keyword arguments
3. @tool decorator with a name argument (and possibly keyword arguments)
4. Invoking tool as function with a name and runnable positional
arguments
The main function is updated to match the overloads. The changes are
100% backwards compatible (all existing calls should continue to work,
just with better type annotations).
**Twitter handle:** @nvachhar
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** Fixes None addition issues when an empty value is
passed on
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** We improve the performance of the InMemoryVectorStore.
**Isue:** Originally, similarity was computed document by document:
```
for doc in self.store.values():
vector = doc["vector"]
similarity = float(cosine_similarity([embedding], [vector]).item(0))
```
This is inefficient and does not make use of numpy vectorization.
This PR computes the similarity in one vectorized go:
```
docs = list(self.store.values())
similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])
```
**Dependencies:** None
**Twitter handle:** @b12_consulting, @Vincent_Min
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Thank you for contributing to LangChain!
- [X] **PR title**: DOC: Added notes in ipynb file to advice user to
upgrade package langchain_openai.
- [X]
Added notes from the issue report: to advise the user to upgrade
langchain_openai
Issue:
https://github.com/langchain-ai/langchain/issues/26616
- [ ] **Add tests and docs**:
- [ ] **Lint and test**:
- [ ]
---------
Co-authored-by: Libby Lin <libbylin@Libbys-MacBook-Pro.local>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Returns the document id along with the Vector Search
results
**Issue:** Fixes https://github.com/langchain-ai/langchain/issues/26860
for CouchbaseVectorStore
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR adds support to the how-to documentation for using AWS Bedrock
and Sagemaker Endpoints.
Because AWS services above dont presently use API keys to access LLMs
I've amended more of the source code than would normally be expected.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Reopened as a personal repo outside the organization.
## Description
- Naver HyperCLOVA X community package
- Add chat model & embeddings
- Add unit test & integration test
- Add chat model & embeddings docs
- I changed partner
package(https://github.com/langchain-ai/langchain/pull/24252) to
community package on this PR
- Could this
embeddings(https://github.com/langchain-ai/langchain/pull/21890) be
deprecated? We are trying to replace it with embedding
model(**ClovaXEmbeddings**) in this PR.
Twitter handle: None. (if needed, contact with
joonha.jeon@navercorp.com)
---
you can check our previous discussion below:
> one question on namespaces - would it make sense to have these in
.clova namespaces instead of .naver?
I would like to keep it as is, unless it is essential to unify the
package name.
(ClovaX is a branding for the model, and I plan to add other models and
components. They need to be managed as separate classes.)
> also, could you clarify the difference between ClovaEmbeddings and
ClovaXEmbeddings?
There are 3 models that are being serviced by embedding, and all are
supported in the current PR. In addition, all the functionality of CLOVA
Studio that serves actual models, such as distinguishing between test
apps and service apps, is supported. The existing PR does not support
this content because it is hard-coded.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Vadym Barda <vadym@langchain.dev>
## Description
I encountered an error while using the` gemma-2-2b-it model` with the
`HuggingFacePipeline` class and have implemented a fix to resolve this
issue.
### What is Problem
```python
model_id="google/gemma-2-2b-it"
gemma_2_model = AutoModelForCausalLM.from_pretrained(model_id)
gemma_2_tokenizer = AutoTokenizer.from_pretrained(model_id)
gen = pipeline(
task='text-generation',
model=gemma_2_model,
tokenizer=gemma_2_tokenizer,
max_new_tokens=1024,
device=0 if torch.cuda.is_available() else -1,
temperature=.5,
top_p=0.7,
repetition_penalty=1.1,
do_sample=True,
)
llm = HuggingFacePipeline(pipeline=gen)
for chunk in llm.stream("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World."):
print(chunk, end="", flush=True)
```
This code outputs the following error message:
```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
Exception in thread Thread-19 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1874, in generate
self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1266, in _validate_generated_length
raise ValueError(
ValueError: Input length of input_ids is 31, but `max_length` is set to 20. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
```
In addition, the following error occurs when the number of tokens is
reduced.
```python
for chunk in llm.stream("Hello World"):
print(chunk, end="", flush=True)
```
```
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1258: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1885: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
warnings.warn(
Exception in thread Thread-20 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2982, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 994, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 803, in forward
inputs_embeds = self.embed_tokens(input_ids)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 164, in forward
return F.embedding(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```
On the other hand, in the case of invoke, the output is normal:
```
llm.invoke("Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.")
```
```
'Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World. Hello World.\n\nThis is a simple program that prints the phrase "Hello World" to the console. \n\n**Here\'s how it works:**\n\n* **`print("Hello World")`**: This line of code uses the `print()` function, which is a built-in function in most programming languages (like Python). The `print()` function takes whatever you put inside its parentheses and displays it on the screen.\n* **`"Hello World"`**: The text within the double quotes (`"`) is called a string. It represents the message we want to print.\n\n\nLet me know if you\'d like to explore other programming concepts or see more examples! \n'
```
### Problem Analysis
- Apparently, I put kwargs in while generating pipelines and it applied
to `invoke()`, but it's not applied in the `stream()`.
- When using the stream, `inputs = self.pipeline.tokenizer (prompt,
return_tensors = "pt")` enters cpu.
- This can crash when the model is in gpu.
### Solution
Just use `self.pipeline` instead of `self.pipeline.model.generate`.
- **Original Code**
```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])
inputs = self.pipeline.tokenizer(prompt, return_tensors="pt")
streamer = TextIteratorStreamer(
self.pipeline.tokenizer,
timeout=60.0,
skip_prompt=skip_prompt,
skip_special_tokens=True,
)
generation_kwargs = dict(
inputs,
streamer=streamer,
stopping_criteria=stopping_criteria,
**pipeline_kwargs,
)
t1 = Thread(target=self.pipeline.model.generate, kwargs=generation_kwargs)
t1.start()
```
- **Updated Code**
```python
stopping_criteria = StoppingCriteriaList([StopOnTokens()])
streamer = TextIteratorStreamer(
self.pipeline.tokenizer,
timeout=60.0,
skip_prompt=skip_prompt,
skip_special_tokens=True,
)
generation_kwargs = dict(
text_inputs= prompt,
streamer=streamer,
stopping_criteria=stopping_criteria,
**pipeline_kwargs,
)
t1 = Thread(target=self.pipeline, kwargs=generation_kwargs)
t1.start()
```
By using the `pipeline` directly, the `kwargs` of the pipeline are
applied, and there is no need to consider the `device` of the `tensor`
made with the `tokenizer`.
> According to the change to use `pipeline`, it was modified to put
`text_inputs=prompts` directly into `generation_kwargs`.
## Issue
None
## Dependencies
None
## Twitter handle
None
---------
Co-authored-by: Vadym Barda <vadym@langchain.dev>
`strightforward` => `straightforward`
`adavanced` => `advanced`
`There a few challenges` => `There are a few challenges`
Documentation Correction:
*
[`docs/docs/concepts/structured_output.mdx`]:
Corrected several typos in the sentence directing users to the API
reference.
Grammar correction needed in passthrough.ipynb
The sentence is:
"Now you've learned how to pass data through your chains to help to help
format the data flowing through your chains."
There's a redundant "to help", and it could be more succinctly written
as:
"Now you've learned how to pass data through your chains to help format
the data flowing through your chains."
Fixes#26009
Thank you for contributing to LangChain!
- [x] **PR title**: "docs: Correcting spelling mistake"
- [x] **PR message**:
- **Description:** Corrected spelling from "trianed" to "trained"
- **Issue:** the issue #26009
- **Dependencies:** NA
- **Twitter handle:** NA
- [ ] **Add tests and docs**: NA
- [ ] **Lint and test**:
Co-authored-by: Libby Lin <libbylin@Libbys-MacBook-Pro.local>
**Issue:** : https://github.com/langchain-ai/langchain/issues/22961
**Description:**
Previously, the documentation for `DuckDuckGoSearchResults` said that it
returns a JSON string, however the code returns a regular string that
can't be parsed as is.
for example running
```python
from langchain_community.tools import DuckDuckGoSearchResults
# Create a DuckDuckGo search instance
search = DuckDuckGoSearchResults()
# Invoke the search
result = search.invoke("Obama")
# Print the result
print(result)
# Print the type of the result
print("Result Type:", type(result))
```
will return
```
snippet: Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ..., title: Obamas to hit the campaign trail in first joint appearances with Harris, link: https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034, snippet: Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ..., title: Obamas set to hit campaign trail with Kamala Harris for first time, link: https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/, snippet: Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ..., title: Harris Will Join Michelle Obama and Barack Obama on Campaign Trail, link: https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html, snippet: Obama's leaving office was "a turning point," Mirsky said. "That was the last time anybody felt normal." A few feet over, a 64-year-old physics professor named Eric Swanson who had grown ..., title: Obama's reemergence on the campaign trail for Harris comes as he ..., link: https://www.cnn.com/2024/10/13/politics/obama-campaign-trail-harris-biden/index.html
Result Type: <class 'str'>
```
After the change in this PR, `DuckDuckGoSearchResults` takes an
additional `output_format = "list" | "json" | "string"` ("string" =
current behavior, default). For example, invoking
`DuckDuckGoSearchResults(output_format="list")` return a list of
dictionaries in the format
```
[{'snippet': '...', 'title': '...', 'link': '...'}, ...]
```
e.g.
```
[{'snippet': "Obama has in a sense been wrestling with Trump's impact since the real estate magnate broke onto the political stage in 2015. Trump's victory the next year, defeating Obama's secretary of ...", 'title': "Obama's fears about Trump drive his stepped-up campaigning", 'link': 'https://www.washingtonpost.com/politics/2024/10/18/obama-trump-anxiety-harris-campaign/'}, {'snippet': 'Harris will hold a campaign event with former President Barack Obama in Georgia next Thursday, the first time the pair has campaigned side by side, a senior campaign official said. A week from ...', 'title': 'Obamas to hit the campaign trail in first joint appearances with Harris', 'link': 'https://www.nbcnews.com/politics/2024-election/obamas-hit-campaign-trail-first-joint-appearances-harris-rcna176034'}, {'snippet': 'Item 1 of 3 Former U.S. first lady Michelle Obama and her husband, former U.S. President Barack Obama, stand on stage during Day 2 of the Democratic National Convention (DNC) in Chicago, Illinois ...', 'title': 'Obamas set to hit campaign trail with Kamala Harris for first time', 'link': 'https://www.reuters.com/world/us/obamas-set-hit-campaign-trail-with-kamala-harris-first-time-2024-10-18/'}, {'snippet': 'Barack and Michelle Obama will make their first campaign appearances alongside Kamala Harris at rallies in Georgia and Michigan. By Reid J. Epstein Reporting from Ashwaubenon, Wis. Here come the ...', 'title': 'Harris Will Join Michelle Obama and Barack Obama on Campaign Trail', 'link': 'https://www.nytimes.com/2024/10/18/us/politics/kamala-harris-michelle-obama-barack-obama.html'}]
Result Type: <class 'list'>
```
---------
Co-authored-by: vbarda <vadym@langchain.dev>
`Fore` => `For`
Documentation Correction:
*
[`docs/docs/concepts/async.mdx`](diffhunk://#diff-4959e81c20607c20c7a9c38db4405a687c5d94f24fc8220377701afeee7562b0L40-R40):
Corrected a typo from "Fore" to "For" in the sentence directing users to
the API reference.
- [ ] **Description:**
- pass the device_map into model_kwargs
- removing the unused device_map variable in the hf_pipeline function
call
- [ ] **Issue:** issue #13128
When using the from_model_id function to load a Hugging Face model for
text generation across multiple GPUs, the model defaults to loading on
the CPU despite multiple GPUs being available using the expected format
``` python
llm = HuggingFacePipeline.from_model_id(
model_id="model-id",
task="text-generation",
device_map="auto",
)
```
Currently, to enable multiple GPU , we have to pass in variable in this
format instead
``` python
llm = HuggingFacePipeline.from_model_id(
model_id="model-id",
task="text-generation",
device=None,
model_kwargs={
"device_map": "auto",
}
)
```
This issue arises due to improper handling of the device and device_map
parameters.
- [ ] **Explanation:**
1. In from_model_id, the model is created using model_kwargs and passed
as the model variable of the pipeline function. So at this moment, to
load the model with multiple GPUs, "device_map" needs to be set to
"auto" within model_kwargs. Otherwise, the model defaults to loading on
the CPU.
2. The device_map variable in from_model_id is not utilized correctly.
In the pipeline function's source code of tnansformer:
- The device_map variable is stored in the model_kwargs dictionary
(lines 867-878 of transformers/src/transformers/pipelines/\__init__.py).
```python
if device_map is not None:
......
model_kwargs["device_map"] = device_map
```
- The model is constructed with model_kwargs containing the device_map
value ONLY IF it is a string (lines 893-903 of
transformers/src/transformers/pipelines/\__init__.py).
```python
if isinstance(model, str) or framework is None:
model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
framework, model = infer_framework_load_model( ... , **model_kwargs, )
```
- Consequently, since a model object is already passed to the pipeline
function, the device_map variable from from_model_id is never used.
3. The device_map variable in from_model_id not only appears unused but
also causes errors. Without explicitly setting device=None, attempting
to load the model on multiple GPUs may result in the following error:
```
Device has 2 GPUs available. Provide device={deviceId} to
`from_model_id` to use available GPUs for execution. deviceId is -1
(default) for CPU and can be a positive integer associated with CUDA
device id.
Traceback (most recent call last):
File "foo.py", line 15, in <module>
llm = HuggingFacePipeline.from_model_id(
File
"foo\site-packages\langchain_huggingface\llms\huggingface_pipeline.py",
line 217, in from_model_id
pipeline = hf_pipeline(
File "foo\lib\site-packages\transformers\pipelines\__init__.py", line
1108, in pipeline
return pipeline_class(model=model, framework=framework, task=task,
**kwargs)
File "foo\lib\site-packages\transformers\pipelines\text_generation.py",
line 96, in __init__
super().__init__(*args, **kwargs)
File "foo\lib\site-packages\transformers\pipelines\base.py", line 835,
in __init__
raise ValueError(
ValueError: The model has been loaded with `accelerate` and therefore
cannot be moved to a specific device. Please discard the `device`
argument when creating your pipeline object.
```
This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (`device_map is not None and device < 0`) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU.
19eb82e68b/libs/community/langchain_community/llms/huggingface_pipeline.py (L204-L213)
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: vbarda <vadym@langchain.dev>
This PR introduces a new `azure_ad_async_token_provider` attribute to
the `AzureOpenAI` and `AzureChatOpenAI` classes in `partners/openai` and
`community` packages, given it's currently supported on `openai` package
as
[AsyncAzureADTokenProvider](https://github.com/openai/openai-python/blob/main/src/openai/lib/azure.py#L33)
type.
The reason for creating a new attribute is to avoid breaking changes.
Let's say you have an existing code that uses a `AzureOpenAI` or
`AzureChatOpenAI` instance to perform both sync and async operations.
The `azure_ad_token_provider` will work exactly as it is today, while
`azure_ad_async_token_provider` will override it for async requests.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
docs: "fix docker command"
- **Description**: The Redis chat message history component requires the
Redis Stack to create indexes. When using only Redis, the following
error occurs: "Unknown command 'FT.INFO', with args beginning with:
'chat_history'".
- **Twitter handle**: savar_bhasin
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This PR updates `CassandraGraphVectorStore` to be based off
`CassandraVectorStore`, instead of using a custom CQL implementation.
This allows users using a `CassandraVectorStore` to upgrade to a
`GraphVectorStore` without having to change their database schema or
re-embed documents.
This PR also updates the documentation of the `GraphVectorStore` base
class and contains native async implementations for the standard graph
methods: `traversal_search` and `mmr_traversal_search` in
`CassandraVectorStore`.
**Issue:** No issue number.
**Dependencies:** https://github.com/langchain-ai/langchain/pull/27078
(already-merged)
**Lint and test**:
- Lint and tests all pass, including existing
`CassandraGraphVectorStore` tests.
- Also added numerous additional tests based of the tests in
`langchain-astradb` which cover many more scenarios than the existing
tests for `Cassandra` and `CassandraGraphVectorStore`
** BREAKING CHANGE**
Note that this is a breaking change for existing users of
`CassandraGraphVectorStore`. They will need to wipe their database table
and restart.
However:
- The interfaces have not changed. Just the underlying storage
mechanism.
- Any one using `langchain_community.vectorstores.Cassandra` can instead
use `langchain_community.graph_vectorstores.CassandraGraphVectorStore`
and they will gain Graph capabilities without having to re-embed their
existing documents. This is the primary goal of this PR.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fixes#27411
**Description:** Adds `template_format` to the `ImagePromptTemplate`
class and updates passing in the `template_format` parameter from
ChatPromptTemplate instead of the hardcoded "f-string".
Also updated docs and typing related to `template_format` to be more
up-to-date and specific.
**Dependencies:** None
**Add tests and docs**: Added unit tests to validate fix. Needed to
update `test_chat` snapshot due to adding new attribute
`template_format` in `ImagePromptTemplate`.
---------
Co-authored-by: Vadym Barda <vadym@langchain.dev>
Thank you for contributing to LangChain!
- [ X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ X]
- **Issue:** issue #26941
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
docs:docs:tutorials:sql_qa.ipynb: fix typo
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
Fix typo in docs:docs:tutorials:sql_qa.ipynb
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** This PR fixes typos in
```
docs/docs/integrations/document_loaders/athena.ipynb
docs/docs/integrations/document_loaders/glue_catalog.ipynb
```
1. Move dependencies for running notebooks into monorepo poetry test
deps;
2. Add script to update cassettes for a single notebook;
3. Add cassettes for some how-to guides.
---
To update cassettes for a single notebook, run
`docs/scripts/update_cassettes.sh`. For example:
```
./docs/scripts/update_cassettes.sh docs/docs/how_to/binding.ipynb
```
Requires:
1. monorepo dev and test dependencies installed;
2. env vars required by notebook are set.
Note: How-to guides are not currently run in [scheduled
job](https://github.com/langchain-ai/langchain/actions/workflows/run_notebooks.yml).
Will add cassettes for more how-to guides in subsequent PRs before
adding them to scheduled job.
**Description**
This PR introduces the proxies parameter to the RecursiveUrlLoader
class, allowing the user to specify proxy servers for requests. This
update enables crawling through proxy servers, providing enhanced
flexibility for network configurations.
The key changes include:
1.Added an optional proxies parameter to the constructor (__init__).
2.Updated the documentation to explain the proxies parameter usage with
an example.
3.Modified the _get_child_links_recursive method to pass the proxies
parameter to the requests.get function.
**Sample Usage**
```python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader
proxies = {
"http": "http://localhost:1080",
"https": "http://localhost:1080",
}
url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies
)
docs = loader.load()
```
---------
Co-authored-by: root <root@thb>
**Description**:
This PR add support of clob/blob data type for oracle document loader,
clob/blob can only be read by oracledb package when connection is open,
so reformat code to process data before connection closes.
**Dependencies**:
oracledb package same as before. pip install oracledb
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
- This pull request addresses a bug in Langchain's VLLM integration,
where the use_beam_search parameter was erroneously passed to
SamplingParams. The SamplingParams class in vLLM does not support the
use_beam_search argument, which caused a TypeError.
- This PR introduces logic to filter out unsupported parameters,
ensuring that only valid parameters are passed to SamplingParams. As a
result, the integration now functions as expected without errors.
- The bug was reproduced by running the code sample from Langchain’s
documentation, which triggered the error due to the invalid parameter.
This fix resolves that error by implementing proper parameter filtering.
**VLLM Sampling Params Class:**
https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py
**Issue:**
I could not found an Issue that belongs to this. Fixes "TypeError:
Unexpected keyword argument 'use_beam_search'" error when using VLLM
from Langchain.
**Dependencies:**
None.
**Tests and Documentation**:
Tests:
No new functionality was added, but I tested the changes by running
multiple prompts through the VLLM integration with various parameter
configurations. All tests passed successfully without breaking
compatibility.
Docs
No documentation changes were necessary as this is a bug fix.
**Reproducing the Error:**
https://python.langchain.com/docs/integrations/llms/vllm/
The code sample from the original documentation can be used to reproduce
the error I got.
from langchain_community.llms import VLLM
llm = VLLM(
model="mosaicml/mpt-7b",
trust_remote_code=True, # mandatory for hf models
max_new_tokens=128,
top_k=10,
top_p=0.95,
temperature=0.8,
)
print(llm.invoke("What is the capital of France ?"))

This PR resolves the issue by ensuring that only valid parameters are
passed to SamplingParams.
**Description**: PR fixes some formatting errors in deprecation message
in the `langchain_community.vectorstores.pgvector` module, where it was
missing spaces between a few words, and one word was misspelled.
**Issue**: n/a
**Dependencies**: n/a
Signed-off-by: mpeveler@timescale.com
Co-authored-by: Erick Friis <erick@langchain.dev>
PR message:
Description:
This PR refactors the Arxiv API wrapper by extracting the Arxiv search
logic into a helper function (_fetch_results) to reduce code duplication
and improve maintainability. The helper function is used in methods like
get_summaries_as_docs, run, and lazy_load, streamlining the code and
making it easier to maintain in the future.
Issue:
This is a minor refactor, so no specific issue is being fixed.
Dependencies:
No new dependencies are introduced with this change.
Add tests and docs:
No new integrations were added, so no additional tests or docs are
necessary for this PR.
Lint and test:
I have run make format, make lint, and make test to ensure all checks
pass successfully.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR updates the integration with OCI data science model deployment
service.
- Update LLM to support streaming and async calls.
- Added chat model.
- Updated tests and docs.
- Updated `libs/community/scripts/check_pydantic.sh` since the use of
`@pre_init` is removed from existing integration.
- Updated `libs/community/extended_testing_deps.txt` as this integration
requires `langchain_openai`.
---------
Co-authored-by: MING KANG <ming.kang@oracle.com>
Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR updates the Firecrawl Document Loader to use the recently
released V1 API of Firecrawl.
**Key Updates:**
**Firecrawl V1 Integration:** Updated the document loader to leverage
the new Firecrawl V1 API for improved performance, reliability, and
developer experience.
**Map Functionality Added:** Introduced the map mode for more flexible
document loading options.
These updates enhance the integration and provide access to the latest
features of Firecrawl.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
Updated
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
twitter: @MaxHTran
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Not needed due to small change
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Max Tran <maxtra@amazon.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: yangzhao <yzahha980122@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Deprecated version of Chroma >=0.5.5 <0.5.12 due to a
serious correctness issue that caused some embeddings for deployments
with multiple collections to be lost (read more on the issue in Chroma
repo)
**Issue:** chroma-core/chroma#2922 (fixed by chroma-core/chroma##2923
and released in
[0.5.13](https://github.com/chroma-core/chroma/releases/tag/0.5.13))
**Dependencies:** N/A
**Twitter handle:** `@t_azarov`
Starting with Clickhouse version 24.8, a different type of configuration
has been introduced in the vectorized data ingestion, and if this
configuration occurs, an error occurs when generating the table. As can
be seen below:

---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Async invocation:
remove : from at the end of line
line 441 because there is not any structure block after it.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description**:
this PR enable VectorStore TLS and authentication (digest, basic) with
HTTP/2 for Infinispan server.
Based on httpx.
Added docker-compose facilities for testing
Added documentation
**Dependencies:**
requires `pip install httpx[http2]` if HTTP2 is needed
**Twitter handle:**
https://twitter.com/infinispan
**Description:** this PR adds a set of methods to deal with metadata
associated to the vector store entries. These, while essential to the
Graph-related extension of the `Cassandra` vector store, are also useful
in themselves. These are (all come in their sync+async versions):
- `[a]delete_by_metadata_filter`
- `[a]replace_metadata`
- `[a]get_by_document_id`
- `[a]metadata_search`
Additionally, a `[a]similarity_search_with_embedding_id_by_vector`
method is introduced to better serve the store's internal working (esp.
related to reranking logic).
**Issue:** no issue number, but now all Document's returned bear their
`.id` consistently (as a consequence of a slight refactoring in how the
raw entries read from DB are made back into `Document` instances).
**Dependencies:** (no new deps: packaging comes through langchain-core
already; `cassio` is now required to be version 0.1.10+)
**Add tests and docs**
Added integration tests for the relevant newly-introduced methods.
(Docs will be updated in a separate PR).
**Lint and test** Lint and (updated) test all pass.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Add timeout at client side for UCFunctionToolkit and add retry logic.
Users could specify environment variable
`UC_TOOL_CLIENT_EXECUTION_TIMEOUT` to increase the timeout value for
retrying to get the execution response if the status is pending. Default
timeout value is 120s.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Tested in Databricks:
<img width="1200" alt="image"
src="https://github.com/user-attachments/assets/54ab5dfc-5e57-4941-b7d9-bfe3f8ad3f62">
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Example updated for vectorstore ChromaDB.
If we want to apply multiple filters then ChromaDB supports filters like
this:
Reference: [ChromaDB
filters](https://cookbook.chromadb.dev/core/filters/)
Thank you.
**Docs Chatbot Tutorial**
The docs state that you can omit the language parameter, but the code
sample to demonstrate, still contains it.
Co-authored-by: Erick Friis <erick@langchain.dev>
initalize -> initialize
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
Co-authored-by: Erick Friis <erick@langchain.dev>
- [ ] **PR title**: docs: fix typo in SQLStore import path
- [ ] **PR message**:
- **Description:** This PR corrects a typo in the docstrings for the
class SQLStore(BaseStore[str, bytes]). The import path in the docstring
currently reads from langchain_rag.storage import SQLStore, which should
be changed to langchain_community.storage import SQLStore. This typo is
also reflected in the official documentation.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** N/A
Co-authored-by: Erick Friis <erick@langchain.dev>
Added missed provider pages. Added missed descriptions and links.
I fixed the Ipex-LLM titles, so the ToC is now sorted properly for these
titles.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
This PR fixes the context loss issue in `AsyncCallbackManager`,
specifically in `on_llm_start` and `on_chat_model_start` methods. It
properly honors the `run_inline` attribute of callback handlers,
preventing race conditions and ordering issues.
Key changes:
1. Separate handlers into inline and non-inline groups.
2. Execute inline handlers sequentially for each prompt.
3. Execute non-inline handlers concurrently across all prompts.
4. Preserve context for stateful handlers.
5. Maintain performance benefits for non-inline handlers.
**These changes are implemented in `AsyncCallbackManager` rather than
`ahandle_event` because the issue occurs at the prompt and message_list
levels, not within individual events.**
## Testing
- Test case implemented in #26857 now passes, verifying execution order
for inline handlers.
## Related Issues
- Fixes issue discussed in #23909
## Dependencies
No new dependencies are required.
---
@eyurtsev: This PR implements the discussed changes to respect
`run_inline` in `AsyncCallbackManager`. Please review and advise on any
needed changes.
Twitter handle: @parambharat
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Added `**kwargs` parameters to the `index` and `aindex` functions in
`libs/core/langchain_core/indexing/api.py`. This allows users to pass
additional arguments to the `add_documents` and `aadd_documents`
methods, enabling the specification of a custom `vector_field`. For
example, users can now use `vector_field="embedding"` when indexing
documents in `OpenSearchVectorStore`
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This commit addresses a typographical error in the documentation for the
async astream_events method. The word 'evens' was incorrectly used in
the introductory sentence for the reference table, which could lead to
confusion for users.\n\n### Changes Made:\n- Corrected 'Below is a table
that illustrates some evens that might be emitted by various chains.' to
'Below is a table that illustrates some events that might be emitted by
various chains.'\n\nThis enhancement improves the clarity of the
documentation and ensures accurate terminology is used throughout the
reference material.\n\nIssue Reference: #27107
Thank you for contributing to LangChain!
- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Update Spanner VS integration doc
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** NA
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
**Description:** Box AI can return responses, but it can also be
configured to return citations. This change allows the developer to
decide if they want the answer, the citations, or both. Regardless of
the combination, this is returned as a single List[Document] object.
**Dependencies:** Updated to the latest Box Python SDK, v1.5.1
**Twitter handle:** BoxPlatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Given the current erroring behavior, every time we've moved a kwarg from
model_kwargs and made it its own field that was a breaking change.
Updating this behavior to support the old instantiations /
serializations.
Assuming build_extra_kwargs was not something that itself is being used
externally and needs to be kept backwards compatible
This adds support for inject tool args that are arbitrary types when
used with pydantic 2.
We'll need to add similar logic on the v1 path, and potentially mirror
the config from the original model when we're doing the subset.
Thank you for contributing to LangChain!
- [x] **PR message**:
- Add Weaviate to the vector store list.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
The PR is an adjustment on few grammar adjustments on the page.
@leomofthings is my twitter handle
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** URL is appended with = which is not working
- **Issue:** removing the = symbol makes the URL valid
- **Twitter handle:** @arunprakash_com
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** prevent index function to re-index entire source
document even if nothing has changed.
- **Issue:** #22135
I worked on a solution to this issue that is a compromise between being
cheap and being fast.
In the previous code, when batch_size is greater than the number of docs
from a certain source almost the entire source is deleted (all documents
from that source except for the documents in the first batch)
My solution deletes documents from vector store and record manager only
if at least one document has changed for that source.
Hope this can help!
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
* [chore]: Agent Observation should be casted to string to avoid errors
* Merge branch 'master' into fix_observation_type_streaming
* [chore]: Using Json.dumps
* [chore]: Exact same logic as when casting agent oobservation to string
template_format is an init argument on ChatPromptTemplate but not an
attribute on the object so was getting shoved into
StructuredPrompt.structured_ouptut_kwargs
These allow converting linked documents (such as those used with
GraphVectorStore) to networkx for rendering and/or in-memory graph
algorithms such as community detection.
**Description:** Moves callback to before yield for `_stream` and
`_astream` function for the textgen model in the community llm package
**Issue:** #16913
**Description**:
Adds a vector store integration with
[sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to
sqlite-vss that is a single C file with no external dependencies.
Pretty straightforward, just copy-pasted the sqlite-vss integration and
made a few tweaks and added integration tests. Only question is whether
all documentation should be directed away from sqlite-vss if it is
defacto deprecated (cc @asg017).
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the gigachat model in the community llm package
**Issue:** #16913
This prevents `trim_messages` from raising an `IndexError` when invoked
with `include_system=True`, `strategy="last"`, and an empty message
list.
Fixes#26895
Dependencies: none
security scanners can't distinguish monorepo sources from each other.
this will resolve issues for folks trying to use e.g. langchain-core but
getting security issues from experimental flagged!
The `Without examples 😿` and `With examples 😻` should have different
outputs to illustrate their point.
See v0.2 docs.
https://python.langchain.com/docs/how_to/extraction_examples/#without-examples-
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** This pull request addresses the validation error in
`SettingsConfigDict` due to extra fields in the `.env` file. The issue
is prevalent across multiple Langchain modules. This fix ensures that
extra fields in the `.env` file are ignored, preventing validation
errors.
**Changes include:**
- Applied fixes to modules using `SettingsConfigDict`.
- **Issue:** NA, similar
https://github.com/langchain-ai/langchain/issues/26850
- **Dependencies:** NA
- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
**Description:** Moves yield to after callback for `_stream` and
`_astream` function for the deepsparse model in the community package
**Issue:** #16913
- this flag ensures the tracer always runs in the same thread as the run
being traced for both sync and async runs
- pro: less chance for ordering bugs and other oddities
- blocking the event loop is not a concern given all code in the tracer
holds the GIL anyway
- **Description:** This PR fixes the response parsing logic for
`ChatDeepInfra`, more specifially `_convert_delta_to_message_chunk()`,
which is invoked when streaming via `ChatDeepInfra`.
- **Issue:** Streaming from DeepInfra via `ChatDeepInfra` is currently
broken because the response parsing logic doesn't handle that
`tool_calls` can be `None`. (There is no GitHub issue for this problem
yet.)
- **Dependencies:** –
- **Twitter handle:** –
Keeping this here as a reminder:
> If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** Moves yield to after callback for
`_prepare_input_and_invoke_stream` and
`_aprepare_input_and_invoke_stream` for bedrock llm in community
package.
**Issue:** #16913
without this `model_config` importing this package produces warnings
about "model_name" having conflicts with protected namespace "model_".
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
When PR body is empty `get_pull_request` method fails with bellow
exception.
**Issue:**
```
TypeError('expected string or buffer')Traceback (most recent call last):
File ".../.venv/lib/python3.9/site-packages/langchain_core/tools/base.py", line 661, in run
response = context.run(self._run, *tool_args, **tool_kwargs)
File ".../.venv/lib/python3.9/site-packages/langchain_community/tools/github/tool.py", line 52, in _run
return self.api_wrapper.run(self.mode, query)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 816, in run
return json.dumps(self.get_pull_request(int(query)))
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 495, in get_pull_request
add_to_dict(response_dict, "body", pull.body)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 487, in add_to_dict
tokens = get_tokens(value)
File ".../.venv/lib/python3.9/site-packages/langchain_community/utilities/github.py", line 483, in get_tokens
return len(tiktoken.get_encoding("cl100k_base").encode(text))
File "....venv/lib/python3.9/site-packages/tiktoken/core.py", line 116, in encode
if match := _special_token_regex(disallowed_special).search(text):
TypeError: expected string or buffer
```
**Twitter:** __gorros__
Chunking of the input array controlled by `self.chunk_size` is being
ignored when `self.check_embedding_ctx_length` is disabled. Effectively,
the chunk size is assumed to be equal 1 in such a case. This is
suprising.
The PR takes into account `self.chunk_size` passed by the user.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Add support to delete documents automatically from the
caches & chat message history by adding a new optional parameter, `ttl`.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
In the previous implementation, `skip_count` was counting all the
documents in the collection. Instead, we want to filter the documents by
`session_id` and calculate `skip_count` by subtracting `history_size`
from the filtered count.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** fix "template" not allowed as prompt param
- **Issue:** #26058
- **Dependencies:** none
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
By default, `HuggingFaceEndpoint` instantiates both the
`InferenceClient` and the `AsyncInferenceClient` with the
`"server_kwargs"` passed as input. This is an issue as both clients
might not support exactly the same kwargs. This has been highlighted in
https://github.com/huggingface/huggingface_hub/issues/2522 by
@morgandiverrez with the `trust_env` parameter. In order to make
`langchain` integration future-proof, I do think it's wiser to forward
only the supported parameters to each client. Parameters that are not
supported are simply ignored with a warning to the user. From a
`huggingface_hub` maintenance perspective, this allows us much more
flexibility as we are not constrained to support the exact same kwargs
in both clients.
## Issue
https://github.com/huggingface/huggingface_hub/issues/2522
## Dependencies
None
## Twitter
https://x.com/Wauplin
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
There is a small bug in "TypedDict class" sample source.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
`unstructured.partition.auto.partition` supports a `url` kwarg, but
`url` in `UnstructuredLoader.__init__` is reserved for the server URL.
Here we add a `web_url` kwarg that is passed to the partition kwargs:
```python
self.unstructured_kwargs["url"] = web_url
```
Thank you for contributing to LangChain!
Fix error like
<img width="1167" alt="image"
src="https://github.com/user-attachments/assets/2e219b26-ec7e-48ef-8111-e0ff2f5ac4c0">
After the fix:
<img width="584" alt="image"
src="https://github.com/user-attachments/assets/48f36fe7-628c-48b6-81b2-7fe741e4ca85">
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: serena-ruan <serena.rxy@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added PebbloTextLoader for loading text in
PebbloSafeLoader.
- Since PebbloSafeLoader wraps document loaders, this new loader enables
direct loading of text into Documents using PebbloSafeLoader.
- **Issue:** NA
- **Dependencies:** NA
- [x] **Tests**: Added/Updated tests
# Description
[Vector store base
class](4cdaca67dc/libs/core/langchain_core/vectorstores/base.py (L65))
currently expects `ids` to be passed in and that is what it passes along
to the AzureSearch vector store when attempting to `add_texts()`.
However AzureSearch expects `keys` to be passed in. When they are not
present, AzureSearch `add_embeddings()` makes up new uuids. This is a
problem when trying to run indexing. [Indexing code
expects](b297af5482/libs/core/langchain_core/indexing/api.py (L371))
the documents to be uploaded using provided ids. Currently AzureSearch
ignores `ids` passed from `indexing` and makes up new ones. Later when
`indexer` attempts to delete removed file, it uses the `id` it had
stored when uploading the document, however it was uploaded under
different `id`.
**Twitter handle: @martintriska1**
Page content sometimes is empty when PyMuPDF can not find text on pages.
For example, this can happen when the text of the PDF is not copyable
"by hand". Then an OCR solution is need - which is not integrated here.
This warning should accurately warn the user that some pages are lost
during this process.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fixes#26212: replaced the raw string with backslashes. Alternative:
raw-stringif the full docstring.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Added search options for BoxRetriever and added documentation to
demonstrate how to use BoxRetriever as an agent tool - @BoxPlatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Ruff doesn't know about the python version in
`[tool.poetry.dependencies]`. It can get it from
`project.requires-python`.
Notes:
* poetry seems to have issues getting the python constraints from
`requires-python` and using `python` in per dependency constraints. So I
had to duplicate the info. I will open an issue on poetry.
* `inspect.isclass()` doesn't work correctly with `GenericAlias`
(`list[...]`, `dict[..., ...]`) on Python <3.11 so I added some `not
isinstance(type, GenericAlias)` checks:
Python 3.11
```pycon
>>> import inspect
>>> inspect.isclass(list)
True
>>> inspect.isclass(list[str])
False
```
Python 3.9
```pycon
>>> import inspect
>>> inspect.isclass(list)
True
>>> inspect.isclass(list[str])
True
```
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** the example to perform hybrid search with the
Elasticsearch retriever is out of date
- **Issue:** N/A
- **Dependencies:** N/A
Co-authored-by: Erick Friis <erick@langchain.dev>
fix#26370
- #26370
`GoogleSpeechToTextLoader` is a deprecated method in
`langchain_community.document_loaders.google_speech_to_text`.
The new recommended usage is to use `SpeechToTextLoader` from
`langchain_google_community`.
When importing from `langchain_google_community`, use the name
`SpeechToTextLoader` instead of the old `GoogleSpeechToTextLoader`.

Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: Fix typo in conda environment code block in
rag.ipynb
- In docs/tutorials/rag.ipynb
Co-authored-by: Erick Friis <erick@langchain.dev>
We recently renamed `MLflow Deployments Server` to `MLflow AI Gateway`
in mlflow. This PR updates the relevant notebooks to use `MLflow AI
gateway`
---
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
### Description:
This pull request significantly enhances the MongodbLoader class in the
LangChain community package by adding robust metadata customization and
improved field extraction capabilities. The updated class now allows
users to specify additional metadata fields through the metadata_names
parameter, enabling the extraction of both top-level and deeply nested
document attributes as metadata. This flexibility is crucial for users
who need to include detailed contextual information without altering the
database schema.
Moreover, the include_db_collection_in_metadata flag offers optional
inclusion of database and collection names in the metadata, allowing for
even greater customization depending on the user's needs.
The loader's field extraction logic has been refined to handle missing
or nested fields more gracefully. It now employs a safe access mechanism
that avoids the KeyError previously encountered when a specified nested
field was absent in a document. This update ensures that the loader can
handle diverse and complex data structures without failure, making it
more resilient and user-friendly.
### Issue:
This pull request addresses a critical issue where the MongodbLoader
class in the LangChain community package could throw a KeyError when
attempting to access nested fields that may not exist in some documents.
The previous implementation did not handle the absence of specified
nested fields gracefully, leading to runtime errors and interruptions in
data processing workflows.
This enhancement ensures robust error handling by safely accessing
nested document fields, using default values for missing data, thus
preventing KeyError and ensuring smoother operation across various data
structures in MongoDB. This improvement is crucial for users working
with diverse and complex data sets, ensuring the loader can adapt to
documents with varying structures without failing.
### Dependencies:
Requires motor for asynchronous MongoDB interaction.
### Twitter handle:
N/A
### Add tests and docs
Tests: Unit tests have been added to verify that the metadata inclusion
toggle works as expected and that the field extraction correctly handles
nested fields.
Docs: An example notebook demonstrating the use of the enhanced
MongodbLoader is included in the docs/docs/integrations directory. This
notebook includes setup instructions, example usage, and outputs.
(Here is the notebook link : [colab
link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing))
Lint and test
Before submitting, I ran make format, make lint, and make test as per
the contribution guidelines. All tests pass, and the code style adheres
to the LangChain standards.
```python
import unittest
from unittest.mock import patch, MagicMock
import asyncio
from langchain_community.document_loaders.mongodb import MongodbLoader
class TestMongodbLoader(unittest.TestCase):
def setUp(self):
"""Setup the MongodbLoader test environment by mocking the motor client
and database collection interactions."""
# Mocking the AsyncIOMotorClient
self.mock_client = MagicMock()
self.mock_db = MagicMock()
self.mock_collection = MagicMock()
self.mock_client.get_database.return_value = self.mock_db
self.mock_db.get_collection.return_value = self.mock_collection
# Initialize the MongodbLoader with test data
self.loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
@patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock())
def test_constructor(self, mock_motor_client):
"""Test if the constructor properly initializes with the correct database and collection names."""
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
self.assertEqual(loader.db_name, "testdb")
self.assertEqual(loader.collection_name, "testcol")
def test_aload(self):
"""Test the aload method to ensure it correctly queries and processes documents."""
# Setup mock data and responses for the database operations
self.mock_collection.count_documents.return_value = asyncio.Future()
self.mock_collection.count_documents.return_value.set_result(1)
self.mock_collection.find.return_value = [
{"_id": "1", "content": "Test document content"}
]
# Run the aload method and check responses
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.loader.aload())
self.assertEqual(len(results), 1)
self.assertEqual(results[0].page_content, "Test document content")
def test_construct_projection(self):
"""Verify that the projection dictionary is constructed correctly based on field names."""
self.loader.field_names = ['content', 'author']
self.loader.metadata_names = ['timestamp']
expected_projection = {'content': 1, 'author': 1, 'timestamp': 1}
projection = self.loader._construct_projection()
self.assertEqual(projection, expected_projection)
if __name__ == '__main__':
unittest.main()
```
### Additional Example for Documentation
Sample Data:
```json
[
{
"_id": "1",
"title": "Artificial Intelligence in Medicine",
"content": "AI is transforming the medical industry by providing personalized medicine solutions.",
"author": {
"name": "John Doe",
"email": "john.doe@example.com"
},
"tags": ["AI", "Healthcare", "Innovation"]
},
{
"_id": "2",
"title": "Data Science in Sports",
"content": "Data science provides insights into player performance and strategic planning in sports.",
"author": {
"name": "Jane Smith",
"email": "jane.smith@example.com"
},
"tags": ["Data Science", "Sports", "Analytics"]
}
]
```
Example Code:
```python
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="example_db",
collection_name="articles",
filter_criteria={"tags": "AI"},
field_names=["title", "content"],
metadata_names=["author.name", "author.email"],
include_db_collection_in_metadata=True
)
documents = loader.load()
for doc in documents:
print("Page Content:", doc.page_content)
print("Metadata:", doc.metadata)
```
Expected Output:
```
Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions.
Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'}
```
Thank you.
---
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
docs:integrations:vectorstores:chroma:fix_typo
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** fix_typo in docs:integrations:vectorstores:chroma
https://python.langchain.com/docs/integrations/vectorstores/chroma/
- **Issue:** https://github.com/langchain-ai/langchain/issues/26561
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
…omSelfQueryRetriever`
This commit corrects an issue in the `_get_docs_with_query` method of
the `CustomSelfQueryRetriever` class. The method was incorrectly using
`self.vectorstore.similarity_search_with_score(query, **search_kwargs)`
without including the `self` argument, which is required for proper
method invocation.
The `self` argument is necessary for calling instance methods and
accessing instance attributes. By including `self` in the method call,
we ensure that the method is correctly executed in the context of the
current instance, allowing it to function as intended.
No other changes were made to the method's logic or functionality.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Firecrawl integration is currently on v0 - which is supported until
version 0.0.20.
@rafaelsideguide is working on a pr for v1 but meanwhile we should fix
the docs.
Support using additional import mapping. This allows users to override
old mappings/add new imports to the loads function.
- [x ] **Add tests and docs**: If you're adding a new integration,
please include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Jess Ou <jessou@jesss-mbp.local.meter>
Co-authored-by: Erick Friis <erick@langchain.dev>
While going through the chatbot tutorial, I noticed a couple of typos
and grammatical issues. Also, the pip install command for
langchain_community was commented out, but the document mentions
installing it.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
docs:tutorials:llm_chain:fix typo
- [ ] **PR message**:
fix typo in llm chain tutorial
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>
Hello,
fix: https://github.com/langchain-ai/langchain/issues/26183
Adding documentation regarding SQL like filter for Google BigQuery
Vector Search coming in next langchain-google-community 1.0.9 release.
Note: langchain-google-community==1.0.9 is not yet released
Question: There is no way to warn the user int the doc about the
availability of a feature after a specific package version ?
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Fix docstring for two functions that look like have
docstrings carried over from other functions.
- **Issue:** Not found issue reporting the miss-leading docstrings.
- **Dependencies:** None
- **Twitter handle:**
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Changed
> "At a high-level, the steps of constructing a knowledge are from text
are:"
to
> "At a high-level, the steps of constructing a knowledge graph from
text are:"
Co-authored-by: Erick Friis <erick@langchain.dev>
The object extends from
langchain_community.chat_models.openai.ChatOpenAI which doesn't have
`bind_tools` defined. I tried extending from
`langchain_openai.ChatOpenAI` in
https://github.com/langchain-ai/langchain/pull/25975 but that PR got
closed because this is not correct.
So adding our own `bind_tools` (which for now copying from ChatOpenAI is
good enough) will solve the tool calling issue we are having now.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR fixes a minor typo in the ScrapflyLoader documentation. The word
"passigng" was changed to "passing."
Before: passigng
After: passing
This change improves the clarity and professionalism of the
documentation.
Co-authored-by: Ashar <asharmalik.ds193@gmail.com>
- **Description:** This is a **one line change**. the
`self.async_client.with_raw_response.create(**payload)` call is not
properly awaited within the `_astream` method. In `_agenerate` this is
done already, but likely forgotten in the other method.
- **Issue:** Not applicable
- **Dependencies:** No dependencies required.
(If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Open source models like Llama3.1 have function calling, but it's not
that great. Therefore, we introduce the option to ignore model's
function calling and just use the prompt-based approach
Thank you for contributing to LangChain!
- [X ] **PR title**
- [X ] **PR message**:
**Description:** adds a handler for when delta choice is None
**Issue:** Fixes#25951
**Dependencies:** Not applicable
- [ X] **Add tests and docs**: Not applicable
- [X ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
Change the default Neo4j username/password (when not supplied as
environment variable or in code) from `None` to `""`.
Neo4j has an option to [disable
auth](https://neo4j.com/docs/operations-manual/current/configuration/configuration-settings/#config_dbms.security.auth_enabled)
which is helpful when developing. When auth is disabled, the username /
password through the `neo4j` module should be `""` (ie an empty string).
Empty strings get marked as false in
`langchain_core.utils.env.get_from_dict_or_env` -- changing this code /
behaviour would have a wide impact and is undesirable.
In order to both _allow_ access to Neo4j with auth disabled and _not_
impact `langchain_core` this patch is presented. The downside would be
that if a user forgets to set NEO4J_USERNAME or NEO4J_PASSWORD they
would see an invalid credentials error rather than missing credentials
error. This could be mitigated but would result in a less elegant patch!
**Issue:**
Fix issue where langchain cannot communicate with Neo4j if Neo4j auth is
disabled.
Thank you for contributing to LangChain!
**Description:**
Similar to other packages (`langchain_openai`, `langchain_anthropic`) it
would be beneficial if that `ChatMistralAI` model could fetch the API
base URL from the environment.
This PR allows this via the following order:
- provided value
- then whatever `MISTRAL_API_URL` is set to
- then whatever `MISTRAL_BASE_URL` is set to
- if `None`, then default is ` "https://api.mistral.com/v1"`
- [x] **Add tests and docs**:
Added unit tests, docs I feel are unnecessary, as this is just aligning
with other packages that do the same?
- [x] **Lint and test**:
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Updating the gateway pages in the documentation to name the
`langchain-databricks` integration.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **PR title**: "community: add Jina Search tool"
- **Description:** Added the Jina Search tool for querying the Jina
search API. This includes the implementation of the JinaSearchAPIWrapper
and the JinaSearch tool, along with a Jupyter notebook example
demonstrating its usage.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** [Twitter
handle](https://x.com/yashp3020?t=7wM0gQ7XjGciFoh9xaBtqA&s=09)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Previously, regardless of whether or not strip_whitespace was set to
true or false, the strip text method in the SpacyTextSplitter class used
`sent.text` to get the sentence. I modified this to include a ternary
such that if strip_whitespace is false, it uses `sent.text_with_ws`
I also modified the project.toml to include the spacy pipeline package
and to lock the numpy version, as higher versions break spacy.
- **Issue:** N/a
- **Dependencies:** None
Returns an array of results which is more specific and easier for later
use.
Tested locally:
```
resp = tool.invoke("what's the weather like in Shanghai?")
for item in resp:
print(item)
```
returns
```
{'snippet': '<b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days.', 'title': 'Shanghai, Shanghai, China Weather Forecast | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/weather-forecast/106577'}
{'snippet': '5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> <b>in Shanghai</b> and forecast for today, tomorrow, and next 14 days.', 'title': 'Weather for Shanghai, Shanghai Municipality, China - timeanddate.com', 'link': 'https://www.timeanddate.com/weather/china/shanghai'}
{'snippet': '<b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ...', 'title': 'Shanghai - BBC Weather', 'link': 'https://www.bbc.com/weather/1796236'}
{'snippet': 'Current <b>weather</b> <b>in Shanghai</b>, <b>Shanghai</b>, China. Check current conditions <b>in Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more.', 'title': 'Shanghai, Shanghai, China Current Weather | AccuWeather', 'link': 'https://www.accuweather.com/en/cn/shanghai/106577/current-weather/106577'}
13-Day Beijing, Xi'an, Chengdu, <b>Shanghai</b> Chinese Language and Culture Immersion Tour. <b>Shanghai</b> in September. Average daily temperature range: 23–29°C (73–84°F) Average rainy days: 10. Average sunny days: 20. September ushers in pleasant autumn <b>weather</b>, making it one of the best months to visit <b>Shanghai</b>. <b>Weather</b> in <b>Shanghai</b>: Climate, Seasons, and Average Monthly Temperature. <b>Shanghai</b> has a subtropical maritime monsoon climate, meaning high humidity and lots of rain. Hot muggy summers, cool falls, cold winters with little snow, and warm springs are the norm. Midsummer through early fall is the best time to visit <b>Shanghai</b>. <b>Shanghai</b>, <b>Shanghai</b>, China <b>Weather</b> Forecast, with current conditions, wind, air quality, and what to expect for the next 3 days. 1165. 45.9. 121. Winter, from December to February, is quite cold: the average January temperature is 5 °C (41 °F). There may be cold periods, with highs around 5 °C (41 °F) or below, and occasionally, even snow can fall. The temperature dropped to -10 °C (14 °F) in January 1977 and to -7 °C (19.5 °F) in January 2016. 5. 99 / 87 °F. 6. 99 / 86 °F. 7. Detailed forecast for 14 days. Need some help? Current <b>weather</b> in <b>Shanghai</b> and forecast for today, tomorrow, and next 14 days. Everything you need to know about today's <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. High/Low, Precipitation Chances, Sunrise/Sunset, and today's Temperature History. <b>Shanghai</b> - <b>Weather</b> warnings issued 14-day forecast. <b>Weather</b> warnings issued. Forecast - <b>Shanghai</b>. Day by day forecast. Last updated Friday at 01:05. Tonight, ... Temperature feels <b>like</b> 34 ... <b>Shanghai</b> 14 Day Extended Forecast. <b>Weather</b> Today <b>Weather</b> Hourly 14 Day Forecast Yesterday/Past <b>Weather</b> Climate (Averages) Currently: 84 °F. Passing clouds. (<b>Weather</b> station: <b>Shanghai</b> Hongqiao Airport, China). See more current <b>weather</b>. Current <b>weather</b> in <b>Shanghai</b>, <b>Shanghai</b>, China. Check current conditions in <b>Shanghai</b>, <b>Shanghai</b>, China with radar, hourly, and more. <b>Shanghai</b> <b>Weather</b> Forecasts. <b>Weather Underground</b> provides local & long-range <b>weather</b> forecasts, weatherreports, maps & tropical <b>weather</b> conditions for the <b>Shanghai</b> area.
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**:
docs: fix typo in summarization_tutorial
- [ ] **PR message**:
docs: fix couple of typos in summarization_tutorial
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Adding a new option to the CSVLoader that allows us to implicitly
specify the columns that are used for generating the Document content.
Currently these are implicitly set as "all fields not part of the
metadata_columns".
In some cases however it is useful to have a field both as a metadata
and as part of the document content.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** The function `_is_assistants_builtin_tool` didn't had
support for `file_search` from OpenAI. This was creating conflict and
blocking the usage of such. OpenAI Assistant changed from`retrieval` to
`file_search`.
The following code
```
agent = OpenAIAssistantV2Runnable.create_assistant(
name="Data Analysis Assistant",
instructions=prompt[0].content,
tools={'type': 'file_search'},
model=self.chat_config.connection.deployment_name,
client=llm,
as_agent=True,
tool_resources={
"file_search": {
"vector_store_ids": vector_store_id
}
}
)
```
Was throwing the following error
```
Traceback (most recent call last):
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 500, in get_response
return await super().get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/chat_decorators.py",
line 96, in get_response
response = await self.inner_chat.get_response(post, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 4 more times]
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/chat/azure_open_ai_chat.py",
line 147, in get_response
chain = chain_factory.get_chain(prompts, post.conversation.id,
overrides, context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/Documents/codes/shellai-nlp-backend/app/llm_connections/chains.py",
line 1324, in get_chain
agent = OpenAIAssistantV2Runnable.create_assistant(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in create_assistant
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 256, in <listcomp>
tools=[_get_assistants_tool(tool) for tool in tools], # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_community/agents/openai_assistant/base.py",
line 119, in _get_assistants_tool
return convert_to_openai_tool(tool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 255, in convert_to_openai_tool
function = convert_to_openai_function(tool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/l.guedesdossantos/anaconda3/envs/shell-e/lib/python3.11/site-packages/langchain_core/utils/function_calling.py",
line 230, in convert_to_openai_function
raise ValueError(
ValueError: Unsupported function
{'type': 'file_search'}
Functions must be passed in as Dict, pydantic.BaseModel, or Callable. If
they're a dict they must either be in OpenAI function format or valid
JSON schema with top-level 'title' and 'description' keys.
```
With the proposed changes, this is fixed and the function will have support for `file_search`.
This was the only place missing the support for `file_search`.
Reference doc
https://platform.openai.com/docs/assistants/tools/file-search
- **Twitter handle:** luizf0992
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description:
- Add system templates and user templates in integration testing
- initialize the response id field value to request_id
- Adjust the default model to hunyuan-pro
- Remove the default values of Temperature and TopP
- Add SystemMessage
all the integration tests have passed.
1、Execute integration tests for the first time
<img width="1359" alt="71ca77a2-e9be-4af6-acdc-4d665002bd9b"
src="https://github.com/user-attachments/assets/9298dc3a-aa26-4bfa-968b-c011a4e699c9">
2、Run the integration test a second time
<img width="1501" alt="image"
src="https://github.com/user-attachments/assets/61335416-4a67-4840-bb89-090ba668e237">
Issue: None
Dependencies: None
Twitter handle: None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds Intel GPU support to `ipex-llm` llm integration.
**Dependencies:** `ipex-llm`
**Contribution maintainer**: @ivy-lv11 @Oscilloscope98
**tests and docs**:
- Add: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/docs/docs/integrations/llms/ipex_llm_gpu.ipynb
- Update: langchain/libs/community/tests/llms/test_ipex_llm.py
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
Thank you for contributing to LangChain!
**Description:**
The current documentation of using the Huggingface with Langchain needs
to set return_full_text as False otherwise pipeline by default returns
both the prompt and response as output.
Code to reproduce:
```python
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from langchain_core.messages import (
HumanMessage,
SystemMessage,
)
llm = HuggingFacePipeline.from_model_id(
model_id="microsoft/Phi-3.5-mini-instruct",
task="text-generation",
pipeline_kwargs=dict(
max_new_tokens=512,
do_sample=False,
repetition_penalty=1.03,
# return_full_text=False
),
device=0
)
chat_model = ChatHuggingFace(llm=llm)
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(
content="What happens when an unstoppable force meets an immovable object?"
),
]
ai_msg = chat_model.invoke(messages)
print(ai_msg.content)
```
Output:
```
<|system|>
You're a helpful assistant<|end|>
<|user|>
What happens when an unstoppable force meets an immovable object?<|end|>
<|assistant|>
The scenario of an "unstoppable force" meeting an "immovable object" is a classic paradox that has puzzled philosophers, scientists, and thinkers for centuries. In physics, however, there are no such things as truly unstoppable forces or immovable objects because all physical entities have mass and interact with other masses through fundamental forces (like gravity).
When we consider the laws of motion, particularly Newton's third law which states that for every action, there is an equal and opposite reaction, it becomes clear that if one were to exist, the other would necessarily be negated by the interaction. For example, if you push against a solid wall with great force, the wall exerts an equal and opposite force back on you, preventing your movement.
In theoretical discussions, this paradox often serves as a thought experiment to explore concepts like determinism versus free will, the limits of physical laws, and the nature of reality itself. However, in practical terms, any force applied to an object will result in some form of deformation, transfer of energy, or movement, depending on the properties of both the force and the object.
So while the idea of an unstoppable force and an immovable object remains a fascinating philosophical conundrum, it does not hold up under the scrutiny of physical laws as we understand them.
```
---------
Co-authored-by: Kirushikesh D B kirushi@ibm.com <kirushi@cccxl012.pok.ibm.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: “syd” <“zheng.yuxi@outlook.com>
- **Description:**
Improve llamacpp embedding class by adding the `device` parameter so it
can be passed to the model and used with `gpu`, `cpu` or Apple metal
(`mps`).
Improve performance by making use of the bulk client api to compute
embeddings in batches.
- **Dependencies:** none
- **Tag maintainer:**
@hwchase17
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Starting from Neo4j 5.23 (22 August 2024), with vector-2.0 indexes,
`vector.dimensions` is not required to be set, which will cause it the
key not exist error in index config if it's not set.
Since the existence of vector.dimensions will only ensure additional
checks, this commit turns embedding dimension check optional, and only
do checks when it exists (not None).
https://neo4j.com/release-notes/database/neo4j-5/
**Twitter handle:** @HollowM186
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- **AI Agent Built With LangChain and FireWorksAI**: "community
notebook"
- **Description:** Added a new AI agent in the cookbook folder that
integrates prompt compression using LLMLingua and arXiv retrieval tools.
The agent is designed to optimize the efficiency and performance of
research tasks by compressing lengthy prompts and retrieving relevant
academic papers. The agent also makes uses of MongoDB to store
conversational history and as it's knowledge base using MongoDB vector
store
- **Twitter handle:** https://x.com/richmondalake
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
## Description
- Updates the self-query retriever factory to check for the new Qdrant
vector store class. i.e. `langchain_qdrant.QdrantVectorstore`.
- Deprecates `QdrantSparseVectorRetriever`, since the vector store
implementation natively supports it now.
Resolves#25798
- **Description:** When useing LLM integration moonshot,it's occurring
error "'Moonshot' object has no attribute '_client'",it's because of the
"_client" that is private in pydantic v1.0 so that we can't use it.I
turn "_client" into "client" , the error to be resolved!
- **Issue:** the issue #24390
- **Dependencies:** none
- **Twitter handle:** @Rainsubtime
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Cyue <Cyue_work2001@163.com>
- **Description:** if you use callback handlers when using tool,
run_manager will be added to input, so you need to explicitly specify
args_schema, but i was confused because it was not listed, so i added
it. Also, it seems that the type does not work with pydantic.BaseModel.
- **Issue:** None
- **Dependencies:** None
- [x] **PR title - community: add neo4j query constructor for self
query**
- [x] **PR message**
- **Description:** adding a Neo4jTranslator so that the Neo4j vector
database can use SelfQueryRetriever
- **Issue:** this issue had been raised before in #19748
- **Dependencies:** none.
- **Twitter handle:** @moyi_dang
- p.s. I have not added the query constructor in BUILTIN_TRANSLATORS in
this PR, I want to make changes to only one package at a time.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Added: arxiv references to the concepts page.
Regenerated: arxiv references page.
Improved: formatting of the concepts page (moved the Partner packages
section after langchain_community)
- **Description:** OpenAI recently introduced a "strict" parameter for
[structured outputs in their
API](https://openai.com/index/introducing-structured-outputs-in-the-api/).
An optional `strict` parameter has been added to
`create_openai_functions_agent()` and `create_openai_tools_agent()` so
developers can use this feature in those agents.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [ ] **PR title**: community: add tests for ChatOctoAI
- [ ] **PR message**:
Description: Added unit tests for the ChatOctoAI class in the community
package to ensure proper validation and default values. These tests
verify the correct initialization of fields, the handling of missing
required parameters, and the proper setting of aliases.
Issue: N/A
Dependencies: None
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Thank you for contributing to LangChain!
community:premai[patch]: standardize init args
- updated `temperature` with Pydantic Field, updated the unit test.
- updated `max_tokens` with Pydantic Field, updated the unit test.
- updated `max_retries` with Pydantic Field, updated the unit test.
Related to #20085
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Description: Moves yield to after callback for _astream for gigachat in
the community package
Issue: #16913
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: "community: Patch enable to use Amazon OpenSearch
Serverless for Semantic Cache store"
- [x] **PR message**:
- **Description:** OpenSearchSemanticCache class support Amazon
OpenSearch Serverless for Semantic Cache store, it's only required to
pass auth(http_auth) parameter to initializer
- **Dependencies:** none
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Jinoos Lee <jinoos@amazon.com>
it fixes two issues:
### YGPTs are broken #25575
```
File ....conda/lib/python3.11/site-packages/langchain_community/embeddings/yandex.py:211, in _make_request(self, texts, **kwargs)
..
--> 211 res = stub.TextEmbedding(request, metadata=self._grpc_metadata) # type: ignore[attr-defined]
AttributeError: 'YandexGPTEmbeddings' object has no attribute '_grpc_metadata'
```
My gut feeling that #23841 is the cause.
I have to drop leading underscore from `_grpc_metadata` for quickfix,
but I just don't know how to do it _pydantic_ enough.
### minor issue:
if we use `api_key`, which is not the best practice the code fails with
```
File ~/git/...../python3.11/site-packages/langchain_community/embeddings/yandex.py:119, in YandexGPTEmbeddings.validate_environment(cls, values)
...
AttributeError: 'tuple' object has no attribute 'append'
```
- Added new integration test. But it requires YGPT env available and
active account. I don't know how int tests dis\enabled in CI.
- added small unit tests with mocks. Should be fine.
---------
Co-authored-by: mikhail-khludnev <mikhail_khludnev@rntgroup.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Support passing extra params when executing UC functions:
The params should be a dictionary with key EXECUTE_FUNCTION_ARG_NAME,
the assumption is that the function itself doesn't use such variable
name (starting and ending with double underscores), and if it does we
raise Exception.
If invalid params passing to the execute_statement, we raise Exception
as well.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: optimize xinference llm import"
- [ ] **PR message**:
- **Description:** from xinferece_client import RESTfulClient when there
is no importing xinference.
- **Dependencies:** xinferece_client
- **Why do so:** the total xinference(pip install xinference[all]) is
too heavy for installing, let alone it is useless for langchain user
except RESTfulClient. The modification has maintained consistency with
the xinference embeddings
[embeddings/xinference](../blob/master/libs/community/langchain_community/embeddings/xinference.py#L89).
[This
commit](d3ca2cc8c3)
has broken the moderation chain so we've faced a crash when migrating
the LangChain from v0.1 to v0.2.
The issue appears that the class attribute the code refers to doesn't
hold the value processed in the `validate_environment` method. We had
`extras={}` in this attribute, and it was casted to `True` when it
should've been `False`. Adding a simple assignment seems to resolve the
issue, though I'm not sure it's the right way.
---
---------
Co-authored-by: Michael Rubél <mrubel@oroinc.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: docs: fixed syntax error in ChatAnthropic Example -
rag app tutorial notebook - generation
- [ ] **PR message**:
- **Description:** Fixed a syntax error in the ChatAnthropic
initialization example in the RAG tutorial notebook. The original code
had an extra set of quotation marks around the model parameter, which
would cause a Python syntax error. The corrected version removes these
unnecessary quotes.
- **Dependencies:** No new dependencies required for this documentation
fix.
I've verified that the corrected code is syntactically valid and matches
the expected format for initializing a ChatAnthropic instance in
LangChain.
- **Twitter handle:** madhu_shantan
- [ ] **Add tests and docs**: the error in Jupyter notebook:
<img width="1189" alt="Screenshot 2024-08-29 at 12 43 47 AM"
src="https://github.com/user-attachments/assets/07148a93-300f-40e2-ad4a-ac219cbb56a4">
the corrected cell:
<img width="983" alt="Screenshot 2024-08-29 at 12 44 18 AM"
src="https://github.com/user-attachments/assets/75b1455a-3671-454e-ac16-8ca77c049dbd">
- [ ] **Lint and test**: As this is a documentation-only change, I have
not run the full test suite. However, I have verified that the corrected
code example is syntactically valid and matches the expected usage of
the ChatAnthropic class.
the error in the docs is here -
<img width="1020" alt="Screenshot 2024-08-29 at 12 48 36 AM"
src="https://github.com/user-attachments/assets/812ccb20-b411-4a5b-afc1-41742efb32a7">
## Description
In `langchain_prompty`, messages are templated by Prompty. However, a
call to `ChatPromptTemplate` was initiating a second templating. We now
convert parsed messages to `Message` objects before calling
`ChatPromptTemplate`, signifying clearly that they are already
templated.
We also revert #25739 , which applied to this second templating, which
we now avoid, and did not fix the original issue.
## Issue
Closes#25703
I have validated langchain interface with tei/tgi works as expected when
TEI and TGI running on Intel Gaudi2. Adding some references to notebooks
to help users find relevant info.
---------
Co-authored-by: Rita Brugarolas <rbrugaro@idc708053.jf.intel.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Add array data type for milvus vector store collection create
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
Adds the 'score' returned by Pinecone to the
`PineconeHybridSearchRetriever` list of returned Documents.
There is currently no way to return the score when using Pinecone hybrid
search, so in this PR I include it by default.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
### Description
adds an init method to ChatDeepInfra to set the model_name attribute
accordings to the argument
### Issue
currently, the model_name specified by the user during initialization of
the ChatDeepInfra class is never set. Therefore, it always chooses the
default model (meta-llama/Llama-2-70b-chat-hf, however probably since
this is deprecated it always uses meta-llama/Llama-3-70b-Instruct). We
stumbled across this issue and fixed it as proposed in this pull
request. Feel free to change the fix according to your coding guidelines
and style, this is just a proposal and we want to draw attention to this
problem.
### Dependencies
no additional dependencies required
Feel free to contact me or @timo282 and @finitearth if you have any
questions.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Make the hyperlink only appear once in the
extract_hyperlinks tool output. (for some websites output contains
meaningless '#' hyperlinks multiple times which will extend the tokens
of context window without any advantage)
**Issue:** None
**Dependencies:** None
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain: Chains: query_constructor: add date time
parser"
- [x] **PR message**:
- **Description:** add date time parser to langchain Chains
query_constructor
- **Issue: https://github.com/langchain-ai/langchain/issues/25526
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Added Azure Search Access Token Authentication instead of API KEY auth.
Fixes Issue: https://github.com/langchain-ai/langchain/issues/24263
Dependencies: None
Twitter: @levalencia
@baskaryan
Could you please review? First time creating a PR that fixes some code.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This pull request introduces support for the AI21 tools calling feature,
available by the Jamba-1.5 models. When Jamba-1.5 detects the necessity
to invoke a provided tool, as indicated by the 'tools' parameter passed
to the model:
```
class ToolDefinition(TypedDict, total=False):
type: Required[Literal["function"]]
function: Required[FunctionToolDefinition]
class FunctionToolDefinition(TypedDict, total=False):
name: Required[str]
description: str
parameters: ToolParameters
class ToolParameters(TypedDict, total=False):
type: Literal["object"]
properties: Required[Dict[str, Any]]
required: List[str]
```
It will respond with a list of tool calls structured as follows:
```
class ToolCall(AI21BaseModel):
id: str
function: ToolFunction
type: Literal["function"] = "function"
class ToolFunction(AI21BaseModel):
name: str
arguments: str
```
This pull request incorporates the necessary modifications to integrate
this functionality into the ai21-langchain library.
---------
Co-authored-by: asafg <asafg@ai21.com>
Co-authored-by: pazshalev <111360591+pazshalev@users.noreply.github.com>
Co-authored-by: Paz Shalev <pazs@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- "libs: langchain_milvus: add db name to milvus connection check"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** add db name to milvus connection check
- **Issue:** https://github.com/langchain-ai/langchain/issues/25277
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This addresses the issue mentioned in #25702
I have updated the endpoint used in validating the endpoint API type in
the AzureMLBaseEndpoint class from `/v1/completions` to `/completions`
and `/v1/chat/completions` to `/chat/completions`.
Co-authored-by: = <=>
- **Description:** Added a `template_format` parameter to
`create_chat_prompt` to allow `.prompty` files to handle variables in
different template formats.
- **Issue:** #25703
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **PR message**: **Fix URL construction in newer Python versions**
- **Description:**
- Update the URL construction logic to use the .value attribute for
Routes enum members.
- This adjustment resolves an issue where the code worked correctly in
Python 3.9 but failed in Python 3.11.
- Clean up unused routes.
- **Issue:** NA
- **Dependencies:** NA
* Removed `ruff check --select I` as `I` is already selected and checked
in the main `ruff check` command
* Added checks for non-empty `PYTHON_FILES`
* Run `ruff check` only on `PYTHON_FILES`
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Fix the validation error for `endpoint_url` for
HuggingFaceEndpoint. I have given a descriptive detail of the isse in
the issue that I have created.
- **Issue:** #24742
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
### Summary
Add `DatabricksVectorSearch` and `DatabricksEmbeddings` classes to the
`langchain-databricks` partner packages. Core functionality is
unchanged, but the vector search class is largely refactored for
readability and maintainability.
This PR does not add integration tests yet. This will be added once the
Databricks test workspace is ready.
Tagging @efriis as POC
### Tracker
[✅] Create a package and imgrate ChatDatabricks
[✍️] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their
docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- [ ] **PR message**:
- **Description:** Compatible with other llm (eg: deepseek-chat, glm-4)
usage meta data
- **Issue:** N/A
- **Dependencies:** no new dependencies added
- [ ] **Add tests and docs**:
libs/partners/openai/tests/unit_tests/chat_models/test_base.py
```shell
cd libs/partners/openai
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_openai_stream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_deepseek_stream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_astream
poetry run pytest tests/unit_tests/chat_models/test_base.py::test_glm4_stream
```
---------
Co-authored-by: hyman <hyman@xiaozancloud.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "langchain-core: Fix type"
- The file to modify is located in
/libs/core/langchain_core/prompts/base.py
- [ ] **PR message**:
- **Description:** The change is a type for the inner input variable,
the type go from dict to Any. This change is required since the method
_validate input expects a type that is not only a dictionary.
- **Dependencies:** There are no dependencies for this change
- [ ] **Add tests and docs**:
1. A test is not needed. This error occurs because I overrode a portion
of the _validate_input method, which is causing a 'beartype' to raise an
error.
This PR introduces adjustments to ensure compatibility with the recently
released preview version of [TiDB Serverless Vector
Search](https://tidb.cloud/ai), aiming to prevent user confusion.
- TiDB Vector now supports vector indexing with cosine and l2 distance
strategies, although inner_product remains unsupported.
- Changing the distance strategy is currently not supported, so the test
cased should be adjusted.
Issue: the `service` optional parameter was mentioned but not used.
Fix: added this parameter.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
There is a bug in the concatenation of embeddings obtained from MLflow
that does not conform to the type hint requested by the function.
``` python
def _query(self, texts: List[str]) -> List[List[float]]:
```
It is logical to expect a **List[List[float]]** for a **List[str]**.
However, the append method encapsulates the response in a global List.
To avoid this, the extend method should be used, which will add the
embeddings of all strings at the same list level.
## Testing
I have tried using OpenAI-ADA to obtain the embeddings, and the result
of executing this snippet is as follows:
``` python
embeds = await MlflowAIGatewayEmbeddings().aembed_documents(texts=["hi", "how are you?"])
print(embeds)
```
``` python
[[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]]
```
When in reality, the expected result should be:
``` python
[[-0.03512698, -0.020624293, -0.015343423, ...], [-0.021260535, -0.011461929, -0.00033121882, ...]]
```
The above result complies with the expected type hint:
**List[List[float]]** . As I mentioned, we can achieve that by using the
extend method instead of the append method.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Description: Simply pass kwargs to allow arguments like "where" to be
propagated
Issue: Previously, db.delete(where={}) wouldn't work for chroma
vectorstores
Dependencies: N/A
Twitter handle: N/A
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: Send both the query and query_embedding to the Databricks
index for hybrid search.
Issue: When using hybrid search with non-Databricks managed embedding we
currently don't pass both the embedding and query_text to the index.
Hybrid search requires both of these. This change fixes this issue for
both `similarity_search` and `similarity_search_by_vector`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
# Issue
As of late July, Perplexity [no longer supports Llama 3
models](https://docs.perplexity.ai/changelog/introducing-new-and-improved-sonar-models).
# Description
This PR updates the default model and doc examples to reflect their
latest supported model. (Mostly updating the same places changed by
#23723.)
# Twitter handle
`@acompa_` on behalf of the team at Not Diamond. Check us out
[here](https://notdiamond.ai).
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Updated titles into a consistent format.
Fixed links to the diagrams.
Fixed typos.
Note: The Templates menu in the navbar is now sorted by the file names.
I'll try sorting the navbar menus by the page titles, not the page file
names.
- Output of the cells was not included in the documentation. I have
added them.
- There is another parameter in the `WikipediaLoader` class called
`doc_content_chars_max` (Based on
[this](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.wikipedia.WikipediaLoader.html)).
I have included this in the list of parameters.
- I put the list of parameters under a new section called "Parameters"
in the documentation.
- I also included the `langchain_community` package in the installation
command.
- Some minor formatting/spelling issues were fixed.
Hello.
First of all, thank you for maintaining such a great project.
## Description
In https://github.com/langchain-ai/langchain/pull/25123, support for
structured_output is added. However, `"additionalProperties": false`
needs to be included at all levels when a nested object is generated.
error from current code:
https://gist.github.com/fufufukakaka/e9b475300e6934853d119428e390f204
```
BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'JokeWithEvaluation': In context=('properties', 'self_evaluation'), 'additionalProperties' is required to be supplied and to be false", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}
```
Reference: [Introducing Structured Outputs in the
API](https://openai.com/index/introducing-structured-outputs-in-the-api/)
```json
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
}
}
}
```
In the current code, `"additionalProperties": false` is only added at
the last level.
This PR introduces the `_add_additional_properties_key` function, which
recursively adds `"additionalProperties": false` to the entire JSON
schema for the request.
Twitter handle: `@fukkaa1225`
Thank you!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Previously the code was able to only handle a single level of nesting
for subgraphs in mermaid. This change adds support for arbitrary nesting
of subgraphs.
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:
1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.
The respective solutions I propose are the following:
1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
When I used the Neo4JGraph enhanced_schema=True option, I ran into an
error because a prop min_size of None was compared numerically with an
int.
The fix I applied is similar to the pattern of skipping embeddings
elsewhere in the file.
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
LLM will stop generating text even in the middle of a sentence if
`finish_reason` is `length` (for OpenAI) or `stop_reason` is
`max_tokens` (for Anthropic).
To obtain longer outputs from LLM, we should call the message generation
API multiple times and merge the results into the text to circumvent the
API's output token limit.
The extra line breaks forced by the `merge_message_runs` function when
seamlessly merging messages can be annoying, so I added the option to
specify the chunk separator.
**Issue:**
No corresponding issues.
**Dependencies:**
No dependencies required.
**Twitter handle:**
@hanama_chem
https://x.com/hanama_chem
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
parsed_json is expected to be a list of dictionaries, but it seems to…
be a single dictionary instead.
This is at
libs/experimental/langchain_experimental/graph_transformers/llm.py
process process_response
Thank you for contributing to LangChain!
- [ ] **Bugfix**: "experimental: bugfix"
---------
Co-authored-by: based <basir.sedighi@nris.no>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Cause chunks are joined by space, so they can't be found in text, and
the final `start_index` is very possibility to be -1.
- The simplest way is to use the natural index of the chunk as
`start_index`.
**Description:** This part of the documentation didn't explain about the
`required` property of function calling. I added additional line as a
note.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This change adds the ID field that's required in
Pinecone to the result documents of the similarity search method.
- **Issue:** Lack of document metadata namely the ID field
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
[langchain_core] Fix UnionType type var replacement
- Added types.UnionType to typing.Union mapping
Type replacement cause `TypeError: 'type' object is not subscriptable`
if any of union type comes as function `_py_38_safe_origin` return
`types.UnionType` instead of `typing.Union`
```python
>>> from types import UnionType
>>> from typing import Union, get_origin
>>> type_ = get_origin(str | None)
>>> type_
<class 'types.UnionType'>
>>> UnionType[(str, None)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'type' object is not subscriptable
>>> Union[(str, None)]
typing.Optional[str]
```
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
I was trying to add this package using langchain-cli: `langchain app add
openai-functions-agent-gmail`, but when then try to build the whole
project using poetry or pip, it fails with the following
error:`poetry.core.masonry.utils.module.ModuleOrPackageNotFound: No
file/folder found for package openai-functions-agent-gmail`
This was fixed by modifying the pyproject.toml as in this commit
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** In GitLab we call these "merge requests" rather than
"pull requests" so I thought I'd go ahead and update the notebook.
- **Issue:** N/A
- **Dependencies:** none
- **Twitter handle:** N/A
Thanks for creating the tools and notebook to help people work with
GitLab. I thought I'd contribute some minor docs updates here.
Description: DeepInfra 500 errors have useful information in the text
field that isn't being exposed to the user. I updated the error message
to fix this.
As an example, this code
```
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
model = "meta-llama/Meta-Llama-3-70B-Instruct"
deepinfra_api_token = "..."
model = ChatDeepInfra(model=model, deepinfra_api_token=deepinfra_api_token)
messages = [HumanMessage("All work and no play makes Jack a dull boy\n" * 9000)]
response = model.invoke(messages)
```
Currently gives this error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server: Error 500
```
This change would give the following error:
```
langchain_community.chat_models.deepinfra.ChatDeepInfraException: DeepInfra Server error status 500: {"error":{"message":"Requested input length 99009 exceeds maximum input length 8192"}}
```
**Refactor PebbloRetrievalQA**
- Created `APIWrapper` and moved API logic into it.
- Created smaller functions/methods for better readability.
- Properly read environment variables.
- Removed unused code.
- Updated models
**Issue:** NA
**Dependencies:** NA
**tests**: NA
**Refactor PebbloSafeLoader**
- Created `APIWrapper` and moved API logic into it.
- Moved helper functions to the utility file.
- Created smaller functions and methods for better readability.
- Properly read environment variables.
- Removed unused code.
**Issue:** NA
**Dependencies:** NA
**tests**: Updated
limit the most recent documents to fetch from MongoDB database.
Thank you for contributing to LangChain!
- [ ] **limit the most recent documents to fetch from MongoDB
database.**: "langchain_mongodb: limit the most recent documents to
fetch from MongoDB database."
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added a doc_limit parameter which enables the limit
for the documents to fetch from MongoDB database
- **Issue:**
- **Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: The neo4j driver can raise a SessionExpired error, which is
considered a retriable error. If a query fails with a SessionExpired
error, this change retries every query once. This change will make the
neo4j integration less flaky.
Twitter handle: noahmay_
### Summary
Create `langchain-databricks` as a new partner packages. This PR does
not migrate all existing Databricks integration, but the package will
eventually contain:
* `ChatDatabricks` (implemented in this PR)
* `DatabricksVectorSearch`
* `DatabricksEmbeddings`
* ~`UCFunctionToolkit`~ (will be done after UC SDK work which
drastically simplify implementation)
Also, this PR does not add integration tests yet. This will be added
once the Databricks test workspace is ready.
Tagging @efriis as POC
### Tracker
[✍️] Create a package and imgrate ChatDatabricks
[ ] Migrate DatabricksVectorSearch, DatabricksEmbeddings, and their docs
~[ ] Migrate UCFunctionToolkit and its doc~
[ ] Add provider document and update README.md
[ ] Add integration tests and set up secrets (after moved to an external
package)
[ ] Add deprecation note to the community implementations.
---------
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
**Description:** Adding `BoxRetriever` for langchain_box. This retriever
handles two use cases:
* Retrieve all documents that match a full-text search
* Retrieve the answer to a Box AI prompt as a Document
**Twitter handle:** @BoxPlatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Updating metadata for sharepoint loader with full
path i.e., webUrl
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
Co-authored-by: dristy.cd <dristy@clouddefense.io>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
-Description: Adding new package: `langchain-box`:
* `langchain_box.document_loaders.BoxLoader` — DocumentLoader
functionality
* `langchain_box.utilities.BoxAPIWrapper` — Box-specific code
* `langchain_box.utilities.BoxAuth` — Helper class for Box
authentication
* `langchain_box.utilities.BoxAuthType` — enum used by BoxAuth class
- Twitter handle: @boxplatform
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
also remove some unused dependencies (fastapi) and unused test/lint/dev
dependencies (community, openai, textsplitters)
chromadb 0.5.4 introduced usage of `model_fields` which is pydantic v2
specific. also released in 0.5.5
The new `langchain-ollama` package seems pretty well implemented, but I
noticed the docs were still outdated so I decided to fix em up a bit.
- Llama3.1 was release on 23rd of July;
https://ai.meta.com/blog/meta-llama-3-1/
- Ollama supports tool calling since 25th of July;
https://ollama.com/blog/tool-support
- LangChain Ollama partner package was released 1st of august;
https://pypi.org/project/langchain-ollama/
**Problem**: Docs note langchain-community instead of langchain-ollama
**Solution**: Update docs to
https://python.langchain.com/v0.2/docs/integrations/chat/ollama/
**Problem**: OllamaFunctions is deprecated, as noted on
[Integrations](https://python.langchain.com/v0.2/docs/integrations/chat/ollama_functions/):
This was an experimental wrapper that attempts to bolt-on tool calling
support to models that do not natively support it. The [primary Ollama
integration](https://python.langchain.com/v0.2/docs/integrations/chat/ollama/) now
supports tool calling, and should be used instead.
**Solution**: Delete old notebook from repo, update the existing one
with @tool decorator + pydantic examples to the notebook
**Problem**: Llama3.1 was released while llama3-groq-tool-call fine-tune
Is noted in notebooks.
**Solution**: update docs + notebooks to llama3.1 (which has improved
tool calling support)
**Problem**: Install instructions are incomplete, there is no
information to download a model and/or run the Ollama server
**Solution**: Add simple instructions to start the ollama service and
pull model (for toolcalling)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This will allow complextype metadata to be returned. the current
implementation throws error when dealing with nested metadata
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Here we allow standard tests to specify a value for `tool_choice` via a
`tool_choice_value` property, which defaults to None.
Chat models [available in
Together](https://docs.together.ai/docs/chat-models) have issues passing
standard tool calling tests:
- llama 3.1 models currently [appear to rely on user-side
parsing](https://docs.together.ai/docs/llama-3-function-calling) in
Together;
- Mixtral-8x7B and Mistral-7B (currently tested) consistently do not
call tools in some tests.
Specifying tool_choice also lets us remove an existing `xfail` and use a
smaller model in Groq tests.
- **Description:** The following
[line](fd546196ef/libs/community/langchain_community/document_loaders/parsers/audio.py (L117))
in `OpenAIWhisperParser` returns a text object for some odd reason
despite the official documentation saying it should return `Transcript`
Instance which should have the text attribute. But for the example given
in the issue and even when I tried running on my own, I was directly
getting the text. The small PR accounts for that.
- **Issue:** : #25218
I was able to replicate the error even without the GenericLoader as
shown below and the issue was with `OpenAIWhisperParser`
```python
parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx",
response_format="srt",
temperature=0)
list(parser.lazy_parse(Blob.from_path('path_to_file.m4a')))
```
…he prompt in the create_stuff_documents_chain
Thank you for contributing to LangChain!
- [ ] **PR title**: "langchain:add document_variable_name in the
function _validate_prompt in create_stuff_documents_chain"
- [ ] **PR message**:
- **Description:** add document_variable_name in the function
_validate_prompt in create_stuff_documents_chain
- **Issue:** according to the description of
create_stuff_documents_chain function, the parameter
document_variable_name can be used to override the "context" in the
prompt, but in the function, _validate_prompt it still use DOCUMENTS_KEY
to check if it is a valid prompt, the value of DOCUMENTS_KEY is always
"context", so even through the user use document_variable_name to
override it, the code still tries to check if "context" is in the
prompt, and finally it reports error. so I use document_variable_name to
replace DOCUMENTS_KEY, the default value of document_variable_name is
"context" which is same as DOCUMENTS_KEY, but it can be override by
users.
- **Dependencies:** none
- **Twitter handle:** https://x.com/xjr199703
- [ ] **Add tests and docs**: none
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
fix: #25482
- **Description:**
Add a prompt to install beautifulsoup4 in places where `from
langchain_community.document_loaders import WebBaseLoader` is used.
- **Issue:** #25482
**Description:** This PR fixes an issue in the demo notebook of
Databricks Vector Search in "Work with Delta Sync Index" section.
**Issue:** N/A
**Dependencies:** N/A
---------
Co-authored-by: Chengzu Ou <chengzu.ou@databrick.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Check whether the API key is already in the environment
Update:
```python
import getpass
import os
os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
os.environ["DATABRICKS_TOKEN"] = getpass.getpass("Enter your Databricks access token: ")
```
To:
```python
import getpass
import os
os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
"Enter your Databricks access token: "
)
```
grit migration:
```
engine marzano(0.1)
language python
`os.environ[$Q] = getpass.getpass("$X")` as $CHECK where {
$CHECK <: ! within if_statement(),
$CHECK => `if $Q not in os.environ:\n $CHECK`
}
```
- [x] NatbotChain: move to community, deprecate langchain version.
Update to use `prompt | llm | output_parser` instead of LLMChain.
- [x] LLMMathChain: deprecate + add langgraph replacement example to API
ref
- [x] HypotheticalDocumentEmbedder (retriever): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] FlareChain: update to use `prompt | llm | output_parser` instead
of LLMChain
- [x] ConstitutionalChain: deprecate + add langgraph replacement example
to API ref
- [x] LLMChainExtractor (document compressor): update to use `prompt |
llm | output_parser` instead of LLMChain
- [x] LLMChainFilter (document compressor): update to use `prompt | llm
| output_parser` instead of LLMChain
- [x] RePhraseQueryRetriever (retriever): update to use `prompt | llm |
output_parser` instead of LLMChain
Within the semantic chunker, when calling `_threshold_from_clusters`
there is the possibility for a divide by 0 error if the
`number_of_chunks` is equal to the length of `distances`.
Fix simply implements a check if these values match to prevent the error
and enable chunking to continue.
Remove the period after the hyperlink in the docstring of
BaseChatOpenAI.with_structured_output.
I have repeatedly copied the extra period at the end of the hyperlink,
which results in a "Page not found" page when pasted into the browser.
Fix typo
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Fix typo and some `callout` tags
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Backwards compatible change that converts pydantic extras to literals
which is consistent with pydantic 2 usage.
- fireworks
- voyage ai
- mistralai
- mistral ai
- together ai
- huggigng face
- pinecone
**Description**
Fix the asyncronous methods to retrieve documents from AzureSearch
VectorStore. The previous changes from [this
commit](ffe6ca986e)
create a similar code for the syncronous methods and the asyncronous
ones but the asyncronous client return an asyncronous iterator
"AsyncSearchItemPaged" as said in the issue #24740.
To solve this issue, the syncronous iterators in asyncronous methods
where changed to asyncronous iterators.
@chrislrobert said in [this
comment](https://github.com/langchain-ai/langchain/issues/24740#issuecomment-2254168302)
that there was a still a flaw due to `with` blocks that close the client
after each call. I removed this `with` blocks in the `async_client`
following the same pattern as the sync `client`.
In order to close up the connections, a __del__ method is included to
gently close up clients once the vectorstore object is destroyed.
**Issue:** #24740 and #24064
**Dependencies:** No new dependencies for this change
**Example notebook:** I created a notebook just to test the changes work
and gives the same results as the syncronous methods for vector and
hybrid search. With these changes, the asyncronous methods in the
retriever work as well.

**Lint and test**: Passes the tests and the linter
This adds `args_schema` member to `SearxSearchResults` tool. This member
is already present in the `SearxSearchRun` tool in the same file.
I was having `TypeError: Type is not JSON serializable:
AsyncCallbackManagerForToolRun` being thrown in langserve playground
when I was using `SearxSearchResults` tool as a part of chain there.
This fixes the issue, so the error is not raised anymore.
This is a example langserve app that was giving me the error, but it
works properly after the proposed fix:
```python
#!/usr/bin/env python
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SearxSearchWrapper
from langchain_community.tools.searx_search.tool import SearxSearchResults
from langserve import add_routes
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
s = SearxSearchWrapper(searx_host="http://localhost:8080")
search = SearxSearchResults(wrapper=s)
search_chain = (
{"context": search, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
app = FastAPI()
add_routes(
app,
search_chain,
path="/chain",
)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=8000)
```
- **Description:** Runhouse recently migrated from Read the Docs to a
self-hosted solution. This PR updates a broken link from the old docs to
www.run.house/docs. Also changed "The Runhouse" to "Runhouse" (it's
cleaner).
- **Issue:** None
- **Dependencies:** None
- **Description:** Standardize SparkLLM, include:
- docs, the issue #24803
- to support stream
- update api url
- model init arg names, the issue #20085
Cleaned up the "Tying it Together" section of the Conversational RAG
tutorial by removing unnecessary imports that were not used. This
reduces confusion and makes the code more concise.
Thank you for contributing to LangChain!
PR title: docs: remove unused imports in Conversational RAG tutorial
PR message:
Description: Removed unnecessary imports from the "Tying it Together"
section of the Conversational RAG tutorial. These imports were not used
in the code and created confusion. The updated code is now more concise
and easier to understand.
Issue: N/A
Dependencies: None
LinkedIn handle: [Hassan
Memon](https://www.linkedin.com/in/hassan-memon-a109b3257/)
Add tests and docs:
Hi [LangChain Team Member’s Name],
I hope you're doing well! I’m thrilled to share that I recently made my
second contribution to the LangChain project. If possible, could you
give me a shoutout on LinkedIn? It would mean a lot to me and could help
inspire others to contribute to the community as well.
Here’s my LinkedIn profile: [Hassan
Memon](https://www.linkedin.com/in/hassan-memon-a109b3257/).
Thank you so much for your support and for creating such a great
platform for learning and collaboration. I'm looking forward to
contributing more in the future!
Best regards,
Hassan Memon
fix: #25137
`SqliteSaver.from_conn_string()` has been changed to a `contextmanager`
method in `langgraph >= 0.2.0`, the original usage is no longer
applicable.
Refer to
<https://github.com/langchain-ai/langgraph/pull/1271#issue-2454736415>
modification method to replace `SqliteSaver` with `MemorySaver`.
- **Description:** This PR implements the `bind_tool` functionality for
ChatZhipuAI as requested by the user. ChatZhipuAI models support tool
calling according to the `OpenAI` tool format, as outlined in their
official documentation [here](https://open.bigmodel.cn/dev/api#glm-4).
- **Issue:** ##23868
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
…ctions to match LangGraph v2 documentation. Corrected code snippet to
prevent validation errors.
Here's how you can fill out the provided template for your pull request:
---
**Thank you for contributing to LangChain!**
- [ ] **PR title**: `docs: update checkpointer example in Conversational
RAG tutorial`
- [ ] **PR message**:
- **Description:** Updated the Conversational RAG tutorial to correct
the checkpointer example by replacing `SqliteSaver` with `MemorySaver`.
Added installation instructions for `langgraph-checkpoint-memory` to
match LangGraph v2 documentation and prevent validation errors.
- **Issue:** N/A
- **Dependencies:** `langgraph-checkpoint-memory`
- **Twitter handle:** N/A
- [ ] **Add tests and docs**:
1. No new integration tests are required.
2. Updated documentation in the Conversational RAG tutorial.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: [LangChain Contribution
Guidelines](https://python.langchain.com/docs/contributing/)
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** This PR rearranges the examples in Upstash Vector
integration documentation to describe how to use namespaces and improve
the description of metadata filtering.
Thank you for contributing to LangChain!
- **Description:** Fixing package install bug in cookbook
- **Issue:** zsh:1: no matches found: unstructured[all-docs]
- **Dependencies:** N/A
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** Fix link for API reference of Gmail Toolkit
- **Issue:** I've just found this issue while I'm reading the doc
- **Dependencies:** N/A
- **Twitter handle:** [@soichisumi](https://x.com/soichisumi)
TODO: If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- In the in ` embedding-3 ` and later models of Zhipu AI, it is
supported to specify the dimensions parameter of Embedding. Ref:
https://bigmodel.cn/dev/api#text_embedding-3 .
- Add test case for `embedding-3` model by assigning dimensions.
This PR deprecates the beta upsert APIs in vectorstore.
We'll introduce them in a V2 abstraction instead to keep the existing
vectorstore implementations lighter weight.
The main problem with the existing APIs is that it's a bit more
challenging to
implement the correct behavior w/ respect to IDs since ID can be present
in
both the function signature and as an optional attribute on the document
object.
But VectorStores that pass the standard tests should have implemented
the semantics properly!
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR gets rid `root_validators(allow_reuse=True)` logic used in
EdenAI Tool in preparation for pydantic 2 upgrade.
- add another test to secret_from_env_factory
**Description:**
The get time point method in the _consume() method of
core.rate_limiters.InMemoryRateLimiter uses time.time(), which can be
affected by system time backwards. Therefore, it is recommended to use
the monotonically increasing monotonic() to obtain the time
```python
with self._consume_lock:
now = time.time() # time.time() -> time.monotonic()
# initialize on first call to avoid a burst
if self.last is None:
self.last = now
elapsed = now - self.last # when use time.time(), elapsed may be negative when system time backwards
```
Thank you for contributing to LangChain!
- [X] **PR title**: "community: fix valueerror mentions wrong argument
missing"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [X] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** when faiss.py has a None relevance_score_fn it raises
a ValueError that says a normalize_fn_score argument is needed.
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Description: As described in the related issue: There is an error
occuring when using langchain-openai>=0.1.17 which can be attributed to
the following PR: #23691
Here, the parameter logprobs is added to requests per default.
However, AzureOpenAI takes issue with this parameter as stated here:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python-new&pivots=programming-language-chat-completions
-> "If you set any of these parameters, you get an error."
Therefore, this PR changes the default value of logprobs parameter to
None instead of False. This results in it being filtered before the
request is sent.
- Issue: #24880
- Dependencies: /
Co-authored-by: blaufink <sebastian.brueckner@outlook.de>
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.
Code mod generated using the following grit pattern:
```
engine marzano(0.1)
language python
`$X.__fields__` => `get_fields($X)` where {
add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
Migrate pydantic extra to literals
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
Add a utility that can be used as a default factory
The goal will be to start migrating from of the pydantic models to use
`from_env` as a default factory if possible.
```python
from pydantic import Field, BaseModel
from langchain_core.utils import from_env
class Foo(BaseModel):
name: str = Field(default_factory=from_env('HELLO'))
```
Change all usages of __fields__ with get_fields adapter merged into
langchain_core.
Code mod generated using the following grit pattern:
```
engine marzano(0.1)
language python
`$X.__fields__` => `get_fields($X)` where {
add_import(source="langchain_core.utils.pydantic", name="get_fields")
}
```
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
## Description
This PR adds back snippets demonstrating sparse and hybrid retrieval in
the Qdrant notebook.
Without the snippets, it's hard to grok the usage.
For business subscription the status is STOCKSBUSINESS not OK
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
## Description
This pull-request extends the existing vector search strategies of
MongoDBAtlasVectorSearch to include Hybrid (Reciprocal Rank Fusion) and
Full-text via new Retrievers.
There is a small breaking change in the form of the `prefilter` kwarg to
search. For this, and because we have now added a great deal of
features, including programmatic Index creation/deletion since 0.1.0, we
plan to bump the version to 0.2.0.
### Checklist
* Unit tests have been extended
* formatting has been applied
* One mypy error remains which will either go away in CI or be
simplified.
---------
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "Documentation Update : Semantic Caching Update for
Upstash"
- Docs, llm caching integrations update
- **Description:** Upstash supports semantic caching, and we would like
to inform you about this
- **Twitter handle:** You can mention eray_eroglu_ if you want to post a
tweet about the PR
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Instantiating `GPT4AllEmbeddings` with no
`gpt4all_kwargs` argument raised a `ValidationError`. Root cause: #21238
added the capability to pass `gpt4all_kwargs` through to the `GPT4All`
instance via `Embed4All`, but broke code that did not specify a
`gpt4all_kwargs` argument.
- **Issue:** #25119
- **Dependencies:** None
- **Twitter handle:** [`@metadaddy`](https://twitter.com/metadaddy)
updated with langchain_google_community instead as the latest revision
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This PR does an aesthetic sort of the config object attributes. This
will make it a bit easier to go back and forth between pydantic v1 and
pydantic v2 on the 0.3.x branch
Among integration packages in libs/partners, Groq is an exception in
that it errors on warnings.
Following https://github.com/langchain-ai/langchain/pull/25084, Groq
fails with
> pydantic.warnings.PydanticDeprecatedSince20: The `__fields__`
attribute is deprecated, use `model_fields` instead. Deprecated in
Pydantic V2.0 to be removed in V3.0.
Here we update the behavior to no longer fail on warning, which is
consistent with the rest of the packages in libs/partners.
**Description:**
In this PR, I am adding three stock market tools from
financialdatasets.ai (my API!):
- get balance sheets
- get cash flow statements
- get income statements
Twitter handle: [@virattt](https://twitter.com/virattt)
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Example: "community: Added bedrock 3-5 sonnet cost detials for
BedrockAnthropicTokenUsageCallbackHandler"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Naval Chand <navalchand@192.168.1.36>
- description: I remove the limitation of mandatory existence of
`QIANFAN_AK` and default model name which langchain uses cause there is
already a default model nama underlying `qianfan` SDK powering langchain
component.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- community: Allow authorization to Confluence with bearer token
- **Description:** Allow authorization to Confluence with [Personal
Access
Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html)
by checking for the keys `['client_id', token: ['access_token',
'token_type']]`
- **Issue:**
Currently the following error occurs when using an personal access token
for authorization.
```python
loader = ConfluenceLoader(
url=os.getenv('CONFLUENCE_URL'),
oauth2={
'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"},
'client_id': 'client_id',
},
page_ids=['12345678'],
)
```
```
ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"]
```
With this PR the loader runs as expected.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This includes Pydantic field metadata in
`_create_subset_model_v2` so that it gets included in the final
serialized form that get sent out.
- **Issue:** #25031
- **Dependencies:** n/a
- **Twitter handle:** @gramliu
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Fixes Neo4JVector.from_existing_graph integration with huggingface
Previously threw an error with existing databases, because
from_existing_graph query returns empty list of new nodes, which are
then passed to embedding function, and huggingface errors with empty
list.
Fixes [24401](https://github.com/langchain-ai/langchain/issues/24401)
---------
Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com>
You can use this with:
```
from langchain_experimental.graph_transformers import GlinerGraphTransformer
gliner = GlinerGraphTransformer(allowed_nodes=["Person", "Organization", "Nobel"], allowed_relationships=["EMPLOYEE", "WON"])
from langchain_core.documents import Document
text = """
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
gliner.convert_to_graph_documents(documents)
```
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR adds a minimal document indexer abstraction.
The goal of this abstraction is to allow developers to create custom
retrievers that also have a standard indexing API and allow updating the
document content in them.
The abstraction comes with a test suite that can verify that the indexer
implements the correct semantics.
This is an iteration over a previous PRs
(https://github.com/langchain-ai/langchain/pull/24364). The main
difference is that we're sub-classing from BaseRetriever in this
iteration and as so have consolidated the sync and async interfaces.
The main problem with the current design is that runt time search
configuration has to be specified at init rather than provided at run
time.
We will likely resolve this issue in one of the two ways:
(1) Define a method (`get_retriever`) that will allow creating a
retriever at run time with a specific configuration.. If we do this, we
will likely break the subclass on BaseRetriever
(2) Generalize base retriever so it can support structured queries
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- [x] **PR title**: "docs: changed example for Exa search retriever
usage"
- [x] **PR message**:
- **Description:** Changed Exa integration doc at
`docs/docs/integrations/tools/exa_search.ipynb` to better reflect simple
Exa use case
- **Issue:** move toward more canonical use of Exa method
(`search_and_contents` rather than just `search`)
- **Dependencies:** no dependencies; docs only change
- **Twitter handle:** n/a - small change
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. - will do
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This PR fixes a bug where if `enable_dynamic_field` and
`partition_key_field` are enabled at the same time, a pymilvus error
occurs.
Milvus requires the partition key field to be a full schema defined
field, and not a dynamic one, so it will throw the error "the specified
partition key field {field} not exist" when creating the collection.
When `enabled_dynamic_field` is set to `True`, all schema field creation
based on `metadatas` is skipped. This code now checks if
`partition_key_field` is set, and creates the field.
Integration test added.
**Twitter handle:** StuartMarshUK
---------
Co-authored-by: Stuart Marsh <stuart.marsh@qumata.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** This PR makes the AthenaLoader profile_name optional
and fixes the type hint which says the type is `str` but it should be
`str` or `None` as None is handled in the loader init. This is a minor
problem but it just confused me when I was using the Athena Loader to
why we had to use a Profile, as I want that for local but not
production.
- **Issue:** #24957
- **Dependencies:** None.
Description: RetryWithErrorOutputParser.from_llm() creates a retry chain
that returns a Generation instance, when it should actually just return
a string.
This class was forgotten when fixing the issue in PR #24687
The comments inside some code blocks seems to be misplaced. The comment
lines containing explanation about `default_key` behavior when operating
with prompts are updated.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Added to `docs/how_to/tools_runtime` as a proof of concept, will apply
everywhere if we like.
A bit more compact than the default callouts, will help standardize the
layout of our pages since we frequently use these boxes.
<img width="1088" alt="Screenshot 2024-07-23 at 4 49 02 PM"
src="https://github.com/user-attachments/assets/7380801c-e092-4d31-bcd8-3652ee05f29e">
Hardens index commands with try/except for free clusters and optional
waits for syncing and tests.
[efriis](https://github.com/efriis) These are the upgrades to the search
index commands (CRUD) that I mentioned.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** The UnstructuredClient will have a breaking change in
the near future. Add a note in the docs that the examples here may not
use the latest version and users should refer to the SDK docs for the
latest info.
This PR introduces a module with some helper utilities for the pydantic
1 -> 2 migration.
They're meant to be used in the following way:
1) Use the utility code to get unit tests pass without requiring
modification to the unit tests
2) (If desired) upgrade the unit tests to match pydantic 2 output
3) (If desired) stop using the utility code
Currently, this module contains a way to map `schema()` generated by
pydantic 2 to (mostly) match the output from pydantic v1.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- **Description:**
Support ChatMlflow.bind_tools method
Tested in Databricks:
<img width="836" alt="image"
src="https://github.com/user-attachments/assets/fa28ef50-0110-4698-8eda-4faf6f0b9ef8">
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Serena Ruan <serena.rxy@gmail.com>
- **Description:** When adding docs for constructing ChatHuggingFace
using a HuggingFacePipeline, I forgot to add `return_full_text=False` as
an argument. In this setup, the chat response would incorrectly contain
all the input text. I am fixing that here by adding that line to the
offending notebook.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** This PR fixes a KeyError in NotionDBLoader when the
"name" key is missing in the "people" property.
**Issue:** Fixes#24223
**Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
instead of hardcoding a linter for each, iterate through the lines of
the template notebook and find lines that start with `##` (includes
lower headings), and enforce that those headings are found in new docs
that are contributed
Add compatibility for pydantic 2 for a utility function.
This will help push some small changes to master, so they don't have to
be kept track of on a separate branch.
The @pre_init validator is a temporary solution for base models. It has
similar (but not identical) semantics to @root_validator(), but it works
strictly as a pre-init validator.
It'll work as expected as long as the pydantic model type hints were
correct.
supports following UX
```python
class SubTool(TypedDict):
"""Subtool docstring"""
args: Annotated[Dict[str, Any], {}, "this does bar"]
class Tool(TypedDict):
"""Docstring
Args:
arg1: foo
"""
arg1: str
arg2: Union[int, str]
arg3: Optional[List[SubTool]]
arg4: Annotated[Literal["bar", "baz"], ..., "this does foo"]
arg5: Annotated[Optional[float], None]
```
- can parse google style docstring
- can use Annotated to specify default value (second arg)
- can use Annotated to specify arg description (third arg)
- can have nested complex types
This PR adds annotations in comunity package.
Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.
This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
Title: [pebblo_retrieval] Identifying entities in prompts given in
PebbloRetrievalQA leading to prompt governance
Description: Implemented identification of entities in the prompt using
Pebblo prompt governance API.
Issue: NA
Dependencies: NA
Add tests and docs: NA
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:**
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
- Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
Description: The old method will be discontinued; use the official SDK
for more model options.
Issue: None
Dependencies: None
Twitter handle: None
Co-authored-by: trumanyan <trumanyan@tencent.com>
**Description:** Updated the Langgraph migration docs to use
`state_modifier` rather than `messages_modifier`
**Issue:** N/A
**Dependencies:** N/A
- [ X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
PR title: Experimental: Add config to convert_to_graph_documents
Description: In order to use langfuse, i need to pass the langfuse
configuration when invoking the chain. langchain_experimental does not
allow to add any parameters (beside the documents) to the
convert_to_graph_documents method. This way, I cannot monitor the chain
in langfuse.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Catarina Franco <catarina.franco@criticalsoftware.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
## Description
This PR:
- Fixes the validation error in `FastEmbedEmbeddings`.
- Adds support for `batch_size`, `parallel` params.
- Removes support for very old FastEmbed versions.
- Updates the FastEmbed doc with the new params.
Associated Issues:
- Resolves#24039
- Resolves #https://github.com/qdrant/fastembed/issues/296
**Description:**
This update significantly improves the Brave Search Tool's utility
within the LangChain library by enriching the search results it returns.
The tool previously returned title, link, and snippet, with the snippet
being a truncated 140-character description from the search engine. To
make the search results more informative, this update enables
extra_snippets by default and introduces additional result fields:
title, link, description (enhancing and renaming the former snippet
field), age, and snippets. The snippets field provides a list of strings
summarizing the webpage, utilizing Brave's capability for more detailed
search insights. This enhancement aims to make the search tool far more
informative and beneficial for users.
**Issue:** N/A
**Dependencies:** No additional dependencies introduced.
**Twitter handle:** @davidalexr987
**Code Changes Summary:**
- Changed the default setting to include extra_snippets in search
results.
- Renamed the snippet field to description to accurately reflect its
content and included an age field for search results.
- Introduced a snippets field that lists webpage summaries, providing
users with comprehensive search result insights.
**Backward Compatibility Note:**
The renaming of snippet to description improves the accuracy of the
returned data field but may impact existing users who have developed
integration's or analyses based on the snippet field. I believe this
change is essential for clarity and utility, and it aligns better with
the data provided by Brave Search.
**Additional Notes:**
This proposal focuses exclusively on the Brave Search package, without
affecting other LangChain packages or introducing new dependencies.
Description: Since moving away from `langchain-community` is
recommended, `init_chat_models()` should import ChatOllama from
`langchain-ollama` instead.
Anthropic models (including via Bedrock and other cloud platforms)
accept a status/is_error attribute on tool messages/results
(specifically in `tool_result` content blocks for Anthropic API). Adding
a ToolMessage.status attribute so that users can set this attribute when
using those models
**Description:** Add empty string default for api_key and change
`server_url` to `url` to match existing loaders.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description**
Fixes DocumentDBVectorSearch similarity_search when no filter is used;
it defaults to None but $match does not accept None, so changed default
to empty {} before pipeline is created.
**Issue**
AWS DocumentDB similarity search does not work when no filter is used.
Error msg: "the match filter must be an expression in an object" #24775
**Dependencies**
No dependencies
**Twitter handle**
https://x.com/perepasamonte
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Mixtral with Groq has started consistently failing tool calling tests.
Here we restrict testing to llama 3.1.
- `.schema` is deprecated in pydantic proper in favor of
`.model_json_schema`.
There is an issue with the prompt format in `GenerativeAgentMemory` ,
try to fix it.
The prompt is same as the one in method `_score_memory_importance`.
issue: #24615
descriptions: The _Graph pydantic model generated from
create_simple_model (which LLMGraphTransformer uses when allowed nodes
and relationships are provided) does not constrain the relationships
(source and target types, relationship type), and the node and
relationship properties with enums when using ChatOpenAI.
The issue is that when calling optional_enum_field throughout
create_simple_model the llm_type parameter is not passed in except for
when creating node type. Passing it into each call fixes the issue.
Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
- [ ] **PR title**: "langchain-openai: openai proxy added to base
embeddings"
- [ ] **PR message**:
- **Description:**
Dear langchain developers,
You've already supported proxy for ChatOpenAI implementation in your
package. At the same time, if somebody needed to use proxy for chat, it
also could be necessary to be able to use it for OpenAIEmbeddings.
That's why I think it's important to add proxy support for OpenAI
embeddings. That's what I've done in this PR.
@baskaryan
---------
Co-authored-by: karpov <karpov@dohod.ru>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "Add documentaiton on InMemoryVectorStore driver for
MemoryDB to langchain-aws"
- Langchain-aws repo :Add MemoryDB documentation
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added documentation on InMemoryVectorStore driver to
aws.mdx and usage example on MemoryDB clusuter
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
Add memorydb notebook to docs/docs/integrations/ folde
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
In the `ChatFireworks` class definition, the Field() call for the "stop"
("stop_sequences") parameter is missing the "default" keyword.
**Issue:**
Type checker reports "stop_sequences" as a missing arg (not recognizing
the default value is None)
**Dependencies:**
None
**Twitter handle:**
None
Description: OutputFixingParser.from_llm() creates a retry chain that
returns a Generation instance, when it should actually just return a
string.
Issue: https://github.com/langchain-ai/langchain/issues/24600
Twitter handle: scribu
---------
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Thank you for contributing to LangChain!
- [x] **PR title**: "community:add Yi LLM", "docs:add Yi Documentation"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** This PR adds support for the Yi model to LangChain.
- **Dependencies:**
[langchain_core,requests,contextlib,typing,logging,json,langchain_community]
- **Twitter handle:** 01.AI
- [x] **Add tests and docs**: I've added the corresponding documentation
to the relevant paths
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Raise `LangChainException` instead of `Exception`. This alleviates the
need for library users to use bare try/except to handle exceptions
raised by `AzureSearch`.
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Description:
add a optional score relevance threshold for select only coherent
document, it's in complement of top_n
Discussion:
add relevance score threshold in flashrank_rerank document compressors
#24013
Dependencies:
no dependencies
---------
Co-authored-by: Benjamin BERNARD <benjamin.bernard@openpathview.fr>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description:
- This PR adds a self query retriever implementation for SAP HANA Cloud
Vector Engine. The retriever supports all operators except for contains.
- Issue: N/A
- Dependencies: no new dependencies added
**Add tests and docs:**
Added integration tests to:
libs/community/tests/unit_tests/query_constructors/test_hanavector.py
**Documentation for self query retriever:**
/docs/integrations/retrievers/self_query/hanavector_self_query.ipynb
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Expanded the chat model functionality to support tools
in the 'baichuan.py' file. Updated module imports and added tool object
handling in message conversions. Additional changes include the
implementation of tool binding and related unit tests. The alterations
offer enhanced model capabilities by enabling interaction with tool-like
objects.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- [x] **PR title**:
community: Add OCI Generative AI tool and structured output support
- [x] **PR message**:
- **Description:** adding tool calling and structured output support for
chat models offered by OCI Generative AI services. This is an update to
our last PR 22880 with changes in
/langchain_community/chat_models/oci_generative_ai.py
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** NA
- [x] **Add tests and docs**:
1. we have updated our unit tests
2. we have updated our documentation under
/docs/docs/integrations/chat/oci_generative_ai.ipynb
- [x] **Lint and test**: `make format`, `make lint` and `make test` we
run successfully
---------
Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
This PR proposes to create a rate limiter in the chat model directly,
and would replace: https://github.com/langchain-ai/langchain/pull/21992
It resolves most of the constraints that the Runnable rate limiter
introduced:
1. It's not annoying to apply the rate limiter to existing code; i.e.,
possible to roll out the change at the location where the model is
instantiated,
rather than at every location where the model is used! (Which is
necessary
if the model is used in different ways in a given application.)
2. batch rate limiting is enforced properly
3. the rate limiter works correctly with streaming
4. the rate limiter is aware of the cache
5. The rate limiter can take into account information about the inputs
into the
model (we can add optional inputs to it down-the road together with
outputs!)
The only downside is that information will not be properly reflected in
tracing
as we don't have any metadata evens about a rate limiter. So the total
time
spent on a model invocation will be:
* time spent waiting for the rate limiter
* time spend on the actual model request
## Example
```python
from langchain_core.rate_limiters import InMemoryRateLimiter
from langchain_groq import ChatGroq
groq = ChatGroq(rate_limiter=InMemoryRateLimiter(check_every_n_seconds=1))
groq.invoke('hello')
```
**Description:**
- This PR exposes some functions in VDMS vectorstore, updates VDMS
related notebooks, updates tests, and upgrade version of VDMS (>=0.0.20)
**Issue:** N/A
**Dependencies:**
- Update vdms>=0.0.20
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Lots of duplicated content from concepts, missing pointers to the second
half of the tool calling loop
Simpler + more focused + a more prominent link to the second half of the
loop was what I was aiming for, but down to be more conservative and
just more prominently link the "passing tools back to the model" guide.
I have also moved the tool calling conceptual guide out from under
`Structured Output` (while leaving a small section for structured
output-specific information) and added more content. The existing
`#functiontool-calling` link will go to this new section.
Fixes for Eden AI Custom tools and ChatEdenAI:
- add missing import in __init__ of chat_models
- add `args_schema` to custom tools. otherwise '__arg1' would sometimes
be passed to the `run` method
- fix IndexError when no human msg is added in ChatEdenAI
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Mistral appears to have added validation for the format of its tool call
IDs:
`{"object":"error","message":"Tool call id was abc123 but must be a-z,
A-Z, 0-9, with a length of
9.","type":"invalid_request_error","param":null,"code":null}`
This breaks compatibility of messages from other providers. Here we add
a function that converts any string to a Mistral-valid tool call ID, and
apply it to incoming messages.
Thank you for contributing to LangChain!
**Description:**
This PR allows users of `langchain_community.llms.ollama.Ollama` to
specify the `auth` parameter, which is then forwarded to all internal
calls of `requests.request`. This works in the same way as the existing
`headers` parameters. The auth parameter enables the usage of the given
class with Ollama instances, which are secured by more complex
authentication mechanisms, that do not only rely on static headers. An
example are AWS API Gateways secured by the IAM authorizer, which
expects signatures dynamically calculated on the specific HTTP request.
**Issue:**
Integrating a remote LLM running through Ollama using
`langchain_community.llms.ollama.Ollama` only allows setting static HTTP
headers with the parameter `headers`. This does not work, if the given
instance of Ollama is secured with an authentication mechanism that
makes use of dynamically created HTTP headers which for example may
depend on the content of a given request.
**Dependencies:**
None
**Twitter handle:**
None
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
### Description
* support asynchronous in InMemoryVectorStore
* since embeddings might be possible to call asynchronously, ensure that
both asynchronous and synchronous functions operate correctly.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This PR introduces the following Runnables:
1. BaseRateLimiter: an abstraction for specifying a time based rate
limiter as a Runnable
2. InMemoryRateLimiter: Provides an in-memory implementation of a rate
limiter
## Example
```python
from langchain_core.runnables import InMemoryRateLimiter, RunnableLambda
from datetime import datetime
foo = InMemoryRateLimiter(requests_per_second=0.5)
def meow(x):
print(datetime.now().strftime("%H:%M:%S.%f"))
return x
chain = foo | meow
for _ in range(10):
print(chain.invoke('hello'))
```
Produces:
```
17:12:07.530151
hello
17:12:09.537932
hello
17:12:11.548375
hello
17:12:13.558383
hello
17:12:15.568348
hello
17:12:17.578171
hello
17:12:19.587508
hello
17:12:21.597877
hello
17:12:23.607707
hello
17:12:25.617978
hello
```

## Interface
The rate limiter uses the following interface for acquiring a token:
```python
class BaseRateLimiter(Runnable[Input, Output], abc.ABC):
@abc.abstractmethod
def acquire(self, *, blocking: bool = True) -> bool:
"""Attempt to acquire the necessary tokens for the rate limiter.```
```
The flag `blocking` has been added to the abstraction to allow
supporting streaming (which is easier if blocking=False).
## Limitations
- The rate limiter is not designed to work across different processes.
It is an in-memory rate limiter, but it is thread safe.
- The rate limiter only supports time-based rate limiting. It does not
take into account the size of the request or any other factors.
- The current implementation does not handle streaming inputs well and
will consume all inputs even if the rate limit has been reached. Better
support for streaming inputs will be added in the future.
- When the rate limiter is combined with another runnable via a
RunnableSequence, usage of .batch() or .abatch() will only respect the
average rate limit. There will be bursty behavior as .batch() and
.abatch() wait for each step to complete before starting the next step.
One way to mitigate this is to use batch_as_completed() or
abatch_as_completed().
## Bursty behavior in `batch` and `abatch`
When the rate limiter is combined with another runnable via a
RunnableSequence, usage of .batch() or .abatch() will only respect the
average rate limit. There will be bursty behavior as .batch() and
.abatch() wait for each step to complete before starting the next step.
This becomes a problem if users are using `batch` and `abatch` with many
inputs (e.g., 100). In this case, there will be a burst of 100 inputs
into the batch of the rate limited runnable.
1. Using a RunnableBinding
The API would look like:
```python
from langchain_core.runnables import InMemoryRateLimiter, RunnableLambda
rate_limiter = InMemoryRateLimiter(requests_per_second=0.5)
def meow(x):
return x
rate_limited_meow = RunnableLambda(meow).with_rate_limiter(rate_limiter)
```
2. Another option is to add some init option to RunnableSequence that
changes `.batch()` to be depth first (e.g., by delegating to
`batch_as_completed`)
```python
RunnableSequence(first=rate_limiter, last=model, how='batch-depth-first')
```
Pros: Does not require Runnable Binding
Cons: Feels over-complicated
Added [ScrapingAnt](https://scrapingant.com/) Web Loader integration.
ScrapingAnt is a web scraping API that allows extracting web page data
into accessible and well-formatted markdown.
Description: Added ScrapingAnt web loader for retrieving web page data
as markdown
Dependencies: scrapingant-client
Twitter: @WeRunTheWorld3
---------
Co-authored-by: Oleg Kulyk <oleg@scrapingant.com>
#### Update (2):
A single `UnstructuredLoader` is added to handle both local and api
partitioning. This loader also handles single or multiple documents.
#### Changes in `community`:
Changes here do not affect users. In the initial process of using the
SDK for the API Loaders, the Loaders in community were refactored.
Other changes include:
The `UnstructuredBaseLoader` has a new check to see if both
`mode="paged"` and `chunking_strategy="by_page"`. It also now has
`Element.element_id` added to the `Document.metadata`.
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. As such,
now both directly inherit from `UnstructuredBaseLoader` and initialize
their `file_path`/`file` attributes respectively and implement their own
`_post_process_elements` methods.
--------
#### Update:
New SDK Loaders in a [partner
package](https://python.langchain.com/v0.1/docs/contributing/integrations/#partner-package-in-langchain-repo)
are introduced to prevent breaking changes for users (see discussion
below).
##### TODO:
- [x] Test docstring examples
--------
- **Description:** UnstructuredAPIFileIOLoader and
UnstructuredAPIFileLoader calls to the unstructured api are now made
using the unstructured-client sdk.
- **New Dependencies:** unstructured-client
- [x] **Add tests and docs**: If you're adding a new integration, please
include
- [x] a test for the integration, preferably unit tests that do not rely
on network access,
- [x] update the description in
`docs/docs/integrations/providers/unstructured.mdx`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
TODO:
- [x] Update
https://python.langchain.com/v0.1/docs/integrations/document_loaders/unstructured_file/#unstructured-api
-
`langchain/docs/docs/integrations/document_loaders/unstructured_file.ipynb`
- The description here needs to indicate that users should install
`unstructured-client` instead of `unstructured`. Read over closely to
look for any other changes that need to be made.
- [x] Update the `lazy_load` method in `UnstructuredBaseLoader` to
handle json responses from the API instead of just lists of elements.
- This method may need to be overwritten by the API loaders instead of
changing it in the `UnstructuredBaseLoader`.
- [x] Update the documentation links in the class docstrings (the
Unstructured documents have moved)
- [x] Update Document.metadata to include `element_id` (see thread
[here](https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1718187499818419))
---------
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
- [ ] **PR title**: "experimental: Adding compatibility for
OllamaFunctions with ImagePromptTemplate"
- [ ] **PR message**:
- **Description:** Removes the outdated
`_convert_messages_to_ollama_messages` method override in the
`OllamaFunctions` class to ensure that ollama multimodal models can be
invoked with an image.
- **Issue:** #24174
---------
Co-authored-by: Joel Akeret <joel.akeret@ti&m.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
add dynamic field feature to langchain_milvus
more unittest, more robustic
plan to deprecate the `metadata_field` in the future, because it's
function is the same as `enable_dynamic_field`, but the latter one is a
more advanced concept in milvus
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This linter is meant to move development to use __init__ instead of
root_validator and validator.
We need to investigate whether we need to lint some of the functionality
of Field (e.g., `lt` and `gt`, `alias`)
`alias` is the one that's most popular:
(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "alias=" | wc -l
144
(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "ge=" | wc -l
10
(community) ➜ community git:(eugene/add_linter_to_community) ✗ git grep
" Field(" | grep "gt=" | wc -l
4
This PR is under WIP and adds the following functionalities:
- [X] Supports tool calling across the langchain ecosystem. (However
streaming is not supported)
- [X] Update documentation
- [ ] **Community**: "Retrievers: Product Quantization"
- [X] This PR adds Product Quantization feature to the retrievers to the
Langchain Community. PQ is one of the fastest retrieval methods if the
embeddings are rich enough in context due to the concepts of
quantization and representation through centroids
- **Description:** Adding PQ as one of the retrievers
- **Dependencies:** using the package nanopq for this PR
- **Twitter handle:** vishnunkumar_
- [X] **Add tests and docs**: If you're adding a new integration, please
include
- [X] Added unit tests for the same in the retrievers.
- [] Will add an example notebook subsequently
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/ -
done the same
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: Update IBM docs about information to pass client
into WatsonxLLM and WatsonxEmbeddings object.
- [x] **PR message**:
- **Description:** Update IBM docs about information to pass client into
WatsonxLLM and WatsonxEmbeddings object.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
- This PR adds vector search filtering for Azure Cosmos DB Mongo vCore
and NoSQL.
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description**
Add support for Pinecone hosted embedding models as
`PineconeEmbeddings`. Replacement for #22890
**Dependencies**
Add `aiohttp` to support async embeddings call against REST directly
- [x] **Add tests and docs**: If you're adding a new integration, please
include
Added `docs/docs/integrations/text_embedding/pinecone.ipynb`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Twitter: `gdjdg17`
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
In some lines its trying to read a key that do not exists yet. In this
cases I changed the direct access to dict.get() method
- [ x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
The previous implementation would never be called.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
After this standard tests will test with the following combinations:
1. pydantic.BaseModel
2. pydantic.v1.BaseModel
If ran within a matrix, it'll covert both pydantic.BaseModel originating
from
pydantic 1 and the one defined in pydantic 2.
### Description
This pull request added new document loaders to load documents of
various formats using [Dedoc](https://github.com/ispras/dedoc):
- `DedocFileLoader` (determine file types automatically and parse)
- `DedocPDFLoader` (for `PDF` and images parsing)
- `DedocAPIFileLoader` (determine file types automatically and parse
using Dedoc API without library installation)
[Dedoc](https://dedoc.readthedocs.io) is an open-source library/service
that extracts texts, tables, attached files and document structure
(e.g., titles, list items, etc.) from files of various formats. The
library is actively developed and maintained by a group of developers.
`Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images
and more.
Full list of supported formats can be found
[here](https://dedoc.readthedocs.io/en/latest/#id1).
For `PDF` documents, `Dedoc` allows to determine textual layer
correctness and split the document into paragraphs.
### Issue
This pull request extends variety of document loaders supported by
`langchain_community` allowing users to choose the most suitable option
for raw documents parsing.
### Dependencies
The PR added a new (optional) dependency `dedoc>=2.2.5` ([library
documentation](https://dedoc.readthedocs.io)) to the
`extended_testing_deps.txt`
### Twitter handle
None
### Add tests and docs
1. Test for the integration:
`libs/community/tests/integration_tests/document_loaders/test_dedoc.py`
2. Example notebook:
`docs/docs/integrations/document_loaders/dedoc.ipynb`
3. Information about the library:
`docs/docs/integrations/providers/dedoc.mdx`
### Lint and test
Done locally:
- `make format`
- `make lint`
- `make integration_tests`
- `make docs_build` (from the project root)
---------
Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>
- **Description:** Add a DocumentTransformer for executing one or more
`LinkExtractor`s and adding the extracted links to each document.
- **Issue:** n/a
- **Depedencies:** none
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Description:** Fixes an issue where the chat message history was not
returned in order. Fixed it now by returning based on timestamps.
- [x] **Add tests and docs**: Updated the tests to check the order
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
This will generate a meaningless string "system: " for generating
condense question; this increases the probability to make an improper
condense question and misunderstand user's question. Below is a case
- Original Question: Can you explain the arguments of Meilisearch?
- Condense Question
- What are the benefits of using Meilisearch? (by CodeLlama)
- What are the reasons for using Meilisearch? (by GPT-4)
The condense questions (not matter from CodeLlam or GPT-4) are different
from the original one.
By checking the content of each dialogue turn, generating history string
only when the dialog content is not empty.
Since there is nothing before first turn, the "history" mechanism will
be ignored at the very first turn.
Doing so, the condense question will be "What are the arguments for
using Meilisearch?".
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
https://www.youtube.com/watch?v=ZIyB9e_7a4c
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:**
- Fix#12870: set scope in `default` func (ref:
https://google-auth.readthedocs.io/en/master/reference/google.auth.html)
- Moved the code to load default credentials to the bottom for clarity
of the logic
- Add docstring and comment for each credential loading logic
- **Issue:** https://github.com/langchain-ai/langchain/issues/12870
- **Dependencies:** no dependencies change
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** @gymnstcs
<!-- If no one reviews your PR within a few days, please @-mention one
of @baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** `QianfanChatEndpoint` When using tool result to
answer questions, the content of the tool is required to be in Dict
format. Of course, this can require users to return Dict format when
calling the tool, but in order to be consistent with other Chat Models,
I think such modifications are necessary.
- **Description:** Adding notebook to demonstrate visual RAG which uses
both video scene description generated by open source vision models (ex.
video-llama, video-llava etc.) as text embeddings and frames as image
embeddings to perform vector similarity search using VDMS.
- **Issue:** N/A
- **Dependencies:** N/A
Feedback that `RunnableWithMessageHistory` is unwieldy compared to
ConversationChain and similar legacy abstractions is common.
Legacy chains using memory typically had no explicit notion of threads
or separate sessions. To use `RunnableWithMessageHistory`, users are
forced to introduce this concept into their code. This possibly felt
like unnecessary boilerplate.
Here we enable `RunnableWithMessageHistory` to run without a config if
the `get_session_history` callable has no arguments. This enables
minimal implementations like the following:
```python
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
memory = InMemoryChatMessageHistory()
chain = RunnableWithMessageHistory(llm, lambda: memory)
chain.invoke("Hi I'm Bob") # Hello Bob!
chain.invoke("What is my name?") # Your name is Bob.
```
- **Description:** The correct Prompts for ZERO_SHOT_REACT were not
being used in the `create_sql_agent` function. They were not using the
specific `SQL_PREFIX` and `SQL_SUFFIX` prompts if client does not
provide any prompts. This is fixed.
- **Issue:** #23585
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Regardless of whether `embedding_func` is set or not, the 'text'
attribute of document should be assigned, otherwise the `page_content`
in the document of the final search result will be lost
### Description
* Fix `libs/langchain/dev.Dockerfile` file. copy the
`libs/standard-tests` folder when building the devcontainer.
* `poetry install --no-interaction --no-ansi --with dev,test,docs`
command requires this folder, but it was not copied.
### Reference
#### Error message when building the devcontainer from the master branch
```
...
[2024-07-20T14:27:34.779Z] ------
> [langchain langchain-dev-dependencies 7/7] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
0.409
0.409 Directory ../standard-tests does not exist
------
...
```
#### After the fix
Build success at vscode:
<img width="866" alt="image"
src="https://github.com/user-attachments/assets/10db1b50-6fcf-4dfe-83e1-d93c96aa2317">
1. Fix HuggingfacePipeline import error to newer partner package
2. Switch to IPEXModelForCausalLM for performance
There are no dependency changes since optimum intel is also needed for
QuantizedBiEncoderEmbeddings
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:** Fixes typo `Le'ts` -> `Let's`.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
When initializing retrievers with `configurable_fields` as base
retriever, `ContextualCompressionRetriever` validation fails with the
following error:
```
ValidationError: 1 validation error for ContextualCompressionRetriever
base_retriever
Can't instantiate abstract class BaseRetriever with abstract method _get_relevant_documents (type=type_error)
```
Example code:
```python
esearch_retriever = VertexAISearchRetriever(
project_id=GCP_PROJECT_ID,
location_id="global",
data_store_id=SEARCH_ENGINE_ID,
).configurable_fields(
filter=ConfigurableField(id="vertex_search_filter", name="Vertex Search Filter")
)
# rerank documents with Vertex AI Rank API
reranker = VertexAIRank(
project_id=GCP_PROJECT_ID,
location_id=GCP_REGION,
ranking_config="default_ranking_config",
)
retriever_with_reranker = ContextualCompressionRetriever(
base_compressor=reranker, base_retriever=esearch_retriever
)
```
It seems like the issue stems from ContextualCompressionRetriever
insisting that base retrievers must be strictly `BaseRetriever`
inherited, and doesn't take into account cases where retrievers need to
be chained and can have configurable fields defined.
0a1e475a30/libs/langchain/langchain/retrievers/contextual_compression.py (L15-L22)
This PR proposes that the base_retriever type be set to `RetrieverLike`,
similar to how `EnsembleRetriever` validates its list of retrievers:
0a1e475a30/libs/langchain/langchain/retrievers/ensemble.py (L58-L75)
- **Description:** Add a flag to determine whether to show progress bar
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** n/a
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Before, if an exception was raised in the outer `try` block in
`Runnable._atransform_stream_with_config` before `iterator_` is
assigned, the corresponding `finally` block would blow up with an
`UnboundLocalError`:
```txt
UnboundLocalError: cannot access local variable 'iterator_' where it is not associated with a value
```
By assigning an initial value to `iterator_` before entering the `try`
block, this commit ensures that the `finally` can run, and not bury the
"true" exception under a "During handling of the above exception [...]"
traceback.
Thanks for your consideration!
This will allow tools and parsers to accept pydantic models from any of
the
following namespaces:
* pydantic.BaseModel with pydantic 1
* pydantic.BaseModel with pydantic 2
* pydantic.v1.BaseModel with pydantic 2
xfailing some sql tests that do not currently work on sqlalchemy v1
#22207 was very much not sqlalchemy v1 compatible.
Moving forward, implementations should be compatible with both to pass
CI
- **Description:** Search has a limit of 500 results, playlistItems
doesn't. Added a class in except clause to catch another common error.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** @TupleType
---------
Co-authored-by: asi-cider <88270351+asi-cider@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** This PR introduces a change to the
`cypher_generation_chain` to dynamically concatenate inputs. This
improvement aims to streamline the input handling process and make the
method more flexible. The change involves updating the arguments
dictionary with all elements from the `inputs` dictionary, ensuring that
all necessary inputs are dynamically appended. This will ensure that any
cypher generation template will not require a new `_call` method patch.
**Issue:** This PR fixes issue #24260.
The `MongoDBStore` can manage only documents.
It's not possible to use MongoDB for an `CacheBackedEmbeddings`.
With this new implementation, it's possible to use:
```python
CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=embeddings,
document_embedding_cache=MongoDBByteStore(
connection_string=db_uri,
db_name=db_name,
collection_name=collection_name,
),
)
```
and use MongoDB to cache the embeddings !
- **Description:**
- Updated checksum in doc metadata
- Sending checksum and removing actual content, while sending data to
`pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`/loader/doc` API
- Adding `pb_id` i.e. pebblo id to doc metadata
- Refactoring as needed.
- Sending `content-checksum` and removing actual content, while sending
data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`prmopt` API
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** Updated
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
Description:
This PR fixes a KeyError: 400 that occurs in the JSON schema processing
within the reduce_openapi_spec function. The _retrieve_ref function in
json_schema.py was modified to handle missing components gracefully by
continuing to the next component if the current one is not found. This
ensures that the OpenAPI specification is fully interpreted and the
agent executes without errors.
Issue:
Fixes issue #24335
Dependencies:
No additional dependencies are required for this change.
Twitter handle:
@lunara_x
**Description:**
**TextEmbed** is a high-performance embedding inference server designed
to provide a high-throughput, low-latency solution for serving
embeddings. It supports various sentence-transformer models and includes
the ability to deploy image and text embedding models. TextEmbed offers
flexibility and scalability for diverse applications.
- **PyPI Package:** [TextEmbed on
PyPI](https://pypi.org/project/textembed/)
- **Docker Image:** [TextEmbed on Docker
Hub](https://hub.docker.com/r/kevaldekivadiya/textembed)
- **GitHub Repository:** [TextEmbed on
GitHub](https://github.com/kevaldekivadiya2415/textembed)
**PR Description**
This PR adds functionality for embedding documents and queries using the
`TextEmbedEmbeddings` class. The implementation allows for both
synchronous and asynchronous embedding requests to a TextEmbed API
endpoint. The class handles batching and permuting of input texts to
optimize the embedding process.
**Example Usage:**
```python
from langchain_community.embeddings import TextEmbedEmbeddings
# Initialise the embeddings class
embeddings = TextEmbedEmbeddings(model="your-model-id", api_key="your-api-key", api_url="your_api_url")
# Define a list of documents
documents = [
"Data science involves extracting insights from data.",
"Artificial intelligence is transforming various industries.",
"Cloud computing provides scalable computing resources over the internet.",
"Big data analytics helps in understanding large datasets.",
"India has a diverse cultural heritage."
]
# Define a query
query = "What is the cultural heritage of India?"
# Embed all documents
document_embeddings = embeddings.embed_documents(documents)
# Embed the query
query_embedding = embeddings.embed_query(query)
# Print embeddings for each document
for i, embedding in enumerate(document_embeddings):
print(f"Document {i+1} Embedding:", embedding)
# Print the query embedding
print("Query Embedding:", query_embedding)
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Fix MultiQueryRetriever breaking Embeddings with empty lines
```
[chain/end] [1:chain:ConversationalRetrievalChain > 2:retriever:Retriever > 3:retriever:Retriever > 4:chain:LLMChain] [2.03s] Exiting Chain run with output:
[outputs]
> /workspaces/Sfeir/sncf/metabot-backend/.venv/lib/python3.11/site-packages/langchain/retrievers/multi_query.py(116)_aget_relevant_documents()
-> if self.include_original:
(Pdb) queries
['## Alternative questions for "Hello, tell me about phones?":', '', '1. **What are the latest trends in smartphone technology?** (Focuses on recent advancements)', '2. **How has the mobile phone industry evolved over the years?** (Historical perspective)', '3. **What are the different types of phones available in the market, and which one is best for me?** (Categorization and recommendation)']
```
Example of failure on VertexAIEmbeddings
```
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The text content is empty."
debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.184.234:443 {created_time:"2024-04-30T09:57:45.625698408+00:00", grpc_status:3, grpc_message:"The text content is empty."}"
```
Fixes: #15959
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Add an async version of `add_documents` to
`ParentDocumentRetriever`
- **Twitter handle:** @johnkdev
---------
Co-authored-by: John Kelly <j.kelly@mwam.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** Add Riza Python/JS code execution tool
- **Issue:** N/A
- **Dependencies:** an optional dependency on the `rizaio` pypi package
- **Twitter handle:** [@rizaio](https://x.com/rizaio)
[Riza](https://riza.io) is a safe code execution environment for
agent-generated Python and JavaScript that's easy to integrate into
langchain apps. This PR adds two new tool classes to the community
package.
- **Description:** Add a `KeybertLinkExtractor` for graph vectorstores.
This allows extracting links from keywords in a Document and linking
nodes that have common keywords.
- **Issue:** None
- **Dependencies:** None.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** This allows extracting links between documents with
common named entities using [GLiNER](https://github.com/urchade/GLiNER).
- **Issue:** None
- **Dependencies:** None
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This PR updates docs to mention correct version of the
`langchain-openai` package required to use the `stream_usage` parameter.
As it can be noticed in the details of this [merge
commit](722c8f50ea),
that functionality is available only in `langchain-openai >= 0.1.9`
while docs state it's available in `langchain-openai >= 0.1.8`.
- **Description**: Mask API key for ChatOpenAi based chat_models
(openai, azureopenai, anyscale, everlyai).
Made changes to all chat_models that are based on ChatOpenAI since all
of them assumes that openai_api_key is str rather than SecretStr.
- **Issue:**: #12165
- **Dependencies:** N/A
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** N/A
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Description: added support for LangChain v0.2 for nvidia ai endpoint.
Implremented inMemory storage for chains using
RunnableWithMessageHistory which is analogous to using
`ConversationChain` which was used in v0.1 with the default
`ConversationBufferMemory`. This class is deprecated in favor of
`RunnableWithMessageHistory` in LangChain v0.2
Issue: None
Dependencies: None.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:**
- Updated the format for the 'Action' section in the planner prompt to
ensure it must be one of the tools without additional words. Adjusted
the phrasing from "should be" to "must be" for clarity and
enforceability.
- Corrected the tool appending logic in the
`_create_api_controller_agent` function to ensure that
`RequestsDeleteToolWithParsing` and `RequestsPatchToolWithParsing` are
properly added to the tools list for "DELETE" and "PATCH" operations.
**Issue:** #24382
**Dependencies:** None
**Twitter handle:** @lunara_x
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Adds MongoDBAtlasVectorSearch to list of VectorStores compatible with
the Indexing API.
(One line change.)
As of `langchain-mongodb = "0.1.7"`, the requirements that the
VectorStore have both add_documents and delete methods with an ids kwarg
is satisfied. #23535 contains the implementation of that, and has been
merged.
Thank you for contributing to LangChain!
- [x] **PR title**: [PebbloSafeLoader] Rename loader type and add
SharePointLoader to supported loaders
- **Description:** Minor fixes in the PebbloSafeLoader:
- Renamed the loader type from `remote_db` to `cloud_folder`.
- Added `SharePointLoader` to the list of loaders supported by
PebbloSafeLoader.
- **Issue:** NA
- **Dependencies:** NA
- [x] **Add tests and docs**: NA
* Please see security warning already in existing class.
* The approach here is fundamentally insecure as it's relying on a block
approach rather than an approach based on only running allowed nodes.
So users should only use this code if its running from a properly
sandboxed environment.
### Description
Missing "stream" parameter. Without it, you'd never receive a stream of
tokens when using stream() or astream()
### Issue
No existing issue available
**Description:** : Add support for chat message history using Couchbase
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
**Description:**
- Updated constructors in PyPDFParser and PyPDFLoader to handle
`extraction_mode` and additional kwargs, aligning with the capabilities
of `PageObject.extract_text()` from pypdf.
- Added `test_pypdf_loader_with_layout` along with a corresponding
example text file to validate layout extraction from PDFs.
**Issue:** fixes#19735
**Dependencies:** This change requires updating the pypdf dependency
from version 3.4.0 to at least 4.0.0.
Additional changes include the addition of a new test
test_pypdf_loader_with_layout and an example text file to ensure the
functionality of layout extraction from PDFs aligns with the new
capabilities.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
# Description
This PR aims to solve a bug in `OutputFixingParser`, `RetryOutputParser`
and `RetryWithErrorOutputParser`
The bug is that the wrong keyword argument was given to `retry_chain`.
The correct keyword argument is 'completion', but 'input' is used.
This pull request makes the following changes:
1. correct a `dict` key given to `retry_chain`;
2. add a test when using the default prompt.
- `NAIVE_FIX_PROMPT` for `OutputFixingParser`;
- `NAIVE_RETRY_PROMPT` for `RetryOutputParser`;
- `NAIVE_RETRY_WITH_ERROR_PROMPT` for `RetryWithErrorOutputParser`;
3. ~~add comments on `retry_chain` input and output types~~ clarify
`InputType` and `OutputType` of `retry_chain`
# Issue
The bug is pointed out in
https://github.com/langchain-ai/langchain/pull/19792#issuecomment-2196512928
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
This pull-request improves the treatment of document IDs in
`MongoDBAtlasVectorSearch`.
Class method signatures of add_documents, add_texts, delete, and
from_texts
now include an `ids:Optional[List[str]]` keyword argument permitting the
user
greater control.
Note that, as before, IDs may also be inferred from
`Document.metadata['_id']`
if present, but this is no longer required,
IDs can also optionally be returned from searches.
This PR closes the following JIRA issues.
* [PYTHON-4446](https://jira.mongodb.org/browse/PYTHON-4446)
MongoDBVectorSearch delete / add_texts function rework
* [PYTHON-4435](https://jira.mongodb.org/browse/PYTHON-4435) Add support
for "Indexing"
* [PYTHON-4534](https://jira.mongodb.org/browse/PYTHON-4534) Ensure
datetimes are json-serializable
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- Description: When SQLDatabase.from_databricks is ran from a Databricks
Workflow job, line 205 (default_host = context.browserHostName) throws
an ``AttributeError`` as the ``context`` object has no
``browserHostName`` attribute. The fix handles the exception and sets
the ``default_host`` variable to null
---------
Co-authored-by: lmorosdb <lmorosdb>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Description:** At the moment neo4j wrapper is using setVectorProperty,
which is deprecated
([link](https://neo4j.com/docs/operations-manual/5/reference/procedures/#procedure_db_create_setVectorProperty)).
I replaced with the non-deprecated version.
Neo4j recently introduced a new cypher method to associate embeddings
into relations using "setRelationshipVectorProperty" method. In this PR
I also implemented a new method to perform this association maintaining
the same format used in the "add_embeddings" method which is used to
associate embeddings into Nodes.
I also included a test case for this new method.
Description: added support for LangChain v0.2 for PipelineAI
integration. Removed deprecated classes and incorporated support for
LangChain v0.2 to integrate with PipelineAI. Removed LLMChain and
replaced it with Runnable interface. Also added StrOutputParser, that
parses LLMResult into the top likely string.
Issue: None
Dependencies: None.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: Added support for langchain v0.2 for shale protocol.
Replaced LLMChain with Runnable interface which allows any two Runnables
to be 'chained' together into sequences. Also added
StreamingStdOutCallbackHandler. Callback handler for streaming.
Issue: None
Dependencies: None.
This cookbook guides user to implement RAG locally on CPU using
langchain tools and open source models. It enables Llama2 model to
answer queries about Intel Q1 2024 earning release using RAG pipeline.
Main libraries are langchain, llama-cpp-python and gpt4all.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Sriragavi <sriragavi.r@intel.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [X] *ApertureDB as vectorstore**: "community: Add ApertureDB as a
vectorestore"
- **Description:** this change provides a new community integration that
uses ApertureData's ApertureDB as a vector store.
- **Issue:** none
- **Dependencies:** depends on ApertureDB Python SDK
- **Twitter handle:** ApertureData
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Integration tests rely on a local run of a public docker image.
Example notebook additionally relies on a local Ollama server.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
All lint tests pass.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Gautam <gautam@aperturedata.io>
On using TavilySearchAPIRetriever with any conversation chain getting
error :
`TypeError: Client.__init__() got an unexpected keyword argument
'api_key'`
It is because the retreiver class is using the depreciated `Client`
class, `TavilyClient` need to be used instead.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
Databricks Vector Search recently added support for hybrid
keyword-similarity search.
See [usage
examples](https://docs.databricks.com/en/generative-ai/create-query-vector-search.html#query-a-vector-search-endpoint)
from their documentation.
This PR updates the Langchain vectorstore interface for Databricks to
enable the user to pass the *query_type* parameter to
*similarity_search* to make use of this functionality.
By default, there will not be any changes for existing users of this
interface. To use the new hybrid search feature, it is now possible to
do
```python
# ...
dvs = DatabricksVectorSearch(index)
dvs.similarity_search("my search query", query_type="HYBRID")
```
Or using the retriever:
```python
retriever = dvs.as_retriever(
search_kwargs={
"query_type": "HYBRID",
}
)
retriever.invoke("my search query")
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
You.com is releasing two new conversational APIs — Smart and Research.
This PR:
- integrates those APIs with Langchain, as an LLM
- streaming is supported
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This pull request introduces two new methods to the
Langchain Chroma partner package that enable similarity search based on
image embeddings. These methods enhance the package's functionality by
allowing users to search for images similar to a given image URI. Also
introduces a notebook to demonstrate it's use.
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** @mrugank9009
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
In some lines its trying to read a key that do not exists yet. In this
cases I changed the direct access to dict.get() method
Thank you for contributing to LangChain!
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Description:**
The `split_text_from_url` method of `HTMLHeaderTextSplitter` does not
include parameters like `timeout` when using `requests` to send a
request. Therefore, I suggest adding a `kwargs` parameter to the
function, which can be passed as arguments to `requests.get()`
internally, allowing control over the `get` request.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
The functions `convert_to_messages` has had an expansion of the
arguments it can take:
1. Previously, it only could take a `Sequence` in order to iterate over
it. This has been broadened slightly to an `Iterable` (which should have
no other impact).
2. Support for `PromptValue` and `BaseChatPromptTemplate` has been
added. These are generated when combining messages using the overloaded
`+` operator.
Functions which rely on `convert_to_messages` (namely `filter_messages`,
`merge_message_runs` and `trim_messages`) have had the type of their
arguments similarly expanded.
Resolves#23706.
<!--
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
-->
---------
Signed-off-by: JP-Ellis <josh@jpellis.me>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Spell check fixes for docs, comments, and a couple of
strings. No code change e.g. variable names.
**Issue:** none
**Dependencies:** none
**Twitter handle:** hmartin
## Description
This PR adds integration tests to follow up on #24164.
By default, the tests use an in-memory instance.
To run the full suite of tests, with both in-memory and Qdrant server:
```
$ docker run -p 6333:6333 qdrant/qdrant
$ make test
$ make integration_test
```
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Explicitly add parameters from openai API
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
I stumbled upon a bug that led to different similarity scores between
the async and sync similarity searches with relevance scores in Qdrant.
The reason being is that _asimilarity_search_with_relevance_scores is
missing, this makes langchain_qdrant use the method of the vectorstore
baseclass leading to drastically different results.
To illustrate the magnitude here are the results running an identical
search in a test vectorstore.
Output of asimilarity_search_with_relevance_scores:
[0.9902903374601824, 0.9472135924938804, 0.8535534011299859]
Output of similarity_search_with_relevance_scores:
[0.9805806749203648, 0.8944271849877607, 0.7071068022599718]
Co-authored-by: Erick Friis <erick@langchain.dev>
I made some changes based on the issues I stumbled on while following
the README of neo4j-semantic-ollama.
I made the changes to the ollama variant, and can also port the relevant
ones to the layer variant once this is approved.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** the template neo4j-semantic-ollama uses an import from
the neo4j-semantic-layer template instead of its own.
Co-authored-by: Erick Friis <erick@langchain.dev>
Latest langchain-cohere sdk mandates passing in the model parameter into
the Embeddings and Reranker inits.
This PR is to update the docs to reflect these changes.
Thank you for contributing to LangChain!
- [ ] **HuggingFaceEndpoint**: "Skip Login to HuggingFaceHub"
- Where: langchain, community, llm, huggingface_endpoint
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Skip login to huggingface hub when when
`huggingfacehub_api_token` is not set. This is needed when using custom
`endpoint_url` outside of HuggingFaceHub.
- **Issue:** the issue # it fixes
https://github.com/langchain-ai/langchain/issues/20342 and
https://github.com/langchain-ai/langchain/issues/19685
- **Dependencies:** None
- [ ] **Add tests and docs**:
1. Tested with locally available TGI endpoint
2. Example Usage
```python
from langchain_community.llms import HuggingFaceEndpoint
llm = HuggingFaceEndpoint(
endpoint_url='http://localhost:8080',
server_kwargs={
"headers": {"Content-Type": "application/json"}
}
)
resp = llm.invoke("Tell me a joke")
print(resp)
```
Also tested against HF Endpoints
```python
from langchain_community.llms import HuggingFaceEndpoint
huggingfacehub_api_token = "hf_xyz"
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
huggingfacehub_api_token=huggingfacehub_api_token,
repo_id=repo_id,
)
resp = llm.invoke("Tell me a joke")
print(resp)
```
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
**Description:** Add support for caching (standard + semantic) LLM
responses using Couchbase
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
If you use `refresh_schema=False`, then the metadata constraint doesn't
exist. ATM, we used default `None` in the constraint check, but then
`any` fails because it can't iterate over None value
- **Description:** `StuffDocumentsChain` uses `LLMChain` which is
deprecated by langchain runnables. `create_stuff_documents_chain` is the
replacement, but needs support for `document_variable_name` to allow
multiple uses of the chain within a longer chain.
- **Issue:** none
- **Dependencies:** none
Thank you for contributing to LangChain!
**Description**:
This PR fixes a bug described in the issue in #24064, when using the
AzureSearch Vectorstore with the asyncronous methods to do search which
is also the method used for the retriever. The proposed change includes
just change the access of the embedding as optional because is it not
used anywhere to retrieve documents. Actually, the syncronous methods of
retrieval do not use the embedding neither.
With this PR the code given by the user in the issue works.
```python
vectorstore = AzureSearch(
azure_search_endpoint=os.getenv("AI_SEARCH_ENDPOINT_SECRET"),
azure_search_key=os.getenv("AI_SEARCH_API_KEY"),
index_name=os.getenv("AI_SEARCH_INDEX_NAME_SECRET"),
fields=fields,
embedding_function=encoder,
)
retriever = vectorstore.as_retriever(search_type="hybrid", k=2)
await vectorstore.avector_search("what is the capital of France")
await retriever.ainvoke("what is the capital of France")
```
**Issue**:
The Azure Search Vectorstore is not working when searching for documents
with asyncronous methods, as described in issue #24064
**Dependencies**:
There are no extra dependencies required for this change.
---------
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
## Description
This PR introduces a new sparse embedding provider interface to work
with the new Qdrant implementation that will follow this PR.
Additionally, an implementation of this interface is provided with
https://github.com/qdrant/fastembed.
This PR will be followed by
https://github.com/Anush008/langchain/pull/3.
Disabled by default.
```python
from langchain_core.tools import tool
@tool(parse_docstring=True)
def foo(bar: str, baz: int) -> str:
"""The foo.
Args:
bar: this is the bar
baz: this is the baz
"""
return bar
foo.args_schema.schema()
```
```json
{
"title": "fooSchema",
"description": "The foo.",
"type": "object",
"properties": {
"bar": {
"title": "Bar",
"description": "this is the bar",
"type": "string"
},
"baz": {
"title": "Baz",
"description": "this is the baz",
"type": "integer"
}
},
"required": [
"bar",
"baz"
]
}
```
preventing issues like #22546
Notes:
- this will only affect release CI. We may want to consider adding
running unit tests with min versions to PR CI in some form
- because this only affects release CI, it could create annoying issues
releasing while I'm on vacation. Unless anyone feels strongly, I'll wait
to merge this til when I'm back
Refactor the code to use the existing InMemroyVectorStore.
This change is needed for another PR that moves some of the imports
around (and messes up the mock.patch in this file)
Description: ImagePromptTemplate for Multimodal llms like llava when
using Ollama
Twitter handle: https://x.com/a7ulr
Details:
When using llava models / any ollama multimodal llms and passing images
in the prompt as urls, langchain breaks with this error.
```python
image_url_components = image_url.split(",")
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'split'
```
From the looks of it, there was bug where the condition did check for a
`url` field in the variable but missed to actually assign it.
This PR fixes ImagePromptTemplate for Multimodal llms like llava when
using Ollama specifically.
@hwchase17
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This adds an extractor interface and an implementation for HTML pages.
Extractors are used to create GraphVectorStore Links on loaded content.
**Twitter handle:** cbornet_
**Description:** There was missing some documentation regarding the
`filter` and `params` attributes in similarity search methods.
---------
Co-authored-by: rpereira <rafael.pereira@criticalsoftware.com>
Decisions to discuss:
1. is a new attr needed or could additional_kwargs be used for this
2. is raw_output a good name for this attr
3. should raw_output default to {} or None
4. should raw_output be included in serialization
5. do we need to update repr/str to exclude raw_output
- add version of AIMessageChunk.__add__ that can add many chunks,
instead of only 2
- In agenerate_from_stream merge and parse chunks in bg thread
- In output parse base classes do more work in bg threads where
appropriate
---------
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This PR moves the in memory implementation to langchain-core.
* The implementation remains importable from langchain-community.
* Supporting utilities are marked as private for now.
mmemory in the description -> memory (corrected spelling mistake)
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Added link to list of built-in tools.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** Support PGVector in PebbloRetrievalQA
- Identity and Semantic Enforcement support for PGVector
- Refactor Vectorstore validation and name check
- Clear the overridden identity and semantic enforcement filters
- **Issue:** NA
- **Dependencies:** NA
- **Tests**: NA(already added)
- **Docs**: Updated
- **Twitter handle:** [@Raj__725](https://twitter.com/Raj__725)
**Description:** Fix for source path mismatch in PebbloSafeLoader. The
fix involves storing the full path in the doc metadata in VectorDB
**Issue:** NA, caught in internal testing
**Dependencies:** NA
**Add tests**: Updated tests
resolves https://github.com/langchain-ai/langchain/issues/23911
When an AIMessageChunk is instantiated, we attempt to parse tool calls
off of the tool_call_chunks.
Here we add a special-case to this parsing, where `""` will be parsed as
`{}`.
This is a reaction to how Anthropic streams tool calls in the case where
a function has no arguments:
```
{'id': 'toolu_01J8CgKcuUVrMqfTQWPYh64r', 'input': {}, 'name': 'magic_function', 'type': 'tool_use', 'index': 1}
{'partial_json': '', 'type': 'tool_use', 'index': 1}
```
The `partial_json` does not accumulate to a valid json string-- most
other providers tend to emit `"{}"` in this case.
Thank you for contributing to LangChain!
- [x] **PR title**: "IBM: Added WatsonxChat to chat models preview,
update passing params to invoke method"
- [x] **PR message**:
- **Description:** Added WatsonxChat passing params to invoke method,
added integration tests
- **Dependencies:** `ibm_watsonx_ai`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR introduces a GraphStore component. GraphStore extends
VectorStore with the concept of links between documents based on
document metadata. This allows linking documents based on a variety of
techniques, including common keywords, explicit links in the content,
and other patterns.
This works with existing Documents, so it’s easy to extend existing
VectorStores to be used as GraphStores. The interface can be implemented
for any Vector Store technology that supports metadata, not only graph
DBs.
When retrieving documents for a given query, the first level of search
is done using classical similarity search. Next, links may be followed
using various traversal strategies to get additional documents. This
allows documents to be retrieved that aren’t directly similar to the
query but contain relevant information.
2 retrieving methods are added to the VectorStore ones :
* traversal_search which gets all linked documents up to a certain depth
* mmr_traversal_search which selects linked documents using an MMR
algorithm to have more diverse results.
If a depth of retrieval of 0 is used, GraphStore is effectively a
VectorStore. It enables an easy transition from a simple VectorStore to
GraphStore by adding links between documents as a second step.
An implementation for Apache Cassandra is also proposed.
See
https://github.com/datastax/ragstack-ai/blob/main/libs/knowledge-store/notebooks/astra_support.ipynb
for a notebook explaining how to use GraphStore and that shows that it
can answer correctly to questions that a simple VectorStore cannot.
**Twitter handle:** _cbornet
This PR rolls out part of the new proposed interface for vectorstores
(https://github.com/langchain-ai/langchain/pull/23544) to existing store
implementations.
The PR makes the following changes:
1. Adds standard upsert, streaming_upsert, aupsert, astreaming_upsert
methods to the vectorstore.
2. Updates `add_texts` and `aadd_texts` to be non required with a
default implementation that delegates to `upsert` and `aupsert` if those
have been implemented. The original `add_texts` and `aadd_texts` methods
are problematic as they spread object specific information across
document and **kwargs. (e.g., ids are not a part of the document)
3. Adds a default implementation to `add_documents` and `aadd_documents`
that delegates to `upsert` and `aupsert` respectively.
4. Adds standard unit tests to verify that a given vectorstore
implements a correct read/write API.
A downside of this implementation is that it creates `upsert` with a
very similar signature to `add_documents`.
The reason for introducing `upsert` is to:
* Remove any ambiguities about what information is allowed in `kwargs`.
Specifically kwargs should only be used for information common to all
indexed data. (e.g., indexing timeout).
*Allow inheriting from an anticipated generalized interface for indexing
that will allow indexing `BaseMedia` (i.e., allow making a vectorstore
for images/audio etc.)
`add_documents` can be deprecated in the future in favor of `upsert` to
make sure that users have a single correct way of indexing content.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
The `langchain_common.vectostore.Redis.delete()` must not be a
`@staticmethod`.
With the current implementation, it's not possible to have multiple
instances of Redis vectorstore because all versions must share the
`REDIS_URL`.
It's not conform with the base class.
**Description**: After reviewing the prompts API, it is clear that the
only way a user can explicitly mark an input variable as optional is
through the `MessagePlaceholder.optional` attribute. Otherwise, the user
must explicitly pass in the `input_variables` expected to be used in the
`BasePromptTemplate`, which will be validated upon execution. Therefore,
to semantically handle a `MessagePlaceholder` `variable_name` as
optional, we will treat the `variable_name` of `MessagePlaceholder` as a
`partial_variable` if it has been marked as optional. This approach
aligns with how the `variable_name` of `MessagePlaceholder` is already
handled
[here](https://github.com/keenborder786/langchain/blob/optional_input_variables/libs/core/langchain_core/prompts/chat.py#L991).
Additionally, an attribute `optional_variable` has been added to
`BasePromptTemplate`, and the `variable_name` of `MessagePlaceholder` is
also made part of `optional_variable` when marked as optional.
Moreover, the `get_input_schema` method has been updated for
`BasePromptTemplate` to differentiate between optional and non-optional
variables.
**Issue**: #22832, #21425
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Description: Fixed a typo during the imports for the
GoogleDriveSearchTool
Issue: It's only for the docs, but it bothered me so i decided to fix it
quickly :D
- **Description:** Enhance JiraAPIWrapper to accept the 'cloud'
parameter through an environment variable. This update allows more
flexibility in configuring the environment for the Jira API.
- **Twitter handle:** Andre_Q_Pereira
---------
Co-authored-by: André Quintino <andre.quintino@tui.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR adds a `SingleStoreDBSemanticCache` class that implements a
cache based on SingleStoreDB vector store, integration tests, and a
notebook example.
Additionally, this PR contains minor changes to SingleStoreDB vector
store:
- change add texts/documents methods to return a list of inserted ids
- implement delete(ids) method to delete documents by list of ids
- added drop() method to drop a correspondent database table
- updated integration tests to use and check functionality implemented
above
CC: @baskaryan, @hwchase17
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
It's a follow-up to https://github.com/langchain-ai/langchain/pull/23765
Now the tools can be bound by calling `bind_tools`
```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_community.chat_models import ChatLiteLLM
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
prompt = "Which city is hotter today and which is bigger: LA or NY?"
# tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]
tools = [GetWeather, GetPopulation]
llm = ChatLiteLLM(model="claude-3-sonnet-20240229").bind_tools(tools)
ai_msg = llm.invoke(prompt)
print(ai_msg.tool_calls)
```
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Igor Drozdov <idrozdov@gitlab.com>
This PR should fix the following issue:
https://github.com/langchain-ai/langchain/issues/23824
Introduced as part of this PR:
https://github.com/langchain-ai/langchain/pull/23416
I am unable to reproduce the issue locally though it's clear that we're
getting a `serialized` object which is not a dictionary somehow.
The test below passes for me prior to the PR as well
```python
def test_cache_with_sqllite() -> None:
from langchain_community.cache import SQLiteCache
from langchain_core.globals import set_llm_cache
cache = SQLiteCache(database_path=".langchain.db")
set_llm_cache(cache)
chat_model = FakeListChatModel(responses=["hello", "goodbye"], cache=True)
assert chat_model.invoke("How are you?").content == "hello"
assert chat_model.invoke("How are you?").content == "hello"
```
- Description: Add support for `path` and `detail` keys in
`ImagePromptTemplate`. Previously, only variables associated with the
`url` key were considered. This PR allows for the inclusion of a local
image path and a detail parameter as input to the format method.
- Issues:
- fixes#20820
- related to #22024
- Dependencies: None
- Twitter handle: @DeschampsTho5
---------
Co-authored-by: tdeschamps <tdeschamps@kameleoon.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
The mongdb have some errors.
- `add_texts() -> List` returns a list of `ObjectId`, and not a list of
string
- `delete()` with `id` never remove chunks.
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
enviroment -> environment
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
Use pydantic to infer nested schemas and all that fun.
Include bagatur's convenient docstring parser
Include annotation support
Previously we didn't adequately support many typehints in the
bind_tools() method on raw functions (like optionals/unions, nested
types, etc.)
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Added support for streaming in AI21 Jamba Model
- **Twitter handle:** https://github.com/AI21Labs
- [x] **Add tests and docs**: If you're adding a new integration, please
include
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
---------
Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
**Description:** Update docs content on agent memory
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
`ChatAnthropic` can get `stop_reason` from the resulting `AIMessage` in
`invoke` and `ainvoke`, but not in `stream` and `astream`.
This is a different behavior from `ChatOpenAI`.
It is possible to get `stop_reason` from `stream` as well, since it is
needed to determine the next action after the LLM call. This would be
easier to handle in situations where only `stop_reason` is needed.
- Issue: NA
- Dependencies: NA
- Twitter handle: https://x.com/kiarina37
- **Description:** Fix some issues in MiniMaxChat
- Fix `minimax_api_host` not in `values` error
- Remove `minimax_group_id` from reading environment variables, the
`minimax_group_id` no longer use in MiniMaxChat
- Invoke callback prior to yielding token, the issus #16913
The prompt template variable detection only worked for singly-nested
sections because we just kept track of whether we were in a section and
then set that to false as soon as we encountered an end block. i.e. the
following:
```
{{#outerSection}}
{{variableThatShouldntShowUp}}
{{#nestedSection}}
{{nestedVal}}
{{/nestedSection}}
{{anotherVariableThatShouldntShowUp}}
{{/outerSection}}
```
Would yield `['outerSection', 'anotherVariableThatShouldntShowUp']` as
input_variables (whereas it should just yield `['outerSection']`). This
fixes that by keeping track of the current depth and using a stack.
When `model_kwargs={"tools": tools}` are passed to `ChatLiteLLM`, they
are executed, but the response is not recognized correctly
Let's add `tool_calls` to the `additional_kwargs`
Thank you for contributing to LangChain!
## ChatAnthropic
I used the following example to verify the output of llm with tools:
```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_anthropic import ChatAnthropic
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
llm = ChatAnthropic(model="claude-3-sonnet-20240229")
llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
print(ai_msg.tool_calls)
```
I get the following response:
```json
[{'name': 'GetWeather', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_01UfDA89knrhw3vFV9X47neT'}, {'name': 'GetWeather', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01NrYVRYae7m7z7tBgyPb3Gd'}, {'name': 'GetPopulation', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_01EPFEpDgzL6vV2dTpD9SVP5'}, {'name': 'GetPopulation', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01B5J6tPJXgwwfhQX9BHP2dt'}]
```
## LiteLLM
Based on https://litellm.vercel.app/docs/completion/function_call
```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
import litellm
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
prompt = "Which city is hotter today and which is bigger: LA or NY?"
tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]
response = litellm.completion(model="claude-3-sonnet-20240229", messages=[{'role': 'user', 'content': prompt}], tools=tools)
print(response.choices[0].message.tool_calls)
```
```python
[ChatCompletionMessageToolCall(function=Function(arguments='{"location": "Los Angeles, CA"}', name='GetWeather'), id='toolu_01HeDWV5vP7BDFfytH5FJsja', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "New York, NY"}', name='GetWeather'), id='toolu_01EiLesUSEr3YK1DaE2jxsQv', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "Los Angeles, CA"}', name='GetPopulation'), id='toolu_01Xz26zvkBDRxEUEWm9pX6xa', type='function'), ChatCompletionMessageToolCall(function=Function(arguments='{"location": "New York, NY"}', name='GetPopulation'), id='toolu_01SDqKnsLjvUXuBsgAZdEEpp', type='function')]
```
## ChatLiteLLM
When I try the following
```python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.utils.function_calling import convert_to_openai_tool
from langchain_community.chat_models import ChatLiteLLM
class GetWeather(BaseModel):
'''Get the current weather in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
'''Get the current population in a given location'''
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
prompt = "Which city is hotter today and which is bigger: LA or NY?"
tools = [convert_to_openai_tool(GetWeather), convert_to_openai_tool(GetPopulation)]
llm = ChatLiteLLM(model="claude-3-sonnet-20240229", model_kwargs={"tools": tools})
ai_msg = llm.invoke(prompt)
print(ai_msg)
print(ai_msg.tool_calls)
```
```python
content="Okay, let's find out the current weather and populations for Los Angeles and New York City:" response_metadata={'token_usage': Usage(prompt_tokens=329, completion_tokens=193, total_tokens=522), 'model': 'claude-3-sonnet-20240229', 'finish_reason': 'tool_calls'} id='run-748b7a84-84f4-497e-bba1-320bd4823937-0'
[]
```
---
When I apply the changes of this PR, the output is
```json
[{'name': 'GetWeather', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_017D2tGjiaiakB1HadsEFZ4e'}, {'name': 'GetWeather', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01WrDpJfVqLkPejWzonPCbLW'}, {'name': 'GetPopulation', 'args': {'location': 'Los Angeles, CA'}, 'id': 'toolu_016UKyYrVAV9Pz99iZGgGU7V'}, {'name': 'GetPopulation', 'args': {'location': 'New York, NY'}, 'id': 'toolu_01Sgv1imExFX1oiR1Cw88zKy'}]
```
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Igor Drozdov <idrozdov@gitlab.com>
Description:
1. partners/HuggingFace module support reading params from env. Not
adjust langchain_community/.../huggingfaceXX modules since they are
deprecated.
2. pydantic 2 @root_validator migration.
Issue: #22448#22819
---------
Co-authored-by: gongwn1 <gongwn1@lenovo.com>
**Description**: Milvus vectorstore supports both `add_documents` via
the base class and `upsert` method which deletes and re-adds documents
based on their ids
**Issue**: Due to mismatch in the interfaces the ids used by `upsert`
are neglected in `add_documents`, as `ids` are passed as argument in
`upsert` but via `kwargs` is `add_documents`
This caused exceptions and inconsistency in the DB, tested with
`auto_id=False`
**Fix**: pass `ids` via `kwargs` to `add_documents`
added pre-filtering documentation
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** added filter vector search
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:**: n/a
- [x] **Add tests and docs**: If you're adding a new integration, please
include - No need for tests, just a simple doc update
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
# Fix streaming in mistral with ainvoke
- [x] **PR title**
- [x] **PR message**
- [x] **Add tests and docs**:
1. [x] Added a test for the fixed integration.
2. [x] An example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Ran `make format`, `make lint` and `make test`
from the root of the package(s) I've modified.
Hello
* I Identified an issue in the mistral package where the callback
streaming (see on_llm_new_token) was not functioning correctly when the
streaming parameter was set to True and call with `ainvoke`.
* The root cause of the problem was the streaming not taking into
account. ( I think it's an oversight )
* To resolve the issue, I added the `streaming` attribut.
* Now, the callback with streaming works as expected when the streaming
parameter is set to True.
## How to reproduce
```
from langchain_mistralai.chat_models import ChatMistralAI
chain = ChatMistralAI(streaming=True)
# Add a callback
chain.ainvoke(..)
# Oberve on_llm_new_token
# Now, the callback is given as streaming tokens, before it was in grouped format.
```
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR implements a BaseContent object from which Document and Blob
objects will inherit proposed here:
https://github.com/langchain-ai/langchain/pull/23544
Alternative: Create a base object that only has an identifier and no
metadata.
For now decided against it, since that refactor can be done at a later
time. It also feels a bit odd since our IDs are optional at the moment.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This fix is for #21726. When having other packages installed that
require the `openai_api_base` environment variable, users are not able
to instantiate the AzureChatModels or AzureEmbeddings.
This PR adds a new value `ignore_openai_api_base` which is a bool. When
set to True, it sets `openai_api_base` to `None`
Two new tests were added for the `test_azure` and a new file
`test_azure_embeddings`
A different approach may be better for this. If you can think of better
logic, let me know and I can adjust it.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fix#23716
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This PR introduces a maxsize parameter for the InMemoryCache class,
allowing users to specify the maximum number of items to store in the
cache. If the cache exceeds the specified maximum size, the oldest items
are removed. Additionally, comprehensive unit tests have been added to
ensure all functionalities are thoroughly tested. The tests are written
using pytest and cover both synchronous and asynchronous methods.
Twitter: @spyrosavl
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Fix LLM string representation for serializable objects.
Fix for issue: https://github.com/langchain-ai/langchain/issues/23257
The llm string of serializable chat models is the serialized
representation of the object. LangChain serialization dumps some basic
information about non serializable objects including their repr() which
includes an object id.
This means that if a chat model has any non serializable fields (e.g., a
cache), then any new instantiation of the those fields will change the
llm representation of the chat model and cause chat misses.
i.e., re-instantiating a postgres cache would result in cache misses!
**Description:** In the chat_models module of the language model, the
import statement for BaseModel has been moved from the conditionally
imported section to the main import area, fixing `NameError `.
**Issue:** fix `NameError `
- Description: Modified the prompt created by the function
`create_unstructured_prompt` (which is called for LLMs that do not
support function calling) by adding conditional checks that verify if
restrictions on entity types and rel_types should be added to the
prompt. If the user provides a sufficiently large text, the current
prompt **may** fail to produce results in some LLMs. I have first seen
this issue when I implemented a custom LLM class that did not support
Function Calling and used Gemini 1.5 Pro, but I was able to replicate
this issue using OpenAI models.
By loading a sufficiently large text
```python
from langchain_community.llms import Ollama
from langchain_openai import ChatOpenAI, OpenAI
from langchain_core.prompts import PromptTemplate
import re
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
with open("texto-longo.txt", "r") as file:
full_text = file.read()
partial_text = full_text[:4000]
documents = [Document(page_content=partial_text)] # cropped to fit GPT 3.5 context window
```
And using the chat class (that has function calling)
```python
chat_openai = ChatOpenAI(model="gpt-3.5-turbo", model_kwargs={"seed": 42})
chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)
```
It works:
```
>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy of Man's Desiring", type='Music'), Node(id='Godel', type='Person'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='clever way of encoding the complicated expressions as numbers', type='Concept')]
```
But if you try to use the non-chat LLM class (that does not support
function calling)
```python
openai = OpenAI(
model="gpt-3.5-turbo-instruct",
max_tokens=1000,
)
gpt35_transformer = LLMGraphTransformer(llm=openai)
graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)
```
It uses the prompt that has issues and sometimes does not produce any
result
```
>>> print(graph_from_gpt35[0].nodes)
[]
```
After implementing the changes, I was able to use both classes more
consistently:
```shell
>>> chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
>>> graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy Of Man'S Desiring", type='Music'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='Godel', type='Person')]
>>> gpt35_transformer = LLMGraphTransformer(llm=openai)
>>> graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_gpt35[0].nodes)
[Node(id='I', type='Pronoun'), Node(id="JESU, JOY OF MAN'S DESIRING", type='Song'), Node(id='larger memory', type='Memory'), Node(id='this nice tree structure', type='Structure'), Node(id='how you can do it all with the numbers', type='Process'), Node(id='JOHANN SEBASTIAN BACH', type='Composer'), Node(id='type of structure', type='Characteristic'), Node(id='that', type='Pronoun'), Node(id='we', type='Pronoun'), Node(id='worry', type='Verb')]
```
The results are a little inconsistent because the GPT 3.5 model may
produce incomplete json due to the token limit, but that could be solved
(or mitigated) by checking for a complete json when parsing it.
This PR adds a part of the indexing API proposed in this RFC
https://github.com/langchain-ai/langchain/pull/23544/files.
It allows rolling out `get_by_ids` which should be uncontroversial to
existing vectorstores without introducing new abstractions.
The semantics for this method depend on the ability of identifying
returned documents using the new optional ID field on documents:
https://github.com/langchain-ai/langchain/pull/23411
Alternatives are:
1. Relax the sequence requirement
```python
def get_by_ids(self, ids: Iterable[str], /) -> Iterable[Document]:
```
Rejected:
- implementations are more likley to start batching with bad defaults
- users would need to call list() or we'd need to introduce another
convenience method
2. Support more kwargs
```python
def get_by_ids(self, ids: Sequence[str], /, **kwargs) -> List[Document]:
...
```
Rejected:
- No need for `batch` parameter since IDs is a sequence
- Output cannot be customized since `Document` is fixed. (e.g.,
parameters could be useful to grab extra metadata like the vector that
was indexed with the Document or to project a part of the document)
**Description:** LanceDB didn't allow querying the database using
similarity score thresholds because the metrics value was missing. This
PR simply fixes that bug.
**Issue:** not applicable
**Dependencies:** none
**Twitter handle:** not available
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** At the moment the Jira wrapper only accepts the the
usage of the Username and Password/Token at the same time. However Jira
allows the connection using only is useful for enterprise context.
Co-authored-by: rpereira <rafael.pereira@criticalsoftware.com>
After merging the [PR #22594 to include Jina AI multimodal capabilities
in the Langchain
documentation](https://github.com/langchain-ai/langchain/pull/22594), we
updated the notebook to showcase the difference between text and
multimodal capabilities more clearly.
DOC: missing parenthesis #23687
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- Update Meta Llama 3 cookbook link
- Add prereq section and information on `messages_modifier` to LangGraph
migration guide
- Update `PydanticToolsParser` explanation and entrypoint in tool
calling guide
- Add more obvious warning to `OllamaFunctions`
- Fix Wikidata tool install flow
- Update Bedrock LLM initialization
@baskaryan can you add a bit of information on how to authenticate into
the `ChatBedrock` and `BedrockLLM` models? I wasn't able to figure it
out :(
This change adds a new message type `RemoveMessage`. This will enable
`langgraph` users to manually modify graph state (or have the graph
nodes modify the state) to remove messages by `id`
Examples:
* allow users to delete messages from state by calling
```python
graph.update_state(config, values=[RemoveMessage(id=state.values[-1].id)])
```
* allow nodes to delete messages
```python
graph.add_node("delete_messages", lambda state: [RemoveMessage(id=state[-1].id)])
```
- add test for structured output
- fix bug with structured output for Azure
- better testing on Groq (break out Mixtral + Llama3 and add xfails
where needed)
This PR modifies the API Reference in the following way:
1. Relist standard methods: invoke, ainvoke, batch, abatch,
batch_as_completed, abatch_as_completed, stream, astream,
astream_events. These are the main entry points for a lot of runnables,
so we'll keep them for each runnable.
2. Relist methods from Runnable Serializable: to_json,
configurable_fields, configurable_alternatives.
3. Expand the note in the API reference documentation to explain that
additional methods are available.
- **Description:** The name of ToolMessage is default to None, which
makes tool message send to LLM likes
```json
{"role": "tool",
"tool_call_id": "",
"content": "{\"time\": \"12:12\"}",
"name": null}
```
But the name seems essential for some LLMs like TongYi Qwen. so we need to set the name use agent_action's tool value.
- **Issue:** N/A
- **Dependencies:** N/A
- **Description:** Fixing the way users have to import Arxiv and
Semantic Scholar
- **Issue:** Changed to use `from langchain_community.tools.arxiv import
ArxivQueryRun` instead of `from langchain_community.tools.arxiv.tool
import ArxivQueryRun`
- **Dependencies:** None
- **Twitter handle:** Nope
This PR fixes an issue with not able to use unlimited/infinity tokens
from the respective provider for the LiteLLM provider.
This is an issue when working in an agent environment that the token
usage can drastically increase beyond the initial value set causing
unexpected behavior.
Descriptions: currently in the
[doc](https://python.langchain.com/v0.2/docs/how_to/extraction_examples/)
it sets "Data" as the LLM's structured output schema, however its
examples given to the LLM output's "Person", which causes the LLM to be
confused and might occasionally return "Person" as the function to call
issue: #23383
Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
- **Description:** A small fix where I moved the `available_endpoints`
in order to avoid the token error in the below issue. Also I have added
conftest file and updated the `scripy`,`numpy` versions to support newer
python versions in poetry files.
- **Issue:** #22804
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Discovered alongside @t968914
- **Description:**
According to OpenAI docs, tool messages (response from calling tools)
must have a 'name' field.
https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models
- **Issue:** N/A (as of right now)
- **Dependencies:** N/A
- **Twitter handle:** N/A
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
This PR adds an optional ID field to the document schema.
# 1. Optional or Required
- An optional field will will requrie additional checking for the type
in user code (annoying).
- However, vectorstores currently don't respect this field. So if we
make it
required and start returning random UUIDs that might be even more
confusing
to users.
**Proposal**: Start with Optional and convert to Required (with default
set to uuid4()) in 1-2 major releases.
# 2. Override __str__ or generic solution in prompts
Overriding __str__ as a simple way to avoid changing user code that
relies on
default str(document) in prompts.
I considered rolling out a more general solution in prompts
(https://github.com/langchain-ai/langchain/pull/8685),
but to do that we need to:
1. Make things serializable
2. The more general solution would likely need to be backwards
compatible as well
3. It's unclear that one wants to format a List[int] in the same way as
List[Document]. The former should be `,` seperated (likely), the latter
should be `---` separated (likely).
**Proposal** Start with __str__ override and focus on the vectorstore
APIs, we generalize prompts later
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- Updates chat few shot prompt tutorial to show off a more cohesive
example
- Fix async Chromium loader guide
- Fix Excel loader install instructions
- Reformat Html2Text page
- Add install instructions to Azure OpenAI embeddings page
- Add missing dep install to SQL QA tutorial
@baskaryan
## Description
Created a helper method to make vector search indexes via client-side
pymongo.
**Recent Update** -- Removed error suppressing/overwriting layer in
favor of letting the original exception provide information.
## ToDo's
- [x] Make _wait_untils for integration test delete index
functionalities.
- [x] Add documentation for its use. Highlight it's experimental
- [x] Post Integration Test Results in a screenshot
- [x] Get review from MongoDB internal team (@shaneharvey, @blink1073 ,
@NoahStapp , @caseyclements)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. Added new integration tests. Not eligible for unit testing since the
operation is Atlas Cloud specific.
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
- [X] **PR title**: "community: fix code example in ZenGuard docs"
- [X] **PR message**:
- **Description:** corrected the docs by indicating in the code example
that the tool accepts a list of prompts instead of just one
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for review
---------
Co-authored-by: Baur <baur.krykpayev@gmail.com>
- **Description:** This PR fixes an issue with SAP HANA Cloud QRC03
version. In that version the number to indicate no length being set for
a vector column changed from -1 to 0. The change in this PR support both
behaviours (old/new).
- **Dependencies:** No dependencies have been introduced.
- **Tests**: The change is covered by previous unit tests.
fixed potential `IndexError: list index out of range` in case there is
no title
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**langchain: ConversationVectorStoreTokenBufferMemory**
-**Description:** This PR adds ConversationVectorStoreTokenBufferMemory.
It is similar in concept to ConversationSummaryBufferMemory. It
maintains an in-memory buffer of messages up to a preset token limit.
After the limit is hit timestamped messages are written into a
vectorstore retriever rather than into a summary. The user's prompt is
then used to retrieve relevant fragments of the previous conversation.
By persisting the vectorstore, one can maintain memory from session to
session.
-**Issue:** n/a
-**Dependencies:** none
-**Twitter handle:** Please no!!!
- [X] **Add tests and docs**: I looked to see how the unit tests were
written for the other ConversationMemory modules, but couldn't find
anything other than a test for successful import. I need to know whether
you are using pytest.mock or another fixture to simulate the LLM and
vectorstore. In addition, I would like guidance on where to place the
documentation. Should it be a notebook file in docs/docs?
- [X] **Lint and test**: I am seeing some linting errors from a couple
of modules unrelated to this PR.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: update docs and add tool to init.py"
- [x] **PR message**:
- **Description:** Fixed some errors and comments in the docs and added
our ZenGuardTool and additional classes to init.py for easy access when
importing
- **Question:** when will you update the langchain-community package in
pypi to make our tool available?
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for review!
---------
Co-authored-by: Baur <baur.krykpayev@gmail.com>
These currently read off AIMessage.tool_calls, and only fall back to
OpenAI parsing if tool calls aren't populated.
Importing these from `openai_tools` (e.g., in our [tool calling
docs](https://python.langchain.com/v0.2/docs/how_to/tool_calling/#tool-calls))
can lead to confusion.
After landing, would need to release core and update docs.
Pydantic allows empty strings:
```
from langchain.pydantic_v1 import Field, BaseModel
class Property(BaseModel):
"""A single property consisting of key and value"""
key: str = Field(..., description="key")
value: str = Field(..., description="value")
x = Property(key="", value="")
```
Which can produce errors downstream. We simply ignore those records
bing_search_url is an endpoint to requests bing search resource and is
normally invariant to users, we can give it the default value to simply
the uesages of this utility/tool
Description: Add classifier_location feature flag. This flag enables
Pebblo to decide the classifier location, local or pebblo-cloud.
Unit Tests: N/A
Documentation: N/A
---------
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
The code snippet under ‘pdfs_qa’ contains an small incorrect code
example , resulting in users getting errors. This pr replaces ‘llm’
variable with ‘model’ to help user avoid a NameError message.
Resolves#22689
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Adds options for configuring MongoDBChatMessageHistory
(no breaking changes):
- session_id_key: name of the field that stores the session id
- history_key: name of the field that stores the chat history
- create_index: whether to create an index on the session id field
- index_kwargs: additional keyword arguments to pass to the index
creation
**Discussion:**
https://github.com/langchain-ai/langchain/discussions/22918
**Twitter handle:** @userlerueda
---------
Co-authored-by: Jib <Jibzade@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Add standard tests to base store abstraction. These only work on [str,
str] right now. We'll need to check if it's possible to add
encoder/decoders to generalize
**Description:**
This PR addresses an issue in the `MongodbLoader` where nested fields
were not being correctly extracted. The loader now correctly handles
nested fields specified in the `field_names` parameter.
**Issue:**
Fixes an issue where attempting to extract nested fields from MongoDB
documents resulted in `KeyError`.
**Dependencies:**
No new dependencies are required for this change.
**Twitter handle:**
(Optional, your Twitter handle if you'd like a mention when the PR is
announced)
### Changes
1. **Field Name Parsing**:
- Added logic to parse nested field names and safely extract their
values from the MongoDB documents.
2. **Projection Construction**:
- Updated the projection dictionary to include nested fields correctly.
3. **Field Extraction**:
- Updated the `aload` method to handle nested field extraction using a
recursive approach to traverse the nested dictionaries.
### Example Usage
Updated usage example to demonstrate how to specify nested fields in the
`field_names` parameter:
```python
loader = MongodbLoader(
connection_string=MONGO_URI,
db_name=MONGO_DB,
collection_name=MONGO_COLLECTION,
filter_criteria={"data.job.company.industry_name": "IT", "data.job.detail": { "$exists": True }},
field_names=[
"data.job.detail.id",
"data.job.detail.position",
"data.job.detail.intro",
"data.job.detail.main_tasks",
"data.job.detail.requirements",
"data.job.detail.preferred_points",
"data.job.detail.benefits",
],
)
docs = loader.load()
print(len(docs))
for doc in docs:
print(doc.page_content)
```
### Testing
Tested with a MongoDB collection containing nested documents to ensure
that the nested fields are correctly extracted and concatenated into a
single page_content string.
### Note
This change ensures backward compatibility for non-nested fields and
improves functionality for nested field extraction.
### Output Sample
```python
print(docs[:3])
```
```shell
# output sample:
[
Document(
# Here in this example, page_content is the combined text from the fields below
# "position", "intro", "main_tasks", "requirements", "preferred_points", "benefits"
page_content='all combined contents from the requested fields in the document',
metadata={'database': 'Your Database name', 'collection': 'Your Collection name'}
),
...
]
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [x] PR title:
community: Add OCI Generative AI new model support
- [x] PR message:
- Description: adding support for new models offered by OCI Generative
AI services. This is a moderate update of our initial integration PR
16548 and includes a new integration for our chat models under
/langchain_community/chat_models/oci_generative_ai.py
- Issue: NA
- Dependencies: No new Dependencies, just latest version of our OCI sdk
- Twitter handle: NA
- [x] Add tests and docs:
1. we have updated our unit tests
2. we have updated our documentation including a new ipynb for our new
chat integration
- [x] Lint and test:
`make format`, `make lint`, and `make test` run successfully
---------
Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
** Description**
This is the community integration of ZenGuard AI - the fastest
guardrails for GenAI applications. ZenGuard AI protects against:
- Prompts Attacks
- Veering of the pre-defined topics
- PII, sensitive info, and keywords leakage.
- Toxicity
- Etc.
**Twitter Handle** : @zenguardai
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. Added an integration test
2. Added colab
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.
---------
Co-authored-by: Nuradil <nuradil.maksut@icloud.com>
Co-authored-by: Nuradil <133880216+yaksh0nti@users.noreply.github.com>
They are now rejecting with code 401 calls from users with expired or
invalid tokens (while before they were being considered anonymous).
Thus, the authorization header has to be removed when there is no token.
Related to: #23178
---------
Signed-off-by: Joffref <mariusjoffre@gmail.com>
Description: 2 feature flags added to SharePointLoader in this PR:
1. load_auth: if set to True, adds authorised identities to metadata
2. load_extended_metadata, adds source, owner and full_path to metadata
Unit tests:N/A
Documentation: To be done.
---------
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
This fixes processing issue for nodes with numbers in their labels (e.g.
`"node_1"`, which would previously be relabeled as `"node__"`, and now
are correctly processed as `"node_1"`)
**Description:**
Fix "`TypeError: 'NoneType' object is not iterable`" when the
auth_context is absent in PebbloRetrievalQA. The auth_context is
optional; hence, PebbloRetrievalQA should work without it, but it throws
an error at the moment. This PR fixes that issue.
**Issue:** NA
**Dependencies:** None
**Unit tests:** NA
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: file_metadata_ was not getting propagated to returned
documents. Changed the lookup key to the name of the blob's path.
Changed blob.path key to blob.path.name for metadata_dict key lookup.
Documentation: N/A
Unit tests: N/A
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
Currently, the `langchain_pinecone` library forces the `async_req`
(asynchronous required) argument to Pinecone to `True`. This design
choice causes problems when deploying to environments that do not
support multiprocessing, such as AWS Lambda. In such environments, this
restriction can prevent users from successfully using
`langchain_pinecone`.
This PR introduces a change that allows users to specify whether they
want to use asynchronous requests by passing the `async_req` parameter
through `**kwargs`. By doing so, users can set `async_req=False` to
utilize synchronous processing, making the library compatible with AWS
Lambda and other environments that do not support multithreading.
**Issue:**
This PR does not address a specific issue number but aims to resolve
compatibility issues with AWS Lambda by allowing synchronous processing.
**Dependencies:**
None, that I'm aware of.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** When use
RunnableWithMessageHistory/SQLChatMessageHistory in async mode, we'll
get the following error:
```
Error in RootListenersTracer.on_chain_end callback: RuntimeError("There is no current event loop in thread 'asyncio_3'.")
```
which throwed by
ddfbca38df/libs/community/langchain_community/chat_message_histories/sql.py (L259).
and no message history will be add to database.
In this patch, a new _aexit_history function which will'be called in
async mode is added, and in turn aadd_messages will be called.
In this patch, we use `afunc` attribute of a Runnable to check if the
end listener should be run in async mode or not.
- **Issue:** #22021, #22022
- **Dependencies:** N/A
The SelfQuery PGVectorTranslator is not correct. The operator is "eq"
and not "$eq".
This patch use a new version of PGVectorTranslator from
langchain_postgres.
It's necessary to release a new version of langchain_postgres (see
[here](https://github.com/langchain-ai/langchain-postgres/pull/75)
before accepting this PR in langchain.
fix systax warning in `create_json_chat_agent`
```
.../langchain/agents/json_chat/base.py:22: SyntaxWarning: invalid escape sequence '\ '
"""Create an agent that uses JSON to format its logic, build for Chat Models.
```
- **Description:** AsyncRootListenersTracer support on_chat_model_start,
it's schema_format should be "original+chat".
- **Issue:** N/A
- **Dependencies:**
minor changes to module import error handling and minor issues in
tutorial documents.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Desscription**: When the ``sql_database.from_databricks`` is executed
from a Workflow Job, the ``context`` object does not have a
"browserHostName" property, resulting in an error. This change manages
the error so the "DATABRICKS_HOST" env variable value is used instead of
stoping the flow
Co-authored-by: lmorosdb <lmorosdb>
The return type of `json.loads` is `Any`.
In fact, the return type of `dumpd` must be based on `json.loads`, so
the correction here is understandable.
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Fix bug with TypedDicts rendering inherited methods if inherting from
typing_extensions.TypedDict rather than typing.TypedDict
- Do not surface inherited pydantic methods for subclasses of BaseModel
- Subclasses of RunnableSerializable will not how methods inherited from
Runnable or from BaseModel
- Subclasses of Runnable that not pydantic models will include a link to
RunnableInterface (they still show inherited methods, we can fix this
later)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Description: Update Rag tutorial notebook so it includes an additional
notebook cell with pip installs of required langchain_chroma and
langchain_community.
This fixes the issue with the rag tutorial gives you a 'missing modules'
error if you run code in the notebook as is.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** Add optional max_messages to MessagePlaceholder
- **Issue:**
[16096](https://github.com/langchain-ai/langchain/issues/16096)
- **Dependencies:** None
- **Twitter handle:** @davedecaprio
Sometimes it's better to limit the history in the prompt itself rather
than the memory. This is needed if you want different prompts in the
chain to have different history lengths.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Thank you for contributing to LangChain!
**Description**
The current code snippet for `Fireworks` had incorrect parameters. This
PR fixes those parameters.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
currently we skip CI on diffs >= 300 files. think we should just run it
on all packages instead
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- Moved doc-strings below attribtues in TypedDicts -- seems to render
better on APIReference pages.
* Provided more description and some simple code examples
- **Description:** Restores compatibility with SQLAlchemy 1.4.x that was
broken since #18992 and adds a test run for this version on CI (only for
Python 3.11)
- **Issue:** fixes#19681
- **Dependencies:** None
- **Twitter handle:** `@krassowski_m`
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** sambanova sambaverse integration improvement: removed
input parsing that was changing raw user input, and was making to use
process prompt parameter as true mandatory
**Description:** `astream_events(version="v2")` didn't propagate
exceptions in `langchain-core<=0.2.6`, fixed in the #22916. This PR adds
a unit test to check that exceptions are propagated upwards.
Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
Added missed docstrings. Format docstrings to the consistent format
(used in the API Reference)
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
This raises ImportError due to a circular import:
```python
from langchain_core import chat_history
```
This does not:
```python
from langchain_core import runnables
from langchain_core import chat_history
```
Here we update `test_imports` to run each import in a separate
subprocess. Open to other ways of doing this!
Tests failing on master with
> FAILED
tests/unit_tests/embeddings/test_ovhcloud.py::test_ovhcloud_embed_documents
- ValueError: Request failed with status code: 401, {"message":"Bad
token; invalid JSON"}
Thank you for contributing to LangChain!
**Description:** Noticed an issue with when I was calling
`RecursiveJsonSplitter().split_json()` multiple times that I was getting
weird results. I found an issue where `chunks` list in the `_json_split`
method. If chunks is not provided when _json_split (which is the case
when split_json calls _json_split) then the same list is used for
subsequent calls to `_json_split`.
You can see this in the test case i also added to this commit.
Output should be:
```
[{'a': 1, 'b': 2}]
[{'c': 3, 'd': 4}]
```
Instead you get:
```
[{'a': 1, 'b': 2}]
[{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
```
---------
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
- **Description:** add `**request_kwargs` and expect `TimeError` in
`_fetch` function for AsyncHtmlLoader. This allows you to fill in the
kwargs parameter when using the `load()` method of the `AsyncHtmlLoader`
class.
Co-authored-by: Yucolu <yucolu@tencent.com>
#### Description
This MR defines a `ExperimentalMarkdownSyntaxTextSplitter` class. The
main goal is to replicate the functionality of the original
`MarkdownHeaderTextSplitter` which extracts the header stack as metadata
but with one critical difference: it keeps the whitespace of the
original text intact.
This draft reimplements the `MarkdownHeaderTextSplitter` with a very
different algorithmic approach. Instead of marking up each line of the
text individually and aggregating them back together into chunks, this
method builds each chunk sequentially and applies the metadata to each
chunk. This makes the implementation simpler. However, since it's
designed to keep white space intact its not a full drop in replacement
for the original. Since it is a radical implementation change to the
original code and I would like to get feedback to see if this is a
worthwhile replacement, should be it's own class, or is not a good idea
at all.
Note: I implemented the `return_each_line` parameter but I don't think
it's a necessary feature. I'd prefer to remove it.
This implementation also adds the following additional features:
- Splits out code blocks and includes the language in the `"Code"`
metadata key
- Splits text on the horizontal rule `---` as well
- The `headers_to_split_on` parameter is now optional - with sensible
defaults that can be overridden.
#### Issue
Keeping the whitespace keeps the paragraphs structure and the formatting
of the code blocks intact which allows the caller much more flexibility
in how they want to further split the individuals sections of the
resulting documents. This addresses the issues brought up by the
community in the following issues:
- https://github.com/langchain-ai/langchain/issues/20823
- https://github.com/langchain-ai/langchain/issues/19436
- https://github.com/langchain-ai/langchain/issues/22256
#### Dependencies
N/A
#### Twitter handle
@RyanElston
---------
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
# Description
This pull request aims to address specific issues related to the
ambiguity and error-proneness of the output types of certain output
parsers, as well as the absence of unit tests for some parsers. These
issues could potentially lead to runtime errors or unexpected behaviors
due to type mismatches when used, causing confusion for developers and
users. Through clarifying output types, this PR seeks to improve the
stability and reliability.
Therefore, this pull request
- fixes the `OutputType` of OutputParsers to be the expected type;
- e.g. `OutputType` property of `EnumOutputParser` raises `TypeError`.
This PR introduce a logic to extract `OutputType` from its attribute.
- and fixes the legacy API in OutputParsers like `LLMChain.run` to the
modern API like `LLMChain.invoke`;
- Note: For `OutputFixingParser`, `RetryOutputParser` and
`RetryWithErrorOutputParser`, this PR introduces `legacy` attribute with
False as default value in order to keep the backward compatibility
- and adds the tests for the `OutputFixingParser` and
`RetryOutputParser`.
The following table shows my expected output and the actual output of
the `OutputType` of OutputParsers.
I have used this table to fix `OutputType` of OutputParsers.
| Class Name of OutputParser | My Expected `OutputType` (after this PR)|
Actual `OutputType` [evidence](#evidence) (before this PR)| Fix Required
|
|---------|--------------|---------|--------|
| BooleanOutputParser | `<class 'bool'>` | `<class 'bool'>` | NO |
| CombiningOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| DatetimeOutputParser | `<class 'datetime.datetime'>` | `<class
'datetime.datetime'>` | NO |
| EnumOutputParser(enum=MyEnum) | `MyEnum` | `TypeError` is raised | YES
|
| OutputFixingParser | The same type as `self.parser.OutputType` | `~T`
| YES |
| CommaSeparatedListOutputParser | `typing.List[str]` |
`typing.List[str]` | NO |
| MarkdownListOutputParser | `typing.List[str]` | `typing.List[str]` |
NO |
| NumberedListOutputParser | `typing.List[str]` | `typing.List[str]` |
NO |
| JsonOutputKeyToolsParser | `typing.Any` | `typing.Any` | NO |
| JsonOutputToolsParser | `typing.Any` | `typing.Any` | NO |
| PydanticToolsParser | `typing.Any` | `typing.Any` | NO |
| PandasDataFrameOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| PydanticOutputParser(pydantic_object=MyModel) | `<class
'__main__.MyModel'>` | `<class '__main__.MyModel'>` | NO |
| RegexParser | `typing.Dict[str, str]` | `TypeError` is raised | YES |
| RegexDictParser | `typing.Dict[str, str]` | `TypeError` is raised |
YES |
| RetryOutputParser | The same type as `self.parser.OutputType` | `~T` |
YES |
| RetryWithErrorOutputParser | The same type as `self.parser.OutputType`
| `~T` | YES |
| StructuredOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| YamlOutputParser(pydantic_object=MyModel) | `MyModel` | `~T` | YES |
NOTE: In "Fix Required", "YES" means that it is required to fix in this
PR while "NO" means that it is not required.
# Issue
No issues for this PR.
# Twitter handle
- [hmdev3](https://twitter.com/hmdev3)
# Questions:
1. Is it required to create tests for legacy APIs `LLMChain.run` in the
following scripts?
- libs/langchain/tests/unit_tests/output_parsers/test_fix.py;
- libs/langchain/tests/unit_tests/output_parsers/test_retry.py.
2. Is there a more appropriate expected output type than I expect in the
above table?
- e.g. the `OutputType` of `CombiningOutputParser` should be
SOMETHING...
# Actual outputs (before this PR)
<div id='evidence'></div>
<details><summary>Actual outputs</summary>
## Requirements
- Python==3.9.13
- langchain==0.1.13
```python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import langchain
>>> langchain.__version__
'0.1.13'
>>> from langchain import output_parsers
```
### `BooleanOutputParser`
```python
>>> output_parsers.BooleanOutputParser().OutputType
<class 'bool'>
```
### `CombiningOutputParser`
```python
>>> output_parsers.CombiningOutputParser(parsers=[output_parsers.DatetimeOutputParser(), output_parsers.CommaSeparatedListOutputParser()]).OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable CombiningOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `DatetimeOutputParser`
```python
>>> output_parsers.DatetimeOutputParser().OutputType
<class 'datetime.datetime'>
```
### `EnumOutputParser`
```python
>>> from enum import Enum
>>> class MyEnum(Enum):
... a = 'a'
... b = 'b'
...
>>> output_parsers.EnumOutputParser(enum=MyEnum).OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable EnumOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `OutputFixingParser`
```python
>>> output_parsers.OutputFixingParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```
### `CommaSeparatedListOutputParser`
```python
>>> output_parsers.CommaSeparatedListOutputParser().OutputType
typing.List[str]
```
### `MarkdownListOutputParser`
```python
>>> output_parsers.MarkdownListOutputParser().OutputType
typing.List[str]
```
### `NumberedListOutputParser`
```python
>>> output_parsers.NumberedListOutputParser().OutputType
typing.List[str]
```
### `JsonOutputKeyToolsParser`
```python
>>> output_parsers.JsonOutputKeyToolsParser(key_name='tool').OutputType
typing.Any
```
### `JsonOutputToolsParser`
```python
>>> output_parsers.JsonOutputToolsParser().OutputType
typing.Any
```
### `PydanticToolsParser`
```python
>>> from langchain.pydantic_v1 import BaseModel
>>> class MyModel(BaseModel):
... a: int
...
>>> output_parsers.PydanticToolsParser(tools=[MyModel, MyModel]).OutputType
typing.Any
```
### `PandasDataFrameOutputParser`
```python
>>> output_parsers.PandasDataFrameOutputParser().OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable PandasDataFrameOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `PydanticOutputParser`
```python
>>> output_parsers.PydanticOutputParser(pydantic_object=MyModel).OutputType
<class '__main__.MyModel'>
```
### `RegexParser`
```python
>>> output_parsers.RegexParser(regex='$', output_keys=['a']).OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable RegexParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `RegexDictParser`
```python
>>> output_parsers.RegexDictParser(output_key_to_format={'a':'a'}).OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable RegexDictParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `RetryOutputParser`
```python
>>> output_parsers.RetryOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```
### `RetryWithErrorOutputParser`
```python
>>> output_parsers.RetryWithErrorOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```
### `StructuredOutputParser`
```python
>>> from langchain.output_parsers.structured import ResponseSchema
>>> response_schemas = [ResponseSchema(name="foo",description="a list of strings",type="List[string]"),ResponseSchema(name="bar",description="a string",type="string"), ]
>>> output_parsers.StructuredOutputParser.from_response_schemas(response_schemas).OutputType
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
raise TypeError(
TypeError: Runnable StructuredOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```
### `YamlOutputParser`
```python
>>> output_parsers.YamlOutputParser(pydantic_object=MyModel).OutputType
~T
```
<div>
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
This change adds args_schema (pydantic BaseModel) to SearxSearchRun for
correct schema formatting on LLM function calls
Issue: currently using SearxSearchRun with OpenAI function calling
returns the following error "TypeError: SearxSearchRun._run() got an
unexpected keyword argument '__arg1' ".
This happens because the schema sent to the LLM is "input:
'{"__arg1":"foobar"}'" while the method should be called with the
"query" parameter.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Updated
*community.langchain_community.document_loaders.directory.py* to enable
the use of multiple glob patterns in the `DirectoryLoader` class. Now,
the glob parameter is of type `list[str] | str` and still defaults to
the same value as before. I updated the docstring of the class to
reflect this, and added a unit test to
*community.tests.unit_tests.document_loaders.test_directory.py* named
`test_directory_loader_glob_multiple`. This test also shows an example
of how to use the new functionality.
- ~~Issue:~~**Discussion Thread:**
https://github.com/langchain-ai/langchain/discussions/18559
- **Dependencies:** None
- **Twitter handle:** N/a
- [x] **Add tests and docs**
- Added test (described above)
- Updated class docstring
- [x] **Lint and test**
---------
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Fix https://github.com/langchain-ai/langchain/issues/22972.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
```SemanticChunker``` currently provide three methods to split the texts semantically:
- percentile
- standard_deviation
- interquartile
I propose new method ```gradient```. In this method, the gradient of distance is used to split chunks along with the percentile method (technically) . This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data.
I have tested this merge on a set of 10 domain specific documents (mostly legal).
Details :
- **Issue:** Improvement
- **Dependencies:** NA
- **Twitter handle:** [x.com/prajapat_ravi](https://x.com/prajapat_ravi)
@hwchase17
---------
Co-authored-by: Raviraj Prajapat <raviraj.prajapat@sirionlabs.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Add chat history store based on Kafka.
Files added:
`libs/community/langchain_community/chat_message_histories/kafka.py`
`docs/docs/integrations/memory/kafka_chat_message_history.ipynb`
New issue to be created for future improvement:
1. Async method implementation.
2. Message retrieval based on timestamp.
3. Support for other configs when connecting to cloud hosted Kafka (e.g.
add `api_key` field)
4. Improve unit testing & integration testing.
**Description:**
- What I changed
- By specifying the `id_key` during the initialization of
`EnsembleRetriever`, it is now possible to determine which documents to
merge scores for based on the value corresponding to the `id_key`
element in the metadata, instead of `page_content`. Below is an example
of how to use the modified `EnsembleRetriever`:
```python
retriever = EnsembleRetriever(retrievers=[ret1, ret2], id_key="id") #
The Document returned by each retriever must keep the "id" key in its
metadata.
```
- Additionally, I added a script to easily test the behavior of the
`invoke` method of the modified `EnsembleRetriever`.
- Why I changed
- There are cases where you may want to calculate scores by treating
Documents with different `page_content` as the same when using
`EnsembleRetriever`. For example, when you want to ensemble the search
results of the same document described in two different languages.
- The previous `EnsembleRetriever` used `page_content` as the basis for
score aggregation, making the above usage difficult. Therefore, the
score is now calculated based on the specified key value in the
Document's metadata.
**Twitter handle:** @shimajiroxyz
- **Description:** add tool_messages_formatter for tool calling agent,
make tool messages can be formatted in different ways for your LLM.
- **Issue:** N/A
- **Dependencies:** N/A
**Standardizing DocumentLoader docstrings (of which there are many)**
This PR addresses issue #22866 and adds docstrings according to the
issue's specified format (in the appendix) for files csv_loader.py and
json_loader.py in langchain_community.document_loaders. In particular,
the following sections have been added to both CSVLoader and JSONLoader:
Setup, Instantiate, Load, Async load, and Lazy load. It may be worth
adding a 'Metadata' section to the JSONLoader docstring to clarify how
we want to extract the JSON metadata (using the `metadata_func`
argument). The files I used to walkthrough the various sections were
`example_2.json` from
[HERE](https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files)
and `hw_200.csv` from
[HERE](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html).
---------
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
- **Description:** A very small fix in the Docstring of
`DuckDuckGoSearchResults` identified in the following issue.
- **Issue:** #22961
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **PR title**: "community: Fix#22975 (Add SSL Verification Option to
Requests Class in langchain_community)"
- **PR message**:
- **Description:**
- Added an optional verify parameter to the Requests class with a
default value of True.
- Modified the get, post, patch, put, and delete methods to include the
verify parameter.
- Updated the _arequest async context manager to include the verify
parameter.
- Added the verify parameter to the GenericRequestsWrapper class and
passed it to the Requests class.
- **Issue:** This PR fixes issue #22975.
- **Dependencies:** No additional dependencies are required for this
change.
- **Twitter handle:** @lunara_x
You can check this change with below code.
```python
from langchain_openai.chat_models import ChatOpenAI
from langchain.requests import RequestsWrapper
from langchain_community.agent_toolkits.openapi import planner
from langchain_community.agent_toolkits.openapi.spec import reduce_openapi_spec
with open("swagger.yaml") as f:
data = yaml.load(f, Loader=yaml.FullLoader)
swagger_api_spec = reduce_openapi_spec(data)
llm = ChatOpenAI(model='gpt-4o')
swagger_requests_wrapper = RequestsWrapper(verify=False) # modified point
superset_agent = planner.create_openapi_agent(swagger_api_spec, swagger_requests_wrapper, llm, allow_dangerous_requests=True, handle_parsing_errors=True)
superset_agent.run(
"Tell me the number and types of charts and dashboards available."
)
```
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** The PR #22777 introduced a bug in
`_similarity_search_without_score` which was raising the
`OperationFailure` error. The mistake was syntax error for MongoDB
pipeline which has been corrected now.
- **Issue:** #22770
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Thank you for contributing to LangChain!
- [x] **PR title**: "community: OCI GenAI embedding batch size"
- [x] **PR message**:
- **Issue:** #22985
- [ ] **Add tests and docs**: N/A
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Signed-off-by: Anders Swanson <anders.swanson@oracle.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- StopIteration can't be set on an asyncio.Future it raises a TypeError
and leaves the Future pending forever so we need to convert it to a
RuntimeError
- Refactor standard test classes to make them easier to configure
- Update openai to support stop_sequences init param
- Update groq to support stop_sequences init param
- Update fireworks to support max_retries init param
- Update ChatModel.bind_tools to type tool_choice
- Update groq to handle tool_choice="any". **this may be controversial**
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Langchain is very popular among developers in China, but there are still
no good Chinese books or documents, so I want to add my own Chinese
resources on langchain topics, hoping to give Chinese readers a better
experience using langchain. This is not a translation of the official
langchain documentation, but my understanding.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Support batch size**
Baichuan updates the document, indicating that up to 16 documents can be
imported at a time
- **Standardized model init arg names**
- baichuan_api_key -> api_key
- model_name -> model
Here we add `stream_usage` to ChatOpenAI as:
1. a boolean attribute
2. a kwarg to _stream and _astream.
Question: should the `stream_usage` attribute be `bool`, or `bool |
None`?
Currently I've kept it `bool` and defaulted to False. It was implemented
on
[ChatAnthropic](e832bbb486/libs/partners/anthropic/langchain_anthropic/chat_models.py (L535))
as a bool. However, to maintain support for users who access the
behavior via OpenAI's `stream_options` param, this ends up being
possible:
```python
llm = ChatOpenAI(model_kwargs={"stream_options": {"include_usage": True}})
assert not llm.stream_usage
```
(and this model will stream token usage).
Some options for this:
- it's ok
- make the `stream_usage` attribute bool or None
- make an \_\_init\_\_ for ChatOpenAI, set a `._stream_usage` attribute
and read `.stream_usage` from a property
Open to other ideas as well.
**Description:** This PR adds a chat model integration for [Snowflake
Cortex](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions),
which gives an instant access to industry-leading large language models
(LLMs) trained by researchers at companies like Mistral, Reka, Meta, and
Google, including [Snowflake
Arctic](https://www.snowflake.com/en/data-cloud/arctic/), an open
enterprise-grade model developed by Snowflake.
**Dependencies:** Snowflake's
[snowpark](https://pypi.org/project/snowflake-snowpark-python/) library
is required for using this integration.
**Twitter handle:** [@gethouseware](https://twitter.com/gethouseware)
- [x] **Add tests and docs**:
1. integration tests:
`libs/community/tests/integration_tests/chat_models/test_snowflake.py`
2. unit tests:
`libs/community/tests/unit_tests/chat_models/test_snowflake.py`
3. example notebook: `docs/docs/integrations/chat/snowflake.ipynb`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Adds `response_metadata` to stream responses from OpenAI. This is
returned with `invoke` normally, but wasn't implemented for `stream`.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
## Description
While `YouRetriever` supports both You.com's Search and News APIs, news
is supported as an afterthought.
More specifically, not all of the News API parameters are exposed for
the user, only those that happen to overlap with the Search API.
This PR:
- improves support for both APIs, exposing the remaining News API
parameters while retaining backward compatibility
- refactor some REST parameter generation logic
- updates the docstring of `YouSearchAPIWrapper`
- add input validation and warnings to ensure parameters are properly
set by user
- 🚨 Breaking: Limit the news results to `k` items
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Ollama has a raw option now.
https://github.com/ollama/ollama/blob/main/docs/api.md
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
**Issue:**
When using the similarity_search_with_score function in
ElasticsearchStore, I expected to pass in the query_vector that I have
already obtained. I noticed that the _search function does support the
query_vector parameter, but it seems to be ineffective. I am attempting
to resolve this issue.
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
Update former pull request:
https://github.com/langchain-ai/langchain/pull/22654.
Modified `langchain_text_splitters.HTMLSectionSplitter`, where in the
latest version `dict` data structure is used to store sections from a
html document, in function `split_html_by_headers`. The header/section
element names serve as dict keys. This can be a problem when duplicate
header/section element names are present in a single html document.
Latter ones can replace former ones with the same name. Therefore some
contents can be miss after html text splitting is conducted.
Using a list to store sections can hopefully solve the problem. A Unit
test considering duplicate header names has been added.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
The generated relationships in the graph had no properties, but the
Relationship class was properly defined with properties. This made it
very difficult to transform conditional sentences into a graph. Adding
properties to relationships can solve this issue elegantly.
The changes expand on the existing LLMGraphTransformer implementation
but add the possibility to define allowed relationship properties like
this: LLMGraphTransformer(llm=llm, relationship_properties=["Condition",
"Time"],)
- **Issue:**
no issue found
- **Dependencies:**
n/a
- **Twitter handle:**
@IstvanSpace
-Quick Test
=================================================================
from dotenv import load_dotenv
import os
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import
LLMGraphTransformer
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document
load_dotenv()
os.environ["NEO4J_URI"] = os.getenv("NEO4J_URI")
os.environ["NEO4J_USERNAME"] = os.getenv("NEO4J_USERNAME")
os.environ["NEO4J_PASSWORD"] = os.getenv("NEO4J_PASSWORD")
graph = Neo4jGraph()
llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
llm_transformer = LLMGraphTransformer(llm=llm)
#text = "Harry potter likes pies, but only if it rains outside"
text = "Jack has a dog named Max. Jack only walks Max if it is sunny
outside."
documents = [Document(page_content=text)]
llm_transformer_props = LLMGraphTransformer(
llm=llm,
relationship_properties=["Condition"],
)
graph_documents_props =
llm_transformer_props.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents_props[0].nodes}")
print(f"Relationships:{graph_documents_props[0].relationships}")
graph.add_graph_documents(graph_documents_props)
---------
Co-authored-by: Istvan Lorincz <istvan.lorincz@pm.me>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Add admonition to the documentation to make sure users are aware that
the tool allows execution of code on the host machine using a python
interpreter (by design).
If the global `debug` flag is enabled, the agent will get the following
error in `FunctionCallbackHandler._on_tool_end` at runtime.
```
Error in ConsoleCallbackHandler.on_tool_end callback: AttributeError("'list' object has no attribute 'strip'")
```
By calling str() before strip(), the error was avoided.
This error can be seen at
[debugging.ipynb](https://github.com/langchain-ai/langchain/blob/master/docs/docs/how_to/debugging.ipynb).
- Issue: NA
- Dependencies: NA
- Twitter handle: https://x.com/kiarina37
Remove the REPL from community, and suggest an alternative import from
langchain_experimental.
Fix for this issue:
https://github.com/langchain-ai/langchain/issues/14345
This is not a bug in the code or an actual security risk. The python
REPL itself is behaving as expected.
The PR is done to appease blanket security policies that are just
looking for the presence of exec in the code.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR moves the validation of the decorator to a better place to avoid
creating bugs while deprecating code.
Prevent issues like this from arising:
https://github.com/langchain-ai/langchain/issues/22510
we should replace with a linter at some point that just does static
analysis
Preserves string content chunks for non tool call requests for
convenience.
One thing - Anthropic events look like this:
```
RawContentBlockStartEvent(content_block=TextBlock(text='', type='text'), index=0, type='content_block_start')
RawContentBlockDeltaEvent(delta=TextDelta(text='<thinking>\nThe', type='text_delta'), index=0, type='content_block_delta')
RawContentBlockDeltaEvent(delta=TextDelta(text=' provide', type='text_delta'), index=0, type='content_block_delta')
...
RawContentBlockStartEvent(content_block=ToolUseBlock(id='toolu_01GJ6x2ddcMG3psDNNe4eDqb', input={}, name='get_weather', type='tool_use'), index=1, type='content_block_start')
RawContentBlockDeltaEvent(delta=InputJsonDelta(partial_json='', type='input_json_delta'), index=1, type='content_block_delta')
```
Note that `delta` has a `type` field. With this implementation, I'm
dropping it because `merge_list` behavior will concatenate strings.
We currently have `index` as a special field when merging lists, would
it be worth adding `type` too?
If so, what do we set as a context block chunk? `text` vs.
`text_delta`/`tool_use` vs `input_json_delta`?
CC @ccurme @efriis @baskaryan
- **Description:** Some of the Cross-Encoder models provide scores in
pairs, i.e., <not-relevant score (higher means the document is less
relevant to the query), relevant score (higher means the document is
more relevant to the query)>. However, the `HuggingFaceCrossEncoder`
`score` method does not currently take into account the pair situation.
This PR addresses this issue by modifying the method to consider only
the relevant score if score is being provided in pair. The reason for
focusing on the relevant score is that the compressors select the top-n
documents based on relevance.
- **Issue:** #22556
- Please also refer to this
[comment](https://github.com/UKPLab/sentence-transformers/issues/568#issuecomment-729153075)
- **PR title**: [community] add chat model llamacpp
- **PR message**:
- **Description:** This PR introduces a new chat model integration with
llamacpp_python, designed to work similarly to the existing ChatOpenAI
model.
+ Work well with instructed chat, chain and function/tool calling.
+ Work with LangGraph (persistent memory, tool calling), will update
soon
- **Dependencies:** This change requires the llamacpp_python library to
be installed.
@baskaryan
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Updated ChatGroq doc string as per issue
https://github.com/langchain-ai/langchain/issues/22296:"langchain_groq:
updated docstring for ChatGroq in langchain_groq to match that of the
description (in the appendix) provided in issue
https://github.com/langchain-ai/langchain/issues/22296. "
Issue: This PR is in response to issue
https://github.com/langchain-ai/langchain/issues/22296, and more
specifically the ChatGroq model. In particular, this PR updates the
docstring for langchain/libs/partners/groq/langchain_groq/chat_model.py
by adding the following sections: Instantiate, Invoke, Stream, Async,
Tool calling, Structured Output, and Response metadata. I used the
template from the Anthropic implementation and referenced the Appendix
of the original issue post. I also noted that: `usage_metadata `returns
none for all ChatGroq models I tested; there is no mention of image
input in the ChatGroq documentation; unlike that of ChatHuggingFace,
`.stream(messages)` for ChatGroq returned blocks of output.
---------
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR adds the feature add Prem Template feature in ChatPremAI.
Additionally it fixes a minor bug for API auth error when API passed
through arguments.
Description: Adjusting the syntax for creating the vectorstore
collection (in the case of automatic embedding computation) for the most
idiomatic way to submit the stored secret name.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:**
Update the NVIDIA Riva tool documentation to use NVIDIA NIM for the LLM.
Show how to use NVIDIA NIMs and link to documentation for LangChain with
NIM.
---------
Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
This PR addresses several lint errors in the core package of LangChain.
Specifically, the following issues were fixed:
1.Unexpected keyword argument "required" for "Field" [call-arg]
2.tests/integration_tests/chains/test_cpal.py:263: error: Unexpected
keyword argument "narrative_input" for "QueryModel" [call-arg]
This should make it obvious that a few of the agents in langchain
experimental rely on the python REPL as a tool under the hood, and will
force users to opt-in.
This downgrades `Function/tool calling` from a h3 to an h4 which means
it'll no longer show up in the right sidebar, but any direct links will
still work. I think that is ok, but LMK if you disapprove.
CC @hwchase17 @eyurtsev @rlancemartin
We need to use a different version of numpy for py3.8 and py3.12 in
pyproject.
And so do projects that use that Python version range and import
langchain.
- **Twitter handle:** _cbornet
**Description**
sqlalchemy uses "sqlalchemy.engine.URL" type for db uri argument.
Added 'URL' type for compatibility.
**Issue**: None
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This implements `show_progress` more consistently
(i.e. it is also added to the `HuggingFaceBgeEmbeddings` object).
- **Issue:** This implements `show_progress` more consistently in the
embeddings huggingface classes. Previously this could have been set via
`encode_kwargs`.
- **Dependencies:** None
- **Twitter handle:** @jonzeolla
… (#22795)
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** This PR updates the documentation to reflect the recent
code changes.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** A change I submitted recently introduced a bug in
`YoutubeLoader`'s `LINES` output format. In those conditions, curly
braces ("`{}`") creates a set, not a dictionary. This bugfix explicitly
specifies that a dictionary is created.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter:** lsloan_umich
- **Mastodon:**
[lsloan@mastodon.social](https://mastodon.social/@lsloan)
changed "# 🌟Recognition" to "### 🌟 Recognition" to match the rest of the
subheadings.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
Support for old clients (Thin and Thick) Oracle Vector Store
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
Support for old clients (Thin and Thick) Oracle Vector Store
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
Have our own local tests
---------
Co-authored-by: rohan.aggarwal@oracle.com <rohaagga@phoenix95642.dev3sub2phx.databasede3phx.oraclevcn.com>
- **Description:** Add a new format, `CHUNKS`, to
`langchain_community.document_loaders.youtube.YoutubeLoader` which
creates multiple `Document` objects from YouTube video transcripts
(captions), each of a fixed duration. The metadata of each chunk
`Document` includes the start time of each one and a URL to that time in
the video on the YouTube website.
I had implemented this for UMich (@umich-its-ai) in a local module, but
it makes sense to contribute this to LangChain community for all to
benefit and to simplify maintenance.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter:** lsloan_umich
- **Mastodon:**
[lsloan@mastodon.social](https://mastodon.social/@lsloan)
With regards to **tests and documentation**, most existing features of
the `YoutubeLoader` class are not tested. Only the
`YoutubeLoader.extract_video_id()` static method had a test. However,
while I was waiting for this PR to be reviewed and merged, I had time to
add a test for the chunking feature I've proposed in this PR.
I have added an example of using chunking to the
`docs/docs/integrations/document_loaders/youtube_transcript.ipynb`
notebook.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR add supports for Azure Cosmos DB for NoSQL vector store.
Summary:
Description: added vector store integration for Azure Cosmos DB for
NoSQL Vector Store,
Dependencies: azure-cosmos dependency,
Tag maintainer: @hwchase17, @baskaryan @efriis @eyurtsev
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** As pointed out in this issue #22770, DocumentDB
`similarity_search` does not support filtering through metadata which
this PR adds by passing in the parameter `filter`. Also this PR fixes a
minor Documentation error.
- **Issue:** #22770
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Ollama vision with messages in OpenAI-style support `{
"image_url": { "url": ... } }`
**Issue:** #22460
Added flexible solution for ChatOllama to support chat messages with
images. Works when you provide either `image_url` as a string or as a
dict with "url" inside (like OpenAI does). So it makes available to use
tuples with `ChatPromptTemplate.from_messages()`
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "langchain: Fix chain_filter.py to be compatible
with async"
- [ ] **PR message**:
- **Description:** chain_filter is not compatible with async.
- **Twitter handle:** pprados
- [X ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Signed-off-by: zhangwangda <zhangwangda94@163.com>
Co-authored-by: Prakul <discover.prakul@gmail.com>
Co-authored-by: Lei Zhang <zhanglei@apache.org>
Co-authored-by: Gin <ictgtvt@gmail.com>
Co-authored-by: wangda <38549158+daziz@users.noreply.github.com>
Co-authored-by: Max Mulatz <klappradla@posteo.net>
Thank you for contributing to LangChain!
### Description
Fix the example in the docstring of redis store.
Change the initilization logic and remove redundant check, enhance error
message.
### Issue
The example in docstring of how to use redis store was wrong.

### Dependencies
Nothing
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- [ ] **Miscellaneous updates and fixes**:
- **Description:** Handled error in querying; quotes in table names;
updated gpudb API
- **Issue:** Threw an error with an error message difficult to
understand if a query failed or returned no records
- **Dependencies:** Updated GPUDB API version to `7.2.0.9`
@baskaryan @hwchase17
- **Description:** allow to use partial variables to pass `top_k` and
`table_info`
- **Issue:** no
- **Dependencies:** no
- **Twitter handle:** @gymnstcs
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This PR updates the `WandbTracer` to work with the
new RunV2 API so that wandb Traces logging works correctly for new
LangChain versions. Here's an example
[run](https://wandb.ai/parambharat/langchain-tracing/runs/wpm99ftq) from
the existing tests
- **Issue:** https://github.com/wandb/wandb/issues/7762
- **Twitter handle:** @ParamBharat
_If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17._
**Updated ChatHuggingFace doc string as per issue #22296**:
"langchain_huggingface: updated docstring for ChatHuggingFace in
langchain_huggingface to match that of the description (in the appendix)
provided in issue #22296. "
**Issue:** This PR is in response to issue #22296, and more specifically
ChatHuggingFace model. In particular, this PR updates the docstring for
langchain/libs/partners/hugging_face/langchain_huggingface/chat_models/huggingface.py
by adding the following sections: Instantiate, Invoke, Stream, Async,
Tool calling, and Response metadata. I used the template from the
Anthropic implementation and referenced the Appendix of the original
issue post. I also noted that: langchain_community hugging face llms do
not work with langchain_huggingface's ChatHuggingFace model (at least
for me); the .stream(messages) functionality of ChatHuggingFace only
returned a block of response.
---------
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: Bagatur <baskaryan@gmail.com>
LLMs struggle with Graph RAG, because it's different from vector RAG in
a way that you don't provide the whole context, only the answer and the
LLM has to believe. However, that doesn't really work a lot of the time.
However, if you wrap the context as function response the accuracy is
much better.
btw... `union[LLMChain, Runnable]` is linting fun, that's why so many
ignores
**Description:** this PR adds Volcengine Rerank capability to Langchain,
you can find Volcengine Rerank API from
[here](https://www.volcengine.com/docs/84313/1254474) &
[here](https://www.volcengine.com/docs/84313/1254605).
[Volcengine](https://www.volcengine.com/) is a cloud service platform
developed by ByteDance, the parent company of TikTok. You can obtain
Volcengine API AK/SK from
[here](https://www.volcengine.com/docs/84313/1254553).
**Dependencies:** VolcengineRerank depends on `volcengine` python
package.
**Twitter handle:** my twitter/x account is https://x.com/LastMonopoly
and I'd like a mention, thank you!
**Tests and docs**
1. integration test: `test_volcengine_rerank.py`
2. example notebook: `volcengine_rerank.ipynb`
**Lint and test**: I have run `make format`, `make lint` and `make test`
from the root of the package I've modified.
Hi 👋
First off, thanks a ton for your work on this 💚 Really appreciate what
you're providing here for the community.
## Description
This PR adds a basic language parser for the
[Elixir](https://elixir-lang.org/) programming language. The parser code
is based upon the approach outlined in
https://github.com/langchain-ai/langchain/pull/13318: it's using
`tree-sitter` under the hood and aligns with all the other `tree-sitter`
based parses added that PR.
The `CHUNK_QUERY` I'm using here is probably not the most sophisticated
one, but it worked for my application. It's a starting point to provide
"core" parsing support for Elixir in LangChain. It enables people to use
the language parser out in real world applications which may then lead
to further tweaking of the queries. I consider this PR just the ground
work.
- **Dependencies:** requires `tree-sitter` and `tree-sitter-languages`
from the extended dependencies
- **Twitter handle:**`@bitcrowd`
## Checklist
- [x] **PR title**: "package: description"
- [x] **Add tests and docs**
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.
<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. -->
The fact that we outsourced pgvector to another project has an
unintended effect. The mapping dictionary found by
`_get_builtin_translator()` cannot recognize the new version of pgvector
because it comes from another package.
`SelfQueryRetriever` no longer knows `PGVector`.
I propose to fix this by creating a global dictionary that can be
populated by various database implementations. Thus, importing
`langchain_postgres` will allow the registration of the `PGvector`
mapping.
But for the moment I'm just adding a lazy import
Furthermore, the implementation of _get_builtin_translator()
reconstructs the BUILTIN_TRANSLATORS variable with each invocation,
which is not very efficient. A global map would be an optimization.
- **Twitter handle:** pprados
@eyurtsev, can you review this PR? And unlock the PR [Add async mode for
pgvector](https://github.com/langchain-ai/langchain-postgres/pull/32)
and PR [community[minor]: Add SQL storage
implementation](https://github.com/langchain-ai/langchain/pull/22207)?
Are you in favour of a global dictionary-based implementation of
Translator?
## Description
This PR addresses a logging inconsistency in the `get_user_agent`
function. Previously, the function was using the root logger to log a
warning message when the "USER_AGENT" environment variable was not set.
This bypassed the custom logger `log` that was created at the start of
the module, leading to potential inconsistencies in logging behavior.
Changes:
- Replaced `logging.warning` with `log.warning` in the `get_user_agent`
function to ensure that the custom logger is used.
This change ensures that all logging in the `get_user_agent` function
respects the configurations of the custom logger, leading to more
consistent and predictable logging behavior.
## Dependencies
None
## Issue
None
## Tests and docs
☝🏻 see description
## `make format`, `make lint` & `cd libs/community; make test`
```shell
> make format
poetry run ruff format docs templates cookbook
1417 files left unchanged
poetry run ruff check --select I --fix docs templates cookbook
All checks passed!
```
```shell
> make lint
poetry run ruff check docs templates cookbook
All checks passed!
poetry run ruff format docs templates cookbook --diff
1417 files already formatted
poetry run ruff check --select I docs templates cookbook
All checks passed!
git grep 'from langchain import' docs/docs templates cookbook | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
```
~cd libs/community; make test~ too much dependencies for integration ...
```shell
> poetry run pytest tests/unit_tests
....
==== 884 passed, 466 skipped, 4447 warnings in 15.93s ====
```
I choose you randomly : @ccurme
Adding `UpstashRatelimitHandler` callback for rate limiting based on
number of chain invocations or LLM token usage.
For more details, see [upstash/ratelimit-py
repository](https://github.com/upstash/ratelimit-py) or the notebook
guide included in this PR.
Twitter handle: @cahidarda
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- Refactor streaming to use raw events;
- Add `stream_usage` class attribute and kwarg to stream methods that,
if True, will include separate chunks in the stream containing usage
metadata.
There are two ways to implement streaming with anthropic's python sdk.
They have slight differences in how they surface usage metadata.
1. [Use helper
functions](https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#streaming-helpers).
This is what we are doing now.
```python
count = 1
with client.messages.stream(**params) as stream:
for text in stream.text_stream:
snapshot = stream.current_message_snapshot
print(f"{count}: {snapshot.usage} -- {text}")
count = count + 1
final_snapshot = stream.get_final_message()
print(f"{count}: {final_snapshot.usage}")
```
```
1: Usage(input_tokens=8, output_tokens=1) -- Hello
2: Usage(input_tokens=8, output_tokens=1) -- !
3: Usage(input_tokens=8, output_tokens=1) -- How
4: Usage(input_tokens=8, output_tokens=1) -- can
5: Usage(input_tokens=8, output_tokens=1) -- I
6: Usage(input_tokens=8, output_tokens=1) -- assist
7: Usage(input_tokens=8, output_tokens=1) -- you
8: Usage(input_tokens=8, output_tokens=1) -- today
9: Usage(input_tokens=8, output_tokens=1) -- ?
10: Usage(input_tokens=8, output_tokens=12)
```
To do this correctly, we need to emit a new chunk at the end of the
stream containing the usage metadata.
2. [Handle raw
events](https://github.com/anthropics/anthropic-sdk-python?tab=readme-ov-file#streaming-responses)
```python
stream = client.messages.create(**params, stream=True)
count = 1
for event in stream:
print(f"{count}: {event}")
count = count + 1
```
```
1: RawMessageStartEvent(message=Message(id='msg_01Vdyov2kADZTXqSKkfNJXcS', content=[], model='claude-3-haiku-20240307', role='assistant', stop_reason=None, stop_sequence=None, type='message', usage=Usage(input_tokens=8, output_tokens=1)), type='message_start')
2: RawContentBlockStartEvent(content_block=TextBlock(text='', type='text'), index=0, type='content_block_start')
3: RawContentBlockDeltaEvent(delta=TextDelta(text='Hello', type='text_delta'), index=0, type='content_block_delta')
4: RawContentBlockDeltaEvent(delta=TextDelta(text='!', type='text_delta'), index=0, type='content_block_delta')
5: RawContentBlockDeltaEvent(delta=TextDelta(text=' How', type='text_delta'), index=0, type='content_block_delta')
6: RawContentBlockDeltaEvent(delta=TextDelta(text=' can', type='text_delta'), index=0, type='content_block_delta')
7: RawContentBlockDeltaEvent(delta=TextDelta(text=' I', type='text_delta'), index=0, type='content_block_delta')
8: RawContentBlockDeltaEvent(delta=TextDelta(text=' assist', type='text_delta'), index=0, type='content_block_delta')
9: RawContentBlockDeltaEvent(delta=TextDelta(text=' you', type='text_delta'), index=0, type='content_block_delta')
10: RawContentBlockDeltaEvent(delta=TextDelta(text=' today', type='text_delta'), index=0, type='content_block_delta')
11: RawContentBlockDeltaEvent(delta=TextDelta(text='?', type='text_delta'), index=0, type='content_block_delta')
12: RawContentBlockStopEvent(index=0, type='content_block_stop')
13: RawMessageDeltaEvent(delta=Delta(stop_reason='end_turn', stop_sequence=None), type='message_delta', usage=MessageDeltaUsage(output_tokens=12))
14: RawMessageStopEvent(type='message_stop')
```
Here we implement the second option, in part because it should make
things easier when implementing streaming tool calls in the near future.
This would add two new chunks to the stream-- one at the beginning and
one at the end-- with blank content and containing usage metadata. We
add kwargs to the stream methods and a class attribute allowing for this
behavior to be toggled. I enabled it by default. If we merge this we can
add the same kwargs / attribute to OpenAI.
Usage:
```python
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(
model="claude-3-haiku-20240307",
temperature=0
)
full = None
for chunk in model.stream("hi"):
full = chunk if full is None else full + chunk
print(chunk)
print(f"\nFull: {full}")
```
```
content='' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 8, 'output_tokens': 0, 'total_tokens': 8}
content='Hello' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='!' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' How' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' can' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' I' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' assist' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' you' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content=' today' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='?' id='run-8a20843f-25c7-4025-ad72-9add395899e3'
content='' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 0, 'output_tokens': 12, 'total_tokens': 12}
Full: content='Hello! How can I assist you today?' id='run-8a20843f-25c7-4025-ad72-9add395899e3' usage_metadata={'input_tokens': 8, 'output_tokens': 12, 'total_tokens': 20}
```
They cause `poetry lock` to take a ton of time, and `uv pip install` can
resolve the constraints from these toml files in trivial time
(addressing problem with #19153)
This allows us to properly upgrade lockfile dependencies moving forward,
which revealed some issues that were either fixed or type-ignored (see
file comments)
Corrected a typo in the AutoGPT example notebook. Changed "Needed synce
jupyter runs an async eventloop" to "Needed since Jupyter runs an async
event loop".
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
removed an extra space before the period in the "Click **Create
codespace on master**." line.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- [x] **Adding AsyncRootListener**: "langchain_core: Adding
AsyncRootListener"
- **Description:** Adding an AsyncBaseTracer, AsyncRootListener and
`with_alistener` function. This is to enable binding async root listener
to runnables. This currently only supported for sync listeners.
- **Issue:** None
- **Dependencies:** None
- [x] **Add tests and docs**: Added units tests and example snippet code
within the function description of `with_alistener`
- [x] **Lint and test**: Run make format_diff, make lint_diff and make
test
## Description
The `path` param is used to specify the local persistence directory,
which isn't required if using Qdrant server.
This is a breaking but necessary change.
This PR adds support for using Databricks Unity Catalog functions as
LangChain tools, which runs inside a Databricks SQL warehouse.
* An example notebook is provided.
The response.get("model", self.model_name) checks if the model key
exists in the response dictionary. If it does, it uses that value;
otherwise, it uses self.model_name.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** This PR corrects the return type in the docstring of
the `docs/api_reference/create_api_rst.py/_load_package_modules`
function. The return type was previously described as a list of
Co-authored-by: suganthsolamanraja <suganth.solamanraja@techjays..com>
langchain-together depends on langchain-openai ^0.1.8
langchain-openai 0.1.8 has langchain-core >= 0.2.2
Here we bump langchain-core to 0.2.2, just to pass minimum dependency
version tests.
Corrected the phrase "complete done" to "completely done" for better
grammatical accuracy and clarity in the Agents section of the README.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
More direct entrypoint for a common use-case. Meant to give people a
more hands-on intro to document loaders/loading data from different data
sources as well.
Some duplicate content for RAG and extraction (to show what you can do
with the loaded documents), but defers to the appropriate sections
rather than going too in-depth.
@baskaryan @hwchase17
decisions to discuss
- only chat models
- model_provider isn't based on any existing values like llm-type,
package names, class names
- implemented as function not as a wrapper ChatModel
- function name (init_model)
- in langchain as opposed to community or core
- marked beta
Thank you for contributing to LangChain!
**Description:** Adds Langchain support for Nomic Embed Vision
**Twitter handle:** nomic_ai,zach_nussbaum
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** This PR addresses an issue with an existing test that
was not effectively testing the intended functionality. The previous
test setup did not adequately validate the filtering of the labels in
neo4j, because the nodes and relationship in the test data did not have
any properties set. Without properties these labels would not have been
returned, regardless of the filtering.
---------
Co-authored-by: Oskar Hane <oh@oskarhane.com>
This PR adds a constructor `metadata_indexing` parameter to the
Cassandra vector store to allow optional fine-tuning of which fields of
the metadata are to be indexed.
This is a feature supported by the underlying CassIO library. Indexing
mode of "all", "none" or deny- and allow-list based choices are
available.
The rationale is, in some cases it's advisable to programmatically
exclude some portions of the metadata from the index if one knows in
advance they won't ever be used at search-time. this keeps the index
more lightweight and performant and avoids limitations on the length of
_indexed_ strings.
I added a integration test of the feature. I also added the possibility
of running the integration test with Cassandra on an arbitrary IP
address (e.g. Dockerized), via
`CASSANDRA_CONTACT_POINTS=10.1.1.5,10.1.1.6 poetry run pytest [...]` or
similar.
While I was at it, I added a line to the `.gitignore` since the mypy
_test_ cache was not ignored yet.
My X (Twitter) handle: @rsprrs.
**Description:** This PR adds a `USER_AGENT` env variable that is to be
used for web scraping. It creates a util to get that user agent and uses
it in the classes used for scraping in [this piece of
doc](https://python.langchain.com/v0.1/docs/use_cases/web_scraping/).
Identifying your scraper is considered a good politeness practice, this
PR aims at easing it.
**Issue:** `None`
**Dependencies:** `None`
**Twitter handle:** `None`
# package community: Fix SQLChatMessageHistory
## Description
Here is a rewrite of `SQLChatMessageHistory` to properly implement the
asynchronous approach. The code circumvents [issue
22021](https://github.com/langchain-ai/langchain/issues/22021) by
accepting a synchronous call to `def add_messages()` in an asynchronous
scenario. This bypasses the bug.
For the same reasons as in [PR
22](https://github.com/langchain-ai/langchain-postgres/pull/32) of
`langchain-postgres`, we use a lazy strategy for table creation. Indeed,
the promise of the constructor cannot be fulfilled without this. It is
not possible to invoke a synchronous call in a constructor. We
compensate for this by waiting for the next asynchronous method call to
create the table.
The goal of the `PostgresChatMessageHistory` class (in
`langchain-postgres`) is, among other things, to be able to recycle
database connections. The implementation of the class is problematic, as
we have demonstrated in [issue
22021](https://github.com/langchain-ai/langchain/issues/22021).
Our new implementation of `SQLChatMessageHistory` achieves this by using
a singleton of type (`Async`)`Engine` for the database connection. The
connection pool is managed by this singleton, and the code is then
reentrant.
We also accept the type `str` (optionally complemented by `async_mode`.
I know you don't like this much, but it's the only way to allow an
asynchronous connection string).
In order to unify the different classes handling database connections,
we have renamed `connection_string` to `connection`, and `Session` to
`session_maker`.
Now, a single transaction is used to add a list of messages. Thus, a
crash during this write operation will not leave the database in an
unstable state with a partially added message list. This makes the code
resilient.
We believe that the `PostgresChatMessageHistory` class is no longer
necessary and can be replaced by:
```
PostgresChatMessageHistory = SQLChatMessageHistory
```
This also fixes the bug.
## Issue
- [issue 22021](https://github.com/langchain-ai/langchain/issues/22021)
- Bug in _exit_history()
- Bugs in PostgresChatMessageHistory and sync usage
- Bugs in PostgresChatMessageHistory and async usage
- [issue
36](https://github.com/langchain-ai/langchain-postgres/issues/36)
## Twitter handle:
pprados
## Tests
- libs/community/tests/unit_tests/chat_message_histories/test_sql.py
(add async test)
@baskaryan, @eyurtsev or @hwchase17 can you check this PR ?
And, I've been waiting a long time for validation from other PRs. Can
you take a look?
- [PR 32](https://github.com/langchain-ai/langchain-postgres/pull/32)
- [PR 15575](https://github.com/langchain-ai/langchain/pull/15575)
- [PR 13200](https://github.com/langchain-ai/langchain/pull/13200)
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** The InMemoryVectorStore is a nice and simple vector
store implementation for quick development and debugging. The current
implementation is quite limited in its functionalities. This PR extends
the functionalities by adding utility function to persist the vector
store to a json file and to load it from a json file. We choose the json
file format because it allows inspection of the database contents in a
text editor, which is great for debugging. Furthermore, it adds a
`filter` keyword that can be used to filter out documents on their
`page_content` or `metadata`.
- **Issue:** -
- **Dependencies:** -
- **Twitter handle:** @Vincent_Min
- [ ] **community**: "vectorstore: added filtering support for LanceDB
vector store"
- [ ] **This PR adds filtering capabilities to LanceDB**:
- **Description:** In LanceDB filtering can be applied when searching
for data into the vectorstore. It is using the SQL language as mentioned
in the LanceDB documentation.
- **Issue:** #18235
- **Dependencies:** No
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
This PR adds deduplication of callback handlers in merge_configs.
Fix for this issue:
https://github.com/langchain-ai/langchain/issues/22227
The issue appears when the code is:
1) running python >=3.11
2) invokes a runnable from within a runnable
3) binds the callbacks to the child runnable from the parent runnable
using with_config
In this case, the same callbacks end up appearing twice: (1) the first
time from with_config, (2) the second time with langchain automatically
propagating them on behalf of the user.
Prior to this PR this will emit duplicate events:
```python
@tool
async def get_items(question: str, callbacks: Callbacks): # <--- Accept callbacks
"""Ask question"""
template = ChatPromptTemplate.from_messages(
[
(
"human",
"'{question}"
)
]
)
chain = template | chat_model.with_config(
{
"callbacks": callbacks, # <-- Propagate callbacks
}
)
return await chain.ainvoke({"question": question})
```
Prior to this PR this will work work correctly (no duplicate events):
```python
@tool
async def get_items(question: str, callbacks: Callbacks): # <--- Accept callbacks
"""Ask question"""
template = ChatPromptTemplate.from_messages(
[
(
"human",
"'{question}"
)
]
)
chain = template | chat_model
return await chain.ainvoke({"question": question}, {"callbacks": callbacks})
```
This will also work (as long as the user is using python >= 3.11) -- as
langchain will automatically propagate callbacks
```python
@tool
async def get_items(question: str,):
"""Ask question"""
template = ChatPromptTemplate.from_messages(
[
(
"human",
"'{question}"
)
]
)
chain = template | chat_model
return await chain.ainvoke({"question": question})
```
Thank you for contributing to LangChain!
**Description:** update to the Vectara / Langchain integration to
integrate new Vectara capabilities:
- Full RAG implemented as a Runnable with as_rag()
- Vectara chat supported with as_chat()
- Both support streaming response
- Updated documentation and example notebook to reflect all the changes
- Updated Vectara templates
**Twitter handle:** ofermend
**Add tests and docs**: no new tests or docs, but updated both existing
tests and existing docs
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** Updated dead link referencing chroma docs in Chroma
notebook under vectorstores
…s and Opensearch Semantic Cache
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [ ] **Packages affected**:
- community: fix `cosine_similarity` to support simsimd beyond 3.7.7
- partners/milvus: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/mongodb: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/pinecone: fix `cosine_similarity` to support simsimd beyond
3.7.7
- partners/qdrant: fix `cosine_similarity` to support simsimd beyond
3.7.7
- [ ] **Broadcast operation failure while using simsimd beyond v3.7.7**:
- **Description:** I was using simsimd 4.3.1 and the unsupported operand
type issue popped up. When I checked out the repo and ran the tests,
they failed as well (have attached a screenshot for that). Looks like it
is a variant of https://github.com/langchain-ai/langchain/issues/18022 .
Prior to 3.7.7, simd.cdist returned an ndarray but now it returns
simsimd.DistancesTensor which is ineligible for a broadcast operation
with numpy. With this change, it also remove the need to explicitly cast
`Z` to numpy array
- **Issue:** #19905
- **Dependencies:** No
- **Twitter handle:** https://x.com/GetzJoydeep
<img width="1622" alt="Screenshot 2024-05-29 at 2 50 00 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/fb27b383-a9ae-4a6f-b355-6d503b72db56">
- [ ] **Considerations**:
1. I started with community but since similar changes were there in
Milvus, MongoDB, Pinecone, and QDrant so I modified their files as well.
If touching multiple packages in one PR is not the norm, then I can
remove them from this PR and raise separate ones
2. I have run and verified that the tests work. Since, only MongoDB had
tests, I ran theirs and verified it works as well. Screenshots attached
:
<img width="1573" alt="Screenshot 2024-05-29 at 2 52 13 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/ce87d1ea-19b6-4900-9384-61fbc1a30de9">
<img width="1614" alt="Screenshot 2024-05-29 at 3 33 51 PM"
src="https://github.com/langchain-ai/langchain/assets/31132555/6ce1d679-db4c-4291-8453-01028ab2dca5">
I have added a test for simsimd. I feel it may not go well with the
CI/CD setup as installing simsimd is not a dependency requirement. I
have just imported simsimd to ensure simsimd cosine similarity is
invoked. However, its not a good approach. Suggestions are welcome and I
can make the required changes on the PR. Please provide guidance on the
same as I am new to the community.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
### Description
Add tools implementation to `ChatEdenAI`:
- `bind_tools()`
- `with_structured_output()`
### Documentation
Updated `docs/docs/integrations/chat/edenai.ipynb`
### Notes
We don´t support stream with tools as of yet. If stream is called with
tools we directly yield the whole message from `generate` (implemented
the same way as Anthropic did).
- [x] **PR title**: Update docstrings for OpenAI base.py
-**Description:** Updated the docstring of few OpenAI functions for a
better understanding of the function.
- **Issue:** #21983
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Noticing errors logged in some situations when tracing with Langsmith:
```python
from langchain_core.pydantic_v1 import BaseModel
from langchain_anthropic import ChatAnthropic
class AnswerWithJustification(BaseModel):
"""An answer to the user question along with justification for the answer."""
answer: str
justification: str
llm = ChatAnthropic(model="claude-3-haiku-20240307")
structured_llm = llm.with_structured_output(AnswerWithJustification)
list(structured_llm.stream("What weighs more a pound of bricks or a pound of feathers"))
```
```
Error in LangChainTracer.on_chain_end callback: AttributeError("'NoneType' object has no attribute 'append'")
[AnswerWithJustification(answer='A pound of bricks and a pound of feathers weigh the same amount.', justification='This is because a pound is a unit of mass, not volume. By definition, a pound of any material, whether bricks or feathers, will weigh the same - one pound. The physical size or volume of the materials does not matter when measuring by mass. So a pound of bricks and a pound of feathers both weigh exactly one pound.')]
```
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
The Vectorstore's API `as_retriever` doesn't expose explicitly the
parameters `search_type` and `search_kwargs` and so these are not well
documented.
This PR improves `as_retriever` for the Cassandra VectorStore by making
these parameters explicit.
NB: An alternative would have been to modify `as_retriever` in
`Vectorstore`. But there's probably a good reason these were not exposed
in the first place ? Is it because implementations may decide to not
support them and have fixed values when creating the
VectorStoreRetriever ?
- **Description:** Added support for using HuggingFacePipeline in
ChatHuggingFace (previously it was only usable with API endpoints,
probably by oversight).
- **Issue:** #19997
- **Dependencies:** none
- **Twitter handle:** none
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR introduces namespace support for Upstash Vector Store, which
would allow users to partition their data in the vector index.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:**
This PR fixes a rendering issue in the docs (Python notebook) of HANA
Cloud Vector Engine.
- **Issue:** N/A
- **Dependencies:** no new dependencies added
File of the fixed notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
## Description
This PR allows passing the HTMLSectionSplitter paths to xslt files. It
does so by fixing two trivial bugs with how passed paths were being
handled. It also changes the default value of the param `xslt_path` to
`None` so the special case where the file was part of the langchain
package could be handled.
## Issue
#22175
- [X] **PR title**: "community: added optional params to Airtable
table.all()"
- [X] **PR message**:
- **Description:** Add's **kwargs to AirtableLoader to allow for kwargs:
https://pyairtable.readthedocs.io/en/latest/api.html#pyairtable.Table.all
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** parakoopa88
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
"community/embeddings: update oracleai.py"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Adding oracle VECTOR_ARRAY_T support.
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Tests are not impacted.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Done.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** When I was running the SparkLLMTextEmbeddings,
app_id, api_key and api_secret are all correct, but it cannot run
normally using the current URL.
```python
# example
from langchain_community.embeddings import SparkLLMTextEmbeddings
embedding= SparkLLMTextEmbeddings(
spark_app_id="my-app-id",
spark_api_key="my-api-key",
spark_api_secret="my-api-secret"
)
embedding= "hello"
print(spark.embed_query(text1))
```

So I updated the url and request body parameters according to
[Embedding_api](https://www.xfyun.cn/doc/spark/Embedding_api.html), now
it is runnable.
**Description:** [IPEX-LLM](https://github.com/intel-analytics/ipex-llm)
is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local
PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low
latency. This PR adds ipex-llm integrations to langchain for BGE
embedding support on both Intel CPU and GPU.
**Dependencies:** `ipex-llm`, `sentence-transformers`
**Contribution maintainer**: @Oscilloscope98
**tests and docs**:
- langchain/docs/docs/integrations/text_embedding/ipex_llm.ipynb
- langchain/docs/docs/integrations/text_embedding/ipex_llm_gpu.ipynb
-
langchain/libs/community/tests/integration_tests/embeddings/test_ipex_llm.py
---------
Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com>
Anthropic's streaming treats tool calls as different content parts
(streamed back with a different index) from normal content in the
`content`.
This means that we need to update our chunk-merging logic to handle
chunks with multi-part content. The alternative is coerceing Anthropic's
responses into a string, but we generally like to preserve model
provider responses faithfully when we can. This will also likely be
useful for multimodal outputs in the future.
This current PR does unfortunately make `index` a magic field within
content parts, but Anthropic and OpenAI both use it at the moment to
determine order anyway. To avoid cases where we have content arrays with
holes and to simplify the logic, I've also restricted merging to chunks
in order.
TODO: tests
CC @baskaryan @ccurme @efriis
- This fixes all the tracing issues with people still using
get_relevant_docs, and a change we need for 0.3 anyway
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- **Description:** The `ApifyWrapper` class expects `apify_api_token` to
be passed as a named parameter or set as an environment variable. But
the corresponding field was missing in the class definition causing the
argument to be ignored when passed as a named param. This patch fixes
that.
- This is a pattern that shows up occasionally in langgraph questions,
people chain a graph to something else after, and want to pass the graph
some kwargs (eg. stream_mode)
- [x] How to: use a vector store to retrieve data
- [ ] How to: generate multiple queries to retrieve data for
- [x] How to: use contextual compression to compress the data retrieved
- [x] How to: write a custom retriever class
- [x] How to: add similarity scores to retriever results
^ done last month
- [x] How to: combine the results from multiple retrievers
- [x] How to: reorder retrieved results to mitigate the "lost in the
middle" effect
- [x] How to: generate multiple embeddings per document
^ this PR
- [ ] How to: retrieve the whole document for a chunk
- [ ] How to: generate metadata filters
- [ ] How to: create a time-weighted retriever
- [ ] How to: use hybrid vector and keyword retrieval
^ todo
1/ added section at start with full code
2/ removed retriever tool (was just distracting)
3/ added section on starting a new conversation
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
LangSmith and LangChain context var handling evolved in parallel since
originally we didn't expect people to want to interweave the decorator
and langchain code.
Once we get a new langsmith release, this PR will let you seemlessly
hand off between @traceable context and runnable config context so you
can arbitrarily nest code.
It's expected that this fails right now until we get another release of
the SDK
### Issue: #22299
### descriptions
The documentation appears to be wrong. When the user actually sets this
parameter "asynchronous" to be True, it fails because the __init__
function of FAISS class doesn't allow this parameter. In fact, most of
the class/instance functions of this class have both the sync/async
version, so it looks like what we need is just to remove this parameter
from the doc.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Lifu Wu <lifu@nextbillion.ai>
- [x] Docs Update: Ollama
- llm/ollama
- Switched to using llama3 as model with reference to templating and
prompting
- Added concurrency notes to llm/ollama docs
- chat_models/ollama
- Added concurrency notes to llm/ollama docs
- text_embedding/ollama
- include example for specific embedding models from Ollama
- **Description:** This PR contains a bugfix which result in malfunction
of multi-turn conversation in QianfanChatEndpoint and adaption for
ToolCall and ToolMessage
ChatOpenAI supports a kwarg `stream_options` which can take values
`{"include_usage": True}` and `{"include_usage": False}`.
Setting include_usage to True adds a message chunk to the end of the
stream with usage_metadata populated. In this case the final chunk no
longer includes `"finish_reason"` in the `response_metadata`. This is
the current default and is not yet released. Because this could be
disruptive to workflows, here we remove this default. The default will
now be consistent with OpenAI's API (see parameter
[here](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options)).
Examples:
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
for chunk in llm.stream("hi"):
print(chunk)
```
```
content='' id='run-8cff4721-2acd-4551-9bf7-1911dae46b92'
content='Hello' id='run-8cff4721-2acd-4551-9bf7-1911dae46b92'
content='!' id='run-8cff4721-2acd-4551-9bf7-1911dae46b92'
content='' response_metadata={'finish_reason': 'stop'} id='run-8cff4721-2acd-4551-9bf7-1911dae46b92'
```
```python
for chunk in llm.stream("hi", stream_options={"include_usage": True}):
print(chunk)
```
```
content='' id='run-39ab349b-f954-464d-af6e-72a0927daa27'
content='Hello' id='run-39ab349b-f954-464d-af6e-72a0927daa27'
content='!' id='run-39ab349b-f954-464d-af6e-72a0927daa27'
content='' response_metadata={'finish_reason': 'stop'} id='run-39ab349b-f954-464d-af6e-72a0927daa27'
content='' id='run-39ab349b-f954-464d-af6e-72a0927daa27' usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}
```
```python
llm = ChatOpenAI().bind(stream_options={"include_usage": True})
for chunk in llm.stream("hi"):
print(chunk)
```
```
content='' id='run-59918845-04b2-41a6-8d90-f75fb4506e0d'
content='Hello' id='run-59918845-04b2-41a6-8d90-f75fb4506e0d'
content='!' id='run-59918845-04b2-41a6-8d90-f75fb4506e0d'
content='' response_metadata={'finish_reason': 'stop'} id='run-59918845-04b2-41a6-8d90-f75fb4506e0d'
content='' id='run-59918845-04b2-41a6-8d90-f75fb4506e0d' usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17}
```
Add kwargs in add_documents function
**langchain**: Add **kwargs in parent_document_retriever"
- **Add kwargs for `add_document` in `parent_document_retriever.py`**
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Issue: The `arXiv` page is missing the arxiv paper references from the
`langchain/cookbook`.
PR: Added the cookbook references.
Result: `Found 29 arXiv references in the 3 docs, 21 API Refs, 5
Templates, and 18 Cookbooks.` - much more references are visible now.
**Description:** Update langchainhub integration test dependency and add
an integration test for pulling private prompt
**Dependencies:** langchainhub 0.1.16
Change 'FIREWALL' to 'FIRECRAWL' as I believe this may have been in
error. Other docs refer to 'FIRECRAWL_API_KEY'.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
# Description
## Problem
`Runnable.get_graph` fails when `InputType` or `OutputType` property
raises `TypeError`.
-
003c98e5b4/libs/core/langchain_core/runnables/base.py (L250-L274)
-
003c98e5b4/libs/core/langchain_core/runnables/base.py (L394-L396)
This problem prevents getting a graph of `Runnable` objects whose
`InputType` or `OutputType` property raises `TypeError` but whose
`invoke` works well, such as `langchain.output_parsers.RegexParser`,
which I have already pointed out in #19792 that a `TypeError` would
occur.
## Solution
- Add `try-except` syntax to handle `TypeError` to the codes which get
`input_node` and `output_node`.
# Issue
- #19801
# Twitter Handle
- [hmdev3](https://twitter.com/hmdev3)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [ ] **PR title**: "Fix list handling in Clova embeddings example
documentation"
- Description:
Fixes a bug in the Clova Embeddings example documentation where
document_text was incorrectly wrapped in an additional list.
- Rationale
The embed_documents method expects a list, but the previous example
wrapped document_text in an unnecessary additional list, causing an
error. The updated example correctly passes document_text directly to
the method, ensuring it functions as intended.
Added the missing verb "is" and a comma to the text in the Prompt
Templates description within the Build a Simple LLM Application tutorial
for more clarity.
- **Description:** updated documentation for llama, falcona and gemma on
Vertex AI Model garden
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** NA
@lkuligin for review
---------
Co-authored-by: adityarane@google.com <adityarane@google.com>
Thank you for contributing to LangChain!
- [x] **PR title**: community: Add Zep Cloud components + docs +
examples
- [x] **PR message**:
We have recently released our new zep-cloud sdks that are compatible
with Zep Cloud (not Zep Open Source). We have also maintained our Cloud
version of langchain components (ChatMessageHistory, VectorStore) as
part of our sdks. This PRs goal is to port these components to langchain
community repo, and close the gap with the existing Zep Open Source
components already present in community repo (added
ZepCloudMemory,ZepCloudVectorStore,ZepCloudRetriever).
Also added a ZepCloudChatMessageHistory components together with an
expression language example ported from our repo. We have left the
original open source components intact on purpose as to not introduce
any breaking changes.
- **Issue:** -
- **Dependencies:** Added optional dependency of our new cloud sdk
`zep-cloud`
- **Twitter handle:** @paulpaliychuk51
- [x] **Add tests and docs**
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
3 fixes of DuckDB vector store:
- unify defaults in constructor and from_texts (users no longer have to
specify `vector_key`).
- include search similarity into output metadata (fixes#20969)
- significantly improve performance of `from_documents`
Dependencies: added Pandas to speed up `from_documents`.
I was thinking about CSV and JSON options, but I expect trouble loading
JSON values this way and also CSV and JSON options require storing data
to disk.
Anyway, the poetry file for langchain-community already contains a
dependency on Pandas.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Generates release notes based on a `git log` command with title names
Aiming to improve to splitting out features vs. bugfixes using
conventional commits in the coming weeks.
Will work for any monorepo packages
- **Description:** this PR gives clickhouse client the ability to use a
secure connection to the clickhosue server
- **Issue:** fixes#22082
- **Dependencies:** -
- **Twitter handle:** `_codingcoffee_`
Signed-off-by: Ameya Shenoy <shenoy.ameya@gmail.com>
Co-authored-by: Shresth Rana <shresth@grapevine.in>
OpenAI recently added a `stream_options` parameter to its chat
completions API (see [release
notes](https://platform.openai.com/docs/changelog/added-chat-completions-stream-usage)).
When this parameter is set to `{"usage": True}`, an extra "empty"
message is added to the end of a stream containing token usage. Here we
propagate token usage to `AIMessage.usage_metadata`.
We enable this feature by default. Streams would now include an extra
chunk at the end, **after** the chunk with
`response_metadata={'finish_reason': 'stop'}`.
New behavior:
```
[AIMessageChunk(content='', id='run-4b20dbe0-3817-4f62-b89d-03ef76f25bde'),
AIMessageChunk(content='Hello', id='run-4b20dbe0-3817-4f62-b89d-03ef76f25bde'),
AIMessageChunk(content='!', id='run-4b20dbe0-3817-4f62-b89d-03ef76f25bde'),
AIMessageChunk(content='', response_metadata={'finish_reason': 'stop'}, id='run-4b20dbe0-3817-4f62-b89d-03ef76f25bde'),
AIMessageChunk(content='', id='run-4b20dbe0-3817-4f62-b89d-03ef76f25bde', usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17})]
```
Old behavior (accessible by passing `stream_options={"include_usage":
False}` into (a)stream:
```
[AIMessageChunk(content='', id='run-1312b971-c5ea-4d92-9015-e6604535f339'),
AIMessageChunk(content='Hello', id='run-1312b971-c5ea-4d92-9015-e6604535f339'),
AIMessageChunk(content='!', id='run-1312b971-c5ea-4d92-9015-e6604535f339'),
AIMessageChunk(content='', response_metadata={'finish_reason': 'stop'}, id='run-1312b971-c5ea-4d92-9015-e6604535f339')]
```
From what I can tell this is not yet implemented in Azure, so we enable
only for ChatOpenAI.
Thank you for contributing to LangChain!
- [X] **PR title**: community: Updated langchain-community PremAI
documentation
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Hey, I'm Sasha. The SDK engineer from [Comet](https://comet.com).
This PR updates the CometTracer class.
Added metadata to CometTracerr. From now on, both chains and spans will
send it.
* Lint for usage of standard xml library
* Add forced opt-in for quip client
* Actual security issue is with underlying QuipClient not LangChain
integration (since the client is doing the parsing), but adding
enforcement at the LangChain level.
- **Description:** I've added a tab on embedding text with LangChain
using Hugging Face models to here:
https://python.langchain.com/v0.2/docs/how_to/embed_text/. HF was
mentioned in the running text, but not in the tabs, which I thought was
odd.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** No need, this is tiny :)
Also, I had a ton of issues with the poetry docs/lint install, so I
haven't linted this. Apologies for that.
cc @Jofthomas
- Tom Aarsen
If tool_use blocks and tool_calls with overlapping IDs are present,
prefer the values of the tool_calls. Allows for mutating AIMessages just
via tool_calls.
**PR message**:
Update `hub.pull("rlm/map-prompt")` to `hub.pull("rlm/reduce-prompt")`
in summarization.ipynb
**Description:**
Fix typo in prompt hub link from `reduce_prompt =
hub.pull("rlm/map-prompt")` to `reduce_prompt =
hub.pull("rlm/reduce-prompt")` following next issue
**Issue:** #22014
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
```python
class UsageMetadata(TypedDict):
"""Usage metadata for a message, such as token counts.
Attributes:
input_tokens: (int) count of input (or prompt) tokens
output_tokens: (int) count of output (or completion) tokens
total_tokens: (int) total token count
"""
input_tokens: int
output_tokens: int
total_tokens: int
```
```python
class AIMessage(BaseMessage):
...
usage_metadata: Optional[UsageMetadata] = None
"""If provided, token usage information associated with the message."""
...
```
- **Description:** When I was running the sparkllm, I found that the
default parameters currently used could no longer run correctly.
- original parameters & values:
- spark_api_url: "wss://spark-api.xf-yun.com/v3.1/chat"
- spark_llm_domain: "generalv3"
```python
# example
from langchain_community.chat_models import ChatSparkLLM
spark = ChatSparkLLM(spark_app_id="my_app_id",
spark_api_key="my_api_key", spark_api_secret="my_api_secret")
spark.invoke("hello")
```

So I updated them to 3.5 (same as sparkllm official website). After the
update, they can be used normally.
- new parameters & values:
- spark_api_url: "wss://spark-api.xf-yun.com/v3.5/chat"
- spark_llm_domain: "generalv3.5"
This pull request addresses and fixes exception handling in the
UpstageLayoutAnalysisParser and enhances the test coverage by adding
error exception tests for the document loader. These improvements ensure
robust error handling and increase the reliability of the system when
dealing with external API calls and JSON responses.
### Changes Made
1. Fix Request Exception Handling:
- Issue: The existing implementation of UpstageLayoutAnalysisParser did
not properly handle exceptions thrown by the requests library, which
could lead to unhandled exceptions and potential crashes.
- Solution: Added comprehensive exception handling for
requests.RequestException to catch any request-related errors. This
includes logging the error details and raising a ValueError with a
meaningful error message.
2. Add Error Exception Tests for Document Loader:
- New Tests: Introduced new test cases to verify the robustness of the
UpstageLayoutAnalysisLoader against various error scenarios. The tests
ensure that the loader gracefully handles:
- RequestException: Simulates network issues or invalid API requests to
ensure appropriate error handling and user feedback.
- JSONDecodeError: Simulates scenarios where the API response is not a
valid JSON, ensuring the system does not crash and provides clear error
messaging.
**Description:**
- Added propagation of document metadata from O365BaseLoader to
FileSystemBlobLoader (O365BaseLoader uses FileSystemBlobLoader under the
hood).
- This is done by passing dictionary `metadata_dict`: key=filename and
value=dictionary containing document's metadata
- Modified `FileSystemBlobLoader` to accept the `metadata_dict`, use
`mimetype` from it (if available) and pass metadata further into blob
loader.
**Issue:**
- `O365BaseLoader` under the hood downloads documents to temp folder and
then uses `FileSystemBlobLoader` on it.
- However metadata about the document in question is lost in this
process. In particular:
- `mime_type`: `FileSystemBlobLoader` guesses `mime_type` from the file
extension, but that does not work 100% of the time.
- `web_url`: this is useful to keep around since in RAG LLM we might
want to provide link to the source document. In order to work well with
document parsers, we pass the `web_url` as `source` (`web_url` is
ignored by parsers, `source` is preserved)
**Dependencies:**
None
**Twitter handle:**
@martintriska1
Please review @baskaryan
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "Add CloudBlobLoader"
- community: Add CloudBlobLoader
- [ ] **PR message**: Add cloud blob loader
- **Description:**
Langchain provides several approaches to read different file formats:
Specific loaders (`CVSLoader`) or blob-compatible loaders
(`FileSystemBlobLoader`). The only implementation proposed for
BlobLoader is `FileSystemBlobLoader`.
Many projects retrieve files from cloud storage. We propose a new
implementation of `BlobLoader` to read files from the three cloud
storage systems. The interface is strictly identical to
`FileSystemBlobLoader`. The only difference is the constructor, which
takes a cloud "url" object such as `s3://my-bucket`, `az://my-bucket`,
or `gs://my-bucket`.
By streamlining the process, this novel implementation eliminates the
requirement to pre-download files from cloud storage to local temporary
files (which are seldom removed).
The code relies on the
[CloudPathLib](https://cloudpathlib.drivendata.org/stable/) library to
interpret cloud URLs. This has been added as an optional dependency.
```Python
loader = CloudBlobLoader("s3://mybucket/id")
for blob in loader.yield_blobs():
print(blob)
```
- [X] **Dependencies:** CloudPathLib
- [X] **Twitter handle:** pprados
- [X] **Add tests and docs**: Add unit test, but it's easy to convert to
integration test, with some files in a cloud storage (see
`test_cloud_blob_loader.py`)
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.
Hello from Paris @hwchase17. Can you review this PR?
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
This PR contains 4 added functions:
- max_marginal_relevance_search_by_vector
- amax_marginal_relevance_search_by_vector
- max_marginal_relevance_search
- amax_marginal_relevance_search
I'm no langchain expert, but tried do inspect other vectorstore sources
like chroma, to build these functions for SurrealDB. If someone has some
changes for me, please let me know. Otherwise I would be happy, if these
changes are added to the repository, so that I can use the orignal repo
and not my local monkey patched version.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:https://github.com/arpitkumar980/langchain.git
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Fixed `AzureSearchVectorStoreRetriever` to account
for search_kwargs. More explanation is in the mentioned issue.
- **Issue:** #21492
---------
Co-authored-by: MAC <mac@MACs-MacBook-Pro.local>
Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [X] **PR title**: "docs: Chroma docstrings update"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [X] **PR message**:
- **Description:** Added and updated Chroma docstrings
- **Issue:** https://github.com/langchain-ai/langchain/issues/21983
- [X] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- only docs
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Description: This change adds args_schema (pydantic BaseModel) to
WikipediaQueryRun for correct schema formatting on LLM function calls
Issue: currently using WikipediaQueryRun with OpenAI function calling
returns the following error "TypeError: WikipediaQueryRun._run() got an
unexpected keyword argument '__arg1' ". This happens because the schema
sent to the LLM is "input: '{"__arg1":"Hunter x Hunter"}'" while the
method should be called with the "query" parameter.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Added [Scrapfly](https://scrapfly.io/) Web Loader integration. Scrapfly
is a web scraping API that allows extracting web page data into
accessible markdown or text datasets.
- __Description__: Added Scrapfly web loader for retrieving web page
data as markdown or text.
- Dependencies: scrapfly-sdk
- Twitter: @thealchemi1st
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.8. Adds [”showRankingScore”:
true”](https://www.meilisearch.com/docs/reference/api/search#ranking-score)
in the search parameters and replaces `_semanticScore` field with `
_rankingScore`
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:**
- Extend AzureSearch with `maximal_marginal_relevance` (for vector and
hybrid search)
- Add construction `from_embeddings` - if the user has already embedded
the texts
- Add `add_embeddings`
- Refactor common parts (`_simple_search`, `_results_to_documents`,
`_reorder_results_with_maximal_marginal_relevance`)
- Add `vector_search_dimensions` as a parameter to the constructor to
avoid extra calls to `embed_query` (most of the time the user applies
the same model and knows the dimension)
**Issue:** none
**Dependencies:** none
- [x] **Add tests and docs**: The docstrings have been added to the new
functions, and unified for the existing ones. The example notebook is
great in illustrating the main usage of AzureSearch, adding the new
methods would only dilute the main content.
- [x] **Lint and test**
---------
Co-authored-by: Oleksii Pokotylo <oleksii.pokotylo@pwc.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Backwards compatible extension of the initialisation
interface of HanaDB to allow the user to specify
specific_metadata_columns that are used for metadata storage of selected
keys which yields increased filter performance. Any not-mentioned
metadata remains in the general metadata column as part of a JSON
string. Furthermore switched to executemany for batch inserts into
HanaDB.
**Issue:** N/A
**Dependencies:** no new dependencies added
**Twitter handle:** @sapopensource
---------
Co-authored-by: Martin Kolb <martin.kolb@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Added extra functionality to `CharacterTextSplitter`,
`TextSplitter` classes.
The user can select whether to append the separator to the previous
chunk with `keep_separator='end' ` or else prepend to the next chunk.
Previous functionality prepended by default to next chunk.
**Issue:** Fixes#20908
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Integrate RankLLM reranker (https://github.com/castorini/rank_llm) into
LangChain
An example notebook is given in
`docs/docs/integrations/retrievers/rankllm-reranker.ipynb`
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Bug code**: In
langchain_community/document_loaders/csv_loader.py:100
- **Description**: currently, when 'CSVLoader' reads the column as None
in the 'csv' file, it will report an error because the 'CSVLoader' does
not verify whether the column is of str type and does not consider how
to handle the corresponding 'row_data' when the column is' None 'in the
csv. This pr provides a solution.
- **Issue:** Fix#20699
- **thinking:**
1. Refer to the processing method for
'langchain_community/document_loaders/csv_loader.py:100' when **'v'**
equals'None', and apply the same method to '**k**'.
(Reference`csv.DictReader` ,**'k'** will only be None when `
len(columns) < len(number_row_data)` is established)
2. **‘k’** equals None only holds when it is the last column, and its
corresponding **'v'** type is a list. Therefore, I referred to the data
format in 'Document' and used ',' to concatenated the elements in the
list.(But I'm not sure if you accept this form, if you have any other
ideas, communicate)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Added revision_example prompt template to include the
revision request and revision examples in the revision chain.
**Issue:** Not Applicable
**Dependencies:** Not Applicable
**Twitter handle:** @nithinjp09
## Description
The existing public interface for `langchain_community.emeddings` is
broken. In this file, `__all__` is statically defined, but is
subsequently overwritten with a dynamic expression, which type checkers
like pyright do not support. pyright actually gives the following
diagnostic on the line I am requesting we remove:
[reportUnsupportedDunderAll](https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportUnsupportedDunderAll):
```
Operation on "__all__" is not supported, so exported symbol list may be incorrect
```
Currently, I get the following errors when attempting to use publicablly
exported classes in `langchain_community.emeddings`:
```python
import langchain_community.embeddings
langchain_community.embeddings.HuggingFaceEmbeddings(...) # error: "HuggingFaceEmbeddings" is not exported from module "langchain_community.embeddings" (reportPrivateImportUsage)
```
This is solved easily by removing the dynamic expression.
Thank you for contributing to LangChain!
- [X] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
**Description:**
Fix ChatDatabricsk in case that streaming response doesn't have role
field in delta chunk
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Updates docs so the example doesn't lead to a warning:
```
LangChainDeprecationWarning: Importing tools from langchain is deprecated. Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead:
`from langchain_community.tools import WikipediaQueryRun`.
To install langchain-community run `pip install -U langchain-community`.
```
## 'raise_for_status' parameter of WebBaseLoader works in sync load but
not in async load.
In webBaseLoader:
Sync load is calling `_scrape` and has `raise_for_status` properly
handled.
```
def _scrape(
self,
url: str,
parser: Union[str, None] = None,
bs_kwargs: Optional[dict] = None,
) -> Any:
from bs4 import BeautifulSoup
if parser is None:
if url.endswith(".xml"):
parser = "xml"
else:
parser = self.default_parser
self._check_parser(parser)
html_doc = self.session.get(url, **self.requests_kwargs)
if self.raise_for_status:
html_doc.raise_for_status()
if self.encoding is not None:
html_doc.encoding = self.encoding
elif self.autoset_encoding:
html_doc.encoding = html_doc.apparent_encoding
return BeautifulSoup(html_doc.text, parser, **(bs_kwargs or {}))
```
Async load is calling `_fetch` but missing `raise_for_status` logic.
```
async def _fetch(
self, url: str, retries: int = 3, cooldown: int = 2, backoff: float = 1.5
) -> str:
async with aiohttp.ClientSession() as session:
for i in range(retries):
try:
async with session.get(
url,
headers=self.session.headers,
ssl=None if self.session.verify else False,
cookies=self.session.cookies.get_dict(),
) as response:
return await response.text()
```
Co-authored-by: kefan.you <darkfss@sina.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "update IBM WatsonxLLM docs with deprecated
LLMChain"
- [x] **PR message**:
- **Description:** update IBM WatsonxLLM docs with deprecated LLMChain
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Title**: "langchain: OpenAI Assistants v2 api support"
***Descriptions***
- [x] "attachments" support added along with backward compatibility of
"file_ids"
- [x] "tool_resources" support added while creating new assistant
- [ ] "tool_choice" parameter support
- [ ] Streaming support
- **Dependencies:** OpenAI v2 API (openai>=1.23.0)
- **Twitter handle:** @skanta_rath
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- Updated docs to have an example to use Jamba instead of J2
---------
Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Tongyi uses different client for chat model and
vision model. This PR chooses proper client based on model name to
support both chat model and vision model. Reference [tongyi
document](https://help.aliyun.com/zh/dashscope/developer-reference/tongyi-qianwen-vl-plus-api?spm=a2c4g.11186623.0.0.27404c9a7upm11)
for details.
```
from langchain_core.messages import HumanMessage
from langchain_community.chat_models import ChatTongyi
llm = ChatTongyi(model_name='qwen-vl-max')
image_message = {
"image": "https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png"
}
text_message = {
"text": "summarize this picture",
}
message = HumanMessage(content=[text_message, image_message])
llm.invoke([message])
```
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
- if tap_output_iter/aiter is called multiple times for the same run
issue events only once
- if chat model run is tapped don't issue duplicate on_llm_new_token
events
- if first chunk arrives after run has ended do not emit it as a stream
event
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
- `llm_chain` becomes `Union[LLMChain, Runnable]`
- `.from_llm` creates a runnable
tested by verifying that docs/how_to/MultiQueryRetriever.ipynb runs
unchanged with sync/async invoke (and that it runs if we specifically
instantiate with LLMChain).
We add a tool and retriever for the [AskNews](https://asknews.app)
platform with example notebooks.
The retriever can be invoked with:
```py
from langchain_community.retrievers import AskNewsRetriever
retriever = AskNewsRetriever(k=3)
retriever.invoke("impact of fed policy on the tech sector")
```
To retrieve 3 documents in then news related to fed policy impacts on
the tech sector. The included notebook also includes deeper details
about controlling filters such as category and time, as well as
including the retriever in a chain.
The tool is quite interesting, as it allows the agent to decide how to
obtain the news by forming a query and deciding how far back in time to
look for the news:
```py
from langchain_community.tools.asknews import AskNewsSearch
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_openai import ChatOpenAI
tool = AskNewsSearch()
instructions = """You are an assistant."""
base_prompt = hub.pull("langchain-ai/openai-functions-template")
prompt = base_prompt.partial(instructions=instructions)
llm = ChatOpenAI(temperature=0)
asknews_tool = AskNewsSearch()
tools = [asknews_tool]
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
)
agent_executor.invoke({"input": "How is the tech sector being affected by fed policy?"})
```
---------
Co-authored-by: Emre <e@emre.pm>
Please let me know if you see any possible areas of improvement. I would
very much appreciate your constructive criticism if time allows.
**Description:**
- Added a aerospike vector store integration that utilizes
[Aerospike-Vector-Search](https://aerospike.com/products/vector-database-search-llm/)
add-on.
- Added both unit tests and integration tests
- Added a docker compose file for spinning up a test environment
- Added a notebook
**Dependencies:** any dependencies required for this change
- aerospike-vector-search
**Twitter handle:**
- No twitter, you can use my GitHub handle or LinkedIn if you'd like
Thanks!
---------
Co-authored-by: Jesse Schumacher <jschumacher@aerospike.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Closes#20561
This PR fixes MLX LLM stream `AttributeError`.
Recently, `mlx-lm` changed the token decoding logic, which affected the
LC+MLX integration.
Additionally, I made minor fixes such as: docs example broken link and
enforcing pipeline arguments (max_tokens, temp and etc) for invoke.
- **Issue:** #20561
- **Twitter handle:** @Prince_Canuma
Related to #20085
@baskaryan
Thank you for contributing to LangChain!
community:sparkllm[patch]: standardized init args
updated `spark_api_key` so that aliased to `api_key`. Added integration
test for `sparkllm` to test that it continues to set the same underlying
attribute.
updated temperature with Pydantic Field, added to the integration test.
Ran `make format`,`make test`, `make lint`, `make spell_check`
UpTrain has a new dashboard now that makes it easier to view projects
and evaluations. Using this requires specifying both project_name and
evaluation_name when performing evaluations. I have updated the code to
support it.
Thank you for contributing to LangChain!
- [x] **PR title**: "community: enable SupabaseVectorStore to support
extended table fields"
- [x] **PR message**:
- Added extension fields to the function _add_vectors so that users can
add other custom fields when insert a record into the database. eg:

---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**:
- Reference to `Collection` object is set to `None` when deleting a
collection `delete_collection()`
- Added utility method `reset_collection()` to allow recreating the
collection
- Moved collection creation out of `__init__` into
`__ensure_collection()` to be reused by object init and
`reset_collection()`
- `_collection` is now a property to avoid breaking changes
**Issues**:
- chroma-core/chroma#2213
**Twitter**: @t_azarov
Example error message:
line 206, in _get_python_function_required_args
if is_function_type and required[0] == "self":
~~~~~~~~^^^
IndexError: list index out of range
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
While integrating the xinference_embedding, we observed that the
downloaded dependency package is quite substantial in size. With a focus
on resource optimization and efficiency, if the project requirements are
limited to its vector processing capabilities, we recommend migrating to
the xinference_client package. This package is more streamlined,
significantly reducing the storage space requirements of the project and
maintaining a feature focus, making it particularly suitable for
scenarios that demand lightweight integration. Such an approach not only
boosts deployment efficiency but also enhances the application's
maintainability, rendering it an optimal choice for our current context.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Add `Origin/langchain` to Apify's client's user-agent
to attribute API activity to LangChain (at Apify, we aim to monitor our
integrations to evaluate whether we should invest more in the LangChain
integration regarding functionality and content)
**Issue:** None
**Dependencies:** None
**Twitter handle:** None
## Description
This PR implements local and dynamic mode in the Nomic Embed integration
using the inference_mode and device parameters. They work as documented
[here](https://docs.nomic.ai/reference/python-api/embeddings#local-inference).
<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, hwchase17. -->
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
These packages all import `LangSmithParams` which was released in
langchain-core==0.2.0.
N.B. we will need to release `openai` and then bump `langchain-openai`
in `together` and `upstage`.
Thank you for contributing to LangChain!
- [x] **PR title**: "docs: update notebook for latest Pinecone API +
serverless"
- [x] **PR message**: Published notebook is incompatible with latest
`pinecone-client` and not runnable. Updated for use with latest Pinecone
Python SDK. Also updated to be compatible with serverless indexes (only
index type available on Pinecone free tier).
- [x] **Add tests and docs**: N/A (tested in Colab)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
- https://app.asana.com/0/0/1207328087952499
Thank you for contributing to LangChain!
- [x] **PR title**: "docs: update notebook for new Pinecone API +
serverless"
- [x] **PR message**: The published notebook is not runnable after
`pinecone-client` v2, which is deprecated. `langchain-pinecone` is not
compatible with the latest `pinecone-client` (v4), so I hardcoded it to
the last v3. Also updated for serverless indexes (only index type
available on Pinecone free plan).
- [x] **Add tests and docs**: N/A (tested in Colab)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
- https://app.asana.com/0/0/1207328087952500
This PR fixes two mistakes in the import paths from community for the
json data aiding the cli migration to 0.2.
It is intended as a quick follow-up to
https://github.com/langchain-ai/langchain/pull/21913 .
@nicoloboschi FYI
ChatOpenaAI --> ChatOpenAI
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
Remove unnecessary print from voyageai embeddings
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- bind_tools interface is a better alternative.
- openai doesn't use functions but tools in its API now.
- the underlying content appears in some redirects, so will need to
investigate if we can remove.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Check if event stream is closed in memory loop.
Using try/except here to avoid race condition, but this may incur a
small overhead in versions prios to 3.11
Update tool calling using prompts.
- Add required concepts
- Update names of tool invoking function.
- Add doc-string to function, and add information about `config` (which
users often forget)
- Remove steps that show how to use single function only. This makes the
how-to guide a bit shorter and more to the point.
- Add diagram from another how-to guide that shows how the thing works
overall.
Since the LangChain based on many research papers, the LC documentation
has several references to the arXiv papers. It would be beneficial to
create a single page with all referenced papers.
PR:
1. Developed code to search the arXiv references in the LangChain
Documentation and the LangChain code base. Those references are included
in a newly generated documentation page.
2. Page is linked to the Docs menu.
Controversial:
1. The `arxiv_references` page is automatically generated. But this
generation now started only manually. It is not included in the doc
generation scripts. The reason for this is simple. I don't want to
mangle into the current documentation refactoring. If you think, we need
to regenerate this page in each build, let me know. Note: This script
has a dependency on the `arxiv` package.
2. The link for this page in the menu is not obvious.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Code:** langchain_community/embeddings/baichuan.py:82
- **Description:** When I make an error using 'baichuan embeddings', the
printed error message is wrapped (there is actually no need to wrap)
```python
# example
from langchain_community.embeddings import BaichuanTextEmbeddings
# error key
BAICHUAN_API_KEY = "sk-xxxxxxxxxxxxx"
embeddings = BaichuanTextEmbeddings(baichuan_api_key=BAICHUAN_API_KEY)
text_1 = "今天天气不错"
query_result = embeddings.embed_query(text_1)
```

There are 2 issues fixed here:
* In the notebook pandas dataframes are formatted as HTML in the cells.
On the documentation site the renderer that converts notebooks
incorrectly displays the raw HTML. I can't find any examples of where
this is working and so I am formatting the dataframes as text.
* Some incorrect table names were referenced resulting in errors.
The only change is replacing the word "operators" with "operates," to
make the sentence grammatically correct.
Thank you for contributing to LangChain!
- [x] **PR title**: "docs: Made a grammatical correction in
streaming.ipynb to use the word "operates" instead of the word
"operators""
- [x] **PR message**:
- **Description:** The use of the word "operators" was incorrect, given
the context and grammar of the sentence. This PR updates the
documentation to use the word "operates" instead of the word
"operators".
- **Issue:** Makes the documentation more easily understandable.
- **Dependencies:** -no dependencies-
- **Twitter handle:** --
- [x] **Add tests and docs**: Since no new integration is being made, no
new tests/example notebooks are required.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- **No formatting changes made to the documentation**
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- Remove double implementations of functions. The single input is just
taking up space.
- Added tool specific information for `async + showing invoke vs.
ainvoke.
- Added more general information about about `async` (this should live
in a different place eventually since it's not specific to tools).
- Changed ordering of custom tools (StructuredTool is simpler and should
appear before the inheritance)
- Improved the error handling section (not convinced it should be here
though)
- Add information about naitve tool calling capabilities
- Add information about standard langchain interface for tool calling
- Update description for tools
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
This PR improves on the `CassandraCache` and `CassandraSemanticCache`
classes, mainly in the constructor signature, and also introduces
several minor improvements around these classes.
### Init signature
A (sigh) breaking change is tentatively introduced to the constructor.
To me, the advantages outweigh the possible discomfort: the new syntax
places the DB-connection objects `session` and `keyspace` later in the
param list, so that they can be given a default value. This is what
enables the pattern of _not_ specifying them, provided one has
previously initialized the Cassandra connection through the versatile
utility method `cassio.init(...)`.
In this way, a much less unwieldy instantiation can be done, such as
`CassandraCache()` and `CassandraSemanticCache(embedding=xyz)`,
everything else falling back to defaults.
A downside is that, compared to the earlier signature, this might turn
out to be breaking for those doing positional instantiation. As a way to
mitigate this problem, this PR typechecks its first argument trying to
detect the legacy usage.
(And to make this point less tricky in the future, most arguments are
left to be keyword-only).
If this is considered too harsh, I'd like guidance on how to further
smoothen this transition. **Our plan is to make the pattern of optional
session/keyspace a standard across all Cassandra classes**, so that a
repeatable strategy would be ideal. A possibility would be to keep
positional arguments for legacy reasons but issue a deprecation warning
if any of them is actually used, to later remove them with 0.2 - please
advise on this point.
### Other changes
- class docstrings: enriched, completely moved to class level, added
note on `cassio.init(...)` pattern, added tiny sample usage code.
- semantic cache: revised terminology to never mention "distance" (it is
in fact a similarity!). Kept the legacy constructor param with a
deprecation warning if used.
- `llm_caching` notebook: uniform flow with the Cassandra and Astra DB
separate cases; better and Cassandra-first description; all imports made
explicit and from community where appropriate.
- cache integration tests moved to community (incl. the imported tools),
env var bugfix for `CASSANDRA_CONTACT_POINTS`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Patch Summary
community:openai[patch]: standardize init args
## Details
I made changes to the OpenAI Chat API wrapper test in the Langchain
open-source repository
- **File**: `libs/community/tests/unit_tests/chat_models/test_openai.py`
- **Changes**:
- Updated `max_retries` with Pydantic Field
- Updated the corresponding unit test
- **Related Issues**: #20085
- Updated max_retries with Pydantic Field, updated the unit test.
---------
Co-authored-by: JuHyung Son <sonju0427@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: updated Browserbase loader"
- [x] **PR message**:
Updates the Browserbase loader with more options and improved docs.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Do not prefix function signature
---
* Reason for this is that information is already present with tool
calling models.
* This will save on tokens for those models, and makes it more obvious
what the description is!
* The @tool can get more parameters to allow a user to re-introduce the
the signature if we want
To permit proper coercion of objects like the following:
```python
class MyAsyncCallable:
async def __call__(self, foo):
return await ...
class MyAsyncGenerator:
async def __call__(self, foo):
await ...
yield
```
This PR introduces a v2 implementation of astream events that removes
intermediate abstractions and fixes some issues with v1 implementation.
The v2 implementation significantly reduces relevant code that's
associated with the astream events implementation together with
overhead.
After this PR, the astream events implementation:
- Uses an async callback handler
- No longer relies on BaseTracer
- No longer relies on json patch
As a result of this re-write, a number of issues were discovered with
the existing implementation.
## Changes in V2 vs. V1
### on_chat_model_end `output`
The outputs associated with `on_chat_model_end` changed depending on
whether it was within a chain or not.
As a root level runnable the output was:
```python
"data": {"output": AIMessageChunk(content="hello world!", id='some id')}
```
As part of a chain the output was:
```
"data": {
"output": {
"generations": [
[
{
"generation_info": None,
"message": AIMessageChunk(
content="hello world!", id=AnyStr()
),
"text": "hello world!",
"type": "ChatGenerationChunk",
}
]
],
"llm_output": None,
}
},
```
After this PR, we will always use the simpler representation:
```python
"data": {"output": AIMessageChunk(content="hello world!", id='some id')}
```
**NOTE** Non chat models (i.e., regular LLMs) are still associated with
the more verbose format.
### Remove some `_stream` events
`on_retriever_stream` and `on_tool_stream` events were removed -- these
were not real events, but created as an artifact of implementing on top
of astream_log.
The same information is already available in the `x_on_end` events.
### Propagating Names
Names of runnables have been updated to be more consistent
```python
model = GenericFakeChatModel(messages=infinite_cycle).configurable_fields(
messages=ConfigurableField(
id="messages",
name="Messages",
description="Messages return by the LLM",
)
)
```
Before:
```python
"name": "RunnableConfigurableFields",
```
After:
```python
"name": "GenericFakeChatModel",
```
### on_retriever_end
on_retriever_end will always return `output` which is a list of
documents (rather than a dict containing a key called "documents")
### Retry events
Removed the `on_retry` callback handler. It was incorrectly showing that
the failed function being retried has invoked `on_chain_end`
https://github.com/langchain-ai/langchain/pull/21638/files#diff-e512e3f84daf23029ebcceb11460f1c82056314653673e450a5831147d8cb84dL1394
Add unit tests that show differences between sync / async versions when
streaming.
The inner on_chain_chunk event is missing if mixing sync and async
functionality. Likely due to missing tap_output_iter implementation on
the sync variant of `_transform_stream_with_config`
0.2 is not a breaking release for core (but it is for langchain and
community)
To keep the core+langchain+community packages in sync at 0.2, we will
relax deps throughout the ecosystem to tolerate `langchain-core` 0.2
## Description
This PR introduces the new `langchain-qdrant` partner package, intending
to deprecate the community package.
## Changes
- Moved the Qdrant vector store implementation `/libs/partners/qdrant`
with integration tests.
- The conditional imports of the client library are now regular with
minor implementation improvements.
- Added a deprecation warning to
`langchain_community.vectorstores.qdrant.Qdrant`.
- Replaced references/imports from `langchain_community` with either
`langchain_core` or by moving the definitions to the `langchain_qdrant`
package itself.
- Updated the Qdrant vector store documentation to reflect the changes.
## Testing
- `QDRANT_URL` and
[`QDRANT_API_KEY`](583e36bf6b)
env values need to be set to [run integration
tests](d608c93d1f)
in the [cloud](https://cloud.qdrant.tech).
- If a Qdrant instance is running at `http://localhost:6333`, the
integration tests will use it too.
- By default, tests use an
[`in-memory`](https://github.com/qdrant/qdrant-client?tab=readme-ov-file#local-mode)
instance(Not comprehensive).
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
This PR makes some small updates for `KuzuQAChain` for graph QA.
- Updated Cypher generation prompt (we now support `WHERE EXISTS`) and
generalize it more
- Support different LLMs for Cypher generation and QA
- Update docs and examples
- Adds Techniques section
- Moves function calling, retrieval types to Techniques
- Removes Installation section (not conceptual)
- Reorders a few things (chat models before llms, package descriptions
before diagram)
- Add text splitter types to Techniques
First Pr for the langchain_huggingface partner Package
- Moved some of the hugging face related class from `community` to the
new `partner package`
Still needed :
- Documentation
- Tests
- Support for the new apply_chat_template in `ChatHuggingFace`
- Confirm choice of class to support for embeddings witht he
sentence-transformer team.
cc : @efriis
---------
Co-authored-by: Cyril Kondratenko <kkn1993@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- Introduce the `merge_and_split` function in the
`UpstageLayoutAnalysisLoader`.
- The `merge_and_split` function takes a list of documents and a
splitter as inputs.
- This function merges all documents and then divides them using the
`split_documents` method, which is a proprietary function of the
splitter.
- If the provided splitter is `None` (which is the default setting), the
function will simply merge the documents without splitting them.
- Make sure the left nav bar is horizontally scrollable
- Make sure the navigation dropdown is vertically scrollable and height
capped at 80% of viewport height
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Adds a Python REPL that executes code in a code interpreter session
using Azure Container Apps dynamic sessions.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [X] **PR title**: "community: Add source metadata to bedrock retriever
response"
- [X] **PR message**:
- **Description:** Bedrock retrieve API returns extra metadata in the
response which is currently not returned in the retriever response
- **Issue:** The change adds the metadata from bedrock retrieve API
response to the bedrock retriever in a backward compatible way. Renamed
metadata to sourceMetadata as metadata term is being used in the
Document already. This is in sync with what we are doing in llama-index
as well.
- **Dependencies:** No
- [X] **Add tests and docs**:
1. Added unit tests
2. Notebook already exists and does not need any change
3. Response from end to end testing, just to ensure backward
compatibility: `[Document(page_content='Exoplanets.',
metadata={'location': {'s3Location': {'uri':
's3://bucket/file_name.txt'}, 'type': 'S3'}, 'score': 0.46886647,
'source_metadata': {'x-amz-bedrock-kb-source-uri':
's3://bucket/file_name.txt', 'tag': 'space', 'team': 'Nasa', 'year':
1946.0}})]`
- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
**Description:** Added a few additional arguments to the whisper parser,
which can be consumed by the underlying API.
The prompt is especially important to fine-tune transcriptions.
---------
Co-authored-by: Roi Perlman <roi@fivesigmalabs.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
This PR introduces chunking logic to the `DeepInfraEmbeddings` class to
handle large batch sizes without exceeding maximum batch size of the
backend. This enhancement ensures that embedding generation processes
large batches by breaking them down into smaller, manageable chunks,
each conforming to the maximum batch size limit.
**Issue:**
Fixes#21189
**Dependencies:**
No new dependencies introduced.
- Added new document_transformer: MarkdonifyTransformer, that uses
`markdonify` package with customizable options to convert HTML to
Markdown. It's similar to Html2TextTransformer, but has more flexible
options and also I've noticed that sometimes MarkdownifyTransformer
performs better than html2text one, so that's why I use markdownify on
my project.
- Added docs and tests
- Usage:
```python
from langchain_community.document_transformers import MarkdownifyTransformer
markdownify = MarkdownifyTransformer()
docs_transform = markdownify.transform_documents(docs)
```
- Example of better performance on simple task, that I've noticed:
```
<html>
<head><title>Reports on product movement</title></head>
<body>
<p data-block-key="2wst7">The reports on product movement will be useful for forming supplier orders and controlling outcomes.</p>
</body>
```
**Html2TextTransformer**:
```python
[Document(page_content='The reports on product movement will be useful for forming supplier orders and\ncontrolling outcomes.\n\n')]
# Here we can see 'and\ncontrolling', which has extra '\n' in it
```
**MarkdownifyTranformer**:
```python
[Document(page_content='Reports on product movement\n\nThe reports on product movement will be useful for forming supplier orders and controlling outcomes.')]
```
---------
Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.bbrouter>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Sokolov Fedor <f.sokolov@sokolov-macbook.local>
Co-authored-by: Sokolov Fedor <f.sokolov@192.168.1.6>
### GPT4AllEmbeddings parameters
---
**Description:**
As of right now the **Embed4All** class inside _GPT4AllEmbeddings_ is
instantiated as it's default which leaves no room to customize the
chosen model and it's behavior. Thus:
- GPT4AllEmbeddings can now be instantiated with custom parameters like
a different model that shall be used.
---------
Co-authored-by: AlexJauchWalser <alexander.jauch-walser@knime.com>
The `_amake_session()` method does not allow modifying the
`self.session_factory` with
anything other than `async_sessionmaker`. This prohibits advanced uses
of `index()`.
In a RAG architecture, it is necessary to import document chunks.
To keep track of the links between chunks and documents, we can use the
`index()` API.
This API proposes to use an SQL-type record manager.
In a classic use case, using `SQLRecordManager` and a vector database,
it is impossible
to guarantee the consistency of the import. Indeed, if a crash occurs
during the import
(problem with the network, ...)
there is an inconsistency between the SQL database and the vector
database.
With the
[PR](https://github.com/langchain-ai/langchain-postgres/pull/32) we are
proposing for `langchain-postgres`,
it is now possible to guarantee the consistency of the import of chunks
into
a vector database. It's possible only if the outer session is built
with the connection.
```python
def main():
db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/"
engine = create_engine(db_url, echo=True)
embeddings = FakeEmbeddings()
pgvector:VectorStore = PGVector(
embeddings=embeddings,
connection=engine,
)
record_manager = SQLRecordManager(
namespace="namespace",
engine=engine,
)
record_manager.create_schema()
with engine.connect() as connection:
session_maker = scoped_session(sessionmaker(bind=connection))
# NOTE: Update session_factories
record_manager.session_factory = session_maker
pgvector.session_maker = session_maker
with connection.begin():
loader = CSVLoader(
"data/faq/faq.csv",
source_column="source",
autodetect_encoding=True,
)
result = index(
source_id_key="source",
docs_source=loader.load()[:1],
cleanup="incremental",
vector_store=pgvector,
record_manager=record_manager,
)
print(result)
```
The same thing is possible asynchronously, but a bug in
`sql_record_manager.py`
in `_amake_session()` must first be fixed.
```python
async def _amake_session(self) -> AsyncGenerator[AsyncSession, None]:
"""Create a session and close it after use."""
# FIXME: REMOVE if not isinstance(self.session_factory, async_sessionmaker):~~
if not isinstance(self.engine, AsyncEngine):
raise AssertionError("This method is not supported for sync engines.")
async with self.session_factory() as session:
yield session
```
Then, it is possible to do the same thing asynchronously:
```python
async def main():
db_url = "postgresql+psycopg://postgres:password_postgres@localhost:5432/"
engine = create_async_engine(db_url, echo=True)
embeddings = FakeEmbeddings()
pgvector:VectorStore = PGVector(
embeddings=embeddings,
connection=engine,
)
record_manager = SQLRecordManager(
namespace="namespace",
engine=engine,
async_mode=True,
)
await record_manager.acreate_schema()
async with engine.connect() as connection:
session_maker = async_scoped_session(
async_sessionmaker(bind=connection),
scopefunc=current_task)
record_manager.session_factory = session_maker
pgvector.session_maker = session_maker
async with connection.begin():
loader = CSVLoader(
"data/faq/faq.csv",
source_column="source",
autodetect_encoding=True,
)
result = await aindex(
source_id_key="source",
docs_source=loader.load()[:1],
cleanup="incremental",
vector_store=pgvector,
record_manager=record_manager,
)
print(result)
asyncio.run(main())
```
---------
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Sean <sean@upstage.ai>
Co-authored-by: JuHyung-Son <sonju0427@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: YISH <mokeyish@hotmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Jason_Chen <820542443@qq.com>
Co-authored-by: Joan Fontanals <joan.fontanals.martinez@jina.ai>
Co-authored-by: Pavlo Paliychuk <pavlo.paliychuk.ca@gmail.com>
Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
Co-authored-by: samanhappy <samanhappy@gmail.com>
Co-authored-by: Lei Zhang <zhanglei@apache.org>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: merdan <48309329+merdan-9@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Andres Algaba <andresalgaba@gmail.com>
Co-authored-by: davidefantiniIntel <115252273+davidefantiniIntel@users.noreply.github.com>
Co-authored-by: Jingpan Xiong <71321890+klaus-xiong@users.noreply.github.com>
Co-authored-by: kaka <kaka@zbyte-inc.cloud>
Co-authored-by: jingsi <jingsi@leadincloud.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Shengsheng Huang <shannie.huang@gmail.com>
Co-authored-by: Michael Schock <mjschock@users.noreply.github.com>
Co-authored-by: Anish Chakraborty <anish749@users.noreply.github.com>
Co-authored-by: am-kinetica <85610855+am-kinetica@users.noreply.github.com>
Co-authored-by: Dristy Srivastava <58721149+dristysrivastava@users.noreply.github.com>
Co-authored-by: Matt <matthew.gotteiner@microsoft.com>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
- **Description:** Fix import class name exporeted from
'playwright.async_api' and 'playwright.sync_api' to match the correct
name in playwright tool. Change import from inline guard_import to
helper function that calls guard_import to make code more readable in
gmail tool. Upgrade playwright version to 1.43.0
- **Issue:** #21354
- **Dependencies:** upgrade playwright version(this is not required for
the bugfix itself, just trying to keep dependencies fresh. I can remove
the playwright version upgrade if you want.)
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
0.2rc
migrations
- [x] Move memory
- [x] Move remaining retrievers
- [x] graph_qa chains
- [x] some dependency from evaluation code potentially on math utils
- [x] Move openapi chain from `langchain.chains.api.openapi` to
`langchain_community.chains.openapi`
- [x] Migrate `langchain.chains.ernie_functions` to
`langchain_community.chains.ernie_functions`
- [x] migrate `langchain/chains/llm_requests.py` to
`langchain_community.chains.llm_requests`
- [x] Moving `langchain_community.cross_enoders.base:BaseCrossEncoder`
->
`langchain_community.retrievers.document_compressors.cross_encoder:BaseCrossEncoder`
(namespace not ideal, but it needs to be moved to `langchain` to avoid
circular deps)
- [x] unit tests langchain -- add pytest.mark.community to some unit
tests that will stay in langchain
- [x] unit tests community -- move unit tests that depend on community
to community
- [x] mv integration tests that depend on community to community
- [x] mypy checks
Other todo
- [x] Make deprecation warnings not noisy (need to use warn deprecated
and check that things are implemented properly)
- [x] Update deprecation messages with timeline for code removal (likely
we actually won't be removing things until 0.4 release) -- will give
people more time to transition their code.
- [ ] Add information to deprecation warning to show users how to
migrate their code base using langchain-cli
- [ ] Remove any unnecessary requirements in langchain (e.g., is
SQLALchemy required?)
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Robocorp (action server) toolkit had a limitation that the content
length returned by the tool was always cut to max 5000 chars. This was
from the time when context windows were much more limited.
This PR removes the limitation. Whatever the underlying tool provides
gets sent back to the agent.
As the robocorp toolkit no longer restricts the content, the implication
is that either the Action (tool) developer or the agent developer needs
to be aware of potentially oversized tool responses. Our point of view
is this should be the agent developer's responsibility, them being in
control of the use case and aware of the context window the LLM has.
Description: We are merging UPSTAGE_DOCUMENT_AI_API_KEY and
UPSTAGE_API_KEY into one, and only UPSTAGE_API_KEY will be used going
forward. And we changed the base class of ChatUpstage to BaseChatOpenAI.
---------
Co-authored-by: Sean <chosh0615@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain-ibm: Fix llm and embeddings 'verify'
attribute default value"
- [x] **PR message**:
- **Description:** fix default value of "verify" attribute
- **Dependencies:** `ibm_watsonx_ai`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
…Endpoint`
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** add `bind_tools` and `with_structured_output` support
to `QianfanChatEndpoint`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- Added Together docs in chat models section
- Update Together provider docs to match the LLM & chat models sections
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This is a doc update. It fixes up formatting and product name
references. The example code is updated to use a local built-in text
file.
@mmhangami Please take a look
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Updated the together integration docs by leading with
the streaming example, explicitly specifying a model to show users how
to do that, and updating the sections to more closely match other
integrations.
Description: This PR includes fix for loader_source to be fetched from
metadata in case of GdriveLoaders.
Documentation: NA
Unit Test: NA
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
- it's only node ids that are limited
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [ ] **HuggingFaceInferenceAPIEmbeddings**: "Additional Headers"
- Where: langchain, community, embeddings. huggingface.py.
- Community: add additional headers when needed by custom HuggingFace
TEI embedding endpoints. HuggingFaceInferenceAPIEmbeddings"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding the `additional_headers` to be passed to
requests library if needed
- **Dependencies:** none
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Tested with locally available TEI endpoints with and without
`additional_headers`
2. Example Usage
```python
embeddings=HuggingFaceInferenceAPIEmbeddings(
api_key=MY_CUSTOM_API_KEY,
api_url=MY_CUSTOM_TEI_URL,
additional_headers={
"Content-Type": "application/json"
}
)
```
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Adding chat completions to the Together AI package,
which is our most popular API. Also staying backwards compatible with
the old API so folks can continue to use the completions API as well.
Also moved the embedding API to use the OpenAI library to standardize it
further.
**Twitter handle:** @nutlope
- [x] **Add tests and docs**: If you're adding a new integration, please
include
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
[Standardized model init args
#20085](https://github.com/langchain-ai/langchain/issues/20085)
- Enable premai chat model to be initialized with `model_name` as an
alias for `model`, `api_key` as an alias for `premai_api_key`.
- Add initialization test `test_premai_initialization`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** fix: variable names in root validator not allowing
pass credentials as named parameters in llm instancing, also added
sambanova's sambaverse and sambastudio llms to __init__.py for module
import
Description: this change adds args_schema (pydantic BaseModel) to
YahooFinanceNewsTool for correct schema formatting on LLM function calls
Issue: currently using YahooFinanceNewsTool with OpenAI function calling
returns the following error "TypeError("YahooFinanceNewsTool._run() got
an unexpected keyword argument '__arg1'")". This happens because the
schema sent to the LLM is "input: "{'__arg1': 'MSFT'}"" while the method
should be called with the "query" parameter.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Issue: `load_qa_chain` is placed in the __init__.py file. As a result,
it is not listed in the API Reference docs.
BTW `load_qa_chain` is heavily presented in the doc examples, but is
missed in API Ref.
Change: moved code from init.py into a new file. Related: #21266
Reverts langchain-ai/langchain#21174
Hey team - going to revert this because it doesn't seem necessary for
testing. We should only be adding optional + extended_testing
dependencies for deps that have extended tests.
otherwise it just increases probability of dependency conflicts in the
community lockfile.
Thank you for contributing to LangChain!
community:baichuan[patch]: standardize init args
updated `baichuan_api_key` so that aliased to `api_key`. Added test that
it continues to set the same underlying attribute. Test checks for
`SecretStr`
updated `temperature` with Pydantic Field, added unit test.
Related to https://github.com/langchain-ai/langchain/issues/20085
If Session and/or keyspace are not provided, they are resolved from
cassio's context. So they are not required.
This change is fully backward compatible.
Issue: the `langkit` package is not presented in the `pyproject.toml`
but it is a requirement for the `WhyLabsCallbackHandler`
Change: added `langkit`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Update LarkSuite loader doc to give an example for
loading data from LarkSuite wiki.
**Issue:** None
**Dependencies:** None
**Twitter handle:** None
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain-ibm: Add support for ibm-watsonx-ai new
major version"
- [x] **PR message**:
- **Description:** Add support for ibm-watsonx-ai new major version
- **Dependencies:** `ibm_watsonx_ai`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
The `LocalFileStore` class can be used to create an on-disk
`CacheBackedEmbeddings` cache. The number of files in these embeddings
caches can grow to be quite large over time (hundreds of thousands) as
embeddings are computed for new versions of content, but the embeddings
for old/deprecated content are not removed.
A *least-recently-used* (LRU) cache policy could be applied to the
`LocalFileStore` directory to delete cache entries that have not been
referenced for some time:
```bash
# delete files that have not been accessed in the last 90 days
find embeddings_cache_dir/ -atime 90 -print0 | xargs -0 rm
```
However, most filesystems in enterprise environments disable access time
modification on read to improve performance. As a result, the access
times of these cache entry files are not updated when their values are
read.
To resolve this, this pull request updates the `LocalFileStore`
constructor to offer an `update_atime` parameter that causes access
times to be updated when a cache entry is read.
For example,
```python
file_store = LocalFileStore(temp_dir, update_atime=True)
```
The default is `False`, which retains the original behavior.
**Testing:**
I updated the LocalFileStore unit tests to test the access time update.
Before you could only extract triples (diffbot calls it facts) from
diffbot to avoid isolated nodes. However, sometimes isolated nodes can
still be useful like for prefiltering, so we want to allow users to
extract them if they want. Default behaviour is unchanged.
**Description:** Update unit test for ChatAnthropic
**Issue:** Test for key passed in from the environment should not have
the key initialized in the constructor
**Dependencies:** None
Thank you for contributing to LangChain!
- Oracle AI Vector Search
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings
- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.
- We have made sure that make format and make lint run clean.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
Memory return could be set as `str` or `message` by `return_messages`
flag as mentioned in
https://python.langchain.com/docs/modules/memory/#whether-memory-is-a-string-or-a-list-of-messages,
where
`langchain.chains.conversation.memory.ConversationSummaryBufferMemory`
did not implement that.
This commit added `buffer_as_str` and `buffer_as_messages` function, and
`buffer` now affected by `return_messages` flag.
## Example Test Code and Output
```python
# Fix: ConversationSummaryBufferMemory with return_messages flag function
# Test code
from langchain.chains.conversation.memory import ConversationSummaryBufferMemory
from langchain_community.llms.ollama import Ollama
llm = Ollama()
# Create an instance of ConversationSummaryBufferMemory with return_messages set to True
memory = ConversationSummaryBufferMemory(return_messages=True, llm=llm)
# Add user and AI messages to the chat memory
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
# Print the buffer
print("Buffer:")
print(*map(type, memory.buffer), sep="\n")
print(memory.buffer, "\n")
# Print the buffer as a string
print("Buffer as String:")
print(type(memory.buffer_as_str))
print(memory.buffer_as_str, "\n")
# Print the buffer as messages
print("Buffer as Messages:")
print(*map(type, memory.buffer_as_messages), sep="\n")
print(memory.buffer_as_messages, "\n")
# Print the buffer after setting return_messages to False
memory.return_messages = False
print("Buffer after setting return_messages to False:")
print(type(memory.buffer))
print(memory.buffer, "\n")
```
```plaintext
Buffer:
<class 'langchain_core.messages.human.HumanMessage'>
<class 'langchain_core.messages.ai.AIMessage'>
[HumanMessage(content='hi!'), AIMessage(content="what's up?")]
Buffer as String:
<class 'str'>
Human: hi!
AI: what's up?
Buffer as Messages:
<class 'langchain_core.messages.human.HumanMessage'>
<class 'langchain_core.messages.ai.AIMessage'>
[HumanMessage(content='hi!'), AIMessage(content="what's up?")]
Buffer after setting return_messages to False:
<class 'str'>
Human: hi!
AI: what's up?
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Issue: we have several helper functions to import third-party libraries
like tools.gmail.utils.import_google in
[community.tools](https://api.python.langchain.com/en/latest/community_api_reference.html#id37).
And we have core.utils.utils.guard_import that works exactly for this
purpose.
The import_<package> functions work inconsistently and rather be private
functions.
Change: replaced these functions with the guard_import function.
Related to #21133
Issues (nit):
1. `utils.guard_import` prints wrong error message when there is an
import `error.` It prints the whole `module_name` but should be only the
first part as the pip package name. E.i. `langchain_core.utils` -> print
not `langchain-core` but `langchain_core.utils`. Also replace '_' with
'-' in the pip package name.
2. it does not handle the `ModuleNotFoundError` which raised if
`guard_import("wrong_module")`
Fixed issues; added ut-s. Controversial: I've reraised
`ModuleNotFoundError` as `ImportError`, since in case of the error, the
proposed action is the same - we need to install a missed package.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Issue: `load_summarize_chain` is placed in the __init__.py file. As a
result, it doesn't listed in the API Reference docs.
Change: moved code from __init__.py into a new file.
**PR message**:
- **Description:** Corrected a syntax error in the code comments within
the `create_tool_calling_agent` function in the langchain package.
- **Issue:** N/A
- **Dependencies:** No additional dependencies required.
- **Twitter handle:** N/A
This PR fixes#21196.
The error was occurring when calling chat completion API with a chat
history. Indeed, the Mistral API does not accept both `content` and
`tool_calls` in the same body.
This PR removes one of theses variables depending on the necessity.
---------
Co-authored-by: Maxime Perrin <mperrin@doing.fr>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
* Introduce individual `fetch_` methods for easier typing.
* Rework some docstrings to google style
* Move some logic to the tool
* Merge the 2 cassandra utility files
- support two-tuples of any sequence type (eg. json.loads never produces
tuples)
- support type alias for role key
- if id is passed in in dict form use it
- if tool_calls passed in in dict form use them
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Refactors the docs build in order to:
- run the same `make build` command in both vercel and local build
- incrementally build artifacts in 2 distinct steps, instead of building
all docs in-place (in vercel) or in a _dist dir (locally)
Highlights:
- introduces `make build` in order to build the docs
- collects and generates all files for the build in
`docs/build/intermediate`
- renders those jupyter notebook + markdown files into
`docs/build/outputs`
And now the outputs to host are in `docs/build/outputs`, which will need
a vercel settings change.
Todo:
- [ ] figure out how to point the right directory (right now deleting
and moving docs dir in vercel_build.sh isn't great)
**Description:**
This pull request introduces a new feature for LangChain: the
integration with the Rememberizer API through a custom retriever.
This enables LangChain applications to allow users to load and sync
their data from Dropbox, Google Drive, Slack, their hard drive into a
vector database that LangChain can query. Queries involve sending text
chunks generated within LangChain and retrieving a collection of
semantically relevant user data for inclusion in LLM prompts.
User knowledge dramatically improved AI applications.
The Rememberizer integration will also allow users to access general
purpose vectorized data such as Reddit channel discussions and US
patents.
**Issue:**
N/A
**Dependencies:**
N/A
**Twitter handle:**
https://twitter.com/Rememberizer
## Summary
`ruff /path/to/file.py` works but is deprecated, and we now recommend
`ruff check /path/to/file.py` (to match `ruff format /path/to/file.py`).
Vertex DIY RAG APIs helps to build complex RAG systems and provide more
granular control, and are suited for custom use cases.
The Ranking API takes in a list of documents and reranks those documents
based on how relevant the documents are to a given query. Compared to
embeddings that look purely at the semantic similarity of a document and
a query, the ranking API can give you a more precise score for how well
a document answers a given query.
[Reference](https://cloud.google.com/generative-ai-app-builder/docs/ranking)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Sync the config in `devcontainer.json` and `docker-compose.yml`**
Issue: when opening the current `master` branch in a dev container in VS
Code, I get the following message as VS Code cannot find the mounted
source folder:

Opening in a GitHub Codespace works (it seems to ignore the mounts in
the `docker-compose.yml`.
This PR updates the mount in `docker-compose.yml` and the config in
`devcontainer.json` so that the two align.
I have tested these changes in GitHub Codespaces and a VS Code dev
container and both loaded successfully.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Add tests to check API keys and Active Directory tokens
are masked
**Issue:** Resolves#12165 for OpenAI and Azure OpenAI models
**Dependencies:** None
Also resolves#12473 which may be closed.
Additional contributors @alex4321 (#12473) and @onesolpark (#12542)
- [ ] **PR message**:
- **Description:** Refactored the lazy_load method to use asynchronous
execution for improved performance. The method now initiates scraping of
all URLs simultaneously using asyncio.gather, enhancing data fetching
efficiency. Each Document object is yielded immediately once its content
becomes available, streamlining the entire process.
- **Issue:** N/A
- **Dependencies:** Requires the asyncio library for handling
asynchronous tasks, which should already be part of standard Python
libraries in Python 3.7 and above.
- **Email:** [r73327118@gmail.com](mailto:r73327118@gmail.com)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update python.py(experimental:Added code for PythonREPL)
Added code for PythonREPL, defining a static method 'sanitize_input'
that takes the string 'query' as input and returns a sanitizing string.
The purpose of this method is to remove unwanted characters from the
input string, Specifically:
1. Delete the whitespace at the beginning and end of the string (' \s').
2. Remove the quotation marks (`` ` ``) at the beginning and end of the
string.
3. Remove the keyword "python" at the beginning of the string (case
insensitive) because the user may have typed it.
This method uses regular expressions (regex) to implement sanitizing.
It all started with this code:
from langchain.agents import Tool
from langchain_experimental.utilities import PythonREPL
python_repl = PythonREPL()
repl_tool = Tool(
name="python_repl",
description="Remove redundant formatting marks at the beginning and end
of source code from input.Use a Python shell to execute python commands.
If you want to see the output of a value, you should print it out with
`print(...)`.",
func=python_repl.run,
)
When I call the agent to write a piece of code for me and execute it
with the defined code, I must get an error: SyntaxError('invalid
syntax', ('<string>', 1, 1,'In', 1, 2))
After checking, I found that pythonREPL has less formatting of input
code than the soon-to-be deprecated pythonREPL tool, so I added this
step to it, so that no matter what code I ask the agent to write for me,
it can be executed smoothly and get the output result.
I have tried modifying the prompt words to solve this problem before,
but it did not work, and by adding a simple format check, the problem is
well resolved.
<img width="1271" alt="image"
src="https://github.com/langchain-ai/langchain/assets/164149097/c49a685f-d246-4b11-b655-fd952fc2f04c">
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**
This pull request updates the Bagel Network package name from
"betabageldb" to "bagelML" to align with the latest changes made by the
Bagel Network team.
The following modifications have been made:
- Updated all references to the old package name ("betabageldb") with
the new package name ("bagelML") throughout the codebase.
- Modified the documentation, and any relevant scripts to reflect the
package name change.
- Tested the changes to ensure that the functionality remains intact and
no breaking changes were introduced.
By merging this pull request, our project will stay up to date with the
latest Bagel Network package naming convention, ensuring compatibility
and smooth integration with their updated library.
Please review the changes and provide any feedback or suggestions. Thank
you!
**Description:** Update UpstageLayoutAnalysisParser and Loader and add
upstage loader example in pdf section
**Dependencies:** langchain_community
**Twitter handle:** [@upstageai](https://twitter.com/upstageai)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Issue:**
Currently `AzureSearch` vector store does not implement `delete` method.
This PR implements it. This also makes it compatible with LangChain
indexer.
**Dependencies:**
None
**Twitter handle:**
@martintriska1
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Upgrades prompts module to use optional imports.
This code was generated with a migration script, but had to be adjusted
manually a bit.
Testing in preparation for applying this code modification across the
rest of the modules in langchain package to reverse the dependency
between langchain community and langchain.
## Summary
No new diagnostics (given that the set of enabled rules hasn't changed),
but gains access to our new parser (much faster) and reduced false
positives all around.
### Description:
When attempting to download PDF files from arXiv, an unexpected 404
error frequently occurs. This error halts the operation, regardless of
whether there are additional documents to process. As a solution, I
suggest implementing a mechanism to ignore and communicate this error
and continue processing the next document from the list.
Proposed Solution: To address the issue of unexpected 404 errors during
PDF downloads from arXiv, I propose implementing the following solution:
- Error Handling: Implement error handling mechanisms to catch and
handle 404 errors gracefully.
- Communication: Inform the user or logging system about the occurrence
of the 404 error.
- Continued Processing: After encountering a 404 error, continue
processing the remaining documents from the list without interruption.
This solution ensures that the application can handle unexpected errors
without terminating the entire operation. It promotes resilience and
robustness in the face of intermittent issues encountered during PDF
downloads from arXiv.
### Issue:
#20909
### Dependencies:
none
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Summary
I ran `ruff check --extend-select RUF100 -n` to identify `# noqa`
comments that weren't having any effect in Ruff, and then `ruff check
--extend-select RUF100 -n --fix` on select files to remove all of the
unnecessary `# noqa: F401` violations. It's possible that these were
needed at some point in the past, but they're not necessary in Ruff
v0.1.15 (used by LangChain) or in the latest release.
Co-authored-by: Erick Friis <erick@langchain.dev>
…/17690
Thank you for contributing to LangChain!
- [x] **Fix Google Lens knowledge graph issue**: "langchain: community"
- Fix for [No "knowledge_graph" property in Google Lens API call from
SerpAPI](https://github.com/langchain-ai/langchain/issues/17690)
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** handled the existence of keys in the json response of
Google Lens
- **Issue:** [No "knowledge_graph" property in Google Lens API call from
SerpAPI](https://github.com/langchain-ai/langchain/issues/17690)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Proposing to centralize code for handling dynamic imports. This allows treating langchain-community as an optional dependency.
---
The proposal is to scan the code base and to replace all existing imports with dynamic imports using this functionality.
Fixed the error that the model name is never actually put into GigaChat
request payload, always defaulting to `GigaChat-Lite`.
With this fix, model selection through
```python
import os
from langchain.chat_models.gigachat import GigaChat
chat = GigaChat(
name="GigaChat-Pro", # <- HERE!!!!!
...
)
```
should actually work, as intended in
[here](804390ba4b/libs/community/langchain_community/llms/gigachat.py (L36)).
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description**: ToolKit and Tools for accessing data in a Cassandra
Database primarily for Agent integration. Initially, this includes the
following tools:
- `cassandra_db_schema` Gathers all schema information for the connected
database or a specific schema. Critical for the agent when determining
actions.
- `cassandra_db_select_table_data` Selects data from a specific keyspace
and table. The agent can pass paramaters for a predicate and limits on
the number of returned records.
- `cassandra_db_query` Expiriemental alternative to
`cassandra_db_select_table_data` which takes a query string completely
formed by the agent instead of parameters. May be removed in future
versions.
Includes unit test and two notebooks to demonstrate usage.
**Dependencies**: cassio
**Twitter handle**: @PatrickMcFadin
---------
Co-authored-by: Phil Miesle <phil.miesle@datastax.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** This pull request introduces a new feature to community
tools, enhancing its search capabilities by integrating the Mojeek
search engine
**Dependencies:** None
---------
Co-authored-by: Igor Brai <igor@mojeek.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Removed redundant self/cls from required args of class functions in
_get_python_function_required_args:
```python
class MemberTool:
def search_member(
self,
keyword: str,
*args,
**kwargs,
):
"""Search on members with any keyword like first_name, last_name, email
Args:
keyword: Any keyword of member
"""
headers = dict(authorization=kwargs['token'])
members = []
try:
members = request_(
method='SEARCH',
url=f'{service_url}/apiv1/members',
headers=headers,
json=dict(query=keyword),
)
except Exception as e:
logger.info(e.__doc__)
return members
convert_to_openai_tool(MemberTool.search_member)
```
expected result:
```
{'type': 'function', 'function': {'name': 'search_member', 'description': 'Search on members with any keyword like first_name, last_name, username, email', 'parameters': {'type': 'object', 'properties': {'keyword': {'type': 'string', 'description': 'Any keyword of member'}}, 'required': ['keyword']}}}
```
#20685
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "docs: switched GCSLoaders docs to
langchain-google-community"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** switched GCSLoaders docs to
langchain-google-community
Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
Implemented bind_tools for OllamaFunctions.
Made OllamaFunctions sub class of ChatOllama.
Implemented with_structured_output for OllamaFunctions.
integration unit test has been updated.
notebook has been updated.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I can't seem to reproduce, but i got this:
```
SystemError: AST constructor recursion depth mismatch (before=102, after=37)
```
And the operation isn't critical for the actual forward pass so seems
preferable to expand our caught exceptions
**Description**: This update enhances the `extract_sub_links` function
within the `langchain_core/utils/html.py` module to include query
parameters in the extracted URLs.
**Issue**: N/A
**Dependencies**: No additional dependencies required for this change.
**Twitter handle**: N/A
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Just a simple PR to fix a broken link. Apparently having backticks
outside a link makes it render as code.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This introduces `store_kwargs` which behaves similarly to `graph_kwargs`
on the `RdfGraph` object, which will enable users to pass `headers` and
other arguments to the underlying `SPARQLStore` object. I have also made
a [PR in `rdflib` to support passing
`default_graph`](https://github.com/RDFLib/rdflib/pull/2761).
Example usage:
```python
from langchain_community.graphs import RdfGraph
graph = RdfGraph(
query_endpoint="http://localhost/sparql",
standard="rdf",
store_kwargs=dict(
default_graph="http://example.com/mygraph"
)
)
```
<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.-->
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
MindsDB integrates with LangChain, enabling users to deploy, serve, and
fine-tune models available via LangChain within MindsDB, making them
accessible to numerous data sources.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: The PebbloSafeLoader should first check for owner,
full_path and size in metadata before implementing its own logic.
Dependencies: None
Documentation: NA.
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Issue: #20514
The current implementation of `construct_instance` expects a `texts:
List[str]` that will call the embedding function. This might not be
needed when we already have a client with collection and `path, you
don't want to add any text.
This PR adds a class method that returns a qdrant instance with an
existing client.
Here everytime
cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)
`construct_instance` is called, this line sends some text for embedding
generation.
---------
Co-authored-by: Anush <anushshetty90@gmail.com>
* Groundedness Check takes `str` or `list[Document]` as input.
* Deprecate `GroundednessCheck` due to its naming.
* Added `UpstageGroundednessCheck`.
* Hotfix for Groundedness Check parameter.
The name `query` was misleading and it should be `answer` instead.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This auto generates partner migrations.
At the moment the migration is from community -> partner.
So one would need to run the migration script twice to go from langchain to partner.
Add script to help generate migrations.
This works well for partner packages. Migrations are generated based on run time rather than static analysis (much simpler to get the correct migrations implemented).
The script for generating migrations from langchain to community still needs work.
`langchain_pinecone.Pinecone` is deprecated in favor of
`PineconeVectorStore`, and is currently a subclass of
`PineconeVectorStore`.
```python
@deprecated(since="0.0.3", removal="0.2.0", alternative="PineconeVectorStore")
class Pinecone(PineconeVectorStore):
"""Deprecated. Use PineconeVectorStore instead."""
pass
```
**Description:** AzureSearch vector store has no tests. This PR adds
initial tests to validate the code can be imported and used.
**Issue:** N/A
**Dependencies:** azure-search-documents and azure-identity are added as
optional dependencies for testing
---------
Co-authored-by: Matt Gotteiner <[email protected]>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**:
_PebbloSafeLoader_: Add support for pebblo server and client version
**Documentation:** NA
**Unit test:** NA
**Issue:** NA
**Dependencies:** None
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- [ ] **Kinetica Document Loader**: "community: a class to load
Documents from Kinetica"
- [ ] **Kinetica Document Loader**:
- **Description:** implemented KineticaLoader in `kinetica_loader.py`
- **Dependencies:** install the Kinetica API using `pip install
gpudb==7.2.0.1 `
**Description:** Fixes a bug in the HuggingGPT task execution logic
here:
except Exception as e:
self.status = "failed"
self.message = str(e)
self.status = "completed"
self.save_product()
where a caught exception effectively just sets `self.message` and can
then throw an exception if, e.g., `self.product` is not defined.
**Issue:** None that I'm aware of.
**Dependencies:** None
**Twitter handle:** https://twitter.com/michaeljschock
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Changes
`lanchain_core.output_parsers.CommaSeparatedListOutputParser` to handle
`,` as a delimiter alongside the previous implementation which used `, `
as delimiter.
- **Issue:** Started noticing that some results returned by LLMs were
not getting parsed correctly when the output contained `,` instead of `,
`.
- **Dependencies:** No
- **Twitter handle:** not active on twitter.
<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
-->
- **Description**:
- **add support for more data types**: by default `IpexLLM` will load
the model in int4 format. This PR adds more data types support such as
`sym_in5`, `sym_int8`, etc. Data formats like NF3, NF4, FP4 and FP8 are
only supported on GPU and will be added in future PR.
- Fix a small issue in saving/loading, update api docs
- **Dependencies**: `ipex-llm` library
- **Document**: In `docs/docs/integrations/llms/ipex_llm.ipynb`, added
instructions for saving/loading low-bit model.
- **Tests**: added new test cases to
`libs/community/tests/integration_tests/llms/test_ipex_llm.py`, added
config params.
- **Contribution maintainer**: @shane-huang
Description: Add support for Semantic topics and entities.
Classification done by pebblo-server is not used to enhance metadata of
Documents loaded by document loaders.
Dependencies: None
Documentation: Updated.
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**
- [x] **PR message**:
- **Description:** Deprecate persist method in Chroma no longer exists
in Chroma 0.4.x
- **Issue:** #20851
- **Dependencies:** None
- **Twitter handle:** AndresAlgaba1
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:**
This PR removes an unnecessary code snippet from the documentation. The
snippet in question is not relevant to the content and does not
contribute to the overall understanding of the topic. It contained
redundant imports and unused code, potentially causing confusion for
readers.
**Issue:**
There is no specific issue number associated with this change.
**Dependencies:**
No additional dependencies are required for this change.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
The RecursiveUrlLoader loader offers a link_regex parameter that can
filter out URLs. However, this filtering capability is limited, and if
the internal links of the website change, unexpected resources may be
loaded. These resources, such as font files, can cause problems in
subsequent embedding processing.
>
https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf
We can add the Content-Type in the HTTP response headers to the document
metadata so developers can choose which resources to use. This allows
developers to make their own choices.
For example, the following may be a good choice for text knowledge.
- text/plain - simple text file
- text/html - HTML web page
- text/xml - XML format file
- text/json - JSON format data
- application/pdf - PDF file
- application/msword - Word document
and ignore the following
- text/css - CSS stylesheet
- text/javascript - JavaScript script
- application/octet-stream - binary data
- image/jpeg - JPEG image
- image/png - PNG image
- image/gif - GIF image
- image/svg+xml - SVG image
- audio/mpeg - MPEG audio files
- video/mp4 - MP4 video file
- application/font-woff - WOFF font file
- application/font-ttf - TTF font file
- application/zip - ZIP compressed file
- application/octet-stream - binary data
**Twitter handle:** @coolbeevip
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
**Description:** In VoyageAI text-embedding examples use voyage-law-2
model
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: Fix misplaced zep cloud example links
- [x] **PR message**:
- **Description:** Fixes misplaced links for vector store and memory zep
cloud examples
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Rerank API
- **Twitter handle:** https://twitter.com/JinaAI_
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
OpenAI API compatible server may not support `safe_len_embedding`,
use `disable_safe_len_embeddings=True` to disable it.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
* Updating the provider docs page.
The RAG example was meant to be moved to cookbook, but was merged by
mistake.
* Fix bug in Groundedness Check
---------
Co-authored-by: JuHyung-Son <sonju0427@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Currently, when a new dev container is created, poetry does not work in
it with the error "No module named 'rapidfuzz'".
Install Poetry outside the project venv so that poetry and project
dependencies do not get mixed. Use pipx to install poetry securely in
its own isolated environment.
Issue: #12237
Twitter handle: https://twitter.com/ibratoev
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Currently, the regex is static (`r"(?<=[.?!])\s+"`),
which is only useful for certain use cases. The current change only
moves this to be a parameter of split_text(). Which adds flexibility
without making it more complex (as the default regex is still the same).
- **Issue:** Not applicable (I searched, no one seems to have created
this issue yet).
- **Dependencies:** None.
_If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17._
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: MarkdownHeaderTextSplitter Fails to Parse Headers with
non-printable characters. more #20643
The following is the official test case. Just replacing `# Foo\n\n` with
`\ufeff# Foo\n\n` will cause the test case to fail.
chunk metadata is empty
```python
def test_md_header_text_splitter_1() -> None:
"""Test markdown splitter by header: Case 1."""
markdown_document = (
"\ufeff# Foo\n\n"
" ## Bar\n\n"
"Hi this is Jim\n\n"
"Hi this is Joe\n\n"
" ## Baz\n\n"
" Hi this is Molly"
)
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on,
)
output = markdown_splitter.split_text(markdown_document)
expected_output = [
Document(
page_content="Hi this is Jim \nHi this is Joe",
metadata={"Header 1": "Foo", "Header 2": "Bar"},
),
Document(
page_content="Hi this is Molly",
metadata={"Header 1": "Foo", "Header 2": "Baz"},
),
]
assert output == expected_output
```
twitter: @coolbeevip
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Description :
- added functionalities - delete, index creation, using existing
connection object etc.
- updated usage
- Added LaceDB cloud OSS support
make lint_diff , make test checks done
- **Description:** fix a bug in the agent_token_buffer_memory
- **Issue:** agent_token_buffer_memory was not working with openai tools
- **Dependencies:** None
- **Twitter handle:** @pokidyshef
**Description:** Adds the command to install packages required before
using _Unstructured_ and _PDFMiner_ from `langchain.community`
**Documentation Page Being Updated:** [LangChain > Retrieval > Document
loaders > PDF > Using
Unstructured](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/#using-unstructured)
**Issue:** #20719
**Dependencies:** no dependencies
**Twitter handle:** SalikaDave
<!--
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17. -->
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Description
Add `aprep_output` method to `langchain/chains/base.py`. Some downstream
`ChatMessageHistory` objects that use async connections require an async
way to append to the context.
It turned out that `ainvoke()` was calling `prep_output` which is
synchronous.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
# Proxy Fix for Groq Class 🐛🚀
## Description
This PR fixes a bug related to proxy settings in the `Groq` class,
allowing users to connect to LangChain services via a proxy.
## Changes Made
- ✅ FIX support for specifying proxy settings in the `Groq` class.
- ✅ Resolved the bug causing issues with proxy settings.
- ❌ Did not include unit tests and documentation updates.
- ❌ Did not run make format, make lint, and make test to ensure code
quality and functionality because I couldn't get it to run, so I don't
program in Python and couldn't run `ruff`.
- ❔ Ensured that the changes are backwards compatible.
- ✅ No additional dependencies were added to `pyproject.toml`.
### Error Before Fix
```python
Traceback (most recent call last):
File "/home/bg/Documents/code/github.com/back2nix/test/groq/main.py", line 9, in <module>
chat = ChatGroq(
^^^^^^^^^
File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__
super().__init__(**kwargs)
File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for ChatGroq
__root__
Invalid `http_client` argument; Expected an instance of `httpx.AsyncClient` but got <class 'httpx.Client'> (type=type_error)
```
### Example usage after fix
```python3
import os
import httpx
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
chat = ChatGroq(
temperature=0,
groq_api_key=os.environ.get("GROQ_API_KEY"),
model_name="mixtral-8x7b-32768",
http_client=httpx.Client(
proxies="socks5://127.0.0.1:1080",
transport=httpx.HTTPTransport(local_address="0.0.0.0"),
),
http_async_client=httpx.AsyncClient(
proxies="socks5://127.0.0.1:1080",
transport=httpx.HTTPTransport(local_address="0.0.0.0"),
),
)
system = "You are a helpful assistant."
human = "{text}"
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])
chain = prompt | chat
out = chain.invoke({"text": "Explain the importance of low latency LLMs"})
print(out)
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Implemented the ability to enable full-text search within the
SingleStore vector store, offering users a versatile range of search
strategies. This enhancement allows users to seamlessly combine
full-text search with vector search, enabling the following search
strategies:
* Search solely by vector similarity.
* Conduct searches exclusively based on text similarity, utilizing
Lucene internally.
* Filter search results by text similarity score, with the option to
specify a threshold, followed by a search based on vector similarity.
* Filter results by vector similarity score before conducting a search
based on text similarity.
* Perform searches using a weighted sum of vector and text similarity
scores.
Additionally, integration tests have been added to comprehensively cover
all scenarios.
Updated notebook with examples.
CC: @baskaryan, @hwchase17
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- added guard on the `pyTigerGraph` import
- added a missed example page in the `docs/integrations/graphs/`
- formatted the `docs/integrations/providers/` page to the consistent
format. Added links.
- **Description:**
This PR adds support for advanced filtering to the integration of HANA
Vector Engine.
The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt,
$lte, $between, $in, $nin, $like, $and, $or
- **Issue:** N/A
- **Dependencies:** no new dependencies added
Added integration tests to:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`
Description of the new capabilities in notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
Thank you for contributing to LangChain!
community:perplexity[patch]: standardize init args
updated pplx_api_key and request_timeout so that aliased to api_key, and
timeout respectively. Added test that both continue to set the same
underlying attributes.
Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: docs: Update Zep Messaging, add links to Zep Cloud
Docs
- [x] **PR message**:
- **Description:** This PR updates Zep messaging in the docs + links to
Langchain Zep Cloud examples in our documentation
- **Twitter handle:** @paulpaliychuk51
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
This PR moves the interface and the logic to core.
The following changes to namespaces:
`indexes` -> `indexing`
`indexes._api` -> `indexing.api`
Testing code is intentionally duplicated for now since it's testing
different
implementations of the record manager (in-memory vs. SQL).
Common logic will need to be pulled out into the test client.
A follow up PR will move the SQL based implementation outside of
LangChain.
**Description:**
This PR fixes an issue in message formatting function for Anthropic
models on Amazon Bedrock.
Currently, LangChain BedrockChat model will crash if it uses Anthropic
models and the model return a message in the following type:
- `AIMessageChunk`
Moreover, when use BedrockChat with for building Agent, the following
message types will trigger the same issue too:
- `HumanMessageChunk`
- `FunctionMessage`
**Issue:**
https://github.com/langchain-ai/langchain/issues/18831
**Dependencies:**
No.
**Testing:**
Manually tested. The following code was failing before the patch and
works after.
```
@tool
def square_root(x: str):
"Useful when you need to calculate the square root of a number"
return math.sqrt(int(x))
llm = ChatBedrock(
model_id="anthropic.claude-3-sonnet-20240229-v1:0",
model_kwargs={ "temperature": 0.0 },
)
prompt = ChatPromptTemplate.from_messages(
[
("system", FUNCTION_CALL_PROMPT),
("human", "Question: {user_input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
tools = [square_root]
tools_string = format_tool_to_anthropic_function(square_root)
agent = (
RunnablePassthrough.assign(
user_input=lambda x: x['user_input'],
agent_scratchpad=lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
)
)
| prompt
| llm
| AnthropicFunctionsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
output = agent_executor.invoke({
"user_input": "What is the square root of 2?",
"tools_string": tools_string,
})
```
List of messages returned from Bedrock:
```
<SystemMessage> content='You are a helpful assistant.'
<HumanMessage> content='Question: What is the square root of 2?'
<AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n <invoke>\n <tool_name>square_root</tool_name>\n <parameters>\n <__arg1>2</__arg1>\n </parameters>\n </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'"
<FunctionMessage> content='1.4142135623730951' name='square_root'
```
Hi! My name is Alex, I'm an SDK engineer from
[Comet](https://www.comet.com/site/)
This PR updates the `CometTracer` class.
Fixed an issue when `CometTracer` failed while logging the data to Comet
because this data is not JSON-encodable.
The problem was in some of the `Run` attributes that could contain
non-default types inside, now these attributes are taken not from the
run instance, but from the `run.dict()` return value.
Causes an issue for this code
```python
from langchain.chat_models.openai import ChatOpenAI
from langchain.output_parsers.openai_tools import JsonOutputToolsParser
from langchain.schema import SystemMessage
prompt = SystemMessage(content="You are a nice assistant.") + "{question}"
llm = ChatOpenAI(
model_kwargs={
"tools": [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Searches the web for the answer to the question.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The question to search for.",
},
},
},
},
}
],
},
streaming=True,
)
parser = JsonOutputToolsParser(first_tool_only=True)
llm_chain = prompt | llm | parser | (lambda x: x)
for chunk in llm_chain.stream({"question": "tell me more about turtles"}):
print(chunk)
# message = llm_chain.invoke({"question": "tell me more about turtles"})
# print(message)
```
Instead by definition, we'll assume that RunnableLambdas consume the
entire stream and that if the stream isn't addable then it's the last
message of the stream that's in the usable format.
---
If users want to use addable dicts, they can wrap the dict in an
AddableDict class.
---
Likely, need to follow up with the same change for other places in the
code that do the upgrade
- **Description:** In January, Laiyer.ai became part of ProtectAI, which
means the model became owned by ProtectAI. In addition to that,
yesterday, we released a new version of the model addressing issues the
Langchain's community and others mentioned to us about false-positives.
The new model has a better accuracy compared to the previous version,
and we thought the Langchain community would benefit from using the
[latest version of the
model](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2).
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @alex_yaremchuk
This PR moves the implementations for chat history to core. So it's
easier to determine which dependencies need to be broken / add
deprecation warnings
Fixed an error in the sample code to ensure that the code can run
directly.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
langchain_community.document_loaders depricated
new langchain_google_community
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
docs: Fix link for `partition_pdf` in Semi_Structured_RAG.ipynb cookbook
- **Description:** Fix incorrect link to unstructured-io `partition_pdf`
section
Vector indexes in ClickHouse are experimental at the moment and can
sometimes break/change behaviour. So this PR makes it possible to say
that you don't want to specify an index type.
Any queries against the embedding column will be brute force/linear
scan, but that gives reasonable performance for small-medium dataset
sizes.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "docs: added a description of differences
langchain_google_genai vs langchain_google_vertexai"
- [ ]
- **Description:** added a description of differences
langchain_google_genai vs langchain_google_vertexai
**Description:** implemented GraphStore class for Apache Age graph db
**Dependencies:** depends on psycopg2
Unit and integration tests included. Formatting and linting have been
run.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update Neo4j Cypher templates to use function callback to pass context
instead of passing it in user prompt.
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** This pull request removes a duplicated `--quiet` flag
in the pip install command found in the LangSmith Walkthrough section of
the documentation.
**Issue:** N/A
**Dependencies:** None
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: docs"
- [ ] **PR message**:
- **Description:** Updated Tutorials for Vertex Vector Search
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
@lkuligin for review
---------
Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This pull request corrects a mistake in the variable name within the
example code. The variable doc_schema has been changed to dog_schema to
fix the error.
Description: you don't need to pass a version for Replicate official
models. That was broken on LangChain until now!
You can now run:
```
llm = Replicate(
model="meta/meta-llama-3-8b-instruct",
model_kwargs={"temperature": 0.75, "max_length": 500, "top_p": 1},
)
prompt = """
User: Answer the following yes/no question by reasoning step by step. Can a dog drive a car?
Assistant:
"""
llm(prompt)
```
I've updated the replicate.ipynb to reflect that.
twitter: @charliebholtz
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
ZhipuAI API only accepts `temperature` parameter between `(0, 1)` open
interval, and if `0` is passed, it responds with status code `400`.
However, 0 and 1 is often accepted by other APIs, for example, OpenAI
allows `[0, 2]` for temperature closed range.
This PR truncates temperature parameter passed to `[0.01, 0.99]` to
improve the compatibility between langchain's ecosystem's and ZhipuAI
(e.g., ragas `evaluate` often generates temperature 0, which results in
a lot of 400 invalid responses). The PR also truncates `top_p` parameter
since it has the same restriction.
Reference: [glm-4 doc](https://open.bigmodel.cn/dev/api#glm-4) (which
unfortunately is in Chinese though).
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
faster-whisper is a reimplementation of OpenAI's Whisper model using
CTranslate2, which is up to 4 times faster than enai/whisper for the
same accuracy while using less memory. The efficiency can be further
improved with 8-bit quantization on both CPU and GPU.
It can automatically detect the following 14 languages and transcribe
the text into their respective languages: en, zh, fr, de, ja, ko, ru,
es, th, it, pt, vi, ar, tr.
The gitbub repository for faster-whisper is :
https://github.com/SYSTRAN/faster-whisper
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
VSDX data contains EMF files. Some of these apparently can contain
exploits with some Adobe tools.
This is likely a false positive from antivirus software, but we
can remove it nonetheless.
Hey @eyurtsev, I noticed that the notebook isn't displaying the outputs
properly. I've gone ahead and rerun the cells to ensure that readers can
easily understand the functionality without having to run the code
themselves.
Replaced `from langchain.prompts` with `from langchain_core.prompts`
where it is appropriate.
Most of the changes go to `langchain_experimental`
Similar to #20348
…gFaceTextGenInference)
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for [HuggingFaceTextGenInference]
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in [HuggingFaceTextGenInference]
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
fix timeout issue
fix zhipuai usecase notebookbook
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
fixed broken `LangGraph` hyperlink
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
@rgupta2508 I believe this change is necessary following
https://github.com/langchain-ai/langchain/pull/20318 because of how
Milvus handles defaults:
59bf5e811a/pymilvus/client/prepare.py (L82-L85)
```python
num_shards = kwargs[next(iter(same_key))]
if not isinstance(num_shards, int):
msg = f"invalid num_shards type, got {type(num_shards)}, expected int"
raise ParamError(message=msg)
req.shards_num = num_shards
```
this way lets Milvus control the default value (instead of maintaining a
separate default in Langchain).
Let me know if I've got this wrong or you feel it's unnecessary. Thanks.
To support number of the shards for the collection to create in milvus
vvectorstores.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Move `FileCallbackHandler` from community to core
**Issue:** #20493
**Dependencies:** None
(imo) `FileCallbackHandler` is a built-in LangChain callback handler
like `StdOutCallbackHandler` and should properly be in in core.
- **Description:** added the headless parameter as optional argument to
the langchain_community.document_loaders AsyncChromiumLoader class
- **Dependencies:** None
- **Twitter handle:** @perinim_98
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- would happen when user's code tries to access attritbute that doesnt
exist, we prefer to let this crash in the user's code, rather than here
- also catch more cases where a runnable is invoked/streamed inside a
lambda. before we weren't seeing these as deps
**Description:** currently, the `DirectoryLoader` progress-bar maximum value is based on an incorrect number of files to process
In langchain_community/document_loaders/directory.py:127:
```python
paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob)
items = [
path
for path in paths
if not (self.exclude and any(path.match(glob) for glob in self.exclude))
]
```
`paths` returns both files and directories. `items` is later used to determine the maximum value of the progress-bar which gives an incorrect progress indication.
- Add functions (_stream, _astream)
- Connect to _generate and _agenerate
Thank you for contributing to LangChain!
- [x] **PR title**: "community: Add streaming logic in ChatHuggingFace"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Addition functions (_stream, _astream) and connection
to _generate and _agenerate
- **Issue:** #18782
- **Dependencies:** none
- **Twitter handle:** @lunara_x
**Community: Unify Titan Takeoff Integrations and Adding Embedding
Support**
**Description:**
Titan Takeoff no longer reflects this either of the integrations in the
community folder. The two integrations (TitanTakeoffPro and
TitanTakeoff) where causing confusion with clients, so have moved code
into one place and created an alias for backwards compatibility. Added
Takeoff Client python package to do the bulk of the work with the
requests, this is because this package is actively updated with new
versions of Takeoff. So this integration will be far more robust and
will not degrade as badly over time.
**Issue:**
Fixes bugs in the old Titan integrations and unified the code with added
unit test converge to avoid future problems.
**Dependencies:**
Added optional dependency takeoff-client, all imports still work without
dependency including the Titan Takeoff classes but just will fail on
initialisation if not pip installed takeoff-client
**Twitter**
@MeryemArik9
Thanks all :)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Description: Add support for authorized identities in PebbloSafeLoader.
Now with this change, PebbloSafeLoader will extract
authorized_identities from metadata and send it to pebblo server
Dependencies: None
Documentation: None
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
From `langchain_community 0.0.30`, there's a bug that cannot send a
file-like object via `file` parameter instead of `file path` due to
casting the `file_path` to str type even if `file_path` is None.
which means that when I call the `partition_via_api()`, exactly one of
`filename` and `file` must be specified by the following error message.
however, from `langchain_community 0.0.30`, `file_path` is casted into
`str` type even `file_path` is None in `get_elements_from_api()` and got
an error at `exactly_one(filename=filename, file=file)`.
here's an error message
```
---> 51 exactly_one(filename=filename, file=file)
53 if metadata_filename and file_filename:
54 raise ValueError(
55 "Only one of metadata_filename and file_filename is specified. "
56 "metadata_filename is preferred. file_filename is marked for deprecation.",
57 )
File /opt/homebrew/lib/python3.11/site-packages/unstructured/partition/common.py:441, in exactly_one(**kwargs)
439 else:
440 message = f"{names[0]} must be specified."
--> 441 raise ValueError(message)
ValueError: Exactly one of filename and file must be specified.
```
So, I simply made a change that casting to str type when `file_path` is
not None.
I use `UnstructuredAPIFileLoader` like below.
```
from langchain_community.document_loaders.unstructured import UnstructuredAPIFileLoader
documents: list = UnstructuredAPIFileLoader(
file_path=None,
file=file, # file-like object, io.BytesIO type
mode='elements',
url='http://127.0.0.1:8000/general/v0/general',
content_type='application/pdf',
metadata_filename='asdf.pdf',
).load_and_split()
```
- [x] **PR title**: "community: improve kuzu cypher generation prompt"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Improves the Kùzu Cypher generation prompt to be more
robust to open source LLM outputs
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @kuzudb
- [x] **Add tests and docs**: If you're adding a new integration, please
include
No new tests (non-breaking. change)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
## Description:
The PR introduces 3 changes:
1. added `recursive` property to `O365BaseLoader`. (To keep the behavior
unchanged, by default is set to `False`). When `recursive=True`,
`_load_from_folder()` also recursively loads all nested folders.
2. added `folder_id` to SharePointLoader.(similar to (this
PR)[https://github.com/langchain-ai/langchain/pull/10780] ) This
provides an alternative to `folder_path` that doesn't seem to reliably
work.
3. when none of `document_ids`, `folder_id`, `folder_path` is provided,
the loader fetches documets from root folder. Combined with
`recursive=True` this provides an easy way of loading all compatible
documents from SharePoint.
The PR contains the same logic as [this stale
PR](https://github.com/langchain-ai/langchain/pull/10780) by
@WaleedAlfaris. I'd like to ask his blessing for moving forward with
this one.
## Issue:
- As described in https://github.com/langchain-ai/langchain/issues/19938
and https://github.com/langchain-ai/langchain/pull/10780 the sharepoint
loader often does not seem to work with folder_path.
- Recursive loading of subfolders is a missing functionality
## Dependecies: None
Twitter handle:
@martintriska1 @WRhetoric
This is my first PR here, please be gentle :-)
Please review @baskaryan
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR updates OctoAIEndpoint LLM to subclass BaseOpenAI as OctoAI is
an OpenAI-compatible service. The documentation and tests have also been
updated.
**Description:** Adds ThirdAI NeuralDB retriever integration. NeuralDB
is a CPU-friendly and fine-tunable text retrieval engine. We previously
added a vector store integration but we think that it will be easier for
our customers if they can also find us under under
langchain-community/retrievers.
---------
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
**Description:** Make ChatDatabricks model supports stream
**Issue:** N/A
**Dependencies:** MLflow nightly build version (we will release next
MLflow version soon)
**Twitter handle:** N/A
Manually test:
(Before testing, please install `pip install
git+https://github.com/mlflow/mlflow.git`)
```python
# Test Databricks Foundation LLM model
from langchain.chat_models import ChatDatabricks
chat_model = ChatDatabricks(
endpoint="databricks-llama-2-70b-chat",
max_tokens=500
)
from langchain_core.messages import AIMessageChunk
for chunk in chat_model.stream("What is mlflow?"):
print(chunk.content, end="|")
```
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Add conditional: bool property to json representation of the graphs
- Add option to generate mermaid graph stripped of styles (useful as a
text representation of graph)
…s arg too
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:**
This PR adds a callback handler for UpTrain. It performs evaluations in
the RAG pipeline to check the quality of retrieved documents, generated
queries and responses.
- **Dependencies:**
- The UpTrainCallbackHandler requires the uptrain package
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
enviroment variable ANTHROPIC_API_URL will not work if anthropic_api_url
has default value
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Description**: Support filter by OR and AND for deprecated PGVector
version
**Issue**: #20445
**Dependencies**: N/A
**Twitter** handle: @martinferenaz
- **Description:**Add Google Firestore Vector store docs
- **Issue:** NA
- **Dependencies:** NA
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Description: fixes LangChainDeprecationWarning: The class
`langchain_community.embeddings.cohere.CohereEmbeddings` was deprecated
in langchain-community 0.0.30 and will be removed in 0.2.0. An updated
version of the class exists in the langchain-cohere package and should
be used instead. To use it run `pip install -U langchain-cohere` and
import as `from langchain_cohere import CohereEmbeddings`.

Dependencies : langchain_cohere
Twitter handle: @Mo_Noumaan
Description of features on mermaid graph renderer:
- Fixing CDN to use official Mermaid JS CDN:
https://www.jsdelivr.com/package/npm/mermaid?tab=files
- Add device_scale_factor to allow increasing quality of resulting PNG.
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for [DeepInfra]
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in [DeepInfra]
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: This update refines the documentation for
`RunnablePassthrough` by removing an unnecessary import and correcting a
minor syntactical error in the example provided. This change enhances
the clarity and correctness of the documentation, ensuring that users
have a more accurate guide to follow.
Issue: N/A
Dependencies: None
This PR focuses solely on documentation improvements, specifically
targeting the `RunnablePassthrough` class within the `langchain_core`
module. By clarifying the example provided in the docstring, users are
offered a more straightforward and error-free guide to utilizing the
`RunnablePassthrough` class effectively.
As this is a documentation update, it does not include changes that
require new integrations, tests, or modifications to dependencies. It
adheres to the guidelines of minimal package interference and backward
compatibility, ensuring that the overall integrity and functionality of
the LangChain package remain unaffected.
Thank you for considering this documentation refinement for inclusion in
the LangChain project.
Fix of YandexGPT embeddings.
The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.
A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.
Twitter: @egor_krash
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Updates the documentation for Portkey and Langchain.
Also updates the notebook. The current documentation is fairly old and
is non-functional.
**Twitter handle:** @portkeyai
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
`_ListSQLDatabaseToolInput` raise error if model returns `{}`.
For example, gpt-4-turbo returns `{}` with SQL Agent initialized by
`create_sql_agent`.
So, I set default value `""` for `_ListSQLDatabaseToolInput` tool_input.
This is actually a gpt-4-turbo issue, not a LangChain issue, but I
thought it would be helpful to set a default value `""`.
This problem is discussed in detail in the following Issue.
**Issue:** https://github.com/langchain-ai/langchain/issues/20405
**Dependencies:** none
Sorry, I did not add or change the test code, as tests for this
components was not exist .
However, I have tested the following code based on the [SQL Agent
Document](https://python.langchain.com/docs/use_cases/sql/agents/), to
make sure it works.
```
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_openai import ChatOpenAI
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
agent_executor = create_sql_agent(llm, db=db, agent_type="openai-tools", verbose=True)
result = agent_executor.invoke("List the total sales per country. Which country's customers spent the most?")
print(result["output"])
```
- **Description:** Complete the support for Lua code in
langchain.text_splitter module.
- **Dependencies:** No
- **Twitter handle:** @saberuster
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_groq import ChatGroq
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatGroq(model_name="mixtral-8x7b-32768", temperature=0)
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
```
> Entering new AgentExecutor chain...
Invoking: `magic_function` with `{'input': 3}`
5The value of magic\_function(3) is 5.
> Finished chain.
{'input': 'what is the value of magic_function(3)?',
'output': 'The value of magic\\_function(3) is 5.'}
```
**Description:** Masking of the API key for AI21 models
**Issue:** Fixes#12165 for AI21
**Dependencies:** None
Note: This fix came in originally through #12418 but was possibly missed
in the refactor to the AI21 partner package
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Replaced all `from langchain.callbacks` into `from
langchain_core.callbacks` .
Changes in the `langchain` and `langchain_experimental`
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description**: The pydantic schema fields are supposed to be
optional but the use of `...` makes them required. This causes a
`ValidationError` when running the example code. I replaced `...` with
`default=None` to make the fields optional as intended. I also
standardized the format for all fields.
- **Issue**: n/a
- **Dependencies**: none
- **Twitter handle**: https://twitter.com/m_atoms
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for Llamafile
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in community llamafile.py
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
spelling error fixed
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for HuggingFaceEndpoint
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in community HuggingFaceEndpoint
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.
- **Description:** Adds FireCrawl data loader
- **Dependencies:** firecrawl-py
- **Twitter handle:** @mendableai
ccing contributors: (@ericciarla @nickscamara)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
LLMs might sometimes return invalid response for LLM graph transformer.
Instead of failing due to pydantic validation, we skip it and manually
check and optionally fix error where we can, so that more information
gets extracted
- **Description:** Added cross-links for easy access of api
documentation of each output parser class from it's description page.
- **Issue:** related to issue #19969
Co-authored-by: Haris Ali <haris.ali@formulatrix.com>
avaliable -> available
- **Description:** fixed typo
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Mistral gives us one ID per response, no individual IDs for tool calls.
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_mistralai import ChatMistralAI
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatMistralAI(model="mistral-large-latest", temperature=0)
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** Adds chroma to the partners package. Tests & code
mirror those in the community package.
**Dependencies:** None
**Twitter handle:** @akiradev0x
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code.
See #20050 as a template for this PR.
- As a byproduct: Added 3 missed `test_imports`.
- Added missed `SolarChat` in to __init___.py Added it into test_import
ut.
- Added `# type: ignore` to fix linting. It is not clear, why linting
errors appear after ^ changes.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
MessagesPlaceholder("chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatAnthropic(model="claude-3-opus-20240229")
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
```
> Entering new AgentExecutor chain...
Invoking: `magic_function` with `{'input': 3}`
responded: [{'text': '<thinking>\nThe user has asked for the value of magic_function applied to the input 3. Looking at the available tools, magic_function is the relevant one to use here, as it takes an integer input and returns an integer output.\n\nThe magic_function has one required parameter:\n- input (integer)\n\nThe user has directly provided the value 3 for the input parameter. Since the required parameter is present, we can proceed with calling the function.\n</thinking>', 'type': 'text'}, {'id': 'toolu_01HsTheJPA5mcipuFDBbJ1CW', 'input': {'input': 3}, 'name': 'magic_function', 'type': 'tool_use'}]
5
Therefore, the value of magic_function(3) is 5.
> Finished chain.
{'input': 'what is the value of magic_function(3)?',
'output': 'Therefore, the value of magic_function(3) is 5.'}
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
core[minor], langchain[patch], openai[minor], anthropic[minor], fireworks[minor], groq[minor], mistralai[minor]
```python
class ToolCall(TypedDict):
name: str
args: Dict[str, Any]
id: Optional[str]
class InvalidToolCall(TypedDict):
name: Optional[str]
args: Optional[str]
id: Optional[str]
error: Optional[str]
class ToolCallChunk(TypedDict):
name: Optional[str]
args: Optional[str]
id: Optional[str]
index: Optional[int]
class AIMessage(BaseMessage):
...
tool_calls: List[ToolCall] = []
invalid_tool_calls: List[InvalidToolCall] = []
...
class AIMessageChunk(AIMessage, BaseMessageChunk):
...
tool_call_chunks: Optional[List[ToolCallChunk]] = None
...
```
Important considerations:
- Parsing logic occurs within different providers;
- ~Changing output type is a breaking change for anyone doing explicit
type checking;~
- ~Langsmith rendering will need to be updated:
https://github.com/langchain-ai/langchainplus/pull/3561~
- ~Langserve will need to be updated~
- Adding chunks:
- ~AIMessage + ToolCallsMessage = ToolCallsMessage if either has
non-null .tool_calls.~
- Tool call chunks are appended, merging when having equal values of
`index`.
- additional_kwargs accumulate the normal way.
- During streaming:
- ~Messages can change types (e.g., from AIMessageChunk to
AIToolCallsMessageChunk)~
- Output parsers parse additional_kwargs (during .invoke they read off
tool calls).
Packages outside of `partners/`:
- https://github.com/langchain-ai/langchain-cohere/pull/7
- https://github.com/langchain-ai/langchain-google/pull/123/files
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: When multithreading is set to True and using the
DirectoryLoader, there was a bug that caused the return type to be a
double nested list. This resulted in other places upstream not being
able to utilize the from_documents method as it was no longer a
`List[Documents]` it was a `List[List[Documents]]`. The change made was
to just loop through the `future.result()` and yield every item.
Issue: #20093
Dependencies: N/A
Twitter handle: N/A
This unit test fails likely validation by the openai client.
Newer openai library seems to be doing more validation so the existing
test fails since http_client needs to be of httpx instance
- **Description**: fixes BooleanOutputParser detecting sub-words ("NOW
this is likely (YES)" -> `True`, not `AmbiguousError`)
- **Issue(s)**: fixes#11408 (follow-up to #17810)
- **Dependencies**: None
- **GitHub handle**: @casperdcl
<!-- if unreviewd after a few days, @-mention one of baskaryan, efriis,
eyurtsev, hwchase17 -->
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
Use the `Stream` context managers in `ChatOpenAi` `stream` and `astream`
method.
Using the context manager returned by the OpenAI client makes it
possible to terminate the stream early since the response connection
will be closed when the context manager exists.
**Issue:** #5340
**Twitter handle:** @snopoke
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Bug fix. Removed extra line in `GCSDirectoryLoader`
to allow catching Exceptions. Now also logs the file path if Exception
is raised for easier debugging.
- **Issue:** #20198 Bug since langchain-community==0.0.31
- **Dependencies:** No change
- **Twitter handle:** timothywong731
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- make Tencent Cloud VectorDB support metadata filtering.
- implement delete function for Tencent Cloud VectorDB.
- support both Langchain Embedding model and Tencent Cloud VDB embedding
model.
- Tencent Cloud VectorDB support filter search keyword, compatible with
langchain filtering syntax.
- add Tencent Cloud VectorDB TranslationVisitor, now work with self
query retriever.
- more documentations.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** In this PR I fixed the links which points to the API
docs for classes in OpenAI functions and OpenAI tools section of output
parsers.
- **Issue:** It fixed the issue #19969
Co-authored-by: Haris Ali <haris.ali@formulatrix.com>
Issue `langchain_community.cross_encoders` didn't have flattening
namespace code in the __init__.py file.
Changes:
- added code to flattening namespaces (used #20050 as a template)
- added ut for a change
- added missed `test_imports` for `chat_loaders` and
`chat_message_histories` modules
This PR make `request_timeout` and `max_retries` configurable for
ChatAnthropic.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: Add semantic caching and memory using
MongoDB"
- [ ] **PR message**:
- **Description:** This PR introduces functionality for adding semantic
caching and chat message history using MongoDB in RAG applications. By
leveraging the MongoDBCache and MongoDBChatMessageHistory classes,
developers can now enhance their retrieval-augmented generation
applications with efficient semantic caching mechanisms and persistent
conversation histories, improving response times and consistency across
chat sessions.
- **Issue:** N/A
- **Dependencies:** Requires `datasets`, `langchain`,
`langchain-mongodb`, `langchain-openai`, `pymongo`, and `pandas` for
implementation. MongoDB Atlas is used for database services, and the
OpenAI API for model access.
- **Twitter handle:** @richmondalake
Co-authored-by: Erick Friis <erick@langchain.dev>
Issue:
When async_req is the default value True, pinecone client return the
multiprocessing AsyncResult object.
When async_req is set to False, pinecone client return the result
directly. `[{'upserted_count': 1}]` . Calling get() method will throw an
error in this case.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
- **Twitter handle:** `@alexsherstinsky`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Last year Microsoft [changed the
name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
of Azure Cognitive Search to Azure AI Search. This PR updates the
Langchain Azure Retriever API and it's associated docs to reflect this
change. It may be confusing for users to see the name Cognitive here and
AI in the Microsoft documentation which is why this is needed. I've also
added a more detailed example to the Azure retriever doc page.
There are more places that need a similar update but I'm breaking it up
so the PRs are not too big 😄 Fixing my errors from the previous PR.
Twitter: @marlene_zw
Two new tests added to test backward compatibility in
`libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** update langchain anthropic templates to support
Claude 3 (iterative search, chain of note, summarization, and XML
response)
- **Issue:** issue # N/A. Stability issues and errors encountered when
trying to use older langchain and anthropic libraries.
- **Dependencies:**
- langchain_anthropic version 0.1.4\
- anthropic package version in the range ">=0.17.0,<1" to support
langchain_anthropic.
- **Twitter handle:** @d_w_b7
- [ x]**Add tests and docs**: If you're adding a new integration, please
include
1. used instructions in the README for testing
- [ x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
After this PR it will be possible to pass a cache instance directly to a
language model. This is useful to allow different language models to use
different caches if needed.
- **Issue:** close#19276
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- Added missed providers
- Added links, descriptions in related examples
- Formatted in a consistent format
Co-authored-by: Erick Friis <erick@langchain.dev>
Updated a page with existing document loaders with links to examples.
Fixed formatting of one example.
Co-authored-by: Erick Friis <erick@langchain.dev>
Issue: The `graph` code was moved into the `community` package a long
ago. But the related documentation is still in the
[use_cases](https://python.langchain.com/docs/use_cases/graph/integrations/diffbot_graphtransformer)
section and not in the `integrations`.
Changes:
- moved the `use_cases/graph/integrations` notebooks into the
`integrations/graphs`
- renamed files and changed titles to follow the consistent format
- redirected old page URLs to new URLs in `vercel.json` and in several
other pages
- added descriptions and links when necessary
- formatted into the consistent format
Should hopefully avoid weird broken link edge cases.
Relative links now trip up the Docusaurus broken link checker, so this
PR also removes them.
Also snuck in a small addition about asyncio
**Description:**
The `LocalFileStore` class can be used to create an on-disk
`CacheBackedEmbeddings` cache. However, the default `umask` settings
gives file/directory write permissions only to the original user. Once
the cache directory is created by the first user, other users cannot
write their own cache entries into the directory.
To make the cache usable by multiple users, this pull request updates
the `LocalFileStore` constructor to allow the permissions for newly
created directories and files to be specified. The specified permissions
override the default `umask` values.
For example, when configured as follows:
```python
file_store = LocalFileStore(temp_dir, chmod_dir=0o770, chmod_file=0o660)
```
then "user" and "group" (but not "other") have permissions to access the
store, which means:
* Anyone in our group could contribute embeddings to the cache.
* If we implement cache cleanup/eviction in the future, anyone in our
group could perform the cleanup.
The default values for the `chmod_dir` and `chmod_file` parameters is
`None`, which retains the original behavior of using the default `umask`
settings.
**Issue:**
Implements enhancement #18075.
**Testing:**
I updated the `LocalFileStore` unit tests to test the permissions.
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** Adds async variants of afrom_texts and
afrom_embeddings into `OpenSearchVectorSearch`, which allows for
`afrom_documents` to be called.
- **Issue:** I implemented this because my use case involves an async
scraper generating documents as and when they're ready to be ingested by
Embedding/OpenSearch
- **Dependencies:** None that I'm aware
Co-authored-by: Ben Mitchell <b.mitchell@reply.com>
This PR supports using Pydantic v2 objects to generate the schema for
the JSONOutputParser (#19441). This also adds a `json_schema` parameter
to allow users to pass any JSON schema to validate with, not just
pydantic.
core/langchain_core/_api[Patch]: mypy ignore fixes#17048
Related to #17048
Applied mypy fixes to below two files:
libs/core/langchain_core/_api/deprecation.py
libs/core/langchain_core/_api/beta_decorator.py
Summary of Fixes:
**Issue 1**
class _deprecated_property(type(obj)): # type: ignore
error: Unsupported dynamic base class "type" [misc]
Fix:
1. Added an __init__ method to _deprecated_property to initialize the
fget, fset, fdel, and __doc__ attributes.
2. In the __get__, __set__, and __delete__ methods, we now use the
self.fget, self.fset, and self.fdel attributes to call the original
methods after emitting the warning.
3. The finalize function now creates an instance of _deprecated_property
with the fget, fset, fdel, and doc attributes from the original obj
property.
**Issue 2**
def finalize( # type: ignore
wrapper: Callable[..., Any], new_doc: str
) -> T:
error: All conditional function variants must have identical
signatures
Fix: Ensured that both definitions of the finalize function have the
same signature
Twitter Handle -
https://x.com/gupteutkarsha?s=11&t=uwHe4C3PPpGRvoO5Qpm1aA
**Description:** Citations are the main addition in this PR. We now emit
them from the multihop agent! Additionally the agent is now more
flexible with observations (`Any` is now accepted), and the Cohere SDK
version is bumped to fix an issue with the most recent version of
pydantic v1 (1.10.15)
- **Description:** In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, I implemented delete method
and all async methods in opensearch_vector_search
- **Dependencies:** No changes
- **Description:** Improvement for #19599: fixing missing return of
graph.draw_mermaid_png and improve it to make the saving of the rendered
image optional
Co-authored-by: Angel Igareta <angel.igareta@klarna.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Update of Cohere documentation (main provider page)
**Issue:** After addition of the Cohere partner package, the
documentation was out of date
**Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: deprecating integrations moved to
langchain_google_community"
- [ ] **PR message**: deprecating integrations moved to
langchain_google_community
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Removes required usage of `requests` from `langchain-core`, all of which
has been deprecated.
- removes Tracer V1 implementations
- removes old `try_load_from_hub` github-based hub implementations
Removal done in a way where imports will still succeed, and usage will
fail with a `RuntimeError`.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** mention not-caching methods in CacheBackedEmbeddings
- **Issue:** n/a I almost created one until I read the code
- **Dependencies:** n/a
- **Twitter handle:** `tarsylia`
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description**: Improves the stability of all Cohere partner package
integration tests. Fixes a bug with document parsing (both dicts and
Documents are handled).
**Description**: This PR simplifies an integration test within the
Cohere partner package:
* It no longer relies on exact model answers
* It no longer relies on a third party tool
This PR completes work for PR #18798 to expose raw tool output in
on_tool_end.
Affected APIs:
* astream_log
* astream_events
* callbacks sent to langsmith via langsmith-sdk
* Any other code that relies on BaseTracer!
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- This ensures ids are stable across streamed chunks
- Multiple messages in batch call get separate ids
- Also fix ids being dropped when combining message chunks
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
- **Issue:** None,
- **Dependencies:** None,
- **Tag maintainer:** @nfcampos ,
- **Twitter handle:** peter_v
I ran `make format`, `make lint` and `make test`.
Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader
Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.
**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...
INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
As in #19346, this PR exposes `request_timeout` in `BaseCohere`, while
`max_retires` is no longer a parameter of the beneath client
(`cohere.Client`) and it is already configured in
`langchain_cohere.llms.Cohere`.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** the layout of html pages can be variant based on the
bootstrap framework or the styles of the pages. So we need to have a
splitter to transform the html tags to a proper layout and then split
the html content based on the provided list of tags to determine its
html sections. We are using BS4 library along with xslt structure to
split the html content using an section aware approach.
- **Dependencies:** No new dependencies
- **Twitter handle:** @m_setayesh
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
[Dria](https://dria.co/) is a hub of public RAG models for developers to
both contribute and utilize a shared embedding lake. This PR adds a
retriever that can retrieve documents from Dria.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** Fix argument translation from OpenAPI spec to OpenAI
function call (and similar)
- **Issue:** OpenGPTs failures with calling Action Server based actions.
- **Dependencies:** None
- **Twitter handle:** mikkorpela
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
~2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: Update `ChatZhipuAI` to support the latest `glm-4` model.
Issue: N/A
Dependencies: httpx, httpx-sse, PyJWT
The previous `ChatZhipuAI` implementation requires the `zhipuai`
package, and cannot call the latest GLM model. This is because
- The old version `zhipuai==1.*` doesn't support the latest model.
- `zhipuai==2.*` requires `pydantic V2`, which is incompatible with
'langchain-community'.
This re-implementation invokes the GLM model by sending HTTP requests to
[open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx`
package, and uses the `httpx-sse` package to handle stream events.
---------
Co-authored-by: zR <2448370773@qq.com>
- **Description:** Add functionality to generate Mermaid syntax and
render flowcharts from graph data. This includes support for custom node
colors and edge curve styles, as well as the ability to export the
generated graphs to PNG images using either the Mermaid.INK API or
Pyppeteer for local rendering.
- **Dependencies:** Optional dependencies are `pyppeteer` if rendering
wants to be done using Pypeteer and Javascript code.
---------
Co-authored-by: Angel Igareta <angel.igareta@klarna.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** An additional `U` argument was added for the
instructions to install the pip packages for the MediaWiki Dump Document
loader which was leading to error in installing the package. Removing
the argument fixed the command to install.
**Issue:** #19820
**Dependencies:** No dependency change requierd
**Twitter handle:** [@vardhaman722](https://twitter.com/vardhaman722)
* Replace `source_documents` with `documents`
* Pass `documents` as a named arg vs keyword
* Make `parsed_docs` more robust
* Fix edge case of doc page_content being `None`
- **Updating Together.ai Endpoint**: "langchain_together: Updated
Deprecated endpoint for partner package"
- Description: The inference API of together is deprecates, do replaced
with completions and made corresponding changes.
- Twitter handle: @dev_yashmathur
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Add attribution_token within
GoogleVertexAISearchRetriever so user can provide this information to
Google support team or product team during debug session.
Reference:
https://cloud.google.com/generative-ai-app-builder/docs/view-analytics#user-events
Attribution tokens. Attribution tokens are unique IDs generated by
Vertex AI Search and returned with each search request. Make sure to
include that attribution token as UserEvent.attributionToken with any
user events resulting from a search. This is needed to identify if a
search is served by the API. Only user events with a Google-generated
attribution token are used to compute metrics.
- **Issue:** No
- **Dependencies:** No
- **Twitter handle:** abehsu1992626
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Support reranking based on cross encoder models
available from HuggingFace.
- Added `CrossEncoder` schema
- Implemented `HuggingFaceCrossEncoder` and
`SagemakerEndpointCrossEncoder`
- Implemented `CrossEncoderReranker` that performs similar functionality
to `CohereRerank`
- Added `cross-encoder-reranker.ipynb` to demonstrate how to use it.
Please let me know if anything else needs to be done to make it visible
on the table-of-contents navigation bar on the left, or on the card list
on [retrievers documentation
page](https://python.langchain.com/docs/integrations/retrievers).
- **Issue:** N/A
- **Dependencies:** None other than the existing ones.
---------
Co-authored-by: Kenny Choe <kchoe@amazon.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.
Issue: https://github.com/langchain-ai/langchain/issues/11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17
Hello! We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.
Thank you for taking the time to review our PR!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Description
This implementation adds functionality from the AlphaVantage API,
renowned for its comprehensive financial data. The class encapsulates
various methods, each dedicated to fetching specific types of financial
information from the API.
### Implemented Functions
- **`search_symbols`**:
- Searches the AlphaVantage API for financial symbols using the provided
keywords.
- **`_get_market_news_sentiment`**:
- Retrieves market news sentiment for a specified stock symbol from the
AlphaVantage API.
- **`_get_time_series_daily`**:
- Fetches daily time series data for a specific symbol from the
AlphaVantage API.
- **`_get_quote_endpoint`**:
- Obtains the latest price and volume information for a given symbol
from the AlphaVantage API.
- **`_get_time_series_weekly`**:
- Gathers weekly time series data for a particular symbol from the
AlphaVantage API.
- **`_get_top_gainers_losers`**:
- Provides details on top gainers, losers, and most actively traded
tickers in the US market from the AlphaVantage API.
### Issue:
- #11994
### Dependencies:
- 'requests' library for HTTP requests. (import requests)
- 'pytest' library for testing. (import pytest)
---------
Co-authored-by: Adam Badar <94140103+adam-badar@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
- **Twitter handle:** `@alexsherstinsky`
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Code written by following, the official documentation
of [Google Drive
Loader](https://python.langchain.com/docs/integrations/document_loaders/google_drive),
gives errors. I have opened an issue regarding this. See #14725. This is
a pull request for modifying the documentation to use an approach that
makes the code work. Basically, the change is that we need to always set
the GOOGLE_APPLICATION_CREDENTIALS env var to an emtpy string, rather
than only in case of RefreshError. Also, rewrote 2 paragraphs to make
the instructions more clear.
- **Issue:** See this related [issue #
14725](https://github.com/langchain-ai/langchain/issues/14725)
- **Dependencies:** NA
- **Tag maintainer:** @baskaryan
- **Twitter handle:** NA
Co-authored-by: Snehil <snehil@example.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: added support for llmsherpa library"
- [x] **Add tests and docs**:
1. Integration test:
'docs/docs/integrations/document_loaders/test_llmsherpa.py'.
2. an example notebook:
`docs/docs/integrations/document_loaders/llmsherpa.ipynb`.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
this pr also drops the community added action for checking broken links
in mdx. It does not work well for our use case, throwing errors for
local paths, plus the rest of the errors our in house solution had.
# Description
Implementing `_combine_llm_outputs` to `ChatMistralAI` to override the
default implementation in `BaseChatModel` returning `{}`. The
implementation is inspired by the one in `ChatOpenAI` from package
`langchain-openai`.
# Issue
None
# Dependencies
None
# Twitter handle
None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
This template utilizes Chroma and TGI (Text Generation Inference) to
execute RAG on the Intel Xeon Scalable Processors. It serves as a
demonstration for users, illustrating the deployment of the RAG service
on the Intel Xeon Scalable Processors and showcasing the resulting
performance enhancements.
**Issue:**
None
**Dependencies:**
The template contains the poetry project requirements to run this
template.
CPU TGI batching is WIP.
**Twitter handle:**
None
---------
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** We'd like to support passing additional kwargs in
`with_structured_output`. I believe this is the accepted approach to
enable additional arguments on API calls.
- **Description:** Haskell language support added in text_splitter
module
- **Dependencies:** No
- **Twitter handle:** @nisargtr
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** PR adds support for limiting number of messages
preserved in a session history for DynamoDBChatMessageHistory
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Subject: Fix Type Misdeclaration for index_schema in redis/base.py
I noticed a type misdeclaration for the index_schema column in the
redis/base.py file.
When following the instructions outlined in [Redis Custom Metadata
Indexing](https://python.langchain.com/docs/integrations/vectorstores/redis)
to create our own index_schema, it leads to a Pylance type error. <br/>
**The error message indicates that Dict[str, list[Dict[str, str]]] is
incompatible with the type Optional[Union[Dict[str, str], str,
os.PathLike]].**
```
index_schema = {
"tag": [{"name": "credit_score"}],
"text": [{"name": "user"}, {"name": "job"}],
"numeric": [{"name": "age"}],
}
rds, keys = Redis.from_texts_return_keys(
texts,
embeddings,
metadatas=metadata,
redis_url="redis://localhost:6379",
index_name="users_modified",
index_schema=index_schema,
)
```
Therefore, I have created this pull request to rectify the type
declaration problem.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Feature
- Set additional headers in constructor
- Headers will be sent in post request
This feature is useful if deploying Ollama on a cloud service such as
hugging face, which requires authentication tokens to be passed in the
request header.
## Tests
- Test if header is passed
- Test if header is not passed
Similar to https://github.com/langchain-ai/langchain/pull/15881
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
If `prompt` is passed into `create_sql_agent()`, then
`toolkit.get_context()` shouldn't be executed against the database
unless relevant prompt variables (`table_info` or `table_names`) are
present .
Thank you for contributing to LangChain!
- [x] **PR title**: "community: Implement DirectoryLoader lazy_load
function"
- [x] **Description**: The `lazy_load` function of the `DirectoryLoader`
yields each document separately. If the given `loader_cls` of the
`DirectoryLoader` also implemented `lazy_load`, it will be used to yield
subdocuments of the file.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access:
`libs/community/tests/unit_tests/document_loaders/test_directory_loader.py`
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory:
`docs/docs/integrations/document_loaders/directory.ipynb`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
When using the SQLDatabaseChain with Llama2-70b LLM and, SQLite
database. I was getting `Warning: You can only execute one statement at
a time.`.
```
from langchain.sql_database import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
sql_database_path = '/dccstor/mmdataretrieval/mm_dataset/swimming_record/rag_data/swimmingdataset.db'
sql_db = get_database(sql_database_path)
db_chain = SQLDatabaseChain.from_llm(mistral, sql_db, verbose=True, callbacks = [callback_obj])
db_chain.invoke({
"query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?"
})
```
Error:
```
Warning Traceback (most recent call last)
Cell In[31], line 3
1 import langchain
2 langchain.debug=False
----> 3 db_chain.invoke({
4 "query": "What is the best time of Lance Larson in men's 100 meter butterfly competition?"
5 })
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:162, in Chain.invoke(self, input, config, **kwargs)
160 except BaseException as e:
161 run_manager.on_chain_error(e)
--> 162 raise e
163 run_manager.on_chain_end(outputs)
164 final_outputs: Dict[str, Any] = self.prep_outputs(
165 inputs, outputs, return_only_outputs
166 )
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain/chains/base.py:156, in Chain.invoke(self, input, config, **kwargs)
149 run_manager = callback_manager.on_chain_start(
150 dumpd(self),
151 inputs,
152 name=run_name,
153 )
154 try:
155 outputs = (
--> 156 self._call(inputs, run_manager=run_manager)
157 if new_arg_supported
158 else self._call(inputs)
159 )
160 except BaseException as e:
161 run_manager.on_chain_error(e)
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:198, in SQLDatabaseChain._call(self, inputs, run_manager)
194 except Exception as exc:
195 # Append intermediate steps to exception, to aid in logging and later
196 # improvement of few shot prompt seeds
197 exc.intermediate_steps = intermediate_steps # type: ignore
--> 198 raise exc
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_experimental/sql/base.py:143, in SQLDatabaseChain._call(self, inputs, run_manager)
139 intermediate_steps.append(
140 sql_cmd
141 ) # output: sql generation (no checker)
142 intermediate_steps.append({"sql_cmd": sql_cmd}) # input: sql exec
--> 143 result = self.database.run(sql_cmd)
144 intermediate_steps.append(str(result)) # output: sql exec
145 else:
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:436, in SQLDatabase.run(self, command, fetch, include_columns)
425 def run(
426 self,
427 command: str,
428 fetch: Literal["all", "one"] = "all",
429 include_columns: bool = False,
430 ) -> str:
431 """Execute a SQL command and return a string representing the results.
432
433 If the statement returns rows, a string of the results is returned.
434 If the statement returns no rows, an empty string is returned.
435 """
--> 436 result = self._execute(command, fetch)
438 res = [
439 {
440 column: truncate_word(value, length=self._max_string_length)
(...)
443 for r in result
444 ]
446 if not include_columns:
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/langchain_community/utilities/sql_database.py:413, in SQLDatabase._execute(self, command, fetch)
410 elif self.dialect == "postgresql": # postgresql
411 connection.exec_driver_sql("SET search_path TO %s", (self._schema,))
--> 413 cursor = connection.execute(text(command))
414 if cursor.returns_rows:
415 if fetch == "all":
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1416, in Connection.execute(self, statement, parameters, execution_options)
1414 raise exc.ObjectNotExecutableError(statement) from err
1415 else:
-> 1416 return meth(
1417 self,
1418 distilled_parameters,
1419 execution_options or NO_OPTIONS,
1420 )
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/sql/elements.py:516, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options)
514 if TYPE_CHECKING:
515 assert isinstance(self, Executable)
--> 516 return connection._execute_clauseelement(
517 self, distilled_params, execution_options
518 )
519 else:
520 raise exc.ObjectNotExecutableError(self)
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1639, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options)
1627 compiled_cache: Optional[CompiledCacheType] = execution_options.get(
1628 "compiled_cache", self.engine._compiled_cache
1629 )
1631 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
1632 dialect=dialect,
1633 compiled_cache=compiled_cache,
(...)
1637 linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
1638 )
-> 1639 ret = self._execute_context(
1640 dialect,
1641 dialect.execution_ctx_cls._init_compiled,
1642 compiled_sql,
1643 distilled_parameters,
1644 execution_options,
1645 compiled_sql,
1646 distilled_parameters,
1647 elem,
1648 extracted_params,
1649 cache_hit=cache_hit,
1650 )
1651 if has_events:
1652 self.dispatch.after_execute(
1653 self,
1654 elem,
(...)
1658 ret,
1659 )
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1848, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
1843 return self._exec_insertmany_context(
1844 dialect,
1845 context,
1846 )
1847 else:
-> 1848 return self._exec_single_context(
1849 dialect, context, statement, parameters
1850 )
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1988, in Connection._exec_single_context(self, dialect, context, statement, parameters)
1985 result = context._setup_result_proxy()
1987 except BaseException as e:
-> 1988 self._handle_dbapi_exception(
1989 e, str_statement, effective_parameters, cursor, context
1990 )
1992 return result
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:2346, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
2344 else:
2345 assert exc_info[1] is not None
-> 2346 raise exc_info[1].with_traceback(exc_info[2])
2347 finally:
2348 del self._reentrant_error
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/base.py:1969, in Connection._exec_single_context(self, dialect, context, statement, parameters)
1967 break
1968 if not evt_handled:
-> 1969 self.dialect.do_execute(
1970 cursor, str_statement, effective_parameters, context
1971 )
1973 if self._has_events or self.engine._has_events:
1974 self.dispatch.after_cursor_execute(
1975 self,
1976 cursor,
(...)
1980 context.executemany,
1981 )
File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/sqlalchemy/engine/default.py:922, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
921 def do_execute(self, cursor, statement, parameters, context=None):
--> 922 cursor.execute(statement, parameters)
Warning: You can only execute one statement at a time.
```
**Issue:**
The Error occurs because when generating the SQLQuery, the llm_input
includes the stop character of "\nSQLResult:", so for this user query
the LLM generated response is **SELECT Time FROM men_butterfly_100m
WHERE Swimmer = 'Lance Larson';\nSQLResult:** it is required to remove
the SQLResult suffix on the llm response before executing it on the
database.
```
llm_inputs = {
"input": input_text,
"top_k": str(self.top_k),
"dialect": self.database.dialect,
"table_info": table_info,
"stop": ["\nSQLResult:"],
}
sql_cmd = self.llm_chain.predict(
callbacks=_run_manager.get_child(),
**llm_inputs,
).strip()
if SQL_RESULT in sql_cmd:
sql_cmd = sql_cmd.split(SQL_RESULT)[0].strip()
result = self.database.run(sql_cmd)
```
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Description: Fix xml parser to handle strings that only contain the root
tag
Issue: N/A
Dependencies: None
Twitter handle: N/A
A valid xml text can contain only the root level tag. Example: <body>
Some text here
</body>
The example above is a valid xml string. If parsed with the current
implementation the result is {"body": []}. This fix checks if the root
level text contains any non-whitespace character and if that's the case
it returns {root.tag: root.text}. The result is that the above text is
correctly parsed as {"body": "Some text here"}
@ale-delfino
Thank you for contributing to LangChain!
Checklist:
- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [x] PR message: **Delete this entire template message** and replace it
with the following bulleted list
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @efriis, @eyurtsev, @hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
When testing Nomic embeddings --
```
from langchain_community.embeddings import LlamaCppEmbeddings
embd_model_path = "/Users/rlm/Desktop/Code/llama.cpp/models/nomic-embd/nomic-embed-text-v1.Q4_K_S.gguf"
embd_lc = LlamaCppEmbeddings(model_path=embd_model_path)
embedding_lc = embd_lc.embed_query(query)
```
We were seeing this error for strings > a certain size --
```
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/llama.py:827, in Llama.embed(self, input, normalize, truncate, return_count)
824 s_sizes = []
826 # add to batch
--> 827 self._batch.add_sequence(tokens, len(s_sizes), False)
828 t_batch += n_tokens
829 s_sizes.append(n_tokens)
File ~/miniforge3/envs/llama2/lib/python3.9/site-packages/llama_cpp/_internals.py:542, in _LlamaBatch.add_sequence(self, batch, seq_id, logits_all)
540 self.batch.token[j] = batch[i]
541 self.batch.pos[j] = i
--> 542 self.batch.seq_id[j][0] = seq_id
543 self.batch.n_seq_id[j] = 1
544 self.batch.logits[j] = logits_all
ValueError: NULL pointer access
```
The default `n_batch` of llama-cpp-python's Llama is `512` but we were
explicitly setting it to `8`.
These need to be set to equal for embedding models.
* The embedding.cpp example has an assertion to make sure these are
always equal.
* Apparently this is not being done properly in llama-cpp-python.
With `n_batch` set to 8, if more than 8 tokens are passed the batch runs
out of space and it crashes.
This also explains why the CPU compute buffer size was small:
raw client with default `n_batch=512`
```
llama_new_context_with_model: CPU input buffer size = 3.51 MiB
llama_new_context_with_model: CPU compute buffer size = 21.00 MiB
```
langchain with `n_batch=8`
```
llama_new_context_with_model: CPU input buffer size = 0.04 MiB
llama_new_context_with_model: CPU compute buffer size = 0.33 MiB
```
We can work around this by passing `n_batch=512`, but this will not be
obvious to some users:
```
embedding = LlamaCppEmbeddings(model_path=embd_model_path,
n_batch=512)
```
From discussion w/ @cebtenzzre. Related:
https://github.com/abetlen/llama-cpp-python/issues/1189
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** The base URL for OpenAI is retrieved from the
environment variable "OPENAI_BASE_URL", whereas for langchain it is
obtained from "OPENAI_API_BASE". By adding `base_url =
os.environ.get("OPENAI_API_BASE")`, the OpenAI proxy can execute
correctly.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- **Description:** added unit tests for NotebookLoader. Linked PR:
https://github.com/langchain-ai/langchain/pull/17614
- **Issue:**
[#17614](https://github.com/langchain-ai/langchain/pull/17614)
- **Twitter handle:** @paulodoestech
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [x] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: lachiewalker <lachiewalker1@hotmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Created a Langchain Tool for OpenAI DALLE Image
Generation.
**Issue:**
[#15901](https://github.com/langchain-ai/langchain/issues/15901)
**Dependencies:** n/a
**Twitter handle:** @paulodoestech
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**: adding checking codes for calling AI model get error
in chat_models/base.py and llms/base.py
**Issue**: Sometimes the AI Model calling will get error, we should
raise it.
Otherwise, the next code 'choices.extend(response["choices"])' will
throw a "TypeError: 'NoneType' object is not iterable" error to mask the
true error.
Because 'response["choices"]' is None.
**Dependencies**: None
---------
Co-authored-by: yangkx <yangkx@asiainfo-int.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## PR message
**Description:** This PR adds a README file for the Together API in the
`libs/partners` folder of this repository. The README includes:
- A brief description of the package
- Installation instructions and class introductions
- Simple usage examples
**Issue:** #17545
This PR only contains document changes.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
1. Fix the BiliBiliLoader that can receive cookie parameters, it
requires 3 other parameters to run. The change is backward compatible.
2. Add test;
3. Add example in docs
- **Issue:** [#14213]
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** A few grammatical changes to improve readability of the
LCEL .ipynb and tidy some null characters.
**Issue:** N/A
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- [x] **PR title**: "community: Support streaming in Azure ML and few
naming changes"
- [x] **PR message**:
- **Description:** Added support for streaming for azureml_endpoint.
Also, renamed and AzureMLEndpointApiType.realtime to
AzureMLEndpointApiType.dedicated. Also, added new classes
CustomOpenAIChatContentFormatter and CustomOpenAIContentFormatter and
updated the classes LlamaChatContentFormatter and LlamaContentFormatter
to now show a deprecated warning message when instantiated.
---------
Co-authored-by: Sachin Paryani <saparan@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** At times, BaseChatMemory._get_input_output may acquire
some extra keys such as 'intermediate_steps' (agent_executor with
return_intermediate_steps set to True) and 'messages'
(agent_executor.iter with memory). In these instances, _get_input_output
can raise an error due to the presence of multiple keys. The 'output'
field should be used as the default field in these cases.
**Issue:** #16791
Previous markdown code was not working as intended, new code should add
green box around the tip so it is highlighted
Co-authored-by: Hershenson, Isaac (Extern) <isaac.hershenson.extern@bayer04.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Description: Added missing `from_documents` method to `KNNRetriever`,
providing the ability to supply metadata to LangChain `Document`s, and
to give it parity to the other retrievers, which do have
`from_documents`.
- Issue: None
- Dependencies: None
- Twitter handle: None
Co-authored-by: Victor Adan <vadan@netroadshow.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Relates to #17048
Description : Applied fix to dynamodb and elasticsearch file.
Error was : `Cannot override writeable attribute with read-only
property`
Suggestion:
instead of adding
```
@messages.setter
def messages(self, messages: List[BaseMessage]) -> None:
raise NotImplementedError("Use add_messages instead")
```
we can change base class property
`messages: List[BaseMessage]`
to
```
@property
def messages(self) -> List[BaseMessage]:...
```
then we don't need to add `@messages.setter` in all child classes.
**Description:**
While not technically incorrect, the TypeVar used for the `@beta`
decorator prevented pyright (and thus most vscode users) from correctly
seeing the types of functions/classes decorated with `@beta`.
This is in part due to a small bug in pyright
(https://github.com/microsoft/pyright/issues/7448 ) - however, the
`Type` bound in the typevar `C = TypeVar("C", Type, Callable)` is not
doing anything - classes are `Callables` by default, so by my
understanding binding to `Type` does not actually provide any more
safety - the modified annotation still works correctly for both
functions, properties, and classes.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
In this small PR I added the `template_tool_response` arg to the
`create_json_chat` function, so that users can customize this prompt in
case of need.
Thanks for your reviews!
---------
Co-authored-by: taamedag <Davide.Menini@swisscom.com>
This patch updates multiple function "run" to "invoke" in
llm_symbolic_math.ipynb.
Without this patch, you see following message.
The function `run` was deprecated in LangChain 0.1.0
and will be removed in 0.2.0. Use invoke instead.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
**Description:** Adds support for `with_structured_output` to Cohere,
which supports single function calling.
---------
Co-authored-by: BeatrixCohere <128378696+BeatrixCohere@users.noreply.github.com>
- [x] **PR title**: "community: fix baidu qianfan missing stop
parameter"
- [x] **PR message**:
- **Description: Baidu Qianfan lost the stop parameter when requesting
service due to extracting it from kwargs. This bug can cause the agent
to receive incorrect results
---------
Co-authored-by: ligang33 <ligang33@baidu.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Bug fixes in this PR:
* allows for other params such as "message" not just the input param to
the prompt for the cohere tools agent
* fixes to documents kwarg from messages
* fixes to tool_calls API call
---------
Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
- **Issue:** When passing an empty list to MergerRetriever it fails with
error: ValueError: max() arg is an empty sequence
- **Description:** We have a use case where we dynamically select
retrievers and use MergerRetriever for merging the output of the
retrievers. We faced this issue when the retriever_docs list is empty.
Adding a default 0 for cases when retriever_docs is an empty list to
avoid "ValueError: max() arg is an empty sequence". Also, changed to use
map() which is more than twice as fast compared to the current
implementation.
```
import timeit
# Sample retriever_docs with varying lengths of sublists
retriever_docs = [[i for i in range(j)] for j in range(1, 1000)]
# First code snippet
code1 = '''
max_docs = max(len(docs) for docs in retriever_docs)
'''
# Second code snippet
code2 = '''
max_docs = max(map(len, retriever_docs), default=0)
'''
# Benchmarking
time1 = timeit.timeit(stmt=code1, globals=globals(), number=10000)
time2 = timeit.timeit(stmt=code2, globals=globals(), number=10000)
# Output
print(f"Execution time for code snippet 1: {time1} seconds")
print(f"Execution time for code snippet 2: {time2} seconds")
```
- **Dependencies:** none
The previous version didn't had Voyage rerank in the init file
- [ ] **PR title**: langchain_voyageai reranker is not working
- [ ] **PR message**:
- **Description:** This fix let you run reranker from voyage
- **Issue:** Was not able to run reranker from voyage
@efriis
Due to changes in the OpenAI SDK, the previous method of setting the
OpenAI proxy in ChatOpenAI no longer works. This PR fixes this issue,
making the previous way of setting the OpenAI proxy in ChatOpenAI
effective again.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This is a follow up to #18371. These are the changes:
- New **Azure AI Services** toolkit and tools to replace those of
**Azure Cognitive Services**.
- Updated documentation for Microsoft platform.
- The image analysis tool has been rewritten to use the new package
`azure-ai-vision-imageanalysis`, doing a proper replacement of
`azure-ai-vision`.
These changes:
- Update outdated naming from "Azure Cognitive Services" to "Azure AI
Services".
- Update documentation to use non-deprecated methods to create and use
agents.
- Removes need to depend on yanked python package (`azure-ai-vision`)
There is one new dependency that is needed as a replacement to
`azure-ai-vision`:
- `azure-ai-vision-imageanalysis`. This is optional and declared within
a function.
There is a new `azure_ai_services.ipynb` notebook showing usage; Changes
have been linted and formatted.
I am leaving the actions of adding deprecation notices and future
removal of Azure Cognitive Services up to the LangChain team, as I am
not sure what the current practice around this is.
---
If this PR makes it, my handle is @galo@mastodon.social
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description**: `bigdl-llm` library has been renamed to
[`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR
migrates the `bigdl-llm` integration to `ipex-llm` .
- **Issue**: N/A. The original PR of `bigdl-llm` is
https://github.com/langchain-ai/langchain/pull/17953
- **Dependencies**: `ipex-llm` library
- **Contribution maintainer**: @shane-huang
Updated doc: docs/docs/integrations/llms/ipex_llm.ipynb
Updated test:
libs/community/tests/integration_tests/llms/test_ipex_llm.py
- **Description:** Add support for Intel Lab's [Visual Data Management
System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store
- **Dependencies:** `vdms` library which requires protobuf = "4.24.2".
There is a conflict with dashvector in `langchain` package but conflict
is resolved in `community`.
- **Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
- **Added tests:**
libs/community/tests/integration_tests/vectorstores/test_vdms.py
- **Added docs:** docs/docs/integrations/vectorstores/vdms.ipynb
- **Added cookbook:** cookbook/multi_modal_RAG_vdms.ipynb
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
If you use an embedding dist function in an eval loop, you get warned
every time. Would prefer to just check once and forget about it.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- .stream() and .astream() call on_llm_new_token, removing the need for
subclasses to do so. Backwards compatible because now we don't pass
run_manager into ._stream and ._astream
- .generate() and .agenerate() now handle `stream: bool` kwarg for
_generate and _agenerate. Subclasses handle this arg by delegating to
._stream(), now one less thing they need to do. Backwards compat because
this is an optional arg that we now never pass to the subclasses
- .generate() and .agenerate() now inspect callback handlers to decide
on a default value for stream:bool if not passed in. This auto enables
streaming when using astream_events and astream_log
- as a result of these three changes any usage of .astream_events and
.astream_log should now yield chat model stream events
- In future PRs we can update all subclasses to reflect these two things
now handled by base class, but in meantime all will continue to work
* **Description**: add `None` type for `file_path` along with `str` and
`List[str]` types.
* `file_path`/`filename` arguments in `get_elements_from_api()` and
`partition()` can be `None`, however, there's no `None` type hint for
`file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader`
currently.
* calling the function with `file_path=None` is no problem, but my IDE
annoys me lol.
* **Issue**: N/A
* **Dependencies**: N/A
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.6 and above. Adds embedders settings and embedder_name which are
now required.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
This PR adds a slightly more helpful message to a Tool Exception
```
# current state
langchain_core.tools.ToolException: Too many arguments to single-input tool
# proposed state
langchain_core.tools.ToolException: Too many arguments to single-input tool. Consider using a StructuredTool instead.
```
**Issue:** Somewhat discussed here 👉#6197
**Dependencies:** None
**Twitter handle:** N/A
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [ ] **cookbook** - update example for SalesGPT - include Stripe
Payment Link Generation
- **Description:** We updated the Jupyter notebook example with the
ability of the AI Agent to negotiate with customers and then close the
deal by generating a custom Stripe payment link.
- **Issue:** N/A
- **Dependencies:** N/a
- **Twitter handle:** @FilipMichalsky @0xtotaylor
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Filip Michalsky <filip_michalsky@g.harvard.edu>
Co-authored-by: Bagatur <baskaryan@gmail.com>
As mentioned in #18322, the current PydanticOutputParser won't work for
anyone trying to parse to pydantic v2 models. This PR adds a separate
`PydanticV2OutputParser`, as well as a `langchain_core.pydantic_v2`
namespace that will fail on import to any projects using pydantic<2.
Happy to update the docs for output parsers if this is something we're
interesting in adding.
On a separate note, I also updated `check_pydantic.sh` to detect
pydantic imports with leading whitespace and excluded the internal
namespaces. That change can be separated into its own PR if needed.
---------
Co-authored-by: Jan Nissen <jan23@gmail.com>
- **Description:** I've made a fix to a ParseError call in the
XMLOutputParser documentation.
- **Issue:** None
- **Dependencies:** None
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
PebbloSafeLoader: Add support for non-file-based Document Loaders
This pull request enhances PebbloSafeLoader by introducing support for
several non-file-based Document Loaders. With this update,
PebbloSafeLoader now seamlessly integrates with the following loaders:
- GoogleDriveLoader
- SlackDirectoryLoader
- Unstructured EmailLoader
**Issue:** NA
**Dependencies:** - None
**Twitter handle:** @Raj__725
---------
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Patch potential XML vulnerability CVE-2024-1455
This patches a potential XML vulnerability in the XMLOutputParser in
langchain-core. The vulnerability in some situations could lead to a
denial of service attack.
At risk are users that:
1) Running older distributions of python that have older version of
libexpat
2) Are using XMLOutputParser with an agent
3) Accept inputs from untrusted sources with this agent (e.g., endpoint
on the web that allows an untrusted user to interact wiith the parser)
Introduction
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit designed to accelerate GenAI/LLM everywhere
with the optimal performance of Transformer-based models on various
Intel platforms
Description
adding ITREX runtime embeddings using intel-extension-for-transformers.
added mdx documentation and example notebooks
added embedding import testing.
---------
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [x] **PR title**: "experimental: Enhance LLMGraphTransformer with
async processing and improved readability"
- [x] **PR message**:
- **Description:** This pull request refactors the `process_response`
and `convert_to_graph_documents` methods in the LLMGraphTransformer
class to improve code readability and adds async versions of these
methods for concurrent processing.
The main changes include:
- Simplifying list comprehensions and conditional logic in the
process_response method for better readability.
- Adding async versions aprocess_response and
aconvert_to_graph_documents to enable concurrent processing of
documents.
These enhancements aim to improve the overall efficiency and
maintainability of the `LLMGraphTransformer` class.
- **Issue:** N/A
- **Dependencies:** No additional dependencies required.
- **Twitter handle:** @jjovalle99
- [x] **Add tests and docs**: N/A (This PR does not introduce a new
integration)
- [x] **Lint and test**: Ran make format, make lint, and make test from
the root of the modified package(s). All tests pass successfully.
Additional notes:
- The changes made in this PR are backwards compatible and do not
introduce any breaking changes.
- The PR touches only the `LLMGraphTransformer` class within the
experimental package.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Implemented try-except block for
`GCSDirectoryLoader`. Reason: Users processing large number of
unstructured files in a folder may experience many different errors. A
try-exception block is added to capture these errors. A new argument
`use_try_except=True` is added to enable *silent failure* so that error
caused by processing one file does not break the whole function.
- **Issue:** N/A
- **Dependencies:** no new dependencies
- **Twitter handle:** timothywong731
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding oracle autonomous database document loader
integration. This will allow users to connect to oracle autonomous
database through connection string or TNS configuration.
https://www.oracle.com/autonomous-database/
- **Issue:** None
- **Dependencies:** oracledb python package
https://pypi.org/project/oracledb/
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Unit test and doc are added.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Currently the semantic_configurations are not used
when creating an AzureSearch instance, instead creating a new one with
default values. This PR changes the behavior to use the passed
semantic_configurations if it is present, and the existing default
configuration if not.
---------
Co-authored-by: Adam Law <adamlaw@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
DefusedXML is causing parsing errors on previously functional code with
the 0.7.x versions. These do not seem to support newer version of python
well. 0.8.x has only been released as rc, so we're not going to to use
it in the core package
* Adds support for `additional_kwargs` in `get_cohere_chat_request`
* This functionality passes in Cohere SDK specific parameters from
`BaseMessage` based classes to the API
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **Add len() implementation to Chroma**: "package: community"
- [x] **PR message**:
- **Description:** add an implementation of the __len__() method for the
Chroma vectostore, for convenience.
- **Issue:** no exposed method to know the size of a Chroma vectorstore
- **Dependencies:** None
- **Twitter handle:** lowrank_adrian
- [x] **Add tests and docs**
- [x] **Lint and test**
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Be more explicit with the `model_kwargs` and
`encode_kwargs` for `HuggingFaceEmbeddings`.
- **Issue:** -
- **Dependencies:** -
I received some reports by my users that they didn't realise that you
could change the default `batch_size` with `HuggingFaceEmbeddings`,
which may be attributed to how the `model_kwargs` and `encode_kwargs`
don't give much information about what you can specify.
I've added some parameter names & links to the Sentence Transformers
documentation to help clear it up. Let me know if you'd rather have
Markdown/Sphinx-style hyperlinks rather than a "bare URL".
- Tom Aarsen
So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.
This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade:
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.
Twitter handle: https://twitter.com/mwmajewsk
### Issue
Recently, the new `allow_dangerous_deserialization` flag was introduced
for preventing unsafe model deserialization that relies on pickle
without user's notice (#18696). Since then some LLMs like Databricks
requires passing in this flag with true to instantiate the model.
However, this breaks existing functionality to loading such LLMs within
a chain using `load_chain` method, because the underlying loader
function
[load_llm_from_config](f96dd57501/libs/langchain/langchain/chains/loading.py (L40))
(and load_llm) ignores keyword arguments passed in.
### Solution
This PR fixes this issue by propagating the
`allow_dangerous_deserialization` argument to the class loader iff the
LLM class has that field.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Create a Class which allows to use the "text2vec" open source embedding
model.
It should install the model by running 'pip install -U text2vec'.
Example to call the model through LangChain:
from langchain_community.embeddings.text2vec import Text2vecEmbeddings
embedding = Text2vecEmbeddings()
bookend.embed_documents([
"This is a CoSENT(Cosine Sentence) model.",
"It maps sentences to a 768 dimensional dense vector space.",
])
bookend.embed_query(
"It can be used for text matching or semantic search."
)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Description
This PR proposes a modification to the `libs/langchain/dev.Dockerfile`
configuration to copy the `libs/langchain/poetry.lock` into the working
directory. The change aims to address the issue where the Poetry install
command, the last command in the `dev.Dockerfile`, takes excessively
long hours, and to ensure the reproducibility of the poetry environment
in the devcontainer.
## Problem
The `dev.Dockerfile`, prepared for development environments such as
`.devcontainer`, encounters an unending dependency resolution when
attempting the Poetry installation.
### Steps to Reproduce
Execute the following build command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The Docker build process gets stuck at the following step, which, in my
experience, did not conclude even after an entire night:
```
=> [langchain-dev-dependencies 4/6] COPY libs/community/ ../community/ 0.9s
=> [langchain-dev-dependencies 5/6] COPY libs/text-splitters/ ../text-splitters/ 0.0s
=> [langchain-dev-dependencies 6/6] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 12.3s
=> => # Updating dependencies
=> => # Resolving dependencies...
```
### Expected Behavior
The Docker build completes in a realistic timeframe. By applying this
PR, the build finishes within a few minutes.
### Analysis
The complexity of LangChain's dependencies has reached a point where
Poetry is required to resolve dependencies akin to threading a needle.
Consequently, poetry install fails to complete in a practical timeframe.
## Solution
The solution for dependency resolution is already recorded in
`libs/langchain/poetry.lock`, so we can use it. When copying
`project.toml` and `poetry.toml`, the `poetry.lock` located in the same
directory should also be copied.
```diff
# Copy only the dependency files for installation
-COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml ./
+COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml libs/langchain/poetry.lock ./
```
## Note
I am not intimately familiar with the historical context of the
`dev.Dockerfile` and thus do not know why `poetry.lock` has not been
copied until now. It might have been an oversight, or perhaps dependency
resolution used to complete quickly even without the `poetry.lock` file
in the past. However, if there are deliberate reasons why copying
`poetry.lock` is not advisable, please just close this PR.
Description:
this change fixes the pydantic validation error when looking up from
GPTCache, the `ChatOpenAI` class returns `ChatGeneration` as response
which is not handled.
use the existing `_loads_generations` and `_dumps_generations` functions
to handle it
Trace
```
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/development/scripts/chatbot-postgres-test.py", line 90, in <module>
print(llm.invoke("tell me a joke"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 166, in invoke
self.generate_prompt(
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 544, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 408, in generate
raise e
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 398, in generate
self._generate_with_cache(
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 585, in _generate_with_cache
cache_val = llm_cache.lookup(prompt, llm_string)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 807, in lookup
return [
^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_community/cache.py", line 808, in <listcomp>
Generation(**generation_dict) for generation_dict in json.loads(res)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__
super().__init__(**kwargs)
File "/home/theinhumaneme/Documents/NebuLogic/conversation-bot/venv/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for Generation
type
unexpected value; permitted: 'Generation' (type=value_error.const; given=ChatGeneration; permitted=('Generation',))
```
Although I don't seem to find any issues here, here's an
[issue](https://github.com/zilliztech/GPTCache/issues/585) raised in
GPTCache. Please let me know if I need to do anything else
Thank you
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Few-Shot prompt template may use a `SemanticSimilarityExampleSelector`
that in turn uses a `VectorStore` that does I/O operations.
So to work correctly on the event loop, we need:
* async methods for the `VectorStore` (OK)
* async methods for the `SemanticSimilarityExampleSelector` (this PR)
* async methods for `BasePromptTemplate` and `BaseChatPromptTemplate`
(future work)
This is a small breaking change but I think it should be done as:
* No external dependency needs to be installed anymore for the default
to work
* It is vendor-neutral
This patch updates function "run" to "invoke" in fake_llm.ipynb. Without
this patch, you see following warning.
LangChainDeprecationWarning: The function `run` was deprecated in
LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.
@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
- **Description:** The `semantic_hybrid_search_with_score_and_rerank`
method of `AzureSearch` contains a hardcoded field name "metadata" for
the document metadata in the Azure AI Search Index. Adding such a field
is optional when creating an Azure AI Search Index, as other snippets
from `AzureSearch` test for the existence of this field before trying to
access it. Furthermore, the metadata field name shouldn't be hardcoded
as "metadata" and use the `FIELDS_METADATA` variable that defines this
field name instead. In the current implementation, any index without a
metadata field named "metadata" will yield an error if a semantic answer
is returned by the search in
`semantic_hybrid_search_with_score_and_rerank`.
- **Issue:** https://github.com/langchain-ai/langchain/issues/18731
- **Prior fix to this bug:** This bug was fixed in this PR
https://github.com/langchain-ai/langchain/pull/15642 by adding a check
for the existence of the metadata field named `FIELDS_METADATA` and
retrieving a value for the key called "key" in that metadata if it
exists. If the field named `FIELDS_METADATA` was not present, an empty
string was returned. This fix was removed in this PR
https://github.com/langchain-ai/langchain/pull/15659 (see
ed1ffca911#).
@lz-chen: could you confirm this wasn't intentional?
- **New fix to this bug:** I believe there was an oversight in the logic
of the fix from
[#1564](https://github.com/langchain-ai/langchain/pull/15642) which I
explain below.
The `semantic_hybrid_search_with_score_and_rerank` method creates a
dictionary `semantic_answers_dict` with semantic answers returned by the
search as follows.
5c2f7e6b2b/libs/community/langchain_community/vectorstores/azuresearch.py (L574-L581)
The keys in this dictionary are the unique document ids in the index, if
I understand the [documentation of semantic
answers](https://learn.microsoft.com/en-us/azure/search/semantic-answers)
in Azure AI Search correctly. When the method transforms a search result
into a `Document` object, an "answer" key is added to the document's
metadata. The value for this "answer" key should be the semantic answer
returned by the search from this document, if such an answer is
returned. The match between a `Document` object and the semantic answers
returned by the search should be done through the unique document id,
which is used as a key for the `semantic_answers_dict` dictionary. This
id is defined in the search result's field named `FIELDS_ID`. I added a
check to avoid any error in case no field named `FIELDS_ID` exists in a
search result (which shouldn't happen in theory).
A benefit of this approach is that this fix should work whether or not
the Azure AI Search Index contains a metadata field.
@levalencia could you confirm my analysis and test the fix?
@raunakshrivastava7 do you agree with the fix?
Thanks for the help!
### Prem SDK integration in LangChain
This PR adds the integration with [PremAI's](https://www.premai.io/)
prem-sdk with langchain. User can now access to deployed models
(llms/embeddings) and use it with langchain's ecosystem. This PR adds
the following:
### This PR adds the following:
- [x] Add chat support
- [X] Adding embedding support
- [X] writing integration tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] writing unit tests
- [X] writing tests for chat
- [X] writing tests for embedding
- [X] Adding documentation
- [X] writing documentation for chat
- [X] writing documentation for embedding
- [X] run `make test`
- [X] run `make lint`, `make lint_diff`
- [X] Final checks (spell check, lint, format and overall testing)
---------
Co-authored-by: Anindyadeep Sannigrahi <anindyadeepsannigrahi@Anindyadeeps-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** PgVector class always runs "create extension" on init
and this statement crashes on ReadOnly databases (read only replicas).
but wierdly the next create collection etc work even in readOnly
databases
- **Dependencies:** no new dependencies
- **Twitter handle:** @VenOmaX666
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
When run command langchain app new my-app, i get this error:
File
"/home/mauricio/.local/lib/python3.8/site-packages/langchain_cli/utils/pyproject.py",
line 15, in <module>
pyproject_toml: Path, local_editable_dependencies: Iterable[tuple[str,
Path]]
TypeError: 'type' object is not subscriptable
This PR fix the error.
The existing default list of separators for the `RecursiveTextSplitter`
assumes spaces are word boundaries. Some languages [don't use spaces
between
words](https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries)
(Chinese, Japanese, Thai, Burmese).
This PR extends the documentation to explain how to cater for those
languages by adding additional punctuation to the separators and
zero-width spaces which are used by some typesetters and will assist the
splitter to not split in words.
Ideally, **these separators could be a constant in the module** but for
now, defining them in the documentation is a start.
**Description:**
- minor PR to speed up onboarding by not trying to add a dataset, if a
model is already present.
- replace batch publish API with streaming when single events are
published.
**Dependencies:** any dependencies required for this change
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
This PR aims to enhance the documentation for TiDB integration, driven
by feedback from our users. It provides detailed introductions to key
features, ensuring developers can fully leverage TiDB for AI application
development.
**Description:**
Expanding version in all the Confluence API calls so to get when the
page was last modified/created in all cases.
**Issue:** #12812
**Twitter handle:** zzste
This PR adds code to make sure that the correct base URL is being
created for the Azure Cognitive Search retriever. At the moment an
incorrect base URL is being generated. I think this is happening because
the original code was based on a depreciated API version. No
dependencies need to be added. I've also added more context to the test
doc strings.
I should also note that ACS is now Azure AI Search. I will open a
separate PR to make these changes as that would be a breaking change and
should potentially be discussed.
Twitter: @marlene_zw
- No new tests added, however the current ACS retriever tests are now
passing when I run them.
- Code was linted.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** This commit introduces support for the newly
available GPU index types introduced in Milvus 2.4 within the LangChain
project's `milvus.py`. With the release of Milvus 2.4, a range of
GPU-accelerated index types have been added, offering enhanced search
capabilities and performance optimizations for vector search operations.
This update ensures LangChain users can fully utilize the new
performance benefits for vector search operations.
- Reference: https://milvus.io/docs/gpu_index.md
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Corrected a broken link within the semantic-chunker.ipynb notebook,
ensuring that users can access the referenced resource.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This patch fixes the #18022 issue, converting the SimSIMD internal
zero-copy outputs to NumPy.
I've also noticed, that oftentimes `dtype=np.float32` conversion is used
before passing to SimSIMD. Which numeric types do LangChain users
generally care about? We support `float64`, `float32`, `float16`, and
`int8` for cosine distances and `float16` seems reasonable for
practically any kind of embeddings and any modern piece of hardware, so
we can change that part as well 🤗
- **Description:** Added support for lower-case and mixed-case names
The names for tables and columns previouly had to be UPPER_CASE.
With this enhancement, also lower_case and MixedCase are supported,
- **Issue:** N/A
- **Dependencies:** no new dependecies added
- **Twitter handle:** @sapopensource
- **Description:** Since the implicit `__call__` has been deprecated in
favor of `invoke`, the local_llms article also needed to be updated.
This article was my introduction to Lanchain, and as it was helpful in
getting me setup with running LLMs locally, it is nice to not have any
warnings when running the example code. With this change, the warnings
go away when running the example code.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** clarkerican
Previous PR passed _parser attribute which apparently is not meant to be
used by user code and causes non deterministic failures on CI when
testing the transform and a transform methods. Reverting this change
temporarily.
This mitigates a security concern for users still using older versions of libexpat that causes an attacker to compromise the availability of the system if an attacker manages to surface malicious payload to this XMLParser.
**Description:** This change passes through `batch_size` to
`add_documents()`/`aadd_documents()` on calls to `index()` and
`aindex()` such that the documents are processed in the expected batch
size.
**Issue:** #19415
**Dependencies:** N/A
**Twitter handle:** N/A
Updated `HuggingFacePipeline` docs to be in sync with list of supported
tasks, including translation.
- [x] **PR title**: "community: Update docs for `HuggingFacePipeline`"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** Update docs for `HuggingFacePipeline`, was earlier
missing `translation` as a valid task
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** None
- [x] **Add tests and docs**:
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
**Description:**
This PR adds [Dappier](https://dappier.com/) for the chat model. It
supports generate, async generate, and batch functionalities. We added
unit and integration tests as well as a notebook with more details about
our chat model.
**Dependencies:**
No extra dependencies are needed.
- **Description:** [CVE
2024-21503](https://www.cve.org/CVERecord?id=CVE-2024-21503) was
recently identified. The python linter "black" suffers from a potential
Regex-related denial of service attack. Updated version from the
vulnerable 24.2.0 to the patched 24.3.0.
- **Issue:** N/A
- **Dependencies:** The 'black' package in both `langchain` (top-level)
and `templates/python-lint`.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
DuckDB has a cosine similarity function along list and array data types,
which can be used as a vector store.
- **Description:** The latest version of DuckDB features a cosine
similarity function, which can be used with its support for list or
array column types. This PR surfaces this functionality to langchain.
- **Dependencies:** duckdb 0.10.0
- **Twitter handle:** @igocrite
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Update s3_file.py to use arguments **mode** and
**post_processors** from the base class **UnstructuredBaseLoader** to
include more metadata about the files from the S3 bucket such as
*'page_number', 'languages'* etc.
**Issue:** NA
**Dependencies:** None
**Twitter handle:** preak95
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Looking at tokens / page of our docs, we see a few outliers:
<img width="761" alt="image"
src="https://github.com/langchain-ai/langchain/assets/122662504/677aa2d6-0a29-45e4-882a-db2bbf46d02b">
It is due to non-rendering images in one case, and output spamming.
Clean these, along with other cases of excessing output spamming in
docs.
All get sucked into chat-langchain for retrieval.
Thank you for contributing to LangChain!
bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo.
Change to the correct address.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Update module imports for Fireworks documentation
**Issue:** Module imports not present or in incorrect location
**Dependencies:** None
**Description:** Update import paths and move to lcel for llama.cpp
examples
**Issue:** Update import paths to reflect package refactoring and move
chains to LCEL in examples
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Invoke callback prior to yielding token for BaseOpenAI
& OpenAIChat
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
**Description:** Invoke callback prior to yielding token for Fireworks
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
**Description:** Moving FireworksEmbeddings documentation to the
location docs/integration/text_embedding/ from langchain_fireworks/docs/
**Issue:** FireworksEmbeddings documentation was not in the correct
location
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I have a small dataset, and I tried to use docarray:
``DocArrayHnswSearch ``. But when I execute, it returns:
```bash
raise ImportError(
ImportError: Could not import docarray python package. Please install it with `pip install "langchain[docarray]"`.
```
Instead of docarray it needs to be
```bash
docarray[hnswlib]
```
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
RecursiveUrlLoader does not currently provide an option to set
`base_url` other than the `url`, though it uses a function with such an
option.
For example, this causes it unable to parse the
`https://python.langchain.com/docs`, as it returns the 404 page, and
`https://python.langchain.com/docs/get_started/introduction` has no
child routes to parse.
`base_url` allows setting the `https://python.langchain.com/docs` to
filter by, while the starting URL is anything inside, that contains
relevant links to continue crawling.
I understand that for this case, the docusaurus loader could be used,
but it's a common issue with many websites.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Delete MistralAIEmbeddings usage document from folder
partners/mistralai/docs
**Issue:** The document is present in the folder docs/docs
**Dependencies:** None
**Description:** Invoke callback prior to yielding token for llama.cpp
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)
**Dependencies:** None
```python
from langchain.agents import tool
from langchain_mistralai import ChatMistralAI
llm = ChatMistralAI(model="mistral-large-latest", temperature=0)
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
tools = [get_word_length]
llm_with_tools = llm.bind_tools(tools)
llm_with_tools.invoke("how long is the word chrysanthemum")
```
currently raises
```
AttributeError: 'dict' object has no attribute 'model_dump'
```
Same with `.with_structured_output`
```python
from langchain_mistralai import ChatMistralAI
from langchain_core.pydantic_v1 import BaseModel
class AnswerWithJustification(BaseModel):
"""An answer to the user question along with justification for the answer."""
answer: str
justification: str
llm = ChatMistralAI(model="mistral-large-latest", temperature=0)
structured_llm = llm.with_structured_output(AnswerWithJustification)
structured_llm.invoke("What weighs more a pound of bricks or a pound of feathers")
```
This appears to fix.
For prompt templates with only 1 variable (common in e.g.,
MessageGraph), it's convenient to wrap the incoming object in the
variable before formatting.
The downside of this, of course, would be that some number of
invocations will successfully format when the user may have intended to
format it properly before
This is a basic VectorStore implementation using an in-memory dict to
store the documents.
It doesn't need any extra/optional dependency as it uses numpy which is
already a dependency of langchain.
This is useful for quick testing, demos, examples.
Also it allows to write vendor-neutral tutorials, guides, etc...
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example:
- libs/core/langchain_core/messages/__init__.py : AnyMessage,
MessageLikeRepresentation, get_buffer_string(), messages_from_dict(),
...
Opinionated: __init__.py is not a typical place to define artifacts.
Moved artifacts from __init__ into utils.py.
Added `MessageLikeRepresentation` to __all__ since it is used outside of
`messages`, for example, in
`libs/core/langchain_core/language_models/base.py`
Added `_message_from_dict` to __all__ since it is used outside of
`messages`(???) I would add `message_from_dict` (without underscore) as
an alias. Please, advise.
Covered by tests in
`libs/core/tests/unit_tests/language_models/chat_models/test_base.py`,
`libs/core/tests/unit_tests/language_models/llms/test_base.py` and
`libs/core/tests/unit_tests/runnables/test_runnable_events.py`
**Description:**
Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.
I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.
The motivation is to minimize compute/data loss when problems occur:
* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.
**Issue:**
Implements enhancement #18026.
**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
## Description
Semantic Cache can retrieve noisy information if the score threshold for
the value is too low. Adding the ability to set a `score_threshold` on
cache construction can allow for less noisy scores to appear.
- [x] **Add tests and docs**
1. Added tests that confirm the `score_threshold` query is valid.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
The `retryFailed` option will retry all failed links, once at a time
with the goal of not triggering bot protection
`microsoft.com` is now hard coded into the whitelist
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example: libs/core/langchain_core/globals/__init__.py :
`set_verbose` `get_llm_cache`, `set_llm_cache`, ...
And the whole `langchain_core.globals` namespace is not visible in the
API Reference. The refactoring is just file renaming.
- **Description:** Enhanced the `BaseChatModel` to support an
`Optional[Union[bool, BaseCache]]` type for the `cache` attribute,
allowing for both boolean flags and custom cache implementations.
Implemented logic within chat model methods to utilize the provided
custom cache implementation effectively. This change aims to provide
more flexibility in caching strategies for chat models.
- **Issue:** Implements enhancement request #17242.
- **Dependencies:** No additional dependencies required for this change.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- **PR message**:
- **Description:** Update the slack toolkit doc to use an agent that
support multiple inputs. Using ReAct agent will cause a ValidationError
when invoking the slack tools. This is because the agent return a string
like `'{"channel": "C05LDF54S21", "message": "Hello, world!"}'` but the
ReAct agent does not support multiple inputs.
- **Issue:** This is related to this
[Discussion#18083](https://github.com/langchain-ai/langchain/discussions/18083)
- **Dependencies:** No dependencies required
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Changing OpenAIAssistantRunnable.create_assistant to send the `file_ids`
parameter to openai.beta.assistants.create
Co-authored-by: Frederico Wu <fred.diaswu@coxautoinc.com>
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
When creating a new index, if we use a retrieval strategy that expects a
model to be deployed in Elasticsearch, check if a model with this name
is indeed deployed before creating an index. This lowers the probability
to get into a state in which an index was created with a faulty model
ID, which cannot be overwritten any more (the index has to manually be
deleted).
Add `keep_alive` parameter to control how long the model will stay
loaded into memory with Ollama。
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**
This PR adds some missing details from the "Split by tokens" page in the
documentation. Specifically:
- The `.from_tiktoken_encoder()` class methods for both the
`CharacterTextSplitter` and `RecursiveCharacterTextSplitter` default to
the old `gpt-2` encoding. I've added a comment to suggest specifying
`model_name` or `encoding`
- The docs didn't mention that the `from_tiktoken_encoder()` class
method passes additional kwargs down to the constructor of the splitter.
I only discovered this by reading the source code
- Added an example of using the `.from_tiktoken_encoder()` class method
with `RecursiveCharacterTextSplitter` which is the recommended approach
for most scenarios above `CharacterTextSplitter`
- Added a warning that `TokenTextSplitter` can split characters which
have multiple tokens (e.g. 猫 has 3 cl100k_base tokens) between multiple
chunks which creates malformed Unicode strings and should not be used in
these situations.
Side note: I think the default argument of `gpt2` for
`.from_tiktoken_encoder()` should be updated?
**Twitter handle** anthonypjshaw
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Issue : For functions which have an argument with the name 'title', the
convert_pydantic_to_openai_function generates an incorrect output and
omits the argument all together. This is because the _rm_titles function
removes all instances of the the key 'title' from the output.
Description : Updates the _rm_titles function to check the presence of
the 'type' key as well before removing the 'title' key. As the title key
that we wish to omit always has a type key along with it.
Potential gap if there is a function defined which has both title and
key as argument names, in which case this would fail. Maybe we could set
a filter on the function argument names and reject those with keyword
argument names.
No dependencies. Passed all tests.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** There was no formatter for mistral models for Azure
ML endpoints. Adding that, plus a configurable timeout (it was hard
coded before)
- **Dependencies:** none
- **Twitter handle:** @tjaffri @docugami
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "Updating format of pip install in two files of
docs/cookbook"
- pip install is not reflecting properly in some of the files in
cookbook
- Example:
[docs/expression_language/cookbook/sql_db](https://python.langchain.com/docs/expression_language/cookbook/sql_db)
- [x] **PR message**: Updating format of pip install in two files of
docs/cookbook
- **Description:** a description of the change
- **Issue:** #19197
- Note - let's do squash merge for the PR
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
add **kwargs in add_documents for upsert, to make it use for other
argument also.
Lets use this, it was unused as of now.
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
## Description
This PR modifies the settings in `libs/langchain/dev.Dockerfile` to
ensure that the `text-splitters` directory is copied before the poetry
installation process begins.
Without this modification, the `docker build` command fails for
`dev.Dockerfile`, preventing the setup of some development environments,
including `.devcontainer`.
## Bug Details
### Repro
Run the following command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The docker build command fails, raising the following error:
```
...
=> [langchain-dev-dependencies 4/5] COPY libs/community/ ../community/ 0.4s
=> ERROR [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 1.1s
------
> [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
#13 0.970
#13 0.970 Directory ../text-splitters does not exist
------
executor failed running [/bin/sh -c poetry install --no-interaction --no-ansi --with dev,test,docs]: exit code: 1
```
### Expected Behavior
The `docker build` command successfully completes without the poetry
error.
### Analysis
The error occurs because the `text-splitters` directory is not copied
into the build environment, unlike the other packages under the `libs`
directory. I suspect that the `COPY` setting was overlooked since
`text-splitters` was separated in a recent PR.
## Fix
Add the following lines to the `libs/langchain/dev.Dockerfile`:
```dockerfile
# Copy the text-splitters library for installation
COPY libs/text-splitters/ ../text-splitters/
```
- **Description:** Tests fail to do value lookup because it does not
specify the index name
- **Issue:** the issue # Failing integration test
- [x] **Add tests and docs**: Tests now pass
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
The root run id (~trace id's) is useful for assigning feedback, but the
current recommended approach is to use callbacks to retrieve it, which
has some drawbacks:
1. Doesn't work for streaming until after the first event
2. Doesn't let you call other endpoints with the same trace ID in
parallel (since you have to wait until the call is completed/started to
use
This PR lets you provide = "run_id" in the runnable config.
Couple considerations:
1. For batch calls, we split the trace up into separate trees (to permit
better rendering). We keep the provided run ID for the first one and
generate a unique one for other elements of the batch.
2. For nested calls, the provided ID is ONLY used on the top root/trace.
### Example Usage
```
chain.invoke("foo", {"run_id": uuid.uuid4()})
```
Classes are missed in __all__ and in different places of __init__.py
- BaichuanLLM
- ChatDatabricks
- ChatMlflow
- Llamafile
- Mlflow
- Together
Added classes to __all__. I also sorted __all__ list.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "community: deprecate DocugamiLoader"
- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader
---------
Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
## Description
* In memory cache easily gets out of sync with the server cache, so we
will remove it entirely to reduce the issues around invalidated caches.
## Dependencies
None
- [x] If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
Returning the embedding is not necessary in the vector search
functionality unless specified as a debugging step. This change defaults
the behavior such that the server _only_ returns the embedding key if
explicitly requested, such as in the case of
`max_marginal_relevance_search`.
- [x] **Add tests and docs**: If you're adding a new integration, please
include
* Added `test_from_documents_no_embedding_return`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- Description:
- Updated the import path for `StreamingStdOutCallbackHandler` in the
streaming response example within `huggingface_endpoint.py`. This change
corrects the import statement to reflect the actual location of
`StreamingStdOutCallbackHandler` in
`langchain_core.callbacks.streaming_stdout`.
- Issue:
- None
- Dependencies:
- No additional dependencies are required for this change.
- Twitter handle:
- None
## Note:
I have tested this change locally and confirmed that the
`StreamingStdOutCallbackHandler` works as expected with the updated
import path. This PR does not require the addition of new tests since it
is a correction to documentation/examples rather than functional code.
- [x] **Support for translation**: "community: Add support for
translation in `HuggingFacePipeline`"
- [x] **Add support for translation in `HuggingFacePipeline`**:
- **Description:** Add support for translation in `HuggingFacePipeline`,
which earlier used to support only text summarization and generation.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** None
- Description:
- This pull request is to fix a bug where page numbers were not set
correctly. In the current code, all chunks share the same metadata
object doc_metadata, so the page number is set with the same value for
all documents. To fix this, I changed to using separate metadata objects
for each chunk.
- Issue:
- None
- Dependencies:
- No additional dependencies are required for this change.
- Twitter handle:
- @eycjur
- Test
- Even if it's not a bug, there are cases where everything ends up with
the same number of pages, so it's very difficult for me to write
integration tests.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
I think that cell type for pip command may be 'code'.
Please check, thank you :)
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Line `from langchain_openai import ChatOpenAI` is put twice in Get
Started / Serving with LangServe section.
Imports on lines 559 and 566 are identical
Co-authored-by: Vitalii <vitalii@localhost>
**Description:**
#18040 forces `fastembed>2.0`, and this causes dependency conflicts with
the new `unstructured` package (different `onnxruntime`). There may be
other dependency conflicts.. The only way to use
`langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But
new `unstructured` contains many fixes.
This PR allows to use both `fastembed` `v1` and `v2`.
How to reproduce:
`pyproject.toml`:
```toml
[tool.poetry]
name = "depstest"
version = "0.0.0"
description = "test"
authors = ["<dev@example.org>"]
[tool.poetry.dependencies]
python = ">=3.10,<3.12"
langchain-community = "^0.0.28"
fastembed = "^0.2.0"
unstructured = {extras = ["pdf"], version = "^0.12"}
```
```bash
$ poetry lock
```
Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
- **Description:** This modification addresses the issue of mutable
default parameters in functions. In the original code, the `chunks`
parameter is defaulted to a list containing an empty dictionary, which
is mutable. Since default parameters in Python are evaluated only once
at function definition time, modifications to the parameter would
persist across future calls. By changing the default to `None` and
checking/initializing within the function, a new list is created for
each call, thus avoiding potential issues.
---------
Co-authored-by: sixiang <sixiang@lixiang.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Update stales link in Together AI documentation
**Issue:** Some links pointed to legacy webpages on the Together AI
website
**Dependencies:** None
**Lint and test**: `make format`, `make lint` were run
**Description:** Update docstring of Together class to show example and
update API URL
**Issue:** Improves usability
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
- [ ] **PR title**: "docs: correction in
"https://github.com/langchain-ai/langchain/blob/master/docs/docs/get_started/quickstart.mdx",
line 289".
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**:
- Corrected the spelling mistake
- #18981
Fixed Grammar in Considerations of Model I/O Concepts documentation page
- Update concepts.mdx
Page Link:
https://python.langchain.com/docs/modules/model_io/concepts#considerations
- **Description:** Fixed Grammar in Considerations of Model I/O
Documentation Page
- **Issue:** "to work well with the model are you using" # "to work well
with the model you are using"
- **Dependencies:** None
- **Twitter handle:** @Anubhav_Madhav
(https://twitter.com/Anubhav_Madhav)
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Description
This PR addresses a documentation issue in the
[Indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
page. Specifically, it corrects the execution results of the Jupyter
notebook under the
[Source](https://python.langchain.com/docs/modules/data_connection/indexing#source)
section, which were broken as detailed below.
## Problem
The execution results following the statement, `This should delete the
old versions of documents associated with doggy.txt source and replace
them with the new versions.`, appear to be incorrect, as described
below.
### Current Behavior
- For some reason, the `index` function fails to add the new content of
`doggy.txt`. Although it deletes the document objects associated with
the `doggy.txt` source, it does not add the objects in
`changed_doggy_docs`. Consequently, the execution result displays
`num_added: 0`.
- This unexpected behavior also impacts the results of
`vectorstore.similarity_search("dog", k=30)`, showing only the contents
of `kitty.txt`. It appears as though the contents of `doggy.txt` have
been completely removed from the index:
```
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```
### Expected Behavior
- The `index` function should successfully add the objects in
`changed_doggy_docs` after removing the old content of `doggy.txt`. The
anticipated execution result is `num_added: 2`.
- Subsequently, the modified content of `doggy.txt` should appear in the
results of `vectorstore.similarity_search("dog", k=30)` as follows:
```
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```
## Fix
I reran `docs/docs/modules/data_connection/indexing.ipynb` and have
included the diff in this PR.
- Description: [a description of the change] Add in code documentation
to core Runnable with_fallbacks method (docs only)
- Issue: the issue #18804
@eyurtsev PTAL
Docs fix: replace column name search with source.
The Xata integration expects metadata column named "source".
The docs suggest the name "search", which if used, yields the following
error:
```
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/xata.py", line 95, in _add_vectors
raise Exception(f"Error adding vectors to Xata: {r.status_code} {r}")
Exception: Error adding vectors to Xata: 400 {'errors': [{'status': 400, 'message': 'invalid record: column [source]: column not found'}]}
```
**Description:** Many LLM steps complete in sub-second duration, which
can lead to non-collection of duration field for Fiddler. This PR
updates duration from seconds to milliseconds.
**Issue:** [INTERNAL] FDL-17568
**Dependencies:** NA
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
- **Description:** Handling fallbacks when calling async streaming for a
LLM that doesn't support it.
- **Issue:** #18920
- **Twitter handle:**@maximeperrin_
---------
Co-authored-by: Maxime Perrin <mperrin@doing.fr>
**Description:** This PR adds updates the fiddler events schema to also
pass user feedback, and llm status to fiddler
**Tickets:** [INTERNAL] FDL-17559
**Dependencies:** NA
**Twitter handle:** behalder
Co-authored-by: Barun Halder <barun@fiddler.ai>
- **Description:** This modification adds pydantic input definition for
sql_database tools. This helps for function calling capability in
LangGraph. Since actions nodes will usually check for the args_schema
attribute on tools, This update should make these tools compatible with
it (only implemented on the InfoSQLDatabaseTool)
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** juanfe8881
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
- **Description:** add async tests, add tokenize support
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
- **Tag maintainer:**
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally -> ✅
Please make sure integration_tests passing locally -> ✅
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR makes the following updates in the pgvector database:
1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
**PR message**: ***Delete this entire checklist*** and replace with
- **Description:** [a description of the change](docs: Add in code
documentation to core Runnable assign method)
- **Issue:** the issue #18804
Fixed typo in line 661 - from 'mimimize' to 'minimize
- [ ] **PR message**:
- **Description:** Fixed typo in streaming document - change 'mimimize'
to 'minimize
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** When calling the `_stream_log_implementation` from
the `astream_log` method in the `Runnable` class, it is not handing over
the `kwargs` argument. Therefore, even if i want to customize APIHandler
and implement additional features with additional arguments, it is not
possible. Conversely, the `astream_events` method normally handing over
the `kwargs` argument.
- **Issue:** https://github.com/langchain-ai/langchain/issues/19054
- **Dependencies:**
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Co-authored-by: hyungwookyang <hyungwookyang@worksmobile.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** This change fixes a bug where attempts to load data
from Notion using the NotionDBLoader resulted in a 400 Bad Request
error. The issue was traced to the unconditional addition of an empty
'filter' object in the request payload, which Notion's API does not
accept. The modification ensures that the 'filter' object is only
included in the payload when it is explicitly provided and not empty,
thus preventing the 400 error from occurring.
- **Issue:** Fixes
[#18009](https://github.com/langchain-ai/langchain/issues/18009)
- **Dependencies:** None
- **Twitter handle:** @gunnzolder
Co-authored-by: Anton Parkhomenko <anton@merge.rocks>
**Description:**
Updates to LangChain-MongoDB documentation: updates to the Atlas vector
search index definition
**Issue:**
NA
**Dependencies:**
NA
**Twitter handle:**
iprakul
This PR adds `batch as completed` method to the standard Runnable
interface. It takes in a list of inputs and yields the corresponding
outputs as the inputs are completed.
Add documentation notebook for `ElasticsearchRetriever`.
## Dependencies
- [ ] Release new `langchain-elasticsearch` version 0.2.0 that includes
`ElasticsearchRetriever`
**Description:** Circular dependencies when parsing references leading
to `RecursionError: maximum recursion depth exceeded` issue. This PR
address the issue by handling previously seen refs as in any typical DFS
to avoid infinite depths.
**Issue:** https://github.com/langchain-ai/langchain/issues/12163
**Twitter handle:** https://twitter.com/theBhulawat
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Refactor code of FAISS vectorcstore and update the
related documentation.
Details:
- replace `.format()` with f-strings for strings formatting;
- refactor definition of a filtering function to make code more readable
and more flexible;
- slightly improve efficiency of
`max_marginal_relevance_search_with_score_by_vector` method by removing
unnecessary looping over the same elements;
- slightly improve efficiency of `delete` method by using set data
structure for checking if the element was already deleted;
**Issue:** fix small inconsistency in the documentation (the old example
was incorrect and unappliable to faiss vectorstore)
**Dependencies:** basic langchain-community dependencies and `faiss`
(for CPU or for GPU)
**Twitter handle:** antonenkodev
Issue : _call method of LLMRouterChain uses predict_and_parse, which is
slated for deprecation.
Description : Instead of using predict_and_parse, this replaces it with
individual predict and parse functions.
Added deps:
- `@supabase/supabase-js` - for sending inserts
- `supabase` - dev dep, for generating types via cli
- `dotenv` for loading env vars
Added script:
- `yarn gen` - will auto generate the database schema types using the
supabase CLI. Not necessary for development, but is useful. Requires
authing with the supabase CLI (will error out w/ instructions if you're
not authed).
Added functionality:
- pulls users IP address (using a free endpoint: `https://api.ipify.org`
so we can filter out abuse down the line)
TODO:
- [x] add env vars to vercel
**Description:** Update the docstring of OpenAI, OpenAIEmbeddings and
ChatOpenAI classes
**Issue:** Update import module paths to the current LangChain API
**Dependencies:** None
**Lint and test**: `make format` and `make lint` were run
This incorporates the review comments from langchain-ai/langchain#18637
which I closed due to an issue I had in updating that pr branch
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
community: fix - change sparkllm spark_app_url to spark_api_url
- **Description:**
- Change the variable name from `sparkllm spark_app_url` to
`spark_api_url` in the community package.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
Variable name was `openai_poem` but it didn't pass in the `"prompt":
"poem"` config, so the examples were showing a joke being returned from
a variable called `*_poem`.
We could have gone one of two ways:
1. Updating the config line and the output line, or
2. Updating the variable name
The latter seemed simpler, so that's what I went with. But I'd be glad
to re-do this PR if you prefer the former.
Thanks for everything, y'all. You rock 🤘
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** `conroywhitney`
**Description:** Update AnthropicLLM deprecation message import path for
ChatAnthropic
**Issue:** Incorrect import path in deprecation message
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string.
This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well.
Fixes the following issue #18760 raised by @eyurtsev
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** Update callbacks documentation
**Issue:** Change some module imports and a method invocation to reflect
the current LangChainAPI
**Dependencies:** None
BasePDFLoader doesn't parse the suffix of the file correctly when
parsing S3 presigned urls. This fix enables the proper detection and
parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno
36] File name too long`.
No additional dependencies required.
Created the `facebook` page from `facebook_faiss` and `facebook_chat`
pages. Added another Facebook integrations into this page.
Updated `discord` page.
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
*Description**: My previous
[PR](https://github.com/langchain-ai/langchain/pull/18521) was
mistakenly closed, so I am reopening this one. Context: AWS released two
Mistral models on Bedrock last Friday (March 1, 2024). This PR includes
some code adjustments to ensure their compatibility with the Bedrock
class.
---------
Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Update azuresearch vectorstore from_texts() method to
include fields argument, necessary for creating an Azure AI Search index
with custom fields.
- **Issue:** Currently index fields are fixed to default fields if Azure
Search index is created using from_texts() method
- **Dependencies:** None
- **Twitter handle:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Small improvement to the openapi prompt.
The agent was not finding the server base URL (looping through all
nodes). This small change narrows the search and enables finding the url
faster.
No dependency
Twitter : @al1pra
# Proper example for AzureOpenAI usage in error message
The original error message is wrong in part of a usage example it gives.
Corrected to the right one.
Co-authored-by: Dzmitry Kankalovich <dzmitry_kankalovich@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
This PR is a successor to this PR -
https://github.com/langchain-ai/langchain/pull/17436
This PR updates the cookbook README with the notebook so that it is
available on langchain docs for discoverability.
cc: @baskaryan, @3coins
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Fix lists display issues in **Docs > Use Cases > Q&A
with RAG > Quickstart**.
In essence, this PR changes:
```markdown
Some paragraph.
- Item a.
- Item b.
```
to:
```markdown
Some paragraph.
- Item a.
- Item b.
```
There needs an extra empty line to make the list rendered properly.
FYI, the old version is displayed not properly as:
<img width="856" alt="image"
src="https://github.com/langchain-ai/langchain/assets/22856433/65202577-8ea2-47c6-b310-39bf42796fac">
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** `S3DirectoryLoader` is failing if prefix is a folder
(ex: `my_folder/`) because `S3FileLoader` will try to load that folder
and will fail. This PR skip nested directories so prefix can be set to
folder instead of `my_folder/files_prefix`.
- **Issue:**
- #11917
- #6535
- #4326
- **Dependencies:** none
- **Twitter handle:** @Falydoor
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- [ ] Title: Mongodb: MongoDB connection performance improvement.
- [ ] Message:
- **Description:** I made collection index_creation as optional. Index
Creation is one time process.
- **Issue:** MongoDBChatMessageHistory class object is attempting to
create an index during connection, causing each request to take longer
than usual. This should be optional with a parameter.
- **Dependencies:** N/A
- **Branch to be checked:** origin/mongo_index_creation
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Add embedding instruction to
HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and
other models that need embedding instruction.
---------
Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
_generate() and _agenerate() both accept **kwargs, then pass them on to
_format_output; but _format_output doesn't accept **kwargs. Attempting
to pass, e.g.,
timeout=50
to _generate (or invoke()) results in a TypeError.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Add Passio Nutrition AI Food Search Tool to Community Package
### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.
### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.
### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.
### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.
### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.
### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.
### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.
### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
"community: added a feature to filter documents in Mongoloader"
- **Description:** added a feature to filter documents in Mongoloader
- **Feature:** the feature #18251
- **Dependencies:** No
- **Twitter handle:** https://twitter.com/im_Kushagra
For some DBs with lots of tables, reflection of all the tables can take
very long. So this change will make the tables be reflected lazily when
get_table_info() is called and `lazy_table_reflection` is True.
Allows all chat models that implement _stream, but not _astream to still have async streaming to work.
Amongst other things this should resolve issues with streaming community model implementations through langserve since langserve is exclusively async.
**Description:** Replacing the deprecated predict() and apredict()
methods in the unit tests
**Issue:** Not applicable
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` have been
run
**Description:** Minor update to Anthropic documentation
**Issue:** Not applicable
**Dependencies:** None
**Lint and test**: `make format` and `make lint` was done
This path updates function "run" to "invoke" in llm_bash.ipynb.
Without this path, you see following warning.
LangChainDeprecationWarning: The function `run` was deprecated in
LangChain 0.1.0
and will be removed in 0.2.0. Use invoke instead.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Fixing a minor typo in the package name.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- [ ] **PR title:** docs: Fix link to HF TEI in
text_embeddings_inference.ipynb
- [ ] **PR message:**
- **Description:** Fix the link to [Hugging Face Text Embeddings
Inference
(TEI)](https://huggingface.co/docs/text-embeddings-inference/index) in
text_embeddings_inference.ipynb
- **Issue:** Fix#18576
Make `ElasticsearchRetriever` available as top-level import.
The `langchain` package depends on `langchain-community` so we do not
need to depend on it explicitly.
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.
## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.
## Twitter handle
- https://twitter.com/friendliai
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
## **Description:**
MongoDB integration tests link to a provided Atlas Cluster. We have very
stringent permissions set against the cluster provided. In order to make
it easier to track and isolate the collections each test gets run
against, we've updated the collection names to map the test file name.
i.e. `langchain_{filename}` => `langchain_test_vectorstores`
Fixes integration test results

## **Dependencies:**
Provided MONGODB_ATLAS_URI
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
cc: @shaneharvey, @blink1073 , @NoahStapp , @caseyclements
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
Fixes#18513.
## Description
This PR attempts to fix the support for Anthropic Claude v3 models in
BedrockChat LLM. The changes here has updated the payload to use the
`messages` format instead of the formatted text prompt for all models;
`messages` API is backwards compatible with all models in Anthropic, so
this should not break the experience for any models.
## Notes
The PR in the current form does not support the v3 models for the
non-chat Bedrock LLM. This means, that with these changes, users won't
be able to able to use the v3 models with the Bedrock LLM. I can open a
separate PR to tackle this use-case, the intent here was to get this out
quickly, so users can start using and test the chat LLM. The Bedrock LLM
classes have also grown complex with a lot of conditions to support
various providers and models, and is ripe for a refactor to make future
changes more palatable. This refactor is likely to take longer, and
requires more thorough testing from the community. Credit to PRs
[18579](https://github.com/langchain-ai/langchain/pull/18579) and
[18548](https://github.com/langchain-ai/langchain/pull/18548) for some
of the code here.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
Thank you for contributing to LangChain!
- [x] **PR title**: "templates: rag-multi-modal typo, replace serch with
search "
- **Description:** Two little typos in multi modal templates (replace
serch string with search)
Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>
@@ -10,7 +10,7 @@ You can use the dev container configuration in this folder to build and run the
You may use the button above, or follow these steps to open this repo in a Codespace:
1. Click the **Code** drop-down menu at the top of https://github.com/langchain-ai/langchain.
1. Click on the **Codespaces** tab.
1. Click **Create codespace on master**.
1. Click **Create codespace on master**.
For more info, check out the [GitHub documentation](https://docs.github.com/en/free-pro-team@latest/github/developing-online-with-codespaces/creating-a-codespace#creating-a-codespace).
- Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
- Where "package" is whichever of langchain, community, core, etc. is being modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
@@ -26,4 +26,4 @@ Additional guidelines:
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in langchain.
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Alternatively, if you are just interested in using the query generation part of the SQL chain, you can check out [`create_sql_query_chain`](https://github.com/langchain-ai/langchain/blob/master/docs/extras/use_cases/tabular/sql_query.ipynb)
The [LangChain CLI](https://python.langchain.com/docs/versions/v0_3/#migrate-using-langchain-cli) can help you automatically upgrade your code to use non-deprecated imports.
This will be especially helpful if you're still on either version 0.0.x or 0.1.x of LangChain.
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
[](https://codespaces.new/langchain-ai/langchain)
[](https://star-history.com/#langchain-ai/langchain)
Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langchain-ai/langchainjs).
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com).
[LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications.
Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
## Quick Install
With pip:
```bash
pip install langchain
```
With conda:
```bash
conda install langchain -c conda-forge
```
## 🤔 What is LangChain?
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
**LangChain** is a framework for developing applications powered by large language models (LLMs).
This framework consists of several parts.
- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- **[LangChain Templates](templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.
- **[LangServe](https://github.com/langchain-ai/langserve)**: A library for deploying LangChain chains as a REST API.
- **[LangSmith](https://smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
- **[LangGraph](https://python.langchain.com/docs/langgraph)**: LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.
For these applications, LangChain simplifies the entire application lifecycle:
The LangChain libraries themselves are made up of several different packages.
- **[`langchain-core`](libs/core)**: Base abstractions and LangChain Expression Language.
- **[`langchain-community`](libs/community)**: Third party integrations.
- **[`langchain`](libs/langchain)**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **Open-source libraries**: Build your applications using LangChain's open-source [building blocks](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel), [components](https://python.langchain.com/docs/concepts/), and [third-party integrations](https://python.langchain.com/docs/integrations/providers/).
Use [LangGraph](https://langchain-ai.github.io/langgraph/) to build stateful agents with first-class streaming and human-in-the-loop support.
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://docs.smith.langchain.com/) so that you can constantly optimize and deploy with confidence.
- **Deployment**: Turn your LangGraph applications into production-ready APIs and Assistants with [LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/).

### Open-source libraries
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
- **`langchain-community`**: Third party integrations.
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[`LangGraph`](https://langchain-ai.github.io/langgraph/)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, *Introduction to LangGraph*, available [here](https://academy.langchain.com/courses/intro-to-langgraph).
### Productionization:
- **[LangSmith](https://docs.smith.langchain.com/)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
### Deployment:
- **[LangGraph Cloud](https://langchain-ai.github.io/langgraph/cloud/)**: Turn your LangGraph applications into production-ready APIs and Assistants.


- End-to-end Example: [Web LangChain (web researcher chatbot)](https://weblangchain.vercel.app) and [repo](https://github.com/langchain-ai/weblangchain)
And much more! Head to the [Use cases](https://python.langchain.com/docs/use_cases/) section of the docs for more.
And much more! Head to the [Tutorials](https://python.langchain.com/docs/tutorials/) section of the docs for more.
## 🚀 How does LangChain help?
The main value props of the LangChain libraries are:
1.**Components**: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
1.**Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2.**Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
## LangChain Expression Language (LCEL)
LCEL is a key part of LangChain, allowing you to build and organize chains of processes in a straightforward, declarative manner. It was designed to support taking prototypes directly into production without needing to alter any code. This means you can use LCEL to set up everything from basic "prompt + LLM" setups to intricate, multi-step workflows.
- **[Overview](https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel)**: LCEL and its benefits
- **[Interface](https://python.langchain.com/docs/concepts/#runnable-interface)**: The standard Runnable interface for LCEL objects
- **[Primitives](https://python.langchain.com/docs/how_to/#langchain-expression-language-lcel)**: More on the primitives LCEL includes
- **[Cheatsheet](https://python.langchain.com/docs/how_to/lcel_cheatsheet/)**: Quick overview of the most common usage patterns
## Components
Components fall into the following **modules**:
**📃 Model I/O:**
**📃 Model I/O**
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
This includes [prompt management](https://python.langchain.com/docs/concepts/#prompt-templates), [prompt optimization](https://python.langchain.com/docs/concepts/#example-selectors), a generic interface for [chat models](https://python.langchain.com/docs/concepts/#chat-models) and [LLMs](https://python.langchain.com/docs/concepts/#llms), and common utilities for working with [model outputs](https://python.langchain.com/docs/concepts/#output-parsers).
**📚 Retrieval:**
**📚 Retrieval**
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/concepts/#document-loaders) from a variety of sources, [preparing it](https://python.langchain.com/docs/concepts/#text-splitters), then [searching over (a.k.a. retrieving from)](https://python.langchain.com/docs/concepts/#retrievers) it for use in the generation step.
**🤖 Agents:**
**🤖 Agents**
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a [standard interface for agents](https://python.langchain.com/docs/concepts/#agents), along with [LangGraph](https://github.com/langchain-ai/langgraph) for building custom agents.
## 📖 Documentation
Please see [here](https://python.langchain.com) for full documentation, which includes:
- [Getting started](https://python.langchain.com/docs/get_started/introduction): installation, setting up the environment, simple examples
-Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/), and [integrations](https://python.langchain.com/docs/integrations/providers)
- [Use case](https://python.langchain.com/docs/use_cases/qa_structured/sql) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/adapters/openai)
- [LangSmith](https://python.langchain.com/docs/langsmith/), [LangServe](https://python.langchain.com/docs/langserve), and [LangChain Template](https://python.langchain.com/docs/templates/) overviews
- [Reference](https://api.python.langchain.com): full API docs
- [Introduction](https://python.langchain.com/docs/introduction/): Overview of the framework and the structure of the docs.
-[Tutorials](https://python.langchain.com/docs/tutorials/): If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
- [How-to guides](https://python.langchain.com/docs/how_to/): Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
- [Conceptual guide](https://python.langchain.com/docs/concepts/): Conceptual explanations of the key parts of the framework.
- [API Reference](https://api.python.langchain.com): Thorough documentation of every class and method.
## 🌐 Ecosystem
- [🦜🛠️ LangSmith](https://docs.smith.langchain.com/): Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
- [🦜🕸️ LangGraph](https://langchain-ai.github.io/langgraph/): Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
- [🦜🏓 LangServe](https://python.langchain.com/docs/langserve): Deploy LangChain runnables and chains as REST APIs.
"query = \"Give me company names that are interesting investments based on EV / NTM and NTM rev growth. Consider EV / NTM multiples vs historical?\"\n",
@@ -4,10 +4,13 @@ Example code for building applications with LangChain, with an emphasis on more
Notebook | Description
:- | :-
[agent_fireworks_ai_langchain_mongodb.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/agent_fireworks_ai_langchain_mongodb.ipynb) | Build an AI Agent With Memory Using MongoDB, LangChain and FireWorksAI.
[mongodb-langchain-cache-memory.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/mongodb-langchain-cache-memory.ipynb) | Build a RAG Application with Semantic Cache Using MongoDB and LangChain.
[LLaMA2_sql_chat.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/LLaMA2_sql_chat.ipynb) | Build a chat application that interacts with a SQL database using an open source llm (llama2), specifically demonstrated on an SQLite database containing rosters.
[Semi_Structured_RAG.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_Structured_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data, including text and tables, using unstructured for parsing, multi-vector retriever for storing, and lcel for implementing chains.
[Semi_structured_and_multi_moda...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using unstructured for parsing, multi-vector retriever for storage and retrieval, and lcel for implementing chains.
[Semi_structured_multi_modal_RA...](https://github.com/langchain-ai/langchain/tree/master/cookbook/Semi_structured_multi_modal_RAG_LLaMA2.ipynb) | Perform retrieval-augmented generation (rag) on documents with semi-structured data and images, using various tools and methods such as unstructured for parsing, multi-vector retriever for storing, lcel for implementing chains, and open source language models like llama2, llava, and gpt4all.
[amazon_personalize_how_to.ipynb](https://github.com/langchain-ai/langchain/blob/master/cookbook/amazon_personalize_how_to.ipynb) | Retrieving personalized recommendations from Amazon Personalize and use custom agents to build generative AI apps
[analyze_document.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/analyze_document.ipynb) | Analyze a single long document.
[autogpt/autogpt.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/autogpt.ipynb) | Implement autogpt, a language model, with langchain primitives such as llms, prompttemplates, vectorstores, embeddings, and tools.
[autogpt/marathon_times.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/autogpt/marathon_times.ipynb) | Implement autogpt for finding winning marathon times.
@@ -35,6 +38,7 @@ Notebook | Description
[llm_symbolic_math.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/llm_symbolic_math.ipynb) | Solve algebraic equations with the help of llms (language learning models) and sympy, a python library for symbolic mathematics.
[meta_prompt.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/meta_prompt.ipynb) | Implement the meta-prompt concept, which is a method for building self-improving agents that reflect on their own performance and modify their instructions accordingly.
[multi_modal_output_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_modal_output_agent.ipynb) | Generate multi-modal outputs, specifically images and text.
[multi_modal_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_modal_RAG_vdms.ipynb) | Perform retrieval-augmented generation (rag) on documents including text and images, using unstructured for parsing, Intel's Visual Data Management System (VDMS) as the vectorstore, and chains.
[multi_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multi_player_dnd.ipynb) | Simulate multi-player dungeons & dragons games, with a custom function determining the speaking schedule of the agents.
[multiagent_authoritarian.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_authoritarian.ipynb) | Implement a multi-agent simulation where a privileged agent controls the conversation, including deciding who speaks and when the conversation ends, in the context of a simulated news network.
[multiagent_bidding.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/multiagent_bidding.ipynb) | Implement a multi-agent simulation where agents bid to speak, with the highest bidder speaking next, demonstrated through a fictitious presidential debate example.
@@ -46,6 +50,7 @@ Notebook | Description
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
[rag_upstage_layout_analysis_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_layout_analysis_groundedness_check.ipynb) | End-to-end RAG example using Upstage Layout Analysis and Groundedness Check.
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
@@ -55,3 +60,7 @@ Notebook | Description
[two_agent_debate_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_agent_debate_tools.ipynb) | Simulate multi-agent dialogues where the agents can utilize various tools.
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialogue simulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
[rag-locally-on-intel-cpu.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag-locally-on-intel-cpu.ipynb) | Perform Retrieval-Augmented-Generation (RAG) on locally downloaded open-source models using langchain and open source tools and execute it on Intel Xeon CPU. We showed an example of how to apply RAG on Llama 2 model and enable it to answer the queries related to Intel Q1 2024 earnings release.
[visual_RAG_vdms.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/visual_RAG_vdms.ipynb) | Performs Visual Retrieval-Augmented-Generation (RAG) using videos and scene descriptions generated by open source models.
[contextual_rag.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/contextual_rag.ipynb) | Performs contextual retrieval-augmented generation (RAG) prepending chunk-specific explanatory context to each chunk before embedding.
"Apply to the [`LLaMA2`](https://arxiv.org/pdf/2307.09288.pdf) paper. \n",
"\n",
"We use the Unstructured [`partition_pdf`](https://unstructured-io.github.io/unstructured/bricks/partition.html#partition-pdf), which segments a PDF document by using a layout model. \n",
"We use the Unstructured [`partition_pdf`](https://unstructured-io.github.io/unstructured/core/partition.html#partition-pdf), which segments a PDF document by using a layout model. \n",
"\n",
"This layout model makes it possible to extract elements, such as tables, from pdfs. \n",
"query = \"What percentage of CPI is dedicated to Housing, and how does it compare to the combined percentage of Medical Care, Apparel, and Other Goods and Services?\"\n",
"suffix_for_images = \" Include any pie charts, graphs, or tables.\"\n",
" raise ValueError(\"a KEYSPACE environment variable must be set\")\n",
"\n",
"session.set_keyspace(keyspace)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This needs to be done one time only!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dataset used is from Kaggle, the [Environmental Sensor Telemetry Data](https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k?select=iot_telemetry_data.csv). The next cell will download and unzip the data into a Pandas dataframe. The following cell is instructions to download manually. \n",
"\n",
"The net result of this section is you should have a Pandas dataframe variable `df`."
"with zip_file.open(csv_file_name) as csv_file:\n",
" df = pd.read_csv(csv_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download Manually"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can download the `.zip` file and unpack the `.csv` contained within. Comment in the next line, and adjust the path to this `.csv` file appropriately."
"WITH COMMENT = 'Data from environmental IoT room sensors. Columns include device identifier, timestamp (ts) of the data collection, carbon monoxide level (co), relative humidity, light presence, LPG concentration, motion detection, smoke concentration, and temperature (temp). Data is partitioned by day and device.';\n",
" description=\"A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.\",\n",
"Here is your task: In the {keyspace} keyspace, find the total number of times the temperature of each device has exceeded 23 degrees on July 14, 2020.\n",
" Create a summary report including the name of the room. Use Pandas if helpful.\n",
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/toolkits/sql_database.html) for answering questions over a Databricks database."
"This example demonstrates the use of the [SQL Database Agent](/docs/integrations/tools/sql_database) for answering questions over a Databricks database."
"* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.\n",
"\n",
"* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b56412ae",
"metadata": {},
"outputs": [],
"source": [
"import getpass"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "16a20d7a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your MongoDB connection string:········\n"
]
}
],
"source": [
"MONGODB_URI = getpass.getpass(\"Enter your MongoDB connection string:\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "978682d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your OpenAI API key:········\n"
]
}
],
"source": [
"OPENAI_API_KEY = getpass.getpass(\"Enter your OpenAI API key:\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "606081c5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"········\n"
]
}
],
"source": [
"# Optional-- If you want to enable Langsmith -- good for debugging\n",
"rag_chain = retriever_chain | rag_prompt | model | parse_output"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "9618d395",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The best movie to watch when feeling down could be \"Last Action Hero.\" It\\'s a fun and action-packed film that blends reality and fantasy, offering an escape from the real world and providing an entertaining distraction.'"
"'I apologize for the confusion. Another movie that might lift your spirits when you\\'re feeling sad is \"Smilla\\'s Sense of Snow.\" It\\'s a mystery thriller that could engage your mind and distract you from your sadness with its intriguing plot and suspenseful storyline.'"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_message_history.invoke(\n",
" {\n",
" \"question\": \"Hmmm..I don't want to watch that one. Can you suggest something else?\"\n",
"'For a lighter movie option, you might enjoy \"Cousins.\" It\\'s a comedy film set in Barcelona with action and humor, offering a fun and entertaining escape from reality. The storyline is engaging and filled with comedic moments that could help lift your spirits.'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_message_history.invoke(\n",
" {\"question\": \"How about something more light?\"},\n",
"## Step 7: Get faster responses using Semantic Cache\n",
"\n",
"**NOTE:** Semantic cache only caches the input to the LLM. When using it in retrieval chains, remember that documents retrieved can change between runs resulting in cache misses for semantically similar queries."
"Many documents contain a mixture of content types, including text and images. \n",
"\n",
"Yet, information captured in images is lost in most RAG applications.\n",
"\n",
"With the emergence of multimodal LLMs, like [GPT-4V](https://openai.com/research/gpt-4v-system-card), it is worth considering how to utilize images in RAG:\n",
"\n",
"In this demo we\n",
"\n",
"* Use multimodal embeddings from Nomic Embed [Vision](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) and [Text](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) to embed images and text\n",
"* Retrieve both using similarity search\n",
"* Pass raw images and text chunks to a multimodal LLM for answer synthesis \n",
"\n",
"## Signup\n",
"\n",
"Get your API token, then run:\n",
"```\n",
"! nomic login\n",
"```\n",
"\n",
"Then run with your generated API token \n",
"```\n",
"! nomic login < token > \n",
"```\n",
"\n",
"## Packages\n",
"\n",
"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
"Let's look at an example pdfs containing interesting images.\n",
"\n",
"1/ Art from the J Paul Getty museum:\n",
"\n",
" * Here is a [zip file](https://drive.google.com/file/d/18kRKbq2dqAhhJ3DfZRnYcTBEUfYxe1YR/view?usp=sharing) with the PDF and the already extracted images. \n",
"We can use `partition_pdf` below from [Unstructured](https://unstructured-io.github.io/unstructured/introduction.html#key-concepts) to extract text and images.\n",
"\n",
"To supply this to extract the images:\n",
"```\n",
"extract_images_in_pdf=True\n",
"```\n",
"\n",
"\n",
"\n",
"If using this zip file, then you can simply process the text only with:\n",
"# Oracle AI Vector Search with Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, your vectors can benefit from all of Oracle Database’s most powerful features, like the following:\n",
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
" * Loading the documents from various sources using OracleDocLoader\n",
" * Summarizing them within/outside the database using OracleSummary\n",
" * Generating embeddings for them within/outside the database using OracleEmbeddings\n",
" * Chunking them according to different requirements using Advanced Oracle Capabilities from OracleTextSplitter\n",
" * Storing and Indexing them in a Vector Store and querying them for queries in OracleVS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are just starting with Oracle Database, consider exploring the [free Oracle 23 AI](https://www.oracle.com/database/free/#resources) which provides a great introduction to setting up your database environment. While working with the database, it is often advisable to avoid using the system user by default; instead, you can create your own user for enhanced security and customization. For detailed steps on user creation, refer to our [end-to-end guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo.ipynb) which also shows how to set up a user in Oracle. Additionally, understanding user privileges is crucial for managing database security effectively. You can learn more about this topic in the official [Oracle guide](https://docs.oracle.com/en/database/oracle/oracle-database/19/admqs/administering-user-accounts-and-security.html#GUID-36B21D72-1BBB-46C9-A0C9-F0D2A8591B8D) on administering user accounts and security."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, create a demo user with all the required privileges. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"User setup done!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# Update with your username, password, hostname, and service_name\n",
" print(f\"User setup failed with error: {e}\")\n",
" finally:\n",
" cursor.close()\n",
" conn.close()\n",
"except Exception as e:\n",
" print(f\"Connection failed with error: {e}\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Process Documents using Oracle AI\n",
"Consider the following scenario: users possess documents stored either in an Oracle Database or a file system and intend to utilize this data with Oracle AI Vector Search powered by Langchain.\n",
"\n",
"To prepare the documents for analysis, a comprehensive preprocessing workflow is necessary. Initially, the documents must be retrieved, summarized (if required), and chunked as needed. Subsequent steps involve generating embeddings for these chunks and integrating them into the Oracle AI Vector Store. Users can then conduct semantic searches on this data.\n",
"\n",
"The Oracle AI Vector Search Langchain library encompasses a suite of document processing tools that facilitate document loading, chunking, summary generation, and embedding creation.\n",
"\n",
"In the sections that follow, we will detail the utilization of Oracle AI Langchain APIs to effectively implement each of these processes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code will show how to connect to Oracle Database. By default, python-oracledb runs in a ‘Thin’ mode which connects directly to Oracle Database. This mode does not need Oracle Client libraries. However, some additional functionality is available when python-oracledb uses them. Python-oracledb is said to be in ‘Thick’ mode when Oracle Client libraries are used. Both modes have comprehensive functionality supporting the Python Database API v2.0 Specification. See the following [guide](https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary) that talks about features supported in each mode. You might want to switch to thick-mode if you are unable to use thin-mode."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
" cursor.execute(create_table_sql)\n",
"\n",
" insert_row_sql = \"\"\"insert into demo_tab values (:1, :2)\"\"\"\n",
" rows_to_insert = [\n",
" (\n",
" 1,\n",
" \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
" ),\n",
" (\n",
" 2,\n",
" \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
" ),\n",
" (\n",
" 3,\n",
" \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
"With the inclusion of a demo user and a populated sample table, the remaining configuration involves setting up embedding and summary functionalities. Users are presented with multiple provider options, including local database solutions and third-party services such as Ocigenai, Hugging Face, and OpenAI. Should users opt for a third-party provider, they are required to establish credentials containing the necessary authentication details. Conversely, if selecting a database as the provider for embeddings, it is necessary to upload an ONNX model to the Oracle Database. No additional setup is required for summary functionalities when using the database option."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load ONNX Model\n",
"\n",
"Oracle accommodates a variety of embedding providers, enabling users to choose between proprietary database solutions and third-party services such as OCIGENAI and HuggingFace. This selection dictates the methodology for generating and managing embeddings.\n",
"\n",
"***Important*** : Should users opt for the database option, they must upload an ONNX model into the Oracle Database. Conversely, if a third-party provider is selected for embedding generation, uploading an ONNX model to Oracle Database is not required.\n",
"\n",
"A significant advantage of utilizing an ONNX model directly within Oracle is the enhanced security and performance it offers by eliminating the need to transmit data to external parties. Additionally, this method avoids the latency typically associated with network or REST API calls.\n",
"\n",
"Below is the example code to upload an ONNX model into Oracle Database:"
"When selecting third-party providers for generating embeddings, users are required to establish credentials to securely access the provider's endpoints.\n",
"\n",
"***Important:*** No credentials are necessary when opting for the 'database' provider to generate embeddings. However, should users decide to utilize a third-party provider, they must create credentials specific to the chosen provider.\n",
"Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).\n",
"\n",
"A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).\n",
"\n",
"Below is a sample code snippet that demonstrates how to use OracleDocLoader"
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library offers a suite of APIs designed for document summarization. It supports multiple summarization providers such as Database, OCIGENAI, HuggingFace, among others, allowing users to select the provider that best meets their needs. To utilize these capabilities, users must configure the summary parameters as specified. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-EC9DDB58-6A15-4B36-BA66-ECBA20D2CE57)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate summary:"
"The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240).\n",
"\n",
"Below is a sample code illustrating how to implement this:"
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides multiple methods for generating embeddings, utilizing either locally hosted ONNX models or third-party APIs. For comprehensive instructions on configuring these alternatives, please refer to the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-C6439E94-4E86-4ECD-954E-4B73D53579DE)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** Users may need to configure a proxy to utilize third-party embedding generation providers, excluding the 'database' provider that utilizes an ONNX model."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate embeddings:"
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
"print(f\"Vector Store Table: {vectorstore.table_name}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The example provided illustrates the creation of a vector store using the DOT_PRODUCT distance strategy. Users have the flexibility to employ various distance strategies with the Oracle AI Vector Store, as detailed in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With embeddings now stored in vector stores, it is advisable to establish an index to enhance semantic search performance during query execution.\n",
"\n",
"***Note*** Should you encounter an \"insufficient memory\" error, it is recommended to increase the ***vector_memory_size*** in your database configuration\n",
"\n",
"Below is a sample code snippet for creating an index:"
"This example demonstrates the creation of a default HNSW index on embeddings within the 'oravs' table. Users may adjust various parameters according to their specific needs. For detailed information on these parameters, please consult the [Oracle AI Vector Search Guide book](https://docs.oracle.com/en/database/oracle/oracle-database/23/vecse/manage-different-categories-vector-indexes.html).\n",
"\n",
"Additionally, various types of vector indices can be created to meet diverse requirements. More details can be found in our [comprehensive guide](https://python.langchain.com/v0.1/docs/integrations/vectorstores/oracle/).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"We have successfully processed the documents and stored them in the vector store, followed by the creation of an index to enhance query performance. We are now prepared to proceed with semantic searches.\n",
"\n",
"Below is the sample code for this process:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
"[]\n",
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
"[]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
"In this cookbook, we use langchain tools and open source models to execute locally on CPU. This notebook has been validated to run on Intel Xeon 8480+ CPU. Here we implement a RAG pipeline for Llama2 model to answer questions about Intel Q1 2024 earnings release."
]
},
{
"cell_type": "markdown",
"id": "acadbcec-3468-4926-8ce5-03b678041c0a",
"metadata": {},
"source": [
"**Create a conda or virtualenv environment with python >=3.10 and install following libraries**\n",
"Document(metadata={'source': 'intel_q1_2024_earnings.pdf', 'page': 0}, page_content='Intel Corporation\\n2200 Mission College Blvd.\\nSanta Clara, CA 95054-1549\\n \\nNews Release\\n Intel Reports First -Quarter 2024 Financial Results\\nNEWS SUMMARY\\n▪First-quarter revenue of $12.7 billion , up 9% year over year (YoY).\\n▪First-quarter GAAP earnings (loss) per share (EPS) attributable to Intel was $(0.09) ; non-GAAP EPS \\nattributable to Intel was $0.18 .')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_splits[0]"
]
},
{
"cell_type": "markdown",
"id": "b88d2632-7c1b-49ef-a691-c0eb67d23e6a",
"metadata": {},
"source": [
"**One of the major step in RAG is to convert each split of document into embeddings and store in a vector database such that searching relevant documents are efficient.** <br>\n",
"**For that, importing Chroma vector database from langchain. Also, importing open source GPT4All for embedding models**"
"**In next step, we will download one of the most popular embedding model \"all-MiniLM-L6-v2\". Find more details of the model at this link https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2**"
"**Look at the first retrieved document from the vector database**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "43a6d94f-b5c4-47b0-a353-2db4c3d24d9c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(metadata={'page': 1, 'source': 'intel_q1_2024_earnings.pdf'}, page_content='Client Computing Group (CCG) $7.5 billion up31%\\nData Center and AI (DCAI) $3.0 billion up5%\\nNetwork and Edge (NEX) $1.4 billion down 8%\\nTotal Intel Products revenue $11.9 billion up17%\\nIntel Foundry $4.4 billion down 10%\\nAll other:\\nAltera $342 million down 58%\\nMobileye $239 million down 48%\\nOther $194 million up17%\\nTotal all other revenue $775 million down 46%\\nIntersegment eliminations $(4.4) billion\\nTotal net revenue $12.7 billion up9%\\nIntel Products Highlights')"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "64ba074f-4b36-442e-b7e2-b26d6e2815c3",
"metadata": {},
"source": [
"**Download Lllama-2 model from Huggingface and store locally** <br>\n",
"**You can download different quantization variant of Lllama-2 model from the link below. We are using Q8 version here (7.16GB).** <br>\n",
"**Now let's ask the same question to Llama model without showing them the earnings release.**"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "1033dd82-5532-437d-a548-27695e109589",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"?\n",
"(NASDAQ:INTC)\n",
"Intel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year."
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n",
"llama_print_timings: load time = 131.20 ms\n",
"llama_print_timings: sample time = 16.05 ms / 68 runs ( 0.24 ms per token, 4236.76 tokens per second)\n",
"llama_print_timings: prompt eval time = 131.14 ms / 16 tokens ( 8.20 ms per token, 122.01 tokens per second)\n",
"llama_print_timings: eval time = 3225.00 ms / 67 runs ( 48.13 ms per token, 20.78 tokens per second)\n",
"llama_print_timings: total time = 3466.40 ms / 83 tokens\n"
]
},
{
"data": {
"text/plain": [
"\"?\\n(NASDAQ:INTC)\\nIntel's CCG (Client Computing Group) revenue for Q1 2024 was $9.6 billion, a decrease of 35% from the previous quarter and a decrease of 42% from the same period last year.\""
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm.invoke(question)"
]
},
{
"cell_type": "markdown",
"id": "75f5cb10-746f-4e37-9386-b85a4d2b84ef",
"metadata": {},
"source": [
"**As you can see, model is giving wrong information. Correct asnwer is CCG revenue in Q1 2024 is $7.5B. Now let's apply RAG using the earning release document**"
]
},
{
"cell_type": "markdown",
"id": "0f4150ec-5692-4756-b11a-22feb7ab88ff",
"metadata": {},
"source": [
"**in RAG, we modify the input prompt by adding relevent documents with the question. Here, we use one of the popular RAG prompt**"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "226c14b0-f43e-4a1f-a1e4-04731d467ec4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template=\"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: {question} \\nContext: {context} \\nAnswer:\"))]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import hub\n",
"\n",
"rag_prompt = hub.pull(\"rlm/rag-prompt\")\n",
"rag_prompt.messages"
]
},
{
"cell_type": "markdown",
"id": "77deb6a0-0950-450a-916a-f2a029676c20",
"metadata": {},
"source": [
"**Appending all retreived documents in a single document**"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "2dbc3327-6ef3-4c1f-8797-0c71964b0921",
"metadata": {},
"outputs": [],
"source": [
"def format_docs(docs):\n",
" return \"\\n\\n\".join(doc.page_content for doc in docs)"
]
},
{
"cell_type": "markdown",
"id": "2e2d9f18-49d0-43a3-bea8-78746ffa86b7",
"metadata": {},
"source": [
"**The last step is to create a chain using langchain tool that will create an e2e pipeline. It will take question and context as an input.**"
"**Now we see the results are correct as it is mentioned in earnings release.** <br>\n",
"**To further automate, we will create a chain that will take input as question and retriever so that we don't need to retrieve documents separately**"
"# RAG using Upstage Layout Analysis and Groundedness Check\n",
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Layout Analysis and Groundedness Check."
"# SalesGPT - Your Context-Aware AI Sales Assistant With Knowledge Base\n",
"# SalesGPT - Context-Aware AI Sales Assistant With Knowledge Base and Ability Generate Stripe Payment Links\n",
"\n",
"This notebook demonstrates an implementation of a **Context-Aware** AI Sales agent with a Product Knowledge Base. \n",
"This notebook demonstrates an implementation of a **Context-Aware** AI Sales agent with a Product Knowledge Base which can actually close sales. \n",
"\n",
"This notebook was originally published at [filipmichalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) by [@FilipMichalsky](https://twitter.com/FilipMichalsky).\n",
"\n",
"SalesGPT is context-aware, which means it can understand what section of a sales conversation it is in and act accordingly.\n",
" \n",
"As such, this agent can have a natural sales conversation with a prospect and behaves based on the conversation stage. Hence, this notebook demonstrates how we can use AI to automate sales development representatives activities, such as outbound sales calls. \n",
"As such, this agent can have a natural sales conversation with a prospect and behaves based on the conversation stage. Hence, this notebook demonstrates how we can use AI to automate sales development representatives activites, such as outbound sales calls. \n",
"\n",
"Additionally, the AI Sales agent has access to tools, which allow it to interact with other systems.\n",
"\n",
"Here, we show how the AI Sales Agent can use a **Product Knowledge Base** to speak about a particular's company offerings,\n",
"hence increasing relevance and reducing hallucinations.\n",
"\n",
"We leverage the [`langchain`](https://github.com/langchain-ai/langchain) library in this implementation, specifically [Custom Agent Configuration](https://langchain-langchain.vercel.app/docs/modules/agents/how_to/custom_agent_with_tool_retrieval) and are inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) architecture ."
"Furthermore, we show how our AI Sales Agent can **generate sales** by integration with the AI Agent Highway called [Mindware](https://www.mindware.co/). In practice, this allows the agent to autonomously generate a payment link for your customers **to pay for your products via Stripe**.\n",
"\n",
"We leverage the [`langchain`](https://github.com/hwchase17/langchain) library in this implementation, specifically [Custom Agent Configuration](https://langchain-langchain.vercel.app/docs/modules/agents/how_to/custom_agent_with_tool_retrieval) and are inspired by [BabyAGI](https://github.com/yoheinakajima/babyagi) architecture ."
" Now determine what should be the next immediate conversation stage for the agent in the sales conversation by selecting only from the following options:\n",
" Now determine what should be the next immediate conversation stage for the agent in the sales conversation by selecting ony from the following options:\n",
" 1. Introduction: Start the conversation by introducing yourself and your company. Be polite and respectful while keeping the tone of the conversation professional.\n",
" 2. Qualification: Qualify the prospect by confirming if they are the right person to talk to regarding your product/service. Ensure that they have the authority to make purchasing decisions.\n",
" 3. Value proposition: Briefly explain how your product/service can benefit the prospect. Focus on the unique selling points and value proposition of your product/service that sets it apart from competitors.\n",
" Now determine what should be the next immediate conversation stage for the agent in the sales conversation by selecting only from the following options:\n",
" Now determine what should be the next immediate conversation stage for the agent in the sales conversation by selecting ony from the following options:\n",
" 1. Introduction: Start the conversation by introducing yourself and your company. Be polite and respectful while keeping the tone of the conversation professional.\n",
" 2. Qualification: Qualify the prospect by confirming if they are the right person to talk to regarding your product/service. Ensure that they have the authority to make purchasing decisions.\n",
" 3. Value proposition: Briefly explain how your product/service can benefit the prospect. Focus on the unique selling points and value proposition of your product/service that sets it apart from competitors.\n",
"\"I'm doing great, thank you for asking! As a Business Development Representative at Sleep Haven, I wanted to reach out to see if you are looking to achieve a better night's sleep. We provide premium mattresses that offer the most comfortable and supportive sleeping experience possible. Are you interested in exploring our sleep solutions? <END_OF_TURN>\""
"{'salesperson_name': 'Ted Lasso',\n",
" 'salesperson_role': 'Business Development Representative',\n",
" 'company_name': 'Sleep Haven',\n",
" 'company_business': 'Sleep Haven is a premium mattress company that provides customers with the most comfortable and supportive sleeping experience possible. We offer a range of high-quality mattresses, pillows, and bedding accessories that are designed to meet the unique needs of our customers.',\n",
" 'company_values': \"Our mission at Sleep Haven is to help people achieve a better night's sleep by providing them with the best possible sleep solutions. We believe that quality sleep is essential to overall health and well-being, and we are committed to helping our customers achieve optimal sleep by offering exceptional products and customer service.\",\n",
" 'conversation_purpose': 'find out whether they are looking to achieve better sleep via buying a premier mattress.',\n",
" 'conversation_history': 'Hello, this is Ted Lasso from Sleep Haven. How are you doing today? <END_OF_TURN>\\nUser: I am well, howe are you?<END_OF_TURN>',\n",
" 'conversation_type': 'call',\n",
" 'conversation_stage': 'Introduction: Start the conversation by introducing yourself and your company. Be polite and respectful while keeping the tone of the conversation professional. Your greeting should be welcoming. Always clarify in your greeting the reason why you are contacting the prospect.',\n",
" 'text': \"I'm doing well, thank you for asking. The reason I'm calling is to discuss how Sleep Haven can help enhance your sleep quality with our premium mattresses. Are you currently looking for ways to achieve a better night's sleep? <END_OF_TURN>\"}"
]
},
"execution_count": 8,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_conversation_utterance_chain.run(\n",
" salesperson_name=\"Ted Lasso\",\n",
" salesperson_role=\"Business Development Representative\",\n",
" company_name=\"Sleep Haven\",\n",
" company_business=\"Sleep Haven is a premium mattress company that provides customers with the most comfortable and supportive sleeping experience possible. We offer a range of high-quality mattresses, pillows, and bedding accessories that are designed to meet the unique needs of our customers.\",\n",
" company_values=\"Our mission at Sleep Haven is to help people achieve a better night's sleep by providing them with the best possible sleep solutions. We believe that quality sleep is essential to overall health and well-being, and we are committed to helping our customers achieve optimal sleep by offering exceptional products and customer service.\",\n",
" conversation_purpose=\"find out whether they are looking to achieve better sleep via buying a premier mattress.\",\n",
" conversation_history=\"Hello, this is Ted Lasso from Sleep Haven. How are you doing today? <END_OF_TURN>\\nUser: I am well, howe are you?<END_OF_TURN>\",\n",
" conversation_type=\"call\",\n",
" conversation_stage=conversation_stages.get(\n",
" \"1\",\n",
" \"Introduction: Start the conversation by introducing yourself and your company. Be polite and respectful while keeping the tone of the conversation professional.\",\n",
" ),\n",
"sales_conversation_utterance_chain.invoke(\n",
" {\n",
" \"salesperson_name\": \"Ted Lasso\",\n",
" \"salesperson_role\": \"Business Development Representative\",\n",
" \"company_name\": \"Sleep Haven\",\n",
" \"company_business\": \"Sleep Haven is a premium mattress company that provides customers with the most comfortable and supportive sleeping experience possible. We offer a range of high-quality mattresses, pillows, and bedding accessories that are designed tomeet the unique needs of our customers.\",\n",
" \"company_values\": \"Our mission at Sleep Haven is to help people achieve a better night's sleep by providing them with the best possible sleep solutions. We believe that quality sleep is essential to overall health and well-being, and we are committed to helping our customers achieve optimal sleep by offering exceptional products and customer service.\",\n",
" \"conversation_purpose\": \"find out whether they are looking to achieve better sleep via buying a premier mattress.\",\n",
" \"conversation_history\": \"Hello, this is Ted Lasso from Sleep Haven. How are you doing today? <END_OF_TURN>\\nUser: I am well, howe are you?<END_OF_TURN>\",\n",
" \"Introduction: Start the conversation by introducing yourself and your company. Be polite and respectful while keeping the tone of the conversation professional.\",\n",
" description=\"useful for when you need to answer questions about product information\",\n",
" )\n",
" ]\n",
"\n",
" return tools"
" return knowledge_base"
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"outputs": [
{
@@ -485,16 +485,18 @@
"text": [
"Created a chunk of size 940, which is longer than the specified 10\n",
"Created a chunk of size 844, which is longer than the specified 10\n",
"Created a chunk of size 837, which is longer than the specified 10\n"
"Created a chunk of size 837, which is longer than the specified 10\n",
"/Users/filipmichalsky/Odyssey/sales_bot/SalesGPT/env/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n"
]
},
{
"data": {
"text/plain": [
"' We have four products available: the Classic Harmony Spring Mattress, the Plush Serenity Bamboo Mattress, the Luxury Cloud-Comfort Memory Foam Mattress, and the EcoGreen Hybrid Latex Mattress. Each product is available in different sizes, with the Classic Harmony Spring Mattress available in Queen and King sizes, the Plush Serenity Bamboo Mattress available in King size, the Luxury Cloud-Comfort Memory Foam Mattress available in Twin, Queen, and King sizes, and the EcoGreen Hybrid Latex Mattress available in Twin and Full sizes.'"
"'The Sleep Haven products available are:\\n\\n1. Luxury Cloud-Comfort Memory Foam Mattress\\n2. Classic Harmony Spring Mattress\\n3. EcoGreen Hybrid Latex Mattress\\n4. Plush Serenity Bamboo Mattress\\n\\nEach product has its unique features and price point.'"
]
},
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
@@ -508,12 +510,199 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up the SalesGPT Controller with the Sales Agent and Stage Analyzer and a Knowledge Base"
"### Payment gateway"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to set up your AI agent to use a payment gateway to generate payment links for your users you need two things:\n",
"\n",
"1. Sign up for a Stripe account and obtain a STRIPE API KEY\n",
"2. Create products you would like to sell in the Stripe UI. Then follow out example of `example_product_price_id_mapping.json`\n",
"to feed the product name to price_id mapping which allows you to generate the payment links."
" description=\"useful for when you need to answer questions about product information or services offered, availability and their costs.\",\n",
" ),\n",
" Tool(\n",
" name=\"GeneratePaymentLink\",\n",
" func=generate_stripe_payment_link,\n",
" description=\"useful to close a transaction with a customer. You need to include product name and quantity and customer name in the query input.\",\n",
" ),\n",
" ]\n",
"\n",
" return tools"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up the SalesGPT Controller with the Sales Agent and Stage Analyzer\n",
"\n",
"#### The Agent has access to a Knowledge Base and can autonomously sell your products via Stripe"
"Created a chunk of size 940, which is longer than the specified 10\n",
"Created a chunk of size 844, which is longer than the specified 10\n",
"Created a chunk of size 837, which is longer than the specified 10\n"
"Created a chunk of size 837, which is longer than the specified 10\n",
"/Users/filipmichalsky/Odyssey/sales_bot/SalesGPT/env/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain.agents.agent.LLMSingleActionAgent` was deprecated in langchain 0.1.0 and will be removed in 0.2.0. Use Use new agent constructor methods like create_react_agent, create_json_agent, create_structured_chat_agent, etc. instead.\n",
" warn_deprecated(\n"
]
}
],
@@ -907,7 +1095,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
@@ -917,7 +1105,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 22,
"metadata": {},
"outputs": [
{
@@ -934,14 +1122,14 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: Hello, this is Ted Lasso from Sleep Haven. How are you doing today?\n"
"Ted Lasso: Good day! This is Ted Lasso from Sleep Haven. How are you doing today?\n"
]
}
],
@@ -951,18 +1139,18 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"sales_agent.human_step(\n",
" \"I am well, how are you? I would like to learn more about your mattresses.\"\n",
" \"I am well, how are you? I would like to learn more about your services.\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 25,
"metadata": {},
"outputs": [
{
@@ -977,92 +1165,32 @@
"sales_agent.determine_conversation_stage()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: I'm glad to hear that you're doing well! As for our mattresses, at Sleep Haven, we provide customers with the most comfortable and supportive sleeping experience possible. Our high-quality mattresses are designed to meet the unique needs of our customers. Can I ask what specifically you'd like to learn more about? \n"
]
}
],
"source": [
"sales_agent.step()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"sales_agent.human_step(\"Yes, what materials are you mattresses made from?\")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conversation Stage: Needs analysis: Ask open-ended questions to uncover the prospect's needs and pain points. Listen carefully to their responses and take notes.\n"
]
}
],
"source": [
"sales_agent.determine_conversation_stage()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: Our mattresses are made from a variety of materials, depending on the model. We have the EcoGreen Hybrid Latex Mattress, which is made from 100% natural latex harvested from eco-friendly plantations. The Plush Serenity Bamboo Mattress features a layer of plush, adaptive foam and a base of high-resilience support foam, with a bamboo-infused top layer. The Luxury Cloud-Comfort Memory Foam Mattress has an innovative, temperature-sensitive memory foam layer and a high-density foam base with cooling gel-infused particles. Finally, the Classic Harmony Spring Mattress has a robust inner spring construction and layers of plush padding, with a quilted top layer and a natural cotton cover. Is there anything specific you'd like to know about these materials?\n"
]
}
],
"source": [
"sales_agent.step()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: I'm doing great, thank you for asking! I'm glad to hear you're interested. Sleep Haven is a premium mattress company, and we're all about offering the best sleep solutions, including top-notch mattresses, pillows, and bedding accessories. Our mission is to help you achieve a better night's sleep. May I know if you're looking to enhance your sleep experience with a new mattress or bedding accessories? \n"
]
}
],
"source": [
"sales_agent.human_step(\n",
" \"Yes, I am looking for a queen sized mattress. Do you have any mattresses in queen size?\"\n",
")"
"sales_agent.step()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conversation Stage: Needs analysis: Ask open-ended questions to uncover the prospect's needs and pain points. Listen carefully to their responses and take notes.\n"
]
}
],
"outputs": [],
"source": [
"sales_agent.determine_conversation_stage()"
"sales_agent.human_step(\n",
" \"Yes, I would like to improve my sleep. Can you tell me more about your products?\"\n",
")"
]
},
{
@@ -1074,7 +1202,24 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: Yes, we do have queen-sized mattresses available. We offer the Luxury Cloud-Comfort Memory Foam Mattress and the Classic Harmony Spring Mattress in queen size. Both mattresses provide exceptional comfort and support. Is there anything specific you would like to know about these options?\n"
"Conversation Stage: Needs analysis: Ask open-ended questions to uncover the prospect's needs and pain points. Listen carefully to their responses and take notes.\n"
]
}
],
"source": [
"sales_agent.determine_conversation_stage()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: Absolutely, I'd be happy to share more about our products. At Sleep Haven, we offer a variety of high-quality mattresses designed to cater to different sleeping preferences and needs. Whether you're looking for memory foam's comfort, the support of hybrid mattresses, or the breathability of natural latex, we have options for everyone. Our pillows and bedding accessories are similarly curated to enhance your sleep quality. Every product is built with the aim of helping you achieve the restful night's sleep you deserve. What specific features are you looking for in a mattress? \n"
]
}
],
@@ -1084,16 +1229,16 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"sales_agent.human_step(\"Yea, compare and contrast those two options, please.\")"
"sales_agent.human_step(\"What mattresses do you have and how much do they cost?\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 32,
"metadata": {},
"outputs": [
{
@@ -1110,14 +1255,14 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: The Luxury Cloud-Comfort Memory Foam Mattress is priced at $999 and is available in Twin, Queen, and King sizes. It features an innovative, temperature-sensitive memory foam layer and a high-density foam base. On the other hand, the Classic Harmony Spring Mattress is priced at $1,299 and is available in Queen and King sizes. It features a robust inner spring construction and layers of plush padding. Both mattresses provide exceptional comfort and support, but the Classic Harmony Spring Mattress may be a better option if you prefer the traditional feel of an inner spring mattress. Do you have any other questions about these options?\n"
"Ted Lasso: We offer two primary types of mattresses at Sleep Haven. The first is our Luxury Cloud-Comfort Memory Foam Mattress, which is priced at $999 and comes in Twin, Queen, and King sizes. The second is our Classic Harmony Spring Mattress, priced at $1,299, available in Queen and King sizes. Both are designed to provide exceptional comfort and support for a better night's sleep. Which type of mattress would you be interested in learning more about? \n"
]
}
],
@@ -1127,14 +1272,66 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"sales_agent.human_step(\n",
" \"Great, thanks, that's it. I will talk to my wife and call back if she is onboard. Have a good day!\"\n",
" \"Okay.I would like to order two Memory Foam mattresses in Twin size please.\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conversation Stage: Close: Ask for the sale by proposing a next step. This could be a demo, a trial or a meeting with decision-makers. Ensure to summarize what has been discussed and reiterate the benefits.\n"
]
}
],
"source": [
"sales_agent.determine_conversation_stage()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Ted Lasso: Fantastic choice! You're on your way to a better night's sleep with our Luxury Cloud-Comfort Memory Foam Mattresses. I've generated a payment link for two Twin size mattresses for you. Here is the link to complete your purchase: https://buy.stripe.com/test_6oEg28e3V97BdDabJn. Is there anything else I can assist you with today? \n"
]
}
],
"source": [
"sales_agent.step()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"sales_agent.human_step(\n",
" \"Great, thanks! I will discuss with my wife and will buy it if she is onboard. Have a good day!\"\n",
"] += f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
"attribute_info[3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
"attribute_info[-3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\""
"attribute_info[-2][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
")\n",
"attribute_info[3][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
")\n",
"attribute_info[-3][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\"\n",
")"
]
},
{
@@ -688,9 +688,9 @@
"metadata": {},
"outputs": [],
"source": [
"attribute_info[-3][\n",
" \"description\"\n",
"] += \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"attribute_info[-3][\"description\"] += (\n",
" \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"Action Input: \"ChatGPT\"\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001B[0m\n",
"Cell \u001B[0;32mIn[36], line 1\u001B[0m\n\u001B[0;32m----> 1\u001B[0m \u001B[43magent_executor\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43minvoke\u001B[49m\u001B[43m(\u001B[49m\u001B[43m{\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43minput\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m:\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mWhat is ChatGPT?\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m}\u001B[49m\u001B[43m)\u001B[49m\n",
"agent_executor.invoke({\"input\": \"What is ChatGPT?\"})"
]
},
{
@@ -179,15 +196,15 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"Action Input: Who developed ChatGPT\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -202,7 +219,7 @@
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
"agent_executor.invoke({\"input\": \"Who developed it?\"})"
]
},
{
@@ -217,14 +234,14 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"Action Input: My daughter 5 years old\u001B[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"\u001B[1m> Entering new LLMChain chain...\u001B[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\u001B[32;1m\u001B[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
@@ -232,16 +249,16 @@
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot. It was created by OpenAI and can send and receive images while chatting.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\u001b[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot. It was created by OpenAI and can send and receive images while chatting.\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -256,8 +273,8 @@
}
],
"source": [
"agent_chain.run(\n",
" input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\"\n",
"agent_executor.invoke(\n",
" {\"input\": \"Thanks. Summarize the conversation, for my daughter 5 years old.\"}\n",
")"
]
},
@@ -289,9 +306,17 @@
}
],
"source": [
"print(agent_chain.memory.buffer)"
"print(agent_executor.memory.buffer)"
]
},
{
"cell_type": "markdown",
"id": "84ca95c30e262e00",
"metadata": {
"collapsed": false
},
"source": []
},
{
"cell_type": "markdown",
"id": "cc3d0aa4",
@@ -340,25 +365,9 @@
" ),\n",
"]\n",
"\n",
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"Action Input: \"ChatGPT\"\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -396,7 +405,7 @@
}
],
"source": [
"agent_chain.run(input=\"What is ChatGPT?\")"
"agent_executor.invoke({\"input\": \"What is ChatGPT?\"})"
]
},
{
@@ -411,15 +420,15 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"Action Input: Who developed ChatGPT\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -434,7 +443,7 @@
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
"agent_executor.invoke({\"input\": \"Who developed it?\"})"
]
},
{
@@ -449,14 +458,14 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"Action Input: My daughter 5 years old\u001B[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"\u001B[1m> Entering new LLMChain chain...\u001B[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\u001B[32;1m\u001B[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
@@ -464,16 +473,16 @@
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\u001b[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -488,8 +497,8 @@
}
],
"source": [
"agent_chain.run(\n",
" input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\"\n",
"agent_executor.invoke(\n",
" {\"input\": \"Thanks. Summarize the conversation, for my daughter 5 years old.\"}\n",
" AIMessage(content='The result of \\\\(3 + 5^{2.743}\\\\) is approximately 300.04, and the result of \\\\(17.24 - 918.1241\\\\) is approximately -900.88.', response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 251, 'total_tokens': 295}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-d1161669-ed09-4b18-94bd-6d8530df5aa8-0')]}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph.invoke(\n",
" {\n",
" \"messages\": [\n",
" HumanMessage(\n",
" \"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"\n",
" )\n",
" ]\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "073c074e-d722-42e0-85ec-c62c079207e4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.