Robocorp (action server) toolkit had a limitation that the content
length returned by the tool was always cut to max 5000 chars. This was
from the time when context windows were much more limited.
This PR removes the limitation. Whatever the underlying tool provides
gets sent back to the agent.
As the robocorp toolkit no longer restricts the content, the implication
is that either the Action (tool) developer or the agent developer needs
to be aware of potentially oversized tool responses. Our point of view
is this should be the agent developer's responsibility, them being in
control of the use case and aware of the context window the LLM has.
Description: We are merging UPSTAGE_DOCUMENT_AI_API_KEY and
UPSTAGE_API_KEY into one, and only UPSTAGE_API_KEY will be used going
forward. And we changed the base class of ChatUpstage to BaseChatOpenAI.
---------
Co-authored-by: Sean <chosh0615@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain-ibm: Fix llm and embeddings 'verify'
attribute default value"
- [x] **PR message**:
- **Description:** fix default value of "verify" attribute
- **Dependencies:** `ibm_watsonx_ai`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
…Endpoint`
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** add `bind_tools` and `with_structured_output` support
to `QianfanChatEndpoint`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- Added Together docs in chat models section
- Update Together provider docs to match the LLM & chat models sections
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This is a doc update. It fixes up formatting and product name
references. The example code is updated to use a local built-in text
file.
@mmhangami Please take a look
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Updated the together integration docs by leading with
the streaming example, explicitly specifying a model to show users how
to do that, and updating the sections to more closely match other
integrations.
Description: This PR includes fix for loader_source to be fetched from
metadata in case of GdriveLoaders.
Documentation: NA
Unit Test: NA
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
- it's only node ids that are limited
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [ ] **HuggingFaceInferenceAPIEmbeddings**: "Additional Headers"
- Where: langchain, community, embeddings. huggingface.py.
- Community: add additional headers when needed by custom HuggingFace
TEI embedding endpoints. HuggingFaceInferenceAPIEmbeddings"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Adding the `additional_headers` to be passed to
requests library if needed
- **Dependencies:** none
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. Tested with locally available TEI endpoints with and without
`additional_headers`
2. Example Usage
```python
embeddings=HuggingFaceInferenceAPIEmbeddings(
api_key=MY_CUSTOM_API_KEY,
api_url=MY_CUSTOM_TEI_URL,
additional_headers={
"Content-Type": "application/json"
}
)
```
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:** Adding chat completions to the Together AI package,
which is our most popular API. Also staying backwards compatible with
the old API so folks can continue to use the completions API as well.
Also moved the embedding API to use the OpenAI library to standardize it
further.
**Twitter handle:** @nutlope
- [x] **Add tests and docs**: If you're adding a new integration, please
include
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
[Standardized model init args
#20085](https://github.com/langchain-ai/langchain/issues/20085)
- Enable premai chat model to be initialized with `model_name` as an
alias for `model`, `api_key` as an alias for `premai_api_key`.
- Add initialization test `test_premai_initialization`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** fix: variable names in root validator not allowing
pass credentials as named parameters in llm instancing, also added
sambanova's sambaverse and sambastudio llms to __init__.py for module
import
Description: this change adds args_schema (pydantic BaseModel) to
YahooFinanceNewsTool for correct schema formatting on LLM function calls
Issue: currently using YahooFinanceNewsTool with OpenAI function calling
returns the following error "TypeError("YahooFinanceNewsTool._run() got
an unexpected keyword argument '__arg1'")". This happens because the
schema sent to the LLM is "input: "{'__arg1': 'MSFT'}"" while the method
should be called with the "query" parameter.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Issue: `load_qa_chain` is placed in the __init__.py file. As a result,
it is not listed in the API Reference docs.
BTW `load_qa_chain` is heavily presented in the doc examples, but is
missed in API Ref.
Change: moved code from init.py into a new file. Related: #21266
Reverts langchain-ai/langchain#21174
Hey team - going to revert this because it doesn't seem necessary for
testing. We should only be adding optional + extended_testing
dependencies for deps that have extended tests.
otherwise it just increases probability of dependency conflicts in the
community lockfile.
Thank you for contributing to LangChain!
community:baichuan[patch]: standardize init args
updated `baichuan_api_key` so that aliased to `api_key`. Added test that
it continues to set the same underlying attribute. Test checks for
`SecretStr`
updated `temperature` with Pydantic Field, added unit test.
Related to https://github.com/langchain-ai/langchain/issues/20085
If Session and/or keyspace are not provided, they are resolved from
cassio's context. So they are not required.
This change is fully backward compatible.
Issue: the `langkit` package is not presented in the `pyproject.toml`
but it is a requirement for the `WhyLabsCallbackHandler`
Change: added `langkit`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description:** Update LarkSuite loader doc to give an example for
loading data from LarkSuite wiki.
**Issue:** None
**Dependencies:** None
**Twitter handle:** None
Thank you for contributing to LangChain!
- [x] **PR title**: "langchain-ibm: Add support for ibm-watsonx-ai new
major version"
- [x] **PR message**:
- **Description:** Add support for ibm-watsonx-ai new major version
- **Dependencies:** `ibm_watsonx_ai`
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
The `LocalFileStore` class can be used to create an on-disk
`CacheBackedEmbeddings` cache. The number of files in these embeddings
caches can grow to be quite large over time (hundreds of thousands) as
embeddings are computed for new versions of content, but the embeddings
for old/deprecated content are not removed.
A *least-recently-used* (LRU) cache policy could be applied to the
`LocalFileStore` directory to delete cache entries that have not been
referenced for some time:
```bash
# delete files that have not been accessed in the last 90 days
find embeddings_cache_dir/ -atime 90 -print0 | xargs -0 rm
```
However, most filesystems in enterprise environments disable access time
modification on read to improve performance. As a result, the access
times of these cache entry files are not updated when their values are
read.
To resolve this, this pull request updates the `LocalFileStore`
constructor to offer an `update_atime` parameter that causes access
times to be updated when a cache entry is read.
For example,
```python
file_store = LocalFileStore(temp_dir, update_atime=True)
```
The default is `False`, which retains the original behavior.
**Testing:**
I updated the LocalFileStore unit tests to test the access time update.
Before you could only extract triples (diffbot calls it facts) from
diffbot to avoid isolated nodes. However, sometimes isolated nodes can
still be useful like for prefiltering, so we want to allow users to
extract them if they want. Default behaviour is unchanged.
**Description:** Update unit test for ChatAnthropic
**Issue:** Test for key passed in from the environment should not have
the key initialized in the constructor
**Dependencies:** None
Thank you for contributing to LangChain!
- Oracle AI Vector Search
Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
- Oracle AI Vector Search is designed for Artificial Intelligence (AI)
workloads that allows you to query data based on semantics, rather than
keywords. One of the biggest benefit of Oracle AI Vector Search is that
semantic search on unstructured data can be combined with relational
search on business data in one single system. This is not only powerful
but also significantly more effective because you don't need to add a
specialized vector database, eliminating the pain of data fragmentation
between multiple systems.
This Pull Requests Adds the following functionalities
Oracle AI Vector Search : Vector Store
Oracle AI Vector Search : Document Loader
Oracle AI Vector Search : Document Splitter
Oracle AI Vector Search : Summary
Oracle AI Vector Search : Oracle Embeddings
- We have added unit tests and have our own local unit test suite which
verifies all the code is correct. We have made sure to add guides for
each of the components and one end to end guide that shows how the
entire thing runs.
- We have made sure that make format and make lint run clean.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: skmishraoracle <shailendra.mishra@oracle.com>
Co-authored-by: hroyofc <harichandan.roy@oracle.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
Memory return could be set as `str` or `message` by `return_messages`
flag as mentioned in
https://python.langchain.com/docs/modules/memory/#whether-memory-is-a-string-or-a-list-of-messages,
where
`langchain.chains.conversation.memory.ConversationSummaryBufferMemory`
did not implement that.
This commit added `buffer_as_str` and `buffer_as_messages` function, and
`buffer` now affected by `return_messages` flag.
## Example Test Code and Output
```python
# Fix: ConversationSummaryBufferMemory with return_messages flag function
# Test code
from langchain.chains.conversation.memory import ConversationSummaryBufferMemory
from langchain_community.llms.ollama import Ollama
llm = Ollama()
# Create an instance of ConversationSummaryBufferMemory with return_messages set to True
memory = ConversationSummaryBufferMemory(return_messages=True, llm=llm)
# Add user and AI messages to the chat memory
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
# Print the buffer
print("Buffer:")
print(*map(type, memory.buffer), sep="\n")
print(memory.buffer, "\n")
# Print the buffer as a string
print("Buffer as String:")
print(type(memory.buffer_as_str))
print(memory.buffer_as_str, "\n")
# Print the buffer as messages
print("Buffer as Messages:")
print(*map(type, memory.buffer_as_messages), sep="\n")
print(memory.buffer_as_messages, "\n")
# Print the buffer after setting return_messages to False
memory.return_messages = False
print("Buffer after setting return_messages to False:")
print(type(memory.buffer))
print(memory.buffer, "\n")
```
```plaintext
Buffer:
<class 'langchain_core.messages.human.HumanMessage'>
<class 'langchain_core.messages.ai.AIMessage'>
[HumanMessage(content='hi!'), AIMessage(content="what's up?")]
Buffer as String:
<class 'str'>
Human: hi!
AI: what's up?
Buffer as Messages:
<class 'langchain_core.messages.human.HumanMessage'>
<class 'langchain_core.messages.ai.AIMessage'>
[HumanMessage(content='hi!'), AIMessage(content="what's up?")]
Buffer after setting return_messages to False:
<class 'str'>
Human: hi!
AI: what's up?
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Issue: we have several helper functions to import third-party libraries
like tools.gmail.utils.import_google in
[community.tools](https://api.python.langchain.com/en/latest/community_api_reference.html#id37).
And we have core.utils.utils.guard_import that works exactly for this
purpose.
The import_<package> functions work inconsistently and rather be private
functions.
Change: replaced these functions with the guard_import function.
Related to #21133
Issues (nit):
1. `utils.guard_import` prints wrong error message when there is an
import `error.` It prints the whole `module_name` but should be only the
first part as the pip package name. E.i. `langchain_core.utils` -> print
not `langchain-core` but `langchain_core.utils`. Also replace '_' with
'-' in the pip package name.
2. it does not handle the `ModuleNotFoundError` which raised if
`guard_import("wrong_module")`
Fixed issues; added ut-s. Controversial: I've reraised
`ModuleNotFoundError` as `ImportError`, since in case of the error, the
proposed action is the same - we need to install a missed package.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Issue: `load_summarize_chain` is placed in the __init__.py file. As a
result, it doesn't listed in the API Reference docs.
Change: moved code from __init__.py into a new file.
**PR message**:
- **Description:** Corrected a syntax error in the code comments within
the `create_tool_calling_agent` function in the langchain package.
- **Issue:** N/A
- **Dependencies:** No additional dependencies required.
- **Twitter handle:** N/A
This PR fixes#21196.
The error was occurring when calling chat completion API with a chat
history. Indeed, the Mistral API does not accept both `content` and
`tool_calls` in the same body.
This PR removes one of theses variables depending on the necessity.
---------
Co-authored-by: Maxime Perrin <mperrin@doing.fr>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
* Introduce individual `fetch_` methods for easier typing.
* Rework some docstrings to google style
* Move some logic to the tool
* Merge the 2 cassandra utility files
- support two-tuples of any sequence type (eg. json.loads never produces
tuples)
- support type alias for role key
- if id is passed in in dict form use it
- if tool_calls passed in in dict form use them
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Refactors the docs build in order to:
- run the same `make build` command in both vercel and local build
- incrementally build artifacts in 2 distinct steps, instead of building
all docs in-place (in vercel) or in a _dist dir (locally)
Highlights:
- introduces `make build` in order to build the docs
- collects and generates all files for the build in
`docs/build/intermediate`
- renders those jupyter notebook + markdown files into
`docs/build/outputs`
And now the outputs to host are in `docs/build/outputs`, which will need
a vercel settings change.
Todo:
- [ ] figure out how to point the right directory (right now deleting
and moving docs dir in vercel_build.sh isn't great)
**Description:**
This pull request introduces a new feature for LangChain: the
integration with the Rememberizer API through a custom retriever.
This enables LangChain applications to allow users to load and sync
their data from Dropbox, Google Drive, Slack, their hard drive into a
vector database that LangChain can query. Queries involve sending text
chunks generated within LangChain and retrieving a collection of
semantically relevant user data for inclusion in LLM prompts.
User knowledge dramatically improved AI applications.
The Rememberizer integration will also allow users to access general
purpose vectorized data such as Reddit channel discussions and US
patents.
**Issue:**
N/A
**Dependencies:**
N/A
**Twitter handle:**
https://twitter.com/Rememberizer
## Summary
`ruff /path/to/file.py` works but is deprecated, and we now recommend
`ruff check /path/to/file.py` (to match `ruff format /path/to/file.py`).
Vertex DIY RAG APIs helps to build complex RAG systems and provide more
granular control, and are suited for custom use cases.
The Ranking API takes in a list of documents and reranks those documents
based on how relevant the documents are to a given query. Compared to
embeddings that look purely at the semantic similarity of a document and
a query, the ranking API can give you a more precise score for how well
a document answers a given query.
[Reference](https://cloud.google.com/generative-ai-app-builder/docs/ranking)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Sync the config in `devcontainer.json` and `docker-compose.yml`**
Issue: when opening the current `master` branch in a dev container in VS
Code, I get the following message as VS Code cannot find the mounted
source folder:

Opening in a GitHub Codespace works (it seems to ignore the mounts in
the `docker-compose.yml`.
This PR updates the mount in `docker-compose.yml` and the config in
`devcontainer.json` so that the two align.
I have tested these changes in GitHub Codespaces and a VS Code dev
container and both loaded successfully.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Add tests to check API keys and Active Directory tokens
are masked
**Issue:** Resolves#12165 for OpenAI and Azure OpenAI models
**Dependencies:** None
Also resolves#12473 which may be closed.
Additional contributors @alex4321 (#12473) and @onesolpark (#12542)
- [ ] **PR message**:
- **Description:** Refactored the lazy_load method to use asynchronous
execution for improved performance. The method now initiates scraping of
all URLs simultaneously using asyncio.gather, enhancing data fetching
efficiency. Each Document object is yielded immediately once its content
becomes available, streamlining the entire process.
- **Issue:** N/A
- **Dependencies:** Requires the asyncio library for handling
asynchronous tasks, which should already be part of standard Python
libraries in Python 3.7 and above.
- **Email:** [r73327118@gmail.com](mailto:r73327118@gmail.com)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update python.py(experimental:Added code for PythonREPL)
Added code for PythonREPL, defining a static method 'sanitize_input'
that takes the string 'query' as input and returns a sanitizing string.
The purpose of this method is to remove unwanted characters from the
input string, Specifically:
1. Delete the whitespace at the beginning and end of the string (' \s').
2. Remove the quotation marks (`` ` ``) at the beginning and end of the
string.
3. Remove the keyword "python" at the beginning of the string (case
insensitive) because the user may have typed it.
This method uses regular expressions (regex) to implement sanitizing.
It all started with this code:
from langchain.agents import Tool
from langchain_experimental.utilities import PythonREPL
python_repl = PythonREPL()
repl_tool = Tool(
name="python_repl",
description="Remove redundant formatting marks at the beginning and end
of source code from input.Use a Python shell to execute python commands.
If you want to see the output of a value, you should print it out with
`print(...)`.",
func=python_repl.run,
)
When I call the agent to write a piece of code for me and execute it
with the defined code, I must get an error: SyntaxError('invalid
syntax', ('<string>', 1, 1,'In', 1, 2))
After checking, I found that pythonREPL has less formatting of input
code than the soon-to-be deprecated pythonREPL tool, so I added this
step to it, so that no matter what code I ask the agent to write for me,
it can be executed smoothly and get the output result.
I have tried modifying the prompt words to solve this problem before,
but it did not work, and by adding a simple format check, the problem is
well resolved.
<img width="1271" alt="image"
src="https://github.com/langchain-ai/langchain/assets/164149097/c49a685f-d246-4b11-b655-fd952fc2f04c">
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**
This pull request updates the Bagel Network package name from
"betabageldb" to "bagelML" to align with the latest changes made by the
Bagel Network team.
The following modifications have been made:
- Updated all references to the old package name ("betabageldb") with
the new package name ("bagelML") throughout the codebase.
- Modified the documentation, and any relevant scripts to reflect the
package name change.
- Tested the changes to ensure that the functionality remains intact and
no breaking changes were introduced.
By merging this pull request, our project will stay up to date with the
latest Bagel Network package naming convention, ensuring compatibility
and smooth integration with their updated library.
Please review the changes and provide any feedback or suggestions. Thank
you!
**Description:** Update UpstageLayoutAnalysisParser and Loader and add
upstage loader example in pdf section
**Dependencies:** langchain_community
**Twitter handle:** [@upstageai](https://twitter.com/upstageai)
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Issue:**
Currently `AzureSearch` vector store does not implement `delete` method.
This PR implements it. This also makes it compatible with LangChain
indexer.
**Dependencies:**
None
**Twitter handle:**
@martintriska1
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Upgrades prompts module to use optional imports.
This code was generated with a migration script, but had to be adjusted
manually a bit.
Testing in preparation for applying this code modification across the
rest of the modules in langchain package to reverse the dependency
between langchain community and langchain.
## Summary
No new diagnostics (given that the set of enabled rules hasn't changed),
but gains access to our new parser (much faster) and reduced false
positives all around.
### Description:
When attempting to download PDF files from arXiv, an unexpected 404
error frequently occurs. This error halts the operation, regardless of
whether there are additional documents to process. As a solution, I
suggest implementing a mechanism to ignore and communicate this error
and continue processing the next document from the list.
Proposed Solution: To address the issue of unexpected 404 errors during
PDF downloads from arXiv, I propose implementing the following solution:
- Error Handling: Implement error handling mechanisms to catch and
handle 404 errors gracefully.
- Communication: Inform the user or logging system about the occurrence
of the 404 error.
- Continued Processing: After encountering a 404 error, continue
processing the remaining documents from the list without interruption.
This solution ensures that the application can handle unexpected errors
without terminating the entire operation. It promotes resilience and
robustness in the face of intermittent issues encountered during PDF
downloads from arXiv.
### Issue:
#20909
### Dependencies:
none
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Summary
I ran `ruff check --extend-select RUF100 -n` to identify `# noqa`
comments that weren't having any effect in Ruff, and then `ruff check
--extend-select RUF100 -n --fix` on select files to remove all of the
unnecessary `# noqa: F401` violations. It's possible that these were
needed at some point in the past, but they're not necessary in Ruff
v0.1.15 (used by LangChain) or in the latest release.
Co-authored-by: Erick Friis <erick@langchain.dev>
…/17690
Thank you for contributing to LangChain!
- [x] **Fix Google Lens knowledge graph issue**: "langchain: community"
- Fix for [No "knowledge_graph" property in Google Lens API call from
SerpAPI](https://github.com/langchain-ai/langchain/issues/17690)
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** handled the existence of keys in the json response of
Google Lens
- **Issue:** [No "knowledge_graph" property in Google Lens API call from
SerpAPI](https://github.com/langchain-ai/langchain/issues/17690)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Proposing to centralize code for handling dynamic imports. This allows treating langchain-community as an optional dependency.
---
The proposal is to scan the code base and to replace all existing imports with dynamic imports using this functionality.
Fixed the error that the model name is never actually put into GigaChat
request payload, always defaulting to `GigaChat-Lite`.
With this fix, model selection through
```python
import os
from langchain.chat_models.gigachat import GigaChat
chat = GigaChat(
name="GigaChat-Pro", # <- HERE!!!!!
...
)
```
should actually work, as intended in
[here](804390ba4b/libs/community/langchain_community/llms/gigachat.py (L36)).
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description**: ToolKit and Tools for accessing data in a Cassandra
Database primarily for Agent integration. Initially, this includes the
following tools:
- `cassandra_db_schema` Gathers all schema information for the connected
database or a specific schema. Critical for the agent when determining
actions.
- `cassandra_db_select_table_data` Selects data from a specific keyspace
and table. The agent can pass paramaters for a predicate and limits on
the number of returned records.
- `cassandra_db_query` Expiriemental alternative to
`cassandra_db_select_table_data` which takes a query string completely
formed by the agent instead of parameters. May be removed in future
versions.
Includes unit test and two notebooks to demonstrate usage.
**Dependencies**: cassio
**Twitter handle**: @PatrickMcFadin
---------
Co-authored-by: Phil Miesle <phil.miesle@datastax.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** This pull request introduces a new feature to community
tools, enhancing its search capabilities by integrating the Mojeek
search engine
**Dependencies:** None
---------
Co-authored-by: Igor Brai <igor@mojeek.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Removed redundant self/cls from required args of class functions in
_get_python_function_required_args:
```python
class MemberTool:
def search_member(
self,
keyword: str,
*args,
**kwargs,
):
"""Search on members with any keyword like first_name, last_name, email
Args:
keyword: Any keyword of member
"""
headers = dict(authorization=kwargs['token'])
members = []
try:
members = request_(
method='SEARCH',
url=f'{service_url}/apiv1/members',
headers=headers,
json=dict(query=keyword),
)
except Exception as e:
logger.info(e.__doc__)
return members
convert_to_openai_tool(MemberTool.search_member)
```
expected result:
```
{'type': 'function', 'function': {'name': 'search_member', 'description': 'Search on members with any keyword like first_name, last_name, username, email', 'parameters': {'type': 'object', 'properties': {'keyword': {'type': 'string', 'description': 'Any keyword of member'}}, 'required': ['keyword']}}}
```
#20685
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "docs: switched GCSLoaders docs to
langchain-google-community"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** switched GCSLoaders docs to
langchain-google-community
Issue: When the third-party package is not installed, whenever we need
to `pip install <package>` the ImportError is raised.
But sometimes, the `ValueError` or `ModuleNotFoundError` is raised. It
is bad for consistency.
Change: replaced the `ValueError` or `ModuleNotFoundError` with
`ImportError` when we raise an error with the `pip install <package>`
message.
Note: Ideally, we replace all `try: import... except... raise ... `with
helper functions like `import_aim` or just use the existing
[langchain_core.utils.utils.guard_import](https://api.python.langchain.com/en/latest/utils/langchain_core.utils.utils.guard_import.html#langchain_core.utils.utils.guard_import)
But it would be much bigger refactoring. @baskaryan Please, advice on
this.
Implemented bind_tools for OllamaFunctions.
Made OllamaFunctions sub class of ChatOllama.
Implemented with_structured_output for OllamaFunctions.
integration unit test has been updated.
notebook has been updated.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
I can't seem to reproduce, but i got this:
```
SystemError: AST constructor recursion depth mismatch (before=102, after=37)
```
And the operation isn't critical for the actual forward pass so seems
preferable to expand our caught exceptions
**Description**: This update enhances the `extract_sub_links` function
within the `langchain_core/utils/html.py` module to include query
parameters in the extracted URLs.
**Issue**: N/A
**Dependencies**: No additional dependencies required for this change.
**Twitter handle**: N/A
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Just a simple PR to fix a broken link. Apparently having backticks
outside a link makes it render as code.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This introduces `store_kwargs` which behaves similarly to `graph_kwargs`
on the `RdfGraph` object, which will enable users to pass `headers` and
other arguments to the underlying `SPARQLStore` object. I have also made
a [PR in `rdflib` to support passing
`default_graph`](https://github.com/RDFLib/rdflib/pull/2761).
Example usage:
```python
from langchain_community.graphs import RdfGraph
graph = RdfGraph(
query_endpoint="http://localhost/sparql",
standard="rdf",
store_kwargs=dict(
default_graph="http://example.com/mygraph"
)
)
```
<!--If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.-->
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
MindsDB integrates with LangChain, enabling users to deploy, serve, and
fine-tune models available via LangChain within MindsDB, making them
accessible to numerous data sources.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: The PebbloSafeLoader should first check for owner,
full_path and size in metadata before implementing its own logic.
Dependencies: None
Documentation: NA.
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Issue: #20514
The current implementation of `construct_instance` expects a `texts:
List[str]` that will call the embedding function. This might not be
needed when we already have a client with collection and `path, you
don't want to add any text.
This PR adds a class method that returns a qdrant instance with an
existing client.
Here everytime
cb6e5e56c2/libs/community/langchain_community/vectorstores/qdrant.py (L1592)
`construct_instance` is called, this line sends some text for embedding
generation.
---------
Co-authored-by: Anush <anushshetty90@gmail.com>
* Groundedness Check takes `str` or `list[Document]` as input.
* Deprecate `GroundednessCheck` due to its naming.
* Added `UpstageGroundednessCheck`.
* Hotfix for Groundedness Check parameter.
The name `query` was misleading and it should be `answer` instead.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This auto generates partner migrations.
At the moment the migration is from community -> partner.
So one would need to run the migration script twice to go from langchain to partner.
Add script to help generate migrations.
This works well for partner packages. Migrations are generated based on run time rather than static analysis (much simpler to get the correct migrations implemented).
The script for generating migrations from langchain to community still needs work.
`langchain_pinecone.Pinecone` is deprecated in favor of
`PineconeVectorStore`, and is currently a subclass of
`PineconeVectorStore`.
```python
@deprecated(since="0.0.3", removal="0.2.0", alternative="PineconeVectorStore")
class Pinecone(PineconeVectorStore):
"""Deprecated. Use PineconeVectorStore instead."""
pass
```
**Description:** AzureSearch vector store has no tests. This PR adds
initial tests to validate the code can be imported and used.
**Issue:** N/A
**Dependencies:** azure-search-documents and azure-identity are added as
optional dependencies for testing
---------
Co-authored-by: Matt Gotteiner <[email protected]>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**:
_PebbloSafeLoader_: Add support for pebblo server and client version
**Documentation:** NA
**Unit test:** NA
**Issue:** NA
**Dependencies:** None
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- [ ] **Kinetica Document Loader**: "community: a class to load
Documents from Kinetica"
- [ ] **Kinetica Document Loader**:
- **Description:** implemented KineticaLoader in `kinetica_loader.py`
- **Dependencies:** install the Kinetica API using `pip install
gpudb==7.2.0.1 `
**Description:** Fixes a bug in the HuggingGPT task execution logic
here:
except Exception as e:
self.status = "failed"
self.message = str(e)
self.status = "completed"
self.save_product()
where a caught exception effectively just sets `self.message` and can
then throw an exception if, e.g., `self.product` is not defined.
**Issue:** None that I'm aware of.
**Dependencies:** None
**Twitter handle:** https://twitter.com/michaeljschock
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Changes
`lanchain_core.output_parsers.CommaSeparatedListOutputParser` to handle
`,` as a delimiter alongside the previous implementation which used `, `
as delimiter.
- **Issue:** Started noticing that some results returned by LLMs were
not getting parsed correctly when the output contained `,` instead of `,
`.
- **Dependencies:** No
- **Twitter handle:** not active on twitter.
<!---
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
-->
- **Description**:
- **add support for more data types**: by default `IpexLLM` will load
the model in int4 format. This PR adds more data types support such as
`sym_in5`, `sym_int8`, etc. Data formats like NF3, NF4, FP4 and FP8 are
only supported on GPU and will be added in future PR.
- Fix a small issue in saving/loading, update api docs
- **Dependencies**: `ipex-llm` library
- **Document**: In `docs/docs/integrations/llms/ipex_llm.ipynb`, added
instructions for saving/loading low-bit model.
- **Tests**: added new test cases to
`libs/community/tests/integration_tests/llms/test_ipex_llm.py`, added
config params.
- **Contribution maintainer**: @shane-huang
Description: Add support for Semantic topics and entities.
Classification done by pebblo-server is not used to enhance metadata of
Documents loaded by document loaders.
Dependencies: None
Documentation: Updated.
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**
- [x] **PR message**:
- **Description:** Deprecate persist method in Chroma no longer exists
in Chroma 0.4.x
- **Issue:** #20851
- **Dependencies:** None
- **Twitter handle:** AndresAlgaba1
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:**
This PR removes an unnecessary code snippet from the documentation. The
snippet in question is not relevant to the content and does not
contribute to the overall understanding of the topic. It contained
redundant imports and unused code, potentially causing confusion for
readers.
**Issue:**
There is no specific issue number associated with this change.
**Dependencies:**
No additional dependencies are required for this change.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
The RecursiveUrlLoader loader offers a link_regex parameter that can
filter out URLs. However, this filtering capability is limited, and if
the internal links of the website change, unexpected resources may be
loaded. These resources, such as font files, can cause problems in
subsequent embedding processing.
>
https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf
We can add the Content-Type in the HTTP response headers to the document
metadata so developers can choose which resources to use. This allows
developers to make their own choices.
For example, the following may be a good choice for text knowledge.
- text/plain - simple text file
- text/html - HTML web page
- text/xml - XML format file
- text/json - JSON format data
- application/pdf - PDF file
- application/msword - Word document
and ignore the following
- text/css - CSS stylesheet
- text/javascript - JavaScript script
- application/octet-stream - binary data
- image/jpeg - JPEG image
- image/png - PNG image
- image/gif - GIF image
- image/svg+xml - SVG image
- audio/mpeg - MPEG audio files
- video/mp4 - MP4 video file
- application/font-woff - WOFF font file
- application/font-ttf - TTF font file
- application/zip - ZIP compressed file
- application/octet-stream - binary data
**Twitter handle:** @coolbeevip
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
**Description:** In VoyageAI text-embedding examples use voyage-law-2
model
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: Fix misplaced zep cloud example links
- [x] **PR message**:
- **Description:** Fixes misplaced links for vector store and memory zep
cloud examples
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Rerank API
- **Twitter handle:** https://twitter.com/JinaAI_
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
OpenAI API compatible server may not support `safe_len_embedding`,
use `disable_safe_len_embeddings=True` to disable it.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
* Updating the provider docs page.
The RAG example was meant to be moved to cookbook, but was merged by
mistake.
* Fix bug in Groundedness Check
---------
Co-authored-by: JuHyung-Son <sonju0427@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Currently, when a new dev container is created, poetry does not work in
it with the error "No module named 'rapidfuzz'".
Install Poetry outside the project venv so that poetry and project
dependencies do not get mixed. Use pipx to install poetry securely in
its own isolated environment.
Issue: #12237
Twitter handle: https://twitter.com/ibratoev
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Currently, the regex is static (`r"(?<=[.?!])\s+"`),
which is only useful for certain use cases. The current change only
moves this to be a parameter of split_text(). Which adds flexibility
without making it more complex (as the default regex is still the same).
- **Issue:** Not applicable (I searched, no one seems to have created
this issue yet).
- **Dependencies:** None.
_If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17._
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description: MarkdownHeaderTextSplitter Fails to Parse Headers with
non-printable characters. more #20643
The following is the official test case. Just replacing `# Foo\n\n` with
`\ufeff# Foo\n\n` will cause the test case to fail.
chunk metadata is empty
```python
def test_md_header_text_splitter_1() -> None:
"""Test markdown splitter by header: Case 1."""
markdown_document = (
"\ufeff# Foo\n\n"
" ## Bar\n\n"
"Hi this is Jim\n\n"
"Hi this is Joe\n\n"
" ## Baz\n\n"
" Hi this is Molly"
)
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on,
)
output = markdown_splitter.split_text(markdown_document)
expected_output = [
Document(
page_content="Hi this is Jim \nHi this is Joe",
metadata={"Header 1": "Foo", "Header 2": "Bar"},
),
Document(
page_content="Hi this is Molly",
metadata={"Header 1": "Foo", "Header 2": "Baz"},
),
]
assert output == expected_output
```
twitter: @coolbeevip
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Description :
- added functionalities - delete, index creation, using existing
connection object etc.
- updated usage
- Added LaceDB cloud OSS support
make lint_diff , make test checks done
- **Description:** fix a bug in the agent_token_buffer_memory
- **Issue:** agent_token_buffer_memory was not working with openai tools
- **Dependencies:** None
- **Twitter handle:** @pokidyshef
**Description:** Adds the command to install packages required before
using _Unstructured_ and _PDFMiner_ from `langchain.community`
**Documentation Page Being Updated:** [LangChain > Retrieval > Document
loaders > PDF > Using
Unstructured](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf/#using-unstructured)
**Issue:** #20719
**Dependencies:** no dependencies
**Twitter handle:** SalikaDave
<!--
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17. -->
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Description
Add `aprep_output` method to `langchain/chains/base.py`. Some downstream
`ChatMessageHistory` objects that use async connections require an async
way to append to the context.
It turned out that `ainvoke()` was calling `prep_output` which is
synchronous.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
# Proxy Fix for Groq Class 🐛🚀
## Description
This PR fixes a bug related to proxy settings in the `Groq` class,
allowing users to connect to LangChain services via a proxy.
## Changes Made
- ✅ FIX support for specifying proxy settings in the `Groq` class.
- ✅ Resolved the bug causing issues with proxy settings.
- ❌ Did not include unit tests and documentation updates.
- ❌ Did not run make format, make lint, and make test to ensure code
quality and functionality because I couldn't get it to run, so I don't
program in Python and couldn't run `ruff`.
- ❔ Ensured that the changes are backwards compatible.
- ✅ No additional dependencies were added to `pyproject.toml`.
### Error Before Fix
```python
Traceback (most recent call last):
File "/home/bg/Documents/code/github.com/back2nix/test/groq/main.py", line 9, in <module>
chat = ChatGroq(
^^^^^^^^^
File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/langchain_core/load/serializable.py", line 120, in __init__
super().__init__(**kwargs)
File "/home/bg/Documents/code/github.com/back2nix/test/groq/venv310/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for ChatGroq
__root__
Invalid `http_client` argument; Expected an instance of `httpx.AsyncClient` but got <class 'httpx.Client'> (type=type_error)
```
### Example usage after fix
```python3
import os
import httpx
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
chat = ChatGroq(
temperature=0,
groq_api_key=os.environ.get("GROQ_API_KEY"),
model_name="mixtral-8x7b-32768",
http_client=httpx.Client(
proxies="socks5://127.0.0.1:1080",
transport=httpx.HTTPTransport(local_address="0.0.0.0"),
),
http_async_client=httpx.AsyncClient(
proxies="socks5://127.0.0.1:1080",
transport=httpx.HTTPTransport(local_address="0.0.0.0"),
),
)
system = "You are a helpful assistant."
human = "{text}"
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])
chain = prompt | chat
out = chain.invoke({"text": "Explain the importance of low latency LLMs"})
print(out)
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Implemented the ability to enable full-text search within the
SingleStore vector store, offering users a versatile range of search
strategies. This enhancement allows users to seamlessly combine
full-text search with vector search, enabling the following search
strategies:
* Search solely by vector similarity.
* Conduct searches exclusively based on text similarity, utilizing
Lucene internally.
* Filter search results by text similarity score, with the option to
specify a threshold, followed by a search based on vector similarity.
* Filter results by vector similarity score before conducting a search
based on text similarity.
* Perform searches using a weighted sum of vector and text similarity
scores.
Additionally, integration tests have been added to comprehensively cover
all scenarios.
Updated notebook with examples.
CC: @baskaryan, @hwchase17
---------
Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- added guard on the `pyTigerGraph` import
- added a missed example page in the `docs/integrations/graphs/`
- formatted the `docs/integrations/providers/` page to the consistent
format. Added links.
- **Description:**
This PR adds support for advanced filtering to the integration of HANA
Vector Engine.
The newly supported filtering operators are: $eq, $ne, $gt, $gte, $lt,
$lte, $between, $in, $nin, $like, $and, $or
- **Issue:** N/A
- **Dependencies:** no new dependencies added
Added integration tests to:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`
Description of the new capabilities in notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
Thank you for contributing to LangChain!
community:perplexity[patch]: standardize init args
updated pplx_api_key and request_timeout so that aliased to api_key, and
timeout respectively. Added test that both continue to set the same
underlying attributes.
Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: docs: Update Zep Messaging, add links to Zep Cloud
Docs
- [x] **PR message**:
- **Description:** This PR updates Zep messaging in the docs + links to
Langchain Zep Cloud examples in our documentation
- **Twitter handle:** @paulpaliychuk51
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
This PR moves the interface and the logic to core.
The following changes to namespaces:
`indexes` -> `indexing`
`indexes._api` -> `indexing.api`
Testing code is intentionally duplicated for now since it's testing
different
implementations of the record manager (in-memory vs. SQL).
Common logic will need to be pulled out into the test client.
A follow up PR will move the SQL based implementation outside of
LangChain.
**Description:**
This PR fixes an issue in message formatting function for Anthropic
models on Amazon Bedrock.
Currently, LangChain BedrockChat model will crash if it uses Anthropic
models and the model return a message in the following type:
- `AIMessageChunk`
Moreover, when use BedrockChat with for building Agent, the following
message types will trigger the same issue too:
- `HumanMessageChunk`
- `FunctionMessage`
**Issue:**
https://github.com/langchain-ai/langchain/issues/18831
**Dependencies:**
No.
**Testing:**
Manually tested. The following code was failing before the patch and
works after.
```
@tool
def square_root(x: str):
"Useful when you need to calculate the square root of a number"
return math.sqrt(int(x))
llm = ChatBedrock(
model_id="anthropic.claude-3-sonnet-20240229-v1:0",
model_kwargs={ "temperature": 0.0 },
)
prompt = ChatPromptTemplate.from_messages(
[
("system", FUNCTION_CALL_PROMPT),
("human", "Question: {user_input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
tools = [square_root]
tools_string = format_tool_to_anthropic_function(square_root)
agent = (
RunnablePassthrough.assign(
user_input=lambda x: x['user_input'],
agent_scratchpad=lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
)
)
| prompt
| llm
| AnthropicFunctionsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True)
output = agent_executor.invoke({
"user_input": "What is the square root of 2?",
"tools_string": tools_string,
})
```
List of messages returned from Bedrock:
```
<SystemMessage> content='You are a helpful assistant.'
<HumanMessage> content='Question: What is the square root of 2?'
<AIMessageChunk> content="Okay, let's calculate the square root of 2.<scratchpad>\nTo calculate the square root of a number, I can use the square_root tool:\n\n<function_calls>\n <invoke>\n <tool_name>square_root</tool_name>\n <parameters>\n <__arg1>2</__arg1>\n </parameters>\n </invoke>\n</function_calls>\n</scratchpad>\n\n<function_results>\n<search_result>\nThe square root of 2 is approximately 1.414213562373095\n</search_result>\n</function_results>\n\n<answer>\nThe square root of 2 is approximately 1.414213562373095\n</answer>" id='run-92363df7-eff6-4849-bbba-fa16a1b2988c'"
<FunctionMessage> content='1.4142135623730951' name='square_root'
```
Hi! My name is Alex, I'm an SDK engineer from
[Comet](https://www.comet.com/site/)
This PR updates the `CometTracer` class.
Fixed an issue when `CometTracer` failed while logging the data to Comet
because this data is not JSON-encodable.
The problem was in some of the `Run` attributes that could contain
non-default types inside, now these attributes are taken not from the
run instance, but from the `run.dict()` return value.
Causes an issue for this code
```python
from langchain.chat_models.openai import ChatOpenAI
from langchain.output_parsers.openai_tools import JsonOutputToolsParser
from langchain.schema import SystemMessage
prompt = SystemMessage(content="You are a nice assistant.") + "{question}"
llm = ChatOpenAI(
model_kwargs={
"tools": [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Searches the web for the answer to the question.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The question to search for.",
},
},
},
},
}
],
},
streaming=True,
)
parser = JsonOutputToolsParser(first_tool_only=True)
llm_chain = prompt | llm | parser | (lambda x: x)
for chunk in llm_chain.stream({"question": "tell me more about turtles"}):
print(chunk)
# message = llm_chain.invoke({"question": "tell me more about turtles"})
# print(message)
```
Instead by definition, we'll assume that RunnableLambdas consume the
entire stream and that if the stream isn't addable then it's the last
message of the stream that's in the usable format.
---
If users want to use addable dicts, they can wrap the dict in an
AddableDict class.
---
Likely, need to follow up with the same change for other places in the
code that do the upgrade
- **Description:** In January, Laiyer.ai became part of ProtectAI, which
means the model became owned by ProtectAI. In addition to that,
yesterday, we released a new version of the model addressing issues the
Langchain's community and others mentioned to us about false-positives.
The new model has a better accuracy compared to the previous version,
and we thought the Langchain community would benefit from using the
[latest version of the
model](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2).
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @alex_yaremchuk
This PR moves the implementations for chat history to core. So it's
easier to determine which dependencies need to be broken / add
deprecation warnings
Fixed an error in the sample code to ensure that the code can run
directly.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
langchain_community.document_loaders depricated
new langchain_google_community
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
docs: Fix link for `partition_pdf` in Semi_Structured_RAG.ipynb cookbook
- **Description:** Fix incorrect link to unstructured-io `partition_pdf`
section
Vector indexes in ClickHouse are experimental at the moment and can
sometimes break/change behaviour. So this PR makes it possible to say
that you don't want to specify an index type.
Any queries against the embedding column will be brute force/linear
scan, but that gives reasonable performance for small-medium dataset
sizes.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "docs: added a description of differences
langchain_google_genai vs langchain_google_vertexai"
- [ ]
- **Description:** added a description of differences
langchain_google_genai vs langchain_google_vertexai
**Description:** implemented GraphStore class for Apache Age graph db
**Dependencies:** depends on psycopg2
Unit and integration tests included. Formatting and linting have been
run.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Update Neo4j Cypher templates to use function callback to pass context
instead of passing it in user prompt.
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** This pull request removes a duplicated `--quiet` flag
in the pip install command found in the LangSmith Walkthrough section of
the documentation.
**Issue:** N/A
**Dependencies:** None
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: docs"
- [ ] **PR message**:
- **Description:** Updated Tutorials for Vertex Vector Search
- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
@lkuligin for review
---------
Co-authored-by: adityarane@google.com <adityarane@google.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This pull request corrects a mistake in the variable name within the
example code. The variable doc_schema has been changed to dog_schema to
fix the error.
Description: you don't need to pass a version for Replicate official
models. That was broken on LangChain until now!
You can now run:
```
llm = Replicate(
model="meta/meta-llama-3-8b-instruct",
model_kwargs={"temperature": 0.75, "max_length": 500, "top_p": 1},
)
prompt = """
User: Answer the following yes/no question by reasoning step by step. Can a dog drive a car?
Assistant:
"""
llm(prompt)
```
I've updated the replicate.ipynb to reflect that.
twitter: @charliebholtz
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
ZhipuAI API only accepts `temperature` parameter between `(0, 1)` open
interval, and if `0` is passed, it responds with status code `400`.
However, 0 and 1 is often accepted by other APIs, for example, OpenAI
allows `[0, 2]` for temperature closed range.
This PR truncates temperature parameter passed to `[0.01, 0.99]` to
improve the compatibility between langchain's ecosystem's and ZhipuAI
(e.g., ragas `evaluate` often generates temperature 0, which results in
a lot of 400 invalid responses). The PR also truncates `top_p` parameter
since it has the same restriction.
Reference: [glm-4 doc](https://open.bigmodel.cn/dev/api#glm-4) (which
unfortunately is in Chinese though).
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
faster-whisper is a reimplementation of OpenAI's Whisper model using
CTranslate2, which is up to 4 times faster than enai/whisper for the
same accuracy while using less memory. The efficiency can be further
improved with 8-bit quantization on both CPU and GPU.
It can automatically detect the following 14 languages and transcribe
the text into their respective languages: en, zh, fr, de, ja, ko, ru,
es, th, it, pt, vi, ar, tr.
The gitbub repository for faster-whisper is :
https://github.com/SYSTRAN/faster-whisper
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
VSDX data contains EMF files. Some of these apparently can contain
exploits with some Adobe tools.
This is likely a false positive from antivirus software, but we
can remove it nonetheless.
Hey @eyurtsev, I noticed that the notebook isn't displaying the outputs
properly. I've gone ahead and rerun the cells to ensure that readers can
easily understand the functionality without having to run the code
themselves.
Replaced `from langchain.prompts` with `from langchain_core.prompts`
where it is appropriate.
Most of the changes go to `langchain_experimental`
Similar to #20348
…gFaceTextGenInference)
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for [HuggingFaceTextGenInference]
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in [HuggingFaceTextGenInference]
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
fix timeout issue
fix zhipuai usecase notebookbook
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
fixed broken `LangGraph` hyperlink
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
@rgupta2508 I believe this change is necessary following
https://github.com/langchain-ai/langchain/pull/20318 because of how
Milvus handles defaults:
59bf5e811a/pymilvus/client/prepare.py (L82-L85)
```python
num_shards = kwargs[next(iter(same_key))]
if not isinstance(num_shards, int):
msg = f"invalid num_shards type, got {type(num_shards)}, expected int"
raise ParamError(message=msg)
req.shards_num = num_shards
```
this way lets Milvus control the default value (instead of maintaining a
separate default in Langchain).
Let me know if I've got this wrong or you feel it's unnecessary. Thanks.
To support number of the shards for the collection to create in milvus
vvectorstores.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Move `FileCallbackHandler` from community to core
**Issue:** #20493
**Dependencies:** None
(imo) `FileCallbackHandler` is a built-in LangChain callback handler
like `StdOutCallbackHandler` and should properly be in in core.
- **Description:** added the headless parameter as optional argument to
the langchain_community.document_loaders AsyncChromiumLoader class
- **Dependencies:** None
- **Twitter handle:** @perinim_98
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- would happen when user's code tries to access attritbute that doesnt
exist, we prefer to let this crash in the user's code, rather than here
- also catch more cases where a runnable is invoked/streamed inside a
lambda. before we weren't seeing these as deps
**Description:** currently, the `DirectoryLoader` progress-bar maximum value is based on an incorrect number of files to process
In langchain_community/document_loaders/directory.py:127:
```python
paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob)
items = [
path
for path in paths
if not (self.exclude and any(path.match(glob) for glob in self.exclude))
]
```
`paths` returns both files and directories. `items` is later used to determine the maximum value of the progress-bar which gives an incorrect progress indication.
- Add functions (_stream, _astream)
- Connect to _generate and _agenerate
Thank you for contributing to LangChain!
- [x] **PR title**: "community: Add streaming logic in ChatHuggingFace"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Addition functions (_stream, _astream) and connection
to _generate and _agenerate
- **Issue:** #18782
- **Dependencies:** none
- **Twitter handle:** @lunara_x
**Community: Unify Titan Takeoff Integrations and Adding Embedding
Support**
**Description:**
Titan Takeoff no longer reflects this either of the integrations in the
community folder. The two integrations (TitanTakeoffPro and
TitanTakeoff) where causing confusion with clients, so have moved code
into one place and created an alias for backwards compatibility. Added
Takeoff Client python package to do the bulk of the work with the
requests, this is because this package is actively updated with new
versions of Takeoff. So this integration will be far more robust and
will not degrade as badly over time.
**Issue:**
Fixes bugs in the old Titan integrations and unified the code with added
unit test converge to avoid future problems.
**Dependencies:**
Added optional dependency takeoff-client, all imports still work without
dependency including the Titan Takeoff classes but just will fail on
initialisation if not pip installed takeoff-client
**Twitter**
@MeryemArik9
Thanks all :)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Description: Add support for authorized identities in PebbloSafeLoader.
Now with this change, PebbloSafeLoader will extract
authorized_identities from metadata and send it to pebblo server
Dependencies: None
Documentation: None
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
From `langchain_community 0.0.30`, there's a bug that cannot send a
file-like object via `file` parameter instead of `file path` due to
casting the `file_path` to str type even if `file_path` is None.
which means that when I call the `partition_via_api()`, exactly one of
`filename` and `file` must be specified by the following error message.
however, from `langchain_community 0.0.30`, `file_path` is casted into
`str` type even `file_path` is None in `get_elements_from_api()` and got
an error at `exactly_one(filename=filename, file=file)`.
here's an error message
```
---> 51 exactly_one(filename=filename, file=file)
53 if metadata_filename and file_filename:
54 raise ValueError(
55 "Only one of metadata_filename and file_filename is specified. "
56 "metadata_filename is preferred. file_filename is marked for deprecation.",
57 )
File /opt/homebrew/lib/python3.11/site-packages/unstructured/partition/common.py:441, in exactly_one(**kwargs)
439 else:
440 message = f"{names[0]} must be specified."
--> 441 raise ValueError(message)
ValueError: Exactly one of filename and file must be specified.
```
So, I simply made a change that casting to str type when `file_path` is
not None.
I use `UnstructuredAPIFileLoader` like below.
```
from langchain_community.document_loaders.unstructured import UnstructuredAPIFileLoader
documents: list = UnstructuredAPIFileLoader(
file_path=None,
file=file, # file-like object, io.BytesIO type
mode='elements',
url='http://127.0.0.1:8000/general/v0/general',
content_type='application/pdf',
metadata_filename='asdf.pdf',
).load_and_split()
```
- [x] **PR title**: "community: improve kuzu cypher generation prompt"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Improves the Kùzu Cypher generation prompt to be more
robust to open source LLM outputs
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @kuzudb
- [x] **Add tests and docs**: If you're adding a new integration, please
include
No new tests (non-breaking. change)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
## Description:
The PR introduces 3 changes:
1. added `recursive` property to `O365BaseLoader`. (To keep the behavior
unchanged, by default is set to `False`). When `recursive=True`,
`_load_from_folder()` also recursively loads all nested folders.
2. added `folder_id` to SharePointLoader.(similar to (this
PR)[https://github.com/langchain-ai/langchain/pull/10780] ) This
provides an alternative to `folder_path` that doesn't seem to reliably
work.
3. when none of `document_ids`, `folder_id`, `folder_path` is provided,
the loader fetches documets from root folder. Combined with
`recursive=True` this provides an easy way of loading all compatible
documents from SharePoint.
The PR contains the same logic as [this stale
PR](https://github.com/langchain-ai/langchain/pull/10780) by
@WaleedAlfaris. I'd like to ask his blessing for moving forward with
this one.
## Issue:
- As described in https://github.com/langchain-ai/langchain/issues/19938
and https://github.com/langchain-ai/langchain/pull/10780 the sharepoint
loader often does not seem to work with folder_path.
- Recursive loading of subfolders is a missing functionality
## Dependecies: None
Twitter handle:
@martintriska1 @WRhetoric
This is my first PR here, please be gentle :-)
Please review @baskaryan
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
This PR updates OctoAIEndpoint LLM to subclass BaseOpenAI as OctoAI is
an OpenAI-compatible service. The documentation and tests have also been
updated.
**Description:** Adds ThirdAI NeuralDB retriever integration. NeuralDB
is a CPU-friendly and fine-tunable text retrieval engine. We previously
added a vector store integration but we think that it will be easier for
our customers if they can also find us under under
langchain-community/retrievers.
---------
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
**Description:** Make ChatDatabricks model supports stream
**Issue:** N/A
**Dependencies:** MLflow nightly build version (we will release next
MLflow version soon)
**Twitter handle:** N/A
Manually test:
(Before testing, please install `pip install
git+https://github.com/mlflow/mlflow.git`)
```python
# Test Databricks Foundation LLM model
from langchain.chat_models import ChatDatabricks
chat_model = ChatDatabricks(
endpoint="databricks-llama-2-70b-chat",
max_tokens=500
)
from langchain_core.messages import AIMessageChunk
for chunk in chat_model.stream("What is mlflow?"):
print(chunk.content, end="|")
```
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- Add conditional: bool property to json representation of the graphs
- Add option to generate mermaid graph stripped of styles (useful as a
text representation of graph)
…s arg too
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:**
This PR adds a callback handler for UpTrain. It performs evaluations in
the RAG pipeline to check the quality of retrieved documents, generated
queries and responses.
- **Dependencies:**
- The UpTrainCallbackHandler requires the uptrain package
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
enviroment variable ANTHROPIC_API_URL will not work if anthropic_api_url
has default value
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
**Description**: Support filter by OR and AND for deprecated PGVector
version
**Issue**: #20445
**Dependencies**: N/A
**Twitter** handle: @martinferenaz
- **Description:**Add Google Firestore Vector store docs
- **Issue:** NA
- **Dependencies:** NA
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Description: fixes LangChainDeprecationWarning: The class
`langchain_community.embeddings.cohere.CohereEmbeddings` was deprecated
in langchain-community 0.0.30 and will be removed in 0.2.0. An updated
version of the class exists in the langchain-cohere package and should
be used instead. To use it run `pip install -U langchain-cohere` and
import as `from langchain_cohere import CohereEmbeddings`.

Dependencies : langchain_cohere
Twitter handle: @Mo_Noumaan
Description of features on mermaid graph renderer:
- Fixing CDN to use official Mermaid JS CDN:
https://www.jsdelivr.com/package/npm/mermaid?tab=files
- Add device_scale_factor to allow increasing quality of resulting PNG.
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for [DeepInfra]
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in [DeepInfra]
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: This update refines the documentation for
`RunnablePassthrough` by removing an unnecessary import and correcting a
minor syntactical error in the example provided. This change enhances
the clarity and correctness of the documentation, ensuring that users
have a more accurate guide to follow.
Issue: N/A
Dependencies: None
This PR focuses solely on documentation improvements, specifically
targeting the `RunnablePassthrough` class within the `langchain_core`
module. By clarifying the example provided in the docstring, users are
offered a more straightforward and error-free guide to utilizing the
`RunnablePassthrough` class effectively.
As this is a documentation update, it does not include changes that
require new integrations, tests, or modifications to dependencies. It
adheres to the guidelines of minimal package interference and backward
compatibility, ensuring that the overall integrity and functionality of
the LangChain package remain unaffected.
Thank you for considering this documentation refinement for inclusion in
the LangChain project.
Fix of YandexGPT embeddings.
The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.
A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.
Twitter: @egor_krash
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Updates the documentation for Portkey and Langchain.
Also updates the notebook. The current documentation is fairly old and
is non-functional.
**Twitter handle:** @portkeyai
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
`_ListSQLDatabaseToolInput` raise error if model returns `{}`.
For example, gpt-4-turbo returns `{}` with SQL Agent initialized by
`create_sql_agent`.
So, I set default value `""` for `_ListSQLDatabaseToolInput` tool_input.
This is actually a gpt-4-turbo issue, not a LangChain issue, but I
thought it would be helpful to set a default value `""`.
This problem is discussed in detail in the following Issue.
**Issue:** https://github.com/langchain-ai/langchain/issues/20405
**Dependencies:** none
Sorry, I did not add or change the test code, as tests for this
components was not exist .
However, I have tested the following code based on the [SQL Agent
Document](https://python.langchain.com/docs/use_cases/sql/agents/), to
make sure it works.
```
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_openai import ChatOpenAI
db = SQLDatabase.from_uri("sqlite:///Chinook.db")
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
agent_executor = create_sql_agent(llm, db=db, agent_type="openai-tools", verbose=True)
result = agent_executor.invoke("List the total sales per country. Which country's customers spent the most?")
print(result["output"])
```
- **Description:** Complete the support for Lua code in
langchain.text_splitter module.
- **Dependencies:** No
- **Twitter handle:** @saberuster
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_groq import ChatGroq
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatGroq(model_name="mixtral-8x7b-32768", temperature=0)
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
```
> Entering new AgentExecutor chain...
Invoking: `magic_function` with `{'input': 3}`
5The value of magic\_function(3) is 5.
> Finished chain.
{'input': 'what is the value of magic_function(3)?',
'output': 'The value of magic\\_function(3) is 5.'}
```
**Description:** Masking of the API key for AI21 models
**Issue:** Fixes#12165 for AI21
**Dependencies:** None
Note: This fix came in originally through #12418 but was possibly missed
in the refactor to the AI21 partner package
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Replaced all `from langchain.callbacks` into `from
langchain_core.callbacks` .
Changes in the `langchain` and `langchain_experimental`
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description**: The pydantic schema fields are supposed to be
optional but the use of `...` makes them required. This causes a
`ValidationError` when running the example code. I replaced `...` with
`default=None` to make the fields optional as intended. I also
standardized the format for all fields.
- **Issue**: n/a
- **Dependencies**: none
- **Twitter handle**: https://twitter.com/m_atoms
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for Llamafile
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in community llamafile.py
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
spelling error fixed
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- [x] **PR title**: community[patch]: Invoke callback prior to yielding
token fix for HuggingFaceEndpoint
- [x] **PR message**:
- **Description:** Invoke callback prior to yielding token in stream
method in community HuggingFaceEndpoint
- **Issue:** https://github.com/langchain-ai/langchain/issues/16913
- **Dependencies:** None
- **Twitter handle:** @bolun_zhang
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Added the [FireCrawl](https://firecrawl.dev) document loader. Firecrawl
crawls and convert any website into LLM-ready data. It crawls all
accessible subpages and give you clean markdown for each.
- **Description:** Adds FireCrawl data loader
- **Dependencies:** firecrawl-py
- **Twitter handle:** @mendableai
ccing contributors: (@ericciarla @nickscamara)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
LLMs might sometimes return invalid response for LLM graph transformer.
Instead of failing due to pydantic validation, we skip it and manually
check and optionally fix error where we can, so that more information
gets extracted
- **Description:** Added cross-links for easy access of api
documentation of each output parser class from it's description page.
- **Issue:** related to issue #19969
Co-authored-by: Haris Ali <haris.ali@formulatrix.com>
avaliable -> available
- **Description:** fixed typo
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
Mistral gives us one ID per response, no individual IDs for tool calls.
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_mistralai import ChatMistralAI
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatMistralAI(model="mistral-large-latest", temperature=0)
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:** Adds chroma to the partners package. Tests & code
mirror those in the community package.
**Dependencies:** None
**Twitter handle:** @akiradev0x
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR should make it easier for linters to do type checking and for IDEs to jump to definition of code.
See #20050 as a template for this PR.
- As a byproduct: Added 3 missed `test_imports`.
- Added missed `SolarChat` in to __init___.py Added it into test_import
ut.
- Added `# type: ignore` to fix linting. It is not clear, why linting
errors appear after ^ changes.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
```python
from langchain.agents import AgentExecutor, create_tool_calling_agent, tool
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
MessagesPlaceholder("chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
]
)
model = ChatAnthropic(model="claude-3-opus-20240229")
@tool
def magic_function(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
tools = [magic_function]
agent = create_tool_calling_agent(model, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is the value of magic_function(3)?"})
```
```
> Entering new AgentExecutor chain...
Invoking: `magic_function` with `{'input': 3}`
responded: [{'text': '<thinking>\nThe user has asked for the value of magic_function applied to the input 3. Looking at the available tools, magic_function is the relevant one to use here, as it takes an integer input and returns an integer output.\n\nThe magic_function has one required parameter:\n- input (integer)\n\nThe user has directly provided the value 3 for the input parameter. Since the required parameter is present, we can proceed with calling the function.\n</thinking>', 'type': 'text'}, {'id': 'toolu_01HsTheJPA5mcipuFDBbJ1CW', 'input': {'input': 3}, 'name': 'magic_function', 'type': 'tool_use'}]
5
Therefore, the value of magic_function(3) is 5.
> Finished chain.
{'input': 'what is the value of magic_function(3)?',
'output': 'Therefore, the value of magic_function(3) is 5.'}
```
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
core[minor], langchain[patch], openai[minor], anthropic[minor], fireworks[minor], groq[minor], mistralai[minor]
```python
class ToolCall(TypedDict):
name: str
args: Dict[str, Any]
id: Optional[str]
class InvalidToolCall(TypedDict):
name: Optional[str]
args: Optional[str]
id: Optional[str]
error: Optional[str]
class ToolCallChunk(TypedDict):
name: Optional[str]
args: Optional[str]
id: Optional[str]
index: Optional[int]
class AIMessage(BaseMessage):
...
tool_calls: List[ToolCall] = []
invalid_tool_calls: List[InvalidToolCall] = []
...
class AIMessageChunk(AIMessage, BaseMessageChunk):
...
tool_call_chunks: Optional[List[ToolCallChunk]] = None
...
```
Important considerations:
- Parsing logic occurs within different providers;
- ~Changing output type is a breaking change for anyone doing explicit
type checking;~
- ~Langsmith rendering will need to be updated:
https://github.com/langchain-ai/langchainplus/pull/3561~
- ~Langserve will need to be updated~
- Adding chunks:
- ~AIMessage + ToolCallsMessage = ToolCallsMessage if either has
non-null .tool_calls.~
- Tool call chunks are appended, merging when having equal values of
`index`.
- additional_kwargs accumulate the normal way.
- During streaming:
- ~Messages can change types (e.g., from AIMessageChunk to
AIToolCallsMessageChunk)~
- Output parsers parse additional_kwargs (during .invoke they read off
tool calls).
Packages outside of `partners/`:
- https://github.com/langchain-ai/langchain-cohere/pull/7
- https://github.com/langchain-ai/langchain-google/pull/123/files
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Description: When multithreading is set to True and using the
DirectoryLoader, there was a bug that caused the return type to be a
double nested list. This resulted in other places upstream not being
able to utilize the from_documents method as it was no longer a
`List[Documents]` it was a `List[List[Documents]]`. The change made was
to just loop through the `future.result()` and yield every item.
Issue: #20093
Dependencies: N/A
Twitter handle: N/A
This unit test fails likely validation by the openai client.
Newer openai library seems to be doing more validation so the existing
test fails since http_client needs to be of httpx instance
- **Description**: fixes BooleanOutputParser detecting sub-words ("NOW
this is likely (YES)" -> `True`, not `AmbiguousError`)
- **Issue(s)**: fixes#11408 (follow-up to #17810)
- **Dependencies**: None
- **GitHub handle**: @casperdcl
<!-- if unreviewd after a few days, @-mention one of baskaryan, efriis,
eyurtsev, hwchase17 -->
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
**Description:**
Use the `Stream` context managers in `ChatOpenAi` `stream` and `astream`
method.
Using the context manager returned by the OpenAI client makes it
possible to terminate the stream early since the response connection
will be closed when the context manager exists.
**Issue:** #5340
**Twitter handle:** @snopoke
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** Bug fix. Removed extra line in `GCSDirectoryLoader`
to allow catching Exceptions. Now also logs the file path if Exception
is raised for easier debugging.
- **Issue:** #20198 Bug since langchain-community==0.0.31
- **Dependencies:** No change
- **Twitter handle:** timothywong731
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- make Tencent Cloud VectorDB support metadata filtering.
- implement delete function for Tencent Cloud VectorDB.
- support both Langchain Embedding model and Tencent Cloud VDB embedding
model.
- Tencent Cloud VectorDB support filter search keyword, compatible with
langchain filtering syntax.
- add Tencent Cloud VectorDB TranslationVisitor, now work with self
query retriever.
- more documentations.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
- **Description:** In this PR I fixed the links which points to the API
docs for classes in OpenAI functions and OpenAI tools section of output
parsers.
- **Issue:** It fixed the issue #19969
Co-authored-by: Haris Ali <haris.ali@formulatrix.com>
Issue `langchain_community.cross_encoders` didn't have flattening
namespace code in the __init__.py file.
Changes:
- added code to flattening namespaces (used #20050 as a template)
- added ut for a change
- added missed `test_imports` for `chat_loaders` and
`chat_message_histories` modules
This PR make `request_timeout` and `max_retries` configurable for
ChatAnthropic.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: Add semantic caching and memory using
MongoDB"
- [ ] **PR message**:
- **Description:** This PR introduces functionality for adding semantic
caching and chat message history using MongoDB in RAG applications. By
leveraging the MongoDBCache and MongoDBChatMessageHistory classes,
developers can now enhance their retrieval-augmented generation
applications with efficient semantic caching mechanisms and persistent
conversation histories, improving response times and consistency across
chat sessions.
- **Issue:** N/A
- **Dependencies:** Requires `datasets`, `langchain`,
`langchain-mongodb`, `langchain-openai`, `pymongo`, and `pandas` for
implementation. MongoDB Atlas is used for database services, and the
OpenAI API for model access.
- **Twitter handle:** @richmondalake
Co-authored-by: Erick Friis <erick@langchain.dev>
Issue:
When async_req is the default value True, pinecone client return the
multiprocessing AsyncResult object.
When async_req is set to False, pinecone client return the result
directly. `[{'upserted_count': 1}]` . Calling get() method will throw an
error in this case.
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** Langchain-Predibase integration was failing, because
it was not current with the Predibase SDK; in addition, Predibase
integration tests were instantiating the Langchain Community `Predibase`
class with one required argument (`model`) missing. This change updates
the Predibase SDK usage and fixes the integration tests.
- **Twitter handle:** `@alexsherstinsky`
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Last year Microsoft [changed the
name](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search)
of Azure Cognitive Search to Azure AI Search. This PR updates the
Langchain Azure Retriever API and it's associated docs to reflect this
change. It may be confusing for users to see the name Cognitive here and
AI in the Microsoft documentation which is why this is needed. I've also
added a more detailed example to the Azure retriever doc page.
There are more places that need a similar update but I'm breaking it up
so the PRs are not too big 😄 Fixing my errors from the previous PR.
Twitter: @marlene_zw
Two new tests added to test backward compatibility in
`libs/community/tests/integration_tests/retrievers/test_azure_cognitive_search.py`
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** update langchain anthropic templates to support
Claude 3 (iterative search, chain of note, summarization, and XML
response)
- **Issue:** issue # N/A. Stability issues and errors encountered when
trying to use older langchain and anthropic libraries.
- **Dependencies:**
- langchain_anthropic version 0.1.4\
- anthropic package version in the range ">=0.17.0,<1" to support
langchain_anthropic.
- **Twitter handle:** @d_w_b7
- [ x]**Add tests and docs**: If you're adding a new integration, please
include
1. used instructions in the README for testing
- [ x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
After this PR it will be possible to pass a cache instance directly to a
language model. This is useful to allow different language models to use
different caches if needed.
- **Issue:** close#19276
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- Added missed providers
- Added links, descriptions in related examples
- Formatted in a consistent format
Co-authored-by: Erick Friis <erick@langchain.dev>
Updated a page with existing document loaders with links to examples.
Fixed formatting of one example.
Co-authored-by: Erick Friis <erick@langchain.dev>
Issue: The `graph` code was moved into the `community` package a long
ago. But the related documentation is still in the
[use_cases](https://python.langchain.com/docs/use_cases/graph/integrations/diffbot_graphtransformer)
section and not in the `integrations`.
Changes:
- moved the `use_cases/graph/integrations` notebooks into the
`integrations/graphs`
- renamed files and changed titles to follow the consistent format
- redirected old page URLs to new URLs in `vercel.json` and in several
other pages
- added descriptions and links when necessary
- formatted into the consistent format
Should hopefully avoid weird broken link edge cases.
Relative links now trip up the Docusaurus broken link checker, so this
PR also removes them.
Also snuck in a small addition about asyncio
**Description:**
The `LocalFileStore` class can be used to create an on-disk
`CacheBackedEmbeddings` cache. However, the default `umask` settings
gives file/directory write permissions only to the original user. Once
the cache directory is created by the first user, other users cannot
write their own cache entries into the directory.
To make the cache usable by multiple users, this pull request updates
the `LocalFileStore` constructor to allow the permissions for newly
created directories and files to be specified. The specified permissions
override the default `umask` values.
For example, when configured as follows:
```python
file_store = LocalFileStore(temp_dir, chmod_dir=0o770, chmod_file=0o660)
```
then "user" and "group" (but not "other") have permissions to access the
store, which means:
* Anyone in our group could contribute embeddings to the cache.
* If we implement cache cleanup/eviction in the future, anyone in our
group could perform the cleanup.
The default values for the `chmod_dir` and `chmod_file` parameters is
`None`, which retains the original behavior of using the default `umask`
settings.
**Issue:**
Implements enhancement #18075.
**Testing:**
I updated the `LocalFileStore` unit tests to test the permissions.
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
- **Description:** Adds async variants of afrom_texts and
afrom_embeddings into `OpenSearchVectorSearch`, which allows for
`afrom_documents` to be called.
- **Issue:** I implemented this because my use case involves an async
scraper generating documents as and when they're ready to be ingested by
Embedding/OpenSearch
- **Dependencies:** None that I'm aware
Co-authored-by: Ben Mitchell <b.mitchell@reply.com>
This PR supports using Pydantic v2 objects to generate the schema for
the JSONOutputParser (#19441). This also adds a `json_schema` parameter
to allow users to pass any JSON schema to validate with, not just
pydantic.
core/langchain_core/_api[Patch]: mypy ignore fixes#17048
Related to #17048
Applied mypy fixes to below two files:
libs/core/langchain_core/_api/deprecation.py
libs/core/langchain_core/_api/beta_decorator.py
Summary of Fixes:
**Issue 1**
class _deprecated_property(type(obj)): # type: ignore
error: Unsupported dynamic base class "type" [misc]
Fix:
1. Added an __init__ method to _deprecated_property to initialize the
fget, fset, fdel, and __doc__ attributes.
2. In the __get__, __set__, and __delete__ methods, we now use the
self.fget, self.fset, and self.fdel attributes to call the original
methods after emitting the warning.
3. The finalize function now creates an instance of _deprecated_property
with the fget, fset, fdel, and doc attributes from the original obj
property.
**Issue 2**
def finalize( # type: ignore
wrapper: Callable[..., Any], new_doc: str
) -> T:
error: All conditional function variants must have identical
signatures
Fix: Ensured that both definitions of the finalize function have the
same signature
Twitter Handle -
https://x.com/gupteutkarsha?s=11&t=uwHe4C3PPpGRvoO5Qpm1aA
**Description:** Citations are the main addition in this PR. We now emit
them from the multihop agent! Additionally the agent is now more
flexible with observations (`Any` is now accepted), and the Cohere SDK
version is bumped to fix an issue with the most recent version of
pydantic v1 (1.10.15)
- **Description:** In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, I implemented delete method
and all async methods in opensearch_vector_search
- **Dependencies:** No changes
- **Description:** Improvement for #19599: fixing missing return of
graph.draw_mermaid_png and improve it to make the saving of the rendered
image optional
Co-authored-by: Angel Igareta <angel.igareta@klarna.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Update of Cohere documentation (main provider page)
**Issue:** After addition of the Cohere partner package, the
documentation was out of date
**Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: deprecating integrations moved to
langchain_google_community"
- [ ] **PR message**: deprecating integrations moved to
langchain_google_community
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
Removes required usage of `requests` from `langchain-core`, all of which
has been deprecated.
- removes Tracer V1 implementations
- removes old `try_load_from_hub` github-based hub implementations
Removal done in a way where imports will still succeed, and usage will
fail with a `RuntimeError`.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** mention not-caching methods in CacheBackedEmbeddings
- **Issue:** n/a I almost created one until I read the code
- **Dependencies:** n/a
- **Twitter handle:** `tarsylia`
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description**: Improves the stability of all Cohere partner package
integration tests. Fixes a bug with document parsing (both dicts and
Documents are handled).
**Description**: This PR simplifies an integration test within the
Cohere partner package:
* It no longer relies on exact model answers
* It no longer relies on a third party tool
This PR completes work for PR #18798 to expose raw tool output in
on_tool_end.
Affected APIs:
* astream_log
* astream_events
* callbacks sent to langsmith via langsmith-sdk
* Any other code that relies on BaseTracer!
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- This ensures ids are stable across streamed chunks
- Multiple messages in batch call get separate ids
- Also fix ids being dropped when combining message chunks
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
- **Issue:** None,
- **Dependencies:** None,
- **Tag maintainer:** @nfcampos ,
- **Twitter handle:** peter_v
I ran `make format`, `make lint` and `make test`.
Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader
Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.
**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...
INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
As in #19346, this PR exposes `request_timeout` in `BaseCohere`, while
`max_retires` is no longer a parameter of the beneath client
(`cohere.Client`) and it is already configured in
`langchain_cohere.llms.Cohere`.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** the layout of html pages can be variant based on the
bootstrap framework or the styles of the pages. So we need to have a
splitter to transform the html tags to a proper layout and then split
the html content based on the provided list of tags to determine its
html sections. We are using BS4 library along with xslt structure to
split the html content using an section aware approach.
- **Dependencies:** No new dependencies
- **Twitter handle:** @m_setayesh
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
[Dria](https://dria.co/) is a hub of public RAG models for developers to
both contribute and utilize a shared embedding lake. This PR adds a
retriever that can retrieve documents from Dria.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**:
- **Description:** Fix argument translation from OpenAPI spec to OpenAI
function call (and similar)
- **Issue:** OpenGPTs failures with calling Action Server based actions.
- **Dependencies:** None
- **Twitter handle:** mikkorpela
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
~2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Description: Update `ChatZhipuAI` to support the latest `glm-4` model.
Issue: N/A
Dependencies: httpx, httpx-sse, PyJWT
The previous `ChatZhipuAI` implementation requires the `zhipuai`
package, and cannot call the latest GLM model. This is because
- The old version `zhipuai==1.*` doesn't support the latest model.
- `zhipuai==2.*` requires `pydantic V2`, which is incompatible with
'langchain-community'.
This re-implementation invokes the GLM model by sending HTTP requests to
[open.bigmodel.cn](https://open.bigmodel.cn/dev/api) via the `httpx`
package, and uses the `httpx-sse` package to handle stream events.
---------
Co-authored-by: zR <2448370773@qq.com>
- **Description:** Add functionality to generate Mermaid syntax and
render flowcharts from graph data. This includes support for custom node
colors and edge curve styles, as well as the ability to export the
generated graphs to PNG images using either the Mermaid.INK API or
Pyppeteer for local rendering.
- **Dependencies:** Optional dependencies are `pyppeteer` if rendering
wants to be done using Pypeteer and Javascript code.
---------
Co-authored-by: Angel Igareta <angel.igareta@klarna.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** An additional `U` argument was added for the
instructions to install the pip packages for the MediaWiki Dump Document
loader which was leading to error in installing the package. Removing
the argument fixed the command to install.
**Issue:** #19820
**Dependencies:** No dependency change requierd
**Twitter handle:** [@vardhaman722](https://twitter.com/vardhaman722)
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
**LangChain** is a framework for developing applications powered by large language models (LLMs).
This framework consists of several parts.
- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- **[LangChain Templates](templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.
- **[LangServe](https://github.com/langchain-ai/langserve)**: A library for deploying LangChain chains as a REST API.
- **[LangSmith](https://smith.langchain.com)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
- **[LangGraph](https://python.langchain.com/docs/langgraph)**: LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.
For these applications, LangChain simplifies the entire application lifecycle:
The LangChain libraries themselves are made up of several different packages.
- **[`langchain-core`](libs/core)**: Base abstractionsand LangChain Expression Language.
- **[`langchain-community`](libs/community)**: Third party integrations.
- **[`langchain`](libs/langchain)**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **Open-source libraries**: Build your applications using LangChain's [modular building blocks](https://python.langchain.com/docs/expression_language/) and [components](https://python.langchain.com/docs/modules/). Integrate with hundreds of [third-party providers](https://python.langchain.com/docs/integrations/platforms/).
- **Productionization**: Inspect, monitor, and evaluate your apps with [LangSmith](https://python.langchain.com/docs/langsmith/) so that you can constantly optimize and deploy with confidence.
- **Deployment**: Turn any chain into a REST API with [LangServe](https://python.langchain.com/docs/langserve).
### Open-source libraries
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
- **`langchain-community`**: Third party integrations.
- Some integrations have been further split into **partner packages** that only rely on **`langchain-core`**. Examples include **`langchain_openai`** and **`langchain_anthropic`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[`LangGraph`](https://python.langchain.com/docs/langgraph)**: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
### Productionization:
- **[LangSmith](https://python.langchain.com/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
### Deployment:
- **[LangServe](https://python.langchain.com/docs/langserve)**: A library for deploying LangChain chains as REST APIs.

@@ -72,34 +78,51 @@ And much more! Head to the [Use cases](https://python.langchain.com/docs/use_cas
## 🚀 How does LangChain help?
The main value props of the LangChain libraries are:
1.**Components**: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
1.**Components**: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2.**Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
## LangChain Expression Language (LCEL)
LCEL is the foundation of many of LangChain's components, and is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.
- **[Overview](https://python.langchain.com/docs/expression_language/)**: LCEL and its benefits
- **[Interface](https://python.langchain.com/docs/expression_language/interface)**: The standard interface for LCEL objects
- **[Primitives](https://python.langchain.com/docs/expression_language/primitives)**: More on the primitives LCEL includes
## Components
Components fall into the following **modules**:
**📃 Model I/O:**
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
This includes [prompt management](https://python.langchain.com/docs/modules/model_io/prompts/), [prompt optimization](https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/), a generic interface for [chat models](https://python.langchain.com/docs/modules/model_io/chat/) and [LLMs](https://python.langchain.com/docs/modules/model_io/llms/), and common utilities for working with [model outputs](https://python.langchain.com/docs/modules/model_io/output_parsers/).
**📚 Retrieval:**
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
Retrieval Augmented Generation involves [loading data](https://python.langchain.com/docs/modules/data_connection/document_loaders/) from a variety of sources, [preparing it](https://python.langchain.com/docs/modules/data_connection/document_loaders/), [then retrieving it](https://python.langchain.com/docs/modules/data_connection/retrievers/) for use in the generation step.
**🤖 Agents:**
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete done. LangChain provides a [standard interface for agents](https://python.langchain.com/docs/modules/agents/), a [selection of agents](https://python.langchain.com/docs/modules/agents/agent_types/) to choose from, and examples of end-to-end agents.
## 📖 Documentation
Please see [here](https://python.langchain.com) for full documentation, which includes:
- [Getting started](https://python.langchain.com/docs/get_started/introduction): installation, setting up the environment, simple examples
-Overview of the [interfaces](https://python.langchain.com/docs/expression_language/), [modules](https://python.langchain.com/docs/modules/), and [integrations](https://python.langchain.com/docs/integrations/providers)
-[Use case](https://python.langchain.com/docs/use_cases/qa_structured/sql) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/adapters/openai)
- [LangSmith](https://python.langchain.com/docs/langsmith/), [LangServe](https://python.langchain.com/docs/langserve), and [LangChain Template](https://python.langchain.com/docs/templates/) overviews
- [Reference](https://api.python.langchain.com): full API docs
-[Use case](https://python.langchain.com/docs/use_cases/) walkthroughs and best practice [guides](https://python.langchain.com/docs/guides/)
-Overviews of the [interfaces](https://python.langchain.com/docs/expression_language/), [components](https://python.langchain.com/docs/modules/), and [integrations](https://python.langchain.com/docs/integrations/providers)
You can also check out the full [API Reference docs](https://api.python.langchain.com).
## 🌐 Ecosystem
- [🦜🛠️ LangSmith](https://python.langchain.com/docs/langsmith/): Tracing and evaluating your language model applications and intelligent agents to help you move from prototype to production.
- [🦜🕸️ LangGraph](https://python.langchain.com/docs/langgraph): Creating stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain primitives.
- [🦜🏓 LangServe](https://python.langchain.com/docs/langserve): Deploying LangChain runnables and chains as REST APIs.
- [LangChain Templates](https://python.langchain.com/docs/templates/): Example applications hosted with LangServe.
"query = \"Give me company names that are interesting investments based on EV / NTM and NTM rev growth. Consider EV / NTM multiples vs historical?\"\n",
[press_releases.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/press_releases.ipynb) | Retrieve and query company press release data powered by [Kay.ai](https://kay.ai).
[program_aided_language_model.i...](https://github.com/langchain-ai/langchain/tree/master/cookbook/program_aided_language_model.ipynb) | Implement program-aided language models as described in the provided research paper.
[qa_citations.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/qa_citations.ipynb) | Different ways to get a model to cite its sources.
[rag_upstage_layout_analysis_groundedness_check.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/rag_upstage_layout_analysis_groundedness_check.ipynb) | End-to-end RAG example using Upstage Layout Analysis and Groundedness Check.
[retrieval_in_sql.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/retrieval_in_sql.ipynb) | Perform retrieval-augmented-generation (rag) on a PostgreSQL database using pgvector.
[sales_agent_with_context.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/sales_agent_with_context.ipynb) | Implement a context-aware ai sales agent, salesgpt, that can have natural sales conversations, interact with other systems, and use a product knowledge base to discuss a company's offerings.
[self_query_hotel_search.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/self_query_hotel_search.ipynb) | Build a hotel room search feature with self-querying retrieval, using a specific hotel recommendation dataset.
@@ -56,3 +57,4 @@ Notebook | Description
[two_agent_debate_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_agent_debate_tools.ipynb) | Simulate multi-agent dialogues where the agents can utilize various tools.
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialogue simulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.
"Apply to the [`LLaMA2`](https://arxiv.org/pdf/2307.09288.pdf) paper. \n",
"\n",
"We use the Unstructured [`partition_pdf`](https://unstructured-io.github.io/unstructured/bricks/partition.html#partition-pdf), which segments a PDF document by using a layout model. \n",
"We use the Unstructured [`partition_pdf`](https://unstructured-io.github.io/unstructured/core/partition.html#partition-pdf), which segments a PDF document by using a layout model. \n",
"\n",
"This layout model makes it possible to extract elements, such as tables, from pdfs. \n",
"query = \"What percentage of CPI is dedicated to Housing, and how does it compare to the combined percentage of Medical Care, Apparel, and Other Goods and Services?\"\n",
"suffix_for_images = \" Include any pie charts, graphs, or tables.\"\n",
" raise ValueError(\"a KEYSPACE environment variable must be set\")\n",
"\n",
"session.set_keyspace(keyspace)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This needs to be done one time only!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dataset used is from Kaggle, the [Environmental Sensor Telemetry Data](https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k?select=iot_telemetry_data.csv). The next cell will download and unzip the data into a Pandas dataframe. The following cell is instructions to download manually. \n",
"\n",
"The net result of this section is you should have a Pandas dataframe variable `df`."
"with zip_file.open(csv_file_name) as csv_file:\n",
" df = pd.read_csv(csv_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download Manually"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can download the `.zip` file and unpack the `.csv` contained within. Comment in the next line, and adjust the path to this `.csv` file appropriately."
"WITH COMMENT = 'Data from environmental IoT room sensors. Columns include device identifier, timestamp (ts) of the data collection, carbon monoxide level (co), relative humidity, light presence, LPG concentration, motion detection, smoke concentration, and temperature (temp). Data is partitioned by day and device.';\n",
" description=\"A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.\",\n",
"Here is your task: In the {keyspace} keyspace, find the total number of times the temperature of each device has exceeded 23 degrees on July 14, 2020.\n",
" Create a summary report including the name of the room. Use Pandas if helpful.\n",
"* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.\n",
"\n",
"* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b56412ae",
"metadata": {},
"outputs": [],
"source": [
"import getpass"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "16a20d7a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your MongoDB connection string:········\n"
]
}
],
"source": [
"MONGODB_URI = getpass.getpass(\"Enter your MongoDB connection string:\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "978682d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your OpenAI API key:········\n"
]
}
],
"source": [
"OPENAI_API_KEY = getpass.getpass(\"Enter your OpenAI API key:\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "606081c5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"········\n"
]
}
],
"source": [
"# Optional-- If you want to enable Langsmith -- good for debugging\n",
"rag_chain = retriever_chain | rag_prompt | model | parse_output"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "9618d395",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The best movie to watch when feeling down could be \"Last Action Hero.\" It\\'s a fun and action-packed film that blends reality and fantasy, offering an escape from the real world and providing an entertaining distraction.'"
"'I apologize for the confusion. Another movie that might lift your spirits when you\\'re feeling sad is \"Smilla\\'s Sense of Snow.\" It\\'s a mystery thriller that could engage your mind and distract you from your sadness with its intriguing plot and suspenseful storyline.'"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_message_history.invoke(\n",
" {\n",
" \"question\": \"Hmmm..I don't want to watch that one. Can you suggest something else?\"\n",
"'For a lighter movie option, you might enjoy \"Cousins.\" It\\'s a comedy film set in Barcelona with action and humor, offering a fun and entertaining escape from reality. The storyline is engaging and filled with comedic moments that could help lift your spirits.'"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with_message_history.invoke(\n",
" {\"question\": \"How about something more light?\"},\n",
"## Step 7: Get faster responses using Semantic Cache\n",
"\n",
"**NOTE:** Semantic cache only caches the input to the LLM. When using it in retrieval chains, remember that documents retrieved can change between runs resulting in cache misses for semantically similar queries."
"# Oracle AI Vector Search with Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
"\n",
" * Partitioning Support\n",
" * Real Application Clusters scalability\n",
" * Exadata smart scans\n",
" * Shard processing across geographically distributed databases\n",
" * Transactions\n",
" * Parallel SQL\n",
" * Disaster recovery\n",
" * Security\n",
" * Oracle Machine Learning\n",
" * Oracle Graph Database\n",
" * Oracle Spatial and Graph\n",
" * Oracle Blockchain\n",
" * JSON\n",
"\n",
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
" * Loading the documents from various sources using OracleDocLoader\n",
" * Summarizing them within/outside the database using OracleSummary\n",
" * Generating embeddings for them within/outside the database using OracleEmbeddings\n",
" * Chunking them according to different requirements using Advanced Oracle Capabilities from OracleTextSplitter\n",
" * Storing and Indexing them in a Vector Store and querying them for queries in OracleVS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, create a demo user with all the required privileges. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"User setup done!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"# please make sure this user has sufficient privileges to perform all below\n",
"Let's think about a scenario that the users have some documents in Oracle Database or in a file system. They want to use the data for Oracle AI Vector Search using Langchain.\n",
"\n",
"For that, the users need to do some document preprocessing. The first step would be to read the documents, generate their summary(if needed) and then chunk/split them if needed. After that, they need to generate the embeddings for those chunks and store into Oracle AI Vector Store. Finally, the users will perform some semantic queries on those data. \n",
"\n",
"Oracle AI Vector Search Langchain library provides a range of document processing functionalities including document loading, splitting, generating summary and embeddings.\n",
"\n",
"In the following sections, we will go through how to use Oracle AI Langchain APIs to achieve each of these functionalities individually. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code will show how to connect to Oracle Database. "
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
" cursor.execute(create_table_sql)\n",
"\n",
" insert_row_sql = \"\"\"insert into demo_tab values (:1, :2)\"\"\"\n",
" rows_to_insert = [\n",
" (\n",
" 1,\n",
" \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
" ),\n",
" (\n",
" 2,\n",
" \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
" ),\n",
" (\n",
" 3,\n",
" \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
"Now that we have a demo user and a demo table with some data, we just need to do one more setup. For embedding and summary, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings; however, for summary, they don't need to do anything."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load ONNX Model\n",
"\n",
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
"\n",
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
"\n",
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
"\n",
"Here is the sample code to load an ONNX model to Oracle Database:"
"On the other hand, if the users choose to use 3rd party provider to generate embeddings and summary, they need to create credential to access 3rd party provider's end points.\n",
"\n",
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings and summary. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
"\n",
"The following sample code will show how to do that:"
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library provides an API to do that. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. Like before, they just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate summary:"
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The following sample code will show how to do that:"
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate embeddings:"
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
"print(f\"Vector Store Table: {vectorstore.table_name}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above example creates a vector store with DOT_PRODUCT distance strategy. \n",
"\n",
"However, the users can create Oracle AI Vector Store provides different distance strategies. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have embeddings stored in vector stores, let's create an index on them to get better semantic search performance during query time.\n",
"\n",
"***Note*** If you are getting some insufficient memory error, please increase ***vector_memory_size*** in your database.\n",
"The above example creates a default HNSW index on the embeddings stored in 'oravs' table. The users can set different parameters as per their requirements. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"Also, there are different types of vector indices that the users can create. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"We have processed the documents, stored them to vector store, and then created index to get better query performance. Now let's do some semantic searches.\n",
"\n",
"Here is the sample code for this:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
"[]\n",
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
"[]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
"# RAG using Upstage Layout Analysis and Groundedness Check\n",
"This example illustrates RAG using [Upstage](https://python.langchain.com/docs/integrations/providers/upstage/) Layout Analysis and Groundedness Check."
"] += f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
"attribute_info[3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
"attribute_info[-3][\n",
" \"description\"\n",
"] += f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\""
"attribute_info[-2][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['starrating'].value_counts().index.tolist())}\"\n",
")\n",
"attribute_info[3][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['maxoccupancy'].value_counts().index.tolist())}\"\n",
")\n",
"attribute_info[-3][\"description\"] += (\n",
" f\". Valid values are {sorted(latest_price['country'].value_counts().index.tolist())}\"\n",
")"
]
},
{
@@ -688,9 +688,9 @@
"metadata": {},
"outputs": [],
"source": [
"attribute_info[-3][\n",
" \"description\"\n",
"] += \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"attribute_info[-3][\"description\"] += (\n",
" \". NOTE: Only use the 'eq' operator if a specific country is mentioned. If a region is mentioned, include all relevant countries in filter.\"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"Action Input: \"ChatGPT\"\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001B[0m\n",
"Cell \u001B[0;32mIn[36], line 1\u001B[0m\n\u001B[0;32m----> 1\u001B[0m \u001B[43magent_executor\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43minvoke\u001B[49m\u001B[43m(\u001B[49m\u001B[43m{\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43minput\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m:\u001B[49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mWhat is ChatGPT?\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m}\u001B[49m\u001B[43m)\u001B[49m\n",
"agent_executor.invoke({\"input\": \"What is ChatGPT?\"})"
]
},
{
@@ -179,15 +196,15 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"Action Input: Who developed ChatGPT\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -202,7 +219,7 @@
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
"agent_executor.invoke({\"input\": \"Who developed it?\"})"
]
},
{
@@ -217,14 +234,14 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"Action Input: My daughter 5 years old\u001B[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"\u001B[1m> Entering new LLMChain chain...\u001B[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\u001B[32;1m\u001B[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
@@ -232,16 +249,16 @@
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot. It was created by OpenAI and can send and receive images while chatting.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\u001b[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot. It was created by OpenAI and can send and receive images while chatting.\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot created by OpenAI that can send and receive images while chatting.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -256,8 +273,8 @@
}
],
"source": [
"agent_chain.run(\n",
" input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\"\n",
"agent_executor.invoke(\n",
" {\"input\": \"Thanks. Summarize the conversation, for my daughter 5 years old.\"}\n",
")"
]
},
@@ -289,9 +306,17 @@
}
],
"source": [
"print(agent_chain.memory.buffer)"
"print(agent_executor.memory.buffer)"
]
},
{
"cell_type": "markdown",
"id": "84ca95c30e262e00",
"metadata": {
"collapsed": false
},
"source": []
},
{
"cell_type": "markdown",
"id": "cc3d0aa4",
@@ -340,25 +365,9 @@
" ),\n",
"]\n",
"\n",
"prefix = \"\"\"Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:\"\"\"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I should research ChatGPT to answer this question.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I should research ChatGPT to answer this question.\n",
"Action: Search\n",
"Action Input: \"ChatGPT\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001b[0m\n",
"Action Input: \"ChatGPT\"\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mNov 30, 2022 ... We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... ChatGPT. We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer... Feb 2, 2023 ... ChatGPT, the popular chatbot from OpenAI, is estimated to have reached 100 million monthly active users in January, just two months after... 2 days ago ... ChatGPT recently launched a new version of its own plagiarism detection tool, with hopes that it will squelch some of the criticism around how... An API for accessing new AI models developed by OpenAI. Feb 19, 2023 ... ChatGPT is an AI chatbot system that OpenAI released in November to show off and test what a very large, powerful AI system can accomplish. You... ChatGPT is fine-tuned from GPT-3.5, a language model trained to produce text. ChatGPT was optimized for dialogue by using Reinforcement Learning with Human... 3 days ago ... Visual ChatGPT connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. Dec 1, 2022 ... ChatGPT is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with a...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -396,7 +405,7 @@
}
],
"source": [
"agent_chain.run(input=\"What is ChatGPT?\")"
"agent_executor.invoke({\"input\": \"What is ChatGPT?\"})"
]
},
{
@@ -411,15 +420,15 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to find out who developed ChatGPT\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to find out who developed ChatGPT\n",
"Action: Search\n",
"Action Input: Who developed ChatGPT\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001b[0m\n",
"Action Input: Who developed ChatGPT\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large... Feb 15, 2023 ... Who owns Chat GPT? Chat GPT is owned and developed by AI research and deployment company, OpenAI. The organization is headquartered in San... Feb 8, 2023 ... ChatGPT is an AI chatbot developed by San Francisco-based startup OpenAI. OpenAI was co-founded in 2015 by Elon Musk and Sam Altman and is... Dec 7, 2022 ... ChatGPT is an AI chatbot designed and developed by OpenAI. The bot works by generating text responses based on human-user input, like questions... Jan 12, 2023 ... In 2019, Microsoft invested $1 billion in OpenAI, the tiny San Francisco company that designed ChatGPT. And in the years since, it has quietly... Jan 25, 2023 ... The inside story of ChatGPT: How OpenAI founder Sam Altman built the world's hottest technology with billions from Microsoft. Dec 3, 2022 ... ChatGPT went viral on social media for its ability to do anything from code to write essays. · The company that created the AI chatbot has a... Jan 17, 2023 ... While many Americans were nursing hangovers on New Year's Day, 22-year-old Edward Tian was working feverishly on a new app to combat misuse... ChatGPT is a language model created by OpenAI, an artificial intelligence research laboratory consisting of a team of researchers and engineers focused on... 1 day ago ... Everyone is talking about ChatGPT, developed by OpenAI. This is such a great tool that has helped to make AI more accessible to a wider...\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer\n",
"Final Answer: ChatGPT was developed by OpenAI.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -434,7 +443,7 @@
}
],
"source": [
"agent_chain.run(input=\"Who developed it?\")"
"agent_executor.invoke({\"input\": \"Who developed it?\"})"
]
},
{
@@ -449,14 +458,14 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: I need to simplify the conversation for a 5 year old.\n",
"Action: Summary\n",
"Action Input: My daughter 5 years old\u001b[0m\n",
"Action Input: My daughter 5 years old\u001B[0m\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"\u001B[1m> Entering new LLMChain chain...\u001B[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mThis is a conversation between a human and a bot:\n",
"\u001B[32;1m\u001B[1;3mThis is a conversation between a human and a bot:\n",
"\n",
"Human: What is ChatGPT?\n",
"AI: ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI's GPT-3 family of large language models and is optimized for dialogue by using Reinforcement Learning with Human-in-the-Loop. It is also capable of sending and receiving images during chatting.\n",
@@ -464,16 +473,16 @@
"AI: ChatGPT was developed by OpenAI.\n",
"\n",
"Write a summary of the conversation for My daughter 5 years old:\n",
"\u001b[0m\n",
"\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n",
"\n",
"Observation: \u001b[33;1m\u001b[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\u001b[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"The conversation was about ChatGPT, an artificial intelligence chatbot developed by OpenAI. It is designed to have conversations with humans and can also send and receive images.\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer.\n",
"Final Answer: ChatGPT is an artificial intelligence chatbot developed by OpenAI that can have conversations with humans and send and receive images.\u001B[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
@@ -488,8 +497,8 @@
}
],
"source": [
"agent_chain.run(\n",
" input=\"Thanks. Summarize the conversation, for my daughter 5 years old.\"\n",
"agent_executor.invoke(\n",
" {\"input\": \"Thanks. Summarize the conversation, for my daughter 5 years old.\"}\n",
" AIMessage(content='The result of \\\\(3 + 5^{2.743}\\\\) is approximately 300.04, and the result of \\\\(17.24 - 918.1241\\\\) is approximately -900.88.', response_metadata={'token_usage': {'completion_tokens': 44, 'prompt_tokens': 251, 'total_tokens': 295}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'stop', 'logprobs': None}, id='run-d1161669-ed09-4b18-94bd-6d8530df5aa8-0')]}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph.invoke(\n",
" {\n",
" \"messages\": [\n",
" HumanMessage(\n",
" \"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"\n",
" )\n",
" ]\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "073c074e-d722-42e0-85ec-c62c079207e4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'messages': [HumanMessage(content=\"what's 3 plus 5 raised to the 2.743. also what's 17.24 - 918.1241\"),\n",
As LangChain continues to grow, the surface area of documentation required to cover it continues to grow too.
This page provides guidelines for anyone writing documentation for LangChain, as well as some of our philosophies around
organization and structure.
## Philosophy
LangChain's documentation aspires to follow the [Diataxis framework](https://diataxis.fr).
Under this framework, all documentation falls under one of four categories:
- **Tutorials**: Lessons that take the reader by the hand through a series of conceptual steps to complete a project.
- An example of this is our [LCEL streaming guide](/docs/expression_language/streaming).
- Our guides on [custom components](/docs/modules/model_io/chat/custom_chat_model) is another one.
- **How-to guides**: Guides that take the reader through the steps required to solve a real-world problem.
- The clearest examples of this are our [Use case](/docs/use_cases/) quickstart pages.
- **Reference**: Technical descriptions of the machinery and how to operate it.
- Our [Runnable interface](/docs/expression_language/interface) page is an example of this.
- The [API reference pages](https://api.python.langchain.com/) are another.
- **Explanation**: Explanations that clarify and illuminate a particular topic.
- The [LCEL primitives pages](/docs/expression_language/primitives/sequence) are an example of this.
Each category serves a distinct purpose and requires a specific approach to writing and structuring the content.
## Taxonomy
Keeping the above in mind, we have sorted LangChain's docs into categories. It is helpful to think in these terms
when contributing new documentation:
### Getting started
The [getting started section](/docs/get_started/introduction) includes a high-level introduction to LangChain, a quickstart that
tours LangChain's various features, and logistical instructions around installation and project setup.
It contains elements of **How-to guides** and **Explanations**.
### Use cases
[Use cases](/docs/use_cases/) are guides that are meant to show how to use LangChain to accomplish a specific task (RAG, information extraction, etc.).
The quickstarts should be good entrypoints for first-time LangChain developers who prefer to learn by getting something practical prototyped,
then taking the pieces apart retrospectively. These should mirror what LangChain is good at.
The quickstart pages here should fit the **How-to guide** category, with the other pages intended to be **Explanations** of more
in-depth concepts and strategies that accompany the main happy paths.
:::note
The below sections are listed roughly in order of increasing level of abstraction.
:::
### Expression Language
[LangChain Expression Language (LCEL)](/docs/expression_language/) is the fundamental way that most LangChain components fit together, and this section is designed to teach
developers how to use it to build with LangChain's primitives effectively.
This section should contains **Tutorials** that teach how to stream and use LCEL primitives for more abstract tasks, **Explanations** of specific behaviors,
and some **References** for how to use different methods in the Runnable interface.
### Components
The [components section](/docs/modules) covers concepts one level of abstraction higher than LCEL.
Abstract base classes like `BaseChatModel` and `BaseRetriever` should be covered here, as well as core implementations of these base classes,
such as `ChatPromptTemplate` and `RecursiveCharacterTextSplitter`. Customization guides belong here too.
This section should contain mostly conceptual **Tutorials**, **References**, and **Explanations** of the components they cover.
:::note
As a general rule of thumb, everything covered in the `Expression Language` and `Components` sections (with the exception of the `Composition` section of components) should
cover only components that exist in `langchain_core`.
:::
### Integrations
The [integrations](/docs/integrations/platforms/) are specific implementations of components. These often involve third-party APIs and services.
If this is the case, as a general rule, these are maintained by the third-party partner.
This section should contain mostly **Explanations** and **References**, though the actual content here is more flexible than other sections and more at the
discretion of the third-party provider.
:::note
Concepts covered in `Integrations` should generally exist in `langchain_community` or specific partner packages.
:::
### Guides and Ecosystem
The [Guides](/docs/guides) and [Ecosystem](/docs/langsmith/) sections should contain guides that address higher-level problems than the sections above.
This includes, but is not limited to, considerations around productionization and development workflows.
These should contain mostly **How-to guides**, **Explanations**, and **Tutorials**.
### API references
LangChain's API references. Should act as **References** (as the name implies) with some **Explanation**-focused content as well.
## Sample developer journey
We have set up our docs to assist a new developer to LangChain. Let's walk through the intended path:
- The developer lands on https://python.langchain.com, and reads through the introduction and the diagram.
- If they are just curious, they may be drawn to the [Quickstart](/docs/get_started/quickstart) to get a high-level tour of what LangChain contains.
- If they have a specific task in mind that they want to accomplish, they will be drawn to the Use-Case section. The use-case should provide a good, concrete hook that shows the value LangChain can provide them and be a good entrypoint to the framework.
- They can then move to learn more about the fundamentals of LangChain through the Expression Language sections.
- Next, they can learn about LangChain's various components and integrations.
- Finally, they can get additional knowledge through the Guides.
This is only an ideal of course - sections will inevitably reference lower or higher-level concepts that are documented in other sections.
## Guidelines
Here are some other guidelines you should think about when writing and organizing documentation.
### Linking to other sections
Because sections of the docs do not exist in a vacuum, it is important to link to other sections as often as possible
to allow a developer to learn more about an unfamiliar topic inline.
This includes linking to the API references as well as conceptual sections!
### Conciseness
In general, take a less-is-more approach. If a section with a good explanation of a concept already exists, you should link to it rather than
re-explain it, unless the concept you are documenting presents some new wrinkle.
Be concise, including in code samples.
### General style
- Use active voice and present tense whenever possible.
- Use examples and code snippets to illustrate concepts and usage.
- Use appropriate header levels (`#`, `##`, `###`, etc.) to organize the content hierarchically.
- Use bullet points and numbered lists to break down information into easily digestible chunks.
- Use tables (especially for **Reference** sections) and diagrams often to present information visually.
- Include the table of contents for longer documentation pages to help readers navigate the content, but hide it for shorter pages.
"# Logic for converting tools to string to go in prompt\n",
"def convert_tools(tools):\n",
" return \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in tools])"
]
},
{
"cell_type": "markdown",
"id": "260f5988",
"metadata": {},
"source": [
"Building an agent from a runnable usually involves a few things:\n",
"\n",
"1. Data processing for the intermediate steps. These need to be represented in a way that the language model can recognize them. This should be pretty tightly coupled to the instructions in the prompt\n",
"\n",
"2. The prompt itself\n",
"\n",
"3. The model, complete with stop tokens if needed\n",
"\n",
"4. The output parser - should be in sync with how the prompt specifies things to be formatted."
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m <tool>search</tool><tool_input>weather in New York\u001b[0m\u001b[36;1m\u001b[1;3m32 degrees\u001b[0m\u001b[32;1m\u001b[1;3m <tool>search</tool>\n",
"<tool_input>weather in New York\u001b[0m\u001b[36;1m\u001b[1;3m32 degrees\u001b[0m\u001b[32;1m\u001b[1;3m <final_answer>The weather in New York is 32 degrees\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'input': 'whats the weather in New york?',\n",
" 'output': 'The weather in New York is 32 degrees'}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent_executor.invoke({\"input\": \"whats the weather in New york?\"})"
"With LCEL you can easily add [custom routing logic](/docs/expression_language/how_to/routing#using-a-custom-function) to your chain to dynamically determine the chain logic based on user input. All you need to do is define a function that given an input returns a `Runnable`.\n",
"\n",
"One especially useful technique is to use embeddings to route a query to the most relevant prompt. Here's a very simple example."
"A black hole is a region in space where gravity is extremely strong, so strong that nothing, not even light, can escape its gravitational pull. It is formed when a massive star collapses under its own gravity during a supernova explosion. The collapse causes an incredibly dense mass to be concentrated in a small volume, creating a gravitational field that is so intense that it warps space and time. Black holes have a boundary called the event horizon, which marks the point of no return for anything that gets too close. Beyond the event horizon, the gravitational pull is so strong that even light cannot escape, hence the name \"black hole.\" While we have a good understanding of black holes, there is still much to learn, especially about what happens inside them.\n"
]
}
],
"source": [
"print(chain.invoke(\"What's a black hole\"))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f261910d-1de1-4a01-8c8a-308db02b81de",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using MATH\n",
"Thank you for your kind words! I will do my best to break down the concept of a path integral for you.\n",
"\n",
"In mathematics and physics, a path integral is a mathematical tool used to calculate the probability amplitude or wave function of a particle or system of particles. It was introduced by Richard Feynman and is an integral over all possible paths that a particle can take to go from an initial state to a final state.\n",
"\n",
"To understand the concept better, let's consider an example. Suppose we have a particle moving from point A to point B in space. Classically, we would describe this particle's motion using a definite trajectory, but in quantum mechanics, particles can simultaneously take multiple paths from A to B.\n",
"\n",
"The path integral formalism considers all possible paths that the particle could take and assigns a probability amplitude to each path. These probability amplitudes are then added up, taking into account the interference effects between different paths.\n",
"\n",
"To calculate a path integral, we need to define an action, which is a mathematical function that describes the behavior of the system. The action is usually expressed in terms of the particle's position, velocity, and time.\n",
"\n",
"Once we have the action, we can write down the path integral as an integral over all possible paths. Each path is weighted by a factor determined by the action and the principle of least action, which states that a particle takes a path that minimizes the action.\n",
"\n",
"Mathematically, the path integral is expressed as:\n",
"\n",
"∫ e^(iS/ħ) D[x(t)]\n",
"\n",
"Here, S is the action, ħ is the reduced Planck's constant, and D[x(t)] represents the integration over all possible paths x(t) of the particle.\n",
"\n",
"By evaluating this integral, we can obtain the probability amplitude for the particle to go from the initial state to the final state. The absolute square of this amplitude gives us the probability of finding the particle in a particular state.\n",
"\n",
"Path integrals have proven to be a powerful tool in various areas of physics, including quantum mechanics, quantum field theory, and statistical mechanics. They allow us to study complex systems and calculate probabilities that are difficult to obtain using other methods.\n",
"\n",
"I hope this explanation helps you understand the concept of a path integral. If you have any further questions, feel free to ask!\n"
Example code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show how to compose different Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting acquainted with LCEL, the [Prompt + LLM](/docs/expression_language/cookbook/prompt_llm_parser) page is a good place to start.
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n",
"AIMessage(content='Harrison was employed at Kensho.')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversational_qa_chain.invoke(\n",
" {\n",
" \"question\": \"where did harrison work?\",\n",
" \"chat_history\": [],\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "424e7e7a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='Harrison worked at Kensho.')"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conversational_qa_chain.invoke(\n",
" {\n",
" \"question\": \"where did he work?\",\n",
" \"chat_history\": [\n",
" HumanMessage(content=\"Who wrote this notebook?\"),\n",
" AIMessage(content=\"Harrison\"),\n",
" ],\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c5543183",
"metadata": {},
"source": [
"### With Memory and returning source documents\n",
"\n",
"This shows how to use memory with the above. For memory, we need to manage that outside at the memory. For returning the retrieved documents, we just need to pass them through all the way."
"chain = prompt | model | StrOutputParser() | search"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "55f2967d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'What sports games are on TV today & tonight? Watch and stream live sports on TV today, tonight, tomorrow. Today\\'s 2023 sports TV schedule includes football, basketball, baseball, hockey, motorsports, soccer and more. Watch on TV or stream online on ESPN, FOX, FS1, CBS, NBC, ABC, Peacock, Paramount+, fuboTV, local channels and many other networks. MLB Games Tonight: How to Watch on TV, Streaming & Odds - Thursday, September 7. Seattle Mariners\\' Julio Rodriguez greets teammates in the dugout after scoring against the Oakland Athletics in a ... Circle - Country Music and Lifestyle. Live coverage of all the MLB action today is available to you, with the information provided below. The Brewers will look to pick up a road win at PNC Park against the Pirates on Wednesday at 12:35 PM ET. Check out the latest odds and with BetMGM Sportsbook. Use bonus code \"GNPLAY\" for special offers! MLB Games Tonight: How to Watch on TV, Streaming & Odds - Tuesday, September 5. Houston Astros\\' Kyle Tucker runs after hitting a double during the fourth inning of a baseball game against the Los Angeles Angels, Sunday, Aug. 13, 2023, in Houston. (AP Photo/Eric Christian Smith) (APMedia) The Houston Astros versus the Texas Rangers is one of ... The second half of tonight\\'s college football schedule still has some good games remaining to watch on your television.. We\\'ve already seen an exciting one when Colorado upset TCU. And we saw some ...'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"input\": \"I'd like to figure out what games are tonight\"})"
"We then use the `RunnableParallel` to prepare the expected inputs into the prompt by using the entries for the retrieved documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass the user’s question:"
"We then use the `RunnableParallel` to prepare the expected inputs into the prompt by using the entries for the retrieved documents as well as the original user question, using the retriever for document search, and `RunnablePassthrough` to pass the user’s question:"
]
},
{
@@ -509,7 +509,7 @@
"source": [
"## Next steps\n",
"\n",
"We recommend reading our [Why use LCEL](/docs/expression_language/why) section next to see a side-by-side comparison of the code needed to produce common functionality with and without LCEL."
"We recommend reading our [Advantages of LCEL](/docs/expression_language/why) section next to see a side-by-side comparison of the code needed to produce common functionality with and without LCEL."
"# Create a runnable with the `@chain` decorator\n",
"# Create a runnable with the @chain decorator\n",
"\n",
"You can also turn an arbitrary function into a chain by adding a `@chain` decorator. This is functionaly equivalent to wrapping in a [`RunnableLambda`](./functions).\n",
"You can also turn an arbitrary function into a chain by adding a `@chain` decorator. This is functionaly equivalent to wrapping in a [`RunnableLambda`](/docs/expression_language/primitives/functions).\n",
"\n",
"This will have the benefit of improved observability by tracing your chain correctly. Any calls to runnables inside this function will be traced as nested childen.\n",
"There are many possible points of failure in an LLM application, whether that be issues with LLM API's, poor model outputs, issues with other integrations, etc. Fallbacks help you gracefully handle and isolate these issues.\n",
"\n",
"Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level."
]
},
{
"cell_type": "markdown",
"id": "a6bb9ba9",
"metadata": {},
"source": [
"## Handling LLM API Errors\n",
"\n",
"This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.\n",
"\n",
"IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing."
" print(openai_llm.invoke(\"Why did the chicken cross the road?\"))\n",
" except RateLimitError:\n",
" print(\"Hit error\")"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "4fc1e673",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"content=' I don\\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\\n\\n- To get to the other side!\\n\\n- It was too chicken to just stand there. \\n\\n- It wanted a change of scenery.\\n\\n- It wanted to show the possum it could be done.\\n\\n- It was on its way to a poultry farmers\\' convention.\\n\\nThe joke plays on the double meaning of \"the other side\" - literally crossing the road to the other side, or the \"other side\" meaning the afterlife. So it\\'s an anti-joke, with a silly or unexpected pun as the answer.' additional_kwargs={} example=False\n"
" print(llm.invoke(\"Why did the chicken cross the road?\"))\n",
" except RateLimitError:\n",
" print(\"Hit error\")"
]
},
{
"cell_type": "markdown",
"id": "f00bea25",
"metadata": {},
"source": [
"We can use our \"LLM with Fallbacks\" as we would a normal LLM."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4f8eaaa0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"content=\" I don't actually know why the kangaroo crossed the road, but I'm happy to take a guess! Maybe the kangaroo was trying to get to the other side to find some tasty grass to eat. Or maybe it was trying to get away from a predator or other danger. Kangaroos do need to cross roads and other open areas sometimes as part of their normal activities. Whatever the reason, I'm sure the kangaroo looked both ways before hopping across!\" additional_kwargs={} example=False\n"
"We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "6d0b8056",
"metadata": {},
"outputs": [],
"source": [
"# First let's create a chain with a ChatModel\n",
"# We add in a string output parser here so the outputs between the two are the same type\n",
"title: \"RunnableLambda: Run Custom Functions\"\n",
"keywords: [RunnableLambda, LCEL]\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "fbc4bf6e",
"metadata": {},
"source": [
"# Run custom functions\n",
"\n",
"You can use arbitrary functions in the pipeline.\n",
"\n",
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single input and unpacks it into multiple argument."
"Runnable lambdas can optionally accept a [RunnableConfig](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.config.RunnableConfig.html#langchain_core.runnables.config.RunnableConfig), which they can use to pass callbacks, tags, and other configuration information to nested runs."
"title: \"RunnableBranch: Dynamically route logic based on input\"\n",
"title: \"Route logic based on input\"\n",
"keywords: [RunnableBranch, LCEL]\n",
"---"
]
@@ -25,7 +25,7 @@
"\n",
"There are two ways to perform routing:\n",
"\n",
"1. Conditionally return runnables from a [`RunnableLambda`](./functions) (recommended)\n",
"1. Conditionally return runnables from a [`RunnableLambda`](/docs/expression_language/primitives/functions) (recommended)\n",
"2. Using a `RunnableBranch`.\n",
"\n",
"We'll illustrate both methods using a two step sequence where the first step classifies an input question as being about `LangChain`, `Anthropic`, or `Other`, then routes to a corresponding prompt chain."
"AIMessage(content=' As Dario Amodei told me, to use Anthropic IPC you first need to import it:\\n\\n```python\\nfrom anthroipc import ic\\n```\\n\\nThen you can create a client and connect to the server:\\n\\n```python \\nclient = ic.connect()\\n```\\n\\nAfter that, you can call methods on the client and get responses:\\n\\n```python\\nresponse = client.ask(\"What is the meaning of life?\")\\nprint(response)\\n```\\n\\nYou can also register callbacks to handle events: \\n\\n```python\\ndef on_poke(event):\\n print(\"Got poked!\")\\n\\nclient.on(\\'poke\\', on_poke)\\n```\\n\\nAnd that\\'s the basics of using the Anthropic IPC client library for Python! Let me know if you have any other questions!', additional_kwargs={}, example=False)"
"AIMessage(content=\"As Dario Amodei told me, to use Anthropic, you can start by exploring the company's website and learning about their mission, values, and the different services and products they offer. Anthropic is focused on developing safe and ethical AI systems, so they have a strong emphasis on transparency and responsible AI development. \\n\\nDepending on your specific needs, you can look into Anthropic's AI research and development services, which cover areas like natural language processing, computer vision, and reinforcement learning. They also offer consulting and advisory services to help organizations navigate the challenges and opportunities of AI integration.\\n\\nAdditionally, Anthropic has released some open-source AI models and tools that you can explore and experiment with. These can be a great way to get hands-on experience with Anthropic's approach to AI development.\\n\\nOverall, Anthropic aims to be a reliable and trustworthy partner in the AI space, so I'd encourage you to reach out to them directly to discuss how they can best support your specific requirements.\", response_metadata={'id': 'msg_01CtLFgFSwvTaJomrihE87Ra', 'content': [ContentBlock(text=\"As Dario Amodei told me, to use Anthropic, you can start by exploring the company's website and learning about their mission, values, and the different services and products they offer. Anthropic is focused on developing safe and ethical AI systems, so they have a strong emphasis on transparency and responsible AI development. \\n\\nDepending on your specific needs, you can look into Anthropic's AI research and development services, which cover areas like natural language processing, computer vision, and reinforcement learning. They also offer consulting and advisory services to help organizations navigate the challenges and opportunities of AI integration.\\n\\nAdditionally, Anthropic has released some open-source AI models and tools that you can explore and experiment with. These can be a great way to get hands-on experience with Anthropic's approach to AI development.\\n\\nOverall, Anthropic aims to be a reliable and trustworthy partner in the AI space, so I'd encourage you to reach out to them directly to discuss how they can best support your specific requirements.\", type='text')], 'model': 'claude-3-haiku-20240307', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': Usage(input_tokens=53, output_tokens=219)})"
]
},
"execution_count": 12,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -191,17 +183,17 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 7,
"id": "48913dc6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' As Harrison Chase told me, to use LangChain you first need to sign up for an API key at platform.langchain.com. Once you have your API key, you can install the Python library and write a simple Python script to call the LangChain API. Here is some sample code to get started:\\n\\n```python\\nimport langchain\\n\\napi_key = \"YOUR_API_KEY\"\\n\\nlangchain.set_key(api_key)\\n\\nresponse = langchain.ask(\"What is the capital of France?\")\\n\\nprint(response.response)\\n```\\n\\nThis will send the question \"What is the capital of France?\" to the LangChain API and print the response. You can customize the request by providing parameters like max_tokens, temperature, etc. The LangChain Python library documentation has more details on the available options. The key things are getting an API key and calling langchain.ask() with your question text. Let me know if you have any other questions!', additional_kwargs={}, example=False)"
"AIMessage(content=\"As Harrison Chase told me, using LangChain involves a few key steps:\\n\\n1. **Set up your environment**: Install the necessary Python packages, including the LangChain library itself, as well as any other dependencies your application might require, such as language models or other integrations.\\n\\n2. **Understand the core concepts**: LangChain revolves around a few core concepts, like Agents, Chains, and Tools. Familiarize yourself with these concepts and how they work together to build powerful language-based applications.\\n\\n3. **Identify your use case**: Determine what kind of task or application you want to build using LangChain, such as a chatbot, a question-answering system, or a document summarization tool.\\n\\n4. **Choose the appropriate components**: Based on your use case, select the right LangChain components, such as agents, chains, and tools, to build your application.\\n\\n5. **Integrate with language models**: LangChain is designed to work seamlessly with various language models, such as OpenAI's GPT-3 or Anthropic's models. Connect your chosen language model to your LangChain application.\\n\\n6. **Implement your application logic**: Use LangChain's building blocks to implement the specific functionality of your application, such as prompting the language model, processing the response, and integrating with other services or data sources.\\n\\n7. **Test and iterate**: Thoroughly test your application, gather feedback, and iterate on your design and implementation to improve its performance and user experience.\\n\\nAs Harrison Chase emphasized, LangChain provides a flexible and powerful framework for building language-based applications, making it easier to leverage the capabilities of modern language models. By following these steps, you can get started with LangChain and create innovative solutions tailored to your specific needs.\", response_metadata={'id': 'msg_01H3UXAAHG4TwxJLpxwuuVU7', 'content': [ContentBlock(text=\"As Harrison Chase told me, using LangChain involves a few key steps:\\n\\n1. **Set up your environment**: Install the necessary Python packages, including the LangChain library itself, as well as any other dependencies your application might require, such as language models or other integrations.\\n\\n2. **Understand the core concepts**: LangChain revolves around a few core concepts, like Agents, Chains, and Tools. Familiarize yourself with these concepts and how they work together to build powerful language-based applications.\\n\\n3. **Identify your use case**: Determine what kind of task or application you want to build using LangChain, such as a chatbot, a question-answering system, or a document summarization tool.\\n\\n4. **Choose the appropriate components**: Based on your use case, select the right LangChain components, such as agents, chains, and tools, to build your application.\\n\\n5. **Integrate with language models**: LangChain is designed to work seamlessly with various language models, such as OpenAI's GPT-3 or Anthropic's models. Connect your chosen language model to your LangChain application.\\n\\n6. **Implement your application logic**: Use LangChain's building blocks to implement the specific functionality of your application, such as prompting the language model, processing the response, and integrating with other services or data sources.\\n\\n7. **Test and iterate**: Thoroughly test your application, gather feedback, and iterate on your design and implementation to improve its performance and user experience.\\n\\nAs Harrison Chase emphasized, LangChain provides a flexible and powerful framework for building language-based applications, making it easier to leverage the capabilities of modern language models. By following these steps, you can get started with LangChain and create innovative solutions tailored to your specific needs.\", type='text')], 'model': 'claude-3-haiku-20240307', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': Usage(input_tokens=50, output_tokens=400)})"
"AIMessage(content=\"As Dario Amodei told me, here are some ways to use Anthropic:\\n\\n- Sign up for an account on Anthropic's website to access tools like Claude, Constitutional AI, and Writer. \\n\\n- Use Claude for tasks like email generation, customer service chat, and QA. Claude can understand natural language prompts and provide helpful responses.\\n\\n- Use Constitutional AI if you need an AI assistant that is harmless, honest, and helpful. It is designed to be safe and aligned with human values.\\n\\n- Use Writer to generate natural language content for things like marketing copy, stories, reports, and more. Give it a topic and prompt and it will create high-quality written content.\\n\\n- Check out Anthropic's documentation and blog for tips, tutorials, examples, and announcements about new capabilities as they continue to develop their AI technology.\\n\\n- Follow Anthropic on social media or subscribe to their newsletter to stay up to date on new features and releases.\\n\\n- For most people, the easiest way to leverage Anthropic's technology is through their website - just create an account to get started!\", additional_kwargs={}, example=False)"
"AIMessage(content=\"As Dario Amodei told me, to use Anthropic, you should first familiarize yourself with our mission and principles. Anthropic is committed to developing safe and beneficial artificial intelligence that can help solve important problems facing humanity. \\n\\nTo get started, I recommend exploring the resources on our website, which cover our research, products, and approach to AI development. You can also reach out to our team to learn more about how Anthropic's technology and services can support your specific needs.\\n\\nThe key is to engage with us in a way that aligns with our values of transparency, ethical AI, and a commitment to the wellbeing of humanity. We're here to collaborate and help you harness the power of advanced AI responsibly.\", response_metadata={'id': 'msg_0187BVnpniPDJnVvwf3M1LdY', 'content': [ContentBlock(text=\"As Dario Amodei told me, to use Anthropic, you should first familiarize yourself with our mission and principles. Anthropic is committed to developing safe and beneficial artificial intelligence that can help solve important problems facing humanity. \\n\\nTo get started, I recommend exploring the resources on our website, which cover our research, products, and approach to AI development. You can also reach out to our team to learn more about how Anthropic's technology and services can support your specific needs.\\n\\nThe key is to engage with us in a way that aligns with our values of transparency, ethical AI, and a commitment to the wellbeing of humanity. We're here to collaborate and help you harness the power of advanced AI responsibly.\", type='text')], 'model': 'claude-3-haiku-20240307', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': Usage(input_tokens=53, output_tokens=160)})"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "display_data"
"output_type": "execute_result"
}
],
"source": [
@@ -277,18 +270,19 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"id": "8d8caf9b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' As Harrison Chase told me, here is how you use LangChain:\\n\\nLangChain is an AI assistant that can have conversations, answer questions, and generate text. To use LangChain, you simply type or speak your input and LangChain will respond. \\n\\nYou can ask LangChain questions, have discussions, get summaries or explanations about topics, and request it togenerate text on a subject. Some examples of interactions:\\n\\n- Ask general knowledge questions and LangChain will try to answer factually. For example \"What is the capital of France?\"\\n\\n- Have conversations on topics by taking turns speaking. You can prompt the start of a conversation by saying something like \"Let\\'s discuss machine learning\"\\n\\n- Ask for summaries or high-level explanations on subjects. For example \"Can you summarize the main themes in Shakespeare\\'s Hamlet?\" \\n\\n- Give creative writing prompts or requests to have LangChain generate text in different styles. For example \"Write a short children\\'s story about a mouse\" or \"Generate a poem in the style of Robert Frost about nature\"\\n\\n- Correct LangChain if it makes an inaccurate statement and provide the right information. This helps train it.\\n\\nThe key is interacting naturally and giving it clear prompts and requests', additional_kwargs={}, example=False)"
"AIMessage(content=\"As Harrison Chase told me, using LangChain involves several key steps. First, you'll need to install the LangChain library and import the necessary modules. Then, you'll want to define your language model, any data sources you plan to use, and the specific tasks you want to accomplish, such as question answering, text generation, or agent-based reasoning. \\n\\nLangChain provides a flexible framework for building applications that leverage large language models. It includes abstractions for things like retrievers, prompts, and chains, which allow you to compose different components together to create powerful workflows. \\n\\nThe documentation on the LangChain website is excellent and covers many common use cases in detail. I'd recommend starting there to get a solid understanding of the core concepts and how to apply them to your specific needs. And of course, feel free to reach out if you have any other questions - I'm always happy to share more insights from my conversations with Harrison.\", response_metadata={'id': 'msg_01T1naS99wGPkEAP4LME8iAv', 'content': [ContentBlock(text=\"As Harrison Chase told me, using LangChain involves several key steps. First, you'll need to install the LangChain library and import the necessary modules. Then, you'll want to define your language model, any data sources you plan to use, and the specific tasks you want to accomplish, such as question answering, text generation, or agent-based reasoning. \\n\\nLangChain provides a flexible framework for building applications that leverage large language models. It includes abstractions for things like retrievers, prompts, and chains, which allow you to compose different components together to create powerful workflows. \\n\\nThe documentation on the LangChain website is excellent and covers many common use cases in detail. I'd recommend starting there to get a solid understanding of the core concepts and how to apply them to your specific needs. And of course, feel free to reach out if you have any other questions - I'm always happy to share more insights from my conversations with Harrison.\", type='text')], 'model': 'claude-3-haiku-20240307', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': Usage(input_tokens=50, output_tokens=205)})"
"As a physics professor, I would be happy to provide a concise and easy-to-understand explanation of what a black hole is.\n",
"\n",
"A black hole is an incredibly dense region of space-time where the gravitational pull is so strong that nothing, not even light, can escape from it. This means that if you were to get too close to a black hole, you would be pulled in and crushed by the intense gravitational forces.\n",
"\n",
"The formation of a black hole occurs when a massive star, much larger than our Sun, reaches the end of its life and collapses in on itself. This collapse causes the matter to become extremely dense, and the gravitational force becomes so strong that it creates a point of no return, known as the event horizon.\n",
"\n",
"Beyond the event horizon, the laws of physics as we know them break down, and the intense gravitational forces create a singularity, which is a point of infinite density and curvature in space-time.\n",
"\n",
"Black holes are fascinating and mysterious objects, and there is still much to be learned about their properties and behavior. If I were unsure about any specific details or aspects of black holes, I would readily admit that I do not have a complete understanding and would encourage further research and investigation.\n"
]
}
],
"source": [
"print(chain.invoke(\"What's a black hole\"))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "df34e469",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using MATH\n",
"A path integral is a powerful mathematical concept in physics, particularly in the field of quantum mechanics. It was developed by the renowned physicist Richard Feynman as an alternative formulation of quantum mechanics.\n",
"\n",
"In a path integral, instead of considering a single, definite path that a particle might take from one point to another, as in classical mechanics, the particle is considered to take all possible paths simultaneously. Each path is assigned a complex-valued weight, and the total probability amplitude for the particle to go from one point to another is calculated by summing (integrating) over all possible paths.\n",
"\n",
"The key ideas behind the path integral formulation are:\n",
"\n",
"1. Superposition principle: In quantum mechanics, particles can exist in a superposition of multiple states or paths simultaneously.\n",
"\n",
"2. Probability amplitude: The probability amplitude for a particle to go from one point to another is calculated by summing the complex-valued weights of all possible paths.\n",
"\n",
"3. Weighting of paths: Each path is assigned a weight based on the action (the time integral of the Lagrangian) along that path. Paths with lower action have a greater weight.\n",
"\n",
"4. Feynman's approach: Feynman developed the path integral formulation as an alternative to the traditional wave function approach in quantum mechanics, providing a more intuitive and conceptual understanding of quantum phenomena.\n",
"\n",
"The path integral approach is particularly useful in quantum field theory, where it provides a powerful framework for calculating transition probabilities and understanding the behavior of quantum systems. It has also found applications in various areas of physics, such as condensed matter, statistical mechanics, and even in finance (the path integral approach to option pricing).\n",
"\n",
"The mathematical construction of the path integral involves the use of advanced concepts from functional analysis and measure theory, making it a powerful and sophisticated tool in the physicist's arsenal.\n"
LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
LCEL was designed from day 1 to **support putting prototypes in production, with no code changes**, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of the reasons you might want to use LCEL:
When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw tokens.
**Async support**
Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langsmith) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while prototyping) as well as with the asynchronous API (eg. in a [LangServe](/docs/langserve) server). This enables using the same code for prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest possible latency.
**Retries and fallbacks**
[**Retries and fallbacks**](/docs/guides/productionization/fallbacks)
Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the added reliability without any latency cost.
For more complex chains it’s often very useful to access the results of intermediate steps even before the final output is produced. This can be used to let end-users know something is happening, or even just to debug your chain. You can stream intermediate results, and it’s available on every [LangServe](/docs/langserve) server.
**Input and output schemas**
[**Input and output schemas**](/docs/expression_language/interface#input-schema)
Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
**Seamless LangSmith tracing integration**
[**Seamless LangSmith tracing**](/docs/langsmith)
As your chains get more and more complex, it becomes increasingly important to understand what exactly is happening at every step.
With LCEL, **all** steps are automatically logged to [LangSmith](/docs/langsmith/) for maximum observability and debuggability.
"To make it as easy as possible to create custom chains, we've implemented a [\"Runnable\"](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. The `Runnable` protocol is implemented for most components. \n",
"To make it as easy as possible to create custom chains, we've implemented a [\"Runnable\"](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol. Many LangChain components implement the `Runnable` protocol, including chat models, LLMs, output parsers, retrievers, prompt templates, and more. There are also several useful primitives for working with runnables, which you can read about [in this section](/docs/expression_language/primitives).\n",
"\n",
"This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way. \n",
"The standard interface includes:\n",
"\n",
@@ -24,7 +25,7 @@
"- [`invoke`](#invoke): call the chain on an input\n",
"- [`batch`](#batch): call the chain on a list of inputs\n",
"\n",
"These also have corresponding async methods:\n",
"These also have corresponding async methods that should be used with [asyncio](https://docs.python.org/3/library/asyncio.html) `await` syntax for concurrency:\n",
"\n",
"- [`astream`](#async-stream): stream back chunks of the response async\n",
"- [`ainvoke`](#async-invoke): call the chain on an input async\n",
"The `RunnablePassthrough.assign(...)` static method takes an input value and adds the extra arguments passed to the assign function.\n",
"\n",
"This is useful when additively creating a dictionary to use as input to a later step, which is a common LCEL pattern.\n",
"\n",
"Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mWARNING: You are using pip version 22.0.4; however, version 24.0 is available.\n",
"You should consider upgrading via the '/Users/jacoblee/.pyenv/versions/3.10.5/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
"\u001b[0mNote: you may need to restart the kernel to use updated packages.\n"
"- The input to the chain is `{\"num\": 1}`. This is passed into a `RunnableParallel`, which invokes the runnables it is passed in parallel with that input.\n",
"- The value under the `extra` key is invoked. `RunnablePassthrough.assign()` keeps the original keys in the input dict (`{\"num\": 1}`), and assigns a new key called `mult`. The value is `lambda x: x[\"num\"] * 3)`, which is `3`. Thus, the result is `{\"num\": 1, \"mult\": 3}`.\n",
"- `{\"num\": 1, \"mult\": 3}` is returned to the `RunnableParallel` call, and is set as the value to the key `extra`.\n",
"- At the same time, the `modified` key is called. The result is `2`, since the lambda extracts a key called `\"num\"` from its input and adds one.\n",
"\n",
"Thus, the result is `{'extra': {'num': 1, 'mult': 3}, 'modified': 2}`.\n",
"\n",
"## Streaming\n",
"\n",
"One nice feature of this method is that it allows values to pass through as soon as they are available. To show this off, we'll use `RunnablePassthrough.assign()` to immediately return source docs in a retrieval chain:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'question': 'where did harrison work?'}\n",
"{'context': [Document(page_content='harrison worked at kensho')]}\n",
"stream = retrieval_chain.stream(\"where did harrison work?\")\n",
"\n",
"for chunk in stream:\n",
" print(chunk)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the first chunk contains the original `\"question\"` since that is immediately available. The second chunk contains `\"context\"` since the retriever finishes second. Finally, the output from the `generation_chain` streams in chunks as soon as it is available."
"Sometimes we want to invoke a Runnable within a Runnable sequence with constant arguments that are not part of the output of the preceding Runnable in the sequence, and which are not part of the user input. We can use `Runnable.bind()` to easily pass these arguments in.\n",
"Sometimes we want to invoke a Runnable within a Runnable sequence with constant arguments that are not part of the output of the preceding Runnable in the sequence, and which are not part of the user input. We can use `Runnable.bind()` to pass these arguments in.\n",
"\n",
"Suppose we have a simple prompt + model sequence:"
"You can use arbitrary functions in the pipeline.\n",
"\n",
"Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single input and unpacks it into multiple argument."
"Runnable lambdas can optionally accept a [RunnableConfig](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.config.RunnableConfig.html#langchain_core.runnables.config.RunnableConfig), which they can use to pass callbacks, tags, and other configuration information to nested runs."
"RunnableParallel can be useful for manipulating the output of one Runnable to match the input format of the next Runnable in a sequence.\n",
"The `RunnableParallel` primitive is essentially a dict whose values are runnables (or things that can be coerced to runnables, like functions). It runs all of its values in parallel, and each value is called with the overall input of the `RunnableParallel`. The final return value is a dict with the results of each value under its appropriate key.\n",
"\n",
"Here the input to prompt is expected to be a map with keys \"context\" and \"question\". The user input is just the question. So we need to get the context using our retriever and passthrough the user input under the \"question\" key.\n",
"It is useful for parallelizing operations, but can also be useful for manipulating the output of one Runnable to match the input format of the next Runnable in a sequence.\n",
"\n",
"\n"
"Here the input to prompt is expected to be a map with keys \"context\" and \"question\". The user input is just the question. So we need to get the context using our retriever and passthrough the user input under the \"question\" key.\n"
"RunnablePassthrough allows to pass inputs unchanged or with the addition of extra keys. This typically is used in conjuction with RunnableParallel to assign data to a new key in the map. \n",
"\n",
"RunnablePassthrough() called on it's own, will simply take the input and pass it through. \n",
"\n",
"RunnablePassthrough called with assign (`RunnablePassthrough.assign(...)`) will take the input, and will add the extra arguments passed to the assign function. \n",
"RunnablePassthrough on its own allows you to pass inputs unchanged. This typically is used in conjuction with RunnableParallel to pass data through to a new key in the map. \n",
"As seen above, `passed` key was called with `RunnablePassthrough()` and so it simply passed on `{'num': 1}`. \n",
"\n",
"In the second line, we used `RunnablePastshrough.assign` with a lambda that multiplies the numerical value by 3. In this cased, `extra` was set with `{'num': 1, 'mult': 3}` which is the original value with the `mult` key added. \n",
"\n",
"Finally, we also set a third key in the map with `modified` which uses a lambda to set a single value adding 1 to the num, which resulted in `modified` key with the value of `2`."
"We also set a second key in the map with `modified`. This uses a lambda to set a single value adding 1 to the num, which resulted in `modified` key with the value of `2`."
]
},
{
@@ -86,7 +79,7 @@
"source": [
"## Retrieval Example\n",
"\n",
"In the example below, we see a use case where we use RunnablePassthrough along with RunnableMap. "
"In the example below, we see a use case where we use `RunnablePassthrough` along with `RunnableParallel`. "
"One key advantage of the `Runnable` interface is that any two runnables can be \"chained\" together into sequences. The output of the previous runnable's `.invoke()` call is passed as input to the next runnable. This can be done using the pipe operator (`|`), or the more explicit `.pipe()` method, which does the same thing. The resulting `RunnableSequence` is itself a runnable, which means it can be invoked, streamed, or piped just like any other runnable.\n",
"\n",
"## The pipe operator\n",
"\n",
"To show off how this works, let's go through an example. We'll walk through a common pattern in LangChain: using a [prompt template](/docs/modules/model_io/prompts/) to format input into a [chat model](/docs/modules/model_io/chat/), and finally converting the chat message output into a string with an [output parser](/docs/modules/model_io/output_parsers/)."
"Prompts and models are both runnable, and the output type from the prompt call is the same as the input type of the chat model, so we can chain them together. We can then invoke the resulting sequence like any other runnable:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Here's a bear joke for you:\\n\\nWhy don't bears wear socks? \\nBecause they have bear feet!\\n\\nHow's that? I tried to keep it light and silly. Bears can make for some fun puns and jokes. Let me know if you'd like to hear another one!\""
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"topic\": \"bears\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Coercion\n",
"\n",
"We can even combine this chain with more runnables to create another chain. This may involve some input/output formatting using other types of runnables, depending on the required inputs and outputs of the chain components.\n",
"\n",
"For example, let's say we wanted to compose the joke generating chain with another chain that evaluates whether or not the generated joke was funny.\n",
"\n",
"We would need to be careful with how we format the input into the next chain. In the below example, the dict in the chain is automatically parsed and converted into a [`RunnableParallel`](/docs/expression_language/primitives/parallel), which runs all of its values in parallel and returns a dict with the results.\n",
"\n",
"This happens to be the same format the next prompt template expects. Here it is in action:"
"analysis_prompt = ChatPromptTemplate.from_template(\"is this a funny joke? {joke}\")\n",
"\n",
"composed_chain = {\"joke\": chain} | analysis_prompt | model | StrOutputParser()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"That's a pretty classic and well-known bear pun joke. Whether it's considered funny is quite subjective, as humor is very personal. Some people may find that type of pun-based joke amusing, while others may not find it that humorous. Ultimately, the funniness of a joke is in the eye (or ear) of the beholder. If you enjoyed the joke and got a chuckle out of it, then that's what matters most.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"composed_chain.invoke({\"topic\": \"bears\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Functions will also be coerced into runnables, so you can add custom logic to your chains too. The below chain results in the same logical flow as before:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"composed_chain_with_lambda = (\n",
" chain\n",
" | (lambda input: {\"joke\": input})\n",
" | analysis_prompt\n",
" | model\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'I appreciate the effort, but I have to be honest - I didn\\'t find that joke particularly funny. Beet-themed puns can be quite hit-or-miss, and this one falls more on the \"miss\" side for me. The premise is a bit too straightforward and predictable. While I can see the logic behind it, the punchline just doesn\\'t pack much of a comedic punch. \\n\\nThat said, I do admire your willingness to explore puns and wordplay around vegetables. Cultivating a good sense of humor takes practice, and not every joke is going to land. The important thing is to keep experimenting and finding what works. Maybe try for a more unexpected or creative twist on beet-related humor next time. But thanks for sharing - I always appreciate when humans test out jokes on me, even if they don\\'t always make me laugh out loud.'"
"However, keep in mind that using functions like this may interfere with operations like streaming. See [this section](/docs/expression_language/primitives/functions) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The `.pipe()` method\n",
"\n",
"We could also compose the same sequence using the `.pipe()` method. Here's what that looks like:"
"'That\\'s a pretty good Battlestar Galactica-themed pun! I appreciated the clever play on words with \"Centurion\" and \"center on.\" It\\'s the kind of nerdy, science fiction-inspired humor that fans of the show would likely enjoy. The joke is clever and demonstrates a good understanding of the Battlestar Galactica universe. I\\'d be curious to hear any other Battlestar-related jokes you might have up your sleeve. As long as they don\\'t reproduce copyrighted material, I\\'m happy to provide my thoughts on the humor and appeal for fans of the show.'"
"You might notice above that `parser` actually doesn't block the streaming output from the model, and instead processes each chunk individually. Many of the [LCEL primitives](/docs/expression_language/primitives) also support this kind of transform-style passthrough streaming, which can be very convenient when constructing apps.\n",
"\n",
"Certain runnables, like [prompt templates](/docs/modules/model_io/prompts) and [chat models](/docs/modules/model_io/chat), cannot process individual chunks and instead aggregate all previous steps. This will interrupt the streaming process. Custom functions can be [designed to return generators](/docs/expression_language/primitives/functions#streaming), which"
]
},
{
"cell_type": "markdown",
"id": "1b399fb4-5e3c-4581-9570-6df9b42b623d",
"metadata": {},
"source": [
":::{.callout-note}\n",
"You do not have to use the `LangChain Expression Language` to use LangChain and can instead rely on a standard **imperative** programming approach by\n",
"If the above functionality is not relevant to what you're building, you do not have to use the `LangChain Expression Language` to use LangChain and can instead rely on a standard **imperative** programming approach by\n",
"caling `invoke`, `batch` or `stream` on each component individually, assigning the results to variables and then using them downstream as you see fit.\n",
"\n",
"If that works for your needs, then that's fine by us 👌!\n",
@@ -29,13 +33,6 @@ If you want to install from source, you can do so by cloning the repo and be sur
pip install -e .
```
## LangChain community
The `langchain-community` package contains third-party integrations. It is automatically installed by `langchain`, but can also be used separately. Install with:
```bash
pip install langchain-community
```
## LangChain core
The `langchain-core` package contains base abstractions that the rest of the LangChain ecosystem uses, along with the LangChain Expression Language. It is automatically installed by `langchain`, but can also be used separately. Install with:
@@ -43,6 +40,13 @@ The `langchain-core` package contains base abstractions that the rest of the Lan
pip install langchain-core
```
## LangChain community
The `langchain-community` package contains third-party integrations. It is automatically installed by `langchain`, but can also be used separately. Install with:
```bash
pip install langchain-community
```
## LangChain experimental
The `langchain-experimental` package holds experimental LangChain code, intended for research and experimental uses.
Install with:
@@ -51,6 +55,13 @@ Install with:
pip install langchain-experimental
```
## LangGraph
`langgraph` is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain.
Install with:
```bash
pip install langgraph
```
## LangServe
LangServe helps developers deploy LangChain runnables and chains as a REST API.
LangServe is automatically installed by LangChain CLI.
**LangChain** is a framework for developing applications powered by language models. It enables applications that:
- **Are context-aware**: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- **Reason**: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
**LangChain** is a framework for developing applications powered by large language models (LLMs).
This framework consists of several parts.
- **LangChain Libraries**: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- **[LangChain Templates](/docs/templates)**: A collection of easily deployable reference architectures for a wide variety of tasks.
- **[LangServe](/docs/langserve)**: A library for deploying LangChain chains as a REST API.
- **[LangSmith](/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
LangChain simplifies every stage of the LLM application lifecycle:
- **Development**: Build your applications using LangChain's open-source [building blocks](/docs/expression_language/) and [components](/docs/modules/). Hit the ground running using [third-party integrations](/docs/integrations/platforms/) and [Templates](/docs/templates).
- **Productionization**: Use [LangSmith](/docs/langsmith/) to inspect, monitor and evaluate your chains, so that you can continuously optimize and deploy with confidence.
- **Deployment**: Turn any chain into an API with [LangServe](/docs/langserve).
import ThemedImage from '@theme/ThemedImage';
import useBaseUrl from '@docusaurus/useBaseUrl';
<ThemedImage
alt="Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers."
Together, these products simplify the entire application lifecycle:
- **Develop**: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
- **Productionize**: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.
- **Deploy**: Turn any chain into an API with LangServe.
Concretely, the framework consists of the following open-source libraries:
## LangChain Libraries
The main value props of the LangChain packages are:
1. **Components**: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
2. **Off-the-shelf chains**: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
The LangChain libraries themselves are made up of several different packages.
- **`langchain-core`**: Base abstractions and LangChain Expression Language.
- **`langchain-community`**: Third party integrations.
- Partner packages (e.g. **`langchain-openai`**, **`langchain-anthropic`**, etc.): Some integrations have been further split into their own lightweight packages that only depend on **`langchain-core`**.
- **`langchain`**: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
- **[langgraph](/docs/langgraph)**: Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph.
- **[langserve](/docs/langserve)**: Deploy LangChain chains as REST APIs.
The broader ecosystem includes:
- **[LangSmith](/docs/langsmith)**: A developer platform that lets you debug, test, evaluate, and monitor LLM applications and seamlessly integrates with LangChain.
## Get started
[Here’s](/docs/get_started/installation) how to install LangChain, set up your environment, and start building.
We recommend following our [Quickstart](/docs/get_started/quickstart) guide to familiarize yourself with the framework by building your first LangChain application.
Read up on our [Security](/docs/security) best practices to make sure you're developing safely with LangChain.
[See here](/docs/get_started/installation) for instructions on how to install LangChain, set up your environment, and start building.
:::note
@@ -57,48 +49,53 @@ These docs focus on the Python LangChain library. [Head here](https://js.langcha
:::
## LangChain Expression Language (LCEL)
## Use cases
LCEL is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.
If you're looking to build something specific or are more of a hands-on learner, check out our [use-cases](/docs/use_cases).
They're walkthroughs and techniques for common end-to-end tasks, such as:
- **[Overview](/docs/expression_language/)**: LCEL and its benefits
- **[Interface](/docs/expression_language/interface)**: The standard interface for LCEL objects
- **[How-to](/docs/expression_language/how_to)**: Key features of LCEL
- **[Cookbook](/docs/expression_language/cookbook)**: Example code for accomplishing common tasks
## Modules
LangChain provides standard, extendable interfaces and integrations for the following modules:
#### [Model I/O](/docs/modules/model_io/)
Interface with language models
#### [Retrieval](/docs/modules/data_connection/)
Interface with application-specific data
#### [Agents](/docs/modules/agents/)
Let models choose which tools to use given high-level directives
LangChain Expression Language (LCEL) is the foundation of many of LangChain's components, and is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.
- **[Get started](/docs/expression_language/)**: LCEL and its benefits
- **[Runnable interface](/docs/expression_language/interface)**: The standard interface for LCEL objects
- **[Primitives](/docs/expression_language/primitives)**: More on the primitives LCEL includes
- and more!
## Ecosystem
### [🦜🛠️ LangSmith](/docs/langsmith)
Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
### [🦜🕸️ LangGraph](/docs/langgraph)
Build stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain primitives.
### [🦜🏓 LangServe](/docs/langserve)
Deploy LangChain runnables and chains as REST APIs.
## [Security](/docs/security)
Read up on our [Security](/docs/security) best practices to make sure you're developing safely with LangChain.
## Additional resources
### [Components](/docs/modules/)
LangChain provides standard, extendable interfaces and integrations for many different components, including:
### [Integrations](/docs/integrations/providers/)
LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing list of [integrations](/docs/integrations/providers/).
@@ -90,12 +94,12 @@ from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
```
If you'd prefer not to set an environment variable you can pass the key in directly via the `openai_api_key` named parameter when initiating the OpenAI LLM class:
If you'd prefer not to set an environment variable you can pass the key in directly via the `api_key` named parameter when initiating the OpenAI LLM class:
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(openai_api_key="...")
llm = ChatOpenAI(api_key="...")
```
</TabItem>
@@ -137,10 +141,10 @@ from langchain_anthropic import ChatAnthropic
If you'd prefer not to set an environment variable you can pass the key in directly via the `anthropic_api_key` named parameter when initiating the Anthropic Chat Model class:
If you'd prefer not to set an environment variable you can pass the key in directly via the `api_key` named parameter when initiating the Anthropic Chat Model class:
First we'll need to import the Cohere SDK package.
```shell
pip install cohere
pip install langchain-cohere
```
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://dashboard.cohere.com/api-keys). Once we have a key we'll want to set it as an environment variable by running:
@@ -161,7 +165,7 @@ export COHERE_API_KEY="..."
We can then initialize the model:
```python
from langchain_community.chat_models import ChatCohere
from langchain_cohere import ChatCohere
llm = ChatCohere()
```
@@ -169,7 +173,7 @@ llm = ChatCohere()
If you'd prefer not to set an environment variable you can pass the key in directly via the `cohere_api_key` named parameter when initiating the Cohere LLM class:
```python
from langchain_community.chat_models import ChatCohere
from langchain_cohere import ChatCohere
llm = ChatCohere(cohere_api_key="...")
```
@@ -190,7 +194,7 @@ Prompt templates convert raw user input to better input to the LLM.
```python
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are world class technical documentation writer."),
("system", "You are a world class technical documentation writer."),
@@ -8,11 +8,11 @@ Here are a few different tools and functionalities to aid in debugging.
## Tracing
Platforms with tracing capabilities like [LangSmith](/docs/langsmith/) and [WandB](/docs/integrations/providers/wandb_tracing) are the most comprehensive solutions for debugging. These platforms make it easy to not only log and visualize LLM apps, but also to actively debug, test and refine them.
Platforms with tracing capabilities like [LangSmith](/docs/langsmith/) are the most comprehensive solutions for debugging. These platforms make it easy to not only log and visualize LLM apps, but also to actively debug, test and refine them.
For anyone building production-grade LLM applications, we highly recommend using a platform like this.
When building production-grade LLM applications, platforms like this are essential.


## `set_debug` and `set_verbose`
@@ -27,7 +27,7 @@ Let's suppose we have a simple agent, and want to visualize the actions it takes
from langchain.agents import AgentType, initialize_agent, load_tools
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.