#### Description
Fixed the following error with `rerank` method from `CohereRerank`:
```
---> [79](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:79) results = self.client.rerank(
[80](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:80) query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc
[81](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:81) )
[82](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:82) result_dicts = []
[83](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:83) for res in results.results:
TypeError: BaseCohere.rerank() takes 1 positional argument but 4 positional arguments (and 2 keyword-only arguments) were given
```
This was easily fixed going from this:
```
def rerank(
self,
documents: Sequence[Union[str, Document, dict]],
query: str,
*,
model: Optional[str] = None,
top_n: Optional[int] = -1,
max_chunks_per_doc: Optional[int] = None,
) -> List[Dict[str, Any]]:
...
if len(documents) == 0: # to avoid empty api call
return []
docs = [
doc.page_content if isinstance(doc, Document) else doc for doc in documents
]
model = model or self.model
top_n = top_n if (top_n is None or top_n > 0) else self.top_n
results = self.client.rerank(
query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc
)
result_dicts = []
for res in results:
result_dicts.append(
{"index": res.index, "relevance_score": res.relevance_score}
)
return result_dicts
```
to this:
```
def rerank(
self,
documents: Sequence[Union[str, Document, dict]],
query: str,
*,
model: Optional[str] = None,
top_n: Optional[int] = -1,
max_chunks_per_doc: Optional[int] = None,
) -> List[Dict[str, Any]]:
...
if len(documents) == 0: # to avoid empty api call
return []
docs = [
doc.page_content if isinstance(doc, Document) else doc for doc in documents
]
model = model or self.model
top_n = top_n if (top_n is None or top_n > 0) else self.top_n
results = self.client.rerank(
query=query, documents=docs, model=model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc <-------------
)
result_dicts = []
for res in results.results: <-------------
result_dicts.append(
{"index": res.index, "relevance_score": res.relevance_score}
)
return result_dicts
```
#### Unit & Integration tests
I added a unit test to check the behaviour of `rerank`. Also fixed the
original integration test which was failing.
#### Format & Linting
Everything worked properly with `make lint_diff`, `make format_diff` and
`make format`. However I noticed an error coming from other part of the
library when doing `make lint`:
```
(langchain-py3.9) ➜ langchain git:(master) make format
[ "." = "" ] || poetry run ruff format .
1636 files left unchanged
[ "." = "" ] || poetry run ruff --select I --fix .
(langchain-py3.9) ➜ langchain git:(master) make lint
./scripts/check_pydantic.sh .
./scripts/lint_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run ruff format . --diff
1636 files already formatted
[ "." = "" ] || poetry run ruff --select I .
[ "." = "" ] || mkdir -p .mypy_cache && poetry run mypy . --cache-dir .mypy_cache
langchain/agents/openai_assistant/base.py:252: error: Argument "file_ids" to "create" of "Assistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]" [arg-type]
langchain/agents/openai_assistant/base.py:374: error: Argument "file_ids" to "create" of "AsyncAssistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]" [arg-type]
Found 2 errors in 1 file (checked 1634 source files)
make: *** [Makefile:65: lint] Error 1
```
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
### Issue
Recently, the new `allow_dangerous_deserialization` flag was introduced
for preventing unsafe model deserialization that relies on pickle
without user's notice (#18696). Since then some LLMs like Databricks
requires passing in this flag with true to instantiate the model.
However, this breaks existing functionality to loading such LLMs within
a chain using `load_chain` method, because the underlying loader
function
[load_llm_from_config](f96dd57501/libs/langchain/langchain/chains/loading.py (L40))
(and load_llm) ignores keyword arguments passed in.
### Solution
This PR fixes this issue by propagating the
`allow_dangerous_deserialization` argument to the class loader iff the
LLM class has that field.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Description
This PR proposes a modification to the `libs/langchain/dev.Dockerfile`
configuration to copy the `libs/langchain/poetry.lock` into the working
directory. The change aims to address the issue where the Poetry install
command, the last command in the `dev.Dockerfile`, takes excessively
long hours, and to ensure the reproducibility of the poetry environment
in the devcontainer.
## Problem
The `dev.Dockerfile`, prepared for development environments such as
`.devcontainer`, encounters an unending dependency resolution when
attempting the Poetry installation.
### Steps to Reproduce
Execute the following build command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The Docker build process gets stuck at the following step, which, in my
experience, did not conclude even after an entire night:
```
=> [langchain-dev-dependencies 4/6] COPY libs/community/ ../community/ 0.9s
=> [langchain-dev-dependencies 5/6] COPY libs/text-splitters/ ../text-splitters/ 0.0s
=> [langchain-dev-dependencies 6/6] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 12.3s
=> => # Updating dependencies
=> => # Resolving dependencies...
```
### Expected Behavior
The Docker build completes in a realistic timeframe. By applying this
PR, the build finishes within a few minutes.
### Analysis
The complexity of LangChain's dependencies has reached a point where
Poetry is required to resolve dependencies akin to threading a needle.
Consequently, poetry install fails to complete in a practical timeframe.
## Solution
The solution for dependency resolution is already recorded in
`libs/langchain/poetry.lock`, so we can use it. When copying
`project.toml` and `poetry.toml`, the `poetry.lock` located in the same
directory should also be copied.
```diff
# Copy only the dependency files for installation
-COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml ./
+COPY libs/langchain/pyproject.toml libs/langchain/poetry.toml libs/langchain/poetry.lock ./
```
## Note
I am not intimately familiar with the historical context of the
`dev.Dockerfile` and thus do not know why `poetry.lock` has not been
copied until now. It might have been an oversight, or perhaps dependency
resolution used to complete quickly even without the `poetry.lock` file
in the past. However, if there are deliberate reasons why copying
`poetry.lock` is not advisable, please just close this PR.
This is a small breaking change but I think it should be done as:
* No external dependency needs to be installed anymore for the default
to work
* It is vendor-neutral
Fixing some issues for AzureCosmosDBSemanticCache
- Added the entry for "AzureCosmosDBSemanticCache" which was missing in
langchain/cache.py
- Added application name when creating the MongoClient for the
AzureCosmosDBVectorSearch, for tracking purposes.
@baskaryan, can you please review this PR, we need this to go in asap.
These are just small fixes which we found today in our testing.
Description: adds support for langchain_cohere
---------
Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** This change passes through `batch_size` to
`add_documents()`/`aadd_documents()` on calls to `index()` and
`aindex()` such that the documents are processed in the expected batch
size.
**Issue:** #19415
**Dependencies:** N/A
**Twitter handle:** N/A
**Description:**
Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.
I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.
The motivation is to minimize compute/data loss when problems occur:
* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.
**Issue:**
Implements enhancement #18026.
**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Changing OpenAIAssistantRunnable.create_assistant to send the `file_ids`
parameter to openai.beta.assistants.create
Co-authored-by: Frederico Wu <fred.diaswu@coxautoinc.com>
## Description
This PR modifies the settings in `libs/langchain/dev.Dockerfile` to
ensure that the `text-splitters` directory is copied before the poetry
installation process begins.
Without this modification, the `docker build` command fails for
`dev.Dockerfile`, preventing the setup of some development environments,
including `.devcontainer`.
## Bug Details
### Repro
Run the following command:
```bash
docker build -f libs/langchain/dev.Dockerfile .
```
### Current Behavior
The docker build command fails, raising the following error:
```
...
=> [langchain-dev-dependencies 4/5] COPY libs/community/ ../community/ 0.4s
=> ERROR [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs 1.1s
------
> [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
#13 0.970
#13 0.970 Directory ../text-splitters does not exist
------
executor failed running [/bin/sh -c poetry install --no-interaction --no-ansi --with dev,test,docs]: exit code: 1
```
### Expected Behavior
The `docker build` command successfully completes without the poetry
error.
### Analysis
The error occurs because the `text-splitters` directory is not copied
into the build environment, unlike the other packages under the `libs`
directory. I suspect that the `COPY` setting was overlooked since
`text-splitters` was separated in a recent PR.
## Fix
Add the following lines to the `libs/langchain/dev.Dockerfile`:
```dockerfile
# Copy the text-splitters library for installation
COPY libs/text-splitters/ ../text-splitters/
```
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
Issue : _call method of LLMRouterChain uses predict_and_parse, which is
slated for deprecation.
Description : Instead of using predict_and_parse, this replaces it with
individual predict and parse functions.
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** add tools_renderer for various non-openai agents,
make tools can be render in different ways for your LLM.
- **Issue:** N/A
- **Dependencies:** N/A
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description:
This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:
- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
@hwchase17,@baskaryan, @eyurtsev,
**Description:** Add facility to pass the optional output parser to
customize the parsing logic
---------
Co-authored-by: hasan <hasan@m2sys.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Avoids deprecation warning that triggered at import time, e.g. with
`python -c 'import langchain.smith'`
/opt/venv/lib/python3.12/site-packages/langchain/callbacks/__init__.py:37:
LangChainDeprecationWarning: Importing this callback from langchain is
deprecated. Importing it from langchain will no longer be supported as
of langchain==0.2.0. Please import from langchain-community instead:
`from langchain_community.callbacks import base`.
To install langchain-community run `pip install -U langchain-community`.
- **Description:** Introduce a new parameter `graph_kwargs` to
`RdfGraph` - parameters used to initialize the `rdflib.Graph` if
`query_endpoint` is set. Also, do not set
`rdflib.graph.DATASET_DEFAULT_GRAPH_ID` as default value for the
`rdflib.Graph` `identifier` if `query_endpoint` is set.
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** N/A
- **Description:** I encountered this error when I tried to use
LLMChainFilter. Even if the message slightly differs, like `Not relevant
(NO)` this results in an error. It has been reported already here:
https://github.com/langchain-ai/langchain/issues/. This change hopefully
makes it more robust.
- **Issue:** #11408
- **Dependencies:** No
- **Twitter handle:** dokatox
- **Description:** Added the `return_sparql_query` feature to the
`GraphSparqlQAChain` class, allowing users to get the formatted SPARQL
query along with the chain's result.
- **Issue:** NA
- **Dependencies:** None
Note: I've ensured that the PR passes linting and testing by running
make format, make lint, and make test locally.
I have added a test for the integration (which relies on network access)
and I have added an example to the notebook showing its use.
- **Description:** In order to override the bool value of
"fetch_schema_from_transport" in the GraphQLAPIWrapper, a
"fetch_schema_from_transport" value needed to be added to the
"_EXTRA_OPTIONAL_TOOLS" dictionary in load_tools in the "graphql" key.
The parameter "fetch_schema_from_transport" must also be passed in to
the GraphQLAPIWrapper to allow reading of the value when creating the
client. Passing as an optional parameter is probably best to avoid
breaking changes. This change is necessary to support GraphQL instances
that do not support fetching schema, such as TigerGraph. More info here:
[TigerGraph GraphQL Schema
Docs](https://docs.tigergraph.com/graphql/current/schema)
- **Threads handle:** @zacharytoliver
---------
Co-authored-by: Zachary Toliver <zt10191991@hotmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Sent to LangSmith
Thank you for contributing to LangChain!
Checklist:
- [ ] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] PR message: **Delete this entire template message** and replace it
with the following bulleted list
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [ ] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
- **Description:** Callback manager can't catch chain input or output
validation errors because `prepare_input` and `prepare_output` are not
part of the try/raise logic, this PR fixes that logic.
- **Issue:** #15954
- **Description :**
Fix: Use shallow copy for schema manipulation in get_format_instructions
Prevents side effects on the original schema object by using a
dictionary comprehension for a safer and more controlled manipulation of
schema key-value pairs, enhancing code reliability.
- **Issue:** #17161
- **Dependencies:** None
- **Twitter handle:** None
- **Description:** Depending on `token_max` used in
`load_summarize_chain`, it could cause an infinite loop when documents
cannot collapse under `token_max`. This change would not affect the
existing feature, but it also gives an option to users to avoid the
situation.
- **Issue:** https://github.com/langchain-ai/langchain/issues/16251
- **Dependencies:** None
- **Twitter handle:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Description:
Addresses a problem where the Date type within an Elasticsearch
SelfQueryRetriever would encounter difficulties in generating a valid
query.
Issue: #17042
---------
Co-authored-by: Max Jakob <max.jakob@elastic.co>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.
**Issue**: This is a new feature
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.
---------
Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** This PR adds support for
[flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for
reranking as alternative to Cohere.
I'm not sure `libs/langchain` is the right place for this change. At
first, I wanted to put it under `libs/community`. All the compressors
were under `libs/langchain/retrievers/document_compressors` though. Hope
this makes sense!
- **Description:**
[AS-IS] When dealing with a yaml file, the extension must be .yaml.
[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.
It's as if it's an error because it's a .jpg extension in jpeg support.
- **Issue:** -
- **Dependencies:**
no dependencies required for this change,
Unlike vector results, the LLM has to completely trust the context of a
graph database result, even if it doesn't provide whole context. We
tried with instructions, but it seems that adding a single example is
the way to go to solve this issue.
## Summary
This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** This adds a recursive json splitter class to the
existing text_splitters as well as unit tests
- **Issue:** splitting text from structured data can cause issues if you
have a large nested json object and you split it as regular text you may
end up losing the structure of the json. To mitigate against this you
can split the nested json into large chunks and overlap them, but this
causes unnecessary text processing and there will still be times where
the nested json is so big that the chunks get separated from the parent
keys.
As an example you wouldn't want the following to be split in half:
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
-----------------------------------------------------------------------split-----
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf',
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
Any llm processing the second chunk of text may not have the context of
val1, and val16 reducing accuracy. Embeddings will also lack this
context and this makes retrieval less accurate.
Instead you want it to be split into chunks that retain the json
structure.
```shell
{'val0': 'DFWeNdWhapbR',
'val1': {'val10': 'QdJo',
'val11': 'FWSDVFHClW',
'val12': 'bkVnXMMlTiQh',
'val13': 'tdDMKRrOY',
'val14': 'zybPALvL',
'val15': 'JMzGMNH',
'val16': {'val160': 'qLuLKusFw',
'val161': 'DGuotLh',
'val162': 'KztlcSBropT',
'val163': 'YlHHDrN',
'val164': 'CtzsxlGBZKf'}}}
```
and
```shell
{'val1':{'val16':{
'val165': 'bXzhcrWLmBFp',
'val166': 'zZAqC',
'val167': 'ZtyWno',
'val168': 'nQQZRsLnaBhb',
'val169': 'gSpMbJwA'},
'val17': 'JhgiyF',
'val18': 'aJaqjUSFFrI',
'val19': 'glqNSvoyxdg'}}
```
This recursive json text splitter does this. Values that contain a list
can be converted to dict first by using split(... convert_lists=True)
otherwise long lists will not be split and you may end up with chunks
larger than the max chunk.
In my testing large json objects could be split into small chunks with
✅ Increased question answering accuracy
✅ The ability to split into smaller chunks meant retrieval queries can
use fewer tokens
- **Dependencies:** json import added to text_splitter.py, and random
added to the unit test
- **Twitter handle:** @joelsprunger
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
- **Issue:** N/A
- **Dependencies:** N/A
- **Twitter handle:** @OntotextGraphDB
* This PR adds async methods to the LLM cache.
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
- **Description:**
This PR standardizes the `output_parser.py` file across all agent types
to ensure a uniform parsing mechanism is implemented. It introduces a
cohesive structure and common interface for output parsing, facilitating
easier modifications and extensions by users. The standardized approach
enhances maintainability and scalability of the codebase by providing a
consistent pattern for output parsing, which can be easily understood
and utilized across different agent types.
This PR builds upon the foundation set by a previously merged PR, which
focused exclusively on standardizing the `output_parser.py` for the
`conversational_agent` ([PR
#16945](https://github.com/langchain-ai/langchain/pull/16945)). With
this new update, I extend the standardization efforts to encompass
`output_parser.py` files across all agent types. This enhancement not
only unifies the parsing mechanism across the board but also introduces
the flexibility for users to incorporate custom `FORMAT_INSTRUCTIONS`.
- **Issue:**
https://github.com/langchain-ai/langchain/issues/10721https://github.com/langchain-ai/langchain/issues/4044
- **Dependencies:**
No new dependencies required for this change
- **Twitter handle:**
With my github user is enough. Thanks
I hope you accept my PR.
Based on my experiments, the newline isn't always there, so we can make
the regex slightly more robust by allowing an optional newline after the
bacticks
**Description:**
With this modification, users can customize the `FORMAT_INSTRUCTIONS`
template, allowing them to create their own prompts
As it is happening in
[this](https://github.com/langchain-ai/langchain/issues/10721) issue,
the `FORMAT_INSTRUCTIONS` is not customizable for the output parser,
unless you create your own class `ConvoOutputParser`. To avoid this, a
modification was done, creating a `format_instruction` variable that
users can customize with ease after initialize the agent.
For example:
```
agent = initialize_agent(
agent = AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
tools = tools,
llm = llm_agent,
verbose = True,
max_iterations = 3,
early_stopping_method = 'generate',
memory = b_w_memory,
handle_parsing_errors = True,
agent_kwargs={
'system_message':PREFIX,
'human_message':SUFFIX,
'template_tool_response':TEMPLATE_TOOL_RESPONSE,
}
)
agent.agent.output_parser.format_instructions = "MY CUSTOM FORMAT INSTRUCTIONS"
print(agent.agent.output_parser.get_format_instructions())
MY CUSTOM FORMAT INSTRUCTIONS
```
Other parameters like `system_message`, `human_message`, or
`template_tool_response` are already customizable and with this PR, the
last parameter `FORMAT_INSTRUCTIONS` in
`langchain.agents.conversational_chat.prompt` can be modified.
**Issue:**
https://github.com/langchain-ai/langchain/issues/10721
**Dependencies:**
No new dependencies required for this change
**Twitter handle:**
With my github user is enough. Thanks
I hope you accept my PR.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Add relevant type annotations for relevant session
and query objects to resolve mypy errors when `# type: ignore` comments
are removed.
- **Issue:** #17048
- **Dependencies:** None,
- **Twitter handle:** [clesiemo3](https://twitter.com/clesiemo3)
I attempted to solve the `UpsertionRecord` ignore but it would require
added a deprecated plugin or moving completely to sqlalchemy 2.0+ from
my understanding. I'm assuming this is not something desired at this
point in time.
This is a PR about #16334
The Stop sequenes isn't meanful in `json_chat` because it depends json
to work, not completions
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Previously, if this did not find a mypy cache then it wouldnt run
this makes it always run
adding mypy ignore comments with existing uncaught issues to unblock other prs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
We didn't override the namespace of the ImagePromptTemplate, so it is
listed as being in langchain.schema
This updates the mapping to let the loader deserialize.
Alternatively, we could make a slight breaking change and update the
namespace of the ImagePromptTemplate since we haven't broadly
publicized/documented it yet..
The `langchain.prompts.example_selector` [still holds several
artifacts](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.prompts)
that belongs to `community`. If they moved to
`langchain_community.example_selectors`, the `langchain.prompts`
namespace would be effectively removed which is great.
- moved a class and afunction to `langchain_community`
Note:
- Previously, the `langchain.prompts.example_selector` artifacts were
moved into the `langchain_core.exampe_selectors`. See the flattened
namespace (`.prompts` was removed)!
Similar flattening was implemented for the `langchain_core` as the
`langchain_core.exampe_selectors`.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:**
The BaseStore methods are currently blocking. Some implementations
(AstraDBStore, RedisStore) would benefit from having async methods.
Also once we have async methods for BaseStore, we can implement the
async `aembed_documents` in CacheBackedEmbeddings to cache the
embeddings asynchronously.
* adds async methods amget, amset, amedelete and ayield_keys to
BaseStore
* implements the async methods for InMemoryStore
* adds tests for InMemoryStore async methods
- **Twitter handle:** cbornet_
Adds:
* methods `aload()` and `alazy_load()` to interface `BaseLoader`
* implementation for class `MergedDataLoader `
* support for class `BaseLoader` in async function `aindex()` with unit
tests
Note: this is compatible with existing `aload()` methods that some
loaders already had.
**Twitter handle:** @cbornet_
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
… converters
One way to convert anything to an OAI function:
convert_to_openai_function
One way to convert anything to an OAI tool: convert_to_openai_tool
Corresponding bind functions on OAI models: bind_functions, bind_tools
- **Description:**
This PR adds a VectorStore integration for SAP HANA Cloud Vector Engine,
which is an upcoming feature in the SAP HANA Cloud database
(https://blogs.sap.com/2023/11/02/sap-hana-clouds-vector-engine-announcement/).
- **Issue:** N/A
- **Dependencies:** [SAP HANA Python
Client](https://pypi.org/project/hdbcli/)
- **Twitter handle:** @sapopensource
Implementation of the integration:
`libs/community/langchain_community/vectorstores/hanavector.py`
Unit tests:
`libs/community/tests/unit_tests/vectorstores/test_hanavector.py`
Integration tests:
`libs/community/tests/integration_tests/vectorstores/test_hanavector.py`
Example notebook:
`docs/docs/integrations/vectorstores/hanavector.ipynb`
Access credentials for execution of the integration tests can be
provided to the maintainers.
---------
Co-authored-by: sascha <sascha.stoll@sap.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:**
This PR aims to enhance the `langchain` library by enabling the support
for passing `custom_headers` in the `GraphQLAPIWrapper` usage within
`langchain/agents/load_tools.py`.
While the `GraphQLAPIWrapper` from the `langchain_community` module is
inherently capable of handling `custom_headers`, its current invocation
in `load_tools.py` does not facilitate this functionality.
This limitation restricts the use of the `graphql` tool with databases
or APIs that require token-based authentication.
The absence of support for `custom_headers` in this context also leads
to a lack of error messages when attempting to interact with secured
GraphQL endpoints, making debugging and troubleshooting more
challenging.
This update modifies the `load_tools` function to correctly handle
`custom_headers`, thereby allowing secure and authenticated access to
GraphQL services requiring tokens.
Example usage after the proposed change:
```python
tools = load_tools(
["graphql"],
graphql_endpoint="https://your-graphql-endpoint.com/graphql",
custom_headers={"Authorization": f"Token {api_token}"},
)
```
- **Issue:** None,
- **Dependencies:** None,
- **Twitter handle:** None
- **Description:** This addresses the issue tagged below where if you
try to pass your own client when creating an OpenAI assistant, a
pydantic error is raised:
Example code:
```python
import openai
from langchain.agents.openai_assistant import OpenAIAssistantRunnable
client = openai.OpenAI()
interpreter_assistant = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview",
client=client
)
```
Error:
`pydantic.v1.errors.ConfigError: field "client" not yet prepared, so the
type is still a ForwardRef. You might need to call
OpenAIAssistantRunnable.update_forward_refs()`
It additionally updates type hints and docstrings to indicate that an
AzureOpenAI client is permissible as well.
- **Issue:** https://github.com/langchain-ai/langchain/issues/15948
- **Dependencies:** N/A
- **Description:** extreact the _aperform_agent_action in the
AgentExecutor class to allow for easier overriding. Extracted logic from
_iter_next_step into a new method _perform_agent_action for consistency
and easier overriding.
- **Issue:** #15706Closes#15706
- **Description:** The HTMLHeaderTextSplitter Class now explicitly
specifies utf-8 encoding in the part of the split_text_from_file method
that calls the HTMLParser.
- **Issue:** Prevent garbled characters due to differences in encoding
of html files (except for English in particular, I noticed that problem
with Japanese).
- **Dependencies:** No dependencies,
- **Twitter handle:** @i_w__a
- **Description:** Adds a text splitter based on
[Konlpy](https://konlpy.org/en/latest/#start) which is a Python package
for natural language processing (NLP) of the Korean language. (It is
like Spacy or NLTK for Korean)
- **Dependencies:** Konlpy would have to be installed before this
splitter is used,
- **Twitter handle:** @untilhamza
* Removed some env vars not used in langchain package IT
* Added Astra DB env vars in langchain package, used for cache tests
* Added conftest.py to load env vars in langchain_community IT
* Added .env.example in langchain_community IT
- **Description:** Support IN and LIKE comparators in Milvus
self-querying retriever, based on [Boolean Expression
Rules](https://milvus.io/docs/boolean.md)
- **Issue:** No
- **Dependencies:** No
- **Twitter handle:** No
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Support [Lantern](https://github.com/lanterndata/lantern) as a new
VectorStore type.
- Added Lantern as VectorStore.
It will support 3 distance functions `l2 squared`, `cosine` and
`hamming` and will use `HNSW` index.
- Added tests
- Added example notebook
- **Description:** This PR defines the output parser of
OpenAIFunctionsAgent as an attribute, enabling customization and
subclassing of the parser logic.
- **Issue:** Subclassing is currently impossible as the
`OpenAIFunctionsAgentOutputParser` class is hard coded into the `plan`
and `aplan` methods
- **Dependencies:** None
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
**Description:**
Fixes OutputParserException thrown by the output_parser when 'query' is
'Null'.
Replace this entire comment with:
- **Description:** Current implentation of output_parser throws
OutputParserException if the response from the LLM contains `query:
null`. This unfortunately happens for my use case. And since there is no
way to modify the prompt used in SelfQueryRetriever, then we have to fix
it here, so it doesn't crash.
- **Issue:** https://github.com/langchain-ai/langchain/issues/15914
Didn't run tests. `make test` is not working. There is no `test` rule in
the `Makefile`.
Co-authored-by: Jan Horcicka <jhorcick@amazon.com>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
This PR fixes an issue where AgentExecutor with RunnableAgent does not allow users to see individual llm tokens if streaming=True is not set explicitly on the underlying chat model.
The majority of this PR is testing code:
1. Create a test chat model that makes it easier to test streaming and
supports AIMessages that include function invocation information.
2. Tests for the chat model
3. Tests for RunnableAgent (previously untested)
4. Tests for openai agent (previously untested)
Todo
- [x] copy over integration tests
- [x] update docs with new instructions in #15513
- [x] add linear ticket to bump core -> community, community->langchain,
and core->openai deps
- [ ] (optional): add `pip install langchain-openai` command to each
notebook using it
- [x] Update docstrings to not need `openai` install
- [x] Add serialization
- [x] deprecate old models
Contributor steps:
- [x] Add secret names to manual integrations workflow in
.github/workflows/_integration_test.yml
- [x] Add secrets to release workflow (for pre-release testing) in
.github/workflows/_release.yml
Maintainer steps (Contributors should not do these):
- [x] set up pypi and test pypi projects
- [x] add credential secrets to Github Actions
- [ ] add package to conda-forge
Functional changes to existing classes:
- now relies on openai client v1 (1.6.1) via concrete dep in
langchain-openai package
Codebase organization
- some function calling stuff moved to
`langchain_core.utils.function_calling` in order to be used in both
community and langchain-openai
- **Description:** `MarkdownHeaderTextSplitter` currently strips header
lines from chunked content. Many applications require these header lines
are preserved. This adds an optional parameter to preserve those headers
in the chunked content.
- **Issue:** #2836 (relevant)
- **Dependencies:** -
- **Tag maintainer:** @baskaryan
- **Twitter handle:** @finnless
Unit tests and new examples in notebook included.
cc @rlancemartin
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
…tch]: import models from community
ran
```bash
git grep -l 'from langchain\.chat_models' | xargs -L 1 sed -i '' "s/from\ langchain\.chat_models/from\ langchain_community.chat_models/g"
git grep -l 'from langchain\.llms' | xargs -L 1 sed -i '' "s/from\ langchain\.llms/from\ langchain_community.llms/g"
git grep -l 'from langchain\.embeddings' | xargs -L 1 sed -i '' "s/from\ langchain\.embeddings/from\ langchain_community.embeddings/g"
git checkout master libs/langchain/tests/unit_tests/llms
git checkout master libs/langchain/tests/unit_tests/chat_models
git checkout master libs/langchain/tests/unit_tests/embeddings/test_imports.py
make format
cd libs/langchain; make format
cd ../experimental; make format
cd ../core; make format
```
Sometimes, the tool_schema is like:
` {'action_name': 'search_items', 'action': {'term': 'pizza'}}`
sometimes, specially with gpt3.5 it comes like:
`{'action_name': 'search_items', 'term': 'pizza'}`
and it fails.
This PR is a way to make it work in both scenarios.
issues releated: #6624
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Co-authored-by: Lucca Zenobio <lucca.zenobio@ifood.com.br>
Adding to my previously, already merged PR I made some further
improvements:
* Added documentation to the existing Pydantic Parser notebook, with an
example using LCEL and `with_retry()` on `OutputParserException`.
* Added an additional output example to the prompt
* More lenient parser in terms of LLM output format
* Amended unit test
FYI @hwchase17
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
The regex used to match "Action" and "Action Input" in the output parser
has been updated. Previously, the regex did not correctly handle
multi-line inputs for "Action Input". The updated code uses the
're.DOTALL' flag to ensure multi-line inputs are correctly captured.
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
These can happen for edge cases not covered by `default` handler (eg.
"strange" keys in dicts)
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- Any direct usage of ThreadPoolExecutor or asyncio.run_in_executor
needs manual handling of context vars
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Correcting a small typo ('the' instead of 'then') and changing another
'the' (instead of 'then' too, it was a hard day for the 'n' key :D) to
'also' to match better with what is done in the code
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
```shell
Python 3.11.6 (main, Nov 2 2023, 04:39:43) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = {'name': 'gc', 'arguments': '{"prompt":"hi\nbob."}'}
>>> import json
>>> json.loads(s['arguments'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 14 (char 13)
>>> json.loads(s['arguments'].replace('\n', '\\n'))
{'prompt': 'hi\nbob.'}
>>>
```
---------
Co-authored-by: Nuno Campos <nuno@langchain.dev>
- Enables strict=False by default
- Uses partial json recovery logic by default
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
fix spellings
**seperate -> separate**: found more occurrences, see
https://github.com/langchain-ai/langchain/pull/14602
**initialise -> intialize**: the latter is more common in the repo
**pre-defined > predefined**: adding a comma after a prefix is a
delicate matter, but this is a generally accepted word
also, another word that appears in the repo is "fs" (stands for
filesystem), e.g., in `libs/core/langchain_core/prompts/loading.py`
` """Unified method for loading a prompt from LangChainHub or local
fs."""`
Isn't "filesystem" better?
- **Description:** This PR fixes test failures on Windows caused by path
handling differences and unescaped special characters in regex. The
failing tests are:
```
FAILED tests/unit_tests/storage/test_filesystem.py::test_yield_keys - AssertionError: assert ['key1', 'subdir\\key2'] == ['key1', 'subdir/key2']
FAILED tests/unit_tests/test_imports.py::test_importable_all - ModuleNotFoundError: No module named 'langchain_community.langchain_community\\adapters'
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_absolute - re.error: incomplete escape \U at position 53
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_on_parent_dir - re.error: incomplete escape \U at position 69
FAILED tests/unit_tests/tools/file_management/test_utils.py::test_get_validated_relative_path_errs_for_symlink_outside_root - re.error: incomplete escape \U at position 64
```
- **Issue:** fixes
https://github.com/langchain-ai/langchain/issues/11775 (partially)
- **Dependencies:** none
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- **Description:** Going forward, we have a own API `pip install
gradientai`. Therefore gradually removing the self-build packages in
llamaindex, haystack and langchain.
- **Issue:** None.
- **Dependencies:** `pip install gradientai`
- **Tag maintainer:** @michaelfeil
- **Description:**
- Add a break case to `text_splitter.py::split_text_on_tokens()` to
avoid unwanted item at the end of result.
- Add a testcase to enforce the behavior.
- **Issue:**
- #14649
- #5897
- **Dependencies:** n/a,
---
**Quick illustration of change:**
```
text = "foo bar baz 123"
tokenizer = Tokenizer(
chunk_overlap=3,
tokens_per_chunk=7
)
output = split_text_on_tokens(text=text, tokenizer=tokenizer)
```
output before change: `["foo bar", "bar baz", "baz 123", "123"]`
output after change: `["foo bar", "bar baz", "baz 123"]`
This is technically a breaking change because it'll switch out default
models from `text-davinci-003` to `gpt-3.5-turbo-instruct`, but OpenAI
is shutting off those endpoints on 1/4 anyways.
Feels less disruptive to switch out the default instead.
Gpt-3.5 sometimes calls with empty string arguments instead of `{}`
I'd assume it's because the typescript representation on their backend
makes it a bit ambiguous.
Builds out a developer documentation section in the docs
- Links it from contributing.md
- Adds an initial guide on how to contribute an integration
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
## Description
New YAML output parser as a drop-in replacement for the Pydantic output
parser. Yaml is a much more token-efficient format than JSON, proving to
be **~35% faster and using the same percentage fewer completion
tokens**.
☑️ Formatted
☑️ Linted
☑️ Tested (analogous to the existing`test_pydantic_parser.py`)
The YAML parser excels in situations where a list of objects is
required, where the root object needs no key:
```python
class Products(BaseModel):
__root__: list[Product]
```
I ran the prompt `Generate 10 healthy, organic products` 10 times on one
chain using the `PydanticOutputParser`, the other one using
the`YamlOutputParser` with `Products` (see below) being the targeted
model to be created.
LLMs used were Fireworks' `lama-v2-34b-code-instruct` and OpenAI
`gpt-3.5-turbo`. All runs succeeded without validation errors.
```python
class Nutrition(BaseModel):
sugar: int = Field(description="Sugar in grams")
fat: float = Field(description="% of daily fat intake")
class Product(BaseModel):
name: str = Field(description="Product name")
stats: Nutrition
class Products(BaseModel):
"""A list of products"""
products: list[Product] # Used `__root__` for the yaml chain
```
Stats after 10 runs reach were as follows:
### JSON
ø time: 7.75s
ø tokens: 380.8
### YAML
ø time: 5.12s
ø tokens: 242.2
Looking forward to feedback, tips and contributions!
TIL `**` globstar doesn't work in make
Makefile changes fix that.
`__getattr__` changes allow import of all files, but raise error when
accessing anything from the module.
file deletions were corresponding libs change from #14559
This reverts commit 38813d7090. This is a
temporary fix, as I don't see a clear way on how to use multiple keys
with `Qdrant.from_texts`.
Context: #14378
- **Description:** In Qdrant allows to input list of keys as the
content_payload_key to retrieve multiple fields (the generated document
will contain the dictionary {field: value} in a string),
- **Issue:** Previously we were able to retrieve only one field from the
vector database when making a search
- **Dependencies:**
- **Tag maintainer:**
- **Twitter handle:** @jb_dlb
---------
Co-authored-by: Jean Baptiste De La Broise <jeanbaptiste.delabroise@mdpi.com>
Description: This PR masked baidu qianfan - Chat_Models API Key and
added unit tests.
Issue: the issue langchain-ai#12165.
Tag maintainer: @eyurtsev
---------
Co-authored-by: xiayi <xiayi@bytedance.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
We found a request with `max_tokens=None` results in the following error
in Anthropic:
```
HTTPError: 400 Client Error: Bad Request for url: https://oregon.staging.cloud.databricks.com/serving-endpoints/corey-anthropic/invocations.
Response text: {"error_code":"INVALID_PARAMETER_VALUE","message":"INVALID_PARAMETER_VALUE: max_tokens was not of type Integer: null"}
```
This PR excludes `max_tokens` if it's None.
- **Description:** new parameters in OpenAIEmbeddings() constructor
(retry_min_seconds and retry_max_seconds) that allow parametrization by
the user of the former min_seconds and max_seconds that were hidden in
_create_retry_decorator() and _async_retry_decorator()
- **Issue:** #9298, #12986
- **Dependencies:** none
- **Tag maintainer:** @hwchase17
- **Twitter handle:** @adumont
make format ✅
make lint ✅
make test ✅
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description :
Updated the functions with new Clarifai python SDK.
Enabled initialisation of Clarifai class with model URL.
Updated docs with new functions examples.
Remove whitespaces from the input of the ListSQLDatabaseTool for better
support.
for example, the input "table1,table2,table3" will throw an exception
whiteout the change although it's a valid input.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** add gitlab url from env,
- **Issue:** no issue,
- **Dependencies:** no,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR adds support for metadata filters of the form:
`{"filter": {"key": { "NIN" : ["list", "of", "values"]}}}`
"IN" is already supported, so this is a quick & related update to add
"NIN"
- **Description:**
1. Add system parameters to the ERNIE LLM API to set the role of the
LLM.
2. Add support for the ERNIE-Bot-turbo-AI model according from the
document https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Alp0kdm0n.
3. For the function call of ErnieBotChat, align with the
QianfanChatEndpoint.
With this PR, the `QianfanChatEndpoint()` can use the `function calling`
ability with `create_ernie_fn_chain()`. The example is as the following:
```
from langchain.prompts import ChatPromptTemplate
import json
from langchain.prompts.chat import (
ChatPromptTemplate,
)
from langchain.chat_models import QianfanChatEndpoint
from langchain.chains.ernie_functions import (
create_ernie_fn_chain,
)
def get_current_news(location: str) -> str:
"""Get the current news based on the location.'
Args:
location (str): The location to query.
Returs:
str: Current news based on the location.
"""
news_info = {
"location": location,
"news": [
"I have a Book.",
"It's a nice day, today."
]
}
return json.dumps(news_info)
def get_current_weather(location: str, unit: str="celsius") -> str:
"""Get the current weather in a given location
Args:
location (str): location of the weather.
unit (str): unit of the tempuature.
Returns:
str: weather in the given location.
"""
weather_info = {
"location": location,
"temperature": "27",
"unit": unit,
"forecast": ["sunny", "windy"],
}
return json.dumps(weather_info)
template = ChatPromptTemplate.from_messages([
("user", "{user_input}"),
])
chat = QianfanChatEndpoint(model="ERNIE-Bot-4")
chain = create_ernie_fn_chain([get_current_weather, get_current_news], chat, template, verbose=True)
res = chain.run("北京今天的新闻是什么?")
print(res)
```
The result of the above code:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 北京今天的新闻是什么?
> Finished chain.
{'name': 'get_current_news', 'arguments': {'location': '北京'}}
```
For the `ErnieBotChat`, now can use the `system` parameter to set the
role of the LLM.
```
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ErnieBotChat
llm = ErnieBotChat(model_name="ERNIE-Bot-turbo-AI", system="你是一个能力很强的机器人,你的名字叫 小叮当。无论问你什么问题,你都可以给出答案。")
prompt = ChatPromptTemplate.from_messages(
[
("human", "{query}"),
]
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
res = chain.run(query="你是谁?")
print(res)
```
The result of the above code:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 你是谁?
> Finished chain.
我是小叮当,一个智能机器人。我可以为你提供各种服务,包括回答问题、提供信息、进行计算等。如果你需要任何帮助,请随时告诉我,我会尽力为你提供最好的服务。
```
- **Description:** Added a notebook to illustrate how to use
`text-embeddings-inference` from huggingface. As
`HuggingFaceHubEmbeddings` was using a deprecated client, I made the
most of this PR updating that too.
- **Issue:** #13286
- **Dependencies**: None
- **Tag maintainer:** @baskaryan
- **Description:** Update code to correctly pass the kwargs
- **Issue:** #14295
- **Dependencies:** -
- **Tag maintainer:**
<--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
#issue-14295
- **Description:** allows not enforcing function usage when a single
function is passed to an openAI function executable (or corresponding
legacy chain). This is a desired feature in the case where the model
does not have enough information to call a function, and needs to get
back to the user.
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** N/A
Add metadata to the blob object. This makes it easier
to make a pipeline that properly propagates metadata information
from raw content to the derived content.
- **Description:** Enhanced `create_sync_playwright_browser` and
`create_async_playwright_browser` functions to accept a list of
arguments. These arguments are now forwarded to
`browser.chromium.launch()` for customizable browser instantiation.
- **Issue:** #13143
- **Dependencies:** None
- **Tag maintainer:** @eyurtsev,
- **Twitter handle:** Dr_Bearden
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Embedding platform
- **Twitter handle:** https://twitter.com/JinaAI_
---------
Co-authored-by: Joan Fontanals Martinez <joan.fontanals.martinez@jina.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:**
Reference library azure-search-documents has been adapted in version
11.4.0:
1. Notebook explaining Azure AI Search updated with most recent info
2. HnswVectorSearchAlgorithmConfiguration --> HnswAlgorithmConfiguration
3. PrioritizedFields(prioritized_content_fields) -->
SemanticPrioritizedFields(content_fields)
4. SemanticSettings --> SemanticSearch
5. VectorSearch(algorithm_configurations) -->
VectorSearch(configurations)
--> Changes now reflected on Langchain: default vector search config
from langchain is now compatible with officially released library from
Azure.
- **Issue:**
Issue creating a new index (due to wrong class used for default vector
search configuration) if using latest version of azure-search-documents
with current langchain version
- **Dependencies:** azure-search-documents>=11.4.0,
- **Tag maintainer:** ,
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** This PR modifies the LLM validation in OpenAI
function agents to check whether the LLM supports OpenAI functions based
on a property (`supports_oia_functions`) instead of whether the LLM
passed to the agent `isinstance` of `ChatOpenAI`. This allows classes
that extend `BaseChatModel` to be passed to these agents as long as
they've been integrated with the OpenAI APIs and have this property set,
even if they don't extend `ChatOpenAI`.
- **Issue:** N/A
- **Dependencies:** none
for issue https://github.com/langchain-ai/langchain/issues/13162
migrate openai audio api, as [openai v1.0.0 Migration
Guide](https://github.com/openai/openai-python/discussions/742)
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Double Max <max@ground-map.com>
- **Description:** In openapi/planner deal with json in markdown output
cases
- **Issue:** In some cases LLMs could return json in markdown which
can't be loaded.
- **Dependencies:**
- **Tag maintainer:** @eyurtsev
- **Twitter handle:**
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adds doc key to metadata field when adding document
to Azure Search.
- **Issue:** -,
- **Dependencies:** -,
- **Tag maintainer:** @eyurtsev,
- **Twitter handle:** @finnless
Right now the document key with the name FIELDS_ID is not included in
the FIELDS_METADATA field, and therefore is not included in the Document
returned from a query. This is really annoying if you want to be able to
modify that item in the vectorstore.
Other's thoughts on this are welcome.
Description: There's a copy-paste typo where on_llm_error() calls
_on_chain_error() instead of _on_llm_error().
Issue: #13580
Dependencies: None
Tag maintainer: @hwchase17
Twitter handle: @jwatte
"Run `make format`, `make lint` and `make test` to check this locally."
The test scripts don't work in a plain Ubuntu LTS 20.04 system.
It looks like the dev container pulling is stuck. Or maybe the internet
is just ornery today.
---------
Co-authored-by: jwatte <jwatte@observeinc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
here it is validating shapely.geometry.point.Point: if not
isinstance(data_frame[page_content_column].iloc[0], gpd.GeoSeries):
raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries" you need
it to validate the geoSeries and not the shapely.geometry.point.Point
if not isinstance(data_frame[page_content_column], gpd.GeoSeries):
raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries"
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
**Description**
Implements `max_marginal_relevance_search` and
`max_marginal_relevance_search_by_vector` for the Momento Vector Index
vectorstore.
Additionally bumps the `momento` dependency in the lock file and adds
logging to the implementation.
**Dependencies**
✅ updates `momento` dependency in lock file
**Tag maintainer**
@baskaryan
**Twitter handle**
Please tag @momentohq for Momento Vector Index and @mloml for the
contribution 🙇
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
Hi! I'm Alex, Python SDK Team Lead from
[Comet](https://www.comet.com/site/).
This PR contains our new integration between langchain and Comet -
`CometTracer` class which uses new `comet_llm` python package for
submitting data to Comet.
No additional dependencies for the langchain package are required
directly, but if the user wants to use `CometTracer`, `comet-llm>=2.0.0`
should be installed. Otherwise an exception will be raised from
`CometTracer.__init__`.
A test for the feature is included.
There is also an already existing callback (and .ipynb file with
example) which ideally should be deprecated in favor of a new tracer. I
wasn't sure how exactly you'd prefer to do it. For example we could open
a separate PR for that.
I'm open to your ideas :)
Running a large number of requests to Embaas' servers (or any server)
can result in intermittent network failures (both from local and
external network/service issues). This PR implements exponential backoff
retries to help mitigate this issue.
The Github utilities are fantastic, so I'm adding support for deeper
interaction with pull requests. Agents should read "regular" comments
and review comments, and the content of PR files (with summarization or
`ctags` abbreviations).
Progress:
- [x] Add functions to read pull requests and the full content of
modified files.
- [x] Function to use Github's built in code / issues search.
Out of scope:
- Smarter summarization of file contents of large pull requests (`tree`
output, or ctags).
- Smarter functions to checkout PRs and edit the files incrementally
before bulk committing all changes.
- Docs example for creating two agents:
- One watches issues: For every new issue, open a PR with your best
attempt at fixing it.
- The other watches PRs: For every new PR && every new comment on a PR,
check the status and try to finish the job.
<!-- Thank you for contributing to LangChain!
Replace this comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the
same people again.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Allow users to pass a generic `BaseStore[str, bytes]` to
MultiVectorRetriever, removing the need to use the `create_kv_docstore`
method. This encoding will now happen internally.
@rlancemartin @eyurtsev
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
If we are not going to make the existing Docstore class also implement
`BaseStore[str, Document]`, IMO all base store implementations should
always be `[str, bytes]` so that they are more interchangeable.
CC @rlancemartin @eyurtsev
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** The existing version hardcoded search.windows.net in
the base url. This is not compatible with the gov cloud. I am allowing
the user to override the default for gov cloud support.,
- **Issue:** N/A, did not write up in an issue,
- **Dependencies:** None
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Nicholas Ceccarelli <nceccarelli2@moog.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Obsidian templates can include
[variables](https://help.obsidian.md/Plugins/Templates#Template+variables)
using double curly braces. `ObsidianLoader` uses PyYaml to parse the
frontmatter of documents. This parsing throws an error when encountering
variables' curly braces. This is avoided by temporarily substituting
safe strings before parsing.
- **Issue:** #13887
- **Tag maintainer:** @hwchase17
**Description:**
Adds the document loader for [Couchbase](http://couchbase.com/), a
distributed NoSQL database.
**Dependencies:**
Added the Couchbase SDK as an optional dependency.
**Twitter handle:** nithishr
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Our PR is an integration of a Steam API Tool that
makes recommendations on steam games based on user's Steam profile and
provides information on games based on user provided queries.
- **Issue:** the issue # our PR implements:
https://github.com/langchain-ai/langchain/issues/12120
- **Dependencies:** python-steam-api library, steamspypi library and
decouple library
- **Tag maintainer:** @baskaryan, @hwchase17
- **Twitter handle:** N/A
Hello langchain Maintainers,
We are a team of 4 University of Toronto students contributing to
langchain as part of our course [CSCD01 (link to course
page)](https://cscd01.com/work/open-source-project). We hope our changes
help the community. We have run make format, make lint and make test
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.
Our PR integrates the python-steam-api, steamspypi and decouple
packages. We have added integration tests to test our python API
integration into langchain and an example notebook is also provided.
Our amazing team that contributed to this PR: @JohnY2002, @shenceyang,
@andrewqian2001 and @muntaqamahmood
Thank you in advance to all the maintainers for reviewing our PR!
---------
Co-authored-by: Shence <ysc1412799032@163.com>
Co-authored-by: JohnY2002 <johnyuan0526@gmail.com>
Co-authored-by: Andrew Qian <andrewqian2001@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: JohnY <94477598+JohnY2002@users.noreply.github.com>
### Description
Starting from [openai version
1.0.0](17ac677995 (module-level-client)),
the camel case form of `openai.ChatCompletion` is no longer supported
and has been changed to lowercase `openai.chat.completions`. In
addition, the returned object only accepts attribute access instead of
index access:
```python
import openai
# optional; defaults to `os.environ['OPENAI_API_KEY']`
openai.api_key = '...'
# all client options can be configured just like the `OpenAI` instantiation counterpart
openai.base_url = "https://..."
openai.default_headers = {"x-foo": "true"}
completion = openai.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
)
print(completion.choices[0].message.content)
```
So I implemented a compatible adapter that supports both attribute
access and index access:
```python
In [1]: from langchain.adapters import openai as lc_openai
...: messages = [{"role": "user", "content": "hi"}]
In [2]: result = lc_openai.chat.completions.create(
...: messages=messages, model="gpt-3.5-turbo", temperature=0
...: )
In [3]: result.choices[0].message
Out[3]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}
In [4]: result["choices"][0]["message"]
Out[4]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}
In [5]: result = await lc_openai.chat.completions.acreate(
...: messages=messages, model="gpt-3.5-turbo", temperature=0
...: )
In [6]: result.choices[0].message
Out[6]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}
In [7]: result["choices"][0]["message"]
Out[7]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}
In [8]: for rs in lc_openai.chat.completions.create(
...: messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
...: ):
...: print(rs.choices[0].delta)
...: print(rs["choices"][0]["delta"])
...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}
In [20]: async for rs in await lc_openai.chat.completions.acreate(
...: messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
...: ):
...: print(rs.choices[0].delta)
...: print(rs["choices"][0]["delta"])
...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}
...
```
### Twitter handle
[lin_bob57617](https://twitter.com/lin_bob57617)
- **Description:** to support not only publicly available Hugging Face
endpoints, but also protected ones (created with "Inference Endpoints"
Hugging Face feature), I have added ability to specify custom api_url.
But if not specified, default behaviour won't change
- **Issue:** #9181,
- **Dependencies:** no extra dependencies
**Description:** The way the condition is checked in the
`return_stopped_response` function of `OpenAIAgent` may not be correct,
when the value returned is `AgentFinish` from the tools it does not work
properly.
Thanks for review, @baskaryan, @eyurtsev, @hwchase17.
- **Description:** Adds `llm_chain_kwargs` to `BaseRetrievalQA.from_llm`
so these can be passed to the LLM at runtime,
- **Issue:** https://github.com/langchain-ai/langchain/issues/14216,
---------
Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
- **Description:** As part of my conversation with Cerebrium team,
`model_api_request` will be no longer available in cerebrium lib so it
needs to be replaced.
- **Issue:** #12705 12705,
- **Dependencies:** Cerebrium team (agreed)
- **Tag maintainer:** @eyurtsev
- **Twitter handle:** No official Twitter account sorry :D
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:** Adding a possibility to use asynchronous callback
handler in human-in-the-loop validation tool. Very useful, for example,
if you want to implement a validation over Telegram bot.
**Issue:** -
**Dependencies:** -
---------
Co-authored-by: Daniyar_Supiyev <daniyar_supiyev@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description** An integration to allow the Yellowbrick Data Warehouse
to function as a vector store
---------
Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
- **Description:** some vector stores have a flag for try deleting the
collection before creating it (such as ´vectorpg´). This is a useful
flag when prototyping indexing pipelines and also for integration tests.
Added the bool flag `pre_delete_collection ` to the constructor (default
False)
- **Tag maintainer:** @hemidactylus
- **Twitter handle:** nicoloboschi
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** This extends `OpenAIEmbeddings` to add support for
non-`tiktoken` based embeddings, specifically for use with the new
`text-generation-webui` API (`--extensions openai`) which does not
support `tiktoken` encodings, but rather strings
- **Issue:** Not found,
- **Dependencies:** HuggingFace `transformers.AutoTokenizer` is new
dependency for running the model without `tiktoken`
- **Tag maintainer:** @baskaryan based on last commit for
`langchain-core` refactor
- **Twitter handle:** @xychelsea
Modified the tokenization process to be model-agnostic, allowing for
both OpenAI and non-OpenAI model tokenizations, by setting the new
default `bool` flag `tiktoken_enabled` to `False`. This requeires
HuggingFace’s AutoTokenizer and handling tokenization for models
requiring different preprocessing steps to generate a chunked string
request rather than a list of integers.
Updated the embeddings generation process to accommodate non-OpenAI
models. This includes converting tokenized text into embeddings using
OpenAI’s and Hugging Face’s model architectures.
-->
Hi,
I made some code changes on the Hologres vector store to improve the
data insertion performance.
Also, this version of the code uses `hologres-vector` library. This
library is more convenient for us to update, and more efficient in
performance.
The code has passed the format/lint/spell check. I have run the unit
test for Hologres connecting to my own database.
Please check this PR again and tell me if anything needs to change.
Best,
Changgeng,
Developer @ Alibaba Cloud
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Fixes the Mathpix PDF loader API integration.
Specifically, ensures that Mathpix auth headers are provided for every
request, and ensures that we recognize all errors that can occur during
a request. Also, the option to provide API keys as kwargs never actually
worked before, but now that's fixed too.
- **Issue:** #11249
- **Dependencies:** None
- **Description:**
This PR introduces the Slack toolkit to LangChain, which allows users to
read and write to Slack using the Slack API. Specifically, we've added
the following tools.
1. get_channel: Provides a summary of all the channels in a workspace.
2. get_message: Gets the message history of a channel.
3. send_message: Sends a message to a channel.
4. schedule_message: Sends a message to a channel at a specific time and
date.
- **Issue:** This pull request addresses [Add Slack Toolkit
#11747](https://github.com/langchain-ai/langchain/issues/11747)
- **Dependencies:** package`slack_sdk`
Note: For this toolkit to function you will need to add a Slack app to
your workspace. Additional info can be found
[here](https://slack.com/help/articles/202035138-Add-apps-to-your-Slack-workspace).
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ArianneLavada <ariannelavada@gmail.com>
Co-authored-by: ArianneLavada <84357335+ArianneLavada@users.noreply.github.com>
Co-authored-by: ariannelavada@gmail.com <you@example.com>