Philippe PRADOS
f5856680fe
community[minor]: add mongodb byte store ( #23876 )
...
The `MongoDBStore` can manage only documents.
It's not possible to use MongoDB for an `CacheBackedEmbeddings`.
With this new implementation, it's possible to use:
```python
CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=embeddings,
document_embedding_cache=MongoDBByteStore(
connection_string=db_uri,
db_name=db_name,
collection_name=collection_name,
),
)
```
and use MongoDB to cache the embeddings !
2024-07-19 13:54:12 -04:00
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 ( #23721 )
...
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path
import toml
import subprocess
import re
ROOT_DIR = Path(__file__).parents[1]
def main():
for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
print(path)
with open(path, "rb") as f:
pyproject = tomllib.load(f)
try:
pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
"^1.10"
)
pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
"^0.5"
)
except KeyError:
continue
with open(path, "w") as f:
toml.dump(pyproject, f)
cwd = "/".join(path.split("/")[:-1])
completed = subprocess.run(
"poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
cwd=cwd,
shell=True,
capture_output=True,
text=True,
)
logs = completed.stdout.split("\n")
to_ignore = {}
for l in logs:
if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
path, line_no, error_type = re.match(
"^(.*)\:(\d+)\: error:.*\[(.*)\]", l
).groups()
if (path, line_no) in to_ignore:
to_ignore[(path, line_no)].append(error_type)
else:
to_ignore[(path, line_no)] = [error_type]
print(len(to_ignore))
for (error_path, line_no), error_types in to_ignore.items():
all_errors = ", ".join(error_types)
full_path = f"{cwd}/{error_path}"
try:
with open(full_path, "r") as f:
file_lines = f.readlines()
except FileNotFoundError:
continue
file_lines[int(line_no) - 1] = (
file_lines[int(line_no) - 1][:-1] + f" # type: ignore[{all_errors}]\n"
)
with open(full_path, "w") as f:
f.write("".join(file_lines))
subprocess.run(
"poetry run ruff format .; poetry run ruff --select I --fix .",
cwd=cwd,
shell=True,
capture_output=True,
text=True,
)
if __name__ == "__main__":
main()
```
2024-07-03 10:33:27 -07:00
Philippe PRADOS
9aabb446c5
community[minor]: Add SQL storage implementation ( #22207 )
...
Hello @eyurtsev
- package: langchain-comminity
- **Description**: Add SQL implementation for docstore. A new
implementation, in line with my other PR ([async
PGVector](https://github.com/langchain-ai/langchain-postgres/pull/32 ),
[SQLChatMessageMemory](https://github.com/langchain-ai/langchain/pull/22065 ))
- Twitter handler: pprados
---------
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Piotr Mardziel <piotrm@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-06-07 21:17:02 +00:00
Christophe Bornet
74947ec894
community[minor]: Add Cassandra ByteStore ( #22064 )
2024-05-23 10:46:23 -04:00
Christophe Bornet
e92e96193f
community[minor]: Add async methods to the AstraDB BaseStore ( #16872 )
...
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-02-19 10:11:49 -08:00
Qihui Xie
5738143d4b
add mongodb_store ( #13801 )
...
# Add MongoDB storage
- **Description:**
Add MongoDB Storage as an option for large doc store.
Example usage:
```Python
# Instantiate the MongodbStore with a MongoDB connection
from langchain.storage import MongodbStore
mongo_conn_str = "mongodb://localhost:27017/"
mongodb_store = MongodbStore(mongo_conn_str, db_name="test-db",
collection_name="test-collection")
# Set values for keys
doc1 = Document(page_content='test1')
doc2 = Document(page_content='test2')
mongodb_store.mset([("key1", doc1), ("key2", doc2)])
# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
# [doc1, doc2]
# Iterate over keys
for key in mongodb_store.yield_keys():
print(key)
# Delete keys
mongodb_store.mdelete(["key1", "key2"])
```
- **Dependencies:**
Use `mongomock` for integration test.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-13 22:33:22 -05:00
Harrison Chase
4eda647fdd
infra: add -p to mkdir in lint steps ( #17013 )
...
Previously, if this did not find a mypy cache then it wouldnt run
this makes it always run
adding mypy ignore comments with existing uncaught issues to unblock other prs
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-02-05 11:22:06 -08:00
Erick Friis
b1a847366c
community: revert SQL Stores ( #16912 )
...
This reverts commit cfc225ecb3
.
https://github.com/langchain-ai/langchain/pull/15909#issuecomment-1922418097
These will have existed in langchain-community 0.0.16 and 0.0.17.
2024-02-01 16:37:40 -08:00
gcheron
cfc225ecb3
community: SQLStrStore/SQLDocStore provide an easy SQL alternative to InMemoryStore
to persist data remotely in a SQL storage ( #15909 )
...
**Description:**
- Implement `SQLStrStore` and `SQLDocStore` classes that inherits from
`BaseStore` to allow to persist data remotely on a SQL server.
- SQL is widely used and sometimes we do not want to install a caching
solution like Redis.
- Multiple issues/comments complain that there is no easy remote and
persistent solution that are not in memory (users want to replace
InMemoryStore), e.g.,
https://github.com/langchain-ai/langchain/issues/14267 ,
https://github.com/langchain-ai/langchain/issues/15633 ,
https://github.com/langchain-ai/langchain/issues/14643 ,
https://stackoverflow.com/questions/77385587/persist-parentdocumentretriever-of-langchain
- This is particularly painful when wanting to use
`ParentDocumentRetriever `
- This implementation is particularly useful when:
* it's expensive to construct an InMemoryDocstore/dict
* you want to retrieve documents from remote sources
* you just want to reuse existing objects
- This implementation integrates well with PGVector, indeed, when using
PGVector, you already have a SQL instance running. `SQLDocStore` is a
convenient way of using this instance to store documents associated to
vectors. An integration example with ParentDocumentRetriever and
PGVector is provided in docs/docs/integrations/stores/sql.ipynb or
[here](https://github.com/gcheron/langchain/blob/sql-store/docs/docs/integrations/stores/sql.ipynb ).
- It persists `str` and `Document` objects but can be easily extended.
**Issue:**
Provide an easy SQL alternative to `InMemoryStore`.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-01-23 16:50:48 -08:00
Christophe Bornet
81d1ba05dc
Add a BaseStore backed by AstraDB ( #15812 )
...
- **Description:** this change adds a `BaseStore` backed by AstraDB
- **Twitter handle:** cbornet_
2024-01-11 21:41:24 -08:00
Bagatur
ed58eeb9c5
community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community ( #14463 )
...
Moved the following modules to new package langchain-community in a backwards compatible fashion:
```
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
```
Moved the following to core
```
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
```
See .scripts/community_split/script_integrations.sh for all changes
2023-12-11 13:53:30 -08:00