Commit Graph

10 Commits

Author SHA1 Message Date
Erick Friis
600b7bdd61
all: test 3.13 ci (#27197)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-10-25 12:56:58 -07:00
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 (#23721)
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path

import toml
import subprocess
import re

ROOT_DIR = Path(__file__).parents[1]


def main():
    for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
        print(path)
        with open(path, "rb") as f:
            pyproject = tomllib.load(f)
        try:
            pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
                "^1.10"
            )
            pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
                "^0.5"
            )
        except KeyError:
            continue
        with open(path, "w") as f:
            toml.dump(pyproject, f)
        cwd = "/".join(path.split("/")[:-1])
        completed = subprocess.run(
            "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )
        logs = completed.stdout.split("\n")

        to_ignore = {}
        for l in logs:
            if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
                path, line_no, error_type = re.match(
                    "^(.*)\:(\d+)\: error:.*\[(.*)\]", l
                ).groups()
                if (path, line_no) in to_ignore:
                    to_ignore[(path, line_no)].append(error_type)
                else:
                    to_ignore[(path, line_no)] = [error_type]
        print(len(to_ignore))
        for (error_path, line_no), error_types in to_ignore.items():
            all_errors = ", ".join(error_types)
            full_path = f"{cwd}/{error_path}"
            try:
                with open(full_path, "r") as f:
                    file_lines = f.readlines()
            except FileNotFoundError:
                continue
            file_lines[int(line_no) - 1] = (
                file_lines[int(line_no) - 1][:-1] + f"  # type: ignore[{all_errors}]\n"
            )
            with open(full_path, "w") as f:
                f.write("".join(file_lines))

        subprocess.run(
            "poetry run ruff format .; poetry run ruff --select I --fix .",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )


if __name__ == "__main__":
    main()

```
2024-07-03 10:33:27 -07:00
Eugene Yurtsev
e5541d1da7
community[patch]: Update doc-string in CloudBlobLoader (#22069)
Update doc-string
2024-05-23 15:31:41 +00:00
Philippe PRADOS
6dd621d636
community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957)
Thank you for contributing to LangChain!

- [ ] **PR title**: "Add CloudBlobLoader"
  - community: Add CloudBlobLoader

- [ ] **PR message**: Add cloud blob loader
    - **Description:** 
 Langchain provides several approaches to read different file formats:

Specific loaders (`CVSLoader`) or blob-compatible loaders
(`FileSystemBlobLoader`). The only implementation proposed for
BlobLoader is `FileSystemBlobLoader`.
      
Many projects retrieve files from cloud storage. We propose a new
implementation of `BlobLoader` to read files from the three cloud
storage systems. The interface is strictly identical to
`FileSystemBlobLoader`. The only difference is the constructor, which
takes a cloud "url" object such as `s3://my-bucket`, `az://my-bucket`,
or `gs://my-bucket`.
      
By streamlining the process, this novel implementation eliminates the
requirement to pre-download files from cloud storage to local temporary
files (which are seldom removed).
      
The code relies on the
[CloudPathLib](https://cloudpathlib.drivendata.org/stable/) library to
interpret cloud URLs. This has been added as an optional dependency.

```Python
loader = CloudBlobLoader("s3://mybucket/id")
for blob in loader.yield_blobs():
    print(blob)
```

- [X] **Dependencies:** CloudPathLib
- [X] **Twitter handle:** pprados


- [X] **Add tests and docs**: Add unit test, but it's easy to convert to
integration test, with some files in a cloud storage (see
`test_cloud_blob_loader.py`)

- [X] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified.

Hello from Paris @hwchase17. Can you review this PR?

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-05-23 10:59:55 -04:00
Eugene Yurtsev
0e5bf16d00
langchain[patch]: Migrate document loaders to use optional langchain community imports (#21095) 2024-05-01 11:26:25 -04:00
Leonid Ganeline
11195cfa42
community[patch]: speed up import times in the community package (#18928)
This PR speeds up import times in the community package
2024-03-11 16:37:36 -04:00
Christophe Bornet
ea141511d8
core: Move document loader interfaces to core (#17723)
This is needed to be able to move document loaders to partner packages.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-06 13:59:00 -05:00
Robby
ece4b43a81
community[patch]: doc loaders mypy fixes (#17368)
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048)

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-12 16:51:06 -08:00
Erick Friis
3a2eb6e12b
infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
2024-02-09 16:13:30 -08:00
Bagatur
ed58eeb9c5
community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463)
Moved the following modules to new package langchain-community in a backwards compatible fashion:

```
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
```

Moved the following to core
```
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
```

See .scripts/community_split/script_integrations.sh for all changes
2023-12-11 13:53:30 -08:00