1
0
mirror of https://github.com/hwchase17/langchain.git synced 2025-04-30 04:45:23 +00:00
Commit Graph

14 Commits

Author SHA1 Message Date
Bagatur
a0c2281540
infra: update mypy 1.10, ruff 0.5 ()
```python
"""python scripts/update_mypy_ruff.py"""
import glob
import tomllib
from pathlib import Path

import toml
import subprocess
import re

ROOT_DIR = Path(__file__).parents[1]


def main():
    for path in glob.glob(str(ROOT_DIR / "libs/**/pyproject.toml"), recursive=True):
        print(path)
        with open(path, "rb") as f:
            pyproject = tomllib.load(f)
        try:
            pyproject["tool"]["poetry"]["group"]["typing"]["dependencies"]["mypy"] = (
                "^1.10"
            )
            pyproject["tool"]["poetry"]["group"]["lint"]["dependencies"]["ruff"] = (
                "^0.5"
            )
        except KeyError:
            continue
        with open(path, "w") as f:
            toml.dump(pyproject, f)
        cwd = "/".join(path.split("/")[:-1])
        completed = subprocess.run(
            "poetry lock --no-update; poetry install --with typing; poetry run mypy . --no-color",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )
        logs = completed.stdout.split("\n")

        to_ignore = {}
        for l in logs:
            if re.match("^(.*)\:(\d+)\: error:.*\[(.*)\]", l):
                path, line_no, error_type = re.match(
                    "^(.*)\:(\d+)\: error:.*\[(.*)\]", l
                ).groups()
                if (path, line_no) in to_ignore:
                    to_ignore[(path, line_no)].append(error_type)
                else:
                    to_ignore[(path, line_no)] = [error_type]
        print(len(to_ignore))
        for (error_path, line_no), error_types in to_ignore.items():
            all_errors = ", ".join(error_types)
            full_path = f"{cwd}/{error_path}"
            try:
                with open(full_path, "r") as f:
                    file_lines = f.readlines()
            except FileNotFoundError:
                continue
            file_lines[int(line_no) - 1] = (
                file_lines[int(line_no) - 1][:-1] + f"  # type: ignore[{all_errors}]\n"
            )
            with open(full_path, "w") as f:
                f.write("".join(file_lines))

        subprocess.run(
            "poetry run ruff format .; poetry run ruff --select I --fix .",
            cwd=cwd,
            shell=True,
            capture_output=True,
            text=True,
        )


if __name__ == "__main__":
    main()

```
2024-07-03 10:33:27 -07:00
Rahul Triptahi
9ef93ecd7c
community[minor]: Added classification_location parameter in PebbloSafeLoader. ()
Description: Add classifier_location feature flag. This flag enables
Pebblo to decide the classifier location, local or pebblo-cloud.
Unit Tests: N/A
Documentation: N/A

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-06-24 17:30:38 -04:00
Dristy Srivastava
ef3df45d9d
community[minor]: Updating payload for pebblo discover API ()
**Description:** Updating response for pebblo discover API. Also
updating filed name case type
**Documentation:** N/A
**Unit tests:** N/A
2024-06-03 15:36:17 -07:00
Bagatur
50186da0a1
infra: rm unused # noqa violations ()
Updating 
2024-05-22 15:21:08 -07:00
Rahul Triptahi
96bd0b0844
community[patch]: Remove redundant pebblo cloud api call ()
Description: removed redundant pebblo cloud api call. Changed classified
`doc` key to `ai_apps_data`.
Documentation: N/A
Unit tests: N/A
2024-05-20 17:15:16 -07:00
Rahul Triptahi
c172611647
community[patch]: Add classifier_url argument in PebbloSafeLoader and documentation update. ()
Description: Add classifier_url argument in PebbloSafeLoader.
Documentation: Updated PebbloSafeLoader documentation with above change
and new links for pebblo github pages.

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-29 17:41:09 -04:00
Rahul Triptahi
955cf186d2
community[patch]: Ingest source, owner and full_path if present in Document's metadata. ()
Description: The PebbloSafeLoader should first check for owner,
full_path and size in metadata before implementing its own logic.
Dependencies: None
Documentation: NA.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-26 17:50:57 -07:00
Dristy Srivastava
5f1d1666e3
community[patch]: Add support for pebblo server and client version ()
**Description**:
_PebbloSafeLoader_: Add support for pebblo server and client version


**Documentation:** NA
**Unit test:** NA
**Issue:** NA
**Dependencies:**  None

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-04-25 20:39:17 +00:00
Rahul Triptahi
dc921f0823
community[patch]: Add semantic info to metadata, classified by pebblo-server. ()
Description: Add support for Semantic topics and entities.
Classification done by pebblo-server is not used to enhance metadata of
Documents loaded by document loaders.
Dependencies: None
Documentation: Updated.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-25 12:55:33 -07:00
Rahul Triptahi
2cbfc94bcb
community[patch]: Add support for authorized identities in PebbloSafeLoader. ()
Description: Add support for authorized identities in PebbloSafeLoader.
Now with this change, PebbloSafeLoader will extract
authorized_identities from metadata and send it to pebblo server
Dependencies: None
Documentation: None

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-16 18:34:06 -07:00
Rahul Triptahi
820b713086
community[minor]: Add support for Pebblo cloud_api_key in PebbloSafeLoader ()
**Description**:
_PebbloSafeLoader_: Add support for pebblo's cloud api-key in
PebbloSafeLoader

- This Pull request enables PebbloSafeLoader to accept pebblo's cloud
api-key and send the semantic classification data to pebblo cloud.

**Documentation**: Updated 
**Unit test**: Added
**Issue**: NA
**Dependencies**: - None
**Twitter handle**: @rahul_tripathi2

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-08 11:10:04 -04:00
Rajendra Kadam
0019d8a948
community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader ()
**Description:**
PebbloSafeLoader: Add support for non-file-based Document Loaders

This pull request enhances PebbloSafeLoader by introducing support for
several non-file-based Document Loaders. With this update,
PebbloSafeLoader now seamlessly integrates with the following loaders:
- GoogleDriveLoader
- SlackDirectoryLoader
- Unstructured EmailLoader

**Issue:** NA
**Dependencies:** - None
**Twitter handle:** @Raj__725

---------

Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-03-27 17:39:52 +00:00
Jan Cap
7ae3ce60d2
community[patch]: Fix pwd import that is not available on windows ()
- **Description:** Resolving problem in
`langchain_community\document_loaders\pebblo.py` with `import pwd`.
`pwd` is not available on windows. import moved to try catch block
  - **Issue:** 
2024-02-14 13:45:10 -08:00
Sridhar Ramaswamy
9f1cbbc6ed
community[minor]: Add pebblo safe document loader ()
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
2024-02-12 21:56:12 -08:00