Commit Graph

18 Commits

Author SHA1 Message Date
Rajendra Kadam
7e5a9c317f
community[minor]: [Pebblo] Enhance PebbloSafeLoader to take anonymize flag (#26812)
- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
2024-09-25 09:33:06 -04:00
Erick Friis
c2a3021bb0
multiple: pydantic 2 compatibility, v0.3 (#26443)
Signed-off-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Dan O'Donovan <dan.odonovan@gmail.com>
Co-authored-by: Tom Daniel Grande <tomdgrande@gmail.com>
Co-authored-by: Grande <Tom.Daniel.Grande@statsbygg.no>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: ZhangShenao <15201440436@163.com>
Co-authored-by: Friso H. Kingma <fhkingma@gmail.com>
Co-authored-by: ChengZi <chen.zhang@zilliz.com>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Morgante Pell <morgantep@google.com>
2024-09-13 14:38:45 -07:00
Dristy Srivastava
7205057c3e
[Community][minor]: Added langchain_version while calling discover API (#24428)
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-08-26 08:47:48 -04:00
Dristy Srivastava
fbb4761199
[Community][minor]: Updating source path, and file path for SharePoint loader in PebbloSafeLoader (#25592)
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-08-26 08:38:40 -04:00
Rajendra Kadam
745d1c2b8d
community[minor]: [Pebblo] Fix URL construction in newer Python versions (#25747)
- **PR message**: **Fix URL construction in newer Python versions**
- **Description:** 
- Update the URL construction logic to use the .value attribute for
Routes enum members.
- This adjustment resolves an issue where the code worked correctly in
Python 3.9 but failed in Python 3.11.
  - Clean up unused routes.
- **Issue:** NA
- **Dependencies:** NA
2024-08-26 07:27:30 -04:00
Rajendra Kadam
1f1679e960
community: Refactor PebbloSafeLoader (#25582)
**Refactor PebbloSafeLoader**
  - Created `APIWrapper` and moved API logic into it.
  - Moved helper functions to the utility file.
  - Created smaller functions and methods for better readability.
  - Properly read environment variables.
  - Removed unused code.

**Issue:** NA
**Dependencies:** NA
**tests**:  Updated
2024-08-22 11:46:52 -04:00
Rajendra Kadam
a6add89bd4
community[minor]: [PebbloSafeLoader] Implement content-size-based batching (#24871)
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:** 
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
    - Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
2024-07-31 09:10:28 -04:00
Dristy Srivastava
020cc1cf3e
Community[minor]: Added checksum in while send data to pebblo-cloud (#23968)
- **Description:** 
            - Updated checksum in doc metadata
- Sending checksum and removing actual content, while sending data to
`pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`/loader/doc` API
            - Adding `pb_id` i.e. pebblo id to doc metadata
            - Refactoring as needed.
- Sending `content-checksum` and removing actual content, while sending
data to `pebblo-cloud` if `classifier-location `is `pebblo-cloud` in
`prmopt` API
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** Updated
- **Docs** NA

---------

Co-authored-by: dristy.cd <dristy@clouddefense.io>
2024-07-19 13:52:54 -04:00
Rajendra Kadam
1c65529fd7
community[minor]: [PebbloSafeLoader] Rename loader type and add SharePointLoader to supported loaders (#24393)
Thank you for contributing to LangChain!

- [x] **PR title**: [PebbloSafeLoader] Rename loader type and add
SharePointLoader to supported loaders
    - **Description:** Minor fixes in the PebbloSafeLoader:
        - Renamed the loader type from `remote_db` to `cloud_folder`.
- Added `SharePointLoader` to the list of loaders supported by
PebbloSafeLoader.
    - **Issue:** NA
    - **Dependencies:** NA
- [x] **Add tests and docs**: NA
2024-07-18 08:23:12 -04:00
Rahul Triptahi
9ef93ecd7c
community[minor]: Added classification_location parameter in PebbloSafeLoader. (#22565)
Description: Add classifier_location feature flag. This flag enables
Pebblo to decide the classifier location, local or pebblo-cloud.
Unit Tests: N/A
Documentation: N/A

---------

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-06-24 17:30:38 -04:00
Rahul Triptahi
7994cba18d
[Community][Minor]: Fetch loader_source of GoogleDriveLoader in PebbloSafeLoader. (#21314)
Description: This PR includes fix for loader_source to be fetched from
metadata in case of GdriveLoaders.
Documentation: NA
Unit Test: NA

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-05-07 14:45:58 -07:00
Leonid Ganeline
85094cbb3a
docs: community docstring updates (#21040)
Added missed docstrings. Updated docstrings to consistent format.
2024-04-29 17:40:23 -04:00
Rahul Triptahi
955cf186d2
community[patch]: Ingest source, owner and full_path if present in Document's metadata. (#20949)
Description: The PebbloSafeLoader should first check for owner,
full_path and size in metadata before implementing its own logic.
Dependencies: None
Documentation: NA.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-26 17:50:57 -07:00
Rahul Triptahi
dc921f0823
community[patch]: Add semantic info to metadata, classified by pebblo-server. (#20468)
Description: Add support for Semantic topics and entities.
Classification done by pebblo-server is not used to enhance metadata of
Documents loaded by document loaders.
Dependencies: None
Documentation: Updated.

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-25 12:55:33 -07:00
Leonid Ganeline
7cf2d2759d
community[patch]: docstrings update (#20301)
Added missed docstrings. Format docstings to the consistent form.
2024-04-11 16:23:27 -04:00
Rahul Triptahi
820b713086
community[minor]: Add support for Pebblo cloud_api_key in PebbloSafeLoader (#19855)
**Description**:
_PebbloSafeLoader_: Add support for pebblo's cloud api-key in
PebbloSafeLoader

- This Pull request enables PebbloSafeLoader to accept pebblo's cloud
api-key and send the semantic classification data to pebblo cloud.

**Documentation**: Updated 
**Unit test**: Added
**Issue**: NA
**Dependencies**: - None
**Twitter handle**: @rahul_tripathi2

Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-04-08 11:10:04 -04:00
Rajendra Kadam
0019d8a948
community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574)
**Description:**
PebbloSafeLoader: Add support for non-file-based Document Loaders

This pull request enhances PebbloSafeLoader by introducing support for
several non-file-based Document Loaders. With this update,
PebbloSafeLoader now seamlessly integrates with the following loaders:
- GoogleDriveLoader
- SlackDirectoryLoader
- Unstructured EmailLoader

**Issue:** NA
**Dependencies:** - None
**Twitter handle:** @Raj__725

---------

Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
2024-03-27 17:39:52 +00:00
Sridhar Ramaswamy
9f1cbbc6ed
community[minor]: Add pebblo safe document loader (#16862)
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
  - **Dependencies:** none
  - **Twitter handle:** srics

@hwchase17
2024-02-12 21:56:12 -08:00