langchain/libs/partners/box/README.md
Scott Hurrey 55fd2e2158
box: add langchain box package and DocumentLoader (#25506)
Thank you for contributing to LangChain!

-Description: Adding new package: `langchain-box`:

* `langchain_box.document_loaders.BoxLoader` — DocumentLoader
functionality
* `langchain_box.utilities.BoxAPIWrapper` — Box-specific code
* `langchain_box.utilities.BoxAuth` — Helper class for Box
authentication
* `langchain_box.utilities.BoxAuthType` — enum used by BoxAuth class

- Twitter handle: @boxplatform


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-08-21 02:23:43 +00:00

195 lines
5.1 KiB
Markdown

# langchain-box
This package contains the LangChain integration with Box. For more information about
Box, check out our [developer documentation](https://developer.box.com).
## Pre-requisites
In order to integrate with Box, you need a few things:
* A Box instance — if you are not a current Box customer, sign up for a
[free dev account](https://account.box.com/signup/n/developer#ty9l3).
* A Box app — more on how to
[create an app](https://developer.box.com/guides/getting-started/first-application/)
* Your app approved in your Box instance — This is done by your admin.
The good news is if you are using a free developer account, you are the admin.
[Authorize your app](https://developer.box.com/guides/authorization/custom-app-approval/#manual-approval)
## Installation
```bash
pip install -U langchain-box
```
## Authentication
The `box-langchain` package offers some flexibility to authentication. The
most basic authentication method is by using a developer token. This can be
found in the [Box developer console](https://account.box.com/developers/console)
on the configuration screen. This token is purposely short-lived (1 hour) and is
intended for development. With this token, you can add it to your environment as
`BOX_DEVELOPER_TOKEN`, you can pass it directly to the loader, or you can use the
`BoxAuth` authentication helper class.
We will cover passing it directly to the loader in the section below.
### BoxAuth helper class
`BoxAuth` supports the following authentication methods:
* Token — either a developer token or any token generated through the Box SDK
* JWT with a service account
* JWT with a specified user
* CCG with a service account
* CCG with a specified user
> [!NOTE]
> If using JWT authentication, you will need to download the configuration from the Box
> developer console after generating your public/private key pair. Place this file in your
> application directory structure somewhere. You will use the path to this file when using
> the `BoxAuth` helper class.
For more information, learn about how to
[set up a Box application](https://developer.box.com/guides/getting-started/first-application/),
and check out the
[Box authentication guide](https://developer.box.com/guides/authentication/select/)
for more about our different authentication options.
Examples:
**Token**
```python
from langchain_box.document_loaders import BoxLoader
from langchain_box.utilities import BoxAuth, BoxAuthType
auth = BoxAuth(
auth_type=BoxAuthType.TOKEN,
box_developer_token=box_developer_token
)
loader = BoxLoader(
box_auth=auth,
...
)
```
**JWT with a service account**
```python
from langchain_box.document_loaders import BoxLoader
from langchain_box.utilities import BoxAuth, BoxAuthType
auth = BoxAuth(
auth_type=BoxAuthType.JWT,
box_jwt_path=box_jwt_path
)
loader = BoxLoader(
box_auth=auth,
...
```
**JWT with a specified user**
```python
from langchain_box.document_loaders import BoxLoader
from langchain_box.utilities import BoxAuth, BoxAuthType
auth = BoxAuth(
auth_type=BoxAuthType.JWT,
box_jwt_path=box_jwt_path,
box_user_id=box_user_id
)
loader = BoxLoader(
box_auth=auth,
...
```
**CCG with a service account**
```python
from langchain_box.document_loaders import BoxLoader
from langchain_box.utilities import BoxAuth, BoxAuthType
auth = BoxAuth(
auth_type=BoxAuthType.CCG,
box_client_id=box_client_id,
box_client_secret=box_client_secret,
box_enterprise_id=box_enterprise_id
)
loader = BoxLoader(
box_auth=auth,
...
```
**CCG with a specified user**
```python
from langchain_box.document_loaders import BoxLoader
from langchain_box.utilities import BoxAuth, BoxAuthType
auth = BoxAuth(
auth_type=BoxAuthType.CCG,
box_client_id=box_client_id,
box_client_secret=box_client_secret,
box_user_id=box_user_id
)
loader = BoxLoader(
box_auth=auth,
...
```
## Document Loaders
The `BoxLoader` class helps you get your unstructured content from Box
in Langchain's `Document` format. You can do this with either a `List[str]`
containing Box file IDs, or with a `str` containing a Box folder ID.
If getting files from a folder with folder ID, you can also set a `Bool` to
tell the loader to get all sub-folders in that folder, as well.
:::info
A Box instance can contain Petabytes of files, and folders can contain millions
of files. Be intentional when choosing what folders you choose to index. And we
recommend never getting all files from folder 0 recursively. Folder ID 0 is your
root folder.
:::
### Load files
```python
import os
from langchain_box.document_loaders import BoxLoader
os.environ["BOX_DEVELOPER_TOKEN"] = "df21df2df21df2d1f21df2df1"
loader = BoxLoader(
box_file_ids=["12345", "67890"],
character_limit=10000 # Optional. Defaults to no limit
)
docs = loader.lazy_load()
```
### Load from folder
```python
import os
from langchain_box.document_loaders import BoxLoader
os.environ["BOX_DEVELOPER_TOKEN"] = "df21df2df21df2d1f21df2df1"
loader = BoxLoader(
box_folder_id="12345",
recursive=False, # Optional. return entire tree, defaults to False
character_limit=10000 # Optional. Defaults to no limit
)
docs = loader.lazy_load()
```