box: Add citation support to langchain_box.retrievers.BoxRetriever when used with Box AI (#27012)

Thank you for contributing to LangChain!

**Description:** Box AI can return responses, but it can also be
configured to return citations. This change allows the developer to
decide if they want the answer, the citations, or both. Regardless of
the combination, this is returned as a single List[Document] object.

**Dependencies:** Updated to the latest Box Python SDK, v1.5.1
**Twitter handle:** BoxPlatform


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
Scott Hurrey
2024-10-04 14:32:34 -04:00
committed by GitHub
parent 1e768a9ec7
commit 558fb4d66d
6 changed files with 242 additions and 49 deletions

View File

@@ -52,18 +52,10 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "b87a8e8b-9b5a-4e78-97e4-274b6b0dd29f",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"Enter your Box Developer Token: ········\n"
]
}
],
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
@@ -81,7 +73,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "a15d341e-3e26-4ca3-830b-5aab30ed66de",
"metadata": {},
"outputs": [],
@@ -102,10 +94,18 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"id": "652d6238-1f87-422a-b135-f5abbb8652fc",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-box"
]
@@ -124,7 +124,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"id": "70cc8e65-2a02-408a-bbc6-8ef649057d82",
"metadata": {},
"outputs": [],
@@ -146,7 +146,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 6,
"id": "97f3ae67",
"metadata": {},
"outputs": [
@@ -156,7 +156,7 @@
"[Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n - Gravitational Wave Detector Kit: $800\\n - Exoplanet Terrarium: $120\\nTotal: $920')]"
]
},
"execution_count": 33,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -192,7 +192,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 7,
"id": "ee0e726d-9974-4aa0-9ce1-0057ec3e540a",
"metadata": {},
"outputs": [],
@@ -216,17 +216,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 8,
"id": "51a60dbe-9f2e-4e04-bb62-23968f17164a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased was the **Gravitational Wave Detector Kit** from AstroTech Solutions, which cost $800.')]"
"[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.')]"
]
},
"execution_count": 5,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -237,6 +237,80 @@
"retriever.invoke(query)"
]
},
{
"cell_type": "markdown",
"id": "31a59a51",
"metadata": {},
"source": [
"## Citations\n",
"\n",
"With Box AI and the `BoxRetriever`, you can return the answer to your prompt, return the citations used by Box to get that answer, or both. No matter how you choose to use Box AI, the retriever returns a `List[Document]` object. We offer this flexibility with two `bool` arguments, `answer` and `citations`. Answer defaults to `True` and citations defaults to `False`, do you can omit both if you just want the answer. If you want both, you can just include `citations=True` and if you only want citations, you would include `answer=False` and `citations=True`\n",
"\n",
"### Get both"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "2eddc8c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.'),\n",
" Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n - Gravitational Wave Detector Kit: $800\\n - Exoplanet Terrarium: $120\\nTotal: $920')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever = BoxRetriever(\n",
" box_developer_token=box_developer_token, box_file_ids=box_file_ids, citations=True\n",
")\n",
"\n",
"retriever.invoke(query)"
]
},
{
"cell_type": "markdown",
"id": "d2e93a2e",
"metadata": {},
"source": [
"### Citations only"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c1892b07",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\\nInvoice Number: A5555\\n\\nLine Items:\\n - Gravitational Wave Detector Kit: $800\\n - Exoplanet Terrarium: $120\\nTotal: $920')]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever = BoxRetriever(\n",
" box_developer_token=box_developer_token,\n",
" box_file_ids=box_file_ids,\n",
" answer=False,\n",
" citations=True,\n",
")\n",
"\n",
"retriever.invoke(query)"
]
},
{
"cell_type": "markdown",
"id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e",
@@ -260,7 +334,7 @@
"metadata": {},
"outputs": [
{
"name": "stdin",
"name": "stdout",
"output_type": "stream",
"text": [
"Enter your OpenAI key: ········\n"