Extend Cube Semantic Loader functionality (#8186)

**PR Description:**

This pull request introduces several enhancements and new features to
the `CubeSemanticLoader`. The changes include the following:

1. Added imports for the `json` and `time` modules.
2. Added new constructor parameters: `load_dimension_values`,
`dimension_values_limit`, `dimension_values_max_retries`, and
`dimension_values_retry_delay`.
3. Updated the class documentation with descriptions for the new
constructor parameters.
4. Added a new private method `_get_dimension_values()` to retrieve
dimension values from Cube's REST API.
5. Modified the `load()` method to load dimension values for string
dimensions if `load_dimension_values` is set to `True`.
6. Updated the API endpoint in the `load()` method from the base URL to
the metadata endpoint.
7. Refactored the code to retrieve metadata from the response JSON.
8. Added the `column_member_type` field to the metadata dictionary to
indicate if a column is a measure or a dimension.
9. Added the `column_values` field to the metadata dictionary to store
the dimension values retrieved from Cube's API.
10. Modified the `page_content` construction to include the column title
and description instead of the table name, column name, data type,
title, and description.

These changes improve the functionality and flexibility of the
`CubeSemanticLoader` class by allowing the loading of dimension values
and providing more detailed metadata for each document.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
This commit is contained in:
Mike Nitsenko
2023-07-25 01:11:58 +06:00
committed by GitHub
parent 82b8d8596c
commit d983046f90
3 changed files with 164 additions and 91 deletions

View File

@@ -53,11 +53,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Input arguments (mandatory)**\n",
"\n",
"`Cube Semantic Loader` requires 2 arguments:\n",
"| Input Parameter | Description |\n",
"| --- | --- |\n",
"| `cube_api_url` | The URL of your Cube's deployment REST API. Please refer to the [Cube documentation](https://cube.dev/docs/http-api/rest#configuration-base-path) for more information on configuring the base path. |\n",
"| `cube_api_token` | The authentication token generated based on your Cube's API secret. Please refer to the [Cube documentation](https://cube.dev/docs/security#generating-json-web-tokens-jwt) for instructions on generating JSON Web Tokens (JWT). |\n"
"\n",
"- `cube_api_url`: The URL of your Cube's deployment REST API. Please refer to the [Cube documentation](https://cube.dev/docs/http-api/rest#configuration-base-path) for more information on configuring the base path.\n",
"\n",
"- `cube_api_token`: The authentication token generated based on your Cube's API secret. Please refer to the [Cube documentation](https://cube.dev/docs/security#generating-json-web-tokens-jwt) for instructions on generating JSON Web Tokens (JWT).\n",
"\n",
"**Input arguments (optional)**\n",
"\n",
"- `load_dimension_values`: Whether to load dimension values for every string dimension or not.\n",
"\n",
"- `dimension_values_limit`: Maximum number of dimension values to load.\n",
"\n",
"- `dimension_values_max_retries`: Maximum number of retries to load dimension values.\n",
"\n",
"- `dimension_values_retry_delay`: Delay between retries to load dimension values."
]
},
{
@@ -85,9 +97,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Returns:\n",
"\n",
"A list of documents with the following attributes:\n",
"Returns a list of documents with the following attributes:\n",
"\n",
"- `page_content`\n",
"- `metadata`\n",
@@ -95,7 +105,8 @@
" - `column_name`\n",
" - `column_data_type`\n",
" - `column_title`\n",
" - `column_description`"
" - `column_description`\n",
" - `column_values`"
]
},
{
@@ -103,7 +114,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> page_content='table name: orders_view, column name: orders_view.total_amount, column data type: number, column title: Orders View Total Amount, column description: None' metadata={'table_name': 'orders_view', 'column_name': 'orders_view.total_amount', 'column_data_type': 'number', 'column_title': 'Orders View Total Amount', 'column_description': 'None'}"
"> page_content='Users View City, None' metadata={'table_name': 'users_view', 'column_name': 'users_view.city', 'column_data_type': 'string', 'column_title': 'Users View City', 'column_description': 'None', 'column_member_type': 'dimension', 'column_values': ['Austin', 'Chicago', 'Los Angeles', 'Mountain View', 'New York', 'Palo Alto', 'San Francisco', 'Seattle']}"
]
}
],