Files
langchain/docs/versioned_docs/version-0.2.x/integrations/document_loaders/cube_semantic.ipynb
Jacob Lee aff771923a Jacob/new docs (#20570)
Use docusaurus versioning with a callout, merged master as well

@hwchase17 @baskaryan

---------

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru>
Co-authored-by: Averi Kitsch <akitsch@google.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com>
Co-authored-by: Fayfox <admin@fayfox.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com>
Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com>
Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: WeichenXu <weichen.xu@databricks.com>
Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com>
Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com>
Co-authored-by: Kartik Sarangmath <kartik@thirdai.com>
Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai>
Co-authored-by: MacanPN <martin.triska@gmail.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>
Co-authored-by: Hyeongchan Kim <kozistr@gmail.com>
Co-authored-by: sdan <git@sdan.io>
Co-authored-by: Guangdong Liu <liugddx@gmail.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com>
Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com>
Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com>
Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com>
Co-authored-by: Tomer Cagan <tomer@tomercagan.com>
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
2024-04-18 11:10:55 -07:00

154 lines
4.8 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cube Semantic Layer"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates the process of retrieving Cube's data model metadata in a format suitable for passing to LLMs as embeddings, thereby enhancing contextual information."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### About Cube"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"[Cube](https://cube.dev/) is the Semantic Layer for building data apps. It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Cubes data model provides structure and definitions that are used as a context for LLM to understand data and generate correct queries. LLM doesnt need to navigate complex joins and metrics calculations because Cube abstracts those and provides a simple interface that operates on the business-level terminology, instead of SQL table and column names. This simplification helps LLM to be less error-prone and avoid hallucinations."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Input arguments (mandatory)**\n",
"\n",
"`Cube Semantic Loader` requires 2 arguments:\n",
"\n",
"- `cube_api_url`: The URL of your Cube's deployment REST API. Please refer to the [Cube documentation](https://cube.dev/docs/http-api/rest#configuration-base-path) for more information on configuring the base path.\n",
"\n",
"- `cube_api_token`: The authentication token generated based on your Cube's API secret. Please refer to the [Cube documentation](https://cube.dev/docs/security#generating-json-web-tokens-jwt) for instructions on generating JSON Web Tokens (JWT).\n",
"\n",
"**Input arguments (optional)**\n",
"\n",
"- `load_dimension_values`: Whether to load dimension values for every string dimension or not.\n",
"\n",
"- `dimension_values_limit`: Maximum number of dimension values to load.\n",
"\n",
"- `dimension_values_max_retries`: Maximum number of retries to load dimension values.\n",
"\n",
"- `dimension_values_retry_delay`: Delay between retries to load dimension values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import jwt\n",
"from langchain_community.document_loaders import CubeSemanticLoader\n",
"\n",
"api_url = \"https://api-example.gcp-us-central1.cubecloudapp.dev/cubejs-api/v1/meta\"\n",
"cubejs_api_secret = \"api-secret-here\"\n",
"security_context = {}\n",
"# Read more about security context here: https://cube.dev/docs/security\n",
"api_token = jwt.encode(security_context, cubejs_api_secret, algorithm=\"HS256\")\n",
"\n",
"loader = CubeSemanticLoader(api_url, api_token)\n",
"\n",
"documents = loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Returns a list of documents with the following attributes:\n",
"\n",
"- `page_content`\n",
"- `metadata`\n",
" - `table_name`\n",
" - `column_name`\n",
" - `column_data_type`\n",
" - `column_title`\n",
" - `column_description`\n",
" - `column_values`\n",
" - `cube_data_obj_type`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Given string containing page content\n",
"page_content = \"Users View City, None\"\n",
"\n",
"# Given dictionary containing metadata\n",
"metadata = {\n",
" \"table_name\": \"users_view\",\n",
" \"column_name\": \"users_view.city\",\n",
" \"column_data_type\": \"string\",\n",
" \"column_title\": \"Users View City\",\n",
" \"column_description\": \"None\",\n",
" \"column_member_type\": \"dimension\",\n",
" \"column_values\": [\n",
" \"Austin\",\n",
" \"Chicago\",\n",
" \"Los Angeles\",\n",
" \"Mountain View\",\n",
" \"New York\",\n",
" \"Palo Alto\",\n",
" \"San Francisco\",\n",
" \"Seattle\",\n",
" ],\n",
" \"cube_data_obj_type\": \"view\",\n",
"}"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}