community: cube document loader - do not load non-public dimensions and measures (#30286)

Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
  - Example: "community: add foobar LLM"

- **Description:** Do not load non-public dimensions and measures
(public: false) with Cube semantic loader

- **Issue:** Currently, non-public dimensions and measures are loaded by
the Cube document loader which leads to downstream applications using
these which is not allowed by Cube.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
This commit is contained in:
Priyansh Agrawal 2025-03-14 19:07:56 +00:00 committed by GitHub
parent ac22cde130
commit f54f14b747
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 22 additions and 2 deletions

View File

@ -152,6 +152,11 @@ class CubeSemanticLoader(BaseLoader):
item_name = str(item.get("name")) item_name = str(item.get("name"))
item_type = str(item.get("type")) item_type = str(item.get("type"))
is_public = bool(item.get("public"))
if not is_public:
logger.info("Skipping %s because it is not public.", item_name)
continue
if ( if (
self.load_dimension_values self.load_dimension_values
and column_member_type == "dimension" and column_member_type == "dimension"

View File

@ -43,7 +43,15 @@ class TestCubeSemanticLoader(unittest.TestCase):
"type": "string", "type": "string",
"title": "Test Title", "title": "Test Title",
"description": "Test Description", "description": "Test Description",
} "public": True,
},
{
"name": "hidden_dimension",
"type": "string",
"title": "Hidden",
"description": "Hidden",
"public": False,
},
], ],
} }
] ]
@ -52,10 +60,17 @@ class TestCubeSemanticLoader(unittest.TestCase):
mock_get_dimension_values.return_value = ["value1", "value2"] mock_get_dimension_values.return_value = ["value1", "value2"]
with self.assertLogs(level="INFO") as cm:
documents = self.loader.load() documents = self.loader.load()
self.assertEqual(len(documents), 1) self.assertEqual(len(documents), 1)
self.assertEqual(documents[0].page_content, "Test Title, Test Description") self.assertEqual(documents[0].page_content, "Test Title, Test Description")
self.assertEqual(documents[0].metadata["column_values"], ["value1", "value2"]) self.assertEqual(documents[0].metadata["column_values"], ["value1", "value2"])
self.assertIn(
"INFO:langchain_community.document_loaders.cube_semantic:"
"Skipping hidden_dimension because it is not public.",
cm.output,
)
if __name__ == "__main__": if __name__ == "__main__":