community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696)

This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.

This is a PR that's meant to raise user awareness that the pickling module is involved.
This commit is contained in:
Eugene Yurtsev
2024-03-06 16:43:01 -05:00
committed by GitHub
parent 0e52961562
commit 4c25b49229
10 changed files with 128 additions and 7 deletions

View File

@@ -87,9 +87,28 @@ class TileDB(VectorStore):
docs_array_uri: str = "",
config: Optional[Mapping[str, Any]] = None,
timestamp: Any = None,
allow_dangerous_deserialization: bool = False,
**kwargs: Any,
):
"""Initialize with necessary components."""
"""Initialize with necessary components.
Args:
allow_dangerous_deserialization: whether to allow deserialization
of the data which involves loading data using pickle.
data can be modified by malicious actors to deliver a
malicious payload that results in execution of
arbitrary code on your machine.
"""
if not allow_dangerous_deserialization:
raise ValueError(
"TileDB relies on pickle for serialization and deserialization. "
"This can be dangerous if the data is intercepted and/or modified "
"by malicious actors prior to being de-serialized. "
"If you are sure that the data is safe from modification, you can "
" set allow_dangerous_deserialization=True to proceed. "
"Loading of compromised data using pickle can result in execution of "
"arbitrary code on your machine."
)
self.embedding = embedding
self.embedding_function = embedding.embed_query
self.index_uri = index_uri