This change updates KMS v2 to not create a new DEK for every
encryption. Instead, we re-use the DEK while the key ID is stable.
Specifically:
We no longer use a random 12 byte nonce per encryption. Instead, we
use both a random 4 byte nonce and an 8 byte nonce set via an atomic
counter. Since each DEK is randomly generated and never re-used,
the combination of DEK and counter are always unique. Thus there
can never be a nonce collision. AES GCM strongly encourages the use
of a 12 byte nonce, hence the additional 4 byte random nonce. We
could leave those 4 bytes set to all zeros, but there is no harm in
setting them to random data (it may help in some edge cases such as
live VM migration).
If the plugin is not healthy, the last DEK will be used for
encryption for up to three minutes (there is no difference on the
behavior of reads which have always used the DEK cache). This will
reduce the impact of a short plugin outage while making it easy to
perform storage migration after a key ID change (i.e. simply wait
ten minutes after the key ID change before starting the migration).
The DEK rotation cycle is performed in sync with the KMS v2 status
poll thus we always have the correct information to determine if a
read is stale in regards to storage migration.
Signed-off-by: Monis Khan <mok@microsoft.com>
Kubelet in standalone mode won't have kubeclient, it cannot get node.status
and get devices from it. Such a kubelet cannot mount attachable volumes
anyway.
* apiserver: add latency tracker for priority & fairness queue wait time
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
* apiserver: exclude priority & fairness wait times to SLO/SLI latency metrics
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
* apiserver: update TestLatencyTrackersFrom to check latency from PriorityAndFairnessTracker
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
* flowcontrol: add helper function observeQueueWaitTime to consolidate metric and latency tracker calls
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
* flowcontrol: replace time.Now() / time.Since() with clock.Now() / clock.Since() for better testability
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
* flowcontrol: add unit test TestQueueWaitTimeLatencyTracker to validate queue wait times recorded by latency tracker
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
---------
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
SyncKnownPods began triggering UpdatePod() for pods that have been
orphaned by desired config to ensure pods run to termination. This
test reads a mutex protected value while pod workers are running
in the background and as a consequence triggers a data race.
Wait for the workers to stabilize before reading the value. Other
tests validate that the correct sync events are triggered (see
kubelet_pods_test.go#TestKubelet_HandlePodCleanups for full
verification of this behavior).
It is slightly concerning that I was unable to recreate the race
locally even under stress testing, but I cannot identify why.