This change will leverage the new PreFilterResult
to reduce down the list of eligible nodes for pod
using Bound Local PVs during PreFilter stage so
that only the node(s) which local PV node affinity
matches will be cosnidered in subsequent scheduling
stages.
Today, the NodeAffinity check is done during Filter
which means all nodes will be considered even though
there may be a large number of nodes that are not
eligible due to not matching the pod's bound local
PV(s)' node affinity requirement. Here we can
reduce down the node list in PreFilter to ensure that
during Filter we are only considering the reduced
list and thus can provide a more clear message to
users when node(s) are not available for scheduling
since the list only contains relevant nodes.
If error is encountered (e.g. PV cache read error) or
if node list reduction cannot be done (e.g. pod uses
no local PVs), then we will still proceed to consider
all nodes for the rest of scheduling stages.
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.
Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.
See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
iSCSI and FC volume plugins do not implement real 3rd party attach/detach.
If reconstruction fails with an error on a FC or iSCSI volume, it will not
be unmounted from the volume global dir and at the same time it will be
marked as unused, to be available to be mounted on another node.
The volume can then be mounted on several nodes, resulting in volume
corruption.
The other block based volume plugins implement attach/detach that either
makes the volume stuck (can't be detached) or will be force-detached from a
node before attaching it somewhere else.
The goal of this move is related to issue 89930, to break the dependence
of scheduling plugins on internal helpers. This function can easily move to
component-helpers where it will be used by other components as well.
As part of externalizing this function to the k8s.io/component-helpers repo,
this commit simplifies the function signature and makes its 2 helpers private
(nodeSelectorRequirementsAsSelector and nodeSelectorRequirementsAsFieldSelector).
This patch removes pkg/util/mount completely, and replaces it with the
mount package now located at k8s.io/utils/mount. The code found at
k8s.io/utils/mount was moved there from pkg/util/mount, so the code is
identical, just no longer in-tree to k/k.
This change is needed to make descriptor lock per pod, in the next commit.
If losetup is called for symlink, path in the output for losetup is resolved,
as a result, we can't distinguish which path the lock is taken.
This patch removes mount.Exec entirely and instead uses the common
utility from k8s.io/utils/exec.
The fake exec implementation found in k8s.io/utils/exec differs a bit
than mount.Exec, with the ability to pre-script expected calls to
Command.CombinedOutput(), so tests that previously relied on a callback
mechanism to produce specific output have been updated to use that
mechanism.
This PR fixes the issue mentioned in #83590 for GCE-PD. It uses
WriteVolumeCache API to writes the file system cache to disk during
UnmountDevice in Windows. Linux does not need to explicitly flush cache
because unmount will automatically sync the disk which also flush the
cache.
Change-Id: Ife2745c92b8c0446e79a52e9f9ec7851d2f6b90d
This patch cleans up pkg/util/mount/* and pkg/util/volume/* to always
use filepath.Join instead of path.Join. filepath.Join is preferred
because path.Join can have issues on Windows.
Since pkg/util/mount is going to move out of k/k, this exported constant
that is Kubernetes specific needed to move somewhere else. Made sense to
move it to pkg/volume/util.
Update GetDeviceNameFromMount in the mount interface to now take a
pluginMountDir argument, which is volume plugin dir with the global
mount path appended to it already.