kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2026-01-29 21:29:24 +00:00

Author	SHA1	Message	Date
Kubernetes Submit Queue	865321c2d6	Merge pull request #61940 from alinbalutoiu/master Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add support for CNI on Windows Server 2016 RTM What this PR does / why we need it: Windows Server 2016 RTM has limited CNI support. This PR makes it possible for the CNI plugin to be used to setup POD networking on Windows Server 2016 RTM (build number 14393). Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #61939 Special notes for your reviewer: The old mode is not supported and tested on Windows Server 2016 RTM. This change allows the CNI plugin to be used on Windows Server 2016 RTM to retrieve the container IP instead of using workarounds (docker inspect). CNI support has been added for Windows Server 2016 version 1709 (build number 16299), this patch will just allow the same support for older build numbers. Windows Server 2016 RTM has a longer lifecycle (LTS) than Windows Server 2016 version 1709. https://support.microsoft.com/en-us/lifecycle/search/19761 vs https://support.microsoft.com/en-us/lifecycle/search/20311 Release note: ```release-note NONE ```	2018-05-03 10:17:03 -07:00
Kubernetes Submit Queue	592c39bccc	Merge pull request #62541 from filbranden/cgroupname1 Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Use a []string for CgroupName, which is a more accurate internal representation What this PR does / why we need it: This is purely a refactoring and should bring no essential change in behavior. It does clarify the cgroup handling code quite a bit. It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.) Special notes for your reviewer: The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings. It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed. The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use. A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced. This refactor results in a net reduction of around 30 lines of code, mainly with the demise of ConvertCgroupNameToSystemd which had fairly complicated logic in it and was doing just too many things. There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.) Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.) NOTE: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes. /assign @derekwaynecarr /assign @dashpole /assign @Random-Liu Release note: ```release-note NONE ```	2018-05-03 08:16:45 -07:00
Kubernetes Submit Queue	4f56127582	Merge pull request #63073 from andyxning/refactor_grpc_dial_with_dialcontext Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. refactor device plugin grpc dial with dialcontext What this PR does / why we need it: Refactor grpc `dial` with `dialContext` as `grpc.WithTimeout` has been deprecated by: > use DialContext and context.WithTimeout instead. Special notes for your reviewer: Release note: ```release-note NONE ```	2018-05-03 01:16:34 -07:00
Kubernetes Submit Queue	186dd7beb1	Merge pull request #62903 from cofyc/fixfsgroupcheckinlocal Automatic merge from submit-queue (batch tested with PRs 62657, 63278, 62903, 63375). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add more volume types in e2e and fix part of them. What this PR does / why we need it: - Add dir-link/dir-bindmounted/dir-link-bindmounted/bockfs volume types for e2e tests. - Fix fsGroup related e2e tests partially. - Return error if we cannot resolve volume path. - Because we should not fallback to volume path, if it's a symbolic link, we may get wrong results. To safely set fsGroup on local volume, we need to implement these two methods correctly for all volume types both on the host and in container: - get volume path kubelet can access - paths on the host and in container are different - get mount references - for directories, we cannot use its mount source (device field) to identify mount references, because directories on same filesystem have same mount source (e.g. tmpfs), we need to check filesystem's major:minor and directory root path on it Here is current status: \| \| (A) volume-path (host) \| (B) volume-path (container) \| (C) mount-refs (host) \| (D) mount-refs (container) \| \| --- \| --- \| --- \| --- \| --- \| \| (1) dir \| OK \| FAIL \| FAIL \| FAIL \| \| (2) dir-link \| OK \| FAIL \| FAIL \| FAIL \| \| (3) dir-bindmounted \| OK \| FAIL \| FAIL \| FAIL \| \| (4) dir-link-bindmounted \| OK \| FAIL \| FAIL \| FAIL \| \| (5) tmpfs\| OK \| FAIL \| FAIL \| FAIL \| \| (6) blockfs\| OK \| FAIL \| OK \| FAIL \| \| (7) block\| NOTNEEDED \| NOTNEEDED \| NOTNEEDED \| NOTNEEDED \| \| (8) gce-localssd-scsi-fs\| NOTTESTED \| NOTTESTED \| NOTTESTED \| NOTTESTED \| - This PR uses `nsenter ... readlink` to resolve path in container as @msau42 @jsafrane [suggested](https://github.com/kubernetes/kubernetes/pull/61489#pullrequestreview-110032850). This fixes B1:B6 and D6, , the rest will be addressed in https://github.com/kubernetes/kubernetes/pull/62102. - C5:D5 marked `FAIL` because `tmpfs` filesystems can share same mount source, we cannot rely on it to check mount references. e2e tests passes due to we use unique mount source string in tests. - A7:D7 marked `NOTNEEDED` because we don't set fsGroup on block devices in local plugin. (TODO: Should we set fsGroup on block device?) - A8:D8 marked `NOTTESTED` because I didn't test it, I leave it to `pull-kubernetes-e2e-gce`. I think it should be same as `blockfs`. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes # Special notes for your reviewer: Release note: ```release-note NONE ```	2018-05-02 20:13:11 -07:00
Kubernetes Submit Queue	b5f61ac129	Merge pull request #62657 from matthyx/master Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation. https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html ```release-note Use /usr/bin/env in all script shebangs to increase portability. ```	2018-05-02 19:44:32 -07:00
Yecheng Fu	3748197876	Add more volume types in e2e and fix part of them. - Add dir-link/dir-bindmounted/dir-link-bindmounted/blockfs volume types for e2e tests. - Return error if we cannot resolve volume path. - Add GetFSGroup/GetMountRefs methods for mount.Interface. - Fix fsGroup related e2e tests partially.	2018-05-02 10:31:42 +08:00
Kubernetes Submit Queue	8eb7eeef39	Merge pull request #63321 from sjenning/fix-pod-deletor Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubelet: force filterContainerID to empty string when removeAll is true fixes https://github.com/kubernetes/kubernetes/issues/57865 alternative to https://github.com/kubernetes/kubernetes/pull/62170 There is a bug in the container deletor where if `removeAll` is `true` in `deleteContainersInPod()` the `filterContainerID` is still used to filter results in `getContainersToDeleteInPod()`. If the filter container is not found, no containers are returned for deletion. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod_container_deletor.go#L74-L77 This is the case for the delayed deletion a pod in `CrashLoopBackoff` as the death of the infra container in response to the `DELETE` is detected by PLEG and triggers an attempt to clean up all containers but uses the infra container id as a filter with `removeAll` set to `true`. The infra container is immediately deleted and thus can not be found when `getContainersToDeleteInPod()` tries to find it. Thus the dead app container from the previous restart attempt still exists. `canBeDeleted()` in the status manager will return `false` until all the pod containers are deleted, delaying the deletion of the pod on the API server. The removal of the containers is eventually forced by the API server `REMOVE` after the grace period.	2018-05-01 09:50:13 -07:00
Filipe Brandenburger	b230fb8ac4	Use a []string for CgroupName, which is a more accurate internal representation The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings. It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed. The new constructor NewCgroupName starts from an existing CgroupName, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use. A RootCgroupName for the top of the cgroup hierarchy tree is introduced. This refactor results in a net reduction of around 30 lines of code, mainly with the demise of ConvertCgroupNameToSystemd which had fairly complicated logic in it and was doing just too many things. There's a small TODO in a helper updateSystemdCgroupInfo that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.) Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)	2018-05-01 08:29:06 -07:00
Seth Jennings	1fb3b24b63	kubelet: fix warning message to not print pointer addrs	2018-04-30 16:38:43 -05:00
Seth Jennings	2aba27da86	kubelet: force filterContainerID to empty string when removeAll is true	2018-04-30 16:29:17 -05:00
Kubernetes Submit Queue	15cc20630d	Merge pull request #60034 from pohly/device-manager-goroutine Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. avoid race condition in device manager and plugin startup/shutdown: wait for goroutines What this PR does / why we need it: Commit `1325c2f` worked around issue #59488, but it is still worthwhile to fix the underlying root cause properly. Which issue(s) this PR fixes: Fixes #59488 Special notes for your reviewer: This is an alternative to PR #59861, which used a different approach. Personally I tend to prefer this one now. Release note: ```release-note NONE ``` /sig node /area hw-accelerators /assign vikaschoudhary16	2018-04-30 13:24:08 -07:00
Lantao Liu	4bb16659ee	Make kubelet `ReadLogs` backward compatible. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-04-27 16:03:29 -07:00
Kubernetes Submit Queue	284e8182a4	Merge pull request #63160 from sjenning/no-waitlogs-stopped-pod Automatic merge from submit-queue (batch tested with PRs 63252, 63160). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubelet: logs: do not wait when following terminated container Currently, a `kubectl logs -f` on a terminated container will output the logs, wait 5 seconds (`stateCheckPeriod`), then return. The 5 seconds delay should not occur as the container is terminated and unable to generate additional log messages. This PR puts a check at the beginning of `waitLogs()` to avoid doing the wait when the container is not running. @derekwaynecarr @smarterclayton	2018-04-27 12:27:05 -07:00
David Eads	e2fc5cf259	remove versioning interface	2018-04-27 07:56:42 -04:00
Kubernetes Submit Queue	7ff38a23f0	Merge pull request #62937 from vikaschoudhary16/fix-dockershim-e2e Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix dockershim e2e What this PR does / why we need it: Delete checkpoint file when GetCheckpoint fails due to corrupt checkpoint. Earlier, before checkpointmanager, [`GetCheckpoint` in dockershim was deleting corrupt checkpoint file implicitly](https://github.com/kubernetes/kubernetes/pull/56040/files#diff-9a174fa21408b7faeed35309742cc631L116). In checkpointmanager's `GetCheckpoint` this implicit deletion of corrupt checkpoint is not happening. Because of this few e2e tests are failing because these tests are testing this deletion. Changes are being added to delete checkpoint file if found corrupted. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #62738 Special notes for your reviewer: No new behavior is being introduced. Implicit deletion of corrupt checkpoint is being done explicitly. Release note: ```release-note None ``` /cc @dashpole @sjenning @derekwaynecarr	2018-04-26 16:26:14 -07:00
Seth Jennings	5da3a1d514	kubelet: logs: do not wait on following terminated container	2018-04-26 16:53:54 -05:00
Kubernetes Submit Queue	a38a02792b	Merge pull request #62662 from wangzhen127/runtime-default Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Change seccomp annotation from "docker/default" to "runtime/default" What this PR does / why we need it: This PR changes seccomp annotation from "docker/default" to "runtime/default", so that it is can be applied to all kinds of container runtimes. This PR is a followup of [#1963](https://github.com/kubernetes/community/pull/1963). Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #39845 Special notes for your reviewer: Release note: ```release-note NONE ```	2018-04-26 14:33:53 -07:00
David Eads	a89291a5de	stop duplicating preferred version order	2018-04-26 10:03:36 -04:00
Lantao Liu	a321646e74	Add level to remote client glog. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-04-26 01:21:20 -07:00
Kubernetes Submit Queue	9e52d14eb9	Merge pull request #62805 from awly/take-reviews Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add awly as reviewer in several subtrees ```release-note NONE ```	2018-04-25 21:24:31 -07:00
Kubernetes Submit Queue	97287177ee	Merge pull request #63075 from deads2k/api-05-eliminate-indirection Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. eliminate indirection from type registration Some years back there was a partial attempt to revamp api type registration, but the effort was never completed and this was before we started splitting schemes. With separate schemes, the idea of partial registration no longer makes sense. This pull starts removing cruft from the registration process and pulls out a layer of indirection that isn't needed. @kubernetes/sig-api-machinery-pr-reviews @lavalamp @cheftako @sttts @smarterclayton Rebase cost is fairly high, so I'd like to avoid this lingering. /assign @sttts /assign @cheftako ```release-note NONE ```	2018-04-25 11:53:14 -07:00
Kubernetes Submit Queue	af5f9bc9bb	Merge pull request #62982 from dixudx/warning_kubelet_remote_sandbox Automatic merge from submit-queue (batch tested with PRs 62982, 63075, 63067, 62877, 63141). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. add warnings on using pod-infra-container-image for remote container runtime What this PR does / why we need it: We should warn on using `--pod-infra-container-image` to avoid confusions, when users are using remote container runtime. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #55676,#62388,#62732 Special notes for your reviewer: /cc @kubernetes/sig-node-pr-reviews Release note: ```release-note add warnings on using pod-infra-container-image for remote container runtime ```	2018-04-25 11:53:11 -07:00
David Eads	e7fbbe0e3c	eliminate indirection from type registration	2018-04-25 09:02:31 -04:00
Andy Xie	b01657d0c7	refactor device plugin grpc dial with dialcontext	2018-04-25 18:40:23 +08:00
Kubernetes Submit Queue	046baee847	Merge pull request #63118 from vikaschoudhary16/start-stop-race Automatic merge from submit-queue (batch tested with PRs 62951, 57460, 63118). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix device plugin re-registration What this PR does / why we need it: While registering a new endpoint, device manager copies all the devices from the old endpoint for the same resource and then it stops the old endpoint and starts the new endpoint. There is no sync between stopping the old and starting the new. While stopping the old, manager marks devices(which are copied to new endpoint as well) as "Unhealthy". In the endpoint.go, when after restart, plugin reports devices healthy, same health state (healthy) is found in the endpoint database and endpoint module does not update manager database. Solution in the PR is to mark devices as unhealthy before copying to new endpoint. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #62773 Special notes for your reviewer: Release note: ```release-note None ``` /cc @jiayingz @vishh @RenaudWasTaken @derekwaynecarr	2018-04-25 02:01:56 -07:00
Di Xu	b47ab8b2d3	add warnings for docker-only flags	2018-04-25 12:56:53 +08:00
Kubernetes Submit Queue	61892abc94	Merge pull request #62874 from dcbw/dockershim-SetUpPod-cleanup-on-failure Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. dockershim/sandbox: clean up pod network even if SetUpPod() failed If the CNI network plugin completes successfully, but something fails between that success and dockerhsim's sandbox setup code, plugin resources may not be cleaned up. A non-trivial amount of code runs after the plugin itself exits and the CNI driver's SetUpPod() returns, and any error condition recognized by that code would cause this leakage. The Kubernetes CRI RunPodSandbox() request does not attempt to clean up on errors, since it cannot know how much (if any) networking was actually set up. It depends on the CRI implementation to do that cleanup for it. In the dockershim case, a SetUpPod() failure means networkReady is FALSE for the sandbox, and TearDownPod() will not be called later by garbage collection even though networking was configured, because dockershim can't know how far SetUpPod() got. Concrete examples include if the sandbox's container is somehow removed during during that time, or another OS error is encountered, or the plugin returns a malformed result to the CNI driver. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1532965 ```release-note NONE ```	2018-04-24 21:48:01 -07:00
vikaschoudhary16	c846d5fe63	Fix race between stopping old and starting new endpoint	2018-04-24 22:22:39 -04:00
Kubernetes Submit Queue	a4271c03cb	Merge pull request #63090 from mtaufen/fix-qosreserved-json-tag Automatic merge from submit-queue (batch tested with PRs 59220, 62927, 63084, 63090, 62284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix qosReserved json tag (lowercase qos, instead of uppercase QOS) The API conventions specify that json keys should start with a lowercase character, and if the key starts with an initialism, all characters in the initialism should be lowercase. See `tlsCipherSuites` as an example. API Conventions: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md >All letters in the acronym should have the same case, using the >appropriate case for the situation. For example, at the beginning >of a field name, the acronym should be all lowercase, such as "httpGet". Follow up to: https://github.com/kubernetes/kubernetes/pull/62925 ```release-note NONE ``` @sjenning @derekwaynecarr	2018-04-24 19:01:20 -07:00
Kubernetes Submit Queue	f68d10cfe4	Merge pull request #62853 from tony612/fix-resultRun-reset Automatic merge from submit-queue (batch tested with PRs 62655, 61711, 59122, 62853, 62390). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. reset resultRun to 0 on pod restart What this PR does / why we need it: The resultRun should be reset to 0 on pod restart, so that resultRun on the first failure of the new container will be 1, which is correct. Otherwise, the actual FailureThreshold after restarting will be `FailureThreshold - 1`. Which issue(s) this PR fixes: This PR is related to https://github.com/kubernetes/kubernetes/issues/53530. https://github.com/kubernetes/kubernetes/pull/46371 fixed that issue but there's still a little problem like what I said above. Special notes for your reviewer: Release note: ```release-note fix resultRun by resetting it to 0 on pod restart ```	2018-04-24 13:28:25 -07:00
Kubernetes Submit Queue	44b57338d5	Merge pull request #59692 from mtaufen/dkcfg-unpack-configmaps Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. unpack dynamic kubelet config payloads to files This PR unpacks the downloaded ConfigMap to a set of files on the node. This enables other config files to ride alongside the KubeletConfiguration, and the KubeletConfiguration to refer to these cohabitants with relative paths. This PR also stops storing dynamic config metadata (e.g. current, last-known-good config records) in the same directory as config checkpoints. Instead, it splits the storage into `meta` and `checkpoints` dirs. The current store dir structure is as follows: ``` - dir named by --dynamic-config-dir (root for managing dynamic config) \| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good) \| - current (a serialized v1 NodeConfigSource object, indicating the assigned config) \| - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config) \| - checkpoints (dir for config checkpoints) \| - uid1 (dir for unpacked config, identified by uid1) \| - file1 \| - file2 \| - ... \| - uid2 \| - ... ``` There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643. ```release-note NONE ``` /cc @luxas @smarterclayton	2018-04-24 12:01:37 -07:00
Dan Williams	91321ef85b	dockershim/sandbox: clean up pod network even if SetUpPod() failed If the CNI network plugin completes successfully, but something fails between that success and dockerhsim's sandbox setup code, plugin resources may not be cleaned up. A non-trivial amount of code runs after the plugin itself exits and the CNI driver's SetUpPod() returns, and any error condition recognized by that code would cause this leakage. The Kubernetes CRI RunPodSandbox() request does not attempt to clean up on errors, since it cannot know how much (if any) networking was actually set up. It depends on the CRI implementation to do that cleanup for it. In the dockershim case, a SetUpPod() failure means networkReady is FALSE for the sandbox, and TearDownPod() will not be called later by garbage collection even though networking was configured, because dockershim can't know how far SetUpPod() got. Concrete examples include if the sandbox's container is somehow removed during during that time, or another OS error is encountered, or the plugin returns a malformed result to the CNI driver. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1532965	2018-04-24 11:17:49 -05:00
Michael Taufen	23c21b055c	Fix qosReserved json tag (lowercase qos, instead of uppercase QOS) The API conventions specify that json keys should start with a lowercase character, and if the key starts with an initialism, all characters in the initialism should be lowercase. See `tlsCipherSuites` as an example. API Conventions: https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md >All letters in the acronym should have the same case, using the >appropriate case for the situation. For example, at the beginning >of a field name, the acronym should be all lowercase, such as "httpGet".	2018-04-24 09:12:35 -07:00
Kubernetes Submit Queue	61a8454c28	Merge pull request #62925 from sjenning/fixup-qosreserved-tag Automatic merge from submit-queue (batch tested with PRs 63046, 62925, 63014). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubelet: fixup QOSReserved json tag Fix up follow on to https://github.com/kubernetes/kubernetes/pull/62509 @mtaufen @derekwaynecarr	2018-04-24 00:42:13 -07:00
Kubernetes Submit Queue	fca65dcd64	Merge pull request #62464 from choury/fix-double-rlock-in-cpumanger Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. avoid dobule RLock() in cpumanager What this PR does / why we need it: We met a deadlock when removing pod. kubelet keeps logging: ``` Pod "xxxx" is terminated, but some containers are still running ``` After debug, we found it stuck in `SetDefaultCPUSet` [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L184) while another goroutine are calling `GetCPUSetOrDefault` [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/cpu_manager.go#L256). According golang/go#15418, It is not safe to double RLock a RWMutex. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): none Special notes for your reviewer: Release note: ```release-note removed unsafe double RLock in cpumanager ```	2018-04-23 16:31:57 -07:00
Kubernetes Submit Queue	939c0783e1	Merge pull request #62152 from smarterclayton/use_client_store Automatic merge from submit-queue (batch tested with PRs 63001, 62152, 61950). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. When bootstrapping a client cert, store it with other client certs The kubelet uses two different locations to store certificates on initial bootstrap and then on subsequent rotation: * bootstrap: certDir/kubelet-client.(crt\|key) * rotation: certDir/kubelet-client-(DATE\|current).pem Bootstrap also creates an initial node.kubeconfig that points to the certs. Unfortunately, with short rotation the node.kubeconfig then becomes out of date because it points to the initial cert/key, not the rotated cert key. Alter the bootstrap code to store client certs exactly as if they would be rotated (using the same cert Store code), and reference the PEM file containing cert/key from node.kubeconfig, which is supported by kubectl and other Go tooling. This ensures that the node.kubeconfig continues to be valid past the first expiration. Example: ``` bootstrap: writes to certDir/kubelet-client-DATE.pem and symlinks to certDir/kubelet-client-current.pem writes node.kubeconfig pointing to certDir/kubelet-client-current.pem rotation: writes to certDir/kubelet-client-DATE.pem and symlinks to certDir/kubelet-client-current.pem ``` This will also allow us to remove the wierd "init store with bootstrap cert" stuff, although I'd prefer to do that in a follow up. @mikedanese @liggitt as per discussion on Slack today ```release-note The `--bootstrap-kubeconfig` argument to Kubelet previously created the first bootstrap client credentials in the certificates directory as `kubelet-client.key` and `kubelet-client.crt`. Subsequent certificates created by cert rotation were created in a combined PEM file that was atomically rotated as `kubelet-client-DATE.pem` in that directory, which meant clients relying on the `node.kubeconfig` generated by bootstrapping would never use a rotated cert. The initial bootstrap certificate is now generated into the cert directory as a PEM file and symlinked to `kubelet-client-current.pem` so that the generated kubeconfig remains valid after rotation. ```	2018-04-23 12:34:14 -07:00
Kubernetes Submit Queue	5b77996433	Merge pull request #62543 from ingvagabund/timeout-on-cloud-provider-request Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Timeout on instances.NodeAddresses cloud provider request What this PR does / why we need it: In cases the cloud provider does not respond before the node gets evicted. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes # Special notes for your reviewer: Release note: ```release-note stop kubelet to cloud provider integration potentially wedging kubelet sync loop ```	2018-04-23 09:12:42 -07:00
Clayton Coleman	368959346a	When bootstrapping a client cert, store it with other client certs The kubelet uses two different locations to store certificates on initial bootstrap and then on subsequent rotation: * bootstrap: certDir/kubelet-client.(crt\|key) * rotation: certDir/kubelet-client-(DATE\|current).pem Bootstrap also creates an initial node.kubeconfig that points to the certs. Unfortunately, with short rotation the node.kubeconfig then becomes out of date because it points to the initial cert/key, not the rotated cert key. Alter the bootstrap code to store client certs exactly as if they would be rotated (using the same cert Store code), and reference the PEM file containing cert/key from node.kubeconfig, which is supported by kubectl and other Go tooling. This ensures that the node.kubeconfig continues to be valid past the first expiration.	2018-04-23 10:23:01 -04:00
Jan Chaloupka	61efc29394	Timeout on instances.NodeAddresses cloud provider request	2018-04-23 13:28:43 +02:00
choury	c1b19fce90	avoid dobule RLock() in cpumanager	2018-04-23 10:33:40 +08:00
vikaschoudhary16	928f83960e	Fix dockershim e2e	2018-04-21 06:04:20 -04:00
Kubernetes Submit Queue	afa68cc287	Merge pull request #62886 from msau42/fix-localssd-fsgroup Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Only count local mounts that are from other pods What this PR does / why we need it: In GCE, we mount the same local SSD in two different paths (for backwards compatability). This makes the fsGroup conflict check fail because it thinks the 2nd mount is from another pod. For the fsgroup check, we only want to detect if other pods are mounting the same volume, so this PR filters the mount list to only those mounts under "/var/lib/kubelet". Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #62867 Release note: ```release-note NONE ```	2018-04-20 20:06:13 -07:00
Kubernetes Submit Queue	48243a9c24	Merge pull request #62780 from RenaudWasTaken/master Automatic merge from submit-queue (batch tested with PRs 62780, 62886). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Change Capacity log verbosity in status update What this PR does / why we need it: While in production we noticed that the log verbosity for the Capacity field in the node status was to high. This log message is called for every device plugin resource at every update. A proposed solution is to tune it down from V(2) to V(5). In a normal setting you'll be able to see the effect by looking at the node status. Release note: ``` NONE ``` /sig node /area hw-accelerators /assign @vikaschoudhary16 @jiayingz @vishh	2018-04-20 20:06:10 -07:00
Seth Jennings	0e5df1fea2	kubelet: fixup QOSReserved json tag	2018-04-20 21:12:18 -05:00
Kubernetes Submit Queue	4d6a6ced8c	Merge pull request #56525 from tianshapjq/testcase-helpers_linux.go Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. new testcase to helpers_linux.go new testcase to helpers_linux.go, PTAL. ```release-note NONE ```	2018-04-20 18:55:13 -07:00
Renaud Gaubert	7297dd33bb	Change Capacity log verbosity in node status update	2018-04-20 16:11:02 +02:00
Kubernetes Submit Queue	e9374411d5	Merge pull request #62509 from sjenning/qos-reserved-feature-gate Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubelet: move QOSReserved from experimental to alpha feature gate Fixes https://github.com/kubernetes/kubernetes/issues/61665 Release note: ```release-note The --experimental-qos-reserve kubelet flags is replaced by the alpha level --qos-reserved flag or QOSReserved field in the kubeletconfig and requires the QOSReserved feature gate to be enabled. ``` /sig node /assign @derekwaynecarr /cc @mtaufen	2018-04-19 16:47:21 -07:00
Kubernetes Submit Queue	f3599ba3c9	Merge pull request #61962 from liggitt/flag-race Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Avoid data races in unit tests Setting global flags in unit tests leads to data races like this: ``` ================== WARNING: DATA RACE Write at 0x0000028f5241 by goroutine 47: flag.(boolValue).Set() /home/jliggitt/.gvm/gos/go1.9.5/src/flag/flag.go:91 +0x7b flag.(FlagSet).Set() /home/jliggitt/.gvm/gos/go1.9.5/src/flag/flag.go:366 +0x10c flag.Set() /home/jliggitt/.gvm/gos/go1.9.5/src/flag/flag.go:379 +0x76 k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.TestPodContainerDeviceAllocation() /home/jliggitt/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager_test.go:549 +0x126 testing.tRunner() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:746 +0x16c Previous read at 0x0000028f5241 by goroutine 34: k8s.io/kubernetes/vendor/github.com/golang/glog.(loggingT).output() /home/jliggitt/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:682 +0x730 k8s.io/kubernetes/vendor/github.com/golang/glog.(loggingT).printf() /home/jliggitt/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:655 +0x259 k8s.io/kubernetes/vendor/github.com/golang/glog.Errorf() /home/jliggitt/go/src/k8s.io/kubernetes/vendor/github.com/golang/glog/glog.go:1118 +0x74 k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(endpointImpl).run() /home/jliggitt/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/endpoint.go:132 +0x1c7e k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(ManagerImpl).addEndpoint.func1() /home/jliggitt/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager.go:378 +0x3f Goroutine 47 (running) created at: testing.(T).Run() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:789 +0x568 testing.runTests.func1() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:1004 +0xa7 testing.tRunner() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:746 +0x16c testing.runTests() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:1002 +0x521 testing.(M).Run() /home/jliggitt/.gvm/gos/go1.9.5/src/testing/testing.go:921 +0x206 main.main() k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/_test/_testmain.go:68 +0x1d3 Goroutine 34 (finished) created at: k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).addEndpoint() /home/jliggitt/go/src/k8s.io/kubernetes/pkg/kubelet/cm/devicemanager/manager.go:377 +0x9d6 ================== --- FAIL: TestPodContainerDeviceAllocation (0.00s) testing.go:699: race detected during execution of test FAIL FAIL k8s.io/kubernetes/pkg/kubelet/cm/devicemanager 0.124s ```	2018-04-19 16:47:14 -07:00
Michelle Au	6cf8a6606c	Only count mounts that are from other pods	2018-04-19 15:40:51 -07:00
Kubernetes Submit Queue	c778d871e0	Merge pull request #58740 from YuxiJin-tobeyjin/add-ut-for-kuberuntime-gc Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. add ut for kuberuntime-gc What this PR does / why we need it: Add ut for kuberuntime-gc to cover more situations: 1) Add two uncovered cases to test sandbox-gc (1) When there are more than one exited sandboxes,the older exited sandboxes without containers for existing pods should be garbage collected; (2) Even though there are more than one exited sandboxes,the older exited sandboxes with containers for existing pods should not be garbage collected. 2) Add one uncovered case to test container-gc (1) To cover the situation when allSourcesReady is set false; Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes # Special notes for your reviewer: Release note: ```release-note "NONE" ```	2018-04-19 12:27:19 -07:00

1 2 3 4 5 ...

6245 Commits