Commit Graph

527 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
83b3baaf58 Merge pull request #71581 from saad-ali/fixCSILogEventSpam
Reduce CSI log and event spam
2018-11-30 22:27:27 -08:00
Yecheng Fu
5ada29ac16 Rename GetUniqueVolumeNameForNonAttachableVolume to GetUniqueVolumeNameFromSpecWithPod 2018-12-01 12:25:32 +08:00
k8s-ci-robot
79e5cb2cb7 Merge pull request #71302 from liggitt/verify-unit-test-feature-gates
Split mutable and read-only access to feature gates, limit tests to readonly access
2018-11-29 21:45:12 -08:00
k8s-ci-robot
2fd1949b7f Merge pull request #71294 from Chenditang/verify-golint
Fix golint verify errors.
2018-11-29 21:45:02 -08:00
saad-ali
2251bf0c21 Ensure volume mount err checking done inside op
Ensure volume mount error checking is done inside the operation so that
failures get handled with exponential backoff, etc.
2018-11-29 16:52:24 -08:00
Mikhail Shaverdo
a29981640f Fix nil pointer dereference panic in attachDetachController
add check `attachableVolumePlugin == nil` to operationGenerator.GenerateDetachVolumeFunc()
2018-11-29 13:10:07 +03:00
Jordan Liggitt
2498ca7606 drop VerifyFeatureGatesUnchanged 2018-11-21 11:51:33 -05:00
chendt.fnst
80de428f49 Fix golint verify errors.
**What type of PR is this?**
/kind cleanup

**What this PR does / why we need it**:
$ hack/verify-golint.sh
Errors from golint:
pkg/cloudprovider/providers/aws/aws_fakes.go:357:9: if block ends with a return statement, so drop this else and outdent its block
pkg/volume/util/util.go:204:9: if block ends with a return statement, so drop this else and outdent its block

**Which issue(s) this PR fixes** *(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)*:

**Special notes for your reviewer**:

**Release note**:
```
NONE
```
2018-11-21 09:11:20 +08:00
Jing Xu
47331cf0a2 WIP: Handle failed attach operation leave uncertain volume attach state
This PR fixes issue #32727.

When an attach operation fails, it is still possible that the volume
will be attached to the node later. This PR adds the logic to record the
volume to node with attached state no matter whether the operation
succedded or not. If the operation fails, mark the attached state to
false. If the operation succeeded, mark the attached state to true. The
reconciler will still issue attach operation until it returns
successfully. If the pod is removed in the mean time, the reconciler
will issue detach operations for all the volumes no matter what is the
attached state.
2018-11-19 17:19:10 -08:00
Jordan Liggitt
248d661327 Add tests to ensure storage feature gate changes don't escape packages 2018-11-16 10:52:53 -05:00
Michelle Au
fd64c08240 Fix storage feature gate test setting 2018-11-16 10:49:40 -05:00
Masaki Kimura
f0354ad605 Fix for adding block volume support to CSI RBD driver 2018-11-14 19:20:56 +00:00
Yecheng Fu
dfe0a08f05 Improve usability of CSI plugin metrics
Use full qualified plugin name if volume spec is present.
2018-11-12 09:21:49 +08:00
Davanum Srinivas
954996e231 Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
k8s-ci-robot
6813ebb568 Merge pull request #67851 from aniket-s-kulkarni/flexvolume-resize-implementation
Flexvolume resize implementation
2018-11-02 10:47:01 -07:00
k8s-ci-robot
133f662610 Merge pull request #70408 from idealhack/fix-golint-pkg-volume-util
Fix golint error for `pkg/volume/util/resize_util.go`
2018-11-01 11:11:22 -07:00
k8s-ci-robot
63a7e06eb5 Merge pull request #69484 from ddebroy/ddebroy-winpipe1
Correctly handle named pipe host mounts for Windows
2018-10-30 16:15:57 -07:00
Yang Li
a4fff0d32c Fix golint error for pkg/volume/util/resize_util.go 2018-10-30 13:01:54 +08:00
Deep Debroy
119e2a1d43 Address CR comments and add more tests
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-10-26 00:29:27 -07:00
Aniket Kulkarni
75350d11e9 adding support for expanding in use persistent volumes for Flex 2018-10-24 15:31:16 -04:00
k8s-ci-robot
2119512b9e Merge pull request #68491 from leakingtapan/golint-fix-volume-util
fix golint issue for pkg/volume/util
2018-10-15 11:40:32 -07:00
Deep Debroy
f8a69f1086 Broaden scope of host path types to skip processing in Windows
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-10-12 19:57:08 -07:00
k8s-ci-robot
a29b093a56 Merge pull request #69451 from justinsb/sort_bind_options
Sort bind options in JoinMountOptions
2018-10-06 21:29:51 -07:00
Deep Debroy
b4bb5dd430 Correctly handle named pipe host mounts for Windows
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-10-05 16:46:04 -07:00
Christoph Blecker
97b2992dc1 Update gofmt for go1.11 2018-10-05 12:59:38 -07:00
Justin Santa Barbara
3c4789b464 Sort bind options in JoinMountOptions
We were not sorting them previously, which made the order
non-deterministic.  If we believe the order doesn't matter, let's pick
a consistent order to minimize the chances of a rare flake.

This also simplifies the unit tests, which were flaking
not-very-rarely, e.g. with

`bazel test //pkg/volume/awsebs/... --runs_per_test=8`
2018-10-04 21:39:13 -04:00
Masaki Kimura
4226ae7a61 Fix to reflect commnet
- Change not to skip error from GetLoopDevice other than DeviceNotFound
- Add comment for the reason for order of descriptor lock release and TearDownDevice
2018-10-03 22:31:51 +00:00
Masaki Kimura
3d808540df Fix descriptor lock release logic for block volume unmapDevice
Fixes: #69114
2018-10-03 14:40:54 +00:00
David Zhu
9d207b3e3c GetMountRefs should not fail if the path supplied does not exist anymore. It has no mount references 2018-09-17 17:35:12 -07:00
Cheng Pan
a5c4f341d7 fix golint issue for pkg/volume/util 2018-09-10 21:24:35 +00:00
Stephen Cuppett
d85daf0f4c Resolves #59015, extends existing regex to cover t3, r5(d) & z1d instance types
From current AWS documentation:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

T3, C5, C5d, M5, M5d, R5, R5d, and z1d instances support a maximum of
28 attachments, and every instance has at least one network interface
attachment. If you have no additional network interface attachments on
these instances, you could attach 27 EBS volumes.
2018-09-05 21:24:09 -04:00
Kubernetes Submit Queue
ca43f007a3 Merge pull request #67731 from gnufied/fix-csi-attach-limit
Automatic merge from submit-queue (batch tested with PRs 68161, 68023, 67909, 67955, 67731). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Fix csi attach limit

Add support for volume limits for CSI.

xref: https://github.com/kubernetes/community/pull/2051

```release-note
Add support for volume attach limits for CSI volumes
```
2018-09-05 14:51:55 -07:00
Hemant Kumar
fc61620db5 Fix compatibility tests for scheduler 2018-09-05 12:29:00 -04:00
Kubernetes Submit Queue
37b29297aa Merge pull request #67432 from lichuqiang/topo_provision_beta
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Move volume dynamic provisioning scheduling to beta

**What this PR does / why we need it**:

*  Combine feature gate VolumeScheduling and DynamicProvisioningScheduling into one
* Add allowedTopologies description in kubectl

**Special notes for your reviewer**:
Wait until related e2e and downside plugins are ready.

/hold

**Release note**:

```release-note
Move volume dynamic provisioning scheduling to beta (ACTION REQUIRED: The DynamicProvisioningScheduling alpha feature gate has been removed. The VolumeScheduling beta feature gate is still required for this feature)
```
2018-08-29 15:19:34 -07:00
Kubernetes Submit Queue
720781e6af Merge pull request #67745 from feiskyer/choose-zones
Automatic merge from submit-queue (batch tested with PRs 67745, 67432, 67569, 67825, 67943). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

Fix panic when choosing zone or zones for volume

**What this PR does / why we need it**:

Fix panic when choosing zone or zones for volume, so that zoneSlice won't divide by zero now.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```

cc @ddebroy @andyzhangx
2018-08-29 15:19:30 -07:00
lichuqiang
4c43d626f2 related test update 2018-08-29 10:30:16 +08:00
lichuqiang
b4a57f6855 combine feature gate VolumeScheduling and DynamicProvisioningScheduling into one 2018-08-29 10:30:08 +08:00
Deep Debroy
a2de7d2d8d Add DynamicProvisioningScheduling support for GCE PD and RePD
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-08-24 14:38:25 -07:00
Pengfei Ni
8e4ab129e9 Fix panic when choosing zone or zones for volume 2018-08-23 10:46:41 +08:00
Hemant Kumar
4b17a48def Implement support for updating volume limits
Create a new predicate to count CSI volumes
2018-08-22 19:36:00 -04:00
Kubernetes Submit Queue
c5e74d128d Merge pull request #66884 from NickrenREN/attacher-detacher-refactor
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Attacher/Detacher refactor for local storage

Proposal link: https://github.com/kubernetes/community/pull/2438

**What this PR does / why we need it**:

Attacher/Detacher refactor for the plugins which just need to mount device, but do not need to attach, such as local storage plugin.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

```release-note
Attacher/Detacher refactor for local storage
```

/sig storage
/kind feature
2018-08-15 07:03:48 -07:00
Kubernetes Submit Queue
5aea00d885 Merge pull request #67097 from chakri-nelluri/EIO-Unmountfix
Automatic merge from submit-queue (batch tested with PRs 67396, 67097, 67395, 67365, 67099). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Ignore EIO error in unmount path

**What this PR does / why we need it**:
This PR ignores EIO in unmount path. XFS shuts down filesystem when the target is down and it returns EIO for the stat calls used in unmount path.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #66868

**Special notes for your reviewer**:
We already handle ESTALE & ENOTCONN errors in isCorruptedMnt Call. Adding EIO to that list covers the XFS shutdown case.

Also Flexvolume doesn't check for these errors in its current form. Updated Flexvolume code to handle it.

```release-note
NONE
```
2018-08-15 05:45:17 -07:00
NickrenREN
55784f88d4 add UTs for devicemountable conditions 2018-08-14 11:13:02 +08:00
NickrenREN
c7e4466873 attacher/detacher refactor 2018-08-14 11:12:41 +08:00
Chakri Nelluri
93a19fce28 Ignore EIO error in unmount path 2018-08-07 21:04:39 -04:00
Deep Debroy
217a3d8902 Add DynamicProvisioningScheduling support for EBS
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2018-08-01 09:00:03 -07:00
Ardalan Kangarlou
ee747b8649 Changed admission controller to allow volume expansion for all volume plugins
There are two motivations for this change:
(1) CSI plugins are soon going to support volume expansion. For such
plugins, admission controller doesn't know whether the plugins are
capabale of supporting volume expansion or not.
(2) Currently, admission controller rejects PVC updates for in-tree plugins
that don't support volume expansion (e.g., NFS, iSCSI). This change allows
external controllers to expand volumes similar to how external provisioners
operate.
2018-07-27 03:06:48 -04:00
Kubernetes Submit Queue
845a55dbbd Merge pull request #63176 from NetApp/bug/59946
Automatic merge from submit-queue (batch tested with PRs 64844, 63176). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix discovery/deletion of iscsi block devices

This PR modifies the iSCSI attach/detatch codepaths in the following
ways:
1) After unmounting a filesystem on an iSCSI block device, always
flush the multipath device mapper entry (if it exists) and delete
all block devices so the kernel forgets about them.
2) When attaching an iSCSI block device, instead of blindly
attempting to scan for the new LUN, first determine if the target
is already logged into, and if not, do the login first. Once every
portal is logged into, the scan is done.
3) Scans are now done for specific devices, instead of the whole
bus. This avoids discovering LUNs that kubelet has no interest in.
4) Additions to the underlying utility interfaces, with new tests
for the new functionality.
5) Some existing code was shifted up or down, to make the new logic
work.
6) A typo in an existing exec call on the attach path was fixed.

Fixes #59946

```release-note
When attaching iSCSI volumes, kubelet now scans only the specific
LUNs being attached, and also deletes them after detaching. This avoids
dangling references to LUNs that no longer exist, which used to be the
cause of random I/O errors/timeouts in kernel logs, slowdowns during
block-device related operations, and very rare cases of data corruption.
```
2018-07-25 16:19:01 -07:00
Ben Swartzlander
6d23d8edbb Avoid deleted iSCSI LUNs in the kernel
This change ensures that iSCSI block devices are deleted after
unmounting, and implements scanning of individual LUNs rather
than scanning the whole iSCSI bus.

In cases where an iSCSI bus is in use by more than one attachment,
detaching used to leave behind phantom block devices, which could
cause I/O errors, long timeouts, or even corruption in the case
when the underlying LUN number was recycled. This change makes
sure to flush references to the block devices after unmounting.

The original iSCSI code scanned the whole target every time a LUN
was attached. On storage controllers that export multiple LUNs on
the same target IQN, this led to a situation where nodes would
see SCSI disks that they weren't supposed to -- possibly dozens or
hundreds of extra SCSI disks. This caused 3 significant problems:

1) The large number of disks wasted resources on the node and
caused a minor drag on performance.
2) The scanning of all the devices caused a huge number of uevents
from the kernel, causing udev to bog down for multiple minutes in
some cases, triggering timeouts and other transient failures.
3) Because Kubernetes was not tracking all the "extra" LUNs that
got discovered, they would not get cleaned up until the last LUN
on a particular target was detached, causing a logout. This led
to significant complications:

In the time window between when a LUN was unintentially scanned,
and when it was removed due to a logout, if it was deleted on the
backend, a phantom reference remained on the node. In the best
case, the phantom LUN would cause I/O errors and timeouts in the
udev system. In the worst case, the backend could reuse the LUN
number for a new volume, and if that new volume were to be
scheduled to a pod with a phantom reference to the old LUN by the
same number, the initiator could get confused and possibly corrupt
data on that volume.

To avoid these problems, the new implementation only scans for
the specific LUN number it expects to see. It's worth noting that
the default behavior of iscsiadm is to automatically scan the
whole bus on login. That behavior can be disabled by setting
node.session.scan = manual
in iscsid.conf, and for the reasons mentioned above, it is
strongly recommended to set that option. This change still works
regardless of the setting in iscsid.conf, and while automatic
scanning will cause some problems, this change doesn't make the
problems any worse, and can make things better in some cases.
2018-07-24 23:58:19 -04:00
Matthew Wong
093e231289 Avoid overflowing int64 in RoundUpSize and return error if overflow int 2018-07-23 13:48:45 -04:00