Commit Graph

182 Commits

Author SHA1 Message Date
Bowei Du
ee847ebf8a Add metrics to all major gce operations {latency, errors}
The new metrics is:

  cloudprovider_gce_api_request_duration_seconds{request, region, zone}
  cloudprovider_gce_api_request_errors{request, region, zone}

`request` is the specific function that is used.
`region` is the target region (Will be "<n/a>" if not applicable)
`zone` is the target zone (Will be "<n/a>" if not applicable)

Note: this fixes some issues with the previous implementation of
metrics for disks:
- Time duration tracked was of the initial API call, not the entire
  operation.
- Metrics label tuple would have resulted in many independent
  histograms stored, one for each disk. (Did not aggregate well).
2017-04-27 12:49:30 -07:00
Mike Danese
a05c3c0efd autogenerated 2017-04-14 10:40:57 -07:00
Kubernetes Submit Queue
f1c0c0a73c Merge pull request #42395 from nicksardo/gce-src-ranges
Automatic merge from submit-queue

Adding load balancer src cidrs to GCE cloudprovider

**What this PR does / why we need it**:
As of January 31st, 2018, GCP will be sending health checks and l7 traffic from two CIDRs and legacy health checks from three CIDS. This PR moves them into the cloudprovider package and provides a flag for override.

Another PR will need to be address firewall rule creation for external L4 network loadbalancing #40778

**Which issue this PR fixes**
Step one of #40778
Step one of https://github.com/kubernetes/ingress/issues/197

**Release note**:
```release-note
Add flags to GCE cloud provider to override known L4/L7 proxy & health check source cidrs
```
2017-04-12 19:57:43 -07:00
Bowei Du
f61590c221 Adds support for PodCIDR allocation from the GCE cloud provider
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.

- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
  the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
2017-04-11 14:07:54 -07:00
Bowei Du
f5be63e0f7 Add PodCIDRs API for GCE (Google cloud alpha feature) 2017-04-10 12:05:02 -07:00
Kubernetes Submit Queue
9c9326114c Merge pull request #43777 from wlan0/provider-id
Automatic merge from submit-queue

move ProvideID indexed methods to right location

@bowei
2017-04-07 19:57:48 -07:00
Jan Safranek
67e1f2c08e Add e2e tests for storageclass
This reverts commit 22352d2844 and makes
gce.GetDiskByNameUnknownZone a public GCE cloud provider method.
2017-04-05 11:49:49 +02:00
Kubernetes Submit Queue
449a13c44c Merge pull request #40338 from gnufied/cloudprovider-gce-metrics
Automatic merge from submit-queue

Implement API usage metrics for gce storage

**What this PR does / why we need it**:

This PR implements support for emitting metrics from GCE about storage operations.

**Which issue this PR fixes** 

Fixes https://github.com/kubernetes/features/issues/182

**Release note**:
```
Add support for emitting metrics from GCE cloudprovider about storage operations.
```
2017-03-30 12:42:02 -07:00
Kubernetes Submit Queue
289ef62442 Merge pull request #43644 from nicksardo/gce-healthchecks
Automatic merge from submit-queue (batch tested with PRs 42617, 43247, 43509, 43644, 43820)

[GCE] Support legacy-https and generic health checks

**What this PR does / why we need it**:
- Adds CRUD functions to manage `compute.HttpsHealthChecks` 
The legacy HTTPS healthchecks will be used by the GLBC (GCE Load balancer Controller)

- Adds CRUD functions to manage `compute.HealthChecks`
These are required for the internal load balancer

- Removes the logic that disregards NotFound errors on DeleteHttpHealthChecks as this is useful information for callers. Here are the three known invocations within kubernetes: 
[gce/gce_loadbalancer.go#L457](bc6e77d42f/pkg/cloudprovider/providers/gce/gce_loadbalancer.go (L457)): Only prints warning that HC wasn't deleted  -> acceptable
[gce/gce_loadbalancer.go#L465](bc6e77d42f/pkg/cloudprovider/providers/gce/gce_loadbalancer.go (L465)): Err is ignored if not nil  -> acceptable
[e2e/framework/ingress_utils.go#L530](bc6e77d42f/test/e2e/framework/ingress_utils.go (L530)): Already checks if is NotFound error -> acceptable

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Step one of https://github.com/kubernetes/ingress/issues/494
Step one of #33483 

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-03-29 16:05:25 -07:00
wlan0
655dfd1196 move ProvideID indexed methods to right location 2017-03-28 15:08:03 -07:00
Hemant Kumar
c4aaf47282 Implement API usage metrics for gce
This PR implements tracking of GCE API usage via prometheus metrics.
2017-03-28 16:33:21 -04:00
wlan0
a68c783dc8 Use ProviderID to address nodes in the cloudprovider
The cloudprovider is being refactored out of kubernetes core. This is being
done by moving all the cloud-specific calls from kube-apiserver, kubelet and
kube-controller-manager into a separately maintained binary(by vendors) called
cloud-controller-manager. The Kubelet relies on the cloudprovider to detect information
about the node that it is running on. Some of the cloudproviders worked by
querying local information to obtain this information. In the new world of things,
local information cannot be relied on, since cloud-controller-manager will not
run on every node. Only one active instance of it will be run in the cluster.

Today, all calls to the cloudprovider are based on the nodename. Nodenames are
unqiue within the kubernetes cluster, but generally not unique within the cloud.
This model of addressing nodes by nodename will not work in the future because
local services cannot be queried to uniquely identify a node in the cloud. Therefore,
I propose that we perform all cloudprovider calls based on ProviderID. This ID is
a unique identifier for identifying a node on an external database (such as
the instanceID in aws cloud).
2017-03-27 23:13:13 -07:00
Nick Sardo
baab99b823 Adding load balancer src ranges; support flag overrides 2017-03-24 16:36:19 -07:00
Nick Sardo
93cb2b41de Adding HTTPS and generic health checks to GCE 2017-03-24 14:24:42 -07:00
Bowei Du
0ab072dde8 Add bowei to OWNERS of cloudproviders/gce 2017-03-24 13:18:13 -07:00
Bowei Du
dc1e614a72 Split the GCE cloud provider into more managable chunks
Each major interface is now in its own file. Any package private
functions that are only referenced by a particular module was also moved
to the corresponding file. All common helper functions were moved to
gce_util.go.

This change is a pure movement of code; no semantic changes were made.
2017-03-23 14:40:16 -07:00
Kubernetes Submit Queue
a2d74cda38 Merge pull request #42452 from jingxu97/Mar/nodeNamePrefix
Automatic merge from submit-queue (batch tested with PRs 42452, 43399)

Modify getInstanceByName to avoid calling getInstancesByNames

This PR modify getInstanceByname to loop through all management zones
directly instead of calling getInstancesByNames. Currently
getInstancesByNames use a node name prefix as a filter to list the
instances. If the prefix does not match, it will return all instances
which is very wasteful since getInstanceByName only query one instance
with a specific name.

Partially fix issue #42445
2017-03-20 15:23:33 -07:00
Jing Xu
880de79376 Return nil when deleting non-exist GCE PD
When gce cloud tries to delete a disk, if the disk could not be found
from the zones, the function should return nil error. This modified behavior is also consistent with AWS
2017-03-03 15:06:39 -08:00
Jing Xu
92f05da1ff Modify getInstanceByName to avoid calling getInstancesByNames
This PR modify getInstanceByname to loop through all management zones
directly instead of calling getInstancesByNames. Currently
getInstancesByNames use a node name prefix as a filter to list the
instances. If the prefix does not match, it will return all instances
which is very wasteful since getInstanceByName only query one instance
with a specific name.
2017-03-03 11:37:08 -08:00
Kubernetes Submit Queue
616d929828 Merge pull request #38702 from jsafrane/gce-provisioning-existing
Automatic merge from submit-queue (batch tested with PRs 38702, 41810, 41778, 41858, 41872)

gce: Reuse unsuccessfully provisioned volumes.

GCE PD names generated by Kubernetes are guaranteed to be unique - they
contain name of the cluster and UID of the PVC that is behind it.
Presence of a GCE PD that has the same name as we want to provision
indicates that previous provisioning did not go well and most probably
the controller manager process was restarted in the meantime.

Kubernetes should reuse this volume and not provision a new one.

Fixes #38681
2017-02-23 07:54:33 -08:00
Kubernetes Submit Queue
16a0a0b975 Merge pull request #41034 from rootfs/gce-instance
Automatic merge from submit-queue (batch tested with PRs 41337, 41375, 41363, 41034, 41350)

use instance's Name to attach gce disk

**What this PR does / why we need it**:
fix #40427
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #40427

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-02-14 09:11:25 -08:00
Jan Safranek
d6e9803007 Add OWNERS file for GCE cloud provider 2017-02-07 10:35:14 +01:00
rootfs
b36009be7f use instance's Name to attach gce disk
Signed-off-by: rootfs <hchen@redhat.com>
2017-02-06 14:45:41 -05:00
deads2k
5a8f075197 move authoritative client-go utils out of pkg 2017-01-24 08:59:18 -05:00
deads2k
c47717134b move utils used in restclient to client-go 2017-01-19 07:55:14 -05:00
deads2k
6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Jeff Grafton
20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Jan Safranek
ba6ab902c9 gce: Reuse unsuccessfully provisioned volumes.
GCE PD names generated by Kubernetes are guaranteed to be unique - they
contain name of the cluster and UID of the PVC that is behind it.
Presence of a GCE PD that has the same name as we want to provision
indicates that previous provisioning did not go well and most probably
the controller manager process was restarted in the meantime.

Kubernetes should reuse this volume and not provision a new one.
2017-01-02 14:15:46 +01:00
Mike Danese
161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
gmarek
98fdcf155d Don't retry creating route if it already exists 2016-12-14 09:16:58 +01:00
Mike Danese
c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Wojciech Tyczynski
289a7ada44 Increase GCE operation timeout 2016-12-12 16:37:21 +01:00
Angus Lees
8a7e103191 providers: Remove long-deprecated Instances.List()
This method has been unused by k8s for some time, and yet is the last
piece of the cloud provider API that encourages provider names to be
human-friendly strings (this method applies a regex to instance names).

Actually removing this deprecated method is part of a long effort to
migrate from instance names to instance IDs in at least the OpenStack
provider plugin.
2016-12-10 22:36:12 +11:00
Kubernetes Submit Queue
cffaf1b71b Merge pull request #31321 from anguslees/lb-nodes
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)

Pass full Node objects to provider LoadBalancer methods
2016-12-05 20:16:53 -08:00
Clayton Coleman
3454a8d52c refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman
5df8cc39c9 refactor: generated 2016-12-03 19:10:46 -05:00
Angus Lees
9d479f948a gce: Update LB API hosts->nodes
Update EnsureLoadBalancer/UpdateLoadBalancer API to use node objects.
2016-12-01 09:53:55 +11:00
Chao Xu
bcc783c594 run hack/update-all.sh 2016-11-23 15:53:09 -08:00
Chao Xu
c962c2602a dependencies: pkg/cloudprovider 2016-11-23 15:53:09 -08:00
bprashanth
a71abdc36d Ensure health check exists before creating target pool 2016-11-10 16:58:45 -08:00
Jing Xu
abbde43374 Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.

This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.

To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.

More details are explained in PR #33760
2016-10-28 09:24:53 -07:00
mfanjie
66381c6694 delete forwardingRules instead of globalForwardingRules 2016-10-25 11:27:38 +08:00
mfanjie
127e1b6115 always clean gce resources in service e2e 2016-10-25 11:27:38 +08:00
Mike Danese
3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
Kubernetes Submit Queue
e6b5b076b8 Merge pull request #33957 from bprashanth/esipp-beta
Automatic merge from submit-queue

Loadbalanced client src ip preservation enters beta

Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
2016-10-20 06:53:07 -07:00
bprashanth
a46a849b9e Promote source ip annotations to beta 2016-10-19 13:39:37 -07:00
Dan Williams
40cefcaf8f cloudprovider/gce: canonicalize instance name when returning instance array
'names' is an array of FQDNs.  'instances' is a map indexed by canonicalized
name.  Clearly these two won't always match, so when building the final
instance array to return, make sure to look up map entries by their canonicalized
name.

In the below example, "ocp-master-5pob" is clearly found as a GCE instance
but when building the final instance array it cannot be matched as the code
is looking for "ocp-master-5pob.c.ose-refarch.internal" instead.  The node
is then deleted from the cluster as it cannot be found by the cloud provider.

gce.go:2519] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): initial node prefix ocp-
gce.go:2530] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): looking for instances map[ocp-master-5pob:<nil>]
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-c' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]):    instance name <omitted> not requested
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]):    instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-b' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]):    instance name <omitted> not requested
gce.go:2576] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]):    found instance 'ocp-master-5pob' remaining 0
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]):    instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-d' (remaining 0)
gce.go:2588] Failed to retrieve instance: "ocp-master-5pob.c.ose-refarch.internal"
gce.go:2624] ### getInstanceByName(ocp-master-5pob.c.ose-refarch.internal): got []: instance not found
gce.go:2626] getInstanceByName/multiple-zones: failed to get instance ocp-master-5pob.c.ose-refarch.internal; err: instance not found
nodecontroller.go:587] Deleting node (no longer present in cloud provider): ocp-master-5pob.c.ose-refarch.internal
nodecontroller.go:664] Recording Deleting Node ocp-master-5pob.c.ose-refarch.internal because it's not present according to cloud provider event message for node ocp-master-5pob.c.ose-refarch.internal
2016-10-19 13:03:58 -05:00
Zach Loafman
22352d2844 Revert "Add e2e tests for storageclass" 2016-10-17 10:32:27 -07:00
Jan Safranek
c9c1147270 Add e2e tests for storageclass
- test pd-ssd and pd-standard on GCE,
- test all four volume types on AWS
- test just the default volume type on OpenStack (right now, there is no API
  to get list of them)
2016-10-13 15:37:08 +02:00
Doug Davis
9d5bac6330 Change minion to node
Contination of #1111

I tried to keep this PR down to just a simple search-n-replace to keep
things simple.  I may have gone too far in some spots but its easy to
roll those back if needed.

I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.

I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-09-28 10:53:30 -07:00