mirror of
				https://github.com/k3s-io/kubernetes.git
				synced 2025-10-31 13:50:01 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			645 lines
		
	
	
		
			26 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			645 lines
		
	
	
		
			26 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| <!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
 | |
| 
 | |
| <!-- BEGIN STRIP_FOR_RELEASE -->
 | |
| 
 | |
| <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | |
|      width="25" height="25">
 | |
| <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | |
|      width="25" height="25">
 | |
| <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | |
|      width="25" height="25">
 | |
| <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | |
|      width="25" height="25">
 | |
| <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | |
|      width="25" height="25">
 | |
| 
 | |
| <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 | |
| 
 | |
| If you are using a released version of Kubernetes, you should
 | |
| refer to the docs that go with that version.
 | |
| 
 | |
| <strong>
 | |
| The latest release of this document can be found
 | |
| [here](http://releases.k8s.io/release-1.1/docs/user-guide/services.md).
 | |
| 
 | |
| Documentation for other releases can be found at
 | |
| [releases.k8s.io](http://releases.k8s.io).
 | |
| </strong>
 | |
| --
 | |
| 
 | |
| <!-- END STRIP_FOR_RELEASE -->
 | |
| 
 | |
| <!-- END MUNGE: UNVERSIONED_WARNING -->
 | |
| 
 | |
| # Services in Kubernetes
 | |
| 
 | |
| **Table of Contents**
 | |
| <!-- BEGIN MUNGE: GENERATED_TOC -->
 | |
| 
 | |
| - [Services in Kubernetes](#services-in-kubernetes)
 | |
|   - [Overview](#overview)
 | |
|   - [Defining a service](#defining-a-service)
 | |
|     - [Services without selectors](#services-without-selectors)
 | |
|   - [Virtual IPs and service proxies](#virtual-ips-and-service-proxies)
 | |
|     - [Proxy-mode: userspace](#proxy-mode-userspace)
 | |
|     - [Proxy-mode: iptables](#proxy-mode-iptables)
 | |
|   - [Multi-Port Services](#multi-port-services)
 | |
|   - [Choosing your own IP address](#choosing-your-own-ip-address)
 | |
|     - [Why not use round-robin DNS?](#why-not-use-round-robin-dns)
 | |
|   - [Discovering services](#discovering-services)
 | |
|     - [Environment variables](#environment-variables)
 | |
|     - [DNS](#dns)
 | |
|   - [Headless services](#headless-services)
 | |
|   - [Publishing services - service types](#publishing-services---service-types)
 | |
|     - [Type NodePort](#type-nodeport)
 | |
|     - [Type LoadBalancer](#type-loadbalancer)
 | |
|     - [External IPs](#external-ips)
 | |
|   - [Shortcomings](#shortcomings)
 | |
|   - [Future work](#future-work)
 | |
|   - [The gory details of virtual IPs](#the-gory-details-of-virtual-ips)
 | |
|     - [Avoiding collisions](#avoiding-collisions)
 | |
|     - [IPs and VIPs](#ips-and-vips)
 | |
|       - [Userspace](#userspace)
 | |
|       - [Iptables](#iptables)
 | |
|   - [API Object](#api-object)
 | |
| 
 | |
| <!-- END MUNGE: GENERATED_TOC -->
 | |
| 
 | |
| ## Overview
 | |
| 
 | |
| Kubernetes [`Pods`](pods.md) are mortal. They are born and they die, and they
 | |
| are not resurrected.  [`ReplicationControllers`](replication-controller.md) in
 | |
| particular create and destroy `Pods` dynamically (e.g. when scaling up or down
 | |
| or when doing [rolling updates](kubectl/kubectl_rolling-update.md)).  While each `Pod` gets its own IP address, even
 | |
| those IP addresses cannot be relied upon to be stable over time. This leads to
 | |
| a problem: if some set of `Pods` (let's call them backends) provides
 | |
| functionality to other `Pods` (let's call them frontends) inside the Kubernetes
 | |
| cluster, how do those frontends find out and keep track of which backends are
 | |
| in that set?
 | |
| 
 | |
| Enter `Services`.
 | |
| 
 | |
| A Kubernetes `Service` is an abstraction which defines a logical set of `Pods`
 | |
| and a policy by which to access them - sometimes called a micro-service.  The
 | |
| set of `Pods` targeted by a `Service` is (usually) determined by a [`Label
 | |
| Selector`](labels.md#label-selectors) (see below for why you might want a
 | |
| `Service` without a selector).
 | |
| 
 | |
| As an example, consider an image-processing backend which is running with 3
 | |
| replicas.  Those replicas are fungible - frontends do not care which backend
 | |
| they use.  While the actual `Pods` that compose the backend set may change, the
 | |
| frontend clients should not need to be aware of that or keep track of the list
 | |
| of backends themselves.  The `Service` abstraction enables this decoupling.
 | |
| 
 | |
| For Kubernetes-native applications, Kubernetes offers a simple `Endpoints` API
 | |
| that is updated whenever the set of `Pods` in a `Service` changes.  For
 | |
| non-native applications, Kubernetes offers a virtual-IP-based bridge to Services
 | |
| which redirects to the backend `Pods`.
 | |
| 
 | |
| ## Defining a service
 | |
| 
 | |
| A `Service` in Kubernetes is a REST object, similar to a `Pod`.  Like all of the
 | |
| REST objects, a `Service` definition can be POSTed to the apiserver to create a
 | |
| new instance.  For example, suppose you have a set of `Pods` that each expose
 | |
| port 9376 and carry a label `"app=MyApp"`.
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Service",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "spec": {
 | |
|         "selector": {
 | |
|             "app": "MyApp"
 | |
|         },
 | |
|         "ports": [
 | |
|             {
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 80,
 | |
|                 "targetPort": 9376
 | |
|             }
 | |
|         ]
 | |
|     }
 | |
| }
 | |
| ```
 | |
| 
 | |
| This specification will create a new `Service` object named "my-service" which
 | |
| targets TCP port 9376 on any `Pod` with the `"app=MyApp"` label.  This `Service`
 | |
| will also be assigned an IP address (sometimes called the "cluster IP"), which
 | |
| is used by the service proxies (see below).  The `Service`'s selector will be
 | |
| evaluated continuously and the results will be POSTed to an `Endpoints` object
 | |
| also named "my-service".
 | |
| 
 | |
| Note that a `Service` can map an incoming port to any `targetPort`.  By default
 | |
| the `targetPort` will be set to the same value as the `port` field.  Perhaps
 | |
| more interesting is that `targetPort` can be a string, referring to the name of
 | |
| a port in the backend `Pods`.  The actual port number assigned to that name can
 | |
| be different in each backend `Pod`. This offers a lot of flexibility for
 | |
| deploying and evolving your `Services`.  For example, you can change the port
 | |
| number that pods expose in the next version of your backend software, without
 | |
| breaking clients.
 | |
| 
 | |
| Kubernetes `Services` support `TCP` and `UDP` for protocols.  The default
 | |
| is `TCP`.
 | |
| 
 | |
| ### Services without selectors
 | |
| 
 | |
| Services generally abstract access to Kubernetes `Pods`, but they can also
 | |
| abstract other kinds of backends.  For example:
 | |
| 
 | |
|   * You want to have an external database cluster in production, but in test
 | |
|     you use your own databases.
 | |
|   * You want to point your service to a service in another
 | |
|     [`Namespace`](namespaces.md) or on another cluster.
 | |
|   * You are migrating your workload to Kubernetes and some of your backends run
 | |
|     outside of Kubernetes.
 | |
| 
 | |
| In any of these scenarios you can define a service without a selector:
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Service",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "spec": {
 | |
|         "ports": [
 | |
|             {
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 80,
 | |
|                 "targetPort": 9376
 | |
|             }
 | |
|         ]
 | |
|     }
 | |
| }
 | |
| ```
 | |
| 
 | |
| Because this service has no selector, the corresponding `Endpoints` object will not be
 | |
| created. You can manually map the service to your own specific endpoints:
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Endpoints",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "subsets": [
 | |
|         {
 | |
|             "addresses": [
 | |
|                 { "IP": "1.2.3.4" }
 | |
|             ],
 | |
|             "ports": [
 | |
|                 { "port": 9376 }
 | |
|             ]
 | |
|         }
 | |
|     ]
 | |
| }
 | |
| ```
 | |
| 
 | |
| NOTE: Endpoint IPs may not be loopback (127.0.0.0/8), link-local
 | |
| (169.254.0.0/16), or link-local multicast ((224.0.0.0/24).
 | |
| 
 | |
| Accessing a `Service` without a selector works the same as if it had selector.
 | |
| The traffic will be routed to endpoints defined by the user (`1.2.3.4:9376` in
 | |
| this example).
 | |
| 
 | |
| ## Virtual IPs and service proxies
 | |
| 
 | |
| Every node in a Kubernetes cluster runs a `kube-proxy`.  This application
 | |
| is responsible for implementing a form of virtual IP for `Service`s.  In
 | |
| Kubernetes v1.0 the proxy was purely in userspace.  In Kubernetes v1.1 an
 | |
| iptables proxy was added, but was not the default operating mode.  In
 | |
| Kubernetes v1.2 we expect the iptables proxy to be the default.
 | |
| 
 | |
| As of Kubernetes v1.0, `Services` are a "layer 3" (TCP/UDP over IP) construct.
 | |
| In Kubernetes v1.1 the `Ingress` API was added (beta) to represent "layer 7"
 | |
| (HTTP) services.
 | |
| 
 | |
| ### Proxy-mode: userspace
 | |
| 
 | |
| In this mode, kube-proxy watches the Kubernetes master for the addition and
 | |
| removal of `Service` and `Endpoints` objects. For each `Service` it opens a
 | |
| port (randomly chosen) on the local node.  Any connections to this "proxy port"
 | |
| will be proxied to one of the `Service`'s backend `Pods` (as reported in
 | |
| `Endpoints`).  Which backend `Pod`  to use is decided based on the
 | |
| `SessionAffinity` of the `Service`.  Lastly, it installs iptables rules which
 | |
| capture traffic to the `Service`'s `clusterIP` (which is virtual) and `Port`
 | |
| and redirects that traffic to the proxy port which proxies the a backend `Pod`.
 | |
| 
 | |
| The net result is that any traffic bound for the `Service`'s IP:Port is proxied
 | |
| to an appropriate backend without the clients knowing anything about Kubernetes
 | |
| or `Services` or `Pods`.
 | |
| 
 | |
| By default, the choice of backend is round robin.  Client-IP based session affinity
 | |
| can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
 | |
| default is `"None"`).
 | |
| 
 | |
| 
 | |
| 
 | |
| ### Proxy-mode: iptables
 | |
| 
 | |
| In this mode, kube-proxy watches the Kubernetes master for the addition and
 | |
| removal of `Service` and `Endpoints` objects. For each `Service` it installs
 | |
| iptables rules which capture traffic to the `Service`'s `clusterIP` (which is
 | |
| virtual) and `Port` and redirects that traffic to one of the `Service`'s
 | |
| backend sets.  For each `Endpoints` object it installs iptables rules which
 | |
| select a backend `Pod`.
 | |
| 
 | |
| By default, the choice of backend is random.  Client-IP based session affinity
 | |
| can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
 | |
| default is `"None"`).
 | |
| 
 | |
| As with the userspace proxy, the net result is that any traffic bound for the
 | |
| `Service`'s IP:Port is proxied to an appropriate backend without the clients
 | |
| knowing anything about Kubernetes or `Services` or `Pods`. This should be
 | |
| faster and more reliable than the userspace proxy.
 | |
| 
 | |
| 
 | |
| 
 | |
| ## Multi-Port Services
 | |
| 
 | |
| Many `Services` need to expose more than one port.  For this case, Kubernetes
 | |
| supports multiple port definitions on a `Service` object.  When using multiple
 | |
| ports you must give all of your ports names, so that endpoints can be
 | |
| disambiguated.  For example:
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Service",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "spec": {
 | |
|         "selector": {
 | |
|             "app": "MyApp"
 | |
|         },
 | |
|         "ports": [
 | |
|             {
 | |
|                 "name": "http",
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 80,
 | |
|                 "targetPort": 9376
 | |
|             },
 | |
|             {
 | |
|                 "name": "https",
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 443,
 | |
|                 "targetPort": 9377
 | |
|             }
 | |
|         ]
 | |
|     }
 | |
| }
 | |
| ```
 | |
| 
 | |
| ## Choosing your own IP address
 | |
| 
 | |
| You can specify your own cluster IP address as part of a `Service` creation
 | |
| request.  To do this, set the `spec.clusterIP` field. For example, if you
 | |
| already have an existing DNS entry that you wish to replace, or legacy systems
 | |
| that are configured for a specific IP address and difficult to re-configure.
 | |
| The IP address that a user chooses must be a valid IP address and within the
 | |
| `service-cluster-ip-range` CIDR range that is specified by flag to the API
 | |
| server.  If the IP address value is invalid, the apiserver returns a 422 HTTP
 | |
| status code to indicate that the value is invalid.
 | |
| 
 | |
| ### Why not use round-robin DNS?
 | |
| 
 | |
| A question that pops up every now and then is why we do all this stuff with
 | |
| virtual IPs rather than just use standard round-robin DNS.  There are a few
 | |
| reasons:
 | |
| 
 | |
|    * There is a long history of DNS libraries not respecting DNS TTLs and
 | |
|      caching the results of name lookups.
 | |
|    * Many apps do DNS lookups once and cache the results.
 | |
|    * Even if apps and libraries did proper re-resolution, the load of every
 | |
|      client re-resolving DNS over and over would be difficult to manage.
 | |
| 
 | |
| We try to discourage users from doing things that hurt themselves.  That said,
 | |
| if enough people ask for this, we may implement it as an alternative.
 | |
| 
 | |
| ## Discovering services
 | |
| 
 | |
| Kubernetes supports 2 primary modes of finding a `Service` - environment
 | |
| variables and DNS.
 | |
| 
 | |
| ### Environment variables
 | |
| 
 | |
| When a `Pod` is run on a `Node`, the kubelet adds a set of environment variables
 | |
| for each active `Service`.  It supports both [Docker links
 | |
| compatible](https://docs.docker.com/userguide/dockerlinks/) variables (see
 | |
| [makeLinkVariables](http://releases.k8s.io/HEAD/pkg/kubelet/envvars/envvars.go#L49))
 | |
| and simpler `{SVCNAME}_SERVICE_HOST` and `{SVCNAME}_SERVICE_PORT` variables,
 | |
| where the Service name is upper-cased and dashes are converted to underscores.
 | |
| 
 | |
| For example, the Service `"redis-master"` which exposes TCP port 6379 and has been
 | |
| allocated cluster IP address 10.0.0.11 produces the following environment
 | |
| variables:
 | |
| 
 | |
| ```bash
 | |
| REDIS_MASTER_SERVICE_HOST=10.0.0.11
 | |
| REDIS_MASTER_SERVICE_PORT=6379
 | |
| REDIS_MASTER_PORT=tcp://10.0.0.11:6379
 | |
| REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379
 | |
| REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
 | |
| REDIS_MASTER_PORT_6379_TCP_PORT=6379
 | |
| REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
 | |
| ```
 | |
| 
 | |
| *This does imply an ordering requirement* - any `Service` that a `Pod` wants to
 | |
| access must be created before the `Pod` itself, or else the environment
 | |
| variables will not be populated.  DNS does not have this restriction.
 | |
| 
 | |
| ### DNS
 | |
| 
 | |
| An optional (though strongly recommended) [cluster
 | |
| add-on](http://releases.k8s.io/HEAD/cluster/addons/README.md) is a DNS server.  The
 | |
| DNS server watches the Kubernetes API for new `Services` and creates a set of
 | |
| DNS records for each.  If DNS has been enabled throughout the cluster then all
 | |
| `Pods` should be able to do name resolution of `Services` automatically.
 | |
| 
 | |
| For example, if you have a `Service` called `"my-service"` in Kubernetes
 | |
| `Namespace` `"my-ns"` a DNS record for `"my-service.my-ns"` is created.  `Pods`
 | |
| which exist in the `"my-ns"` `Namespace` should be able to find it by simply doing
 | |
| a name lookup for `"my-service"`.  `Pods` which exist in other `Namespaces` must
 | |
| qualify the name as `"my-service.my-ns"`.  The result of these name lookups is the
 | |
| cluster IP.
 | |
| 
 | |
| Kubernetes also supports DNS SRV (service) records for named ports.  If the
 | |
| `"my-service.my-ns"` `Service` has a port named `"http"` with protocol `TCP`, you
 | |
| can do a DNS SRV query for `"_http._tcp.my-service.my-ns"` to discover the port
 | |
| number for `"http"`.
 | |
| 
 | |
| ## Headless services
 | |
| 
 | |
| Sometimes you don't need or want load-balancing and a single service IP.  In
 | |
| this case, you can create "headless" services by specifying `"None"` for the
 | |
| cluster IP (`spec.clusterIP`).
 | |
| 
 | |
| For such `Services`, a cluster IP is not allocated. DNS is configured to return
 | |
| multiple A records (addresses) for the `Service` name, which point directly to
 | |
| the `Pods` backing the `Service`.  Additionally, the kube proxy does not handle
 | |
| these services and there is no load balancing or proxying done by the platform
 | |
| for them.  The endpoints controller will still create `Endpoints` records in
 | |
| the API.
 | |
| 
 | |
| This option allows developers to reduce coupling to the Kubernetes system, if
 | |
| they desire, but leaves them freedom to do discovery in their own way.
 | |
| Applications can still use a self-registration pattern and adapters for other
 | |
| discovery systems could easily be built upon this API.
 | |
| 
 | |
| ## Publishing services - service types
 | |
| 
 | |
| For some parts of your application (e.g. frontends) you may want to expose a
 | |
| Service onto an external (outside of your cluster, maybe public internet) IP
 | |
| address, other services should be visible only from inside of the cluster.
 | |
| 
 | |
| 
 | |
| Kubernetes `ServiceTypes` allow you to specify what kind of service you want.
 | |
| The default and base type is `ClusterIP`, which exposes a service to connection
 | |
| from inside the cluster. `NodePort` and `LoadBalancer` are two types that expose
 | |
| services to external traffic.
 | |
| 
 | |
| Valid values for the `ServiceType` field are:
 | |
| 
 | |
|    * `ClusterIP`: use a cluster-internal IP only - this is the default and is
 | |
|      discussed above. Choosing this value means that you want this service to be
 | |
|      reachable only from inside of the cluster.
 | |
|    * `NodePort`: on top of having a cluster-internal IP, expose the service on a
 | |
|      port on each node of the cluster (the same port on each node). You'll be able
 | |
|      to contact the service on any `<NodeIP>:NodePort` address.
 | |
|    * `LoadBalancer`: on top of having a cluster-internal IP and exposing service
 | |
|      on a NodePort also, ask the cloud provider for a load balancer
 | |
|      which forwards to the `Service` exposed as a `<NodeIP>:NodePort`
 | |
|      for each Node.
 | |
| 
 | |
| Note that while `NodePort`s can be TCP or UDP, `LoadBalancer`s only support TCP
 | |
| as of Kubernetes 1.0.
 | |
| 
 | |
| ### Type NodePort
 | |
| 
 | |
| If you set the `type` field to `"NodePort"`, the Kubernetes master will
 | |
| allocate a port from a flag-configured range (default: 30000-32767), and each
 | |
| Node will proxy that port (the same port number on every Node) into your `Service`.
 | |
| That port will be reported in your `Service`'s `spec.ports[*].nodePort` field.
 | |
| 
 | |
| If you want a specific port number, you can specify a value in the `nodePort`
 | |
| field, and the system will allocate you that port or else the API transaction
 | |
| will fail (i.e. you need to take care about possible port collisions yourself).
 | |
| The value you specify must be in the configured range for node ports.
 | |
| 
 | |
| This gives developers the freedom to set up their own load balancers, to
 | |
| configure cloud environments that are not fully supported by Kubernetes, or
 | |
| even to just expose one or more nodes' IPs directly.
 | |
| 
 | |
| Note that this Service will be visible as both `<NodeIP>:spec.ports[*].nodePort`
 | |
| and `spec.clusterIp:spec.ports[*].port`.
 | |
| 
 | |
| ### Type LoadBalancer
 | |
| 
 | |
| On cloud providers which support external load balancers, setting the `type`
 | |
| field to `"LoadBalancer"` will provision a load balancer for your `Service`.
 | |
| The actual creation of the load balancer happens asynchronously, and
 | |
| information about the provisioned balancer will be published in the `Service`'s
 | |
| `status.loadBalancer` field.  For example:
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Service",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "spec": {
 | |
|         "selector": {
 | |
|             "app": "MyApp"
 | |
|         },
 | |
|         "ports": [
 | |
|             {
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 80,
 | |
|                 "targetPort": 9376,
 | |
|                 "nodePort": 30061
 | |
|             }
 | |
|         ],
 | |
|         "clusterIP": "10.0.171.239",
 | |
|         "loadBalancerIP": "78.11.24.19",
 | |
|         "type": "LoadBalancer"
 | |
|     },
 | |
|     "status": {
 | |
|         "loadBalancer": {
 | |
|             "ingress": [
 | |
|                 {
 | |
|                     "ip": "146.148.47.155"
 | |
|                 }
 | |
|             ]
 | |
|         }
 | |
|     }
 | |
| }
 | |
| ```
 | |
| 
 | |
| Traffic from the external load balancer will be directed at the backend `Pods`,
 | |
| though exactly how that works depends on the cloud provider. Some cloud providers allow
 | |
| the `loadBalancerIP` to be specified. In those cases, the load-balancer will be created
 | |
| with the user-specified `loadBalancerIP`. If the `loadBalancerIP` field is not specified,
 | |
| an ephemeral IP will be assigned to the loadBalancer. If the `loadBalancerIP` is specified, but the
 | |
| cloud provider does not support the feature, the field will be ignored.
 | |
| 
 | |
| ### External IPs
 | |
| 
 | |
| If there are external IPs that route to one or more cluster nodes, Kubernetes services can be exposed on those
 | |
| `externalIPs`. Traffic that ingresses into the cluster with the external IP (as destination IP), on the service port,
 | |
| will be routed to one of the service endpoints. `externalIPs` are not managed by Kubernetes and are the responsibility
 | |
| of the cluster administrator.
 | |
| 
 | |
| In the ServiceSpec, `externalIPs` can be specified along with any of the `ServiceTypes`.
 | |
| In the example below, my-service can be accessed by clients on 80.11.12.10:80 (externalIP:port)
 | |
| 
 | |
| ```json
 | |
| {
 | |
|     "kind": "Service",
 | |
|     "apiVersion": "v1",
 | |
|     "metadata": {
 | |
|         "name": "my-service"
 | |
|     },
 | |
|     "spec": {
 | |
|         "selector": {
 | |
|             "app": "MyApp"
 | |
|         },
 | |
|         "ports": [
 | |
|             {
 | |
|                 "name": "http",
 | |
|                 "protocol": "TCP",
 | |
|                 "port": 80,
 | |
|                 "targetPort": 9376
 | |
|             }
 | |
|         ],
 | |
|         "externalIPs" : [
 | |
|             "80.11.12.10"
 | |
|         ]
 | |
|     }
 | |
| }
 | |
| ```
 | |
| 
 | |
| ## Shortcomings
 | |
| 
 | |
| Using the userspace proxy for VIPs will work at small to medium scale, but will
 | |
| not scale to very large clusters with thousands of Services.  See [the original
 | |
| design proposal for portals](http://issue.k8s.io/1107) for more details.
 | |
| 
 | |
| Using the userspace proxy obscures the source-IP of a packet accessing a `Service`.
 | |
| This makes some kinds of firewalling impossible.  The iptables proxier does not
 | |
| obscure in-cluster source IPs, but it does still impact clients coming through
 | |
| a load-balancer or node-port.
 | |
| 
 | |
| LoadBalancers only support TCP, not UDP.
 | |
| 
 | |
| The `Type` field is designed as nested functionality - each level adds to the
 | |
| previous.  This is not strictly required on all cloud providers (e.g. Google Compute Engine does
 | |
| not need to allocate a `NodePort` to make `LoadBalancer` work, but AWS does)
 | |
| but the current API requires it.
 | |
| 
 | |
| ## Future work
 | |
| 
 | |
| In the future we envision that the proxy policy can become more nuanced than
 | |
| simple round robin balancing, for example master-elected or sharded.  We also
 | |
| envision that some `Services` will have "real" load balancers, in which case the
 | |
| VIP will simply transport the packets there.
 | |
| 
 | |
| We intend to improve our support for L7 (HTTP) `Services`.
 | |
| 
 | |
| We intend to have more flexible ingress modes for `Services` which encompass
 | |
| the current `ClusterIP`, `NodePort`, and `LoadBalancer` modes and more.
 | |
| 
 | |
| ## The gory details of virtual IPs
 | |
| 
 | |
| The previous information should be sufficient for many people who just want to
 | |
| use `Services`.  However, there is a lot going on behind the scenes that may be
 | |
| worth understanding.
 | |
| 
 | |
| ### Avoiding collisions
 | |
| 
 | |
| One of the primary philosophies of Kubernetes is that users should not be
 | |
| exposed to situations that could cause their actions to fail through no fault
 | |
| of their own.  In this situation, we are looking at network ports - users
 | |
| should not have to choose a port number if that choice might collide with
 | |
| another user.  That is an isolation failure.
 | |
| 
 | |
| In order to allow users to choose a port number for their `Services`, we must
 | |
| ensure that no two `Services` can collide.  We do that by allocating each
 | |
| `Service` its own IP address.
 | |
| 
 | |
| To ensure each service receives a unique IP, an internal allocator atomically
 | |
| updates a global allocation map in etcd prior to each service. The map object
 | |
| must exist in the registry for services to get IPs, otherwise creations will
 | |
| fail with a message indicating an IP could not be allocated. A background
 | |
| controller is responsible for creating that map (to migrate from older versions
 | |
| of Kubernetes that used in memory locking) as well as checking for invalid
 | |
| assignments due to administrator intervention and cleaning up any IPs
 | |
| that were allocated but which no service currently uses.
 | |
| 
 | |
| ### IPs and VIPs
 | |
| 
 | |
| Unlike `Pod` IP addresses, which actually route to a fixed destination,
 | |
| `Service` IPs are not actually answered by a single host.  Instead, we use
 | |
| `iptables` (packet processing logic in Linux) to define virtual IP addresses
 | |
| which are transparently redirected as needed.  When clients connect to the
 | |
| VIP, their traffic is automatically transported to an appropriate endpoint.
 | |
| The environment variables and DNS for `Services` are actually populated in
 | |
| terms of the `Service`'s VIP and port.
 | |
| 
 | |
| We support two proxy modes - userspace and iptables, which operate slightly
 | |
| differently.
 | |
| 
 | |
| #### Userspace
 | |
| 
 | |
| As an example, consider the image processing application described above.
 | |
| When the backend `Service` is created, the Kubernetes master assigns a virtual
 | |
| IP address, for example 10.0.0.1.  Assuming the `Service` port is 1234, the
 | |
| `Service` is observed by all of the `kube-proxy` instances in the cluster.
 | |
| When a proxy sees a new `Service`, it opens a new random port, establishes an
 | |
| iptables redirect from the VIP to this new port, and starts accepting
 | |
| connections on it.
 | |
| 
 | |
| When a client connects to the VIP the iptables rule kicks in, and redirects
 | |
| the packets to the `Service proxy`'s own port.  The `Service proxy` chooses a
 | |
| backend, and starts proxying traffic from the client to the backend.
 | |
| 
 | |
| This means that `Service` owners can choose any port they want without risk of
 | |
| collision.  Clients can simply connect to an IP and port, without being aware
 | |
| of which `Pods` they are actually accessing.
 | |
| 
 | |
| #### Iptables
 | |
| 
 | |
| Again, consider the image processing application described above.
 | |
| When the backend `Service` is created, the Kubernetes master assigns a virtual
 | |
| IP address, for example 10.0.0.1.  Assuming the `Service` port is 1234, the
 | |
| `Service` is observed by all of the `kube-proxy` instances in the cluster.
 | |
| When a proxy sees a new `Service`, it installs a series of iptables rules which
 | |
| redirect from the VIP to per-`Service` rules.  The per-`Service` rules link to
 | |
| per-`Endpoint` rules which redirect (Destination NAT) to the backends.
 | |
| 
 | |
| When a client connects to the VIP the iptables rule kicks in.  A backend is
 | |
| chosen (either based on session affinity or randomly) and packets are
 | |
| redirected to the backend.  Unlike the userspace proxy, packets are never
 | |
| copied to userspace, the kube-proxy does not have to be running for the VIP to
 | |
| work, and the client IP is not altered.
 | |
| 
 | |
| This same basic flow executes when traffic comes in through a node-port or
 | |
| through a load-balancer, though in those cases the client IP does get altered.
 | |
| 
 | |
| ## API Object
 | |
| 
 | |
| Service is a top-level resource in the kubernetes REST API. More details about the
 | |
| API object can be found at: [Service API
 | |
| object](https://htmlpreview.github.io/?https://github.com/kubernetes/kubernetes/blob/HEAD/docs/api-reference/v1/definitions.html#_v1_service).
 | |
| 
 | |
| 
 | |
| <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 | |
| []()
 | |
| <!-- END MUNGE: GENERATED_ANALYTICS -->
 |