mirror of
https://github.com/k3s-io/kubernetes.git
synced 2025-09-19 17:16:12 +00:00
Cloning docs for 0.19.0
This commit is contained in:
468
release-0.19.0/docs/services.md
Normal file
468
release-0.19.0/docs/services.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# Services in Kubernetes
|
||||
|
||||
## Overview
|
||||
|
||||
Kubernetes [`Pods`](pods.md) are mortal. They are born and they die, and they
|
||||
are not resurrected. [`ReplicationControllers`](replication-controller.md) in
|
||||
particular create and destroy `Pods` dynamically (e.g. when scaling up or down
|
||||
or when doing rolling updates). While each `Pod` gets its own IP address, even
|
||||
those IP addresses cannot be relied upon to be stable over time. This leads to
|
||||
a problem: if some set of `Pods` (let's call them backends) provides
|
||||
functionality to other `Pods` (let's call them frontends) inside the Kubernetes
|
||||
cluster, how do those frontends find out and keep track of which backends are
|
||||
in that set?
|
||||
|
||||
Enter `Services`.
|
||||
|
||||
A Kubernetes `Service` is an abstraction which defines a logical set of `Pods`
|
||||
and a policy by which to access them - sometimes called a micro-service. The
|
||||
set of `Pods` targeted by a `Service` is (usually) determined by a [`Label
|
||||
Selector`](labels.md) (see below for why you might want a `Service` without a
|
||||
selector).
|
||||
|
||||
As an example, consider an image-processing backend which is running with 3
|
||||
replicas. Those replicas are fungible - frontends do not care which backend
|
||||
they use. While the actual `Pods` that compose the backend set may change, the
|
||||
frontend clients should not need to be aware of that or keep track of the list
|
||||
of backends themselves. The `Service` abstraction enables this decoupling.
|
||||
|
||||
For Kubernetes-native applications, Kubernetes offers a simple `Endpoints` API
|
||||
that is updated whenever the set of `Pods` in a `Service` changes. For
|
||||
non-native applications, Kubernetes offers a virtual-IP-based bridge to Services
|
||||
which redirects to the backend `Pods`.
|
||||
|
||||
## Defining a service
|
||||
|
||||
A `Service` in Kubernetes is a REST object, similar to a `Pod`. Like all of the
|
||||
REST objects, a `Service` definition can be POSTed to the apiserver to create a
|
||||
new instance. For example, suppose you have a set of `Pods` that each expose
|
||||
port 9376 and carry a label "app=MyApp".
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Service",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "my-service"
|
||||
},
|
||||
"spec": {
|
||||
"selector": {
|
||||
"app": "MyApp"
|
||||
},
|
||||
"ports": [
|
||||
{
|
||||
"protocol": "TCP",
|
||||
"port": 80,
|
||||
"targetPort": 9376
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This specification will create a new `Service` object named "my-service" which
|
||||
targets TCP port 9376 on any `Pod` with the "app=MyApp" label. This `Service`
|
||||
will also be assigned an IP address (sometimes called the "cluster IP"), which
|
||||
is used by the service proxies (see below). The `Service`'s selector will be
|
||||
evaluated continuously and the results will be posted in an `Endpoints` object
|
||||
also named "my-service".
|
||||
|
||||
Note that a `Service` can map an incoming port to any `targetPort`. By default
|
||||
the `targetPort` is the same as the `port` field. Perhaps more interesting is
|
||||
that `targetPort` can be a string, referring to the name of a port in the
|
||||
backend `Pod`s. The actual port number assigned to that name can be different
|
||||
in each backend `Pod`. This offers a lot of flexibility for deploying and
|
||||
evolving your `Service`s. For example, you can change the port number that
|
||||
pods expose in the next version of your backend software, without breaking
|
||||
clients.
|
||||
|
||||
Kubernetes `Service`s support `TCP` and `UDP` for protocols. The default
|
||||
is `TCP`.
|
||||
|
||||
### Services without selectors
|
||||
|
||||
Services generally abstract access to Kubernetes `Pods`, but they can also
|
||||
abstract other kinds of backends. For example:
|
||||
|
||||
* You want to have an external database cluster in production, but in test
|
||||
you use your own databases.
|
||||
* You want to point your service to a service in another
|
||||
[`Namespace`](namespaces.md) or on another cluster.
|
||||
* You are migrating your workload to Kubernetes and some of your backends run
|
||||
outside of Kubernetes.
|
||||
|
||||
In any of these scenarios you can define a service without a selector:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Service",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "my-service"
|
||||
},
|
||||
"spec": {
|
||||
"ports": [
|
||||
{
|
||||
"protocol": "TCP",
|
||||
"port": 80,
|
||||
"targetPort": 9376
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Because this has no selector, the corresponding `Endpoints` object will not be
|
||||
created. You can manually map the service to your own specific endpoints:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Endpoints",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "my-service"
|
||||
},
|
||||
"subsets": [
|
||||
{
|
||||
"addresses": [
|
||||
{ "IP": "1.2.3.4" }
|
||||
],
|
||||
"ports": [
|
||||
{ "port": 80 }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Accessing a `Service` without a selector works the same as if it had selector.
|
||||
The traffic will be routed to endpoints defined by the user (`1.2.3.4:80` in
|
||||
this example).
|
||||
|
||||
## Virtual IPs and service proxies
|
||||
|
||||
Every node in a Kubernetes cluster runs a `kube-proxy`. This application
|
||||
watches the Kubernetes master for the addition and removal of `Service`
|
||||
and `Endpoints` objects. For each `Service` it opens a port (random) on the
|
||||
local node. Any connections made to that port will be proxied to one of the
|
||||
corresponding backend `Pods`. Which backend to use is decided based on the
|
||||
`SessionAffinity` of the `Service`. Lastly, it installs iptables rules which
|
||||
capture traffic to the `Service`'s `Port` on the `Service`'s cluster IP (which
|
||||
is entirely virtual) and redirects that traffic to the previously described
|
||||
port.
|
||||
|
||||
The net result is that any traffic bound for the `Service` is proxied to an
|
||||
appropriate backend without the clients knowing anything about Kubernetes or
|
||||
`Services` or `Pods`.
|
||||
|
||||

|
||||
|
||||
By default, the choice of backend is random. Client-IP based session affinity
|
||||
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
|
||||
default is `"None"`).
|
||||
|
||||
As of Kubernetes 1.0, `Service`s are a "layer 3" (TCP/UDP over IP) construct. We do not
|
||||
yet have a concept of "layer 7" (HTTP) services.
|
||||
|
||||
## Multi-Port Services
|
||||
|
||||
Many `Service`s need to expose more than one port. For this case, Kubernetes
|
||||
supports multiple port definitions on a `Service` object. When using multiple
|
||||
ports you must give all of your ports names, so that endpoints can be
|
||||
disambiguated. For example:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Service",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "my-service"
|
||||
},
|
||||
"spec": {
|
||||
"selector": {
|
||||
"app": "MyApp"
|
||||
},
|
||||
"ports": [
|
||||
{
|
||||
"name": "http",
|
||||
"protocol": "TCP",
|
||||
"port": 80,
|
||||
"targetPort": 9376
|
||||
},
|
||||
{
|
||||
"name": "https",
|
||||
"protocol": "TCP",
|
||||
"port": 443,
|
||||
"targetPort": 9377
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Choosing your own IP address
|
||||
|
||||
A user can specify their own cluster IP address as part of a `Service` creation
|
||||
request. To do this, set the `spec.clusterIP` field. For example, if they
|
||||
already have an existing DNS entry that they wish to replace, or legacy systems
|
||||
that are configured for a specific IP address and difficult to re-configure.
|
||||
The IP address that a user chooses must be a valid IP address and within the
|
||||
service_cluster_ip_range CIDR range that is specified by flag to the API server.
|
||||
If the IP address value is invalid, the apiserver returns a 422 HTTP status code
|
||||
to indicate that the value is invalid.
|
||||
|
||||
### Why not use round-robin DNS?
|
||||
|
||||
A question that pops up every now and then is why we do all this stuff with
|
||||
virtual IPs rather than just use standard round-robin DNS. There are a few
|
||||
reasons:
|
||||
|
||||
* There is a long history of DNS libraries not respecting DNS TTLs and
|
||||
caching the results of name lookups.
|
||||
* Many apps do DNS lookups once and cache the results.
|
||||
* Even if apps and libraries did proper re-resolution, the load of every
|
||||
client re-resolving DNS over and over would be difficult to manage.
|
||||
|
||||
We try to discourage users from doing things that hurt themselves. That said,
|
||||
if enough people ask for this, we may implement it as an alternative.
|
||||
|
||||
## Discovering services
|
||||
|
||||
Kubernetes supports 2 primary modes of finding a `Service` - environment
|
||||
variables and DNS.
|
||||
|
||||
### Environment variables
|
||||
|
||||
When a `Pod` is run on a `Node`, the kubelet adds a set of environment variables
|
||||
for each active `Service`. It supports both [Docker links
|
||||
compatible](https://docs.docker.com/userguide/dockerlinks/) variables (see
|
||||
[makeLinkVariables](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/kubelet/envvars/envvars.go#L49))
|
||||
and simpler `{SVCNAME}_SERVICE_HOST` and `{SVCNAME}_SERVICE_PORT` variables,
|
||||
where the Service name is upper-cased and dashes are converted to underscores.
|
||||
|
||||
For example, the Service "redis-master" which exposes TCP port 6379 and has been
|
||||
allocated cluster IP address 10.0.0.11 produces the following environment
|
||||
variables:
|
||||
|
||||
```
|
||||
REDIS_MASTER_SERVICE_HOST=10.0.0.11
|
||||
REDIS_MASTER_SERVICE_PORT=6379
|
||||
REDIS_MASTER_PORT=tcp://10.0.0.11:6379
|
||||
REDIS_MASTER_PORT_6379_TCP=tcp://10.0.0.11:6379
|
||||
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
|
||||
REDIS_MASTER_PORT_6379_TCP_PORT=6379
|
||||
REDIS_MASTER_PORT_6379_TCP_ADDR=10.0.0.11
|
||||
```
|
||||
|
||||
*This does imply an ordering requirement* - any `Service` that a `Pod` wants to
|
||||
access must be created before the `Pod` itself, or else the environment
|
||||
variables will not be populated. DNS does not have this restriction.
|
||||
|
||||
### DNS
|
||||
|
||||
An optional (though strongly recommended) cluster add-on is a DNS server. The
|
||||
DNS server watches the Kubernetes API for new `Services` and creates a set of
|
||||
DNS records for each. If DNS has been enabled throughout the cluster then all
|
||||
`Pods` should be able to do name resolution of `Services` automatically.
|
||||
|
||||
For example, if you have a `Service` called "my-service" in Kubernetes
|
||||
`Namespace` "my-ns" a DNS record for "my-service.my-ns" is created. `Pods`
|
||||
which exist in the "my-ns" `Namespace` should be able to find it by simply doing
|
||||
a name lookup for "my-service". `Pods` which exist in other `Namespace`s must
|
||||
qualify the name as "my-service.my-ns". The result of these name lookups is the
|
||||
cluster IP.
|
||||
|
||||
We will soon add DNS support for multi-port `Service`s in the form of SRV
|
||||
records.
|
||||
|
||||
## Headless services
|
||||
|
||||
Sometimes you don't need or want load-balancing and a single service IP. In
|
||||
this case, you can create "headless" services by specifying `"None"` for the
|
||||
cluster IP (`spec.clusterIP`).
|
||||
For such `Service`s, a cluster IP is not allocated and service-specific
|
||||
environment variables for `Pod`s are not created. DNS is configured to return
|
||||
multiple A records (addresses) for the `Service` name, which point directly to
|
||||
the `Pod`s backing the `Service`. Additionally, the kube proxy does not handle
|
||||
these services and there is no load balancing or proxying done by the platform
|
||||
for them. The endpoints controller will still create `Endpoints` records in
|
||||
the API.
|
||||
|
||||
This option allows developers to reduce coupling to the Kubernetes system, if
|
||||
they desire, but leaves them freedom to do discovery in their own way.
|
||||
Applications can still use a self-registration pattern and adapters for other
|
||||
discovery systems could easily be built upon this API.
|
||||
|
||||
##<a name="external"></a>External services
|
||||
|
||||
For some parts of your application (e.g. frontends) you may want to expose a
|
||||
Service onto an external (outside of your cluster, maybe public internet) IP
|
||||
address. Kubernetes supports two ways of doing this: `NodePort`s and
|
||||
`LoadBalancer`s.
|
||||
|
||||
Every `Service` has a `Type` field which defines how the `Service` can be
|
||||
accessed. Valid values for this field are:
|
||||
|
||||
* `ClusterIP`: use a cluster-internal IP only - this is the default
|
||||
* `NodePort`: use a cluster IP, but also expose the service on a port on each
|
||||
node of the cluster (the same port on each)
|
||||
* `LoadBalancer`: use a ClusterIP and a NodePort, but also ask the cloud
|
||||
provider for a load balancer which forwards to the `Service`
|
||||
|
||||
Note that while `NodePort`s can be TCP or UDP, `LoadBalancer`s only support TCP
|
||||
as of Kubernetes 1.0.
|
||||
|
||||
### Type = NodePort
|
||||
|
||||
If you set the `type` field to `"NodePort"`, the Kubernetes master will
|
||||
allocate you a port (from a flag-configured range) on each node for each port
|
||||
exposed by your `Service`. That port will be reported in your `Service`'s
|
||||
`spec.ports[*].nodePort` field. If you specify a value in that field, the
|
||||
system will allocate you that port or else will fail the API transaction.
|
||||
|
||||
This gives developers the freedom to set up their own load balancers, to
|
||||
configure cloud environments that are not fully supported by Kubernetes, or
|
||||
even to just expose one or more nodes' IPs directly.
|
||||
|
||||
### Type = LoadBalancer
|
||||
|
||||
On cloud providers which support external load balancers, setting the `type`
|
||||
field to `"LoadBalancer"` will provision a load balancer for your `Service`.
|
||||
The actual creation of the load balancer happens asynchronously, and
|
||||
information about the provisioned balancer will be published in the `Service`'s
|
||||
`status.loadBalancer` field. For example:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "Service",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "my-service"
|
||||
},
|
||||
"spec": {
|
||||
"selector": {
|
||||
"app": "MyApp"
|
||||
},
|
||||
"ports": [
|
||||
{
|
||||
"protocol": "TCP",
|
||||
"port": 80,
|
||||
"targetPort": 9376,
|
||||
"nodePort": 30061
|
||||
}
|
||||
],
|
||||
"clusterIP": "10.0.171.239",
|
||||
"type": "LoadBalancer"
|
||||
},
|
||||
"status": {
|
||||
"loadBalancer": {
|
||||
"ingress": [
|
||||
{
|
||||
"ip": "146.148.47.155"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Traffic from the external load balancer will be directed at the backend `Pods`,
|
||||
though exactly how that works depends on the cloud provider.
|
||||
|
||||
## Shortcomings
|
||||
|
||||
We expect that using iptables and userspace proxies for VIPs will work at
|
||||
small to medium scale, but may not scale to very large clusters with thousands
|
||||
of Services. See [the original design proposal for
|
||||
portals](https://github.com/GoogleCloudPlatform/kubernetes/issues/1107) for more
|
||||
details.
|
||||
|
||||
Using the kube-proxy obscures the source-IP of a packet accessing a `Service`.
|
||||
This makes some kinds of firewalling impossible.
|
||||
|
||||
LoadBalancers only support TCP, not UDP.
|
||||
|
||||
The `Type` field is designed as nested functionality - each level adds to the
|
||||
previous. This is not strictly required on all cloud providers (e.g. GCE does
|
||||
not need to allocate a `NodePort` to make `LoadBalancer` work, but AWS does)
|
||||
but the current API requires it.
|
||||
|
||||
## Future work
|
||||
|
||||
In the future we envision that the proxy policy can become more nuanced than
|
||||
simple round robin balancing, for example master-elected or sharded. We also
|
||||
envision that some `Services` will have "real" load balancers, in which case the
|
||||
VIP will simply transport the packets there.
|
||||
|
||||
There's a
|
||||
[proposal](https://github.com/GoogleCloudPlatform/kubernetes/issues/3760) to
|
||||
eliminate userspace proxying in favor of doing it all in iptables. This should
|
||||
perform better and fix the source-IP obfuscation, though is less flexible than
|
||||
arbitrary userspace code.
|
||||
|
||||
We intend to have first-class support for L7 (HTTP) `Service`s.
|
||||
|
||||
We intend to have more flexible ingress modes for `Service`s which encompass
|
||||
the current `ClusterIP`, `NodePort`, and `LoadBalancer` modes and more.
|
||||
|
||||
## The gory details of virtual IPs
|
||||
|
||||
The previous information should be sufficient for many people who just want to
|
||||
use `Services`. However, there is a lot going on behind the scenes that may be
|
||||
worth understanding.
|
||||
|
||||
### Avoiding collisions
|
||||
|
||||
One of the primary philosophies of Kubernetes is that users should not be
|
||||
exposed to situations that could cause their actions to fail through no fault
|
||||
of their own. In this situation, we are looking at network ports - users
|
||||
should not have to choose a port number if that choice might collide with
|
||||
another user. That is an isolation failure.
|
||||
|
||||
In order to allow users to choose a port number for their `Services`, we must
|
||||
ensure that no two `Services` can collide. We do that by allocating each
|
||||
`Service` its own IP address.
|
||||
|
||||
To ensure each service receives a unique IP, an internal allocator atomically
|
||||
updates a global allocation map in etcd prior to each service. The map object
|
||||
must exist in the registry for services to get IPs, otherwise creations will
|
||||
fail with a message indicating an IP could not be allocated. A background
|
||||
controller is responsible for creating that map (to migrate from older versions
|
||||
of Kubernetes that used in memory locking) as well as checking for invalid
|
||||
assignments due to administrator intervention and cleaning up any any IPs
|
||||
that were allocated but which no service currently uses.
|
||||
|
||||
### IPs and VIPs
|
||||
|
||||
Unlike `Pod` IP addresses, which actually route to a fixed destination,
|
||||
`Service` IPs are not actually answered by a single host. Instead, we use
|
||||
`iptables` (packet processing logic in Linux) to define virtual IP addresses
|
||||
which are transparently redirected as needed. When clients connect to the
|
||||
VIP, their traffic is automatically transported to an appropriate endpoint.
|
||||
The environment variables and DNS for `Services` are actually populated in
|
||||
terms of the `Service`'s VIP and port.
|
||||
|
||||
As an example, consider the image processing application described above.
|
||||
When the backend `Service` is created, the Kubernetes master assigns a virtual
|
||||
IP address, for example 10.0.0.1. Assuming the `Service` port is 1234, the
|
||||
`Service` is observed by all of the `kube-proxy` instances in the cluster.
|
||||
When a proxy sees a new `Service`, it opens a new random port, establishes an
|
||||
iptables redirect from the VIP to this new port, and starts accepting
|
||||
connections on it.
|
||||
|
||||
When a client connects to the VIP the iptables rule kicks in, and redirects
|
||||
the packets to the `Service proxy`'s own port. The `Service proxy` chooses a
|
||||
backend, and starts proxying traffic from the client to the backend.
|
||||
|
||||
This means that `Service` owners can choose any port they want without risk of
|
||||
collision. Clients can simply connect to an IP and port, without being aware
|
||||
of which `Pod`s they are actually accessing.
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
[]()
|
||||
|
||||
|
||||
[]()
|
Reference in New Issue
Block a user