diff --git a/README.md b/README.md index 2bc7df0a..d34ec945 100644 --- a/README.md +++ b/README.md @@ -414,78 +414,73 @@ Using RKE's pluggable user addons, it's possible to deploy Rancher 2.x server in # chown /var/run/docker.sock ``` -## Etcd Snapshot +## Etcd Snapshots -You can configure a Rancher Kubernetes Engine (RKE) cluster to automatically create backups of etcd. In a disaster scenario, you can restore these backups, which are stored on other cluster nodes. +You can configure a Rancher Kubernetes Engine (RKE) cluster to automatically take snapshots of etcd. In a disaster scenario, you can restore these snapshots, which are stored on other cluster nodes. -### Etcd Rolling-Backup +### One-Time Snapshots -To schedule a recurring automatic etcd snapshot save, enable the `etcd-backup` service. `etcd-backup` runs in a service container alongside the `etcd` container. `etcd-backup` automatically creates a snapshot of etcd and stores them to its local disk. +RKE introduce a new command that can take a snapshot of a running etcd node in rke cluster, the snapshot will be automatically saved in `/opt/rke/etcd-snapshots`, the commands works as following: +``` +./rke etcd snapshot-save --config cluster.yml -To enable `etcd-backup` in RKE CLI, configure the following three variables: +WARN[0000] Name of the snapshot is not specified using [rke_etcd_snapshot_2018-05-17T23:32:08+02:00] +INFO[0000] Starting saving snapshot on etcd hosts +INFO[0000] [dialer] Setup tunnel for host [x.x.x.x] +INFO[0001] [dialer] Setup tunnel for host [y.y.y.y] +INFO[0002] [dialer] Setup tunnel for host [z.z.z.z] +INFO[0003] [etcd] Saving snapshot [rke_etcd_snapshot_2018-05-17T23:32:08+02:00] on host [x.x.x.x] +INFO[0004] [etcd] Successfully started [etcd-snapshot-once] container on host [x.x.x.x] +INFO[0004] [etcd] Saving snapshot [rke_etcd_snapshot_2018-05-17T23:32:08+02:00] on host [y.y.y.y] +INFO[0005] [etcd] Successfully started [etcd-snapshot-once] container on host [y.y.y.y] +INFO[0005] [etcd] Saving snapshot [rke_etcd_snapshot_2018-05-17T23:32:08+02:00] on host [z.z.z.z] +INFO[0006] [etcd] Successfully started [etcd-snapshot-once] container on host [z.z.z.z] +INFO[0006] Finished saving snapshot [rke_etcd_snapshot_2018-05-17T23:32:08+02:00] on all etcd hosts +``` + +The command will save a snapshot of etcd from each etcd node in the cluster config file and will save it in `/opt/rke/etcd-snapshots`. This command also creates a container for taking the snapshot. When the process completes, the container is automatically removed. + +### Etcd Recurring Snapshots + +To schedule a recurring automatic etcd snapshot save, enable the `etcd-snapshot` service. `etcd-snapshot` runs in a service container alongside the `etcd` container. `etcd-snapshot` automatically takes a snapshot of etcd and stores them to its local disk in `/opt/rke/etcd-snapshots`. + +To enable `etcd-snapshot` in RKE CLI, configure the following three variables: ``` services: etcd: - backup: true + snapshot: true creation: 5m0s retention: 24h ``` -- `backup`: Enables/disables etcd backups in the RKE cluster. +- `snapshot`: Enables/disables etcd snapshot recurring service in the RKE cluster. Default value: `false`. -- `creation`: Time period in which `etcd-backup` creates and stores local backups. +- `creation`: Time period in which `etcd-sanpshot` take snapshots. Default value: `5m0s` -- `retention`: Time period before before an etcd backup expires. Expired backups are purged. +- `retention`: Time period before before an etcd snapshot expires. Expired snapshots are purged. Default value: `24h` -After RKE runs, view the `etcd-backup` logs to confirm backups are being created automatically: +After RKE runs, view the `etcd-snapshot` logs to confirm backups are being created automatically: ``` -# docker logs etcd-backup +# docker logs etcd-snapshot + time="2018-05-04T18:39:16Z" level=info msg="Initializing Rolling Backups" creation=1m0s retention=24h0m0s time="2018-05-04T18:40:16Z" level=info msg="Created backup" name="2018-05-04T18:40:16Z_etcd" runtime=108.332814ms time="2018-05-04T18:41:16Z" level=info msg="Created backup" name="2018-05-04T18:41:16Z_etcd" runtime=92.880112ms time="2018-05-04T18:42:16Z" level=info msg="Created backup" name="2018-05-04T18:42:16Z_etcd" runtime=83.67642ms time="2018-05-04T18:43:16Z" level=info msg="Created backup" name="2018-05-04T18:43:16Z_etcd" runtime=86.298499ms ``` -Backups are saved to the following directory: `/opt/rke/etcdbackup/`. Backups are created on each node that runs etcd. +Backups are saved to the following directory: `/opt/rke/etcd-snapshots/`. Backups are created on each node that runs etcd. -### Etcd onetime Snapshots - -RKE also added two commands that for etcd v3 snapshot management: -``` -./rke etcd snapshot-save --name NAME -``` -and -``` -./rke etcd snapshot-restore --name NAME -``` - -The backup command saves a snapshot of etcd from each etcd nodes in the cluster config file and will save it in `/opt/rke/etcdbackup`. This command also creates a container for the backup. When the backup completes, the container is removed. - -``` -# ./rke etcd snapshot-save --name snapshot --config cluster.yml - -INFO[0000] Starting Backup on etcd hosts -INFO[0000] [dialer] Setup tunnel for host [x.x.x.x] -INFO[0002] [dialer] Setup tunnel for host [y.y.y.y] -INFO[0004] [dialer] Setup tunnel for host [z.z.z.z] -INFO[0006] [etcd] Starting backup on host [x.x.x.x] -INFO[0007] [etcd] Successfully started [etcd-backup-once] container on host [x.x.x.x] -INFO[0007] [etcd] Starting backup on host [y.y.y.y] -INFO[0009] [etcd] Successfully started [etcd-backup-once] container on host [y.y.y.y] -INFO[0010] [etcd] Starting backup on host [z.z.z.z] -INFO[0011] [etcd] Successfully started [etcd-backup-once] container on host [z.z.z.z] -INFO[0011] Finished backup on all etcd hosts -``` ### Etcd Disaster recovery -`etcd snapshot-restore` is used for etcd Disaster recovery, it reverts to any snapshot stored in `/opt/rke/etcdbackup` that you explicitly define. When you run `etcd snapshot-restore`, RKE removes the old etcd container if it still exists. To restore operations, RKE creates a new etcd cluster using the snapshot you choose. +`etcd snapshot-restore` is used for etcd Disaster recovery, it reverts to any snapshot stored in `/opt/rke/etcd-snapshots` that you explicitly define. When you run `etcd snapshot-restore`, RKE removes the old etcd container if it still exists. To restore operations, RKE creates a new etcd cluster using the snapshot you choose. >**Warning:** Restoring an etcd snapshot deletes your current etcd cluster and replaces it with a new one. Before you run the `etcd snapshot-restore` command, backup any important data in your current cluster. @@ -530,6 +525,112 @@ INFO[0027] [etcd] Successfully started etcd plane.. INFO[0027] Finished restoring on all etcd hosts ``` +## Example + +In this example we will assume that you started RKE on two nodes: + +| Name | IP | Role | +|:-----:|:--------:|:----------------------:| +| node1 | 10.0.0.1 | [controlplane, worker] | +| node2 | 10.0.0.2 | [etcd] | + +### 1. Setting up rke cluster +A minimal cluster configuration file for running k8s on these nodes should look something like the following: + +``` +nodes: + - address: 10.0.0.1 + hostname_override: node1 + user: ubuntu + role: [controlplane,worker] + - address: 10.0.0.2 + hostname_override: node2 + user: ubuntu + role: [etcd] +``` + +After running `rke up` you should be able to have a two node cluster, the next step is to run few pods on node1: + +``` +kubectl --kubeconfig=kube_config_cluster.yml run nginx --image=nginx --replicas=3 +``` + +### 2. Backup etcd cluster + +Now lets take a snapshot using RKE: + +``` +rke etcd snapshot-save --name snapshot.db --config cluster.yml +``` + +![etcd snapshot](img/rke-etcd-backup.png) + +### 3. Store snapshot externally + +After taking the etcd backup on node2 we should be able to save this backup in a persistence place, one of the options to do that is to save the backup taken on a s3 bucket or tape backup, for example: + +``` +root@node2:~# s3cmd mb s3://rke-etcd-backup +root@node2:~# s3cmd /opt/rke/etcdbackup/snapshot.db s3://rke-etcd-backup/ +``` + +### 4. Pull the backup on a new node + +To simulate the failure lets powerdown node2 completely: + +``` +root@node2:~# poweroff +``` + +Now its time to pull the backup saved on s3 on a new node: + +| Name | IP | Role | +|:-----:|:--------:|:----------------------:| +| node1 | 10.0.0.1 | [controlplane, worker] | +| ~~node2~~ | ~~10.0.0.2~~ | ~~[etcd]~~ | +| node3 | 10.0.0.3 | [etcd] | +| | | | +``` +root@node3:~# mkdir -p /opt/rke/etcdbackup +root@node3:~# s3cmd get s3://rke-etcd-backup/snapshot.db /opt/rke/etcdbackup/snapshot.db +``` + +### 5. Restore etcd on the new node + +Now lets do a restore to restore and run etcd on the third node, in order to do that you have first to add the third node to the cluster configuration file: +``` +nodes: + - address: 10.0.0.1 + hostname_override: node1 + user: ubuntu + role: [controlplane,worker] +# - address: 10.0.0.2 +# hostname_override: node2 +# user: ubuntu +# role: [etcd] + - address: 10.0.0.3 + hostname_override: node3 + user: ubuntu + role: [etcd] +``` +and then run `rke etcd restore`: +``` +rke etcd snapshot-restore --name snapshot.db --config cluster.yml +``` + +The previous command will restore the etcd data dir from the snapshot and run etcd container on this node, the final step is to restore the operations on the cluster by making the k8s api to point to the new etcd, to do that we run `rke up` again on the new cluster.yml file: +``` +rke up --config cluster.yml +``` +You can make sure that operations have been restored by checking the nginx deployment we created earlier: +``` +> kubectl get pods +NAME READY STATUS RESTARTS AGE +nginx-65899c769f-kcdpr 1/1 Running 0 17s +nginx-65899c769f-pc45c 1/1 Running 0 17s +nginx-65899c769f-qkhml 1/1 Running 0 17s +``` + ## License Copyright (c) 2018 [Rancher Labs, Inc.](http://rancher.com) diff --git a/cluster/cluster.go b/cluster/cluster.go index 79aad7fd..1813928f 100644 --- a/cluster/cluster.go +++ b/cluster/cluster.go @@ -77,12 +77,12 @@ func (c *Cluster) DeployControlPlane(ctx context.Context) error { if len(c.Services.Etcd.ExternalURLs) > 0 { log.Infof(ctx, "[etcd] External etcd connection string has been specified, skipping etcd plane") } else { - etcdBackup := services.EtcdBackup{ - Backup: c.Services.Etcd.Backup, + etcdRollingSnapshot := services.EtcdSnapshot{ + Snapshot: c.Services.Etcd.Snapshot, Creation: c.Services.Etcd.Creation, Retention: c.Services.Etcd.Retention, } - if err := services.RunEtcdPlane(ctx, c.EtcdHosts, etcdNodePlanMap, c.LocalConnDialerFactory, c.PrivateRegistriesMap, c.UpdateWorkersOnly, c.SystemImages.Alpine, etcdBackup); err != nil { + if err := services.RunEtcdPlane(ctx, c.EtcdHosts, etcdNodePlanMap, c.LocalConnDialerFactory, c.PrivateRegistriesMap, c.UpdateWorkersOnly, c.SystemImages.Alpine, etcdRollingSnapshot); err != nil { return fmt.Errorf("[etcd] Failed to bring up Etcd Plane: %v", err) } } diff --git a/cluster/etcd.go b/cluster/etcd.go index 93cdeaa2..a955e743 100644 --- a/cluster/etcd.go +++ b/cluster/etcd.go @@ -11,16 +11,16 @@ import ( "github.com/rancher/types/apis/management.cattle.io/v3" ) -func (c *Cluster) BackupEtcd(ctx context.Context, backupName string) error { +func (c *Cluster) SnapshotEtcd(ctx context.Context, snapshotName string) error { for _, host := range c.EtcdHosts { - if err := services.RunEtcdBackup(ctx, host, c.PrivateRegistriesMap, c.SystemImages.Alpine, c.Services.Etcd.Creation, c.Services.Etcd.Retention, backupName, true); err != nil { + if err := services.RunEtcdSnapshotSave(ctx, host, c.PrivateRegistriesMap, c.SystemImages.Alpine, c.Services.Etcd.Creation, c.Services.Etcd.Retention, snapshotName, true); err != nil { return err } } return nil } -func (c *Cluster) RestoreEtcdBackup(ctx context.Context, backupPath string) error { +func (c *Cluster) RestoreEtcdSnapshot(ctx context.Context, snapshotPath string) error { // Stopping all etcd containers for _, host := range c.EtcdHosts { if err := tearDownOldEtcd(ctx, host, c.SystemImages.Alpine, c.PrivateRegistriesMap); err != nil { @@ -30,8 +30,8 @@ func (c *Cluster) RestoreEtcdBackup(ctx context.Context, backupPath string) erro // Start restore process on all etcd hosts initCluster := services.GetEtcdInitialCluster(c.EtcdHosts) for _, host := range c.EtcdHosts { - if err := services.RestoreEtcdBackup(ctx, host, c.PrivateRegistriesMap, c.SystemImages.Etcd, backupPath, initCluster); err != nil { - return fmt.Errorf("[etcd] Failed to restore etcd backup: %v", err) + if err := services.RestoreEtcdSnapshot(ctx, host, c.PrivateRegistriesMap, c.SystemImages.Etcd, snapshotPath, initCluster); err != nil { + return fmt.Errorf("[etcd] Failed to restore etcd snapshot: %v", err) } } // Deploy Etcd Plane @@ -40,12 +40,12 @@ func (c *Cluster) RestoreEtcdBackup(ctx context.Context, backupPath string) erro for _, etcdHost := range c.EtcdHosts { etcdNodePlanMap[etcdHost.Address] = BuildRKEConfigNodePlan(ctx, c, etcdHost, etcdHost.DockerInfo) } - etcdBackup := services.EtcdBackup{ - Backup: c.Services.Etcd.Backup, + etcdRollingSnapshots := services.EtcdSnapshot{ + Snapshot: c.Services.Etcd.Snapshot, Creation: c.Services.Etcd.Creation, Retention: c.Services.Etcd.Retention, } - if err := services.RunEtcdPlane(ctx, c.EtcdHosts, etcdNodePlanMap, c.LocalConnDialerFactory, c.PrivateRegistriesMap, c.UpdateWorkersOnly, c.SystemImages.Alpine, etcdBackup); err != nil { + if err := services.RunEtcdPlane(ctx, c.EtcdHosts, etcdNodePlanMap, c.LocalConnDialerFactory, c.PrivateRegistriesMap, c.UpdateWorkersOnly, c.SystemImages.Alpine, etcdRollingSnapshots); err != nil { return fmt.Errorf("[etcd] Failed to bring up Etcd Plane: %v", err) } return nil diff --git a/cmd/etcd.go b/cmd/etcd.go index a8f6b05e..efd750c5 100644 --- a/cmd/etcd.go +++ b/cmd/etcd.go @@ -3,20 +3,22 @@ package cmd import ( "context" "fmt" + "time" "github.com/rancher/rke/cluster" "github.com/rancher/rke/hosts" "github.com/rancher/rke/log" "github.com/rancher/rke/pki" "github.com/rancher/types/apis/management.cattle.io/v3" + "github.com/sirupsen/logrus" "github.com/urfave/cli" ) func EtcdCommand() cli.Command { - backupRestoreFlags := []cli.Flag{ + snapshotFlags := []cli.Flag{ cli.StringFlag{ Name: "name", - Usage: "Specify Backup name", + Usage: "Specify Snapshot name", }, cli.StringFlag{ Name: "config", @@ -26,33 +28,33 @@ func EtcdCommand() cli.Command { }, } - backupRestoreFlags = append(backupRestoreFlags, commonFlags...) + snapshotFlags = append(snapshotFlags, commonFlags...) return cli.Command{ Name: "etcd", - Usage: "etcd backup/restore operations in k8s cluster", + Usage: "etcd snapshot save/restore operations in k8s cluster", Subcommands: []cli.Command{ { Name: "snapshot-save", Usage: "Take snapshot on all etcd hosts", - Flags: backupRestoreFlags, - Action: BackupEtcdHostsFromCli, + Flags: snapshotFlags, + Action: SnapshotSaveEtcdHostsFromCli, }, { Name: "snapshot-restore", Usage: "Restore existing snapshot", - Flags: backupRestoreFlags, - Action: RestoreEtcdBackupFromCli, + Flags: snapshotFlags, + Action: RestoreEtcdSnapshotFromCli, }, }, } } -func BackupEtcdHosts( +func SnapshotSaveEtcdHosts( ctx context.Context, rkeConfig *v3.RancherKubernetesEngineConfig, dockerDialerFactory hosts.DialerFactory, - configDir, backupName string) error { + configDir, snapshotName string) error { log.Infof(ctx, "Starting saving snapshot on etcd hosts") kubeCluster, err := cluster.ParseCluster(ctx, rkeConfig, clusterFilePath, configDir, dockerDialerFactory, nil, nil) @@ -63,19 +65,19 @@ func BackupEtcdHosts( if err := kubeCluster.TunnelHosts(ctx, false); err != nil { return err } - if err := kubeCluster.BackupEtcd(ctx, backupName); err != nil { + if err := kubeCluster.SnapshotEtcd(ctx, snapshotName); err != nil { return err } - log.Infof(ctx, "Finished saving snapshot on all etcd hosts") + log.Infof(ctx, "Finished saving snapshot [%s] on all etcd hosts", snapshotName) return nil } -func RestoreEtcdBackup( +func RestoreEtcdSnapshot( ctx context.Context, rkeConfig *v3.RancherKubernetesEngineConfig, dockerDialerFactory hosts.DialerFactory, - configDir, backupName string) error { + configDir, snapshotName string) error { log.Infof(ctx, "Starting restoring snapshot on etcd hosts") kubeCluster, err := cluster.ParseCluster(ctx, rkeConfig, clusterFilePath, configDir, dockerDialerFactory, nil, nil) @@ -86,15 +88,15 @@ func RestoreEtcdBackup( if err := kubeCluster.TunnelHosts(ctx, false); err != nil { return err } - if err := kubeCluster.RestoreEtcdBackup(ctx, backupName); err != nil { + if err := kubeCluster.RestoreEtcdSnapshot(ctx, snapshotName); err != nil { return err } - log.Infof(ctx, "Finished restoring snapshot on all etcd hosts") + log.Infof(ctx, "Finished restoring snapshot [%s] on all etcd hosts", snapshotName) return nil } -func BackupEtcdHostsFromCli(ctx *cli.Context) error { +func SnapshotSaveEtcdHostsFromCli(ctx *cli.Context) error { clusterFile, filePath, err := resolveClusterFile(ctx) if err != nil { return fmt.Errorf("Failed to resolve cluster file: %v", err) @@ -110,11 +112,16 @@ func BackupEtcdHostsFromCli(ctx *cli.Context) error { if err != nil { return err } - - return BackupEtcdHosts(context.Background(), rkeConfig, nil, "", ctx.String("name")) + // Check snapshot name + etcdSnapshotName := ctx.String("name") + if etcdSnapshotName == "" { + etcdSnapshotName = fmt.Sprintf("rke_etcd_snapshot_%s", time.Now().Format(time.RFC3339)) + logrus.Warnf("Name of the snapshot is not specified using [%s]", etcdSnapshotName) + } + return SnapshotSaveEtcdHosts(context.Background(), rkeConfig, nil, "", etcdSnapshotName) } -func RestoreEtcdBackupFromCli(ctx *cli.Context) error { +func RestoreEtcdSnapshotFromCli(ctx *cli.Context) error { clusterFile, filePath, err := resolveClusterFile(ctx) if err != nil { return fmt.Errorf("Failed to resolve cluster file: %v", err) @@ -130,7 +137,10 @@ func RestoreEtcdBackupFromCli(ctx *cli.Context) error { if err != nil { return err } - - return RestoreEtcdBackup(context.Background(), rkeConfig, nil, "", ctx.String("name")) + etcdSnapshotName := ctx.String("name") + if etcdSnapshotName == "" { + return fmt.Errorf("You must specify the snapshot name to restore") + } + return RestoreEtcdSnapshot(context.Background(), rkeConfig, nil, "", etcdSnapshotName) } diff --git a/services/etcd.go b/services/etcd.go index 3597f0c3..7ffc7306 100644 --- a/services/etcd.go +++ b/services/etcd.go @@ -21,17 +21,17 @@ import ( ) const ( - EtcdBackupPath = "/opt/rke/etcdbackup/" - EtcdRestorePath = "/opt/rke/etcdrestore/" - EtcdDataDir = "/var/lib/rancher/etcd/" + EtcdSnapshotPath = "/opt/rke/etcd-snapshots" + EtcdRestorePath = "/opt/rke/etcd-snapshots-restore/" + EtcdDataDir = "/var/lib/rancher/etcd/" ) -type EtcdBackup struct { - // Enable or disable backup creation - Backup bool - // Creation period of the etcd backups +type EtcdSnapshot struct { + // Enable or disable snapshot creation + Snapshot bool + // Creation period of the etcd snapshots Creation string - // Retention period of the etcd backups + // Retention period of the etcd snapshots Retention string } @@ -43,7 +43,7 @@ func RunEtcdPlane( prsMap map[string]v3.PrivateRegistry, updateWorkersOnly bool, alpineImage string, - etcdBackup EtcdBackup) error { + etcdSnapshot EtcdSnapshot) error { log.Infof(ctx, "[%s] Building up etcd plane..", ETCDRole) for _, host := range etcdHosts { if updateWorkersOnly { @@ -54,8 +54,8 @@ func RunEtcdPlane( if err := docker.DoRunContainer(ctx, host.DClient, imageCfg, hostCfg, EtcdContainerName, host.Address, ETCDRole, prsMap); err != nil { return err } - if etcdBackup.Backup { - if err := RunEtcdBackup(ctx, host, prsMap, alpineImage, etcdBackup.Creation, etcdBackup.Retention, EtcdBackupContainerName, false); err != nil { + if etcdSnapshot.Snapshot { + if err := RunEtcdSnapshotSave(ctx, host, prsMap, alpineImage, etcdSnapshot.Creation, etcdSnapshot.Retention, EtcdSnapshotContainerName, false); err != nil { return err } } @@ -219,8 +219,8 @@ func IsEtcdMember(ctx context.Context, etcdHost *hosts.Host, etcdHosts []*hosts. return false, nil } -func RunEtcdBackup(ctx context.Context, etcdHost *hosts.Host, prsMap map[string]v3.PrivateRegistry, etcdBackupImage string, creation, retention, name string, once bool) error { - log.Infof(ctx, "[etcd] Starting backup on host [%s]", etcdHost.Address) +func RunEtcdSnapshotSave(ctx context.Context, etcdHost *hosts.Host, prsMap map[string]v3.PrivateRegistry, etcdSnapshotImage string, creation, retention, name string, once bool) error { + log.Infof(ctx, "[etcd] Saving snapshot [%s] on host [%s]", name, etcdHost.Address) imageCfg := &container.Config{ Cmd: []string{ "/opt/rke/rke-etcd-backup", @@ -231,7 +231,7 @@ func RunEtcdBackup(ctx context.Context, etcdHost *hosts.Host, prsMap map[string] "--name", name, "--endpoints=" + etcdHost.InternalAddress + ":2379", }, - Image: etcdBackupImage, + Image: etcdSnapshotImage, } if once { imageCfg.Cmd = append(imageCfg.Cmd, "--once") @@ -242,28 +242,28 @@ func RunEtcdBackup(ctx context.Context, etcdHost *hosts.Host, prsMap map[string] } hostCfg := &container.HostConfig{ Binds: []string{ - fmt.Sprintf("%s:/backup", EtcdBackupPath), + fmt.Sprintf("%s:/backup", EtcdSnapshotPath), fmt.Sprintf("%s:/etc/kubernetes:z", path.Join(etcdHost.PrefixPath, "/etc/kubernetes"))}, NetworkMode: container.NetworkMode("host"), } if once { - if err := docker.DoRunContainer(ctx, etcdHost.DClient, imageCfg, hostCfg, EtcdBackupOnceContainerName, etcdHost.Address, ETCDRole, prsMap); err != nil { + if err := docker.DoRunContainer(ctx, etcdHost.DClient, imageCfg, hostCfg, EtcdSnapshotOnceContainerName, etcdHost.Address, ETCDRole, prsMap); err != nil { return err } - status, err := docker.WaitForContainer(ctx, etcdHost.DClient, etcdHost.Address, EtcdBackupOnceContainerName) + status, err := docker.WaitForContainer(ctx, etcdHost.DClient, etcdHost.Address, EtcdSnapshotOnceContainerName) if status != 0 || err != nil { - return fmt.Errorf("Failed to take etcd backup exit code [%d]: %v", status, err) + return fmt.Errorf("Failed to take etcd snapshot exit code [%d]: %v", status, err) } - return docker.RemoveContainer(ctx, etcdHost.DClient, etcdHost.Address, EtcdBackupOnceContainerName) + return docker.RemoveContainer(ctx, etcdHost.DClient, etcdHost.Address, EtcdSnapshotOnceContainerName) } - return docker.DoRunContainer(ctx, etcdHost.DClient, imageCfg, hostCfg, EtcdBackupContainerName, etcdHost.Address, ETCDRole, prsMap) + return docker.DoRunContainer(ctx, etcdHost.DClient, imageCfg, hostCfg, EtcdSnapshotContainerName, etcdHost.Address, ETCDRole, prsMap) } -func RestoreEtcdBackup(ctx context.Context, etcdHost *hosts.Host, prsMap map[string]v3.PrivateRegistry, etcdRestoreImage, backupName, initCluster string) error { - log.Infof(ctx, "[etcd] Restoring [%s] snapshot on etcd host [%s]", backupName, etcdHost.Address) +func RestoreEtcdSnapshot(ctx context.Context, etcdHost *hosts.Host, prsMap map[string]v3.PrivateRegistry, etcdRestoreImage, snapshotName, initCluster string) error { + log.Infof(ctx, "[etcd] Restoring [%s] snapshot on etcd host [%s]", snapshotName, etcdHost.Address) nodeName := pki.GetEtcdCrtName(etcdHost.InternalAddress) - backupPath := filepath.Join(EtcdBackupPath, backupName) + snapshotPath := filepath.Join(EtcdSnapshotPath, snapshotName) imageCfg := &container.Config{ Cmd: []string{ @@ -273,7 +273,7 @@ func RestoreEtcdBackup(ctx context.Context, etcdHost *hosts.Host, prsMap map[str "--cacert", pki.GetCertPath(pki.CACertName), "--cert", pki.GetCertPath(nodeName), "--key", pki.GetKeyPath(nodeName), - "snapshot", "restore", backupPath, + "snapshot", "restore", snapshotPath, "--data-dir=" + EtcdRestorePath, "--name=etcd-" + etcdHost.HostnameOverride, "--initial-cluster=" + initCluster, diff --git a/services/services.go b/services/services.go index e0a7f4f3..e1e2b91e 100644 --- a/services/services.go +++ b/services/services.go @@ -21,19 +21,19 @@ const ( SidekickServiceName = "sidekick" RBACAuthorizationMode = "rbac" - KubeAPIContainerName = "kube-apiserver" - KubeletContainerName = "kubelet" - KubeproxyContainerName = "kube-proxy" - KubeControllerContainerName = "kube-controller-manager" - SchedulerContainerName = "kube-scheduler" - EtcdContainerName = "etcd" - EtcdBackupContainerName = "etcd-backup" - EtcdBackupOnceContainerName = "etcd-backup-once" - EtcdRestoreContainerName = "etcd-restore" - NginxProxyContainerName = "nginx-proxy" - SidekickContainerName = "service-sidekick" - LogLinkContainerName = "rke-log-linker" - LogCleanerContainerName = "rke-log-cleaner" + KubeAPIContainerName = "kube-apiserver" + KubeletContainerName = "kubelet" + KubeproxyContainerName = "kube-proxy" + KubeControllerContainerName = "kube-controller-manager" + SchedulerContainerName = "kube-scheduler" + EtcdContainerName = "etcd" + EtcdSnapshotContainerName = "etcd-rolling-snapshots" + EtcdSnapshotOnceContainerName = "etcd-snapshot-once" + EtcdRestoreContainerName = "etcd-restore" + NginxProxyContainerName = "nginx-proxy" + SidekickContainerName = "service-sidekick" + LogLinkContainerName = "rke-log-linker" + LogCleanerContainerName = "rke-log-cleaner" KubeAPIPort = 6443 SchedulerPort = 10251 diff --git a/vendor.conf b/vendor.conf index 770396bb..46451251 100644 --- a/vendor.conf +++ b/vendor.conf @@ -25,4 +25,4 @@ github.com/ugorji/go/codec ccfe18359b55b97855cee1d3f74e5efbda4869d github.com/Microsoft/go-winio ab35fc04b6365e8fcb18e6e9e41ea4a02b10b175 github.com/rancher/norman ff60298f31f081b06d198815b4c178a578664f7d -github.com/rancher/types d289637bccd1ac6a8eaa46556733890a5f204fbc +github.com/rancher/types f08dc626d420185972a3bcf11504b81d7f2d37e0 diff --git a/vendor/github.com/rancher/types/apis/management.cattle.io/v3/rke_types.go b/vendor/github.com/rancher/types/apis/management.cattle.io/v3/rke_types.go index a3da2aee..185aad51 100644 --- a/vendor/github.com/rancher/types/apis/management.cattle.io/v3/rke_types.go +++ b/vendor/github.com/rancher/types/apis/management.cattle.io/v3/rke_types.go @@ -171,11 +171,11 @@ type ETCDService struct { Key string `yaml:"key" json:"key,omitempty"` // External etcd prefix Path string `yaml:"path" json:"path,omitempty"` - // Etcd Backup Service - Backup bool `yaml:"backup" json:"backup,omitempty"` - // Etcd Backup Retention period + // Etcd Recurring snapshot Service + Snapshot bool `yaml:"snapshot" json:"snapshot,omitempty"` + // Etcd snapshot Retention period Retention string `yaml:"retention" json:"retention,omitempty"` - // Etcd Backup Creation period + // Etcd snapshot Creation period Creation string `yaml:"creation" json:"creation,omitempty"` }