|
1 | | -:title: Backing up Data |
| 1 | +:title: Backing Up and Restoring Data |
2 | 2 | :description: Backing up stateful data on Deis. |
3 | 3 |
|
4 | 4 | .. _backing_up_data: |
5 | 5 |
|
6 | | -Backing up Data |
7 | | -======================== |
| 6 | +Backing Up and Restoring Data |
| 7 | +============================= |
8 | 8 |
|
9 | 9 | While applications deployed on Deis follow the Twelve-Factor methodology and are thus stateless, |
10 | | -Deis maintains platform state in two places: the :ref:`Store` component, and in etcd. |
| 10 | +Deis maintains platform state in the :ref:`Store` component. |
11 | 11 |
|
12 | | -Store component |
13 | | ---------------- |
14 | | -The store component runs `Ceph`_, and is used by the :ref:`Database` and :ref:`Registry` components |
15 | | -as a data store. This enables the components themselves to freely move around the cluster while |
16 | | -their state is backed by store. |
| 12 | +The store component runs `Ceph`_, and is used by the :ref:`Database`, :ref:`Registry`, |
| 13 | +:ref:`Controller`, and :ref:`Logger` components as a data store. Database and registry |
| 14 | +use store-gateway and controller and logger use store-volume. Being backed by the store component |
| 15 | +enables these components to move freely around the cluster while their state is backed by store. |
17 | 16 |
|
18 | 17 | The store component is configured to still operate in a degraded state, and will automatically |
19 | 18 | recover should a host fail and then rejoin the cluster. Total data loss of Ceph is only possible |
20 | | -if all of the store containers are removed. However, backup of Ceph is fairly straightforward. |
| 19 | +if all of the store containers are removed. However, backup of Ceph is fairly straightforward, and |
| 20 | +is recommended before :ref:`Upgrading Deis <upgrading-deis>`. |
21 | 21 |
|
22 | | -Data in Ceph is stored on the filesystem in ``/var/lib/ceph``, and metadata information is stored |
23 | | -within Ceph. Ceph provides the ability to take snapshots of storage pools with the `rados`_ command. |
| 22 | +Data stored in Ceph is accessible in two places: on the CoreOS filesystem at ``/var/lib/deis/store`` |
| 23 | +and in the store-gateway component. Backing up this data is straightforward - we can simply tarball |
| 24 | +the filesystem data, and use any S3-compatible blob store tool to download all files in the |
| 25 | +store-gateway component. |
24 | 26 |
|
25 | | -Using pg_dump |
26 | | -------------- |
27 | | -Since the database component runs PostgreSQL, ``pg_dumpall`` can also be used to generate a text |
28 | | -dump of the database. |
| 27 | +Setup |
| 28 | +----- |
| 29 | + |
| 30 | +The ``deis-store-gateway`` component exposes an S3-compatible API, so we can use a tool like `s3cmd`_ |
| 31 | +to work with the object store. First, install our fork of s3cmd with a patch for Ceph support: |
| 32 | + |
| 33 | +.. code-block:: console |
| 34 | +
|
| 35 | + $ pip install git+https://github.com/deis/s3cmd |
| 36 | +
|
| 37 | +We'll need the generated access key and secret key for use with the gateway. We can get these using |
| 38 | +``deisctl``, either on one of the cluster machines or on a remote machine with ``DEISCTL_TUNNEL`` set: |
| 39 | + |
| 40 | +.. code-block:: console |
| 41 | +
|
| 42 | + $ deisctl config store get gateway/accessKey |
| 43 | + $ deisctl config store get gateway/secretKey |
| 44 | +
|
| 45 | +Back on the local machine, run ``s3cmd --configure`` and enter your access key and secret key. |
| 46 | +Other settings can be left at the defaults. If the configure script prompts you to test the credentials, |
| 47 | +skip that step - it will try to authenticate against Amazon S3 and fail. |
| 48 | + |
| 49 | +You'll need to change a few additional configuration settings. First, edit ``~/.s3cfg`` and change |
| 50 | +``host_base`` and ``host_bucket`` to match ``deis-store.<your domain>``. For example, for my local |
| 51 | +Vagrant setup, I've changed the lines to: |
| 52 | + |
| 53 | +.. code-block:: console |
| 54 | +
|
| 55 | + host_base = deis-store.local3.deisapp.com |
| 56 | + host_bucket = deis-store.local3.deisapp.com/%(bucket) |
| 57 | +
|
| 58 | +You'll also need to enable ``use_path_mode``: |
| 59 | + |
| 60 | +.. code-block:: console |
| 61 | +
|
| 62 | + use_path_mode = True |
| 63 | +
|
| 64 | +We can now use ``s3cmd`` to back up and restore data from the store-gateway. |
| 65 | + |
| 66 | +Backing up |
| 67 | +---------- |
| 68 | + |
| 69 | +Database backups and registry data |
| 70 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 71 | + |
| 72 | +The store-gateway component stores database backups and is used to store data for the registry. |
| 73 | +On our local machine, we can use ``s3cmd sync`` to copy the objects locally: |
| 74 | + |
| 75 | +.. code-block:: console |
| 76 | +
|
| 77 | + $ s3cmd sync s3://db_wal . |
| 78 | + $ s3cmd sync s3://registry . |
| 79 | +
|
| 80 | +Log data |
| 81 | +~~~~~~~~ |
| 82 | + |
| 83 | +The store-volume service mounts a filesystem which is used by the controller and logger components |
| 84 | +to store and retrieve application and component logs. |
| 85 | + |
| 86 | +Since this is just a POSIX filesystem, you can simply tarball the contents of this directory |
| 87 | +and rsync it to a local machine: |
| 88 | + |
| 89 | +.. code-block:: console |
| 90 | +
|
| 91 | + $ ssh core@<hostname> 'cd /var/lib/deis/store && sudo tar cpzf ~/store_file_backup.tar.gz .' |
| 92 | + tar: /var/lib/deis/store/logs/deis-registry.log: file changed as we read it |
| 93 | + $ rsync -avhe ssh core@<hostname>:~/store_file_backup.tar.gz . |
| 94 | +
|
| 95 | +Note that you'll need to specify the SSH port when using Vagrant: |
| 96 | + |
| 97 | +.. code-block:: console |
| 98 | +
|
| 99 | + $ rsync -avhe 'ssh -p 2222' core@127.0.0.1:~/store_file_backup.tar.gz . |
| 100 | +
|
| 101 | +Note the warning - in a running cluster the log files are constantly being written to, so we are |
| 102 | +preserving a specific moment in time. |
| 103 | + |
| 104 | +Database data |
| 105 | +~~~~~~~~~~~~~ |
| 106 | + |
| 107 | +While backing up the Ceph data is sufficient (as database ships backups and WAL logs to store), |
| 108 | +we can also back up the PostgreSQL data using ``pg_dumpall`` so we have a text dump of the database. |
| 109 | + |
| 110 | +We can identify the machine running database with ``deisctl list``, and from that machine: |
| 111 | + |
| 112 | +.. code-block:: console |
| 113 | +
|
| 114 | + core@deis-1 ~ $ docker exec deis-database sudo -u postgres pg_dumpall > dump_all.sql |
| 115 | + core@deis-1 ~ $ docker cp deis-database:/app/dump_all.sql . |
| 116 | +
|
| 117 | +Restoring |
| 118 | +--------- |
| 119 | + |
| 120 | +.. note:: |
| 121 | + |
| 122 | + Restoring data is only necessary when deploying a new cluster. Most users will use the normal |
| 123 | + in-place upgrade workflow which does not require a restore. |
| 124 | + |
| 125 | +We want to restore the data on a new cluster before the rest of the Deis components come up and |
| 126 | +initialize. So, we will install the whole platform, but only start the store components: |
| 127 | + |
| 128 | +.. code-block:: console |
| 129 | +
|
| 130 | + $ deisctl install platform |
| 131 | + $ deisctl start store-monitor |
| 132 | + $ deisctl start store-daemon |
| 133 | + $ deisctl start store-metadata |
| 134 | + $ deisctl start store-gateway |
| 135 | + $ deisctl start store-volume |
| 136 | +
|
| 137 | +We'll also need to start a router so we can access the gateway: |
| 138 | + |
| 139 | +.. code-block:: console |
| 140 | +
|
| 141 | + $ deisctl start router@1 |
| 142 | +
|
| 143 | +The default maximum body size on the router is too small to support large uploads to the gateway, |
| 144 | +so we need to increase it: |
| 145 | + |
| 146 | +.. code-block:: console |
| 147 | +
|
| 148 | + $ deisctl config router set bodySize=100m |
| 149 | +
|
| 150 | +The new cluster will have generated a new access key and secret key, so we'll need to get those again: |
| 151 | + |
| 152 | +.. code-block:: console |
| 153 | +
|
| 154 | + $ deisctl config store get gateway/accessKey |
| 155 | + $ deisctl config store get gateway/secretKey |
| 156 | +
|
| 157 | +Edit ``~/.s3cfg`` and update the keys. |
| 158 | + |
| 159 | +Now we can restore the data! |
| 160 | + |
| 161 | +Database backups and registry data |
| 162 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 163 | + |
| 164 | +Because neither the database nor registry have started, the bucket we need to restore to will not |
| 165 | +yet exist. So, we'll need to create those buckets: |
| 166 | + |
| 167 | +.. code-block:: console |
| 168 | +
|
| 169 | + $ s3cmd mb s3://db_wal |
| 170 | + $ s3cmd mb s3://registry |
| 171 | +
|
| 172 | +Now we can restore the data: |
29 | 173 |
|
30 | 174 | .. code-block:: console |
31 | 175 |
|
32 | | - dev $ fleetctl ssh deis-database.service |
33 | | - coreos $ nse deis-database |
34 | | - coreos $ sudo -u postgres pg_dumpall > pg_dump.sql |
| 176 | + $ s3cmd sync basebackups_005 s3://db_wal |
| 177 | + $ s3cmd sync wal_005 s3://db_wal |
| 178 | + $ s3cmd sync registry s3://registry |
| 179 | +
|
| 180 | +Log data |
| 181 | +~~~~~~~~ |
| 182 | + |
| 183 | +Once we copy the tarball back to one of the CoreOS machines, we can extract it: |
| 184 | + |
| 185 | +.. code-block:: console |
| 186 | +
|
| 187 | + $ rsync -avhe ssh store_file_backup.tar.gz core@<hostname>:~/store_file_backup.tar.gz |
| 188 | + $ ssh core@<hostname> 'cd /var/lib/deis/store && sudo tar -xzpf ~/store_file_backup.tar.gz --same-owner' |
| 189 | +
|
| 190 | +Note that you'll need to specify the SSH port when using Vagrant: |
| 191 | + |
| 192 | +.. code-block:: console |
| 193 | +
|
| 194 | + $ rsync -avhe 'ssh -p 2222' store_file_backup.tar.gz core@127.0.0.1:~/store_file_backup.tar.gz |
| 195 | +
|
| 196 | +Finishing up |
| 197 | +~~~~~~~~~~~~ |
| 198 | + |
| 199 | +Now that the data is restored, the rest of the cluster should come up normally with a ``deisctl start platform``. |
| 200 | + |
| 201 | +The last task is to instruct the controller to re-write user keys, application data, and domains to etcd. |
| 202 | +Log into the machine which runs deis-controller and run the following. Note that the IP address to |
| 203 | +use in the ``export`` command should correspond to the IP of the host machine which runs this container. |
| 204 | + |
| 205 | +.. code-block:: console |
| 206 | +
|
| 207 | + $ nse deis-controller |
| 208 | + $ cd /app |
| 209 | + $ export ETCD=172.17.8.100:4001 |
| 210 | + ./manage.py shell <<EOF |
| 211 | + from api.models import * |
| 212 | + [k.save() for k in Key.objects.all()] |
| 213 | + [a.save() for a in App.objects.all()] |
| 214 | + [d.save() for d in Domain.objects.all()] |
| 215 | + EOF |
| 216 | + $ exit |
| 217 | +
|
| 218 | +.. note:: |
35 | 219 |
|
36 | | -etcd |
37 | | ----- |
38 | | -Service state and fleet scheduling data is stored in etcd. Unfortunately, there is currently no |
39 | | -recommended backup solution for etcd. However, there is a third-party tool called `etcd-dump`_ which |
40 | | -can be used to dump the data stored in etcd. |
| 220 | + The database keeps track of running application containers. Since this is a fresh cluster, it is |
| 221 | + advisable to ``deis scale <proctype>=0`` and then ``deis scale`` back up to the desired number of |
| 222 | + containers for an application. This ensures the database has an accurate view of the cluster. |
41 | 223 |
|
42 | | -Official backup recommendations for etcd are forthcoming. The CoreOS team is tracking etcd update |
43 | | -documentation in `#683`_. |
| 224 | +That's it! The cluster should be fully restored. |
44 | 225 |
|
45 | | -.. _`#683`: https://github.com/coreos/etcd/issues/683 |
46 | | -.. _`etcd-dump`: https://github.com/AaronO/etcd-dump |
47 | 226 | .. _`Ceph`: http://ceph.com |
48 | | -.. _`rados`: http://ceph.com/docs/master/man/8/rados |
| 227 | +.. _`s3cmd`: http://s3tools.org/ |
0 commit comments