Skip to content

Commit 5e05482

Browse files
committed
docs(managing_deis): add store to backup/restore docs
1 parent 816e718 commit 5e05482

2 files changed

Lines changed: 210 additions & 29 deletions

File tree

Lines changed: 208 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,227 @@
1-
:title: Backing up Data
1+
:title: Backing Up and Restoring Data
22
:description: Backing up stateful data on Deis.
33

44
.. _backing_up_data:
55

6-
Backing up Data
7-
========================
6+
Backing Up and Restoring Data
7+
=============================
88

99
While applications deployed on Deis follow the Twelve-Factor methodology and are thus stateless,
10-
Deis maintains platform state in two places: the :ref:`Store` component, and in etcd.
10+
Deis maintains platform state in the :ref:`Store` component.
1111

12-
Store component
13-
---------------
14-
The store component runs `Ceph`_, and is used by the :ref:`Database` and :ref:`Registry` components
15-
as a data store. This enables the components themselves to freely move around the cluster while
16-
their state is backed by store.
12+
The store component runs `Ceph`_, and is used by the :ref:`Database`, :ref:`Registry`,
13+
:ref:`Controller`, and :ref:`Logger` components as a data store. Database and registry
14+
use store-gateway and controller and logger use store-volume. Being backed by the store component
15+
enables these components to move freely around the cluster while their state is backed by store.
1716

1817
The store component is configured to still operate in a degraded state, and will automatically
1918
recover should a host fail and then rejoin the cluster. Total data loss of Ceph is only possible
20-
if all of the store containers are removed. However, backup of Ceph is fairly straightforward.
19+
if all of the store containers are removed. However, backup of Ceph is fairly straightforward, and
20+
is recommended before :ref:`Upgrading Deis <upgrading-deis>`.
2121

22-
Data in Ceph is stored on the filesystem in ``/var/lib/ceph``, and metadata information is stored
23-
within Ceph. Ceph provides the ability to take snapshots of storage pools with the `rados`_ command.
22+
Data stored in Ceph is accessible in two places: on the CoreOS filesystem at ``/var/lib/deis/store``
23+
and in the store-gateway component. Backing up this data is straightforward - we can simply tarball
24+
the filesystem data, and use any S3-compatible blob store tool to download all files in the
25+
store-gateway component.
2426

25-
Using pg_dump
26-
-------------
27-
Since the database component runs PostgreSQL, ``pg_dumpall`` can also be used to generate a text
28-
dump of the database.
27+
Setup
28+
-----
29+
30+
The ``deis-store-gateway`` component exposes an S3-compatible API, so we can use a tool like `s3cmd`_
31+
to work with the object store. First, install our fork of s3cmd with a patch for Ceph support:
32+
33+
.. code-block:: console
34+
35+
$ pip install git+https://github.com/deis/s3cmd
36+
37+
We'll need the generated access key and secret key for use with the gateway. We can get these using
38+
``deisctl``, either on one of the cluster machines or on a remote machine with ``DEISCTL_TUNNEL`` set:
39+
40+
.. code-block:: console
41+
42+
$ deisctl config store get gateway/accessKey
43+
$ deisctl config store get gateway/secretKey
44+
45+
Back on the local machine, run ``s3cmd --configure`` and enter your access key and secret key.
46+
Other settings can be left at the defaults. If the configure script prompts you to test the credentials,
47+
skip that step - it will try to authenticate against Amazon S3 and fail.
48+
49+
You'll need to change a few additional configuration settings. First, edit ``~/.s3cfg`` and change
50+
``host_base`` and ``host_bucket`` to match ``deis-store.<your domain>``. For example, for my local
51+
Vagrant setup, I've changed the lines to:
52+
53+
.. code-block:: console
54+
55+
host_base = deis-store.local3.deisapp.com
56+
host_bucket = deis-store.local3.deisapp.com/%(bucket)
57+
58+
You'll also need to enable ``use_path_mode``:
59+
60+
.. code-block:: console
61+
62+
use_path_mode = True
63+
64+
We can now use ``s3cmd`` to back up and restore data from the store-gateway.
65+
66+
Backing up
67+
----------
68+
69+
Database backups and registry data
70+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71+
72+
The store-gateway component stores database backups and is used to store data for the registry.
73+
On our local machine, we can use ``s3cmd sync`` to copy the objects locally:
74+
75+
.. code-block:: console
76+
77+
$ s3cmd sync s3://db_wal .
78+
$ s3cmd sync s3://registry .
79+
80+
Log data
81+
~~~~~~~~
82+
83+
The store-volume service mounts a filesystem which is used by the controller and logger components
84+
to store and retrieve application and component logs.
85+
86+
Since this is just a POSIX filesystem, you can simply tarball the contents of this directory
87+
and rsync it to a local machine:
88+
89+
.. code-block:: console
90+
91+
$ ssh core@<hostname> 'cd /var/lib/deis/store && sudo tar cpzf ~/store_file_backup.tar.gz .'
92+
tar: /var/lib/deis/store/logs/deis-registry.log: file changed as we read it
93+
$ rsync -avhe ssh core@<hostname>:~/store_file_backup.tar.gz .
94+
95+
Note that you'll need to specify the SSH port when using Vagrant:
96+
97+
.. code-block:: console
98+
99+
$ rsync -avhe 'ssh -p 2222' core@127.0.0.1:~/store_file_backup.tar.gz .
100+
101+
Note the warning - in a running cluster the log files are constantly being written to, so we are
102+
preserving a specific moment in time.
103+
104+
Database data
105+
~~~~~~~~~~~~~
106+
107+
While backing up the Ceph data is sufficient (as database ships backups and WAL logs to store),
108+
we can also back up the PostgreSQL data using ``pg_dumpall`` so we have a text dump of the database.
109+
110+
We can identify the machine running database with ``deisctl list``, and from that machine:
111+
112+
.. code-block:: console
113+
114+
core@deis-1 ~ $ docker exec deis-database sudo -u postgres pg_dumpall > dump_all.sql
115+
core@deis-1 ~ $ docker cp deis-database:/app/dump_all.sql .
116+
117+
Restoring
118+
---------
119+
120+
.. note::
121+
122+
Restoring data is only necessary when deploying a new cluster. Most users will use the normal
123+
in-place upgrade workflow which does not require a restore.
124+
125+
We want to restore the data on a new cluster before the rest of the Deis components come up and
126+
initialize. So, we will install the whole platform, but only start the store components:
127+
128+
.. code-block:: console
129+
130+
$ deisctl install platform
131+
$ deisctl start store-monitor
132+
$ deisctl start store-daemon
133+
$ deisctl start store-metadata
134+
$ deisctl start store-gateway
135+
$ deisctl start store-volume
136+
137+
We'll also need to start a router so we can access the gateway:
138+
139+
.. code-block:: console
140+
141+
$ deisctl start router@1
142+
143+
The default maximum body size on the router is too small to support large uploads to the gateway,
144+
so we need to increase it:
145+
146+
.. code-block:: console
147+
148+
$ deisctl config router set bodySize=100m
149+
150+
The new cluster will have generated a new access key and secret key, so we'll need to get those again:
151+
152+
.. code-block:: console
153+
154+
$ deisctl config store get gateway/accessKey
155+
$ deisctl config store get gateway/secretKey
156+
157+
Edit ``~/.s3cfg`` and update the keys.
158+
159+
Now we can restore the data!
160+
161+
Database backups and registry data
162+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
163+
164+
Because neither the database nor registry have started, the bucket we need to restore to will not
165+
yet exist. So, we'll need to create those buckets:
166+
167+
.. code-block:: console
168+
169+
$ s3cmd mb s3://db_wal
170+
$ s3cmd mb s3://registry
171+
172+
Now we can restore the data:
29173

30174
.. code-block:: console
31175
32-
dev $ fleetctl ssh deis-database.service
33-
coreos $ nse deis-database
34-
coreos $ sudo -u postgres pg_dumpall > pg_dump.sql
176+
$ s3cmd sync basebackups_005 s3://db_wal
177+
$ s3cmd sync wal_005 s3://db_wal
178+
$ s3cmd sync registry s3://registry
179+
180+
Log data
181+
~~~~~~~~
182+
183+
Once we copy the tarball back to one of the CoreOS machines, we can extract it:
184+
185+
.. code-block:: console
186+
187+
$ rsync -avhe ssh store_file_backup.tar.gz core@<hostname>:~/store_file_backup.tar.gz
188+
$ ssh core@<hostname> 'cd /var/lib/deis/store && sudo tar -xzpf ~/store_file_backup.tar.gz --same-owner'
189+
190+
Note that you'll need to specify the SSH port when using Vagrant:
191+
192+
.. code-block:: console
193+
194+
$ rsync -avhe 'ssh -p 2222' store_file_backup.tar.gz core@127.0.0.1:~/store_file_backup.tar.gz
195+
196+
Finishing up
197+
~~~~~~~~~~~~
198+
199+
Now that the data is restored, the rest of the cluster should come up normally with a ``deisctl start platform``.
200+
201+
The last task is to instruct the controller to re-write user keys, application data, and domains to etcd.
202+
Log into the machine which runs deis-controller and run the following. Note that the IP address to
203+
use in the ``export`` command should correspond to the IP of the host machine which runs this container.
204+
205+
.. code-block:: console
206+
207+
$ nse deis-controller
208+
$ cd /app
209+
$ export ETCD=172.17.8.100:4001
210+
./manage.py shell <<EOF
211+
from api.models import *
212+
[k.save() for k in Key.objects.all()]
213+
[a.save() for a in App.objects.all()]
214+
[d.save() for d in Domain.objects.all()]
215+
EOF
216+
$ exit
217+
218+
.. note::
35219

36-
etcd
37-
----
38-
Service state and fleet scheduling data is stored in etcd. Unfortunately, there is currently no
39-
recommended backup solution for etcd. However, there is a third-party tool called `etcd-dump`_ which
40-
can be used to dump the data stored in etcd.
220+
The database keeps track of running application containers. Since this is a fresh cluster, it is
221+
advisable to ``deis scale <proctype>=0`` and then ``deis scale`` back up to the desired number of
222+
containers for an application. This ensures the database has an accurate view of the cluster.
41223

42-
Official backup recommendations for etcd are forthcoming. The CoreOS team is tracking etcd update
43-
documentation in `#683`_.
224+
That's it! The cluster should be fully restored.
44225

45-
.. _`#683`: https://github.com/coreos/etcd/issues/683
46-
.. _`etcd-dump`: https://github.com/AaronO/etcd-dump
47226
.. _`Ceph`: http://ceph.com
48-
.. _`rados`: http://ceph.com/docs/master/man/8/rados
227+
.. _`s3cmd`: http://s3tools.org/

docs/managing_deis/upgrading-deis.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ There are currently two strategies for upgrading a Deis cluster:
1212
* In-place Upgrade (recommended)
1313
* Migration Upgrade
1414

15+
Before attempting an upgrade, it is strongly recommended to :ref:`backup your data <backing_up_data>`.
16+
1517
In-place Upgrade
1618
----------------
1719

0 commit comments

Comments
 (0)