Skip to content

Commit 03b087e

Browse files
committed
Merge pull request #1994 from carmstrong/store_docs
docs(*): doc changes based on store
2 parents 1d03fac + 17e2d1f commit 03b087e

19 files changed

Lines changed: 553 additions & 134 deletions

docs/installing_deis/upgrading-deis.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,8 @@ To migrate over, start by pointing the new cluster at the old cluster's endpoint
122122

123123
.. code-block:: console
124124
125-
$ etcdctl set /deis/database/host pqsl.example.org
126-
$ etcdctl set /deis/database/port 1234
125+
$ deisctl config database set host pqsl.example.org
126+
$ deisctl config database set port 1234
127127
...
128128
129129
Next, you'll also want to migrate over the application directories:
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
:title: Addding/Removing Hosts
2+
:description: Considerations for adding or removing Deis hosts.
3+
4+
.. _add_remove_host:
5+
6+
Adding/Removing Hosts
7+
=====================
8+
9+
Most Deis components handle new machines just fine. Care has to be taken when removing machines from the cluster, however, since the deis-store components act as the backing store for all the stateful data Deis needs to function properly.
10+
11+
Note that these instructions follow the Ceph documentation for `removing monitors`_ and `removing OSDs`_. Should these instructions differ significantly from the Ceph documentation, the Ceph documentation should be followed, and a PR to update this documentation would be much appreciated.
12+
13+
Since Ceph uses the Paxos algorithm, it is important to always have enough monitors in the cluster to be able to achieve a majority: 1:1, 2:3, 3:4, 3:5, 4:6, etc. It is always preferable to add a new node to the cluster before removing an old one, if possible.
14+
15+
This documentation will assume a running three-node Deis cluster. We will add a fourth machine to the cluster, then remove the first machine.
16+
17+
Inspecting health
18+
-----------------
19+
20+
Before we begin, we should check the state of the Ceph cluster to be sure it's healthy. We can do this by logging into any machine in the cluster, entering a store container, and then querying Ceph:
21+
22+
.. code-block:: console
23+
24+
core@deis-1 ~ $ nse deis-store-monitor
25+
groups: cannot find name for group ID 11
26+
root@deis-1:/# ceph -s
27+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
28+
health HEALTH_OK
29+
monmap e3: 3 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0}, election epoch 8, quorum 0,1,2 deis-1,deis-2,deis-3
30+
osdmap e18: 3 osds: 3 up, 3 in
31+
pgmap v31: 960 pgs, 9 pools, 1158 bytes data, 45 objects
32+
16951 MB used, 31753 MB / 49200 MB avail
33+
960 active+clean
34+
35+
We see from the ``pgmap`` that we have 960 placement groups, all of which are ``active+clean``. This is good!
36+
37+
Adding a node
38+
-------------
39+
40+
To add a new node to your Deis cluster, simply provision a new CoreOS machine with the same etcd discovery URL specified in the cloud-config file. When the new machine comes up, it will join the etcd cluster. You can confirm this with ``fleetctl list-machines``.
41+
42+
Since logspout, publisher, store-monitor, and store-daemon are global units, they will be automatically started on the new node.
43+
44+
Once the new machine is running, we can inspect the Ceph cluster health again:
45+
46+
.. code-block:: console
47+
48+
core@deis-1 ~ $ nse deis-store-monitor
49+
groups: cannot find name for group ID 11
50+
root@deis-1:/# ceph -s
51+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
52+
health HEALTH_WARN clock skew detected on mon.deis-4
53+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
54+
osdmap e22: 4 osds: 4 up, 4 in
55+
pgmap v43: 960 pgs, 9 pools, 1158 bytes data, 45 objects
56+
22584 MB used, 42352 MB / 65600 MB avail
57+
960 active+clean
58+
59+
Note that we have:
60+
61+
.. code-block:: console
62+
63+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
64+
osdmap e22: 4 osds: 4 up, 4 in
65+
66+
We have 4 monitors and OSDs. Hooray!
67+
68+
Removing a node
69+
---------------
70+
71+
When removing a node from the cluster that runs a deis-store component, you'll need to tell Ceph that both the store-daemon and store-monitor running on this host will be leaving the cluster. We're going to remove the first node in our cluster, deis-1. That machine has an IP address of ``172.17.8.100``.
72+
73+
Removing an OSD
74+
~~~~~~~~~~~~~~~
75+
76+
Before we can tell Ceph to remove an OSD, we need the OSD ID. We can get this from etcd:
77+
78+
.. code-block:: console
79+
80+
core@deis-2 ~ $ etcdctl get /deis/store/osds/172.17.8.100
81+
1
82+
83+
Note: In some cases, we may not know the IP or hostname or the machine we want to remove. In these cases, we can use ``ceph osd tree`` to see the current state of the cluster. This will list all the OSDs in the cluster, and report which ones are down.
84+
85+
Now that we have the OSD's ID, let's remove it. We'll need a shell in any store-monitor or store-daemon container on any host in the cluster (except the one we're removing). In this example, I am on ``deis-2``.
86+
87+
.. code-block:: console
88+
89+
core@deis-2 ~ $ nse deis-store-monitor
90+
groups: cannot find name for group ID 11
91+
root@deis-2:/# ceph osd out 1
92+
marked out osd.1.
93+
94+
95+
This instructs Ceph to start relocating placement groups on that OSD to another host. We can watch this with ``ceph -w``:
96+
97+
.. code-block:: console
98+
99+
root@deis-2:/# ceph -w
100+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
101+
health HEALTH_WARN clock skew detected on mon.deis-4
102+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
103+
osdmap e24: 4 osds: 4 up, 3 in
104+
pgmap v58: 960 pgs, 9 pools, 1158 bytes data, 45 objects
105+
16900 MB used, 31793 MB / 49200 MB avail
106+
960 active+clean
107+
108+
2014-10-07 17:55:11.900151 mon.0 [INF] pgmap v58: 960 pgs: 960 active+clean; 1158 bytes data, 16900 MB used, 31793 MB / 49200 MB avail; 29 B/s, 3 objects/s recovering
109+
2014-10-07 17:56:38.860305 mon.0 [INF] pgmap v59: 960 pgs: 960 active+clean; 1158 bytes data, 16900 MB used, 31793 MB / 49200 MB avail
110+
111+
We can see that the placement groups are back in a clean state. We can now stop the daemon. Since the store units are global units, we can't target a specific one to stop. Instead, we log into the host machine and instruct Docker to stop the container:
112+
113+
.. code-block:: console
114+
115+
core@deis-1 ~ $ docker stop deis-store-daemon
116+
deis-store-daemon
117+
118+
Back inside a store container on ``deis-2``, we can finally remove the OSD:
119+
120+
.. code-block:: console
121+
122+
core@deis-2 ~ $ nse deis-store-monitor
123+
groups: cannot find name for group ID 11
124+
root@deis-2:/# ceph osd crush remove osd.1
125+
removed item id 1 name 'osd.1' from crush map
126+
root@deis-2:/# ceph auth del osd.1
127+
updated
128+
root@deis-2:/# ceph osd rm 1
129+
removed osd.1
130+
131+
For cleanup, we should remove the OSD entry from etcd:
132+
133+
.. code-block:: console
134+
135+
core@deis-2 ~ $ etcdctl rm /deis/store/osds/172.17.8.100
136+
137+
That's it! If we inspect the health, we see that there are now 3 osds again, and all of our placement groups are ``active+clean``.
138+
139+
.. code-block:: console
140+
141+
core@deis-2 ~ $ nse deis-store-monitor
142+
groups: cannot find name for group ID 11
143+
root@deis-2:/# ceph -s
144+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
145+
health HEALTH_WARN clock skew detected on mon.deis-4
146+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
147+
osdmap e28: 3 osds: 3 up, 3 in
148+
pgmap v81: 960 pgs, 9 pools, 1158 bytes data, 45 objects
149+
16915 MB used, 31779 MB / 49200 MB avail
150+
960 active+clean
151+
152+
Removing a monitor
153+
~~~~~~~~~~~~~~~~~~
154+
155+
Removing a monitor is much easier. First, we remove the etcd entry so any clients that are using Ceph won't use the monitor for connecting:
156+
157+
.. code-block:: console
158+
159+
$ etcdctl rm /deis/store/hosts/172.17.8.100
160+
161+
Within 5 seconds, confd will run on all store clients and remove the monitor from the ``ceph.conf`` configuration file.
162+
163+
Next, we stop the container:
164+
165+
.. code-block:: console
166+
167+
core@deis-1 ~ $ docker stop deis-store-monitor
168+
deis-store-monitor
169+
170+
171+
Back on another host, we can again enter a store container and then remove this monitor:
172+
173+
.. code-block:: console
174+
175+
root@deis-2:/# ceph mon remove deis-1
176+
2014-10-07 18:14:38.055584 7fab0d6e7700 0 monclient: hunting for new mon
177+
2014-10-07 18:14:38.055584 7fab0d6e7700 0 monclient: hunting for new mon
178+
removed mon.deis-1 at 172.17.8.100:6789/0, there are now 3 monitors
179+
2014-10-07 18:14:38.072885 7fab0c5e4700 0 -- 172.17.8.101:0/1000361 >> 172.17.8.100:6789/0 pipe(0x7faafc007c90 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7faafc007f00).fault
180+
2014-10-07 18:14:38.072885 7fab0c5e4700 0 -- 172.17.8.101:0/1000361 >> 172.17.8.100:6789/0 pipe(0x7faafc007c90 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7faafc007f00).fault
181+
182+
Note the faults that follow - this is normal to see when a Ceph client is unable to communicate with a certain monitor. The important line is that we see ``removed mon.deis-1 at 172.17.8.100:6789/0, there are now 3 monitors``.
183+
184+
Finally, let's check the health of the cluster:
185+
186+
.. code-block:: console
187+
188+
root@deis-2:/# ceph -s
189+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
190+
health HEALTH_OK
191+
monmap e5: 3 mons at {deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 16, quorum 0,1,2 deis-2,deis-3,deis-4
192+
osdmap e28: 3 osds: 3 up, 3 in
193+
pgmap v91: 960 pgs, 9 pools, 1158 bytes data, 45 objects
194+
16927 MB used, 31766 MB / 49200 MB avail
195+
960 active+clean
196+
197+
We're done!
198+
199+
.. _`removing monitors`: http://ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing-monitors
200+
.. _`removing OSDs`: http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-osds/#removing-osds-manual
201+

docs/managing_deis/backing_up_data.rst

Lines changed: 13 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,37 +7,20 @@ Backing up Data
77
========================
88

99
While applications deployed on Deis follow the Twelve-Factor methodology and are thus stateless,
10-
Deis maintains platform state in two places: data containers and etcd.
10+
Deis maintains platform state in two places: the :ref:`Store` component, and in etcd.
1111

12-
Data containers
12+
Store component
1313
---------------
14-
Data containers are simply Docker containers that expose a volume which is shared with another container.
15-
The components with data containers are builder, database, logger, and registry. Since these are just
16-
Docker containers, they can be exported with ordinary Docker commands:
14+
The store component runs `Ceph`_, and is used by the :ref:`Database` and :ref:`Registry` components
15+
as a data store. This enables the components themselves to freely move around the cluster while
16+
their state is backed by store.
1717

18-
.. code-block:: console
19-
20-
dev $ fleetctl ssh deis-builder.service
21-
coreos $ sudo docker export deis-builder-data > /home/coreos/deis-builder-data-backup.tar
22-
dev $ fleetctl ssh deis-database.service
23-
coreos $ sudo docker export deis-database-data > /home/coreos/deis-database-data-backup.tar
24-
dev $ fleetctl ssh deis-logger.service
25-
coreos $ sudo docker export deis-logger-data > /home/coreos/deis-logger-data-backup.tar
26-
dev $ fleetctl ssh deis-registry.service
27-
coreos $ sudo docker export deis-registry-data > /home/coreos/deis-registry-data-backup.tar
28-
29-
Importing looks very similar:
30-
31-
.. code-block:: console
18+
The store component is configured to still operate in a degraded state, and will automatically
19+
recover should a host fail and then rejoin the cluster. Total data loss of Ceph is only possible
20+
if all of the store containers are removed. However, backup of Ceph is fairly straightforward.
3221

33-
dev $ fleetctl ssh deis-builder.service
34-
coreos $ cat /home/coreos/deis-builder-data-backup.tar | sudo docker import - deis-builder-data
35-
dev $ fleetctl ssh deis-database.service
36-
coreos $ cat /home/coreos/deis-database-data-backup.tar | sudo docker import - deis-database-data
37-
dev $ fleetctl ssh deis-logger.service
38-
coreos $ cat /home/coreos/deis-logger-data-backup.tar | sudo docker import - deis-logger-data
39-
dev $ fleetctl ssh deis-registry.service
40-
coreos $ cat /home/coreos/deis-registry-data-backup.tar | sudo docker import - deis-registry-data
22+
Data in Ceph is stored on the filesystem in ``/var/lib/ceph``, and metadata information is stored
23+
within Ceph. Ceph provides the ability to take snapshots of storage pools with the `rados`_ command.
4124

4225
Using pg_dump
4326
-------------
@@ -46,7 +29,7 @@ dump of the database.
4629

4730
.. code-block:: console
4831
49-
dev $ fleetctl ssh deis-database.service
32+
dev $ fleetctl ssh deis-database@1.service
5033
coreos $ nse deis-database
5134
coreos $ sudo -u postgres pg_dumpall > pg_dump.sql
5235
@@ -61,3 +44,5 @@ documentation in `#683`_.
6144

6245
.. _`#683`: https://github.com/coreos/etcd/issues/683
6346
.. _`etcd-dump`: https://github.com/AaronO/etcd-dump
47+
.. _`Ceph`: http://ceph.com
48+
.. _`rados`: http://ceph.com/docs/master/man/8/rados

docs/managing_deis/builder_settings.rst

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ Requires: :ref:`controller <controller_settings>`, :ref:`registry <registry_sett
1313

1414
Required by: :ref:`router <router_settings>`
1515

16-
Considerations: must live on the same host as controller (see `#985`_)
17-
1816
Settings set by builder
1917
-----------------------
2018
The following etcd keys are set by the builder component, typically in its /bin/boot script.
@@ -40,7 +38,7 @@ setting description
4038
/deis/controller/protocol protocol of the controller component (set by controller)
4139
/deis/registry/host host of the controller component (set by registry)
4240
/deis/registry/port port of the controller component (set by registry)
43-
/deis/services/* application metadata (set by controller)
41+
/deis/services/* healthy application containers reported by deis/publisher
4442
/deis/slugbuilder/image slugbuilder image to use (default: deis/slugbuilder:latest)
4543
/deis/slugrunner/image slugrunner image to use (default: deis/slugrunner:latest)
4644
==================================== ===========================================================
@@ -52,14 +50,14 @@ supplied with Deis:
5250

5351
.. code-block:: console
5452
55-
$ etcdctl set /deis/builder/image myaccount/myimage:latest
53+
$ deisctl config builder set image myaccount/myimage:latest
5654
5755
This will pull the image from the public Docker registry. You can also pull from a private
5856
registry:
5957

6058
.. code-block:: console
6159
62-
$ etcdctl set /deis/builder/image registry.mydomain.org:5000/myaccount/myimage:latest
60+
$ deisctl config builder set image registry.mydomain.org:5000/myaccount/myimage:latest
6361
6462
Be sure that your custom image functions in the same way as the `stock builder image`_ shipped with
6563
Deis. Specifically, ensure that it sets and reads appropriate etcd keys.

docs/managing_deis/cache_settings.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,14 @@ supplied with Deis:
3737

3838
.. code-block:: console
3939
40-
$ etcdctl set /deis/cache/image myaccount/myimage:latest
40+
$ deisctl config cache set image myaccount/myimage:latest
4141
4242
This will pull the image from the public Docker registry. You can also pull from a private
4343
registry:
4444

4545
.. code-block:: console
4646
47-
$ etcdctl set /deis/cache/image registry.mydomain.org:5000/myaccount/myimage:latest
47+
$ deisctl config cache set image registry.mydomain.org:5000/myaccount/myimage:latest
4848
4949
Be sure that your custom image functions in the same way as the `stock cache image`_ shipped with
5050
Deis. Specifically, ensure that it sets and reads appropriate etcd keys.

docs/managing_deis/controller_settings.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Requires: :ref:`controller <controller_settings>`, :ref:`cache <cache_settings>`
1313

1414
Required by: :ref:`router <router_settings>`
1515

16-
Considerations: must live on the same host as both builder and logger (see `#985`_)
16+
Considerations: must live on the same host as logger (see `#985`_)
1717

1818
Settings set by controller
1919
--------------------------
@@ -60,14 +60,14 @@ supplied with Deis:
6060

6161
.. code-block:: console
6262
63-
$ etcdctl set /deis/controller/image myaccount/myimage:latest
63+
$ deisctl config controller set image myaccount/myimage:latest
6464
6565
This will pull the image from the public Docker registry. You can also pull from a private
6666
registry:
6767

6868
.. code-block:: console
6969
70-
$ etcdctl set /deis/controller/image registry.mydomain.org:5000/myaccount/myimage:latest
70+
$ deisctl config controller set image registry.mydomain.org:5000/myaccount/myimage:latest
7171
7272
Be sure that your custom image functions in the same way as the `stock controller image`_ shipped with
7373
Deis. Specifically, ensure that it sets and reads appropriate etcd keys.

0 commit comments

Comments
 (0)