Skip to content

Commit 17e2d1f

Browse files
committed
docs(managing_deis): rewrite several pages based on store
[skip ci]
1 parent 009ddf7 commit 17e2d1f

12 files changed

Lines changed: 324 additions & 111 deletions
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
:title: Addding/Removing Hosts
2+
:description: Considerations for adding or removing Deis hosts.
3+
4+
.. _add_remove_host:
5+
6+
Adding/Removing Hosts
7+
=====================
8+
9+
Most Deis components handle new machines just fine. Care has to be taken when removing machines from the cluster, however, since the deis-store components act as the backing store for all the stateful data Deis needs to function properly.
10+
11+
Note that these instructions follow the Ceph documentation for `removing monitors`_ and `removing OSDs`_. Should these instructions differ significantly from the Ceph documentation, the Ceph documentation should be followed, and a PR to update this documentation would be much appreciated.
12+
13+
Since Ceph uses the Paxos algorithm, it is important to always have enough monitors in the cluster to be able to achieve a majority: 1:1, 2:3, 3:4, 3:5, 4:6, etc. It is always preferable to add a new node to the cluster before removing an old one, if possible.
14+
15+
This documentation will assume a running three-node Deis cluster. We will add a fourth machine to the cluster, then remove the first machine.
16+
17+
Inspecting health
18+
-----------------
19+
20+
Before we begin, we should check the state of the Ceph cluster to be sure it's healthy. We can do this by logging into any machine in the cluster, entering a store container, and then querying Ceph:
21+
22+
.. code-block:: console
23+
24+
core@deis-1 ~ $ nse deis-store-monitor
25+
groups: cannot find name for group ID 11
26+
root@deis-1:/# ceph -s
27+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
28+
health HEALTH_OK
29+
monmap e3: 3 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0}, election epoch 8, quorum 0,1,2 deis-1,deis-2,deis-3
30+
osdmap e18: 3 osds: 3 up, 3 in
31+
pgmap v31: 960 pgs, 9 pools, 1158 bytes data, 45 objects
32+
16951 MB used, 31753 MB / 49200 MB avail
33+
960 active+clean
34+
35+
We see from the ``pgmap`` that we have 960 placement groups, all of which are ``active+clean``. This is good!
36+
37+
Adding a node
38+
-------------
39+
40+
To add a new node to your Deis cluster, simply provision a new CoreOS machine with the same etcd discovery URL specified in the cloud-config file. When the new machine comes up, it will join the etcd cluster. You can confirm this with ``fleetctl list-machines``.
41+
42+
Since logspout, publisher, store-monitor, and store-daemon are global units, they will be automatically started on the new node.
43+
44+
Once the new machine is running, we can inspect the Ceph cluster health again:
45+
46+
.. code-block:: console
47+
48+
core@deis-1 ~ $ nse deis-store-monitor
49+
groups: cannot find name for group ID 11
50+
root@deis-1:/# ceph -s
51+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
52+
health HEALTH_WARN clock skew detected on mon.deis-4
53+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
54+
osdmap e22: 4 osds: 4 up, 4 in
55+
pgmap v43: 960 pgs, 9 pools, 1158 bytes data, 45 objects
56+
22584 MB used, 42352 MB / 65600 MB avail
57+
960 active+clean
58+
59+
Note that we have:
60+
61+
.. code-block:: console
62+
63+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
64+
osdmap e22: 4 osds: 4 up, 4 in
65+
66+
We have 4 monitors and OSDs. Hooray!
67+
68+
Removing a node
69+
---------------
70+
71+
When removing a node from the cluster that runs a deis-store component, you'll need to tell Ceph that both the store-daemon and store-monitor running on this host will be leaving the cluster. We're going to remove the first node in our cluster, deis-1. That machine has an IP address of ``172.17.8.100``.
72+
73+
Removing an OSD
74+
~~~~~~~~~~~~~~~
75+
76+
Before we can tell Ceph to remove an OSD, we need the OSD ID. We can get this from etcd:
77+
78+
.. code-block:: console
79+
80+
core@deis-2 ~ $ etcdctl get /deis/store/osds/172.17.8.100
81+
1
82+
83+
Note: In some cases, we may not know the IP or hostname or the machine we want to remove. In these cases, we can use ``ceph osd tree`` to see the current state of the cluster. This will list all the OSDs in the cluster, and report which ones are down.
84+
85+
Now that we have the OSD's ID, let's remove it. We'll need a shell in any store-monitor or store-daemon container on any host in the cluster (except the one we're removing). In this example, I am on ``deis-2``.
86+
87+
.. code-block:: console
88+
89+
core@deis-2 ~ $ nse deis-store-monitor
90+
groups: cannot find name for group ID 11
91+
root@deis-2:/# ceph osd out 1
92+
marked out osd.1.
93+
94+
95+
This instructs Ceph to start relocating placement groups on that OSD to another host. We can watch this with ``ceph -w``:
96+
97+
.. code-block:: console
98+
99+
root@deis-2:/# ceph -w
100+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
101+
health HEALTH_WARN clock skew detected on mon.deis-4
102+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
103+
osdmap e24: 4 osds: 4 up, 3 in
104+
pgmap v58: 960 pgs, 9 pools, 1158 bytes data, 45 objects
105+
16900 MB used, 31793 MB / 49200 MB avail
106+
960 active+clean
107+
108+
2014-10-07 17:55:11.900151 mon.0 [INF] pgmap v58: 960 pgs: 960 active+clean; 1158 bytes data, 16900 MB used, 31793 MB / 49200 MB avail; 29 B/s, 3 objects/s recovering
109+
2014-10-07 17:56:38.860305 mon.0 [INF] pgmap v59: 960 pgs: 960 active+clean; 1158 bytes data, 16900 MB used, 31793 MB / 49200 MB avail
110+
111+
We can see that the placement groups are back in a clean state. We can now stop the daemon. Since the store units are global units, we can't target a specific one to stop. Instead, we log into the host machine and instruct Docker to stop the container:
112+
113+
.. code-block:: console
114+
115+
core@deis-1 ~ $ docker stop deis-store-daemon
116+
deis-store-daemon
117+
118+
Back inside a store container on ``deis-2``, we can finally remove the OSD:
119+
120+
.. code-block:: console
121+
122+
core@deis-2 ~ $ nse deis-store-monitor
123+
groups: cannot find name for group ID 11
124+
root@deis-2:/# ceph osd crush remove osd.1
125+
removed item id 1 name 'osd.1' from crush map
126+
root@deis-2:/# ceph auth del osd.1
127+
updated
128+
root@deis-2:/# ceph osd rm 1
129+
removed osd.1
130+
131+
For cleanup, we should remove the OSD entry from etcd:
132+
133+
.. code-block:: console
134+
135+
core@deis-2 ~ $ etcdctl rm /deis/store/osds/172.17.8.100
136+
137+
That's it! If we inspect the health, we see that there are now 3 osds again, and all of our placement groups are ``active+clean``.
138+
139+
.. code-block:: console
140+
141+
core@deis-2 ~ $ nse deis-store-monitor
142+
groups: cannot find name for group ID 11
143+
root@deis-2:/# ceph -s
144+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
145+
health HEALTH_WARN clock skew detected on mon.deis-4
146+
monmap e4: 4 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 12, quorum 0,1,2,3 deis-1,deis-2,deis-3,deis-4
147+
osdmap e28: 3 osds: 3 up, 3 in
148+
pgmap v81: 960 pgs, 9 pools, 1158 bytes data, 45 objects
149+
16915 MB used, 31779 MB / 49200 MB avail
150+
960 active+clean
151+
152+
Removing a monitor
153+
~~~~~~~~~~~~~~~~~~
154+
155+
Removing a monitor is much easier. First, we remove the etcd entry so any clients that are using Ceph won't use the monitor for connecting:
156+
157+
.. code-block:: console
158+
159+
$ etcdctl rm /deis/store/hosts/172.17.8.100
160+
161+
Within 5 seconds, confd will run on all store clients and remove the monitor from the ``ceph.conf`` configuration file.
162+
163+
Next, we stop the container:
164+
165+
.. code-block:: console
166+
167+
core@deis-1 ~ $ docker stop deis-store-monitor
168+
deis-store-monitor
169+
170+
171+
Back on another host, we can again enter a store container and then remove this monitor:
172+
173+
.. code-block:: console
174+
175+
root@deis-2:/# ceph mon remove deis-1
176+
2014-10-07 18:14:38.055584 7fab0d6e7700 0 monclient: hunting for new mon
177+
2014-10-07 18:14:38.055584 7fab0d6e7700 0 monclient: hunting for new mon
178+
removed mon.deis-1 at 172.17.8.100:6789/0, there are now 3 monitors
179+
2014-10-07 18:14:38.072885 7fab0c5e4700 0 -- 172.17.8.101:0/1000361 >> 172.17.8.100:6789/0 pipe(0x7faafc007c90 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7faafc007f00).fault
180+
2014-10-07 18:14:38.072885 7fab0c5e4700 0 -- 172.17.8.101:0/1000361 >> 172.17.8.100:6789/0 pipe(0x7faafc007c90 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7faafc007f00).fault
181+
182+
Note the faults that follow - this is normal to see when a Ceph client is unable to communicate with a certain monitor. The important line is that we see ``removed mon.deis-1 at 172.17.8.100:6789/0, there are now 3 monitors``.
183+
184+
Finally, let's check the health of the cluster:
185+
186+
.. code-block:: console
187+
188+
root@deis-2:/# ceph -s
189+
cluster c3ff2017-b0a8-4c5a-be00-636560ca567d
190+
health HEALTH_OK
191+
monmap e5: 3 mons at {deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0,deis-4=172.17.8.103:6789/0}, election epoch 16, quorum 0,1,2 deis-2,deis-3,deis-4
192+
osdmap e28: 3 osds: 3 up, 3 in
193+
pgmap v91: 960 pgs, 9 pools, 1158 bytes data, 45 objects
194+
16927 MB used, 31766 MB / 49200 MB avail
195+
960 active+clean
196+
197+
We're done!
198+
199+
.. _`removing monitors`: http://ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing-monitors
200+
.. _`removing OSDs`: http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-osds/#removing-osds-manual
201+

docs/managing_deis/backing_up_data.rst

Lines changed: 13 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,37 +7,20 @@ Backing up Data
77
========================
88

99
While applications deployed on Deis follow the Twelve-Factor methodology and are thus stateless,
10-
Deis maintains platform state in two places: data containers and etcd.
10+
Deis maintains platform state in two places: the :ref:`Store` component, and in etcd.
1111

12-
Data containers
12+
Store component
1313
---------------
14-
Data containers are simply Docker containers that expose a volume which is shared with another container.
15-
The components with data containers are builder, database, logger, and registry. Since these are just
16-
Docker containers, they can be exported with ordinary Docker commands:
14+
The store component runs `Ceph`_, and is used by the :ref:`Database` and :ref:`Registry` components
15+
as a data store. This enables the components themselves to freely move around the cluster while
16+
their state is backed by store.
1717

18-
.. code-block:: console
19-
20-
dev $ fleetctl ssh deis-builder.service
21-
coreos $ sudo docker export deis-builder-data > /home/coreos/deis-builder-data-backup.tar
22-
dev $ fleetctl ssh deis-database.service
23-
coreos $ sudo docker export deis-database-data > /home/coreos/deis-database-data-backup.tar
24-
dev $ fleetctl ssh deis-logger.service
25-
coreos $ sudo docker export deis-logger-data > /home/coreos/deis-logger-data-backup.tar
26-
dev $ fleetctl ssh deis-registry.service
27-
coreos $ sudo docker export deis-registry-data > /home/coreos/deis-registry-data-backup.tar
28-
29-
Importing looks very similar:
30-
31-
.. code-block:: console
18+
The store component is configured to still operate in a degraded state, and will automatically
19+
recover should a host fail and then rejoin the cluster. Total data loss of Ceph is only possible
20+
if all of the store containers are removed. However, backup of Ceph is fairly straightforward.
3221

33-
dev $ fleetctl ssh deis-builder.service
34-
coreos $ cat /home/coreos/deis-builder-data-backup.tar | sudo docker import - deis-builder-data
35-
dev $ fleetctl ssh deis-database.service
36-
coreos $ cat /home/coreos/deis-database-data-backup.tar | sudo docker import - deis-database-data
37-
dev $ fleetctl ssh deis-logger.service
38-
coreos $ cat /home/coreos/deis-logger-data-backup.tar | sudo docker import - deis-logger-data
39-
dev $ fleetctl ssh deis-registry.service
40-
coreos $ cat /home/coreos/deis-registry-data-backup.tar | sudo docker import - deis-registry-data
22+
Data in Ceph is stored on the filesystem in ``/var/lib/ceph``, and metadata information is stored
23+
within Ceph. Ceph provides the ability to take snapshots of storage pools with the `rados`_ command.
4124

4225
Using pg_dump
4326
-------------
@@ -46,7 +29,7 @@ dump of the database.
4629

4730
.. code-block:: console
4831
49-
dev $ fleetctl ssh deis-database.service
32+
dev $ fleetctl ssh deis-database@1.service
5033
coreos $ nse deis-database
5134
coreos $ sudo -u postgres pg_dumpall > pg_dump.sql
5235
@@ -61,3 +44,5 @@ documentation in `#683`_.
6144

6245
.. _`#683`: https://github.com/coreos/etcd/issues/683
6346
.. _`etcd-dump`: https://github.com/AaronO/etcd-dump
47+
.. _`Ceph`: http://ceph.com
48+
.. _`rados`: http://ceph.com/docs/master/man/8/rados

docs/managing_deis/builder_settings.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ setting description
3838
/deis/controller/protocol protocol of the controller component (set by controller)
3939
/deis/registry/host host of the controller component (set by registry)
4040
/deis/registry/port port of the controller component (set by registry)
41-
/deis/services/* application metadata (set by controller)
41+
/deis/services/* healthy application containers reported by deis/publisher
4242
/deis/slugbuilder/image slugbuilder image to use (default: deis/slugbuilder:latest)
4343
/deis/slugrunner/image slugrunner image to use (default: deis/slugrunner:latest)
4444
==================================== ===========================================================

docs/managing_deis/ha_database.rst

Lines changed: 0 additions & 24 deletions
This file was deleted.

docs/managing_deis/index.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
:title: Managing Deis
2-
:description: Step-by-step guide for operations engineers setting up a private PaaS using Deis.
2+
:description: Guide for operations engineers managing a private PaaS using Deis.
33

44
.. _managing_deis:
55

@@ -11,6 +11,8 @@ Managing Deis
1111

1212
.. toctree::
1313

14+
add_remove_host
15+
backing_up_data
1416
builder_settings
1517
cache_settings
1618
controller_settings
@@ -21,9 +23,7 @@ Managing Deis
2123
store_daemon_settings
2224
store_gateway_settings
2325
store_monitor_settings
24-
managing_users
26+
operational_tasks
2527
platform_logging
2628
platform_monitoring
27-
backing_up_data
28-
ha_database
2929
security_considerations

docs/managing_deis/managing_users.rst

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
:title: Operational tasks
2+
:description: Common operational tasks for your Deis cluster.
3+
4+
.. _operational_tasks:
5+
6+
Operational tasks
7+
~~~~~~~~~~~~~~~~~
8+
9+
Inspecting store
10+
================
11+
It is sometimes helpful to query the :Ref:`Store` component to ask about the health of the Ceph cluster.
12+
To do this, log into any machine running a ``store-monitor`` or ``store-daemon`` service. Then,
13+
``nse deis-store-monitor`` or ``nse deis-store-daemon`` and issue a ``ceph -s``. This should output the
14+
health of the cluster like:
15+
16+
.. code-block:: console
17+
18+
cluster 6506db0c-9eae-4bb6-a40a-95954dd3c4c3
19+
health HEALTH_OK
20+
monmap e3: 3 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0}, election epoch 8, quorum 0,1,2 deis-1,deis-2,deis-3
21+
osdmap e7: 3 osds: 3 up, 3 in
22+
pgmap v14: 192 pgs, 3 pools, 0 bytes data, 0 objects
23+
19378 MB used, 28944 MB / 49200 MB avail
24+
192 active+clean
25+
26+
If you see ``HEALTH_OK``, this means everything is working as it should.
27+
Note also ``monmap e3: 3 mons at...`` which means all three monitor containers are up and responding,
28+
and ``osdmap e7: 3 osds: 3 up, 3 in`` which means all three daemon containers are up and running.
29+
30+
We can also see from the ``pgmap`` that we have 192 placement groups, all of which are ``active+clean``.
31+
32+
Managing users
33+
==============
34+
35+
There are two classes of Deis users: normal users and administrators.
36+
37+
* Users can use most of the features of Deis - creating and deploying applications, adding/removing domains, etc.
38+
* Administrators can perform all the actions that users can, but they can also create, edit, and destroy clusters.
39+
40+
The first user created on a Deis installation is automatically an administrator.
41+
42+
Promoting users to administrators
43+
---------------------------------
44+
45+
You can use the ``deis perms`` command to promote a user to an administrator:
46+
47+
.. code-block:: console
48+
49+
$ deis perms:create john --admin

0 commit comments

Comments
 (0)