Skip to content

Commit 901523f

Browse files
committed
Merge pull request #2953 from carmstrong/docs-ceph_quorum
docs(*): add Ceph quorum documentation
2 parents 10f26fd + d7d3f6d commit 901523f

5 files changed

Lines changed: 196 additions & 101 deletions

File tree

docs/managing_deis/add_remove_host.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ Inspecting health
2525
-----------------
2626

2727
Before we begin, we should check the state of the Ceph cluster to be sure it's healthy.
28-
We can do this by logging into any machine in the cluster, entering a store container, and then querying Ceph:
28+
To do this, we use ``deis-store-admin`` - see :ref:`using-store-admin`.
2929

3030
.. code-block:: console
3131
32-
core@deis-1 ~ $ nse deis-store-monitor
32+
core@deis-1 ~ $ nse deis-store-admin
3333
root@deis-1:/# ceph -s
3434
cluster 20038e38-4108-4e79-95d4-291d0eef2949
3535
health HEALTH_OK
@@ -111,6 +111,8 @@ that the store services on this host will be leaving the cluster.
111111
In this example we're going to remove the first node in our cluster, deis-1.
112112
That machine has an IP address of ``172.17.8.100``.
113113

114+
.. _removing_an_osd:
115+
114116
Removing an OSD
115117
~~~~~~~~~~~~~~~
116118

@@ -130,7 +132,7 @@ on any host in the cluster (except the one we're removing). In this example, I a
130132

131133
.. code-block:: console
132134
133-
core@deis-2 ~ $ nse deis-store-monitor
135+
core@deis-2 ~ $ nse deis-store-admin
134136
root@deis-2:/# ceph osd out 2
135137
marked out osd.2.
136138
@@ -178,7 +180,7 @@ Back inside a store container on ``deis-2``, we can finally remove the OSD:
178180

179181
.. code-block:: console
180182
181-
core@deis-2 ~ $ nse deis-store-monitor
183+
core@deis-2 ~ $ nse deis-store-admin
182184
root@deis-2:/# ceph osd crush remove osd.2
183185
removed item id 2 name 'osd.2' from crush map
184186
root@deis-2:/# ceph auth del osd.2
@@ -196,7 +198,7 @@ That's it! If we inspect the health, we see that there are now 3 osds again, and
196198

197199
.. code-block:: console
198200
199-
core@deis-2 ~ $ nse deis-store-monitor
201+
core@deis-2 ~ $ nse deis-store-admin
200202
root@deis-2:/# ceph -s
201203
cluster 20038e38-4108-4e79-95d4-291d0eef2949
202204
health HEALTH_OK
@@ -231,7 +233,7 @@ Back on another host, we can again enter a store container and then remove this
231233

232234
.. code-block:: console
233235
234-
core@deis-2 ~ $ nse deis-store-monitor
236+
core@deis-2 ~ $ nse deis-store-admin
235237
root@deis-2:/# ceph mon remove deis-1
236238
removed mon.deis-1 at 172.17.8.100:6789/0, there are now 3 monitors
237239
2014-11-04 06:57:59.712934 7f04bc942700 0 monclient: hunting for new mon

docs/managing_deis/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Managing Deis
1818
operational_tasks
1919
platform_logging
2020
platform_monitoring
21+
recovering-ceph-quorum
2122
security_considerations
2223
ssl-endpoints
2324
upgrading-deis
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
:title: Recovering Ceph quorum
2+
:description: Additional information for recovering clusters once Ceph has lost quorum.
3+
4+
.. _recovering-ceph-quorum:
5+
6+
Recovering Ceph quorum
7+
======================
8+
9+
Ceph relies on `Paxos`_ to maintain a quorum among monitor services so that they agree on cluster state.
10+
In some cases Ceph can lose quorum, such as when hosts are added and removed from the cluster in
11+
quick successtion, without removing the old hosts from Ceph (see :ref:`add_remove_host`).
12+
13+
A telltale sign of quorum loss is when querying cluster health, ``ceph -s`` times out with monitor
14+
faults on every host in the cluster.
15+
16+
.. important::
17+
18+
Ceph refusing to do anything when it has lost quorum is a safety precaution to prevent you
19+
from losing data. Attempting to recover from this situation rquires knowledge about the state
20+
of your cluster, and should only be attempted if data loss is not considered catastrophic (such as
21+
when a recent backup is available). When in doubt, consult the Ceph and Deis communities for
22+
assistance. Deis recommends regular backups to minimize impact should an issue like this occur.
23+
For more information, see :ref:`backing_up_data`.
24+
25+
The instructions below are intentionally vague, as each recovery scenario will be unique. They are
26+
intended only to point users in the right direction for recovery.
27+
28+
To recover from Ceph quorum loss:
29+
30+
#. Suspect quorum loss because ``ceph -s`` shows nothing but timeouts and/or monitor faults
31+
#. :ref:`using-store-admin`, use the Ceph `admin socket`_ to query the `mon status`_, identifying that there are enough stale entries to prevent Ceph from gaining quorum
32+
#. Stop the platform with ``deisctl stop platform`` so components stop trying to write data to store (note that instead, manually stopping all components except router will allow application containers to remain up, unaffected)
33+
#. Clean up stale entries in ``/deis/store/hosts`` so that dead monitors are not written out to clients
34+
#. Update ``/deis/store/monSetupLock`` to point to the healthy monitor -- note that this isn't strictly necessary, as this value is only used if wiping clean and starting a fresh cluster from scratch with no data, but it's good cleanup
35+
#. Start the healthy monitor and use the admin socket to get the current state of the cluster.
36+
#. Given the cluster state as the monitor sees it, use `monmaptool`_ to manually remove stale monitor entries from the monmap (i.e. ``monmaptool --rm mon.<hostname> --clobber /etc/ceph/monmap``)
37+
#. Stop the healty moitor and use ``deis-store-admin`` to inject the prepared monmap into the monitor with ``ceph-mon -i <hostname> --inject-monmap /etc/ceph/monmap``
38+
#. Start the monitor and ensure it achieves quorum by itself (use ``ceph -s`` and/or query mon_status on the admin socket)
39+
#. Start the other monitors and ensure they connect
40+
#. Start the OSDs with ``deisctl start store-daemon``
41+
#. Observe the OSD map with ``ceph osd dump`` -- for each OSD that is no longer with us, follow :ref:`removing_an_osd` -- take care to ensure that the data is relocated (watch the health with ``ceph -w``) before marking another OSD as ``out``
42+
#. Once the OSD map reflects the now-healthy OSDs, start the remaining store services in order: ``deisctl start store-metadata`` and ``deisctl start store-gateway``
43+
#. Confirm that the cluster is healthy with the metadata servers added, and then start ``store-volume`` with ``deisctl start store-volume``.
44+
#. Start the remaining services with ``deisctl start platform``
45+
46+
.. _`admin socket`: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#using-the-monitor-s-admin-socket
47+
.. _`mon status`: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#understanding-mon-status
48+
.. _`monmaptool`: http://ceph.com/docs/master/man/8/monmaptool/
49+
.. _`Paxos`: http://en.wikipedia.org/wiki/Paxos_%28computer_science%29

docs/troubleshooting_deis/index.rst

Lines changed: 8 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@
66
Troubleshooting Deis
77
====================
88

9+
:Release: |version|
10+
:Date: |today|
11+
12+
.. toctree::
13+
14+
troubleshooting-store
15+
916
Common issues that users have run into when provisioning Deis are detailed below.
1017

1118
Logging in to the cluster
@@ -37,99 +44,7 @@ which will lead to issues running Deis successfully.
3744
A deis-store component fails to start
3845
-------------------------------------
3946

40-
The store component is the most complex component of Deis. As such, there are many ways for it to fail.
41-
Recall that the store components represent Ceph services as follows:
42-
43-
* ``store-monitor``: http://ceph.com/docs/giant/man/8/ceph-mon/
44-
* ``store-daemon``: http://ceph.com/docs/giant/man/8/ceph-osd/
45-
* ``store-gateway``: http://ceph.com/docs/giant/radosgw/
46-
* ``store-metadata``: http://ceph.com/docs/giant/man/8/ceph-mds/
47-
* ``store-volume``: a system service which mounts a `Ceph FS`_ volume to be used by the controller and logger components
48-
49-
Log output for store components can be viewed with ``deisctl status store-<component>`` (such as
50-
``deisctl status store-volume``). Additionally, the Ceph health can be queried by entering
51-
a store container with ``nse deis-store-monitor`` and then issuing a ``ceph -s``. This should output the
52-
health of the cluster like:
53-
54-
.. code-block:: console
55-
56-
core@deis-1 ~ $ nse deis-store-monitor
57-
root@deis-1:/# ceph -s
58-
cluster 20038e38-4108-4e79-95d4-291d0eef2949
59-
health HEALTH_OK
60-
monmap e3: 3 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0}, election epoch 16, quorum 0,1,2 deis-1,deis-2,deis-3
61-
mdsmap e10: 1/1/1 up {0=deis-2=up:active}, 2 up:standby
62-
osdmap e36: 3 osds: 3 up, 3 in
63-
pgmap v2096: 1344 pgs, 12 pools, 369 MB data, 448 objects
64-
24198 MB used, 23659 MB / 49206 MB avail
65-
1344 active+clean
66-
67-
If you see ``HEALTH_OK``, this means everything is working as it should.
68-
Note also ``monmap e3: 3 mons at...`` which means all three monitor containers are up and responding,
69-
``mdsmap e10: 1/1/1 up...`` which means all three metadata containers are up and responding,
70-
and ``osdmap e7: 3 osds: 3 up, 3 in`` which means all three daemon containers are up and running.
71-
72-
We can also see from the ``pgmap`` that we have 1344 placement groups, all of which are ``active+clean``.
73-
74-
For additional information on troubleshooting Ceph, see `troubleshooting`_. Common issues with
75-
specific store components are detailed below.
76-
77-
store-monitor
78-
~~~~~~~~~~~~~
79-
80-
The monitor is the first store component to start, and is required for any of the other store
81-
components to function properly. If a ``deisctl list`` indicates that any of the monitors are failing,
82-
it is likely due to a host issue. Common failure scenarios include not
83-
having adequate free storage on the host node - in that case, monitors will fail with errors similar to:
84-
85-
.. code-block:: console
86-
87-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053693 7fd0586a6700 0 mon.deis-staging-node1@0(leader).data_health(6) update_stats avail 1% total 5960684 used 56655
88-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053770 7fd0586a6700 -1 mon.deis-staging-node1@0(leader).data_health(6) reached critical levels of available space on
89-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053772 7fd0586a6700 0 ** Shutdown via Data Health Service **
90-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053821 7fd056ea3700 -1 mon.deis-staging-node1@0(leader) e3 *** Got Signal Interrupt ***
91-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053834 7fd056ea3700 1 mon.deis-staging-node1@0(leader) e3 shutdown
92-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054000 7fd056ea3700 0 quorum service shutdown
93-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054002 7fd056ea3700 0 mon.deis-staging-node1@0(shutdown).health(6) HealthMonitor::service_shutdown 1 services
94-
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054065 7fd056ea3700 0 quorum service shutdown
95-
96-
This is typically only an issue when deploying Deis on bare metal, as most cloud providers have adequately
97-
large volumes.
98-
99-
store-daemon
100-
~~~~~~~~~~~~
101-
102-
The daemons are responsible for actually storing the data on the filesystem. The cluster is configured
103-
to allow writes with just one daemon running, but the cluster will be running in a degraded state, so
104-
restoring all daemons to a running state as quickly as possible is paramount.
105-
106-
Daemons can be safely restarted with ``deisctl restart store-daemon``, but this will restart all daemons,
107-
resulting in downtime of the storage cluster until the daemons recover. Alternatively, issuing a
108-
``sudo systemctl restart deis-store-daemon`` on the host of the failing daemon will restart just
109-
that daemon.
110-
111-
store-gateway
112-
~~~~~~~~~~~~~
113-
114-
The gateway runs Apache and a FastCGI server to communicate with the cluster. Restarting the gateway
115-
will result in a short downtime for the registry component (and will prevent the database from
116-
backing up), but those components should recover as soon as the gateway comes back up.
117-
118-
store-metadata
119-
~~~~~~~~~~~~~~
120-
121-
The metadata servers are required for the **volume** to function properly. Only one is active at
122-
any one time, and the rest operate as hot standbys. The monitors will promote a standby metadata
123-
server should the active one fail.
124-
125-
store-volume
126-
~~~~~~~~~~~~
127-
128-
Without functioning monitors, daemons, and metadata servers, the volume service will likely hang
129-
indefinitely (or restart constantly). If the controller or logger happen to be running on a host with a
130-
failing store-volume, application logs will be lost until the volume recovers.
131-
132-
Note that store-volume requires CoreOS >= 471.1.0 for the CephFS kernel module.
47+
For information on troubleshooting a ``deis-store`` component, see :ref:`troubleshooting-store`.
13348

13449
Any component fails to start
13550
----------------------------
@@ -178,6 +93,4 @@ Other issues
17893

17994
Running into something not detailed here? Please `open an issue`_ or hop into #deis on Freenode IRC and we'll help!
18095

181-
.. _`Ceph FS`: https://ceph.com/docs/giant/cephfs/
18296
.. _`open an issue`: https://github.com/deis/deis/issues/new
183-
.. _`troubleshooting`: http://docs.ceph.com/docs/giant/rados/troubleshooting/
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
:title: Troubleshooting deis-store
2+
:description: Resolutions for common issues with deis-store and Ceph.
3+
4+
.. _troubleshooting-store:
5+
6+
Troubleshooting deis-store
7+
==========================
8+
9+
The store component is the most complex component of Deis. As such, there are many ways for it to fail.
10+
Recall that the store components represent Ceph services as follows:
11+
12+
* ``store-monitor``: http://ceph.com/docs/giant/man/8/ceph-mon/
13+
* ``store-daemon``: http://ceph.com/docs/giant/man/8/ceph-osd/
14+
* ``store-gateway``: http://ceph.com/docs/giant/radosgw/
15+
* ``store-metadata``: http://ceph.com/docs/giant/man/8/ceph-mds/
16+
* ``store-volume``: a system service which mounts a `Ceph FS`_ volume to be used by the controller and logger components
17+
18+
Log output for store components can be viewed with ``deisctl status store-<component>`` (such as
19+
``deisctl status store-volume``). Additionally, the Ceph health can be queried by using the ``deis-store-admin``
20+
administrative container to access the cluster.
21+
22+
.. _using-store-admin:
23+
24+
Using store-admin
25+
-----------------
26+
27+
``deis-store-admin`` is an optional component that is helpful when diagnosing problems with ``deis-store``.
28+
It contains the ``ceph`` client and writes the necessary Ceph configuration files so it always has the
29+
most up-to-date configuration for the cluster.
30+
31+
To use ``deis-store-admin``, install and start it with ``deisctl``:
32+
33+
.. code-block:: console
34+
35+
$ deisctl install store-admin
36+
$ deisctl start store-admin
37+
38+
The container will now be running on all hosts in the cluster. Log into any of the hosts, enter
39+
the container with ``nse deis-store-admin``, and then issue a ``ceph -s`` to query the cluster's health.
40+
41+
The output should be similar to the following:
42+
43+
.. code-block:: console
44+
45+
core@deis-1 ~ $ nse deis-store-admin
46+
root@deis-1:/# ceph -s
47+
cluster 20038e38-4108-4e79-95d4-291d0eef2949
48+
health HEALTH_OK
49+
monmap e3: 3 mons at {deis-1=172.17.8.100:6789/0,deis-2=172.17.8.101:6789/0,deis-3=172.17.8.102:6789/0}, election epoch 16, quorum 0,1,2 deis-1,deis-2,deis-3
50+
mdsmap e10: 1/1/1 up {0=deis-2=up:active}, 2 up:standby
51+
osdmap e36: 3 osds: 3 up, 3 in
52+
pgmap v2096: 1344 pgs, 12 pools, 369 MB data, 448 objects
53+
24198 MB used, 23659 MB / 49206 MB avail
54+
1344 active+clean
55+
56+
If you see ``HEALTH_OK``, this means everything is working as it should.
57+
Note also ``monmap e3: 3 mons at...`` which means all three monitor containers are up and responding,
58+
``mdsmap e10: 1/1/1 up...`` which means all three metadata containers are up and responding,
59+
and ``osdmap e7: 3 osds: 3 up, 3 in`` which means all three daemon containers are up and running.
60+
61+
We can also see from the ``pgmap`` that we have 1344 placement groups, all of which are ``active+clean``.
62+
63+
For additional information on troubleshooting Ceph, see `troubleshooting`_. Common issues with
64+
specific store components are detailed below.
65+
66+
.. note::
67+
68+
If all of the ``ceph`` client commands seem to be hanging and the output is solely monitor
69+
faults, the cluster may have lost quorum and manual intervention is necessary to recover.
70+
For more information, see :ref:`recovering-ceph-quorum`.
71+
72+
store-monitor
73+
-------------
74+
75+
The monitor is the first store component to start, and is required for any of the other store
76+
components to function properly. If a ``deisctl list`` indicates that any of the monitors are failing,
77+
it is likely due to a host issue. Common failure scenarios include not
78+
having adequate free storage on the host node - in that case, monitors will fail with errors similar to:
79+
80+
.. code-block:: console
81+
82+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053693 7fd0586a6700 0 mon.deis-staging-node1@0(leader).data_health(6) update_stats avail 1% total 5960684 used 56655
83+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053770 7fd0586a6700 -1 mon.deis-staging-node1@0(leader).data_health(6) reached critical levels of available space on
84+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053772 7fd0586a6700 0 ** Shutdown via Data Health Service **
85+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053821 7fd056ea3700 -1 mon.deis-staging-node1@0(leader) e3 *** Got Signal Interrupt ***
86+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.053834 7fd056ea3700 1 mon.deis-staging-node1@0(leader) e3 shutdown
87+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054000 7fd056ea3700 0 quorum service shutdown
88+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054002 7fd056ea3700 0 mon.deis-staging-node1@0(shutdown).health(6) HealthMonitor::service_shutdown 1 services
89+
Oct 29 20:04:00 deis-staging-node1 sh[1158]: 2014-10-29 20:04:00.054065 7fd056ea3700 0 quorum service shutdown
90+
91+
This is typically only an issue when deploying Deis on bare metal, as most cloud providers have adequately
92+
large volumes.
93+
94+
store-daemon
95+
------------
96+
97+
The daemons are responsible for actually storing the data on the filesystem. The cluster is configured
98+
to allow writes with just one daemon running, but the cluster will be running in a degraded state, so
99+
restoring all daemons to a running state as quickly as possible is paramount.
100+
101+
Daemons can be safely restarted with ``deisctl restart store-daemon``, but this will restart all daemons,
102+
resulting in downtime of the storage cluster until the daemons recover. Alternatively, issuing a
103+
``sudo systemctl restart deis-store-daemon`` on the host of the failing daemon will restart just
104+
that daemon.
105+
106+
store-gateway
107+
-------------
108+
109+
The gateway runs Apache and a FastCGI server to communicate with the cluster. Restarting the gateway
110+
will result in a short downtime for the registry component (and will prevent the database from
111+
backing up), but those components should recover as soon as the gateway comes back up.
112+
113+
store-metadata
114+
--------------
115+
116+
The metadata servers are required for the **volume** to function properly. Only one is active at
117+
any one time, and the rest operate as hot standbys. The monitors will promote a standby metadata
118+
server should the active one fail.
119+
120+
store-volume
121+
------------
122+
123+
Without functioning monitors, daemons, and metadata servers, the volume service will likely hang
124+
indefinitely (or restart constantly). If the controller or logger happen to be running on a host with a
125+
failing store-volume, application logs will be lost until the volume recovers.
126+
127+
Note that store-volume requires CoreOS >= 471.1.0 for the CephFS kernel module.
128+
129+
.. _`Ceph FS`: https://ceph.com/docs/giant/cephfs/
130+
.. _`troubleshooting`: http://docs.ceph.com/docs/giant/rados/troubleshooting/

0 commit comments

Comments
 (0)