docs(*): add Ceph quorum documentation

carmstrong · carmstrong · commit d7d3f6dc7cb7 · 2015-01-23T15:26:03.000-08:00
diff --git a/docs/managing_deis/add_remove_host.rst b/docs/managing_deis/add_remove_host.rst
@@ -111,6 +111,8 @@ that the store services on this host will be leaving the cluster.
 In this example we're going to remove the first node in our cluster, deis-1.
 That machine has an IP address of ``172.17.8.100``.
 
+.. _removing_an_osd:
+
 Removing an OSD
 ~~~~~~~~~~~~~~~
 
diff --git a/docs/managing_deis/index.rst b/docs/managing_deis/index.rst
@@ -18,6 +18,7 @@ Managing Deis
     operational_tasks
     platform_logging
     platform_monitoring
+    recovering-ceph-quorum
     security_considerations
     ssl-endpoints
     upgrading-deis
diff --git a/docs/managing_deis/recovering-ceph-quorum.rst b/docs/managing_deis/recovering-ceph-quorum.rst
@@ -0,0 +1,49 @@
+:title: Recovering Ceph quorum
+:description: Additional information for recovering clusters once Ceph has lost quorum.
+
+.. _recovering-ceph-quorum:
+
+Recovering Ceph quorum
+======================
+
+Ceph relies on `Paxos`_ to maintain a quorum among monitor services so that they agree on cluster state.
+In some cases Ceph can lose quorum, such as when hosts are added and removed from the cluster in
+quick successtion, without removing the old hosts from Ceph (see :ref:`add_remove_host`).
+
+A telltale sign of quorum loss is when querying cluster health, ``ceph -s`` times out with monitor
+faults on every host in the cluster.
+
+.. important::
+
+    Ceph refusing to do anything when it has lost quorum is a safety precaution to prevent you
+    from losing data. Attempting to recover from this situation rquires knowledge about the state
+    of your cluster, and should only be attempted if data loss is not considered catastrophic (such as
+    when a recent backup is available). When in doubt, consult the Ceph and Deis communities for
+    assistance. Deis recommends regular backups to minimize impact should an issue like this occur.
+    For more information, see :ref:`backing_up_data`.
+
+The instructions below are intentionally vague, as each recovery scenario will be unique. They are
+intended only to point users in the right direction for recovery.
+
+To recover from Ceph quorum loss:
+
+#. Suspect quorum loss because ``ceph -s`` shows nothing but timeouts and/or monitor faults
+#. :ref:`using-store-admin`, use the Ceph `admin socket`_ to query the `mon status`_, identifying that there are enough stale entries to prevent Ceph from gaining quorum
+#. Stop the platform with ``deisctl stop platform`` so components stop trying to write data to store (note that instead, manually stopping all components except router will allow application containers to remain up, unaffected)
+#. Clean up stale entries in ``/deis/store/hosts`` so that dead monitors are not written out to clients
+#. Update ``/deis/store/monSetupLock`` to point to the healthy monitor -- note that this isn't strictly necessary, as this value is only used if wiping clean and starting a fresh cluster from scratch with no data, but it's good cleanup
+#. Start the healthy monitor and use the admin socket to get the current state of the cluster.
+#. Given the cluster state as the monitor sees it, use `monmaptool`_ to manually remove stale monitor entries from the monmap (i.e. ``monmaptool --rm mon.<hostname> --clobber /etc/ceph/monmap``)
+#. Stop the healty moitor and use ``deis-store-admin`` to inject the prepared monmap into the monitor with ``ceph-mon -i <hostname> --inject-monmap /etc/ceph/monmap``
+#. Start the monitor and ensure it achieves quorum by itself (use ``ceph -s`` and/or query mon_status on the admin socket)
+#. Start the other monitors and ensure they connect
+#. Start the OSDs with ``deisctl start store-daemon``
+#. Observe the OSD map with ``ceph osd dump`` -- for each OSD that is no longer with us, follow :ref:`removing_an_osd` -- take care to ensure that the data is relocated (watch the health with ``ceph -w``) before marking another OSD as ``out``
+#. Once the OSD map reflects the now-healthy OSDs, start the remaining store services in order: ``deisctl start store-metadata`` and ``deisctl start store-gateway``
+#. Confirm that the cluster is healthy with the metadata servers added, and then start ``store-volume`` with ``deisctl start store-volume``.
+#. Start the remaining services with ``deisctl start platform``
+
+.. _`admin socket`: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#using-the-monitor-s-admin-socket
+.. _`mon status`: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#understanding-mon-status
+.. _`monmaptool`: http://ceph.com/docs/master/man/8/monmaptool/
+.. _`Paxos`: http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
diff --git a/docs/troubleshooting_deis/troubleshooting-store.rst b/docs/troubleshooting_deis/troubleshooting-store.rst
@@ -63,6 +63,12 @@ We can also see from the ``pgmap`` that we have 1344 placement groups, all of wh
 For additional information on troubleshooting Ceph, see `troubleshooting`_. Common issues with
 specific store components are detailed below.
 
+.. note::
+
+    If all of the ``ceph`` client commands seem to be hanging and the output is solely monitor
+    faults, the cluster may have lost quorum and manual intervention is necessary to recover.
+    For more information, see :ref:`recovering-ceph-quorum`.
+
 store-monitor
 -------------