Skip to content

Commit 1315f9c

Browse files
committed
fix(store): fix shared etcd key defaults
In store-monitor's bin/boot script, we set default etcd keys that are used to template ceph.conf for all store components. Previously, this logic was inside the monitor setup lock logic, meaning it's only executed once for the lifetime of a cluster. This breaks when adding new keys to ceph.conf and this logic, since upgraded clusters will not execute this logic a second time, thus skipping the creation of new (and necessary) keys. The cluster will not come up on an upgrade because confd cannot template ceph.conf in the store components. Because *all* store monitors now set these defaults before checking if they're the master, we have to ensure that all monitors are setting the same default values and not writing over a different default set by a competing monitor. We do this by removing the ability to set environment variable overrides for these values. It never really made sense anyway, since a user can always override these by manually setting the etcd key before cluster start.
1 parent a222a80 commit 1315f9c

1 file changed

Lines changed: 12 additions & 15 deletions

File tree

  • store/monitor/bin

store/monitor/bin/boot

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,6 @@ ETCD="$HOST:$ETCD_PORT"
88
ETCD_PATH=${ETCD_PATH:-/deis/store}
99
HOSTNAME=`hostname`
1010

11-
# These defaults are for 3 hosts
12-
NUM_STORES=${NUM_STORES:-3}
13-
PG_NUM=${PG_NUM:-64}
14-
## We set this to the number of PGs before re-evaluating the PG count so users upgrading don't see the warning
15-
## Now, 12 pools * 64 pgs per pool = 768 PGs per OSD
16-
PGS_PER_OSD_WARNING=${PGS_PER_OSD_WARNING:-1536}
17-
1811
function etcd_set_default {
1912
set +e
2013
etcdctl --no-sync -C $ETCD mk $ETCD_PATH/$1 $2 >/dev/null 2>&1
@@ -25,21 +18,25 @@ function etcd_set_default {
2518
set -e
2619
}
2720

21+
# set some defaults in etcd - these are templated in ceph.conf
22+
# These defaults are sane for 3 hosts, and may need to be tweaked for larger clusters.
23+
etcd_set_default delayStart 15
24+
etcd_set_default size 3 # maintain 3 copies of all data
25+
etcd_set_default minSize 1 # since we have 3 copies of data, the cluster can operate with just one host up
26+
etcd_set_default pgNum 64 # this gives us a reasonable number of placement groups per host, assuming 3 hosts and 12 pools
27+
28+
# New clusters use 768 PGs per host (12 pools * 64 PGs per pool = 768 PGs per OSD)
29+
# However, upgraded clusters may still use 128 PGs per pool, so we set this to 1536 PGs per host to suppress the
30+
# "too many placement groups per host" warning
31+
etcd_set_default maxPGsPerOSDWarning 1536
32+
2833
if ! etcdctl --no-sync -C $ETCD get ${ETCD_PATH}/monSetupComplete >/dev/null 2>&1 ; then
2934
echo "store-monitor: Ceph hasn't yet been deployed. Trying to deploy..."
3035
# let's rock and roll. we need to obtain a lock so we can ensure only one machine is trying to deploy the cluster
3136
if etcdctl --no-sync -C $ETCD mk ${ETCD_PATH}/monSetupLock $HOSTNAME >/dev/null 2>&1 \
3237
|| [[ `etcdctl --no-sync -C $ETCD get ${ETCD_PATH}/monSetupLock` == "$HOSTNAME" ]] ; then
3338
echo "store-monitor: obtained the lock to proceed with setting up."
3439

35-
# set some defaults in etcd if they're not passed in as environment variables
36-
# these are templated in ceph.conf
37-
etcd_set_default delayStart 15
38-
etcd_set_default maxPGsPerOSDWarning ${PGS_PER_OSD_WARNING}
39-
etcd_set_default minSize 1
40-
etcd_set_default pgNum ${PG_NUM}
41-
etcd_set_default size ${NUM_STORES}
42-
4340
# Generate administrator key
4441
ceph-authtool /etc/ceph/ceph.client.admin.keyring --create-keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
4542

0 commit comments

Comments
 (0)