Skip to content

Commit 80c60e0

Browse files
author
Matthew Fisher
committed
ref(scheduler): introduce deis/publisher
In Deis's scheduler abstraction, we run a "sidecar" systemd unit which publishes the running host and port of the application to etcd for service discovery. The issues with this sidecar container are as follows: - it must live on the same host as the container - it must wait for the container to come up before publishing - it relies on using the docker client directly The last point is crucial. This means that we are effectively monitoring the docker client for container uptime, which is not the same as monitoring the container directly. This has been refactored into a micro-service image called deis/publisher. Publisher listens directly to a docker socket bind-mounted into the container and listens to the docker events API for running containers. Running containers that follow the same format as our current naming scheme (e.g. hardy-woodsman_v2.web.1) are published to etcd for service discovery. This allows us to remove the hard dependency on running a sidecar container to publish the app container to etcd. Instead, one container per host publishes all apps running on that host. MIGRATING: to migrate to this change from existing hosts, you can run the following: ><> # rsync publisher/systemd/deis-publisher.service to every node ><> for i in deis-1.example.com deis-2.example.com; do ssh core@$i -c "sudo systemctl enable /home/core/deis-publisher.service && sudo systemctl start /home/core/deis-publisher" Then, you'll want to remove each announce sidecar attached to the applications: ><> fleetctl destroy <appname>-_<version>.<proctype>.<conatiner ID>-announce
1 parent 2459c2c commit 80c60e0

12 files changed

Lines changed: 261 additions & 74 deletions

File tree

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
include includes.mk
66

7-
COMPONENTS=builder cache controller database logger registry router
8-
START_ORDER=logger database cache registry controller builder router
7+
COMPONENTS=builder cache controller database logger publisher registry router
8+
START_ORDER=publisher logger database cache registry controller builder router
99
CLIENTS=client deisctl
1010

1111
all: build run

controller/api/models.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -436,9 +436,6 @@ def _get_command(self):
436436

437437
_command = property(_get_command)
438438

439-
def _command_announceable(self):
440-
return self._command.lower() in ['start web', '']
441-
442439
def clone(self, release):
443440
c = Container.objects.create(owner=self.owner,
444441
app=self.app,
@@ -459,7 +456,7 @@ def create(self):
459456
name=job_id,
460457
image=image,
461458
command=self._command,
462-
use_announcer=self._command_announceable(), **kwargs)
459+
**kwargs)
463460
except Exception as e:
464461
err = '{} (create): {}'.format(job_id, e)
465462
log_event(self.app, err, logging.ERROR)
@@ -469,7 +466,7 @@ def create(self):
469466
def start(self):
470467
job_id = self._job_id
471468
try:
472-
self._scheduler.start(job_id, self._command_announceable())
469+
self._scheduler.start(job_id)
473470
except Exception as e:
474471
err = '{} (start): {}'.format(job_id, e)
475472
log_event(self.app, err, logging.WARNING)
@@ -479,7 +476,7 @@ def start(self):
479476
def stop(self):
480477
job_id = self._job_id
481478
try:
482-
self._scheduler.stop(job_id, self._command_announceable())
479+
self._scheduler.stop(job_id)
483480
except Exception as e:
484481
err = '{} (stop): {}'.format(job_id, e)
485482
log_event(self.app, err, logging.ERROR)
@@ -489,7 +486,7 @@ def stop(self):
489486
def destroy(self):
490487
job_id = self._job_id
491488
try:
492-
self._scheduler.destroy(job_id, self._command_announceable())
489+
self._scheduler.destroy(job_id)
493490
except Exception as e:
494491
err = '{} (destroy): {}'.format(job_id, e)
495492
log_event(self.app, err, logging.ERROR)

controller/scheduler/chaos.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,28 +25,28 @@ def tearDown(self):
2525

2626
# job api
2727

28-
def create(self, name, image, command, use_announcer, **kwargs):
28+
def create(self, name, image, command, **kwargs):
2929
if random.random() < CREATE_ERROR_RATE:
3030
raise RuntimeError
3131
return True
3232

33-
def start(self, name, use_announcer):
33+
def start(self, name):
3434
"""
3535
Start an idle job
3636
"""
3737
if random.random() < START_ERROR_RATE:
3838
raise RuntimeError
3939
return True
4040

41-
def stop(self, name, use_announcer):
41+
def stop(self, name):
4242
"""
4343
Stop a running job
4444
"""
4545
if random.random() < STOP_ERROR_RATE:
4646
raise RuntimeError
4747
return True
4848

49-
def destroy(self, name, use_announcer):
49+
def destroy(self, name):
5050
"""
5151
Destroy an existing job
5252
"""

controller/scheduler/coreos.py

Lines changed: 4 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -101,15 +101,12 @@ def _get_machines(self):
101101

102102
# container api
103103

104-
def create(self, name, image, command='', template=None, use_announcer=True, **kwargs):
104+
def create(self, name, image, command='', template=None, **kwargs):
105105
"""Create a container"""
106106
self._create_container(name, image, command,
107107
template or copy.deepcopy(CONTAINER_TEMPLATE), **kwargs)
108108
self._create_log(name, image, command, copy.deepcopy(LOG_TEMPLATE))
109109

110-
if use_announcer:
111-
self._create_announcer(name, image, command, copy.deepcopy(ANNOUNCE_TEMPLATE))
112-
113110
def _create_container(self, name, image, command, unit, **kwargs):
114111
l = locals().copy()
115112
l.update(re.match(MATCH, name).groupdict())
@@ -146,22 +143,10 @@ def _create_log(self, name, image, command, unit):
146143
# post unit to fleet
147144
self._put_unit(name+'-log', {"desiredState": "launched", "options": unit})
148145

149-
def _create_announcer(self, name, image, command, unit):
150-
l = locals().copy()
151-
l.update(re.match(MATCH, name).groupdict())
152-
# construct unit from template
153-
for f in unit:
154-
f['value'] = f['value'].format(**l)
155-
# post unit to fleet
156-
self._put_unit(name+'-announce', {"desiredState": "launched", "options": unit})
157-
158-
def start(self, name, use_announcer=True):
146+
def start(self, name):
159147
"""Start a container"""
160148
self._wait_for_container(name)
161149

162-
if use_announcer:
163-
self._wait_for_announcer(name)
164-
165150
def _wait_for_container(self, name):
166151
# we bump to 20 minutes here to match the timeout on the router and in the app unit files
167152
for _ in range(1200):
@@ -177,24 +162,6 @@ def _wait_for_container(self, name):
177162
else:
178163
raise RuntimeError('container failed to start')
179164

180-
def _wait_for_announcer(self, name):
181-
# wait a bit for the announcer to come up, otherwise we may have hit
182-
# https://github.com/docker/docker/issues/8022
183-
for _ in range(30):
184-
states = self._get_state(name)
185-
if states and len(states.get('states', [])) == 1:
186-
state = states.get('states')[0]
187-
subState = state.get('systemdSubState')
188-
if subState == 'running':
189-
# wait for the router to be reconfigured
190-
time.sleep(10)
191-
break
192-
elif subState == 'failed':
193-
raise RuntimeError('announcer failed to start')
194-
time.sleep(1)
195-
else:
196-
raise RuntimeError('announcer timeout on start')
197-
198165
def _wait_for_destroy(self, name):
199166
for _ in range(30):
200167
states = self._get_state(name)
@@ -204,15 +171,13 @@ def _wait_for_destroy(self, name):
204171
else:
205172
raise RuntimeError('timeout on container destroy')
206173

207-
def stop(self, name, use_announcer=True):
174+
def stop(self, name):
208175
"""Stop a container"""
209176
raise NotImplementedError
210177

211-
def destroy(self, name, use_announcer=True):
178+
def destroy(self, name):
212179
"""Destroy a container"""
213180
funcs = []
214-
if use_announcer:
215-
funcs.append(functools.partial(self._destroy_announcer, name))
216181
funcs.append(functools.partial(self._destroy_container, name))
217182
funcs.append(functools.partial(self._destroy_log, name))
218183
# call all destroy functions, ignoring any errors
@@ -226,9 +191,6 @@ def destroy(self, name, use_announcer=True):
226191
def _destroy_container(self, name):
227192
return self._delete_unit(name)
228193

229-
def _destroy_announcer(self, name):
230-
return self._delete_unit(name+'-announce')
231-
232194
def _destroy_log(self, name):
233195
return self._delete_unit(name+'-log')
234196

@@ -350,18 +312,6 @@ def attach(self, name):
350312
]
351313

352314

353-
ANNOUNCE_TEMPLATE = [
354-
{"section": "Unit", "name": "Description", "value": "{name} announce"},
355-
{"section": "Unit", "name": "BindsTo", "value": "{name}.service"},
356-
{"section": "Service", "name": "EnvironmentFile", "value": "/etc/environment"},
357-
{"section": "Service", "name": "ExecStartPre", "value": '''/bin/sh -c "until docker inspect -f '{{{{range $i, $e := .NetworkSettings.Ports }}}}{{{{$p := index $e 0}}}}{{{{$p.HostPort}}}}{{{{end}}}}' {name} >/dev/null 2>&1; do sleep 2; done; port=$(docker inspect -f '{{{{range $i, $e := .NetworkSettings.Ports }}}}{{{{$p := index $e 0}}}}{{{{$p.HostPort}}}}{{{{end}}}}' {name}); if [[ -z $port ]]; then echo We have no port...; exit 1; fi; echo Waiting for $port/tcp...; until netstat -lnt | grep :$port >/dev/null; do sleep 1; done"'''}, # noqa
358-
{"section": "Service", "name": "ExecStart", "value": '''/bin/sh -c "port=$(docker inspect -f '{{{{range $i, $e := .NetworkSettings.Ports }}}}{{{{$p := index $e 0}}}}{{{{$p.HostPort}}}}{{{{end}}}}' {name}); echo Connected to $COREOS_PRIVATE_IPV4:$port/tcp, publishing to etcd...; while netstat -lnt | grep :$port >/dev/null; do etcdctl set /deis/services/{app}/{name} $COREOS_PRIVATE_IPV4:$port --ttl 60 >/dev/null; sleep 45; done"'''}, # noqa
359-
{"section": "Service", "name": "ExecStop", "value": "/usr/bin/etcdctl rm --recursive /deis/services/{app}/{name}"}, # noqa
360-
{"section": "Service", "name": "TimeoutStartSec", "value": "20m"},
361-
{"section": "X-Fleet", "name": "MachineOf", "value": "{name}.service"},
362-
]
363-
364-
365315
RUN_TEMPLATE = [
366316
{"section": "Unit", "name": "Description", "value": "{name} admin command"},
367317
{"section": "Service", "name": "ExecStartPre", "value": '''/bin/sh -c "IMAGE=$(etcdctl get /deis/registry/host 2>&1):$(etcdctl get /deis/registry/port 2>&1)/{image}; docker pull $IMAGE"'''}, # noqa

controller/scheduler/mock.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,25 @@ def tearDown(self):
2020

2121
# container api
2222

23-
def create(self, name, image, command, use_announcer, **kwargs):
23+
def create(self, name, image, command, **kwargs):
2424
"""
2525
Create a new container
2626
"""
2727
return
2828

29-
def start(self, name, use_announcer):
29+
def start(self, name):
3030
"""
3131
Start a container
3232
"""
3333
return
3434

35-
def stop(self, name, use_announcer):
35+
def stop(self, name):
3636
"""
3737
Stop a container
3838
"""
3939
return
4040

41-
def destroy(self, name, use_announcer):
41+
def destroy(self, name):
4242
"""
4343
Destroy a container
4444
"""

docs/understanding_deis/concepts.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,9 @@ changed, making it easy to rollback code and configuration.
9999
Run Stage
100100
^^^^^^^^^
101101
The run stage shells out jobs to the scheduler. The scheduler is in control of balancing the
102-
processes evenly across the cluster, as well as the announcers and the loggers for each
103-
application. The scheduler uses SSH to submit jobs to each node in the cluster and updates
104-
the proxy component between releases, making zero downtime deployments possible.
102+
processes evenly across the cluster, as well as the loggers for each application. The
103+
scheduler uses SSH to submit jobs to each node in the cluster and updates the proxy
104+
component between releases, making zero downtime deployments possible.
105105

106106
.. _concepts_backing_services:
107107

publisher/Dockerfile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
FROM deis/go:latest
2+
3+
WORKDIR /go/src/github.com/deis/deis/publisher
4+
CMD /go/bin/publisher
5+
6+
ADD . /go/src/github.com/deis/deis/publisher
7+
RUN CGO_ENABLED=0 go get -a -ldflags '-s' github.com/deis/deis/publisher

publisher/LICENSE

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Copyright 2014 OpDemand LLC
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

publisher/Makefile

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
include ../includes.mk
2+
3+
DOCKER_IMAGE := deis/publisher
4+
BUILD_IMAGE := $(DOCKER_IMAGE)-build
5+
RELEASE_IMAGE := $(DOCKER_IMAGE):$(BUILD_TAG)
6+
REMOTE_IMAGE := $(REGISTRY)/$(RELEASE_IMAGE)
7+
8+
build: check-docker
9+
docker build -t $(BUILD_IMAGE) .
10+
docker cp `docker run -d $(BUILD_IMAGE)`:/go/bin/publisher image/
11+
docker build -t $(DOCKER_IMAGE) image
12+
rm -rf image/publisher
13+
14+
clean: check-docker check-registry
15+
docker rmi $(RELEASE_IMAGE) $(REMOTE_IMAGE)
16+
17+
full-clean: check-docker check-registry
18+
docker images -q $(DOCKER_IMAGE) | xargs docker rmi -f
19+
docker images -q $(REGISTRY)/$(DOCKER_IMAGE) | xargs docker rmi -f
20+
21+
install: check-deisctl
22+
deisctl install publisher
23+
24+
dev-release: check-docker check-registry check-deisctl
25+
docker tag $(RELEASE_IMAGE) $(REMOTE_IMAGE)
26+
docker push $(REMOTE_IMAGE)
27+
deisctl config publisher set image=$(REMOTE_IMAGE)
28+
29+
release: check-docker
30+
docker push $(DOCKER_IMAGE)
31+
32+
restart: stop start
33+
34+
run: install start
35+
36+
start: check-deisctl
37+
deisctl start publisher
38+
39+
stop: check-deisctl
40+
deisctl stop publisher
41+
42+
test:
43+
@echo no unit tests
44+
45+
uninstall: check-deisctl
46+
deisctl uninstall publisher

publisher/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
publisher
2+
=========
3+
4+
Publisher listens directly to a docker socket bind-mounted into the container and listens to the
5+
docker events API for running containers on the host. Deis applications are published to etcd for
6+
service discovery.
7+
8+
## Running this Container
9+
10+
Set $ETCD_HOST to be the IP address/hostname of the etcd endpoint you wish to target, and
11+
$HOST to be the IP address of the host running this container:
12+
13+
><> docker run -d -v /var/run/docker.sock:/tmp/docker.sock -e ETCD_HOST=192.168.0.1 -e HOST=192.168.0.1 deis/publisher
14+
15+
## Building from Source
16+
17+
To build the image, run `make build`.
18+
19+
The build/runtime environment is split into two parts:
20+
21+
### The build environment
22+
23+
Based on deis/go, this image installs Go and compiles publisher into a binary.
24+
25+
### The runtime environment
26+
27+
Leveraging the build environment, this image pulls in the standalone binary compiled in
28+
the build environment and injects it into a minimal standalone container, minimizing the
29+
disk space footprint that this image takes up. In fact, this image is < 5MB:
30+
31+
><> docker images | grep publisher
32+
deis/publisher master 7974d140b07d 11 minutes ago 4.678 MB
33+
deis/publisher-build master 75983660e714 11 minutes ago 1.091 GB

0 commit comments

Comments
 (0)