Skip to content

Commit dd2b505

Browse files
author
Jonathan Chauncey
committed
docs(logging,monitoring): Update platform-logging and monitoring docs with recent changes
1 parent 01dacdb commit dd2b505

2 files changed

Lines changed: 76 additions & 88 deletions

File tree

src/managing-workflow/platform-logging.md

Lines changed: 47 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -2,44 +2,16 @@
22

33
The logging platform is made up of 2 components - [Fluentd](https://github.com/deis/fluentd) and [Logger](https://github.com/deis/logger).
44

5-
[Fluentd](https://github.com/deis/fluentd) runs on every worker node of the cluster and is deployed as a [Daemon Set](http://kubernetes.io/v1.1/docs/admin/daemons.html). The Fluentd pods capture all of the stderr and stdout streams of every container running on the host (even those not hosted directly by kubernetes). It then sends this data via the Syslog UDP port (514) to the Logger component.
5+
[Fluentd](https://github.com/deis/fluentd) runs on every worker node of the cluster and is deployed as a [Daemon Set](http://kubernetes.io/v1.1/docs/admin/daemons.html). The Fluentd pods capture all of the stderr and stdout streams of every container running on the host (even those not hosted directly by kubernetes). Once the log message arrives in our [custom fluentd plugin](https://github.com/deis/fluentd/tree/master/rootfs/opt/fluentd/deis-output) we determine where the message originated.
66

7-
Logger acts like a syslog server and receives all log messages that are occurring on the cluster. It then filters this data to only Deis deploy applications and stores those log messages in a ring buffer where they can be fetched via the Deis CLI.
7+
If the message was from the [Workflow Controller](https://github.com/deis/controller) or from an application deployed via workflow we send it to the logs topic on the local [NSQD](nsq.io) instance.
88

9-
## Installation
9+
If the message is from the [Workflow Router](https://github.com/deis/router) we build an Influxdb compatible message and send it to the same NSQD instance but instead place the message on the metrics topic.
1010

11-
The logging system is part of the main installation of Workflow. You will need to watch the components come up and verify they are in a running state by executing the following command:
11+
Logger then acts as a consumer reading messages off of the NSQ logs topic storing those messages in a local Redis instance. When a user wants to retrieve log entries using the `deis logs` command we make an HTTP request from Controller to Logger which then fetches the appropriate data from Redis.
1212

13-
```
14-
$ kubectl --namespace=deis get pods
15-
```
16-
17-
You should see output similar to this:
18-
```
19-
NAME READY STATUS RESTARTS AGE
20-
deis-builder-2qgil 1/1 Running 2 17h
21-
deis-controller-6rivh 1/1 Running 3 17h
22-
deis-database-iou5f 1/1 Running 0 17h
23-
deis-logger-6er1f 1/1 Running 0 1h
24-
deis-logger-fluentd-4asyw 1/1 Running 0 1h
25-
deis-logger-fluentd-tbhvf 1/1 Running 0 1h
26-
deis-minio-2jnr7 1/1 Running 0 17h
27-
deis-registry-terrk 1/1 Running 4 17h
28-
deis-router-jakw6 1/1 Running 0 17h
29-
deis-workflow-manager-f1ige 1/1 Running 0 33m
30-
```
31-
32-
There should be a fluentd pod per worker node of your Kubernetes cluster. So if you are running a 3 node cluster with 1 master and 2 workers you will have 2 fluentd pods running.
33-
34-
Once you have verified that the pods have started correctly you will need to restart your controller pod so that it can capture the correct information about how to talk to the logger pod.
35-
36-
```
37-
kubectl --namespace=deis delete pods <deis-controller-pod>
38-
```
39-
40-
The replication controller will restart a new pod with all of the correct information.
41-
42-
Once the pod has restarted, you can verify the logging system is working by going to a deployed app and executing the `deis logs` command. If an error occurs the CLI will print a user friendly message about how to debug the issue.
13+
## Debugging Logger
14+
If the `deis logs` command encounters an error it will return the following message:
4315

4416
```
4517
Error: There are currently no log messages. Please check the following things:
@@ -50,38 +22,45 @@ Error: There are currently no log messages. Please check the following things:
5022

5123
## Architecture Diagram
5224
```
53-
┌──────────────┐
54-
│ │
55-
│ Host ├─────┐
56-
│ Fluentd│ │
57-
└──────────────┘ UDP
58-
59-
┌──────────────┐ │ ┌──────────────┐
60-
│ │ │ │ Logger │
61-
│ Host │─UDP─┼─────▶│ Host │
62-
│ Fluentd│ │ │ Fluentd │
63-
└──────────────┘ │ └──────────────┘
64-
┌──────────────┐ │
65-
│ │ UDP
66-
│ Host │─────┘
67-
│ Fluentd│
68-
└──────────────┘
25+
┌────────┐
26+
│ Router │ ┌────────┐ ┌─────┐
27+
└────────┘ │ Logger │◀───▶│Redis│
28+
│ └────────┘ └─────┘
29+
Log file ▲
30+
│ │
31+
▼ │
32+
┌────────┐ ┌─────────┐ logs/metrics ┌─────┐
33+
│App Logs│──Log File──▶│ fluentd │───────topics─────▶│ NSQ │
34+
└────────┘ └─────────┘ └─────┘
35+
36+
37+
┌─────────────┐ │
38+
│ HOST │ ▼
39+
│ Telegraf │───┐ ┌────────┐
40+
└─────────────┘ │ │Telegraf│
41+
│ └────────┘
42+
┌─────────────┐ │ │
43+
│ HOST │ │ ┌───────────┐ │
44+
│ Telegraf │───┼───▶│ InfluxDB │◀────Wire ───────────┘
45+
└─────────────┘ │ └───────────┘ Protocol
46+
│ ▲
47+
┌─────────────┐ │ │
48+
│ HOST │ │ ▼
49+
│ Telegraf │───┘ ┌──────────┐
50+
└─────────────┘ │ Grafana │
51+
└──────────┘
6952
```
7053

7154
## Default Configuration
7255
By default the Fluentd pod can be configured to talk to numerous syslog endpoints. So for example it is possible to have Fluentd send log messages to both the Logger component and [Papertrail](https://papertrailapp.com/). This allows production deployments of Deis to satisfy stringent logging requirements such as offsite backups of log data.
7356

74-
Configuring Fluentd to talk to multiple syslog endpoints means adding the following stanzas to the Fluentd daemonset manifest -
57+
Configuring Fluentd to talk to multiple syslog endpoints means adding the following stanzas to the [Fluentd daemonset manifest](https://github.com/deis/charts/blob/master/workflow-v2.1.0/tpl/deis-logger-fluentd-daemon.yaml) -
7558

7659
```
7760
env:
7861
- name: "SYSLOG_HOST_1"
79-
value: $(DEIS_LOGGER_SERVICE_HOST)
62+
value: "my.syslog.host"
8063
- name: "SYSLOG_PORT_1"
81-
value: $(DEIS_LOGGER_SERVICE_PORT_TRANSPORT)
82-
- name: "SYSLOG_HOST_2"
83-
value: "my.syslog.host.2"
84-
- name: "SYSLOG_PORT_2"
8564
value: "5144"
8665
....
8766
- name: "SYSLOG_HOST_N"
@@ -90,3 +69,15 @@ env:
9069
value: "51333"
9170
```
9271

72+
If you only need to talk to 1 Syslog endpoint you can use the following configuration within your chart:
73+
74+
```
75+
env:
76+
- name: "SYSLOG_HOST"
77+
value: "my.syslog.host"
78+
- name: "SYSLOG_PORT"
79+
value: "5144"
80+
```
81+
82+
### Customizing:
83+
We currently support logging information to Syslog, Elastic Search, and Sumo Logic. However, we will gladly accept pull requests that add support to other locations. For more information please visit the [fluentd repository](https://github.com/deis/fluentd).
Lines changed: 29 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,40 @@
11
# Platform Monitoring
22

33
## Description
4-
5-
With the release of Deis Workflow, we now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 4 components:
6-
4+
We now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 3 components:
75
* [Telegraf](https://docs.influxdata.com/telegraf/v0.12/) - Metrics collection daemon written by team behind InfluxDB.
86
* [InfluxDB](https://docs.influxdata.com/influxdb/v0.12/) - Time series database
97
* [Grafana](http://grafana.org/) - Graphing tool for time series data
10-
* [Stdout-Metrics](https://github.com/deis/stdout-metrics) - Tool for consuming metrics via standard out and forwards them to InfluxDB
118

129
## Architecture Diagram
13-
1410
```
15-
┌────────┐
16-
│ Router │
17-
└────────┘
18-
19-
20-
▼ ┌──────────┐
21-
┌─────────────┐ ┌─────────┐ │ stdout │
22-
│ HOST │ │ fluentd │────▶│ metrics │
23-
│ Telegraf │───┐ └─────────┘ └──────────┘
24-
└─────────────┘ │ │
25-
│ │
26-
┌─────────────┐ │ │
27-
│ HOST │ │ ┌───────────┐ │
28-
│ Telegraf │───┼───▶│ InfluxDB │◀─────────┘
29-
└─────────────┘ │ └───────────┘
30-
│ │
31-
┌─────────────┐ │ │
32-
│ HOST │ │ ▼
33-
│ Telegraf │───┘ ┌──────────┐
34-
└─────────────┘ │ Grafana │
35-
└──────────┘
11+
┌────────┐
12+
│ Router │ ┌────────┐ ┌─────┐
13+
└────────┘ │ Logger │◀───▶│Redis│
14+
│ └────────┘ └─────┘
15+
Log file ▲
16+
│ │
17+
▼ │
18+
┌────────┐ ┌─────────┐ logs/metrics ┌─────┐
19+
│App Logs│──Log File──▶│ fluentd │───────topics─────▶│ NSQ │
20+
└────────┘ └─────────┘ └─────┘
21+
22+
23+
┌─────────────┐ │
24+
│ HOST │ ▼
25+
│ Telegraf │───┐ ┌────────┐
26+
└─────────────┘ │ │Telegraf│
27+
│ └────────┘
28+
┌─────────────┐ │ │
29+
│ HOST │ │ ┌───────────┐ │
30+
│ Telegraf │───┼───▶│ InfluxDB │◀────Wire ───────────┘
31+
└─────────────┘ │ └───────────┘ Protocol
32+
│ ▲
33+
┌─────────────┐ │ │
34+
│ HOST │ │ ▼
35+
│ Telegraf │───┘ ┌──────────┐
36+
└─────────────┘ │ Grafana │
37+
└──────────┘
3638
```
3739

3840
### Grafana
@@ -53,7 +55,7 @@ them separately in version control.
5355

5456
### InfluxDB
5557

56-
As of the Beta4 release InfluxDB is writing data to the host disk, however, if the InfluxDB pod dies and comes back on
58+
InfluxDB writes data to the host disk, however, if the InfluxDB pod dies and comes back on
5759
another host the data will not be recovered. We intend to fix this in a future release. The InfluxDB Admin UI is also
5860
exposed through the router allowing users to access the query engine by going to `influx.mydomain.com`. You will need to
5961
configure where to find the `influx-api` endpoint by clicking the "gear" icon at the top right and changing the host to
@@ -76,13 +78,8 @@ Telegraf is the metrics collection daemon used within the monitoring stack. It w
7678

7779
It is possible to send these metrics to other endpoints besides InfluxDB. For more information please consult the following [file](https://github.com/deis/monitor/blob/master/telegraf/rootfs/config.toml.tpl)
7880

79-
### Stdout-Metrics
80-
81-
Stdout-Metrics is a custom tool built by the Deis team to provide metrics that are reported via standard out - like Nginx. It consumes the log stream from FluentD filtering out messages that are not from the [Deis Router](https://github.com/deis/router). Once it finds a message it can parse it will turn that into a metric and send it directly to InfluxDB.
82-
8381
### Customizing
8482

8583
Each of these components allows for customization via environment variables. If you would like to learn more please visit the following github repositories:
8684

87-
* [stdout-metrics](https://github.com/deis/stdout-metrics)
8885
* [monitor](https://github.com/deis/monitor)

0 commit comments

Comments
 (0)