Skip to content
Snippets Groups Projects
Verified Commit a089ac60 authored by Sheogorath's avatar Sheogorath :european_castle:
Browse files

feat(cert-manager): Add dashboard and alerts

This patch adds various cert-mananger alerts and dashboards these are
inspired by uneeq-oss and monitoring-mixins on GitHub.

References:
https://github.com/monitoring-mixins/website
parent c45a5b95
No related branches found
No related tags found
No related merge requests found
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: prometheus-cert-manager-rules
namespace: cert-manager
spec:
groups:
- name: cert-manager
rules:
- alert: CertManagerAbsent
annotations:
description: New certificates will not be able to be minted, and existing ones
can't be renewed until cert-manager is back.
runbook_url: https://gitlab.com/uneeq-oss/cert-manager-mixin/-/blob/master/RUNBOOK.md#certmanagerabsent
summary: Cert Manager has dissapeared from Prometheus service discovery.
expr: absent(up{job="cert-manager"})
for: 10m
labels:
severity: critical
- name: certificates
rules:
- alert: CertManagerCertExpirySoon
annotations:
description: The domain that this cert covers will be unavailable after {{ $value
| humanizeDuration }}. Clients using endpoints that this cert protects will
start to fail in {{ $value | humanizeDuration }}.
runbook_url: https://gitlab.com/uneeq-oss/cert-manager-mixin/-/blob/master/RUNBOOK.md#certmanagercertexpirysoon
summary: The cert `{{ $labels.name }}` is {{ $value | humanizeDuration }} from
expiry, it should have renewed over a week ago.
expr: |
avg by (exported_namespace, namespace, name) (
certmanager_certificate_expiration_timestamp_seconds - time()
) < (21 * 24 * 3600) # 21 days in seconds
for: 1h
labels:
severity: warning
- alert: CertManagerCertNotReady
annotations:
description: This certificate has not been ready to serve traffic for at least
10m. If the cert is being renewed or there is another valid cert, the ingress
controller _may_ be able to serve that instead.
runbook_url: https://gitlab.com/uneeq-oss/cert-manager-mixin/-/blob/master/RUNBOOK.md#certmanagercertnotready
summary: The cert `{{ $labels.name }}` is not ready to serve traffic.
expr: |
max by (name, exported_namespace, namespace, condition) (
certmanager_certificate_ready_status{condition!="True"} == 1
)
for: 10m
labels:
severity: critical
- alert: CertManagerHittingRateLimits
annotations:
description: Depending on the rate limit, cert-manager may be unable to generate
certificates for up to a week.
runbook_url: https://gitlab.com/uneeq-oss/cert-manager-mixin/-/blob/master/RUNBOOK.md#certmanagerhittingratelimits
summary: Cert manager hitting LetsEncrypt rate limits.
expr: |
sum by (host) (
rate(certmanager_http_acme_client_request_count{status="429"}[5m])
) > 0
for: 5m
labels:
severity: critical
This diff is collapsed.
...@@ -6,8 +6,16 @@ resources: ...@@ -6,8 +6,16 @@ resources:
- repository.yaml - repository.yaml
- release.yaml - release.yaml
- networkpolicy.yaml - networkpolicy.yaml
- alerts.yaml
- ../../shared/networkpolicies/allow-from-monitoring.yaml - ../../shared/networkpolicies/allow-from-monitoring.yaml
- ../../shared/networkpolicies/allow-from-same-namespace.yaml - ../../shared/networkpolicies/allow-from-same-namespace.yaml
- ../../shared/networkpolicies/allow-from-all-namespaces.yaml - ../../shared/networkpolicies/allow-from-all-namespaces.yaml
patchesStrategicMerge: patchesStrategicMerge:
- networkpolicy-patch.yaml - networkpolicy-patch.yaml
configMapGenerator:
- name: cert-manager-grafana-dashboards
files:
- ./dashboards/cert-manager.json
options:
labels:
grafana_dashboard: cert-manager
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment