From 8dc0838a440551d601f3624d6c31e59924634bb6 Mon Sep 17 00:00:00 2001 From: Sheogorath <sheogorath@shivering-isles.com> Date: Tue, 5 Oct 2021 03:57:26 +0200 Subject: [PATCH] readme: Provide lessons learned and new premise --- README.md | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 6fbba2f48..9915e8f19 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,32 @@ Shivering-Isles GitOps Infrastructure === -This repository contains the Kubernetes objects that are synced and managed by [flux](https://fluxcd.io) in order to be deployed. +This repository contains the Kubernetes objects that are synced and managed by [flux](https://fluxcd.io) in order to be deployed as well as the terraform definitions to setup the base infrastructure. -Assumptions +**Note**: *Glue code to make the base infrastructure a usable Kubernetes cluster is still missing.* + +Assumption +--- + +Building smaller, more-minimalistic, plain Kubernetes clusters will be cheaper than OpenShift with OKD and more stable since etcd doesn't have to write a ton of data to disk and there aren't two API server running that take up to 3GB of RAM per master node. + +The goal is still to manage everything GitOps style, but more iterative and slowly grinding the way forward before clusters will become productive. + +Original assumptions / Lessons Learned --- -This repository is focused on a setup based on OpenShift, [OKD](https://okd.io) to be specific. Therefore some installations and settings might be based on the expecation of OKD's default setup instead of going the plain Kubernetes way of inventing everything ourselves. +> This repository is focused on a setup based on OpenShift, [OKD](https://okd.io) to be specific. Therefore some installations and settings might be based on the expectation of OKD's default setup instead of going the plain Kubernetes way of inventing everything ourselves. + +Sadly this previous assumption didn't hold up. OpenShift on Hetzner Cloud resulted in quite annoying downtimes during upgrades. While the origin of the problem was not fully determined, it was proven that severe spikes in etcd writing fsyncs of up to 600ms did play a major role in it. Tools --- To handle things properly, try to get the following tools: -- oc / kubectl (you should get it from your OKD cluster) +- kubectl - flux - [sops](https://github.com/mozilla/sops/releases/) (for secret handling) - [helm](https://helm.sh/) (just for sake of completeness and validation) - - +- [terraform](https://terraform.io/) +- make -- GitLab