New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
My 2 cents:
@Cpt_Ben thanks a lot for your point.
may i ask you what is the infrastructure / stack you recommend. you already mentioned
what about proxy, certs, secrets "vaults" , repo, backups and restore,
anything else you can add would be great to know and thanks in advance.
@vitobotta is also invited to add his valuable inputs.
For proxy, have the HAproxy load balancer in front of the cluster, then nginx-ingress and set up dynamic DNS configurations for a specific domain, while also allowing the customers to define their own domains with the ingresses. This requires engineering ofc.
For certs, Rancher has a nice graphical interface of storing encrypted secrets (as of 1.6), as well as certs. For more info, read the documentation here.
For repo, I prefer Harbor, it's fairly nice and easy to manage. Recommend an LDAP integration with whatever SSO (if any) you're using for user authentication. Both Harbor and Rancher are LDAP ready.
For backups, it's a bit complicated as most storage-level backups only create snapshots of the PVs but not the actual application configuration. Portworx Backup supports both, but it was broken on Rancher 2.5.x the last time I tried it, they were aware of the issue and were working on a fix. Not sure if it's fixed already. Worth a try though, they have a 30 day trial.
Of course there're other storage drivers available, I only mentioned a handful.
+1 a Gitlab instance is nice for CI/CD deployments, it's fairly easy to set up with K8S.
It's also worth taking a look at Portainer, not everyone needs Kubernetes.
For secrets management, have a look at Hashicorp vault, or up and coming Infisical
Hi, I have been working with k8s for 5 years and have managed clusters on prem before moving to GCP/GKE. I strongly recommend you go with something like Rancher instead of doing everything with "vanilla" k8s. It's a solid combo management+distro (if you use RKE2 or k3s, RK1 is old and will be abandoned at some point). Using something like Rancher will make many things easier and if you are in trouble, they offer paid support so you can solve problems quickly with them if you don't know how to fix.
At previous jobs I was managing everything myself and luckily I never needed external support, but it's good to know it's there if needed.
Storage: there is a number of options both open source/free and paid. Rook-Ceph, mentioned by @Cpt_Ben is solid, but be aware that while orchestrating persistent volumes is easy with Rook because it's automated, it can happen that you need to intervene manually to fix some issues if something serious happens to the cluster. In my opinion if you are not expert with this I would go with something like Longhorn (also created by Rancher). Longhorn is a lot easier to install and use, much easier to manage and recover volumes when some replicas are faulty with a nice dashboard, and can be significantly faster than Ceph depending on which disks you use (Ceph was really designed in the HDD era). Longhorn also supports both snapshots and backups to off site storage like S3-compatible or NFs, and the backups are "crash consistent" because they are performed after taking a snaphot, automatically. So this can give you peace of mind that can restore your volumes easily if something happens, even in another cluster. With Longhorn you can even configure disaster recovery: so you have volumes on a primary cluster continuously replicated to a "standby" cluster, and if something major happens to the primary cluster, you can instantly promote the volumes on the standby cluster and set them as primary, minimizing data loss dramatically in the event of disaster. One advantage Ceph has on Longhorn, though, is that it stores data in chunks replicated across multiple nodes (usually 3), and you can even create volumes larger than the size of the actual disks on a single node, because it automatically spreads the data across nodes. This is very powerful because you can really distribute data, but it comes at the expense of performance. With Longhorn, on the other hand, you can only create volumes as large as the disk space available (minus some space reserved for the system). Both can do "thin provisioning", meaning that you can create say a volume 100GB big for example, but the volume will not take 100GB initially. It will grow as you add data to it. This allows you to provision larger volumes just to have more capacity in the future. Having said that, with both Ceph and Longhorn you have storage classes that allow volume expansion (this just requires a restart of the workload using the volume after updating the size, so not a big deal). There are other options. Open source there's also OpenEBS, which supports different storage engines. Pretty easy to use, it used to be the slowest though. This may have changed recently with the new engine Mayastor, but I haven't tested it. Other options: Robin, StorageOS, Portworx and more. Some of these are paid but have free tiers as well. Generally speaking, if you can afford it Portworx is the absolute best storage solution for on prem Kubernetes clusters.
Monitoring: I usually use the kube-prometheus-stack Helm chart in my clusters. It's a nice setup between Prometheus, AlertManager and Grafana. On top of this I also add Robusta.dev for better notifications.
Logging: I prefer Loki, because it's lighterweight compared to other options and it's easy to install and use (querying logs is very easy).
Backups: like I mentioned Longhorn can back up volumes, but you may likely want to back up whole applications. For this there's Velero, which is free and open source and works really well. It can also back up volumes but these backups are not "crash consistent" out of the box because they use Restic, so they are file level backups. You can use a trick with fsfreeze to make crash consistent backups which works quite well if the volumes don't see a very high write load. If you want a better backup solution I recommend Kasten (there's a free tier). It can even create crash consistent backups with supported storage drivers, and Longhorn is supported as well.
Certificates: this one is super easy. Just use cert manager. It can provision certificates with Let's Encrypt very quickly with both HTTP and DNS challenge methods.
Secrets: Hashicorp Vault is a solid option.
By "proxy" you probably mean ingress controller. I recommend ingress-nginx because it's the most documented of all of them, and just works. To expose the ingress to the outside world you would normally use a load balancer, which is provisioned automatically in cloud environments. On prem it's a different story of course, but if you can use MetalLB then you can provision load balancers there as well. This is the best solution if you can make it work with your environment because Kubernetes automatically keeps the load balancer configuration up to date with the active endpoints without requiring intervention from you. So you can replace nodes, move apps around etc, it's automatic. If MetalLB cannot work in your environment, then the easiest option is to use the ingress controller with host ports, and configure DNS to round-robin load balance the nodes. But load balancing based on DNS is not a good idea usually. A better option, which requires additional effort, is to set up an external load balancer such as HAproxy in high availability mode using something like keepalived. The problem with both of these solutions is that you need to keep the config updated if the IPs of the nodes change etc.
I don't know what else to add, perhaps it's easier if you ask direct questions on topics you are not sure about. Happy to help
I forgot to mention that I built a tool - which you can find at https://github.com/vitobotta/hetzner-k3s - that allows you to very quickly and easily create production level clusters in Hetzner Cloud. It supports highly available node pools across different regions, highly available control plane, and even autoscaling among other things. The reason why I am mentioning it is that there are many companies using my tool for their clusters because you get out of the box
just like managed Kubernetes services, but you save a ton of money by using my tool and Hetzner instead. One guy told me that in his company they were saving 25K (yes thousands) per month after switching from Google Kubernetes Engine to my tool + Hetzner. So if you can use Hetzner, this is an option worth considering
BTW I wrote a little comparison of some storage solutions here https://vitobotta.com/2019/08/06/kubernetes-storage-openebs-rook-longhorn-storageos-robin-portworx/ if interested.
@vitobotta I think you've just made me want to give k8s another go. Great information, thank you
Clarification my inquiry relates only to Kuberentes cluster self management. I've got 3+ years experience porting customer applications and kubernetes based application design at this point.
None with managing Kuberentes musters (sizing, upgrade plans, etc) or the design choices behind Kubernetes components (csi, etc)
Highly recommend Rancher as well.
Question related :
How does one manages the billing of kubernetes clusters ?
We have a test setup deployed and managed with Rancher but I do not see any tool to manage the billing..
Have you experienced.higherboberheads with k3s Vs k8s (e.g etcd Vs SQL backed). What about compatibility?