Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


50 nodes Kubernetes cluster in under 9 minutes
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

50 nodes Kubernetes cluster in under 9 minutes

First tests with the foundation of what will be ClusterNinja: from zero to kubeconfig for a cluster with 50 nodes total (3 masters for HA, 47 worker nodes of different types in different pools and in different locations) created in only 8 minutes and 54 seconds TOTAL.

This includes everything, from creating all the resources (servers, load balancer, private network, firewall) to deploying Kubernetes and components required for managing upgrades as well as provisioning persistent volumes and load balancers out of the box.

The upgrade of the same cluster from Kubernetes 1.26 to 1.27 also took only 4 minutes.

Do you know how long these things take on GCP/AWS/Azure? :D

Thoughts?

Thanked by 1mrTom

Comments

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited October 2023

    @vitobotta said: Thoughts?

    Does grannies knitting wordpress load pretty quick then?

    Francisco

    Thanked by 2vitobotta homelabber
  • @Francisco said:

    @vitobotta said: Thoughts?

    Does grannies knitting wordpress load pretty quick then?

    Francisco

    no doubt :D

    Thanked by 1Francisco
  • ehabehab Member
    edited October 2023

    that's pretty impressive, was it from different providers and networks?

  • @ehab said:

    that's pretty impressive, was it from different providers and networks?

    No, this test was with Hetzner for now with servers in 3 different locations. I need to do more work to do a test like this with multi cloud, although the first experiments (at a smaller scale) were successful.

  • I was wondering if any providers would be interested in partnering when I am ready to launch or soon after. ClusterNinja will be a "bring your own provider" managed service and will support some popular providers natively, but it will also be possible to use nodes from any provider of your choice. I could potentially partner with some LET providers with good reputation and offer them as recommended options for those who want to save more money.

    Which providers on LET would be good to talk with about this?

  • akhfaakhfa Member
    edited October 2023

    @vitobotta said:
    First tests with the foundation of what will be ClusterNinja: from zero to kubeconfig for a cluster with 50 nodes total (3 masters for HA, 47 worker nodes of different types in different pools and in different locations) created in only 8 minutes and 54 seconds TOTAL.

    This includes everything, from creating all the resources (servers, load balancer, private network, firewall) to deploying Kubernetes and components required for managing upgrades as well as provisioning persistent volumes and load balancers out of the box.

    The upgrade of the same cluster from Kubernetes 1.26 to 1.27 also took only 4 minutes.

    Do you know how long these things take on GCP/AWS/Azure? :D

    Thoughts?

    Assuming that it will spawn on multiple provider, the challenge for the distributed storage is that how to encrypt the connection when all traffic comes from the public IP. You can use CNI that encrypt connection by default while taking hit to the storage performance, and maybe further performance hit when using public connection between providers, depend on the peering.

    What I can think of is put IO-heavy apps like database to outside of the cluster for the multi-provider deployment

  • @vitobotta said:
    No, this test was with Hetzner for now with servers in 3 different locations. I need to do more work to do a test like this with multi cloud, although the first experiments (at a smaller scale) were successful.

  • ValVal Member

    This work because VMs are created pretty fast by the API on Hetzner Cloud. It might be slower, for instance, on Azure, where VMs takes a bit of time to be created.
    Great job! Love all your automatization stuff.

  • @akhfa said:

    @vitobotta said:
    First tests with the foundation of what will be ClusterNinja: from zero to kubeconfig for a cluster with 50 nodes total (3 masters for HA, 47 worker nodes of different types in different pools and in different locations) created in only 8 minutes and 54 seconds TOTAL.

    This includes everything, from creating all the resources (servers, load balancer, private network, firewall) to deploying Kubernetes and components required for managing upgrades as well as provisioning persistent volumes and load balancers out of the box.

    The upgrade of the same cluster from Kubernetes 1.26 to 1.27 also took only 4 minutes.

    Do you know how long these things take on GCP/AWS/Azure? :D

    Thoughts?

    Assuming that it will spawn on multiple provider, the challenge for the distributed storage is that how to encrypt the connection when all traffic comes from the public IP. You can use CNI that encrypt connection by default while taking hit to the storage performance, and maybe further performance hit when using public connection between providers, depend on the peering.

    In the alpha version of the multi cloud node I have tested with, the nodes talk to each other over a wireguard mesh. For the storage, I have tested successfully with Longhorn, which I am going to recommend to users due to its ease of use and features, and for workloads that require high performance and do not require replication at storage level (such as databases that handle their own replication), I will recommend/install optionally Rancher's local path provisioner.

    All this works beautifully from my initial tests :)

    Thanked by 1ehab
  • @akhfa said:

    @vitobotta said:
    First tests with the foundation of what will be ClusterNinja: from zero to kubeconfig for a cluster with 50 nodes total (3 masters for HA, 47 worker nodes of different types in different pools and in different locations) created in only 8 minutes and 54 seconds TOTAL.

    This includes everything, from creating all the resources (servers, load balancer, private network, firewall) to deploying Kubernetes and components required for managing upgrades as well as provisioning persistent volumes and load balancers out of the box.

    The upgrade of the same cluster from Kubernetes 1.26 to 1.27 also took only 4 minutes.

    Do you know how long these things take on GCP/AWS/Azure? :D

    Thoughts?

    Assuming that it will spawn on multiple provider, the challenge for the distributed storage is that how to encrypt the connection when all traffic comes from the public IP. You can use CNI that encrypt connection by default while taking hit to the storage performance, and maybe further performance hit when using public connection between providers, depend on the peering.

    What I can think of is put IO-heavy apps like database to outside of the cluster for the multi-provider deployment

    I saw you added the last sentence. Like I said for performance they can just provision volumes on the root disk with the local path provisioner. These disks are usually NVME or at least SSD, so they are OK for most tasks. But later on I could also add features like automatic provisioning of resources outside the cluster, if there is demand. But to be honest things have changed dramatically over the past 2-3 years, and there are some awesome Kubernetes operators that make Kubernetes the best platform even to run databases, which is something people would have thought more than twice before doing it only 2-3 years ago.

  • @Val said:
    This work because VMs are created pretty fast by the API on Hetzner Cloud. It might be slower, for instance, on Azure, where VMs takes a bit of time to be created.
    Great job! Love all your automatization stuff.

    Yeah Hetzner is ridiculously fast, I have to agree. So far it's the fastest I have seen at creating servers. One newly created server is up and running in like 20-30 seconds. I wonder if they keep a bunch of servers ready at all times just to be assigned to an account, because otherwise I can't explain that speed.

  • tjntjn Member

    It's pretty impressive @vitobotta keep it up!

  • Another great post @vitobotta. Looking forward to ClusterNinja.

  • @vitobotta said:

    @Val said:
    This work because VMs are created pretty fast by the API on Hetzner Cloud. It might be slower, for instance, on Azure, where VMs takes a bit of time to be created.
    Great job! Love all your automatization stuff.

    Yeah Hetzner is ridiculously fast, I have to agree. So far it's the fastest I have seen at creating servers. One newly created server is up and running in like 20-30 seconds. I wonder if they keep a bunch of servers ready at all times just to be assigned to an account, because otherwise I can't explain that speed.

    In the OS tab they have (or had) a symbol that said ”rapid deploy” or something similar (they all did), I think that = it’s avaliable stand-by

  • @tjn said:
    It's pretty impressive @vitobotta keep it up!

    @MrLime said:
    Another great post @vitobotta. Looking forward to ClusterNinja.

    Thanks!

    @emgh said:

    @vitobotta said:

    @Val said:
    This work because VMs are created pretty fast by the API on Hetzner Cloud. It might be slower, for instance, on Azure, where VMs takes a bit of time to be created.
    Great job! Love all your automatization stuff.

    Yeah Hetzner is ridiculously fast, I have to agree. So far it's the fastest I have seen at creating servers. One newly created server is up and running in like 20-30 seconds. I wonder if they keep a bunch of servers ready at all times just to be assigned to an account, because otherwise I can't explain that speed.

    In the OS tab they have (or had) a symbol that said ”rapid deploy” or something similar (they all did), I think that = it’s avaliable stand-by

    Oh, I have never noticed that symbol

    Thanked by 1emgh
  • Jack_SBEJack_SBE Member, Patron Provider

    @vitobotta said:
    I was wondering if any providers would be interested in partnering when I am ready to launch or soon after. ClusterNinja will be a "bring your own provider" managed service and will support some popular providers natively, but it will also be possible to use nodes from any provider of your choice. I could potentially partner with some LET providers with good reputation and offer them as recommended options for those who want to save more money.

    Which providers on LET would be good to talk with about this?

    This seems great! We would possibly be interested if you’re still looking for providers.

  • Which CNI you deployed with?

  • I wonder if they keep a bunch of servers ready at all times just to be assigned to an account, because otherwise I can't explain that speed.

    This is the only way. Same as with fast FaaS.

    The point is not starting fast but maintaining it, day-2 operations & being cheap enough. One provider that failed on "cheap" is/was elest.io IMHO.

    Thanked by 1Shamli
  • @Jack_SBE said:

    @vitobotta said:
    I was wondering if any providers would be interested in partnering when I am ready to launch or soon after. ClusterNinja will be a "bring your own provider" managed service and will support some popular providers natively, but it will also be possible to use nodes from any provider of your choice. I could potentially partner with some LET providers with good reputation and offer them as recommended options for those who want to save more money.

    Which providers on LET would be good to talk with about this?

    This seems great! We would possibly be interested if you’re still looking for providers.

    Sounds good! I'll take a note about you so I'll get in touch a bit later. :)

    @ddorian43 said:

    I wonder if they keep a bunch of servers ready at all times just to be assigned to an account, because otherwise I can't explain that speed.

    This is the only way. Same as with fast FaaS.

    The point is not starting fast but maintaining it, day-2 operations & being cheap enough. One provider that failed on "cheap" is/was elest.io IMHO.

    How did they fail? What do you mean? Thanks

  • @vitobotta said: Do you know how long these things take on GCP/AWS/Azure?

    more than 10. just creating an EKS cluster (not nodes!!) takes 5mins.

  • @nanankcornering said:

    @vitobotta said: Do you know how long these things take on GCP/AWS/Azure?

    more than 10. just creating an EKS cluster (not nodes!!) takes 5mins.

    Exactly :D

  • @vitobotta said: How did they fail? What do you mean?

    Too expensive IMHO for BYOC.

  • @ddorian43 said:

    @vitobotta said: How did they fail? What do you mean?

    Too expensive IMHO for BYOC.

    They have a completely different pricing than what I have in mind. I won't add a % on top of the instance price. You will be able to chose whichever nodes you want. ClusterNinja will likely be priced based on the size of the cluster. Still need to think about it but it's likely going that way. WDYT?

  • Upgrading empty cluster it's easy, for comparison shake, when upgrading the control plane in AWS/GCP it's slower but no downtime

    Maybe start with your sales pitch, nodes availability (what happen if hosts run out of node) ?
    Are you offering managed services?

  • @sibaper said:
    Upgrading empty cluster it's easy, for comparison shake, when upgrading the control plane in AWS/GCP it's slower but no downtime

    Maybe start with your sales pitch, nodes availability (what happen if hosts run out of node) ?
    Are you offering managed services?

    The cluster was not empty, I installed several things on purpose to fill nodes etc.

    Node availability is not my concern since this is a "bring your own provider" kinda of service, so that is responsibility of the infrastructure provider you choose. I don't provide infrastructure, although that may change if I do some nice partnerships with some providers here.

    Yes, it will be a managed Kubernetes service with some nice features but I am not ready to share more details yet since I don't know what will be available at launch if we try to launch in 3-4 months.

Sign In or Register to comment.