Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


IWStack issues (prometeus' cloudstack)
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

IWStack issues (prometeus' cloudstack)

aleporealepore Member
edited October 2014 in General

hi,
i use IWStack from Prometeus.
would like to known people experience with this service, and in general with Cloudstack, because i'm worried about recent issues i had.

would also like to have a reply from Salvatore, if possible, as their forum is currently "read-only" and the customer support replies are pretty "terse".

i've read of some recent issues on their forum, and had one myself on monday:
i found a (production) instance (KVM) stopped for some reason, without any chance of restart it (stuck on "starting").
rebuilding from a disk snapshot may be the best solution here, but if i remember correctly i was not even able to start another instance on the same zone.

end of the story: i've rebuilt from scratch my server and all my apps on another zone (XEN), god bless IT automation tools.
after ~8 hours i was still not able to start my old instance (and didn't received a single "yes we have a problem" reply).
late reply was a bit scary: "no idea what the cause was".

anyone has similar problems?
is cloudstack/iwstack actually suitable for production environments?
maybe prometeus just need more experience with that?
are XEN VMs better than KVM ones in this case?
is there's some reason i should believe this will not happen again?
using disk snapshots may be the best way to handle these situations?

thanks in advance.

Comments

  • Oh great. Today i thought of signing for a service with them.. Good to know at least now.

  • said: anyone has similar problems?

    I've had similar problem with their cloud this weekend. I couldn't boot certain (official) ISOs and VMs often broke during the creation or destroying process. There were also several netwotk outages of a few minutes each.

  • Problems on the deploy process are often misconfiguration or timing issues with the network. This is like cloudstack works.

  • Haven't had any issues before, but right now my running instance is no longer accessible. I'm not able to open the console from cloudstack, so maybe something is going on...

  • MaouniqueMaounique Host Rep, Veteran
    edited October 2014

    http://board.prometeus.net/viewtopic.php?f=15&t=1417
    There is an issue with the DDoS protected network currently, our upstream provider seems to be having big losses so maybe a huge attack going on there.
    We re-routed the network through our routers directly, so there is no protection now except our nulling.
    It is recommended you do not use the DDoS protected IPs if you have no reason to believe you are running DDoS magnet services and you were never attacked before.
    The longer the route, the more issues can arise, especially in the DDoS game business.

    As for other issues, there were 3 incidents in 2 years.

    1. There was a massive disconnect caused by a malfunctioning NIC which saturated a few ports forcing the orchestrator to consider many nodes dead and restart the instances on the remaining ones. This caused massive lags and delays in restarts because of the sheer number of instances "on the move". All effects were over in some 12 hours. Some 1/5 of the instances were affected and we provided compensation to everyone who noticed and asked for it.
      Measures to avoid it:
      -almost all nodes have now internal connection to the orchestrator, a different circuit though our internal network to serve as redundancy and avoid total connection loss. this does not apply to external nodes, i.e. those not in the datacenter, such as the dallas ones, of course;
      -increased timeout so very short bursts will not trigger such moves.

    2. A botched system VM upgrade due to a cloudstack bug meant the orchestrator was unavailable for some half a day, meaning no administration of the VMs via the console or the web interface was possible, as well as firewall and load balancing, etc. It affected all people which had to make some changes during that time.
      This was fixed by going up a step more and then downgrading to the version compatible with our setup (no showstopper bugs).
      This cant be avoided in the future, we cannot test updates due to the impossibility to have similarly loaded environment.
      Compensation was provided to those who noticed and asked for it.

    3. 2 nodes failed one after another locking resources in the following way:
      -first node fails, the orchestrator starts moving VMs on other nodes;
      -extra load on the storage makes another node to lock storage effectively making it unavailable while VMs were transferred to it. This caused those VMs to fail and remain locked.
      Since the virtual routers are also regular VMs placed randomly on the nodes, it did not affect only the people with actual VMs tehre, but also the network of those who'se virtual routers were there.
      Some 1 in 100 VMs were affected if we include those due to the crash of their internal network.

    I do not consider this is outside the scope of normal operation, nothing is perfect, we will continue to provide compensation to the people affected and improve the product where possible to avoid similar failures.

    As for the reasons VMs fail, those vary and many can be internal due to OS or other things.
    Of course, snapshots are a great feature and everyone should have backups of backups, that is why we offer free FTP backup as well as free anycast DNS failover and affinity groups to make sure you do not have redundancy VMs on the same node.

    The KVM zone is the first we made, it has already a few upgrades as well as some custom code to fix some cloudstack bugs and our own billing system. It runs on slightly older and smaller servers and it is KVM which is slightly slower than Xen. it is intended for semi-professionals and higher grade hobbists as well as developers.

    Our Xen Zone has been added at the request of IT departments from some businesses and runs on newer and larger servers as well as a 10x more expensive SAN (the old one was 150k). As such, it has no small instances (starts from 1 GB) and is intended for professionals which know their way around the UI and API and we do not intend to offer support for it in the forums or in the billing system (the WHMCS module is still available, but if people do not know how to use the UI, they should not use the professional zone and a simple VPS should suffice).
    Of course, support in the tickets will still be provided, but not the kind of how do I make a virtual network or things like those, but rather questions regarding APIs and allowing access to more of those, projects and their setups, etc.

    Thanked by 3alepore Blanoz pcan
  • It is recommended you do not use the DDoS protected IPs if you have no reason to believe you are running DDoS magnet services and you were never attacked before.

    Yea, no, that's not what DDoS protection si for - You buy it to be PREPARED

  • MaouniqueMaounique Host Rep, Veteran
    edited October 2014

    William said: Yea, no, that's not what DDoS protection si for - You buy it to be PREPARED

    And you need to be prepared IF you have reasons to believe you might get attacked one day. For example one ticket today was from someone who was running a backend server there.
    Besides, attacks today go to tens of Gbps quickly, we do not provide that kind of protection anyway, it is for OVH and other people who can tank 500 Gbps, we have it to protect our network, over 1.5 mil pps and 15 G it will be nulled anyway.

    Thanked by 1alepore
  • @maounique many thanks for the detailed explanation, i just learned lots of Prometeus services use advices that are not somewhere else (or i missed them).

    i think i'll continue with the XEN zone.

  • MaouniqueMaounique Host Rep, Veteran

    alepore said: use advices that are not somewhere else (or i missed them).

    No problem, but I did not understand that part.
    When we launched the DDoS protection we clearly explained the limits and who should be using it. It is even in our FAQ.

  • sorry, i was referring more at the Iwstack info and the zones differences.

  • anyway about the DDos stuff, i'm not very interested in it, i just want to point out that the site says "This protection is intended for casual target" which seems a bit different from what i've read here.

  • MaouniqueMaounique Host Rep, Veteran

    It depends what you understand by casual.
    I understand it as people at low risk which happen to be attacked very rarely and by relatively low attacks.
    People which play hard ball in the DDoS arena should get that level of protection.

Sign In or Register to comment.