Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Looking for a price match, 16 Dedis
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Looking for a price match, 16 Dedis

ricardoricardo Member
edited August 2016 in Requests

We had servers with Delimiter but today their payment processor has decided they don't want our money, and I'm looking for a competitive deal.

What I'm after: 16 Servers

Reason for usage: We fingerprint websites and look at application usage (analytics IDs, javascript libraries etc, pretty much like 'builtwith'), it's essentially a simple C-written crawler that fetches the home page and some inner pages of websites. The servers grab and process the pages.

What I'm after:

- 1 IPv4 per server, no IPv6 required
- CPUs we had on the old boxes as shown by /proc/cpuinfo:
    processor   : 7
    vendor_id   : GenuineIntel
    cpu family  : 6
    model       : 23
    model name  : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
- 8GB RAM per server
- 500GB HD per server
- 5TB B/W, may look for more down the line so would want a cheapish add-on option
- IP space doesn't need to be terribly clean, it may get an automated abuse report or two purely because of anal sysadmins doing so for crawlers.

The specs are a general guideline.

Delimiter were charging $300/m for these. If you're a provider or offering a suggestion, I need the company to be established, e.g. several years in business already. I would pay monthly and looking to avoid a setup fee (but would be willing to commit to quarterly).

«1

Comments

  • bacloudbacloud Member, Patron Provider

    old ones, even duals... http://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E5420+@+2.50GHz

    I can offer you like 10 x E3v3 servers ( http://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1231+v3+@+3.40GHz ) for $540 with 1TB HDD and 100Mbps unlimited.

  • I can offer you something. But not established as long as you'd like.

  • ricardo said: Delimiter were charging $300/m for these. If you're a provider or offering a suggestion, I need the company to be established, e.g. several years in business already.

    Do you need them as dedicated servers? A number of our clients run similar applications but use KVM based instances rather than dedicated boxes.

  • quadhost said: Do you need them as dedicated servers?

    It's processor intensive.

    To clarify, I'm happy with less boxes but need comparable CPU performance, I would be OK with 8 boxes, same RAM and disk.

  • ricardo said: because of anal sysadmins

    waaaat!?

    Thanked by 1bacloud
  • BradyHBradyH Member, Host Rep

    @ricardo

    now if you had 16 dedicated servers and was only paying 300 a month for all of them. I am not sure anyone could do 19 bucks a month for a dedicated Server. Just the rack space alone would cost more than that, plus the bandwidth.

    On the hard drive space are you using all that space or is that just what they gave you.

    I can set you up on a KVM server that is in Dallas. I can set each kvm server up with 8 GB of ram.

    The main server has dual E 5 2650, SSD Drives, Raid 10,

  • CenTexHosting said: now if you had 16 dedicated servers and was only paying 300 a month for all of them. I am not sure anyone could do 19 bucks a month for a dedicated Server. Just the rack space alone would cost more than that, plus the bandwidth.

    Clearly Delimiter could.

    Thanked by 2netomx deadbeef
  • randvegetarandvegeta Member, Host Rep

    @Ricardo,

    Your traffic is mostly inbound or outbound or both? Does location matter?

    $300 for all 16 servers ? Do you really need 16 servers or fewer but more powrful ones can also do?

  • LeeLee Veteran

    ricardo said: Clearly Delimiter could.

    Not anymore though.

    Thanked by 1Clouvider
  • randvegeta said: Your traffic is mostly inbound or outbound or both? Does location matter?

    $300 for all 16 servers ? Do you really need 16 servers or fewer but more powrful ones can also do?

    I'm running a crawler so inbound, if I'm thinking from the right perspective. Answereed the other question already.

    Lee said: Not anymore though.

    By my choice.

  • randvegetarandvegeta Member, Host Rep

    @ricardo,

    Long term usage or short term? I have some short term servers I could give you .

  • exception0x876exception0x876 Member, Host Rep, LIR

    https://lowendbox.com/blog/joes-datacenter-20month-dual-l5420-w-8gb-ram-500gb-hdd-kansas-city/

    it seems comparable in performance/price. however I haven't tried this provider myself, so can't vouch for the quality.

    Thanked by 1ricardo
  • https://virmach.com/vds-dedicated-servers/

    8 of those may fit your needs and come close to your price tag if you buy and pay quarterly even ssd available if you don't the need hdd space... and maybe buy monthly first and switch to quarterly in steps to leverage the costs over three months :-)

    or ask @virmach for a custom deal especially if you need less IPs he might be able to cut some dollars? just a wild guess ;-)

    Thanked by 1ricardo
  • You could put a bunch of hetzner (auction) boxes on the job: free inbound bw on those things iirc

    Or 4 of these for similar cpu scores: https://www.wholesaleinternet.net/cart/?id=279 will net you LESS cost per month

    Thanked by 1ricardo
  • qpsqps Member, Host Rep
    edited August 2016

    @ricardo QuickPacket is running a special on our Dual Xeon L5640 systems in Atlanta at $49.99 per system per month. You could probably accomplish the same thing with fewer systems since each server has 12 cores/24 threads. Order link is in my signature.

    Thanked by 1ricardo
  • @teamacc said:

    Or 4 of these for similar cpu scores: https://www.wholesaleinternet.net/cart/?id=279 will net you LESS cost per month

    +1 for the CPU/cost ratio - use the additional IPs with proxmox to split them into pieces for easier management/deployment and backup capabilities

  • Some good suggestions at comparable rates. Thanks. I think i'll do some more sums to see my specific requirements, but the ones mentioned are ballpark of what I'm after. RAM/disk requirements are quite low, CPU and bandwidth will be the bottlenecks. I'll be taking a closer look over the next week. Cheers.

  • Falzo said: +1 for the CPU/cost ratio - use the additional IPs with proxmox to split them into pieces for easier management/deployment and backup capabilities

    Sure. Then have his app perform slower because it is based on single core clock, not core availability, plus limiting inbound to much less available BW (100(0)Mbit divided by VPS).

    Hetzner is cheap for i7s, especially if you don't pay VAT - Seflow also ok, but already more expensive. JoesDC/Wholesale (essentially same crap) should work also, but the HE/Cogent mix is pretty bad.

  • @ricardo Is your crawling app heavily multi threaded / multi process, or is it single thread? I.e. is single thread performance more important, or more cores / threads is better?

  • Lots of threads. Give me a moment and I'll type out a fairly detailed description of how its done. Then hopefully I'll get the creme-de-la-creme analysis from you lot :)

  • ricardoricardo Member
    edited August 2016

    This is a pretty detailed gist of what the boxes will do, and I'm open to ideas of what hardware (or offer) would fit the bill. The one caveat I'd add is that extra bandwidth and more boxes is a distinct possibility.

    • Crawler, will do around 200 million pages a month, pretty much the home page of domains I know of that are currently resolving. Would be looking to scale if up into the low billions by grabbing some inner pages.
    • One box dealing with DNS using unbound, also deals with creating and handing out crawler queues, mostly from memory and disk will be relatively untouched.
    • libCURL threaded on other boxes fetching the pages, gzipped where possible. My baseline number for page size is 25 Kilobytes.
    • other boxes do a series of pattern matches mostly using PCRE regex (around 1000 regexes), metadata is stored and page is discarded (hence not much need for disk but a need for CPU).
    • processed data sent off to some other servers

    My previous incarnation of this used a headless browser (phantomJS) because I was interested in Javascript manipulating the DOM, but I've found out it doesn't make all that much difference for the fingerprints I'm interested in. I ran around 30 threads per box and was maxing out on CPU, but using libcurl I imagine the CPU load will be 90% lower for crawling and more intensive on the PCRE front, assuming I crawl quicker.

    • So 200 million pages * 25 kilobytes is roughly 5TB which I'd do monthly.
    • Ideally I'd crawl in around 2 weeks, monthly, so that's around 8.3 million pages daily, or 100 pages a second
    • Seems like a 1Gbit pipe would be OK for now
    • PhantomJS would do around 12 pages a second per box on those CPUs, so even 8 boxes with a comparable processor would do fine, or 4 boxes with a newer gen CPU. thoughts? The load on each box was comparable to the number of threads each box had, 8, or lower. The load was a bit unpredictable due to shoddy JS on the web and the variability of page/DOM sizes.
    • Around 10MB should be fine for each thread.
    Thanked by 1rincewind
  • What load are you currently pulling on your boxes? Or are they cancelled due to payment malfunction?

  • I've updated my post regarding load.

  • exception0x876exception0x876 Member, Host Rep, LIR

    @Zen said:
    Even older E3's benchmark twice as fast and have twice as many threads.

    ricardo said: processor : 7

    His current systems seems to have dual CPU, so it should be close to one E3.

  • SpeedyKVMSpeedyKVM Banned, Member
    edited August 2016

    https://speedykvm.com/#vdedi

    V-DEDICATED #1 (1x KVM VPS on dedicated host node) with coupon LET, $21.75/mo for;

    • 16GB ECC RAM (15GB usable)
    • Xeon E3-1230 (8 CPU)
    • 120GB SSD (90GB usable)
    • Single SSD
    • Full KVM Virtualization
    • 10TB Transfer, 1Gbps Port
    • 1 IPv4 ($1 for more), /64 IPv6
    • UPS Protected Power

    Seems ideal for a crawler as you have the dedicated resources, dedicated gigabit port, etc. If you need more storage get a storage plan too, or larger vdedi and nfs mount it.

    Thanked by 1ricardo
  • rincewindrincewind Member
    edited August 2016

    Wow. That's some serious number crunching.

    For string matching, Pire looks right up your alley for combining multiple regex matches in a single pass. There's also Google's RE2 and PCRE-SLJIT compared in this benchmark. For C++, I like Boost::Xpressive's static regexes.

    Are you avoiding XML parsing for performance reasons? I would have thought libxml2 parsing with XPath matching would be easier, and with 1000 regexes per page, you will amortize the expense of building the XML tree.

    EDIT: Meant Boost:Xpressive and not Boost::Spirit

    Thanked by 1ricardo
  • You might want to grab a few of the Intel Atom C2750 from Online.net. €15.99 each, pretty much the same price and better specs

  • Eobble said: You might want to grab a few of the Intel Atom C2750 from Online.net. €15.99 each, pretty much the same price and better specs

    Online Ip reputation is pretty crappy though.

  • Eobble said: You might want to grab a few of the Intel Atom C2750 from Online.net. €15.99 each, pretty much the same price and better specs

    you probably missed the point that those delimiter boxes come with dual cpu, so I think the c2750 won't deliver better performance ;-)

  • ricardoricardo Member
    edited August 2016

    @Incero

    That looks like a nice deal, what part of it isn't dedicated then? Would you do a deal on 10 paid quarterly? I'd probably have a few more things to check out before going with it, maybe I can PM.

    rincewind said: Are you avoiding XML parsing for performance reasons

    Thanks for the thoughts on it rince, previously PhantomJS already had the DOM tree so that's where the heavy lifting was done, I haven't looked into anything other than PCRE regexes for now, but will check out those other libraries you had mentioned. Fact is I couldn't really get into the inner workings of phantomJS and I'm pretty confident anything I do in C will outperform it, particularly due to all the stuff a headless browser has to do compared to a dumb grab via cURL. I probably will do some XML parsing as the majority of fingerprints are located in specific elements, I don't look at the whole page for each regex (e.g. hundreds of them just look at src and href attributes). The < head,a,script > elements contain most stuff, so I can even skip most of the string and just parse them sequentially with a subset of the regex.

    Thanks to others for posting insights. I'm not too clued up on a CPU like for like comparison but have a slightly better idea today. It does seem quite reasonable that I can get similar CPU performance for a similar price.

    Thanked by 2SpeedyKVM rincewind
Sign In or Register to comment.