Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Looking for (urgent!) help on an archiving project for a few days
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Looking for (urgent!) help on an archiving project for a few days

joepie91joepie91 Member, Patron Provider

We (ArchiveTeam) are currently archiving Blip.tv, as it's shutting down, but we've run into a bit of a capacity snafu. It's an all-volunteer (and very time-sensitive!) effort.

The archival is distributed amongst many systems, and the archived data is then uploaded to one of several 'rsync targets' - collection boxes, pretty much. These rsync servers need to be pretty beefy, as they get potentially several gbps of data thrown at them constantly, for the duration of an archival project.

Now our primary rsync target has filled up to the brim, and we can't get it emptied out in time - Blip is already in the process of shutting down, and most likely the content will disappear somewhere during today. The rsync servers we are currently using, are having throughput issues. We're still trying to save the last bits, however, so we could really use any help.

What we're looking for: Somebody to donate spare server space (or a VM, it doesn't matter) for a few days. Expect it to need a few TB of disk space, and inbound bandwidth to possibly hit a few gbps, at least for the first day. After Blip shuts down (likely later today), we'll likely still need a few days to move everything back off.

All it will really need to run, is an rsync daemon - shell/root access would be handy (to set up processing scripts), but is not strictly required. Expect it to use a lot of disk I/O - video files, so largely sequential writes.

If you can help out, then please either join the IRC channel (#blooper.tv on EFNet), or PM me on here.

Thanks!

Comments

  • netomxnetomx Moderator, Veteran

    Why today, that I'm off my city? :(

  • Why does the information on your site say about shutting down in 2014, or have i missed something?

  • joepie91joepie91 Member, Patron Provider

    @AshleyUk said:
    Why does the information on your site say about shutting down in 2014, or have i missed something?

    Apparently they've had a few false starts (well, stops) where they announced to possibly shut down, but eventually didn't. This time it's for real, though.

    There's a few other sites that have done similar things, like TwitPic. We've started to just consider them early warning signs for an actual shutdown :)

  • @Joepie91 - PM me with what you need, I will see what I can do.

  • What's the point of this entire project?

  • archive.org alternative

    Maniac said: What's the point of this entire project?

  • joepie91joepie91 Member, Patron Provider
    edited August 2015

    @GM2015 said:
    archive.org alternative

    Well, not quite. We're uploading the Blip content to archive.org for storage - we're just the ones actually grabbing the content, in this case, archive.org would still be doing the long-term storage.

    We're also backing up the Internet Archive, but that's a separate project :)

  • ManiacManiac Member
    edited August 2015

    joepie91 said: We're uploading the Blip content to archive.org for storage

    The idea of backing up a website that's shutting down and wasting plenty of space just to save videos that likely nobody is going to watch just amazes me. I'm sorry for being so critical, but I just don't get the idea.

  • LeeLee Veteran

    Maniac said: but I just don't get the idea.

    Clearly not.

  • Librarians' obsession.

    Everything needs to be found, nice, tidy and neat.

    Don't you put your stinky hands on my videos!

    Maniac said: The idea of backing up a website that's shutting down and wasting plenty of space just to save videos that likely nobody is going to watch just amazes me. I'm sorry for being so critical, but I just don't get the idea.

    Thanked by 1vimalware
  • Do you have a disk space amount that'll be required? I can spin up a box in DC now.

  • joepie91joepie91 Member, Patron Provider
    edited August 2015

    Maniac said: The idea of backing up a website that's shutting down and wasting plenty of space just to save videos that likely nobody is going to watch just amazes me. I'm sorry for being so critical, but I just don't get the idea.

    Jason Scott has done an excellent series of talks on this topic if you want to learn more about it, but the very concise summary is:

    Companies convince you to store your data with them, (implicitly) promising that they'll keep it safe. They'll then proceed to delete it when it becomes unprofitable to store or continue operating, not caring about anybody's possible emotional attachment or practical use of it.

    This hurts the historical record (which is actually very important, socially), this hurts those individuals whose data has been lost (often there's little notice given, and they only find out that data is gone afterwards, when it can't be recovered anymore), and it hurts everybody else, because that thing you're looking for suddenly doesn't exist anymore, because some suit somewhere decided that it didn't bring in enough money.

    If you've ever used the Wayback Machine: that's why.

    @KwiceroLTD said:
    Do you have a disk space amount that'll be required? I can spin up a box in DC now.

    I'll send you a PM :)

    Estimate is 9-10 TB worst-case, but PMs are probably faster to talk because of notifications.

    EDIT: The current statistics, for those who are curious: http://tracker.archiveteam.org/blip/

    Thanked by 1J1021
  • I have to disagree.

    Loosing data is great.

    It brings back time in the equation. It makes things forgettable again. People can move on and start fresh again.

  • wazzz7733334 is me :D

  • How much space does Blip.tv use? are we talking about dozens/hundreads of Tb?

  • joepie91joepie91 Member, Patron Provider
    edited August 2015

    @nitro85 said:
    How much space does Blip.tv use? are we talking about dozens/hundreads of Tb?

    We're currently at about 172TB of space used. My estimate (but this may or may not be accurate, depending on how long videos turn out being) is that we'd need another 10TB or so. We're pretty close already!

    Live statistics are here, though the average is off.

  • nitro85nitro85 Member
    edited August 2015

    @joepie91 said:
    We're currently at about 172TB of space used. My estimate (but this may or may not be accurate, depending on how long videos turn out being) is that we'd need another 10TB or so. We're pretty close already!
    Live statistics are here, though the average is off.

    Unfortunatly I cant help..
    What do you guys plan to do with all those cute cat videos?

    Dump it all into Youtube and Dailymotion accounts, and make an archive tube like site for blip.tv with all those video embeded?

  • joepie91joepie91 Member, Patron Provider
    edited August 2015

    nitro85 said: Unfortunatly I cant help.. What do you guys plan to do with all those cute cat videos?

    Dump it all into Youtube and Dailymotion accounts, and make an archive tube like site for blip.tv with all those video embeded?

    It's actually not all cat videos! Blip was an 'original web series' site, which more or less comes down to a vlog site. They had a ton of other content as well, but that was their primary purpose.

    Data will be stored at the Internet Archive, primarily. Where possible, it will be made available through the Wayback Machine, but this may not work correctly. In all cases, you can grab the original data in bulk from the IA collection, which is updated with 25GB chunks as the archival progresses (with a few hours delay).

    There are no plans (to my knowledge) of building a custom site for it, but of course any third party could do so if they wanted to :) For example, the GeoCities backup (also from ArchiveTeam) has been used in bulk by a bunch of cultural/social research projects, and I believe there has also been a "browse the old GeoCities" site built with it.

    Thanked by 1vimalware
  • @joepie91 said:

    Are the rsync source servers in USA or Europe?

    I imagine that would impact the usefulness of donated servers (peak mbps). Kimsufis would be quite pointless with that deadline, for instance .

  • joepie91joepie91 Member, Patron Provider

    vimalware said: Are the rsync source servers in USA or Europe?

    I imagine that would impact the usefulness of donated servers (peak mbps). Kimsufis would be quite pointless with that deadline, for instance .

    The Warriors (ie. actual archiving systems) are distributed across the globe, really. I believe we had a live map at some point, but I can't seem to find it right now...

    The rsync target servers are currently in Singapore, Germany and Atlanta. We do indeed have some issues with transatlantic speeds, though that's kind of inevitable as the 'tracker' doesn't support geotargeting yet. It's at least somewhat made up for by running lots of threads, though.

    Kimsufis aren't really an option as rsync target servers, but mostly because they can't go above 100mbps. The tracker also doesn't support 'weighting' yet, so anything under 1gbps would just become a bottleneck for clients (as it would get just as many clients as the 1gbps servers, and keep their upload threads occupied).

    They're fine as Warriors, though - I'm running two Kimsufis as Warriors myself with this script.

  • Can you make an installer for OpenVZ. Just virtualbox isn't the easiest

  • Had my KS-1 running his last few days as an Warrior on the blip.tv project - managed to upload nearly 1TB.
    So it is (except for the bandwith) not that ressource hungry and can be run on machines with lower power! Just if somebody thinks about helping and is not sure...

  • joepie91joepie91 Member, Patron Provider
    edited August 2015

    TinyTunnel_Tom said: Can you make an installer for OpenVZ. Just virtualbox isn't the easiest

    You can just follow the distribution-specific manual setup instructions on this page for OpenVZ, it should only take 5 minutes or so. The VirtualBox VM is really only for desktop systems - the instructions on the page linked above will set up the archiving script directly on your system (under a limited user), without additional virtualization.

    There's also a Docker image, but I have no idea how well that works.

    The reason the Warrior VM exists, is mostly that NTFS/Windows are not very suitable for archiving (due to technical limitations), so you need to run a virtualized Linux environment. If you're already running Linux, then you won't really need the VM.

    Bochi said: Had my KS-1 running his last few days as an Warrior on the blip.tv project - managed to upload nearly 1TB.

    So it is (except for the bandwith) not that ressource hungry and can be run on machines with lower power! Just if somebody thinks about helping and is not sure...

    Indeed. For this particular project, you should expect about 10mbps of bandwidth usage and 0-2GB of disk space usage per concurrent thread. It cleans up the data after uploading it to an rsync server, so it doesn't need any long-term storage. Try to use touch STOP if you want to gracefully kill the process, though - killing the process directly will result in tasks getting 'stuck'.

    For what it's worth, you can improve throughput by editing pipeline.py before starting the process.

    Change this:

        LimitConcurrent(NumberConfigValue(min=1, max=4, default="1",
            name="shared:rsync_threads", title="Rsync threads",

    ... into:

        LimitConcurrent(NumberConfigValue(min=1, max=20, default="20",
            name="shared:rsync_threads", title="Rsync threads",

    That will bypass the concurrency limit on the rsync uploads, which will help especially if your server is in the US - most rsync servers are a bit slow to upload to, from US locations. This really probably should've been changed in the repository itself, but for some reason, it wasn't.

  • I am away on holiday. I have a few LESes if they would help? Along with some LHCs :P shoot me a PM feel free to max their bandwidth

  • joepie91joepie91 Member, Patron Provider

    We've saved Blip :D!

    I just want to say thanks to everybody who contributed. Thankfully the Blip servers stayed available throughout the weekend, so we could squeeze the last bits out of it. The donated rsync targets have been returned, after moving the data to more permanent locations.

    Of course future help is still welcome for other projects - not much is going on right now, but I'm sure that things will start happening again soon... it usually does. I'm working on Nix packages for the Warrior scripts, so that it should be possible to install them with one command in the future, distro-independently.

    For those who like ArchiveTeam: Jason Scott (who also started ArchiveTeam) has now started a new project, Archive Corps, for similarly short-term volunteer archival efforts... but in the physical world. If you're possibly interested in helping out with that (regardless of where you live), then you can go here to sign up for the mailing list :)

    Thanked by 1netomx
  • netomxnetomx Moderator, Veteran

    @joepie91 said:
    We've saved Blip :D!

    I just want to say thanks to everybody who contributed. Thankfully the Blip servers stayed available throughout the weekend, so we could squeeze the last bits out of it. The donated rsync targets have been returned, after moving the data to more permanent locations.

    Of course future help is still welcome for other projects - not much is going on right now, but I'm sure that things will start happening again soon... it usually does. I'm working on Nix packages for the Warrior scripts, so that it should be possible to install them with one command in the future, distro-independently.

    For those who like ArchiveTeam: Jason Scott (who also started ArchiveTeam) has now started a new project, Archive Corps, for similarly short-term volunteer archival efforts... but in the physical world. If you're possibly interested in helping out with that (regardless of where you live), then you can go here to sign up for the mailing list :)

    Count me in, how much do you need? (HDD)

  • LeeLee Veteran
    edited September 2015

    joepie91 said: We've saved Blip :D!

    Nice work, it's a really worthwhile project.

    Thanked by 1netomx
  • joepie91joepie91 Member, Patron Provider

    @netomx said:
    Count me in, how much do you need? (HDD)

    For Warrior projects? That depends on the project. Blip required about 1GB per concurrent thread, but most projects are smaller (around 25MB per concurrent thread). Data is removed locally after being uploaded to the collection servers :)

  • netomxnetomx Moderator, Veteran

    @joepie91 said:

    For backups

Sign In or Register to comment.