Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How to Cache or copy a website?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How to Cache or copy a website?

Tenshi_420Tenshi_420 Member
edited February 2014 in Help

I am trying to cache or copy a (an entire website) websites contents to save as a backup. I am doing this out of self preservation not to clone or phishing sites. How can I go about this ?

The site contains mostly text with images. Nothing to fancy just cache/copy.

In case something happens I want to be able to offer copies in case of an emergency. Its a bit of a secret project and very very time sensitive project that I am trying to accomplish as soon as I can.

Thanks for taking time to read. (:

P.S. I do not own or have access to this website other than its www .

Comments

  • imagineimagine Member
    edited February 2014

    HTTrack should do the trick: http://www.httrack.com/

  • Wget will the do job

    wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains www.domain.com \ --no-parent \ http://www.domain.com/folder/

    Thanked by 1perennate
  • AmitzAmitz Member
    edited February 2014

    Just as a friendly reminder by the admin of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

    Thanked by 2jar Pwner
  • For the others thanks, I'll try it when I have more time and;

    @Amitz said:
    Just as a short reminder by the owner of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

    Oh thanks for the heads up, I think I would cache (when I can) a page per 9 seconds, maybe write a cheesy workable bash script and cron it-- with supervision)

    Thanked by 1Amitz
  • @Amitz said:
    Just as a short reminder by the owner of a website that gets "copied" every day by people who want to "archive" the content: It's a pain in the ass and can put some heavy load on the server and causes quite some trouble to the admin. Be gentle...

    What site do you run O.O

  • I am responsible for the technical part (not the content) of a niche adult gallery. 15,000+ images and there are people who try to "cache" them all locally on a daily basis...

  • @Amitz can you share a link please?

  • support123support123 Member
    edited February 2014

    marcm said: @Amitz can you share a link please?

    Link will be better.

  • No, sorry. :)
    I do not want the site to get associated (publicly) with me. I just take care about the backend, not the content.

  • ftpit said: 0.0

    image

    Thanked by 2Rallias Mark_R
  • Use wget, as suggested. There's lots of commandline options to do what you want, including "mirror".

    Amitz said: It's a pain in the ass and can put some heavy load on the server

    And there's a "wait" parameter (-w) to insert a delay between requests so you don't create an issue like this. Use a healthy wait and cue the job up before going to bed... Next morning your good to go.

  • painfreepcpainfreepc Member
    edited March 2014

    easy way to do it on you home desktop:

    WinHTTrack

    Download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.

  • painfreepc said: easy way to do it on you home desktop:

    WinHTTrack

    Personally I block it on my servers for the abuse issues that @Amitz raised above.

  • RadiRadi Host Rep, Veteran
    edited March 2014

    @Amitz said:
    I am responsible for the technical part (not the content) of a niche adult gallery. 15,000+ images and there are people who try to "cache" them all locally on a daily basis...

    Create a weekly archive by cron(may create a load on server, but its only once a week). Use nginx on an unmetered server to host the archive. :P

Sign In or Register to comment.