Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Help with multiple CURL commands
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Help with multiple CURL commands

eKoeKo Member

Hello,
Can someone help me out with a command like this:

curl --compressed -m 5 --retry 2 --retry-delay 2 --silent -H 'Accept-Encoding: ' --connect-timeout 5 -w 'www.google.com\t:\t%{time_total}\n' -o /dev/null https://www.google.com > /home/url-list.txt

I would like to do a bash script, to run multiple urls at the same time from a txt file (like google.com, google.co.uk, google.co.in, etc) and save the results to a txt file.
Im asking if someone is good with bash and willing to help (I need to run that script with crontab).

p.s. I know 0 about bash...

Thanks!

Comments

  • risharderisharde Patron Provider, Veteran

    I'm on mobile so I can't test said command... is it that you are trying to scrape urls from google? If so, google usually detects these searches as automated after you do a few page loads... have never tried to bypass the protection ...you might need to factor in proxies to achieve the desired result.. this was when I tested it a few years back... not sure if google has changed since then

  • eKoeKo Member

    Hello,
    No, actually Im trying to get pageload of given urls from a txt file (in this case i put google as example domains) I need to monitor my own domains pageloads and save them every 1h into an other txt file (like results-date-hour.txt ?).

    a sample urls.txt can be:
    mydomain.com
    mydomain.net
    mydomain.org

    and a sample results.txt can be:

    #

    mydomain.com : 0.140
    mydomain.net : 0.333
    mydomain.org: 1.100

    Tested on Date Time

    #

    Thanks!

  • risharderisharde Patron Provider, Veteran

    Gotcha, as soon as I get to a terminal, I will work on it for you as long as someone else doesn't beat me to it

  • IonSwitch_StanIonSwitch_Stan Member, Host Rep

    root@test-vps1 ~]# cat domains

    www.google.com

    www.reddit.com

    www.ionswitch.com

    [root@test-vps1 ~]# cat domains | xargs -i curl --compressed -m 5 --retry 2 --retry-delay 2 --silent -H 'Accept-Encoding: ' --connect-timeout 5 -w '{}\t:\t%{time_total}\n' -o /dev/null https://{} >> log

    [root@test-vps1 ~]# cat log

    www.google.com : 0.491

    www.reddit.com : 0.183

    www.ionswitch.com : 0.180

  • eKoeKo Member

    Hello Stan,
    I see you have done it pretty quickly, but can you please do instead of cat domains import the domains from a domains.txt ? and I presume if I change >> log with >> results.txt is the same?

    Can be done to add an timestamp on the footer of the results like: Date-Time ?
    I really appreciate your help guys!

  • ricardoricardo Member

    In his example, "domains" is the name of the input text file.

  • eKoeKo Member

    ok,
    In domains.txt i use:

    https://www.google.co.in  
    https://www.google.co.uk

    the command is:

    rm -rf /home/results.txt && cat /home/domains.txt | xargs -i curl --compressed -m 5 --retry 2 --retry-delay 2 --silent -H 'Accept-Encoding: ' --connect-timeout 5 -w '{}\t:\t%{time_total}\n' -o /dev/null {} >> /home/results.txt

    the results.txt is:

    https://www.google.co.in
        :   0.000
    https://www.google.co.uk    :   0.048

    o.O How is this possible?
    I need first to delete the old results.txt then create the new one, thats why the rm -rf ...

  • edited May 2017

    -o /dev/null sends the output into space.

    -w '{}\t:\t%{time_total}\n specifies what is being sent to stdout.

    >> /home/results.txt saves stdout to the results.txt file, which is why you're only seeing how long it took to download the page.

    As some general advice, read the man page before running any command posted on the Internet. One, it makes sure the command isn't malicious, and two, it makes sure you understand what is going on.

    man curl or https://curl.haxx.se/docs/manpage.html.

    Also, rm /home/results.txt will suffice since it's your file.

Sign In or Register to comment.