Looking for a few easy scripts that will do this for meStill waiting for this to star

alc · Nov 22, 2010

Anyone got a easy script that will go though hundreds of urls and check for repeats and delete them? Also wounder if there is one that will click through to check if the sights are active and delete any with error pages?

dchuk · Nov 22, 2010

scrapebox will do both of these out of the box

mattseh · Nov 23, 2010

sort -u bigfuckingfile.txt | uniq -u > uniquefuckingfile.txt

yast · Nov 23, 2010

mattseh said:
sort -u bigfuckingfile.txt | uniq -u > uniquefuckingfile.txt

the beauty of a shell

Houdas · Nov 24, 2010

mattseh said:
sort -u bigfuckingfile.txt | uniq -u > uniquefuckingfile.txt

wow... i HAVE to learn some of this unix/terminal bullshit

uplinked · Nov 24, 2010

one-upsmanship -- after using mattseh's script to get the unique addresses, this will load each page in curl, output it's HTTP status, and add the ones that are http200 (OK) to the status-ok.txt file

Code:

for i in `sort -u bigfuckingfile.txt | uniq -u`; 
do 
  page=$(curl -is $i);
  out=$(echo $page | head -n1 | cut -d\  -f2);
  echo "Status:\t$out \t$i";
  [[ "$out" == "200" ]] && echo $i >> status-ok.txt;
done

suck my balls, matt

uplinked · Nov 24, 2010

just for the sake of completeness --

Code:

# example input: bigfuckingfile.txt
http://www.google.com
http://www.google.com
http://google.com
http://google12345.com

# script output:
Status:	 	http://google12345.com
Status:	301 	http://google.com
Status:	200 	http://www.google.com

# example output file: status-ok.txt
http://www.google.com

esrun · Nov 24, 2010

Haha nice. I liked the simplicity of Matts script but yeah, nice addition uplinked. That's actually a sexy little routine there.

Search

Search

Looking for a few easy scripts that will do this for meStill waiting for this to star

alc

New member

dchuk

Senior Botter

mattseh

import this

yast

New member

Houdas

Member

uplinked

Code-whisperer

uplinked

Code-whisperer

esrun

New member