How to block Ahrefs, majesticSEO and others ?

furfing

New member
Sep 12, 2008
607
3
0
N/A
Hello,

I'd like to know your suggestion to block Ahrefs, majesticSEO and others from crawling your website.

Thx for your suggestions.
 


Personally I just have honeypot URL's, once visited, the IP gets blocked, exceptions for google/bing.

No-one browsing the site would visit this URL, so anything visiting it would mostly likely be a bot or someone being nosey.


Exceptions for google are easy as their IP addresses are listed, however Bing's arent so be careful. Many bots use googlebot/bing as their user-agent, so you do need to verify their IP for validity.
 
I think in one of the CCarter's enlightened threads there was a code for htaccess for making this.

I never used it in my sites though.

BTW no one from my PBN clients has asked for this, so maybe it's not so important after all...
 
Hey furfing,

You need to put an ".htaccess" file on your server. In the public_html folder (Root of your website)

The contents of the .htaccess file is:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

<IfModule mod_rewrite.c>
RewriteEngine On
SetEnvIfNoCase User-Agent .*ia_archiver.* bad_bot
SetEnvIfNoCase User-Agent .*rogerbot.* bad_bot
SetEnvIfNoCase User-Agent .*exabot.* bad_bot
SetEnvIfNoCase User-Agent .*mj1.* bad_bot
SetEnvIfNoCase User-Agent .*dotbot.* bad_bot
SetEnvIfNoCase User-Agent .*gigabot.* bad_bot
SetEnvIfNoCase User-Agent .*ahrefsbot.* bad_bot
SetEnvIfNoCase User-Agent .*sitebot.* bad_bot
SetEnvIfNoCase User-Agent .*Twice.* bad_bot
SetEnvIfNoCase User-Agent .*Yand.* bad_bot
SetEnvIfNoCase User-Agent .*Yahoo.* bad_bot
SetEnvIfNoCase User-Agent .*Voil.* bad_bot
SetEnvIfNoCase User-Agent .*libw.* bad_bot
SetEnvIfNoCase User-Agent .*Java.* bad_bot
SetEnvIfNoCase User-Agent .*Sogou.* bad_bot
SetEnvIfNoCase User-Agent .*psbot.* bad_bot
SetEnvIfNoCase User-Agent .*boitho.* bad_bot
SetEnvIfNoCase User-Agent .*ajSitemap.* bad_bot
SetEnvIfNoCase User-Agent .*Rankivabot.* bad_bot
SetEnvIfNoCase User-Agent .*DBLBot.* bad_bot
SetEnvIfNoCase User-Agent .*Alexa.* bad_bot
SetEnvIfNoCase User-Agent .*Ezooms.* bad_bot
SetEnvIfNoCase User-Agent .*YodaoBot.* bad_bot
SetEnvIfNoCase User-Agent .*CompSpyBot.* bad_bot
SetEnvIfNoCase User-Agent .*JikeSpider.* bad_bot
SetEnvIfNoCase User-Agent .*FyberSpider.* bad_bot
SetEnvIfNoCase User-Agent .*Sosospider.* bad_bot
SetEnvIfNoCase User-Agent .*Sogou.* bad_bot
SetEnvIfNoCase User-Agent .*Spider.* bad_bot
SetEnvIfNoCase User-Agent .*icerocket.* bad_bot
SetEnvIfNoCase User-Agent .*blogsearch.* bad_bot


<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
</IfModule>

The list of user agent strings might need to get updated. There's lists of bot strings laying around the web/forums somewhere.

This is a good start tho.
 
my suggestion, dont.

if I was a smart programmer at ahrefs and majestic and the crawler I coded for them got "blocked" by your robots txt or htaccess, that would just make me want to visit you more and see what your hiding.

i could easily mark you down to get "revisited" by my other crawler, on another IP, with a different user agent, on a different OS ( change any variable with another ) and get the info I wanted.

blocking them is just a way to for sure bring attention to yourself.

If you don't think this happens ( not talking to the OP, but anyone reading that thinks Im crazy ), than you surely don't pay attention to your logs much.​
 
Eliquid,

thx for your input.

so you imply doing so would be the perfect footprint for Google Spam team and then nuke my PBN ?

Your remark make a lot of sense and i didn't think it this way.

Thx