Found a good Robots.txt file for Magento

ANOMOANON · Nov 25, 2013

A while back I was having some major issues with Magento duplicate content showing up in GWT.

What happens is that the big G crawls your Magento cart and classifies all of your urls that "sort" info in the backend as unique urls with duplicate content. It gets hella messy.

I never saw any negative action toward my sites due to this very common problem, but Im weird and I like everything to be all clean n pretty.

If you are using Magento you may find this useful (copy and paste the follwing into a txt file/ save as robots.txt and up to your ftp root):

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Dissalow: /catalog/product/gallery/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

ANOMOANON · Nov 25, 2013

PS: If any of you vets lurking around have any suggestions/additions/corrections please don't be shy

microphone head · Nov 25, 2013

Many thanks. Useful !

flaw3d · Nov 25, 2013

I don't use magento but I still think this shit is awesome.

mattseh · Nov 25, 2013

Disallow: /cron.php
Disallow: /cron.sh

oh god

Meatytreats · Nov 25, 2013

ANOMOANON said:
What happens is that the big G crawls your Magento cart and classifies all of your urls that "sort" info in the backend as unique urls with duplicate content. It gets hella messy.

This is what the rel="canonical" tag is for...

ANOMOANON · Nov 25, 2013

mattseh said:
Disallow: /cron.php
Disallow: /cron.sh

oh god

Would love a little expansion on this thought

Meatytreats said:
This is what the rel="canonical" tag is for...

I wish it were that simple. For some reason with this CMS its pure confusion hell with the duplicate content situation. Its very very frustrating. I did all of the tricks, even blocked parameters in GWT, and it was still just a fuck show.

For me this has been the only thing that actually worked.

Faymus · Dec 2, 2013

mattseh said:
Disallow: /cron.php
Disallow: /cron.sh

oh god

This is actually very common (and ideal) to disallow both of these files in a Magento robots.txt file.

rish3 · Dec 2, 2013

It's more difficult to implement, but a combination of <meta robots> and rel=canonical, rel=next/prev, is a better solution.

Robots.txt doesn't prevent indexing of a page, it just prevents crawling. You end up with indexed pages in google that look like this:

The better approach is:

rel=canonical for things like duplicate entry points for category pages, product pages, etc
rel=next,rel=prev for paginated category listings
<meta robots="noindex"> for things that shouldn't be indexed at all, like product search results, cron.php and other internal-use pages, etc.

Why is this better? Well, for one, if someone links to a page, rel=canonical keeps the juice. Robots.txt doesn't. It also keeps your indexed page count down, which may have some effect for things like Panda.

If you aren't into hacking support for these things yourself, I assume there are relatively inexpensive 3rd party plugins.

Ron Weasley · Jan 2, 2014

Thanks a lot mate.

robbied79 · Jan 7, 2014

good shiz.. thank bro

Search

Search

Found a good Robots.txt file for Magento

ANOMOANON

shmanamoanen

ANOMOANON

shmanamoanen

microphone head

New member

flaw3d

Doing PPC

mattseh

import this

Meatytreats

New member

ANOMOANON

shmanamoanen

Faymus

New member

rish3

New member

Ron Weasley

New member

robbied79

New member