How do I get backlinks for a site?

nighthaze

Member
Oct 26, 2012
138
2
18
Can anyone elaborate on how sites like backlinkwatch.com scrape their data?

I want to build something similar without using paid 3rd party APIs like MajesticSEO or Ahrefs or SEOMoz.

Basically it has to either rely on search engines for the results or devise other free ways to scrape links.
 


You either need to create an index of the internet that you can query against for the backlink data (which is what those 3 companies you listed do) or do some fancy search engine querying that won't work very well at all and will blast the shit out of your proxies.

Ironically the "easier" option is to build your own index, but that's not easy at all.
 
@igl00, what scripts are those?

@pinchyfingers, "If you have to ask this question..." then I probably don't know what I'm doing?

You're right ;)

I need something basic, for a seo / domain tools site, like the top 40 links for a given URL.

Given that I'm not going or willing to code the next Google, I think I'll stick with an API like MajesticSEO's. I'm still interested if there's a quick solution or readily available script that does a basic job of this.
 
Spend $79 on ahrefs, move on to your next problem. This is one of those things where it's worth just paying the money.

Actually, that's most things.
 
Sage advice, but won't work in my case. I plan to put this on a public site which will have loads of visitors use the tool and I'll have to either pay for millions of API requests OR come to an agreement with them to let me use it for free in exchange for branding / sending them leads. So far only one provider has replied and is willing to go along with it, but not with me as an affiliate.
 
Sage advice, but won't work in my case. I plan to put this on a public site which will have loads of visitors use the tool and I'll have to either pay for millions of API requests OR come to an agreement with them to let me use it for free in exchange for branding / sending them leads. So far only one provider has replied and is willing to go along with it, but not with me as an affiliate.

then don't offer this for free?

There's a reason you need to pay to get the data from these providers: it costs money to get the data in the first place.
 
Earlier this year, I was using the Alexa data through Amazon Web Services.

It's pretty cheap to query per domain.

But you only get back the backlinks Alexa has. You're not going to get complete or comprehensive data.

Still, we're talking fractions of a penny per site lookup, so it is affordable, plus depending on how you structure your queries, you can get other related data back at the same time for no extra cost.
 
If you don't want to scrape at all then you will have to index the web from scratch (from zero), as pointed out erlier.

If you are okay with scraping the search engines but not the tools like Ahrefs, Majestic etc then you will have to issue the search term in each searrch engine that you want to scrape and then parse the search engine results to build your index.

You could also try and find whether there are API-driven methods to do this. I am not sure. I think Google does offer some API but I guess you will have to use your Google key for that. I don't know what the other SEs do on this front.
 
You can try Scrapebox. You can use it to analyse competitor backlinks.

First you have scrape some free proxies. Once you clean that list up by removing dead proxies or unresponsive ones, you then have your proxy list.

Then you select your footprint(i usually just click the wordpress radio button)

If you're doing research on a particular website you can input link:name-of-site where "name-of-site" is the url to the domain of the site you're researching.

Then hit SCRAPE and you're off!

It'll search through google, bing, yahoo and another one I believe for any backlinks to that site and provide them to you in a table.

You can check the PR's of each of the sites etc....


There are addons to can install like one that adds to the table the number of outbound links (OBL..)

You can get thousands and thousands of links that way.

That might be how they do it ;)
 
Also worth noting is that the scraping of the web has been done for you already. Using the free CommonCrawl, you already have an index at your disposal. The issue? It's 20+ TB. In order to effectively query it, you'll have to use MapReduce and Amazon AWS' Elastic MapReduce feature. There are a few examples online, but it's basically Java/Hadoop. Now, since your task is finding back links you'll have to process the entire data set because it's a raw data set, and the only index built is for determining known URLs. For example, you can use the common crawl to immediately download the contents of a known URL, but finding the back links for a given URL will require you to traverse the common crawl.

Either way, it's not easy - BUT don't start off by scraping the Internets.
 
open site explorer for competition backlinks, alexa backlinks from competitions(some are olders...) or try seokicks.de for backlinks competition.
 
there are different methods of getting links one is paid and another getting links from off page submission.. many types are there..
article submission
book marking
directory submission
web 2.0
etc.