Page Strength Revisited

When I first looked at Matt Inman and Rand Fishkin’s Page Strength tool just after its launch the correlation coefficient with PageRank was extremely high at 0.92 on a random sample of 26 urls (see Page Strength). Now the Aviva Directory has published a list of Bob Mutch’s directories showing their Page Strength and simply by adding the current PageRank value to the table it is possible to calculate a correlation coefficient on this larger data set.

Directories with a current PageRank of zero were removed from the table so as not to perturbate the results. Many of them are PageRank zero because they are banned from Google’s index, for example galaxy.com (Page Strength 5), cannylink.com (Page Strength 4), dirone.com (Page Strength 3.5) and so on.

After the removal of all directories with a current PageRank of zero the remaining 277 were plotted on an x,y chart and the correlation coefficient calculated. The results were as follows:

Directories Page Strength against PageRank chart

The correlation coefficient was lower than in the previous data set at 0.78 but still significant. I had a quick look at some of the outliers like topicalbeach.com with a PageRank of 5 and a Page Strength of 1.5 and found that in this case Aviva Directory had misreported the Page Strength which is actually 4.5. Validating the data would increase the correlation coefficient.

Users of the Page Strength tool have reported a variety of anomalies because the Yahoo search operator linkdomain:domain.tld used by the tool does not always return correct results. Sorry Matt and Rand but I cannot see this tool as anything other than a very unreliable proxy for the already unreliable PageRank metric.

9 Comments »

  1. Aviva said,

    August 9, 2006 @ 9:34 pm

    Michael, thanks for the interesting follow up!

    It’s not entirely surprising to see such a close correlation between page strength and pagerank, given that pagerank is an element that is considered in assessing page strength, and given that linkage data is an important contributing factor to both measurements.

    I agree with you that page strength is far from perfect and I don’t think that anyone is claiming that it is. But I think that your claim about the emperor’s new clothes is a bit unfair. It seems to me that page strength is the best measurement we have available at the moment because it is the only one that takes into account all the known ranking factors for which there is publicly accessible data. Sure, better tools can and probably will be developed.

  2. Red Cardinal said,

    August 17, 2006 @ 9:31 am

    Interesting findings Michael.

    I still think that the Seomoz tool is very handy for aggregating various indicators that help assessing any given site/page.

    But your study is still useful.

    I presume your into statistical analysis? 4 years studying stats in university and I cant remember a bit of it :)

    Cheers

    Richard.

  3. bobmutch said,

    September 12, 2006 @ 9:46 pm

    Aviva:
    >>>It’s not entirely surprising to see such a close correlation between page strength and pagerank.

    I agree! Using a 0 to 10 scale (really it’s a 20 unit scale due to using 1/2’s), you can’t expect it to be much different as Pagerank is not all the bad for showing the strength of sites.

  4. duz said,

    September 12, 2006 @ 10:03 pm

    Bob the fact that PageRank and Page Strength are both on a 0 to 10 scale will not affect the correlation. For example we could multiply every value of Page Strength by 100,000 so that it was on a scale of 0 to 1,000,000 and the correlation would be numerically the same.

  5. bobmutch said,

    September 13, 2006 @ 5:30 pm

    duz:
    I agree on that if you multiply every value by 100,000 as you are not still working with a scale of 0 to 10 (11 units in the case of Pagerank and 21 in the case of Rand’s tool that does halfs).

    But if you have a real scale that has values of 0 to 100,000 you can show a wider variance between sites. I think this is one of the errors of Pagerank and one of the things that I wouldn’t be supprised to see Rand fix.

  6. bobmutch said,

    September 14, 2006 @ 1:43 am

    That should have said you __are__ still working… not you are not working.

    I agree on that if you multiply every value by 100,000 as you are still working with a scale of 0 to 10 (11 units in the case of Pagerank and 21 in the case of Rand’s tool that does halves).

  7. duz said,

    September 14, 2006 @ 5:44 am

    Bob - By scale I mean the minimum and maximum values only.

    The choice of scale is arbitrary, will not affect the distribution and is usually chosen to make the variable easy to use or visualize.

    ILQ, Page Strength and toolbar PageRank are all discrete variables i.e variables that can only have certain exact values (unlike continuous variables). Choosing the number of values a discrete variable has depends on the granularity of the parameters used in its calculation and the purpose to which it will be put.

  8. Brian Turner said,

    January 17, 2007 @ 3:52 pm

    A couple of problems with the PAgestrength tool is that I can’t get my DMOZ or Wikipedia links to show.

    Additionally, my UK sites get links from UK universities - which means .ac.uk links, not .edu links - so I feel sites like Platinax are somewhat underestimated by this process. :)

  9. duz said,

    January 18, 2007 @ 12:29 am

    That’s a good point Brian. As Rand says “Even if the number we spit back isn’t all that valuable, the data certainly is…” which would be true if the data was reliable - which it isn’t. However as I said here “A very good example of Link Bait in action and a lesson for us all!”.

RSS feed for comments on this post · TrackBack URI

Leave a Comment

Bot-Check