Login or Register



Shopping Cart

Your cart is empty

PDF Print E-mail

TrustRank

 

TrustRank is a patent registered by Google, however this section is based on a TrustRank paper done by Yahoo! The paper however has fantastic insight on the challenges search engines face and how they can plan to overcome these challenges.

Trustrank is a new trademark registered by Google and at the compilation of this document purely speculation as to what exactly it is. Following is my own interpretation of what this new concept can be and how it will effect SEO in the future based on the TrustRank paper originally done by Yahoo!.

Trustrank is an algorithm with the main objective to combat spam and misleading websites. It includes many of the factors that are already familiar to the SEO industry. The easiest way to explain TrustRank is by removing the mathematics and focusing on the assumptions on which it is based. The reason for this is that there are various variables involved with the algorithm which makes the accurate calculation impossible. What is important though is that PR plays a very important role in the ultimate calculation of trust rank and that the assumption of various SEO's (including myself) that the importance of PR only related to the depth at which a website was spidered is not true. It forms an integral part of the ultimate TrustRank algorithm with minor changes made to the PR algorithm.

The first point that are made clear is that the algorithm are capable of removing self hyperlinks. In other words hyperlinks referencing pages in the same URL or domain these links are collapsed into one score.

The following pages are identified for later use. Pages with no incoming links (unreferenced pages), pages without outgoing links (non referencing pages) and pages with both (isolated pages).

As part of the objective to fight Spam human intervention is required. Pages are divided up in good (white) pages and bad (black) pages. It is also assumed that good pages seldom points to bad pages. On the other hand bad pages quite often link to bad pages. The ideal trust property is calculated using these assumptions and eventually list the pages according to their probability of being trusted with a score between 0 (bad) and 1 (good).

What is clear as well is that the entire algorithm draws a web graph. This means that entire networks of links effect the TrustRank of pages connected to each other.

It is also assumed that the more credit a page receive from other good pages, the more probable that it is also good.

Added to this it is assumed that the amount of external links on a page are usually in proportion with the quality of the page. It is therefore assumed that if there are only a few external links on a page, there is a good possibility that the pages being linked to are good.

The opposite is also assumed. That if a single page has many external links there is a high probability that some of the pages linked to are bad.

In a sense TrustRank can be seen as the opposite of the PageRank algorithm. Where the PageRank algorithm mainly calculates a score based on the incoming links, TrustRank bases a score on the number of outgoing links. However the dampening factors makes it difficult to accurately pinpoint the balance of incoming and external links that a page should have.

Please note that it seems the TrustRank algorithm prefers homepages of sites with a high PageRank above 4 for competitive terms.

Why is TrustRank important

It is important when reading through my summary to understand the value of this very short summary of the TrustRank paper. SEO is always open for debate and the one thing always winning a debate is logic. In this case the TrustRank paper makes allot of sense.

One thing that has always amazed me was the complexity of search and the infrastructure involved behind delivering good research results. A search company like Google has an incredible network infrastructure involved behind the scenes and to a large degree above comprehension to normal individuals. The TrustRank paper gives allot of insight to the limitations they have to deal with. It also explains to a large extent why Google opted to "Sandbox" new sites as this gives them time to evaluate the website with regards to recency of updates and in general the compliance to their requirements before allowing it to rank in their SERPS. This makes allot of sense in their continuing struggle against spammers and blackhat SEO's in delevering true quality results to their users.