"Welcome to this free resource for site owners and small businesses. If this is your first visit you may find it helpful to read these two posts; About SEO Blog and Using SEO Blog. To keep up to date you should subscribe to the RSS feed and you are welcome to ask questions or to make a comment but you must register first. Thank you and may your business prosper". - Michael Duz

Combine RSS Feeds with Yahoo Pipes

One of the easiest ways for a non-programmer to combine, aggregate and filter multiple RSS feeds into one is to use Yahoo! Pipes (YP). YP uses a sleek visual editor that allows the user to fetch and manipulate data sources, add user defined inputs and filter the content in a number of ways.

I used YP to combine nine* popular SEO feeds into one and then published it on pipes.yahoo.com where anybody can now use it. Try it in your favorite reader - Composite SEO News Feed.

By using the WordPress plugins FeedList and RunPHP I can also easily display the Composite SEO News Feed right here:

Composite SEO News Feed

  1. Facebook Puts the Brakes on Google Friend Connect
  2. Facebook Can’t Stomach Beer Pong
  3. Daily Search Forum Recap: May 15, 2008
  4. SearchCap: The Day In Search, May 15, 2008
  5. Linky Goodness, May 15
  6. More Features for YouTube: Free Demographic Analytics
  7. A Few Bad Votes Too Many?
  8. Carl Icahn Makes His Move To Oust Yahoo Board, Which Has “Completely Botched” Microsoft Merger Talks
  9. Yahoo! SearchMonkey Now Open For Everyone

Remember this is the actual feed not just a graphic so whenever you are viewing this page the feed will be up to date.

When you first look at the drag and drop interface of YP it may seem a little daunting but here is a step by step using the above practical example and you can of course combine any feeds you choose.

First you need to sign in to YP with your Yahoo ID (create an ID if you don’t have one). When you’re signed in click Create a pipe and click the untitled tab to give your pipe a name. Drag a Fetch Feed into the workspace.

Drag the Fetch Feed module to the workspace.

Enter a feed url which you will find on most sites by clicking the RSS, XML or Atom link, or icon. If you see a “?” icon in the Fetch Feed module that means you have input a non-valid feed address.

Copy and paste the feed url.

Click the url icon to enter a second feed.

Click the url icon to enter a second feed.

Enter the second feed url.

Enter the second feed url.

Repeat until you have entered all the feed urls that you want to combine.

Complete the addition of feed urls.

Drag a Sort module to the workspace. Pipe the Fetch Feed module to the Sort module by clicking the circle on top of the Sort module and dragging it to the circle at the bottom of fetch module. A blue pipe will appear and connect the two.

Pipe the Fetch Feed module to the Sort module.

Sort by date in descending order by selecting PubDate from the first drop-down menu and Descending from the second drop-down menu.

Sort by date in descending order.

Drag a Truncate module to the workspace. Pipe the Sort module to the Truncate module by clicking the circle at the bottom of the Sort module and dragging it to the circle at the top of Truncate module. Enter a value for the maximum number of items you require from your combined feed.

Pipe the Sort module to the Truncate module.

Pipe the Truncate module to the Pipe Output and the Debug area will fill up with your new feed’s output.

Pipe the Truncate module to the Pipe Output.

Finally click Save and then click Publish. In the pop-up window enter a description for your pipe and when you click Publish again your Pipe will go public.

By combing YP with mashup tools like Dapper or OpenKapow you will be able to construct an RSS feed from almost anything that you can find on the Web.

*The nine feeds combined in the Composite SEO News Feed:
SEO by the SEA
Search Engine Land
Search Engine Roundtable
Matt Cutts
SEO Book
SEO Blog
SEOMoz
Threadwatch
Marketing Pilgrim

Comments

Disable Personalized Search Browser Add-ons

It happens occasionally when I am on the phone to a client, they are looking at one set of search results in Google and I am seeing a different set for exactly the same search term. In the past it’s down to us accessing different datacenters when Google are in the process of updating their index. It’s never a problem because I have just changed to accessing the same datacenter as my client by typing http://64.233.161.107 or whatever my client’s datacenter IP address happens to be at the time, into the address bar of my browser.

Now I have come across an additional problem, just once but I know it will occur with increasing frequency. My client is logged into their Google Account and is being served personalized search results which of course I cannot duplicate. When this actually happened my solution was to simply have the client sign out of their account. Also of course I had to make sure I was signed out of my Google account as well and this could become a real pain.

Recently I have found myself signing in and out of my Google account like a fiddler’s elbow and was looking for a better solution. Matt Cutts provided the clue when he mentioned at a recent conference that if you want to turn off personalized search for a particular query you just have to paste the parameter “&pws=0” to the end of your query string. That’s a nice touch but it really needs a browser extension to make it work ‘under the hood’.

So here they are - an IE and a Firefox add-on.

For IE7.

Download degsie.zip

Unzip

Close all instances of IE.

Run the unzipped DepersonalizeGoogleSearch – IE

That’s it! All searches on any Google search domain will now be appended with “&pws=0”

You can toggle (enable or disable) the add-on in the browser by going to Tools —> Manage Add-ons —> Enable or Disable Add-ons and locating “DepersonalizeGoogleSearch”.

You can uninstall the add-on in the normal way with Add/Remove programs in Windows.

For Firefox.

Download degsff.zip

Unzip

Close all instances of Firefox.

Run the unzipped DepersonalizeGoogleSearch – Firefox

That’s it! All searches on any Google search domain will now be appended with “&pws=0”

You can toggle (enable or disable) or remove the add-on in the browser by going to Tools —> Add-ons and selecting “DepersonalizeGoogleSearch”. You can also uninstall the add-on with Add/Remove programs in Windows.

If you don’t have international clients or only ever search on one Google country specific domain then you may want to look at Joost de Valk’s Google de-Personalized Search for Firefox and IE7 as an alternative.

Joost’s OpenSearch plugin uses the browser search box but if you want to opt-out of Google personalized search permanently or for long periods the DepersonalizeGoogleSearch add-on uses the regular Google search box so you can continue searching as normal.

If you do use the DepersonalizeGoogleSearch add-on any feedback would be appreciated, just leave a comment below.

Comments

Clicky

The last time I recommended a tool without reservation it was HitTail and now I am recommending another! Ideal for small to medium traffic websites and bloggers it’s an online analytics service with some useful features not found in Google Analytics.

The tool is called Clicky and is programmed by Sean Hammons who is the technical half of Roxr Software Ltd a two person company based in Portland, Oregon. Sean has a degree in psychology and it shows. Not least because the report pages are beautifully minimalist and highly addictive, as you will discover when you initially use the service.

After you have registered all you have to do is include two lines of code in the footer of every page on your site and from then on information about every click by every visitor is sent back to Clicky and logged to your account. This information includes the visitor’s IP address, geographic location, browser type, operating system, URL and page title, the date and time and where they came from. If they came from a search engine the search query is also logged.

When you log into your account there are a range of reports you can view, for example here is a partial screen shot showing some of this site’s visitors on May 21st 2007.

Clicky screen shot showing visitors

It shows the time of the visit, IP address, country of origin, operating system, browser, number of actions, duration of visit and where they came from. You can click on the links to obtain more information. For example clicking on one of the IP addresses will bring up a view showing visitor details like this.

Clicky screen shot showing visitor details

The integration of Google maps is a nice touch and the table of actions below it can prove very useful, as also can the ‘Content’ view.

Clicky screen shot showing content view

Here we can see the stats for individual pages with the option of selecting entry or exit pages only.

To keep this post a reasonable length I will not list all the functionality of Clicky but the ‘Spy’ view needs to be mentioned. It is a live view of visitors interacting with the site and comes with a health warning because it is extremely addictive! It even has an optional RSS feed so you can integrate it into your favorite feed reader.

Clicky does not have all the advanced features of Google Analytics (or other analytics software) but its presentation and real time functionality make it a more suitable choice in most cases for small to medium traffic websites and bloggers.

Clicky is free for up to 3 sites with less than 1,000 average daily page views per site but comes with a limited feature set as it is free. For $2.99/month or $19.99/year you can have up to 3 sites and 10,000 average daily page views total (between all sites) and this includes the additional features like RSS feeds, Spy, outbound link tracking, download tracking, and more. There is also a Pro/Small business version for $5.99/month or $49.99/year where you can have up to 10 sites and 50,000 average daily page views total with all the additional features (including those in development) plus SSL support.

Sean tells me that additional features in the pipeline include data export, viewing aggregate data from more than a day and analyzing traffic for individual pages (e.g. where people came from, where they went to, what searches lead to this page, etc.).

Congratulations to Roxr Software on a great tool and a great implementation.

(Disclaimer for those that don’t know me: I am not associated with Roxr Software in any way, I am not an affiliate and I never review for money).

Comments

The LSI Myth

In a previous post ‘What is Latent Semantic Indexing?‘ I attempted to give a non-mathematical and simplified explanation of LSI. The document set I chose as an example was every web page and we saw how this would result in a matrix of huge dimensions. I mentioned that LSI would consume very large amounts of processing power if used on such a huge term-document matrix. If you want to get a feel of just how much processing is required take a look at Telcordia LSI Engine: Implementation and Scalability Issues. Not only that but to be meaningful the process would have to index a constant stream of new and updated pages and run continuously, this makes it totally impractical. The algorithm does not scale and keeping the data in memory for very large datasets is not feasible. Keeping it on disk and making random disk seeks takes too much time. LSI has been shown to work best on small homogeneous document collections but for large non-homogeneous document collections it remains a research tool of an as yet unknown efficacy. Also recent experimental results seem to confirm claims by previous researchers that the retrieval accuracy of the LSI technique may deteriorate with large size inhomogeneous datasets (Clustered SVD strategies in latent semantic indexing). The search engines may well have a semantic component of some kind (more on that later) but LSI, no way!

So why would anybody claim that Google or any other search engine was using LSI? Two possible reasons, simple ignorance or as Dr. E. Garcia (information retrieval researcher) puts it “snake oil marketers”, SEO firms and individuals who find some commercial value in pretending they have an understanding of LSI. Here are some typical quotes right off their web pages:

LSI quotes

So what sort of evidence do these people cite to justify their erroneous claims? There appear to three common misunderstandings. The first concerns Google’s acquisition of Applied Semantics in April 2003. Applied Semantics was purchased for its semantic text processing and online advertising expertise derived from its patented CIRCA technology (Google press release). CIRCA uses a proprietary ontology which consists of hundreds of thousands of concepts and their relationships to each other. This ontology is developed by merging industry standard knowledge bases with automated tools together with guidance and direction from a team of lexicographers and computational linguists. The technology is outlined in two Applied Semantics patents; Meaning-based advertising and document relevance determination and Meaning-based information organization and retrieval. CIRCA has absolutely nothing to do with LSI. Google uses CIRCA (by now much improved) to target online advertising and also possibly in much the same way that Yahoo uses its “concept server” (Systems and methods for search processing using superunits and Systems and methods for generating concept units from search queries. The concept server manifests itself as the “Also try:” snippet at the top of the Yahoo SERPs.

The second erroneous justification is associating the Google synonym search operator with LSI. This Google advanced search operator will search not only for your search term but also for its synonyms if you place the tilde sign (~) immediately in front of your search term. As Marissa Mayer, Vice President, Search Products at Google put it when the operator was launched “We think this is a powerful and useful way to broaden results. It’s the opposite of disambiguation, which narrows a search”. Anyone who has used it will see immediately that it uses a small and very poor set of real synonyms (sorry Marissa!). For example ‘shell’ has many synonyms; ammunition, armament, bullet, cartridge, carcass, framework, peel, husk, seashell etc., etc. However Google recognizes very few of these with a ~shell search. Obviously it is not based on a synonym thesaurus but it is as Marissa says a way to broaden search results. These pseudo-synonyms are almost certainly generated algorithmically (possibly from clickthrough data) but again absolutely nothing to do with LSI. In any case as Dr E. Garcia explains LSI is far from being a synonym discovery technique (LSI Keyword Research and Co-Occurrence Theory).

The third fallacious argument involves a belief that a raft of recent Google patents ‘proves’ that Google is using LSI. The patents in question are; Multiple index based information retrieval system, Phrase-based searching in an information retrieval system, Phrase-based indexing in an information retrieval system, Phrase-based generation of document descriptions, Phrase identification in an information retrieval system and Detecting spam documents in a phrase based information retrieval system. These patents contain some very interesting concepts and are required reading for the professional SEO. They are however only filed patents and this does not mean that all or any of the ideas in them have been implemented. They should be studied to give an indication of what search engineers are thinking about and which components (if any) may be implemented now and in the future.

The overall concept in these patents involves indexing documents (pages) according to their included phrases with each potential phrase classified as either a good phrase or a bad phrase. Good phrases are defined as “phrases that tend to occur in more than certain percentage of documents in the document collection and/or are indicated as having a distinguished appearance in such documents, such as delimited by markup tags or other morphological, format, or grammatical markers. Another aspect of good phrases is that they are predictive of other good phrases, and are not merely sequences of words that appear in the lexicon”. Bad phrases are defined as those “…lacking in predictive power”. When a user types in a query any phrases present in the query are used to search the index and ranked results are returned according to the phrases that are contained in the document. This is a gross over simplification :) but to explain the details here is not the point.

The confusion with these patents and LSI arises because as part of the indexing process the proposed algorithm maintains a co-occurrence matrix of good phrases and this is mistaken for the term-document matrix used in LSI. The co-occurrence matrix of good phrases is not only different, it is much smaller and not optimally mapped by SVD as in LSI.

So what’s the bottom line for the LSI myth? If you hear or read an SEO talking about the importance of LSI in search engine optimization then you can be sure they haven’t a clue what they are talking about and you should simply follow the advice for good copy from a previous post.

Those that have got this far may be wondering what use is LSI if it is not used by the search engines. LSI does in fact have quite a few practical applications and here are some examples to satisfy the curious; Pacific Metrics Corporation are using the Content Analyst Company LSI Patents for automated essay scoring, the analysis of legal documents and creating document summaries for academic funding applications.

May 11, 2007

Professor Michael Berry head of the Department of Computer Science at the University of Tennessee wrote me as follows “Just for the record, LSI has been used to index on the order of 10 million documents using out-of-core SVD based techniques so you could apply it to subdomains of the Web but the entire Web would be problematic as you point out”. He also recommended an all inclusive reference book now available on LSA - Handbook of Latent Semantic Analysis, T.K. Landauer, D.S. McNamara, S. Dennis, and W. Kintsch (Eds), Lawrence Erlbaum Associates (2007). Thank you Dr Berry.

Comments (2)

Paid Links

“Now Warwick, tell me, even upon thy conscience, is Edward your true king? For I were loath to link with him that were not lawful chosen”. Henry VI, Act 3, Scene 3 by William Shakespeare.

To buy or not to buy, that is the question.

The head of Google’s Webspam team was advising over a year ago that “…if you sell links, you should mark them with the nofollow tag. – Matt Cutts”. A more recent post has caused alarm bells to ring in the minds of those who buy or sell links. The post in question details how to report any sites you find that are selling or buying links. Matt explains that these external reports will be used to test out some new techniques in algorithmic paid link detection.

So why is Google so keen on detecting paid links you might ask? Look no further than Google’s Corporate Information, Philosophy page “Google works because it relies on the millions of individuals posting links on websites to help determine which other sites offer content of value. Google assesses the importance of every web page using a variety of techniques, including its patented PageRank algorithm which analyzes which sites have been “voted” the best sources of information by other pages across the web”. So it is hardly surprising that Google views paid links as ‘paid votes’ and therefore likely to introduce bias into their PageRank algorithm.

The Problem.

With the introduction of PageRank (originating from Larry Page and Sergey Brin’s 1997 paper) Google created a new commodity – links that improve ranking. Economists from Karl Marx to Milton Friedman have recognized that for every commodity there will always be a market and hence the buying and selling of links has become an industry. Text link brokers have been making hay while the sun shines and Google now feels that it needs to get on top of this problem before the PageRank component of its algorithm breaks. Algorithmically detecting and then discounting paid links is one approach and hence Matt Cutts request for data.

Google’s Solution.

As well as improving the detection of paid links Google’s solution includes extending the use of the nofollow tag from its original conception as “…an easy way for a website to tell search engines that the website can’t or doesn’t want to vouch for a link - Matt Cutts” to a “…machine-readable disclosure for paid links… – Matt Cutts”. It appears likely that once Google encounters a paid link without a nofollow then at the very least it will be discounted.

What should you do?

The obvious course of action is to only buy links for traffic and make sure they are nofollowed or if you are selling links then make sure they too are all nofollowed. However you can be sure that no professional SEO will be signing up exclusively to this approach. Paid links are too important a tool in SEO to be given up on Google’s say so, especially when Google are still in the process of creating an improved detection algorithm. So my advice if you buy links is:

  • Go into stealth mode if you aren’t in it already.
  • Don’t buy links that are advertised or from a broker.
  • Approach site owners directly by telephone.
  • Check the site to make sure it would pass a human inspection for paid links.
  • Make sure your link is embedded in content and that it is relevant content.
  • Make sure the link points to relevant content on you website.
  • Don’t buy home page links.

If you employ an SEO or are about to, make sure that they have a clearly defined policy on buying links based on the above. If you don’t want links purchased for your site make sure that your SEO knows your position on the subject.

June 7, 2007
Added
Google has provided guidelines in its Webmaster Help Center titled Why should I report paid links to Google?

June 12, 2007
Added
Google has put up a Paid Links Reporting Form on Webmaster Tools.

Google paid links reporting form on Webmaster Tools

December 1, 2007
Added
Google have simultaneously published two important posts on paid links in a concerted effort to draw a line in the sand:

On Google Webmaster Central Blog - Information about buying and selling links that pass PageRank

On Matt Cutts blog - Selling links that pass PageRank

December 30, 2007
Added
Ted Murphy of Izea (formerly PayPerPost) has published part of an email he received from Matt Cutts “Google (and probably all search engines) will consider all links in a paid post to be paid”. (My embolding)

Comments

« Previous entries · Next entries »