It is important to have search engine friendly urls if you want your pages spidered and indexed by the search engines but what does having search engine friendly urls actually mean? Let’s take a look at what the three major search engines say about urls:
Google has three things to say on the subject in its Webmaster Guidelines:
1. If you decide to use dynamic pages (i.e., the URL contains a “?” character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
2. Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
3. Don’t use “&id=” as a parameter in your URLs, as we don’t include these pages in our index.
Yahoo in their Search Indexing FAQ say:
Do you index dynamically generated pages (e.g., asp, .shtml, PHP, “?”, etc.)?
Yahoo! does index dynamic pages, but for page discovery, our crawler mostly follows static links. We recommend you avoid using dynamically generated links except in directories that are not intended to be crawled/indexed (e.g., those should have a /robots.txt exclusion).
MSN’s Guidelines for successful indexing say:
Keep your URLs simple and static. Complicated or frequently changed URLs are difficult to use as link destinations. For example, the URL www.example.com/mypage is easier for MSNBot to crawl and for people to type than a long URL with multiple extensions.
The message is clear, static urls are better than dynamic but if you have a dynamic site the urls must be as simple as possible, with only one or two query strings and no session IDs.
A url that might look like this:
Should preferably look like this:
How you achieve this depends on whether you are starting out with a new site or have an established site with existing complex urls.
If it is a new site then search engine friendly urls must be built into the design criteria. How this will be done depends on the programming language. For example if you planned to use PHP then you might make use of the PATH_INFO variable or if you use ASP.NET then you could modify the Global.asax file.
If you plan to use a content management system (CMS) then make sure that it generates search engine friendly urls out of the box. The Content Management Comparison Tool has a check box for ‘Friendly URLs’ if you are researching CMS tools.
A completely different approach (not approved of by geeks but worth consideration if you are designing your own site as a non-professional) is to create static HTML web pages from a database or spreadsheets but not in real-time. WebMerge for example works with any database or spreadsheet that can export in tabular format such as FileMaker Pro, Microsoft Access, and AppleWorks. Using HTML template pages WebMerge makes a new HTML page from the data in each record of the exported file. It can also create index pages with links to other pages and generated pages can be hosted without the need for a database.
If it is an existing site then problematic urls can be converted to simple urls in real-time. If you are on an Apache server then you can use mod_rewrite to rewrite requested URLs on the fly. This requires knowledge of regular expressions which can be rather daunting if you are not a programmer. Fortunately there is an abundance of mod_rewrite expertise at RentACoder if you get stuck. If you are on Internet Information Server (IIS) then you can use something like ISAPI_Rewrite to rewrite your urls which also requires knowledge of regular expressions.
What ever your solution you should try to incorporate your keywords in the urls and only ever use hyphens, never an underscore or space.
Additional reading in a more recent post - URLs (Update)