Canonicals

What is a canonical?

A canonical link is an HTML element used to distinguish the “original” page from derivative pages carrying the same content. It is used to prevent duplicate content issues on the site and tells search engines which page it should index.

 

canonical link diagram

 

How are canonical links used?

How a canonical is used depends on the site and the types of content it contains. Here are the six common instances where canonicals should be used:

  1. Self Referring Canonicals
  2. Duplicate Pages
  3. View All Pages
  4. Faceted Navigation
  5. Non-HTML Content
  6. Cross-Domain Syndication

Self-Referring

This type of canonical points to itself. This is used as a confidence indicator to confirm that the page the search engine has found is indeed the page that should be indexed. This type of canonical is particularly useful when redirecting pages to a new location. Search engines will follow a 301 redirect and use the self referring canonical to confirm that the page it has arrived on is the new page that should be indexed.

Example of a self-referring canonical:

URL
http://www.example.com/breakdancing-grizzly-bear

Canonical
<link rel="canonical" href="http://www.example.com/breakdancing-grizzly-bear" />

Duplicate Content

In it’s most basic form duplicate content means that two or more URLs have the same content. Normally this is not done on purpose, but rather the Content Management System (CMS) is producing URLs that will render the content on different URLs.

An important thing to remember about duplicate content is that if a URL can be modified and the site still renders the content on the original URL, then you have a potential duplicate content issue.

Common ways modifying a URL can produce duplicate content:

http vs https

These would technically be considered duplicates

http://www.example.com/services
https://www.example.com/services

www vs non-www. This happens when a CMS does not force the domain to use either www or non-www. Having a www in the URL is really declaring a subdomain. So being able to render content on the www version of the URL is like

These would technically be considered duplicates

http://example.com/services
http://www.example.com/services

Capitalization. If you can modify a URL by capitalizing one or more of its characters and the content still renders, that is considered duplicate content. It would be rare to see this type of duplicate being indexed by search engines, but it can have an effect on the way a page accumulates authority. If another site links to a piece of content using capitalization, authority will be passed to that URL, instead of attributing authority to the lower case version of the link.

These would technically be considered duplicates

http://www.example.com/services
http://www.example.com/seRvices

Development Sites. When a site is undergoing a redesign a development site is typically set up to test the new site in a live environment. If the developers fail to add a noindex tag to the page, then there is potential for duplicate content issues. Developement sites are usually hosted on a subdomain or seperate domain. In either case developers should included a noindex tag and block all search engines from crawling that content.

These would technically be considered duplicates

http://www.example.com/
http://dev.example.com/
http://www.development-domain.com/

View All Pages

Duplicate content can be created when a website has a single view all page and individual pages that contain pieces of content from the view all page. This is common with publishers who product list type content where the view all pages has all ten items on one page, but also breaks each item out onto it’s own page.

view all canonical diagram

The problem with this type of content is that it often competes with itself in organic rankings. To prevent this, the site should add a canonical from breakout pages to the view all page. This eliminates duplicate content issues and consolidates link metrics, making the view all page the one page that will be indexed and ranked.

Examples of a view all page canonical: 

View All URL
http://www.example.com/top-5-bill-murray-movies

Individual Pages
URL: http://www.example.com/top-5-bill-murray-movies/groundhog-day
Canonical: http://www.example.com/top-5-bill-murray-movies

URL: http://www.example.com/top-5-bill-murray-movies/ghostbusters
Canonical: http://www.example.com/top-5-bill-murray-movies

URL: http://www.example.com/top-5-bill-murray-movies/lost-in-translation
Canonical: http://www.example.com/top-5-bill-murray-movies

URL: http://www.example.com/top-5-bill-murray-movies/caddyshack
Canonical: http://www.example.com/top-5-bill-murray-movies

URL: http://www.example.com/top-5-bill-murray-movies/scrooged
Canonical: http://www.example.com/top-5-bill-murray-movies

Faceted navigation

Infinite scroll

Non-HTML Content

Cross-domain Duplicate Content

Some times content is cross published on multiple sites that are owned by the same company. This is still duplicate content and each piece of content has the ability to compete for rankings. To ensure the correct domain ranks for an article or piece of content a cross-domain canonical can be added to the page.

Resources:

Cross-domain URL selection – Search Console Help

Handling Legitimate Cross-Domain Canonicals – Google Webmaster Blog

Cross-Domain Canonical The New 301? – Whiteboard Friday – Moz

Does Google support cross-domain rel=”canonical”? – Google Webmaster on YouTube

How is it implemented?

There are two way to implement a canonical link. The first, and most  common, is by adding a <link> HTML tag to the <head> of a page.

Additional Resources

Ecommerce SEO: Product Variation, Colors, and Sizes – Merkle

rel=canonical: the ultimate guide – Yoast