What is a canonical?
A canonical link is an HTML element used to distinguish the “original” page from derivative pages carrying the same content. It is used to prevent duplicate content issues on the site and tells search engines which page it should index.
How are canonical links used?
How a canonical is used depends on the site and the types of content it contains. Here are the six common instances where canonicals should be used:
- Self Referring Canonicals
- Duplicate Pages
- View All Pages
- Faceted Navigation
- Non-HTML Content
- Cross-Domain Syndication
- Self-referring vs canonicalization
- one canonical per page
- plugins can sometimes conflict with hard-coded canonicals, duplicating them
- absolute urls not relative urls
This type of canonical points to itself. This is used as a confidence indicator to confirm that the page the search engine has found is indeed the page that should be indexed. This type of canonical is particularly useful when redirecting pages to a new location. Search engines will follow a 301 redirect and use the self referring canonical to confirm that the page it has arrived on is the new page that should be indexed.
Example of a self-referring canonical:
URL http://www.example.com/breakdancing-grizzly-bear Canonical <link rel="canonical" href="http://www.example.com/breakdancing-grizzly-bear" />
In it’s most basic form duplicate content means that two or more URLs have the same content. Normally this is not done on purpose, but rather the Content Management System (CMS) is producing URLs that will render the content on different URLs.
An important thing to remember about duplicate content is that if a URL can be modified and the site still renders the content on the original URL, then you have a potential duplicate content issue.
Common ways modifying a URL can produce duplicate content:
http vs https
These would technically be considered duplicates http://www.example.com/services https://www.example.com/services
www vs non-www. This happens when a CMS does not force the domain to use either www or non-www. Having a www in the URL is really declaring a subdomain. So being able to render content on the www version of the URL is like
These would technically be considered duplicates http://example.com/services http://www.example.com/services
Capitalization. If you can modify a URL by capitalizing one or more of its characters and the content still renders, that is considered duplicate content. It would be rare to see this type of duplicate being indexed by search engines, but it can have an effect on the way a page accumulates authority. If another site links to a piece of content using capitalization, authority will be passed to that URL, instead of attributing authority to the lower case version of the link.
These would technically be considered duplicates http://www.example.com/services http://www.example.com/seRvices
Development Sites. When a site is undergoing a redesign a development site is typically set up to test the new site in a live environment. If the developers fail to add a noindex tag to the page, then there is potential for duplicate content issues. Developement sites are usually hosted on a subdomain or seperate domain. In either case developers should included a noindex tag and block all search engines from crawling that content.
These would technically be considered duplicates http://www.example.com/ http://dev.example.com/ http://www.development-domain.com/
View All Pages
Duplicate content can be created when a website has a single view all page and individual pages that contain pieces of content from the view all page. This is common with publishers who product list type content where the view all pages has all ten items on one page, but also breaks each item out onto it’s own page.
The problem with this type of content is that it often competes with itself in organic rankings. To prevent this, the site should add a canonical from breakout pages to the view all page. This eliminates duplicate content issues and consolidates link metrics, making the view all page the one page that will be indexed and ranked.
Examples of a view all page canonical:
View All URL http://www.example.com/top-5-bill-murray-movies Individual Pages URL: http://www.example.com/top-5-bill-murray-movies/groundhog-day Canonical: http://www.example.com/top-5-bill-murray-movies URL: http://www.example.com/top-5-bill-murray-movies/ghostbusters Canonical: http://www.example.com/top-5-bill-murray-movies URL: http://www.example.com/top-5-bill-murray-movies/lost-in-translation Canonical: http://www.example.com/top-5-bill-murray-movies URL: http://www.example.com/top-5-bill-murray-movies/caddyshack Canonical: http://www.example.com/top-5-bill-murray-movies URL: http://www.example.com/top-5-bill-murray-movies/scrooged Canonical: http://www.example.com/top-5-bill-murray-movies
What if a site has multiple canonicals?
Another issue is when pages include multiple rel=canonical links to different URLs. This happens frequently in conjunction with SEO plugins that often insert a default rel=canonical link, possibly unbeknownst to the webmaster who installed the plugin. In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.
Source: 5 Common Mistakes with rel=canonical by Google Webmasters Blog
Cross-domain Duplicate Content
Some times content is cross published on multiple sites that are owned by the same company. This is still duplicate content and each piece of content has the ability to compete for rankings. To ensure the correct domain ranks for an article or piece of content a cross-domain canonical can be added to the page.
Cross-domain URL selection – Search Console Help
Handling Legitimate Cross-Domain Canonicals – Google Webmaster Blog
Does Google support cross-domain rel=”canonical”? – Google Webmaster on YouTube
How is it implemented?
There are two way to implement a canonical link. The first, and most common, is by adding a <link> HTML tag to the <head> of a page.