Noindex vs Nofollow vs robots.txt

What is noindex?

This is a value accepted in the content attribute of a robots meta tag. This meta tag is an HTML element placed  in the head of a page and prevents search engines from including the page in search results (hence “no index”).

Here is how it’s used alone: 

<meta name="robots" content="noindex">

It can also be used with a nofollow tag: 

<meta name="robots" content="noindex, nofollow">

How to use it in an HTTP header: 

X-Robots-Tag: noindex

Reasons To Use the No Index Tag

  • Remove content from search results. Maintain crawlability, but add a noindex tag. This will show search engines that these pages should be removed. This may take time though. For faster removal, add the tags then use Google Search Consoles Remove URL feature. This will temporarily remove the URL from search results. When Google goes to index these again, it will crawl those pages see the noindex tag and not index them.
  • Prevent non-organic landing pages from receiving organic traffic. If you have a landing page that you only want to send traffic from paid or email mediums, then including a noindex tag on those pages is the best way to do that.
  • Prevent thank you pages from being indexed. Thank you pages are often used as a destination goal in Analytics. You want to ensure that the people that see that page actually completed the goal and did not visit that page accidentally.
  • Development environments. Development environments are tested on a live domain (separate from the site) or a subdomain (as part of the site). Either way, if pages that are under development are indexed, it can create a duplicate content issue that would result in the company competing against itself.
  • Disallow PDFs from being indexed. If you have gated PDFs (PDFs that require user name, email, etc.) you do not want search engines to make them freely available in search results. In this case a noindex tag can be added to the HTTP header of the document.

What is nofollow?

Nofollow instructs web crawlers to no crawl a link. It also instructs search engines to not pass authority from the origin page to the destination page.

Placing nofollow in the head. Using nofollow in the head of a page instructs web crawlers no to follow all the links on that page and to no pass authority from that page to all the pages it links to.

<meta name="robots" content="nofollow">

Placing nofollow on an individual link. Using nofollow on an individual links instructs web crawlers not to follow that link and not to pass authority from the that page to the destination page.

<a href="http://www.example.com/" rel="nofollow">click here</a>

What is robots.txt and how does it differ from noindex/nofollow?

Robots.txt is merely a directive and is sometime ignored. A good example of this is that you will sometimes see pages index, but Google does not include a title or description for the listing. In that case the URLs were blocked by robots.txt, but Google still crawled them in order to find other pages that might not be blocked. If these pages had a noindex tag on them, they would have never been included in the search result.

Sources:

  • http://www.dashboardjunkie.com/noindex-nofollow-canonical-and-disallow
  • https://www.searchenginejournal.com/tell-google-how-to-treat-your-content-disallow-nofollow-noindex/6736/
  • http://www.seobook.com/robots-txt-vs-rel-nofollow-vs-meta-robots-nofollow
  • http://www.robotstxt.org/meta.html
  • https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag