URL Transliteration – Pros and Cons

Arguments For: User experience Google shows non-latin character URLs in search results Arguments Against: Many global companies, even the major search engines, use Latin characters Non-latin characters might not be displayed in some browsers   Research: From: Which browser does not support Unicode in URLs? RFC 3986 requires percent-encoding of Unicode characters in URLs: When a new…

How Google Crawls and Indexes AJAX Content

What is AJAX? AJAX (short for “asynchronous JavaScript and XML”) is a set of Web development techniques using many Web technologies on the client side to create asynchronous Web applications. With Ajax, Web applications can send data to and retrieve from a server asynchronously (in the background) without interfering with the display and behavior of…

Duplicate Content – Everything You Need to Know and How to Fix It

What is duplicate content? Duplicate happens when the same exact content is available on two unique URLs. What are the most common causes of duplicate content? Capitalizations Trailing Slash www vs non-www http vs https Publishing across different domains, subdomains, and/or subdirectories Why is duplicate content an issue? keyword canniballization reporting index bloat crawlability How…

How does Google rank images?

If the text is embedded as images, we may process the images with OCR algorithms to extract the text. What happens with the images in PDF files? A: Currently the images are not indexed. In order for us to index your images, you should create HTML pages for them. To increase the likelihood of us returning your…

How does Google index PDF files?

PDFs in Google search results Q: Can Google index any type of PDF file? A: Generally we can index textual content (written in any language) from PDF files that use various kinds of character encodings, provided they’re not password protected or encrypted. The general rule of the thumb is that if you can copy and paste…

Noindex vs Nofollow vs robots.txt

What is noindex? This is a value accepted in the content attribute of a robots meta tag. This meta tag is an HTML element placed  in the head of a page and prevents search engines from including the page in search results (hence “no index”). Here is how it’s used alone:  <meta name=”robots” content=”noindex”> It…