Google’s duplicate content patent
Wednesday, December 16, 2009 3:40This month, Google was granted a patent with the name Duplicate document detection in a web crawler system. The patent explains how a content filter from the search engine can work with a duplicate content server.
What is duplicate content?
The patent contains a definition of duplicate content:
"Duplicate documents are documents that have substantially identical content, and in some embodiments wholly identical content, but different document addresses."
The patent describes three scenarios in which duplicate documents are encountered by a web crawler:
- Two pages, comprising any combination of regular web page(s) and temporary redirect page(s), are duplicate documents if they share the same page content, but have different URLs.
- Two temporary redirect pages are duplicate documents if they share the same target URL, but have different source URLs.
- A regular web page and a temporary redirect page are duplicate documents if the URL of the regular web page is the target URL of the temporary redirect page or the content of the regular web page is the same as that of the temporary redirect page.
A permanent redirect page is not directly involved in duplicate document detection because the crawlers are configured not to download the content of the redirecting page.
How does Google detect duplicate content?
According to the patent description, Google’s web crawler consults the duplicate content server to check if a found page is a copy of another document. The algorithm then determines which version is the most important version. Google can use different methods to detect duplicate content. For example, Google might take "content fingerprints" and compare them when a new web page is found. Interestingly, it’s not always the page with the highest PageRank that is chosen as the most important URL for the content:
"In some embodiments, a canonical page of an equivalence class is not necessarily the document that has the highest score (e.g., the highest page rank or other query-independent metric)."
How does this affect your website?
If you want to get high rankings, it is easier to do so with unique content. Try to use as much original content as possible on your web pages. If your website must use the same content as another website, make sure that your website has better inbound links than the other websites that carry the same content. It’s likely that your website will be chosen as the most important URL for the content then.
If your web site has unique content, you don’t have to worry about potential duplicate content penalties. Optimize that content for search engines and make sure that your web site has good inbound links. It’s hard to outrank a website with good optimized content and many good inbound links.
source: Axandra.com
Related posts:
- The Tricky Issue of Duplicate Content & What Google Says About It Being a full-time online marketer means you have to keep a close watch on how Google is ranking pages on the web… one very serious...
- Ten things that you should know about search engine optimization (SEO) source: Axandra.com Search engine optimization is not difficult if you understand the basic concepts. If you know what to do, your website will get the...
- The Benefits Of Good Website Content The content of a website includes everything from the graphical elements of design and the pictures used to the textual website content. All must play...
- The SEO Content Controversy About the Author: Scott Lindsay – Now you can make a website incredibly fast at HighPowerSites.com. Start a home business and be an Ebook Reseller...
- How to Optimize for Google – Part 3 of 3 In Part 1 and Part 2 of How to Optimize for Google I discussed general website optimization, links, and Google webmaster tools. In Part 3...
- SEO : Content Relevancy and Link Popularity Explained How does a web page get ranked #1 on Google? There is no doubt that the methods of the 1990s no longer work – and...
- Google’s Search Engine Optimization guide for webmasters If you avoid the following things, it will be easier for Google to index your web pages: Things that you should not do in your...
- Ranking test In an online webmaster forum, a webmaster described the link experiment that he did with his websites. He tried to find out how linking to...



Review this blog on Bloggers.com