By Shari Thurow, Funder and SEO Director of Omni Marketing Interactive
What is duplicate content? Search engines don’t see layout, just copy. Exclude our site search via robots.txt and meta robots tag to avoid submitting duplicate content to Google. Duplicate content is not considered spam, but it can be penalized. The biggest issue, however, is usability. IMPORTANT NOTE: Duplicate content is not penalized, it’s filtered out of the SERPs.
How do search engines filter out duplicate content?
- Crawl time filters (duplicate URLs)
- Index-time filters
- Query-time filters
Index-Time Filters
- Broilerplate strippings (search engine removes broilerplate elements to determine the content fingerprint)
- Linkage properties (# of inbound and outbound links)
- Host name resolution (what domain resides on which ip address?)
- Shingle Comparison (Andre Broder – Google Scholar and shingles)
- Every Web document has a unique content signature or “fingerprint”