By Shari Thurow, Funder and SEO Director of Omni Marketing Interactive
What is duplicate content? Search engines don’t see layout, just copy. Exclude our site search via robots.txt and meta robots tag to avoid submitting duplicate content to Google. Duplicate content is not considered spam, but it can be penalized. The biggest issue, however, is usability. IMPORTANT NOTE: Duplicate content is not penalized, it’s filtered out of the SERPs.
How do search engines filter out duplicate content?
- Crawl time filters (duplicate URLs)
- Index-time filters
- Query-time filters
- Broilerplate strippings (search engine removes broilerplate elements to determine the content fingerprint)
- Linkage properties (# of inbound and outbound links)
- Host name resolution (what domain resides on which ip address?)
- Shingle Comparison (Andre Broder – Google Scholar and shingles)
- Every Web document has a unique content signature or “fingerprint”
- Content is broken down into sets of word patterns
- word sets are created from groups of adjacent words
- Pattern matching
- * = matches any sequence of characters
- $ = matches the end of a URL
- Be consistent and be proactive
- Don’t robots exclude a page and then put it in your sitemap
StoneTemple Consulting – How to Syndicate Content Safely
301 redirect eliminates duplicate content altogether, but it doesn’t cause the pages to cease to exist. Redirects link juice back to the canonical page. (301 redirect doesn’t move all of the link juice, but probably 99%). Change any links on your site to point to the canonical page.
The duplicate page remains and still gets crawled, but it’s only a suggestion. The search engine doesn’t have to follow the canonical instruction. The reason is that search engines find that webmasters make mistakes with canonical a lot of the time. Mistakes happen all the time. Google doesn’t care if your first. They want the best content for the user.
In Webmaster Tools, you can tell the search engines to ignore parameters.
If you syndicate your content, you should make sure it is noindexed. Search engines sometimes prefer the syndicatee, instead of the original creator of the content. If you noindex a page, it keeps the duplicate out of the index, but the page will still be crawled. It can still pass link juice/pagerank through to other page.s
Pseudo Dupes – Duplicate Titles
Title tags are one of the most powerful on page ranking factors and give a strong hint to search engines on what a page is about. Duplicate title tags may cause a page to not rank, much the same as an actual duplicate page.
Syndicating exact copies of content is a bad idea. Just because it works today, doesn’t mean it will work in the future.
Syndicating content is a great way to get quality links back to your site. Don’t syndicate your content to the point that you no longer rank for it.
Good Syndication creates new original content based on your existing subject matter expertise. It’s a great way to get visibility.
Divide your writing efforts by creating new original articles. Publish some of them on your own site and syndicate the rest to 3rd parties and watch the link juice flows. It also gives users a reason to go to your site because there is different content on your site.
3 major syndication options
- create new original content
Helpful tools to check for duplicate content: write a script or use Majestic SEO