Google Panda Report: Understanding how Google views thin and duplicate content
By Julie King | June 1, 2012
When we commissioned the search audit from Reliable SEO we expected to find problems with duplicate content from scrapers and were ready to hear that negative SEO had resulted in us being penalized for low-quality links we had not requested.
What we were not expecting was to learn that content on our website that we viewed as useful – and had in the past ranked particularly well in Google – was being penalized for being “thin”.
1) Some software components on CanadaOne created an impression of thin, duplicated content
Reliable SEO pinpointed the first problems on our homepage. Designed to have people land and then quickly move to something they were interested in without overwhelming the user, we primarily used links with supporting descriptive text. This, coupled with the location of the content below Google’s page layout fold (the section on site architecture explains this in more detail) and an events' calendar browse functionality designed for people but not search engines added up to a page with search optimization signal problems.
The events calendar on the homepage posed a particular problem, due to next and previous buttons that could be followed recursively through many thousands of empty pages that appeared to be duplicated content. Fortunately the solution to this issue was simple: by adding the rel=nofollow tag to the href tags used on the previous and next links we have been able to prevent search robots from following these links.
2) Look to on-site links, as well as off-site links, to understand why you are being penalized
The second issue that really surprised us was that lists of on-site links on CanadaOne were a problem. We had always understood that poor quality inbound links could be a problem, but as it turned out some key content areas of our site were now also sending out spam signals. Interestingly, some of these pages, like our news, had previously been amongst our top-ranked content.
One area that stood out to Reliable SEO was our archives, which let users pull up a list of links by category or date. We have been publishing CanadaOne ten times a year or more since March 1998 and have built an online library that contains over 900 business articles on a variety of topics. When shown in the archives, this mean that there were over 50 different pages of links, which look like thin content and in some cases duplicate content within the context of the new Panda algorithm.
Fortunately the solution here was mostly straight-forward.
For the archives and business news, every article on CanadaOne has a meta description so we were able to add the meta description to the archives. This should not only help clean up the overall penalty, but will also improve the usability for those that matter most: the real small business owners and managers who come to CanadaOne for help.
For our “content funnels”, which help us categorize content and display related articles, a key issue that emerged is that for some shorter pages and news stories, the number of links to related stories can be greater than the full text of the main story. This is a particular problem for popular categories like “starting a business”, where we might have a 300 word story with a list of 40-60 related articles.
The short-term fix has been to cap the number of related articles that can appear in a “content funnel” to 5 articles.
Although it will take a lot of time, our intention over the longer term is to incorporate a tool that will enable us to create related links to much more tightly related topics. (For example, we would link two articles on trademarks and copyright, rather than linking them to the more general categories of “legal” or even “intellectual property”.)
Another problematic area of our site is the Ask-the-Expert page, which interestingly enough also shows how the Panda algorithm has shifted. After CanadaOne was first affected by Panda we saw a drop in traffic to many areas of our site, while traffic to this area of the site increased. In fact, in the fall of 2011 this section of our website saw the highest traffic seen since we started tracking traffic through Google Analytics in 2007, 142 per cent higher than the peak previously recorded on April 14, 2008.
By far the most work will come in updating this section, which has hundreds of questions. For this we will look at another suggestion, which will be to break up the page into content topic areas, so that content is concentrated around major topic areas (HR, taxes, etc.) rather than having all topics together.
3) Issues Around Attribution and Scraped/Duplicate content
The third thing that surprised us was that not only were content scrapers (sites that had stolen our content and republished it without our permission) damaging our ranking because of the duplication, but because of the difficulties search engines currently have with proper attribution, it is possible that CanadaOne is being seen as the spammer/scraper for content that was stolen from our site!
Reliable SEO outlined an interesting way for us to address the attribution problem, through the use of the rel=author format which is described in more detail in the section on site architecture.
Recommended actions for other sites:
- Evaluate your website by looking carefully at sections and individual pages, to compare the ratio of text on the page to the number of links. While we cannot provide an exact ratio, you want to have your text fill the majority of the page, with links playing a much lesser role.
- Look for pages where you have created lists of links and add text-based descriptions to each link.
- Look for links from your site that point to what Google would view as a poor quality website. While these are sometimes necessary, try to keep links like this to a minimum.
Page 1:Google Panda Report: Introduction (previous)
Page 2: Google Panda Report: Understanding How Google Views Duplicate or Thin Content
Page 3: Google Panda Report: Site Focus (next)
Page 4: Google Panda Report: Site Architecture
Page 5: Google Panda Report: Site Clean-Up with Google Webmaster Tools