Marketing is a highly competitive field, with more than 50% of marketers being laid off in the past few years. To stay ahead, companies are constantly looking to find new and innovative ways to market themselves that don’t cost as much while also providing better results.
SagaReach Marketing is a marketing degree that offers the best in marketing education. The company has a team of experienced and well-trained professors, who have helped thousands of students learn about the industry. SagaReach Marketing also features live classes, which are available to students all over the world.
Duplicate content is a challenge that search engines like Google face. Similar material exists at several sites (URLs) on the web, and search engines don’t know which URL to present in the search results as a consequence of duplicate content. This may harm a website’s score, and the issue only becomes worse when users link to multiple versions of the same material. This article will assist you in comprehending the many sources of duplicate content and determining a solution for each.
What is the definition of duplicate content?
Content that is accessible on many URLs on the internet is referred to as duplicate content. Because many URLs display the same information, search engines are unable to determine which URL should appear higher in the search results. As a result, they may rank both URLs lower and give other sites precedence.
The technological causes of duplicate content and their remedies will be the subject of this essay. If you’d want a more in-depth look at duplicate content and how it relates to cloned or scraped material, as well as term cannibalization, we recommend reading this article: What is Duplicate Content.
Let’s have a look at an example.
Duplicate content is like to being at a fork in the road with two signs pointing in opposite directions for the same destination: Which path should you choose? To make things worse, the eventual destination is also different, although insignificantly. You may not mind if you still obtain the answer you were looking for as a reader, but a search engine must choose which page to display in the search results since it doesn’t want to show the same information again.
Let’s imagine you have an article about ‘keyword x’ at http://www.example.com/keyword-x/ and another article on the same topic at http://www.example.com/article-category/keyword-x/. This scenario is not made up: it occurs in many current Content Management Systems (CMS). Let’s imagine your piece is shared by a number of bloggers, and some of them link to the first URL while others link to the second. This is when the issue with the search engine reveals its actual nature: it’s your problem. Because those links both promote separate URLs, the duplicate content is your issue. Your chances of ranking for ‘keyword x’ would be better if they all linked to the same URL.
Whether you’re not sure if your site has duplicate content concerns, these duplicate content discovery tools may help you figure it out.
Why should you avoid duplicating material on your website?
Your rankings will suffer as a result of duplicate material. At the very least, search engines will be unable to recommend a website to visitors. As a consequence, all of the sites that those search engines consider duplicates are at danger of being lowered in the rankings. That is the ideal situation. If your duplicate content problems are severe, such as having extremely thin material mixed with word-for-word duplicated content, Google may take manual action against you for deceiving visitors. So, if you want your content to rank, you need to make sure that each page has a reasonable quantity of original material.
However, it isn’t only a concern for search engines. It may be really annoying for your users if they can’t locate what they’re looking for when they’re searching for a certain page. As with many areas of SEO, it’s critical to address duplicate content concerns for both user experience and search.
Duplicate content’s causes
Duplicate material may occur for a variety of reasons. The majority of them are technical: it’s not frequently that a person chooses to publish the identical material in two locations without indicating which is the original. Unless, of course, you accidentally duplicated a post and published it. However, for the most part, it seems out of place.
However, there are a variety of technical reasons behind this, the most common of which is that developers do not think like a browser or even a user, much alone a search engine spider — they think like a programmer. Take the article from http://www.example.com/keyword-x/ and http://www.example.com/article-category/keyword-x/, for example. If you inquire with the developer, they will tell you that it only exists once.
The idea of a URL is misunderstood.
That developer isn’t insane; they’re simply speaking in a different dialect. The website will most likely be powered by a CMS, and although there is only one article in the database, the website’s software simply enables that same content to be accessed through several URLs. That’s because the developer considers the article’s unique identifier to be the ID it has in the database, not the URL. The URL, on the other hand, is the search engine’s unique identifier for a piece of material. When you convey this to a developer, they will begin to understand the issue. And, after reading this post, you’ll be able to provide them an immediate answer.
You’ll often want to keep track of your visitors and enable them to save products they wish to purchase in a shopping cart, for example. You’ll need to offer them a’session’ to do this. A session is a snapshot of a visitor’s activity on your site, and it might include information such as the products in their shopping basket. The unique identifier for that session – called the Session ID – must be kept someplace in order to sustain that session while a visitor navigates from one page to the next. Cookies are the most popular method of doing this. Search engines, on the other hand, don’t normally save cookies.
Some systems revert to utilizing Session IDs in the URL at this point. This implies that the Session ID is added to the URL of every internal link on the page, and since the Session ID is unique to that session, it produces a new URL and hence duplicate content.
For tracking and sorting, URL parameters are employed.
Use of URL parameters that do not affect the content of a page, such as tracking links, is another source of duplicate content. The URLs http://www.example.com/keyword-x/ and http://www.example.com/keyword-x/?source=rss are not the same to a search engine. The latter may enable you to trace where individuals came from, but it may also make it more difficult for you to rank high – an unwelcome side effect!
Of course, this isn’t limited to tracking parameters. It applies to every parameter you may add to a URL that doesn’t modify the important information, whether it’s for ‘changing the sorting on a group of items’ or’showing another sidebar’: they all result in duplicate content.
Scrapers and syndication of content
The majority of causes for duplicating content are either your or your website’s fault. Other websites, on the other hand, may use your material with or without your permission. Because they don’t always link to your original content, the search engine misses it and has to cope with yet another version of the same piece. The more successful your site grows, the more scrapers you’ll attract, exacerbating the situation.
The following is the order of the parameters.
Another typical problem is that a CMS uses URLs like /?id=1&cat=2, where ID refers to the article and cat refers to the category, rather than good clear URLs. In most website systems, the URL /?cat=2&id=1 yields the same results, yet in a search engine, the results are entirely different.
There is an option to paginate your comments in WordPress, as well as some other systems. As a result, material is repeated over the article URL, as well as the article URL + /comment-page-1/, /comment-page-2/, and so on.
Pages that can be printed
Unless you actively prohibit them, Google will normally identify printer-friendly pages created by your content management system and linked to from your article pages. Now consider this: which version of Google do you want to see? Is it the one with your adverts and other material, or the one with just your article?
When both versions of your site are available, this is one of the oldest rules in the book, yet search engines still get it wrong from time to time: WWW vs. non-WWW duplicate content. Another case I’ve observed is HTTP vs. HTTPS duplicate content, in which the identical material is given out via both protocols.
A ‘canonical’ URL is a conceptual solution.
An ironic aside
The word “canonical” comes from the Roman Catholic tradition, in which a list of holy works was compiled and regarded as authentic. They were known as the New Testament’s canonical Gospels. The irony is that it took the Roman Catholic church about 300 years and countless battles to get at that canonical list, and they ultimately picked four different versions of the same tale…
The fact that many URLs link to the same information, as we’ve seen, is an issue, but it can be remedied. One individual working at a newspaper will usually be able to tell you what the ‘proper’ URL for a certain piece is, but if you ask three people inside the same organization, you’ll often receive three different replies…
That is an issue that must be addressed since there can only be one winner in the end (URL). The search engines refer to the ‘proper’ URL for a piece of information as the canonical URL.
Detecting concerns with duplicate content
You may be unsure if you have duplicate material on your site or in your content. One of the simplest methods to detect duplicate material is to use Google.
In situations like this, there are a number of search operators that might be quite useful. You’d input the following search term into Google to identify all the URLs on your site that include your keyword X article:
intitle:”Keyword X” site:example.com
After that, Google will display you all of the pages on example.com that have that term. The more detailed that intitle component of the query is, the simpler it will be to find duplicate material. The same strategy may be used to find duplicate information all throughout the internet. If your article’s complete title was ‘Keyword X – Why It Is Awesome,’ you’d look for:
“Keyword X – Why It Is Awesome” is the subtitle.
And Google would show you all of the websites that had the same title. Some scrapers may modify the headline, so it’s worth looking for one or two entire phrases from your post. When you do such a search, Google may display a message similar to this on the final page of results:
This indicates that Google has already begun to ‘de-dupe’ the results. It’s still not good, so visit the link and check over all of the other results to see if you can solve any of them.
Read more: DIY: Check for Duplicate Content «
Duplicate content solutions that work
You must begin the process of canonicalization after you’ve determined which URL is the canonical URL for your piece of content (yeah I know, try saying that three times out loud fast). This implies we need to inform search engines about the canonical version of a page and make it as easy as possible for them to discover it. In order of preference, there are four ways to solve the problem:
- Avoiding the creation of duplicate material
- Duplicate material is redirected to the canonical URL.
- The duplicate page should have a canonical link element added to it.
- Adding an HTML link to the canonical page from the duplicate page
Avoiding the creation of duplicate material
Some of the above-mentioned reasons of duplicate material may be easily remedied:
- Do your URLs include Session IDs? These may usually be turned off in your system’s settings.
- Are there any duplicate printer-friendly pages on your computer? These are absolutely unneeded; instead, a print style sheet should be used.
- Do you use WordPress comment pagination? On 99 percent of sites, you should just deactivate this function (under settings » discussion).
- Is the order of your parameters different? Tell your programmer to create a script that places arguments in the same order every time (this is often referred to as a URL factory).
- Are there any concerns with tracking links? You can utilize hash tag-based campaign monitoring instead of parameter-based campaign tracking in most circumstances.
- Do you have any troubles with WWW vs. non-WWWW issues? Choose one and stick to it by diverting the other to it. You may also specify a preference in Google Webmaster Tools, but you’ll need to claim both domain names.
Even if your issue isn’t readily solved, it could be worthwhile to put out the effort. Because it is by far the greatest solution to the issue, the objective should be to prevent duplicate information from surfacing at all.
301 301 301 301 301 301 301 301 301
While it may be hard to completely prevent your system from producing incorrect URLs for content, it is occasionally feasible to redirect them. If this doesn’t make sense to you (which I understand), keep this in mind when speaking with your devs. If you are able to eliminate any of the duplicate content concerns, make sure that all of the previous duplicate content URLs are redirected to the correct canonical URLs.
Even if you know it’s the incorrect URL, you may not want or be able to delete a duplicate version of an article. The canonical link feature was added by search engines to address this problem. It’s located in the part of your website and appears as follows:
You put the correct canonical URL for your article in the href section of the canonical link. When a canonical search engine discovers this link element, it conducts a gentle 301 redirect, moving the majority of the link value obtained by that page to your canonical page.
However, as Google’s John Mueller points out, this procedure is a little longer than a 301 redirect, so if you can simply execute a 301 redirect, that would be ideal.
Continue reading: rel=canonical • What it is and how to use it (or not) »
Backlinking to the original source
If you can’t do any of the above, maybe because you don’t have control over the portion of the site where your information appears, including a link to the original post at the top or bottom of the piece is usually a good idea. You could wish to accomplish this by include a link to the article in your RSS feed. Some scrapers will remove the connection, while others may keep it. If you have multiple links referring to your original content, Google will quickly figure out that it is the canonical version.
What if someone copies your website’s content? [/readmore]
Conclusion: duplicative material can and should be eliminated.
Duplicate material is a common occurrence. I’ve yet to come across a site with more than 1,000 pages that doesn’t have some kind of duplicate content issue. It’s something you’ll have to keep an eye on all the time, but it’s fixable, and the benefits may be substantial. Simply simply removing duplicate material from your site, your excellent content might fly in the ranks!
Examine your technical SEO readiness.
Duplicate material must be removed as part of your technical SEO strategy. Do you want to know how well your site’s overall technical SEO is? We’ve put up a technical SEO fitness assessment to assist you find out where you need to improve.
Continue reading: Rel=canonical: The Definitive Guide »
De Valk, Joost
SagaReach Marketing’s creator and Chief Product Officer is Joost de Valk. He is an online entrepreneur who has invested in and mentored various firms in addition to owning SagaReach Marketing. His major areas of competence are open source software and digital marketing.
- marketing analytics
- jobs in marketing
- advertising or marketing jobs