canonical tag Archives - Bruce Clay, Inc.

How To Use the Canonical Link Element for Duplicate Content

Bruce Clay — Mon, 26 Feb 2024 16:26:35 +0000

If you want to avoid duplicate content issues, using the canonical link tag can help. This technical SEO best practice is fairly straightforward, and in this article, I’ll talk about why and when you should use it, and tips on how to get started.

What Is a Canonical URL?
When to Use a Canonical URL
How to Use the Canonical Link Element
Canonicalization FAQs
FAQ: How can I effectively use canonical URLs to prevent duplicate content issues and improve my SEO?

What Is a Canonical URL?

A canonical URL is a webpage that is the best representative of a group of duplicate or near-duplicate webpages.

The canonical link element helps solve duplicate content issues by signaling to search engines like Google which webpage is the original or best pick out of a group of pages that are duplicate or near duplicate.

Google defines canonical URL as:

“A canonical URL is the URL of the best representative page from a group of duplicate pages, according to Google. For example, if you have two URLs for the same page (for example: example.com?dress=1234 and example.com/dresses/1234), Google chooses one as canonical. Similarly, if you have multiple pages that are nearly identical, Google can group them together (for example, pages that differ only by the sorting or filtering of the contents, such as by price or item color). (You might hear the term “canonical page” used occasionally, but that is technically incorrect, as it is a specific URL that is actually canonical.)
The canonical can be in a different domain than a duplicate (such as en.example.com and fr.example.com).”

The hope is that the canonical URL is the webpage that receives the SEO benefit. And while you can indicate a canonical URL, Google states that none of them are required:

“While we encourage you to use these methods, none of them are required; your site will likely do just fine without specifying a canonical preference. That’s because if you don’t specify a canonical URL, Google will identify which version of the URL is objectively the best version to show to users in Search.”

Google goes into more detail on how it chooses the canonical URL in this video:

In the video, Google’s John Mueller states that Google picks the canonical URL by following two general guidelines:

Which URL does it look like the site wants Google to use?
Which URL would be more useful for the user?

When in doubt, you can use the URL Inspection tool in Google Search Console to find out which page Google considers the canonical.

When to Use a Canonical URL

A canonical URL is for solving duplicate content issues. These duplicate content issues could be on your site or they could be shared with other websites.

Duplicate content is an SEO no-no, and you can learn more about that in: Is Duplicate Content Bad for Search Engine Rankings?

Some reasons to use the canonical link element:

Specify which webpage you want in the search results.
Consolidate link signals for similar or duplicate webpages.
Simplify tracking methods for a product or topic.
Preserve crawl budget.

It used to be that the canonical link element was used often if you were to syndicate content across third-party publishers.

Today, Google says:

“The canonical link element is not recommended for those who wish to avoid duplication by syndication partners, because the pages are often very different.

The most effective solution is for partners to block indexing of your content.

For more, see Avoid article duplication in Google News, which also has advice about blocking syndicated content from Google Search.”

Google gives a list of do’s and don’ts when it comes to canonicalization in its help file:

Don’t use the robots.txt file for canonicalization purposes.

Don’t use the URL removal tool for canonicalization. It hides all versions of a URL from Search.

Don’t specify different URLs as canonical for the same page using different canonicalization techniques (for example, don’t specify one URL in a sitemap, but a different URL for that same page using rel=”canonical”).

We don’t recommend using noindex to prevent selection of a canonical page within a single site, because it will completely block the page from Search. rel=”canonical” link annotations are the preferred solution.

If you’re using hreflang elements, make sure to specify a canonical page in the same language, or the best possible substitute language if a canonical page doesn’t exist for the same language.

When linking within your site, link to the canonical URL rather than a duplicate URL. Linking consistently to the URL that you consider to be canonical helps Google understand your preference.

Specify a canonical page when using hreflang tags. Specify a canonical page in same language, or the best possible substitute language if a canonical doesn’t exist for the same language.

When linking within your site, link to the canonical URL rather than a duplicate URL. Linking consistently to the URL that you consider to be canonical helps Google understand your preference.

Also note that “Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are issues or conflicting signals.”

For even more on canonicalization and when to use it, Google myth-busts some common beliefs about canonicalization in the following video:

In that video, they cover:

Canonicalization is not a topical grouping (0:00)
The most common canonicalization myths (1:29)
Is canonicalization a directive or a signal for Google Search? (2:01)
Should canonicalization be used as a redirect? (3:08)
What are the actual factors for duplication and deduplication? (4:25)
Site’s preference for the canonical URL vs user’s preference (7:33)
Canonicalization vs unique content on pages with a canonical tag (08:59)

How to Use the Canonical Link Element

You can specify a canonical URL with rel=”canonical” in two ways:

Add a rel=”canonical” link element to the section of the duplicate (non-canonical version) of each webpage.
Indicate the canonical version of a URL by using a rel=”canonical” HTTP header.

Google gives a list of pros and cons for each execution type here.

Although Google gives multiple recommendations, they explain that you should choose one type of canonicalization method and stick with that.

Using more than one type of canonicalization method will be more prone to errors than using a single type of canonicalization method.

1. Adding a rel=”canonical” Link to the Section of the Duplicate (Non-Canonical Version) of Each Webpage

To tell the search engines when a page is a duplicate of another page, you can use the rel=”canonical” link tag on all the duplicate pages, indicating which page is the canonical URL.

Let’s look at an example. To specify a canonical link to the fictional URL: http://www.example.com/product.php?item=swedish-fish, you’d create a element as follows:

You’d then copy this link into the section of all non-canonical versions of the page, such as http://www.example.com/product.php?item=swedish-fish&sort=price.

If you publish content on both http and https, such as the following example:

http://www.example.com/product.php?item=swedish-fish and https://www.example.com/product.php?item=swedish-fish, you’d specify the canonical version of the page as well.

Create the element as follows:

Add this link to the section of https://www.example.com/product.php?item=swedish-fish.

2. Indicating the Canonical Version of a URL by Using a rel=”canonical” HTTP Header

Adding rel=”canonical” to the head section of a page is useful for HTML content, but it can’t be used for PDFs and other file types indexed by Google.

In these cases, you can indicate a canonical URL by responding with the link rel=”canonical” HTTP header, like this (note that to use this option, you’ll need to be able to configure your server):

Link: ; rel=”canonical”

Canonicalization FAQs

Here are a few common questions we get about canonicalization and their answers:

Is rel=”canonical” a suggestion or a directive?

The rel=”canonical” attribute is a hint or suggestion, not a directive.

The rel=”canonical” lets site owners suggest the version of a page that Google should treat as canonical. However, rel=”canonical” is a strong signal that the specified URL should become canonical.

Google will take this into account with other signals when determining which URL sets have identical content, and when calculating the most relevant of these pages to display in search results.

The rel=”canonical” attribute should be used only to specify the preferred version of pages with identical content (although minor differences such as sort order are OK).

For instance, if a site has a set of pages for the same model of dance shoe, each varying only by the color of the shoe pictured, it may make sense to set the page highlighting the most popular color as the canonical version so that Google may be more likely to show that page in search results.

Can Google follow a chain of rel=”canonical” designations?

Yes, to some extent, but to ensure optimal canonicalization, our recommendation is to update links to point to a single canonical page.

What’s the difference between applying a 301 redirect and using a canonical link element to avoid duplicate content?

301 redirects and canonical tags are both used to tell search engines about multiple versions of a web page.

301 redirects are used when a page has moved permanently, while canonical tags are used when there are multiple versions of a page.

Here are some differences between 301 redirects and canonical tags:

301 redirects are a status code that tells search engines and users that a page has moved permanently.
301 redirects remove the page from the index and pass any SEO credit to the new page.
301 redirects send users to the new location of a page.
The canonical tag tells search engines which page to show in search results.
Canonical tags are used to prevent problems caused by duplicate content appearing on multiple URLs.

Google gives some examples of when you’d use 301 redirects, here.

Can we use relative URLs in the canonical link element?

Google suggests using absolute URLs rather than relative URLs with the rel=”canonical” link element.

Even though relative paths are supported by Google, they can cause problems in the long run (for example, if you unintentionally allow your testing site to be crawled) and thus are not recommended).

(Read more on relative vs. absolute URLs.)

Does rel=”canonical” work if the URLs are different?

No, the canonical link element is only effective if the pages are duplicates or near duplicates.

If the pages are different, Google will disregard the canonical link element and consider the URLs as two different pages

To sum up, it’s worth taking the time to implement a canonical URL if you think you might run into duplicate content issues. Your SEO program will thank you.

Duplicate content issues affecting your SEO? Our SEO experts can help. Schedule a free 1:1 consultation with us today.

FAQ: How can I effectively use canonical URLs to prevent duplicate content issues and improve my SEO?

Canonical URLs play a crucial role in website optimization, enabling you to address duplicate content problems and enhance your SEO efforts. Understanding their purpose and implementing them correctly can make a significant difference in your search engine rankings.

So, let’s discuss the best practices for their effective usage.

Defining canonical URLs
Canonical URLs are HTML tags that inform search engines about the preferred version of a webpage when there are multiple versions with similar content. By specifying the canonical URL, you steer search engines in the right direction, ensuring they attribute the desired SEO value to the chosen URL.

Why are canonical URLs important?
When search engines encounter duplicate content, they can become confused about which version to rank in search results. As a result, your website’s SEO may suffer, and traffic may be divided among different pages. Canonical URLs address this issue by consolidating authority and ensuring that only one version is considered for ranking.

Implementing canonical URLs correctly
To effectively use canonical URLs, follow these steps:

Identify duplicate content: Analyze your website to find duplicate content pages that may cause confusion among search engines. Tools like Google Search Console and third-party SEO platforms can help with this task.
Choose the preferred version:Determine the primary version of the page that you want search engines to rank.
Add the canonical tag: Insert the canonical tag in the head section of the duplicate content pages, specifying the preferred version’s URL.

Best practices for canonical tags
To make the most of canonical URLs, consider these tips:

Consistency is key: Maintain consistency by using canonical tags uniformly throughout your website. Ensure all canonical URLs point to the same page when different versions exist.
Use appropriate directives: Utilize the rel=canonical attribute to specify the canonical URL.
Include self-referencing canonical tags: Even if a page has no duplicate versions currently, it’s good practice to include a self-referencing canonical tag to ensure consistency in case duplicates arise in the future.

Buyer intent search terms
When optimizing your content, keep in mind the buyer intent search terms related to canonical URLs. Some examples include “best practices for canonical URLs,” “canonical tags implementation,” and “how to prevent duplicate content using canonical URLs.”

Benefit of canonical URLs for SEO
By using canonical URLs correctly, you can consolidate the SEO value of duplicate content pages into one preferred version. This helps search engines understand your site structure better and keeps your rankings focused on the desired URL.

Regularly monitor duplicate content
Keep an eye on your website’s duplicate content by using SEO tools that send alerts when new instances are detected. This proactive approach ensures you maintain control over your site’s SEO performance.

Canonical URLs are an essential tool in preventing duplicate content issues and enhancing your SEO strategy. By correctly implementing canonical tags, you guide search engines toward the preferred version and consolidate SEO value. Stay vigilant, monitor duplicate content regularly and optimize your website for improved search engine rankings.

Step-by-Step Procedure:

Identify duplicate content pages on your website.
Choose the preferred version of the page.
Insert the canonical tag in the head section of duplicate content pages.
Specify the preferred version’s URL in the canonical tag.
Ensure consistency by using canonical tags uniformly on your website.
Use the rel=canonical attribute to specify canonical URLs.
Include self-referencing canonical tags on pages with no duplicates.
Optimize your content for buyer intent search terms related to canonical URLs.
Consolidate the SEO value of duplicate content pages into the preferred version.
Maintain a clear site structure for search engine understanding.
Regularly monitor duplicate content using SEO tools.
Receive alerts for new instances of duplicate content.
Take proactive actions to resolve and prevent duplicate content issues.
Stay updated with the latest best practices for canonical URLs.
Continuously optimize your website for improved search engine rankings.
Implement changes to canonical URLs whenever necessary.
Regularly review and refine your SEO strategy.
Stay knowledgeable about search engine algorithm updates.
Attend SEO conferences and join industry forums for expert insights.
Collaborate with SEO professionals to improve your website’s performance.

Implement the steps outlined in this article to prevent duplicate content issues, optimize your website and improve search engine rankings.

The post How To Use the Canonical Link Element for Duplicate Content appeared first on Bruce Clay, Inc..

How Do I Get Rid of Extra Pages in the Google Index?

Bruce Clay — Tue, 05 Dec 2023 17:30:21 +0000

Let’s say you have an ecommerce website with thousands of products, each with variations in sizes and colors. You use the Google Search Console Index Coverage report to see a list of Indexed pages in the Google search results for your website.

To your surprise, you see way more pages than the website should have. Why does that happen, and how do you get rid of them?

I answer this question in our “Ask Us Anything” series on YouTube. Here’s the video, and then you can read more about this common problem and its solution below.

Why do these “extra” webpages show up in Google’s index?
How do I get rid of “extra” webpages in Google’s index?
Summary
FAQ: How can I eliminate extra pages from my website’s Google index?

Why Do These “Extra” Webpages Show Up in Google’s Index?

This issue is common for ecommerce websites. “Extra” webpages can show up in Google’s index because extra URLs are being generated on your ecommerce website.

Here’s how: When people use search parameters on a website to specify certain sizes or colors of a product, it is typical that a new URL is automatically generated for that size or color choice.

That causes a separate webpage. Even though it’s not a “separate” product, that webpage can be indexed like the main product page, if it is discovered by Google via a link

When this happens, and you have a lot of size and color combinations, you may end up with many different webpages for one product. Now, if Google discovers those webpages URLs, then you may end up having multiple webpages in the Google index for one product.

How Do I Get Rid of “Extra” Webpages in Google’s Index?

Using the canonical tag, you can get all of those product variation URLs to point to the same original product page. That is the right way to handle near-duplicate content, such as color changes.

Here’s what Google has to say about using the canonical tag to resolve this issue:

A canonical URL is the URL of the page that Google thinks is most representative from a set of duplicate pages on your site. For example, if you have URLs for the same page (example.com?dress=1234 and example.com/dresses/1234), Google chooses one as canonical. The pages don’t need to be absolutely identical; minor changes in sorting or filtering of list pages don’t make the page unique (for example, sorting by price or filtering by item color).

Google goes on to say that:

If you have a single page that’s accessible by multiple URLs, or different pages with similar content … Google sees these as duplicate versions of the same page. Google will choose one URL as the canonical version and crawl that, and all other URLs will be considered duplicate URLs and crawled less often.

If you don’t explicitly tell Google which URL is canonical, Google will make the choice for you or might consider them both of equal weight, which might lead to unwanted behavior …

But what if you don’t want those “extra” pages indexed at all? In my opinion, the canonical solution is the way to go in this situation.

But there are two other solutions that people have used in the past to get the pages out of the index:

Block pages with robots.txt (not recommended, and I’ll explain why in a moment)
Use a robots meta tag to block individual pages

Robots.txt Option

The problem with using robots.txt to block webpages is that using it does not mean Google will drop webpages from the index.

According to Google Search Central:

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.

Also, a disallow directive in robots.txt does not guarantee the bot will not crawl the page. That is because robots.txt is a voluntary system. However it would be rare for the major search engine bots not to adhere to your directives.

Either way, this is not an optimal first choice. And Google recommends against it.

Robots Meta Tag Option

Here’s what Google says about the robots meta tag:

The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in Google Search results.

Place the robots meta tag in the section of any given webpage. Then, either encourage the bots to crawl that page via an XML sitemap submission or naturally (which could take up to 90 days).

When the bots come back to crawl the page, they will encounter the robots meta tag and understand the directive to not show the page in the search results.

Summary

So, to recap:

Using the canonical tag is the best and most common solution to the problem of “extra” pages being indexed in Google — a common issue for ecommerce websites.
If you don’t want pages to be indexed at all, consider using the robots meta tag to direct the search engine bots how you want those pages to be handled.

Still confused or want someone to take care of this problem for you? We can help you with your extra pages and remove them from the Google index for you. Schedule a free consultation here.

FAQ: How can I eliminate extra pages from my website’s Google index?

The issue of extra pages in your website’s Google index can be a significant roadblock. These surplus pages often stem from dynamic content generation, such as product variations on ecommerce sites, creating a cluttered index that affects your site’s performance.

Understanding the root cause is crucial. Ecommerce websites, in particular, face challenges when various product attributes trigger the generation of multiple URLs for a single product. This can lead to many indexed pages, impacting your site’s SEO and user experience.

Employing the canonical tag is the most reliable solution to tackle this. The canonical tag signals to Google the preferred version of a page, consolidating the indexing power onto a single, representative URL. Google itself recommends this method, emphasizing its effectiveness in handling near-duplicate content.

While some may consider using robots.txt to block webpages, it’s not optimal. Google interprets robots.txt as a directive to control crawler access, not as a tool for removal from the index. In contrast, the robots meta tag offers a more targeted approach, allowing precise control over individual page indexing.

The canonical tag remains the go-to solution. However, if there’s a strong preference for total removal from the index, the robot meta tag can be a strategic ally. Balancing the desire for a streamlined index with SEO best practices is the key to optimizing your online presence effectively.

Mastering the elimination of extra pages from your website’s Google index involves a strategic combination of understanding the issue, implementing best practices like the canonical tag and considering alternatives for specific scenarios. By adopting these strategies, webmasters can enhance their site’s SEO, improve user experience and maintain a clean and efficient online presence.

Step-by-Step Procedure:

Identify Extra Pages: Conduct a thorough audit to pinpoint all surplus pages in your website’s Google index.
Determine Root Cause: Understand why these pages are generated, focusing on dynamic content elements.
Prioritize Canonical Tag: Emphasize the use of the canonical tag as the primary solution for near-duplicate content.
Implement Canonical Tags: Apply canonical tags to all relevant pages, specifying the preferred version for consolidation.
Check Google Recommendations: Align strategies with Google’s guidelines, ensuring compatibility and adherence.
Evaluate Robots.txt Option: Understand the limitations and potential drawbacks before considering robots.txt.
Deploy Robots Meta Tag: Use robot meta tags strategically to control indexing on specific pages if necessary.
Balance SEO Impact: Consider the impact of each solution on SEO and user experience for informed decision-making.
Regular Monitoring: Establish a routine to monitor index changes and assess the effectiveness of implemented strategies.
Iterative Optimization: Continuously refine and optimize strategies based on evolving site dynamics and Google algorithms.

Continue refining and adapting these steps based on your website’s unique characteristics and changing SEO landscapes.

The post How Do I Get Rid of Extra Pages in the Google Index? appeared first on Bruce Clay, Inc..