STUDY SEO: Tell Google Which Pages Not to Crawl

Bangkok. Thailand. JUNE 24,2021 :A woman is typing on Google search engine from a laptop. Google is the biggest Internet search engine in the world.

The typical goal of search engine optimization is to have your site’s pages show up on a Google results page in answer to a query. The object is for Google and every other search engine to crawl and index all your product detail pages, blog posts, and articles, and anything else that results in conversions.

But there are pages that should not be included in search results. Removing them from Google’s index might actually increase search engine traffic to more important, better-converting pages.

Don’t Index These

But do you really care if your privacy policy, GDPR disclosures, or similar pages are showing up on Google? Pages you likely don’t want Google to index include:

  • Thank you pages (displayed after a survey or similar)
  • Ad landing pages (meant for pay-per-click campaigns)
  • Policy pages
  • Internal site search results (because going from Google’s results page right into your website’s search results page may not be a good user experience).

 

Not every page on your company’s website should be indexed with Google. Photo: Campaign Creators.

Removing Pages

Getting these sorts of pages out of Google’s index could also improve your website’s authority which in turn might improve how well its various pages rank on Google for relevant queries.

Some SEO practitioners argue that Google has become adept at identifying content quality and is on the lookout, so to speak, for redundant, duplicate, or relatively low-quality pages.

What’s more, some SEO professionals have suggested that Google averages the relative value of all of the pages on your website to create an aggregate authority or value score. This might be domain authority, domain rank, or a similar metric.

If your company has stuffed Google’s index with relatively low-value pages — such as the privacy policy your tech guy copied and pasted from your ecommerce platform provider — it could affect how authoritative Google believes your site is as a whole.

For example, writing about the topic of removing website pages (deleting pages, in this instance), Chris Hickey of Inflow, an ecommerce agency in Denver, Colorado, reported a 22 percent increase in organic search engine traffic and a 7 percent increase in revenue from organic search traffic after culling thousands of duplicate pages from a client’s ecommerce website.

Similarly, in 2017 SEO tool maker Moz removed 75 percent of the pages on its website from the Google index. The pages were primarily low-value member profiles from the Moz community. These pages did not have much unique content, and removing them from the Google index resulted in a 13.7 percent increase in year-over-year organic search traffic.

Removal Tool

Perhaps the best tool for removing an individual page from Google’s index is the robots noindex meta tag.

Inserted in thesection of a page’s HTML markup, this simple tag asks all search engines not to index the associated page. Google’s primary web crawler, Googlebot, follows this directive and will drop any page marked with noindex the next time it crawls that page.

Using your website’s content management system, it should be relatively easy to add this tag to policy pages, internal search results, and other pages that don’t need to be included in Google’s index or shown in response to a Google query.

HTTP Response Header

The robots noindex directive may also be passed in an HTTP response header. Think of the HTTP response header as a text message your server sends to a web browser or web crawler (such as Googlebot) when it requests a page.

Within this header, your site can tell Google not to index the page. Here is an example.

For some businesses, it may be easier to write a script that will place this X-Robots-Tag than it would be to manually or even programmatically add the robots meta tag. Both this HTTP tag and the meta tag have the same effect. Which one of these methods your business uses is a matter of preference.

Prevent Indexing?

Robots.txt does not prevent indexing. A robots.txt file is located in a website’s root directory. This simple text file tells a search engine web crawler which pages on the site it can access.

Often, website owners and managers mistakenly think that disallowing a page in a robots.txt file will prevent that page from showing up in Google’s index. But that is not always the case.

For example, if another site links to a page on your company’s website, Googlebot could follow that link and index the page even if that page is disallowed in a robots.txt file.

If you want to remove pages from Google’s index, the robots.txt file is probably not the best choice. Rather, it is helpful for limiting how Google indexes your site and preventing search engine bots from overwhelming your company’s web server.

It is important to mention that you should not disallow a page in a robots.txt file and use a noindex tag at the same time. Doing so could cause Googlebot to miss the noindex directive.

Ultimately, it may sound counterintuitive, but there are almost certainly pages on your company’s website that should not be included in Google’s index or displayed on a Google results page. The best way to remove those pages is with a robots noindex tag.

Sources

SEO: Tell Google Which Pages Not to Crawl

Get a FREE 1-to-1 SEO Service Consultation

Fill up below and we will get back to you a FREE 1-to-1 SEO Service consultation on how to grow your business in Google!

Latest News

What’s New in SEO & SEM ?

  • All
  • Blog
  • SEO Malaysia
  • Website Design
All
  • All
  • Blog
  • SEO Malaysia
  • Website Design
4 useful suggestions for using SEO into your business

4 useful suggestions for using SEO into your business

SEO is a strategy that includes approaches for optimising websites, blogs, and other web pages in order to get visibility …

What factors contribute to an excellent web page design

What factors contribute to an excellent web page design?

Talking about your firm on your own website is obviously crucial; once consumers go to your website, they certainly want …

Catchall Redirects Are a Bad Practice

SEO: Catchall Redirects Are a Bad Practice

I recently learned a new phrase from a prospective client: “limbo page.” He used the term when describing his company’s …

How to Deal With a Negative Review Attack

How to Deal With a Negative Review Attack

One Sunday, as our church was holding our outdoor patio service, a hummingbird got stuck in the skylight in the …

Google Muzzles ‘Self-serving’ Review Snippets

Google Muzzles ‘Self-serving’ Review Snippets

With a new algorithm, Google is determined to remove “self-serving” reviews from rich snippets in search results. Thankfully for ecommerce …

SEO Tips When URLs Differ for Mobile and Desktop

SEO Tips When URLs Differ for Mobile and Desktop

If you still have a separate mobile ecommerce site, meaning that you have different URLs for mobile and desktop, your …

Young content creator girl is on her laptop sitting on the sofa. Working with photos from home

SEO: That Blob of Text May Not Be Helping

Adding on-site text for search engines and not users could confuse algorithms and may amount to keyword stuffing, according to …

Asian fashion female blogger online influencer holding shopping bags and lots of clothes on clothes rack for recording new fashion video broadcast live video to social network by internet at home.

11 Steps to On-site Video SEO

Once you decide to invest in video, you still need to figure out how to get people to view and …

find-your-audience-on-digital-and-storytell-with-data

Perspectives: Find your audience on digital and storytell with data

With so many fascinating tales to tell, Netflix aims to ignite conversations across Asia that will get consumers excited about …

male hand holding a smartphone, selective focus

4 mobile page speed wins to discuss with your developer, starting with images

Editor’s note: A version of this article previously appeared on Think with Google Nordics & Benelux. It addresses the importance …