Smart Traffic SEO Tip #19 – Ways to block your Site from Google – Use of Robots file

In our previous blog posts we discussed about banned sites, reasons why we should avoid linking to them and methods to check whether the site is indexed or not. While many are trying hard to get their pages indexed, there are some that choose not to index their site or a portion of site’s pages in Google. It may be a secure site, a mirror site duplicating others or a test site.

One way to restrict what pages are indexed by Google is by using the Robots.txt file. This file restricts access to your site by search engine robots. The Robots.txt file actually uses three rules:

* User-agent: the robot the following rule applies to
* Disallow: the URL you want to block
* Allow: the URL you want to allow (used if blocking an entire site apart from some pages)

If you choose to block your whole site from all robots then use the following syntax in your robots.txt file:

User-Agent: *
Disallow: /

The asterisk above stands for ALL ROBOTS and the forward slash sign “/” stands for the entire site. This syntax will block robots from crawling your site’s pages. However, take note that if a robot discovers your site by other means, like through a link from another site pointing to your URL, then they may still appear in Google index and in the search results.

To entirely prevent a page from being added to the Google index even if other sites link to it then use the noindex meta tag to be discussed in the next blog post.