Support Hotline
Monday to Friday 9am to 6pm CET
(Standard national / international calling rates apply)
Help and Support Center
Ask us: SEARCH
Make effective use of robots.txt
Posted by Ouali Rezouali on 12 December 2012 02:29 PM


Make effective use of robots.txt


Restrict crawling where it's not needed with robots.txt


A "robots.txt" file tells search engines whether they can access and therefore crawl parts of your site (1). This file, which must be named "robots.txt", is placed in the root directory of your site ( 2).

You may not want certain pages of your site crawled because they might not be useful to users if found in a search engine's search results. If you do want to prevent search engines from crawling your pages, Google Webmaster Tools has a friendly robots.txt generator to help you create this file. Note that if your site uses subdomains and you wish to have certain pages not crawled on a particular subdomain, you'll have to create a separate robots.txt file for that subdomain. For more information on robots.txt, we suggest this Webmaster Help Center guide on using robots.txt files


There are a handful of other ways to prevent content appearing in search results, such as adding "NOINDEX" to your robots meta tag,
using .htaccess to password protect directories, and using Google Webmaster Tools to remove content that has already been crawled.
Google engineer Matt Cutts walks through the caveats of each URL blocking method in a helpful video.


Your Cabanova Website and robots.txt


We don't provide a function which would allow to generate a robot.txt file directly from the Sitebuilder but if necessary, you can generate a robot.txt file using the Google robot.txt Generator and provide us with it.


We'll then add it to the root directory of your Website at our server level.


The following link provides you with a detailed procedure on how to generate a robot.txt file: Click here to go to the link


Best Practices


Use more secure methods for sensitive content

You shouldn't feel comfortable using robots.txt to block sensitive or confidential material. One reason is that search engines could still reference the URLs you block (showing just the URL, no title or snippet) if there happen to be links to those URLs somewhere on the Internet (like referrer logs). Also, non-compliant or rogue search engines that don't acknowledge the Robots Exclusion Standard could
disobey the instructions of your robots.txt. Finally, a curious user could examine the directories or subdirectories in your robots.txt file and guess the URL of the content that you don't want seen. Encrypting the content or password-protecting it with .htaccess are more secure alternatives.



- allowing search result-like pages to be crawled
- users dislike leaving one search result page and landing on another search result page that doesn't add significant value for them
- allowing URLs created as a result of proxy services to be crawled

(0 votes)
This article was helpful
This article was not helpful

Comments (0)
Post a new comment 
Full Name: