What is Robots.txt and How It Will Affects Your Website?
What is a Robots.txt File?
The robots.txt file is a file created by webmasters that instructs the search engine robots on how you want your website pages to be crawled or indexed in the search engines.
In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents. If the robots.txt file does not contain any directives that disallow a user-agent’s activity (or if the site doesn’t have a robots.txt file), it will proceed to crawl other information on the site.
Basic Format of Robots.txt file:
This is the basic format of a robots.txt file which you can follow:
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Below are several examples of robots.txt which contain several directives and how you can interpret it.
Allowing all web crawlers to index your website because the * is referring to all robots.
Meanwhile, the directive has no value, that means no pages are disallowed.
User-agent: *
Disallow:
In the example below, the syntax is indicating all robots to stay away from indexing this website (including home directory).
User-agent: *
Disallow: /
The syntax below indicate all robots to stay away from indexing these directories:
User-agent: *
Disallow: /wordpress
Disallow: /tmp/
Disallow: /joomla
This syntax is indicating to the robots to stay away from one specific file:
User-agent: *
Disallow: /taxation/payment-amount.html
This syntax tells a specific robot e.g. Googlebot to stay out of a website:
User-agent: googlebot
Disallow: /
The spelling of the bots are non-case-sensitive. You can use "Googlebot" or "googlebot" inside your robots.txt
This syntax informs two specific robots not to enter one specific directory:
User-agent: Samplebot # replace ''Samplebot" with the actual user-agent of the bot
User-agent: bingbot
Disallow: /secret/
Example:
User-agent: bingbot
User-agent: googlebot
Disallow: /secret/
Do You Know You Can Add Your Comments Inside Robots.txt Too?
Instead of adding the syntax user-agent and disallow inside robots.txt, you can also insert your comment after # as seen in the samples below:
User-agent: googlebot # all Google services
Disallow: /secret/ # disallow this directory
User-agent: googlebot-news # only the news service
Disallow: / # disallow everything
User-agent: * # any robot
Disallow: /secret/ # disallow this directory
Updated on: 04/02/2019
Thank you!