Robots.txt
Robots.txt is a special text file which is used to provide instructions to the web crawlers or robots regarding which areas of your site should not be crawled and indexed. This file exists in the root or top level directory and is the first file that the web crawlers access. The crawlers read the information contained in the robots.txt file and proceed accordingly.
Syntax
User-Agent:* (Agent Name)
Disallow: / (File Path)
In the above lines, Agent Name is to be replaced by name of any search engine bots which you wish to exclude and the file path is to be replaced by the absolute url of the file which you wish to exclude.
Example 1
User-Agent:*
Disallow: /
The above contents would disallow all the robots from accessing the server and thereby stopping them from accessing the files contained therein.
Example 2
User-Agent: Google
Disallow: /
The above contents would disallow Google bot from accessing the server and thereby stopping it from accessing the files contained therein.
When should you use Robots.txt ?
A pretty useful question, right? Well, you must use robots.txt if there are some special scripts in your server which you do not want the bots to access or if you want any specific bots not to crawl the contents of the site. For smaller sites of less than hundred pages , there is rarely any need for robots.txt as there are no special scripts to hide. But for larger sites that have huge databases associated with them, they may be some special pages or scripts which needs to be hidden from the bots. In that case, you must use this robots.txt file.
I have created robots.txt, now my secret pages are safe!
I have heard many people say this but in reality, robots.txt works only for obedient robots. The instructions contained in the file may or may not be followed by the search engine bots. The obedient ones will follow it and would not crawl the secret pages while the unobedient ones would disallow the instructions and can begin crawling. So if you want to keep the pages out of index, use no index meta tag.
Robots.txt Example Entries and Use of Wildcards
1- To disallow crawling of a folder named "Abs"
User-agent: *
Disallow: /Abs/
2- To block a page named "Soc.html"
User-agent: *
Disallow: /Soc.html
3- To block web pages that has file name ending with php
User-agent: *
Disallow: /*.php$
4- To block googlebot (Google's crawling agent) from accessing contents in the folder named "B"
User-agent: googlebot
Disallow: /B/
5- To disallow all .jpg extension images from the crawlers
User-agent: *
Disallow: *.jpg
6- To exclude a file named "joker.php" contained in the folder named "circus"
User-agent: *
Disallow: /circus/joker.php
7- To prevent the Googlebot-Image from accessing images on your site
User-agent: Googlebot-Image
Disallow: /
Tricky Question??
What will the following entry do?
User-agent: *
Disallow:
Answer- It will allow the crawlers to access every folder and every web page on your server because you have not mentioned any folder or file name to be disallowed.
Free Robots.txt Generators
There are some free tools on the web which can help you in creating your own robots.txt file. These tools are given below:-
Seobook robots.txt generator
Advanced robots.txt generator (Software free download)
Yellowpipe robots.txt generator
Seochat robots generator
Syntax
User-Agent:* (Agent Name)
Disallow: / (File Path)
In the above lines, Agent Name is to be replaced by name of any search engine bots which you wish to exclude and the file path is to be replaced by the absolute url of the file which you wish to exclude.
Example 1
User-Agent:*
Disallow: /
The above contents would disallow all the robots from accessing the server and thereby stopping them from accessing the files contained therein.
Example 2
User-Agent: Google
Disallow: /
The above contents would disallow Google bot from accessing the server and thereby stopping it from accessing the files contained therein.
When should you use Robots.txt ?
A pretty useful question, right? Well, you must use robots.txt if there are some special scripts in your server which you do not want the bots to access or if you want any specific bots not to crawl the contents of the site. For smaller sites of less than hundred pages , there is rarely any need for robots.txt as there are no special scripts to hide. But for larger sites that have huge databases associated with them, they may be some special pages or scripts which needs to be hidden from the bots. In that case, you must use this robots.txt file.
I have created robots.txt, now my secret pages are safe!
I have heard many people say this but in reality, robots.txt works only for obedient robots. The instructions contained in the file may or may not be followed by the search engine bots. The obedient ones will follow it and would not crawl the secret pages while the unobedient ones would disallow the instructions and can begin crawling. So if you want to keep the pages out of index, use no index meta tag.
Robots.txt Example Entries and Use of Wildcards
1- To disallow crawling of a folder named "Abs"
User-agent: *
Disallow: /Abs/
2- To block a page named "Soc.html"
User-agent: *
Disallow: /Soc.html
3- To block web pages that has file name ending with php
User-agent: *
Disallow: /*.php$
4- To block googlebot (Google's crawling agent) from accessing contents in the folder named "B"
User-agent: googlebot
Disallow: /B/
5- To disallow all .jpg extension images from the crawlers
User-agent: *
Disallow: *.jpg
6- To exclude a file named "joker.php" contained in the folder named "circus"
User-agent: *
Disallow: /circus/joker.php
7- To prevent the Googlebot-Image from accessing images on your site
User-agent: Googlebot-Image
Disallow: /
Tricky Question??
What will the following entry do?
User-agent: *
Disallow:
Answer- It will allow the crawlers to access every folder and every web page on your server because you have not mentioned any folder or file name to be disallowed.
Free Robots.txt Generators
There are some free tools on the web which can help you in creating your own robots.txt file. These tools are given below:-
Seobook robots.txt generator
Advanced robots.txt generator (Software free download)
Yellowpipe robots.txt generator
Seochat robots generator
Terima kasih telah membaca artikel Robots.txt
Artikel ini memiliki rating: 97% Top Artikel dari 93058 ratings. 93058 pembaca merekomendasikan ini.
Ditulis Oleh Zai Azura
Description: Robots.txt
Terimakasih atas kunjungan Sobat beserta kesediaan Sobat membaca artikel ini. Kritik dan Saran dapat Sobat sampaikan melalui Kotak komentar dibawah.
Artikel ini memiliki rating: 97% Top Artikel dari 93058 ratings. 93058 pembaca merekomendasikan ini.
Ditulis Oleh Zai Azura
Description: Robots.txt
Terimakasih atas kunjungan Sobat beserta kesediaan Sobat membaca artikel ini. Kritik dan Saran dapat Sobat sampaikan melalui Kotak komentar dibawah.
0 comments
Post a Comment
Untuk menyisipkan gambar, gunakan tag <i rel='image'>URL GAMBAR ANDA...</i>
Untuk menyisipkan kode, gunakan tag <i rel="pre">KODE ANDA...</i>
Untuk menciptakan efek tebal gunakan tag <strong>TEKS ANDA...</strong>
Link aktif akan dimatikan secara otomatis.
NB: Jika ingin menuliskan kode pada komentar harap gunakan Tool untuk mengkonversi kode tersebut agar kode bisa muncul dan jelas atau gunakan tool dibawah "Konversi Kode di Sini!".
NB: Jika ingin menuliskan komentar yang keluar dari topik pada artikel ini silahkan kehalaman OOT (Out Of Topic) dengan menekan tombol OOT di bawah ini.
Konversi Kode di Sini! Emoticon OOT