Have you ever heard of it, if not, today it is a matter of great news for you because today i will be giving some information about Robots txt. If you have a blog or website then you must have felt that sometimes all the information we do not want to get public becomes public in the internet,Do you know why this happens? Why many good things have not been indexed after too many days. If you want to know about the secret behind all these things, then you will have to read this article, Robots txt carefully, so you will know about all these things till the end of the article.
To tell all the search engineers, the files and folders have to show all the public in the website and Robots meta tag is used for what does not. But all Search Engines do not have to read Meta tag, so many Robots Meta tag goes unnoticed as unread. The best way to do this .It is the use of this file, which can easily be given to search engines about files and folders in their website or blog. So today I thought why you should give all the information about what is Robots.txt , so that you will not have any problem in understanding it further. Then what are the delays, let’s start and know what the is robots.txt and what’s the fate behind it.
What is Robots txt?
Robots.txt is a text file that you put in your site so that you can tell Search Robots which pages you want to visit or crawl in your site and who do not. By the way, following Robots.txt is not mandatory for search engines but they pay attention to it and do not visit pages and folders mentioned in it. According to that Robots.txt is very important. So it is very important to keep it in main directory so that the search engine has the ability to find it.
The point here to note is that if we do not implement this file in the right place, then search engineers will think that maybe you have not included the robot.txt file so that the pages of your site may not even be index. So this small file has a lot of importance if it has not been used correctly, it can also reduce the ranking of your website. Therefore it is very important to have good information about this.
How robots txt work?
Any search engines or Web Spiders have come to your website or blog for the first time, then they crawl your robot.txt file as it contains all the information about your website, which is not to crawl and which ones to do. And they index your guided pages, so that your indexed pages appear in search engine results.
Robots.txt files can prove to be very fond of you if:
- You want search engine to ignore duplicate pages in your website
- Do not want to index your internal search results pages
- If you want the search engines to index some pages you do not index
- Do not want to index some of your files such as some images, PDFs etc.
- You want to tell search engines where your sitemap is stable then
How to create robots.txt file
If you have not even created a robots.txt file in your website or blog then you should make it very soon, because it is going to be very favored for you in the future. You must follow some instructions to create this:
- First create a text file and save it as robots.txt. For this, you can use NotePad if you use Windows or TextEdit if you use Macs and then save it according to the text-delimited file.
- Now upload it to your website’s root directory. Which is a root level folder and it is also called “htdocs” and it appears after your domain name.
- If you use subdomains, then you need to create separate robots.txt file for all the subdomain
What is Syntax Robots txt
In Robots.txt we use some syntax, which we really need to know about.
- User-Agent: Those robots that follow these rules and they are applicable (eg “Googlebot,” etc.)
- Disallow: To use it means blocking pages from bots which you do not want any other can access it. (Need to write disallow before files here)
- Noindex: With its use, the search engine will not index your pages that you do not want to be indexed.
- Use a blank line to separate all the User-Agent / Disallow group, but note here that there are no blank lines between the two groups (not between the user-agent line and the last Disallow needed .
- Hash symbol (#) can be used to give comments inside a robots.txt file, where all the items of # will be ignored will be ignored. They are mainly used for whole lines or end of lines.
- Directories and filenames are case-sensitive: “private”, “private”, and “PRIVATE” are quite different for all search engines. Let’s understand this with the help of example. Here’s a note about him.
- The robot “Googlebot” here has not written a statement disallowed in it so that it is free to go anywhere • All the sites here have been closed where “msnbot” has been used.
- All robots (other than Googlebot) are not permitted to view / tmp / directory or files called / logs, which are explained below, through comments, eg, tmp.htm,
/ logs or logs.php
# Block all robots from tmp and logs directories
Disallow: / tmp
/ Disallow: / logs # for directories and files called logs
Advantages of using Robots.txt
By the way, lots of use of robots.txt is given to me, but I have to tell here some very important information about which everyone should be aware of.
- Using robots.txt, your sensitive information can be kept private.
- With the help of robots.txt “canonicalization” problems can be kept away or multiple “canonical” URLs can also be placed. Forgetting this problem is also called “duplicate content” problem.
- With this you can also help Google Bots to index pages.
What did if we not use the robots.txt file?
If we do not use any robots.txt file then there is no restriction on search engines where to crawl and where it can not index all the things that they find in your website. This is all for many websites but if we talk about some good practice then we should be used because search engines have the ability to index your pages, and they do not need to go through all the pages again and again.