Limited Time Offer!

For Less Than the Cost of a Starbucks Coffee, Access All DevOpsSchool Videos on YouTube Unlimitedly.
Master DevOps, SRE, DevSecOps Skills!

Enroll Now

What is Robots.txt and an overview for SEO?

Are you wondering what the is a robots.txt file? well, I’m gonna tell you about it. Robots.txt is a text file that you put on your site so that you can show search robots on which pages you want to show or crawl on your site and which files should not be visible. By the way, Robots.txt is not mandatory for search engines but they pay attention to it and do not visit pages and folders mentioned in it. According to that Robots.txt is very important. So it is very important to put it in the main directory so that the search engine is easy to find.

Example robots.txt:
Here are a few examples of robots.txt in action for a www.example.com site:

Robots.txt file URL: www.example.com/robots.txt
Blocking all web crawlers from all content
User-agent: *
Disallow: /
Using this syntax in a robots.txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage.

Allowing all web crawlers access to all content
User-agent: *
Disallow:

I am giving an overview below for SEO & key Insight.

You wanna see if you have a robots.txt file or not, you can go to your site, put in /robots.txt and it should live there.
Now the robots.txt file is a really important thing, because it is recommended by google that you specifically have one. And if can’t find it and other crawlers out there can’t find it, in some cases they won’t crawl your website at all. or at least that’s what they say. In many cases, I’ve see that they actually still will, but you definitely want to have a robots.txt file, and there are some things that you need to know about it.

So, the robots.txt file, it basically, at its most primitive, basic state, it allows you to either block the website, block portions of the website, or index the website. so that’s basically what it does. It’s just a way to basically allow your site to be inside of Google or not.

So the robots.txt file is just one way to block things online or to allow things online. You know, it doesn’t only need to be google, there’s a lot of different types of crawlers out there, a lot of different types of things that are looking to access your website on a regular basis, that can be blocked from the robot.txt.

So in some cases, you might see a hundred things in somebody’s robots.txt file that they’re trying to block. the reason that they wanna do that is because if they get too many of these third-party widgets and things coming in, trying to crawl the site.

Third party crawlers can slow down the site

It can slow down the site, it can slow down the server, and it can cause server errors and all kinds of different issues, and you know, maybe you just wanna block somebody from scraping content from your website or analyzing specific changes that you make on your site. so you know, the tricky part is though, in many cases, that people, these type of software, they’ll just ignore the file, right?

so you can ignore it and there’s some software out there, like screaming frog for example, where you just click a button and it says, ignore the robots.txt file. so even if you have a block inside of the robot.txt, you can click ignore, and just gonna completely ignore it, and gonna allow you to crawl the site anyway. So a lot of subtleties to this little file that lives on your websites, and quite a bit to know about it, actually.

So I feel like I could go on and on about the robot.txt file, but I just wanna give you a basic overview, so that you understand what it’s about, you know. kind of the last thing that I’ll mention about it is you can use the robot.txt tester inside of Google search console to see if you’re blocking a certain page on your website, with the robots.txt file.

I recommend you keep an eye on it and check it from time to time. A lot of people will put a link to their XML sitemap in there, with the theory that Google comes in, crawls it, sees the XML sitemap, and jumps into that. That’s a little bit of an older practice, and probably not anything that you need to do anymore, especially with being able to submit sitemaps directly to Google search console. But it’s a great file, make sure you have one, make sure that it’s set up correctly.
Example sitemap

User-agent: *

Allow: /
#Sitemap Reference
Sitemap:http://www.example.com/sitemap.xml