Question 1

What is a robots.txt file?

Accepted Answer

robots.txt is a plain text file placed at the root of your website (e.g. example.com/robots.txt) that tells web crawlers which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Standard.

Question 2

Does robots.txt prevent a page from being indexed?

Accepted Answer

No — robots.txt only controls crawling, not indexing. A blocked page can still appear in search results if Google finds links pointing to it. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header on the page itself.

Question 3

How do I block a specific bot in robots.txt?

Accepted Answer

Use a separate User-agent block. For example, to block only AhrefsBot: User-agent: AhrefsBot on one line, then Disallow: / on the next. Google's crawler responds to User-agent: Googlebot.

Question 4

Does Google always respect robots.txt?

Accepted Answer

Google respects Disallow rules in robots.txt for crawling. However, Google may still index URLs it hasn't crawled if it finds links to them. Google also ignores robots.txt directives for Crawl-delay — it uses its own crawl rate logic.

Question 5

Where should I place the robots.txt file?

Accepted Answer

robots.txt must be placed at the root of your domain — e.g. https://yourdomain.com/robots.txt. It cannot be placed in a subdirectory and cannot apply to subdomains other than the one it's served from. Each subdomain needs its own robots.txt.

Robots.txt Tester

Frequently Asked Questions

Frequently Asked Questions

Related tools