What's going on here
When parsing robots.txt files, search engines ignore anything that's after a hashtag in a robots.txt file - this turns it into a comment. However, a crawler will read anything on the page when it doesn't think it's looking at a robots.txt file, meaning we can put an entire website inside a robots.txt file.
But how is it in a text file?
We're using the .txt extension as an alias for PHP, meaning that we can code intelligently and change the content when different URL parameters are present.
It's worth noting that we are allowing *all* txt files to execute PHP because we are puritans and because we know this environment is secure. If you were implementing this on a production server, you should use mod rewrite to only execute PHP when the robots.txt file is requested
And where are the pounds/hashtags (#)?
Do crawlers actually obey the directives though?
Yes, see for yourself: Here we have A page which robots are not allowed to crawl
, and Google is not able to crawl it