π€ Robots.txt Generator
Create a robots.txt file to control search engine crawling
User-agent Rules
Robots.txt: Your Site's Traffic Cop
The robots.txt file tells search engine crawlers where they can and cannot go on your site. It's not security β it's a polite request. Well-behaved crawlers follow it; malicious bots ignore it completely. Understanding its syntax and limitations is essential for every website owner.
Basic Syntax and Rules
Robots.txt uses a simple format:
User-agent: [crawler name]
Disallow: [path to block]
Allow: [path to permit] Wildcards are supported: * matches any characters, $ matches end of URL. Comments start with #.
A common misconception: paths are case-sensitive. /Private/ and /private/ are different paths.
The Security Myth
Robots.txt is NOT a security mechanism. Anyone can view your robots.txt file directly by visiting yourdomain.com/robots.txt. It's visible to everyone, including malicious actors.
If you have truly sensitive pages (admin panels, private user data, internal documents), don't rely on robots.txt. Use authentication, password protection, or noindex meta tags. Treat robots.txt as a crawl efficiency tool, not a security tool.
Common Mistakes That Break SEO
- Blocking CSS and JavaScript: Googlebot needs to render pages like a browser. Blocking /css/ or /js/ directories causes rendering problems and can tank your rankings.
- Incorrect wildcard usage: Disallow: /private* blocks /private, /private-page, /private-area. But Disallow: /private blocks only /private, not subdirectories.
- Forgetting the sitemap: Add your sitemap location: Sitemap: https://yoursite.com/sitemap.xml
- Case sensitivity: User-agent: Googlebot (not GOOGLEBOT)
Special Directives for Googlebot
Google supports additional directives beyond the standard robots.txt:
- Crawl-delay: How many seconds between requests (but Google often ignores this)
- Googlebot-Image: Target image crawlers specifically
- Googlebot-News: For news-specific crawling
Testing Your Robots.txt
Always test your robots.txt before deploying:
- Google Search Console: robots.txt tester shows errors and warnings
- Fetch as Google: Test specific URLs to see if Googlebot can access them
- Browser preview: View your robots.txt in a browser to see what's visible
Step-by-Step Guide
- Add user-agent rules β Click "Add Rule" to create directives for specific crawlers (like Googlebot) or use * for all crawlers.
- Set Allow or Disallow β "Allow" tells crawlers they can access a path. "Disallow" tells them to stay away.
- Configure paths β Enter the URL paths. Use / for root, /admin/ for specific folders. Wildcards like * are supported.
- Add your sitemap URL β Include a reference to your XML sitemap to help crawlers find it.
- Generate and download β Copy the generated code and save it as robots.txt in your website's root directory.
Tips & Best Practices
- Always allow CSS and JS β Googlebot needs to access your stylesheets and JavaScript to properly render pages. Never block these directories.
- Block private/admin areas β Use disallow for /admin/, /wp-admin/, /private/, or any backend areas. But remember: robots.txt is not security!
- Validate your file β Use Google's robots.txt tester in Search Console to verify your file works correctly.
- Robots.txt is not a security tool β Anyone can view your robots.txt. Use proper authentication for truly sensitive content.
Frequently Asked Questions
A robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It must be placed in the root directory of your website.
No. Robots.txt is a directive, not a security measure. Malicious bots may ignore it. Never use robots.txt to hide sensitive pages β use proper authentication instead.
Where should I put my robots.txt file?
Place it at the root of your domain: https://example.com/robots.txt. It will not work in subdirectories.