The robots.txt Mistakes That Can Kill Your SEO

📅 May 17, 2026 ⏱️ 7 min read By Lao Lu · lusdaily.com

Robot blocked by wall illustration representing robots.txt SEO mistakes

A few months ago, I watched a friend's e-commerce site drop from 15,000 daily organic visitors to under 2,000 in the span of two weeks. The cause? A single line in their robots.txt file that was accidentally blocking Google from crawling their product pages.

The worst part is they didn't even notice it themselves. I found it during a routine audit, buried in a file they hadn't touched in months. Someone on their team had added a wildcard rule thinking it only applied to a staging subdomain. It didn't.

That's the thing about robots.txt mistakes — they're silent killers. Your site still loads fine. Users can still browse. But Google can't see half your pages, and you won't find out until your traffic starts tanking.

What robots.txt Actually Does (And Doesn't Do)

Before we get into the mistakes, let's get clear on what robots.txt is and isn't:

What it does: Tells search engine crawlers which URLs they should request from your site. It's a suggestion, not a wall — well-behaved crawlers respect it, but malicious ones ignore it.

What it doesn't do: It does NOT prevent pages from being indexed. If Google finds a URL through a sitemap, an internal link, or an external backlink, it might index it even if robots.txt blocks crawling. If you want to prevent indexing, use the noindex meta tag instead.

This distinction is crucial, and it's where most people get confused.

Mistake #1: Blocking CSS and JavaScript Files

This is probably the most common robots.txt mistake I see. People block entire directories that happen to contain CSS and JS files, then wonder why their pages look different in search results.

Google needs to render your pages to understand them. If you block access to your CSS and JavaScript, Google sees a broken version of your page. This can directly impact your rankings because Google can't properly evaluate your content, layout, or user experience.

Here's the classic bad robots.txt:

User-agent: *
Disallow: /assets/

That /assets/ directory probably contains your CSS and JS. Instead, be specific:

User-agent: *
Disallow: /assets/images/private/
Disallow: /assets/docs/internal/
Allow: /assets/css/
Allow: /assets/js/

I learned this the hard way on one of my own sites. Blocked /static/ thinking it only had images I didn't want indexed. It also had all my CSS. Google Search Console started showing rendering errors, and my rich snippets disappeared within a week.

Mistake #2: The Disallow Slash Disaster

This one's embarrassing because it's so simple, but it happens all the time:

User-agent: *
Disallow: /

That single slash blocks your ENTIRE site from being crawled. Every page, every resource, everything. I've seen this happen when someone meant to block just the root admin page but forgot the rest of the path.

What they probably meant was:

User-agent: *
Disallow: /admin/

One character difference. Massive impact. Always double-check your Disallow paths, and never use Disallow: / unless you genuinely want to block everything.

Mistake #3: Conflicting Allow and Disallow Rules

When you have both Allow and Disallow rules that apply to the same URL, the most specific rule wins — but "most specific" isn't always obvious. Different crawlers can interpret conflicting rules differently.

Consider this:

User-agent: *
Disallow: /blog/
Allow: /blog/guides/

Most crawlers will allow /blog/guides/ because it's more specific than /blog/. But some older crawlers might just see the first matching Disallow rule and stop. The safest approach is to avoid conflicting rules entirely when possible.

Mistake #4: Blocking Pages You Actually Want Indexed

I see this pattern a lot with pagination, filter pages, and search results:

User-agent: *
Disallow: /search/
Disallow: /page/
Disallow: /*?*

Sounds reasonable — you don't want Google wasting crawl budget on search result pages. But what if /search/ is also where your helpful content hub lives? Or what if your filtered category pages (like /shoes?color=red) actually have unique, valuable content?

Before blocking anything, ask yourself: "Is there any scenario where a page matching this pattern could rank for a valuable keyword?" If yes, don't block it with robots.txt. Use noindex instead, which lets Google crawl the page but keeps it out of the index.

Mistake #5: Not Having a robots.txt at All

This isn't a "mistake" exactly — your site will work fine without one. But you're missing an opportunity. A robots.txt file lets you:

Point crawlers to your XML sitemap
Prevent crawling of low-value pages (login, cart, checkout)
Save crawl budget on large sites
Block AI scrapers that respect robots.txt (many do now)

For most sites, a minimal robots.txt is better than none:

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /api/

Sitemap: https://yoursite.com/sitemap.xml

Mistake #6: Wildcard Overuse

Wildcards (* and $) are powerful but dangerous. I once saw someone use this:

User-agent: *
Disallow: /*.pdf$

The intention was to prevent PDF files from being crawled. But what about the product manuals and white papers that actually drive traffic? Some PDFs are incredibly valuable for SEO — they earn backlinks and can rank for competitive terms.

Instead of blanket-blocking file types, block specific directories:

User-agent: *
Disallow: /internal-reports/
Disallow: /private-docs/

How to Audit Your robots.txt

Here's my simple audit process that takes about 10 minutes:

Open your robots.txt — Go to yoursite.com/robots.txt. If it returns a 404, create one.
Check every Disallow rule — For each one, ask: "Could this accidentally block a page I want indexed?"
Verify in Google Search Console — Use the robots.txt Tester tool (under Settings) to check if specific URLs are blocked.
Check the Coverage report — Look for URLs flagged as "Crawled — currently not indexed" or "Blocked by robots.txt."
Test rendering — Use URL Inspection in GSC to see if Google can render your pages correctly.

I do this audit quarterly, and I almost always find something worth fixing. Last time, it was a new blog category that someone had inadvertently blocked while updating the file.

The Right Way to Generate robots.txt

Writing robots.txt by hand is error-prone. I've messed it up myself, and I've been doing this for years. That's exactly why I built a robots.txt generator — it walks you through each directive, shows you exactly what each rule does, and prevents the most common syntax errors.

Whether you use a generator or write it by hand, the key is to test everything before pushing it live. A single bad line in robots.txt can undo months of SEO work.