URL Encoding Bugs: The Silent Killer of Web Forms

📅 May 18, 2026 ⏱️ 8 min read ✍️ By Lu Shen
URL encoding error breaking a web form with special characters

A friend of mine lost a client because of an ampersand. Their company name was "Tom & Jerry Consulting" and every time someone clicked the email link on their website, the subject line showed up as just "Tom " — everything after the ampersand vanished. The client thought they were being unprofessional. They just had a URL encoding bug.

This stuff happens constantly, and it's almost never obvious. URL encoding issues are the cockroaches of web development — they hide in dark corners, multiply when you're not looking, and by the time you notice them, the damage is already done.

What URL Encoding Actually Does

URLs can only contain a limited set of characters: letters, numbers, and a few special characters like hyphens and underscores. Everything else — spaces, ampersands, question marks, non-English characters, emoji — has to be encoded as a percent sign followed by two hexadecimal digits.

A space becomes %20. An ampersand becomes %26. A question mark becomes %3F. The letter "é" becomes %C3%A9. The shrug emoji 🤷 becomes %F0%9F%A4%B7.

This encoding exists because URLs have structural characters that mean specific things. The question mark starts a query string. The ampersand separates parameters. The equals sign assigns values. If your actual data contains these characters unencoded, the URL parser can't tell the difference between structure and content.

That's why "Tom & Jerry" in a query parameter breaks things. The parser sees the ampersand and thinks "new parameter starts here." So ?subject=Tom & Jerry gets parsed as two parameters: subject=Tom (with a trailing space) and Jerry (a parameter with no value). The subject ends at the ampersand.

The Five Most Common URL Encoding Bugs

1. The Ampersand Split

This is the one I see most often. Any form data that might contain an ampersand — company names, addresses, product descriptions — will break unencoded query strings. It's especially common in mailto links and share URLs that users generate themselves.

The fix is simple: always encode parameter values. ?subject=Tom%20%26%20Jerry works perfectly. But developers forget because most test data doesn't include ampersands. "John Smith" works fine. "Johnson & Johnson" does not.

2. The Space Problem

Spaces in URLs are weird. Technically, they should be encoded as %20. But in practice, many systems also accept + as a space substitute. This creates a mess because the two encodings aren't always interchangeable.

In query strings (after the ?), + and %20 both represent a space. But in the path portion of a URL (before the ?), + is a literal plus sign, not a space. Only %20 represents a space in the path.

I've seen APIs that encode spaces as + in path parameters, which then get decoded as literal plus signs by the server. Users see "Mac+Book+Pro" instead of "Mac Book Pro" and assume the site is broken. They're not wrong.

3. Double Encoding

This happens when something is encoded, then encoded again. %20 becomes %2520 because the % in %20 gets encoded to %25.

Double encoding typically occurs when multiple layers of a system each apply encoding independently. The frontend encodes the data, then the backend encodes it again. Or a proxy server adds encoding on top of what the application already did.

The result: users see %2520 rendered in their browser instead of a space. It looks like garbage, and it breaks links that get copy-pasted or shared.

4. Unicode and International Characters

Non-ASCII characters — accented letters, Chinese characters, Arabic script — require multi-byte UTF-8 encoding, which means each character can turn into multiple percent-encoded sequences. "Café" becomes Caf%C3%A9. "東京" becomes %E6%9D%B1%E4%BA%AC.

Problems arise when different parts of a system assume different character encodings. If one component uses UTF-8 and another uses Latin-1, you get mojibake — those garbled strings like "Café" that make your site look like it's haunted.

This is especially nasty in email links. Mailto URLs with non-ASCII subjects need RFC 2047 encoding, not URL encoding. Use the wrong one and the email client shows garbage in the subject line.

5. Hash Fragments in Data

The # character has a special meaning in URLs: it marks the beginning of a fragment identifier. Everything after # is never sent to the server — it's handled entirely by the browser.

If a user enters "Product #5" in a form and you put that in a URL without encoding the hash, everything after it disappears from the server's perspective. Your backend receives "Product " and has no idea there was a "#5" attached. Debugging this is a nightmare because it works fine with "Product A" but silently fails with "Product #5".

Where These Bugs Hide

URL encoding issues don't show up in normal testing because most test data uses simple ASCII characters. "John Doe" works fine. "José María O'Brien-Smith III" probably doesn't, but nobody tests with that.

Here are the places I've found encoding bugs lurking:

The common thread: these are all places where user-generated or dynamic content gets embedded into URLs. Static URLs almost never have encoding problems. It's the dynamic ones that bite you.

The Right Way to Encode

Different parts of a URL need different encoding functions, and using the wrong one is a bug itself:

For query parameter values: Use encodeURIComponent() in JavaScript. This encodes everything that's not safe in a query parameter, including &, =, ?, and #.

For path segments: Use encodeURIComponent() as well, but note that it also encodes /, which you may want to preserve if the path contains multiple segments.

For an entire URL: Use encodeURI(). This encodes special characters but preserves URL structural characters like ://, /, ?, &, and =. Use this when you have a complete URL that might contain non-ASCII characters.

Never use encodeURI() for individual parameter values, and never use encodeURIComponent() for complete URLs. That's the #1 mistake I see.

Decoding: The Other Side of the Coin

Decoding issues are just as common as encoding ones. The biggest trap is decoding in the wrong order or the wrong number of times.

If your data was double-encoded, you need to decode it twice. If you only decode once, you'll see percent-encoded gibberish. But if your data was correctly single-encoded and you decode twice, you'll mangle any legitimate percent signs in the original data.

The fix is consistency: have a single point in your request handling pipeline where URL decoding happens, and make sure it happens exactly once. Don't let different middleware, proxies, or framework layers each decode independently.

Testing for Encoding Bugs

Here's my personal URL encoding test string: A&B C#D?E=F+G. It contains an ampersand, a space, a hash, a question mark, an equals sign, and a plus sign — all characters with special URL meanings. If this string survives a round trip through your system and comes back intact, you're probably fine.

For international characters, test with: José María café 東京 🤷. If that works, your UTF-8 handling is solid.

And for the love of everything, test your forms with real-world data, not "Test Test" every time. Try company names with ampersands, addresses with hash symbols, and names with accented characters. That's where the bugs live.

The Bottom Line

URL encoding bugs are silent because they only appear with certain inputs. Your forms work fine for 95% of users, then silently fail for the 5% who have ampersands in their company name or accented characters in their name. Those users don't report bugs — they just leave.

The fix is straightforward: always encode dynamic values going into URLs, use the right encoding function for the right part of the URL, decode exactly once on the receiving end, and test with characters that have special meanings. Do that and these bugs disappear.

If you need to quickly encode or decode a URL string, I built a URL encoder/decoder that handles all of this — including proper UTF-8 support and detection of double-encoding. It's faster than opening a browser console every time.