URL Encoding Guide: Percent-Encoding, Pitfalls, and Practical Patterns

You may have seen this before: the frontend sends a query string that looks correct, but the backend receives corrupted text. Or a URL breaks as soon as it includes spaces, +, or non-Latin characters. In many cases, this is not a transport problem. It is a URL encoding problem caused by encoding the wrong part at the wrong time.

1. What Is URL Encoding?

A URL can safely carry only a limited character set. When you need spaces, Unicode text, emoji, or special symbols such as & and =, those characters must be converted using percent-encoding: % followed by two hexadecimal digits.

For example, a space is commonly encoded as %20. If data is not encoded correctly, parsers may treat values as structural separators, splitting or rewriting your parameters.

2. Which Characters Should Be Encoded?

A practical way to reason about this is by character categories:

  • Unreserved (typically safe as-is): A-Z, a-z, 0-9, -, _, ., ~
  • Reserved (structural meaning): :, /, ?, #, [, ], @, &, =
  • Non-ASCII (Unicode text): must be UTF-8 encoded first, then percent-encoded

The goal is not "encode everything". The goal is "encode the right characters in the right URL component".

3. Why Is Space Sometimes %20 and Sometimes +?

This is a classic source of confusion. In regular percent-encoding, spaces are represented as %20. In application/x-www-form-urlencoded rules (commonly used by HTML forms), spaces are represented as +.

If one system interprets + as a literal plus sign and another treats it as a space, data mismatches are inevitable. Agree on one encoding contract across services.

4. JavaScript: encodeURI vs encodeURIComponent

Both functions encode, but they serve different purposes:

  • encodeURI: use for an entire URL. It preserves structural characters like :, /, and ?.
  • encodeURIComponent: use for a single parameter value. It encodes characters such as & and = to avoid query-string corruption.
const keyword = 'C++ guide & examples';
const url = '/search?q=' + encodeURIComponent(keyword);
// /search?q=C%2B%2B%20guide%20%26%20examples

If you encode parameter values with encodeURI, separators like & may leak into query structure and break parsing.

5. Common Failure: Double Encoding

Double encoding happens when data is encoded more than once across layers (frontend, SDK, gateway, backend). A literal % becomes %25, and values look corrupted.

  • Original: hello world
  • Encoded once: hello%20world
  • Encoded twice: hello%2520world

Document ownership clearly: who encodes, who decodes, and at what boundary.

6. Treat Path, Query, and Fragment Separately

Different URL components follow different rules. A common mistake is treating the full URL as one homogeneous string.

Component Recommended Approach Common Mistake
Path Encode per segment while preserving route structure Encoding the full path and breaking slashes
Query Encode keys and values independently Leaving & or = unencoded in values
Fragment Handle according to frontend/router behavior Mixing fragment and server query rules

7. Security Note: Encoding Is Not Sanitization

URL encoding protects transport representation, not application security. It does not replace input validation, output escaping, or parameterized database access.

  • Validate decoded data at the receiver.
  • Escape for the target output context (HTML, SQL, shell, etc.).
  • Do not inject decoded content directly into the DOM.
Quick Checklist
Identify the URL component first, choose the right encoder second, and verify with an end-to-end round-trip test last.

Conclusion

Reliable URL encoding is less about memorizing character tables and more about applying consistent rules at clear boundaries. Once your team aligns on component-level encoding and ownership, many "mysterious" API bugs disappear.