Full-Width vs Half-Width Characters: What They Are and When to Convert

Ever submitted a form that looks perfectly valid, only to get a "format error"? Or watched Excel sort the same number in the wrong position? Many of these frustrating issues trace back to one subtle culprit: mixing full-width and half-width characters. This article covers what these character types are, why they exist, and when you need to convert between them.

1. Definitions: Full-Width and Half-Width

In East Asian character sets (especially CJK environments), characters are divided into two display-width categories:

TypeWidthTypical Unicode RangeExamples
Half-width1 character unitU+0021–U+007E (ASCII printable)A B C 1 2 ! @ #
Full-width2 character unitsU+FF01–U+FF60 (Fullwidth ASCII variants)A B C 1 2 ! @ #

In a monospace font, full-width characters take up exactly twice the horizontal space of half-width ones—designed so that Latin letters align neatly with ideographic characters. Half-width characters match the standard ASCII repertoire used in code, URLs, and modern computer systems.

CJK ideographs, hiragana, and katakana are inherently full-width and have no half-width equivalents. The full-width / half-width distinction only applies to Latin letters, digits, and punctuation, which exist in both forms.

2. Side-by-Side Comparison

CategoryHalf-WidthFull-Width
Uppercase lettersA B C ZA B C Z
Lowercase lettersa b c za b c z
Digits0 1 2 90 1 2 9
Common punctuation! @ # $ % ( ) ,! @ # $ % ( ) ,
SpaceRegular space (U+0020) Ideographic space (U+3000)

These characters look nearly identical, but their Unicode code points are entirely different. Computers treat them as distinct characters when comparing, searching, or sorting—which is exactly why mixing them causes problems.

3. Why Does Mixing Happen?

Full-width characters have a historical reason for existing. Early CJK character sets (such as Big5 and Shift-JIS) included full-width ASCII variants so that Latin text would align visually with ideographic characters. Input method editors (IMEs) for Chinese and Japanese often have a "full-width mode" that inserts full-width characters by default.

Common sources of mixed-width text:

  • IME full-width mode (e.g., typing digits and symbols in a CJK input mode).
  • Copy-and-paste from Word, PDF, or scanned documents.
  • Users unaware of the full-width/half-width switch on mobile keyboards or Japanese IMEs.
  • Legacy data carried over from other systems or encodings.

4. When Do You Need to Convert?

4.1 Form Validation and Database Storage

A user enters a phone number as 03-1234-5678 (full-width digits). The regex /^\d{2}-\d{4}-\d{4}$/ fails entirely, because full-width digits are not matched by \d. The fix: normalize to half-width on the backend before validating.

4.2 Search and Matching

A database stores the product name as half-width iPhone. A user searches for full-width iPhone and gets zero results. Unless your search engine normalizes characters, you need to do it at the application layer.

4.3 CSV and Excel Data Processing

Full-width digits in a numeric column cause Excel's SUM() to skip those cells (treated as text), and sorting puts full-width after 9 instead of in the correct numeric position.

4.4 Code and Configuration Files

A stray full-width quote or full-width colon in a JSON, YAML, or .env file will cause a parse error. These characters are notoriously hard to spot visually.

4.5 URLs and Email Addresses

Full-width characters cannot appear directly in URLs or email addresses. user@example.com contains a full-width (U+FF20) that most mail systems cannot recognize.

4.6 Typesetting and Print

Conversely, traditional CJK typesetting (and some government document standards) require full-width punctuation. In those contexts you may need to convert half-width to full-width rather than the other way around.

5. Conversion Rules and Caveats

The mapping between fullwidth ASCII (U+FF01–U+FF5E) and standard ASCII (U+0021–U+007E) is a simple constant offset:

// Full-width → Half-width (JavaScript)
function toHalfWidth(str) {
    return str.replace(/[\uFF01-\uFF5E]/g, ch =>
        String.fromCharCode(ch.charCodeAt(0) - 0xFEE0)
    ).replace(/\u3000/g, ' ');  // Full-width space → regular space
}

// Half-width → Full-width
function toFullWidth(str) {
    return str.replace(/[\u0021-\u007E]/g, ch =>
        String.fromCharCode(ch.charCodeAt(0) + 0xFEE0)
    ).replace(/ /g, '\u3000');
}

Important caveats:

  • Full-width space (U+3000, Ideographic Space) lies outside U+FF01–U+FF5E and must be handled separately.
  • Halfwidth Katakana (U+FF65–U+FF9F) is a different block and requires its own conversion logic.
  • Decide direction before converting: confirm whether the target system expects half-width or full-width, then convert accordingly—don't blindly normalize everything to half-width.

6. Language and Platform Support

Language / PlatformFull-Width → Half-WidthNotes
PHPmb_convert_kana($str, 'a')Requires mbstring extension; 'A' for the reverse direction
Pythonunicodedata + str.translateNo built-in shortcut; use the jaconv package for convenience
JavaScriptRegex (see above)No native function
JavaCustom implementation or ICU4JNo standard library support
ExcelASC() / JIS()ASC() converts full→half; JIS() converts half→full (requires Japanese locale)

7. Practical Normalization Strategy

Rather than converting at every point of use, normalize once at the system boundary where data enters:

  1. Normalize immediately on form submission: For format-sensitive fields (phone numbers, postal codes, ID numbers), convert to half-width before running any validation.
  2. Normalize search queries too: Apply the same character standard to both the query and the indexed data to maximize search recall.
  3. Pre-process CSV imports: Run a normalization script before loading data into the database.
  4. Add a full-width lint check to CI/CD: Use a pre-commit hook or linter to catch accidental full-width characters in code and config files before they reach production.
Quick Check
Unsure whether a string contains full-width characters? Paste it into a text converter, apply "to half-width", and compare the output with the input. Any difference means full-width characters were present—a fast and reliable detection method.

8. Summary

  • Program logic, URLs, APIs, and database index columns → prefer half-width.
  • Traditional CJK typesetting, government documents, and print → use full-width punctuation per the applicable style guide.
  • Normalize at system boundaries once, rather than patching issues in multiple places.

When you need to convert text in bulk, the Text Converter tool on this site handles full-width ↔ half-width conversion quickly, without writing a single line of code.