Full-Width vs Half-Width Characters: What They Are and When to Convert

Ever submitted a form that looks perfectly valid, only to get a "format error"? Or watched Excel sort the same number in the wrong position? Many of these frustrating issues trace back to one subtle culprit: mixing full-width and half-width characters. This article covers what these character types are, why they exist, and when you need to convert between them.

1. Definitions: Full-Width and Half-Width

In East Asian character sets (especially CJK environments), characters are divided into two display-width categories:

Type	Width	Typical Unicode Range	Examples
Half-width	1 character unit	U+0021–U+007E (ASCII printable)	`A B C 1 2 ! @ #`
Full-width	2 character units	U+FF01–U+FF60 (Fullwidth ASCII variants)	`ＡＢＣ１２！＠＃`

In a monospace font, full-width characters take up exactly twice the horizontal space of half-width ones—designed so that Latin letters align neatly with ideographic characters. Half-width characters match the standard ASCII repertoire used in code, URLs, and modern computer systems.

CJK ideographs, hiragana, and katakana are inherently full-width and have no half-width equivalents. The full-width / half-width distinction only applies to Latin letters, digits, and punctuation, which exist in both forms.

2. Side-by-Side Comparison

Category	Half-Width	Full-Width
Uppercase letters	A B C Z	ＡＢＣＺ
Lowercase letters	a b c z	ａｂｃｚ
Digits	0 1 2 9	０１２９
Common punctuation	! @ # $ % ( ) ,	！＠＃＄％（），
Space	Regular space (U+0020)	Ideographic space (U+3000)

These characters look nearly identical, but their Unicode code points are entirely different. Computers treat them as distinct characters when comparing, searching, or sorting—which is exactly why mixing them causes problems.

3. Why Does Mixing Happen?

Full-width characters have a historical reason for existing. Early CJK character sets (such as Big5 and Shift-JIS) included full-width ASCII variants so that Latin text would align visually with ideographic characters. Input method editors (IMEs) for Chinese and Japanese often have a "full-width mode" that inserts full-width characters by default.

Common sources of mixed-width text:

IME full-width mode (e.g., typing digits and symbols in a CJK input mode).
Copy-and-paste from Word, PDF, or scanned documents.
Users unaware of the full-width/half-width switch on mobile keyboards or Japanese IMEs.
Legacy data carried over from other systems or encodings.

4. When Do You Need to Convert?

4.1 Form Validation and Database Storage

A user enters a phone number as ０３－１２３４－５６７８ (full-width digits). The regex /^\d{2}-\d{4}-\d{4}$/ fails entirely, because full-width digits are not matched by \d. The fix: normalize to half-width on the backend before validating.

4.2 Search and Matching

A database stores the product name as half-width iPhone. A user searches for full-width ｉＰｈｏｎｅ and gets zero results. Unless your search engine normalizes characters, you need to do it at the application layer.

4.3 CSV and Excel Data Processing

Full-width digits in a numeric column cause Excel's SUM() to skip those cells (treated as text), and sorting puts full-width １ after 9 instead of in the correct numeric position.

4.4 Code and Configuration Files

A stray full-width quote ＂ or full-width colon ： in a JSON, YAML, or .env file will cause a parse error. These characters are notoriously hard to spot visually.

4.5 URLs and Email Addresses

Full-width characters cannot appear directly in URLs or email addresses. user＠example.com contains a full-width ＠ (U+FF20) that most mail systems cannot recognize.

4.6 Typesetting and Print

Conversely, traditional CJK typesetting (and some government document standards) require full-width punctuation. In those contexts you may need to convert half-width to full-width rather than the other way around.

5. Conversion Rules and Caveats

The mapping between fullwidth ASCII (U+FF01–U+FF5E) and standard ASCII (U+0021–U+007E) is a simple constant offset:

// Full-width → Half-width (JavaScript)
function toHalfWidth(str) {
    return str.replace(/[\uFF01-\uFF5E]/g, ch =>
        String.fromCharCode(ch.charCodeAt(0) - 0xFEE0)
    ).replace(/\u3000/g, ' ');  // Full-width space → regular space
}

// Half-width → Full-width
function toFullWidth(str) {
    return str.replace(/[\u0021-\u007E]/g, ch =>
        String.fromCharCode(ch.charCodeAt(0) + 0xFEE0)
    ).replace(/ /g, '\u3000');
}

Important caveats:

Full-width space (U+3000, Ideographic Space) lies outside U+FF01–U+FF5E and must be handled separately.
Halfwidth Katakana (U+FF65–U+FF9F) is a different block and requires its own conversion logic.
Decide direction before converting: confirm whether the target system expects half-width or full-width, then convert accordingly—don't blindly normalize everything to half-width.

6. Language and Platform Support

Language / Platform	Full-Width → Half-Width	Notes
PHP	`mb_convert_kana($str, 'a')`	Requires mbstring extension; `'A'` for the reverse direction
Python	unicodedata + str.translate	No built-in shortcut; use the `jaconv` package for convenience
JavaScript	Regex (see above)	No native function
Java	Custom implementation or ICU4J	No standard library support
Excel	ASC() / JIS()	ASC() converts full→half; JIS() converts half→full (requires Japanese locale)

7. Practical Normalization Strategy

Rather than converting at every point of use, normalize once at the system boundary where data enters:

Normalize immediately on form submission: For format-sensitive fields (phone numbers, postal codes, ID numbers), convert to half-width before running any validation.
Normalize search queries too: Apply the same character standard to both the query and the indexed data to maximize search recall.
Pre-process CSV imports: Run a normalization script before loading data into the database.
Add a full-width lint check to CI/CD: Use a pre-commit hook or linter to catch accidental full-width characters in code and config files before they reach production.

Quick Check

Unsure whether a string contains full-width characters? Paste it into a text converter, apply "to half-width", and compare the output with the input. Any difference means full-width characters were present—a fast and reliable detection method.

8. Summary

Program logic, URLs, APIs, and database index columns → prefer half-width.
Traditional CJK typesetting, government documents, and print → use full-width punctuation per the applicable style guide.
Normalize at system boundaries once, rather than patching issues in multiple places.

When you need to convert text in bulk, the Text Converter tool on this site handles full-width ↔ half-width conversion quickly, without writing a single line of code.