Ever submitted a form that looks perfectly valid, only to get a "format error"? Or watched Excel sort the same number in the wrong position? Many of these frustrating issues trace back to one subtle culprit: mixing full-width and half-width characters. This article covers what these character types are, why they exist, and when you need to convert between them.
1. Definitions: Full-Width and Half-Width
In East Asian character sets (especially CJK environments), characters are divided into two display-width categories:
| Type | Width | Typical Unicode Range | Examples |
|---|---|---|---|
| Half-width | 1 character unit | U+0021–U+007E (ASCII printable) | A B C 1 2 ! @ # |
| Full-width | 2 character units | U+FF01–U+FF60 (Fullwidth ASCII variants) | A B C 1 2 ! @ # |
In a monospace font, full-width characters take up exactly twice the horizontal space of half-width ones—designed so that Latin letters align neatly with ideographic characters. Half-width characters match the standard ASCII repertoire used in code, URLs, and modern computer systems.
CJK ideographs, hiragana, and katakana are inherently full-width and have no half-width equivalents. The full-width / half-width distinction only applies to Latin letters, digits, and punctuation, which exist in both forms.
2. Side-by-Side Comparison
| Category | Half-Width | Full-Width |
|---|---|---|
| Uppercase letters | A B C Z | A B C Z |
| Lowercase letters | a b c z | a b c z |
| Digits | 0 1 2 9 | 0 1 2 9 |
| Common punctuation | ! @ # $ % ( ) , | ! @ # $ % ( ) , |
| Space | Regular space (U+0020) | Ideographic space (U+3000) |
These characters look nearly identical, but their Unicode code points are entirely different. Computers treat them as distinct characters when comparing, searching, or sorting—which is exactly why mixing them causes problems.
3. Why Does Mixing Happen?
Full-width characters have a historical reason for existing. Early CJK character sets (such as Big5 and Shift-JIS) included full-width ASCII variants so that Latin text would align visually with ideographic characters. Input method editors (IMEs) for Chinese and Japanese often have a "full-width mode" that inserts full-width characters by default.
Common sources of mixed-width text:
- IME full-width mode (e.g., typing digits and symbols in a CJK input mode).
- Copy-and-paste from Word, PDF, or scanned documents.
- Users unaware of the full-width/half-width switch on mobile keyboards or Japanese IMEs.
- Legacy data carried over from other systems or encodings.
4. When Do You Need to Convert?
4.1 Form Validation and Database Storage
A user enters a phone number as 03-1234-5678 (full-width digits). The regex /^\d{2}-\d{4}-\d{4}$/ fails entirely, because full-width digits are not matched by \d. The fix: normalize to half-width on the backend before validating.
4.2 Search and Matching
A database stores the product name as half-width iPhone. A user searches for full-width iPhone and gets zero results. Unless your search engine normalizes characters, you need to do it at the application layer.
4.3 CSV and Excel Data Processing
Full-width digits in a numeric column cause Excel's SUM() to skip those cells (treated as text), and sorting puts full-width 1 after 9 instead of in the correct numeric position.
4.4 Code and Configuration Files
A stray full-width quote " or full-width colon : in a JSON, YAML, or .env file will cause a parse error. These characters are notoriously hard to spot visually.
4.5 URLs and Email Addresses
Full-width characters cannot appear directly in URLs or email addresses. user@example.com contains a full-width @ (U+FF20) that most mail systems cannot recognize.
4.6 Typesetting and Print
Conversely, traditional CJK typesetting (and some government document standards) require full-width punctuation. In those contexts you may need to convert half-width to full-width rather than the other way around.
5. Conversion Rules and Caveats
The mapping between fullwidth ASCII (U+FF01–U+FF5E) and standard ASCII (U+0021–U+007E) is a simple constant offset:
// Full-width → Half-width (JavaScript)
function toHalfWidth(str) {
return str.replace(/[\uFF01-\uFF5E]/g, ch =>
String.fromCharCode(ch.charCodeAt(0) - 0xFEE0)
).replace(/\u3000/g, ' '); // Full-width space → regular space
}
// Half-width → Full-width
function toFullWidth(str) {
return str.replace(/[\u0021-\u007E]/g, ch =>
String.fromCharCode(ch.charCodeAt(0) + 0xFEE0)
).replace(/ /g, '\u3000');
}
Important caveats:
- Full-width space (U+3000, Ideographic Space) lies outside U+FF01–U+FF5E and must be handled separately.
- Halfwidth Katakana (U+FF65–U+FF9F) is a different block and requires its own conversion logic.
- Decide direction before converting: confirm whether the target system expects half-width or full-width, then convert accordingly—don't blindly normalize everything to half-width.
6. Language and Platform Support
| Language / Platform | Full-Width → Half-Width | Notes |
|---|---|---|
| PHP | mb_convert_kana($str, 'a') | Requires mbstring extension; 'A' for the reverse direction |
| Python | unicodedata + str.translate | No built-in shortcut; use the jaconv package for convenience |
| JavaScript | Regex (see above) | No native function |
| Java | Custom implementation or ICU4J | No standard library support |
| Excel | ASC() / JIS() | ASC() converts full→half; JIS() converts half→full (requires Japanese locale) |
7. Practical Normalization Strategy
Rather than converting at every point of use, normalize once at the system boundary where data enters:
- Normalize immediately on form submission: For format-sensitive fields (phone numbers, postal codes, ID numbers), convert to half-width before running any validation.
- Normalize search queries too: Apply the same character standard to both the query and the indexed data to maximize search recall.
- Pre-process CSV imports: Run a normalization script before loading data into the database.
- Add a full-width lint check to CI/CD: Use a pre-commit hook or linter to catch accidental full-width characters in code and config files before they reach production.
8. Summary
- Program logic, URLs, APIs, and database index columns → prefer half-width.
- Traditional CJK typesetting, government documents, and print → use full-width punctuation per the applicable style guide.
- Normalize at system boundaries once, rather than patching issues in multiple places.
When you need to convert text in bulk, the Text Converter tool on this site handles full-width ↔ half-width conversion quickly, without writing a single line of code.