Universal Encoder Guide: Binary, Hex, ASCII, Base64 & UTF-8
Technical Mastery Overview
Why Encoding Knowledge Matters
Encoding bugs are some of the most confusing in software development because they're often invisible at the surface level:
éappearing instead ofé— a UTF-8 byte sequence being interpreted as Latin-1- A JWT that "looks right" but fails verification — because the comparison is hex vs Base64
- A database that stores
?instead of Japanese characters — because the connection encoding isn't UTF-8 - A hash that doesn't match — because one side is computing over raw bytes, another over a hex string
These bugs disappear when you can see what's actually in the bytes. Our encoder makes that visible.
The Encoding Stack
Every piece of text you work with passes through multiple encoding layers:
Human text: "Hello"
↓ Unicode code points
Code points: U+0048 U+0065 U+006C U+006C U+006F
↓ UTF-8 encoding
Bytes: 48 65 6C 6C 6F
↓ Hexadecimal representation
Hex: 48656c6c6f
↓ Binary representation
Binary: 01001000 01100101 01101100 01101100 01101111
↓ Base64 encoding (for text transport)
Base64: SGVsbG8=
Each layer is a different way of representing the same underlying data.
ASCII — The Foundation
ASCII (American Standard Code for Information Interchange) assigns 7-bit integers (0–127) to English characters, digits, punctuation, and control codes:
| Decimal | Hex | Binary | Character |
|---|---|---|---|
| 65 | 0x41 | 01000001 | A |
| 97 | 0x61 | 01100001 | a |
| 48 | 0x30 | 00110000 | 0 |
| 32 | 0x20 | 00100000 | (space) |
| 10 | 0x0A | 00001010 | (newline \n) |
| 13 | 0x0D | 00001101 | (carriage return \r) |
The capital/lowercase distinction in ASCII is exactly one bit — bit 5. A is 01000001, a is 01100001. This is why bitwise case conversion is a classic interview example.
Control characters (0–31) are invisible but meaningful:
\t(tab) = 9\n(newline) = 10\r(carriage return) = 13\0(null) = 0 — terminates strings in C
Unicode and UTF-8
ASCII only covers English. Unicode assigns code points to every character in every writing system — over 149,000 characters as of Unicode 15.0.
UTF-8 is the dominant encoding for storing and transmitting Unicode text. It's variable-width:
| Code point range | Bytes | Example |
|---|---|---|
| U+0000 – U+007F | 1 byte | ASCII characters |
| U+0080 – U+07FF | 2 bytes | Latin extended, Arabic, Hebrew |
| U+0800 – U+FFFF | 3 bytes | CJK (Chinese, Japanese, Korean), most other scripts |
| U+10000 – U+10FFFF | 4 bytes | Emoji, rare scripts |
é = U+00E9 = 0xC3 0xA9 (2 bytes in UTF-8)
字 = U+5B57 = 0xE5 0xAD 0x97 (3 bytes in UTF-8)
😊 = U+1F60A = 0xF0 0x9F 0x98 0x8A (4 bytes in UTF-8)
This is why strlen("é") returns 2 in C (byte count), but len("é") returns 1 in Python 3 (character count) — they're measuring different things.
Hexadecimal — The Developer's Binary
Binary (base-2) is accurate but verbose. Decimal (base-10) doesn't align neatly with bytes. Hexadecimal (base-16, digits 0–9 and A–F) is the sweet spot:
1 hex digit = 4 bits
2 hex digits = 1 byte (8 bits)
Binary: 01001000 01100101
Hex: 4 8 6 5 → 0x4865
Decimal: 18533
Hexadecimal is universal in:
- Memory addresses:
0x7ffee4b1c390 - Color codes:
#3b82f6= RGB(59, 130, 246) - Hash outputs:
2cf24dba5fb0a30e26e83b2ac5b9e29e... - Network protocols: packet dumps in Wireshark
- Certificate fingerprints
- Cryptographic keys
Use our Color Converter to translate hex color codes to RGB and HSL, and our Hash Generator to produce hex-encoded hashes.
Common Encoding Problems and Diagnoses
"Mojibake" — corrupted characters
Symptom: "caf\u00e9" displays as "café" or "caf?"
Cause: UTF-8 bytes being interpreted as Latin-1 or ASCII
Fix: Ensure consistent UTF-8 throughout the pipeline (database connection, HTTP headers, file reading)
Set charset=utf-8 in HTTP Content-Type headers, configure database connections with SET NAMES utf8mb4, and open files with explicit encoding: open(f, encoding='utf-8').
Hash comparison failures
Symptom: HMAC verification fails even with the correct secret
Cause: One side returns hex, other returns Base64, comparing as strings fails
Fix: Decode both to bytes before comparison, or use same output format
// Wrong — comparing hex to Base64 directly
'2cf24dba...' === 'LPJNul+wow4m6Dx...' // false, different formats
// Correct — convert to same format
const hexToBase64 = hex => Buffer.from(hex, 'hex').toString('base64');
Null bytes in strings
Symptom: String appears correct but comparison fails or truncates unexpectedly
Cause: Embedded null bytes (\x00) that terminate strings in C-based systems
Fix: Use binary-safe comparison functions, avoid null bytes in text data
Encoding in Security Contexts
Different encodings can be exploited to bypass filters:
- Double encoding:
%253Cdecodes to%3Cdecodes to<— can bypass XSS filters that only decode once - Unicode normalization attacks: different Unicode representations of the same visual character have different byte sequences
- Null byte injection:
file.php\x00.jpgmay be read asfile.phpby some systems
Understanding encoding at the byte level helps identify these attack vectors. Use our encoder to inspect exactly what bytes a string contains — our URL Encoder shows percent-encoding for URL contexts, and our Base64 Decoder handles Base64-encoded payloads.
Practical Encoding Conversions
Text to hex — for embedding binary data in source code:
const hex = Buffer.from("secret key", 'utf8').toString('hex');
// → "7365637265..." — safe to store in JSON, YAML, env files
Hex to bytes — for hashing and crypto operations:
const bytes = Buffer.from("2cf24dba...", 'hex');
// → Uint8Array for SubtleCrypto operations
Text to binary — for low-level protocol work and teaching:
"A" → 01000001
All these conversions happen locally in our encoder. For more specific encoding needs: Base64 encode/decode in our Base64 Encoder, URL-safe encoding in our URL Encoder, and hash generation in our Hash Generator.
Experience it now.
Use the professional-grade Universal Encoder with zero latency and 100% privacy in your browser.