Universal Encoder Guide: Binary, Hex, ASCII, Base64 & UTF-8

TK
Toolshubkit Editor
Published Jan 2025
8 MIN READ • Developer Utilities
Every character on your screen is a number — and that number can be represented in binary, hexadecimal, decimal, Base64, or as Unicode code points. Our Universal Encoder converts any text across all major encodings simultaneously, making it easy to understand how your data is actually stored and transmitted at the byte level.

Technical Mastery Overview

Hex/Binary/ASCII
Multi-format view
One-click Copy
Local Conversion

Why Encoding Knowledge Matters

Encoding bugs are some of the most confusing in software development because they're often invisible at the surface level:

  • é appearing instead of é — a UTF-8 byte sequence being interpreted as Latin-1
  • A JWT that "looks right" but fails verification — because the comparison is hex vs Base64
  • A database that stores ? instead of Japanese characters — because the connection encoding isn't UTF-8
  • A hash that doesn't match — because one side is computing over raw bytes, another over a hex string

These bugs disappear when you can see what's actually in the bytes. Our encoder makes that visible.

The Encoding Stack

Every piece of text you work with passes through multiple encoding layers:

Human text:   "Hello"
              ↓ Unicode code points
Code points:  U+0048 U+0065 U+006C U+006C U+006F
              ↓ UTF-8 encoding
Bytes:        48 65 6C 6C 6F
              ↓ Hexadecimal representation
Hex:          48656c6c6f
              ↓ Binary representation
Binary:       01001000 01100101 01101100 01101100 01101111
              ↓ Base64 encoding (for text transport)
Base64:       SGVsbG8=

Each layer is a different way of representing the same underlying data.

ASCII — The Foundation

ASCII (American Standard Code for Information Interchange) assigns 7-bit integers (0–127) to English characters, digits, punctuation, and control codes:

Decimal Hex Binary Character
65 0x41 01000001 A
97 0x61 01100001 a
48 0x30 00110000 0
32 0x20 00100000 (space)
10 0x0A 00001010 (newline \n)
13 0x0D 00001101 (carriage return \r)

The capital/lowercase distinction in ASCII is exactly one bit — bit 5. A is 01000001, a is 01100001. This is why bitwise case conversion is a classic interview example.

Control characters (0–31) are invisible but meaningful:

  • \t (tab) = 9
  • \n (newline) = 10
  • \r (carriage return) = 13
  • \0 (null) = 0 — terminates strings in C

Unicode and UTF-8

ASCII only covers English. Unicode assigns code points to every character in every writing system — over 149,000 characters as of Unicode 15.0.

UTF-8 is the dominant encoding for storing and transmitting Unicode text. It's variable-width:

Code point range Bytes Example
U+0000 – U+007F 1 byte ASCII characters
U+0080 – U+07FF 2 bytes Latin extended, Arabic, Hebrew
U+0800 – U+FFFF 3 bytes CJK (Chinese, Japanese, Korean), most other scripts
U+10000 – U+10FFFF 4 bytes Emoji, rare scripts
é  = U+00E9 = 0xC3 0xA9 (2 bytes in UTF-8)
字  = U+5B57 = 0xE5 0xAD 0x97 (3 bytes in UTF-8)
😊 = U+1F60A = 0xF0 0x9F 0x98 0x8A (4 bytes in UTF-8)

This is why strlen("é") returns 2 in C (byte count), but len("é") returns 1 in Python 3 (character count) — they're measuring different things.

Hexadecimal — The Developer's Binary

Binary (base-2) is accurate but verbose. Decimal (base-10) doesn't align neatly with bytes. Hexadecimal (base-16, digits 0–9 and A–F) is the sweet spot:

1 hex digit = 4 bits
2 hex digits = 1 byte (8 bits)
Binary:      01001000 01100101
Hex:         4        8        6        5   → 0x4865
Decimal:     18533

Hexadecimal is universal in:

  • Memory addresses: 0x7ffee4b1c390
  • Color codes: #3b82f6 = RGB(59, 130, 246)
  • Hash outputs: 2cf24dba5fb0a30e26e83b2ac5b9e29e...
  • Network protocols: packet dumps in Wireshark
  • Certificate fingerprints
  • Cryptographic keys

Use our Color Converter to translate hex color codes to RGB and HSL, and our Hash Generator to produce hex-encoded hashes.

Common Encoding Problems and Diagnoses

"Mojibake" — corrupted characters

Symptom: "caf\u00e9" displays as "café" or "caf?"
Cause:   UTF-8 bytes being interpreted as Latin-1 or ASCII
Fix:     Ensure consistent UTF-8 throughout the pipeline (database connection, HTTP headers, file reading)

Set charset=utf-8 in HTTP Content-Type headers, configure database connections with SET NAMES utf8mb4, and open files with explicit encoding: open(f, encoding='utf-8').

Hash comparison failures

Symptom: HMAC verification fails even with the correct secret
Cause:   One side returns hex, other returns Base64, comparing as strings fails
Fix:     Decode both to bytes before comparison, or use same output format
// Wrong — comparing hex to Base64 directly
'2cf24dba...' === 'LPJNul+wow4m6Dx...'  // false, different formats

// Correct — convert to same format
const hexToBase64 = hex => Buffer.from(hex, 'hex').toString('base64');

Null bytes in strings

Symptom: String appears correct but comparison fails or truncates unexpectedly
Cause:   Embedded null bytes (\x00) that terminate strings in C-based systems
Fix:     Use binary-safe comparison functions, avoid null bytes in text data

Encoding in Security Contexts

Different encodings can be exploited to bypass filters:

  • Double encoding: %253C decodes to %3C decodes to < — can bypass XSS filters that only decode once
  • Unicode normalization attacks: different Unicode representations of the same visual character have different byte sequences
  • Null byte injection: file.php\x00.jpg may be read as file.php by some systems

Understanding encoding at the byte level helps identify these attack vectors. Use our encoder to inspect exactly what bytes a string contains — our URL Encoder shows percent-encoding for URL contexts, and our Base64 Decoder handles Base64-encoded payloads.

Practical Encoding Conversions

Text to hex — for embedding binary data in source code:

const hex = Buffer.from("secret key", 'utf8').toString('hex');
// → "7365637265..." — safe to store in JSON, YAML, env files

Hex to bytes — for hashing and crypto operations:

const bytes = Buffer.from("2cf24dba...", 'hex');
// → Uint8Array for SubtleCrypto operations

Text to binary — for low-level protocol work and teaching:

"A" → 01000001

All these conversions happen locally in our encoder. For more specific encoding needs: Base64 encode/decode in our Base64 Encoder, URL-safe encoding in our URL Encoder, and hash generation in our Hash Generator.

Experience it now.

Use the professional-grade Universal Encoder with zero latency and 100% privacy in your browser.

Launch Universal Encoder
Understanding encoding is foundational to debugging character issues, security problems, and protocol implementations. When something looks wrong in your data, the answer is often in the bytes.